throbber
“g
`
`I
`
`.110‘r
`3333310
`
`WE
`dumflWait""$333
`
`
`
`
`
`Q gh
`
`: % E
`
`:
`5"
`
`g
`
`|PR2018-01594
`
`EXHIBIT
`
`2052
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 1
`
`IPR2018-01594
`
`EXHIBIT
`2052
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 1
`
`

`

`High-Per£ ormance
`Coinputer Architecture
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 2
`
`

`

`High-Per£ ormance
`Contputer Architecture
`
`Harold S. Stone
`IBM Watson Research Center
`and
`Courant Institute
`NewYork University
`
`I. Addison-Wesley Publishing Company
`Reading, Massachusetts
`Menlo Park, California • Don Mills, Ontario
`Wokingham, England • Amsterdam
`Sydney • Singapore • Tokyo • Madrid
`Bogota • Santiago • San Juan
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 3
`
`

`

`This book is in the Addison-Wesley Series in Electrical and Computer Engineering
`
`Sponsoring Editor • Tom Robbins
`Production Supervisor • Bette J. Aaronson
`Copy Editor • Sarah Meyer
`Text Designer • Herb Caswell
`Illustrator • Hardlines
`Technical Art Consultant • Joseph Vetere
`Manufacturing Supervisor • Hugh Crawford
`Cover Designer • Jean Depoian
`
`Library of Congress Cataloging-in-Publication Data
`
`Stone, Harold S., 1938-
`High-performance computer architecture.
`
`Bibliography: p.
`Includes index.
`I. Title.
`I. Computer architecture.
`QA76.9.A73S76 1987
`004.22
`87-1073
`ISBN 0-201-16802-2
`
`Copyright© 1987 by Addison-Wesley Publishing Company
`
`All rights reserved. No part of this publication
`may be reproduced, stored in a retrieval sys(cid:173)
`tem, or transmitted, in any form or by any
`means, electronic, mechanical, photocopying,
`recording, or otherwise, without the prior
`written permission of the publisher. Printed in
`the United States of America. Published simul(cid:173)
`taneously in Canada.
`
`ABCDEFGHIJ-AL-8987
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 4
`
`

`

`To Jan-colleague and companion
`
`A precis from the pages of history
`
`Chapter 1
`Architecture is preeminently the art of significant forms in space-that is,
`forms significant of their functions.
`- Claude Bragdon, 1931
`
`Chapter 2
`I know of no way of judging the future but by the past.
`- Patrick Henry, 1775
`
`Chapter 3
`Comparisons do ofttime great grievance.
`-
`John Lydgate, c. 1440
`
`Chapter4
`The fickle multitude, which veers with every wind!
`-
`J. C. F. Schiller, 1800
`
`Chapter 5
`The tucked-up sempstress walks with hasty strides,
`While streams run down her oil'd umbrella's sides.
`-
`Jonathan Swift, 1711
`
`Chapter 6
`Sat cit si sat bene. [It is done quickly enough if it is done well.]
`-
`Latin proverb
`
`Chapter 7
`Who depends upon another man's table often dines late.
`-
`John Ray, 1678
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 5
`
`

`

`Preface
`
`Teaching computer architecture is an interesting challenge for the instructor
`because the field is in constant flux. What the architect does depends strongly
`on the devices available, and the devices have been changing every two to
`three years, with major breakthroughs once or twice a decade. Within the
`brief life of this textbook, there may be a complete turnover in the devices
`used in computers.
`What then should be taught to prepare students for what lies ahead?
`What information will remain important over the technical career of a stu(cid:173)
`dent, and what information will soon become obsolete, of historical interest
`only? This text stresses design ideas embodied in many machines and the
`techniques for evaluating the ideas. The ideas and the evaluation techniques
`are the principles that will survive. The specific implementations of
`machines that one might choose in 1987, 1990, or 1993 reflect the basic
`principles described here as applied to the device technology currently
`prevailing. Effective designs are those that use technology cleverly and
`achieve balanced, efficient structures matched well to the class of problems
`they attack. This text stresses the means to achieve balance and efficiency
`regardless of the underlying technology.
`We use a multifaceted approach to teaching the reader how to prepare for
`the future. The major features are the following:
`
`1. Each topic is a general architectural approach-memory designs,
`pipeline techniques, and a variety of parallel structures.
`2. Within each topic the focus is on fundamental bottlenecks-memory
`bandwidth, processing bandwidth, communications, and synchroniza(cid:173)
`tion-and how to overcome these bottlenecks for each specific topic area.
`
`vii
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 6
`
`

`

`viii
`
`Preface
`
`3. The material addresses evaluation techniques to help the reader isolate
`aspects that are highly efficient from those that are not.
`4. A few machines whose structure is of historical interest are described to
`illustrate how the concepts can be implemented.
`5. Where appropriate, the text draws on examples of real applications and
`their architectural requirements.
`6. Exercises at the end of chapters give the reader an opportunity to
`sketch out designs and perform evaluation under a variety of
`technology-oriented constraints.
`The exercises are particularly important because the reader learns to
`master the material by integrating a number of different ideas, often by
`working through a paper design that must meet some unusual set of con(cid:173)
`straints. In several exercises, the student is asked to produce a series of
`designs, each reflecting a different set of underlying devices. This helps the
`student gain experience in adapting basic techniques to new situations.
`The text is intended for advanced undergraduates and first-year graduate
`students. It assumes the student has had a course in machine organization so
`that the basic operation of a processor is well understood. Some experience
`with assembly language is helpful, but not essential. Programming in a high(cid:173)
`level language such as Pascal, however, is necessary to understand the
`applications used as examples. Mathematical background in linear systems
`or numerical methods is helpful for Chapters 4 and 5, and some exposure to
`operating systems will assist understanding Chapter 7. In neither case is the
`material absolutely required because the text contains sufficient background
`discussion to support the presentation.
`The text purposely avoids highly detailed descriptions of popular ma(cid:173)
`chines because in time the machines described will inevitably be obsolete. In
`future years, a reader of such material may be led to think that the specific
`details of the successful machine represent good design decisions for the
`future as well for the time frame in which the design was actually done. A
`better approach is for the individual instructor to discuss one or two current
`machines while using the text, with the notion that the current machines can
`change each year at the discretion of the instructor. It is also possible to use
`the text without such supplementary material because the design exercises
`provide challenges that represent technology through the end of the 80s and
`into the 90s.
`We jokingly tell students that the subject matter enjoys a positive benefit
`from the rapid changes in technology. The instructor need not create new
`exercises and examinations for each new class. The questions may be the
`same each year, but the answers will be different.
`As an aid for the student and instructor, there is a floppy disk available
`with stripped traces of program execution. The student should find this use-
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 7
`
`

`

`Preface
`
`ix
`
`ful for cache evaluation studies. The disk is usable directly on IBM personal
`computers and compatible equipment and can be accessed by programs
`written in a variety of programming languages. Prior to the publication of
`this text, thorough studies of cache behavior required main-frame computers
`for analysis because of the massive amounts of data to process. Techniques
`described in Chapter 2 show how to reduce the processing by as much as two
`orders of magnitude. These techniques make possible the use of a personal
`computer as the primary analysis tool. Instructors may find that exercises in
`cache analysis are particularly illuminating. Instructors who would like to
`obtain the disk should contact the publisher or the author.
`The material in the text is structured in a modular fashion, with each
`chapter reasonably independent of every other chapter. The instructor can
`put together a course by selecting individual chapters and individual sections
`according to the background of the students, the prerequisites available, and
`the successor courses in the curriculum.
`Chapters 2 and 3 form the core material. Cache memories and pipeline
`structures are widely used today, and they are likely to be effective in the
`technologies that will emerge in the next several years. These chapters should
`be taught in all course offerings.
`For courses in which students have a strong mathematical preparation,
`Chapters 4 and 5 are particularly well suited because they treat techniques
`for high-speed numerical computations. Although the information is of
`interest for general-purpose computers, it is biased to supercomputers that
`are used on large-scale numerical problems.
`Chapters 6 and 7 treat multiprocessors, which are more general purpose
`than the machines of Chapters 4 and 5. These chapters are recommended for
`curricula that stress systems programming and computer engineering.
`In one semester, it is reasonable to co.nplete selected sections of all chap(cid:173)
`ters, or to cover Chapters 2 and 3 and two other chapters in depth. Chapter 1,
`which has no exercises, is to be used as background reading to set the tone of
`the exposition. The text can easily satisfy the needs of a two-quarter sequence
`if the instructor chooses to use the full material.
`No matter which portion of the text is covered, working the exercises is
`critical for a thorough appreciation of the material. The design-oriented exer(cid:173)
`cises can be rather frustrating at first because there is no clear indication of a
`correct answer. The reader wants to be able to jot down a simple answer to a
`question after a small amount of thought. What a pleasure to crank through a
`calculation and find the answer is 17.5. The design exercises are nothing like
`this. No specific quality distinguishes a right answer from a wrong answer.
`The answer is a design, and if it meets the design constraints it must be
`correct.
`The point of working such exercises is not the final design, but rather the
`process of arriving at the final design. What alternatives were considered?
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 8
`
`

`

`x
`
`Preface
`
`How does the final design overcome basic problems? Did the student consider
`a reasonable set of alternatives or was there a valid approach missed that
`should have been considered? Is the evaluation of the design reasonable? For
`what assumptions concerning technology factors and workload character(cid:173)
`istics is the given design an efficient one?
`After working through s.everal such problems the reader becomes familiar
`with the thought processes of the designer and gains both experience and
`insight in architectural design. Many exercises seem to capture real situ(cid:173)
`ations, and this is as intended. As in real situations the reader may discover
`that there is no good solution, and a compromise has to be invented. Or there
`may be several reasonable solutions, and the reader has to pick one, possibly
`on the basis of characteristics that are secondary in importance because all
`the solutions available have satisfactory primary characteristics. Many exer(cid:173)
`cises have actually been drawn from design problems faced by the author,
`with constraints updated for the present and future.
`The preparation of this text represents the fruits of labor of many parties.
`The author's students, Tom Puzak, Zarka Cvetanovic, and Dominique Thi(cid:173)
`ebaut, contributed a substantial number of the ideas presented. They also
`offered helpful comments and criticism as the project progressed. Other re(cid:173)
`viewers whose comments are reflected in these pages are William F. Applebe,
`Georgia Institute of Technology; Richard A. Erdrich, UNISYS Corporation;
`John L. Hennessy, Stanford University; K. C. Murphy, Advanced Micro De(cid:173)
`vices; Paul Pederson, New York University; Richard L. Sites, Digital Equip(cid:173)
`ment Corporation; and Phil Emma, Jeff Lee, Peter Hsu, K. S. Natarajan,
`Howard Sachar, and Marc Surette, all with IBM. Collectively and individu(cid:173)
`ally, their work has aided greatly in the process of developing the material to
`make it easily accessible to the intended audience. The publication crew at
`Addison-Wesley did a remarkable job in putting the project together. Bette
`Aaronson and Sarah Meyer demonstrated that they know pipelining in prac(cid:173)
`tice better than the author does in theory, smoothly flowing the chapters
`through the tedious process of markup, text edit, and page composition
`to demonstrate their proficiency in high-performance publishing. To Tom
`Robbins we offer gratitude for support and encouragement in this project
`from its inception to its completion.
`
`Chappaqua, New York
`
`H.S.S.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 9
`
`

`

`Contents
`
`1 Introduction
`
`1.1 Technology and Architecture
`1.2 But Is It Art?
`1.2.1 The Cost Factor
`1.2.2 Hardware Considerations
`1.3 High-Performance Techniques
`1.3 .1 Measuring Costs
`1.3.2 The Role of Applications
`1.3 .3 The Impact of VLSI
`1.3 .4 The Effect of Technological Change on Cost
`1.3.5 Algorithms and Architecture
`1.4 Historical References
`
`2
`
`Memory-System Design
`
`2.1 Exploiting Program Characteristics
`2.2 Cache Memory
`2.2.1 Basic Cache Structure
`2 .2 .2 Cache Design
`2.2.3 Cache Analysis
`2.2.4 Replacement Policies
`2.2.5 Footprints in the Cache
`2 .2 .6 Writing to the Cache
`
`1
`
`1
`3
`4
`7
`9
`10
`12
`13
`14
`17
`19
`
`21
`23
`29
`29
`32
`39
`52
`58
`66
`
`xi
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 10
`
`

`

`xii
`
`Contents
`
`2 .3 Virtual Memory
`2.3.1 Virtual-Memory Structure
`2.3.2 Virtual-Memory Mapping
`2.3.3 Improving Program Locality
`2.3.4 Replacement Algorithms
`2.3.5 Buffering Effects in Virtual-Memory Systems
`Exercises
`
`3 Pipeline Design Techniques
`
`3.1 Principles of Pipeline Design
`3.2 Memory Structures in Pipeline Computers
`3.3 Performance of Pipelined Computers
`3.4 Control of Pipeline Stages
`3.4.1 Design of a Multi-function Pipeline
`3.4.2 The Collision Vector and Pipeline Control
`3.4.3 Maximum Performance Pipelines
`3 .4 .4 Using Delays to Increase Performance
`3.4.5 Interlock Elimination
`3 .5 Exploiting Pipeline Techniques
`3.5.1 Conditional Branches
`3.5.2 Internal Forwarding and Deferred Instructions
`3 .5 .3 Machines with Both Cache and Virtual Memory
`3.5.4 RISC Architectures
`3 .6 Historical References
`Exercises
`
`4 Characteristics of Numerical Applications
`
`4.1 Classification of Large-Scale Numerical Problems
`4.1.1 Continuum Models
`4 .1.1 Particle Models
`4.2 Design Constraints for High-Performance Machines
`4.3 Architectures for the Continuum Model
`4.4 Algorithms for the Continuum Model
`4.4.1 The Cosmic Cube
`4.4.2 Data-Flow Requirements
`4.4.3 Parallel Solutions
`4.4.4 Recursive Doubling and Cyclic Reduction
`4.5 The Perfect Shuffle
`4.5.1 The Perfect-Shuffle Interconnection Pattern
`4.5.2 Applications of the Perfect Shuffle
`4.6 Architectures for the Continuum Model-Which Direction?
`Exercises
`
`69
`70
`74
`81
`84
`90
`94
`
`102
`
`103
`115
`117
`127
`127
`132
`138
`140
`148
`150
`150
`155
`165
`168
`171
`172
`
`177
`
`178
`180
`182
`184
`186
`194
`194
`195
`200
`206
`210
`210
`217
`227
`229
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 11
`
`

`

`Contents
`
`5
`
`Vector Computers
`5.1 A Generic Vector Processor
`5.1.1 Multiple Memory Modules
`5.1.2 Intermediate Memories
`5.2 Access Patterns for Numerical Algorithms
`5.2.1 Gaussian Elimination
`5.3 Data-Structuring Techniques for Vector Machines
`5.4 Attached Vector-Processors
`5.5 Sparse-Matrix Techniques
`5 .6 The G F-11, A Very High-Speed Vector Processor
`5.7 Final Comments on Vector Computers
`Exercises
`
`6 Multiprocessors
`
`xiii
`
`233
`
`234
`236
`244
`248
`249
`253
`261
`266
`268
`271
`274
`
`278
`
`279
`283
`
`6.1 Background
`6.2 Multiprocessor Performance
`6.2.1 The Basic Model-Two Processors with Unoverlapped
`285
`Communications
`6.2.2 Extension to N Processors
`286
`6.2.3 A Stochastic Model
`290
`291
`6.2.4 A Model with Linear Communication Costs
`6.2.5 An Optimistic Model-Fully Overlapped Communication 293
`6.2.6 A Model with Multiple Communication Links
`295
`6.2.7 Multiprocessor Models
`297
`6.3 Multiprocessor Interconnections
`299
`6.3.1 Bus Interconnections
`299
`6.3.2 Ring Interconnections
`304
`6.3.3 Crossbar Interconnections
`305
`6.3.4 The Shuffle-Exchange Interconnection and the
`Combining Switch
`6.3.5 The Butterfly Operation and the Reverse-Binary
`Transformation
`6.3.6 The Combining Network and Fetch-and-Add
`6.4 Cache Coherence in Multiprocessors
`6.5 Summary
`Exercises
`
`310
`
`312
`318
`324
`329
`330
`
`7 Multiprocessor Algorithms
`
`7 .1 Easy Parallelism
`7 .1.1 The do par and do seq Constructions
`7 .1 .2 Barrier Synchronization
`
`332
`
`333
`335
`336
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 12
`
`

`

`xiv
`
`Contents
`
`7.1.3 Performance Considerations
`7 .1 .4 Increasing Granularity
`7.1.5 Initiating Tasks
`7.2 Synchronization Techniques
`7.2.1 Synchronization with Test-and-Set
`7.2.2 Synchronization with Increment and Decrement
`7 .2.3 Synchronization with Compare-and-Swap
`7 .2.4 Synchronization with Fetch-and-Add
`7.3 Parallel Search-How to Use and Not Use Parallelism
`7.3 .1 Searching for the Maximum of a Unimodal Function
`7.3 .2 Parallel Branch-and-Bound-The Traveling-Salesman
`Problem
`7 .4 Transforming Serial Algorithms into Parallel Algorithms
`7 .4 .1 Dependency Analysis
`7.4.2 Exploiting Parallelism Across Iterations
`7.4.3 The Effects of Scheduling on Parallelism
`7 .5 Final Comments on Multiprocessors
`Exercises
`
`References
`
`Index and Glossary
`
`338
`341
`345
`347
`348
`352
`355
`362
`365
`366
`
`369
`37 4
`37 5
`377
`382
`383
`385
`
`389
`
`397
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 13
`
`

`

`

`

`2
`
`Introduction
`
`Chap. l
`
`ridiculous proposal for today may be ideal for tomorrow. There are no abso(cid:173)
`lute rules that say that one architecture is better than another.
`The key to learning about computer architecture is learning how to evalu(cid:173)
`ate architecture in the context of the technology available. It is as important
`to know if a computer system makes effective use of processor cycles, memory
`capacity, and input/output bandwidth as it is to know its raw computational
`speed. The objective is to look at both cost and performance, not performance
`alone, in evaluating architectures. Because of changes in technology, relative
`costs among modules as well as absolute costs change dramatically every few
`years, so the best proportion of different types of modules in a cost-effective
`design changes with technology.
`This text takes the approach that it is methodology, not conclusions, that
`needs to be taught. We present a menu of possibilities, some reasonable today
`and some not. We show how to construct high-performance systems by mak(cid:173)
`ing selections from the menus, and we evaluate the systems produced in
`terms of technology that exists in the mid-1980s. The conclusions reached by
`these evaluations are probably reasonable through the end of the decade, but
`in no way do we claim that the architectures that look strongest today will be
`the best in the next decade.
`The methodology, however, is timeless. From time to time the computer
`architect needs to construct a new menu of design choices. With that menu
`and the design and evaluation techniques described in this text, the architect
`should be able to produce high-quality systems in any decade for the tech(cid:173)
`nology at that time.
`Performance analysis should be based on the architecture of the total
`system. Design and analysis of high-performance systems is very complex,
`however, and is best approached by breaking the large system into a hier(cid:173)
`archy of functional blocks, each with an architecture that can be analyzed in
`isolation. If any single function is very complicated, it too can be further
`refined into a collection of more primitive functions. Processor architecture,
`for example, involves putting together registers, arithmetic units, and control
`logic to create processors-the computational elements of a computer
`system.
`An important facet of processor architecture is the design of the instruc(cid:173)
`tion set for the processor, and we shall learn in the course of this text that
`there are controversies raging today concerning whether instruction sets
`should be very simple or very complex. We do not settle this controversy here;
`there cannot be a single answer. But we do illuminate the factors that deter(cid:173)
`mine the answer, and in any technology an architect can measure those
`factors in the course of a new design.
`Computer architecture is sometimes confused with the design of com(cid:173)
`puter hardware. Because computer architecture deals with modules at a
`functional level, not exclusively at a hardware level, computer architecture
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 15
`
`

`

`Sec. 1.2
`
`But Is It Art?
`
`3
`
`must encompass more than hardware. We can specify, for example, that a
`processor performs arithmetic and logic functions, and we can be reasonably
`sure that these functions will be built into the hardware and not require
`additional programming. If we specify memory management functions in the
`processor, the actual implementation of those functions may be some mix of
`hardware and software,, with the exact mix depending on performance, avail(cid:173)
`ability of existing hardware or software components, and costs.
`When very-large-scale integration (VLSI) was in its infancy, memory(cid:173)
`management functions were implemented in software, and the processor
`architecture had to support such software by providing only a collection of
`registers for address mapping and protection. With VLSI it becomes possible
`to embed a greater portion of memory management in hardware. Many sys(cid:173)
`tems employ sophisticated algorithms in hardware for performing memory(cid:173)
`management functions once exclusively implemented in software.
`The line between hardware and software becomes somewhat fuzzy when
`last year's software is embedded directly in read-only memory on a memory(cid:173)
`management chip where it is invisibly invoked by the programs being man(cid:173)
`aged. Once such a chip is packaged and is then a "black box" that does
`memory management, the solution becomes a hardware solution. The archi(cid:173)
`tect who uses the chip need not provide additional software for memory
`management. If a chip does most, but not all, memory-management func(cid:173)
`tions internally, then the architect must look into providing the missing
`features by incorporating software modules.
`In retrospect, computer architecture makes systems from components,
`and the components can be hardware, software, or a mixture of both. The
`skill involved in architecture is to select a good collection of components and
`put them together so they work effectively as a total system. Later chapters
`show various examples of architectures, some proven successful and some
`proposals that might succeed.
`
`1.2 But Is It Art?
`An article in the New York Times in January 1985 described a discovery of an
`unsigned painting by de Kooning that raised a few eyebrows among art
`critics. Although it does not bear his signature, there was no doubt that it is
`his work, and it was hung in a gallery for public viewing. The piece is a bench
`from the outhouse of his summer beach house that de Kooning painted ab(cid:173)
`stractly to give the appearance of marble. Is this piece a great work of art by a
`renowned master, or is it just a painted privy seat? The point is that art
`appreciation is based on aesthetics, for which we have no absolute measures.
`We have no absolute test to conclude whether the work is a masterpiece or a
`piece of junk. If the art world agrees that it is a masterpiece, then it is a
`masterpiece.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 16
`
`

`

`4
`
`Introduction
`
`Chap. 1
`
`Computer architecture, too, has an aesthetic side, but it is quite different
`from the arts. We can evaluate the quality of an architecture in terms of
`maximum number of results per cycle, program and data capacity, and cost,
`as well as other measures that tend to be important in various contexts. We
`need never debate a questions such as, "but is it fast?"
`Architectures can be compared on critical measures when choices must
`be made. The challenge comes because technology gives us new choices each
`year, and the decisions from last year may not hold this year. Not only must
`the architect understand the best decision for today, but the architect must
`factor in the effects of expected changes in technology over the life of a design.
`Therefore, not only do evaluation techniques play a crucial role in individual
`decisions, but by using these techniques over a period of years, the architect
`gains experience in understanding the impact of technological developments
`on new architectures and is able to judge trends for several years in the
`future.
`Here are the principal criteria for judging an architecture:
`
`• Performance;
`• Cost; and
`• Maximum program and data size.
`
`There are a dozen or more other criteria, such as weight, power consumption,
`volume, and ease of programming, that may have relatively high significance
`in particular cases, but the three listed here are important in all applications
`and critical in most of them.
`
`1.2.1 The Cost Factor
`
`The cost criterion deserves a bit more explanation because so many people
`are confused about what it means. The cost of a computer system to a user is
`the money that the user pays for the system, namely its price. To the designer,
`cost is not so clearly defined. In most cases, cost is the cost of manufacturing,
`including a fair amortization of the cost of development and capital tools for
`construction. All too often we see comparisons of architectures that compare
`the parts cost of System A with the purchase price of System B, where System
`A is a novel architecture that is being proposed as an innovation, and System
`B represents a model in commercial production.
`Another fallacious comparison is often made when relating hardware to
`software. In the early years of computing, software was often bundled free of
`charge with hardware, but, as the industry matured, software itself became a
`commodity of value to be sold.
`We now discover that what was once a free good now commands a signifi(cid:173)
`cant portion of a computing budget. The trends that people quote are de(cid:173)
`picted in Fig. 1.1, where we see the cost of software steadily rising with
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 17
`
`

`

`

`

`

`

`Sec. 1.2
`
`But Is It Art?
`
`7
`
`Our analysis also shows why in years to come hardware costs will still
`prove to be significant compared to software costs. At issue here is the cost of
`manufacturing. Software manufacturing costs are near zero today and can
`only go lower, so that software pricing in a competitive market mainly re(cid:173)
`flects the amortization of development costs.
`Hardware manufacturing costs, while small on a per-chip basis, are many
`times more than software manufacturing costs. It is far less costly today to
`replicate accurate copies of software than it is to replicate hardware. Hard(cid:173)
`ware requires assembly and testing to make sure that each copy is a faithful
`copy of the original design. This is far more complex today than the quality
`assurance on a software manufacturing line that simply has to compare each
`bit of information in software to see if it agrees with the original program.
`We see that hardware pricing carries the burden of per-unit manu(cid:173)
`facturing costs together with development costs, whereas software pricing
`reflects development costs to a much greater extent. When computers fit on a
`single chip, their prices should bear some similarity with software prices.
`Indeed, we see hand calculators sold for roughly the same price as the most
`popular simple software tools. But computers that contain hundreds or thou(cid:173)
`sands of individual components are far more complex to reproduce than any
`software package. At the very least, the hardware manufacturer has to test
`the chips and systems to reject the failures, and the corresponding process in
`software manufacturing is negligible because copying software is low cost,
`reliable, and inexpensively verified. In a competitive market, it is very un(cid:173)
`likely that computers of moderate or high performance will be given away to
`purchasers of the accompanying software.
`
`1.2.2 Hardware Considerations
`
`Another fallacious argument about new designs for the future concerns the
`lavish use of hardware components in a system. The architects state con(cid:173)
`vincingly that with current trends in force, the cost of hardware will be
`negligible, so that we can afford to build systems of much greater hardware
`complexity in the future than we can today. Clearly, there is truth in this
`argument to the extent that future systems will surely be more powerful and
`complex at equal cost to today's systems. But the argument must be used
`with care because it does not excuse gross waste of hardware.
`In the future, given System A, with 100 times the logic as present systems,
`and System B, whose performance is essentially identical to A's but has only
`10 or 20 times the logic as present systems, System A will be at a serious
`competitive disadvantage. For a few hundred or a few thousand copies of
`System A sold, System A may be priced competitively with System B. For
`higher volumes of production, however, the inefficiency of the architecture of
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 20
`
`

`

`8
`
`Introduction
`
`Chap. 1
`
`System A will force its price higher than System B's for equal system value.
`Of course, this presumes that both System A and System B are built from
`components of the same generation of technology. If A's chips are ten times as
`dense as B's chips and therefore 10 times less costly per device, then the
`argument changes, and device technology, not architecture is the determin(cid:173)
`ing factor in the price of the system.
`Throughout this text we explore the study of architecture by considering
`innovations of the future that depend on low-cost components. But we shall
`always heed the efficiency of the architectures we examine to be sure that we
`are using our building blocks well.
`Consider, for example, a multiprocessor system in which there exists no
`shared memory, and suppose that we want to run a parallel program in which
`each processor executes the same program. Obviously, we can load identical
`copies of the program in all processors. When the program is small or the
`number of processors is rather modest, the memory consumed by the multi(cid:173)
`ple copies may be quite tolerable.
`But what if the program is a megabyte in size, and what if we plan to use
`1000 processors in our system? Then the copies of the program account for a
`gigabyte of storage, which need not be present if there were some way to
`share one copy of code across all processors.
`If System A uses multiple copies of programs, and System B, through a
`clever design, achieves nearly equal performance with a single copy, then the
`extra gigabyte of memory required by System A could well make System A
`totally uncompetitive with System B, unless the cost of storage becomes so
`insignificant that a gigabyte of memory accounts for a paltry fraction of the
`cost of a system. System A's architect hopes that the cost per bit of memory
`will tumble in the future, but System A requires 1010 more bits, and this is an
`enormous multiplier. If current historical trends continue, a drop in cost per
`bit to offset an inefficiency of this magnitude would probably take twenty to
`thirty years.
`In the example just presented, the architect of System A has to be aware
`of other approaches that could overcome a basic flaw in System A for the
`particular application. System A might be totally effective for other applica(cid:173)
`tions in which each processor requires a different program. But in the given
`context, System B has a tremendous, probably insurmountable advantage.
`The ar

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket