`
`I
`
`.110‘r
`3333310
`
`WE
`dumflWait""$333
`
`
`
`
`
`Q gh
`
`: % E
`
`:
`5"
`
`g
`
`|PR2018-01594
`
`EXHIBIT
`
`2052
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 1
`
`IPR2018-01594
`
`EXHIBIT
`2052
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 1
`
`
`
`High-Per£ ormance
`Coinputer Architecture
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 2
`
`
`
`High-Per£ ormance
`Contputer Architecture
`
`Harold S. Stone
`IBM Watson Research Center
`and
`Courant Institute
`NewYork University
`
`I. Addison-Wesley Publishing Company
`Reading, Massachusetts
`Menlo Park, California • Don Mills, Ontario
`Wokingham, England • Amsterdam
`Sydney • Singapore • Tokyo • Madrid
`Bogota • Santiago • San Juan
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 3
`
`
`
`This book is in the Addison-Wesley Series in Electrical and Computer Engineering
`
`Sponsoring Editor • Tom Robbins
`Production Supervisor • Bette J. Aaronson
`Copy Editor • Sarah Meyer
`Text Designer • Herb Caswell
`Illustrator • Hardlines
`Technical Art Consultant • Joseph Vetere
`Manufacturing Supervisor • Hugh Crawford
`Cover Designer • Jean Depoian
`
`Library of Congress Cataloging-in-Publication Data
`
`Stone, Harold S., 1938-
`High-performance computer architecture.
`
`Bibliography: p.
`Includes index.
`I. Title.
`I. Computer architecture.
`QA76.9.A73S76 1987
`004.22
`87-1073
`ISBN 0-201-16802-2
`
`Copyright© 1987 by Addison-Wesley Publishing Company
`
`All rights reserved. No part of this publication
`may be reproduced, stored in a retrieval sys(cid:173)
`tem, or transmitted, in any form or by any
`means, electronic, mechanical, photocopying,
`recording, or otherwise, without the prior
`written permission of the publisher. Printed in
`the United States of America. Published simul(cid:173)
`taneously in Canada.
`
`ABCDEFGHIJ-AL-8987
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 4
`
`
`
`To Jan-colleague and companion
`
`A precis from the pages of history
`
`Chapter 1
`Architecture is preeminently the art of significant forms in space-that is,
`forms significant of their functions.
`- Claude Bragdon, 1931
`
`Chapter 2
`I know of no way of judging the future but by the past.
`- Patrick Henry, 1775
`
`Chapter 3
`Comparisons do ofttime great grievance.
`-
`John Lydgate, c. 1440
`
`Chapter4
`The fickle multitude, which veers with every wind!
`-
`J. C. F. Schiller, 1800
`
`Chapter 5
`The tucked-up sempstress walks with hasty strides,
`While streams run down her oil'd umbrella's sides.
`-
`Jonathan Swift, 1711
`
`Chapter 6
`Sat cit si sat bene. [It is done quickly enough if it is done well.]
`-
`Latin proverb
`
`Chapter 7
`Who depends upon another man's table often dines late.
`-
`John Ray, 1678
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 5
`
`
`
`Preface
`
`Teaching computer architecture is an interesting challenge for the instructor
`because the field is in constant flux. What the architect does depends strongly
`on the devices available, and the devices have been changing every two to
`three years, with major breakthroughs once or twice a decade. Within the
`brief life of this textbook, there may be a complete turnover in the devices
`used in computers.
`What then should be taught to prepare students for what lies ahead?
`What information will remain important over the technical career of a stu(cid:173)
`dent, and what information will soon become obsolete, of historical interest
`only? This text stresses design ideas embodied in many machines and the
`techniques for evaluating the ideas. The ideas and the evaluation techniques
`are the principles that will survive. The specific implementations of
`machines that one might choose in 1987, 1990, or 1993 reflect the basic
`principles described here as applied to the device technology currently
`prevailing. Effective designs are those that use technology cleverly and
`achieve balanced, efficient structures matched well to the class of problems
`they attack. This text stresses the means to achieve balance and efficiency
`regardless of the underlying technology.
`We use a multifaceted approach to teaching the reader how to prepare for
`the future. The major features are the following:
`
`1. Each topic is a general architectural approach-memory designs,
`pipeline techniques, and a variety of parallel structures.
`2. Within each topic the focus is on fundamental bottlenecks-memory
`bandwidth, processing bandwidth, communications, and synchroniza(cid:173)
`tion-and how to overcome these bottlenecks for each specific topic area.
`
`vii
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 6
`
`
`
`viii
`
`Preface
`
`3. The material addresses evaluation techniques to help the reader isolate
`aspects that are highly efficient from those that are not.
`4. A few machines whose structure is of historical interest are described to
`illustrate how the concepts can be implemented.
`5. Where appropriate, the text draws on examples of real applications and
`their architectural requirements.
`6. Exercises at the end of chapters give the reader an opportunity to
`sketch out designs and perform evaluation under a variety of
`technology-oriented constraints.
`The exercises are particularly important because the reader learns to
`master the material by integrating a number of different ideas, often by
`working through a paper design that must meet some unusual set of con(cid:173)
`straints. In several exercises, the student is asked to produce a series of
`designs, each reflecting a different set of underlying devices. This helps the
`student gain experience in adapting basic techniques to new situations.
`The text is intended for advanced undergraduates and first-year graduate
`students. It assumes the student has had a course in machine organization so
`that the basic operation of a processor is well understood. Some experience
`with assembly language is helpful, but not essential. Programming in a high(cid:173)
`level language such as Pascal, however, is necessary to understand the
`applications used as examples. Mathematical background in linear systems
`or numerical methods is helpful for Chapters 4 and 5, and some exposure to
`operating systems will assist understanding Chapter 7. In neither case is the
`material absolutely required because the text contains sufficient background
`discussion to support the presentation.
`The text purposely avoids highly detailed descriptions of popular ma(cid:173)
`chines because in time the machines described will inevitably be obsolete. In
`future years, a reader of such material may be led to think that the specific
`details of the successful machine represent good design decisions for the
`future as well for the time frame in which the design was actually done. A
`better approach is for the individual instructor to discuss one or two current
`machines while using the text, with the notion that the current machines can
`change each year at the discretion of the instructor. It is also possible to use
`the text without such supplementary material because the design exercises
`provide challenges that represent technology through the end of the 80s and
`into the 90s.
`We jokingly tell students that the subject matter enjoys a positive benefit
`from the rapid changes in technology. The instructor need not create new
`exercises and examinations for each new class. The questions may be the
`same each year, but the answers will be different.
`As an aid for the student and instructor, there is a floppy disk available
`with stripped traces of program execution. The student should find this use-
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 7
`
`
`
`Preface
`
`ix
`
`ful for cache evaluation studies. The disk is usable directly on IBM personal
`computers and compatible equipment and can be accessed by programs
`written in a variety of programming languages. Prior to the publication of
`this text, thorough studies of cache behavior required main-frame computers
`for analysis because of the massive amounts of data to process. Techniques
`described in Chapter 2 show how to reduce the processing by as much as two
`orders of magnitude. These techniques make possible the use of a personal
`computer as the primary analysis tool. Instructors may find that exercises in
`cache analysis are particularly illuminating. Instructors who would like to
`obtain the disk should contact the publisher or the author.
`The material in the text is structured in a modular fashion, with each
`chapter reasonably independent of every other chapter. The instructor can
`put together a course by selecting individual chapters and individual sections
`according to the background of the students, the prerequisites available, and
`the successor courses in the curriculum.
`Chapters 2 and 3 form the core material. Cache memories and pipeline
`structures are widely used today, and they are likely to be effective in the
`technologies that will emerge in the next several years. These chapters should
`be taught in all course offerings.
`For courses in which students have a strong mathematical preparation,
`Chapters 4 and 5 are particularly well suited because they treat techniques
`for high-speed numerical computations. Although the information is of
`interest for general-purpose computers, it is biased to supercomputers that
`are used on large-scale numerical problems.
`Chapters 6 and 7 treat multiprocessors, which are more general purpose
`than the machines of Chapters 4 and 5. These chapters are recommended for
`curricula that stress systems programming and computer engineering.
`In one semester, it is reasonable to co.nplete selected sections of all chap(cid:173)
`ters, or to cover Chapters 2 and 3 and two other chapters in depth. Chapter 1,
`which has no exercises, is to be used as background reading to set the tone of
`the exposition. The text can easily satisfy the needs of a two-quarter sequence
`if the instructor chooses to use the full material.
`No matter which portion of the text is covered, working the exercises is
`critical for a thorough appreciation of the material. The design-oriented exer(cid:173)
`cises can be rather frustrating at first because there is no clear indication of a
`correct answer. The reader wants to be able to jot down a simple answer to a
`question after a small amount of thought. What a pleasure to crank through a
`calculation and find the answer is 17.5. The design exercises are nothing like
`this. No specific quality distinguishes a right answer from a wrong answer.
`The answer is a design, and if it meets the design constraints it must be
`correct.
`The point of working such exercises is not the final design, but rather the
`process of arriving at the final design. What alternatives were considered?
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 8
`
`
`
`x
`
`Preface
`
`How does the final design overcome basic problems? Did the student consider
`a reasonable set of alternatives or was there a valid approach missed that
`should have been considered? Is the evaluation of the design reasonable? For
`what assumptions concerning technology factors and workload character(cid:173)
`istics is the given design an efficient one?
`After working through s.everal such problems the reader becomes familiar
`with the thought processes of the designer and gains both experience and
`insight in architectural design. Many exercises seem to capture real situ(cid:173)
`ations, and this is as intended. As in real situations the reader may discover
`that there is no good solution, and a compromise has to be invented. Or there
`may be several reasonable solutions, and the reader has to pick one, possibly
`on the basis of characteristics that are secondary in importance because all
`the solutions available have satisfactory primary characteristics. Many exer(cid:173)
`cises have actually been drawn from design problems faced by the author,
`with constraints updated for the present and future.
`The preparation of this text represents the fruits of labor of many parties.
`The author's students, Tom Puzak, Zarka Cvetanovic, and Dominique Thi(cid:173)
`ebaut, contributed a substantial number of the ideas presented. They also
`offered helpful comments and criticism as the project progressed. Other re(cid:173)
`viewers whose comments are reflected in these pages are William F. Applebe,
`Georgia Institute of Technology; Richard A. Erdrich, UNISYS Corporation;
`John L. Hennessy, Stanford University; K. C. Murphy, Advanced Micro De(cid:173)
`vices; Paul Pederson, New York University; Richard L. Sites, Digital Equip(cid:173)
`ment Corporation; and Phil Emma, Jeff Lee, Peter Hsu, K. S. Natarajan,
`Howard Sachar, and Marc Surette, all with IBM. Collectively and individu(cid:173)
`ally, their work has aided greatly in the process of developing the material to
`make it easily accessible to the intended audience. The publication crew at
`Addison-Wesley did a remarkable job in putting the project together. Bette
`Aaronson and Sarah Meyer demonstrated that they know pipelining in prac(cid:173)
`tice better than the author does in theory, smoothly flowing the chapters
`through the tedious process of markup, text edit, and page composition
`to demonstrate their proficiency in high-performance publishing. To Tom
`Robbins we offer gratitude for support and encouragement in this project
`from its inception to its completion.
`
`Chappaqua, New York
`
`H.S.S.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 9
`
`
`
`Contents
`
`1 Introduction
`
`1.1 Technology and Architecture
`1.2 But Is It Art?
`1.2.1 The Cost Factor
`1.2.2 Hardware Considerations
`1.3 High-Performance Techniques
`1.3 .1 Measuring Costs
`1.3.2 The Role of Applications
`1.3 .3 The Impact of VLSI
`1.3 .4 The Effect of Technological Change on Cost
`1.3.5 Algorithms and Architecture
`1.4 Historical References
`
`2
`
`Memory-System Design
`
`2.1 Exploiting Program Characteristics
`2.2 Cache Memory
`2.2.1 Basic Cache Structure
`2 .2 .2 Cache Design
`2.2.3 Cache Analysis
`2.2.4 Replacement Policies
`2.2.5 Footprints in the Cache
`2 .2 .6 Writing to the Cache
`
`1
`
`1
`3
`4
`7
`9
`10
`12
`13
`14
`17
`19
`
`21
`23
`29
`29
`32
`39
`52
`58
`66
`
`xi
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 10
`
`
`
`xii
`
`Contents
`
`2 .3 Virtual Memory
`2.3.1 Virtual-Memory Structure
`2.3.2 Virtual-Memory Mapping
`2.3.3 Improving Program Locality
`2.3.4 Replacement Algorithms
`2.3.5 Buffering Effects in Virtual-Memory Systems
`Exercises
`
`3 Pipeline Design Techniques
`
`3.1 Principles of Pipeline Design
`3.2 Memory Structures in Pipeline Computers
`3.3 Performance of Pipelined Computers
`3.4 Control of Pipeline Stages
`3.4.1 Design of a Multi-function Pipeline
`3.4.2 The Collision Vector and Pipeline Control
`3.4.3 Maximum Performance Pipelines
`3 .4 .4 Using Delays to Increase Performance
`3.4.5 Interlock Elimination
`3 .5 Exploiting Pipeline Techniques
`3.5.1 Conditional Branches
`3.5.2 Internal Forwarding and Deferred Instructions
`3 .5 .3 Machines with Both Cache and Virtual Memory
`3.5.4 RISC Architectures
`3 .6 Historical References
`Exercises
`
`4 Characteristics of Numerical Applications
`
`4.1 Classification of Large-Scale Numerical Problems
`4.1.1 Continuum Models
`4 .1.1 Particle Models
`4.2 Design Constraints for High-Performance Machines
`4.3 Architectures for the Continuum Model
`4.4 Algorithms for the Continuum Model
`4.4.1 The Cosmic Cube
`4.4.2 Data-Flow Requirements
`4.4.3 Parallel Solutions
`4.4.4 Recursive Doubling and Cyclic Reduction
`4.5 The Perfect Shuffle
`4.5.1 The Perfect-Shuffle Interconnection Pattern
`4.5.2 Applications of the Perfect Shuffle
`4.6 Architectures for the Continuum Model-Which Direction?
`Exercises
`
`69
`70
`74
`81
`84
`90
`94
`
`102
`
`103
`115
`117
`127
`127
`132
`138
`140
`148
`150
`150
`155
`165
`168
`171
`172
`
`177
`
`178
`180
`182
`184
`186
`194
`194
`195
`200
`206
`210
`210
`217
`227
`229
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 11
`
`
`
`Contents
`
`5
`
`Vector Computers
`5.1 A Generic Vector Processor
`5.1.1 Multiple Memory Modules
`5.1.2 Intermediate Memories
`5.2 Access Patterns for Numerical Algorithms
`5.2.1 Gaussian Elimination
`5.3 Data-Structuring Techniques for Vector Machines
`5.4 Attached Vector-Processors
`5.5 Sparse-Matrix Techniques
`5 .6 The G F-11, A Very High-Speed Vector Processor
`5.7 Final Comments on Vector Computers
`Exercises
`
`6 Multiprocessors
`
`xiii
`
`233
`
`234
`236
`244
`248
`249
`253
`261
`266
`268
`271
`274
`
`278
`
`279
`283
`
`6.1 Background
`6.2 Multiprocessor Performance
`6.2.1 The Basic Model-Two Processors with Unoverlapped
`285
`Communications
`6.2.2 Extension to N Processors
`286
`6.2.3 A Stochastic Model
`290
`291
`6.2.4 A Model with Linear Communication Costs
`6.2.5 An Optimistic Model-Fully Overlapped Communication 293
`6.2.6 A Model with Multiple Communication Links
`295
`6.2.7 Multiprocessor Models
`297
`6.3 Multiprocessor Interconnections
`299
`6.3.1 Bus Interconnections
`299
`6.3.2 Ring Interconnections
`304
`6.3.3 Crossbar Interconnections
`305
`6.3.4 The Shuffle-Exchange Interconnection and the
`Combining Switch
`6.3.5 The Butterfly Operation and the Reverse-Binary
`Transformation
`6.3.6 The Combining Network and Fetch-and-Add
`6.4 Cache Coherence in Multiprocessors
`6.5 Summary
`Exercises
`
`310
`
`312
`318
`324
`329
`330
`
`7 Multiprocessor Algorithms
`
`7 .1 Easy Parallelism
`7 .1.1 The do par and do seq Constructions
`7 .1 .2 Barrier Synchronization
`
`332
`
`333
`335
`336
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 12
`
`
`
`xiv
`
`Contents
`
`7.1.3 Performance Considerations
`7 .1 .4 Increasing Granularity
`7.1.5 Initiating Tasks
`7.2 Synchronization Techniques
`7.2.1 Synchronization with Test-and-Set
`7.2.2 Synchronization with Increment and Decrement
`7 .2.3 Synchronization with Compare-and-Swap
`7 .2.4 Synchronization with Fetch-and-Add
`7.3 Parallel Search-How to Use and Not Use Parallelism
`7.3 .1 Searching for the Maximum of a Unimodal Function
`7.3 .2 Parallel Branch-and-Bound-The Traveling-Salesman
`Problem
`7 .4 Transforming Serial Algorithms into Parallel Algorithms
`7 .4 .1 Dependency Analysis
`7.4.2 Exploiting Parallelism Across Iterations
`7.4.3 The Effects of Scheduling on Parallelism
`7 .5 Final Comments on Multiprocessors
`Exercises
`
`References
`
`Index and Glossary
`
`338
`341
`345
`347
`348
`352
`355
`362
`365
`366
`
`369
`37 4
`37 5
`377
`382
`383
`385
`
`389
`
`397
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 13
`
`
`
`
`
`2
`
`Introduction
`
`Chap. l
`
`ridiculous proposal for today may be ideal for tomorrow. There are no abso(cid:173)
`lute rules that say that one architecture is better than another.
`The key to learning about computer architecture is learning how to evalu(cid:173)
`ate architecture in the context of the technology available. It is as important
`to know if a computer system makes effective use of processor cycles, memory
`capacity, and input/output bandwidth as it is to know its raw computational
`speed. The objective is to look at both cost and performance, not performance
`alone, in evaluating architectures. Because of changes in technology, relative
`costs among modules as well as absolute costs change dramatically every few
`years, so the best proportion of different types of modules in a cost-effective
`design changes with technology.
`This text takes the approach that it is methodology, not conclusions, that
`needs to be taught. We present a menu of possibilities, some reasonable today
`and some not. We show how to construct high-performance systems by mak(cid:173)
`ing selections from the menus, and we evaluate the systems produced in
`terms of technology that exists in the mid-1980s. The conclusions reached by
`these evaluations are probably reasonable through the end of the decade, but
`in no way do we claim that the architectures that look strongest today will be
`the best in the next decade.
`The methodology, however, is timeless. From time to time the computer
`architect needs to construct a new menu of design choices. With that menu
`and the design and evaluation techniques described in this text, the architect
`should be able to produce high-quality systems in any decade for the tech(cid:173)
`nology at that time.
`Performance analysis should be based on the architecture of the total
`system. Design and analysis of high-performance systems is very complex,
`however, and is best approached by breaking the large system into a hier(cid:173)
`archy of functional blocks, each with an architecture that can be analyzed in
`isolation. If any single function is very complicated, it too can be further
`refined into a collection of more primitive functions. Processor architecture,
`for example, involves putting together registers, arithmetic units, and control
`logic to create processors-the computational elements of a computer
`system.
`An important facet of processor architecture is the design of the instruc(cid:173)
`tion set for the processor, and we shall learn in the course of this text that
`there are controversies raging today concerning whether instruction sets
`should be very simple or very complex. We do not settle this controversy here;
`there cannot be a single answer. But we do illuminate the factors that deter(cid:173)
`mine the answer, and in any technology an architect can measure those
`factors in the course of a new design.
`Computer architecture is sometimes confused with the design of com(cid:173)
`puter hardware. Because computer architecture deals with modules at a
`functional level, not exclusively at a hardware level, computer architecture
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 15
`
`
`
`Sec. 1.2
`
`But Is It Art?
`
`3
`
`must encompass more than hardware. We can specify, for example, that a
`processor performs arithmetic and logic functions, and we can be reasonably
`sure that these functions will be built into the hardware and not require
`additional programming. If we specify memory management functions in the
`processor, the actual implementation of those functions may be some mix of
`hardware and software,, with the exact mix depending on performance, avail(cid:173)
`ability of existing hardware or software components, and costs.
`When very-large-scale integration (VLSI) was in its infancy, memory(cid:173)
`management functions were implemented in software, and the processor
`architecture had to support such software by providing only a collection of
`registers for address mapping and protection. With VLSI it becomes possible
`to embed a greater portion of memory management in hardware. Many sys(cid:173)
`tems employ sophisticated algorithms in hardware for performing memory(cid:173)
`management functions once exclusively implemented in software.
`The line between hardware and software becomes somewhat fuzzy when
`last year's software is embedded directly in read-only memory on a memory(cid:173)
`management chip where it is invisibly invoked by the programs being man(cid:173)
`aged. Once such a chip is packaged and is then a "black box" that does
`memory management, the solution becomes a hardware solution. The archi(cid:173)
`tect who uses the chip need not provide additional software for memory
`management. If a chip does most, but not all, memory-management func(cid:173)
`tions internally, then the architect must look into providing the missing
`features by incorporating software modules.
`In retrospect, computer architecture makes systems from components,
`and the components can be hardware, software, or a mixture of both. The
`skill involved in architecture is to select a good collection of components and
`put them together so they work effectively as a total system. Later chapters
`show various examples of architectures, some proven successful and some
`proposals that might succeed.
`
`1.2 But Is It Art?
`An article in the New York Times in January 1985 described a discovery of an
`unsigned painting by de Kooning that raised a few eyebrows among art
`critics. Although it does not bear his signature, there was no doubt that it is
`his work, and it was hung in a gallery for public viewing. The piece is a bench
`from the outhouse of his summer beach house that de Kooning painted ab(cid:173)
`stractly to give the appearance of marble. Is this piece a great work of art by a
`renowned master, or is it just a painted privy seat? The point is that art
`appreciation is based on aesthetics, for which we have no absolute measures.
`We have no absolute test to conclude whether the work is a masterpiece or a
`piece of junk. If the art world agrees that it is a masterpiece, then it is a
`masterpiece.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 16
`
`
`
`4
`
`Introduction
`
`Chap. 1
`
`Computer architecture, too, has an aesthetic side, but it is quite different
`from the arts. We can evaluate the quality of an architecture in terms of
`maximum number of results per cycle, program and data capacity, and cost,
`as well as other measures that tend to be important in various contexts. We
`need never debate a questions such as, "but is it fast?"
`Architectures can be compared on critical measures when choices must
`be made. The challenge comes because technology gives us new choices each
`year, and the decisions from last year may not hold this year. Not only must
`the architect understand the best decision for today, but the architect must
`factor in the effects of expected changes in technology over the life of a design.
`Therefore, not only do evaluation techniques play a crucial role in individual
`decisions, but by using these techniques over a period of years, the architect
`gains experience in understanding the impact of technological developments
`on new architectures and is able to judge trends for several years in the
`future.
`Here are the principal criteria for judging an architecture:
`
`• Performance;
`• Cost; and
`• Maximum program and data size.
`
`There are a dozen or more other criteria, such as weight, power consumption,
`volume, and ease of programming, that may have relatively high significance
`in particular cases, but the three listed here are important in all applications
`and critical in most of them.
`
`1.2.1 The Cost Factor
`
`The cost criterion deserves a bit more explanation because so many people
`are confused about what it means. The cost of a computer system to a user is
`the money that the user pays for the system, namely its price. To the designer,
`cost is not so clearly defined. In most cases, cost is the cost of manufacturing,
`including a fair amortization of the cost of development and capital tools for
`construction. All too often we see comparisons of architectures that compare
`the parts cost of System A with the purchase price of System B, where System
`A is a novel architecture that is being proposed as an innovation, and System
`B represents a model in commercial production.
`Another fallacious comparison is often made when relating hardware to
`software. In the early years of computing, software was often bundled free of
`charge with hardware, but, as the industry matured, software itself became a
`commodity of value to be sold.
`We now discover that what was once a free good now commands a signifi(cid:173)
`cant portion of a computing budget. The trends that people quote are de(cid:173)
`picted in Fig. 1.1, where we see the cost of software steadily rising with
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 17
`
`
`
`
`
`
`
`Sec. 1.2
`
`But Is It Art?
`
`7
`
`Our analysis also shows why in years to come hardware costs will still
`prove to be significant compared to software costs. At issue here is the cost of
`manufacturing. Software manufacturing costs are near zero today and can
`only go lower, so that software pricing in a competitive market mainly re(cid:173)
`flects the amortization of development costs.
`Hardware manufacturing costs, while small on a per-chip basis, are many
`times more than software manufacturing costs. It is far less costly today to
`replicate accurate copies of software than it is to replicate hardware. Hard(cid:173)
`ware requires assembly and testing to make sure that each copy is a faithful
`copy of the original design. This is far more complex today than the quality
`assurance on a software manufacturing line that simply has to compare each
`bit of information in software to see if it agrees with the original program.
`We see that hardware pricing carries the burden of per-unit manu(cid:173)
`facturing costs together with development costs, whereas software pricing
`reflects development costs to a much greater extent. When computers fit on a
`single chip, their prices should bear some similarity with software prices.
`Indeed, we see hand calculators sold for roughly the same price as the most
`popular simple software tools. But computers that contain hundreds or thou(cid:173)
`sands of individual components are far more complex to reproduce than any
`software package. At the very least, the hardware manufacturer has to test
`the chips and systems to reject the failures, and the corresponding process in
`software manufacturing is negligible because copying software is low cost,
`reliable, and inexpensively verified. In a competitive market, it is very un(cid:173)
`likely that computers of moderate or high performance will be given away to
`purchasers of the accompanying software.
`
`1.2.2 Hardware Considerations
`
`Another fallacious argument about new designs for the future concerns the
`lavish use of hardware components in a system. The architects state con(cid:173)
`vincingly that with current trends in force, the cost of hardware will be
`negligible, so that we can afford to build systems of much greater hardware
`complexity in the future than we can today. Clearly, there is truth in this
`argument to the extent that future systems will surely be more powerful and
`complex at equal cost to today's systems. But the argument must be used
`with care because it does not excuse gross waste of hardware.
`In the future, given System A, with 100 times the logic as present systems,
`and System B, whose performance is essentially identical to A's but has only
`10 or 20 times the logic as present systems, System A will be at a serious
`competitive disadvantage. For a few hundred or a few thousand copies of
`System A sold, System A may be priced competitively with System B. For
`higher volumes of production, however, the inefficiency of the architecture of
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2069, p. 20
`
`
`
`8
`
`Introduction
`
`Chap. 1
`
`System A will force its price higher than System B's for equal system value.
`Of course, this presumes that both System A and System B are built from
`components of the same generation of technology. If A's chips are ten times as
`dense as B's chips and therefore 10 times less costly per device, then the
`argument changes, and device technology, not architecture is the determin(cid:173)
`ing factor in the price of the system.
`Throughout this text we explore the study of architecture by considering
`innovations of the future that depend on low-cost components. But we shall
`always heed the efficiency of the architectures we examine to be sure that we
`are using our building blocks well.
`Consider, for example, a multiprocessor system in which there exists no
`shared memory, and suppose that we want to run a parallel program in which
`each processor executes the same program. Obviously, we can load identical
`copies of the program in all processors. When the program is small or the
`number of processors is rather modest, the memory consumed by the multi(cid:173)
`ple copies may be quite tolerable.
`But what if the program is a megabyte in size, and what if we plan to use
`1000 processors in our system? Then the copies of the program account for a
`gigabyte of storage, which need not be present if there were some way to
`share one copy of code across all processors.
`If System A uses multiple copies of programs, and System B, through a
`clever design, achieves nearly equal performance with a single copy, then the
`extra gigabyte of memory required by System A could well make System A
`totally uncompetitive with System B, unless the cost of storage becomes so
`insignificant that a gigabyte of memory accounts for a paltry fraction of the
`cost of a system. System A's architect hopes that the cost per bit of memory
`will tumble in the future, but System A requires 1010 more bits, and this is an
`enormous multiplier. If current historical trends continue, a drop in cost per
`bit to offset an inefficiency of this magnitude would probably take twenty to
`thirty years.
`In the example just presented, the architect of System A has to be aware
`of other approaches that could overcome a basic flaw in System A for the
`particular application. System A might be totally effective for other applica(cid:173)
`tions in which each processor requires a different program. But in the given
`context, System B has a tremendous, probably insurmountable advantage.
`The ar