`
`Reference 23
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 1
`
`
`
`Third Edition
`
`High-Performance
`Computer Architecture
`
`,.-. ~~
`
`- ~ ~ - V
`"
`
`Harold s. Stone
`IBM T.J. Watson
`Research Center
`and
`Courant Institute
`New York University
`
`,.4., Addison ... wcsley Publishing Company
`
`Reading, Massachusetts
`Menlo Park, California • New York
`Don Mills, Ontario • Wokingham, England
`Amsterdam • Bonn • Sydney • Singapore
`Tokyo • Madrid • San Juan • Milan • Paris
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 2
`
`
`
`This book is in the Addison-Wesley Series in Electrical and· Computer Engineering
`
`Libnry of Congress Cataloging-in-Publication Data
`
`Stone, Harold S.
`High-performance computer architecture/ Harold S. Stone.-3rd
`
`ed.
`
`p. cm.
`Includes bibliographical references and index.
`ISBN 0-201-52688-3
`L Computer architecture. I. Title.
`QA76.9.A73S76 199.3 ,'J t·\: 0
`004.2'2-dc20
`· • ►., _i
`, .· '
`
`92-32243
`CIP
`
`Copyright@ 1993 by Addison-Wesley Publishing Company, Inc.
`All rights reserved. No part of this publication may be reproduced, stored in a
`retrieval system, or transmitted, in any form or by any means, electronic, mechanical,
`photocopying, rttording, or otherwise, without the prior written permission of the
`publisher. Printed in the United States of America.
`1 2 3 4 5 6 7 8 9 10-HA-95949392
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 3
`
`
`
`To ]an-colleague and companion
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 4
`
`
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 5
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 5
`
`
`
`p
`reface
`
`..
`
`Teaching computer architecture is an interesting challenge for the instructor
`because the field is in constant flux. What the architect does depends strongly
`on the devices available, and the devices have been changing every two to three
`years, with major breakthroughs once or twice a decade. Within the brief life
`of the first edition of this textbook a whole generation of processor and memory
`chips were first offered for sale, appeared in popular computers, and then
`gradually disappeared from the marketplace as their successors took their places.
`The particular features and strengths of those devices have given way to other
`features in various new combinations and new relative costs. Design practices
`are evolving to exploit the new devices for a new generation of machines. And
`they will evolve again as the next wave of devices appears in the coming years.
`What then should be taught to prepare students for what lies ahead? What
`information win remain important over the technical career of a student, and
`what information will soon become obsolete, of historical interest only? This
`text stresses design ideas embodied in many machines and the techniques for
`evaluating those ideas. The ideas and the evaluation techniques are the principles
`that will survive. The specific implementations of machines that one might
`choose in 1995 2000, or 2005 reflect the basic principles described here as applied
`to the device technology currently prevailing. Effective designs are those that
`use technology cleverly and achieve balanced, efficient structures matched well
`to the class of problems they attack. This text stresses the means to achieve
`balance and efficiency in the context of any device technology.
`
`vii
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 6
`
`
`
`viii
`
`Preface
`
`We use a multifaceted approach to teaching the reader how to prepare for
`the future. The major features are the following:
`1. Each topic is a general architectural approach-memory designs, pipeline
`techniques, and a variety of parallel structures.
`2. Within each topic the focus is on fundamenta] bottlenecks-memory band
`width, processing band-.vidth, communications, anq synchronization-and
`how to overcome these bottlenecks for each topic area.
`3. The materiaJ addresses evaluation techniques to help the reader isolate as
`pects that are highly efficient from those that are not.
`4. A few machines whose structure is of historical interest are described to
`illustrate how the concepts can be implemented.
`5. Where appropriate, the text draws on examples of real applications and their
`architectural requirements.
`6. Exercises at the end of chapters give the reader an opportunity to sketch
`out designs and perform .evaluation under a variety of technology-oriented
`constraints.
`The exercises are particularly important. They help the reader master the material
`by integrating a number of different ideas, often by working through a paper
`design that must satisfy some unusual set of constraints. In several exercises,
`the student is asked to produce a series of designs1 each reflecting a different
`set of underlying devices. This helps the student gain experience in adapting
`basic techniques to new situations.
`The text is intended for the advanced undergraduate and first-year graduate
`students. It assumes the student has had a course in machine organization so
`that the basic operation of a processor is well understood. Some experience with
`assembly language is helpful, but not essential. Programming in a high-level
`language such as Pascal, however, is necessary to understand the applications
`used as examples. Mathematical background in probability is helpful for Chap
`ter 2, linear systems or numerical methods for Chapters 4 and 5, and some
`exposure to operating systems will assist understanding of Chapter 7. In no case
`is the material absolutely required because the text contains sufficient discussion
`and references to source material to support the presentation.
`The text purposely avoids detailed descriptions of popular machines because
`in time the machines so described will inevitably be obsolete. In future years,
`a reader of such material may be led to think that the specific details of a
`successful machine represent good design decisions for the future as well as for
`the period in which the design was actually done. A better approach is for the
`individual instructor to discuss one or two current machines while using the
`text, with the notion that current machines can change each year at the discretion
`of the instructor. It is also possible to use the text.without such supplementary
`material because the design exercises provide challenges that represent tech
`nology through the 1990s.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 7
`
`
`
`P_rcfacc
`
`ix
`
`We jokingly teJl students that the subject matter enjoys a positive benefit
`from the rapid change in technology. The instructor need not create new ex
`ercises and examinations for each new class. The questions may be the same
`each year, but the answers will be different.
`A number of teaching aids are available with this edition. The exercises in
`Chapter 2 make use of traces of instruction execution for which a floppy disk
`with sample traces is available from the publisher for course adopters. The disk
`is in IBM-compatible format and can be accessed by programs written in a variety
`of programming languages.
`Prior to the publication of this text thorough studies of cache behavior
`required main-frame computers for ana1ysis due to the maSS\Ve amounts of data
`to process. The techniques described in Chapter 2 show how to reduce the
`processing by as much as two orders of magnitude and make possib]e the use
`of a personal computer as the primary analysis tool. The analysis techniques
`were first made widely available in the first edition of this text., and have now
`become standard among computer architects. The exercises for Chapter 2 give
`the student ample opportunity to practice cache analysis on the sample traces
`and to practice evaluating design alternatives.
`An instructor's guide with solutions to selected exercises is also available
`from the publisher to course adopters. Among the solutions in the manual are
`sample solutions to some of the design exercises. The instructor shou]d bear in
`mind that the design exercises can be satisfied by many different designs, and
`that the sample solutions are illustrative of good approaches, but are definitely
`not the only acceptable solutions. What is important is the reasoning used by
`the student to establish that a particular design meets the constraints imposed
`and is both efficient and effective in solving the given design problem.
`Three sets of video-taped lectures provide instructional aid in a different
`form. A set of eight lectures that cover the highlights of the entire text can be
`ordered by writing to Addison-Wesley, Reading, MA 01867., Attn: Engineering
`Editor. A set of three lectures on the topics of multiprocessor cache coherence
`and synchronization is available from the IEEE Computer Society Press, 10662
`Los Vaqueros Circle, Los Alamitos, CA 90720. Another set of three lectures on
`advanced topics in cache behavior and cache analysis is available from the Na
`tional Technological University, 700 Centre Ave., Ft. Collins, CO 80526, Attn:
`Richard Soderberg. The videotapes focus on central issues, and describe these
`topics visually and orally in a way that cannot be done in writing. Students and
`instructors will find the video tapes very useful for intensive study in short
`courses or self-paced instruction. The video medium is an effective means for
`fast transfer of informationJ and it is a useful supplement to a slower paced
`program of classroom lecture and intensive reading that encourages deeper
`understanding.
`Instructors familiar with the first edition will find new material on program
`behavior models, RISC architectureJ and parallel synchronization. The material
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 8
`
`
`
`X
`
`Preface
`
`on program behavior has been introduced because machines have changed so
`quickly in recent years that designers are forced to produce new generations of
`processors without the benefit of traces of workloads for those processors. In
`such cases, the evaluation techniques described in Chapter 2 cannot be brought
`into play. The next best tool is to produce estimates of program behavior that
`can be used as input to design evaluations. We have incorporated some inter
`esting new developments in program modeling that appeared after the publi
`cation of the first edition.
`Similarly, RISC architecture and parallel synchronization have been devel
`oping very quickly in recent years and demanded additional space in the new
`edition. Beyond these topics, small incremental changes in the remaining topics
`have helped bring them up to date and streamlined their presentation.
`The material in the text is structured in a modular fashion,. with each chapter
`reasonably independent of every other chapter. The instructor can put together
`a course by selecting individual chapters and individual sections according to
`the background of the students, the prerequisites available, and the successor
`courses in the curriculum.
`Chapters 2 and 3 form the core material. Cache memories and pipeline
`structures are widely used today, and they are likely to be effective in the
`technologies that will emerge in the next several years. These chapters should
`be taught in all course offerings.
`For courses in which students have a good background in nu merical meth
`ods, Chapters 4 and 5 show how parallel computer architectures are matched
`to problem domains. Students unfamiliar with the underlying mathematical
`applications will gain an understanding of computational methods in wide use
`from these chapters, and all readers will appreciate how data flow and syn
`chronization of math ematical actions in an algorithm are directly supported by
`architectural features. The chapters are biased toward supercomputers and large
`scale computations, but the material is useful as well for general purpose
`computers.
`Chapters 6 and 7 treat multiprocessors, which are more general purpose
`than the machines of Chapters 4 and 5. Multiprocessors were almost exclusively
`research vehicles in the 1970s, and were in commercial use in niche areas in the
`1980s. The 1990s will find a much broader use of multiprocessors as the speed
`of individual processors reaches the limit of metal interconnections. The highest
`sustainable clock rate for metal interconnections is roughly 200 to 250 MHz for
`a typical conductor geometry, although the dock rate can be boosted even higher
`at great expense by reducing the dimensions of all components and conductors.
`Computers in all dasses from microprocessors to high-end machines started the
`1990s within one to two generations of this clock limit. To sustain increases in
`perf?rmance through the decade, the industry must embrace multiprocessing
`m Virtually all computers, or must abandon metal interconnection technology
`for another technology such as optical fiber or optical waveguide technology.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 9
`
`
`
`Prdacc
`
`xi
`
`In this text, we cxplqre the use of multiprocessors and leave the topic of
`optical interconnections for another time and another text. The multiprocessor
`discussion is oriented to where to seek performance improvement by using
`resources efficiently. The interplay of multiple disciplines is central to this dis
`cussion. Each specialist on a design team should have a broad shallow knowledge
`of the full scope of a design, including hardware, software, architecture, and
`applications, while enjoying a much deeper knowledge of a specialty area. Chap
`ters 6 and 7 give a broad view of multiprocessors and delve deeply into particular
`topics such as algorithm design and performance models that are relevant to all
`specialties. These chapters are recommended especially for curricula that em
`phasize systems programming and computer engineering.
`In one semester, it is reasonable to complete selected sections of all chapters,
`or to cover Chapters 2 and 3 and two other chapters in depth. Chapter 1, which
`has no exercises, is to be used as background reading to set the tone of the
`exposition. The text can easily satisfy the needs of a two-quarter or two-semester
`sequence if the instructor chooses to use the full material.
`No matter which portion of the text is covered, working the exercises is
`critical for a thorough appreciation of the material. The design-oriented exercises
`can be rather frustrating at first because there is no clear indication of a correct
`answer. The reader wants to see exercises that can be answered quickly by
`jotting down a simple answer after a small amount of thought. \'\That a pleasure
`to crank through a calculation and find the answer is 17.5. The design exercises
`are nothing like this. In a sense an answer is correct if it meets the constraints
`of the design. The reality is that the answer should be more than correct-it
`must be competitive.
`The point of working such exercises is not the final design, but rather the
`process of arriving at the final design. What alternatives were considered? How
`does the final design overcome basic problems? Did the student consider a
`reasonable set of alternatives or was there a valid approach missed that should
`have been considered? Is the evaluation of the design reasonable? For what
`assumptions concerning technology factors and workload characteristics is the
`given design an efficient one?
`After working through such problems the reader becomes familiar with the
`thought processes of the designer and gains both experience and insight into
`architectural design. Many exercises seem to capture real situations, and this is
`as intended. As in real situations, the reader may discover that there is no good
`solution, and a compromise has to be invented. Or there may be several rea
`sonable solutions, and the reader has to pick one, possibly on the basis of
`characteristics that are secondary in importance because all solutions available
`have satisfactory primary characteristics. Many exercises have been drawn from
`design problems faced by the author, with constraints updated for the present
`and future.
`The preparation of this text represents the fruits of labor of many parties.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 10
`
`
`
`xii
`
`Preface
`
`The author's students, Tom Puzak, Zarka Cvetanovic, Dominique Thiebaut, and
`John Turek contributed a number of ideas to the text and exercises. They also
`offered helpful comments and criticisms as the project progressed. Kevin Don
`ovan, David Epstein, and Robert Hinkley produced high-quality solutions to
`the exercises that appear in the instructor's guide. Other reviewers whose com
`ments are reflected in these pages are WilJiam F. Applebe, Georgia Institute of
`Technology; Richard A. Erdrich, Unisys Corporation; John L Hennessy, Stan
`ford University; K. C. Murphy, Advanced Micro Devices; PauJ Pederson,
`New
`York University; Richard L. Sites, Digital Equipment Corporation; Henry Levy,
`University of 'v\Tashington; Glen Langdon, University of California at Santa Cruz;
`Peter Hsu, Sun Microcomputers, and Phil Emma, Jeff Lee, K. S. Natarajan,
`Howard Sachar, and Marc Surette, all with IBM. Collective)y and individually,
`their work has aided greatly the process of developing material to make it easily
`accessible to the intended audience. The publication crew at Addison-Wesley
`did a remarkable job in putting the project together. Patsy DuMou1in, Bette
`Aaronson, and Karen Myer demonstrated that they know pipelining in practice
`better than the author does in theory, smoothly flowing the chapters through
`the tedious process of markup, text editing, and page composition in a remark
`able example of proficiency in high-performance publishing. To Tom Robbins,
`we offer gratitude for support and encouragement in the project from its incep
`tion to its completion.
`
`Clmppaqua, New York
`
`H. S.S.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 11
`
`
`
`Contents
`
`1
`
`2
`
`Introduction
`1.1 Technology and Architecture
`1.2 But Is It Art?
`1.2.1 The Cost Factor
`1.2.2 Hardware Considerations
`1.3 High-Performance Techniques
`1.3.1 Measuring Costs
`1.3.2 The Role of Applications
`1.3.3 The Impact of VLSI
`1.3.4 The Impact of Digital Communications
`1.3.5 The Effect of Technological Change on Cost
`1.3.6 Algorithms and Architecture
`1.4 Historical References
`
`1
`1
`3
`4
`8
`10
`11
`12
`14
`15
`16
`19
`21
`
`Memory-System Design
`24
`2.1 Exploiting Program Characteristics
`26
`2.2 Cache Memory
`32
`2.2.1
`Basic Cache Structure
`32
`2.2.2 Cache Design
`36
`2.2.3 Cache Analysis: Trace Generation and Trace Length 44
`2. 2.4
`Efficient Cache Analysis
`57
`2.2.5
`Replacement Policies
`70
`Footprints in the Cache
`2.2.6
`76
`
`Xiii
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 12
`
`
`
`3
`
`Contents
`
`2.2.7 Writing to the Cache
`2.2.8 Other Cache Metrics
`2.2.9 Modeling System Performance
`2.2.10 Modeling Cache Behavior
`2.3 Virtual Memory
`2.3.1 Virtual-Memory Structure
`2.3.2 Virtual-Memory Mapping
`Improving Program Locality
`2.3.3
`2.3.4 Replacement Algorithms
`2.3.5 Buffering Effects in Virtual-Memory Systems
`Exercises
`
`Pipeline Design Techniques
`3.1 Principles of Pipeline Design
`3.2 Memory Structures in Pipeline Computers
`3.3 Performance of Pipelined Computers
`3.4 Control of Pipeline Stages
`3.4.1 Design of a Multi-Function Pipeline
`3.4.2 The Collision Vector and Pipeline Control
`3.4.3 Maximum Performance Pipelines
`3.4.4 Using Delays to Increase Performance
`3.4.5
`Interlock Elimination
`3.5 Exploiting Pipeline Techniques
`3.5.1 Conditional Branches
`3.5.2
`Internal Forwarding and Deferred Instructions
`3.5.3 Machines with Both Cache and Virtual Memory
`3.5.4 RISC Architectures
`3.5.5 Superscalar Architectures
`3.6 Historical References
`Exercises
`
`4
`
`Characteristics of Numerical Applications
`
`4.1 Classification of Large-Scale Numerical Problems
`4.1.1 Continuum Models
`4.1.2 Particle Models
`4.2 Design Constraints for High-Performance Machines
`4.3 Architectures for the Continuum Model
`4.4 Algorithms for the Continuum Model
`4.4.1 The Cosmic Cube versus the ILLIAC IV
`4.4.2 Data-Flow Requirements
`4.4.3 Parallel Solutions
`4. 4. 4 Recursive Doubling and Cyclic Reduction
`
`84
`87
`90
`95
`102
`103
`107
`115
`118
`125
`129
`
`142
`
`143
`155
`157
`169
`169
`174
`180
`182
`190
`192
`192
`197
`207
`210
`218
`227
`228
`
`235
`
`236
`238
`240
`242
`244
`251
`252
`254
`259
`265
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 13
`
`
`
`5
`
`6
`
`Contents
`
`xv
`
`4.5
`
`The Perfect Shuffle
`4.5.1 The Perfect-Shuffle Interconnection Pattern
`4.5.2 Applications of the Perfect Shuffle
`Architectures for the Continuum Model-Which Direction?
`4.6
`Exercises
`
`268
`269
`275
`285
`288
`
`Vector Computers
`5.1 A Generic Vector Processor
`5.1.1 Multiple Memory Modules
`Intermediate Memories
`5.1.2
`5.2 Access Patterns for Numerical Algorithms
`5.2.1 Gaussian Elimination
`5.3 Data-Structuring Techniques for Vector Machines
`5.4 Attached Vector-Processors
`5.5 Sparse-Matrix Techniques
`5.6 The GF-11-A Very High-Speed Vector Processor
`5.7 Final Comments on Vector Computers
`Exercises
`
`Multiprocessors
`6.1 Background
`6.2 Multiprocessor Performance
`6.2.1 The Basic Model-Two Processors with
`Unoverlapped Communications
`6.2.2 Extension to N Processors
`6.2.3 A Stochastic Model
`6.2.4 A Model with Linear Communication Costs
`6.2.5 An Optimistic Model-Fully Overlapped
`Communication
`6.2.6 A Model with Multiple Communication Links
`6.2.7 Multiprocessor Models
`6.3 Multiprocessor Interconnections
`6.3.1 Bus Interconnections
`6.3.2 Ring Interconnections
`6.3.3 Crossbar Interconnections
`6.3.4 Two- and Three-Dimensional Meshes
`6.3.5 The Shuffle-Exchange Interconnection and the
`Combining Switch
`6.3.6 The Butterfly Operation and the Reverse-Binary
`Transformation
`6.3.7 The Combining Network and Fetch-and-Add
`6.3.8 Hypercube Interconnections
`
`292
`
`293
`295
`302
`307
`308
`312
`319
`324
`327
`329
`332
`
`337
`
`338
`342
`
`344
`346
`
`349
`350
`
`352
`353
`
`356
`358
`358
`363
`365
`370
`
`371
`
`373
`378
`384
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 14
`
`
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 15
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 15
`
`
`
`1
`
`Architecture is preeminently the art
`of significant forms in space-that is,
`forms significant of their functions.
`-Claude Bragdon r 1931
`
`Introduction
`
`1.1 Technology and Architecture
`1.t But Is It Art?
`1.3 High-Performance Techniques
`1.4 Historical References
`
`This text is devoted to the study of the architecture of high-speed computer
`systems, with emphasis on design and analysis. We view a computer system
`as being constructed from a variety of functional modules such as processors,
`memories, input/output channels, and switching networks. By architecture, we
`mean the shucture of the modules as they are organized in a computer system.
`The architectural design of a computer system involves selecting various func
`tional modules such as processors and memories and organizing them into a
`system by designing the interconnections that tie them together. This is anal
`ogous to the architectural design of buildings, which involves selecting materials
`and fitting the pieces together to form a viable structure.
`
`1.1 Technology and Architecture
`Computer architecture is driven by technology. Every year brings new devices,
`new functions, and new possibilities. An imaginative and effective architecture
`for today could be a klunker for tomorrow, and likewise, a ridiculous proposal
`
`1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 16
`
`
`
`2
`
`Introduction
`
`Chapte,. 1
`
`for today may be ideal for tomorrow. There are no absolute rules that say that
`one architecture is better than another.
`The key to learning about computer architecture is learning how to evaluate
`architecture in the context of the technology available. It is as important to know
`if a computer system makes effective use of processor cycles, memory capacity,
`and Lnput/output bandwidth as it is to know its raw computational speed. The
`objective is to look at both cost and performance, not performance alone, in
`evaluating architectures. Because of changes in technology, relative costs among
`modules as well as absolute costs change dramatically every few years, so the
`best proportion of different types of modules in a cost-effective design changes
`with technology.
`This text takes the approach that it is methodology, not conclusions, that
`needs to be taught. We present a menu of possibilities, some reasonable today
`and some not. We show how to construct high-performance systems by making
`selections from the menus, and we evaluate the systems produced in terms of
`technology that exists at the start of the 1990s. The conclusions reached by these
`evaluations are probably reasonable through the middle of the decade, but in
`no way do we claim that the architectures that look strongest today will be the
`best as we turn to a new millennium.
`The methodology,. however, is timeless. From time to time the computer
`architect needs to construct a new menu of design choices. With that menu and
`the design and evaluation techniques described in this text, the architect should
`be able to produce high-quality systems in any decade for the technology at that
`time.
`Performance analysis should be based on the architecture of the total system.
`Design and analysis of high-performance systems is very complex, however,
`and is best approached by breaking the large system into a hierarchy of functional
`blocks, each with an architecture that can be analyzed in isolation. If any sing1e
`function is very complicated, it too can be further refined into a collection of
`more primitive functions. Processor architecture, for example ✓ involves putting
`together registers, arithmetic units ., and control logic to create processors-the
`computational elements of a computer system.
`An important facet of processor architecture is the design of the instruction
`set for the processor. In years past, there were controversies raging over whether
`instruction sets should be very simple or very complex. The controversies were
`not settled with a single solution; instruction sets continue to evolve with dif
`ferent underlying philosophies. But as part of the evolution, each different
`approach is influenced by the others, and incorporates advantages of other
`approaches where possible. We illuminate the factors that determine the quality
`of an instruction set, and in any technology an architect can measure those
`factors for a new design to guide the design process.
`Computer architecture is sometimes confused with the design of computer
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 17
`
`
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 18
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 18
`
`
`
`4
`
`Introduction
`
`Chapter 1
`
`aesthetics, for which we have no absolute measures. We have no absolute test
`to conclude whether the work is a masterpiece or a piece of junk. If the art world
`agrees that it is a masterpiece, then it is a mast�rpi:ce.
`.
`.
`.
`.
`Computer architecture, too, has an aesthetic side, but 1t 1s quite different
`from the arts. We can evaluate the quality of an architecture in terms of maximum
`number of results per cycle, program and data capacity, and cost, as well as
`other measures that tend to be important in various contexts. We need never
`debate a question such as, "but is it fast?"
`Architectures can be compared on critical measures when choices must be
`made. The challenge comes because technology gives us new choices each year,
`and the decisions from last year may not hold this year. Not only must the
`architect understand the best decision for today, but the architect must factor
`in the effects of expected changes in technology over the life of a design. There
`fore, not only do evaluation techniques play a crucial role in individual decisions,
`but by using these techniques over a period of years, the architect gains expe
`rience in understanding the impact of technological developments on new ar
`chitectures and is able to judge trends for several years in the future.
`Here are the principal criteria for judging an architecture:
`
`• Performance;
`• Cost; and
`• Maximum program and data size.
`
`There are a dozen or more other criteria, such as weight, power consumption,
`volume, and ease of programming, that may have relatively high significance
`in particular cases, but the three listed here are important in all applications and
`critical in most of them.
`
`1.2.1 The Cost Factor
`
`The cost criterion deserves a bit more explanation because so many people are
`confused about what it means. The cost of a computer system to a user is the
`money that the user pays for the system, namely its price. To the designer, cost
`is not so clearly defined. In most cases, cost is the cost of manufacturing, in
`cluding a fair amortization of the cost of development and capital tools for
`construction. All too often we see comparisons of architectures that compare
`the parts cost of System A with the purchase price of System B, where System
`A is a novel architecture that is being proposed as an innovation, and System
`B represents a model in commercial production.
`Another fallacious comparison is often made when relating hardware to
`software. In the early years of computing, software was often bundled free of
`charge with hardware, but, as the industry matured, software itself became a
`commodity of value to be sold.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 19
`
`
`
`Section 1.2
`
`But Is It M'l
`
`5
`
`We now discover that what was once a free good now commands a signif
`icant portion of a computing budget. The trends that people quote are depicted
`in Fig. 1.1, where we see the cost of software steadily rising with inflation and
`complexity, and wit h apparently little relief from advances in software tools.
`Plotted on the same curve is the general trend for hardware in the same period
`of time. Hardware components appear to be diminishing in cost at an unbe
`lievable rate. If we project these trends forw'ard ten to twenty years, we may
`believe that hardware might be bund1ed with software, given free with the
`purchase of the software that runs on it. But this view is rather naive.
`Software and hardware costs each have two components:
`
`1. A one-time development costj and
`2. A per-unit manufacturing cost.
`
`The actual cost of a product, be it software or hardware, is shown in Fig. 1.2 as
`a function of the volume of production of a product. Note that the cost of the
`first unit is equal to the cost of the development. The cost curve moves upward
`with volume, but the slope tends to diminish with very high volumes because
`of manufacturing experience that tends to reduce per-unit costs over large vol
`umes of production. The curve in Fig. 1.2 shows accumulated cost of the total
`volume of a product. The price of the product is the cost shown on the curve
`divided by the volume, plus a markup for profit. So price is very sensitive to
`volume when development costs are high.
`When software was essentially free, the development costs were either bun-
`
`9......---------------------,
`
`o Software
`
`■ Hardware (log of normalized cost)
`
`8
`
`7
`
`�
`E 4
`0
`z
`
`3
`
`2
`
`10--�--'----....J"----_,__ _ __._ __ ....._ _ __._ __ ...._ _ _.
`1950
`1955
`1960
`1965
`1970
`1975
`1980 1985
`1990
`
`Fig. 1.1 A naive view of computer-cost trends.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 20
`
`
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 21
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2135, p. 21
`
`
`
`Section 1.2
`
`But Is It Art?
`
`1
`
`of the database-management software may be sold. This alone can account for
`a factor-of-ten difference in price.
`Our analysis also shows why in years to come hardware costs will still prove
`to be significant compared to software costs. At issue here is the cost of man�
`ufacturing. Software manufacturing costs are near zero today and can only go
`lower, so that softv,:are pricing in a competitive market mainly reflects the am
`ortization of development costs.
`Hardware manufacturing costs, while small on a per-chip basis, are many
`times more than software manufacturing costs. It is far less costly today to
`replicate accurate copies of software than it is to replicate hardware. Hardware
`requires assembly and testing to make sure that each copy is a faithful copy of
`the original design. This is far more complex today than the quality assurance
`on a software manufacturing line that simply has to compare each bit of infor
`mation in software to see if it agrees with the original program.
`Figure ·1.2 suggests a strategy for the development and pricing of VLSI chips,
`hardware, and software. Development costs have to be amortized over the
`volume of units sold. The price of a unit