throbber
COMPUTER
`
`'
`
`I ARCHITECTURE
`
`
`
`‘—
`
`QUANTITATIVE
`5APPROACH
`
`_
`
`
`
`fl‘IIIIIlI
`:
`I
`I
`JOHNL HENNESSY
`&
`I
`
`DAVID A PATTERSON
`
`
`
`_
`
`_
`
`1
`
`DELL Ex.1035.001
`Ex.1035.001
`
`DELL
`
`

`

`
`
`DELL Ex.1035.002
`Ex.1035.002
`
`DELL
`
`

`

`._>H._.;o>n=n
`..__Egg>.____.._:.%.OzPc>qu>i<m
`
`
`
`..012F52233nozgqmwg>Fn:_._..mn._.c_mm
`
`
`
`MORGAN
`KAUFMANN
`PUBLISHERS -
`INC
`'
`
`= g;-
`" j;
`
`.,
`'
`
`DELL Ex.1035.003
`Ex.1035.003
`
`DELL
`
`

`

`····rm11r11HHr-·
`
`D f" •1.
`t
`1 aoooo6534
`_._. ... ..., ... ..:;:" Mr .. nnec ure e 1n1 ions,
`and Rules of Thumb
`
`I
`T . . F
`r1v1a, ormu as,
`
`Definitions
`
`Big Endian: the byte with the binary address "x ... xOO" is in the most significant position
`("big end") of a 32-bit word (page 95).
`Clock rate: inverse of clock cycle time, usually measured in MHz (page 36).
`CPI: clock cycles per instruction (page 36).
`Hit rate: fraction of memory references found in the cache, equal to 1-Miss rate (page 404).
`Hit time: memory-access time for a cache hit, including time to determine if hit or miss (page 405).
`Instruction count: number of instructions executed while running a program (page 36).
`Little Endian: the byte with the binary address "x ... xOO" is in the least significant position
`("little end") of a 32-bit word (page 95).
`MIMD: (multiple instruction stream, multiple data stream) a multiprocessor or multicomputer
`(page 572).
`Miss penalty: time to replace a block in the top level of a cache system with the corresponding block
`from the lower level (page 405).
`Miss rate: fraction of memory references not found in the cache, equal to 1 - Hit rate (page 404 ).
`N112: the vector length needed to reach one-half of R00 (page 384).
`Nv: the vector length needed so that vector mode is faster than scalar mode (page 384).
`: the megaflop rate of an infinite-length vector (page 384).
`R 00
`RAW data hazard: (read after write) instruction tries to read a ~ource before a prior instruction writes
`it; so it incorrectly gets the old value (page 264).
`SIMD: (single instruction stream, multiple data stream) an array processor (page 572).
`SISD: (single instruction stream, single data stream) a uniprocessor (page 572).
`Spatial locality: (locality in space) if an item is referenced, nearby items will tend to be referenced
`soon (page 403).
`Temporal locality: (locality in time) if an item is referenced, it will tend to be referenced again soon
`(page 403).
`WAR data hazard: (write after read) instruction tries to write a destination before it is read by a prior
`instruction, so prior instruction incorrectly gets the new value (page 264).
`WAW data hazard: (write after write) instruction tries to write an operand before it is written by a
`prior instruction. The writes are performed in the wrong order, incorrectly leaving the value of
`the prior instruction in the destination (page 264).
`
`Trivia
`
`Byte order of machines (page 95)
`Big Endian: IBM 360, MIPS, Motorola, SPARC, DLX
`Little Endian: DEC VAX, DEC RISC, Intel 80x86
`
`Year and User Address Size of Generations of IBM and Intel Computer Families
`
`Year
`
`Model
`
`~
`
`1964
`1971
`1983
`1986
`
`IBM360
`IBM370
`IBM370-XA
`IBMESA/370
`
`User address size
`
`Year
`
`Model
`
`User address size
`
`24
`24
`31
`16+31
`
`1978
`1981
`1982
`1985
`1989
`
`Intel 8086
`Intel 80186
`Intel 80286
`Intel 80386
`Intel 80486
`
`4+16
`4+16
`16+16
`16+32 or 32
`16+32 or 32
`
`Ex.1035.004
`
`DELL
`
`

`

`Formulas
`
`1
`1. Amdahl's Law: Speedup=---------------
`.
`Fractionenhanced
`( l-Fract10nenhanced) + S
`d
`pee UPenhanced
`
`(page 8)
`
`2. CPU time= Instruction count* Clock cycles per instruction* Clock cycle time (page 36)
`
`3 Average memory-access time = Hit time + Miss rate * Miss penalty (page 405)
`
`4. Means-arithmetic(AM), weighted arithmetic(W AM), harmonic(HM) and weighted harmonic(WHM):
`n
`n
`AM= l L,Timei, WAM = L,Weighti * Timei, HM=
`n
`l
`. 1
`n
`n
`n. 1
`l=
`l=
`~Weighti
`~ 1
`..£...J Ratei
`~Ratei
`i=l
`i=l
`where Timei is the execution time for the ith program of a total of n in the workload, W eighti is the
`weighting of the ith program in the workload, and Ratei is a function of l!fimei (page 51).
`
`, WHM =
`
`5.
`
`. _ Cost of die + Cost of testing die + Cost of packaging (page
`.
`d
`.-r .
`C
`p· 1 t
`ost o1 integrate circuit -
`t
`· ld
`ma es y1e
`{ 1 Defects per unit area * Die area }--ex
`. Id W .c
`6 D .
`• ld
`+
`a
`.
`ie yie =
`aier y1e *
`where Wafer yield accounts for wafers that are so bad they need not be tested and a corresponds
`to the number of masking levels critical to die yield (usually a 2:: 2.0, page 59).
`
`)
`
`55
`
`d _ Clock cycle timen0 pipelining *
`.
`p .
`1
`ipe me spee up - Clock cycle timepipelined
`
`Ideal CPI * Pipeline depth
`Ideal CPI + Pipeline stall cycles per instruction
`
`7 ·
`
`where Pipeline stall cycles accounts for clock cycles lost due to pipeline hazards (page 258).
`
`8. System performance:
`Time0verlap
`Timeuo
`Timecpu
`.
`+ S
`d
`S

`M
`(S
`d
`d
`T1meworkload = S
`d
`pee upcpu
`ax1mum pee upcpu, pee up11o)
`pee up1;0
`where Timecpu means the time the CPU is busy, Time1;0 means the time the I/0 system is busy,
`and Timeoverlap means the time both are busy. This formula assumes the overlap scales linearly
`with speedup (page 506).
`
`Rules of Thumb
`
`1. Amdahl/Case Rule: A balanced compu,ter system need~ about 1 megabyte of main memory
`capacity and 1 megabit per second of I/0 bandwidth per MIPS of CPU performance (page 17).
`2. 90110 Locality Rule: A program executes about 90% of its instructions in 10% of its code (pages
`11-12).
`3. DRAM-Growth Rule: Density increases by about 60% per year, quadrupling in 3 years (page 17).
`4. Disk-Growth Rule: Den~ity increases by about 25% per year, doubling in 3 years (page 17).
`5. Address-Consumption Rule: The memory needed by the average program grows by about a factor
`of 1.5 to 2 per year; thus, it consumes between 1/2 and 1 address bit per year (page 16).
`6. 90150 Branch-Taken Rule: About 90% of backward-going branches are taken while about 50% of
`forward-going branches are taken (page 108).
`7. 2:1 Cache Rule: The miss rate of a direct-mapped cache of size Xis about the same as a 2-way(cid:173)
`set-associative cache of size X/2 (page 421).
`
`Ex.1035.005
`
`DELL
`
`

`

`DELL Ex.1035.006
`Ex.1035.006
`
`DELL
`
`

`

`Computer
`Architecture
`A
`Quantitative
`Approach
`
`Ex.1035.007
`
`DELL
`
`

`

`DELL Ex.1035.008
`Ex.1035.008
`
`DELL
`
`

`

`Computer
`Architecture
`A
`Quantitative
`Approach
`
`David A. Patterson
`UNIVERSITY OF CALIFORNIA AT BERKELEY
`
`John L. Hennessy
`STANFORD UNIVERSITY
`
`With a Contribution by
`David Goldberg
`Xerox Palo Alto Research Center
`
`MORGAN KAUFMANN PUBLISHERS, INC.
`SAN MATEO, CALIFORNIA
`
`Ex.1035.009
`
`DELL
`
`

`

`Sponsoring Editor Bruce Spatz
`Production Manager Shirley Jowell
`Technical Writer Walker Cunningham
`Text Design Gary Head
`Cover Design David Lance Goines
`Copy Editor Linda Medoff
`Proofreader Paul Medoff
`Computer Typesetting and Graphics Fifth Street Computer Services
`
`Library of Congress Cataloging-in-Publication Data
`Patterson, David A.
`Computer architecture : a quantitative approach I David A.
`Patterson, John L. Hennessy
`p.
`cm.
`Includes bibliographical references
`ISBN 1-55860- 069-8
`1. Computer architecture. I. Hennessy, John L. II. Title.
`QA76.9.A73P377 1990
`004.2'2--dc20
`
`Morgan Kaufmann Publishers, Inc.
`Editorial Office: 2929 Campus Drive. San Mateo, CA 94403
`Order from: P.O. Box 50490, Palo Alto, CA 94303-9953
`
`© 1990 by Morgan Kaufmann Publishers, Inc.
`All rights reserved.
`
`89-85227
`
`CIP
`
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted
`in any form or by any means-electronic, mechanical, recording, or otherwise-without
`the prior permission of the publisher.
`
`All instruction sets and other design information of the DLX computer system contained
`herein is copyrighted by the publisher and may not be incorporated in other publications
`or distributed by media without formal acknowledgement and written consent from the
`publisher. Use of the DLX in other publications for educational purposes is encouraged
`and application for permission is welcomed.
`ADVICE, PRAISE, & ERRORS: Any correspondence related to this publication or
`intended for the authors should be addressed to the editorial offices of Morgan Kaufmann
`Publishers, Inc., Dept. P&H APE. Information regarding error sightings is encouraged.
`Any error sightings that are accepted for correction in subsequent printings will be
`rewarded by the authors with a payment of $1.00 (U.S.) per correction upon availability
`of the new printing. Electronic mail can be sent to bugs3@vsop.stanford.edu. (Please
`include your full name and permanent mailing address.)
`INSTRUCTOR SUPPORT: For information on classroom software and other instructor
`materials available to adopters, please contact the editorial offices of Morgan Kaufmann
`Publishers, Inc. (415) 578-9911.
`Third printing, i993
`
`Ex.1035.010
`
`DELL
`
`

`

`To Andrea, Linda, and our four sons
`
`Ex.1035.011
`
`DELL
`
`

`

`Trademarks
`The following trademarks are the property of the following organizations:
`
`Alliant is a trademark of Alliant Computers.
`AMD 29000 is a trademark of AMD.
`
`trademarks of Cypress
`
`TeX is a trademark of American Mathematical Society.
`AMI 6502 is a trademark of AMI.
`Apple I, Apple II, and Macintosh are trademarks of Apple Computer,
`Inc.
`ZS- I is a trademark of Astronautics.
`UNIX and UNIX F77 are trademarks of AT&T Bell Laboratories.
`Turbo C is a trademark of Borland International.
`The Cosmic Cube is a trademark of California Institute of Technology.
`Warp, C.mmp, and Cm* are trademarks of Carnegie-Mellon University.
`CP3 IOO is a trademark of Conner Peripherals.
`CDC 6600, CDC 7600, CDC STAR-100, CYBER-180, CYBER
`180/990, and CYBER-205 are trademarks of Control Data Corporation.
`Conve~, C-1, C-2, and C series are trademarks of Convex.
`CRA Y-3 is a trademark of Cray Computer Corporation.
`CRAY-I, CRAY-IS, CRAY-2, CRAY X-MP, CRAY X-MP/416,
`CRAY Y-MP, CFT77 V3.0, CFT, and CFT2 Vl.3a are trademarks of
`Cray Research.
`Cydra 5 is a trademark of Cydrome.
`CY7C601, 7C601, 7C604, and 7C157 are
`Semiconductor.
`Nova is a trademark of Data General Corporation.
`HEP is a trademark of Denelcor.
`CV AX, DEC, DECsystem, DECstation, DECstation 3100, DECsystem
`10/20, fort, LPl I, Massbus, MicroVAX-I, MicroVAX-II, PDP-8, PDP-
`10, PDP-I I, RS-1 !M/IAS, Unibus, Ultrix, Ultrix 3.0, VAX,
`V AXstation, V AXstation 2000, V AXstation 3100, VAX-I I, VAX-
`11/780, V AX-11/785, VAX Model 730, Model 750, Model 780, VAX
`8600, VAX 8700, VAX 8800, VS FORTRAN V2.4, and VMS are
`trademarks of Digital Equipment Corporation.
`BINAC is a trademark of Eckert-Mauchly Computer Corporation.
`Multimax is a trademark of Encore Computers.
`ET A 10 is a trademark of the ET A Corporation.
`SYMBOL is a trademark of Fairchild Corporation.
`Pegasus is a trademark of Ferranti, Ltd.
`Ferrari and Testarossa are trademarks of Ferrari Motors.
`AP-120B is a trademark of Floating Point Systems.
`Ford and Escort are trademarks Ford Motor Co.
`Gnu C Compiler is a trademark of Free Software Foundation.
`M2361A, Super Eagle, VPlOO, and VP200 are trademarks of Fujitsu
`Corporation.
`Chevrolet and Corvette are trademarks of General Motors Corporation.
`HP Precision Architecture, HP 850, HP 3000, HP 3000/70, Apollo DN
`300, Apollo DN 10000, and Precision are trademarks of Hewlett-Packard
`Company.
`S810, S810/200, and S820 are trademarks of Hitachi Corporation.
`Hyundai and Excel are trademarks of the Hyundai Corporation.
`432, 960 CA, 4004, 8008, 8080, 8086, 8087, 8088, 80186, 80286,
`80386, 80486, iAPX 432, i860, Intel, Multibus, Multibus II, and Intel
`Hypercube are trademarks of Intel Corporation.
`Inmos and Transputer are trademarks of Inmos.
`Clipper ClOO is a trademark of Intergraph.
`IBM, 360, 360/30, 360/40, 360/50, 360/65, 360/85, 360/91, 370,
`370/135, 370/138, 370/145, 370/155, 370/158, 370/165, 370/168, 370-
`XA, ESA/370, System/360, System/370, 701, 704, 709, 801, 3033, 3080,
`3080 series, 3080 VF, 3081, 3090, 3090/100, 3090/200, 3090/400,
`
`3090/600, 3090/600S, 3090 VF, 3330, 3380, 3380D, 3380 Disk Model
`AK4, 3380J, 3390, 3880-23, 3990, 7030, 7090, 7094, IBM FORTRAN,
`ISAM, MYS, IBM PC, IBM PC-AT, PL.8, RT-PC, SAGE Stretch
`IBM SYS, Vector Facility, and VM are trademarks of Inte~ationai
`Business Machines Corporation.
`FutureBus is a trademark of the Institute of Electrical and Electronic
`Engineers.
`Lamborghini and Countach are trademarks of Nuova Automobili
`Ferrucio Lamborghini, SP A.
`Lotus 1-2-3 is a trademark of Lotus Development Corporation.
`MB8909 is a trademark of LSI Logic.
`NuBus is a trademark of Massachusetts Institute of Technology.
`Miata and Mazda are trademarks of Mazda.
`MASM, Microsoft Macro Assembler, MS DOS, MS DOS 3.1, and OS/2
`are trademarks of Microsoft Corporation.
`MIPS, MIPS 120, MIPS/120A, M/500, M/1000, RC6230, RC6280,
`R2000, R2000A, R2010, R3000, and R3010 are trademarks of MIPS
`Computer Systems.
`Delta Series 8608, System V/88 R32V!, VME bus, 6809, 68000, 68010,
`68020, 68030, 68882, 88000, 88000 l.8.4ml4, 88100, and 88200 are
`trademarks of Motorola Corporation.
`Multiflow is a trademark of Multiflow Corporation.
`National 32032 and 32x32 are trademarks of National Semiconductor
`Corporation.
`Ncube is a trademark of Ncube Corporation.
`SX/2, SX/3, and FORTRAN 77/SX V.040 are
`Information Systems.
`NYU Ultracomputer is a trademark of New York University.
`VAST-2 v.2.21 is a trademark of Pacific Sierra.
`Wren IV, Imprimis, Sabre, Sabre 97209, and IPI-2 are trademarks of
`Seagate Corporation.
`Sequent, Balance 800, Balance 21000, and Symmetry are trademarks of
`Sequent Computers.
`Silicon Graphics 4D/60, 4D/240, and Silicon Graphics 4D Series are
`trademarks of Silicon Graphics.
`Stellar GS 1000, Stardent-1500, and Ardent Titan-I are trademarks of
`Stardent.
`Sun 2, Sun 3, Sun 3/75, Sun 3/260, Sun 3/280, Sun 4, Sun 4/110, Sun
`4/260, Sun 4/280, SunOS 4.0.3c, Sun 1.2 FORTRAN compiler, SPARC,
`and SPARCstation I are trademarks of Sun Microsystems.
`Synapse N+ I is a trademark of Synapse.
`Tandem and Cyclone are trademarks of Tandem Computers.
`TI 8847 and TI ASC are trademarks of Texas Instruments Corporation.
`Connection Machine and CM-2 are trademarks of Thinking Machines.
`Burroughs 6500, B5000, B5500, D-machine, UNIV AC, UNIV AC I,
`UNIV AC 1103 are trademarks of UNISYS.
`Spice and 4.2 BSD UNIX are trademarks of University of California,
`Berkeley.
`Illiac, Illiac IV, and Cedar are trademarks of University of Illinois.
`Ada is a trademark of the U.S. Government (Ada Joint Program Office).
`Weitek 3364, Weitek 1167, WTL 3110, and WTL 3170 are trademarks
`of Weitek Computers.
`Alto, Ethernet, PARC, Palo Alto Research Center, Smalltalk, and Xerox
`are trademarks of Xerox Corporation.
`Z-80 is a trademark of Zilog.
`
`trademarks of NEC
`
`Ex.1035.012
`
`DELL
`
`

`

`Foreword
`
`by C. Gordon Bell
`
`I am delighted and honored to write the foreword for this landmark book.
`The authors have gone beyond the contributions of Thomas to Calculus and
`Samuelson to Economics. They have provided the definitive text and reference
`for computer architecture and design. To advance computing, I urge publishers
`to withdraw the scores of books on this topic so a new breed of architect/
`engineer can quickly emerge. This book won't eliminate the complex and
`errorful microprocessors from semicomputer companies, but it will hasten the
`education of engineers who can design better ones.
`The book presents the critical tools to analyze uniprocessor computers. It
`shows the practicing engineer how technology changes over time and offers the
`empirical constants one needs for design. It motivates the designer about func(cid:173)
`tion, which is a welcome departure from the usual exhaustive shopping list of
`mechanisms that a naive designer might attempt to include in a single design.
`The authors establish a baseline for analysis and comparisons by using the
`most important machine in each class: mainframe (IBM 360), mini (DEC VAX),
`and micro/PC (Intel 80x86). With this foundation, they show the coming
`mainline of simpler pipelined and parallel processors. These new technologies
`are shown as variants of their pedagogically useful, but highly realizable,
`processor (DLX). The authors stress technology independence by measuring
`work done per clock (parallelism), and time to do work (efficiency and latency).
`These methods should also improve the quality of research on new architectures
`and parallelism.
`Thus, the book is required understanding for anyone working with architec(cid:173)
`ture or hardware, including architects, chip and computer system engineers, and
`compiler and operating system engineers. It is especially useful for software
`engineers writing programs for pipelined and vector computers. Managers and
`marketers will benefit by knowing the Fallacies and Pitfalls sections of the book.
`One can lay the demise of many a computer-and, occasionally, a company--on
`engineers who fail to understand the subtleties of computer design.
`The first two chapters establish the essence of computer design through
`measurement and the understanding of price/performance. These concepts are
`applied to the instruction set architecture and how it is measured. They discuss
`the implementation of processors and include extensive discussions of tech(cid:173)
`niques for designing pipelined and vector processors. Chapters are also devoted
`to memory hierarchy and the often-neglected input/output. The final chapter
`
`ix
`
`Ex.1035.013
`
`DELL
`
`

`

`x
`
`Foreword
`
`presents the opportunities and questions about machines and directions of the
`future. Now, we need their next book on how to build these machines.
`The reason this book sets a standard above all others and is unlikely to be
`superseded in any foreseeable future is the understanding, experience, taste, and
`uniqueness of the authors. They have stimulated the major change in architecture
`by their work on RISC (Patterson coined the word). Their university research
`leading to product development at MIPS and Sun Microsystems established
`important architectures for the 1990s. Thus, they have done the analysis,
`evaluated the trade-offs, worked on the compilers and operating systems, and
`seen their machines achieve significance in use. Furthermore, as teachers, they
`have seen that the book is pedagogically sound (and have solicited opinions
`from others through the unprecedented Beta testing program). I know this will
`be the book of the decade in computer systems. Perhaps its greatest
`accomplishment would be to stimulate other great architects and designers of
`higher-level systems (databases, communications systems, languages and
`operating systems) to write similar books about their domains.
`I've already enjoyed and learned from the book, and surely you will too.
`
`-C. Gordon Bell
`
`Ex.1035.014
`
`DELL
`
`

`

`1
`
`2
`
`3
`
`Computer Architecture: A Quantitative Approach
`
`xi
`
`Contents
`
`Foreword
`by C. GORDON BELL
`
`Preface
`
`Acknowledgements
`
`Fundamentals of Computer Design
`1.1
`Introduction
`1.2
`Definitions of Performance
`1.3
`Quantitative Principles of Computer Design
`1.4
`The Job of a Computer Designer
`1.5
`Putting It All Together: The Concept of Memory Hierarchy
`1.6
`Fallacies and Pitfalls
`1.7
`Concluding Remarks
`1.8
`Historical Perspective and References
`Exercises
`
`Performance and Cost
`2.1
`Introduction
`2.2
`Performance
`2.3
`Cost
`2.4
`Putting It All Together: Price/Performance of Three Machines
`2.5
`Fallacies and Pitfalls
`2.6
`Concluding Remarks
`2.7
`Historical Perspective and References
`Exercises
`
`Instruction Set Design:
`Alternatives and Principles
`3.1
`Introduction
`3.2
`Classifying Instruction Set Architectures
`3.3
`Operand Storage in Memory: Classifying General-Purpose
`Register Machines
`3.4
`Memory Addressing
`3.5
`Operations in the Instruction Set
`3.6
`Type and Size of Operands
`3.7
`The Role of High-Level Languages and Compilers
`3.8
`Putting It All Together: How Programs Use Instruction Sets
`3.9
`Fallacies and Pitfalls
`3.10 Concluding Remarks
`3.11 Historical Perspective and References
`Exercises
`
`ix
`
`xvii
`xx iii
`
`2
`3
`5
`8
`13
`18
`21
`22
`23
`28
`
`32
`33
`35
`53
`66
`70
`76
`77
`81
`
`88
`89
`90
`
`92
`94
`103
`109
`111
`122
`124
`126
`127
`132
`
`Ex.1035.015
`
`DELL
`
`

`

`xii
`
`4
`
`5
`
`6
`
`Contents
`
`Instruction Set Examples and
`Measurements of Use
`Instruction Set Measurements: What and Why
`4.1
`4.2
`The VAX Architecture
`The 360/370 Architecture
`4.3
`The 8086 Architecture
`4.4
`The DLX Architecture
`4.5
`Putting It All Together: Measurements
`4.6
`of Instruction Set Usage
`Fallacies and Pitfalls
`Concluding Remarks
`Historical Perspective and References
`Exercises
`
`4.7
`4.8
`4.9
`
`Basic Processor Implementation Techniques
`5.1
`Introduction
`5.2
`Processor Datapath
`5.3
`Basic Steps of Execution
`Hardwired Control
`5.4
`5.5
`Microprogrammed Control
`Interrupts and Other Entanglements
`5.6
`5.7
`Putting It All Together: Control for DLX
`5.8
`Fallacies and Pitfalls
`5.9
`Concluding Remarks
`5.10 Historical Perspective and References
`Exercises
`
`Pipelining
`6.1 What Is Pipelining?
`6.2
`The Basic Pipeline for DLX
`Making the Pipeline Work
`6.3
`The Major Hurdle of Pipelining-Pipeline Hazards
`6.4
`6.5 What Makes Pipelining Hard to Implement
`Extending the DLX Pipeline to Handle Multicycle Operations
`6.6
`6.7
`Advanced Pipelining-Dynamic Scheduling in Pipelines
`Advanced Pipelining-Taking Advantage of More
`6.8
`Instruction-Level Parallelism
`Putting It All Together: A Pipelined VAX
`6.9
`6.10 Fallacies and Pitfalls
`6.11 Concluding Remarks
`6.12 Historical Perspective and References
`Exercises
`
`138
`139
`142
`148
`153
`160
`
`167
`183
`185
`186
`191
`
`198
`199
`201
`202
`204
`208
`214
`220
`238
`240
`241
`244
`
`250
`251
`252
`255
`257
`278
`284
`290
`
`314
`328
`334
`337
`338
`343
`
`r
`
`Ex.1035.016
`
`DELL
`
`

`

`7
`
`8
`
`9
`
`Computer Architecture: A Quantitative Approach
`
`xiii
`
`Vector Processors
`7.1
`Why Vector Machines?
`7.2
`Basic Vector Architecture
`Two Real-World Issues: Vector Length and Stride
`7.3
`A Simple Model for Vector Performance
`7.4
`Compiler Technology for Vector Machines
`7.5
`Enhancing Vector Performance
`7.6
`7.7
`Putting It All Together: Evaluating the
`Performance of Vector Processors
`Fallacies and Pitfalls
`7.8
`Concluding Remarks
`7.9
`7.10 Historical Perspective and References
`Exercises
`
`Memory-Hierarchy Design
`Introduction: Principle of Locality
`8.1
`General Principles of Memory Hierarchy
`8.2
`8.3
`Caches
`Main Memory
`8.4
`8.5
`Virtual Memory
`Protection and Examples of Virtual Memory
`8.6
`More Optimizations Based on Program Behavior
`8.7
`Advanced Topics-Improving Cache-Memory Performance
`8.8
`Putting It All Together: The VAX-11/780 Memory Hierarchy
`8.9
`8.10 Fallacies and Pitfalls
`8.11 Concluding Remarks
`8.12 Historical Perspective and References
`Exercises
`
`Input/Output
`9.1
`Introduction
`9.2
`Predicting System Performance
`1/0 Performance Measures
`9.3
`Types of 1/0 Devices
`9.4
`Buses-Connecting 1/0 Devices to CPU/Memory
`9.5
`Interfacing to the CPU
`9.6
`9.7
`Interfacing to an Operating System
`Designing an 1/0 System
`9.8
`Putting It All Together:
`9.9
`The IBM 3990 Storage Subsystem
`9.10 Fallacies and Pitfalls
`9.11 Concluding Remarks
`9.12 Historical Perspective and References
`Exercises
`
`350
`351
`353
`364
`369
`371
`377
`
`383
`390
`392
`393
`397
`402
`403
`404
`408
`425
`432
`438
`449
`454
`475
`480
`484
`485
`490
`498
`499
`501
`506
`512
`528
`533
`535
`539
`
`546
`554
`559
`560
`563
`
`Ex.1035.017
`
`DELL
`
`

`

`xiv
`
`10
`
`Contents
`
`Future Directions
`10.1
`Introduction
`10.2 Flynn Classification of Computers
`10.3 SIMD Computers-Single Instruction
`Stream, Multiple Data Streams
`10.4 MIMD Computers-Multiple Instruction
`Streams, Multiple Data Streams
`10.5 The Roads to El Dorado
`10.6 Special-Purpose Processors
`10.7 Future Directions for Compilers
`10.8 Putting It All Together: The Sequent Symmetry
`Multiprocessor
`10.9 Fallacies and Pitfalls
`10.1 O Concluding Remarks-Evolution Versus
`Revolution in Computer Architecture
`10.11 Historical Perspective and References
`Exercises
`
`Appendix A: Computer Arithmetic
`by DAVID GOLDBERG
`Xerox Palo Alto Research Center
`Introduction
`A.1
`Basic Techniques of Integer Arithmetic
`A.2
`Floating Point
`A.3
`Floating-Point Addition
`A.4
`Floating-Point Multiplication
`A.5
`Division and Remainder
`A.6
`Precisions and Exception Handling
`A.7
`Speeding Up Integer Addition
`A.8
`Speeding Up Integer Multiplication and Division
`A.9
`A.1 O Putting It All Together
`A.11 Fallacies and Pitfalls
`A.12 Historical Perspective and References
`Exercises
`
`Appendix B: Complete Instruction Set Tables
`B.1
`VAX User Instruction Set
`B.2
`System/360 Instruction Set
`B.3
`8086 Instruction Set
`
`570
`571
`572
`
`572
`
`574
`576
`580
`581
`
`582
`585
`
`587
`588
`592
`
`A·1
`
`A-1
`A-2
`A-12
`A-16
`A-20
`A-23
`A-28
`A-31
`A-39
`A-53
`A-57
`A-58
`A-63
`
`B·1
`B-2
`B-6
`B-9
`
`Appendix C: Detailed Instruction Set Measurements C·1
`C.1
`VAX Detailed Measurements
`C-2
`C-3
`C.2
`360 Detailed Measurements
`C.3
`Intel 8086 Detailed Measurements
`C-4
`C.4
`DLX Detailed Instruction Set Measurements
`C-5
`
`Ex.1035.018
`
`DELL
`
`

`

`Computer Architecture: A Quantitative Approach
`
`xv
`
`Appendix D: Time Versus Frequency Measurements D·1
`D.1
`Time Distribution on the VAX-11 /780
`D-2
`D.2
`Time Distribution on the IBM 370/168
`D-4
`D.3
`Time Distribution on an 8086 in an IBM PC
`D-6
`D.4
`Time Distribution on a DLX Relative
`D-8
`
`Appendix E: Survey of RISC Architectures
`E.1
`Introduction
`E.2
`Addressing Modes and Instruction Formats
`E.3
`Instructions: The DLX Subset
`E.4
`Instructions: Common Extensions to DLX
`E.5
`Instructions Unique to MIPS
`E.6
`Instructions Unique to SPARC
`E.7
`Instructions Unique to M88000
`E.8
`Instructions Unique to i860
`E.9
`Concluding Remarks
`E.10 References
`References
`Index
`
`E·1
`E-1
`E-2
`E-4
`E-9
`E-12
`E-15
`E-17
`E-19
`E-23
`E-24
`
`R·1
`1·1
`
`Ex.1035.019
`
`DELL
`
`

`

`DELL Ex.1035.020
`Ex.1035.020
`
`DELL
`
`

`

`Computer Architecture: A Quantitative Approach
`
`xvii
`
`Preface
`
`I started in 1962 to write a single book with this sequence of chapters, but soon
`found that it was more important to treat the subjects in depth rather than to
`skim over them lightly. The resulting length has meant that each chapter by itself
`contains enough material for a one semester course, so it has become necessary
`to publish the series in separate volumes ...
`
`Donald Knuth, The Art of Computer Programming,
`Preface to Volume 1 (of 7) (1968)
`
`Why We Wrote This Book
`
`Welcome to this book! We're glad to have the opportunity to communicate with
`you! There are so many exciting things happening in computer architecture, but
`we feel available materials just do not adequately make people aware of this.
`This is not a dreary science of paper machines that will never work. No! It's a
`discipline of keen intellectual interest, requiring balance of marketplace forces
`and cost/performance, leading to glorious failures and some notable successes.
`And it is hard to match the excitement of seeing thousands of people use the
`machine that you designed.
`Our primary goal in writing this book is to help change the way people learn
`about computer architecture. We believe that the field has changed from one that
`can only be taught with definitions and historical information, to one that can be
`studied with real examples and real measurements. We envision this book as
`suitable for a course in computer architecture as well as a primer or reference for
`professional engineers and computer architects. This book embodies a new
`approach to demystifying computer architecture-it emphasizes a quantitative
`approach to cost/performance tradeoffs. This does not imply an overly formal
`approach, but simply one that is grounded in good engineering design. To
`accomplish this, we've included lots of data about real machines, so that a reader
`can understand design tradeoffs in a quantitative as well as qualitative fashion. A
`significant component of this approach can be found in the problem sets at the
`end of every chapter, as well as the software that accompanies the book. Such
`exercises have long formed the core of science and engineering education. With
`
`Ex.1035.021
`
`DELL
`
`

`

`xviii
`
`Preface
`
`the emergence of a quantitative basis for teaching computer architecture, we feel
`the field has the potential to move toward the rigorous quantitative foundation of
`other disciplines.
`
`Topic Selection and Organization
`
`We have a conservative approach to topic selection, for there are many
`interesting ideas in the field. Rather than attempting a comprehensive survey of
`every architecture a reader might encounter today in practice or in the literature,
`we've chosen the core concepts of computer architecture that are likely to be
`included in any new machine. In making these decisions, a key criterion has
`been to emphasize ideas that have been sufficiently examined to be discussed in
`quantitative terms. For example, we concentrate on uniprocessors until the final
`chapter, where a bus-oriented, shared-memory multiprocessor is described. We
`believe this class of computer architecture will increase in popularity, but despite
`this perception it only met our criteria by a ·slim margin. Only recently has this
`class of architecture been examined in ways that allow us to discuss it
`quantitatively; a short time ago even this wouldn't have been included. Although
`large-scale parallel processors are of obvious importance to the future, it is our
`feeling that a firm basis in the principles of uniprocessor design is necessary
`before any practicing engineer tries to build a better computer of any
`organization; especially one incorporating multiple uniprocessors.
`Readers familiar with our research might expect this book to be only about
`reduced instruction set computers (RISCs). This is a mistaken judgment about
`the content of this book. Our hope is that design principles and quantitative data
`in this book will restrict discussions of architecture styles to terms like "faster"
`or "cheaper," unlike previous debates.
`The material we have selected has been stretched upon a consistent structure
`that is followed in every chapter. After explaining the ideas of a chapter, we
`include a "Putting It All Together" section that ties these ideas together by
`showing how they are used in a real machine. This is followed by a section,
`entitled "Fallacies and Pitfalls," that lets readers learn from the mistakes of
`others.We show examples of common misunderstandings and architectural
`traps that are difficult to avoid even when you know they are lying in wait for
`you. Each chapter ends with a "Concluding Remarks" section, followed by a
`"Historical Perspective and References" section that attempts to give proper
`credit for the ideas in the chapter and a sense of the history surrounding the
`inventions, presenting the human drama of computer design. It also supplies
`references that the student of architecture may want to pursue. If you have time,
`we recommend reading some of the classic papers in the field that are mentioned
`in these sections. It is both enjoyable and educational to hear the ideas from the
`mouths of the creators. Each chapter ends with Exercises, over 200 in total,
`which vary from one-minute reviews to term projects.
`
`Ex.1035.022
`
`DELL
`
`

`

`Computer Architecture: A Quantitative Approach
`
`xix
`
`A glance at the Table of Contents shows that neither the amount nor the depth
`of the material is equal from chapter to chapter. In the early chapters, for
`example, we have more basic material to ensure a common terminology and
`background. In talking with our colleagues, we found widely varying opinions
`of the backgrounds readers have, the pace at which they can pick up new
`material, and even the order in which ideas should be introduced. Our
`assumption is that the reader is familiar with logic design, and has had some
`exposure to at least one instruction set and basic software concepts. The pace
`varies with the chapters, with the first half gentler than the last half. The
`organizational decisions were formed in response to reviewer advice. The final
`organization was selected to conveniently suit the majority of courses (beyond
`Berkeley and Stanford!) with only minor modifications. Depending on your
`goals, we see three paths through this material:
`
`Introductory coverage: Chapters 1, 2, 3, 4, 5, 6.1-6.5, 8.1-8.5, 9.1-9.5, 10,
`and A.1-A.3.
`
`Intermediary coverage: Chapters 1, 2, 3, 4, 5, 6.1-6.6, 6.9-6.12, 8.1-8.7, 8.9-
`8.12, 9, 10, A (except skip division in Section A.9),
`and E.
`
`Advanced coverage:
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket