`
`'
`
`I ARCHITECTURE
`
`
`
`‘—
`
`QUANTITATIVE
`5APPROACH
`
`_
`
`
`
`fl‘IIIIIlI
`:
`I
`I
`JOHNL HENNESSY
`&
`I
`
`DAVID A PATTERSON
`
`
`
`_
`
`_
`
`1
`
`DELL Ex.1035.001
`Ex.1035.001
`
`DELL
`
`
`
`
`
`DELL Ex.1035.002
`Ex.1035.002
`
`DELL
`
`
`
`._>H._.;o>n=n
`..__Egg>.____.._:.%.OzPc>qu>i<m
`
`
`
`..012F52233nozgqmwg>Fn:_._..mn._.c_mm
`
`
`
`MORGAN
`KAUFMANN
`PUBLISHERS -
`INC
`'
`
`= g;-
`" j;
`
`.,
`'
`
`DELL Ex.1035.003
`Ex.1035.003
`
`DELL
`
`
`
`····rm11r11HHr-·
`
`D f" •1.
`t
`1 aoooo6534
`_._. ... ..., ... ..:;:" Mr .. nnec ure e 1n1 ions,
`and Rules of Thumb
`
`I
`T . . F
`r1v1a, ormu as,
`
`Definitions
`
`Big Endian: the byte with the binary address "x ... xOO" is in the most significant position
`("big end") of a 32-bit word (page 95).
`Clock rate: inverse of clock cycle time, usually measured in MHz (page 36).
`CPI: clock cycles per instruction (page 36).
`Hit rate: fraction of memory references found in the cache, equal to 1-Miss rate (page 404).
`Hit time: memory-access time for a cache hit, including time to determine if hit or miss (page 405).
`Instruction count: number of instructions executed while running a program (page 36).
`Little Endian: the byte with the binary address "x ... xOO" is in the least significant position
`("little end") of a 32-bit word (page 95).
`MIMD: (multiple instruction stream, multiple data stream) a multiprocessor or multicomputer
`(page 572).
`Miss penalty: time to replace a block in the top level of a cache system with the corresponding block
`from the lower level (page 405).
`Miss rate: fraction of memory references not found in the cache, equal to 1 - Hit rate (page 404 ).
`N112: the vector length needed to reach one-half of R00 (page 384).
`Nv: the vector length needed so that vector mode is faster than scalar mode (page 384).
`: the megaflop rate of an infinite-length vector (page 384).
`R 00
`RAW data hazard: (read after write) instruction tries to read a ~ource before a prior instruction writes
`it; so it incorrectly gets the old value (page 264).
`SIMD: (single instruction stream, multiple data stream) an array processor (page 572).
`SISD: (single instruction stream, single data stream) a uniprocessor (page 572).
`Spatial locality: (locality in space) if an item is referenced, nearby items will tend to be referenced
`soon (page 403).
`Temporal locality: (locality in time) if an item is referenced, it will tend to be referenced again soon
`(page 403).
`WAR data hazard: (write after read) instruction tries to write a destination before it is read by a prior
`instruction, so prior instruction incorrectly gets the new value (page 264).
`WAW data hazard: (write after write) instruction tries to write an operand before it is written by a
`prior instruction. The writes are performed in the wrong order, incorrectly leaving the value of
`the prior instruction in the destination (page 264).
`
`Trivia
`
`Byte order of machines (page 95)
`Big Endian: IBM 360, MIPS, Motorola, SPARC, DLX
`Little Endian: DEC VAX, DEC RISC, Intel 80x86
`
`Year and User Address Size of Generations of IBM and Intel Computer Families
`
`Year
`
`Model
`
`~
`
`1964
`1971
`1983
`1986
`
`IBM360
`IBM370
`IBM370-XA
`IBMESA/370
`
`User address size
`
`Year
`
`Model
`
`User address size
`
`24
`24
`31
`16+31
`
`1978
`1981
`1982
`1985
`1989
`
`Intel 8086
`Intel 80186
`Intel 80286
`Intel 80386
`Intel 80486
`
`4+16
`4+16
`16+16
`16+32 or 32
`16+32 or 32
`
`Ex.1035.004
`
`DELL
`
`
`
`Formulas
`
`1
`1. Amdahl's Law: Speedup=---------------
`.
`Fractionenhanced
`( l-Fract10nenhanced) + S
`d
`pee UPenhanced
`
`(page 8)
`
`2. CPU time= Instruction count* Clock cycles per instruction* Clock cycle time (page 36)
`
`3 Average memory-access time = Hit time + Miss rate * Miss penalty (page 405)
`
`4. Means-arithmetic(AM), weighted arithmetic(W AM), harmonic(HM) and weighted harmonic(WHM):
`n
`n
`AM= l L,Timei, WAM = L,Weighti * Timei, HM=
`n
`l
`. 1
`n
`n
`n. 1
`l=
`l=
`~Weighti
`~ 1
`..£...J Ratei
`~Ratei
`i=l
`i=l
`where Timei is the execution time for the ith program of a total of n in the workload, W eighti is the
`weighting of the ith program in the workload, and Ratei is a function of l!fimei (page 51).
`
`, WHM =
`
`5.
`
`. _ Cost of die + Cost of testing die + Cost of packaging (page
`.
`d
`.-r .
`C
`p· 1 t
`ost o1 integrate circuit -
`t
`· ld
`ma es y1e
`{ 1 Defects per unit area * Die area }--ex
`. Id W .c
`6 D .
`• ld
`+
`a
`.
`ie yie =
`aier y1e *
`where Wafer yield accounts for wafers that are so bad they need not be tested and a corresponds
`to the number of masking levels critical to die yield (usually a 2:: 2.0, page 59).
`
`)
`
`55
`
`d _ Clock cycle timen0 pipelining *
`.
`p .
`1
`ipe me spee up - Clock cycle timepipelined
`
`Ideal CPI * Pipeline depth
`Ideal CPI + Pipeline stall cycles per instruction
`
`7 ·
`
`where Pipeline stall cycles accounts for clock cycles lost due to pipeline hazards (page 258).
`
`8. System performance:
`Time0verlap
`Timeuo
`Timecpu
`.
`+ S
`d
`S
`·
`M
`(S
`d
`d
`T1meworkload = S
`d
`pee upcpu
`ax1mum pee upcpu, pee up11o)
`pee up1;0
`where Timecpu means the time the CPU is busy, Time1;0 means the time the I/0 system is busy,
`and Timeoverlap means the time both are busy. This formula assumes the overlap scales linearly
`with speedup (page 506).
`
`Rules of Thumb
`
`1. Amdahl/Case Rule: A balanced compu,ter system need~ about 1 megabyte of main memory
`capacity and 1 megabit per second of I/0 bandwidth per MIPS of CPU performance (page 17).
`2. 90110 Locality Rule: A program executes about 90% of its instructions in 10% of its code (pages
`11-12).
`3. DRAM-Growth Rule: Density increases by about 60% per year, quadrupling in 3 years (page 17).
`4. Disk-Growth Rule: Den~ity increases by about 25% per year, doubling in 3 years (page 17).
`5. Address-Consumption Rule: The memory needed by the average program grows by about a factor
`of 1.5 to 2 per year; thus, it consumes between 1/2 and 1 address bit per year (page 16).
`6. 90150 Branch-Taken Rule: About 90% of backward-going branches are taken while about 50% of
`forward-going branches are taken (page 108).
`7. 2:1 Cache Rule: The miss rate of a direct-mapped cache of size Xis about the same as a 2-way(cid:173)
`set-associative cache of size X/2 (page 421).
`
`Ex.1035.005
`
`DELL
`
`
`
`DELL Ex.1035.006
`Ex.1035.006
`
`DELL
`
`
`
`Computer
`Architecture
`A
`Quantitative
`Approach
`
`Ex.1035.007
`
`DELL
`
`
`
`DELL Ex.1035.008
`Ex.1035.008
`
`DELL
`
`
`
`Computer
`Architecture
`A
`Quantitative
`Approach
`
`David A. Patterson
`UNIVERSITY OF CALIFORNIA AT BERKELEY
`
`John L. Hennessy
`STANFORD UNIVERSITY
`
`With a Contribution by
`David Goldberg
`Xerox Palo Alto Research Center
`
`MORGAN KAUFMANN PUBLISHERS, INC.
`SAN MATEO, CALIFORNIA
`
`Ex.1035.009
`
`DELL
`
`
`
`Sponsoring Editor Bruce Spatz
`Production Manager Shirley Jowell
`Technical Writer Walker Cunningham
`Text Design Gary Head
`Cover Design David Lance Goines
`Copy Editor Linda Medoff
`Proofreader Paul Medoff
`Computer Typesetting and Graphics Fifth Street Computer Services
`
`Library of Congress Cataloging-in-Publication Data
`Patterson, David A.
`Computer architecture : a quantitative approach I David A.
`Patterson, John L. Hennessy
`p.
`cm.
`Includes bibliographical references
`ISBN 1-55860- 069-8
`1. Computer architecture. I. Hennessy, John L. II. Title.
`QA76.9.A73P377 1990
`004.2'2--dc20
`
`Morgan Kaufmann Publishers, Inc.
`Editorial Office: 2929 Campus Drive. San Mateo, CA 94403
`Order from: P.O. Box 50490, Palo Alto, CA 94303-9953
`
`© 1990 by Morgan Kaufmann Publishers, Inc.
`All rights reserved.
`
`89-85227
`
`CIP
`
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted
`in any form or by any means-electronic, mechanical, recording, or otherwise-without
`the prior permission of the publisher.
`
`All instruction sets and other design information of the DLX computer system contained
`herein is copyrighted by the publisher and may not be incorporated in other publications
`or distributed by media without formal acknowledgement and written consent from the
`publisher. Use of the DLX in other publications for educational purposes is encouraged
`and application for permission is welcomed.
`ADVICE, PRAISE, & ERRORS: Any correspondence related to this publication or
`intended for the authors should be addressed to the editorial offices of Morgan Kaufmann
`Publishers, Inc., Dept. P&H APE. Information regarding error sightings is encouraged.
`Any error sightings that are accepted for correction in subsequent printings will be
`rewarded by the authors with a payment of $1.00 (U.S.) per correction upon availability
`of the new printing. Electronic mail can be sent to bugs3@vsop.stanford.edu. (Please
`include your full name and permanent mailing address.)
`INSTRUCTOR SUPPORT: For information on classroom software and other instructor
`materials available to adopters, please contact the editorial offices of Morgan Kaufmann
`Publishers, Inc. (415) 578-9911.
`Third printing, i993
`
`Ex.1035.010
`
`DELL
`
`
`
`To Andrea, Linda, and our four sons
`
`Ex.1035.011
`
`DELL
`
`
`
`Trademarks
`The following trademarks are the property of the following organizations:
`
`Alliant is a trademark of Alliant Computers.
`AMD 29000 is a trademark of AMD.
`
`trademarks of Cypress
`
`TeX is a trademark of American Mathematical Society.
`AMI 6502 is a trademark of AMI.
`Apple I, Apple II, and Macintosh are trademarks of Apple Computer,
`Inc.
`ZS- I is a trademark of Astronautics.
`UNIX and UNIX F77 are trademarks of AT&T Bell Laboratories.
`Turbo C is a trademark of Borland International.
`The Cosmic Cube is a trademark of California Institute of Technology.
`Warp, C.mmp, and Cm* are trademarks of Carnegie-Mellon University.
`CP3 IOO is a trademark of Conner Peripherals.
`CDC 6600, CDC 7600, CDC STAR-100, CYBER-180, CYBER
`180/990, and CYBER-205 are trademarks of Control Data Corporation.
`Conve~, C-1, C-2, and C series are trademarks of Convex.
`CRA Y-3 is a trademark of Cray Computer Corporation.
`CRAY-I, CRAY-IS, CRAY-2, CRAY X-MP, CRAY X-MP/416,
`CRAY Y-MP, CFT77 V3.0, CFT, and CFT2 Vl.3a are trademarks of
`Cray Research.
`Cydra 5 is a trademark of Cydrome.
`CY7C601, 7C601, 7C604, and 7C157 are
`Semiconductor.
`Nova is a trademark of Data General Corporation.
`HEP is a trademark of Denelcor.
`CV AX, DEC, DECsystem, DECstation, DECstation 3100, DECsystem
`10/20, fort, LPl I, Massbus, MicroVAX-I, MicroVAX-II, PDP-8, PDP-
`10, PDP-I I, RS-1 !M/IAS, Unibus, Ultrix, Ultrix 3.0, VAX,
`V AXstation, V AXstation 2000, V AXstation 3100, VAX-I I, VAX-
`11/780, V AX-11/785, VAX Model 730, Model 750, Model 780, VAX
`8600, VAX 8700, VAX 8800, VS FORTRAN V2.4, and VMS are
`trademarks of Digital Equipment Corporation.
`BINAC is a trademark of Eckert-Mauchly Computer Corporation.
`Multimax is a trademark of Encore Computers.
`ET A 10 is a trademark of the ET A Corporation.
`SYMBOL is a trademark of Fairchild Corporation.
`Pegasus is a trademark of Ferranti, Ltd.
`Ferrari and Testarossa are trademarks of Ferrari Motors.
`AP-120B is a trademark of Floating Point Systems.
`Ford and Escort are trademarks Ford Motor Co.
`Gnu C Compiler is a trademark of Free Software Foundation.
`M2361A, Super Eagle, VPlOO, and VP200 are trademarks of Fujitsu
`Corporation.
`Chevrolet and Corvette are trademarks of General Motors Corporation.
`HP Precision Architecture, HP 850, HP 3000, HP 3000/70, Apollo DN
`300, Apollo DN 10000, and Precision are trademarks of Hewlett-Packard
`Company.
`S810, S810/200, and S820 are trademarks of Hitachi Corporation.
`Hyundai and Excel are trademarks of the Hyundai Corporation.
`432, 960 CA, 4004, 8008, 8080, 8086, 8087, 8088, 80186, 80286,
`80386, 80486, iAPX 432, i860, Intel, Multibus, Multibus II, and Intel
`Hypercube are trademarks of Intel Corporation.
`Inmos and Transputer are trademarks of Inmos.
`Clipper ClOO is a trademark of Intergraph.
`IBM, 360, 360/30, 360/40, 360/50, 360/65, 360/85, 360/91, 370,
`370/135, 370/138, 370/145, 370/155, 370/158, 370/165, 370/168, 370-
`XA, ESA/370, System/360, System/370, 701, 704, 709, 801, 3033, 3080,
`3080 series, 3080 VF, 3081, 3090, 3090/100, 3090/200, 3090/400,
`
`3090/600, 3090/600S, 3090 VF, 3330, 3380, 3380D, 3380 Disk Model
`AK4, 3380J, 3390, 3880-23, 3990, 7030, 7090, 7094, IBM FORTRAN,
`ISAM, MYS, IBM PC, IBM PC-AT, PL.8, RT-PC, SAGE Stretch
`IBM SYS, Vector Facility, and VM are trademarks of Inte~ationai
`Business Machines Corporation.
`FutureBus is a trademark of the Institute of Electrical and Electronic
`Engineers.
`Lamborghini and Countach are trademarks of Nuova Automobili
`Ferrucio Lamborghini, SP A.
`Lotus 1-2-3 is a trademark of Lotus Development Corporation.
`MB8909 is a trademark of LSI Logic.
`NuBus is a trademark of Massachusetts Institute of Technology.
`Miata and Mazda are trademarks of Mazda.
`MASM, Microsoft Macro Assembler, MS DOS, MS DOS 3.1, and OS/2
`are trademarks of Microsoft Corporation.
`MIPS, MIPS 120, MIPS/120A, M/500, M/1000, RC6230, RC6280,
`R2000, R2000A, R2010, R3000, and R3010 are trademarks of MIPS
`Computer Systems.
`Delta Series 8608, System V/88 R32V!, VME bus, 6809, 68000, 68010,
`68020, 68030, 68882, 88000, 88000 l.8.4ml4, 88100, and 88200 are
`trademarks of Motorola Corporation.
`Multiflow is a trademark of Multiflow Corporation.
`National 32032 and 32x32 are trademarks of National Semiconductor
`Corporation.
`Ncube is a trademark of Ncube Corporation.
`SX/2, SX/3, and FORTRAN 77/SX V.040 are
`Information Systems.
`NYU Ultracomputer is a trademark of New York University.
`VAST-2 v.2.21 is a trademark of Pacific Sierra.
`Wren IV, Imprimis, Sabre, Sabre 97209, and IPI-2 are trademarks of
`Seagate Corporation.
`Sequent, Balance 800, Balance 21000, and Symmetry are trademarks of
`Sequent Computers.
`Silicon Graphics 4D/60, 4D/240, and Silicon Graphics 4D Series are
`trademarks of Silicon Graphics.
`Stellar GS 1000, Stardent-1500, and Ardent Titan-I are trademarks of
`Stardent.
`Sun 2, Sun 3, Sun 3/75, Sun 3/260, Sun 3/280, Sun 4, Sun 4/110, Sun
`4/260, Sun 4/280, SunOS 4.0.3c, Sun 1.2 FORTRAN compiler, SPARC,
`and SPARCstation I are trademarks of Sun Microsystems.
`Synapse N+ I is a trademark of Synapse.
`Tandem and Cyclone are trademarks of Tandem Computers.
`TI 8847 and TI ASC are trademarks of Texas Instruments Corporation.
`Connection Machine and CM-2 are trademarks of Thinking Machines.
`Burroughs 6500, B5000, B5500, D-machine, UNIV AC, UNIV AC I,
`UNIV AC 1103 are trademarks of UNISYS.
`Spice and 4.2 BSD UNIX are trademarks of University of California,
`Berkeley.
`Illiac, Illiac IV, and Cedar are trademarks of University of Illinois.
`Ada is a trademark of the U.S. Government (Ada Joint Program Office).
`Weitek 3364, Weitek 1167, WTL 3110, and WTL 3170 are trademarks
`of Weitek Computers.
`Alto, Ethernet, PARC, Palo Alto Research Center, Smalltalk, and Xerox
`are trademarks of Xerox Corporation.
`Z-80 is a trademark of Zilog.
`
`trademarks of NEC
`
`Ex.1035.012
`
`DELL
`
`
`
`Foreword
`
`by C. Gordon Bell
`
`I am delighted and honored to write the foreword for this landmark book.
`The authors have gone beyond the contributions of Thomas to Calculus and
`Samuelson to Economics. They have provided the definitive text and reference
`for computer architecture and design. To advance computing, I urge publishers
`to withdraw the scores of books on this topic so a new breed of architect/
`engineer can quickly emerge. This book won't eliminate the complex and
`errorful microprocessors from semicomputer companies, but it will hasten the
`education of engineers who can design better ones.
`The book presents the critical tools to analyze uniprocessor computers. It
`shows the practicing engineer how technology changes over time and offers the
`empirical constants one needs for design. It motivates the designer about func(cid:173)
`tion, which is a welcome departure from the usual exhaustive shopping list of
`mechanisms that a naive designer might attempt to include in a single design.
`The authors establish a baseline for analysis and comparisons by using the
`most important machine in each class: mainframe (IBM 360), mini (DEC VAX),
`and micro/PC (Intel 80x86). With this foundation, they show the coming
`mainline of simpler pipelined and parallel processors. These new technologies
`are shown as variants of their pedagogically useful, but highly realizable,
`processor (DLX). The authors stress technology independence by measuring
`work done per clock (parallelism), and time to do work (efficiency and latency).
`These methods should also improve the quality of research on new architectures
`and parallelism.
`Thus, the book is required understanding for anyone working with architec(cid:173)
`ture or hardware, including architects, chip and computer system engineers, and
`compiler and operating system engineers. It is especially useful for software
`engineers writing programs for pipelined and vector computers. Managers and
`marketers will benefit by knowing the Fallacies and Pitfalls sections of the book.
`One can lay the demise of many a computer-and, occasionally, a company--on
`engineers who fail to understand the subtleties of computer design.
`The first two chapters establish the essence of computer design through
`measurement and the understanding of price/performance. These concepts are
`applied to the instruction set architecture and how it is measured. They discuss
`the implementation of processors and include extensive discussions of tech(cid:173)
`niques for designing pipelined and vector processors. Chapters are also devoted
`to memory hierarchy and the often-neglected input/output. The final chapter
`
`ix
`
`Ex.1035.013
`
`DELL
`
`
`
`x
`
`Foreword
`
`presents the opportunities and questions about machines and directions of the
`future. Now, we need their next book on how to build these machines.
`The reason this book sets a standard above all others and is unlikely to be
`superseded in any foreseeable future is the understanding, experience, taste, and
`uniqueness of the authors. They have stimulated the major change in architecture
`by their work on RISC (Patterson coined the word). Their university research
`leading to product development at MIPS and Sun Microsystems established
`important architectures for the 1990s. Thus, they have done the analysis,
`evaluated the trade-offs, worked on the compilers and operating systems, and
`seen their machines achieve significance in use. Furthermore, as teachers, they
`have seen that the book is pedagogically sound (and have solicited opinions
`from others through the unprecedented Beta testing program). I know this will
`be the book of the decade in computer systems. Perhaps its greatest
`accomplishment would be to stimulate other great architects and designers of
`higher-level systems (databases, communications systems, languages and
`operating systems) to write similar books about their domains.
`I've already enjoyed and learned from the book, and surely you will too.
`
`-C. Gordon Bell
`
`Ex.1035.014
`
`DELL
`
`
`
`1
`
`2
`
`3
`
`Computer Architecture: A Quantitative Approach
`
`xi
`
`Contents
`
`Foreword
`by C. GORDON BELL
`
`Preface
`
`Acknowledgements
`
`Fundamentals of Computer Design
`1.1
`Introduction
`1.2
`Definitions of Performance
`1.3
`Quantitative Principles of Computer Design
`1.4
`The Job of a Computer Designer
`1.5
`Putting It All Together: The Concept of Memory Hierarchy
`1.6
`Fallacies and Pitfalls
`1.7
`Concluding Remarks
`1.8
`Historical Perspective and References
`Exercises
`
`Performance and Cost
`2.1
`Introduction
`2.2
`Performance
`2.3
`Cost
`2.4
`Putting It All Together: Price/Performance of Three Machines
`2.5
`Fallacies and Pitfalls
`2.6
`Concluding Remarks
`2.7
`Historical Perspective and References
`Exercises
`
`Instruction Set Design:
`Alternatives and Principles
`3.1
`Introduction
`3.2
`Classifying Instruction Set Architectures
`3.3
`Operand Storage in Memory: Classifying General-Purpose
`Register Machines
`3.4
`Memory Addressing
`3.5
`Operations in the Instruction Set
`3.6
`Type and Size of Operands
`3.7
`The Role of High-Level Languages and Compilers
`3.8
`Putting It All Together: How Programs Use Instruction Sets
`3.9
`Fallacies and Pitfalls
`3.10 Concluding Remarks
`3.11 Historical Perspective and References
`Exercises
`
`ix
`
`xvii
`xx iii
`
`2
`3
`5
`8
`13
`18
`21
`22
`23
`28
`
`32
`33
`35
`53
`66
`70
`76
`77
`81
`
`88
`89
`90
`
`92
`94
`103
`109
`111
`122
`124
`126
`127
`132
`
`Ex.1035.015
`
`DELL
`
`
`
`xii
`
`4
`
`5
`
`6
`
`Contents
`
`Instruction Set Examples and
`Measurements of Use
`Instruction Set Measurements: What and Why
`4.1
`4.2
`The VAX Architecture
`The 360/370 Architecture
`4.3
`The 8086 Architecture
`4.4
`The DLX Architecture
`4.5
`Putting It All Together: Measurements
`4.6
`of Instruction Set Usage
`Fallacies and Pitfalls
`Concluding Remarks
`Historical Perspective and References
`Exercises
`
`4.7
`4.8
`4.9
`
`Basic Processor Implementation Techniques
`5.1
`Introduction
`5.2
`Processor Datapath
`5.3
`Basic Steps of Execution
`Hardwired Control
`5.4
`5.5
`Microprogrammed Control
`Interrupts and Other Entanglements
`5.6
`5.7
`Putting It All Together: Control for DLX
`5.8
`Fallacies and Pitfalls
`5.9
`Concluding Remarks
`5.10 Historical Perspective and References
`Exercises
`
`Pipelining
`6.1 What Is Pipelining?
`6.2
`The Basic Pipeline for DLX
`Making the Pipeline Work
`6.3
`The Major Hurdle of Pipelining-Pipeline Hazards
`6.4
`6.5 What Makes Pipelining Hard to Implement
`Extending the DLX Pipeline to Handle Multicycle Operations
`6.6
`6.7
`Advanced Pipelining-Dynamic Scheduling in Pipelines
`Advanced Pipelining-Taking Advantage of More
`6.8
`Instruction-Level Parallelism
`Putting It All Together: A Pipelined VAX
`6.9
`6.10 Fallacies and Pitfalls
`6.11 Concluding Remarks
`6.12 Historical Perspective and References
`Exercises
`
`138
`139
`142
`148
`153
`160
`
`167
`183
`185
`186
`191
`
`198
`199
`201
`202
`204
`208
`214
`220
`238
`240
`241
`244
`
`250
`251
`252
`255
`257
`278
`284
`290
`
`314
`328
`334
`337
`338
`343
`
`r
`
`Ex.1035.016
`
`DELL
`
`
`
`7
`
`8
`
`9
`
`Computer Architecture: A Quantitative Approach
`
`xiii
`
`Vector Processors
`7.1
`Why Vector Machines?
`7.2
`Basic Vector Architecture
`Two Real-World Issues: Vector Length and Stride
`7.3
`A Simple Model for Vector Performance
`7.4
`Compiler Technology for Vector Machines
`7.5
`Enhancing Vector Performance
`7.6
`7.7
`Putting It All Together: Evaluating the
`Performance of Vector Processors
`Fallacies and Pitfalls
`7.8
`Concluding Remarks
`7.9
`7.10 Historical Perspective and References
`Exercises
`
`Memory-Hierarchy Design
`Introduction: Principle of Locality
`8.1
`General Principles of Memory Hierarchy
`8.2
`8.3
`Caches
`Main Memory
`8.4
`8.5
`Virtual Memory
`Protection and Examples of Virtual Memory
`8.6
`More Optimizations Based on Program Behavior
`8.7
`Advanced Topics-Improving Cache-Memory Performance
`8.8
`Putting It All Together: The VAX-11/780 Memory Hierarchy
`8.9
`8.10 Fallacies and Pitfalls
`8.11 Concluding Remarks
`8.12 Historical Perspective and References
`Exercises
`
`Input/Output
`9.1
`Introduction
`9.2
`Predicting System Performance
`1/0 Performance Measures
`9.3
`Types of 1/0 Devices
`9.4
`Buses-Connecting 1/0 Devices to CPU/Memory
`9.5
`Interfacing to the CPU
`9.6
`9.7
`Interfacing to an Operating System
`Designing an 1/0 System
`9.8
`Putting It All Together:
`9.9
`The IBM 3990 Storage Subsystem
`9.10 Fallacies and Pitfalls
`9.11 Concluding Remarks
`9.12 Historical Perspective and References
`Exercises
`
`350
`351
`353
`364
`369
`371
`377
`
`383
`390
`392
`393
`397
`402
`403
`404
`408
`425
`432
`438
`449
`454
`475
`480
`484
`485
`490
`498
`499
`501
`506
`512
`528
`533
`535
`539
`
`546
`554
`559
`560
`563
`
`Ex.1035.017
`
`DELL
`
`
`
`xiv
`
`10
`
`Contents
`
`Future Directions
`10.1
`Introduction
`10.2 Flynn Classification of Computers
`10.3 SIMD Computers-Single Instruction
`Stream, Multiple Data Streams
`10.4 MIMD Computers-Multiple Instruction
`Streams, Multiple Data Streams
`10.5 The Roads to El Dorado
`10.6 Special-Purpose Processors
`10.7 Future Directions for Compilers
`10.8 Putting It All Together: The Sequent Symmetry
`Multiprocessor
`10.9 Fallacies and Pitfalls
`10.1 O Concluding Remarks-Evolution Versus
`Revolution in Computer Architecture
`10.11 Historical Perspective and References
`Exercises
`
`Appendix A: Computer Arithmetic
`by DAVID GOLDBERG
`Xerox Palo Alto Research Center
`Introduction
`A.1
`Basic Techniques of Integer Arithmetic
`A.2
`Floating Point
`A.3
`Floating-Point Addition
`A.4
`Floating-Point Multiplication
`A.5
`Division and Remainder
`A.6
`Precisions and Exception Handling
`A.7
`Speeding Up Integer Addition
`A.8
`Speeding Up Integer Multiplication and Division
`A.9
`A.1 O Putting It All Together
`A.11 Fallacies and Pitfalls
`A.12 Historical Perspective and References
`Exercises
`
`Appendix B: Complete Instruction Set Tables
`B.1
`VAX User Instruction Set
`B.2
`System/360 Instruction Set
`B.3
`8086 Instruction Set
`
`570
`571
`572
`
`572
`
`574
`576
`580
`581
`
`582
`585
`
`587
`588
`592
`
`A·1
`
`A-1
`A-2
`A-12
`A-16
`A-20
`A-23
`A-28
`A-31
`A-39
`A-53
`A-57
`A-58
`A-63
`
`B·1
`B-2
`B-6
`B-9
`
`Appendix C: Detailed Instruction Set Measurements C·1
`C.1
`VAX Detailed Measurements
`C-2
`C-3
`C.2
`360 Detailed Measurements
`C.3
`Intel 8086 Detailed Measurements
`C-4
`C.4
`DLX Detailed Instruction Set Measurements
`C-5
`
`Ex.1035.018
`
`DELL
`
`
`
`Computer Architecture: A Quantitative Approach
`
`xv
`
`Appendix D: Time Versus Frequency Measurements D·1
`D.1
`Time Distribution on the VAX-11 /780
`D-2
`D.2
`Time Distribution on the IBM 370/168
`D-4
`D.3
`Time Distribution on an 8086 in an IBM PC
`D-6
`D.4
`Time Distribution on a DLX Relative
`D-8
`
`Appendix E: Survey of RISC Architectures
`E.1
`Introduction
`E.2
`Addressing Modes and Instruction Formats
`E.3
`Instructions: The DLX Subset
`E.4
`Instructions: Common Extensions to DLX
`E.5
`Instructions Unique to MIPS
`E.6
`Instructions Unique to SPARC
`E.7
`Instructions Unique to M88000
`E.8
`Instructions Unique to i860
`E.9
`Concluding Remarks
`E.10 References
`References
`Index
`
`E·1
`E-1
`E-2
`E-4
`E-9
`E-12
`E-15
`E-17
`E-19
`E-23
`E-24
`
`R·1
`1·1
`
`Ex.1035.019
`
`DELL
`
`
`
`DELL Ex.1035.020
`Ex.1035.020
`
`DELL
`
`
`
`Computer Architecture: A Quantitative Approach
`
`xvii
`
`Preface
`
`I started in 1962 to write a single book with this sequence of chapters, but soon
`found that it was more important to treat the subjects in depth rather than to
`skim over them lightly. The resulting length has meant that each chapter by itself
`contains enough material for a one semester course, so it has become necessary
`to publish the series in separate volumes ...
`
`Donald Knuth, The Art of Computer Programming,
`Preface to Volume 1 (of 7) (1968)
`
`Why We Wrote This Book
`
`Welcome to this book! We're glad to have the opportunity to communicate with
`you! There are so many exciting things happening in computer architecture, but
`we feel available materials just do not adequately make people aware of this.
`This is not a dreary science of paper machines that will never work. No! It's a
`discipline of keen intellectual interest, requiring balance of marketplace forces
`and cost/performance, leading to glorious failures and some notable successes.
`And it is hard to match the excitement of seeing thousands of people use the
`machine that you designed.
`Our primary goal in writing this book is to help change the way people learn
`about computer architecture. We believe that the field has changed from one that
`can only be taught with definitions and historical information, to one that can be
`studied with real examples and real measurements. We envision this book as
`suitable for a course in computer architecture as well as a primer or reference for
`professional engineers and computer architects. This book embodies a new
`approach to demystifying computer architecture-it emphasizes a quantitative
`approach to cost/performance tradeoffs. This does not imply an overly formal
`approach, but simply one that is grounded in good engineering design. To
`accomplish this, we've included lots of data about real machines, so that a reader
`can understand design tradeoffs in a quantitative as well as qualitative fashion. A
`significant component of this approach can be found in the problem sets at the
`end of every chapter, as well as the software that accompanies the book. Such
`exercises have long formed the core of science and engineering education. With
`
`Ex.1035.021
`
`DELL
`
`
`
`xviii
`
`Preface
`
`the emergence of a quantitative basis for teaching computer architecture, we feel
`the field has the potential to move toward the rigorous quantitative foundation of
`other disciplines.
`
`Topic Selection and Organization
`
`We have a conservative approach to topic selection, for there are many
`interesting ideas in the field. Rather than attempting a comprehensive survey of
`every architecture a reader might encounter today in practice or in the literature,
`we've chosen the core concepts of computer architecture that are likely to be
`included in any new machine. In making these decisions, a key criterion has
`been to emphasize ideas that have been sufficiently examined to be discussed in
`quantitative terms. For example, we concentrate on uniprocessors until the final
`chapter, where a bus-oriented, shared-memory multiprocessor is described. We
`believe this class of computer architecture will increase in popularity, but despite
`this perception it only met our criteria by a ·slim margin. Only recently has this
`class of architecture been examined in ways that allow us to discuss it
`quantitatively; a short time ago even this wouldn't have been included. Although
`large-scale parallel processors are of obvious importance to the future, it is our
`feeling that a firm basis in the principles of uniprocessor design is necessary
`before any practicing engineer tries to build a better computer of any
`organization; especially one incorporating multiple uniprocessors.
`Readers familiar with our research might expect this book to be only about
`reduced instruction set computers (RISCs). This is a mistaken judgment about
`the content of this book. Our hope is that design principles and quantitative data
`in this book will restrict discussions of architecture styles to terms like "faster"
`or "cheaper," unlike previous debates.
`The material we have selected has been stretched upon a consistent structure
`that is followed in every chapter. After explaining the ideas of a chapter, we
`include a "Putting It All Together" section that ties these ideas together by
`showing how they are used in a real machine. This is followed by a section,
`entitled "Fallacies and Pitfalls," that lets readers learn from the mistakes of
`others.We show examples of common misunderstandings and architectural
`traps that are difficult to avoid even when you know they are lying in wait for
`you. Each chapter ends with a "Concluding Remarks" section, followed by a
`"Historical Perspective and References" section that attempts to give proper
`credit for the ideas in the chapter and a sense of the history surrounding the
`inventions, presenting the human drama of computer design. It also supplies
`references that the student of architecture may want to pursue. If you have time,
`we recommend reading some of the classic papers in the field that are mentioned
`in these sections. It is both enjoyable and educational to hear the ideas from the
`mouths of the creators. Each chapter ends with Exercises, over 200 in total,
`which vary from one-minute reviews to term projects.
`
`Ex.1035.022
`
`DELL
`
`
`
`Computer Architecture: A Quantitative Approach
`
`xix
`
`A glance at the Table of Contents shows that neither the amount nor the depth
`of the material is equal from chapter to chapter. In the early chapters, for
`example, we have more basic material to ensure a common terminology and
`background. In talking with our colleagues, we found widely varying opinions
`of the backgrounds readers have, the pace at which they can pick up new
`material, and even the order in which ideas should be introduced. Our
`assumption is that the reader is familiar with logic design, and has had some
`exposure to at least one instruction set and basic software concepts. The pace
`varies with the chapters, with the first half gentler than the last half. The
`organizational decisions were formed in response to reviewer advice. The final
`organization was selected to conveniently suit the majority of courses (beyond
`Berkeley and Stanford!) with only minor modifications. Depending on your
`goals, we see three paths through this material:
`
`Introductory coverage: Chapters 1, 2, 3, 4, 5, 6.1-6.5, 8.1-8.5, 9.1-9.5, 10,
`and A.1-A.3.
`
`Intermediary coverage: Chapters 1, 2, 3, 4, 5, 6.1-6.6, 6.9-6.12, 8.1-8.7, 8.9-
`8.12, 9, 10, A (except skip division in Section A.9),
`and E.
`
`Advanced coverage:
`
`