`
`I, Rachel J. Watters, am a librarian, and the Director of Wisconsin TechSearch
`
`("WTS"), located at 728 State Street, Madison, Wisconsin, 53706. WTS is an
`
`interlibrary loan department at the University of Wisconsin-Madison. I have worked as
`
`a librarian at the University of Wisconsin library system since 1998. I have been
`
`employed at WTS since 2002, first as a I ibrarian and, beginning in 2011, as the Director.
`
`Through the course of my employment, I have become well informed about the
`
`operations of the University of Wisconsin library system, which follows standard library
`
`practices.
`
`This Declaration relates to the dates of receipt and availability of the following:
`
`Hennessy, J.L. and Patterson, D.A. (1998) Computer
`Organization and Design: the Hardware/Software Interface. San
`Francisco, CA: Morgan Kaufmann Publishers.
`
`Standard operaNng procedures for materials at the University of Wisconsin-
`
`Madison Libraries. When a volume was received by the Library, it would be checked
`
`in, added to library holdings records, and made available to readers as soon after its
`
`arrival as possible. The procedure normally took a few days or at most 2 to 3 weeks.
`
`Exhibit A to this Declaration is true and accurate copy of the front matter of the
`
`Computer Organi::ation and Design: the Hardware/5ic~ftware Jnte1:face ( 1998)
`
`publication, which incl udcs a stamp on the verso page showing that this book is the
`
`property of the University of Wisconsin-Madison Libraries.
`
`1
`
`INTEL - 1013
`
`
`
`Declaration of Rachel J. Watters on Authentication of Publication
`
`Attached as Exhibit B is the cataloging system record of the University of
`
`Wisconsin-Madison Libraries for its copy of the Computer Organization and Design:
`
`the Hardware/Sofnvare Interface ( 1998) publication. As shown in the "Receiving date"
`
`field of this Exhibit, the University of Wisconsin-Madison Libraries owned this book
`
`and had it cataloged in the system as of June 6, 1999.
`
`Members of the interested public could locate the Computer Organization and
`
`Design: the Hardware/Software Interface (1998) publication after it was cataloged by
`
`searching the public library catalog or requesting a search through WTS. The search
`
`could be done by title, author, and/or subject key words. Members of the interested
`
`public could access the publication by locating it on the library's shelves or requesting it
`
`from WTS.
`
`I declare that all statements made herein of my own kQ.owledge are true and that
`
`all statements made on information and belief are believed to be true; and further that
`
`these statements were made with the knowledge that willful false statements and the like
`
`so made are punishable by fine or imprisonment, or both, under Section I 001 of Title I 8
`
`of the United States Code.
`
`Date: January 30, 2020
`
`Wisconsin TechSearch
`Memorial Library
`728 State Street
`Madison, Wisconsin 53706
`
`Director
`
`2
`
`INTEL - 1013
`
`
`
`SECOND ED IT ION
`
`Computer Organization and Design
`
`THE HARDWARE / SOFTWARE INTERFACE
`
`INTEL - 1013
`
`
`
`T R A D E M A R K S
`
`Th" following trad emarks arc the property ol the following organiz,,lions:
`
`T,·X is a trademark of America! M,1the matical Sodety.
`
`Ap ple II and Macintosh are t rad e mMks of Apple Compuk r~, Inc.
`CDC 6b0~, CDC 7b00, CDC STAR·lOO, CYBER-JSU. CYBl::R·
`1811/990, and CYBER-20.:; are trademarks of Control Data C,irpor,, (cid:173)
`tion.
`
`11,e Cosmic Cub" is a tr,,dcmark o f C.11ifornia lns Htutc of Te,:hnnl(cid:173)
`ogy.
`C PJJOIJ is a trademark of Conner Pe ripherals.
`Cra y. CllAY-1, CRAY J<IIJ, CRAY T<l0, C RAY X-MP/ 416, and
`CRAY Y-M I' are lr~d.imarks of Cray Res~arch.
`
`Alpha , AlphaSer,·t:r, AlphnStation, D EC, UEC~ystem, DECsyst.-m
`3100, DECstation, l'OP-8. PDP-11, Unib us, VAX, VAX 37ll0, and
`VAXl 1 /7&J MC trademarks of Digit,11 E<1uipmen t Corporation.
`MP2361A, Su per Eagle, VPIOO, VP2lllJ, and Vl'l'300,u,• tra demarks
`of Fujitsu Corpora tion.
`
`Gnu C Compiler is a trademark of Fre.i Softw,irc Fo111\t1ath:,n.
`
`Goodyear MPP is a trad emark of Goodyear Tire am\ Rubber Co.,
`Inc.
`Apollo ON 300, Apollo DN 10000, Corw~x. I IP, HP Pred,ion
`Archil~cturc, HPPA, H P850, HP 30Ull, HP 300/70, P/\-RTSC, and
`Prl!Cision M<' r.igist.ired trademarks of H.,wlet-l',1eka rd Cllmpany.
`432,960 CA, 4004, 8008, 8080, SU86, llll117, 8088, 80136, 11028b, 8038b,
`80480, Della, iAJ'X ~32. i8ol!, Intel, lnt.il4&, Intel Hypercub.-. iP(cid:173)
`SC/2, MMX, Mu ltib~s, Multibus JI, Parai,;on, and Pentium a re
`trado1marks of Intel Corporation. ln td ln~id o1 is a registered trade·
`m a rk vf Intel Corporation.
`
`360, 360/30, 3b0/ 40, 300/50, 3()()/C,5, 360/ 85, 3bll/<11, 370, 370/158,
`370/165, 370/ 168, 371>-XA, F.SA/370, 701, 704, 709, ~01, 3033, 3080,
`3080 series, 30RO VI', 3081, 3090, 30911/ltlO, 30'l0/20U, 3090/ 400,
`3090/600, 3090/6005, 30'10 VI', ,1330, 3380, 33801), 3380 Disk Modd
`AK4, 3380), 3390, 38110-23, J<J9il, 7090, 7W4, IBM, IBM l'C, IBM PC(cid:173)
`AT, IBM SVS, ISAM, MVS, Pl..8, Pow.,rPC. POWERstation, RT-I'C,
`RAMAC, RS/6000, Sage, Stretch, System/:160, Vector Faility . aml
`VM ar.- trademarks of International Business Machines Corpor,1•
`lion. PO WERserver, RISC Sy~tem/60Ull, ~nd SP2 nrc rcKistered
`trademarks <>f ln t<:mational Business Machines Corp<>r,, tiun.
`
`ICL DAI' is~ tradt.'mark nf lnterMtional C nmputers Lim\to!d.
`In mos and Transputer Mc trademarks of In mos.
`
`FulurcBus is a tradcmMk of the Institute o f 1::lectrical and Electron(cid:173)
`ic l:ngin.,ers.
`
`KSR-1 is a trademark of Kendall S,1u,,re Research.
`
`MASP/\R MP-1 And MASPAR MI'-2 ,,r~ tr~dcmarks of MMPar
`Corp<•rilti,m.
`Mll"S, R2000, R31)0U, ,,ml RI 0000 are regis te red trad"marks o f
`MIPS Tcdm,,logy, Inc.
`
`Windows is a trademark of Microsoft Corporation.
`
`Nu Bus is., tr.1dcmark of Massachu~••tts Institute of Technology.
`D•lta Series l:!61!~, System V/AA R32V1, VME bus, b809, b8UOO.
`btlOIO, 1>81120, hROJll, ol\881, (>81!82, 8800U, K!«l<lO 1.8 .4m14, ~8100,
`and ~ll20IJ Me trademarks uf Mt>tnmla Corf)(>ration.
`
`Ncube an d nCu bc / ten are tradt1m,uks of Nt-ube Corporation.
`NEC is a ro,gl~tercd tr,1d~n,ark of :--JECCurporation.
`Network Com pute r is a trademark of Ornd• Corpor,,tion.
`
`l'a rsyt,·c C.C is a tT.iti~mark o f l'arsytcc, Inc.
`lm primi,, I l'l-2, S.1br.i, Sabre '17W<i, S¢ai;atl', ,md Wren IV are
`trad,•marks of SeaKat~ Teclnmloi,;y, Inc.
`
`NUMA•Q , S...")""111, and Symmd ry are trademarks of Sequent
`Cllmputcrs.
`
`Power C h,,11.,ngc, Silicon Gra phics, Silicon Graphics 43/ 240,
`5ilicon Gr,,phlc~ 4D/ 611, Silicon Grnphic. 4D/240, and Silkun
`Gr,,p hics 4 D !kri.-$ are trademarks of SIiicon Graphics. O rlgin2000
`is a registered trademark of Silicon Graphics.
`SPEC is a r'-'11isle rL'<f lr,,d ~mark of the Standard Performance Eval(cid:173)
`uatio n Corpora tion.
`Spice is a trad~milrk o( Uniwn-ity of California al Berkeley.
`
`En k>rprise, Java, Sun, 5un Ultra, Sun Microsystems, and Ultra are
`trademarks o f Su n Microsystems, Inc. SPARC and UltraSPARC
`a,., registered trademarks of SPARC In terna tional, Inc., licensed tv
`Sun Microsy.stenis, Inc.
`
`Conncctit>n Machin e, C M· 2, and C M-.5are trad emarks of Thinking
`Machines.
`Burroui,;hts b511ll, B500ll, B5500, 0 -machine, UNIV /\C, U NIV AC !,
`and U NIV AC 1103 are trademarks of UN !SYS.
`
`Alto, PARC, P,,lo Alto Rcs..,arch Cc11tcr. and Xerox are trademarks
`of Xerox Corpor,,ti<•n.
`Tht1 UNIX tradcn,ark i~ licensed excluslw ly through X/ Ope11
`Comp,,ny Ltd.
`
`All 0 th..,. product nanws arc trad emarks or rcgiskred trademarks
`of their respcctiv.., m mJMni,.,., Where tradema rks ap pear in this
`book and Morgan K,rnfm,,nn Publishe rs w as aw.ue of a tr,,d.,mark
`dai,n, the tr.,dcmarks haw bcc11 print~d in initial caps <>r a ll ca ps.
`
`INTEL - 1013
`
`
`
`S E C O N D ED IT ION
`
`Computer Organization and Design
`
`THE HARDWARE / SOFTWARE
`
`INTERFACE
`
`John L. Henne"y
`Stanford University
`
`David A. Patterson
`University of California, Berkeley
`
`With a contribution by
`James R. Larus
`University of Wisconsin
`
`M~(cid:141)
`
`Morgan Kaufmann Publishers, Inc.
`San Francisco, California
`
`INTEL - 1013
`
`
`
`Kurt 1<. Wendt Librnry
`University of Wizconsii,-Madison
`215 N. Randall A11enue
`Madison, WI 53706-1680
`
`Spon~rlng Editor Denise Pcnrust:
`Production Manager Yonic Overhm
`Production Editor Julie Pabst
`Editorial Coordinator Jane Elliott
`Text and Cover Dealgn Ross Carron Design
`Uluetratlon Alexand~r Tesb ln Associates, with second edition modifications by Dartmouth
`Publishing, Inc.
`Chapter Opener Illustrations Canary Studios
`Copyedltor Ken De!lilPcnt,1
`Composition Nancy Lc,~an
`Proofreader ),mnifcr McClain
`Indexer Steve Rath
`Printer Courier Corporation
`
`Morgan Kaufmann Pul,lisher.i, Inc.
`Editorial and Sales Office:
`340 Pine Street, Sixth Floor
`San Fnmcisco, CA <,(cid:157) J0-1-320~
`USA
`
`Tdephone
`-115/392-26~5
`F-'<:simile 415/<182-2!>65
`Email mkp@mkf• . .-o1111
`WWW hftp;//11•111w.mk11.com
`Onfor toll fl>'(' 800/745-7323
`
`•£> 1998 by Morgan Kaufmann Publishers, Inc.
`All rights reserved
`Printed in lht: United States t'f Americd
`
`()2 01 00 <l<J
`
`5 4 3 2
`
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`or by any means-elt'ctronic, mechanical, photocopying, recording, or otherwise-without the prior
`written permission of the publisher.
`
`Advice, Prelse, and Errors: Any cmr,·~pondence rel,11,,d to this p ublication nr int<>ndcd for the nuthnrs
`shouM be st!nt dectronically to n n/21•11s,@111kj1.COIII. lnform,1tinn ,egarding error sightings is ,•11cournged.
`Any t>rrm sightings that ar,· i\cc:cptl'd for corredion in subsequent p rintin11s will be rewarded by th,,
`,rnthors with ,1 pa>•mcnt of SUH) (U.S.) per correction at the linw of their implcnwntalion in a reprint.
`
`Library of Congress Cataloglng-iit-Publlcatlon Data
`Patterson, Dav id /1..
`Co rnput.-r org,,nization ,md d,•s ign : the ha rdw.nc / software int,·rlace
`I D,wid A. Palterson, John L. H,mnessy.-2nd ed .
`p. cm.
`Includes bibliographic,,! rderences and index.
`IS!JN 1-5;,X60--lQI-X (p:1pcr)
`ISBN l·SS~hll--l21H> (dnth).-
`1. Comp1•t~r organization. 2. c,.»nputers-Desi11,n and constructio n.
`I. li,•nnt'~sy, John L.
`II. Tille
`3. Compute r interfaces.
`QA76.<J.CM3H46
`1997
`tl04.2'2-dc2.I
`
`<J7-Jf,(}5U
`
`INTEL - 1013
`
`
`
`viii
`
`Content•
`
`Contents
`
`Foreword vi
`by John H. Crawfl11·d
`
`Worked Examples xiii
`
`Computer Organization and Design Online xvi
`
`Preface xix
`
`CHAPTERS
`
`a Computer Abstractions and Technology 2
`
`J...,t.
`1.2
`·--1.a
`":i.4
`._..t..5
`1.e
`.' 1.7
`__ 1.8
`1.9
`. -1.t.O
`
`Introduction 3
`Below Your Program 5
`Under the Covers 10
`Integrated Circuits: Fueling Innovation 21
`Real Stuff: Manufacturing Pentium Chips 24
`Fallaclea and Pitfalls 29
`Concluding Remarks 30
`Hiatorical Perapective end Further Reading 32
`KeyTerms 44
`Exercises 45
`
`II The Role of Performance s2
`
`2,1
`lntrocluction 54
`2.2 Measuring Performance 58
`2.3 Relating the Metric• 60
`2.4 Clloosing Programs to Evaluate Performance 66
`2 .5 Comparing and Summarizing Performance 69
`2.8 Real Stuff: The SPEC95 Benchmark• and Performance of Recent
`Processors 71
`2. 7 Fallaciea and Pitfall& 75
`2.8 Concluding Remarka 82
`
`INTEL - 1013
`
`
`
`Contents
`
`ix
`
`2.9 Historical Perspective and Further Reading 8:1
`2.10 Key Terms 8CJ
`2.11 Exercises 90
`
`-II Instructions: Language of the Machine
`
`104
`
`Introduction 106
`3.1
`3.2 Operations of the Computer Hardware 107
`3.3 Operands of the Computer Hardware 109
`3.4 Representing Instructions in the Computer 116
`Instructions for Making Decisions 122
`3.5
`3,6 Supporting Procedures In Computer Hardware 132
`3.7 Beyond Numbers 142
`3.8 other Styles of MIPS Addressing 145
`3.9 Starting a Program 15b
`3.10 An Example to Put It All Together 163
`3.11 Arrays versus Pointers 171
`3.12 Real Stuff: PowerPC and 80x86 Instructions 175
`3 .13 Fallacies and Pitfalls 185
`3.14 Concluding Remarks 11'7
`3.15 Historical Perspective and Further Reading 18'1
`3,16 Key Terms 196
`3.17 Exercises 19h
`
`a Arithmetic for Computers 20H
`
`4.1
`Introduction 210
`':,4 ,2 Signed and Unsigned Numbers 210
`_, 4,3 Addition and Subtraction 220
`4.4 Logical Operations 225
`4.5 Constructing an Arithmetic Logic Unit 2:10
`4.6 Multiplication 250
`4. 7 Division 265
`4.8 Floating Point 275
`4.9 Real Stuff: Floating Point in the PowerPC and 80x86 301
`4.10 Fallacies and Pitfalls 304
`.11 Concluding Remarks 3()8
`4 .12 Historical Perspective and Further Reading 312
`4.13 Key Terms 322
`4.14 Exercises 322
`
`II The Processor: Datapath and Control 330
`
`5 1
`Introduction 338
`U,- Building a Datapath 343
`5.3 A Simple Implementation Scheme 351
`
`INTEL - 1013
`
`
`
`X
`
`Contents
`
`5.4 A Multicycle Implementation 377
`5.5 Microprogramming: Simpllfylng Control Design 399
`5.6 Exceptions 410
`5.7 Real Stuff: The Pentium Pro lmplementatlon 416
`5.8 Fallacies and Pitfalls 419
`5.9 Concluding Remark& 421
`5.10 Historical Perspective and Further Reading 423
`5.11 Kay Terms 426
`5.12 Exercises 427
`
`II Enhancing Performance with Pipelining
`
`434
`
`~ An Overview of Pipelining 436
`6.2 A Pipelined Datapath 449
`6.3 Pipelined Control 466
`6.4 Data Ha:rards and Forwarding 476
`6.5 Data Hazards and Stall• 489
`6.6 Branch Hazards 496
`8. 7 Exceptions 505
`6.8 Supenicalar and Dynamic Pipelining 510
`6.9 Real Stuff: PowerPC 604 and Pentium Pro Pipelines 517
`6.10 Fallacies and Pitfalls 520
`6.11 Concluding Remarks 521
`6.12 Hlstoricel Perspective and Further Reading 525
`6.13 Key Terms 529
`6.14 Exercises 529
`
`(cid:127) Large and Fast: Exploiting Memory Hierarchy 538
`
`Introduction 540
`7.1
`7.2 The Basics of Caches 545
`7.3 Measuring and Improving Cache Performance 564
`7.4 Virtual Memory 579
`7.5 A Common Framework for Memory Hierarchies 603
`7.8 Real Stuff: The Pentium Pro and PowerPC 604 Memory Hierarchies 611
`7.7 Fallacies and Pitfalls 615
`7,8 Concluding Remarks 618
`7.9 Historical Perspective and Further Reading 621
`7.10 Key Terms 627
`7.11 Exercises 628
`
`a Interfacing Processors and Peripherals 636
`
`Introduction 638
`8.1
`8.2 1/0 Performance Measures: Some Examples from Disk and FIie
`Syetems 641
`8.3 Types and Characteristic(cid:127) of 1/0 Devices 644
`
`INTEL - 1013
`
`
`
`Content"
`
`xi
`
`8.4 Buses: Connecting 1/0 Devices to Processor and Memory b55
`Interfacing 1/ 0 Devices to the Memory, Processor, and Operating
`8.5
`System 673
`8.6 Designing an 1/0 System b84
`8.7 Real Stuff: A Typical Desktop 1/ 0 System 687
`8.8 Fallacies and Pitfalls 688
`8.9 Concluding Remarks 690
`8.10 Hlstorical Perspective and Further Reading
`8.11 Key Terms 700
`8.12 Ell.ercises 700
`
`t,94
`
`a Multiprocessors 710
`
`Introduction 712
`9.1
`9.2 Programming Multiprocesaors 714
`9.3 Multiprocessors Connected by a Single Bus 717
`9.4 Multiprocessors Connected by a Network 727
`9.5 Clusters 734
`9.6 Network Topologies 73h
`9. 7 Real Stuff: Future Directions for Multiprocessors 740
`9,8 Fallacies and Pitfalls 743
`9.9 Concluding Remarks-Evolution versus Revolution in Computer
`Architecturo 746
`9.10 Historical Perspective and Further Reading 748
`9.11 Key Terms 756
`9.12 Exercises 750
`
`A-2
`
`APPENDICES
`
`II
`
`Assemblers, Linkers, and the SPIM Simulator
`by James R. um1s, Uni,•crsity of Wisconsin
`A.1
`Introduction A-3
`A.2 Assemblers A-10
`A.3 Linkers A-17
`A.4 Loading A-19
`/\-20
`A.5 Memory Usage
`A.6 Procedure Call Convention A -22
`A, 7 Exceptions and Interrupts A-32
`A.8
`Input and Output A-3t,
`A.9 SPIM A-38
`A.10 MIPS R2000 Assembly Language A-49
`A.11 Concluding Remarks /\-75
`A.12 Key Terms A -7o
`A.13 Exercises A-7n
`
`INTEL - 1013
`
`
`
`xii
`
`Contents
`
`II The Basics of Logic Design B-2
`
`Introduction B-3
`B.1
`8.2 Gates, Truth Tables, and Logic Equations B-4
`B.3 Combinational Logic B-8
`B.4 Clocks B-18
`B.S Memory Elements B-21
`B.6 Finite State Machines B-35
`B. 7 Timing Methodologies B-39
`B.8 Concluding Remarks B-44
`B.9 Key Terms B-45
`B.10 Exercise& 8-45
`
`II Mapping Control to Hardware c-2
`
`Introduction C-3
`C.J.
`Implementing Combinational Control Units C-4
`C.2
`C.3
`Implementing Finite State Machine Control C-8
`Implementing the Next-State Function with a Sequencer C-21
`C.4
`C.5 Translating a Microprogram to Hardware C-28
`C.8 Concluding Remarks C-31
`C. 7 Key Terms C-32
`C.B Ellerclses C-32
`
`Glossary G-1
`
`Index 1-1
`
`INTEL - 1013
`
`
`
`K.F. WENDT LIBRARY
`UW COLLEGE OF ENGR.
`215 N. RANDALL t\VENUE
`MADISON ,. 11 53706
`
`INTEL - 1013
`
`
`
`234
`
`Chapter 4 Arithmetic for Computers
`
`The Sum bit is set when exactly one input is 1 or when all three inputs are 1..
`The Sum results in a complex Boolean equation (recall that a mec1ns NOT a):
`Sum = (a· b · Carryln) +{a· b · Carry In)+ (a· b · Carryln) +(a · b · Carryln)
`
`)· 1[) The drawing of the logic for the Sum bit in the adder black box is left as an
`
`/
`
`exercise (see Exercise 4.43).
`Figure 4.14 shows a I-bit ALU derived by combining the c1dder with the ear-
`lier components. Sometimes designers also want the ALU to perform a few
`more simple operations, such as generating 0. The easiest way to add an oper-
`11tion is to expand the multiplexor controlled by the Operation line and, for this
`example, to connect 0 directly to the new input of that expanded multiplexor.
`
`A 32-Bit ALU
`Now that we have completed the 1-bit ALU, the full 32-bit ALU is created by
`connecting adjacent ''black boxes." Using xi to mean the ith bit of x, Figure 4.15
`shows a 32-bit ALU. Just as a single stone can cause ripples to radiate to the
`shores of a quiet lake, a single carry out of the least significant bit (Result0)
`can ripple all the way through the adder, causing a carry out of the most sig(cid:173)
`nificant bit (Result31). Hence, the adder created by directly linking the cc1rries
`of 1-bit adders is called a ripple carry adder. We'll see a faster way to connect
`the 1-bit adders starting on page 241.
`
`Operation
`Carryln
`
`Result
`
`1
`
`2
`
`;..
`
`FIGURE 4.14 A 1•blt ALU that performs AND, OR, and addition (see Flgure 4.13).
`
`CarryOut
`
`INTEL - 1013
`
`
`
`4.5 Conltl"l.lctlng an Arithmetic Logic Unit
`
`235
`
`Carryln
`
`Operation
`
`ao
`
`bO
`
`al
`
`b1
`
`a2
`
`b2
`
`Carryln
`
`ALUO
`
`Carryout
`
`Carryln
`
`ALU1
`
`Carryout
`
`Carryln
`
`ALU2
`
`Carryout
`
`ResultO
`
`Resultl
`
`Result2
`
`a31
`
`b31
`
`Carryln
`
`ALU31
`
`1------- Result31
`
`FIGURE 4.15 A 32-blt ALU con1tructed from 32 1-blt ALU•. CarryOut of the less significant
`bit is connected to the Carryln of the more significant bit. Thi~ organizotiun is callt'CI ripple carry.
`
`Subtraction is the same as adding the negative version of an operand, and
`this is how adders perform subtraction. Recall that the shortcut for negating a
`two's complement number is to invert each bit (sometimes called the one's co111-
`plcmc11f as explained in the elaboration on page 219) and then add 1. To invert
`each bit, we simply add a 2:1 multiplexor that chooses between b and b, as
`Figure 4.16 shows.
`Suppose we connect 32 of these 1-bit ALUs, as we did in Figure 4.15. The
`added multiplexor gives the option of b or its inverted value, depending on
`Binvert, but this is only one step in negating a two's complement number. No(cid:173)
`tice that the least significant bit still has a Carryln signal, even though it's un(cid:173)
`necessary for addition. What happens if we set this Carryln to l inste21d of O?
`
`INTEL - 1013
`
`
`
`236
`
`Chapter 4 Arithmetic for Computers
`
`Binvert
`
`Operation
`Carryln
`
`a --1---- -1 - - -- - i
`
`0
`
`1
`
`2
`
`Result
`
`Carryout
`
`FIGURE 4.16 A 1-blt ALU that performs AND, OR, and addition on a and b or a and b, By
`selecting b Cl3invert = 1) and selling Carryln to I in thi: lens! significant bit of the ALU, we get
`two's complement ,ubtrnction of b from .1 in!-IPad of ~ddition of b to a .
`
`The adder will then calculate c1 + b + 1. Dy selecting the inverted version of b,
`we get exactly what w~ ant:
`;;_
`-
`.
`a + b + Ii = a + (b + I) == a + ( -b) = a - b
`'
`The simplicity of the hardware design of a two's complement adder helps
`explain why two's complement representation hns become the universal stan·
`dard for integer computer arithmetic.
`
`Tailoring the 32-Bit ALU to MIPS
`This set of operations-add, subtract, AND, OR-is found in the ALU of
`almost every computer. If w e look at Figure 4.7 un page 228, we see that the
`operations of most MIPS instructions can be performed by this ALU. But the
`design of the ALU is incomplete.
`One instruction that still needs support is the set on less than instruction
`(s1 t). Rccdll that the operation produces 1 if rs< rt, and O otherwise. Conse·
`,1ucntly, s l t will set all but the least significant bit to 0, with the least signifi(cid:173)
`cant bit set according to the comparison. For the ALU to perform s 1 t, we first
`need to expand the three-input multiplexor in Figure 4.16 to add an input for
`the s l t result. We call that new input Less. and use it only for s l t .
`
`X
`
`..
`
`INTEL - 1013
`
`
`
`i
`
`,(
`
`(
`
`4 .5 Constructing an Arithmetic Logic Unit
`
`237
`
`,
`
`The top dra\•ving of Figure 4.17 shows the nev,, 1-bit ALU w ilh the expanded
`multiplexor. From the description of s l t above, we must connect Oto the Less
`input for the upper 31 bits of the ALU, since those bits me always set to O. What
`remains to consider is how to compare and set the least sig11ificn11t bit for set on
`less than instructions.
`What happens if we subtract b from a? If the difference is negative, then a<
`b since
`')
`f.
`
`(a-b) < O(cid:157)
`((a - b)+b)<(O+b)
`(cid:157) a<b
`We want the least significant bit oL.a sel on less than operation to be a l il a <
`b; t~at is, a 1 if a - bis neg__ativc and a O if it's positive. This desired result cor(cid:173)
`responds exactly to the sign-bit values: 1 means negative and O means posi(cid:173)
`tive. Following this line of argument, we need only connect the sign bit from
`the adder output to the least significan t bit to get set on less than.
`Unfortunately, the Result output from the most significant ALU bit in the
`top of Figure 4.17 for the s l t operation is not the output of the adder; the ALU
`Ol1tput for the s l t operation is obviously the input value Less.
`Thus, we need a new 1-bit ALU for the most significant bit that has an extra
`output bit: the adder output. The bottom d rawing of Figure 4.17 shows the de(cid:173)
`sign, with this new adder output line called Set, and used only for s l t. As long
`as we need a special ALU for tht! most significant bit, we added the overflow
`detection logic since it is also associated with that bit.
`Alas, the test of less than is a little more complicated than just described be(cid:173)
`cause of overflow; Exercise 4.23 on page 326 explores what must be done.
`Figure4.18 shows the 32-bit ALU.
`Notice that every time we want the ALU to subtract, we set both Carryln
`and Binvert to 1. For adds or logical operations, we want both control lines to
`be 0. We can therefore simplify control of the ALU by combining the Carry In
`and Binvcrt to a single control line called B11egnte.
`To furth er tailor the ALU to the MIPS instruction set, we must support con(cid:173)
`ditional branch instructions. These instructions branch either if two registers
`are equal or if they are unequal. The easiest way to test equality with the ALU
`is to subtract b from a and then test to see if the result is O since
`(a - b = 0) (cid:157)
`a = b
`Thus, if we add hardware to test if the result is 0, we can test for equality.
`The simplest way is to OR all the outputs together and then send that signal
`through an inverter:
`
`Zero = (Result31 + Result30 + . .. + Result2 + Resultl + ResultO)
`Figure 4.19 shows the revised 32-bit ALU. We can think of the combination
`of the 1-bit Bnegate line and the 2-bit Operation lines as 3-bit control lines for
`
`INTEL - 1013
`
`
`
`238
`
`Chapter 4 Artthrnetlc for Computer•
`
`Blnvert
`
`Operation
`Carryln
`
`a_,._- ----+----...i
`
`0
`
`1
`
`2
`
`Result
`
`Less -+-----------+---+1 3
`
`Carryout
`
`B1nvert
`
`Operation
`Carryln
`
`a -+-----+----+--i
`
`0
`
`1
`
`2
`
`Result
`
`Less -+ - - -----+-11-+---1t--tr-+13
`
`- - - - -set
`
`Over11ow
`detection
`
`r-----+-... Overflow
`
`FIGURE 4.17 (Top) A 1·blt ALU that perform• AND, OR, and addllloa on II and b orb, and
`(bottom) • 1•blt ALU for the moat elgnlflcant bit. The top draw in~ includes n direct input th.it
`is C'Onnected to perform the set on less than operiltinn (see Fi~urc 4. 18); the bottom has a direct out·
`put from the adder for the less than compariso11 called Set. (Refer to Exercis.- 4.42 to sw how to c:il·
`culate overflow with fower inputs.)
`
`INTEL - 1013
`
`
`
`4.1 Constructing •• Arithmetic Logic Unit
`
`239
`
`Binvert
`
`Carryln
`
`Operation
`
`ao
`bO
`
`a1
`b1
`0
`
`a2
`b2
`0
`
`Less
`Carryout
`
`Carryln
`ALU1
`
`ResultO
`
`=lesultl
`
`Carryln
`ALU2 1------1---+ Result2
`Less
`Carryout
`
`Carryln
`
`a31
`b31
`0
`
`Carryln 1------- Result31
`1 - - - - - - Set
`ALU31
`1-------.. ... Overflow
`Less
`
`FIGURE 4.18 A 32-blt ALU constructed from the 31 copies of the 1.-1t ALU In the top of
`Figure 4.17 and one 1•blt AW In the bottom of that figure. The Less inputs are connected to
`0 except for the least significant bit, and that is connected to th<! Set output of the most significant
`bit. If the ALU performs a - b and we select th<! input 3 in the multiplexor in Figure 4.17, then
`Result = 0 ... 001 if n < b, and Result = 0 .. . 000 otherwise.
`
`the ALU, telling it to perform add, subtract, AND, OR, or set on less than.
`Figure 4.20 shows the ALU control lines and the corresponding ALU opera(cid:173)
`tion.
`Finally, now that we have seen what is inside a 32-bit ALU, we will use the
`universal symbol for a complete ALU, as shown in Figure 4.21.
`
`INTEL - 1013
`
`
`
`240
`
`Chapter 4 Arithmetic for Computer•
`
`Bnegate
`
`Operation
`
`ao
`bO
`
`al
`bl
`0
`
`a2
`b2
`0
`
`ResultO
`
`Resultl
`
`Result2
`
`Carryln
`ALU1
`Less
`Carryout
`
`Carryln
`ALU2
`Less
`Carryout
`
`[p-,,m
`
`a31
`b31
`0
`
`Result31
`Carryln
`1------, Set
`ALU31
`Less 1---- -+--- -- -- -- -----+ Overflow
`
`FIGURE 4,19 The final 32-blt ALU. This adds a Zero detector to Figure 4.18.
`
`ALU control lines
`
`Function
`
`000
`001
`010
`110
`111
`
`r - -
`
`and
`or
`add
`subtract
`set on less than
`
`FIGURE 4.20 Th• value& of the t,.ree ALU c:ontrol lines Bnegate and Operation and the
`corresponding ALU operations.
`
`INTEL - 1013
`
`
`
`4.5 Com1tructlng an Arithmetic Logic Unit
`
`241
`
`ALU operation
`
`Zero
`Result
`Overflow
`
`carryout
`
`FIGURE 4.21 The symbol co111monly used to represent an ALU, as shown In Figure 4.19.
`Thi~ symb()I is also used to r1cprc~ent an ,,ddcr, s,, it is norm,,lly lnbelcu either with ALU or
`Adder.
`
`Carry Lookahead
`The next quc::;tion is, How quick1y cc1n this ALU <1dd two 32-bit operands? We
`Ci\n d etermine the a and b inputs, but the Carry In input depends on the oper(cid:173)
`ation in the adjacent 1-bit adder. If w e trace all the way through the chain of
`dependencies, we connect the most signific.int bit to the le<1st significant bit,
`so the most significant bit of the sum must wait for tlv.~ scqucntinl evalu.ition of
`all 32 1-bit adders. This sequential chilin reaction is too slow to be used in
`time-criticc1l hardware.
`There are a variety of schemes to anticipate the car ry so that the worst-case
`scenario is a function of the log2 of the number of bits in the adder. These a n(cid:173)
`ticipa tory signals are foster because they go through few er gates in sequence,
`but it takes many more gates to anticipate the proper carry.
`A key to understanding fast carry schemes is to remember that, unlike soft(cid:173)
`ware, hardware executes in parallel whenever inputs change.
`
`Fast Carry Using "Infinite" Hardware
`Appendix B mentions that any equation can be represented in two levels of
`logic. Since the only external inputs are the two upe rnnds a nd the Carryln to
`the least significant bit of the adder, in theory we could calculate the Carry In
`values to all the remaining bits of the adder in just twu levels of logic.
`For example, the Carry In for bit 2 of tlw adder is exactly the CarryOut of l;iit
`l, so the formirla is
`Carryin2 == ( b I · Carry In 1 }+ ( a I · Carry lnl J + (a I · b I )
`
`Simila rly, Carryln1 is defined as
`Carrylnl = l bll - Carryln0J + (a0 · Carryin0) + (aO · bO)
`
`INTEL - 1013
`
`
`
`-
`
`250
`
`Chapter 4 Arithmetic for Computera
`
`(cid:127)
`
`MuHipllcation
`
`Multiplicafio11 is vexnlion,
`Division is as bad;
`The rnle of three doth puzzlr: me,
`And pmctici: drives me mad.
`
`Anonymous, Elizabethan manuscript, 1570
`
`With the construction of the ALU and explanation of addition, subtraction,
`and shifts, we are ready to build the more vexing operation of multiply.
`• But first let's review the multiplication of decimal numbers in longhand to
`remind ourselves of the steps and the names of the operands. For reasons that
`will become clear shortly, we limit this decimal example to using only the dig(cid:173)
`its O and 1. Multiplying lOOOtt!n by 10011en;
`
`Multiplicand
`Multiplier
`
`lOOOten
`· 1001 ten
`
`X
`
`Product
`
`1000
`0000
`0000
`1000
`1001 OOOten
`The first operand is called the 11111/tiplimnd and the second the multiplier. The
`final result is called the product. As you may recall, the algorithm learned in
`grammar school is to take the digits of the multiplier one at a time from right
`to left, multiplying the multiplicand by the single digit of the multiplier and
`shifting the intermediate product one digit to the left of the earlier inter(cid:173)
`mediate products.
`The firs t observation is that the number of digits in the product is consider(cid:173)
`ably larger than the number in either the multiplicand or the multiplier. In fact,
`if we ignore the sign bits, the length of the multiplication of an 11-bit multipli(cid:173)
`cand and an 111-bit multiplier is a product that is I!+ m bits long. That is, 11 + m
`bits are required to represent all possible products. Hence, like add, multiply
`must cope with overflow because we frequently want a 32-bit product as the
`res ult of multiplying two 32-bit numbers.
`In this example we restricted the decimal digits to O and 1. With only two
`choices, each step of the multiplication is simple:
`
`1. Just place a copy of the multiplicand (1 x multiplicand) in the proper
`place if the multiplier digit is a 1, or
`
`2. Place O (0 x m ultiplicand) in the proper place if the digit is O.
`
`INTEL - 1013
`
`
`
`4.6 Multlpllcatlon
`
`2U
`
`Although the decimal example above happened to use only O and 1, multipli(cid:173)
`cation of binary numbers must always use O ,rnd 1, and thus always offers
`only these two choices.
`Now that we have reviewed the bnsics of multiplication, the traditional next
`step is to provide the h ighly optimized multiply hardware. We break with tra(cid:173)
`dition in the belief that you will gain a better understanding by seeing the evo(cid:173)
`lution of the multiply hardwc1re and algorithm through three generations. The
`rest of this section presents successive refinements of the hardware and the
`algorithm until we have a version used in some computers. For now, let's
`assume that we are multiplying only positive numbers.
`
`First Version of the Multiplication Algorithm and Hardware
`The initial design mimics the algorithm we learned in grammar school; the
`hardware is shown in Figure 4.25. We have drawn the hardware so that data
`flows from top to bottom to more closely resemble the paper-and-pencil
`method.
`Let's assume that the multiplier is in the 32-bit Multiplier register and that
`the 64-bit Product register is initialized to 0. From the paper-and-pencil exam(cid:173)
`ple above, it's dear that i.ve will need to move the multiplicand left one digit
`each step as it may be added to the intermediate products. Over 32 steps a
`
`--Mult1phcand
`
`Shirt ie1't
`
`64-bit ALU
`
`Product
`
`64 bits
`
`Write , __ ....,
`
`Control test
`
`-Multiplier
`
`Shi~ right
`32 bits
`
`FIGURE 4.25 First version of tlle multlplic:atlon h11rdw11re. The Multiplicand regbter, ALU.
`and Product rcgbter arc all r,4 bil~ wide, with only tht> Multiplier register t·ontainin~ :12 bit~. The
`32-bit m11ltiplic,md :,,tarts in the right half of the Multiplicand register, ,rnd is shifted left 1 bit nn
`each step. The multiplier is ~hifted in the opposite direction at e,Kh step. The al~orithm starts
`with the product infti,ilized toll. Control decides wlwn to ~hift the ~foltiplic,rnd and \.lultiplier
`n:gi~ters ,md when tu write new ,·alue:,, into the Prnduct n:gister.
`
`INTEL - 1013
`
`
`
`252
`
`Chapter 4 Arithmetic for Computer•
`
`-
`
`32-bit multiplicand would move 32 bits to the left. Hence we need a 64-bit Mul(cid:173)
`tiplicand register, initialized with the 32-bit multiplicand in the right half and
`O in the left half. This register is then shifted left