`
`Computer Organization and Design
`
`THE HARDWARE/SOFTWARE INTERFACE
`
`John L. Hennessy
`Stanford University
`
`David A. Patterson
`University of California, Berkeley
`
`With a contribution by
`James R. Larus
`University of Wisconsin
`
`M ::(cid:141)
`
`Morgan Kaufmann Publishers, Inc.
`San Francisco, California
`
`INTEL - 1012
`
`
`
`Indexer Steve Rath
`Printer Courier Corporation
`
`Morgan Kaufmann Publishers, Inc.
`Editorial and Sales Office:
`340 Pine Street, Sixth Floor
`San Francisco, CA 94104-3205
`USA
`
`Telephone 415/392-2665
`Facsimile 415/982-2665
`111kp@111kp.co111
`WWW
`lzttp://www.111kp.co111
`Order toll free 800/745-7323
`
`© 1998 by Morgan Kaufmann Publishers, Inc.
`All rights reserved
`Printed in the United States of America
`
`02 01 00 99
`
`5 4 3 2
`
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`or by any means-electronic, mechanical, photocopying, recording, or otherwise--without the prior
`written permission of the publisher.
`
`Advice, Praise, and Errors: Any correspondence related to this publication or intended for the authors
`should be sent electronically to cod2l111gs@111kp.co111. Information regarding error sightings is encouraged.
`Any error sightings that are accepted for correction in subsequent printings will be rewarded by the
`authors with a payment of $1.00 (U.S.) per correction at the time of their implementation in a reprint.
`
`Library of Congress Cataloging-in-Publication Data
`Patterson, David A.
`Computer organiza tion and design: the hardware/ software interface
`I David A. Patterson, John L. Hennessy.-2nd ed.
`p. cm.
`Includes bibliographical references and index.
`ISBN 1-55860-428-6 (cloth).-ISBN 1-55860-491-X (paper)
`1. Computer organization. 2. Computers-Design and construction.
`I. Hennessy, John L.
`3. Computer interfaces.
`II. Title
`QA76.9.C643H46
`1997
`004.2'2-dc21
`
`97-16050
`
`INTEL - 1012
`
`
`
`14
`
`Cliapter 1 C~mp11ter Abstractions and Technology
`
`lmd!le a personal compllter. Th-, n,rtic,11 board in the back is a print-,d drm1t
`f!GUR£ t ,!I
`L><.w·d (PC bnard). called th• 1:1,1t11c,·h•nrd 1n a PC, th~t c:ontairn, mo,t uf the eledronic5 of thie com(cid:173)
`p1.1ter: Figure 1.11 is a11 "'·erhe~d phot<1graph of that board, rotated 9(1 degree,. TI,e proces"c>r i~
`the iarf,t' hlack 1wtan5l<' m :he lower-right corner 0! th<' board lFigure 1.9 is a ph.:itngrarh o f the
`proc<";,,,r bef<,re it is f>ln1ted in the black packdge. I The two lar);e board, a lldched perpendiLularly
`in the top third oi the m"therboard on the right c·ontain input / o utput interface, to th" Ethern<et
`hxal are,1 netwcrk ar.d il ,·idco c,rd frir" CRT. The tWP.!-mall board, att.J,hed pe·rpendiculJrly to
`the middle oi the mothcrbc•ard contain the memory chips. The large box on the h,wer Jett is~
`,agt> for 111ou11ting: ,1cld,tic,nal ,irives: above it are ~ hont disk drive and a floppy disk dnve. The
`pmv1o1• sllp)Yly is n•xn1,;l'v 111 th1c nght <'f lhh bL.>x, but ii w<1, remL>\'ed lo gel a bell<'r vie\'\ of th~
`tni,theTbo.ard.
`
`The proces,or is the active part of the board, following the instructions of a
`program to the letter. It ;;idds numbers, tests numbers, signals 1/0 d evices t0
`activate, and so on. ·1·he processor is the !,uge square below the memory boards
`in the lower-right corner of Figure 1.8. Occasionally, reople call the processc,r
`the Cl-'U, for the more bureaucratic-stiunding antral proa-%or 1m2t.
`Descending ;,ven lmver into the hardware, Figure 1.q reveals details of the
`processor in Figure LS. The proce">Sor comprisec; two main components:
`Jatilpath and contro), the respective bra,..,·n and brain of the proces:C.or. The
`datll'path performs the arithmetic operations, and co11fn1/ tells the datapath,
`nwmnrr, :md f ;n ckvicr:,, wh;it to cit1 .~fcnnling to tlw wir.h(',. of thP in~tn1r-
`1i,11i•, nl i!w f'l"< 'F.J';,n, ( 'h,,pit'r r;
`,·,111,,111:, the d,1i,1p,1tli ,1t1d u,ntr,,] fur ,,
`',I r,11_,;hl 1·Pn,·,1 r, l 1111pk•·nL'Hl.1 ll<>!l . . rnd l. ·1;,1 ptl'r h dt•si ril,,_•s t hv chcinge" lWt'<h•d
`It:, .i hl;½h<·r p,~,·f<Jn·r'i.lHh ,· dt 1:-,1~n.
`
`INTEL - 1012
`
`
`
`mucn naraer 10 come up mr an excuse tni:ll:, 1 LI u aoes not sign extenu its 1m111ernate.
`Since andi and ori normally work with unsigned integers, the immediates are
`treated as unsigned integers as well, meaning that they are expanded to 32 bits by pad(cid:173)
`ding with leading Os instead of sign extension. Thus if the bit fields in the third line of
`the example above extended beyond the 16 least significant bits, the and i instruction
`would need a 32-bit constant to avoid clearing the upper portion of the fields.
`The MIPS assembler creates 32-bit constants with the pair of instructions l u i and
`or i; see Chapter 3, page 14 7 for an example of creating 32-bit constants using 1 u i
`and addi.
`
`(cid:127) Constructing an Arithmetic Logic Unit
`
`ALU n. [Arthritic Logic Unit or (rare) Arithmetic Logic Unit] A random-number
`generator supplied as standard with all computer systems.
`
`Stan Kelly-Bootle, Tlze Devil's DP DictionanJ, 1981
`
`The arithmetic logic unit or ALU is the brawn of the computer, the device that
`performs the arithmetic operations like addition and subtraction or logical
`operations like AND and OR. This section constructs an ALU from the four
`hardware building blocks shown in Figure 4.8 (see Appendix B for more
`details on these building blocks). Cases 1, 2, and 4 in Figure 4.8 all have two
`inputs. We will sometimes use versions of these components with more than
`two inputs, confident that you can generalize from this simple example. In
`any case, Appendix B provides examples with more inputs. (You may wish to
`review sections B. l through B.3 before proceeding further.)
`Because the MIPS word is 32 bits wide, we need a 32-bit-wide ALU. Let's as(cid:173)
`sume that we will connect 32 1-bit ALUs to create the desired ALU. We'll
`therefore start by constructing a 1-bit ALU.
`
`A 1-Bit ALU
`The logical operations are easiest, because they map directly onto the hard(cid:173)
`ware components in Figure 4.8.
`
`INTEL - 1012
`
`
`
`more simple operations, such as generating 0. The easiest way to add an oper(cid:173)
`ation is to expand the multiplexor controlled by the Operation line and, for this
`example, to connect 0 directly to the new input of that expanded multiplexor.
`
`A 32-Bit ALU
`Now that we have completed the 1-bit ALU, the full 32-bit ALU is created by
`connecting adjacent 'black boxes." Using xi to mean the ith bit of x, Figure 4.15
`shows a 32-bit ALU. Just as a single stone can cause ripples to radiate to\ the
`shores of a quiet lake, a single carry out of the least significant bit (Result0)
`can ripple all the way through the adder, causing a carry out of the most sig(cid:173)
`nificant bit (Result31). Hence, the adder created by directly linking the carries
`of 1-bit adders is called a ripple carry adder. We'll see a faster way to connect
`the 1-bit adders starting on page 241.
`
`Operation
`Carryln
`
`Result
`
`FIGURE 4.14 A 1-blt ALU that performs AND, OR, and addition (see Figure 4.13).
`
`Carryout
`
`INTEL - 1012
`
`
`
`4.S Constructing an Arithmetic I.ogle Unit
`
`235
`
`Carryln
`
`Operation
`
`ao--+ Carryln -----+-- ResultO
`
`bO --+ ALUO
`CarryOut
`
`a1
`
`b1
`
`Carryln
`ALUl 1 - - - -- - 1 - -(cid:141) Result1
`
`a2 --+ Carryln
`
`b2 --+ ALU2
`Carryout
`
`1------+-- Result2
`
`!
`
`a31 --+ Carrylri
`
`b31 --+
`
`ALU31
`
`Rcsult31
`
`FIGURE 4.15 A 32-blt AL.U constructed from 32 1-blt ALU(cid:127) . Carryout of the less signifirnnt
`bit is connected to the Carry In of the more significant bit. This organizlltion is caUcd ripple carry.
`
`Subtraction is the same as adding the negative version of an operand, and
`this is how adders perform subtraction. Reci\U that the shortcut for negating a
`two's complement number is to invert each bit (sometimes called the one's com(cid:173)
`plemrnt as explained in the elaboration on page 21 9) alld then add 1. To invert
`each bit, we simply add a 2:1 multiplexor tha t chooses between band b , as
`Figure 4.16 shows.
`Suppose we connect 32 of the~e 1-bit ALUs, as we did in Figure 4.1 5. The
`added multiplexor gives the option of b or its inverted value, depending on
`Binvert, bu t this is only one step in negc1ting" two's complement number. No(cid:173)
`tice that the least i:;ignificant bit still has a Cc1rryln signal, even though it's un(cid:173)
`necessary for addition. What happens if we set this Carry In to 1 instead of O?
`
`INTEL - 1012
`
`
`
`238
`
`Chapter 4 Arithmetic fOf' Co..,.,..,..
`
`Blnvert
`
`Operation
`Carryln
`
`a-+---~----i
`
`..
`
`Result
`
`0
`
`1
`
`2
`
`Carry()ut
`
`FIGURE 4.18 A 1-lllt ALU that JMlflorma AND, OR, and addition on a and II Of a and b. By
`selecting b (Binvert " 1) and setting Carryln to I in the least significant bit of the ALU, we get
`two's romplement subtraction of b from ,1 instead of addition of b to a.
`
`Tl1e adder will then calculate a + b + 1. By selecting the inverted version of b,
`we get exactly what we want:
`
`a+b+ I = a+(b+ I) = a+(-b) = a -b
`
`The simplicity of the hardware d esign of a two's complement adder helps
`explain why two's complement representation has become the universal stan(cid:173)
`dard for integer computer arithmetic,
`
`Tailoring the 32-Bit ALU to MIPS
`This set of operations-add, subtract, AND, OR-is found in the ALU of
`c1lmost every computer. If we look at Figure 4.7 on page 228, we see that the
`operations of most MIPS instructions can be performed by this ALU. But the
`design of the ALU is incomplete.
`One inslruction that still needs support is the set on less than instruction
`(s l t ). Recall that the operation produces 1 if rs < rt, and O otherwise. Conse(cid:173)
`quently, s 1 t will set all but the least significant bit to 0, with the least signifi(cid:173)
`cant bit set according to the comparison. For the ALU to perfom1 s l t, we first
`need to expand the three-input multiplexor in Figure 4.16 to add an input for
`the s 1 L result. We call that new input /,ess, and use it only for s 1 t.
`
`INTEL - 1012
`
`
`
`4.5 Constructing an Arithmetic Logic Unit
`
`237
`
`The top drawing of Figure 4.17 shows the new 1-bit ALU with the expanded
`multiplexor. From the description of s 1 t above, we must connect Oto the Less
`input for the upper 31 bits of the ALU, since those bits are always set to 0. What
`remains to consider is how to compare and set the least significant bit for set on
`less than instructions.
`What happens if we subtract b from a? If the difference is negative, then a<
`b since
`
`(a-b)<0 (cid:157)
`((a - b)+b)<(0+b)
`(cid:157)
`a<b
`We want the least significant bit of a set on less than operation to be a 1 if a <
`b; that is, a 1 if a - bis negative and a 0 if it's positive. This desired result cor(cid:173)
`responds exactly to the sign-bit values: 1 means negative and 0 means posi(cid:173)
`tive. Following this line of argument, we need only connect the sign bit from
`the adder output to the least significant bit to get set on less than.
`Unfortunately, the Result output from the most significant ALU bit in the
`top of Figure 4.17 for the s 1 t operation is not the output of the adder; the ALU
`output for the s 1 t operation is obviously the input value Less.
`Thus, we need a new 1-bit ALU for the most significant bit that has an extra
`output bit: the adder output. The bottom drawing of Figure 4.17 shows the de(cid:173)
`sign, with this new adder output line called Set, and used only for s 1 t . As long
`as we need a special ALU for the most significant bit, we added the overflow
`detection logic since it is also associated with that bit.
`Alas, the test of less than is a little more complicated than just described be(cid:173)
`cause of overflow; Exercise 4.23 on page 326 explores what must be done.
`Figure 4.18 shows the 32-bit ALU.
`Notice that every time we want the ALU to subtract, we set both Carryln
`and Binvert to 1. For adds or logical operations, we want both control lines to
`be 0. We can therefore simplify control of the ALU by combining the Carryln
`and Binvert to a single control line called Bnegate.
`To further tailor the ALU to the MIPS instruction set, we must support con(cid:173)
`ditional branch instructions. These instructions branch either if two registers
`are equal or if they are unequal. The easiest way to test equality with the ALU
`is to subtract b from a and then test to see if the result is 0 since
`(a - b = 0) (cid:157) a = b
`Thus, if we add hardware to test if the result is 0, we can test for equality.
`The simplest way is to OR all the outputs together and then send that signal
`through an inverter:
`
`Zero = (Result31 + Result30 + . . . + Result2 + Resultl + Result0)
`Figure 4.19 shows the revised 32-bit ALU. We can think of the combination
`of the 1-bit Bnegate line and the 2-bit Operation lines as 3-bit control lines for
`
`INTEL - 1012
`
`
`
`238
`
`Chapter 4 Arithmetic: for Computers
`
`Binvert
`
`Operat ion
`Carryln
`
`a
`
`b
`
`Less
`
`0
`
`1
`
`Result
`
`1
`
`2
`
`3
`
`CarryOUl
`
`Binvert
`
`Operat ion
`Carryln
`
`a -1------1--.... - - ,
`
`b
`
`Less
`
`1
`
`2
`
`3
`
`Result
`
`Set
`
`Overflow
`
`Overflow
`detection
`
`(Top) A 1-blt ALU that perform& AND, OR, end addition on a and b orb, and
`FIGURE 4.17
`(bottom) a 1-blt ALU for the moat algnltic:11nt bit. The top drnwing i ndudcs a direct input that
`is connected to perform the set on less than operation (see Figure 4.18); the bottom has a direct out(cid:173)
`put from the adder for the less than comparison called Set. (Rcfor to Exercise 4.42 to see how to cal(cid:173)
`culate overAow with fewer inputs.)
`
`INTEL - 1012
`
`
`
`4.5 Constructing an Arithmetic Logic Unit
`
`239
`
`Blnvert
`
`Carryln
`
`Operation
`
`aO
`bO
`
`al
`bl
`0
`
`a2
`b2
`0
`
`1 - - -- -1--- ResultO
`
`1--------11--.. Result!
`
`1--------11--.. Resu lt2
`
`Less
`Carryout
`
`Carryln
`a31
`b31--+ ALU31
`O
`Less
`
`1--------+ Result31
`1------set
`l--------11--.. overflow
`
`FIGURE 4.18 A 32-blt ALU constructed from the 31 copies of the 1-blt ALU In the top of
`Figure 4.17 and one 1-blt ALU In the bottom of that figure. The Less inputs are connec1te<I tr,
`0 except for the least significant bit, and that is connected to !he Set output o f the most sign i limnt
`bit. If the ALU performs a - b and we select the input 3 in the multiplexor in Figur" 4.17, ,then
`Result= O . . . 001 if a < b, and Resull = 0 .. . 000 otherwise.
`
`the ALU, telling ii to perform add, subtract, AND, OR, or set on less than.
`Figure 4.20 shows the ALU control lines and the corresponding ALU opera(cid:173)
`tion.
`Finally, now that we have seen what is inside a 32-bit ALU, we will use the
`universal symbol for a complete ALU, as shown in Figure 4.21.
`
`INTEL - 1012
`
`
`
`240
`
`Chapter 4 Arithmetic for Computers
`
`Bnegate
`
`Operation
`
`ao
`bO
`
`a1
`b1
`0
`
`a2
`b2
`0
`
`a31
`b31
`0
`
`ResultO
`
`Less
`Carryout
`
`ResulU
`
`Result2
`
`Less
`Carryout
`
`Zero
`
`1------t--------------- Overflow
`
`FIGURIE 4.19 The 1in•I 32-blt ALU. This acids a Zero detector to Figure 4.18.
`
`ALU control lines
`
`Function
`
`000
`001
`010
`110
`111
`
`and
`or
`add
`subtract
`se1 on less than
`
`j
`
`FIGURE 4.20 The value• of the three ALU control lines Bnegate and OperQtion and the
`corre9ponding ALU operationa.
`
`INTEL - 1012
`
`
`
`4.5 Constructing an Arithmetic Logic Unit
`
`241
`
`ALU operation
`
`a
`
`b
`
`Carryout
`
`Zero
`Result
`Overflow
`
`FIGURE 4.21 The symbol commonly used to represent an ALU, as shown in Figure 4.19.
`This symbol is also used to represent an adder, so it is normally labeled either with ALU or
`Adder.
`
`Carry Lookahead
`The next question is, How quickly can this ALU add two 32-bit operands? We
`can determine the a and b inputs, but the Carry In input depends on the oper(cid:173)
`ation in the adjacent 1-bit adder. If we trace all the way through the chain of
`dependencies, we connect the most significant bit to the least significant bit,
`so the most significant bit of the sum must wait for the sequential evaluation of
`all 32 I-bit adders. This sequential chain reaction is too slow to be used in
`time-critical hardware.
`There are a variety of schemes to anticipate the carry so that the worst-case
`scenario is a function of the log2 of the number of bits in the adder. These an(cid:173)
`ticipatory signals are faster because they go through fewer gates in sequence,
`but it takes many more gates to anticipate the proper carry.
`A key to understanding fast carry schemes is to remember that, unlike soft(cid:173)
`ware, hardware executes in parallel whenever inputs change.
`
`Fast Carry Using "Infinite" Hardware
`Appendix B mentions that any equation can be represented in two levels of
`logic. Since the only external inputs are the two operands and the Carryln to
`the least significant bit of the adder, in theory we could calculate the Carryln
`values to all the remaining bits of the adder in just two levels of logic.
`For example, the Carry In for bit 2 of the adder is exactly the CarryOut of bit
`1, so the formula is
`Carryln2 = (b 1 · Carrylnl) + (a 1 · Carry In I)+ (al · b 1)
`Similarly, Carrylnl is defined as
`Carrylnl = (b0 · Carryln0) + (a0 · Carryln0) + (a0 · b0)
`
`INTEL - 1012
`
`
`
`Anonymous, Elizabethan manuscript, 1570
`
`x
`
`Product
`
`With the construction of the ALU and explanation of addition, subtraction,
`and shifts, we are ready to build the more vexing operation of multiply.
`• But first let's review the multiplication of decimal numbers in longhand to
`remind ourselves of the steps and the names of the operands. For reasons that
`will become clear shortly, we limit this decimal example to using only the dig(cid:173)
`its 0 and 1. Multiplying 10001en by 10011en:
`Multiplicand
`1 000 t en
`. l00l te n
`Multiplier
`1000
`0000
`0000
`1000
`1001000ten
`The first operand is called the multiplicand and the second the multiplier. The
`final result is called the product. As you may recall, the algorithm learned in
`grammar school is to take the digits of the multiplier one at a time from right
`to left, multiplying the multiplicand by the single digit of the multiplier and
`shifting the intermediate product one digit to the left of the earlier inter(cid:173)
`mediate products.
`The first observation is that the number of digits in the product is consider(cid:173)
`ably larger than the number in either the multiplicand or the multiplier. In fact,
`if we ignore the sign bits, the length of the multiplication of an n-bit multipli(cid:173)
`cand and an m-bit multiplier is a product that is n + m bits long. That is, n + m
`bits are required to represent all possible produ<:cts. Hence, like add, multiply
`must cope with overflow because we frequently want a 32-bit product as the
`result of multiplying two 32-bit numbers.
`In this example we restricted the decimal digits to O and 1. With only two
`choices, each step of the multiplication is simple:
`
`1. Just place a copy of the multiplicand (1 x multiplicand) in the proper
`place if the multiplier digit is a 1, or
`
`2. Place O (0 x multiplicand) in the proper place if the digit is 0.
`
`INTEL - 1012
`
`
`
`4.6 Multiplication
`
`251
`
`Although the decimal example above happened to use only O and 1, multipli(cid:173)
`cation of binary numbers must always use O and 1, and thus always offers
`only these two choices.
`Now that we have reviewed the basics of multiplication, the traditional next
`step is to provide the highly optimized multiply hardware. We break with tra(cid:173)
`dition in the belief that you will gain a better understanding by seeing the evo(cid:173)
`lution of the multiply hardware and algorithm through three generations. The
`rest of this section presents successive refinements of the hardware and the
`algorithm until we have a version used in some computers. For now, let's
`assume that we are multiplying only positive numbers.
`
`First Version of the Multiplication Algorithm and Hardware
`The initial design mimics the algorithm we learned in grammar school; the
`hardware is shown in Figure 4.25. We have drawn the hardware so that data
`flows from top to bottom to more closely resemble the paper-and-pencil
`method.
`Let's assume that the multiplier is in the 32-bit Multiplier register and that
`the 64-bit Product register is initialized to 0. From the paper-and-pencil exam(cid:173)
`ple above, it's clear that we will need to move the multiplicand left one digit
`each step as it may be added to the intermediate products. Over 32 steps a
`
`-Multiplicand
`
`Shift left
`
`-Multiplier
`
`Shift right
`
`32 bits
`
`Product
`
`Write
`
`64 bits
`
`FIGURE 4.25 First version of the multiplication hardware. The Multiplicand register, ALU,
`and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. The
`32-bit multiplicand starts in the right half of the Multiplicand register, and is shifted left 1 bit on
`each step. The multiplier is shifted in the opposite direction at each step. The algorithm starts
`with the product initialized to 0. Control decides when to shift the Multiplicand and Multiplier
`registers and when to write new values into the Product register.
`
`INTEL - 1012
`
`
`
`252
`
`Chapter 4 Alitllmatlc for Computer(cid:127)
`
`32-bit multiplicand would move 32 bits to the left. Hence we need a 64-bit Mul(cid:173)
`tiplicand register, initialized with the 32-bit multiplicand in the right half and
`0 in the left half. This register is then shifted left 1 bit each step to align the mul(cid:173)
`tiplicand with the sum being accumulated in the 64-bit Product register.
`Figure 4.26 shows the three basic steps needed for each bit. The least signif(cid:173)
`icant bit of the multiplier (MultiplicrO) determines whether the multiplicand is
`
`Start
`
`MultiplierO : 1
`
`Multipliero = o
`
`1a. Add multiplicand to product and
`place the result In Product register
`
`2. Shift the Multiplicand register left 1 bit
`
`3. Shift the Multiplier register right 1 bit
`
`Done
`
`FIGURE 4.28 The first multlpllcatlon algorltllm, u•ln,i the hardware shown In Figure 4.25,
`If thl' l~ast signific,mt bit of the multiplier is 1, add the mulliplicand to the product. If nut. gu to
`the n,•xt step. Shift tlw multiplicand ldt ,md thi, multiplier ri~hl in the ne>.t twu steps. These three
`step, an• repeated 32 limes.
`
`INTEL - 1012
`
`
`
`4,8 Multlpllcatlon
`
`253
`
`added to the Product register. The left shift in step 2 has the effect of moving
`the intermediate operands to the left, just as w hen multiplying by hand. The
`shift right in step 3 gives us the next bit of the multiplier to examine in the fol(cid:173)
`lowing iteration. These three steps are repeated 32 times to obtain the
`product.
`
`Example
`
`Answer
`
`First Multiply Algorithm
`
`Using 4-bit numbers to save space, multiply 21en x 31en, or 00101wo
`X 0011two·
`
`Figure 4.27 shows the value of each register for each of the steps labeled
`according to Figure 4.26, with the final value of 0000 01101wo or 61en· Color
`is used to indicate the register values that change on that step, and the bit
`circled is the one examined to determine the operation of the next step.
`
`Step
`
`it:Mi@iMll~MIWiifi:i·IM#tffiN
`
`IIMtM
`Initial values
`la: 1 (cid:157) Prod = Prod + Mcand
`2: Shift left Multiplicand
`3: Shift right Multiplier
`la: 1 (cid:157)
`Prod = Prod + Mcand
`2: Shift left Multiplicand
`3: Shift right Multiplier
`- - - -+
`1: o ~ no operation
`3
`2: Shift left Multiplicand
`3: Shift right Multlpller
`1: 0 (cid:157)
`no operation
`0000
`- - ---+--
`2: Shift left Multiplicand
`0000
`- - - ---+--
`-
`-
`-0000
`'--- -- ~3_: _S_hi_ft~rlght M_ul_tipc...l_ie_r - - -~-
`
`0
`1
`
`2
`
`4
`
`00
`0011
`0011
`00
`0001
`0001
`00
`0000
`0000
`•
`- -+--== ...
`0
`
`00000010
`0000 0010
`0000 0100
`00000100
`0000 0100
`00001000
`00001000
`00001000
`00010000
`00010000
`00010000
`0010 0000
`0010 0000
`
`0000 0000
`00000010
`0000 0010
`0000 0010
`00000110
`00000110
`00000110
`00000110
`00000110
`00000110
`00000110
`00000110
`00000110
`
`FIGURE 4.27 Multlply •J1:(cid:127) mple u•lng flrat algorlthm In Figure 4.28. The bit examined l{l
`determine the next step Is circled in color.
`
`If each step took a clock cycle, this algorithm would require almost 100 dc;,ck
`cycles to multiply two 32-bit numbers. The relative importance of arithmetic
`operations like multiply varies with the program, but addition and subtraction
`may be anywhere fromS to 100 times more popular than multiply. According(cid:173)
`ly, in many applications, multiply can take multiple clock cycles without sig(cid:173)
`nificantly affecting performance. Yet Amdahl's law (see Chapter 2, page 75)
`reminds us that even a moderate frequency for a slow operation can limit
`performance.
`
`INTEL - 1012
`
`
`
`254
`
`C haptar 4 Arithmetic for Comp11tan1
`
`Multiplicand
`
`Shift right
`Write
`
`Product
`
`64 bits
`
`-Multiplier
`
`Shift right
`
`32 bits
`
`FIGURE 4.28 Second version of the multlpllcatlon hard""are. CompAre with the firr,t vt•r(cid:173)
`sion in Figurl' 4.25. Tlu- Multiplicand rqiisler, ALU, nnd Multiplier Tl'~ister MP all J2 bits wid,•,
`"'ith only the, l'rodud rcgisMr lt>ft at 64 bit,. Now the product is shifl·c•d rii,:;ht. The,., d1nnges arc
`hi~hli)'\htect in color.
`
`Second Version of the Multiplication Algorlthm
`and Hardware
`Computer pioneers recognized that half of the bits of the multiplicand in the
`first algorithm were always 0, so only half could contain useful bit values. A
`full M-hit ALU thus seemed wasteful rmd slow since half of the adder bit~
`were adding Oto the intermediate sum,
`The original algorithm s'llifts the multiplicand left with Os inst•rted in tlw
`new positions, so the multiplicand cannot affect the least significant bits of the
`product after they settle down. lns te.:id of shifting the multiplicand left, they
`wondL•red, whal if we shiH the l'roducf l'i:,?/lf? Now lhe multiplicand would be
`fixed rel,1tive to tht.• product, ,md since we are adding only 32 bits, tl1e adder
`need be only 32 bits wide. Figure 4.28 shows how this change halves the
`widths of both the ALU ,md the multiplicand.
`Figure 4,29 shows the multiply algorithm inspired by this observation. This
`algorithm st.:irts with the 32-bit Multiplicand and 32-bit Multiplier registers set
`to their named values and the 64-bit Product register set to 0. This algorithm
`only forms a 32-bit sum, so only the left half of the 64-bit Product regislL'r is
`changed by the .idditinn.
`
`INTEL - 1012
`
`
`
`4,8 Multlpllcatlon
`
`255
`
`Start
`
`MultlpllerO = 1
`
`MultiplierO = 0
`
`la. Add multlpllcand to the left hair of
`the product and place the result In
`the left half of the Product register
`
`2. Shift the Product register right 1 bit
`
`3. Shift the Multiplier register right 1 bit
`
`Done
`
`FIGURE 4.29 The second multlpllcatlOII elgorlthm, ualnS: the hardware In FIS:ure 4.28. In
`this version, the Product register is shifted right instead of shif!ing the multiplicand. Color type
`shows the chal\ges from Figure 4.26.
`
`INTEL - 1012
`
`
`
`256
`
`Ch11pter 4 Arithmetic for Computers
`
`Second Multiply Algorithm
`
`Multiply 00101wo x 0011 twllusing the algorithm in Figure 4.29.
`
`Figure 4.30 shows the revised 4-bit example, again giving a product of
`0000 0l l01w,,·
`iiMiiM:F
`
`Step
`
`i@\MiH&i?MiMHi:i·MMM@N
`
`o
`1
`
`2
`
`3
`
`0000 0000
`0010
`00
`Initial values
`--4- --'----jf-------+--
`- - - - - -- --
`0010 0000
`la: 1 => Prod = Prod + Mcand
`0011
`0010
`00010000
`2: Shift right Product
`0011
`0010
`- - -----+- --~-ae--
`00010000
`00~
`0010
`3: Shift right Multlpl_ie_r ___ -+--- - -- -+ - - - --
`la: 1 ,,> Prod ,, Prod + Mcand
`0011 0000
`0001
`0010
`00011000
`2: Shift right Product
`0001
`0010
`00
`3: Shift right Multiplier
`0010
`00011000
`0010
`1: 0 => no operation ·
`0000
`00011000
`00001100
`0010
`2: Shift rilllll Proauct
`0000
`0000
`0010
`0000 1100
`3: Shifl right Multiplier
`1 : o => 110 operation
`0000
`0010
`00001100
`0000 0110
`0000
`0010
`2: Shifl right Product
`- - - - - --+---=~-
`0000
`0000 0110
`3: Shift right Multiplier
`0010
`
`FIGURE 4,30 Multiply example ualng Mcond algorithm in Figura 4. 29. The bit examined to
`determi.ne the next step is circled in color.
`
`Final Version of the Multiplication-Algorithm and
`Hardware
`The final ob!;ervali1m of the frugal computer piuneers was that the Product
`register had wallled ~pace that matched exactly the size of the multiplier: As
`the wasted space in the product disappears, so do the bits of tlw multiplier. In
`response, the third version of the multiplication algorithm combi.11eti the right(cid:173)
`most half of the product with the multiplier. figure 4.31 shows the hardware.
`The least significant bit of the 64-bit Product register (Product0) now is the bit
`to be tc~ted.
`The algorithm starts by assigning the multiplier to the right ha lf of the Prod(cid:173)
`uct register, placing O in the upper half. Figure 4.32 shows the new steps.
`
`INTEL - 1012
`
`
`
`4.6 Multlpllcatlon
`
`257
`
`Multiplicand
`
`Shift right
`Write
`
`Control
`test
`
`Product
`
`64 bits
`
`FIGURE 4.31 Third version of the multlpllcatlo(cid:127) hardware. Comparing w ith the second ver(cid:173)
`sion in Figure 4.28 on page 2S4, the separate Multiplier register has disappeared. The multiplier
`is placed instead in the right half of the Product register.
`
`Third Multiply Algorithm
`
`Multiply 001 Otwo x 00111wo using the algorithm in Figure 4.32.
`
`Figure 4.33 shows the revised 4-bit example for the final algorithm.
`
`Signed Multiplication
`Sofor we have dealt with positive numbers. The easiest way to understand
`how to deal with signed numbers is to first convert the multiplier and multi(cid:173)
`plicand to positive numbers and then remember the original signs. The algo(cid:173)
`rithms should then be run for 31 iterations, leaving the signs out of the
`calculation. As we learned in grammar school, we need negate the product
`only if the original signs disagree.
`It turns o ut that the last algorithm will work for signed numbers provided
`that we remember that the numbers we are dealing with have infinite digits,
`and that we are only representing them w1fh 32 bits. Hence the shifting steps
`would need to extend the sign of the product for signed numbers. When the
`algorithm completes, the lower word would have the 32-bit product.
`
`INTEL - 1012
`
`
`
`Chapter 4 Arithmetic for Computers
`
`Start
`
`Pro<:1uctO = 1
`
`ProduclO = O
`
`1a. Add multiplicand to the lett half of
`the product and place the result 111
`the left half of the Product register
`
`2. Shift the Product register right 1 bit
`
`Done
`
`FIGURE 4 .32 The third multlpllcatlon (cid:127)
`ICortlhm. It nel'<iS only two steps because thl! J'roduct
`and Multiplier regi,tl!rs have been combined. Color type shows changes from Figure 4.2'1.
`
`Mulllpllcand -
`
`NIMtdi:E
`
`Step
`
`Initial values
`la: 1 => Proo = Prod + Mcand
`2: Shirt right Product
`la: 1 => Prod = Prod + Mcand
`2: Shilt right Product - -- -
`1: 0 => no operation - - - -
`2: Shilt right Product
`- ----
`1; 0 •> no operation
`2: Shift right P-rOd_u_c_l _ _ _
`
`•
`
`I 0000 ~
`0010
`10- - -+ , - -0010 0011
`10
`0001 Ooo:!)
`10
`0011 0001
`00011ocg
`0010
`0001 1000
`0000 ll<@
`00001100
`0000 0110
`
`---- 0010
`
`00_10 _ _ _
`0010
`0010
`
`f'IGURE 4.83 MuNlply exampla ualn, third al.orlthm In Figure 4.32. The bit l!Xamined to
`determine, the next ,-tep is circled in rnlo r.
`
`INTEL - 1012
`
`
`
`4.6 Multlplicatlon
`
`259
`
`Booth's Algorithm
`A more elegant approach to multiplying signed numbers than above is called
`Booth's algorithm. It starts with the observation that with the ability to both
`add and subtract there are multiple ways to compute a product. Suppose we
`want to multiply 2ten by 6ten, or 0010two by OllOtwo:
`00l0two
`0ll0two
`
`X
`
`+
`0000
`+
`0010
`+
`0010
`+ 0000
`
`s hift (0 in multi pl i er)
`( 1 in multi pl i er)
`add
`( 1 in multi pl i er)
`add
`shift (0 in multi pl i er)
`
`00001 l00two
`Booth observed that an ALU that could add or subtract could get the same
`result in more than one way. For example, since
`
`6ten
`
`= - 2ten + Bten
`
`or
`
`0ll0two
`
`= - 00l0two + l000two
`
`we could replace a string of ls in the multiplier with an initial subtract when
`we first see a 1 and then later add when we see the bit after the last 1. For
`example,
`
`00l0two
`0ll0t\o/0
`
`X
`
`+
`
`0000 shift (0 in multiplier)
`sub (fir st 1 in multiplier)
`0010
`shift (middle of string of ls)
`+
`0000
`+_0_0_l_0 __ add (p rior step had la st 1)
`
`00001 l00two
`Booth invented this approach in a quest for speed because in machines of
`his era shifting was faster than addition. Indeed, for some patterns his algo(cid:173)
`rithm would be faster; it's our good fortune that it handles signed numbers as
`
`INTEL - 1012
`
`
`
`.....
`
`..., .............. b
`
`....., ..
`
`..., ..,
`
`..,.. .. .,...,..,.._,.,J
`
`...................... ......... ......... ,
`alone.
`If we are limited to looking at just 2 bits, we can t