throbber
S E C O N D ED IT ION
`
`Computer Organization and Design
`
`THE HARDWARE/SOFTWARE INTERFACE
`
`John L. Hennessy
`Stanford University
`
`David A. Patterson
`University of California, Berkeley
`
`With a contribution by
`James R. Larus
`University of Wisconsin
`
`M ::(cid:141)
`
`Morgan Kaufmann Publishers, Inc.
`San Francisco, California
`
`INTEL - 1012
`
`

`

`Indexer Steve Rath
`Printer Courier Corporation
`
`Morgan Kaufmann Publishers, Inc.
`Editorial and Sales Office:
`340 Pine Street, Sixth Floor
`San Francisco, CA 94104-3205
`USA
`
`Telephone 415/392-2665
`Facsimile 415/982-2665
`Email
`111kp@111kp.co111
`WWW
`lzttp://www.111kp.co111
`Order toll free 800/745-7323
`
`© 1998 by Morgan Kaufmann Publishers, Inc.
`All rights reserved
`Printed in the United States of America
`
`02 01 00 99
`
`5 4 3 2
`
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`or by any means-electronic, mechanical, photocopying, recording, or otherwise--without the prior
`written permission of the publisher.
`
`Advice, Praise, and Errors: Any correspondence related to this publication or intended for the authors
`should be sent electronically to cod2l111gs@111kp.co111. Information regarding error sightings is encouraged.
`Any error sightings that are accepted for correction in subsequent printings will be rewarded by the
`authors with a payment of $1.00 (U.S.) per correction at the time of their implementation in a reprint.
`
`Library of Congress Cataloging-in-Publication Data
`Patterson, David A.
`Computer organiza tion and design: the hardware/ software interface
`I David A. Patterson, John L. Hennessy.-2nd ed.
`p. cm.
`Includes bibliographical references and index.
`ISBN 1-55860-428-6 (cloth).-ISBN 1-55860-491-X (paper)
`1. Computer organization. 2. Computers-Design and construction.
`I. Hennessy, John L.
`3. Computer interfaces.
`II. Title
`QA76.9.C643H46
`1997
`004.2'2-dc21
`
`97-16050
`
`INTEL - 1012
`
`

`

`14
`
`Cliapter 1 C~mp11ter Abstractions and Technology
`
`lmd!le a personal compllter. Th-, n,rtic,11 board in the back is a print-,d drm1t
`f!GUR£ t ,!I
`L><.w·d (PC bnard). called th• 1:1,1t11c,·h•nrd 1n a PC, th~t c:ontairn, mo,t uf the eledronic5 of thie com(cid:173)
`p1.1ter: Figure 1.11 is a11 "'·erhe~d phot<1graph of that board, rotated 9(1 degree,. TI,e proces"c>r i~
`the iarf,t' hlack 1wtan5l<' m :he lower-right corner 0! th<' board lFigure 1.9 is a ph.:itngrarh o f the
`proc<";,,,r bef<,re it is f>ln1ted in the black packdge. I The two lar);e board, a lldched perpendiLularly
`in the top third oi the m"therboard on the right c·ontain input / o utput interface, to th" Ethern<et
`hxal are,1 netwcrk ar.d il ,·idco c,rd frir" CRT. The tWP.!-mall board, att.J,hed pe·rpendiculJrly to
`the middle oi the mothcrbc•ard contain the memory chips. The large box on the h,wer Jett is~
`,agt> for 111ou11ting: ,1cld,tic,nal ,irives: above it are ~ hont disk drive and a floppy disk dnve. The
`pmv1o1• sllp)Yly is n•xn1,;l'v 111 th1c nght <'f lhh bL.>x, but ii w<1, remL>\'ed lo gel a bell<'r vie\'\ of th~
`tni,theTbo.ard.
`
`The proces,or is the active part of the board, following the instructions of a
`program to the letter. It ;;idds numbers, tests numbers, signals 1/0 d evices t0
`activate, and so on. ·1·he processor is the !,uge square below the memory boards
`in the lower-right corner of Figure 1.8. Occasionally, reople call the processc,r
`the Cl-'U, for the more bureaucratic-stiunding antral proa-%or 1m2t.
`Descending ;,ven lmver into the hardware, Figure 1.q reveals details of the
`processor in Figure LS. The proce">Sor comprisec; two main components:
`Jatilpath and contro), the respective bra,..,·n and brain of the proces:C.or. The
`datll'path performs the arithmetic operations, and co11fn1/ tells the datapath,
`nwmnrr, :md f ;n ckvicr:,, wh;it to cit1 .~fcnnling to tlw wir.h(',. of thP in~tn1r-
`1i,11i•, nl i!w f'l"< 'F.J';,n, ( 'h,,pit'r r;
`,·,111,,111:, the d,1i,1p,1tli ,1t1d u,ntr,,] fur ,,
`',I r,11_,;hl 1·Pn,·,1 r, l 1111pk•·nL'Hl.1 ll<>!l . . rnd l. ·1;,1 ptl'r h dt•si ril,,_•s t hv chcinge" lWt'<h•d
`It:, .i hl;½h<·r p,~,·f<Jn·r'i.lHh ,· dt 1:-,1~n.
`
`INTEL - 1012
`
`

`

`mucn naraer 10 come up mr an excuse tni:ll:, 1 LI u aoes not sign extenu its 1m111ernate.
`Since andi and ori normally work with unsigned integers, the immediates are
`treated as unsigned integers as well, meaning that they are expanded to 32 bits by pad(cid:173)
`ding with leading Os instead of sign extension. Thus if the bit fields in the third line of
`the example above extended beyond the 16 least significant bits, the and i instruction
`would need a 32-bit constant to avoid clearing the upper portion of the fields.
`The MIPS assembler creates 32-bit constants with the pair of instructions l u i and
`or i; see Chapter 3, page 14 7 for an example of creating 32-bit constants using 1 u i
`and addi.
`
`(cid:127) Constructing an Arithmetic Logic Unit
`
`ALU n. [Arthritic Logic Unit or (rare) Arithmetic Logic Unit] A random-number
`generator supplied as standard with all computer systems.
`
`Stan Kelly-Bootle, Tlze Devil's DP DictionanJ, 1981
`
`The arithmetic logic unit or ALU is the brawn of the computer, the device that
`performs the arithmetic operations like addition and subtraction or logical
`operations like AND and OR. This section constructs an ALU from the four
`hardware building blocks shown in Figure 4.8 (see Appendix B for more
`details on these building blocks). Cases 1, 2, and 4 in Figure 4.8 all have two
`inputs. We will sometimes use versions of these components with more than
`two inputs, confident that you can generalize from this simple example. In
`any case, Appendix B provides examples with more inputs. (You may wish to
`review sections B. l through B.3 before proceeding further.)
`Because the MIPS word is 32 bits wide, we need a 32-bit-wide ALU. Let's as(cid:173)
`sume that we will connect 32 1-bit ALUs to create the desired ALU. We'll
`therefore start by constructing a 1-bit ALU.
`
`A 1-Bit ALU
`The logical operations are easiest, because they map directly onto the hard(cid:173)
`ware components in Figure 4.8.
`
`INTEL - 1012
`
`

`

`more simple operations, such as generating 0. The easiest way to add an oper(cid:173)
`ation is to expand the multiplexor controlled by the Operation line and, for this
`example, to connect 0 directly to the new input of that expanded multiplexor.
`
`A 32-Bit ALU
`Now that we have completed the 1-bit ALU, the full 32-bit ALU is created by
`connecting adjacent 'black boxes." Using xi to mean the ith bit of x, Figure 4.15
`shows a 32-bit ALU. Just as a single stone can cause ripples to radiate to\ the
`shores of a quiet lake, a single carry out of the least significant bit (Result0)
`can ripple all the way through the adder, causing a carry out of the most sig(cid:173)
`nificant bit (Result31). Hence, the adder created by directly linking the carries
`of 1-bit adders is called a ripple carry adder. We'll see a faster way to connect
`the 1-bit adders starting on page 241.
`
`Operation
`Carryln
`
`Result
`
`FIGURE 4.14 A 1-blt ALU that performs AND, OR, and addition (see Figure 4.13).
`
`Carryout
`
`INTEL - 1012
`
`

`

`4.S Constructing an Arithmetic I.ogle Unit
`
`235
`
`Carryln
`
`Operation
`
`ao--+ Carryln -----+-- ResultO
`
`bO --+ ALUO
`CarryOut
`
`a1
`
`b1
`
`Carryln
`ALUl 1 - - - -- - 1 - -(cid:141) Result1
`
`a2 --+ Carryln
`
`b2 --+ ALU2
`Carryout
`
`1------+-- Result2
`
`!
`
`a31 --+ Carrylri
`
`b31 --+
`
`ALU31
`
`Rcsult31
`
`FIGURE 4.15 A 32-blt AL.U constructed from 32 1-blt ALU(cid:127) . Carryout of the less signifirnnt
`bit is connected to the Carry In of the more significant bit. This organizlltion is caUcd ripple carry.
`
`Subtraction is the same as adding the negative version of an operand, and
`this is how adders perform subtraction. Reci\U that the shortcut for negating a
`two's complement number is to invert each bit (sometimes called the one's com(cid:173)
`plemrnt as explained in the elaboration on page 21 9) alld then add 1. To invert
`each bit, we simply add a 2:1 multiplexor tha t chooses between band b , as
`Figure 4.16 shows.
`Suppose we connect 32 of the~e 1-bit ALUs, as we did in Figure 4.1 5. The
`added multiplexor gives the option of b or its inverted value, depending on
`Binvert, bu t this is only one step in negc1ting" two's complement number. No(cid:173)
`tice that the least i:;ignificant bit still has a Cc1rryln signal, even though it's un(cid:173)
`necessary for addition. What happens if we set this Carry In to 1 instead of O?
`
`INTEL - 1012
`
`

`

`238
`
`Chapter 4 Arithmetic fOf' Co..,.,..,..
`
`Blnvert
`
`Operation
`Carryln
`
`a-+---~----i
`
`..
`
`Result
`
`0
`
`1
`
`2
`
`Carry()ut
`
`FIGURE 4.18 A 1-lllt ALU that JMlflorma AND, OR, and addition on a and II Of a and b. By
`selecting b (Binvert " 1) and setting Carryln to I in the least significant bit of the ALU, we get
`two's romplement subtraction of b from ,1 instead of addition of b to a.
`
`Tl1e adder will then calculate a + b + 1. By selecting the inverted version of b,
`we get exactly what we want:
`
`a+b+ I = a+(b+ I) = a+(-b) = a -b
`
`The simplicity of the hardware d esign of a two's complement adder helps
`explain why two's complement representation has become the universal stan(cid:173)
`dard for integer computer arithmetic,
`
`Tailoring the 32-Bit ALU to MIPS
`This set of operations-add, subtract, AND, OR-is found in the ALU of
`c1lmost every computer. If we look at Figure 4.7 on page 228, we see that the
`operations of most MIPS instructions can be performed by this ALU. But the
`design of the ALU is incomplete.
`One inslruction that still needs support is the set on less than instruction
`(s l t ). Recall that the operation produces 1 if rs < rt, and O otherwise. Conse(cid:173)
`quently, s 1 t will set all but the least significant bit to 0, with the least signifi(cid:173)
`cant bit set according to the comparison. For the ALU to perfom1 s l t, we first
`need to expand the three-input multiplexor in Figure 4.16 to add an input for
`the s 1 L result. We call that new input /,ess, and use it only for s 1 t.
`
`INTEL - 1012
`
`

`

`4.5 Constructing an Arithmetic Logic Unit
`
`237
`
`The top drawing of Figure 4.17 shows the new 1-bit ALU with the expanded
`multiplexor. From the description of s 1 t above, we must connect Oto the Less
`input for the upper 31 bits of the ALU, since those bits are always set to 0. What
`remains to consider is how to compare and set the least significant bit for set on
`less than instructions.
`What happens if we subtract b from a? If the difference is negative, then a<
`b since
`
`(a-b)<0 (cid:157)
`((a - b)+b)<(0+b)
`(cid:157)
`a<b
`We want the least significant bit of a set on less than operation to be a 1 if a <
`b; that is, a 1 if a - bis negative and a 0 if it's positive. This desired result cor(cid:173)
`responds exactly to the sign-bit values: 1 means negative and 0 means posi(cid:173)
`tive. Following this line of argument, we need only connect the sign bit from
`the adder output to the least significant bit to get set on less than.
`Unfortunately, the Result output from the most significant ALU bit in the
`top of Figure 4.17 for the s 1 t operation is not the output of the adder; the ALU
`output for the s 1 t operation is obviously the input value Less.
`Thus, we need a new 1-bit ALU for the most significant bit that has an extra
`output bit: the adder output. The bottom drawing of Figure 4.17 shows the de(cid:173)
`sign, with this new adder output line called Set, and used only for s 1 t . As long
`as we need a special ALU for the most significant bit, we added the overflow
`detection logic since it is also associated with that bit.
`Alas, the test of less than is a little more complicated than just described be(cid:173)
`cause of overflow; Exercise 4.23 on page 326 explores what must be done.
`Figure 4.18 shows the 32-bit ALU.
`Notice that every time we want the ALU to subtract, we set both Carryln
`and Binvert to 1. For adds or logical operations, we want both control lines to
`be 0. We can therefore simplify control of the ALU by combining the Carryln
`and Binvert to a single control line called Bnegate.
`To further tailor the ALU to the MIPS instruction set, we must support con(cid:173)
`ditional branch instructions. These instructions branch either if two registers
`are equal or if they are unequal. The easiest way to test equality with the ALU
`is to subtract b from a and then test to see if the result is 0 since
`(a - b = 0) (cid:157) a = b
`Thus, if we add hardware to test if the result is 0, we can test for equality.
`The simplest way is to OR all the outputs together and then send that signal
`through an inverter:
`
`Zero = (Result31 + Result30 + . . . + Result2 + Resultl + Result0)
`Figure 4.19 shows the revised 32-bit ALU. We can think of the combination
`of the 1-bit Bnegate line and the 2-bit Operation lines as 3-bit control lines for
`
`INTEL - 1012
`
`

`

`238
`
`Chapter 4 Arithmetic: for Computers
`
`Binvert
`
`Operat ion
`Carryln
`
`a
`
`b
`
`Less
`
`0
`
`1
`
`Result
`
`1
`
`2
`
`3
`
`CarryOUl
`
`Binvert
`
`Operat ion
`Carryln
`
`a -1------1--.... - - ,
`
`b
`
`Less
`
`1
`
`2
`
`3
`
`Result
`
`Set
`
`Overflow
`
`Overflow
`detection
`
`(Top) A 1-blt ALU that perform& AND, OR, end addition on a and b orb, and
`FIGURE 4.17
`(bottom) a 1-blt ALU for the moat algnltic:11nt bit. The top drnwing i ndudcs a direct input that
`is connected to perform the set on less than operation (see Figure 4.18); the bottom has a direct out(cid:173)
`put from the adder for the less than comparison called Set. (Rcfor to Exercise 4.42 to see how to cal(cid:173)
`culate overAow with fewer inputs.)
`
`INTEL - 1012
`
`

`

`4.5 Constructing an Arithmetic Logic Unit
`
`239
`
`Blnvert
`
`Carryln
`
`Operation
`
`aO
`bO
`
`al
`bl
`0
`
`a2
`b2
`0
`
`1 - - -- -1--- ResultO
`
`1--------11--.. Result!
`
`1--------11--.. Resu lt2
`
`Less
`Carryout
`
`Carryln
`a31
`b31--+ ALU31
`O
`Less
`
`1--------+ Result31
`1------set
`l--------11--.. overflow
`
`FIGURE 4.18 A 32-blt ALU constructed from the 31 copies of the 1-blt ALU In the top of
`Figure 4.17 and one 1-blt ALU In the bottom of that figure. The Less inputs are connec1te<I tr,
`0 except for the least significant bit, and that is connected to !he Set output o f the most sign i limnt
`bit. If the ALU performs a - b and we select the input 3 in the multiplexor in Figur" 4.17, ,then
`Result= O . . . 001 if a < b, and Resull = 0 .. . 000 otherwise.
`
`the ALU, telling ii to perform add, subtract, AND, OR, or set on less than.
`Figure 4.20 shows the ALU control lines and the corresponding ALU opera(cid:173)
`tion.
`Finally, now that we have seen what is inside a 32-bit ALU, we will use the
`universal symbol for a complete ALU, as shown in Figure 4.21.
`
`INTEL - 1012
`
`

`

`240
`
`Chapter 4 Arithmetic for Computers
`
`Bnegate
`
`Operation
`
`ao
`bO
`
`a1
`b1
`0
`
`a2
`b2
`0
`
`a31
`b31
`0
`
`ResultO
`
`Less
`Carryout
`
`ResulU
`
`Result2
`
`Less
`Carryout
`
`Zero
`
`1------t--------------- Overflow
`
`FIGURIE 4.19 The 1in•I 32-blt ALU. This acids a Zero detector to Figure 4.18.
`
`ALU control lines
`
`Function
`
`000
`001
`010
`110
`111
`
`and
`or
`add
`subtract
`se1 on less than
`
`j
`
`FIGURE 4.20 The value• of the three ALU control lines Bnegate and OperQtion and the
`corre9ponding ALU operationa.
`
`INTEL - 1012
`
`

`

`4.5 Constructing an Arithmetic Logic Unit
`
`241
`
`ALU operation
`
`a
`
`b
`
`Carryout
`
`Zero
`Result
`Overflow
`
`FIGURE 4.21 The symbol commonly used to represent an ALU, as shown in Figure 4.19.
`This symbol is also used to represent an adder, so it is normally labeled either with ALU or
`Adder.
`
`Carry Lookahead
`The next question is, How quickly can this ALU add two 32-bit operands? We
`can determine the a and b inputs, but the Carry In input depends on the oper(cid:173)
`ation in the adjacent 1-bit adder. If we trace all the way through the chain of
`dependencies, we connect the most significant bit to the least significant bit,
`so the most significant bit of the sum must wait for the sequential evaluation of
`all 32 I-bit adders. This sequential chain reaction is too slow to be used in
`time-critical hardware.
`There are a variety of schemes to anticipate the carry so that the worst-case
`scenario is a function of the log2 of the number of bits in the adder. These an(cid:173)
`ticipatory signals are faster because they go through fewer gates in sequence,
`but it takes many more gates to anticipate the proper carry.
`A key to understanding fast carry schemes is to remember that, unlike soft(cid:173)
`ware, hardware executes in parallel whenever inputs change.
`
`Fast Carry Using "Infinite" Hardware
`Appendix B mentions that any equation can be represented in two levels of
`logic. Since the only external inputs are the two operands and the Carryln to
`the least significant bit of the adder, in theory we could calculate the Carryln
`values to all the remaining bits of the adder in just two levels of logic.
`For example, the Carry In for bit 2 of the adder is exactly the CarryOut of bit
`1, so the formula is
`Carryln2 = (b 1 · Carrylnl) + (a 1 · Carry In I)+ (al · b 1)
`Similarly, Carrylnl is defined as
`Carrylnl = (b0 · Carryln0) + (a0 · Carryln0) + (a0 · b0)
`
`INTEL - 1012
`
`

`

`Anonymous, Elizabethan manuscript, 1570
`
`x
`
`Product
`
`With the construction of the ALU and explanation of addition, subtraction,
`and shifts, we are ready to build the more vexing operation of multiply.
`• But first let's review the multiplication of decimal numbers in longhand to
`remind ourselves of the steps and the names of the operands. For reasons that
`will become clear shortly, we limit this decimal example to using only the dig(cid:173)
`its 0 and 1. Multiplying 10001en by 10011en:
`Multiplicand
`1 000 t en
`. l00l te n
`Multiplier
`1000
`0000
`0000
`1000
`1001000ten
`The first operand is called the multiplicand and the second the multiplier. The
`final result is called the product. As you may recall, the algorithm learned in
`grammar school is to take the digits of the multiplier one at a time from right
`to left, multiplying the multiplicand by the single digit of the multiplier and
`shifting the intermediate product one digit to the left of the earlier inter(cid:173)
`mediate products.
`The first observation is that the number of digits in the product is consider(cid:173)
`ably larger than the number in either the multiplicand or the multiplier. In fact,
`if we ignore the sign bits, the length of the multiplication of an n-bit multipli(cid:173)
`cand and an m-bit multiplier is a product that is n + m bits long. That is, n + m
`bits are required to represent all possible produ<:cts. Hence, like add, multiply
`must cope with overflow because we frequently want a 32-bit product as the
`result of multiplying two 32-bit numbers.
`In this example we restricted the decimal digits to O and 1. With only two
`choices, each step of the multiplication is simple:
`
`1. Just place a copy of the multiplicand (1 x multiplicand) in the proper
`place if the multiplier digit is a 1, or
`
`2. Place O (0 x multiplicand) in the proper place if the digit is 0.
`
`INTEL - 1012
`
`

`

`4.6 Multiplication
`
`251
`
`Although the decimal example above happened to use only O and 1, multipli(cid:173)
`cation of binary numbers must always use O and 1, and thus always offers
`only these two choices.
`Now that we have reviewed the basics of multiplication, the traditional next
`step is to provide the highly optimized multiply hardware. We break with tra(cid:173)
`dition in the belief that you will gain a better understanding by seeing the evo(cid:173)
`lution of the multiply hardware and algorithm through three generations. The
`rest of this section presents successive refinements of the hardware and the
`algorithm until we have a version used in some computers. For now, let's
`assume that we are multiplying only positive numbers.
`
`First Version of the Multiplication Algorithm and Hardware
`The initial design mimics the algorithm we learned in grammar school; the
`hardware is shown in Figure 4.25. We have drawn the hardware so that data
`flows from top to bottom to more closely resemble the paper-and-pencil
`method.
`Let's assume that the multiplier is in the 32-bit Multiplier register and that
`the 64-bit Product register is initialized to 0. From the paper-and-pencil exam(cid:173)
`ple above, it's clear that we will need to move the multiplicand left one digit
`each step as it may be added to the intermediate products. Over 32 steps a
`
`-Multiplicand
`
`Shift left
`
`-Multiplier
`
`Shift right
`
`32 bits
`
`Product
`
`Write
`
`64 bits
`
`FIGURE 4.25 First version of the multiplication hardware. The Multiplicand register, ALU,
`and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. The
`32-bit multiplicand starts in the right half of the Multiplicand register, and is shifted left 1 bit on
`each step. The multiplier is shifted in the opposite direction at each step. The algorithm starts
`with the product initialized to 0. Control decides when to shift the Multiplicand and Multiplier
`registers and when to write new values into the Product register.
`
`INTEL - 1012
`
`

`

`252
`
`Chapter 4 Alitllmatlc for Computer(cid:127)
`
`32-bit multiplicand would move 32 bits to the left. Hence we need a 64-bit Mul(cid:173)
`tiplicand register, initialized with the 32-bit multiplicand in the right half and
`0 in the left half. This register is then shifted left 1 bit each step to align the mul(cid:173)
`tiplicand with the sum being accumulated in the 64-bit Product register.
`Figure 4.26 shows the three basic steps needed for each bit. The least signif(cid:173)
`icant bit of the multiplier (MultiplicrO) determines whether the multiplicand is
`
`Start
`
`MultiplierO : 1
`
`Multipliero = o
`
`1a. Add multiplicand to product and
`place the result In Product register
`
`2. Shift the Multiplicand register left 1 bit
`
`3. Shift the Multiplier register right 1 bit
`
`Done
`
`FIGURE 4.28 The first multlpllcatlon algorltllm, u•ln,i the hardware shown In Figure 4.25,
`If thl' l~ast signific,mt bit of the multiplier is 1, add the mulliplicand to the product. If nut. gu to
`the n,•xt step. Shift tlw multiplicand ldt ,md thi, multiplier ri~hl in the ne>.t twu steps. These three
`step, an• repeated 32 limes.
`
`INTEL - 1012
`
`

`

`4,8 Multlpllcatlon
`
`253
`
`added to the Product register. The left shift in step 2 has the effect of moving
`the intermediate operands to the left, just as w hen multiplying by hand. The
`shift right in step 3 gives us the next bit of the multiplier to examine in the fol(cid:173)
`lowing iteration. These three steps are repeated 32 times to obtain the
`product.
`
`Example
`
`Answer
`
`First Multiply Algorithm
`
`Using 4-bit numbers to save space, multiply 21en x 31en, or 00101wo
`X 0011two·
`
`Figure 4.27 shows the value of each register for each of the steps labeled
`according to Figure 4.26, with the final value of 0000 01101wo or 61en· Color
`is used to indicate the register values that change on that step, and the bit
`circled is the one examined to determine the operation of the next step.
`
`Step
`
`it:Mi@iMll~MIWiifi:i·IM#tffiN
`
`IIMtM
`Initial values
`la: 1 (cid:157) Prod = Prod + Mcand
`2: Shift left Multiplicand
`3: Shift right Multiplier
`la: 1 (cid:157)
`Prod = Prod + Mcand
`2: Shift left Multiplicand
`3: Shift right Multiplier
`- - - -+
`1: o ~ no operation
`3
`2: Shift left Multiplicand
`3: Shift right Multlpller
`1: 0 (cid:157)
`no operation
`0000
`- - ---+--
`2: Shift left Multiplicand
`0000
`- - - ---+--
`-
`-
`-0000
`'--- -- ~3_: _S_hi_ft~rlght M_ul_tipc...l_ie_r - - -~-
`
`0
`1
`
`2
`
`4
`
`00
`0011
`0011
`00
`0001
`0001
`00
`0000
`0000
`•
`- -+--== ...
`0
`
`00000010
`0000 0010
`0000 0100
`00000100
`0000 0100
`00001000
`00001000
`00001000
`00010000
`00010000
`00010000
`0010 0000
`0010 0000
`
`0000 0000
`00000010
`0000 0010
`0000 0010
`00000110
`00000110
`00000110
`00000110
`00000110
`00000110
`00000110
`00000110
`00000110
`
`FIGURE 4.27 Multlply •J1:(cid:127) mple u•lng flrat algorlthm In Figure 4.28. The bit examined l{l
`determine the next step Is circled in color.
`
`If each step took a clock cycle, this algorithm would require almost 100 dc;,ck
`cycles to multiply two 32-bit numbers. The relative importance of arithmetic
`operations like multiply varies with the program, but addition and subtraction
`may be anywhere fromS to 100 times more popular than multiply. According(cid:173)
`ly, in many applications, multiply can take multiple clock cycles without sig(cid:173)
`nificantly affecting performance. Yet Amdahl's law (see Chapter 2, page 75)
`reminds us that even a moderate frequency for a slow operation can limit
`performance.
`
`INTEL - 1012
`
`

`

`254
`
`C haptar 4 Arithmetic for Comp11tan1
`
`Multiplicand
`
`Shift right
`Write
`
`Product
`
`64 bits
`
`-Multiplier
`
`Shift right
`
`32 bits
`
`FIGURE 4.28 Second version of the multlpllcatlon hard""are. CompAre with the firr,t vt•r(cid:173)
`sion in Figurl' 4.25. Tlu- Multiplicand rqiisler, ALU, nnd Multiplier Tl'~ister MP all J2 bits wid,•,
`"'ith only the, l'rodud rcgisMr lt>ft at 64 bit,. Now the product is shifl·c•d rii,:;ht. The,., d1nnges arc
`hi~hli)'\htect in color.
`
`Second Version of the Multiplication Algorlthm
`and Hardware
`Computer pioneers recognized that half of the bits of the multiplicand in the
`first algorithm were always 0, so only half could contain useful bit values. A
`full M-hit ALU thus seemed wasteful rmd slow since half of the adder bit~
`were adding Oto the intermediate sum,
`The original algorithm s'llifts the multiplicand left with Os inst•rted in tlw
`new positions, so the multiplicand cannot affect the least significant bits of the
`product after they settle down. lns te.:id of shifting the multiplicand left, they
`wondL•red, whal if we shiH the l'roducf l'i:,?/lf? Now lhe multiplicand would be
`fixed rel,1tive to tht.• product, ,md since we are adding only 32 bits, tl1e adder
`need be only 32 bits wide. Figure 4.28 shows how this change halves the
`widths of both the ALU ,md the multiplicand.
`Figure 4,29 shows the multiply algorithm inspired by this observation. This
`algorithm st.:irts with the 32-bit Multiplicand and 32-bit Multiplier registers set
`to their named values and the 64-bit Product register set to 0. This algorithm
`only forms a 32-bit sum, so only the left half of the 64-bit Product regislL'r is
`changed by the .idditinn.
`
`INTEL - 1012
`
`

`

`4,8 Multlpllcatlon
`
`255
`
`Start
`
`MultlpllerO = 1
`
`MultiplierO = 0
`
`la. Add multlpllcand to the left hair of
`the product and place the result In
`the left half of the Product register
`
`2. Shift the Product register right 1 bit
`
`3. Shift the Multiplier register right 1 bit
`
`Done
`
`FIGURE 4.29 The second multlpllcatlOII elgorlthm, ualnS: the hardware In FIS:ure 4.28. In
`this version, the Product register is shifted right instead of shif!ing the multiplicand. Color type
`shows the chal\ges from Figure 4.26.
`
`INTEL - 1012
`
`

`

`256
`
`Ch11pter 4 Arithmetic for Computers
`
`Second Multiply Algorithm
`
`Multiply 00101wo x 0011 twllusing the algorithm in Figure 4.29.
`
`Figure 4.30 shows the revised 4-bit example, again giving a product of
`0000 0l l01w,,·
`iiMiiM:F
`
`Step
`
`i@\MiH&i?MiMHi:i·MMM@N
`
`o
`1
`
`2
`
`3
`
`0000 0000
`0010
`00
`Initial values
`--4- --'----jf-------+--
`- - - - - -- --
`0010 0000
`la: 1 => Prod = Prod + Mcand
`0011
`0010
`00010000
`2: Shift right Product
`0011
`0010
`- - -----+- --~-ae--
`00010000
`00~
`0010
`3: Shift right Multlpl_ie_r ___ -+--- - -- -+ - - - --
`la: 1 ,,> Prod ,, Prod + Mcand
`0011 0000
`0001
`0010
`00011000
`2: Shift right Product
`0001
`0010
`00
`3: Shift right Multiplier
`0010
`00011000
`0010
`1: 0 => no operation ·
`0000
`00011000
`00001100
`0010
`2: Shift rilllll Proauct
`0000
`0000
`0010
`0000 1100
`3: Shifl right Multiplier
`1 : o => 110 operation
`0000
`0010
`00001100
`0000 0110
`0000
`0010
`2: Shifl right Product
`- - - - - --+---=~-
`0000
`0000 0110
`3: Shift right Multiplier
`0010
`
`FIGURE 4,30 Multiply example ualng Mcond algorithm in Figura 4. 29. The bit examined to
`determi.ne the next step is circled in color.
`
`Final Version of the Multiplication-Algorithm and
`Hardware
`The final ob!;ervali1m of the frugal computer piuneers was that the Product
`register had wallled ~pace that matched exactly the size of the multiplier: As
`the wasted space in the product disappears, so do the bits of tlw multiplier. In
`response, the third version of the multiplication algorithm combi.11eti the right(cid:173)
`most half of the product with the multiplier. figure 4.31 shows the hardware.
`The least significant bit of the 64-bit Product register (Product0) now is the bit
`to be tc~ted.
`The algorithm starts by assigning the multiplier to the right ha lf of the Prod(cid:173)
`uct register, placing O in the upper half. Figure 4.32 shows the new steps.
`
`INTEL - 1012
`
`

`

`4.6 Multlpllcatlon
`
`257
`
`Multiplicand
`
`Shift right
`Write
`
`Control
`test
`
`Product
`
`64 bits
`
`FIGURE 4.31 Third version of the multlpllcatlo(cid:127) hardware. Comparing w ith the second ver(cid:173)
`sion in Figure 4.28 on page 2S4, the separate Multiplier register has disappeared. The multiplier
`is placed instead in the right half of the Product register.
`
`Third Multiply Algorithm
`
`Multiply 001 Otwo x 00111wo using the algorithm in Figure 4.32.
`
`Figure 4.33 shows the revised 4-bit example for the final algorithm.
`
`Signed Multiplication
`Sofor we have dealt with positive numbers. The easiest way to understand
`how to deal with signed numbers is to first convert the multiplier and multi(cid:173)
`plicand to positive numbers and then remember the original signs. The algo(cid:173)
`rithms should then be run for 31 iterations, leaving the signs out of the
`calculation. As we learned in grammar school, we need negate the product
`only if the original signs disagree.
`It turns o ut that the last algorithm will work for signed numbers provided
`that we remember that the numbers we are dealing with have infinite digits,
`and that we are only representing them w1fh 32 bits. Hence the shifting steps
`would need to extend the sign of the product for signed numbers. When the
`algorithm completes, the lower word would have the 32-bit product.
`
`INTEL - 1012
`
`

`

`Chapter 4 Arithmetic for Computers
`
`Start
`
`Pro<:1uctO = 1
`
`ProduclO = O
`
`1a. Add multiplicand to the lett half of
`the product and place the result 111
`the left half of the Product register
`
`2. Shift the Product register right 1 bit
`
`Done
`
`FIGURE 4 .32 The third multlpllcatlon (cid:127)
`ICortlhm. It nel'<iS only two steps because thl! J'roduct
`and Multiplier regi,tl!rs have been combined. Color type shows changes from Figure 4.2'1.
`
`Mulllpllcand -
`
`NIMtdi:E
`
`Step
`
`Initial values
`la: 1 => Proo = Prod + Mcand
`2: Shirt right Product
`la: 1 => Prod = Prod + Mcand
`2: Shilt right Product - -- -
`1: 0 => no operation - - - -
`2: Shilt right Product
`- ----
`1; 0 •> no operation
`2: Shift right P-rOd_u_c_l _ _ _
`
`•
`
`I 0000 ~
`0010
`10- - -+ , - -0010 0011
`10
`0001 Ooo:!)
`10
`0011 0001
`00011ocg
`0010
`0001 1000
`0000 ll<@
`00001100
`0000 0110
`
`---- 0010
`
`00_10 _ _ _
`0010
`0010
`
`f'IGURE 4.83 MuNlply exampla ualn, third al.orlthm In Figure 4.32. The bit l!Xamined to
`determine, the next ,-tep is circled in rnlo r.
`
`INTEL - 1012
`
`

`

`4.6 Multlplicatlon
`
`259
`
`Booth's Algorithm
`A more elegant approach to multiplying signed numbers than above is called
`Booth's algorithm. It starts with the observation that with the ability to both
`add and subtract there are multiple ways to compute a product. Suppose we
`want to multiply 2ten by 6ten, or 0010two by OllOtwo:
`00l0two
`0ll0two
`
`X
`
`+
`0000
`+
`0010
`+
`0010
`+ 0000
`
`s hift (0 in multi pl i er)
`( 1 in multi pl i er)
`add
`( 1 in multi pl i er)
`add
`shift (0 in multi pl i er)
`
`00001 l00two
`Booth observed that an ALU that could add or subtract could get the same
`result in more than one way. For example, since
`
`6ten
`
`= - 2ten + Bten
`
`or
`
`0ll0two
`
`= - 00l0two + l000two
`
`we could replace a string of ls in the multiplier with an initial subtract when
`we first see a 1 and then later add when we see the bit after the last 1. For
`example,
`
`00l0two
`0ll0t\o/0
`
`X
`
`+
`
`0000 shift (0 in multiplier)
`sub (fir st 1 in multiplier)
`0010
`shift (middle of string of ls)
`+
`0000
`+_0_0_l_0 __ add (p rior step had la st 1)
`
`00001 l00two
`Booth invented this approach in a quest for speed because in machines of
`his era shifting was faster than addition. Indeed, for some patterns his algo(cid:173)
`rithm would be faster; it's our good fortune that it handles signed numbers as
`
`INTEL - 1012
`
`

`

`.....
`
`..., .............. b
`
`....., ..
`
`..., ..,
`
`..,.. .. .,...,..,.._,.,J
`
`...................... ......... ......... ,
`alone.
`If we are limited to looking at just 2 bits, we can t

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket