IPR2020-00532, No. 1012-14 Exhibit - Part 3 John L Hennessy David A Patterson, Computer Organization and Design The HardwareSoftware Interface 2d ed 1998 Hennessy (P.T.A.B. Feb. 7, 2020)

236
`
`Chapter 4 Arithmetic for Computers
`
`4.5 Constructing an Arithmetic Logic Unit
`
`237
`
`11
`
`Binvert
`
`Operation
`
`Carryln
`
`a-+-----+--.--i
`
`Result
`
`Carryout
`
`FIGURE 4.16 A 1-bit ALU that performs AND, OR, and addition on a and b or a and b. By
`selecting b (Binvert = 1) and setting Carry In to 1 in the _least significant bit of the ALU, we get
`two's complement subtraction of b from a instead of add1t1on of b to a.
`
`The adder will then calculate a + b + l. By selecting the inverted version of b,
`we get exactly what we want:
`
`a+b + I= a+(b+ I)= a+(- b) = a-b
`
`The simplicity of the hardware design of a two's complement adder helps
`explain why two's complement representation has become the universal stan(cid:173)
`dard for integer computer arithmetic.
`
`Tailoring the 32-Bit ALU to MIPS
`This set of operations-add, subtract, AND, OR-is found in the ALU of
`almost every computer. If we look at Figure 4.7 on page 228, we see that the
`operations of most MIPS instructions can be performed by this ALU. But the
`design of the ALU is incomplete.
`.
`.
`One instruction that still needs support is the set on less than mstruct10n
`(s l t ). Recall that the operation produces 1 if rs < rt, and 0 otherwise. Conse(cid:173)
`quently, s l t will set all but the least significant bit to 0, with the least signifi(cid:173)
`cant bit set according to the comparison. For the ALU to perform s l t , we first
`need to expand the three-input multiplexor in Figure 4.16 to add an input for
`the s l t result. We call that new input Less, and use it only for s l t .
`
`The top drawing of Figure 4.17 shows the new I-bit ALU with the expanded
`multiplexor. From the description of s l t above, we must connect Oto the Less
`input for the upper 31 bits of the ALU, since those bits are always set to 0. What
`remains to consider is how to compare and set the least significant bit for set on
`less than instructions.
`What happens if we subtract b from a? If the difference is negative, then a <
`b since
`
`(a - b) < 0 (cid:157)
`
`((a - b) + b) < (0 + b)
`a<b
`
`We want the least significant bit of a set on less than operation to be a 1 if a <
`b; that is, a 1 if a - b is negative and a 0 if it's positive. This desired result cor(cid:173)
`responds exactly to the sign-bit values: 1 means negative and 0 means posi(cid:173)
`tive. Following this line of argument, we need only connect the sign bit from
`the adder output to the least significant bit to get set on less than .
`Unfortunately, the Result output from the most significant ALU bit in the
`top of Figure 4.17 for the s l t operation is not the output of the adder; the ALU
`output for the s l t operation is obviously the input value Less.
`Thus, we need a new I-bit ALU for the most significant bit that has an extra
`output bit: the adder output. The bottom drawing of Figure 4.17 shows the de(cid:173)
`sign, with this new adder output line called Set, and used only for s l t. As long
`as we need a special ALU for the most significant bit, we added the overflow
`detection logic since it is also associated with that bit.
`Alas, the test of less than is a little more complicated than just described be(cid:173)
`cause of overflow; Exercise 4.23 on page 326 explores what must be done.
`Figure 4.18 shows the 32-bit ALU.
`Notice that every time we want the ALU to subtract, we set both Carryin
`and Binvert to l. For adds or logical operations, we want both control lines to
`be 0. We can therefore simplify control of the ALU by combining the Carryln
`and Binvert to a single control line called Bnegate.
`To further tailor the ALU to the MIPS instruction set, we must support con(cid:173)
`ditional branch instructions. These instructions branch either if two registers
`are equa l or if they are unequal. The easiest way to test equality with the ALU
`is to subtract b from a and then test to see if the result is 0 since
`(a - b = 0) (cid:157)
`a = b
`Thus, if we add hardware to test if the result is 0, we can test for equality.
`The simplest way is to OR all the outputs together and then send that signal
`through an inverter:
`Zero = (Result31 + Result30 + ... + Result2 + Resultl + Result0)
`Figure 4.19 shows the revised 32-bit ALU. We can think of the combination
`of the I-bit Bnegate line and the 2-bit Operation lines as 3-bit control lines for
`
`INTEL - 1012
`
`(cid:157)
`

`238
`
`Chapter 4 Arithmetic for Computers
`
`4.5 Constructing an Arithmetic Logic Unit
`
`239
`
`Binvert
`
`Operation
`Carryln
`
`a - - i - - - - - - - t - - - - -...
`
`Result
`
`Less -+------------+---+t3
`
`Carryout
`
`Binvert
`
`Operation
`Carryln
`
`a -+-----+----11>1
`
`0
`
`1
`
`2
`
`Result
`
`1
`
`Less -+------+-+-t-----t---t--i3
`
`------r-- Set
`
`Overflow
`detection
`
`r----"T""+ Overflow
`
`(Top) A 1-blt ALU that performs AND, OR, and addition on a and b orb, and
`FIGURE 4.17
`(bottom) a 1-blt ALU for the most significant bit. The top drawing includes a direct input that
`is connected to perform the set on less than operation (see Figure 4.18); the bottom has a direct out(cid:173)
`put from the adder for the less than comparison called Set. (Refer to Exercise 4.42 to see how to cal(cid:173)
`culate overflow with fewer inputs.)
`
`Binvert
`
`Carryln
`
`Operation
`
`aO
`bO
`
`al
`bl
`0
`
`a2
`b2
`0
`
`r-------r--- ResultO
`
`r-----"T""-. Resu ltl
`
`Carryln
`ALU2
`Less
`Carryout
`
`r-----"T""-. Result2
`
`Carryln
`
`a31
`b31
`0
`
`r - - - - - - - . Result31
`
`r------1--. Overflow
`
`FIGURE 4.18 A 32-bit ALU constructed from the 31 copies of the 1-bit ALU in the top of
`Figure 4.17 and one 1-bit ALU in the bottom of that figure. The Less inputs are connected to
`0 except for the least significant bit, and that is connected to the Set output of the most significant
`bit. If the ALU performs a - band w e select the input 3 in the multiplexor in Figure 4.17, then
`Result = 0 . .. 001 if a < b, and Result = 0 ... 000 otherwise.
`
`the ALU, telling it to perform add, subtract, AND, OR, or set on less than.
`Figure 4.20 shows the ALU control lines and the corresponding ALU opera(cid:173)
`tion.
`Finally, now that we have seen what is inside a 32-bit ALU, we will use the
`universal symbol for a complete ALU, as shown in Figure 4.21.
`
`INTEL - 1012
`
`

`240
`
`Chapter 4 Arithmetic for Computers
`
`4.5 Constructing an Arithmetic Logic Unit
`
`241
`
`Bnegate
`
`Operation
`
`ao
`bO
`
`a1
`b1
`0
`
`a2
`b2
`0
`
`a31
`b31
`0
`
`ResultO
`
`Less
`Carryout
`
`Result1 Dv--z,rn
`
`Result2
`
`Less
`Carryout
`
`Result31
`
`1 - - - - - - - , Set
`1-------+- - - - - - - - - - - - - Overflow
`
`Less
`
`FIGURE 4.19 The final 32-bit ALU. This adds a Zero detector to Figure 4.18.
`
`ALU control lines
`
`Function
`
`000
`001
`010
`110
`111
`
`and
`or
`add
`subtract
`set on less than
`
`FIGURE 4.20 The values of the three ALU control lines Bnegate and Operation and the
`corresponding ALU operations.
`
`ALU operation
`
`a
`
`b
`
`Carryout
`
`Zero
`Result
`Overflow
`
`FIGURE 4.21 The symbol commonly used to represent an ALU, as shown in Figure 4.19.
`This symbol is also used to represent an adder, so it is normally labeled either with ALU or
`Adder.
`
`Carry Lookahead
`
`The next question is, How quickly can this ALU add two 32-bit operands? We
`can determine the a and b inputs, but the Carry In input depends on the oper(cid:173)
`ation in the adjacent 1-bit adder. If we trace all the way through the chain of
`dependencies, we connect the most significant bit to the least significant bit,
`so the most significant bit of the sum must wait for the sequential evaluation of
`all 32 1-bit adders. This sequential chain reaction is too slow to be used in
`time-critical hardware.
`There are a variety of schemes to anticipate the carry so that the worst-case
`scenario is a function of the log2 of the number of bits in the adder. These an(cid:173)
`ticipatory signals are faster because they go through fewer gates in sequence,
`but it takes many more gates to anticipate the proper carry.
`A key to understanding fast carry schemes is to remember that, unlike soft(cid:173)
`ware, hardware executes in parallel whenever inputs change.
`
`Fast Carry Using "Infinite" Hardware
`Appendix B mentions that any equation can be represented in two levels of
`logic. Since the only external inputs are the two operands and the Carryln to
`the least significant bit of the adder, in theory we could calculate the Carryln
`values to all the remaining bits of the adder in just two levels of logic.
`For example, the Carry In for bit 2 of the adder is exactly the CarryOut of bit
`1, so the formula is
`Carryln2 = (b I · Carry In I)+ (a I · Carrylnl) + (a I· b I)
`Similarly, Carrylnl is defined as
`Carrylnl = (b0 · Carryln0) + (a0 · Carryln0) + (a0 · 60)
`
`INTEL - 1012
`
`

`242
`
`Chapter 4 Arithmetic for Computers
`
`Using the shorter and more traditional abbreviation of ci for Carrylni we can
`rewrite the formulas as
`'
`
`c2 = ( b 1 · c 1 ) + ( a 1 · c 1) + ( a 1 · b 1 )
`cl= (b0-c0)+(a0-c0)+(aO-bO)
`
`Substituting the definition of cl for the first equation results in this formula:
`c2 = (al ·a0-b0)+(al-a0-c0)+(al -b0-cO)
`+(bl -a0-b0)+(bl ·a0-cO)+(bl -b0-cO)+(al -bl)
`
`You ca~ imagine how the equation expands as we get to higher bits in the
`adder; rt _grows exponentially with the number of bits. This complexity is
`refle~t~~ m the cost of the hardware for fast carry, making this simple scheme
`proh1b1hvely expensive for wide adders.
`
`Fast Carry Using the First Level of Abstraction: Propagate and
`Generate
`
`Most fast carr~ sch~mes li~it the complexity of the equations to simplify the
`hardware, while still makmg substantial speed improvements over ripple
`carry. One such scheme is a carry-lookahead adder. In Chapter 1, we said com(cid:173)
`puter systems cope _with complexity by using levels of abstraction. A carry(cid:173)
`lookahead adder relies on levels of abstraction in its implementation.
`Let's factor our original equation as a first step:
`
`ci+l = (bi· ci) + (ai • ci) + (ai. bi)
`= (ai ·bi)+ (ai +bi)· ci
`
`If we were to rewrite the equation for c2 using this formula, we would see
`some repeated patterns:
`
`c2= (al -bl)+(al +bl)·((a0-b0)+(aO+bO)-cO)
`
`Note the repeated appearance of (ai • bi) and (ai + bi) in the formula above.
`T~ese two important factors are traditionally called generate (gi) and propagate
`(pz):
`
`I
`I
`I I
`
`Using them to define ci+ 1, we get
`
`gi = ai · bi
`pi = ai + bi
`
`ci+ 1 = gi + pi • ci
`To see where the signals get their names, suppose gi is 1. Then
`
`ci+l = gi +pi· ci = I+ pi• ci = 1
`
`4.5 Constructing an Arithmetic Logic Unit
`
`243
`
`That is, the adder generates a CarryOut (ci+ 1) independent of the value of Car(cid:173)
`ry In (ci). Now suppose that gi is O and pi is 1. Then
`
`ci+l = gi +pi· ci = 0 + I · ci = ci
`
`That is, the adder propagates Carryln to a CarryOut. Putting the two together,
`Carrylni+l is a 1 if either gi is 1 or both pi is 1 and Carrylni is 1.
`As an analogy, imagine a row of dominoes set on edge. The end domino can
`be tipped over by pushing one far away provided there are no gaps between
`the two. Similarly, a carry out can be made true by a generate far away provid(cid:173)
`ed all the propagates between them are true.
`Relying on the definitions of propagate and generate as our first level of ab(cid:173)
`straction, we can express the Carryln signals more economically. Let's show it
`for 4 bits:
`
`cl = go+ (pO • cO)
`
`c2 = gl + (pl · gO) + (pl · pO · cO)
`
`c3 = g2 + (p2 · gl) + (p2 · pl · gO) + (p2 · pl · pO · c0)
`
`c4 = g3 + (p3 · g2) + (p3 · p2 · gl) + (p3 · p2 · pl · g0)
`+ (p3 · p2 · pl · pO · cO)
`
`These equations just represent common sense: Carrylni is a 1 if some earlier
`adder generates a carry and all intermediary adders propagate a carry. Figure
`4.22 uses plumbing to try to explain carry lookahead.
`Even this simplified form leads to large equations and, hence, considerable
`logic even for a 16-bit adder. Let's try moving to two levels of abstraction.
`
`Fast Carry Using the Second Level of Abstraction
`First we consider this 4-bit adder with its carry-lookahead logic as a single
`building block. If we connect them in ripple carry fashion to form a 16-bit
`adder, the add will be faster than the original with a little more hardware.
`To go faster, we'll need carry lookahead at a higher level. To perform carry
`lookahead for 4-bit adders, we need propagate and generate signals at this
`higher level. Here they are for the four 4-bit adder blocks:
`
`PO = p3 · p2 · p 1 · pO
`Pl = p7 · p6 · pS · p4
`P2 = p 11 · p 10 · p9 · p8
`P3 = p15 · p14 · p13 · p12
`
`That is, the "super" propagate signal for the 4-bit abstraction (Pi) is true only
`if each of the bits in the group will propagate a carry.
`
`INTEL - 1012
`
`

`244
`
`Chapter 4 Arithmetic for Computers
`
`4.5 Constructing an Arithmetic Logic Unit
`
`245
`
`CO = g3 + (p3 · g2) + (p3 · p2 · gl) + (p3 · p2 · p1 · gO)
`GI = g7 + (p7 · g6) + (p7 · p6 · g5) + (p7 · p6 · p5 · g4)
`C2 = gll + (p11 · glO) + (pll · p10 · g9) + (pll · p10 · p9 · g8)
`C3 = g15 + (p15 · g14) + (p15 · p14 · g13) + (p15 · p14 · p13 · g12)
`Figure 4.23 updates our plumbing analogy to show PO and GO.
`
`c4
`
`FIGURE 4.22 A plumbing analogy for carry lookahead for 1 bit, 2 bits, and 4 bits using
`water, pipes, and valves. The wrenches are turned to open and close valves. Water is shown in
`color. The output of the pipe (ci+l) will be full if either the nearest generate value (gi) is turned on
`or if the i propagate value (pi) is on and there is water further upstream, either from an earlier
`generate, or propagate with water behind it. Carryln (cO) can result in a carry out without the
`help of any generates, but with the help of nil propagates.
`
`For the "super" generate signal (Gi), we care only if there is a carry out of
`the most significant bit of the 4-bit group. This obviously occurs if generate is
`true for that most significant bit; it also occurs if an earlier generate is true and
`all the intermediate propagates, including that of the most significant bit, are
`also true:
`
`GO
`
`FIGURE 4.23 A plumbing analogy for the next-level carry-lookahead signals PO and GO.
`PO i, open only if all four propagates (pi) are open, while water flows in GO only if at least one
`generate (gi) is open and all the propagates downstream from that generate arc open.
`
`INTEL - 1012
`
`

`246
`
`Chapter 4 Arithmetic for Computers
`
`4.5 Constructing an Arithmetic Logic Unit
`
`247
`
`Carryln
`I
`!
`
`Carryln
`
`ALUO
`PO
`GO
`
`!
`Carryln
`
`ALUl
`Pl
`Gl
`
`l
`
`Carryln
`
`ALU2
`P2
`G2
`
`l
`
`Carryln
`
`ALU3
`P3
`G3
`
`ao (cid:173)
`bo (cid:173)
`al (cid:173)
`bl (cid:173)
`a2 -
`b2 -
`a3 -
`b3 -
`
`a4 -
`b4 -
`a5 -
`b5 -
`a6 -
`b6 -
`a7 -
`b7 -
`
`a8 -
`b8 -
`a9 -
`b9 -
`a1O (cid:173)
`b10 -
`a11 -
`bll -
`
`a12 -
`b12 -
`a13 -
`b13 -
`a14 -
`b14 -
`a15 -
`b15 -
`
`pi
`gi
`
`Cl
`
`ci + 1
`
`pi+ 1
`gi+ 1
`
`C2
`
`ci+ 2
`
`pi+ 2
`gi+ 2
`
`C3
`
`Ci+ 3
`
`pi+ 3
`gi + 3
`
`ci + 4
`
`C4
`
`ResultO- 3
`
`C arry-lookahead unit
`
`Result4-7
`
`Result8- 11
`
`Result12- 15
`
`Carryout
`
`FIGURE 4.24 Four 4-blt ALUs using carry lookahead to form a 16-blt adder. Note that the
`carries come from the carry-lookahead unit, not from the 4-bit AL Us.
`
`Then the equations at this higher level of abstraction for the carry in for each
`4-bit group of the 16-bit adder (Cl, C2, C3, C4 in Figure 4.24) are very similar
`to the carry out equations for each bit of the 4-bit adder (cl, c2, c3, c4) on page
`243:
`
`Cl = GO+ (PO · c0 )
`
`C2 = Gl + (Pl · GO ) + (Pl · PO· c0 )
`C3 = G2 + (P2 · Gl) + (P2 · Pl · GO)+ (P2 · Pl · PO· c0)
`
`C4 = G3 + (P3 · G2) + (P3 · P2 · Gl) + (P3 · P2 ·Pl · GO)
`+ (P3 · P2 · Pl · PO · c0 )
`
`Figure 4.24 shows 4-bit adders connected with such a carry lookahead unit.
`Exercises 4.44 through 4.48 explore the speed differences between these carry
`schemes, different notations for multibit propagate and generate signals, and
`the design of a 64-bit adder.
`
`Example
`
`Answer
`
`Both Levels of the Propagate and Generate
`
`Determine the gi, pi, Pi, and Gi values of these two 16-bit numbers:
`a :
`0001 1010 0011 OOlltwo
`b:
`1110 0101 1110 lOlltwo
`Also, what is CarryOutlS (C4)?
`
`Aligning the bits makes it easy to see the values of generate gi (ai · bi) and
`propagate pi (ai +bi) :
`a :
`0001 1010 0011 0011
`b:
`1110 0101 1110 1011
`0000 0000 0010 0011
`gi :
`1111 1111 1111 1011
`pi :
`where the bits are numbered 15 to 0 from left to right. Next, the "super"
`propagates (P3, P2, Pl , PO) are simply the AND of the lower-level propa(cid:173)
`gates:
`P3 = 1 . I. I. I =
`I =
`P2 = 1 . I
`I
`I =
`Pl = I
`I
`I = 0
`PO = l · 0 · I
`
`I
`
`INTEL - 1012
`
`

`248
`
`Chapter 4 Arithmetic for Computers
`
`4.5 Constructing an Arithmetic Logic Unit
`
`249
`
`The "super" generates are more complex, so use the following equations:
`GO = g3 + (p3 · g2) + (p3 · p2 · gl) + (p3 · p2 · pl · g0)
`= 0 + (1 · 0) + (1 · 0 · 1) + (1 · 0 · 1 · 1) = 0 + 0 + 0 + 0 = 0
`GI = g7 + (p7 · g6) + (p7 · p6 · g5) + (p7 · p6 · p5 · g4)
`= 0 + (1 · 0) + (1 · 1 · 1) + (1 · 1 · 1 · 0) = 0 + 0 + 1 + 0 = 1
`G2 = gll + (pll · glO) + (pll · plO · g9) + (pll · plO · p9 · g8)
`0 + (1 · 0) + (1 · 1 · 0) + (1 · 1 · 1 · 0) = 0 + 0 + 0 + 0 = 0
`G3 = g15 + (p15 · g14) + (p15 · p14 · g13) + (p15 · p14 · p13· g12)
`= 0 + (1 · 0) + (1 · 1 · 0) + (1 · 1 · 1 · 0) = 0 + 0 + 0 + 0 = 0
`Finally, CarryOut15 is
`C4 = G3 + (P3 · G2) + (P3 · P2 · Gl) + (P3 · P2 · Pl · GO)
`+ (P3 · P2 · Pl ·PO· c0)
`= 0 + (1 · 0) + (1 · 1 · 1) + (1 · 1 · 1 · 0) + (1 · 1 · 1 · 0 · 0)
`= 0+0+1+0+0=1
`Hence there is a carry out when adding these two 16-bit numbers.
`
`The reason carry lookahead can make carries faster is that all logic begins
`evaluating the moment the clock cycle begins, and the result will not change
`once the output of each gate stops changing. By taking a shortcut of going
`through fewer gates to send the carry in signal, the output of the gates will
`stop changing sooner, and hence the time for the adder can be less.
`To appreciate the importance of carry lookahead, we need to calculate the
`relative performance between it and ripple carry adders.
`
`Example
`
`Speed of Ripple Carry versus Carry Lookahead
`
`One simple way to model time for logic is to assume each AND or OR gate
`takes the same time for a signal to pass through it. Time is estimated by
`simply counting the number of gates along the longest path through a
`piece of logic. Compare the number of gate delays for the critical paths of
`two 16-bit adders, one using ripple carry and one using two-level carry
`lookahead.
`
`Figure 4.13 on page 233 shows that the carry out signal takes two gate de(cid:173)
`lays per bit. Then the number of gate delays between a carry in to the least
`significant bit and the carry out of the most significant is 16 x 2 = 32.
`
`For carry lookahead, the carry out of the most significant bit is just C4,
`defined in the example. It takes two levels of logic to specify C4 in terms
`of Pi and Gi (the OR of several AND terms) . Pi is specified in one level of
`logic (AND) using pi, and Gi is specified in two levels using pi and gi, so
`the worst case for this next level of abstraction is two levels of logic. pi and
`gi are each one level of logic, defined in terms of ai and bi. If we assume
`one gate delay for each level of logic in these equations, the worst case is
`2 + 2 + I = 5 gate delays.
`Hence for 16-bit addition a carry-lookahead adder is six times faster,
`using this simple estimate of hardware speed.
`
`Summary
`
`The primary point of this section is that the traditional ALU can be con(cid:173)
`structed from a multiplexor and a few gates that are replicated 32 times. To
`make it more useful to the MIPS architecture, we expand the traditional ALU
`with hardware to test if the result is 0, detect overflow, and perform the basic
`operation for set on less than.
`Carry lookahead offers a faster path than waiting for the carries to ripple
`through all 32 1-bit adders. This faster path is paved by two signals, generate
`and propagate. The former creates a carry regardless of the carry input, and the
`other passes a carry along. Carry lookahead also gives another example of how
`abstraction is important in computer design to cope with complexity.
`
`Elaboration: We have now accounted for all but one of the arithmetic and logical
`operations for the core MIPS instruction set: the ALU in Figure 4 .21 omits support of
`shift instructions. It would be possible to widen the ALU multiplexor to include a left
`shift by 1 bit or right shift by 1 bit. But hardware designers have created a circuit called
`a barrel shifter, which can shift from 1 to 31 bits in no more time than it takes to add
`two 32-bit numbers, so shifting is normally done outside the ALU.
`
`Elaboration: The logic equation for the Sum output of the full adder on page 234 can
`be expressed more simply by using a more powerful gate than AND and OR. An exclu(cid:173)
`sive OR gate is true if the two operands disagree; that is,
`x ic- y (cid:157)
`1 and x == y (cid:157) 0
`In some technologies , exclusive OR is more efficient than two levels of AND and OR
`gates. Using the symbol EB to represent exclusive OR, here is the new equation:
`
`Sum= a EB b EB Carryln
`
`Also, we have drawn the ALU the traditional way, using gates. Computers are
`designed today in CMOS transistors , which are basically switches. CMOS ALU and bar(cid:173)
`rel shifters take advantage of these switches and have many fewer multiplexors than
`shown in our designs, but the design principles are similar.
`
`INTEL - 1012
`
`

`250
`
`Chapter 4 Arithmetic for Computers
`
`II Multiplication
`
`Multiplication is vexation,
`Division is as bad;
`The rule of three doth puzzle me,
`And practice drives me mad.
`
`Anonymous, Elizabethan manuscript, 1570
`
`With the construction of the ALU and explanation of addition, subtraction,
`and shifts, we are ready to build the more vexing operation of multiply.
`But first let's review the multiplication of decimal numbers in longhand to
`remind ourselves of the steps and the names of the operands. For reasons that
`will become clear shortly, we limit this decimal example to using only the dig(cid:173)
`its O and 1. Multiplying lOOOten by lOOlten:
`
`X
`
`Multiplicand
`Multiplier
`
`l000ten
`l00lten
`1000
`0000
`0000
`1000
`Product
`1001 000ten
`The first operand is called the multiplicand and the second the multiplier. The
`final result is called the product. As you may recall, the algorithm learned in
`grammar school is to take the digits of the multiplier one at a time from right
`to left, multiplying the multiplicand by the single digit of the multiplier and
`shifting the intermediate product one digit to the left of the earlier inter(cid:173)
`mediate products.
`The first observation is that the number of digits in the product is consider(cid:173)
`ably larger than the number in either the multiplicand or the multiplier. In fact,
`if we ignore the sign bits, the length of the multiplication of an n-bit multipli(cid:173)
`cand and an m-bit multiplier is a product that is n + m bits long. That is, n + m
`bits are required to represent all possible products. Hence, like add, multiply
`must cope with overflow because we frequently want a 32-bit product as the
`result of multiplying two 32-bit numbers.
`In this example we restricted the decimal digits to O and 1. With only two
`choices, each step of the multiplication is simple:
`
`1. Just place a copy of the multiplicand (1 x multiplicand) in the proper
`place if the multiplier digit is a 1, or
`
`2. Place O (0 x multiplicand) in the proper place if the digit is 0.
`
`4.6 Multiplication
`
`251
`
`Although the decimal example above happened to use only O and 1, multipli(cid:173)
`cation of binary numbers must always use O and 1, and thus always offers
`only these two choices.
`Now that we have reviewed the basics of multiplication, the traditional next
`step is to provide the highly optimized multiply hardware. We break with tra(cid:173)
`dition in the belief that you will gain a better understanding by seeing the evo(cid:173)
`lution of the multiply hardware and algorithm through three generations. The
`rest of this section presents successive refinements of the hardware and the
`algorithm until we have a version used in some computers. For now, let's
`assume that we are multiplying only positive numbers.
`
`First Version of the Multiplication Algorithm and Hardware
`The initial design mimics the algorithm we learned in grammar school; the
`hardware is shown in Figure 4.25. We have drawn the hardware so that data
`flows from top to bottom to more closely resemble the paper-and-pencil
`method.
`Let's assume that the multiplier is in the 32-bit Multiplier register and that
`the 64-bit Product register is initialized to 0. From the paper-and-pencil exam(cid:173)
`ple above, it's clear that we will need to move the multiplicand left one digit
`each step as it may be added to the intermediate products. Over 32 steps a
`
`-Multiplicand
`
`Shift left
`
`-Multiplier
`
`Shift right
`
`32 bits
`
`Write,__ .....
`
`Control test
`
`64-bit ALU
`
`Product
`
`64 bits
`
`FIGURE 4.25 First version of the multiplication hardware. The Multiplicand register, ALU,
`and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. The
`32-bit multiplicand starts in the right half of the Multiplicand register, and is shifted left 1 bit on
`each step. The multiplier is shifted in the opposite direction at each step. The algorithm starts
`with the product initialized to 0. Control decides when to shift the Multiplicand and Multiplier
`registers and when to write new values into the Product register.
`
`INTEL - 1012
`
`

`252
`
`Chapter 4 Arithmetic for Computers
`
`4.6 Multiplication
`
`253
`
`32-bit multiplicand would move 32 bits to the left. Hence we need a 64-bit Mul(cid:173)
`tiplicand register, initialized with the 32-bit multiplicand in the right half and
`0 in the left half. This register is then shifted left 1 bit each step to align the mul(cid:173)
`tiplicand with the sum being accumulated in the 64-bit Pro~uct register. .
`.
`Figure 4.26 shows the three basic steps needed for each bit. The l~ast s1gm~(cid:173)
`icant bit of the multiplier (Multiplier0) determines whether the multiplicand 1s
`
`added to the Product register. The left shift in step 2 has the effect of moving
`the intermediate operands to the left, just as when multiplying by hand. The
`shift right in step 3 gives us the next bit of the multiplier to examine in the fol(cid:173)
`lowing iteration. These three steps are repeated 32 times to obtain the
`product.
`
`Start
`
`First Multiply Algorithm
`
`Using 4-bit numbers to save space, multiply 2ten x 3tew or 0010two
`X 0011two·
`
`Figure 4.27 shows the value of each register for each of the steps labeled
`according to Figure 4.26, with the final value of 0000 0110two or 6ten· Color
`is used to indicate the register values that change on that step, and the bit
`circled is the one examined to determine the operation of the next step.
`
`~•~1~M~- ~a~.1~.•~~~~~~~~~~~~~~~~~~~~~~~•~x~M~1~™~·~1m~•~~~~~~~-MMM
`1----o __ t--ln_it_ia_l v_a_lu_e_s ______ --+ __ 0_0_1_(JJ _+--_0_0_0_0_0_0_1_0~
`0000 0~
`la: 1 (cid:157)
`Prod = Prod + Mcand
`0000 0010
`1
`0011
`0000 0010
`j
`f-2- :_S_h-ift- le_ft_M_u-lt-ip-lic_a_n_d _ ____ 0_0_1_1_-+-~00~0~0- 0~1~0~0--+-' 0000 0010
`
`1j
`
`
`
`Step
`
`0000 0010
`00000110~
`
`J
`I
`
`MultiplierO = 1
`
`MultiplierO = 0
`
`la. Add multiplicand to product and
`place the result in Product register
`
`2 . Shift the Multiplicand register left 1 bit
`
`3 . Shift the Multiplier register right 1 bit
`
`No: < 32 repetitions
`
`32 repetitions
`
`(_oo_ne )
`
`FIGURE 4.26 The first multiplication algorithm, using the hardware shown in Figure 4.25.
`If the least significant bit of the multiplier is 1, add the multiplicand to the product. If not, go to
`the next step. Shift the multiplicand left and the multiplier right in the next two steps. These three
`steps are repeated 32 times.
`
`OOLQJ
`3: Shift right Multiplier
`0000 0100
`~ - - - t - - - - - - - - - - - - - - - t - - - - - -+ - - - - - -~
`la: 1 (cid:157) Prod = Prod + Mcand
`2
`0001
`0000 0100
`2: Shift left Multiplicand
`0001
`_ 0000 1000
`/
`0000 0110
`0000 ~~~~ ~-
`0000 0110
`0000 ~
`0000 0110
`0001 0000
`0000 0110
`0000 0110
`0000 0110
`0000 0110
`0000 0110
`
`-
`
`3
`
`4
`
`3: Shift right Multiplier
`1: 0 (cid:157)
`no operation
`2: Shift left Multiplicand
`3: Shift right Multiplier
`1: O (cid:157)
`no operation
`2: Shift left Multiplicand
`3: Shift right Multiplier
`
`00~
`0000
`0000
`OOl\;)
`
`0000
`0000
`0000
`
`-~
`
`I
`I
`
`0001 0000
`
`00010~~
`001000~ I
`1
`0010 0000
`- - - I
`
`FIGURE 4.27 Multiply example using first algorithm in Figure 4.26. The bit exilmined to
`determine the next step is circled in color.
`
`If each step took a clock cycle, this algorithm would require almost 100 clock
`cycles to multiply two 32-bit numbers. The relative importance of arithmetic
`operations like multiply varies with the program, but addition and subtraction
`may be anywhere from 5 to 100 times more popular than multiply. According(cid:173)
`ly, in many applications, multiply can take multiple clock cycles without sig(cid:173)
`nificantly affecting performance. Yet Amdahl's law (see Chapter 2, page 75)
`reminds us that even a moderate frequency for a slow operation can limit
`performance.
`
`INTEL - 1012
`
`

`254
`
`Chapter 4 Arithmetic for Computers
`
`4.6 Multiplication
`
`255
`
`Multiplicand
`
`Shift right ,_ _ _
`Write ..---...
`
`Product
`
`64 bits
`
`-Multiplier
`
`Shift right
`
`32 bits
`
`FIGURE 4.28 Second version of the multiplication hardware. Compare with the first ver(cid:173)
`sion in Figure 4.25. The Multiplicand register, ALU, and Multiplier register are all 32 bits wide,
`with only the Product register left at 64 bits. Now the product is shifted right. These changes are
`highlighted in color.
`
`Second Version of the Multiplication Algorithm
`and Hardware
`Computer pioneers recognized that half of the bits of the multiplicand in the
`first algorithm were always 0, so only half could contain useful bit values. A
`full 64-bit ALU thus seemed wasteful and slow since half of the adder bits
`were adding Oto the intermediate sum.
`The original algorithm shifts the multiplicand left with Os inserted in the
`new positions, so the multiplicand cannot affect the least significant bits of the
`product after they settle down. Instead of shifting the multiplicand left, they
`wondered, what if we shift the product right? Now the multiplicand would be
`fixed relative to the product, and since we are adding only 32 bits, the adder
`need be only 32 bits wide. Figure 4.28 shows how this change halves the
`widths of both the ALU and the multiplicand.
`Figure 4.29 shows the multiply algorithm inspired by this observation. This
`algorithm starts with the 32-bit Multiplicand and 32-bit Multiplier registers set
`to their named values and the 64-bit Product register set to 0. This algorithm
`only forms a 32-bit sum, so only the left half of the 64-bit Product register is
`changed by the addition.
`
`Start
`
`MultiplierO = 1
`
`MultiplierO = 0
`
`1a. Add multiplicand to the left half of
`the product and place the result in
`the left half of the Product register
`
`2. Shift the Product register right 1 bit
`
`3. Shift the Multiplier register right 1 bit
`
`Done
`
`FIGURE 4.29 The second multiplication algorithm, using the hardware In Figure 4.28. Jn
`this vers10n, the Product register is shifted right instead of shifting the multiplicand. Color type
`shows the changes from Figure 4.26.
`
`INTEL - 1012
`
`

`256
`
`Chapter 4 Arithmetic for Computers
`
`4.6 Multiplication
`
`257
`
`Second Multiply Algorithm
`
`Multiply 0010two X 001ltwousing the algorithm in Figure 4.29.
`
`Figure 4.30 shows the revised 4-bit example, again giving a product of
`0000 0110two·
`
`Multiplicand
`
`32 bits
`
`-Product
`
`64 bits
`
`Shift right .._ _ _,
`Write - - -
`
`Control
`test
`
`FIGURE 4.31 Third version of the multiplication hardware. Comparing with the second ver(cid:173)
`sion in Figure 4.28 on page 254, the separate Multiplier register has disappeared. The multiplier
`is placed instead in the right half of the Product register.
`
`Third Multiply Algorithm
`
`Multiply 0010two x 0011two using the algorithm in Figure 4.32.
`
`Figure 4.33 shows the revised 4-bit example for the final algorithm.
`
`Signed Multiplication
`
`Sofar we have dealt with positive numbers. The easiest way to understand
`how to deal with signed numbers is to first convert the multiplier and multi(cid:173)
`plicand to positive numbers and then remember the original signs. The algo(cid:173)
`rithms should then be run for 31 iterations, leaving the signs out of the
`calculation. As we learned in grammar school, we need negate the product
`only if the original signs disagree.
`It turns out that the last algorithm will work for signed numbers provided
`that we remember that th

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases