IPR2020-00532, No. 1012-13 Exhibit - Part 2 John L Hennessy David A Patterson, Computer Organization and Design The HardwareSoftware Interface 2d ed 1998 Hennessy (P.T.A.B. Feb. 7, 2020)

168
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.10 An Example to Put It All Together
`
`169
`
`add
`add
`lw
`lw
`s lt
`beq
`
`$tl. $tl,$tl
`HZ. $a0 . $tl
`$t3 . 0( HZ)
`H4 . 4( HZ)
`H O, H4. $t3
`HO, $zero. ex it2
`
`j * 4
`# reg $tl
`V + ( j * 4)
`# reg HZ
`v[j]
`# reg H3
`v[j + 1]
`# reg $t4
`reg $t0 = 0 if $t4
`#
`ff go to exit2 if $t4
`
`~ $t3
`~ $t3
`
`(body of second for loop)
`
`addi
`j
`exit2:
`
`$sl . $sl . -1
`for2tst
`
`ff
`- 1
`j = j
`ff jump to test of inner loop
`
`The Procedure Call in sort
`The next step is the body of the second for loop:
`swap(v,j);
`
`Calling s wap is easy enough:
`
`jal
`
`swap
`
`Passing Parameters in sort
`The problem comes when we want to pass parameters because the s o rt pro(cid:173)
`cedure needs the values in registers $a0 and $al, ~et the swap pr~ce~ure
`needs to have its parameters placed in those same reg1st~rs. _One solution 1s to
`the parameters for sort into other registers earlier m the procedure,
`:r:ing registers $ a 0 and $ a 1 available for the call of swap . (This copy_is faster
`than saving and restoring on the stack.) We first copy $ a 0 and $ a 1 mto $ s 2
`and $s3 during the procedure:
`ff copy parameter $a0 into $s2
`move
`$s2. $a0
`ff copy parameter $a l into $s3
`move
`$s3. $al
`Then we pass the parameters to swap with these two instructions:
`ff first swap parameter is v
`move
`$a0
`$s2
`ff second swap parameter is j
`move
`$al: $sl
`
`Preserving Registers in sort
`The only remaining code is the saving and re~toring of registers. Clearly w_e
`must save the return address in register $r a, smce sort 1s a procedure and 1s
`
`called itself. The so rt procedure also uses the saved registers $s 0, $s 1, $s 2,
`and $s3, so they must be saved . The prologue of th e s ort procedure is th en
`ff make r oom on stack fo r 5 r egs
`addi
`$sp , $sp , - 20
`ff s ave $ ra on s tack
`sw
`$r a , 16 ($s p)
`ff s av e $s 3 on s t ack
`sw
`$s 3 , 12 ( $sp )
`ff s ave $s 2 on stac k
`sw
`$s 2 . 8 ( $sp )
`ff s av e $sl on s t ac k
`sw
`$sl , 4($sp )
`ff save $s0 on s t ac k
`sw
`$s0, 0($sp)
`The tail of the procedure simply reverses all these instructions, then adds a j r
`to return.
`
`The Full Procedure sort
`Now we put all the pieces together in Figure 3.26, being careful to replace ref(cid:173)
`erences to registers $ a 0 and $ a 1 in th e for loops with re ferences to registers
`$ s 2 and $ s 3. Once again to make the code easier to follow, we identify each
`block of code with its purpose in the procedure. In this example, 9 lines of th e
`sort procedure in C beca me the 35 lines in the MIPS assembly language.
`
`Elaboration: One optimization that would work we ll
`in this example is procedure
`inlining. Instead of passing arguments in parameters and invoking the code with a j a l
`instruction , the compiler would copy the code from the body of the s wap procedure
`where the call to s wap appears in the code. lnlining would avoid four instructions in
`this example. The downside of the inlining optimization is that the compi led code would
`be bigger, assuming that the inlined procedure is cal led from several locations . Such a
`code expansion might turn into lower performance if it increased the cache miss rate;
`see Chapter 7.
`The MIPS compilers always save room on the stack for the arguments in case they
`need to be stored, so in reality they always decrement $ s p by 16 to make room for all
`4 argument registers (16 bytes) . One reason is that C provides a v a r a rg option that
`allows a pointer to pick, say, the third argument to a procedure. Wh en the com piler
`encounters the rare va r a r g, it copies the registers onto the stack into the reserved
`locations.
`
`INTEL - 1012
`
`

`170
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.11 Arrays versus Pointers
`
`171
`
`sort :
`
`addi
`SW
`
`SW
`
`SW
`
`SW
`
`SW
`
`$sp , $sp . -20
`$ ra , 16($spl
`$s3 , 12( $sp)
`$s2 , 8($sp)
`$s 1. 4($sp)
`$s0 , 0($sp)
`
`make room on stack for 5 registers
`save $ ra on stack
`save $s3 on stack
`save $s2 on stack
`save $sl on stack
`save $s0 on stack
`
`$s2, $a0
`$s3 , $al
`$s0 , $zero
`HO , $s0 , $s3
`H O. $ze r o . exitl
`$s 1. $s0 . -1
`H O, $s 1. 0
`$t0 , $zero , exit2
`Hl , $s 1. $sl
`$tl' $tl, Hl
`$t2 , $s2 , Hl
`$t3 , 0( $t2)
`$t4 , 4($t2)
`$t0 , $t 4 , H3
`H O, $zero , exit2
`$a0 , $s2
`$a 1. $s1
`swap
`$s 1. $5 l. - 1
`for2tst
`$s0 , $s0 ,
`fo r ltst
`
`Move parameters I
`
`Outer loop
`
`Inner loop
`
`Pass parameters
`and call
`
`Inner loop
`
`Outer loop
`
`move
`move
`move
`forltst : slt
`be q
`ad di
`I for2tst : s lt i
`bne
`add
`add
`add
`lw
`lw
`slt
`beq
`mo ve
`move
`jal
`addi
`j
`exit2 : addi
`j
`
`exi tl :
`
`lw
`l w
`lw
`lw
`lw
`addi
`
`$s0 . 0($sp)
`$s 1. 4 ( $s p )
`$s2, 8( $sp)
`$s3 , 12($sp)
`$r a , 16($sp)
`$sp , $sp , 20
`
`It restore $s0 from stack
`j/ restore $s1 from stac k
`It restore $s2 from stack
`j/ restore $s3 from stack
`j/ restore $ ra from stack
`It restore stack pointer
`Procedure return
`It return to calling routine
`
`jr
`
`$ra
`
`FIGURE 3.26 MIPS assembly version of procedure sort in Figure 3.25 on page 166.
`
`Saving registers
`It
`It
`It
`It
`It
`It
`Procedure body
`It
`copy parameter $a0 into $ s2 (save $a0 )
`
`It
`copy parameter $al into $
`s3 (save $al)
`It
`i = 0
`It
`reg $t0 = 0 if $s0 2 $s3
`It
`go to e xi tl if $s0 2 $s3
`It
`j = i - 1
`reg HO = 1 i f $sl < 0 ( j
`It
`go to exit2 if $sl < 0 ( j
`It
`reg Hl = j * 2
`It
`It
`j * 4
`reg Hl
`V + ( j * 4 )
`It
`reg $t2
`It
`V [ j]
`reg $t3
`v[j + 1]
`It
`reg $t4
`It
`reg HO= 0 if H4 2 H3
`It
`go t o exit2 if $t4 2 $t3
`It
`1st pa r ameter of s wap is
`V
`It
`j
`2nd parameter of swap is
`It swap code shown in Figur e 3 . 24
`It j = j - 1
`It jump to test of inner loop
`It i = i + 1
`It jump to test of outer loop
`Restoring registers
`
`II
`
`Arrays versus Pointers
`
`A challenging topic for any new programmer is understanding pointers.
`Comparing assembly code that uses arrays and array indices to the assembly
`code that uses pointers offers insight into that difference. This section shows
`C and MIPS assembly versions of two procedures to clear a sequence of
`words in memory: one using array indices and one using pointers.
`Figure 3.27 shows the two C procedures.
`
`Hardware
`Software
`Interface
`
`People used to be taught to use pointers in C to get greater effi(cid:173)
`ciency than available with arrays: "Use pointers, even if you
`can't understand
`the code." The procedure cl ea r 2 in
`Figure 3.27 is such an example. Modern optimizing compilers
`can produce just as good code for the array version of the code.
`The purpose of this section is to show how pointers map into
`MIPS instructions, and not to endorse a questionable style.
`
`clea r l( i nt array[] . int size)
`I
`
`i nt i;
`for ( i = O; i < size : i
`array[i) = 0 ;
`
`i + 1)
`
`clear2(int *array , int size)
`I
`
`int *P ;
`for (p = &array[0J ; p < &array[size ]; p
`*P = 0 ;
`
`p + 1)
`
`FIGURE 3.27 Two C procedures for setting an array to all zeros. Cl ear 1 uses indices,
`while cl ear 2 uses pointers. The second procedure needs some explanation for those unfamiliar
`with C. The address of a variable is indicated by & and referring to the object pointed to by a
`pointer is indicated by *. The declarations declare that array and p are pointers to integers. The
`first part of the for loop in cl ea r2 assigns the address of the first element of array to the pointer
`p. The second part of the for loop tests to see if the pointer is pointing beyond the last element of
`array. Incrementing a pointer by one, in the last part of the for loop, means moving the pointer
`to the next sequential object of its declared size. Since p is a pointer to integers, the compiler will
`generate MIPS instructions to increment p by four, the number of bytes in a MIPS integer. The
`assignment in the loop places 0 in the object pointed to by p.
`
`( i 2 n)
`( i 2 n)
`
`< 0)
`< 0)
`
`(old $a0)
`
`- 7
`
`INTEL - 1012
`
`

`172
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.11 Arrays versus Pointers
`
`173
`
`F
`
`Array Version of Clear
`
`Let's start with the array version, clear 1, focusing on the body of the loop
`and ignoring the procedure linkage code. We assume that the two parameters
`array and s i z e are found in the registers $ a O and $ a 1, and that i is allocated
`to register $t0.
`The initialization of i, the first part of the for loop, is straightforward:
`ff
`move HO, $zero
`i = 0 (register HO = 0)
`To set array [ i J to O we must first get its address. Start by multiplying i by 4
`to get the byte address:
`loopl: add $tl,$t0,$t0
`add $tl,$tl,$tl
`
`i * 2
`11 $ t 1 =
`# $tl = i * 4
`
`Since the starting address of the array is in a register, we must add it to the
`index to get the address of a r ray [ i J using an add instruction:
`add
`$t2,$a0,$tl
`# $t2 = address of array[i]
`
`(This example is an ideal situation for indexed addressing; see page 175.)
`Finally we can store O in that address:
`s w
`$zero , 0 ( $t 2 )
`
`fl array [ i J = 0
`This instruction is the end of the body of the loop, so the next step is to incre(cid:173)
`ment i:
`
`Pointer Version of Clear
`
`The second procedure that uses pointers allocates the two parameters array
`and size to the registers $ a O and $ a 1 and allocates p to register $t 0. The code
`for the second procedure starts with assigning the pointer p to the address of
`the first element of the array:
`
`move
`$t0,$a0
`# p = address of array[OJ
`The next code is the body of the for loop, which simply stores O into p:
`loop2:
`sw
`$zero,0($t0) # Memory[p] = 0
`
`This instruction implements the body of the loop, so the next code is the itera(cid:173)
`tion increment, which changes p to point to the next word:
`# p = p + 4
`addi
`$t0,$t0,4
`Incrementing a pointer by 1 means moving the pointer to the next sequential
`object in C. Since p is a pointer to integers, each of which use 4 bytes, the com(cid:173)
`piler increments p by 4.
`The loop test is next. The first step is calculating the address of the last ele(cid:173)
`ment of array. Start with multiplying size by 4 to get its byte address:
`I/ $t 1 = s i z e * 2
`add
`$ tl . $ a 1 , $ a 1
`add $tl,$tl,$tl
`# $tl =size* 4
`
`and then we add the product to the starting address of the array to get the
`address of the first word after the array:
`add $t2,$a0,$tl
`# $t2 = address of array[size]
`
`loop2:
`
`The loop test is simply to see if p is less than the last element of a r ray:
`fl $t3 = (p<&array[size]l
`slt
`$t3,$t0,$t2
`$ t3 . $zero , l o op 211 i f ( p < & array [ s i z e J ) go to I o op 2
`b n e
`With all the pieces completed, we can show a pointer version of the code to
`zero an array:
`move $t0, $a0
`# p = address of array[OJ
`fl Memory[p] = 0
`$zero,0($t(J)
`SW
`addi $t(J,$t0,4
`# p = p + 4
`fl $tl
`add $tl,$al,$al
`size* 2
`add $tl,$tl,$tl
`11 $tl
`size* 4
`add $t2. $a0, $tl
`# $t2
`address of array[size]
`s lL $t3,$t0,$t?
`11 $t3
`(p<&arrayCsizeJ)
`bne $t3. $zero. l oop2# if (p<&array[size]) go to loop?
`As in the first example, this code assumes size is greater than 0.
`
`addi $t0,$t0,l
`
`11
`The loop test checks if i is less than size:
`fl $t3 = (i < size)
`slt
`$t3,$t0,$al
`bne
`$t3,$zero,loopl # if (i < size) go to loopl
`
`i
`
`i + 1
`
`0
`
`We have now seen all the pieces of the procedure. Here is the MIPS code for
`clearing an array using indices:
`move
`$t0. $zero
`loopl: add
`$tl,$t0,$t0
`add
`$tl,$tl,$tl
`add
`$t2,$a0,$tl
`sw
`$zero, 0($t2)
`addi
`$t0,$t0,l
`slt
`$t3,$t0,$al
`bne
`$t3. $zero, l oopl
`
`ff i
`i * 2
`if $ t 1
`i * 4
`# $tl
`address of array[i]
`# $t2
`fl array[i J = 0
`# i = i + 1
`# $t3 = (i < size)
`fl if (i < size) go to loopl
`(This code works as long as size is greater than 0.)
`
`,I'
`
`Ii'
`:,1,
`,,
`I
`
`11 I,
`
`,I•
`, I
`
`INTEL - 1012
`
`

`174
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`175
`
`Note that this program calculates the address of the end of the array every
`iteration of the loop, even though it does not change. A faster version of the
`code moves this calculation outside the loop:
`# p = address of array[ (cid:143)]
`move $t0 ,$ a0
`add $tl , $al,$al
`# $tl - s i ze* 2
`# $tl - size* 4
`add $tl,$tl . $Ll
`it $1
`3cirj HZ . ian . $tl
`= addrPss
`# Memory[p] = 0
`loop2 : sw
`$zero , 0($t0)
`# p = p + 4
`addi $t0 , $t0 , 4
`slt $t3 , $t0 , $t2
`# $t3 = (p< &array[size])
`bne
`$t3 ,$ zero , loo p2 # if (p< &a r r ay [size]l go to loop2
`
`t ar-riJy[sizc
`
`Comparing the Two Versions of Clear
`Comparing the two code sequences side by side illustrates the difference
`between array indices and pointers (the changes introduced by the pointer
`version are highlighted):
`
`move
`loopl : add
`add
`add
`SW
`addi
`slt
`bne
`
`# i = 0
`HO , $zero
`i * z
`$tl,$t0 , $t0 # $tl
`II $tl = i * 4
`$tl , $tl , $tl
`II HZ= &array[i J
`HZ , $a0 , $tl
`$zero , 0 ( H Z ) II array [ i J = 0
`II i = i + 1
`HO , HO , 1
`II H3 = ( i < size)
`H3,HO , $al
`$t3,$zero , loopl# if () go to loopl
`
`loopZ :
`
`ii r & JI 1· a 'y [
`move HO, f.J
`ize ~ z
`add $tl , ;"l,Sal II $tl
`add $tl , $tl , Hl II $tl
`ze * 4
`II
`&array[ s i Zl']
`add HZ , $a0 , Hl HZ
`$zero , 0($ t0 )# Mrmo rylp] = 0
`SW
`p + 4
`II 1
`addi $t0 , $t0, 4
`s lt H3 . HO,H2 II $t3=Cr<&ar,ay[
`bne $t3 . $zero , loopZ# if () go to loopZ
`
`The version on the left must have the "multiply" and add inside the loop
`because i is incremented and each address must be recalculated from the new
`index; the memory pointer version on the right increments the pointer p
`directly. The pointer version reduces the instructions executed per iteration
`from 7 to 4. Many modern compilers will optimize the C code in cl earl to
`produce code similar to the assembly code above on the right-hand side.
`
`Elaboration: The C compiler would add a test to be sure that size is greater than 0.
`One way would be to add a jump just before the first instruction of the loop to the s l t
`instruction.
`
`II
`
`Real Stuff: PowerPC and 80x86
`Instructions
`
`Ben11ty is nltogether in the eye of the beholder.
`
`Margaret Wolfe Hungerford , Molly 811w11, 1877
`
`Designers of instruction sets sometimes provide more powerful operations
`than those found in MIPS. The goal is generally to reduce the number of
`instructions executed by a program. The danger is that this reduction can
`occur at the cost of simplicity, increasing the time a program takes to execute
`because the instructions are slower. This slowness may be the result of a
`slower clock cycle time or of requiring more clock cycles than a simpler
`sequence (see section 2.8 on page 82).
`The path toward operation complexity is thus fraught with peril. To avoid
`these problems, designers have moved toward simpler instructions. Section
`3.13 demonstrates the pitfalls of complexity.
`
`The IBM/ Motorola PowerPC
`The PowerPC, made by IBM and Motorola and used in the Apple Macintosh,
`shares many similarities to MIPS: both have 32 integer registers, instructions
`are all 32 bits long, and data transfer is possible only with loads and stores.
`The primary difference is two more addressing modes plus a few operations.
`
`Indexed Addressing
`In the examples above we saw cases where we needed one register to hold the
`base of the array and the other to hold the index of the array. PowerPC pro(cid:173)
`vides an addressing mode, often called indexed nddressing, that allows two reg(cid:173)
`isters to be added together. The MIPS code
`St0 , $a0 , Ss3 # Sao has base of an array , Ss3 is index
`add
`# reg Stl gets Memory[$a0+$s3]
`lw
`Stl , 0($t0)
`could be replaced by the following single instruction in PowerPC:
`lw
`Stl , $a0+$s3 # reg Stl gets Memory[$a0+$s3J
`Using the same notation as Figure 3.17, Figure 3.28 shows indexed address(cid:173)
`ing. It is available with both loads and stores.
`
`Update Addressing
`Imagine the case of a code sequence marching through an array of words in
`memory, such as in the array version of cl ear 1 on page 172. A frequent pair
`
`INTEL - 1012
`
`

`176
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`177
`
`a. Indexed addressing
`
`op
`
`rs
`
`rt
`
`rd
`
`I
`l
`Register
`L--':_-:._-:._-:._-_-_-_-_-_-_-_- -- -- ,- - - - - - -~, __ &---~; 1--------w_o_rd ______
`7
`
`Register
`
`_
`
`Memory
`
`b. Update addressing
`
`op
`
`rs
`
`I
`
`[I
`
`rt
`
`Address
`
`Register
`
`l
`0
`r
`
`· I
`
`Memory
`
`Word
`
`FIGURE 3 .28
`
`Illustration of indexed and update addressing mode. The operand is shaded in color.
`
`of operations would be loading a word and then incrementi1:g the base regis(cid:173)
`ter to point to the next word. The idea of update address,_ng is t_o have a new
`version of data transfer instructions that will automatically increment the
`base register to point to the next word each time data is transferr~d. Since the
`MIPS architecture uses byte addresses and words are 4 bytes, this new form
`would be equivalent to this pair of MIPS instructions:
`lw
`$t0 , 4($s3)
`# reg $t0 gets Memory[$s3+4J
`addi
`$s3 ,$ s3 ,4
`# $s3 = $s3 + 4
`
`The PowerPC includes an instruction like this:
`fl reg HO=Memory[$s3+4] ; $s3 = $s3+4
`lwu
`H0 , 4($s3)
`That is, the register is updated with the address calculated as part of the lo~d.
`Figure 3.28 also shows update addressing. PowerPC has update addressing
`options for both base and indexed addressing, and for both loads and stores.
`
`Unique PowerPC Instructions
`The PowerPC instructions follow the same architecture style as MIPS, largely
`relying on fast execution of simple instructions for performance. Here are a
`few exceptions.
`
`The first is load multiple and store multiple. These can transfer up to 32
`words of data in a single instruction and are intended to make fast copies of
`locations in memory by using load multiple and store multiple back to back.
`They also save code size when saving or restoring registers.
`A second example is loops. The Power PC has a special counter register, sep(cid:173)
`arate from the other 32 registers, to try to improve performance of a for loop.
`Suppose we wanted to execute the following C code:
`for (i = n ; i != O; i = i - 1)
`/ . . . I ;
`
`If we want to decrement a register, compare to 0, and then branch as long as
`the register is not 0, we could use the following MIPS instructions:
`Loop :
`
`addi
`bne
`
`1
`# $t0 = $t0
`$t0,$t0,-l
`# if HO != 0 go to Loop
`$t0,$zero, Loop
`In Power PC we could use a single instruction instead:
`l;
`be
`Loop , $ctr!=O
`#$ctr= $ctr -
`# if $ctr != 0 go to Loop
`
`Hardware
`Software
`Interface
`
`In addition to going against the advice of simplicity, such
`sophisticated operations ma y not exactly match what the
`compiler needs to produce. For example, suppose that
`instead of decrementing by one, the compiler wanted to
`increment by four, or instead of branching on not equal
`zero, the compiler wanted to branch if the index was less
`than or equal to
`the limit. Then
`the instruction just
`described would be a mismatch . When faced with such objections, the
`instruction set designer might then generalize the operation, adding another
`operand to specify the increment and perhaps an option on which branch
`condition to use. Then the danger is that a common case, say, incrementing by
`one, will be slower than a sequence of simple operations.
`
`The Intel 80x86
`
`MIPS was the vision of a single small group in 1985; the pieces of this archi(cid:173)
`tecture fit nicely together, and the whole architecture can be described suc(cid:173)
`cinctly. Such is not the case for the 80x86; it is the product of several
`independent groups who evolved the architecture over almost 20 years, add(cid:173)
`ing new features to the original instruction set as someone might add clothing
`to a packed bag. Here are important 80x86 milestones:
`
`•(cid:141)
`
`INTEL - 1012
`
`

`178
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`179
`
`,
`
`(cid:127) 1978: The Intel 8086 architecture was announced as an assembly(cid:173)
`language-compatible extension of the then-successful Intel 8080, an 8-bit
`microprocessor. The 8086 is a 16-bit architecture, with all internal registers
`16 bits wide. Unlike MIPS, the registers have dedicated uses, and hence
`the 8086 is not considered a general-purpose register architecture.
`(cid:127) 1980: The Intel 8087 floating-point coprocessor is announced. This ar(cid:173)
`chitecture extends the 8086 with about 60 floating-point instructions. In(cid:173)
`stead of using registers, it relies on a stack (see section 3.15 and section
`4.9).
`(cid:127) 1982: The 80286 extended the 8086 architecture by increasing the ad(cid:173)
`dress space to 24 bits, by creating an elaborate memory-mapping and
`protection model (see Chapter 7), and by adding a few instructions to
`round out the instruction set and to manipulate the protection model.
`(cid:127) 1985: The 80386 extended the 80286 architecture to 32 bits. In addition
`to a 32-bit architecture with 32-bit registers and a 32-bit address space,
`the 80386 added new addressing modes and additional operations. The
`added instructions make the 80386 nearly a general-purpose register
`machine. The 80386 also added paging support in addition to segment(cid:173)
`ed addressing (see Chapter 7). Like the 80286, the 80386 has a mode to
`execute 8086 programs without change.
`(cid:127) 1989-95: The subsequent 80486 in 1989, Pentium in 1992, and Pentium
`Pro in 1995 were aimed at higher performance, with only four instruc(cid:173)
`tions added to the user-visible instruction set: three to help with multi(cid:173)
`processing (Chapter 9) and a conditional move instruction.
`(cid:127) 1997: After the Pentium and Pentium Pro were shipping, Intel an(cid:173)
`nounced that it would expand the Pentium and the Pentium Pro archi(cid:173)
`tectures with MMX. This new set of 57 instructions uses the floating(cid:173)
`point stack to accelerate multimedia and communication applications.
`MMX instructions typically operate on multiple short data elements at
`a time, in the tradition of single instruction, multiple data (SIMD) archi(cid:173)
`tectures (see Chapter 9).
`This history illustrates the impact of the "golden handcuffs" of compatibility
`on the 80x86, as the existing software base at each step was too important to
`jeopardize with significant architectural changes.
`Whatever the artistic failures of the 80x86, keep in mind that there are more
`instances of this architectural family than of any other in the world, perhaps
`300 million in 1997. Nevertheless, this checkered ancestry has led to an archi(cid:173)
`tecture that is difficult to explain and impossible to love.
`Brace yourself for what you are about to see! Do not try to read this section
`with the care you would need to write 80x86 programs; the goal instead is to
`give you familiarity with the strengths and weaknesses of the world's most
`popular architecture.
`
`Rather than show the entire 16-bit and 32-bit instruction set, in this section
`we concentrate on the 32-bit subset that originated with the 80386, as this por(cid:173)
`tion of the architecture will be increasingly dominant over time. We start our
`explanation with the registers and addressing modes, move on to the integer
`operations, and conclude with an examination of instruction encoding.
`
`80x86 Registers and Data Addressing Modes
`The evolution of the instruction set can be seen in the registers of the 80386
`(Figure 3.29). The 80386 basically extended all 16-bit registers (except the seg(cid:173)
`ment registers) to 32 bits, prefixing an E to their name to indicate the 32-bit
`version. We'll refer to them generically as GPRs (general-purpose registers).
`The 80386 contains only eight GPRs. This means MIPS programs can use four
`times as many.
`
`Name
`
`31
`
`EAX
`
`ECX
`
`EDX
`
`EBX
`
`ESP
`
`EBP
`
`ESI
`
`EDI
`
`EIP
`
`EFLAGS
`
`Use
`
`0
`
`GPR 0
`
`GPR 1
`
`GPR 2
`
`GPR 3
`
`GPR 4
`
`GPR 5
`
`GPR 6
`
`GPR 7
`
`cs
`
`ss
`
`DS
`
`ES
`
`FS
`
`GS
`
`Code segment pointer
`
`Stack segment pointer (top of stack)
`
`Data segment pointer O
`
`Data segment pointer 1
`
`Data segment pointer 2
`
`Data segment pointer 3
`
`Instruction pointer (PC)
`
`Condition codes
`
`FIGURE 3.29 The 80386 register set. Starting with the 80386, the top eight registers were extended to 32
`bits and could also be used as general-purpose registers.
`
`INTEL - 1012
`
`

`180
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`181
`
`Description
`
`Register
`restrictions
`
`Address is in a register.
`
`Address is contents of base register
`plus displacement.
`
`not ESP or EBP
`
`not ESP or EBP
`
`Base: any GPR
`Index: not ESP
`
`MIPS equivalent
`-
`-
`
`-
`
`1
`
`Mode
`I Register indirect
`I Based mode
`I with 8- or 32-bit
`displacement
`Base plus
`scaled index
`
`Base plus scaled
`I
`index with
`8- or 32-bit
`displacement
`
`Source/destination operand type
`
`Second source operand
`
`-
`
`Register
`Register
`- - - - - -- - - - - - - - - - - - - - - - - - - - -
`Register
`Immediate
`
`-
`
`-
`
`Memory
`Register
`- ~ - - - - - - - - - - - - - - - - - - -
`Memory
`Register
`
`-
`
`-
`
`Memory
`
`Immediate
`
`Instruction types for the arithmetic, logical, and data transfer instructions.
`FIGURE 3.30
`The 80x86 ;illows the combinations shown. The only restriction is the absence of a memory(cid:173)
`memory mode. lmmediates may be 8, 16, or 32 bits in length; a register is any one of the 14 m;ijor
`registers in Figure 3.29 (not EIP or EFLAGS).
`
`The arithmetic, logical, and data transfer instructions are two-operand in(cid:173)
`structions that allow the combinations shown in Figure 3.30. There are two
`important differences here. The 80x86 arithmetic and logical instructions must
`have one operand act as both a source and a destination; MIPS allows separate
`registers for source and destination. This restriction puts more pressure on the
`limited registers, since one source register must be modified. The second im(cid:173)
`portant difference is that one of the operands can be in memory. Thus virtually
`any instruction may have one operand in memory, unlike MIPS and PowerPC.
`The seven data memory-addressing modes, described in detail below, offer
`two sizes of addresses within the instruction. These so-called displnce111ents can
`be 8 bits or 32 bits.
`Although a memory operand can use any addressing mode, there are re(cid:173)
`strictions on which registers can be used in a mode. Figure 3.31 shows the 80x86
`addressing modes and which GPRs cannot be used with that mode, plus how
`you would get the same effect using MIPS instructions.
`
`80x86 Integer Operations
`The 8086 provides support for both 8-bit (byte) and 16-bit (word) data types.
`The 80386 adds 32-bit addresses and data (double words) in the 80x86. The data
`type distinctions apply to register operations as well as memory accesses.
`Almost every opera tion works on both 8-bit data and on one longer data size.
`That size is determined by the mode, and is either 16 bits or 32 bits.
`Clearly some programs want to operate on data of all three sizes, so the
`80386 architects provide a convenient way to specify each version without ex(cid:173)
`panding code size significantly. They decided that most programs would be
`dominated by either 16-bit or 32-bit data, and so it made sense to be able to set
`a default large size. This default data size is set by a bit in the code segment
`register. To override the default data size, an 8-bit prefix is attached to the in(cid:173)
`struction to tell the machine to use the other large size for this instruction.
`The prefix solution was borrowed from the 8086, which allows multiple pre(cid:173)
`fixes to modify instruction behavior. The three original prefixes override the
`
`l w $ s O , 0 ( $ s l )
`l w $s0 , 10()($51 )// <; ] 6- bit 7
`II d is p laceme nt
`---r---=:T----- - -- -- - -- - -----1----- -- - -- __j_ - -- - --
`he address is
`Base:anyGPR
`mul $t0 ,$ s2 .4
`Index: not ESP ~ add $ t O, $t O, $ s 1
`Base + (2Scale x Index)
`where Scale has the value 0, 1, 2, or 3 .
`l w $ s0 . O ( HO )
`mu l $t O, $ s z. 4
`The address is
`a dd HO . $t O, $ s 1
`Base + (2scale x Index) + displacement
`l w $s0 . 100( HO ) fl <; 16-bi t
`where Scale has the value 0, 1, 2, or 3.
`II displacement
`
`_
`
`_ ________
`
`_J_ __ _ ~ _ __J
`
`_
`
`FIGURE 3.31 80x86 32-bit addressing modes with register restrictions and the equivalent MIPS code. The Base
`plus Scaled Index add_ressing mode'. not found in MIPS or the PowerPC, is included to avoid the multiplies by four (scale
`factor of 2) to turn an mdex ma register mto a byte address (see Figures 3.24 and 3.26). A scale factor of 1 is used for 16-bit
`data, and_a scale factor of 3 for 64-bit data. Scale factor of O means the address is not scaled. If the displacement is longer
`than 16 bits Ill the second or fourth modes, then the MIPS equivalent mode would need two more instructions: a
`L, i to
`load the upper 16 bits of the displacement and an add to sum the upper address with the base register $ J. (Intel gi,·es t\\'O
`different names to what 1s called Based addressing mode-Based and Indexed- but they are essenti;illv kkntical ,rnd we
`combme them here.)
`-
`
`default segment register, lock the bus to support a semaphore (see Chapter 9),
`or repeat the following instruction until the register ECX counts down to 0.
`This last prefix was intended to be paired with a byte move instruction to move
`a variable number of bytes. The 80386 also added a prefix to override the de(cid:173)
`fault address size.
`The 80x86 integer operations can be divided into four major classes:
`
`1. Data movement instructions, including move, push, and pop
`
`2. Arithmetic and logic instructions, including test and integer and deci(cid:173)
`mal arithmetic operations
`
`3. Control flow, including conditional branches, unconditional Jumps,
`calls, and returns
`
`4. String instructions, including string move and string compare
`
`The first two categories are unremarkable, except that the arithmetic and
`logic instruction operations allow the destination to either be a register or a
`memory location. Figure 3.32 shows some typical 80x86 instructions and their
`functions.
`Conditional branches on the PowerPC and the 80x86 are based on co11ditio11
`codes or flngs . Condition codes are set as a side effect of an operation; most Me
`used to compare the value of a result to 0. Branches then test the conditiun
`codes. The argument for condition codes is that they occur as part of normal
`operations and are faster to test than it is to compare registers as MIPS does for
`
`,
`
`l
`
`INTEL - 1012
`
`

`182
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`183
`
`I
`
`Instruction
`
`J E name
`
`JMP name
`CALL name
`MOVW EBX,[ ED l+45]
`PUSH ES!
`POP EDI
`ADD
`EAX , #6765
`TEST EDX,#4 2
`MOVSL
`
`Function
`
`{EIP=name ) ;
`
`if equal(condition code)
`EIP - 128 ~ name< EIP+l28
`EIP=name
`SP=SP - 4 ; M[SPJ=EIP+5; EIP=name ;
`EB X=M[ ED I +45 J
`SP=SP- 4 ; M[SP]= ES I
`EDl=M[SPJ ; SP=SP+4
`EAX= EAX + 6765
`Set condition code (flags) with EDX and 42hex
`
`M [EDI J =M [ES I J ;
`EDI=EDI+4 ; ESI=E SI+4
`
`FIGURE 3.32 Some typical 80x86 instructions and their functions. A list of frequent oper(cid:173)
`ations appears in Figure 3.33. The CALL saves the EIP of the next instruction on the stack. (EIP is
`the Intel PC.)
`
`beq and bne. The argument against condition codes is that the compare to 0 ex(cid:173)
`tends the time of the operation, since it uses extra hardware after the operation,
`and that often the programmer must use compare instructions to test a value
`that is not the result of an operation. Also, PC-relative branch addresses must
`be specified in the number of bytes, since unlike MIPS, 80386 instructions are
`not all 4 bytes in length.
`String instructions are part of the 8080 ancestry of the 80x86 and are not
`commonly executed in most programs. They are often slower than equivalent
`software routines (see the fallacy on page 185).
`Figure 3.33 lists some of the integer 80x86 instructions. Many of the instruc(cid:173)
`tions are available in both byte and word formats.
`
`80x86 Instruction Encoding
`Saving the worst for last, the encoding of instructions in the 8086 is complex,
`with many different instruction formats. Instructions for the 80386 may vary
`from 1 byte, when there are no operands, up to 17 bytes.
`Figure 3.34 shows the instruction format for several of the example instruc(cid:173)
`tions in Figure 3.32. The opcode byte usually contains a bit saying whether the
`operand is 8 bits or 32 bits. For some instructions the opcode may include the
`addressing mode and the register; this is true in many instructions that have
`the form "register= register op immediate." Other instructions use a "post(cid:173)
`byte" or extra opcode byte, labeled "mod, reg, r/m," which con

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases