`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.10 An Example to Put It All Together
`
`169
`
`add
`add
`lw
`lw
`s lt
`beq
`
`$tl. $tl,$tl
`HZ. $a0 . $tl
`$t3 . 0( HZ)
`H4 . 4( HZ)
`H O, H4. $t3
`HO, $zero. ex it2
`
`j * 4
`# reg $tl
`V + ( j * 4)
`# reg HZ
`v[j]
`# reg H3
`v[j + 1]
`# reg $t4
`reg $t0 = 0 if $t4
`#
`ff go to exit2 if $t4
`
`~ $t3
`~ $t3
`
`(body of second for loop)
`
`addi
`j
`exit2:
`
`$sl . $sl . -1
`for2tst
`
`ff
`- 1
`j = j
`ff jump to test of inner loop
`
`The Procedure Call in sort
`The next step is the body of the second for loop:
`swap(v,j);
`
`Calling s wap is easy enough:
`
`jal
`
`swap
`
`Passing Parameters in sort
`The problem comes when we want to pass parameters because the s o rt pro(cid:173)
`cedure needs the values in registers $a0 and $al, ~et the swap pr~ce~ure
`needs to have its parameters placed in those same reg1st~rs. _One solution 1s to
`the parameters for sort into other registers earlier m the procedure,
`:r:ing registers $ a 0 and $ a 1 available for the call of swap . (This copy_is faster
`than saving and restoring on the stack.) We first copy $ a 0 and $ a 1 mto $ s 2
`and $s3 during the procedure:
`ff copy parameter $a0 into $s2
`move
`$s2. $a0
`ff copy parameter $a l into $s3
`move
`$s3. $al
`Then we pass the parameters to swap with these two instructions:
`ff first swap parameter is v
`move
`$a0
`$s2
`ff second swap parameter is j
`move
`$al: $sl
`
`Preserving Registers in sort
`The only remaining code is the saving and re~toring of registers. Clearly w_e
`must save the return address in register $r a, smce sort 1s a procedure and 1s
`
`called itself. The so rt procedure also uses the saved registers $s 0, $s 1, $s 2,
`and $s3, so they must be saved . The prologue of th e s ort procedure is th en
`ff make r oom on stack fo r 5 r egs
`addi
`$sp , $sp , - 20
`ff s ave $ ra on s tack
`sw
`$r a , 16 ($s p)
`ff s av e $s 3 on s t ack
`sw
`$s 3 , 12 ( $sp )
`ff s ave $s 2 on stac k
`sw
`$s 2 . 8 ( $sp )
`ff s av e $sl on s t ac k
`sw
`$sl , 4($sp )
`ff save $s0 on s t ac k
`sw
`$s0, 0($sp)
`The tail of the procedure simply reverses all these instructions, then adds a j r
`to return.
`
`The Full Procedure sort
`Now we put all the pieces together in Figure 3.26, being careful to replace ref(cid:173)
`erences to registers $ a 0 and $ a 1 in th e for loops with re ferences to registers
`$ s 2 and $ s 3. Once again to make the code easier to follow, we identify each
`block of code with its purpose in the procedure. In this example, 9 lines of th e
`sort procedure in C beca me the 35 lines in the MIPS assembly language.
`
`Elaboration: One optimization that would work we ll
`in this example is procedure
`inlining. Instead of passing arguments in parameters and invoking the code with a j a l
`instruction , the compiler would copy the code from the body of the s wap procedure
`where the call to s wap appears in the code. lnlining would avoid four instructions in
`this example. The downside of the inlining optimization is that the compi led code would
`be bigger, assuming that the inlined procedure is cal led from several locations . Such a
`code expansion might turn into lower performance if it increased the cache miss rate;
`see Chapter 7.
`The MIPS compilers always save room on the stack for the arguments in case they
`need to be stored, so in reality they always decrement $ s p by 16 to make room for all
`4 argument registers (16 bytes) . One reason is that C provides a v a r a rg option that
`allows a pointer to pick, say, the third argument to a procedure. Wh en the com piler
`encounters the rare va r a r g, it copies the registers onto the stack into the reserved
`locations.
`
`INTEL - 1012
`
`
`
`170
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.11 Arrays versus Pointers
`
`171
`
`sort :
`
`addi
`SW
`
`SW
`
`SW
`
`SW
`
`SW
`
`$sp , $sp . -20
`$ ra , 16($spl
`$s3 , 12( $sp)
`$s2 , 8($sp)
`$s 1. 4($sp)
`$s0 , 0($sp)
`
`make room on stack for 5 registers
`save $ ra on stack
`save $s3 on stack
`save $s2 on stack
`save $sl on stack
`save $s0 on stack
`
`$s2, $a0
`$s3 , $al
`$s0 , $zero
`HO , $s0 , $s3
`H O. $ze r o . exitl
`$s 1. $s0 . -1
`H O, $s 1. 0
`$t0 , $zero , exit2
`Hl , $s 1. $sl
`$tl' $tl, Hl
`$t2 , $s2 , Hl
`$t3 , 0( $t2)
`$t4 , 4($t2)
`$t0 , $t 4 , H3
`H O, $zero , exit2
`$a0 , $s2
`$a 1. $s1
`swap
`$s 1. $5 l. - 1
`for2tst
`$s0 , $s0 ,
`fo r ltst
`
`Move parameters I
`
`Outer loop
`
`Inner loop
`
`Pass parameters
`and call
`
`Inner loop
`
`Outer loop
`
`move
`move
`move
`forltst : slt
`be q
`ad di
`I for2tst : s lt i
`bne
`add
`add
`add
`lw
`lw
`slt
`beq
`mo ve
`move
`jal
`addi
`j
`exit2 : addi
`j
`
`exi tl :
`
`lw
`l w
`lw
`lw
`lw
`addi
`
`$s0 . 0($sp)
`$s 1. 4 ( $s p )
`$s2, 8( $sp)
`$s3 , 12($sp)
`$r a , 16($sp)
`$sp , $sp , 20
`
`It restore $s0 from stack
`j/ restore $s1 from stac k
`It restore $s2 from stack
`j/ restore $s3 from stack
`j/ restore $ ra from stack
`It restore stack pointer
`Procedure return
`It return to calling routine
`
`jr
`
`$ra
`
`FIGURE 3.26 MIPS assembly version of procedure sort in Figure 3.25 on page 166.
`
`Saving registers
`It
`It
`It
`It
`It
`It
`Procedure body
`It
`copy parameter $a0 into $ s2 (save $a0 )
`
`It
`copy parameter $al into $
`s3 (save $al)
`It
`i = 0
`It
`reg $t0 = 0 if $s0 2 $s3
`It
`go to e xi tl if $s0 2 $s3
`It
`j = i - 1
`reg HO = 1 i f $sl < 0 ( j
`It
`go to exit2 if $sl < 0 ( j
`It
`reg Hl = j * 2
`It
`It
`j * 4
`reg Hl
`V + ( j * 4 )
`It
`reg $t2
`It
`V [ j]
`reg $t3
`v[j + 1]
`It
`reg $t4
`It
`reg HO= 0 if H4 2 H3
`It
`go t o exit2 if $t4 2 $t3
`It
`1st pa r ameter of s wap is
`V
`It
`j
`2nd parameter of swap is
`It swap code shown in Figur e 3 . 24
`It j = j - 1
`It jump to test of inner loop
`It i = i + 1
`It jump to test of outer loop
`Restoring registers
`
`II
`
`Arrays versus Pointers
`
`A challenging topic for any new programmer is understanding pointers.
`Comparing assembly code that uses arrays and array indices to the assembly
`code that uses pointers offers insight into that difference. This section shows
`C and MIPS assembly versions of two procedures to clear a sequence of
`words in memory: one using array indices and one using pointers.
`Figure 3.27 shows the two C procedures.
`
`Hardware
`Software
`Interface
`
`People used to be taught to use pointers in C to get greater effi(cid:173)
`ciency than available with arrays: "Use pointers, even if you
`can't understand
`the code." The procedure cl ea r 2 in
`Figure 3.27 is such an example. Modern optimizing compilers
`can produce just as good code for the array version of the code.
`The purpose of this section is to show how pointers map into
`MIPS instructions, and not to endorse a questionable style.
`
`clea r l( i nt array[] . int size)
`I
`
`i nt i;
`for ( i = O; i < size : i
`array[i) = 0 ;
`
`i + 1)
`
`clear2(int *array , int size)
`I
`
`int *P ;
`for (p = &array[0J ; p < &array[size ]; p
`*P = 0 ;
`
`p + 1)
`
`FIGURE 3.27 Two C procedures for setting an array to all zeros. Cl ear 1 uses indices,
`while cl ear 2 uses pointers. The second procedure needs some explanation for those unfamiliar
`with C. The address of a variable is indicated by & and referring to the object pointed to by a
`pointer is indicated by *. The declarations declare that array and p are pointers to integers. The
`first part of the for loop in cl ea r2 assigns the address of the first element of array to the pointer
`p. The second part of the for loop tests to see if the pointer is pointing beyond the last element of
`array. Incrementing a pointer by one, in the last part of the for loop, means moving the pointer
`to the next sequential object of its declared size. Since p is a pointer to integers, the compiler will
`generate MIPS instructions to increment p by four, the number of bytes in a MIPS integer. The
`assignment in the loop places 0 in the object pointed to by p.
`
`( i 2 n)
`( i 2 n)
`
`< 0)
`< 0)
`
`(old $a0)
`
`- 7
`
`INTEL - 1012
`
`
`
`172
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.11 Arrays versus Pointers
`
`173
`
`F
`
`Array Version of Clear
`
`Let's start with the array version, clear 1, focusing on the body of the loop
`and ignoring the procedure linkage code. We assume that the two parameters
`array and s i z e are found in the registers $ a O and $ a 1, and that i is allocated
`to register $t0.
`The initialization of i, the first part of the for loop, is straightforward:
`ff
`move HO, $zero
`i = 0 (register HO = 0)
`To set array [ i J to O we must first get its address. Start by multiplying i by 4
`to get the byte address:
`loopl: add $tl,$t0,$t0
`add $tl,$tl,$tl
`
`i * 2
`11 $ t 1 =
`# $tl = i * 4
`
`Since the starting address of the array is in a register, we must add it to the
`index to get the address of a r ray [ i J using an add instruction:
`add
`$t2,$a0,$tl
`# $t2 = address of array[i]
`
`(This example is an ideal situation for indexed addressing; see page 175.)
`Finally we can store O in that address:
`s w
`$zero , 0 ( $t 2 )
`
`fl array [ i J = 0
`This instruction is the end of the body of the loop, so the next step is to incre(cid:173)
`ment i:
`
`Pointer Version of Clear
`
`The second procedure that uses pointers allocates the two parameters array
`and size to the registers $ a O and $ a 1 and allocates p to register $t 0. The code
`for the second procedure starts with assigning the pointer p to the address of
`the first element of the array:
`
`move
`$t0,$a0
`# p = address of array[OJ
`The next code is the body of the for loop, which simply stores O into p:
`loop2:
`sw
`$zero,0($t0) # Memory[p] = 0
`
`This instruction implements the body of the loop, so the next code is the itera(cid:173)
`tion increment, which changes p to point to the next word:
`# p = p + 4
`addi
`$t0,$t0,4
`Incrementing a pointer by 1 means moving the pointer to the next sequential
`object in C. Since p is a pointer to integers, each of which use 4 bytes, the com(cid:173)
`piler increments p by 4.
`The loop test is next. The first step is calculating the address of the last ele(cid:173)
`ment of array. Start with multiplying size by 4 to get its byte address:
`I/ $t 1 = s i z e * 2
`add
`$ tl . $ a 1 , $ a 1
`add $tl,$tl,$tl
`# $tl =size* 4
`
`and then we add the product to the starting address of the array to get the
`address of the first word after the array:
`add $t2,$a0,$tl
`# $t2 = address of array[size]
`
`loop2:
`
`The loop test is simply to see if p is less than the last element of a r ray:
`fl $t3 = (p<&array[size]l
`slt
`$t3,$t0,$t2
`$ t3 . $zero , l o op 211 i f ( p < & array [ s i z e J ) go to I o op 2
`b n e
`With all the pieces completed, we can show a pointer version of the code to
`zero an array:
`move $t0, $a0
`# p = address of array[OJ
`fl Memory[p] = 0
`$zero,0($t(J)
`SW
`addi $t(J,$t0,4
`# p = p + 4
`fl $tl
`add $tl,$al,$al
`size* 2
`add $tl,$tl,$tl
`11 $tl
`size* 4
`add $t2. $a0, $tl
`# $t2
`address of array[size]
`s lL $t3,$t0,$t?
`11 $t3
`(p<&arrayCsizeJ)
`bne $t3. $zero. l oop2# if (p<&array[size]) go to loop?
`As in the first example, this code assumes size is greater than 0.
`
`addi $t0,$t0,l
`
`11
`The loop test checks if i is less than size:
`fl $t3 = (i < size)
`slt
`$t3,$t0,$al
`bne
`$t3,$zero,loopl # if (i < size) go to loopl
`
`i
`
`i + 1
`
`0
`
`We have now seen all the pieces of the procedure. Here is the MIPS code for
`clearing an array using indices:
`move
`$t0. $zero
`loopl: add
`$tl,$t0,$t0
`add
`$tl,$tl,$tl
`add
`$t2,$a0,$tl
`sw
`$zero, 0($t2)
`addi
`$t0,$t0,l
`slt
`$t3,$t0,$al
`bne
`$t3. $zero, l oopl
`
`ff i
`i * 2
`if $ t 1
`i * 4
`# $tl
`address of array[i]
`# $t2
`fl array[i J = 0
`# i = i + 1
`# $t3 = (i < size)
`fl if (i < size) go to loopl
`(This code works as long as size is greater than 0.)
`
`,I'
`
`Ii'
`:,1,
`,,
`I
`
`11 I,
`
`,I•
`, I
`
`INTEL - 1012
`
`
`
`174
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`175
`
`Note that this program calculates the address of the end of the array every
`iteration of the loop, even though it does not change. A faster version of the
`code moves this calculation outside the loop:
`# p = address of array[ (cid:143)]
`move $t0 ,$ a0
`add $tl , $al,$al
`# $tl - s i ze* 2
`# $tl - size* 4
`add $tl,$tl . $Ll
`it $1
`3cirj HZ . ian . $tl
`= addrPss
`# Memory[p] = 0
`loop2 : sw
`$zero , 0($t0)
`# p = p + 4
`addi $t0 , $t0 , 4
`slt $t3 , $t0 , $t2
`# $t3 = (p< &array[size])
`bne
`$t3 ,$ zero , loo p2 # if (p< &a r r ay [size]l go to loop2
`
`t ar-riJy[sizc
`
`Comparing the Two Versions of Clear
`Comparing the two code sequences side by side illustrates the difference
`between array indices and pointers (the changes introduced by the pointer
`version are highlighted):
`
`move
`loopl : add
`add
`add
`SW
`addi
`slt
`bne
`
`# i = 0
`HO , $zero
`i * z
`$tl,$t0 , $t0 # $tl
`II $tl = i * 4
`$tl , $tl , $tl
`II HZ= &array[i J
`HZ , $a0 , $tl
`$zero , 0 ( H Z ) II array [ i J = 0
`II i = i + 1
`HO , HO , 1
`II H3 = ( i < size)
`H3,HO , $al
`$t3,$zero , loopl# if () go to loopl
`
`loopZ :
`
`ii r & JI 1· a 'y [
`move HO, f.J
`ize ~ z
`add $tl , ;"l,Sal II $tl
`add $tl , $tl , Hl II $tl
`ze * 4
`II
`&array[ s i Zl']
`add HZ , $a0 , Hl HZ
`$zero , 0($ t0 )# Mrmo rylp] = 0
`SW
`p + 4
`II 1
`addi $t0 , $t0, 4
`s lt H3 . HO,H2 II $t3=Cr<&ar,ay[
`bne $t3 . $zero , loopZ# if () go to loopZ
`
`The version on the left must have the "multiply" and add inside the loop
`because i is incremented and each address must be recalculated from the new
`index; the memory pointer version on the right increments the pointer p
`directly. The pointer version reduces the instructions executed per iteration
`from 7 to 4. Many modern compilers will optimize the C code in cl earl to
`produce code similar to the assembly code above on the right-hand side.
`
`Elaboration: The C compiler would add a test to be sure that size is greater than 0.
`One way would be to add a jump just before the first instruction of the loop to the s l t
`instruction.
`
`II
`
`Real Stuff: PowerPC and 80x86
`Instructions
`
`Ben11ty is nltogether in the eye of the beholder.
`
`Margaret Wolfe Hungerford , Molly 811w11, 1877
`
`Designers of instruction sets sometimes provide more powerful operations
`than those found in MIPS. The goal is generally to reduce the number of
`instructions executed by a program. The danger is that this reduction can
`occur at the cost of simplicity, increasing the time a program takes to execute
`because the instructions are slower. This slowness may be the result of a
`slower clock cycle time or of requiring more clock cycles than a simpler
`sequence (see section 2.8 on page 82).
`The path toward operation complexity is thus fraught with peril. To avoid
`these problems, designers have moved toward simpler instructions. Section
`3.13 demonstrates the pitfalls of complexity.
`
`The IBM/ Motorola PowerPC
`The PowerPC, made by IBM and Motorola and used in the Apple Macintosh,
`shares many similarities to MIPS: both have 32 integer registers, instructions
`are all 32 bits long, and data transfer is possible only with loads and stores.
`The primary difference is two more addressing modes plus a few operations.
`
`Indexed Addressing
`In the examples above we saw cases where we needed one register to hold the
`base of the array and the other to hold the index of the array. PowerPC pro(cid:173)
`vides an addressing mode, often called indexed nddressing, that allows two reg(cid:173)
`isters to be added together. The MIPS code
`St0 , $a0 , Ss3 # Sao has base of an array , Ss3 is index
`add
`# reg Stl gets Memory[$a0+$s3]
`lw
`Stl , 0($t0)
`could be replaced by the following single instruction in PowerPC:
`lw
`Stl , $a0+$s3 # reg Stl gets Memory[$a0+$s3J
`Using the same notation as Figure 3.17, Figure 3.28 shows indexed address(cid:173)
`ing. It is available with both loads and stores.
`
`Update Addressing
`Imagine the case of a code sequence marching through an array of words in
`memory, such as in the array version of cl ear 1 on page 172. A frequent pair
`
`INTEL - 1012
`
`
`
`176
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`177
`
`a. Indexed addressing
`
`op
`
`rs
`
`rt
`
`rd
`
`I
`l
`Register
`L--':_-:._-:._-:._-_-_-_-_-_-_-_- -- -- ,- - - - - - -~, __ &---~; 1--------w_o_rd ______
`7
`
`Register
`
`_
`
`Memory
`
`b. Update addressing
`
`op
`
`rs
`
`I
`
`[I
`
`rt
`
`Address
`
`Register
`
`l
`0
`r
`
`· I
`
`Memory
`
`Word
`
`FIGURE 3 .28
`
`Illustration of indexed and update addressing mode. The operand is shaded in color.
`
`of operations would be loading a word and then incrementi1:g the base regis(cid:173)
`ter to point to the next word. The idea of update address,_ng is t_o have a new
`version of data transfer instructions that will automatically increment the
`base register to point to the next word each time data is transferr~d. Since the
`MIPS architecture uses byte addresses and words are 4 bytes, this new form
`would be equivalent to this pair of MIPS instructions:
`lw
`$t0 , 4($s3)
`# reg $t0 gets Memory[$s3+4J
`addi
`$s3 ,$ s3 ,4
`# $s3 = $s3 + 4
`
`The PowerPC includes an instruction like this:
`fl reg HO=Memory[$s3+4] ; $s3 = $s3+4
`lwu
`H0 , 4($s3)
`That is, the register is updated with the address calculated as part of the lo~d.
`Figure 3.28 also shows update addressing. PowerPC has update addressing
`options for both base and indexed addressing, and for both loads and stores.
`
`Unique PowerPC Instructions
`The PowerPC instructions follow the same architecture style as MIPS, largely
`relying on fast execution of simple instructions for performance. Here are a
`few exceptions.
`
`The first is load multiple and store multiple. These can transfer up to 32
`words of data in a single instruction and are intended to make fast copies of
`locations in memory by using load multiple and store multiple back to back.
`They also save code size when saving or restoring registers.
`A second example is loops. The Power PC has a special counter register, sep(cid:173)
`arate from the other 32 registers, to try to improve performance of a for loop.
`Suppose we wanted to execute the following C code:
`for (i = n ; i != O; i = i - 1)
`/ . . . I ;
`
`If we want to decrement a register, compare to 0, and then branch as long as
`the register is not 0, we could use the following MIPS instructions:
`Loop :
`
`addi
`bne
`
`1
`# $t0 = $t0
`$t0,$t0,-l
`# if HO != 0 go to Loop
`$t0,$zero, Loop
`In Power PC we could use a single instruction instead:
`l;
`be
`Loop , $ctr!=O
`#$ctr= $ctr -
`# if $ctr != 0 go to Loop
`
`Hardware
`Software
`Interface
`
`In addition to going against the advice of simplicity, such
`sophisticated operations ma y not exactly match what the
`compiler needs to produce. For example, suppose that
`instead of decrementing by one, the compiler wanted to
`increment by four, or instead of branching on not equal
`zero, the compiler wanted to branch if the index was less
`than or equal to
`the limit. Then
`the instruction just
`described would be a mismatch . When faced with such objections, the
`instruction set designer might then generalize the operation, adding another
`operand to specify the increment and perhaps an option on which branch
`condition to use. Then the danger is that a common case, say, incrementing by
`one, will be slower than a sequence of simple operations.
`
`The Intel 80x86
`
`MIPS was the vision of a single small group in 1985; the pieces of this archi(cid:173)
`tecture fit nicely together, and the whole architecture can be described suc(cid:173)
`cinctly. Such is not the case for the 80x86; it is the product of several
`independent groups who evolved the architecture over almost 20 years, add(cid:173)
`ing new features to the original instruction set as someone might add clothing
`to a packed bag. Here are important 80x86 milestones:
`
`•(cid:141)
`
`INTEL - 1012
`
`
`
`178
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`179
`
`,
`
`(cid:127) 1978: The Intel 8086 architecture was announced as an assembly(cid:173)
`language-compatible extension of the then-successful Intel 8080, an 8-bit
`microprocessor. The 8086 is a 16-bit architecture, with all internal registers
`16 bits wide. Unlike MIPS, the registers have dedicated uses, and hence
`the 8086 is not considered a general-purpose register architecture.
`(cid:127) 1980: The Intel 8087 floating-point coprocessor is announced. This ar(cid:173)
`chitecture extends the 8086 with about 60 floating-point instructions. In(cid:173)
`stead of using registers, it relies on a stack (see section 3.15 and section
`4.9).
`(cid:127) 1982: The 80286 extended the 8086 architecture by increasing the ad(cid:173)
`dress space to 24 bits, by creating an elaborate memory-mapping and
`protection model (see Chapter 7), and by adding a few instructions to
`round out the instruction set and to manipulate the protection model.
`(cid:127) 1985: The 80386 extended the 80286 architecture to 32 bits. In addition
`to a 32-bit architecture with 32-bit registers and a 32-bit address space,
`the 80386 added new addressing modes and additional operations. The
`added instructions make the 80386 nearly a general-purpose register
`machine. The 80386 also added paging support in addition to segment(cid:173)
`ed addressing (see Chapter 7). Like the 80286, the 80386 has a mode to
`execute 8086 programs without change.
`(cid:127) 1989-95: The subsequent 80486 in 1989, Pentium in 1992, and Pentium
`Pro in 1995 were aimed at higher performance, with only four instruc(cid:173)
`tions added to the user-visible instruction set: three to help with multi(cid:173)
`processing (Chapter 9) and a conditional move instruction.
`(cid:127) 1997: After the Pentium and Pentium Pro were shipping, Intel an(cid:173)
`nounced that it would expand the Pentium and the Pentium Pro archi(cid:173)
`tectures with MMX. This new set of 57 instructions uses the floating(cid:173)
`point stack to accelerate multimedia and communication applications.
`MMX instructions typically operate on multiple short data elements at
`a time, in the tradition of single instruction, multiple data (SIMD) archi(cid:173)
`tectures (see Chapter 9).
`This history illustrates the impact of the "golden handcuffs" of compatibility
`on the 80x86, as the existing software base at each step was too important to
`jeopardize with significant architectural changes.
`Whatever the artistic failures of the 80x86, keep in mind that there are more
`instances of this architectural family than of any other in the world, perhaps
`300 million in 1997. Nevertheless, this checkered ancestry has led to an archi(cid:173)
`tecture that is difficult to explain and impossible to love.
`Brace yourself for what you are about to see! Do not try to read this section
`with the care you would need to write 80x86 programs; the goal instead is to
`give you familiarity with the strengths and weaknesses of the world's most
`popular architecture.
`
`Rather than show the entire 16-bit and 32-bit instruction set, in this section
`we concentrate on the 32-bit subset that originated with the 80386, as this por(cid:173)
`tion of the architecture will be increasingly dominant over time. We start our
`explanation with the registers and addressing modes, move on to the integer
`operations, and conclude with an examination of instruction encoding.
`
`80x86 Registers and Data Addressing Modes
`The evolution of the instruction set can be seen in the registers of the 80386
`(Figure 3.29). The 80386 basically extended all 16-bit registers (except the seg(cid:173)
`ment registers) to 32 bits, prefixing an E to their name to indicate the 32-bit
`version. We'll refer to them generically as GPRs (general-purpose registers).
`The 80386 contains only eight GPRs. This means MIPS programs can use four
`times as many.
`
`Name
`
`31
`
`EAX
`
`ECX
`
`EDX
`
`EBX
`
`ESP
`
`EBP
`
`ESI
`
`EDI
`
`EIP
`
`EFLAGS
`
`Use
`
`0
`
`GPR 0
`
`GPR 1
`
`GPR 2
`
`GPR 3
`
`GPR 4
`
`GPR 5
`
`GPR 6
`
`GPR 7
`
`cs
`
`ss
`
`DS
`
`ES
`
`FS
`
`GS
`
`Code segment pointer
`
`Stack segment pointer (top of stack)
`
`Data segment pointer O
`
`Data segment pointer 1
`
`Data segment pointer 2
`
`Data segment pointer 3
`
`Instruction pointer (PC)
`
`Condition codes
`
`FIGURE 3.29 The 80386 register set. Starting with the 80386, the top eight registers were extended to 32
`bits and could also be used as general-purpose registers.
`
`INTEL - 1012
`
`
`
`180
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`181
`
`Description
`
`Register
`restrictions
`
`Address is in a register.
`
`Address is contents of base register
`plus displacement.
`
`not ESP or EBP
`
`not ESP or EBP
`
`Base: any GPR
`Index: not ESP
`
`MIPS equivalent
`-
`-
`
`-
`
`1
`
`Mode
`I Register indirect
`I Based mode
`I with 8- or 32-bit
`displacement
`Base plus
`scaled index
`
`Base plus scaled
`I
`index with
`8- or 32-bit
`displacement
`
`Source/destination operand type
`
`Second source operand
`
`-
`
`Register
`Register
`- - - - - -- - - - - - - - - - - - - - - - - - - - -
`Register
`Immediate
`
`-
`
`-
`
`Memory
`Register
`- ~ - - - - - - - - - - - - - - - - - - -
`Memory
`Register
`
`-
`
`-
`
`Memory
`
`Immediate
`
`Instruction types for the arithmetic, logical, and data transfer instructions.
`FIGURE 3.30
`The 80x86 ;illows the combinations shown. The only restriction is the absence of a memory(cid:173)
`memory mode. lmmediates may be 8, 16, or 32 bits in length; a register is any one of the 14 m;ijor
`registers in Figure 3.29 (not EIP or EFLAGS).
`
`The arithmetic, logical, and data transfer instructions are two-operand in(cid:173)
`structions that allow the combinations shown in Figure 3.30. There are two
`important differences here. The 80x86 arithmetic and logical instructions must
`have one operand act as both a source and a destination; MIPS allows separate
`registers for source and destination. This restriction puts more pressure on the
`limited registers, since one source register must be modified. The second im(cid:173)
`portant difference is that one of the operands can be in memory. Thus virtually
`any instruction may have one operand in memory, unlike MIPS and PowerPC.
`The seven data memory-addressing modes, described in detail below, offer
`two sizes of addresses within the instruction. These so-called displnce111ents can
`be 8 bits or 32 bits.
`Although a memory operand can use any addressing mode, there are re(cid:173)
`strictions on which registers can be used in a mode. Figure 3.31 shows the 80x86
`addressing modes and which GPRs cannot be used with that mode, plus how
`you would get the same effect using MIPS instructions.
`
`80x86 Integer Operations
`The 8086 provides support for both 8-bit (byte) and 16-bit (word) data types.
`The 80386 adds 32-bit addresses and data (double words) in the 80x86. The data
`type distinctions apply to register operations as well as memory accesses.
`Almost every opera tion works on both 8-bit data and on one longer data size.
`That size is determined by the mode, and is either 16 bits or 32 bits.
`Clearly some programs want to operate on data of all three sizes, so the
`80386 architects provide a convenient way to specify each version without ex(cid:173)
`panding code size significantly. They decided that most programs would be
`dominated by either 16-bit or 32-bit data, and so it made sense to be able to set
`a default large size. This default data size is set by a bit in the code segment
`register. To override the default data size, an 8-bit prefix is attached to the in(cid:173)
`struction to tell the machine to use the other large size for this instruction.
`The prefix solution was borrowed from the 8086, which allows multiple pre(cid:173)
`fixes to modify instruction behavior. The three original prefixes override the
`
`l w $ s O , 0 ( $ s l )
`l w $s0 , 10()($51 )// <; ] 6- bit 7
`II d is p laceme nt
`---r---=:T----- - -- -- - -- - -----1----- -- - -- __j_ - -- - --
`he address is
`Base:anyGPR
`mul $t0 ,$ s2 .4
`Index: not ESP ~ add $ t O, $t O, $ s 1
`Base + (2Scale x Index)
`where Scale has the value 0, 1, 2, or 3 .
`l w $ s0 . O ( HO )
`mu l $t O, $ s z. 4
`The address is
`a dd HO . $t O, $ s 1
`Base + (2scale x Index) + displacement
`l w $s0 . 100( HO ) fl <; 16-bi t
`where Scale has the value 0, 1, 2, or 3.
`II displacement
`
`_
`
`_ ________
`
`_J_ __ _ ~ _ __J
`
`_
`
`FIGURE 3.31 80x86 32-bit addressing modes with register restrictions and the equivalent MIPS code. The Base
`plus Scaled Index add_ressing mode'. not found in MIPS or the PowerPC, is included to avoid the multiplies by four (scale
`factor of 2) to turn an mdex ma register mto a byte address (see Figures 3.24 and 3.26). A scale factor of 1 is used for 16-bit
`data, and_a scale factor of 3 for 64-bit data. Scale factor of O means the address is not scaled. If the displacement is longer
`than 16 bits Ill the second or fourth modes, then the MIPS equivalent mode would need two more instructions: a
`L, i to
`load the upper 16 bits of the displacement and an add to sum the upper address with the base register $ J. (Intel gi,·es t\\'O
`different names to what 1s called Based addressing mode-Based and Indexed- but they are essenti;illv kkntical ,rnd we
`combme them here.)
`-
`
`default segment register, lock the bus to support a semaphore (see Chapter 9),
`or repeat the following instruction until the register ECX counts down to 0.
`This last prefix was intended to be paired with a byte move instruction to move
`a variable number of bytes. The 80386 also added a prefix to override the de(cid:173)
`fault address size.
`The 80x86 integer operations can be divided into four major classes:
`
`1. Data movement instructions, including move, push, and pop
`
`2. Arithmetic and logic instructions, including test and integer and deci(cid:173)
`mal arithmetic operations
`
`3. Control flow, including conditional branches, unconditional Jumps,
`calls, and returns
`
`4. String instructions, including string move and string compare
`
`The first two categories are unremarkable, except that the arithmetic and
`logic instruction operations allow the destination to either be a register or a
`memory location. Figure 3.32 shows some typical 80x86 instructions and their
`functions.
`Conditional branches on the PowerPC and the 80x86 are based on co11ditio11
`codes or flngs . Condition codes are set as a side effect of an operation; most Me
`used to compare the value of a result to 0. Branches then test the conditiun
`codes. The argument for condition codes is that they occur as part of normal
`operations and are faster to test than it is to compare registers as MIPS does for
`
`,
`
`l
`
`INTEL - 1012
`
`
`
`182
`
`Chapter 3
`
`Instructions: Language of the Machine
`
`3.12 Real Stuff: PowerPC and 80x86 Instructions
`
`183
`
`I
`
`Instruction
`
`J E name
`
`JMP name
`CALL name
`MOVW EBX,[ ED l+45]
`PUSH ES!
`POP EDI
`ADD
`EAX , #6765
`TEST EDX,#4 2
`MOVSL
`
`Function
`
`{EIP=name ) ;
`
`if equal(condition code)
`EIP - 128 ~ name< EIP+l28
`EIP=name
`SP=SP - 4 ; M[SPJ=EIP+5; EIP=name ;
`EB X=M[ ED I +45 J
`SP=SP- 4 ; M[SP]= ES I
`EDl=M[SPJ ; SP=SP+4
`EAX= EAX + 6765
`Set condition code (flags) with EDX and 42hex
`
`M [EDI J =M [ES I J ;
`EDI=EDI+4 ; ESI=E SI+4
`
`FIGURE 3.32 Some typical 80x86 instructions and their functions. A list of frequent oper(cid:173)
`ations appears in Figure 3.33. The CALL saves the EIP of the next instruction on the stack. (EIP is
`the Intel PC.)
`
`beq and bne. The argument against condition codes is that the compare to 0 ex(cid:173)
`tends the time of the operation, since it uses extra hardware after the operation,
`and that often the programmer must use compare instructions to test a value
`that is not the result of an operation. Also, PC-relative branch addresses must
`be specified in the number of bytes, since unlike MIPS, 80386 instructions are
`not all 4 bytes in length.
`String instructions are part of the 8080 ancestry of the 80x86 and are not
`commonly executed in most programs. They are often slower than equivalent
`software routines (see the fallacy on page 185).
`Figure 3.33 lists some of the integer 80x86 instructions. Many of the instruc(cid:173)
`tions are available in both byte and word formats.
`
`80x86 Instruction Encoding
`Saving the worst for last, the encoding of instructions in the 8086 is complex,
`with many different instruction formats. Instructions for the 80386 may vary
`from 1 byte, when there are no operands, up to 17 bytes.
`Figure 3.34 shows the instruction format for several of the example instruc(cid:173)
`tions in Figure 3.32. The opcode byte usually contains a bit saying whether the
`operand is 8 bits or 32 bits. For some instructions the opcode may include the
`addressing mode and the register; this is true in many instructions that have
`the form "register= register op immediate." Other instructions use a "post(cid:173)
`byte" or extra opcode byte, labeled "mod, reg, r/m," which con