throbber
(12) United States Patent
`Blomgren et al.
`
`I 1111111111111111 11111 111111111111111 IIIII IIIII IIIII 111111111111111 11111111
`US006334183Bl
`US 6,334,183 Bl
`Dec. 25, 2001
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) METHOD AND APPARATUS FOR
`HANDLING PARTIAL REGISTER ACCESSES
`
`(75)
`
`Inventors: James S. Blomgren; Anthony M.
`Petro, both of Austin, TX (US)
`
`(73) Assignee: Intrinsity, Inc., Austin, TX (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by O days.
`
`(21) Appl. No.: 09/195,757
`
`(22)
`
`Filed:
`
`Nov. 18, 1998
`
`Related U.S. Application Data
`Provisional application No. 60/065,643, filed on Nov. 18,
`1997.
`Int. Cl.7 ...................................................... G06F 15/00
`U.S. Cl. ........................... 712/221; 712/223; 708/236
`Field of Search ..................................... 712/221, 222,
`712/223, 224; 708/236, 495, 525, 523,
`490
`
`References Cited
`
`(60)
`
`(51)
`(52)
`(58)
`
`(56)
`
`U.S. PATENT DOCUMENTS
`5,144,576 * 9/1992 Briggs et al. ........................ 708/493
`
`5,204,828 * 4/1993 Kohn .................................... 708/501
`5,524,263
`6/1996 Griffith et al. .
`5,835,394 * 11/1998 Wong ................................... 708/670
`5,880,983 * 3/1999 Elliott et al. ......................... 708/501
`5,930,159 * 7/1999 Wong ................................... 708/550
`5,935,198 * 8/1999 Blomgren ............................. 708/290
`* cited by examiner
`
`Primary Examiner-Eddie Chan
`Assistant Examiner-Gautam R. Patel
`(74) Attorney, Agent, or Firm-Booth & Wright, L.L.P.;
`Matthew J. Booth
`
`(57)
`
`ABSTRACT
`
`The present invention includes a partial register write han(cid:173)
`dler. The write handler receives either two or three operands.
`An execution unit operates on portions of two operands,
`rather than on full operands. The result of the execution unit
`has fewer bits than an "additional" operand, which may be
`any of the two or three operands received by the write
`handler. An output multiplexer receives all of the bits of an
`execution unit result and selected bits of the additional
`operand, and produces an output that has as many bits as the
`additional operand. If the output of the multiplexer is a string
`of bits, the string of bits contains the execution unit result as
`a substring of bits. The remaining bits of the output of the
`multiplexer are selected from the additional operand.
`
`36 Claims, 2 Drawing Sheets
`
`-------------------------------------------------------------------,
`A<l:0> 8<7:0>
`A<31:16> 8<31:16> A< 15:8> 8< 15:8>
`
`168
`
`TOP
`
`166
`
`164
`
`162
`
`-------- ----------'(_160 _______ ---------------------- ------------
`RESULT<31:16>
`RESULT<15:8>
`RESULT<7:0>
`
`ADI) BH,AL- >AL
`ADD BL,AL->AL
`ADD BH,AH->AH
`ADD BL,AH->AH
`ADD BX,AX->AX
`ADD EBX,EAX->EAX
`
`TOP
`pass
`pass
`pass
`pass
`pass
`add
`
`HIGH
`pass
`pass
`add
`don•tcare
`add
`add
`
`MIX-H
`don•tcare
`don•tcare
`don't care
`add
`don•tcare
`don•tcare
`
`MIX-L
`add
`don•tcare
`don•tcare
`don't care
`don•tcare
`don•tcare
`
`LOW
`don•tcare
`add
`pass
`pass
`add
`add
`
`Oracle-1008 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Dec. 25, 2001
`
`Sheet 1 of 2
`
`US 6,334,183 Bl
`
`Register
`File
`
`~
`
`/
`
`~
`
`C
`
`A
`
`buffer ~/
`
`ALU, or
`Arithmetic Logic Unit
`(instruction unit)
`boolean--.
`
`~shifter
`---~adder
`/
`
`"'
`
`160_/
`
`FIG. 1
`
`bypass
`
`/
`'-150
`
`B
`
`\. 102
`
`Oracle-1008 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`I■-
`~
`~
`~
`I■-
`_,.,l;,..
`~
`~
`O'I
`rJ'J.
`e
`
`N
`0 ....,
`N
`~ ....
`'JJ. =(cid:173)~
`
`'"""'
`0
`0
`N
`~Ul
`N
`ri
`~
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`add
`add
`pass
`pass
`add
`don't care
`LOW
`
`don't care
`don't care
`don't care
`don't care
`don't care
`add
`MIX-L
`
`don't care
`don't care
`add
`don't care
`don't care
`don't care
`MIX-H
`
`FIG. 2
`
`add
`add
`don't care
`add
`pass
`pass
`HIGH
`
`add
`pass
`pass
`pass
`pass
`pass
`TOP
`
`ADD EBX,EAX->EAX
`ADD BX,AX->AX
`ADD BL,AH->AH
`ADD BH,AH->AH
`ADD BL,AL->AL
`ADD BH,AL->AL
`
`RESULT<l:0>
`
`RESULT<15:8>
`
`RESULT<31:16>
`
`~--------~---------""'--160-------~----------------------+-------------
`
`.
`
`I
`
`162 i
`
`/
`
`172
`
`174
`
`164
`
`166
`
`170
`
`TOP
`
`-------------------------------------------------------------------,
`
`A<7:0> 8<7:0>
`
`A<31:16> 8<31:16> A< 15:8> 8< 15:8>
`
`Oracle-1008 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`US 6,334,183 Bl
`
`1
`METHOD AND APPARATUS FOR
`HANDLING PARTIAL REGISTER ACCESSES
`
`This application claims the benefit of the earlier filed
`U.S. Provisional Application Ser. No. 60/065,643, filed Nov.
`18, 1997, which is incorporated by reference for all purposes
`into this application.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates to the field of computer
`operation, and more specifically to register files in a com(cid:173)
`puter.
`2. Description of the Related Art
`The original 8086 used eight general purpose registers in
`a sequential processor. Each instruction was received and
`was processed one at a time. However, as processors have
`become pipelined to achieve higher clock rates, dealing with
`instructions has become more problematic. Pipelined
`instructions have added complexity in dealing with bypassed
`operands, and have made handling partial registers more
`difficult. Pipelined processors have been able to process
`instructions out of order, "shelving" instructions until all
`operands needed for the operation are valid.
`When an instruction that requires operands is received,
`the pipelined processor has typically responded by examin(cid:173)
`ing the register file to determine whether all of the operands
`are available. If some operands are not available, the instruc(cid:173)
`tion is "shelved" until the operands become available. An
`instruction may remain shelved for several clock cycles. For
`example, this could occur when operands missing from an
`instruction must be read from a memory having a high
`latency. While an instruction is shelved, however, later
`instructions received by the processor may attempt to update
`the values of the operands that were available in the shelved
`instruction. Consequently, performing operations out of
`order can be a complicated task, as updates to operand
`values require complicated register file management.
`Similarly, instructions have been allowed to produce
`intermediate values, which have not been ready to be written
`to the register file. "Intermediate" values refer to those
`values generated by an execution unit such an arithmetic
`logic unit (ALU), but which may not be complete. When an
`instruction that has generated a result is not yet known to be
`on the execution path, the instruction is not yet known to be
`complete. For example, consider an instruction to add the
`values of AX and BX and place the result in AX. The value
`of BX may be available, but the original value of AX may 50
`not be available. In such a case, the instruction is not
`complete; an operand is still necessary to complete the
`operation. The result, therefore, is "intermediate" in that it
`depends on a yet-to-be-determined value. Such intermediate
`values may not be written to the register file until the 55
`instruction is known to be complete.
`Instructions are not known to be complete until all older
`instructions in the instruction path have completed and have
`not encountered any exceptional conditions. However, pipe(cid:173)
`lined processors have generally been permitted to execute 60
`instructions on intermediate data, and then later to determine
`that the intermediate data is valid and may be written to the
`register file. Pipelined processors have therefore achieved
`much of their speed performance by allowing results to be
`used before those results are committed to the register file. 65
`When a processor has received updates for some, but not
`all, of the operands needed for a particular operation, the
`
`30
`
`2
`pipelined processor has required additional logic to prevent
`use of "old" values for the missing operands. Handling
`instructions out of order can greatly complicate this task.
`Moreover, when partial register handling has been
`5 supported, updates to a portion of a register have compli(cid:173)
`cated handling the register, where portions of the register
`have been valid and portions of the register have been
`invalid.
`Intermediate results have therefore been held in a pending
`10 state until written to the register file. The pending state has
`been implemented as an additional "pending" file somewhat
`structurally similar to, and in some implementations larger
`than, the register file. Bypassing, or reading intermediate
`values from the pending state, has allowed pipelined pro-
`15 cessors to execute instructions before determining that the
`instructions that had generated those intermediate values are
`complete.
`Typically, the pending state has been implemented as a
`number of stages along a pipeline. Data has been copied
`20 from one stage to the next until, at the final stage, that data
`has been written to the register file. At each point along the
`pipeline, data has been available to "younger" instructions,
`with more recent updates to the register value being written
`to the first stage of the pipeline. Instructions have been able
`25 to select values from the various stages of the pipeline, and
`from the register file.
`To handle the relatively large number of sources (the
`various stages of the pipeline and the register file) from
`which an instruction can read values of an operand, a bypass
`multiplexer ("bypass multiplexer") has typically been
`included. The implementation of the bypass multiplexer,
`however, is different in many CISC architectures than it is in
`RISC architectures, because the former supports partial
`register write operations, while the latter does not.
`The x86 instruction set originally supported eight 16-bit
`general purpose registers, four of which could be divided
`into two 8-bit general purpose registers. Division of registers
`AX, BX, CX, and DX allowed byte-access (8-bit access) to
`40 the upper and lower bytes of these registers. As a result, not
`only have registers AX, BX, CX, and DX been supported,
`but also registers AH, AL, BH, BL, CH, CL, DH, and DL,
`referring to the high-order byte and low-order byte within
`each 16-bit register. The register count effectively increased
`45 to 16 registers: registers AX, BX, CX, DX,AH,AL, BH, BL,
`CH, CL, DH, DL, SP, BP, SI, and DI. Registers SP, BP, SI,
`and DI have not been divided.
`Moreover, with the introduction of the 386, the x86
`architecture grew to support 32-bit registers. However, the
`prevalence of code using the original 16-bit register set
`instruction set necessitated support of both 16-bit and 32-bit
`register sizes. The 386 instruction set allowed access to all
`the aforementioned registers, as well as allowing access to
`"extended" (32-bit) registers. The 16-bit registers were
`considered partial registers of the new 32-bit registers. Each
`instruction was provided with four partial register options:
`the instruction could select the full "extended" register, or
`the "lower" 16-bits of the extended register, or the high(cid:173)
`order byte or the low-order byte of the lower 16-bit register.
`For example, in the 386 instruction set, an instruction was
`permitted to access an extended register, for example EAX.
`Such an access would access bits [31:0] of a 32-bit register.
`Another instruction could access a 16-bit register, for
`example AX. Such an access would access bits [15:0] of the
`same 32-bit register. Another instruction could access an
`8-bit register, for example, AL. Such an access would access
`bits [7:0] of the same 32-bit register. Another instruction
`
`35
`
`Oracle-1008 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`US 6,334,183 Bl
`
`3
`could access another 8-bit register, for example, AX. Such
`an access would access bits [15:8] of the same 32-bit
`register.
`In part due to the problem of handling partial registers in
`a pipelined processor, and in part due to the low cost of 5
`registers, RISC microprocessors have not supported partial
`register write operations. Adding registers has become far
`less expensive than dividing existing registers, particularly
`in light of the added complexity. Therefore, the RISC
`processor historically has operated on registers in their 10
`entirety. Consequently, the bypass multiplexer in a pipelined
`RISC processor configuration has been required only to
`select one of the sources (the various stages of the pipeline
`or the register file) from which an instruction can read values
`of an operand.
`On the other hand, CISC architectures allow a variety of
`portions of the registers to be altered. A pipelined CISC
`processor configuration that allows portions of registers to
`be operated upon has typically required an additional field to
`indicate which portion of the register has been updated
`between one stage and the next. The same result stage
`registers have been used for the result, and the additional
`field has been used to indicate how to write ( commit) the
`result to the register file at the end of the pipeline. Since
`CISC architecture allows a variety of portions of registers to 25
`be altered, the bypass multiplexer must select more than one
`place that an operand might be found, since portions of that
`operand may be generated by different instructions.
`These approaches have proven unsatisfactory. The imple(cid:173)
`mentation of the bypass multiplexer has proven extremely
`complicated in CISC architectures. A particularly problem(cid:173)
`atic example is the situation in which multiple instructions
`have written to different portions of a register, and then an
`instruction requiring the value of the register is encountered.
`In such a case, various stages of the pipeline contain
`different portions of the register value to be used. According
`to one approach, the instruction requiring the value of the
`register has caused the bypass multiplexer to select from the
`relatively large number of sources of values. According to
`another approach, the instruction requiring the value of the
`register has been stalled until enough of the results have
`been written that the operand comes from a single result
`register or from the register file itself. According to still
`another approach, a combination of complex multiplexing 45
`and stalling has been used.
`
`20
`
`4
`to receive a second operand having a second plurality of bits.
`The third input is configured to receive a third operand
`having a third plurality of bits.
`The exemplary embodiment of the present invention also
`includes an execution unit. The execution unit is configured
`to perform an operation upon a portion of the first operand
`and upon a portion of the second operand. The execution
`unit is also configured to provide a result having a number
`of bits no greater than the number of bits of the third
`operand.
`The exemplary embodiment of the present invention also
`includes a multiplexer, configured to receive all of the bits
`of the execution unit result and all of the bits of the third
`operand. The multiplexer is also configured to select a
`15 portion of the third operand, and to provide a multiplexer
`output. The multiplexer output comprises the selected por(cid:173)
`tion of the third operand and the execution unit result. Once
`the multiplexer output is produced, it replaces the third
`operand.
`In some embodiments, the first operand and the third
`operand are the same operand. The execution unit is also
`configured to provide a result having a number of bits no
`greater than the number of bits of the third operand. The
`multiplexer output comprises the selected portion of the first
`operand and the execution unit result. Once the multiplexer
`output is produced, it replaces the first operand.
`In some embodiments, the first and second operands ( and,
`when a third operand is included, the third operand) all have
`30 equal numbers of bits. Moreover, the bits of each of the
`operands are numbered, and there is a one-to-one correspon(cid:173)
`dence between the bits of the various operands. However, in
`some embodiments, the selected portion of the first operand
`and the selected portion of the second operand contain bits
`35 that correspond to one another; for example, the selected
`portion of the first operand may include bits O to 7 of the first
`operand, and the selected portion of the second operand may
`include bits O to 7 of the second operand. In other
`embodiments, no bit in the selected portion of the first
`40 operand corresponds to any bit in the selected portion of the
`second operand; for example, the selected portion of the first
`operand may include bits O and 7 of the first operand, and the
`selected portion of the second operand may include bits 8 to
`15 of the second operand.
`Although there is a one-to-one correspondence between
`the bits of the third operand and the bits of the first and
`second operands, the selected portion of the third operand
`includes all of the bits of the third operand except those that
`correspond to a bit in either of the first and second operands.
`For example, when the selected portion of the first operand
`includes bits O to 7 of the first operand, and the selected
`portion of the second operand includes bits O to 7 of the
`second operand, the selected portion of the third operand
`includes bits 8 to 31 of the third operand. When the selected
`55 portion of the first operand includes the bits O to 7 of the first
`operand, and the selected portion of the second operand
`includes bits 8 to 15 of the second operand, then the bits
`selected from the third operand are the bits which are not
`provided from the execution unit.
`In some embodiments, when the execution unit is con-
`figured to perform an operation, the operation performed is
`an ADD operation performed by the arithmetic logic unit
`within the execution unit. The ADD operation produces a
`sum of the selected portion of the first operand and the
`65 selected portion of the second operand. In some such
`embodiments, the multiplexer is configured to provide bits
`0 to 7 of the third operand as bits O to 7 of the multiplexer
`
`SUMMARY OF THE INVENTION
`
`The present invention includes a partial register write
`handler. The write handler receives either two or three 50
`operands. An execution unit operates on portions of two
`operands, rather than on full operands. The result of the
`execution unit has fewer bits than an "additional" operand,
`which may be any of the two or three operands received by
`the write handler. An output multiplexer receives all of the
`bits of an execution unit result and selected bits of the
`additional operand, and produces an output that has as many
`bits as the additional operand. If the output of the multi(cid:173)
`plexer is a string of bits, the string of bits can contain the
`execution unit result as a substring of bits, and the remaining 60
`bits of the output of the multiplexer are selected from the
`additional operand.
`The present invention includes, according to an exem(cid:173)
`plary embodiment, an apparatus that comprises a first input
`for receiving a first operand, a second input, and a third
`input. The first input is configured to receive a first operand
`having a first plurality of bits. The second input is configured
`
`Oracle-1008 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`US 6,334,183 Bl
`
`6
`register may be located. Tables 1-5 illustrate how one
`embodiment of the present invention handles a read instruc(cid:173)
`tion that attempts to read at least some of the bits of a
`register, but which follows a previous instruction that has
`written to at least some of the bits of a register.
`Table 1 describes a write-then-read situation. The "B" in
`Table 1 indicates that the value can be bypassed. In Table 1,
`a register value is written to a register by an execution unit
`performing an instruction, and then immediately read from
`10 the first pipeline stage by the immediately following instruc(cid:173)
`tion. In other words, the most recently generated instance of
`the register value is read. As shown in Table 1, in such a
`situation, the register file can be bypassed. Bypassing the
`register file indicates that the instruction that generated the
`15 value is complete, and that other instructions may use the
`value.
`
`TABLE 1
`
`Partial register written-to
`by the first instruction
`
`EAX
`
`AX
`
`AH
`
`AL
`
`Partial register
`read by the second
`instruction
`
`EAX
`AX
`AH
`AL
`
`B
`
`B
`
`B
`
`B
`
`Bypassed values are still written to the register file, as
`intermediate values before they are written to the register
`30 file.
`Table 2 describes a different write-then-read situation. In
`Table 2, a portion of the register value is generated by a first
`instruction, and then a portion of that portion is needed by
`a second instruction. In such a case, the number of bits being
`written and the number of bits thereafter being read are
`unequal, but the bits being read by the second instruction are
`assuredly the updated value. In other words, Table 2 illus(cid:173)
`trates a situation in which the read instruction "wants" a
`smaller portion of the data that was produced by the write.
`For example, if an instruction that reads AX follows an
`instruction that writes all of EAX, then the read instruction
`receives only updated bits.
`Table 2 encompasses the Table 1 case, i.e., where the same
`register is written and then read. Table 2 also adds the
`situation in which a portion of the register value is generated
`by a first instruction, and then a portion of that portion is
`needed by a second instruction. Write operations to EAX or
`AX cannot be followed by a read operation from AH, due to
`bit alignment uncertainties.
`
`45
`
`5
`output, the sum as bits 8 to 15 of the multiplexer output, and
`bits 16 to 31 of the third operand as bits 16 to 31 of the
`multiplexer output. In other such embodiments, the multi(cid:173)
`plexer is configured to provide the sum as bits O to 7 of the
`multiplexer output, and bits 8 to 31 of the third operand as 5
`bits 8 to 31 of the multiplexer output.
`In still other embodiments of the present invention, a
`system includes a register file containing a first plurality of
`registers. The first plurality of registers includes a first
`register having a first number of bits, and a second register
`having a second number of bits, and a third register having
`a third number of bits. The system also includes a pending
`file containing a second plurality of registers. The second
`plurality of registers includes at least one register corre(cid:173)
`sponding to the first register, at least one register corre(cid:173)
`sponding to the second register, and at least one register
`corresponding to the third register.
`In such embodiments, the system also includes a bypass
`multiplexer configured to select a first operand either from
`the first register or from one of the at least one register 20
`corresponding to the first register in the pending file. The
`bypass multiplexer configured to select a second operand
`either from the second register or from one of the at least one
`register corresponding to the second register in the pending
`file. The system also includes an execution unit, configured 25
`to perform an operation upon a portion of the first operand
`and upon a portion of the second operand. The execution
`unit is further configured to provide a result having a number
`of bits no greater than the number of bits of the third register.
`In such embodiments, the system also includes an output
`multiplexer that receives all of the bits of the execution unit
`result, and selects a third operand either from the third
`register or from one of the at least one register corresponding
`to the third register in the pending file. However, in some of 35
`these embodiments, the third register is actually the first
`register. The output multiplexer also selects a portion of the
`third operand having a number of bits equal to the difference
`between the number of bits in the third register and the
`number of bits in the execution unit result. Accordingly, the 40
`multiplexer provides a multiplexer output comprising the
`selected portion of the third operand and the execution unit
`result, and provides the multiplexer output to either the third
`register or to one of the registers corresponding to the third
`register in the pending file.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 shows a partial write handler having both a bypass
`multiplexer and an output multiplexer, according to one
`embodiment of the present invention.
`FIG. 2 shows a destination merging adder residing within
`a write handler according to the second embodiment of the
`present invention.
`
`50
`
`TABLE 2
`
`Partial register written-to
`by the first instruction
`
`EAX
`
`AX
`
`AH
`
`AL
`
`Partial register
`read by the second
`instruction
`
`EAX
`AX
`AH
`AL
`
`B
`B
`
`B
`
`B
`
`B
`
`B
`
`B
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`55
`
`The present invention comprises partial register access
`handler. This disclosure describes numerous specific details
`that include specific structures, circuits, and logic functions 60
`in order to provide a thorough understanding of the present
`invention. One skilled in the art will appreciate that one may
`practice the present invention without these specific details.
`According to one embodiment of the present invention, a
`partial write handler allows the bypass multiplexer to com(cid:173)
`bine a variety of portions of the registers, despite the large
`number of locations in which various pipelined stages of the
`
`Table 3 shows the write-then-read situation where either
`the write operation writes data to register AH, or the read
`operation reads from AH, or both. When AH is written, data
`65 is provided to bits [15:8] of register EAX. However, the
`remaining bits, bits [7:0] and bits [31:16], are not up
`Consequently, reading from EAX or AX would require
`
`Oracle-1008 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`US 6,334,183 Bl
`
`7
`obtaining bits [15:8] from the updated value provided by the
`write command, and also obtaining other bits from another
`location. While this could be done by a complex multiplexer,
`generally, this is performed by stalling until all the necessary
`bits have been updated. Stalling, or interlocking, is indicated 5
`by an "I" in Table 3.
`
`TABLE 3
`
`Partial register written-to
`by the first instruction
`
`EAX
`
`AX
`
`AH
`
`AL
`
`Partial register
`read by the second
`instruction
`
`EAX
`AX
`AH
`AL
`
`B
`B
`
`B
`
`B
`
`B
`
`B
`
`B
`
`Referring now to Table 4, when an entire different portion
`of a register is written than is being requested, then the
`operand is obtained from the register file rather than from the
`updated bits. This is indicated by the letter "R" in Table 4.
`The updating of the bits that are not requested is irrelevant
`to the read instruction.
`
`TABLE 4
`
`Partial register written-to
`by the first instruction
`
`EAX
`
`AX
`
`AH
`
`AL
`
`Partial register
`read by the second
`instruction
`
`EAX
`AX
`AH
`AL
`
`B
`B
`
`B
`
`B
`
`B
`
`B
`R
`
`R
`B
`
`Referring now to Table 5, when more of the register is
`read than had been written, the bypass multiplexer must
`combine bits from different sources, or stall. Writing to AX
`and then reading EAX requires the read instruction to obtain
`bits [15:0] from the updated AX, and the remaining bits
`[31:16] from a pipeline stage representing a previous write
`instance of the register value. Similarly, writing to AL and
`then reading from AX or EAX requires the read instruction
`to obtain bits [7:0] from the updated AL, and the remaining
`bits [15:8] (in the case of reading AX) or [31:8] (in the case
`of reading EAX) from a pipeline stage representing a
`previous write instance of the register value. While this
`could be done by a complex bypass multiplexer, generally
`pipeline processors merely stall on such instructions until all
`the necessary bits have been updated. Stalling, or
`interlocking, is indicated by an "I" in Table 5.
`
`TABLE 5
`
`Partial register written-to
`by the first instruction
`
`EAX
`
`AX
`
`AH
`
`AL
`
`Partial register
`read by the second
`instruction
`
`EAX
`AX
`AH
`AL
`
`B
`B
`
`B
`
`B
`
`B
`
`B
`R
`
`R
`B
`
`When complex bypass multiplexers are used in a pipeline
`processor, the additional internal interconnect requirements
`of the control hardware increase by a factor of three, since
`there are three portions of a register which can be indepen(cid:173)
`dently written.
`
`8
`Mapping Register File Bits to an Output Multiplexer
`FIG. 1 shows one embodiment of the present invention
`comprising a partial write handler having both a bypass
`multiplexer 150 and an output multiplexer 160. The bypass
`multiplexer 150 provides an input to an execution unit 102
`such as an arithmetic logic unit (ALU). The execution unit
`102 receives and executes an instruction. The instruction is
`selected from an instruction set that allows access to partial
`registers AH, AL, BH, BL, CH, CL, DH, and DL, referring
`10 to the high-order byte and low-order byte within each 16-bit
`register, as well as to the full 16-bit registers AX, BX, CX,
`and DX, and their 32-bit extended versions. The output of
`the execution unit 102 contains a number of bits that
`depends on the instruction being performed.
`According to the embodiment of the present invention
`shown in FIG. 1, the partial register write handler also
`includes output multiplexer 160. Output multiplexer 160
`receives the output of the execution unit 102 as well as
`additional bits, and then provides a full 32-bit output upon
`20 every clock cycle. The use of the output multiplexer 160 to
`produce a 32-bit output obviates the need for the bypass
`multiplexer 150 to select bits from multiple locations in the
`cases shown above in Tables 3-5.
`The output multiplexer produces results that appear as if
`the entire 32-bit registers were updated upon each clock
`cycle. In effect, a substitute instruction set is created, in
`which instructions that write to partial registers are replaced
`with instructions that not only write to partial registers, but
`also read enough bits from the register file to generate a full
`32-bit register value. A, B, and C in FIG. 1 represent
`operands residing in partial registers that are provided to
`ALU 102 and/or the output multiplexer 160. If, for example,
`the bottom (least significant) bits of register EAX are
`incremented (AL) then the upper 24 bits are set to the value
`35 that would be in the register file entry for EAX after the
`increment completes. Since the increment does not modify
`the upper 24 bits, the value after the increment is the same
`as the value before the increment.
`According to a first embodiment of the present invention,
`the output multiplexer and the execution unit 102 perform
`their respective operations in parallel. To accomplish this,
`the output multiplexer 160 determines the destination reg(cid:173)
`ister of the instruction being executed in the execution unit
`102 before the execution unit 102 begins executing the
`instruction. The output multiplexer 160 also determines the
`number of bits the instruction being executed in the execu-
`tion unit 102 can generate, also before the execution unit 102
`begins executing the instruction. The output multiplexer 160
`then accesses the register file, and reads the prior value of the
`destination register. The output multiplexer 160 then merges
`the bits of the destination register's prior value with the
`destination register's update to provide a 32-bit register
`value.
`Assuming the Destination Register is One of the Source
`55 Registers
`Since the number of instructions within the x86 instruc(cid:173)
`tion set is quite extensive, providing an output multiplexer
`160 that maps bits of all of the registers in the register file
`would be an enormous task. However, most instructions in
`the x86 instruction set select and operate on two (source)
`operands, and then write the result back to one of the two
`source operands rather than to a third operand. The output
`multiplexer 160 of the present invention takes advantage of
`this feature of the x86 instruction set. For example, all of the
`65 x86 instruction set ADD operations fall into six categories,
`all of which overwrite at least part of one of the source
`registers. In other words, for any two registers, identified in
`
`15
`
`25
`
`30
`
`40
`
`45
`
`50
`
`60
`
`Oracle-1008 p. 7
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`US 6,334,183 Bl
`
`9
`a particular order, there are six ADD operations in the x86
`instruction set, all of

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket