`Liu et al.
`
`54 METHOD FORSCHEDULING A FLAG
`GENERATING INSTRUCTION AND A
`SUBSEQUENT INSTRUCTION BY
`EXECUTING THE FLAG GENERATING
`INSTRUCTION IN A MICROPROCESSOR
`
`(75)
`
`56)
`
`Inventors: Kin-Yip Liu, Millbrae; Ken
`Shoemaker, Los Altos Hills; Gary
`Hammond, Campbell; Anand Pai, San
`Jose; Krishna Yellamilli, Cupertino, all
`of Calif.
`Assignee: Intel Corporation, Santa Clara, Calif.
`Notice:
`This patent issued on a continued pros
`ecution application filed under 37 CFR
`1.53(d), and is subject to the twenty year
`patent term provisions of 35 U.S.C.
`154(a)(2).
`Appl. No.: 08/699,910
`Filed:
`Aug. 20, 1996
`Int. Cl." ............................... G06F 9/40; G06F 9/38
`U.S. Cl. .......................... 712/214; 712/215; 712/216;
`712/221; 712/223; 712/23; 712/32; 712/41;
`712/245
`Field of Search ..................................... 395/376-394,
`395/561, 564, 566, 567, 800.23, 800.41,
`800.42, 800.32; 712/200, 214-216, 220-221,
`223, 226, 245, 23, 32, 41
`References Cited
`U.S. PATENT DOCUMENTS
`
`1/1973 Batcher ................................... 708/210
`3,711,692
`4,161,784 7/1979 Cushing et al.
`... 712/222
`4,189,716 2/1980 Krambeck .....
`... 708/210
`4,371,951
`2/1983 Kort et al. .
`395/800.02
`4,393,468 7/1983 New ..............
`... 708/518
`4,418,383 11/1983 Doyle et al.
`... 710/127
`4,435,753 3/1984 Rizzi .............
`... 395/709
`4,486,848 12/1984 Kaminski ..
`... 708/210
`4,498,177 2/1985 Larson ..........
`... 708/210
`4,567,574
`1/1986 Saadé et al. ..
`... 395/709
`4,630,192 12/1986 Wassel et al. ........................... 708/210
`(List continued on next page.)
`
`addf;2X = X2 - X3
`
`prodf32 AFLAGS = X1, ALFAGS
`
`
`
`
`
`
`
`
`
`
`
`
`
`X1 {32} = carry out of bit 3
`X133} = carry out of bit 6
`X1 {34} = carry out of bit 7
`X1 {35} = carry Out of bit 14
`X1 {36} = carry Out of bit 15
`X1 {37} = carry Out of bit 30
`X1 (38} = carry out of bit 31
`
`aux = location of Auxiliary Carry in AFLAGS
`carry = location of Carry in AFLAGS
`Overf= location of Overflow in AFLAGS
`parity = locatin Parity in AFLAGS
`sign = location of Sign in AFLAGS
`Zero = location of Zero in AFLAGS
`
`
`
`US006049864A
`Patent Number:
`11
`(45) Date of Patent:
`
`6,049,864
`*Apr. 11, 2000
`
`FOREIGN PATENT DOCUMENTS
`318957A3 6/1989 European Pat. Off. .......... G06F 5/OO
`OTHER PUBLICATIONS
`MC88110 Second Generation Risc Microprocessor User's
`Manual, Motorola, pp. 1-23, date unknown.
`MC88110 Errata to Second Generation Risc Microproces
`sor User's Manual, Motorola, pp. 1-11, Sep. 22, 1992.
`MC88110 User Manual Index, Motorola, date unknown.
`MC8811OPRG Programmer's Reference Guide, Motorola,
`pp. 1-4, Dec. 1992.
`MC88110 User's Manual, Section 2 Programming Model,
`Motorola, pp. 1-20, date unknown.
`MC88110 User's Manual, Section 3 Addressing Modes and
`Instruction Set Summary, Motorola, pp. 1-32, date
`unknown.
`MC88110 User's Manual, Section 5 Graphics Unit Imple
`mentation, Motorola, pp. 1-26, date unknown.
`MC88110 User's Manual Padd Pixel Add, Section 10, pp.
`62-71, date unknown.
`(List continued on next page.)
`Primary Examiner Zarni Maung
`ASSistant Examiner Bharat Barot
`Attorney, Agent, or Firm Blakely, Sokoloff, Taylor &
`Zafman LLP
`ABSTRACT
`57
`A method for Scheduling a flag generating instruction and a
`Subsequent instruction. The Subsequent instruction has a
`data dependency on the flag generating instruction. The flag
`generating instruction is translated into first and Second
`instructions. The Subsequent instruction is translated into at
`least a third instruction. The first instruction, when executed,
`generates a result and intermediate flag generation data. The
`Second instruction, when executed, generates a plurality of
`flags. The first instruction is scheduled to execute before the
`Second and third instructions. The Second instruction is
`Scheduled to execute before the third instruction if the third
`instruction has a data dependency on the Second instruction,
`otherwise the third instruction may be Scheduled to execute
`before the Second instruction.
`
`19 Claims, 4 Drawing Sheets
`
`Translate the flag generating instruction
`into at least first and second
`instructions.
`
`Translate the subsequent instruction
`into at least a third instruction.
`
`
`
`
`
`Schedule the first instruction to
`execute.
`
`
`
`
`
`
`
`Does
`the third
`instruction have a data
`dependency On the Second
`instruction
`
`Schedule the Second
`instruction to execute
`before the third
`instruction.
`
`
`
`Schedule the third
`instruction to execute
`before the second
`instruction.
`
`LzLabs GmbH. Ex. 1007-1
`
`
`
`6,049,864
`Page 2
`
`
`
`U.S. PATENT DOCUMENTS
`
`7/1996 Ashkenzai ............................... 708/210
`5,541,865
`5,555,432 9/1996 Hinton et al. ...................... 395/800.23
`5,590,359 12/1996 Sharangpani.
`... 395/800.32
`4,656,582 4/1987 Chaitin et al. .......................... 395/707
`5,613,080 3/1997 Ray et al. ............................... 395/390
`4,656,583 4/1987 Auslander et al.
`395/709
`5,632,023 5/1997 White et al. ............................ 395/394
`4,707,800 11/1987 Montrone et al.
`708/714
`4,710,872 12/1987 Scarborough .
`395/707
`OTHER PUBLICATIONS
`4,773,007 9/1988 Kanada et al. .
`395/704
`it. H1. Milit al. .
`3,. SPARC Technology, UltraSPARC Multimedia Capabilities
`4,785,421 11/1988 Takahashi et al.
`708/205 CE Srps s RTime Video and Advanced
`4,821,181 4/1989 Iwasawa et al. ..
`34.5/500
`irri
`•s
`4,833,606 5/1989 Iwasawa et al. ........................
`7,
`Case, TriMedia Processors Aimed at Future Multimedia
`4.885,684 12/1989 Austin et al. ........................... 70/10s
`Embedded App, pp. 128 (1994).
`4,901.270 2/1990 Galbi et al. ...
`708/708
`Gwennap, HPS PA 7100LC Uses New Instructions to
`4,965,724 10/1990 Utsumi et al. .
`305,707
`Eliminate Decoder Chip, pp. 16-17 (1994).
`4,989,168
`1/1991 Kuroda et al. .
`708/216
`U.S. Patent Application entitled, “Method and Apparatus
`5,201,056 4/1993 Daniel et al. ..
`712/221
`Using Novel Operations in a Processor Inventors: Peleg et
`5,321,820 6/1994 Nakajima ...
`395/586
`al. Filed: Dec. 30, 1993.
`5,339,447 8/1994 Balmer .................................... 708/210
`U.S. Patent Application entitled, “Apparatus for Providing a
`5,404.552 4/1995 Ikenaga .............................. 395/800.41
`Two-Instruction Implementation of a Population Count
`5,418,736 5/1995 Widigen et al. ........................ 708/785
`Function” Inventors: Dreyer et al., Filed: Dec. 29, 1995.
`
`LzLabs GmbH. Ex. 1007-2
`
`
`
`U.S. Patent
`
`Apr. 11, 2000
`
`Sheet 1 of 4
`
`6,049,864
`
`
`
`
`
`Jo pºppé ale slºquinú þaufils üb?, sinôpôleinuð?pu002 SI MOJAMO
`
`
`
`
`
`
`
`
`
`TO?I: || V | Z | S || 0 ||
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`| 037 || Z |
`
`LzLabs GmbH. Ex. 1007-3
`
`
`
`
` SOVAVUlUBISJoUOReIO|=ubisSoydiyulAueAlelixnyJouoyeso]=xne
`
`({o:z}.x)sedsey=Awedsgydyuaaa
`
`SOYTIUlMO|J9AQJOUOIEIO]=J4aA0
`(0=={O:LE}LX)=O18Z'SOVTAV
`
`{Ze}iXvi{selLx=Haao'sovidy
`SOWTV‘LX=SUV1dVc&spoud
`SOVTAYulAuedJouole90]=Aued
`SOV140Ul0497JOUOIE90|=0192
`
`{getLX=Aueosoyldy
`
`SOVTVUlAqueguljeso|=Aqued
`{LE}LX=ubis'SOWTdy
`{Ze}LX=xne'SsoyTy
`
`uolonysu|Guryessuay6e|4
`EX+cX=1Xo€Iddv
`
`uononssujJuanbasqns
`{o:retex+{o:Le}ex={0
`pLUdJoInoAueo={o¢}1x2UGJonoAueo={pe}LXQIqJonoAued={ee}Lx
`GLUdJoynoAureos={9¢}LX
`OFUgJoInoAueo={7¢}1X
`L¢ug'soJnoAueo={9c}1X
`€ygJonoAueo={ze}1x
`GX+OX=Ixceippe
`
`
`
`
`
`
`
`¢Old
`
`
`
`eOl
`
`‘LEFLX
`
`U.S. Patent
`U.S. Patent
`
`Apr. 11, 2000
`
`Sheet 2 of 4
`
`6,049,864
`6,049,864
`
`
`
`‘BJM
`
`LzLabs GmbH. Ex. 1007-4
`
`LzLabs GmbH. Ex. 1007-4
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Apr. 11, 2000
`
`Sheet 3 of 4
`
`6,049,864
`
`FIG. 4
`
`Format: prodf AFLAGS =X1, mode, Opsize
`Operation: switch (opsize) {
`Case 8-bit
`top bit p0S = 7
`Carry poS = 34;
`break,
`Case 16-bit
`top bit p0S = 15,
`Carry poS = 36;
`break,
`Case 32-bit:
`top bit p0S =31;
`Carry pós = 38;
`break)
`Switch (mode) {
`case ADDF:/*OO/
`AFLAGS.aux=X1{32};
`AFLAGS.Carry =X1(carry pos});
`AFLAGS.Overf=X1(carry pos}^X1(carry pos-1};
`break,
`Case SUBF:/*O1*/
`AFLAGS.aux = -(X1(32});
`AFLAGS.Carry = -(X1(carry pos});
`AFLAGS.Overf=X1(carry pos} ^X1(carry pos-1};
`break,
`Case shift1:/*10*/
`/* left shifts/rotates;
`AFLAGS.Carry =X1{top bit poS + 1};
`AFLAGS.Overf= AFLAGS.Carry
`X1 (top bit pos};
`break,
`Case shiftr/11/
`/* rightshifts/rotates */
`AFLAGS.carry =X1(top bit p0s + 1};
`AFLAGS.Overf=X1{top bit pos-1} ^X1 (top bit p0s),
`break;
`
`AFLAGS.parity = has even parity (X17:0});
`AFLAGS.sign =X1(top bit p0S};
`AFLAGS.Zero = (X1(top bit p0S:0} = = 0);
`
`LzLabs GmbH. Ex. 1007-5
`
`
`
`U.S. Patent
`
`Apr. 11, 2000
`
`Sheet 4 of 4
`
`6,049,864
`
`Translate the flag generating instruction
`into at least first and SeCOnd
`instructionS.
`
`
`
`Translate the Subsequent instruction
`into at least a third instruction.
`
`Schedule the first instruction to
`eXeCute.
`
`
`
`
`
`DOeS
`the third
`instruction have a data
`dependency On the second
`instruction
`?
`
`
`
`
`
`
`
`
`
`
`
`
`
`Schedule the SeCOnd
`instruction to eXecute
`before the third
`instruction.
`
`
`
`Schedule the third
`instruction to eXecute
`before the SeCOnd
`instruction.
`
`F.G. 5
`
`LzLabs GmbH. Ex. 1007-6
`
`
`
`6,049,864
`
`1
`METHOD FOR SCHEDULING A FLAG
`GENERATING INSTRUCTION AND A
`SUBSEQUENT INSTRUCTION BY
`EXECUTING THE FLAG GENERATING
`INSTRUCTION IN AMCROPROCESSOR
`
`2
`A flag is a bit that may indicate the condition of a
`microprocessor. A flag may also control the operation of the
`microprocessor. Flags are typically grouped together into a
`flag register. For example, a microprocessor could group
`arithmetic flags into the AFLAGS register as shown in FIG.
`1(a). Descriptions of these arithmetic flags are shown in
`FIG. 1(b).
`The arithmetic portion of the above CISC arithmetic
`instruction efficiently translates into a Single RISC instruc
`tion or at least a relatively small number of RISC instruc
`tions. However, the flag generation portion conventionally
`translates into many RISC instructions. Numerous instruc
`tions are required because of the complexity of flag genera
`tion. AS an example, the previously discussed ADD instruc
`tion could modify the contents of the sign, Zero, carry,
`auxiliary carry, parity, and Overflow flags. As a result of the
`large number of RISC instructions necessary to generate the
`above flags, the performance of the microprocessor
`decreases dramatically. Thus, translating CISC flag gener
`ating arithmetic instructions into RISC instructions is not an
`optimal Solution.
`There is a need for a more efficient method to execute flag
`generating arithmetic and logic instructions. Such a method
`should avoid the expense, complexity, and Slow Speed of
`CISC microprocessors. In addition, such a method should
`take advantage of RISC performance enhancements while
`avoiding generating a large number of RISC instructions
`that decrease performance.
`2. SUMMARY OF THE INVENTION
`The invention relates to a method, performed by a
`microprocessor, for Scheduling a flag generating instruction
`and a Subsequent instruction. The Subsequent instruction has
`a data dependency on the flag generating instruction. The
`flag generating instruction is translated into first and Second
`instructions. The Subsequent instruction is translated into at
`least a third instruction.
`The first instruction, when executed, generates a result
`and intermediate flag generation data. The Second
`instruction, when executed, generates a plurality of flags.
`The first instruction is scheduled to execute before the
`Second and third instructions. The Second instruction is
`Scheduled to execute before the third instruction if the third
`instruction has a data dependency on the Second instruction,
`otherwise the third instruction may be Scheduled to execute
`before the Second instruction.
`
`50
`
`55
`
`60
`
`65
`
`3. BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1(a) is a diagram of an AFLAGS register.
`FIG. 1(b) is a table containing a description of flags
`contained in the AFLAGS register of FIG. 1(a).
`FIG. 2 is a diagram of a plurality of computer instructions.
`FIG. 3 is a high level diagram of one embodiment of a
`method to translate a flag generating instruction.
`FIG. 4 presents pseudo code for a prodf instruction.
`FIG. 5 is a high level diagram of a method for executing
`the flag generating and Subsequent instructions of FIG. 2.
`4. DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`Referring to FIG. 2, a plurality of instructions is shown.
`The plurality of instructions are typically CISC instructions,
`although they may be RISC instructions.
`The plurality of instructions include at least one flag
`generating instruction and a Subsequent instruction. The flag
`
`15
`
`25
`
`35
`
`40
`
`45
`
`BACKGROUND OF THE INVENTION
`This invention relates to a method for executing a flag
`generating instruction and a Subsequent instruction. This
`method is particularly useful for rapidly executing flag
`generating CISC instructions Such as Intel x86 instructions.
`When many computer languages that are in widespread
`use today were developed, memory was a very expensive
`commodity. In addition, during the development of the early
`microprocessors, instructions took a long time to fetch when
`compared with the time needed to execute. Thus, the lan
`guage developerS needed to ensure that instructions could be
`Stored compactly. As a result, they developed computer
`languages that contained a very rich Vocabulary of computer
`operations. A microprocessor that executes these operations
`is known as a complex instruction set computer (CISC).
`While CISC microprocessors can rapidly handle CISC
`instructions, Such microprocessors are often more complex,
`expensive, and often slower than a microprocessor designed
`to handle Simpler instructions.
`Today, because memory is relatively inexpensive, modem
`computer languages emphasize Speed of execution rather
`than compactness of code. Thus, more modern reduced
`instruction set computer (RISC) languages utilize a smaller
`number of Simple instructions that may be rapidly handled
`by a RISC microprocessor. One of the primary reasons that
`RISC instructions may be handled faster than CISC instruc
`tions is that RISC instructions are easier to schedule and
`execute in parallel than CISC instructions. A microprocessor
`that can execute instructions in parallel is known as a
`SuperScalar microprocessor.
`Most microprocessor instructions generate at least one
`result that is utilized by Subsequent instructions. If a Subse
`quent instruction utilizes a result produced by a previous
`instruction, the Subsequent instruction is said to have a data
`dependency on the previous instruction. Thus, if a Super
`Scalar microprocessor attempts to execute the two instruc
`tions at the same time, the execution of the Subsequent
`instruction must wait until the previous instruction has
`produced its result. Typically, any instruction must be
`delayed until all of its inputs have been produced. This
`requirement places a Significant performance limitation on
`SuperScalar microprocessors.
`The above described execution, known as out-of-order
`execution, is a conventional method to Significantly increase
`the performance of modem microprocessors. It is known in
`the art that RISC instructions can be more easily executed
`out-of-order than their CISC counterparts.
`In an effort to ensure compatibility with older CISC
`Software, the vast majority of computer programmerS Still
`use CISC instructions today. However, in an effort to take
`advantage of the RISC performance enhancements, Some
`microprocessor designers break down lengthy CISC instruc
`tions into Simpler operations that more closely resemble
`RISC instructions.
`When a CISC arithmetic instruction, Such as ADD
`X1=X2+X3 executes, it generates a result equal to X2+X3.
`X2 and X3 are known as operands. In addition, the arith
`metic instruction generates Several flags. CISC logical
`instructions such as AND, OR, and XOR, also generate
`Similar flags.
`
`LzLabs GmbH. Ex. 1007-7
`
`
`
`3
`generating instruction may be any instruction that generates
`at least one flag. Examples of flag generating arithmetic
`instructions include add, Subtract, Shift left, and shift right.
`Examples of flag generating logical instructions include
`instructions that perform the logical operations AND, OR,
`and XOR.
`The Subsequent instruction has a data dependency on the
`flag generating instruction. Thus, the Subsequent instruction
`may take the result and/or a flag generated by the flag
`generating instruction as an input.
`While only two instructions are shown in FIG. 2, it is to
`be understood that the two instructions are part of a larger
`computer program consisting of numerous instructions. In
`addition, even though the Subsequent instruction is shown as
`immediately following the flag generating instruction, there
`may be additional instructions between the flag generating
`instruction and the Subsequent instruction.
`4.1 Translating the flag generating instruction
`Referring to FIG. 3, a method to execute the instructions
`shown in FIG. 2 is presented. A flag generating instruction,
`such as a 32 bit add instruction (ADD32), is translated into
`a first instruction (addf32) and a second instruction
`(prodf32). The first and second instructions are typically
`RISC instructions, however, under Some circumstances,
`they may be CISC instructions. (In an effort to facilitate
`readability, all CISC instructions will be upper case and all
`RISC instructions will be lower case.) While the terms
`“first and “second” are used to describe the above two
`instructions, there may be additional instructions generated
`from the flag generating instruction. In addition, there may
`be additional instructions generated between the generation
`of the first and Second instructions.
`Referring again to FIG.3, ADD32X1=X2+X3 is intended
`to place the sum of the two 32 bit operands X2 and X3 into
`register X1. Further, ADD32 is intended to update some or
`all of the 6 arithmetic flags in the AFLAGS register of FIG.
`1(a). ADD32 X1=X2+X3 may be broken down into two
`RISC instructions. Addf32 generates the arithmetic result
`X2+X3. In addition, addf32 produces intermediate flag
`generation data Such as carry out bits or other bits that may
`be utilized in producing flags. Prodf32 produces the arith
`metic flags using the intermediate flag generation data
`produced by addf32.
`4.1.1 addf32
`As shown in FIG. 3, addf32 places the arithmetic result
`X2--X3 into the lower 32 bits of X1. Note that both X2 and
`X3 are 32 bit numbers. However, the registers that hold X1,
`X2, and X3 may be greater than 32 bits in length. A length
`of 64 bits may be optimal. If 64 bit registers are used, then
`the upper 32 bits may be used to Store intermediate flag
`generation data and flags while the lower 32 bits may be
`used to Store the above arithmetic result. Referring again to
`FIG. 3, the various carry out bits may be placed into the
`higher order bits of X1. Alternatively, the carry out bits may
`be placed in any memory Storage location, Such as another
`general purpose register. However, placing the arithmetic
`result, intermediate flag generation data, and even AFLAGS
`in a Single register may be useful in certain circumstances
`Such as data dependency checking.
`4.1.2 prodf32
`As shown in FIG. 3, the prodf32 instruction produces six
`arithmetic flags by performing conventional operations on
`the intermediate flag generation data. Because prodf32 takes
`the intermediate flag generation data as an input, prodf32 is
`data dependent on add32. In the embodiment shown in FIG.
`3, prodf32 sets the Auxiliary Carry flag to the value of
`X1{32}. X132 contains the value of the carry out of bit 3.
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,049,864
`
`15
`
`25
`
`4
`Similarly, prodf32 sets the Carry flag to the value of X1:38,
`the carry out of bit 31. The Overflow flag is set to the result
`of X1{38 XOR X1{37. X1{37 contains the value of the
`carry out of bit 30. The Parity flag is set if the first 8 bits in
`X1 have even parity. Similarly, the Zero flag is set if the first
`32 bits of X1 are equal to zero.
`AS is known by those skilled in the art, flag generation
`depends on both the operand Size and the flag generating
`instruction. Thus, different versions of prodf may be needed
`for the differing operand Sizes and differing flag generating
`instructions. FIG. 4 presents C pseudo code for a prodf
`instruction that Supports operand sizes of 8, 16, and 32 bits.
`This C pseudo code also Supports Several different flag
`generating instructions.
`There may be circumstances when a designer would
`desire to keep Some of the flags unchanged. For example, in
`an effort to maintain compatibility with certain Intel x86
`CISC instructions, some instructions may be allowed to
`modify only a Subset of the flags. In this case, flags that the
`instruction should not modify could be masked. Thus, prodf
`could take an AFLAGS mask as an additional input. Prodf
`could then generate a temporary AFLAGS, check the given
`AFLAGS mask, and merge the temporary and original
`AFLAGS together to form the final AFLAGS.
`4.2 Translating the Subsequent instruction
`Referring to FIG. 5, after the flag generating instruction is
`translated into first and Second instructions, the Subsequent
`instruction is translated into one or more instructions. Typi
`cally these instructions are RISC instructions. One of these
`RISC instructions will be referred to as a third instruction.
`Recall that the Subsequent instruction is data dependent on
`the flag generating instruction. Thus, the third instruction is
`data dependent on either the first and/or the Second instruc
`tions.
`4.3 Scheduling the first instruction
`AS discussed previously, the Second instruction takes the
`result generated by the first instruction as an input. Similarly,
`the third instruction takes the result of the first instruction
`and/or a flag generated by the Second instruction as an input.
`Thus, because the Second and third instructions are data
`dependent on at least the first instruction, the first instruction
`is Scheduled to execute first by the microprocessor instruc
`tion Scheduler.
`4.4 Scheduling the Second and third instructions
`Again referring to FIG. 5, the third instruction may or may
`not take the flags generated by the Second instruction as an
`input. If the third instruction does take the flags as an input,
`then the third instruction is data dependent on the Second
`instruction. Thus, the Second instruction is Scheduled to
`execute prior to the third instruction. However, if the third
`instruction is not data dependent on the Second instruction,
`then an increase in performance may be obtained. In this
`case, the third instruction may be Scheduled to execute
`before the Second instruction.
`4.5 Storage of certain flag combinations
`When determining a conditional branch, a microprocessor
`may often use certain combinations of Some of the bits in
`AFLAGS. Some examples of these combinations are “below
`or equal” (Carry OR Zero), “less” (Sign XOR Overflow),
`“less or equal” (Sign XOR Overflow) OR Zero), and the
`complements of these cases. In an effort to increase micro
`processor performance, the values for each of these combi
`nations may be produced by prodf and Stored in the
`AFLAGS register. If the combinations are so stored, then
`many branch decisions may be made by testing only one bit.
`5. REMARKS
`One of the advantages of the invention is that it provides
`an efficient method for executing flag generating instruc
`
`LzLabs GmbH. Ex. 1007-8
`
`
`
`S
`tions. By breaking Such instructions into a Small number of
`instructions, Subsequent instructions may be expedited if
`they do not take flags as an input. In addition, by breaking
`Such instructions into only a Small number of instructions,
`the invention avoids the large number of instructions that are
`generated by conventional CISC to RISC translations.
`Another advantage of the invention is that it allows Some
`arithmetic instructions to be executed in a single clock cycle
`in a modem microprocessor. For example, it may not be
`possible to execute a CISC ADD32 instruction in a single
`clock cycle of a modern microprocessor. However, it would
`be possible to execute a RISC add32 in a single instruction
`cycle.
`Another advantage of breaking CISC flag generating
`arithmetic instructions into RISC instructions, is that it
`simplifies the design effort. It is known by those skilled in
`the art that designing two Separate modules often requires
`less effort than designing a Single complex module.
`Still another primary advantage of this invention is that it
`decreases the complexity of a microprocessor. CISC micro
`processors arc inherently complex. However, by breaking
`down CISC flag generating instructions into more manage
`able RISC instructions, CISC instructions can be efficiently
`executed using RISC performance enhancements.
`It will be appreciated by those of ordinary skill having the
`benefit of this disclosure that the illustrative embodiments
`described above are capable of numerous variations without
`departing from the Scope and Spirit of the invention. For
`example, while an ADD32 instruction has been discussed as
`an example of a flag generating instruction, other ADD
`instructions that operate on different length operands are
`also flag generating instructions. There are many other flag
`generating instructions, Such as but not limited to Subtract,
`left shift, right shift, and logical instructions. Accordingly,
`the exclusive rights Sought to be patented are as described in
`the claims below.
`What is claimed is:
`1. A method, performed by a microprocessor, for execut
`ing a flag generating instruction, the method comprising:
`(a) translating the flag generating instruction into at least
`a first instruction and a Second instruction;
`(b) the first instruction generating intermediate flaggen
`eration data upon execution of the first instruction; and
`(c) the Second instruction, utilizing the intermediate flag
`generation data, generating a plurality of flags upon
`execution of the Second instruction.
`2. The method of claim 1, further comprising Scheduling
`the first instruction for execution before the Second instruc
`tion.
`3. The method of claim 1 wherein the flag generating
`instruction comprises a CISC instruction and the first and
`Second instructions comprise RISC instructions.
`4. The method of claim 1 wherein the flag generating
`instruction comprises an Intel x86 instruction.
`5. The method of claim 1 wherein the flag generating
`instruction comprises an arithmetic instruction.
`6. The method of claim 1 wherein the flag generating
`instruction comprises a logical instruction.
`7. The method of claim 1 wherein the microprocessor
`comprises a SuperScalar out-of-order microprocessor.
`8. A method, performed by a microprocessor, for Sched
`uling a flag generating instruction and a Subsequent
`instruction, the Subsequent instruction having a data depen
`dency on the flag generating instruction, the method com
`prising:
`
`6
`(a) translating the flag generating instruction into a first
`instruction and a Second instruction, the first
`instruction, when executed, generating intermediate
`flag generation data, the Second instruction, when
`executed, utilizing the intermediate flag generation
`data, generating a plurality of flags;
`(b) translating the Subsequent instruction into at least a
`third instruction;
`(c) Scheduling the first instruction to execute before the
`Second and third instructions, and
`(d) Scheduling the Second instruction to execute before the
`third instruction if the third instruction has a data
`dependency on the Second instruction.
`9. The method of claim 8 wherein the flag generating
`instruction and the Subsequent instruction comprise CISC
`instructions and the first, Second, and third instructions
`comprise RISC instructions.
`10. The method of claim 8 wherein the flag generating
`instruction comprises an arithmetic instruction.
`11. The method of claim 8 wherein the flag generating
`instruction comprises a logical instruction.
`12. The method of claim 8 wherein the flag generating
`instruction and the Subsequent instruction comprise Intel
`x86 instructions.
`13. The method of claim 8 wherein the microprocessor is
`a SuperScalar out-of-order microprocessor.
`14. A method, performed by a microprocessor, for Sched
`uling a flag generating instruction and a Subsequent
`instruction, the Subsequent instruction having a data depen
`dency on the flag generating instruction, the method com
`prising:
`(a) translating the flag generating instruction into a first
`instruction and a Second instruction, the first
`instruction, when executed, generating intermediate
`flag generation data, the Second instruction, when
`executed, utilizing the intermediate flag generation
`data, generating a plurality of flags;
`(b) translating the Subsequent instruction into at least a
`third instruction;
`(c) Scheduling the first instruction to execute before the
`Second and third instructions,
`(d) Scheduling the Second instruction to execute before the
`third instruction if the third instruction has a data
`dependency on the Second instruction; and
`(e) otherwise, Scheduling the third instruction to execute
`before the Second instruction.
`15. The method of claim 14 wherein the flag generating
`instruction and the Subsequent instruction comprise CISC
`instructions and the first, Second, and third instructions
`comprise RISC instructions.
`16. The method of claim 14 wherein the flag generating
`instruction comprises an arithmetic instruction.
`17. The method of claim 14 wherein the flag generating
`instruction comprises a logical instruction.
`18. The method of claim 14 wherein the flag generating
`instruction and the Subsequent instruction comprise Intel
`x86 instructions.
`19. The method of claim 14 wherein the microprocessor
`is a SuperScalar out-of-order microprocessor.
`
`k
`
`k
`
`k
`
`k
`
`k
`
`6,049,864
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`LzLabs GmbH. Ex. 1007-9
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`CERTIFICATE OF CORRECTION
`PATENT NO. :6,049,864
`DATED
`: April 11, 2000
`INVENTOR(S) : Liu et al.
`It is certified that error appears in the above-identified patent and that said Letters Patent is hereby
`corrected as shown below:
`
`In column 1, at line 26, delete "modem" and insert-modern--.
`
`Attest:
`
`Signed and Sealed this
`Fifteenth Day of May, 2001
`Zaaé, f-34.
`
`NICHOLAS P. GODC
`
`Attesting Officer
`
`Acting Director of the United States Patent and Trade mark Office
`
`LzLabs GmbH. Ex. 1007-10
`
`