`Cole
`
`USOO6865272B2
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,865,272 B2
`Mar. 8, 2005
`
`(54) EXECUTING PERMUTATIONS
`(75) Inventor: Anthony James Cole, Bristol (GB)
`(73) Assignee: STMicroelectronics Limited,
`Almondsbury Bristol (GB)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 138 days.
`
`(*) Notice:
`
`(21) Appl. No.: 10/117,934
`(22) Filed:
`Apr. 8, 2002
`(65)
`Prior Publication Data
`US 2003/0138098 A1 Jul. 24, 2003
`Related U.S. Application Data
`(63) Continuation of application No. 09/994,382, filed on Nov.
`26, 2001, now abandoned, which is a continuation of appli
`cation No. 09/893,305, filed on Jun. 27, 2001, now aban
`doned, which is a continuation of application No. 09/698,
`486, filed on Oct. 27, 2000, now abandoned, which is a
`continuation of application No. 09/505.397, filed on Feb. 16,
`2000, now abandoned, which is a continuation of application
`No. 09/362,407, filed on Jul. 28, 1999, now abandoned,
`which is a continuation of application No. 09/226,981, filed
`on Jan. 8, 1999, now abandoned.
`Foreign Application Priority Data
`(30)
`(GB) ............................................. 98O1713
`Jan. 27, 1998
`(51) Int. Cl. .............................. H04L 9/00; G06F 7/58
`(52) U.S. Cl. ........................... 380/28; 380/37; 708/209;
`708/250; 712/300
`(58) Field of Search ............................ 380/252, 28, 37,
`380/265, 268, 29; 712/300; 708/209, 253,
`250
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`12/1992 Delaporte et al.
`5,168,521 A
`5,412,728 A
`5/1995 Besnard et al.
`5,822,619 A * 10/1998 Sidwell ...................... 712/300
`5,884,069 A * 3/1999
`... 712/221
`6,145,077 A * 11/2000 Sidwell et al. .............. 712/300
`
`FOREIGN PATENT DOCUMENTS
`
`WO
`
`WO 98.38767
`
`9/1998
`
`OTHER PUBLICATIONS
`
`Standard Search Report dated Oct. 16, 1998, issued by the
`European Patent Office.
`Multiple Master Keys, IBM Technical Disclosure Bulletin,
`vol. 32, No. 8A, Jan. 1, 1990, pp. 239-242.
`
`* cited by examiner
`
`Primary Examiner-Gilberto Barron
`Assistant Examiner Jung W Kim
`(74) Attorney, Agent, or Firm-Lisa K. Jorgenson; James
`H. Morris; Wolf, Greenfield & Sacks, P.C.
`(57)
`ABSTRACT
`A method for changing the bit-order of a data value in a data
`processing System having a register capable of Storing data
`Strings which each comprise a plurality of Sub-Strings that
`are not individually addressable, the method comprising
`assigning an output data String by the Steps of loading the
`data value into a first data String, generating, for each
`Sub-String of the output data String, a corresponding inter
`mediate data String, each Sub-String of which corresponds to
`a Selected bit on the first data String and has all its bits equal
`to the value of the Selected bit; and generating the output
`data String, in each Sub-String of which each bit has the same
`value as the bits in a Selected Sub-String of the intermediate
`data String that corresponds to that Sub-String of the output
`data String.
`
`3,383,661. A *
`
`5/1968 Goldstein ................... 708/200
`
`20 Claims, 10 Drawing Sheets
`
`
`
`Oracle-1023 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 1 of 10
`
`US 6,865,272 B2
`
`...
`
`I | | | | |
`F.G. 1
`
`502
`
`5O 42 34
`52 44 36
`54 46 38
`56.48 40
`49 41 33
`51 43 35
`53 45 37
`55 47. 39
`
`26 18
`28 20
`3O 22
`32 24
`25 17
`27 19
`29 21
`31 23
`
`FIG. 2
`
`
`
`40
`39
`38
`37
`36
`35
`34
`33
`
`48 6
`47 15
`46 14
`45 13
`44 12
`43 1
`42 10
`1 41 9
`
`56 24
`55 23
`54 22
`53 24
`52 20
`51 19
`50 18
`49 17
`
`64
`63
`62
`61
`60
`59
`58
`57
`
`32
`31
`30
`29
`28
`27
`26
`25
`
`FG. A.
`
`Oracle-1023 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 2 of 10
`
`US 6,865,272 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`OOUBLE
`OP
`
`SRl
`PACKED
`ARITHMETIC st
`
`
`
`RES
`
`MEMORY
`REEESS
`
`SEs
`S.
`
`
`
`30
`
`
`
`G
`
`RES
`
`ARTEMETIC
`
`PACK OPS
`ACCESS
`
`MEM OPS
`CONCITION
`
`I
`
`
`
`H
`- sects POINTER
`/\
`HT-1-5
`-
`DOUBLE
`E1 E2
`ATCH
`REGISTER
`SR
`is
`"fit
`s:
`V/
`OEST DEST
`REG in
`- - - CONSTANT
`
`
`
`G
`
`NST
`
`
`
`a
`
`
`
`OPCODE
`NEW NST
`
`BRANCH
`
`to
`FETCH
`
`50
`
`53
`
`- is 3.
`Li. E
`- NEWS
`ADDRESS
`REACY READ s T :
`
`SELECT
`SELECT
`
`
`
`WRITE
`
`0 22
`
`MEMORY
`
`FIG. 5
`PROCESSOR & MEMORY
`
`AOORESS
`DATA
`
`26
`
`
`
`
`
`
`
`
`
`Oracle-1023 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 3 of 10
`
`US 6,865,272 B2
`
`ROUTE OPCODE
`
`118
`
`70
`BYTE REPLICATE
`
`
`
`
`
`is
`
`100
`
`-
`
`OBVIOUS PACKED
`IWIST & ZIP
`AREMEric
`-
`S.
`52-SR1' 'SR2-5,
`F.G. 6
`PACKED UNIT
`
`
`
`
`
`
`
`
`
`
`
`
`
`DOUBLE
`56
`
`RES
`
`
`
`RESULT
`FIG. 8
`OBVIOUS PACKED ARTHMETIC
`
`Oracle-1023 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 4 of 10
`
`US 6,865,272 B2
`
`
`
`
`
`S?08 WAS
`
`Oracle-1023 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 5 of 10
`
`US 6,865,272 B2
`
`92
`
`90
`
`FIG.9
`OBVIOUS PACKED ARITHMETIC WITH UNPACKED OPERAND
`
`RESULT
`
`
`
`FIG 10
`BYTE REPLICATE
`
`Oracle-1023 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 6 of 10
`
`US 6,865,272 B2
`
`
`
`51,32 Š 2
`25S
`
`SHOEofows show syMMETRICA operations
`F.G. 11
`ZIP AND UNZIP
`
`Oracle-1023 p. 7
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 7 of 10
`
`US 6,865,272 B2
`
`
`
`DATA
`FORMAT
`
`FLP
`
`ARRAY LAYOUT
`O 1
`
`Oracle-1023 p. 8
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 8 of 10
`
`US 6,865,272 B2
`
`FIG. 13
`6l-BT ZIPS AND UNZIPS
`SRC1
`
`130
`
`S--> S >
`
`T Y p s i
`2.A.re
`ity:
`EI),
`
`
`
`
`
`
`
`
`
`
`168
`
`A.
`
`Oracle-1023 p. 9
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 9 of 10
`
`US 6,865,272 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`E
`
`
`
`
`
`
`
`sessssssssssss
`fifteeplete 52.22:5???
`ASX
`(KKKK'
`A788.7 A7A7A2-182A%
`488 AY
`196 te See
`Ž>£S S.
`CN/
`1981 12 SSNN.I.
`D
`2OO
`
`17s
`RESULT
`FIG. 14,
`OOUBLE LENGTH 8-BIT ZIP AND UNZIP
`
`174
`
`
`
`SRC1
`
`2O2
`
`
`
`S4
`
`Sea &:
`i
`it is,
`YJ" Witt/
`N
`17 s. YN
`D diffiti for to
`talk
`"T" 2O6
`
`
`
`A2
`
`YA
`
`
`
`LL Le
`
`
`
`
`
`
`
`
`
`232
`
`ES-20
`
`FIG. 15
`DOUBLE LENGTH 16-BIT AND 32-BT ZIP AND UNZIP
`
`Oracle-1023 p. 10
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 10 of 10
`
`US 6,865,272 B2
`
`OPCOOE
`258 - ||
`
`236
`SRC2
`|| 1
`
`|| ||
`
`Y P E
`
`
`
`23,
`
`SRC1
`||
`
`
`
`160
`
`
`
`
`
`58
`
`sigsseesa
`syyyAAA
`:
`*(Krk (2):
`H SS,
`3/3// KCV
`
`2
`
`2
`
`
`
`%
`
`
`
`FIG 1 6 I-
`
`RESULT
`
`
`
`Y P E
`
`i
`
`16-BIT ANO 32-8T FLPS
`
`Oracle-1023 p. 11
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`1
`EXECUTING PERMUTATIONS
`
`US 6,865,272 B2
`
`5
`
`2
`Preferably one or more of the intermediate data strings is
`generated by the Steps of masking, at the first data String, all
`but the Selected bits corresponding to the Sub-Strings of the
`intermediate data String, and allocating to all the bits of each
`Sub-String in the intermediate data String the value of the
`umasked Selected bit corresponding to that Sub-String of the
`intermediate data String.
`Preferably the Step of generating the output data String
`comprises the Step of, for each intermediate data String,
`generating a corresponding Second intermediate data String
`by masking all but one bit in each Sub-String of the respec
`tive intermediate data String.
`Preferably the Step of generating the output data String
`comprises the Step of, for each Sub-String of the output data
`String generating an intermediate Sub-String corresponding
`to the respective Second intermediate data String by per
`forming an OR operation on the Sub-Strings of that Second
`intermediate data String. The OR operation could be per
`formed by any suitable method that has an ORing effect.
`Preferably the Step of generating the output data String
`comprises bit shifting at least one intermediate Sub-String to
`a desired position in the output data String.
`Preferably for each intermediate data String no more than
`one of the Selected bits is located in any Sub-String of the first
`data String, Suitably with each Sub-String of the intermediate
`data String corresponding to a single Sub-String in the first
`data string. Preferably all the selected bits are located at a
`Single Selected bit-position in each Sub-String of the first data
`String, Suitably with each intermediate data String being
`asSociated with a Selected bit position in the Sub-String of the
`first data String.
`According to the present invention from a Second aspect
`there is provided a data processing System comprising:
`processing means,
`a register capable of Storing data Strings which each
`comprise a plurality of Sub-Strings that are not indi
`vidually addressable; and
`a program memory for Storing a set of instructions for the
`processor to change the bit-order of a data value by
`assigning an output data String according to the fol
`lowing Steps:
`a. loading the data value into a first data String,
`b. generating, for each Sub-String of the output data String,
`a corresponding intermediate data String, each Sub
`String of which corresponds to a Selected bit in the first
`data String and has all its bits equal to the value of the
`Selected bit; and
`c. generating the output data String, in each Sub-String of
`which each bit has the same value as the bits in a
`Selected Sub-String of the intermediate data String that
`corresponds to that Sub-String of the output data String.
`According to the present invention from a third aspect
`there is provided a method for changing the bit-order of a
`data value in a data processing System having a register
`capable of Storing data Strings which each comprise a
`plurality of Sub-Strings that arc not individually addressable,
`the method comprising assigning an output data String by the
`Steps of
`loading the data value into a first data String;
`generating, for each bit-position in the Sub-Strings of the
`output data String, a corresponding intermediate data
`String associated with a Selected Sub-String of the input
`data String, each Sub-String of that intermediate data
`String corresponding to a bit-position in the Sub-Strings
`of the first data String and having all its bits equal to the
`value of the bit at that bit-position in the respective
`Sub-String of the first data String, and
`
`25
`
`BACKGROUND OF THE INVENTION
`This invention relates to methods and Systems for chang
`ing the bit-order of data.
`There are numerous circumstances when the bit order of
`data needs to be changed. Examples are in cryptography and
`interleaving algorithms (e.g. Reed Solomon).
`SUMMARY OF THE INVENTION
`One example in cryptography is in encoding or decoding
`data according to the data encryption Standard (DES).
`Details of the DES algorithm are given in National Institute
`15
`of Standards and Technology-FIPS PUB 46-1, Data
`Encryption Standard and in “Privacy and Authentication: An
`Introduction to Cryptography”, Diffie & Hellman, Proceed
`ings of the IEEE, Vol. 67, pp. 397–427, 1979. The DES
`algorithm is a block cipher that operates on 64-bit blocks of
`plaintext data and uses a 56-bit key. ESSentially the same
`algorithm is used for encryption and decryption. The overall
`transformations employed in the DES algorithm may be
`written as P' FP(X))}, where X is the plaintext, P is a
`certain permutation and the function F combines Substitu
`tions and transpositions. The permutation P is known as the
`initial permutation (IP) because it is performed at the outset
`of encoding a block of data. The inverse permutation P' is
`known as the inverse initial permutation (IIP) and is per
`formed at the end of encoding a block of data. FIG. 1
`illustrates the IP. A 64-bit input value 501 is re-ordered to
`form a 64-bit output value 502 in which the 1st bit has the
`value of the 58th input bit, the 2nd bit has the value of the
`50th input bit and so on. The full forward ordering scheme
`is given in the table in FIG. 2. FIG. 3 illustrates the IIP. A
`35
`64-bit input value 503 is re-ordered to form a 64-bit output
`value 504 in which the 1st bit has the value of the 40th input
`bit, the 2nd bit has the value of the 8th input bit and so on.
`The full inverse ordering scheme is given in the table in FIG.
`4. (In FIGS. 1 to 4 bit-position 1 is the most significant bit
`(MSB) and position 64 is the least significant bit (LSB)).
`When the DES algorithm is being executed there is a need
`to perform these permutations as quickly as possible. The
`permutations are normally performed by the simplest means:
`a look-up table. Furthermore, the procedure used to imple
`ment the look-up table is usually of a loop Structure, the
`branching of which does not make best use of modern
`processors equipped with predictive capabilities. Similar
`considerations apply to other applications that require per
`mutations to be performed.
`According to the present invention from a first aspect
`there is provided a method for changing the bit-order of a
`data value in a data processing System having a register
`capable of Storing data Strings which each comprise a
`plurality of Sub-Strings that are not individually addressable,
`the method comprising assigning an output data String by the
`Steps of:
`loading the data value into a first data String,
`generating, for each Sub-String of the output data String, a
`corresponding intermediate data String, each Sub-String
`of which corresponds to a selected bit in the first data
`String and has all its bits equal to the value of the
`Selected bit; and
`generating the output data String, in each Sub-String of
`which each bit has the same value as the bits in a
`Selected Sub-String of the intermediate data String that
`corresponds to that Sub-String of the output data String.
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Oracle-1023 p. 12
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`3
`generating the output data String, in which each of the bits
`at each bit-position in the Sub-Strings of which has the
`Same value as the bits in a Selected Sub-String of the
`intermediate data String corresponding to that bit
`position.
`Preferably each of the intermediate data Strings is gener
`ated by the Step of expanding the Sub-String associated with
`that intermediate data String So that each of the bits of that
`sub-string determines the value of all the bits in the corre
`sponding Sub-String of the intermediate data String.
`Preferably the Step of generating the intermediate data
`Strings comprises bit shifting at least one Sub-String of the
`input data String.
`Preferably the Step of generating the output data String
`comprises the Step of generating a Second intermediate data
`String corresponding to each intermediate data String by
`masking all but one bit in each Sub-String of the respective
`intermediate data String.
`Preferably the Step of generating the output data String
`comprises the Step of performing an OR operation on the
`Second intermediate data Strings. The OR operation could be
`performed by any suitable method that has an ORing effect.
`According to the present invention from a fourth aspect
`there is provided a data processing System comprising:
`processing means;
`a register capable of Storing data Strings which each
`comprise a plurality of Sub-Strings that are not indi
`vidually addressable; and
`a program memory for Storing a set of instructions for the
`processor to change the bit-order of a data value by
`assigning an output data String according to the fol
`lowing Steps:
`a loading the data value into a first data String;
`b. generating, for each Sub-String of the output data String,
`a corresponding intermediate data String, each Sub
`String of which corresponds to a Selected bit in the first
`data String and has all its bits equal to the value of the
`Selected bit; and
`c. generating the output data String, in which each of the
`bits at each bit-position in the Sub-Strings of which has
`the same value as the bits in a Selected Sub-String of the
`intermediate data String that corresponds to that bit
`position.
`In all the aspects of the invention it is preferred that each
`bit represents a binary digit. Suitably all the data Strings have
`the same bit-length; preferred lengths are 32, 64 and 128
`bits. Suitably all the data strings have the same number of
`sub-strings; preferred numbers are 4, 8 and 16. Suitably all
`the Sub-Strings have the same bit-length; preferred lengths
`are 4, 8 and 16. Suitably the number of bits in each
`Sub-String equals the number of Sub-Strings in each data
`String. One most preferred arrangement is for each data
`string to have 64 bits and be composed of 8 8-bit sub-strings.
`The data value may be formed of several smaller data
`values.
`In all the aspects of the invention the order of the steps
`could be Such that only one or Some of the intermediate data
`Strings are generated before the output data String is begun
`to be generated. Preferably the relevant bits derived from an
`intermediate data String are included in the output data String
`before the next intermediate data String is generated. Pref
`erably each bit in the first data String has a corresponding bit
`in the output data-String. A bit's bit-position is Suitably
`determined by its bit-significance (in the range from MSB to
`LSB) in its sub-string.
`In all the aspects of the invention the output data String is
`preferably generated by Setting its bits to be equal to the
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,865,272 B2
`
`4
`appropriate bits of the other Strings and/or SubStrings as
`described above.
`The methods and Systems may be used for implementing
`permutations of various Sorts. The methods and Systems may
`be used in, for example, encryption, decryption, interleaving
`or de-interleaving. The data value may represent data to be,
`for example, compressed, decompressed, interleaved or
`de-interleaved.
`Some or all of the Steps and/or instructions are Suitably
`performed by a computer, Suitably according to a stored
`program. The computer may provide specific instructions
`for performing one or more of the Steps. In particular there
`may be specific instructions for performing masking (for
`example AND instructions), ORing (for example OR
`instructions or addacr instructions or other instructions of
`that type to OR Sub-Strings together) or expanding bits to
`occupy a Sub-String (for example cmpne instructions or
`other instructions of that type). The instructions preferably
`operate on data Strings whose Sub-Strings are not individu
`ally accessible. The instructions may be of the packed
`arithmetic type.
`The computer or the data processing System may be part
`of an encryption, decryption, interleaving, de-interleaving or
`communication System. Particular examples of the algo
`rithms that may be performed include the DES permutation,
`the DES inverse permutation, Reed-Solomon interleaving
`and Reed-Solomon de-interleaving.
`BRIEF DESCRIPTION OF THE DRAWINGS
`The present invention will now be described by way of
`example with reference to the accompanying drawings in
`which:
`FIG. 1 illustrates the IP permutation in the DES algo
`rithm;
`FIG. 2 shows the IP ordering scheme;
`FIG. 3 illustrates the IIP permutation in the DES algo
`rithm;
`FIG. 4 shows the lip ordering scheme;
`FIG. 5 is a block diagram of a processor and memory of
`a computer,
`FIG. 6 is a block diagram of a packed arithmetic unit;
`FIG. 7 shows the meaning of symbols used in the figures;
`FIG. 8 is a block diagram of an obvious packed arithmetic
`unit operating on two packed Source operands,
`FIG. 9 is a block diagram of an obvious packed arithmetic
`unit which operates on a packed Source operand and an
`unpacked Source operand;
`FIG. 10 shows a byte replicate unit;
`FIG. 11 ShowS Zip and unzip restructuring operations,
`FIG. 12 shows flip restructuring operations,
`FIG.13 shows part of the twist and zip unit for performing
`64 bit ZipS and unzips,
`FIG. 14 shows part of the twist and zip unit for performing
`double length 8 bit Zips and unzips,
`FIG. 15 shows part of the twist and zip unit for performing
`double length 16 bit and 32 bit zips and unzips;
`FIG.16 shows part of the twist and zip unit for performing
`8 bit flips; and
`FIG. 17 shows part of the twist and zip unit for performing
`16 bit and 32 bit flips.
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`A computer Suitable for performing the method of the
`present invention will now be described.
`
`Oracle-1023 p. 13
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`S
`FIG. 5 shows a processor in accordance with one embodi
`ment of the present invention. The processor has three
`execution units including a conventional arithmetic unit 2
`and a memory acceSS unit 4. In addition there is a packed
`arithmetic unit 6. The processor also includes an instruction
`fetcher 8, an instruction register 10, a register file 12 and an
`instruction pointer 14 all of which operate under the control
`of a control unit 16 of the processor. The register file
`comprises a set of registers each having a predetermined bit
`capacity and each being addressable with a single address.
`It is not possible to address individual locations within a
`register. When a register is accessed, the entire contents of
`the register are concerned. The processor further includes a
`constant unit 18 and a select unit 20. The constant unit 18
`and select unit 20 are also operated under the control of the
`control unit 16. The processor operates in conjunction with
`a memory 22 which holds instructions and data values for
`effecting operations of the processor. Data values and
`instructions are Supplied to and from the memory 22 via a
`data bus 24. The data bus 24 Supplies data values to and from
`the memory 22 via a memory data input 26. The data bus 24
`also Supplies data to the instruction fetcher 8 via a fetcher
`data input 28 and to the memory acceSS unit 4 via a memory
`access read input 30. The memory is addressed via the select
`unit 20 on address input 32. The select unit 20 is controlled
`via a fetch signal 34 from the control unit 16 to select an
`address 36 from the fetcher 8 or an address 38 from the
`memory access unit 4. Read and write control lines 40,42
`from the control unit 16 control read and write operations to
`and from the memory 22. The instruction fetcher 8 fetches
`instructions from the memory 22 under the control of the
`control unit 16 as follows. An address 36 from which
`instructions are to be read is provided to the memory 22 via
`the select unit 20. These instructions are provided via the
`data bus 24 to the fetcher data input 28. When the instruction
`fetcher has fetched its next instruction, or in any event has
`a next instruction ready, it issues a Ready Signal on line 44
`to the control unit 16. The instruction which is to be
`executed is Supplied to the instruction register 10 along
`instruction line Inst 46 and held there during its execution.
`The instruction pointer 14 holds the address of the instruc
`tion being executed Supplied to it from the fetcher 8 via
`instruction pointer line 48. A Get signal 47 responsive to a
`New Inst signal 53 from the control unit 16 causes the
`instruction register 10 to Store the next instruction on Inst
`line 46 and causes the fetcher 8 to prepare the next instruc
`tion. The New Inst signal 53 also causes the instruction
`pointer 14 to Store the address of the next instruction. A
`branch line 50 from the control unit 16 allows the instruction
`fetcher 8 to execute branches.
`The instruction register 10 provides Source 1 and Source
`2 register addresses to the register file 12 as Reg1 and Reg2.
`A result register address is provided as Dest. Opcode is
`provided to the control unit 16 along line 51. In addition,
`Some instructions will provide a constant operand instead of
`encoding one or both Source registers. The constant is
`provided by the constant unit 18. The instruction's source
`values are provided on Source 1 and Source 2 buses 52.54
`by the appropriate Settings of the S1 Reg and S2 Reg Signals
`at inputs E1.E2. The correct execution unit is enabled by
`providing the appropriate values for Pack Ops, Mem Ops
`and ALU Ops Signals from the control unit 16 in accordance
`with the Opcode on line 51. The enabled unit will normally
`provide a result Res on a result bus 56. This is normally
`Stored in the Selected result register Dest in the register file
`12. There are Some exceptions to this.
`Some instructions provide a Double length result. These
`store the first part of the result in the normal way. In a
`
`6
`Subsequent additional Stage, the Second part of the result is
`Stored in the next register in the register file 12 by asserting
`a Double signal 58.
`Branches 50 need to read and adjust the instruction
`pointer 14. These cause the S1 Reg Signal not to be asserted,
`and so the instruction pointer 14 provides the Source 1 value
`online 60. The Source 2 value is provided in the normal way
`(either from a register in the register file 12, or the constant
`unit 18). The arithmetic unit 2 executes the branch calcula
`tions and its result is stored into the fetcher 8 on the New IP
`input 64, rather than the register file 12, Signalled by the
`Branch line 50 from the control unit 16. This starts the
`fetcher from a new address.
`Conditional branches must execute in two stages depend
`ing on the State of condition line 62. The first stage uses the
`Dest Register as another Source, by asserting a Read Dest
`signal 45. If the condition is satisfied, then the normal branch
`Source operands are read and a branch is executed.
`Calls must Save a return address. This is done by Storing
`the instruction pointer value in a destination register prior to
`calculating the branch target.
`The computer described herein has several noteworthy
`general qualities.
`Source operands are always the natural word length.
`There can be one, two or three Source operands.
`The result is always the natural word length, or twice the
`natural word length. There is a performance penalty when it
`is twice the natural word length as it takes an extra Stage to
`Store and occupies two, rather than one, registers. For this
`computer, assume a natural word length of 64 bits. That is,
`each register in the register file has a predetermined capacity
`of 64 bits.
`The execution units 2,4,6 do not hold any state between
`instruction execution. Thus Subsequent instructions are inde
`pendent.
`Non-Packed Instructions
`The arithmetic unit 2 and memory acceSS unit 4, along
`with the control unit 16 can execute the following instruc
`tions of a conventional instruction Set. In the following
`definitions, a register is used to denote the contents of a
`register as well as a register itself as a storage location, in a
`manner familiar to a perSon Skilled in the art.
`
`OW
`add
`
`sub
`load
`
`Store
`cmpe
`
`Move a constant or a register into a register
`Add two registers together and store the result in a third register
`(which could be the same as either of the sources)
`Subtract two registers and store the result in a third register
`Use one register as an address and read from that location in
`memory, storing the result into another register
`Use one register as an address and store the contents of another
`register into memory at the location specified by the address
`Compare two registers (or a register and a constant) for equality.
`If they are equal, store 1 into the destination register otherwise
`StOre Zero
`cmpge Compare two registers (or a register and a constant) for order
`ability. If the second is not less than the first, store 1 into the
`destination register other wise store zero
`jump
`Unconditional jump to a new location
`jumpz Jump to a new program location, if the contents of a specified
`register is zero
`jumpinz, Jump to a new program location, if the contents of a specified
`register is not zero
`Perform a bitwise right shift of a register by a constant or
`another register and store the result in a destination register.
`The shift is signed because the sign bit is duplicated when
`shifting
`
`shr
`
`US 6,865,272 B2
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Oracle-1023 p. 14
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`US 6,865,272 B2
`
`8
`The instructions are named as follows:
`
`-continued
`
`shl
`orfxor
`
`Perform a bitwise left shift of a register by a constant or another
`register and store the result in a destination register
`Perform a bitwise logical operation (orfxor) on two registers and
`store result in destination register.
`
`5
`
`Packed Unit
`FIG. 6 shows in a block diagram the packed arithmetic
`unit 6. This is shown as a collection of Separate units each
`responsible for Some Subset of packed arithmetic instruc
`tions. Another implementation could combine the functions
`in different ways. The units include a byte replicate unit 70,
`a mask unit 72, a twist and Zip unit 74, an obvious packed
`arithmetic unit 80 and other units 76 and 78 which are not
`described herein. These are all operated responsive to a route
`opcode unit 82 which selectively controls the arithmetic
`units 70 to 80. Operands for the arithmetic units 70 to 80 are
`supplied along the Source 1 and Source 2 buses 52,54.
`Results from the arithmetic units are Supplied to the result
`bus 56. The opinput to the route opcode unit 82 receives the
`Pack Ops instruction from the control unit 16 (FIG. 1). It
`will be appreciated that the operands Supplied on the Source
`1 and Source 2 buses are loaded into respective input buffers
`of the arithmetic units and the results Supplied from one or
`two output buffers to one or two destination registers in the
`register file 12.
`Obvious Packed Arithmetic
`The obvious packed arithmetic unit 80 performs opera
`tions taking the two Source operands as containing Several
`packed objects each and operating on respective pairs of
`objects in the two operands to produce a result also con
`taining the same number of packed objects as each Source.
`The operations Supported can be addition, Subtraction,
`comparison, multiplication, left shift, right shift etc. AS
`explained above, by addressing a register using a single
`address an operand will be accessed. The operand comprises
`a plurality of objects which cannot be individually
`addressed.
`FIG. 7 shows the symbols used in the diagrams illustrat
`ing the arithmetic units of the packed arithmetic unit 6.
`FIG. 8 shows an obvious packed arithmetic unit which
`can perform addition, Subtraction, comparison and multipli
`cation of packed 16 bit numbers. AS, in this case, the Source
`and result bus widths are 64 bit, there are four packed
`objects, each 16 bits long, on each bus.
`The obvious packed arithmetic unit 80 comprises four
`arithmetic logical units ALU0-ALU3, each of which are
`controlled by opcode on line 100 which is derived form the
`route opcode unit 82 in FIG. 3. The 64 bit word supplied
`from Source register 1 SRC1 contains four packed objects
`S10-S13). The 64 bit word Supplied from source register
`2 SRC2 contains four packed objects S20-S23). These are
`stored in first and second input buffers 90.92. The first
`arithmetic logic unit ALU0 operates on the first packed
`object in each operand, S10 and S20 to generate a result
`RO). The second to fourth arithmetic logic units
`ALU1-ALU3 similarly take the second to fourth pairs of
`objects and provide respective results R1 to R3). These
`are stored in a result buffer 102. The result word thus
`contains four packed objects. An enable unit 101 determines
`if any of the unit should be active and controls whether the
`output buffer asserts its output.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Add each respective S1i to S2i as 2's complement
`add2p
`numbers producing Ri. Overflow is ignored.
`addacr2p Or each respective S1i with S2i as 2's complement
`numbers producing Ri.
`sub2p
`Subtract each respective S2i from S1i as 2's complement
`numbers producing Ri. Overflow is ignored.
`cmpe2p
`Compare each respective S1i with S2i). If they are equal,
`set Ri to all ones; if they are different, set Ri to zero.
`cmpge2ps Compare each respective S1i with S2i as signed 2's
`complement numbers. If S1i) is greater than or equal to S2i.
`set Ri to all ones; if S1i is less than S2i set Ri
`to Zero.
`cmpne2p Compare each respective S1i with S2i). If they are
`different, set Ri to all ones; if they are equal, set Ri to
`ZCO.
`Multiply each respective S1i by S2i as signed 2’s
`complement numbers setting Ri to the least significant 16
`bits of the full (32bit) product.
`
`mul2ps
`