`
`Transmitted herewith for filing under 35 U.S.C. 111 and 37 C.F.R. 1.53 is the patent application of:
`Lee, R.
`For: A METHOD AND SYSTEM FOR PERFORMING PERMUTATIONS WITH BIT PERMUTATION
`INSTRUCTIONS
`
`EL847628152US
`
`Enclosed are:
`l8l Certificate of Mailing with Express Mail Mailing Label No.
`l8l 27
`sheets of drawings.
`(cid:143) A certified copy of a
`(cid:143)
`l8l Declaration
`l8l Power of Attorney
`(cid:143)
`Information Disclosure Statement
`(cid:143) Preliminary Amendment
`l8l Not needed
`Verified Statement(s) to Establish Small Entity Status Under 37 C.F.R. 1.9 and 1.27.
`(cid:143) Other:
`
`Signed.
`
`l8l Unsigned.
`
`application.
`
`For
`
`#Filed
`
`#Allowed
`
`#Extra
`
`CLAIMS AS FILED
`
`Total Claims
`
`lndep. Claims
`
`77
`
`23
`
`-20 =
`
`- 3 =
`
`57
`
`20
`
`X
`
`X
`
`Multiple Dependent Claims (check if applicable)
`
`(cid:143)
`
`Rate
`
`$9.00
`
`$40.00
`
`Fee
`
`$513.00
`
`$800.00
`
`$0.00
`
`BASIC FEE
`
`$355.00
`
`TOT AL FILING FEE
`
`$1,668.00
`
`l8l A check in the amount of
`$1,668.00
`to cover the filing fee is enclosed.
`l8l The Commissioner is hereby authorized to charge and credit Deposit Account No.
`as described below. A duplicate copy of this sheet is enclosed.
`(cid:143) Charge the amount of
`as filing fee.
`l8l Credit any overpayment.
`l8l Charge any additional filing fees required under 37 C.F.R. 1.16 and 1.17.
`D Charge the issue fee set in 37 C.F.R. 1.18 at the mailing of the Notice of Allowance,
`pursuant to 37 C.F.R. 1.311(b).
`
`11
`
`13-2165
`
`Dated: May 7, 2001
`
`cc:
`
`Diane Dunn McKay
`Reg. No. 34,586
`Mathews, Collins, Shepherd & Gould, P.A.
`100 Thanet Circle, Suite 306
`Princeton, NJ 08540
`(609) 924-8555 Telephone
`(609) 924-3036 Facsimile
`
`P01 SMALUREV06
`
`Oracle-1002 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104 us
`
`EXPRESS MAIL CERTIFICATE
`
`"Express Mail" Mailing Label Number: EL847628152US
`
`Express Mail Corporate Account Number: X079384
`
`Date of Deposit:
`
`May 7, 2001
`
`Type of Documents:
`
`1.
`2.
`3.
`4.
`5.
`6.
`7.
`
`Acknowledgment Post Card;
`"Express Mail" Certificate;
`Patent Application Transmittal Letter (Small Entity)
`Specification (34 pages), Claims (19 pages) & Abstract (1 page);
`27 Sheets of Drawings;
`Declaration and Power of Attorney (unsigned);
`Check in the amount of $1,668.00.
`
`I hereby certify that this paper or fee is being deposited with the United States Postal
`Service "Express Mail Post Office to Addressee" service under 3 7 CFR 1.10 on the date
`indicated above and is addressed to the Assistant Commissioner of Patents, Washington,
`D.C. 20231; BOX: PATENT APPLICATION.
`
`Diane Dunn McKay
`(Typed or printed name of person mailing paper or fee)
`
`Oracle-1002 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`A Method and System for Performing Permutations
`With Bit Permutation Instructions
`
`4759-104
`
`5
`
`Background of the Invention
`
`1. Field of the Invention
`
`The present invention relates to a method and system for performing
`
`10
`
`permutations of a sequence of bits in a programmable processor.
`
`2. Description of the Related Art
`
`The need for secure information processing has increased with the increasing use
`
`15
`
`of the public internet and wireless communications in e-commerce, e-business and
`
`personal use. Typical use of the internet is not secure. Secure information processing
`
`typically includes authentication of users and host machines, confidentiality of messages
`
`sent over public networks, and assurances that messages, programs and data have not
`
`been maliciously changed. Conventional solutions have provided security functions by
`
`20
`
`using different security protocols employing different cryptographic algorithms, such as
`
`public key, symmetric key and hash algorithms.
`
`For encrypting large amounts of data symmetric key cryptography algorithms
`have been used, see Bruce Schneier, "Applied Cryptography", 2nd Ed., John Wiley &
`Sons, Inc., 1996. These algorithms use the same secret key to encrypt and decrypt a
`
`25
`
`given message, and encryption and decryption have the same computational complexity.
`
`In symmetric key algorithms, the cryptographic techniques of "confusion" and
`
`"diffusion" are synergistically employed. "Confusion" obscures the relationship between
`
`the plaintext ( original message) and the ciphertext ( encrypted message), for example,
`through substitution of arbitrary bits for bits in the plaintext. "Diffusion" spread the
`
`30
`
`redundancy of the plaintext over the ciphertext, for example through permutation of the
`bits of the plaintext block. Such bit-level permutations have the drawback of being slow
`
`1
`
`Oracle-1002 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`when implemented with conventional instructions available in microprocessors and other
`
`programmable processors.
`
`Bit-level permutations are particularly difficult for processors, and have been
`
`5
`
`avoided in the design of new cryptography algorithms, where it is desired to have fast
`
`software implementations, for example in the Advanced Encryption Standard, as
`described in NIST, "Announcing Request for Candidate Algorithm Nominations for the
`
`Advanced Encryption Standard (AES)", http:/ /csrc.nist.gov/encryption/aes/pre(cid:173)
`roundl/aes 9709.htm, Since conventional microprocessors are word-oriented, performing
`bit-level permutations is difficult and tedious. Every bit has to be extracted from the
`
`source register, moved to its new location in the destination register, and combined with
`the bits that have already been moved. This requires 4 instructions per bit (mask
`
`generation, AND, SHIFT, OR), and 4n instructions to perform an arbitrary permutation
`of n bits. Conventional microprocessors, for example Precision Architecture (PA-RISC)
`have been described to provide more powerful bit-manipulation capabilities using
`EXTRACT and DEPOSIT instructions, which can essentially perform the four operations
`required for each bit in 2 instructions (EXTRACT, DEPOSIT), resulting in 2n
`instructions for any arbitrary permutation of n bits, see Ruby Lee, "Precision
`Architecture", IEEE Computer, Vol. 22, No. 1, pp. 78-91, Jan. 1989. Accordingly, an
`arbitrary 64-bit permutation could take 128 or 256 instructions on this type of
`conventional microprocessor. Pre-defined permutations with some regular patterns have
`been implemented in fewer instructions, for example, the permutations in DES, as
`described in Bruce Schneier, "Applied Cryptography", 2nd Ed., John Wiley & Sons, Inc.,
`
`1996.
`
`Conventional techniques have also used table lookup methods to implement fixed
`permutations. To achieve a fixed permutation of n input bits with one table lookup, a
`table with 2n entries is used with each entry being n bits. For a 64-bit permutation, this
`type of table lookup would use 267 bytes, which is clearly infeasible. Alternatively, the
`table can be broken up into smaller tables, and several table lookup operations could be
`
`10
`
`15
`
`20
`
`25
`
`30
`
`2
`
`Oracle-1002 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`used. For example, a 64-bit permutation could be implemented by permuting 8
`consecutive bits at a time, then combining these 8 intermediate permutations into a final
`permutation. This method requires 8 tables, each with 256 entries, each entry being 64
`bits. Each entry has zeros in all positions, except the 8 bit positions to which the selected
`8 bits in the source are permuted. After the eight table lookups done by 8 LOAD
`instructions, the results are combined with 7 OR instructions to get the final permutation.
`In addition, 8 instructions are needed to extract the index for the LOAD instruction, for a
`total of 23 instructions. The memory requirement is 8*256*8=16 kilobytes for eight
`tables. Although 23 instructions is less than the 128 or 256 instructions used in the
`previous method, the actual execution time can be much longer due to cache miss
`penalties or memory access latencies. For example, if half of the 8 Load instructions
`miss in the cache, and each cache miss takes 50 cycles to fetch the missing cache line
`from main memory, the actual execution time is more than 4*50=200 cycles.
`Accordingly, this method can be longer than the previously described 128 cycles using
`EXTRACT and DEPOSIT. This method also has the drawback of a memory requirement
`of 16 kilobytes for the tables.
`
`5
`
`10
`
`15
`
`20
`
`Permutations are a requirement for fast processing of digital multimedia
`information, using subword-parallel instructions, more commonly known as multimedia
`instructions, as described in Ruby Lee, "Accelerating Multimedia with Enhanced Micro(cid:173)
`processors", IEEE Micro, Vol. 15, No. 2, pp.22-32, April 1995, and Ruby Lee, "Subword
`Parallelism in MAX-2", IEEE Micro, Vol. 16, No. 4, pp.51-59, August 1996. The MAX-
`2 general-purpose PERMUTE instructions can do any permutation, with and without
`repetitions, of the subwords packed in a 64-bit register. However, it is only defined for
`16-bit subwords. MIX and MUX instructions have been implemented in the IA-64
`architectures, which are extensions to the MIX and PERMUTE instructions ofMAX-2,
`see Intel Corporation, "IA-64 Application Developer's Architecture Guide", Intel
`Corporation, May, 1999. The IA-64 uses MUX instruction, which is a fully general
`permute instruction for 16-bit subwords, with five new permute byte variants. A
`30 VPERM instruction has been used in an AltiVec extension to the Power PC™ available
`
`25
`
`3
`
`Oracle-1002 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`from IBM Corporation, Armonk, NY, see Motorola Corporation, "'AltiVec Extensions to
`PowerPC' Instruction Set Architecture Specification", Motorola Corporation, May 1998.
`The Altivec VPERM instruction extends the general permutation capabilities ofMAX-2's
`PERMUTE instruction to 8-bit subwords selected from two 128-bit source registers, into
`a single 128-bit destination register. Since there are 32 such subwords from which 16 are
`selected, this requires 16*lg32=80 bits for specifying the desired permutation. This
`means that VPERM has to use another 128-bit register to hold the permutation control
`bits, making it a very expensive instruction with three source registers and one
`destination register, all 128 bits wide.
`
`It is desirable to provide significantly faster and more economical ways to
`perform arbitrary permutations of n bits, without any need for table storage, which can be
`used for encrypting large amounts of data for confidentiality or privacy.
`
`5
`
`10
`
`15
`
`SUMMARY OF THE INVENTION
`
`;,
`
`The present invention provides permutation instructions which can be used in
`software executed in a programmable processor for solving permutation problems in
`cryptography, multimedia and other applications. For fast cryptography, bit-level
`permutations are used, whereas for multimedia, permutations on subwords of typically 8
`bits or 16 bits are used. Permutation instructions of the present invention can be used to
`provide any arbitrary permutation of sixty-four 1-bit subwords in a 64-bit processor, i.e.,
`a processor with 64-bit words, registers and datapaths, for use in fast cryptography. The
`permutation instructions of the present invention can also be used for permuting
`subwords greater than 1 bit in size, for use in fast multimedia processing. For example,
`in addition to being able to permute sixty-four 1-bit subwords in a register, the
`permutation instructions and underlying functional unit can permute thirty-two 2-bit
`subwords, sixteen 4-bit subwords, eight 8-bit subwords, four 16-bit subwords, or two 32-
`bit subwords. The permutation instructions of the present invention can be added as new
`instructions to the Instruction Set Architecture of a conventional microprocessor, or they
`
`20
`
`25
`
`30
`
`4
`
`Oracle-1002 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`can be used in the design of new processors or coprocessors to be efficient for both
`cryptography and multimedia software.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`A PPERM instruction is defined to perform permutations by a sequence of
`instructions with each sequence specifying the position in the source for each bit in the
`destination. In the PPERM instruction bits in the destination register that change are
`updated and bits in the destination register that do not change are set to zero.
`Alternatively, a PPERM3R instruction is defined to perform permutations. The
`PPERM3R instruction is similar to the PPERM instruction except that the bits from the
`destination register which do not change are copied unchanged, rather than set to zero as
`in PPERM. Accordingly, the PPERM3R instruction uses three source registers because
`the destination register is also a source register since the unchanged bits are held in the
`destination register. For every one of n bits to be changed in the final permutation, 1 gn
`bits can be used in the PPERM instruction or the PPERM3R instruction to specify which
`bit in the source register should replace the bit to be changed in the destination register.
`
`In an alternate embodiment, a GRP instruction is defined to perform
`permutations. The GRP instruction divides the initial sequence in the source register into
`two groups depending on configuration bits. The first group is concatenated with the
`second group to form the result of one GRP instruction, which is also an intermediate bit
`sequence toward the final permutation. The total number of GRP instructions for a
`permutation of n bits is up to lgn.
`
`In an embodiment of the present invention, multibit subwords are permuted with
`the GRP instruction. In a further embodiment of the invention, the method and system
`are scaled for performing permutations of 2n bits in which subwords are packed into two
`or more registers. In this embodiment, at most 2lgn+4 instructions are used to permute
`2n bits using n-bit words.
`
`30
`
`For a better understanding of the present invention, reference may be made to the
`accompanying drawings.
`
`5
`
`~
`
`Oracle-1002 p. 7
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`Brief Description of the Drawings
`
`5
`
`10
`
`15
`
`Fig. 1 is a schematic diagram of a system for implementing permutation instructions in
`accordance with an embodiment of the present invention.
`Fig. 2 is a flow diagram of a method for determining a permutation instruction
`sequence to achieve a desired permutation in accordance with an embodiment of the present
`invention.
`Fig. 3A is a schematic diagram of operation of a PPERM instruction.
`Fig. 3B is a schematic diagram of an example extracting 8 bits from multiple registers
`with PPERM instructions with "otherreg" bit.
`Fig. 4A illustrates a schematic diagram of an alternate system with three source
`registers.
`Fig. 4B is a schematic diagram of a circuit for implementing a 64-bit PPERM
`instruction.
`Fig. 5 is a schematic diagram of an operation of a GRP instruction.
`Fig. 6 is a schematic diagram of a method for determining a sequence of GRP
`instruction and control bits for the GRP instructions.
`Fig. 7 is an example of determining a GRP instruction sequence for an 8-bit
`permutation.
`Fig. SA is a schematic diagram of a unit for serial implementation of a GRP
`operation with one control signal.
`Fig. 8B is a circuit diagram for a unit for a GRP operation with one control signal.
`Fig. 8C is a schematic diagram of an alternate unit for serial implementation of a
`GRP operation with two control signals.
`Fig. SD is a circuit diagram for a unit for a GRP operation with two control
`signals.
`Fig. 9A is an example of a circuit for serial implementation of a first step of a
`30 GRP operation using half of a GRP function unit.
`Fig. 9B is an example of a whole 8-bit GRP function unit.
`
`20
`
`25
`
`6
`
`Oracle-1002 p. 8
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`Fig. 10 is a circuit diagram of a serial implementation of a GRP operation.
`Fig. 11 is a schematic diagram of principle for a parallel scheme for a GRP
`
`operation.
`Fig. 12 is a schematic diagram of a unit for parallel implementation of a GRP
`operation.
`Fig. 13 is a circuit diagram for extracting z bits in an 8-bit group by combining z
`bits in two 4-bit groups.
`Fig. 14 is a circuit diagram of addition of two one-hot encoded numbers.
`Fig. 15 is a circuit diagram of a circuit to generate a final result of an n bit GRP
`operation, wherein n=4.
`Fig. 16 is a schematic diagram of a parallel GRP implementation.
`Fig. 17 is a schematic diagram of a module for generating select signals.
`Fig. 18A is a flow diagram of a method for 2n-bit permutation in accordance with
`an embodiment of the present invention.
`Fig. 18B is a schematic diagram of the method shown in 18A.
`Fig. 19 is a graph of the number of instructions for encryption and key generation
`in DES.
`Fig. 20 is a circuit diagram for extracting z bits in an 8-bit group by combining z
`bits in two 4-bit groups and making non-z bits zero.
`Fig. 21 is a schematic diagram of an embodiment of a parallel implementation of
`8-bit GRP operation.
`Fig. 22 is a schematic diagram of an embodiment of a parallel implementation of
`64-bit GRP operation.
`Fig. 23 is a schematic diagram of an embodiment of a module for generating
`select signals.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`Detailed Description
`
`Reference will now be made in greater detail to a preferred embodiment of the
`invention, an example of which is illustrated in the accompanying drawings. Wherever
`
`30
`
`=
`
`7
`
`Oracle-1002 p. 9
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`possible, the same reference numerals will be used throughout the drawings and the
`description to refer to the same or like parts.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`Fig. 1 illustrates a schematic diagram of a system 10 for implementing efficient
`permutation instructions in accordance with the teachings of the present invention. Register
`file 12 includes source register 1 la, source register 11 b and destination register l lc. System
`10 can provide bit-level permutations of all n bits of any register in register file 12. The same
`solution can be applied to different subword sizes of i bits, for i=0, 1, 2, ... , m, where n=2m
`bits. For a fixed word size of n bits, and 1-bit subwords, there are n subwords to be permuted.
`Source register values to be permuted 13 from register l la and configuration bits 15 from
`source register 11 b are applied over data paths to permutation functional unit 14. Permutation
`function unit 14 generates permutation result 16. Permutation result 16 can be an intermediate
`result if additional permutations are performed by permutation functional unit 14. For other
`instructions, arithmetic logic unit (ALU) 17 and shifter 18 receive source register values 13
`from source register 1 la and source register values 15 from source register I lb and generate
`respective ALU result 20 and shifter result 21 over a data path to destination register I le.
`System 10 can be implemented in any programmable processor, for example, a conventional
`microprocessor, digital signal processor (DSP), cryptographic processor, multimedia processor,
`media processor, programmable system-on-a-chip (SOC), and can be used in developing
`processors or coprocessors for providing cryptography, multimedia and other operations.
`
`Fig. 2 is a flow diagram of a method of determining permutation instruction sequences
`for permutations 22. The determined permutation instruction sequences can be executed in
`permutation functional unit 14. In block 23, bit positions in a source sequence of bits are
`defined for a group of bits in a destination register. In block 24, a permutation instruction is
`determined with the bit positions to assemble bits from the source sequence of bits. In block
`25, the permutation instruction is performed. The assembled bits are inserted into a destination
`register as determined by the bit positions. Blocks 23-25 can be conditionally repeated for
`every non-overlapping group of bits in the destination, in block 26. After the final permutation
`is determined, the desired permutation of the source register is determined in block
`27.Alternatively, all block 23 can be performed for all bits in the destination register and block
`
`8
`
`Oracle-1002 p. 10
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`24 is performed for all bit positions determined in block 23. Thereafter when the permutation
`needs to be performed, block 25 is performed. All permutation instructions can be performed
`in block 25 in parallel.
`
`4759-104
`
`A PPERM instruction can be used as the permutation instruction described above
`for dynamically specified permutations of n subwords. Each PPERM instruction defines
`a subset of bits which subsets can be combined to define the desired permutation. The
`source positions for k bits can be specified with one instruction. PPERM instructions can
`be defined as follows:
`
`5
`
`10
`
`PPERM,x
`
`Rl, R2, R3
`
`wherein Rl and R2 are the source registers and R3 is a destination register. Rl contains
`the bits to be permuted. R2 contains configuration bits. x specifies which k bits in R3
`15 will change. In R3, only k bits specified by x are updated, the other bits are set to zero.
`klgn bits in R2 can be used to specify where to extract the k consecutive bits to be
`changed in R3.
`
`20
`
`25
`
`Fig. 3A illustrates a schematic diagram of an example operation of the PPERM
`instruction. The PPERM instruction is PPERM, 1 Rl, R2, R3 wherein R2 is
`0x020E160820252C33 in hexadecimal notation. This is the same as the decimal values
`shown in R2 as (2, 14, 22, 8, 32, 37, 44, 51). Configuration bits ofregister R2 can be
`applied with source bits to be permuted of register RI to a 64-8 crossbar 30. 64-8
`crossbar 30 assembles bits from RI according to configuration bits ofR2 into assembled
`bits 32. Assembled bits 32 are inserted in R3 at byte I as determined by x. In this
`embodiment, 8 bits are permuted each time, and 56 bits are set to zero.
`In order to store the position information in one register, the following inequality
`should hold
`klgn::,; n
`
`(1)
`
`30
`
`Therefore,
`
`9
`
`Oracle-1002 p. 11
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`ks..!!:_
`lgn
`
`4759-104
`
`(2)
`
`Approximately nllgn bits can be specified with one instruction. In total, n/k
`PPERM instructions which is approximately equivalent to lg n PPERM instructions are
`used for an n-bit permutation. For example, when n=64, k=8 is selected. Eight PPERM
`instructions for a 64-bit permutation are used, and seven OR instructions to merge these
`results to get the desired permutation. For every one of the k bits to be copied in the final
`permutation, lgn bits are used to specify which bit in the source register should be copied.
`
`The PPERM instruction is scalable to multiple n bits wherein subwords are packed in
`more than one register. To allow PPERM to permute bits from more than one source register,
`an extra bit (denoted "otherreg") is used to select each bit in the source register. Accordingly,
`different PPERM instructions can pick bits from more than one register. In this embodiment,
`for n=64 bits, each index into the source register is (lgn+ 1 )=7 bits. If the "otherreg" bit=0, then
`the remaining 6-bit index selects a bit in the source register to place in the destination register,
`as described above. If the "otherreg" bit=l, the corresponding bit in the destination register is
`forced to zero. The pseudo code for the operation performed by PPERM instructions on 64-bit
`architecture is shown in Table 1.
`
`Instruction
`PPERM,x Rl, R2, R3
`
`Pseudo code
`
`O;
`R3 [0 .. n-1]
`<: k; i ++}
`0; i
`for (i
`otherreg R2[i*(lg(n)+l)];
`j = R2[i*(lg(n}+l}+l .. ( (i+l) * (lg(n} +1} -1];
`0)
`if (otherreg
`R3[x*k+i]= Rl[j];
`
`Table 1
`
`To permute 2n bits, two source registers must be used, and two destination registers are
`produced. For each destination register, 8 PPERM instructions are used on each source
`register, requiring a total of 16 PPERM instructions and 15 OR instructions to combine the
`results into one destination register. The same must be repeated to produce the other
`destination register. Hence, a total of 2(16+ 15)=62 instructions are needed to permute 2n bits.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`10
`
`Oracle-1002 p. 12
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`The PPERM instruction with the "otherreg" bit can permute more than 2n bits stored in
`multiple registers. In Fig 3B, the 8 bits to be collected in register Rl can come from 5 different
`source registers RI 1, Rl2, R13, Rl4 and R15. In this example the bits collected are: bit 2 from
`RI 1, bit 14 from R12, bit 22 from R13, bit 8 from R14, and bit 32, bit 37, bit 44 and bit 51
`from R15. Registers R21, R22, R23, R24 and R25 are used for storing configuration bits. 8
`configuration bits are used to select one data bit from one ofregisters RI 1, R12, R13, R14 or
`Rl 5. The lower 6 bits are the position of selected bits in source registers, shown as the lower
`rwo of numbers in the configuration registers in Fig. 3B. The most significant bit is the
`"otherreg" bit, shown as the upper row of numbers in Fig. 3B. The PPERM instructions shown
`in Table 1.1 can be used to collect the desired 8 bits from the 5 data registers RI 1, R12, Rl3,
`R14 and RlS. In each of instructions 1, 2, 3 and 4, only one bit is extracted to RI, R2, R3, and
`R4 because only one index in each of the configuration bits has a O "otherreg" bit. For
`example, in R21, only 02 has O in "otherreg" bit. In R22, only OE has O in "otherreg" bit.
`Instruction 5 puts 4 bits in RS because 4 indices have O in their "otherreg" bit. Thereafter, the
`desired 8 bits are merged in Rl with OR instructions.
`
`1:
`2:
`3:
`4:
`5:
`6:
`7:
`8:
`9:
`
`PPERM,2 Rll, R21, Rl
`R12, R22, R2
`PPERM,2
`R13, R23, R3
`PPERM,2
`R14, R24, R4
`PPERM,2
`R15, R25, RS
`PPERM,2
`Rl, R2, Rl
`OR
`R3, R4, R3
`OR
`Rl, RS, Rl
`OR
`Rl, R3, Rl
`OR
`
`0x02 OE 16 08 20 25 2C
`;R21=0x02 80 80 80 80 80 80
`;R22=0x80 OE 80 80 80 80 80
`;R23=0x80 80 16 80 80 80 80
`;R24=0x80 80 80 08 80 80 80
`;R2S=0x80 80 80 80 20 25 2C
`
`33
`80
`80
`80
`80
`33
`
`Table 1.1
`
`In an alternate embodiment, the number of configuration registers are reduced. An
`additional parameter can be used in the PPERM instruction such that the PPERM instruction
`can be defined as:
`
`11
`
`~
`
`5
`
`10
`
`15
`
`20
`
`25
`
`Oracle-1002 p. 13
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`PPERM, x, regid
`
`Rl, R2, R3
`
`wherein Rl and R2 are the source registers and R3 is a destination register. Rl
`contains a subset of the bits to be permuted. x specifies which k bits in R3 are changed
`by copying bits from the source register. Regid specifies which subset of bits are stored
`in Rl. The configuration register R2 contains the index of the bit to be copied, and a
`srcid field, for each of the k bits. In R3, a bit is copied if it is one of the k bits specified
`by x and its "srcid" is equal to the "regid" encoded in the instruction. Otherwise, this bit
`in R3 is set to zero. "regid" and "srcid" can be any reasonable size, but both "regid" and
`"scrid" must contain the same number of bits. If regid and srcid have m bits, k(lgn+m)
`bits in R2 are used to specify where to extract the k bits and from which register. If m=O,
`the 11PPERM,x,regid" instruction is reduced back to the above-described "PPERM,x"
`instruction.
`
`For example, the PPERM instructions shown in Table 1.2 can be used for reducing the
`number of configuration registers used in the previous example, shown in Table 1.1. 8 bits are
`used to specify the location for each selected bit. The lower 6 bits are the bit position in the
`source register and higher 2 bits are "srcid". 2 configuration registers are used. Instruction 1
`grabs one bit because only 02' s srcid 0 matches the regid=0 of instruction 1. Instruction 2
`grabs one bit because only 4E's srcid 1 matches the regid=l of instruction 2. Instruction 3
`grabs one bit because only D6's srcid 3 matches the regid=3 of instruction 3. Instruction 4
`grabs one bit because only 48's srcid 1 matches the regid=l of instruction 4. Notice that
`instruction 4 uses a different configuration register 25 than the configuration register 21 used
`by the first 3 instructions. This allows more than 4 source registers to be used to supply bits to
`be permuted. Instruction 5 grabs 4 bits because the srcid 0 of bit 20, bit 25, bit 2C and bit 33
`all match the regid=0 of instruction 5.
`
`In this embodiment, only one configuration register is needed, if the data bits to be
`permuted are stored in at most 4 source registers. This is because 8 data bits are permuted in
`one PPERM instruction, each data bit requiring 6 configuration bits to specify its positional
`location in a source register, so only 2 configuration bits are left to specify a source register. If
`
`12
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`~
`
`Oracle-1002 p. 14
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`two configuration registers are used, then the data bits to be permuted can be stored in at most
`
`8 registers, and so forth.
`
`4759-104
`
`1:
`
`PPERM,2,0
`
`Rll, R21, Rl
`
`2:
`
`PPERM,2,1
`
`R12, R21, R2
`
`3:
`
`PPERM,2,3
`
`R13, R21, R3
`
`4:
`
`PPERM,2,l
`
`R14, R25, R4
`
`PPERM,2,0
`
`RlS, R25, RS
`
`OR
`OR
`OR
`OR
`
`Rl, R2, Rl
`R3, R4, R3
`Rl, RS, Rl
`Rl, R3, Rl
`
`5:
`
`6:
`7:
`8:
`9:
`
`5
`
`0x02 OE 16 08 20 25 2C 33
`
`;R21=0x02 4E D6 80 80 80 80 80
`;srcid:0 1
`2
`2
`2
`2
`2
`3
`
`;R21=0X02 4E D6 80 80 80 80 80
`l
`2
`2
`2
`2
`;srcid:0
`3
`2
`
`;R21=0X02 4E D6 80 80 80 80 80
`3
`2
`;srcid:0 1
`2
`2
`2
`2
`
`;R25=0x80 80 80 48 20 25 2C 33
`;srcid:2
`2
`2
`0
`0
`1
`0
`0
`
`;R25=0X80 80 80 48 20 25 2C 33
`1
`0
`;srcid:2 2
`0
`0
`0
`2
`
`Table 1.2
`
`Fig. 4A illustrates a schematic diagram of an alternate embodiment of system 100 with
`
`3 source registers rather than two. Register file 112 includes source register 11 la, source
`
`register 11 lb, source register 11 ld and destination register 11 lc. System 100 can provide bit-
`
`10
`
`level permutations of all n bits of any register in register file 112. Source register values to be
`
`permuted 113 from register 11 la and configuration bits 115 from source register 11 lb and
`
`intermediate result 122 from source register 11 ld are applied over data paths to permutation
`
`functional unit 114. Permutation function unit 114 generates permutation result 116 which is
`
`written back to destination register 11 lc. PPERM3R instructions are defined on system 100 to
`
`15
`
`perform any n-bit permutation.
`
`The PPERM3R instruction can be defined as follows:
`
`PPERM3R,x RI, R2, R3
`
`RI and R2 are source registers, and R3 is a destination register. Rl contains the bits to be
`
`permuted. R2 contains the configuration bits. x specifies which k bits in R3 will change.
`
`20
`
`In R3, only k bits specified by x are updated, the other bits are copied from R3
`
`13
`
`Oracle-1002 p. 15
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`4759-104
`
`unchanged. The R3 destination register must also be a source register because the
`unchanged bits must be copied from R3 (used as a source register) to R3 (used as a
`destination register). The PPERM3R instruction is similar to the PPPERM instruction
`described above, except in the PPERM3R instructions three source registers are used.
`
`PPERM3R does not use the OR instructions used to accumulate the intermediate
`results produced by each PPERM instruction. For example, if 8 PPERM instructions are
`performed to permute 64 bits, then 7 OR instructions are used to accumulate the final
`permuted result, as described previously. To achieve the same permutation, 8
`PPERM3R instructions are used, since the partial result can be accumulated with each
`PPERM3R instruction. Accordingly, system 100 for PPERM3R requires 3 source
`registers, whereas system 10 for PPERM3 requires only 2 source registers.
`
`The following codes in Table 2 give an example of PPERM3R instruction which
`can be used to do an initial permutation in the data encryption standard (DES). All
`registers are 64 bits in width. Rl is the source and R2 is the target register. RIO through
`Rl 7 are registers containing permutation configuration bits. 6 of 8 bits are used to
`represent the position of each bit in the source register. Each PPERM3R instruction
`produces an intermediate state with 8 bits permuted. 8 instructions are required to
`permute all 64 bits. For example, the first byte in Rl0, Ox39 (in hexadecimal notation),
`indicates that the first bit in the target register R2 is bit 57 (in decimal notation) in the
`source register Rl.
`
`PPERM3R,O Rl, RlO, R2
`PPERM3R,1 Rl, Rll, R2
`PPERM3R,2 Rl, Rl2, R2
`PPERM3R,3 Rl, Rl3, R2
`PPERM3R,4 Rl, R14, R2
`PPERM3R,5 Rl, R15, R2
`PPERM3R,6 Rl, Rl6, R2
`PPERM3R,7 Rl, R17, R2
`
`Ox3931292119110901
`RlO
`R