`US007174014B2
`
`c12) United States Patent
`Lee et al.
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US 7,174,014 B2
`Feb.6,2007
`
`(54) METHOD AND SYSTEM FOR PERFORMING
`PERMUTATIONS WITH BIT PERMUTATION
`INSTRUCTIONS
`
`(75)
`
`Inventors: Ruby B. Lee, Princeton, NJ (US);
`Zhijie Shi, Princeton, NJ (US)
`
`(73) Assignee: Teleputers, LLC, Princeton, NJ (US)
`
`5,483,541 A
`5,524,256 A *
`5,546,393 A
`5,623,548 A
`5,673,321 A
`
`1/1996 Linsky ....................... 371/2.1
`6/1996 Turkowski .................. 712/300
`8/1996 Mine ......................... 370/60.1
`4/1997 Akiyama et al.
`............. 380/28
`9/1997 Lee ............................. 380/37
`
`(Continued)
`
`OTHER PUBLICATIONS
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 881 days.
`
`Zhijie Shi, Ruby B. Lee: "Bit Permutation Instructions for Accel(cid:173)
`erating Software Cryptography", in Proc. IEEE Intl. Conf. Appli(cid:173)
`cation-Specific Systems, Architectures and Processors, pp. 138-148,
`Jul. 2000.*
`
`(21) Appl. No.: 09/850,239
`
`(22)
`
`Filed:
`
`May 7, 2001
`
`(65)
`
`(51)
`
`(52)
`
`(58)
`
`Prior Publication Data
`
`US 2002/0078011 Al
`
`Jun. 20, 2002
`
`Int. Cl.
`H04K 1100
`(2006.01)
`U.S. Cl. .......................... 380/28; 380/44; 380/265;
`377/54; 377/60; 377/67; 377/81; 340/825.68;
`365/78; 711/109; 712/1; 712/10; 712/24;
`712/223
`Field of Classification Search .................. 380/44,
`380/265, 28; 377/54, 60, 75, 67, 81; 711/109;
`340/825.68; 365/73, 78; 712/1, 24, 10,
`712/223
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,796,830 A
`3,962,539 A
`4,275,265 A
`4,937,574 A
`4,972,481 A
`5,001,753 A
`5,297,207 A
`5,442,705 A
`
`3/1974 Smith .......................... 178/22
`6/1976 Ehrsam et al. ................ 178/22
`.. ... ... ... 178/27 .09
`6/ 1981 Davida et al.
`6/1990 Wright ....................... 341/106
`ll/ 1990 Santesson .. ... ... ... ... ... .. . 380/49
`3/1991 Davio et al ................... 380/29
`3/ 1994 Degele . ... .. ... ... ... ... ... .. . 380/46
`8/1995 Miyano ....................... 380/29
`
`(Continued)
`
`Primary Examiner-Emmanuel L. Moise
`Assistant Examiner-Paul Callahan
`(74)Attorney, Agent, or Firm-Mathews, Shepherd, McKay
`& Bruneau, P.A.
`
`(57)
`
`ABSTRACT
`
`The present invention provides permutation instructions
`usable in a programmable processor for solving permutation
`problems in cryptography, multimedia and other applica(cid:173)
`tions. PPERM and PPERM3R instructions are defined to
`perform permutations by a sequence of instructions with
`each sequence specifying the position in the source for each
`bit in the destination. In the PPERM instruction bits in the
`destination register that change are updated and bits in the
`destination register that do not change are set to zero. In the
`PPERM3R instruction bits in the destination register that
`change are updated and bits in the destination register that do
`not change are copied from intermediate result of previous
`PPERM3R instructions. Both PPERM and PPERM3R
`instructions can individually do permutation with bit rep(cid:173)
`etition. Both PPERM and PPERM3R instructions can indi(cid:173)
`vidually do permutation of bits stored in more than one
`register. In an alternate embodiment, a GRP instruction is
`defined to perform permutations.
`
`64 Claims, 27 Drawing Sheets
`
`n
`
`lla
`llb
`
`13
`15
`
`Register
`
`llc File
`
`12
`
`Permutation
`Functional
`Unit
`14
`
`16
`
`ALU
`
`17
`
`20
`
`Shifter
`
`18
`
`21
`
`Oracle-1001 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`US 7,174,014 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`5,734,721 A *
`5,768,493 A
`5,956,405 A
`6,009,505 A
`6,072,873 A
`6,081,896 A
`6,119,224 A
`6,195,026 Bl
`6,275,587 Bl
`
`3/ 1998 Clark ... ... ... .. ... ... ... ... .. . 380/46
`6/1998 Kumar ....................... 395/181
`9/ 1999 Yuval ... ... ... .. ... ... ... ... .. . 380/29
`12/1999 Thayer et al.
`................. 712/6
`6/2000 Bewick ...................... 380/217
`6/2000 Johns-Vano et al. ........ 713/200
`9/2000 Roth .......................... 712/300
`2/2001 Acharya .. .. ... ... ... ... ... .. . 341/60
`8/2001 Amerige ..................... 380/255
`
`4/2002 Lee ............................ 721/223
`6,381,690 Bl
`6,865,272 B2 * 3/2005 Cole ........................... 380/28
`
`OTHER PUBLICATIONS
`
`Advanced Encryption Standard, "Announcing Request for Candi(cid:173)
`date Algorithm Nominations for the Advanced Encryption Standard
`( AES), "http://csrc.nist.gov/ encryption/aes/pre-roundl/aes_9709.
`htm.
`
`* cited by examiner
`
`Oracle-1001 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 1 of 27
`
`US 7,174,014 B2
`
`n
`
`lla
`
`llb
`
`13
`15
`
`Register
`
`l lc File
`
`12
`
`Permutation
`Functional
`Unit
`14
`
`16
`
`20
`
`Fig.1
`
`Shifter
`
`18
`
`21
`
`Oracle-1001 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 2 of 27
`
`US 7,174,014 B2
`
`23-
`
`Determine bit positions in source
`
`l
`l
`
`Determine permutation instruction
`
`Perform permutation instruction
`
`24-
`
`25-
`
`26-
`
`Conditionally repeat 23 - 25
`
`,,
`27
`
`(
`
`]
`
`Fig. 2
`
`Oracle-1001 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 3 of 27
`
`US 7,174,014 B2
`
`2
`
`14
`
`22
`
`8
`
`32
`
`37
`
`44
`
`51
`
`R2
`
`Rl
`
`30--.__
`
`64-8 Crossbar
`
`__.-- 32
`
`R3
`
`0
`
`0
`
`0
`
`PPERM,1 Rl,R2,R3
`
`;R2 = 0x020E160820252C33
`
`Fig. 3A
`
`Oracle-1001 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 4 of 27
`
`US 7,174,014 B2
`
`R21
`
`Rll
`
`R22
`
`R12
`
`R23
`
`R13
`
`R24
`
`R14
`
`R25
`
`R15
`
`R21=0x02808D8080808D80
`1
`1
`1
`1
`1
`1
`0
`2
`0
`0
`0
`0
`0
`0
`
`1
`0
`
`PPERM
`FU
`
`Rl
`
`0
`
`0
`
`1
`0
`
`R2
`
`0
`
`1
`0
`
`R3
`
`0
`
`1
`0
`
`PPERM
`FU
`
`0
`
`0
`
`PPERM
`FU
`
`R4
`
`0
`
`0
`
`0
`51
`
`PPERM
`FU
`
`1 bit: otherreg
`
`RS
`
`0
`
`0
`
`OR
`
`~ - po•lion ;n
`ource register
`1
`
`rn
`
`/
`\
`8 bits value. 7 bits used.
`unused bit is 0
`
`R1
`
`0
`
`0 D 0
`
`0
`
`Fig. 3B
`
`Oracle-1001 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 5 of 27
`
`US 7,174,014 B2
`
`100
`
`n
`
`113
`
`llla
`
`11 lb
`Register
`
`11 lc File
`
`112
`
`······~······· ................... ······=·
`.
`.
`
`Permutation
`Functional
`Unit
`114
`
`ALU
`
`117
`
`116
`
`120
`
`Shifter
`
`118
`
`121
`
`Fig. 4A
`
`Oracle-1001 p. 7
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 6 of 27
`
`US 7,174,014 B2
`
`33
`
`8 6-bit control
`
`64 bits to be permuted
`
`-30
`
`X
`
`3(cid:157)
`8
`decoder
`
`Sell ... Sel7
`
`\
`
`8 permuted bits
`y
`
`_,1
`
`G
`G=O for PPERM~-s--,.'!-..-i---,.t--'l"'l-""""l"'l-,,.-""'l'"i'-..-----~"'1-1---,.._"l"'i-__,.t--,.;--,-,~~
`G=uochanged
`bit value in R3
`for PPERM3R SelO
`
`SeJ7
`
`\_______ bit O-bit7 (byte 0)
`
`bit 56 -
`~
`64 bits result
`
`bit 63 (byte 7) ___J
`
`34
`
`Fig. 4B
`
`Oracle-1001 p. 8
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb. 6, 2007
`
`Sheet 7 of 27
`
`US 7,174,014 B2
`
`0
`
`7
`
`Control Bits R2
`
`Data
`
`R1
`
`I 1101011I1Joi1 1 01
`j a I b I c i d I e j f I g I h I
`
`Result
`
`R3
`
`Fig. 5
`
`Oracle-1001 p. 9
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 8 of 27
`
`US 7,174,014 B2
`
`40
`
`41-
`
`Determine the final arrangement
`
`-42
`
`43-
`
`44-
`
`45-
`
`46-
`
`Combine groups of MISes
`
`Sort the merged groups
`
`ement
`
`Determine control bits for a permutation
`instruction
`
`No
`
`arrangement is the initial
`arrangement
`
`-47
`
`48-
`
`Determine GRP instruction sequence
`
`Yes
`
`Fig. 6
`
`Oracle-1001 p. 10
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 9 of 27
`
`US 7,174,014 B2
`
`50
`
`52-
`
`p
`
`Iteration 1
`(5, 0, 1, 2, 4, 3, 7, 6)
`
`51
`
`/
`Iteration 2
`(3, 5, 7, 0, 1, 2, 4, 6)
`
`53- MISes in P
`
`(5)(0, 1, 2, 4)(31 7)(6)
`
`(3, 5, 7), (Oi l, 2, 4, 6)
`
`54-
`
`55-
`
`56-
`
`57-
`
`Combining
`MISes
`Sorting
`
`New
`Arrangement
`Control bits
`forGRP
`instruction
`
`(5, 3, 7),(0, 1, 2, 4, 6)
`
`(3,5,7,0, 1,2,4,6)
`
`(3, 5, 7), (0, 1, 2, 4, 6)
`
`(0, 1, 2, 3, 4, 5, 6, 7)
`
`(3, 5, 7, 0, 1, 2, 4, 6)
`
`(0, 1,2,3,4,5,6,7)
`
`(1,0, 1,0,o,o,o, 1)
`
`(1, 1, 1, 0, 1, 0, 1, 0)
`
`Fig. 7
`
`Oracle-1001 p. 11
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 10 of 27
`
`US 7,174,014 B2
`
`60
`
`61a
`
`63c
`
`62x
`
`62y
`
`A)
`
`Fig. 8A
`
`63c'
`
`61a
`
`63c
`
`62x
`
`62y
`
`C)
`
`61a
`
`6 C
`
`62x ---+----'
`
`B)
`
`62y
`
`Fig. SB
`
`61a
`
`62x ---.--
`
`D)
`
`62y
`
`Fig. SC
`
`Fig. 8D
`
`Oracle-1001 p. 12
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 11 of 27
`
`US 7,174,014 B2
`
`D
`unit 60
`
`Row0
`
`Row1
`
`Row4
`
`Rows
`
`Row7
`
`Rows
`
`Fig. 9A
`
`Oracle-1001 p. 13
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 12 of 27
`
`US 7,174,014 B2
`
`71
`
`70
`
`(cid:143) unit 60
`
`72 -----
`
`Fig. 9B
`
`Oracle-1001 p. 14
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 13 of 27
`
`US 7,174,014 B2
`
`bits to be permuted
`control bits
`
`output
`
`Fig. 10
`
`Oracle-1001 p. 15
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 14 of 27
`
`US 7,174,014 B2
`
`80
`
`nJ2 bits
`
`81
`
`nl2 bits
`
`82
`
`83
`
`n bits
`
`D z bits: bits whose control bits are 0
`flt! bits whose control bits are 1
`ml invalid values
`
`Fig. 11
`
`Oracle-1001 p. 16
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 15 of 27
`
`US 7,174,014 B2
`
`85
`
`861
`
`870
`
`Fig. 12
`
`Oracle-1001 p. 17
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 16 of 27
`
`US 7,174,014 B2
`
`90
`
`00
`
`Ol
`
`02
`
`03
`
`04
`
`05
`
`06
`
`07
`
`Fig.13
`
`Oracle-1001 p. 18
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 17 of 27
`
`US 7,174,014 B2
`
`92
`
`0
`
`0
`
`0
`
`0
`
`T4
`
`T3
`
`T2
`
`Tl
`
`TO
`
`0
`
`so
`
`Sl
`
`S2
`
`S3
`
`S4
`
`U8
`
`U7
`
`U6
`
`us
`
`U4
`
`U3
`
`U2
`
`Ul
`
`uo
`
`Fig. 14
`
`Oracle-1001 p. 19
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 18 of 27
`
`US 7,174,014 B2
`
`94
`
`IO
`
`II
`
`12
`
`13
`
`JO
`
`so--
`
`SI--
`
`S2 - - -1
`
`S3--
`
`S4--
`
`00
`
`01
`
`02
`
`03
`
`Fig. 15
`
`Oracle-1001 p. 20
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 19 of 27
`
`US 7,174,014 B2
`
`64 bits input
`
`64 bits foput
`
`2 bit (cid:157)
`
`4 bits
`
`4 bit (cid:157)
`
`8 bits
`
`:
`:
`:
`:
`... ~ .. ········: .. ···· ...... , ................ , .... ~ ... ·····-:-··········· , .... ···················~ ....................................... , .. :- ··························
`.
`.
`.
`..
`.
`.
`..
`.
`
`16 bit (cid:157)
`
`32 bits
`
`........... ··-,1.- ...... ............................ l········· .. · .. · ........... ................ ;·· .. ··········· ..................... ; .............................. .
`32 bit (cid:157)
`
`64 bits
`
`............................... ·•··t·· .. •··········•····· .. ···· ... · ........................................ ,. ............ .......................... ··············•·····"
`
`output
`
`Fig. 16
`
`Oracle-1001 p. 21
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 20 of 27
`
`US 7,174,014 B2
`
`96
`
`64 bits control bit
`
`1 :encode number of 1 's in groups of 2 bits
`
`2:encode number of 1 'sin groups of 4 bits
`
`••••••••••• ................................. ·········--· ............ ·••,,=••··· ·········-········· •••
`
`. .
`.
`.
`.
`.
`
`.
`.
`.
`.
`
`············· .......... ; ..................................... t .............................. .
`6:encode number of 1 'sin groups of 64 bits
`
`5:encode number of 1 'sin groups of 32 bits
`
`Fig. 17
`
`Oracle-1001 p. 22
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 21 of 27
`
`US 7,174,014 B2
`
`200-
`
`201-
`
`202-
`
`Divide the bits of each register into two
`groups each going to separate
`destination registers using two GRP
`instructions
`
`Put bits of each register into the
`respective group in the destination
`re ister
`
`Perform GRP instructions to perform
`n-permutations on bits in each
`destination re ister
`
`Fig. 18A
`
`Oracle-1001 p. 23
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 22 of 27
`
`US 7,174,014 B2
`
`R1
`
`R1
`
`l
`r .J,
`
`205a
`
`R3
`
`I l
`
`R2
`
`205b
`
`207a
`
`207b
`
`GRP
`
`I
`
`SHIFT PAIR
`
`Fig. 18B
`
`Oracle-1001 p. 24
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 23 of 27
`
`US 7,174,014 B2
`
`3.03
`
`~Table
`Lookup
`(cid:143) PPERM3R
`
`3.5
`
`3
`
`2.5
`
`2
`
`1.5
`
`1
`
`0.5
`
`0
`
`Encryption
`
`Key generation
`
`Fig. 19
`
`Oracle-1001 p. 25
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 24 of 27
`
`US 7,174,014 B2
`
`110
`
`IO
`
`II
`
`12
`
`I3
`
`14
`
`15
`
`16
`
`17
`
`0
`
`00
`
`01
`
`02
`
`03
`
`04
`
`05
`
`06
`
`07
`
`Fig. 20
`
`Oracle-1001 p. 26
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 25 of 27
`
`US 7,174,014 B2
`
`114
`
`Control bits
`
`Inverted Control bits
`
`115
`/
`
`IO
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`17
`
`16
`
`15
`
`14
`
`13
`
`12
`
`11
`
`IO
`
`~ ~ n
`
`o
`
`o
`
`o
`
`o
`
`o
`
`04 03 02 01 00
`
`0
`
`0
`
`0
`
`Z0
`
`Zl
`
`Z2
`
`00 01 02
`
`03
`
`04
`
`113
`
`Fig. 21
`
`Oracle-1001 p. 27
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 26 of 27
`
`US 7,174,014 B2
`
`64 bits input
`
`64 bits input in reverse order
`
`1:1 bit (cid:157)
`
`2 bits
`
`2:2 bit (cid:157)
`
`4 bits
`
`.
`.
`.
`.
`.
`.
`.
`.
`'
`..
`..
`..
`'"" '°"" ""•}•• • "••••• "• ••" ••• •" ••" ,,.,,, • ••••-=•••• •• ••• •••• • ,o • • • •n ••• ' " •••• t••• •• • " •••• • ''" ••••" •• .,,,,., "''" :, ••••" ••••• ••r•••" •••••
`:
`•
`:
`•
`(cid:127)
`(cid:127)
`:
`:
`
`•
`
`3:4 bit (cid:157)
`
`8 bits
`
`···············l················· ··········· ·--···;l··················· ........................ l ······· .. ··························f············· ··· ············
`6:32 bit (cid:157)
`
`64 bits
`
`5:16 bit (cid:157)
`
`32 bits
`
`64 OR gates
`t
`output
`
`I
`
`Fig. 22
`
`Oracle-1001 p. 28
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`U.S. Patent
`
`Feb.6,2007
`
`Sheet 27 of 27
`
`US 7,174,014 B2
`
`64 bits control bit
`
`1 :encode number of 1 's in groups of 2 bits
`
`2:encode number of l's in groups of 4 bits
`
`~----~I ._I _____ __.I 5:encode nwnber of 1 's in groups of 32 bits
`
`Fig. 23
`
`Oracle-1001 p. 29
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`US 7,174,014 B2
`
`1
`METHOD AND SYSTEM FOR PERFORMING
`PERMUTATIONS WITH BIT PERMUTATION
`INSTRUCTIONS
`
`BACKGROUND OF THE INVENTION
`
`2
`example, the permutations in DES, as described in Bruce
`Schneier, "Applied Cryptography", 2nd Ed., John Wiley &
`Sons, Inc., 1996.
`Conventional techniques have also used table lookup
`5 methods to implement fixed permutations. To achieve a
`fixed permutation of n input bits with one table lookup, a
`table with r entries is used with each entry being n bits. For
`a 64-bit permutation, this type of table lookup would use 2 67
`bytes, which is clearly infeasible. Alternatively, the table can
`be broken up into smaller tables, and several table lookup
`operations could be used. For example, a 64-bit permutation
`could be implemented by permuting 8 consecutive bits at a
`time, then combining these 8 intermediate permutations into
`15 a final permutation. This method requires 8 tables, each with
`256 entries, each entry being 64 bits. Each entry has zeros
`in all positions, except the 8 bit positions to which the
`selected 8 bits in the source are permuted. After the eight
`table lookups done by 8 LOAD instructions, the results are
`20 combined with 7 OR instructions to get the final permuta(cid:173)
`tion. In addition, 8 instructions are needed to extract the
`index for the LOAD instruction, for a total of 23 instruc(cid:173)
`tions. The memory requirement is 8*256*8=16 kilobytes for
`eight tables. Although 23 instructions is less than the 128 or
`25 256 instructions used in the previous method, the actual
`execution time can be much longer due to cache miss
`penalties or memory access latencies. For example, if half of
`the 8 Load instructions miss in the cache, and each cache
`miss takes 50 cycles to fetch the missing cache line from
`30 main memory, the actual execution time is more than
`4*50=200 cycles. Accordingly, this method can be longer
`than the previously described 128 cycles using EXTRACT
`and DEPOSIT. This method also has the drawback of a
`memory requirement of 16 kilobytes for the tables.
`Permutations are a requirement for fast processing of
`digital multimedia information, using subword-parallel
`instructions, more commonly kuown as multimedia instruc(cid:173)
`tions, as described in Ruby Lee, "Accelerating Multimedia
`with Enhanced Micro-processors", IEEE Micro, Vol. 15, No.
`2, pp.22-32, April 1995, and Ruby Lee, "Subword Paral(cid:173)
`lelism in MAX-2 ", IEEE Micro, Vol. 16, No. 4, pp.51-59,
`August 1996. The MAX-2 general-purpose PERMUTE
`instructions can do any permutation, with and without
`repetitions, of the subwords packed in a 64-bit register.
`45 However, it is only defined for 16-bit subwords. MIX and
`MUX instructions have been implemented in the IA-64
`architectures, which are extensions to the MIX and PER(cid:173)
`MUTE instructions of MAX-2, see Intel Corporation, "IA-
`64 Application Developer's Architecture Guide", Intel Cor-
`50 poration, May, 1999. The IA-64 uses MUX instruction,
`which is a fully general permute instruction for 16-bit
`subwords, with five new permute byte variants. A VPERM
`instruction has been used in an AltiVec extension to the
`Power PC™ available from IBM Corporation, Armonk,
`N.Y., see Motorola Corporation, "'AltiVec Extensions to
`PowerPC'
`Instruction Set Architecture Specification",
`Motorola Corporation, May 1998. The Altivec VPERM
`instruction extends the general permutation capabilities of
`MAX-2's PERMUTE instruction to 8-bit subwords selected
`from two 128-bit source registers, into a single 128-bit
`destination register. Since there are 32 such subwords from
`which 16 are selected, this requires 16*lg32=80 bits for
`specifying the desired permutation. This means that VPERM
`has to use another 128-bit register to hold the permutation
`control bits, making it a very expensive instruction with
`three source registers and one destination register, all 128
`bits wide.
`
`10
`
`1. Field of the Invention
`The present invention relates to a method and system for
`performing permutations of a sequence of bits in a program(cid:173)
`mable processor.
`2. Description of the Related Art
`The need for secure information processing has increased
`with the increasing use of the public internet and wireless
`communications in e-commerce, e-business and personal
`use. Typical use of the internet is not secure. Secure infor(cid:173)
`mation processing typically includes authentication of users
`and host machines, confidentiality of messages sent over
`public networks, and assurances that messages, programs
`and data have not been maliciously changed. Conventional
`solutions have provided security functions by using different
`security protocols employing different cryptographic algo(cid:173)
`rithms, such as public key, symmetric key and hash algo(cid:173)
`rithms.
`For encrypting large amounts of data symmetric key
`cryptography algorithms have been used, see Bruce
`Schneier, "Applied Cryptography", 2nd Ed., John Wiley &
`Sons, Inc., 1996. These algorithms use the same secret key
`to encrypt and decrypt a given message, and encryption and
`decryption have the same computational complexity. In
`symmetric key algorithms, the cryptographic techniques of
`"confusion" and "diffusion" are synergistically employed.
`"Confusion" obscures the relationship between the plaintext
`( original message) and the ciphertext ( encrypted message),
`for example, through substitution of arbitrary bits for bits in 35
`the plaintext. "Diffusion" spread the redundancy of the
`plaintext over the ciphertext, for example through permuta(cid:173)
`tion of the bits of the plaintext block. Such bit-level permu(cid:173)
`tations have the drawback of being slow when implemented
`with conventional instructions available in microprocessors 40
`and other programmable processors.
`Bit-level permutations are particularly difficult for pro(cid:173)
`cessors, and have been avoided in the design of new
`cryptography algorithms, where it is desired to have fast
`software implementations, for example in the Advanced
`Encryption Standard, as described in NIST, "Announcing
`Request for Candidate Algorithm Nominations for the
`Advanced Encryption Standard (AES)". Since conventional
`microprocessors are word-oriented, performing bit-level
`permutations is difficult and tedious. Every bit has to be
`extracted from the source register, moved to its new location
`in the destination register, and combined with the bits that
`have already been moved. This requires 4 instructions per bit
`(mask generation, AND, SHIFT, OR), and 4n instructions to
`perform an arbitrary permutation of n bits. Conventional 55
`microprocessors, for example Precision Architecture (PA(cid:173)
`RISC) have been described to provide more powerful bit(cid:173)
`manipulation capabilities using EXTRACT and DEPOSIT
`instructions, which can essentially perform the four opera(cid:173)
`tions required for each bit in 2 instructions (EXTRACT, 60
`DEPOSIT), resulting in 2n instructions for any arbitrary
`permutation of n bits, see Ruby Lee, "Precision Architec(cid:173)
`ture", IEEE Computer, Vol. 22, No. 1, pp. 78-91, January
`1989. Accordingly, an arbitrary 64-bit permutation could
`take 128 or 256 instructions on this type of conventional 65
`microprocessor. Pre-defined permutations with some regular
`patterns have been implemented in fewer instructions, for
`
`Oracle-1001 p. 30
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`US 7,174,014 B2
`
`4
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`3
`It is desirable to provide significantly faster and more
`economical ways to perform arbitrary permutations of n bits,
`without any need for table storage, which can be used for
`encrypting large amounts of data for confidentiality or
`privacy.
`
`SUMMARY OF THE INVENTION
`
`The present invention provides permutation instructions
`which can be used in software executed in a programmable 10
`processor for solving permutation problems in cryptography,
`multimedia and other applications. For fast cryptography,
`bit-level permutations are used, whereas for multimedia,
`permutations on subwords of typically 8 bits or 16 bits are
`used. Permutation instructions of the present invention can 15
`be used to provide any arbitrary permutation of sixty-four
`1-bit subwords in a 64-bit processor, i.e., a processor with
`64-bit words, registers and datapaths, for use in fast cryp(cid:173)
`tography. The permutation instructions of the present inven(cid:173)
`tion can also be used for permuting subwords greater than 1 20
`bit in size, for use in fast multimedia processing. For
`example, in addition to being able to permute sixty-four
`1-bit subwords in a register, the permutation instructions and
`underlying functional unit can permute thirty-two 2-bit
`subwords, sixteen 4-bit subwords, eight 8-bit subwords, four 25
`16-bit subwords, or two 32-bit subwords. The permutation
`instructions of the present invention can be added as new
`instructions to the Instruction Set Architecture of a conven-
`tional microprocessor, or they can be used in the design of
`new processors or coprocessors to be efficient for both 30
`cryptography and multimedia software.
`A PPERM instruction is defined to perform permutations
`by a sequence of instructions with each sequence specifying
`the position in the source for each bit in the destination. In
`the PPERM instruction bits in the destination register that 35
`change are updated and bits in the destination register that do
`not change are set to zero. Alternatively, a PPERM3R
`instruction
`is defined
`to perform permutations. The
`PPERM3R instruction is similar to the PPERM instruction
`except that the bits from the destination register which do
`not change are copied unchanged, rather than set to zero as
`in PPERM. Accordingly, the PPERM3R instruction uses
`three source registers because the destination register is also
`a source register since the unchanged bits are held in the
`destination register. For every one of n bits to be changed in
`the final permutation, lgn bits can be used in the PPERM
`instruction or the PPERM3R instruction to specify which bit
`in the source register should replace the bit to be changed in
`the destination register.
`In an alternate embodiment, a GRP instruction is defined
`to perform permutations. The GRP instruction divides the
`initial sequence in the source register into two groups
`depending on configuration bits. The first group is concat(cid:173)
`enated with the second group to form the result of one GRP
`instruction, which is also an intermediate bit sequence
`toward the final permutation. The total number of GRP
`instructions for a permutation of n bits is up to lgn.
`In an embodiment of the present invention, multibit
`subwords are permuted with the GRP instruction. In a
`further embodiment of the invention, the method and system
`are scaled for performing permutations of 2n bits in which
`subwords are packed into two or more registers. In this
`embodiment, at most 21gn+4 instructions are used to per(cid:173)
`mute 2n bits using n-bit words.
`For a better understanding of the present invention, ref(cid:173)
`erence may be made to the accompanying drawings.
`
`FIG. 1 is a schematic diagram of a system for implement(cid:173)
`ing permutation instructions in accordance with an embodi-
`5 ment of the present invention.
`FIG. 2 is a flow diagram of a method for determining a
`permutation instruction sequence to achieve a desired per(cid:173)
`mutation in accordance with an embodiment of the present
`invention.
`FIG. 3A is a schematic diagram of operation of a PPERM
`instruction.
`FIG. 3B is a schematic diagram of an example extracting
`8 bits from multiple registers with PPERM instructions with
`"otherreg" bit.
`FIG. 4A illustrates a schematic diagram of an alternate
`system with three source registers.
`FIG. 4B is a schematic diagram of a circuit for imple(cid:173)
`menting a 64-bit PPERM instruction.
`FIG. 5 is a schematic diagram of an operation of a GRP
`instruction.
`FIG. 6 is a schematic diagram of a method for determin(cid:173)
`ing a sequence of GRP instruction and control bits for the
`GRP instructions.
`FIG. 7 is an example of determining a GRP instruction
`sequence for an 8-bit permutation.
`FIG. SA is a schematic diagram of a unit for serial
`implementation of a GRP operation with one control signal.
`FIG. SB is a circuit diagram for a unit for a GRP operation
`with one control signal.
`FIG. SC is a schematic diagram of an alternate unit for
`serial implementation of a GRP operation with two control
`signals.
`FIG. SD is a circuit diagram for a unit for a GRP operation
`with two control signals.
`FIG. 9A is an example of a circuit for serial implemen(cid:173)
`tation of a first step of a GRP operation using half of a GRP
`function unit.
`FIG. 9B is an example of a whole 8-bit GRP function unit.
`FIG. 10 is a circuit diagram of a serial implementation of
`40 a GRP operation.
`FIG. 11 is a schematic diagram of principle for a parallel
`scheme for a GRP operation.
`FIG. 12 is a schematic diagram of a unit for parallel
`implementation of a GRP operation.
`FIG. 13 is a circuit diagram for extracting z bits in an 8-bit
`group by combining z bits in two 4-bit groups.
`FIG. 14 is a circuit diagram of addition of two one-hot
`encoded numbers.
`FIG. 15 is a circuit diagram of a circuit to generate a final
`result of an n bit GRP operation, wherein n=4.
`FIG. 16 is a schematic diagram of a parallel GRP imple(cid:173)
`mentation.
`FIG. 17 is a schematic diagram of a module for generating
`55 select signals.
`FIG. lSA is a flow diagram of a method for 2n-bit
`permutation in accordance with an embodiment of the
`present invention.
`FIG. lSB is a schematic diagram of the method shown in
`60 lSA.
`FIG. 19 is a graph of the number of instructions for
`encryption and key generation in DES.
`FIG. 20 is a circuit diagram for extracting z bits in an 8-bit
`group by combining z bits in two 4-bit groups and making
`65 non-z bits zero.
`FIG. 21 is a schematic diagram of an embodiment of a
`parallel implementation of 8-bit GRP operation.
`
`50
`
`45
`
`Oracle-1001 p. 31
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`US 7,174,014 B2
`
`5
`FIG. 22 is a schematic diagram of an embodiment of a
`parallel implementation of 64-bit GRP operation.
`FIG. 23 is a schematic diagram of an embodiment of a
`module for generating select signals.
`
`6
`desired permutation. The source positions for k bits can be
`specified with one instruction. PPERM instructions can be
`defined as follows:
`
`PPERM,x Rl,R2,R3
`
`DETAILED DESCRIPTION
`
`Reference will now be made in greater detail to a pre(cid:173)
`ferred embodiment of the invention, an example of which is
`illustrated in the accompanying drawings. Wherever pos(cid:173)
`sible, the same reference numerals will be used throughout
`the drawings and the description to refer to the same or like
`parts.
`FIG. 1 illustrates a schematic diagram of a system 10 for
`implementing efficient permutation instructions in accor(cid:173)
`dance with the teachings of the present invention. Register
`file 12 includes source register lla, source register llb and
`destination register llc. System 10 can provide bit-level
`permutations of all n bits of any register in register file 12.
`The same solution can be applied to different subword sizes
`of 2' bits, for i=0, 1, 2, ... , m, where n=2m bits. For a fixed
`word size of n bits, and 1-bit subwords, there are n subwords
`to be permuted. Source register values to be permuted 13
`from register lla and configuration bits 15 from source
`register llb are applied over data paths to permutation
`functional unit 14. Permutation function unit 14 generates
`permutation result 16. Permutation result 16 can be an
`intermediate result if additional permutations are performed 30
`by permutation functional unit 14. For other instructions,
`arithmetic logic unit (ALU) 17 and shifter 18 receive source
`register values 13 from source register lla and source
`register values 15 from source register llb and generate
`respective ALU result 20 and shifter result 21 over a data 35
`path to destination register llc. System 10 can be imple(cid:173)
`mented in any programmable processor, for example, a
`conventional microprocessor, digital
`signal processor
`(DSP), cryptographic processor, multimedia processor,
`media processor, programmable system-on-a-chip (SOC), 40
`and can be used in developing processors or coprocessors
`for providing cryptography, multimedia and other opera(cid:173)
`tions.
`FIG. 2 is a flow diagram of a method of determining
`permutation instruction sequences for permutations 22. The
`determined permutation instruction sequences can be
`executed in permutation functional unit 14. In block 23, bit
`positions in a source sequence of bits are defined for a group
`of bits in a destination register. In block 24, a permutation
`instruction is determined with the bit positions to assemble
`bits from the source sequence of bits. In block 25, the
`permutation instruction is performed. The assembled bits are
`inserted into a destination register as determined by the bit
`positions. Blocks 23-25 can be conditionally repeated for
`every non-overlapping group of bits in the destination, in
`block 26. After the final permutation is determined, the
`desired permutation of the source register is determined in
`block 27. Alternatively, all block 23 can be performed for all
`bits in the destination register and block 24 is performed for
`all bit positions determined in block 23. Thereafter when the 60
`permutation needs to be performed, block 25 is performed.
`All permutation instructions can be performed in block 25 in
`parallel.
`A PPERM instruction can be used as the permutation
`instruction described above for dynamically specified per- 65
`mutations of n subwords. Each PPERM instruction defines
`a subset of bits which subsets can be combined to define the
`
`wherein Rl and R2 are the source registers and R3 is a
`destination register. Rl contains the bits to be permuted. R2
`contains configuration bits. x specifies which k bits in R3
`will change. In R3, only k bits specified by x are updated, the
`10 other bits are set to zero. k lgn bits in R2 can be used to
`specify where to extract the k consecutive bits to be changed
`in R3.
`FIG. 3A illustrates a schematic diagram of an example
`operation of the PPERM instruction. The PPERM instruc-
`is PPERM, 1 Rl, R2, R3 wherein R2
`15 tion
`is
`0x020E160820252C33 in hexadecimal notation. This is the
`same as the decimal values shown in R2 a