throbber
I 1111111111111111 11111 1111111111 111111111111111 1111111111111111 IIII IIII IIII
`US007174014B2
`
`c12) United States Patent
`Lee et al.
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US 7,174,014 B2
`Feb.6,2007
`
`(54) METHOD AND SYSTEM FOR PERFORMING
`PERMUTATIONS WITH BIT PERMUTATION
`INSTRUCTIONS
`
`(75)
`
`Inventors: Ruby B. Lee, Princeton, NJ (US);
`Zhijie Shi, Princeton, NJ (US)
`
`(73) Assignee: Teleputers, LLC, Princeton, NJ (US)
`
`5,483,541 A
`5,524,256 A *
`5,546,393 A
`5,623,548 A
`5,673,321 A
`
`1/1996 Linsky ....................... 371/2.1
`6/1996 Turkowski .................. 712/300
`8/1996 Mine ......................... 370/60.1
`4/1997 Akiyama et al.
`............. 380/28
`9/1997 Lee ............................. 380/37
`
`(Continued)
`
`OTHER PUBLICATIONS
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 881 days.
`
`Zhijie Shi, Ruby B. Lee: "Bit Permutation Instructions for Accel(cid:173)
`erating Software Cryptography", in Proc. IEEE Intl. Conf. Appli(cid:173)
`cation-Specific Systems, Architectures and Processors, pp. 138-148,
`Jul. 2000.*
`
`(21) Appl. No.: 09/850,239
`
`(22)
`
`Filed:
`
`May 7, 2001
`
`(65)
`
`(51)
`
`(52)
`
`(58)
`
`Prior Publication Data
`
`US 2002/0078011 Al
`
`Jun. 20, 2002
`
`Int. Cl.
`H04K 1100
`(2006.01)
`U.S. Cl. .......................... 380/28; 380/44; 380/265;
`377/54; 377/60; 377/67; 377/81; 340/825.68;
`365/78; 711/109; 712/1; 712/10; 712/24;
`712/223
`Field of Classification Search .................. 380/44,
`380/265, 28; 377/54, 60, 75, 67, 81; 711/109;
`340/825.68; 365/73, 78; 712/1, 24, 10,
`712/223
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,796,830 A
`3,962,539 A
`4,275,265 A
`4,937,574 A
`4,972,481 A
`5,001,753 A
`5,297,207 A
`5,442,705 A
`
`3/1974 Smith .......................... 178/22
`6/1976 Ehrsam et al. ................ 178/22
`.. ... ... ... 178/27 .09
`6/ 1981 Davida et al.
`6/1990 Wright ....................... 341/106
`ll/ 1990 Santesson .. ... ... ... ... ... .. . 380/49
`3/1991 Davio et al ................... 380/29
`3/ 1994 Degele . ... .. ... ... ... ... ... .. . 380/46
`8/1995 Miyano ....................... 380/29
`
`(Continued)
`
`Primary Examiner-Emmanuel L. Moise
`Assistant Examiner-Paul Callahan
`(74)Attorney, Agent, or Firm-Mathews, Shepherd, McKay
`& Bruneau, P.A.
`
`(57)
`
`ABSTRACT
`
`The present invention provides permutation instructions
`usable in a programmable processor for solving permutation
`problems in cryptography, multimedia and other applica(cid:173)
`tions. PPERM and PPERM3R instructions are defined to
`perform permutations by a sequence of instructions with
`each sequence specifying the position in the source for each
`bit in the destination. In the PPERM instruction bits in the
`destination register that change are updated and bits in the
`destination register that do not change are set to zero. In the
`PPERM3R instruction bits in the destination register that
`change are updated and bits in the destination register that do
`not change are copied from intermediate result of previous
`PPERM3R instructions. Both PPERM and PPERM3R
`instructions can individually do permutation with bit rep(cid:173)
`etition. Both PPERM and PPERM3R instructions can indi(cid:173)
`vidually do permutation of bits stored in more than one
`register. In an alternate embodiment, a GRP instruction is
`defined to perform permutations.
`
`64 Claims, 27 Drawing Sheets
`
`n
`
`lla
`llb
`
`13
`15
`
`Register
`
`llc File
`
`12
`
`Permutation
`Functional
`Unit
`14
`
`16
`
`ALU
`
`17
`
`20
`
`Shifter
`
`18
`
`21
`
`Oracle-1001 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`US 7,174,014 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`5,734,721 A *
`5,768,493 A
`5,956,405 A
`6,009,505 A
`6,072,873 A
`6,081,896 A
`6,119,224 A
`6,195,026 Bl
`6,275,587 Bl
`
`3/ 1998 Clark ... ... ... .. ... ... ... ... .. . 380/46
`6/1998 Kumar ....................... 395/181
`9/ 1999 Yuval ... ... ... .. ... ... ... ... .. . 380/29
`12/1999 Thayer et al.
`................. 712/6
`6/2000 Bewick ...................... 380/217
`6/2000 Johns-Vano et al. ........ 713/200
`9/2000 Roth .......................... 712/300
`2/2001 Acharya .. .. ... ... ... ... ... .. . 341/60
`8/2001 Amerige ..................... 380/255
`
`4/2002 Lee ............................ 721/223
`6,381,690 Bl
`6,865,272 B2 * 3/2005 Cole ........................... 380/28
`
`OTHER PUBLICATIONS
`
`Advanced Encryption Standard, "Announcing Request for Candi(cid:173)
`date Algorithm Nominations for the Advanced Encryption Standard
`( AES), "http://csrc.nist.gov/ encryption/aes/pre-roundl/aes_9709.
`htm.
`
`* cited by examiner
`
`Oracle-1001 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 1 of 27
`
`US 7,174,014 B2
`
`n
`
`lla
`
`llb
`
`13
`15
`
`Register
`
`l lc File
`
`12
`
`Permutation
`Functional
`Unit
`14
`
`16
`
`20
`
`Fig.1
`
`Shifter
`
`18
`
`21
`
`Oracle-1001 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 2 of 27
`
`US 7,174,014 B2
`
`23-
`
`Determine bit positions in source
`
`l
`l
`
`Determine permutation instruction
`
`Perform permutation instruction
`
`24-
`
`25-
`
`26-
`
`Conditionally repeat 23 - 25
`
`,,
`27
`
`(
`
`]
`
`Fig. 2
`
`Oracle-1001 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 3 of 27
`
`US 7,174,014 B2
`
`2
`
`14
`
`22
`
`8
`
`32
`
`37
`
`44
`
`51
`
`R2
`
`Rl
`
`30--.__
`
`64-8 Crossbar
`
`__.-- 32
`
`R3
`
`0
`
`0
`
`0
`
`PPERM,1 Rl,R2,R3
`
`;R2 = 0x020E160820252C33
`
`Fig. 3A
`
`Oracle-1001 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 4 of 27
`
`US 7,174,014 B2
`
`R21
`
`Rll
`
`R22
`
`R12
`
`R23
`
`R13
`
`R24
`
`R14
`
`R25
`
`R15
`
`R21=0x02808D8080808D80
`1
`1
`1
`1
`1
`1
`0
`2
`0
`0
`0
`0
`0
`0
`
`1
`0
`
`PPERM
`FU
`
`Rl
`
`0
`
`0
`
`1
`0
`
`R2
`
`0
`
`1
`0
`
`R3
`
`0
`
`1
`0
`
`PPERM
`FU
`
`0
`
`0
`
`PPERM
`FU
`
`R4
`
`0
`
`0
`
`0
`51
`
`PPERM
`FU
`
`1 bit: otherreg
`
`RS
`
`0
`
`0
`
`OR
`
`~ - po•lion ;n
`ource register
`1
`
`rn
`
`/
`\
`8 bits value. 7 bits used.
`unused bit is 0
`
`R1
`
`0
`
`0 D 0
`
`0
`
`Fig. 3B
`
`Oracle-1001 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 5 of 27
`
`US 7,174,014 B2
`
`100
`
`n
`
`113
`
`llla
`
`11 lb
`Register
`
`11 lc File
`
`112
`
`······~······· ................... ······=·
`.
`.
`
`Permutation
`Functional
`Unit
`114
`
`ALU
`
`117
`
`116
`
`120
`
`Shifter
`
`118
`
`121
`
`Fig. 4A
`
`Oracle-1001 p. 7
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 6 of 27
`
`US 7,174,014 B2
`
`33
`
`8 6-bit control
`
`64 bits to be permuted
`
`-30
`
`X
`
`3(cid:157)
`8
`decoder
`
`Sell ... Sel7
`
`\
`
`8 permuted bits
`y
`
`_,1
`
`G
`G=O for PPERM~-s--,.'!-..-i---,.t--'l"'l-""""l"'l-,,.-""'l'"i'-..-----~"'1-1---,.._"l"'i-__,.t--,.;--,-,~~
`G=uochanged
`bit value in R3
`for PPERM3R SelO
`
`SeJ7
`
`\_______ bit O-bit7 (byte 0)
`
`bit 56 -
`~
`64 bits result
`
`bit 63 (byte 7) ___J
`
`34
`
`Fig. 4B
`
`Oracle-1001 p. 8
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb. 6, 2007
`
`Sheet 7 of 27
`
`US 7,174,014 B2
`
`0
`
`7
`
`Control Bits R2
`
`Data
`
`R1
`
`I 1101011I1Joi1 1 01
`j a I b I c i d I e j f I g I h I
`
`Result
`
`R3
`
`Fig. 5
`
`Oracle-1001 p. 9
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 8 of 27
`
`US 7,174,014 B2
`
`40
`
`41-
`
`Determine the final arrangement
`
`-42
`
`43-
`
`44-
`
`45-
`
`46-
`
`Combine groups of MISes
`
`Sort the merged groups
`
`ement
`
`Determine control bits for a permutation
`instruction
`
`No
`
`arrangement is the initial
`arrangement
`
`-47
`
`48-
`
`Determine GRP instruction sequence
`
`Yes
`
`Fig. 6
`
`Oracle-1001 p. 10
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 9 of 27
`
`US 7,174,014 B2
`
`50
`
`52-
`
`p
`
`Iteration 1
`(5, 0, 1, 2, 4, 3, 7, 6)
`
`51
`
`/
`Iteration 2
`(3, 5, 7, 0, 1, 2, 4, 6)
`
`53- MISes in P
`
`(5)(0, 1, 2, 4)(31 7)(6)
`
`(3, 5, 7), (Oi l, 2, 4, 6)
`
`54-
`
`55-
`
`56-
`
`57-
`
`Combining
`MISes
`Sorting
`
`New
`Arrangement
`Control bits
`forGRP
`instruction
`
`(5, 3, 7),(0, 1, 2, 4, 6)
`
`(3,5,7,0, 1,2,4,6)
`
`(3, 5, 7), (0, 1, 2, 4, 6)
`
`(0, 1, 2, 3, 4, 5, 6, 7)
`
`(3, 5, 7, 0, 1, 2, 4, 6)
`
`(0, 1,2,3,4,5,6,7)
`
`(1,0, 1,0,o,o,o, 1)
`
`(1, 1, 1, 0, 1, 0, 1, 0)
`
`Fig. 7
`
`Oracle-1001 p. 11
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 10 of 27
`
`US 7,174,014 B2
`
`60
`
`61a
`
`63c
`
`62x
`
`62y
`
`A)
`
`Fig. 8A
`
`63c'
`
`61a
`
`63c
`
`62x
`
`62y
`
`C)
`
`61a
`
`6 C
`
`62x ---+----'
`
`B)
`
`62y
`
`Fig. SB
`
`61a
`
`62x ---.--
`
`D)
`
`62y
`
`Fig. SC
`
`Fig. 8D
`
`Oracle-1001 p. 12
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 11 of 27
`
`US 7,174,014 B2
`
`D
`unit 60
`
`Row0
`
`Row1
`
`Row4
`
`Rows
`
`Row7
`
`Rows
`
`Fig. 9A
`
`Oracle-1001 p. 13
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 12 of 27
`
`US 7,174,014 B2
`
`71
`
`70
`
`(cid:143) unit 60
`
`72 -----
`
`Fig. 9B
`
`Oracle-1001 p. 14
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 13 of 27
`
`US 7,174,014 B2
`
`bits to be permuted
`control bits
`
`output
`
`Fig. 10
`
`Oracle-1001 p. 15
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 14 of 27
`
`US 7,174,014 B2
`
`80
`
`nJ2 bits
`
`81
`
`nl2 bits
`
`82
`
`83
`
`n bits
`
`D z bits: bits whose control bits are 0
`flt! bits whose control bits are 1
`ml invalid values
`
`Fig. 11
`
`Oracle-1001 p. 16
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 15 of 27
`
`US 7,174,014 B2
`
`85
`
`861
`
`870
`
`Fig. 12
`
`Oracle-1001 p. 17
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 16 of 27
`
`US 7,174,014 B2
`
`90
`
`00
`
`Ol
`
`02
`
`03
`
`04
`
`05
`
`06
`
`07
`
`Fig.13
`
`Oracle-1001 p. 18
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 17 of 27
`
`US 7,174,014 B2
`
`92
`
`0
`
`0
`
`0
`
`0
`
`T4
`
`T3
`
`T2
`
`Tl
`
`TO
`
`0
`
`so
`
`Sl
`
`S2
`
`S3
`
`S4
`
`U8
`
`U7
`
`U6
`
`us
`
`U4
`
`U3
`
`U2
`
`Ul
`
`uo
`
`Fig. 14
`
`Oracle-1001 p. 19
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 18 of 27
`
`US 7,174,014 B2
`
`94
`
`IO
`
`II
`
`12
`
`13
`
`JO
`
`so--
`
`SI--
`
`S2 - - -1
`
`S3--
`
`S4--
`
`00
`
`01
`
`02
`
`03
`
`Fig. 15
`
`Oracle-1001 p. 20
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 19 of 27
`
`US 7,174,014 B2
`
`64 bits input
`
`64 bits foput
`
`2 bit (cid:157)
`
`4 bits
`
`4 bit (cid:157)
`
`8 bits
`
`:
`:
`:
`:
`... ~ .. ········: .. ···· ...... , ................ , .... ~ ... ·····-:-··········· , .... ···················~ ....................................... , .. :- ··························
`.
`.
`.
`..
`.
`.
`..
`.
`
`16 bit (cid:157)
`
`32 bits
`
`........... ··-,1.- ...... ............................ l········· .. · .. · ........... ................ ;·· .. ··········· ..................... ; .............................. .
`32 bit (cid:157)
`
`64 bits
`
`............................... ·•··t·· .. •··········•····· .. ···· ... · ........................................ ,. ............ .......................... ··············•·····"
`
`output
`
`Fig. 16
`
`Oracle-1001 p. 21
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 20 of 27
`
`US 7,174,014 B2
`
`96
`
`64 bits control bit
`
`1 :encode number of 1 's in groups of 2 bits
`
`2:encode number of 1 'sin groups of 4 bits
`
`••••••••••• ................................. ·········--· ............ ·••,,=••··· ·········-········· •••
`
`. .
`.
`.
`.
`.
`
`.
`.
`.
`.
`
`············· .......... ; ..................................... t .............................. .
`6:encode number of 1 'sin groups of 64 bits
`
`5:encode number of 1 'sin groups of 32 bits
`
`Fig. 17
`
`Oracle-1001 p. 22
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 21 of 27
`
`US 7,174,014 B2
`
`200-
`
`201-
`
`202-
`
`Divide the bits of each register into two
`groups each going to separate
`destination registers using two GRP
`instructions
`
`Put bits of each register into the
`respective group in the destination
`re ister
`
`Perform GRP instructions to perform
`n-permutations on bits in each
`destination re ister
`
`Fig. 18A
`
`Oracle-1001 p. 23
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 22 of 27
`
`US 7,174,014 B2
`
`R1
`
`R1
`
`l
`r .J,
`
`205a
`
`R3
`
`I l
`
`R2
`
`205b
`
`207a
`
`207b
`
`GRP
`
`I
`
`SHIFT PAIR
`
`Fig. 18B
`
`Oracle-1001 p. 24
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 23 of 27
`
`US 7,174,014 B2
`
`3.03
`
`~Table
`Lookup
`(cid:143) PPERM3R
`
`3.5
`
`3
`
`2.5
`
`2
`
`1.5
`
`1
`
`0.5
`
`0
`
`Encryption
`
`Key generation
`
`Fig. 19
`
`Oracle-1001 p. 25
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 24 of 27
`
`US 7,174,014 B2
`
`110
`
`IO
`
`II
`
`12
`
`I3
`
`14
`
`15
`
`16
`
`17
`
`0
`
`00
`
`01
`
`02
`
`03
`
`04
`
`05
`
`06
`
`07
`
`Fig. 20
`
`Oracle-1001 p. 26
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 25 of 27
`
`US 7,174,014 B2
`
`114
`
`Control bits
`
`Inverted Control bits
`
`115
`/
`
`IO
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`17
`
`16
`
`15
`
`14
`
`13
`
`12
`
`11
`
`IO
`
`~ ~ n
`
`o
`
`o
`
`o
`
`o
`
`o
`
`04 03 02 01 00
`
`0
`
`0
`
`0
`
`Z0
`
`Zl
`
`Z2
`
`00 01 02
`
`03
`
`04
`
`113
`
`Fig. 21
`
`Oracle-1001 p. 27
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 26 of 27
`
`US 7,174,014 B2
`
`64 bits input
`
`64 bits input in reverse order
`
`1:1 bit (cid:157)
`
`2 bits
`
`2:2 bit (cid:157)
`
`4 bits
`
`.
`.
`.
`.
`.
`.
`.
`.
`'
`..
`..
`..
`'"" '°"" ""•}•• • "••••• "• ••" ••• •" ••" ,,.,,, • ••••-=•••• •• ••• •••• • ,o • • • •n ••• ' " •••• t••• •• • " •••• • ''" ••••" •• .,,,,., "''" :, ••••" ••••• ••r•••" •••••
`:
`•
`:
`•
`(cid:127)
`(cid:127)
`:
`:
`
`•
`
`3:4 bit (cid:157)
`
`8 bits
`
`···············l················· ··········· ·--···;l··················· ........................ l ······· .. ··························f············· ··· ············
`6:32 bit (cid:157)
`
`64 bits
`
`5:16 bit (cid:157)
`
`32 bits
`
`64 OR gates
`t
`output
`
`I
`
`Fig. 22
`
`Oracle-1001 p. 28
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`U.S. Patent
`
`Feb.6,2007
`
`Sheet 27 of 27
`
`US 7,174,014 B2
`
`64 bits control bit
`
`1 :encode number of 1 's in groups of 2 bits
`
`2:encode number of l's in groups of 4 bits
`
`~----~I ._I _____ __.I 5:encode nwnber of 1 's in groups of 32 bits
`
`Fig. 23
`
`Oracle-1001 p. 29
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`US 7,174,014 B2
`
`1
`METHOD AND SYSTEM FOR PERFORMING
`PERMUTATIONS WITH BIT PERMUTATION
`INSTRUCTIONS
`
`BACKGROUND OF THE INVENTION
`
`2
`example, the permutations in DES, as described in Bruce
`Schneier, "Applied Cryptography", 2nd Ed., John Wiley &
`Sons, Inc., 1996.
`Conventional techniques have also used table lookup
`5 methods to implement fixed permutations. To achieve a
`fixed permutation of n input bits with one table lookup, a
`table with r entries is used with each entry being n bits. For
`a 64-bit permutation, this type of table lookup would use 2 67
`bytes, which is clearly infeasible. Alternatively, the table can
`be broken up into smaller tables, and several table lookup
`operations could be used. For example, a 64-bit permutation
`could be implemented by permuting 8 consecutive bits at a
`time, then combining these 8 intermediate permutations into
`15 a final permutation. This method requires 8 tables, each with
`256 entries, each entry being 64 bits. Each entry has zeros
`in all positions, except the 8 bit positions to which the
`selected 8 bits in the source are permuted. After the eight
`table lookups done by 8 LOAD instructions, the results are
`20 combined with 7 OR instructions to get the final permuta(cid:173)
`tion. In addition, 8 instructions are needed to extract the
`index for the LOAD instruction, for a total of 23 instruc(cid:173)
`tions. The memory requirement is 8*256*8=16 kilobytes for
`eight tables. Although 23 instructions is less than the 128 or
`25 256 instructions used in the previous method, the actual
`execution time can be much longer due to cache miss
`penalties or memory access latencies. For example, if half of
`the 8 Load instructions miss in the cache, and each cache
`miss takes 50 cycles to fetch the missing cache line from
`30 main memory, the actual execution time is more than
`4*50=200 cycles. Accordingly, this method can be longer
`than the previously described 128 cycles using EXTRACT
`and DEPOSIT. This method also has the drawback of a
`memory requirement of 16 kilobytes for the tables.
`Permutations are a requirement for fast processing of
`digital multimedia information, using subword-parallel
`instructions, more commonly kuown as multimedia instruc(cid:173)
`tions, as described in Ruby Lee, "Accelerating Multimedia
`with Enhanced Micro-processors", IEEE Micro, Vol. 15, No.
`2, pp.22-32, April 1995, and Ruby Lee, "Subword Paral(cid:173)
`lelism in MAX-2 ", IEEE Micro, Vol. 16, No. 4, pp.51-59,
`August 1996. The MAX-2 general-purpose PERMUTE
`instructions can do any permutation, with and without
`repetitions, of the subwords packed in a 64-bit register.
`45 However, it is only defined for 16-bit subwords. MIX and
`MUX instructions have been implemented in the IA-64
`architectures, which are extensions to the MIX and PER(cid:173)
`MUTE instructions of MAX-2, see Intel Corporation, "IA-
`64 Application Developer's Architecture Guide", Intel Cor-
`50 poration, May, 1999. The IA-64 uses MUX instruction,
`which is a fully general permute instruction for 16-bit
`subwords, with five new permute byte variants. A VPERM
`instruction has been used in an AltiVec extension to the
`Power PC™ available from IBM Corporation, Armonk,
`N.Y., see Motorola Corporation, "'AltiVec Extensions to
`PowerPC'
`Instruction Set Architecture Specification",
`Motorola Corporation, May 1998. The Altivec VPERM
`instruction extends the general permutation capabilities of
`MAX-2's PERMUTE instruction to 8-bit subwords selected
`from two 128-bit source registers, into a single 128-bit
`destination register. Since there are 32 such subwords from
`which 16 are selected, this requires 16*lg32=80 bits for
`specifying the desired permutation. This means that VPERM
`has to use another 128-bit register to hold the permutation
`control bits, making it a very expensive instruction with
`three source registers and one destination register, all 128
`bits wide.
`
`10
`
`1. Field of the Invention
`The present invention relates to a method and system for
`performing permutations of a sequence of bits in a program(cid:173)
`mable processor.
`2. Description of the Related Art
`The need for secure information processing has increased
`with the increasing use of the public internet and wireless
`communications in e-commerce, e-business and personal
`use. Typical use of the internet is not secure. Secure infor(cid:173)
`mation processing typically includes authentication of users
`and host machines, confidentiality of messages sent over
`public networks, and assurances that messages, programs
`and data have not been maliciously changed. Conventional
`solutions have provided security functions by using different
`security protocols employing different cryptographic algo(cid:173)
`rithms, such as public key, symmetric key and hash algo(cid:173)
`rithms.
`For encrypting large amounts of data symmetric key
`cryptography algorithms have been used, see Bruce
`Schneier, "Applied Cryptography", 2nd Ed., John Wiley &
`Sons, Inc., 1996. These algorithms use the same secret key
`to encrypt and decrypt a given message, and encryption and
`decryption have the same computational complexity. In
`symmetric key algorithms, the cryptographic techniques of
`"confusion" and "diffusion" are synergistically employed.
`"Confusion" obscures the relationship between the plaintext
`( original message) and the ciphertext ( encrypted message),
`for example, through substitution of arbitrary bits for bits in 35
`the plaintext. "Diffusion" spread the redundancy of the
`plaintext over the ciphertext, for example through permuta(cid:173)
`tion of the bits of the plaintext block. Such bit-level permu(cid:173)
`tations have the drawback of being slow when implemented
`with conventional instructions available in microprocessors 40
`and other programmable processors.
`Bit-level permutations are particularly difficult for pro(cid:173)
`cessors, and have been avoided in the design of new
`cryptography algorithms, where it is desired to have fast
`software implementations, for example in the Advanced
`Encryption Standard, as described in NIST, "Announcing
`Request for Candidate Algorithm Nominations for the
`Advanced Encryption Standard (AES)". Since conventional
`microprocessors are word-oriented, performing bit-level
`permutations is difficult and tedious. Every bit has to be
`extracted from the source register, moved to its new location
`in the destination register, and combined with the bits that
`have already been moved. This requires 4 instructions per bit
`(mask generation, AND, SHIFT, OR), and 4n instructions to
`perform an arbitrary permutation of n bits. Conventional 55
`microprocessors, for example Precision Architecture (PA(cid:173)
`RISC) have been described to provide more powerful bit(cid:173)
`manipulation capabilities using EXTRACT and DEPOSIT
`instructions, which can essentially perform the four opera(cid:173)
`tions required for each bit in 2 instructions (EXTRACT, 60
`DEPOSIT), resulting in 2n instructions for any arbitrary
`permutation of n bits, see Ruby Lee, "Precision Architec(cid:173)
`ture", IEEE Computer, Vol. 22, No. 1, pp. 78-91, January
`1989. Accordingly, an arbitrary 64-bit permutation could
`take 128 or 256 instructions on this type of conventional 65
`microprocessor. Pre-defined permutations with some regular
`patterns have been implemented in fewer instructions, for
`
`Oracle-1001 p. 30
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`US 7,174,014 B2
`
`4
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`3
`It is desirable to provide significantly faster and more
`economical ways to perform arbitrary permutations of n bits,
`without any need for table storage, which can be used for
`encrypting large amounts of data for confidentiality or
`privacy.
`
`SUMMARY OF THE INVENTION
`
`The present invention provides permutation instructions
`which can be used in software executed in a programmable 10
`processor for solving permutation problems in cryptography,
`multimedia and other applications. For fast cryptography,
`bit-level permutations are used, whereas for multimedia,
`permutations on subwords of typically 8 bits or 16 bits are
`used. Permutation instructions of the present invention can 15
`be used to provide any arbitrary permutation of sixty-four
`1-bit subwords in a 64-bit processor, i.e., a processor with
`64-bit words, registers and datapaths, for use in fast cryp(cid:173)
`tography. The permutation instructions of the present inven(cid:173)
`tion can also be used for permuting subwords greater than 1 20
`bit in size, for use in fast multimedia processing. For
`example, in addition to being able to permute sixty-four
`1-bit subwords in a register, the permutation instructions and
`underlying functional unit can permute thirty-two 2-bit
`subwords, sixteen 4-bit subwords, eight 8-bit subwords, four 25
`16-bit subwords, or two 32-bit subwords. The permutation
`instructions of the present invention can be added as new
`instructions to the Instruction Set Architecture of a conven-
`tional microprocessor, or they can be used in the design of
`new processors or coprocessors to be efficient for both 30
`cryptography and multimedia software.
`A PPERM instruction is defined to perform permutations
`by a sequence of instructions with each sequence specifying
`the position in the source for each bit in the destination. In
`the PPERM instruction bits in the destination register that 35
`change are updated and bits in the destination register that do
`not change are set to zero. Alternatively, a PPERM3R
`instruction
`is defined
`to perform permutations. The
`PPERM3R instruction is similar to the PPERM instruction
`except that the bits from the destination register which do
`not change are copied unchanged, rather than set to zero as
`in PPERM. Accordingly, the PPERM3R instruction uses
`three source registers because the destination register is also
`a source register since the unchanged bits are held in the
`destination register. For every one of n bits to be changed in
`the final permutation, lgn bits can be used in the PPERM
`instruction or the PPERM3R instruction to specify which bit
`in the source register should replace the bit to be changed in
`the destination register.
`In an alternate embodiment, a GRP instruction is defined
`to perform permutations. The GRP instruction divides the
`initial sequence in the source register into two groups
`depending on configuration bits. The first group is concat(cid:173)
`enated with the second group to form the result of one GRP
`instruction, which is also an intermediate bit sequence
`toward the final permutation. The total number of GRP
`instructions for a permutation of n bits is up to lgn.
`In an embodiment of the present invention, multibit
`subwords are permuted with the GRP instruction. In a
`further embodiment of the invention, the method and system
`are scaled for performing permutations of 2n bits in which
`subwords are packed into two or more registers. In this
`embodiment, at most 21gn+4 instructions are used to per(cid:173)
`mute 2n bits using n-bit words.
`For a better understanding of the present invention, ref(cid:173)
`erence may be made to the accompanying drawings.
`
`FIG. 1 is a schematic diagram of a system for implement(cid:173)
`ing permutation instructions in accordance with an embodi-
`5 ment of the present invention.
`FIG. 2 is a flow diagram of a method for determining a
`permutation instruction sequence to achieve a desired per(cid:173)
`mutation in accordance with an embodiment of the present
`invention.
`FIG. 3A is a schematic diagram of operation of a PPERM
`instruction.
`FIG. 3B is a schematic diagram of an example extracting
`8 bits from multiple registers with PPERM instructions with
`"otherreg" bit.
`FIG. 4A illustrates a schematic diagram of an alternate
`system with three source registers.
`FIG. 4B is a schematic diagram of a circuit for imple(cid:173)
`menting a 64-bit PPERM instruction.
`FIG. 5 is a schematic diagram of an operation of a GRP
`instruction.
`FIG. 6 is a schematic diagram of a method for determin(cid:173)
`ing a sequence of GRP instruction and control bits for the
`GRP instructions.
`FIG. 7 is an example of determining a GRP instruction
`sequence for an 8-bit permutation.
`FIG. SA is a schematic diagram of a unit for serial
`implementation of a GRP operation with one control signal.
`FIG. SB is a circuit diagram for a unit for a GRP operation
`with one control signal.
`FIG. SC is a schematic diagram of an alternate unit for
`serial implementation of a GRP operation with two control
`signals.
`FIG. SD is a circuit diagram for a unit for a GRP operation
`with two control signals.
`FIG. 9A is an example of a circuit for serial implemen(cid:173)
`tation of a first step of a GRP operation using half of a GRP
`function unit.
`FIG. 9B is an example of a whole 8-bit GRP function unit.
`FIG. 10 is a circuit diagram of a serial implementation of
`40 a GRP operation.
`FIG. 11 is a schematic diagram of principle for a parallel
`scheme for a GRP operation.
`FIG. 12 is a schematic diagram of a unit for parallel
`implementation of a GRP operation.
`FIG. 13 is a circuit diagram for extracting z bits in an 8-bit
`group by combining z bits in two 4-bit groups.
`FIG. 14 is a circuit diagram of addition of two one-hot
`encoded numbers.
`FIG. 15 is a circuit diagram of a circuit to generate a final
`result of an n bit GRP operation, wherein n=4.
`FIG. 16 is a schematic diagram of a parallel GRP imple(cid:173)
`mentation.
`FIG. 17 is a schematic diagram of a module for generating
`55 select signals.
`FIG. lSA is a flow diagram of a method for 2n-bit
`permutation in accordance with an embodiment of the
`present invention.
`FIG. lSB is a schematic diagram of the method shown in
`60 lSA.
`FIG. 19 is a graph of the number of instructions for
`encryption and key generation in DES.
`FIG. 20 is a circuit diagram for extracting z bits in an 8-bit
`group by combining z bits in two 4-bit groups and making
`65 non-z bits zero.
`FIG. 21 is a schematic diagram of an embodiment of a
`parallel implementation of 8-bit GRP operation.
`
`50
`
`45
`
`Oracle-1001 p. 31
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`US 7,174,014 B2
`
`5
`FIG. 22 is a schematic diagram of an embodiment of a
`parallel implementation of 64-bit GRP operation.
`FIG. 23 is a schematic diagram of an embodiment of a
`module for generating select signals.
`
`6
`desired permutation. The source positions for k bits can be
`specified with one instruction. PPERM instructions can be
`defined as follows:
`
`PPERM,x Rl,R2,R3
`
`DETAILED DESCRIPTION
`
`Reference will now be made in greater detail to a pre(cid:173)
`ferred embodiment of the invention, an example of which is
`illustrated in the accompanying drawings. Wherever pos(cid:173)
`sible, the same reference numerals will be used throughout
`the drawings and the description to refer to the same or like
`parts.
`FIG. 1 illustrates a schematic diagram of a system 10 for
`implementing efficient permutation instructions in accor(cid:173)
`dance with the teachings of the present invention. Register
`file 12 includes source register lla, source register llb and
`destination register llc. System 10 can provide bit-level
`permutations of all n bits of any register in register file 12.
`The same solution can be applied to different subword sizes
`of 2' bits, for i=0, 1, 2, ... , m, where n=2m bits. For a fixed
`word size of n bits, and 1-bit subwords, there are n subwords
`to be permuted. Source register values to be permuted 13
`from register lla and configuration bits 15 from source
`register llb are applied over data paths to permutation
`functional unit 14. Permutation function unit 14 generates
`permutation result 16. Permutation result 16 can be an
`intermediate result if additional permutations are performed 30
`by permutation functional unit 14. For other instructions,
`arithmetic logic unit (ALU) 17 and shifter 18 receive source
`register values 13 from source register lla and source
`register values 15 from source register llb and generate
`respective ALU result 20 and shifter result 21 over a data 35
`path to destination register llc. System 10 can be imple(cid:173)
`mented in any programmable processor, for example, a
`conventional microprocessor, digital
`signal processor
`(DSP), cryptographic processor, multimedia processor,
`media processor, programmable system-on-a-chip (SOC), 40
`and can be used in developing processors or coprocessors
`for providing cryptography, multimedia and other opera(cid:173)
`tions.
`FIG. 2 is a flow diagram of a method of determining
`permutation instruction sequences for permutations 22. The
`determined permutation instruction sequences can be
`executed in permutation functional unit 14. In block 23, bit
`positions in a source sequence of bits are defined for a group
`of bits in a destination register. In block 24, a permutation
`instruction is determined with the bit positions to assemble
`bits from the source sequence of bits. In block 25, the
`permutation instruction is performed. The assembled bits are
`inserted into a destination register as determined by the bit
`positions. Blocks 23-25 can be conditionally repeated for
`every non-overlapping group of bits in the destination, in
`block 26. After the final permutation is determined, the
`desired permutation of the source register is determined in
`block 27. Alternatively, all block 23 can be performed for all
`bits in the destination register and block 24 is performed for
`all bit positions determined in block 23. Thereafter when the 60
`permutation needs to be performed, block 25 is performed.
`All permutation instructions can be performed in block 25 in
`parallel.
`A PPERM instruction can be used as the permutation
`instruction described above for dynamically specified per- 65
`mutations of n subwords. Each PPERM instruction defines
`a subset of bits which subsets can be combined to define the
`
`wherein Rl and R2 are the source registers and R3 is a
`destination register. Rl contains the bits to be permuted. R2
`contains configuration bits. x specifies which k bits in R3
`will change. In R3, only k bits specified by x are updated, the
`10 other bits are set to zero. k lgn bits in R2 can be used to
`specify where to extract the k consecutive bits to be changed
`in R3.
`FIG. 3A illustrates a schematic diagram of an example
`operation of the PPERM instruction. The PPERM instruc-
`is PPERM, 1 Rl, R2, R3 wherein R2
`15 tion
`is
`0x020E160820252C33 in hexadecimal notation. This is the
`same as the decimal values shown in R2 a

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket