`ARCHITECTURE, IMPLEMENTATION,
`AND CRYPTOGRAPHIC PROPERTIES
`
`Zhijie Jerry Shi
`
`A DISSERTATION
`PRESENTED TO THE FACULTY
`OF PRINCETON UNIVERSITY
`IN CANDIDACY FOR THE DEGREE
`OF DOCTOR OF PHILOSOPHY
`
`RECOMMENDED FOR ACCEPTANCE
`BY THE DEPARTMENT OF
`ELECTRICAL ENGINEERING
`
`June 2004
`
`
`
`Oracle-1036 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`© Copyright by Zhijie Jerry Shi, 2004.
`
`All rights reserved.
`
`
`
`ii
`
`Oracle-1036 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`I certify that I have read this thesis and that in my opinion it
`is fully adequate, in scope and in quality, as a dissertation
`for the degree of Doctor of Philosophy.
`
`
`
`
`
`
`Ruby B. Lee
`(Principal Adviser)
`
` I
`
` certify that I have read this thesis and that in my opinion it
`is fully adequate, in scope and in quality, as a dissertation
`for the degree of Doctor of Philosophy.
`
`
`
`
`
`Niraj K. Jha
`
` I
`
` certify that I have read this thesis and that in my opinion it
`is fully adequate, in scope and in quality, as a dissertation
`for the degree of Doctor of Philosophy.
`
`
`
`
`
`
`Yiqun L. Yin
`
`Approved for the Princeton University Graduate School:
`
`
`
`
`
`Dean of the Graduate School
`
`iii
`
`Oracle-1036 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`Abstract
`
`Bit permutation operations are interesting and important from both cryptographic and
`architectural points of view. Cryptographically, bit-level permutations naturally provide
`certain effects which are not easily obtained
`through word-level operations.
`Architecturally, the ability to support very fast bit permutations may be the next step in
`the evolution of word-oriented processors to support new multimedia and secure
`information processing workloads, which frequently manipulate data items smaller than a
`word. Any new instructions introduced into a general-purpose processor should ideally
`be useful in many applications; and their impact on the cycle time, and datapath and
`control complexity evaluated in terms of cost versus benefit.
`
`This thesis investigates the benefits and cost of architectural support for fast bit
`permutations in programmable processors. The new permutation methods presented in
`the thesis reduce the number of instructions required to permute n bits from O(n) to
`O(log2(n)). This both accelerates existing block ciphers and enables new ciphers that use
`arbitrary data-dependent permutations. The thesis also demonstrates that the bit
`permutation instructions can be used to achieve significant speedup in other applications,
`such as sorting bytes. The cost of the bit permutation instructions is studied in terms of
`their implementation complexity and their impact on a processor’s cycle time.
`
`The cryptographic properties of the bit permutation instructions are studied with
`respect to the two most effective cryptanalytic attacks, linear and differential
`cryptanalysis. The results show that the new bit permutation operations can enhance the
`
`iv
`
`Oracle-1036 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`cryptographic strength of the round function in block ciphers, so fewer rounds can
`achieve the same level of security. This leads to significant improvements in
`performance and energy consumption.
`
`This thesis is a detailed example of a more general exploration of new instruction-set
`architecture features motivated by cryptographic algorithms, as well as of architectural
`features occurring for other reasons, e.g., multimedia, that may influence the design of
`cryptographic algorithms. We hope to have initiated a continuing dialog between cipher
`designers and processor architects. Future research in this direction will lead to more
`architectural and algorithmic innovations helpful for achieving pervasive security in
`information processing, networking, and storage.
`
`
`
`v
`
`Oracle-1036 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`Acknowledgments
`
`First of all, I would like to thank my advisor Professor Ruby B. Lee for her guidance and
`support throughout my graduate studies. Her dedication to research and patience for
`students and colleagues will always be an example for me to follow. Without the
`numerous discussions and brainstorms with her, the results presented in this thesis would
`never have existed.
`
`I am indebted to Dr. Yiqun Lisa Yin, for her valuable comments and suggestions on
`my research. I would also like to thank Professor Neil Burgess (Cardiff University,
`U.K.), Professor Ronald Rivest (Massachusetts Institute of Technology), and Dr. Matt
`Robshaw (Royal Holloway College, University of London, U.K.) for their fruitful
`discussions. I would also like to thank Professor Niraj K. Jha (Princeton University) for
`taking the time to read my thesis.
`
`I am grateful to the members of PALMS group for their support. I would also like to
`thank all my friends at Princeton University, who make life at Princeton a wonderful
`experience.
`
`I would like to thank my parents and my wife Kate for their constant love and
`support. It is impossible for me to express my gratitude towards them in mere words. I
`dedicate this thesis to them.
`
`
`
`vi
`
`Oracle-1036 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`Contents
`
`Abstract
`
`Acknowledgments
`
`1 Introduction
`
`iv
`
`vi
`
`1
`
`1.1 Subword and bit permutations············································································ 3
`
`1.1.1 Subword operations and permutations for multimedia processing·········· 4
`
`1.1.2 Bit permutation······················································································· 8
`
`1.2 Thesis contributions ························································································· 10
`
`1.3 Thesis organization··························································································· 11
`
`2 Bit Permutations in Secret-key Encryption
`
`13
`
`2.1 Cryptographic algorithms················································································· 13
`
`2.1.1 Symmetric-key algorithms···································································· 14
`
`2.1.2 Public-key algorithms··········································································· 16
`
`vii
`
`Oracle-1036 p. 7
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`2.1.3 Hash algorithms···················································································· 17
`
`2.2 Block ciphers···································································································· 20
`
`2.3 Basic operations in block ciphers ····································································· 25
`
`2.4 Processor support for basic operations in block ciphers ··································· 31
`
`2.5 Summary ·········································································································· 37
`
`3 Bit Permutation Instructions
`
`39
`
`3.1 Past work·········································································································· 39
`
`3.2 Design goals····································································································· 42
`
`3.3 The GRP instruction························································································· 44
`
`3.3.1 Definition of the GRP instruction ························································· 44
`
`3.3.2 Configuration of GRP instructions for arbitrary permutations·············· 45
`
`3.4 Other bit permutation instructions ···································································· 52
`
`3.4.1 PPERM3R and PPERM········································································ 52
`
`3.4.2 CROSS ································································································· 55
`
`3.4.3 OMFLIP ······························································································· 66
`
`3.4.4 SWPERM and SIEVE ·········································································· 67
`
`3.5 Comparison of permutation alternatives··························································· 70
`
`3.5.1 Subword permutations·········································································· 70
`
`3.5.2 Scalability to 2n bits ············································································· 72
`
`3.5.3 Mappings (permutations with repetitions) ············································ 76
`
`viii
`
`Oracle-1036 p. 8
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`3.5.4 Performance for permuting 64 bits ······················································· 77
`
`3.6 Performance ····································································································· 80
`
`3.7 Permutation in ASIP architectures···································································· 82
`
`3.7.1 LdState ································································································· 83
`
`3.7.2 Register pair ························································································· 86
`
`3.7.3 Two-length ISA ···················································································· 87
`
`3.7.4 Bundled instruction··············································································· 88
`
`3.7.5 Superscalar execution ··········································································· 88
`
`3.7.6 VLIW execution ··················································································· 90
`
`3.7.7 Comparison ·························································································· 91
`
`3.8 Summary ·········································································································· 93
`
`4 Hardware Implementation of Bit Permutation Instructions
`
`95
`
`4.1 Logical effort···································································································· 96
`
`4.2 The implementation of CROSS and BFLY ···················································· 103
`
`4.2.1 The latency of the butterfly network··················································· 104
`
`4.2.2 The latency of the CROSS instruction ················································ 109
`
`4.3 The implementation of OMFLIP ···································································· 114
`
`4.3.1 The latency of a 4-stage omega and flip network································ 116
`
`4.3.2 The latency of a 2-stage omega and flip network································ 120
`
`4.4
`
`Implementations of GRP ················································································ 122
`
`ix
`
`Oracle-1036 p. 9
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`4.5 The latency of the GRP unit ··········································································· 131
`
`4.6 Summary and discussion ················································································ 138
`
`Appendices·············································································································· 141
`
`Appendix 4A. The capacitance of wires ························································· 141
`
`Appendix 4B. Latency of the inverse butterfly network ································· 144
`
`5 Cryptographic Properties of Bit Permutation Operations
`
`146
`
`5.1 Overview of cryptanalytic techniques ···························································· 149
`
`5.2 Differential properties of permutation instructions········································· 151
`
`5.3 Linear properties of permutation instructions ················································· 158
`
`5.4 Enhancing existing ciphers using permutation operations ······························ 163
`
`5.4.1 RC5 and existing attacks····································································· 164
`
`5.4.2 Enhancing RC5 using GRP································································· 165
`
`5.4.3 Performance improvement·································································· 170
`
`5.4.4 RC5-grp versus RC6··········································································· 171
`
`5.4.5 Decryption and the UNGRP operation················································ 176
`
`5.5 Summary and discussion ················································································ 177
`
`6 Bit Permutations in Other Applications
`
`180
`
`6.1 Subword sorting ····························································································· 181
`
`6.2 Sorting subwords in a single register······························································ 182
`
`6.3 Sorting subwords in multiple registers ··························································· 185
`
`x
`
`Oracle-1036 p. 10
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`6.4 Performance of sorting subwords with GRP··················································· 194
`
`6.5 Configuring GRP instructions for permutations ············································· 197
`
`6.6 Permutation of multi-bit subwords in multiple registers································· 202
`
`6.7 Summary ········································································································ 206
`
`7 Conclusions and Future Research
`
`207
`
`7.1 Efficient bit permutation instructions ····························································· 207
`
`7.2 Bit permutation instructions in block ciphers ················································· 209
`
`7.3 Future research ······························································································· 211
`
` Bibliography
`
`
`213
`
`xi
`
`Oracle-1036 p. 11
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`List of Tables
`
`Table 2.1: Block ciphers used in commonly-used security protocols .............................. 26
`
`Table 2.2: Operations in block ciphers for encrypting a block......................................... 27
`
`Table 2.3: Operations used in block ciphers..................................................................... 33
`
`Table 3.1: Traditional methods to do a 64-bit permutation on 64-bit systems................. 41
`
`Table 3.2: Configuring GRP instructions for an 8-bit permutation.................................. 51
`
`Table 3.3: Configuring the GRP instructions for (0, 1, 2, 3, 4, 5, 6, 7) ............................ 51
`
`Table 3.4: Number of the PPERM3R instructions for one permutation........................... 53
`
`Table 3.5: Maximum number of instructions for a permutation....................................... 72
`
`Table 3.6: Comparison of permutation alternatives.......................................................... 77
`
`Table 3.7: Maximum number of instructions (or cycles) for permuting 64 bits............... 79
`
`Table 3.8: Memory requirement for 64-bit permutations ................................................. 80
`
`Table 3.9: Comparison of alternatives on ASIPs for 64-bit permutations........................ 92
`
`Table 4.1: The logical effort and parasitic delay of common logic gates......................... 98
`
`xii
`
`Oracle-1036 p. 12
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`Table 4.2: Calculating the delay of a 64-bit butterfly network....................................... 109
`
`Table 4.3: Calculating the delay of 64-bit CROSS instructions ..................................... 113
`
`Table 4.4: Input capacitance, logical effort, and parasitic delay of 3:1 MUX................ 117
`
`Table 4.5: Calculating the delay for a 64-bit 4-stage omega-flip network ..................... 119
`
`Table 4.6: Input capacitance, logical effort, and parasitic delay of 4:1 MUX................ 120
`
`Table 4.7: Calculating the delay for a 64-bit 2-stage omega-flip network ..................... 121
`
`Table 4.8: The input capacitance, logical effort, and parasitic delay of TG and ITG .... 132
`
`Table 4.9: Branching effort, logical effort, and parasitic delay for stages in GRP64..... 137
`
`Table 4.10: The latency of different functional units (64 bits)....................................... 139
`
`Table 4.11: Estimation of wire capacitance.................................................................... 142
`
`Table 4.12: Branching effort, logical effort, and parasitic delay of the inverse butterfly
`network ......................................................................................................... 145
`
`Table 5.1: Differential properties of permutation operations ......................................... 153
`
`Table 5.2: Linear properties of DDR and GRP............................................................... 160
`
`Table 5.3: Single-bit characteristics for DDR in RC5 .................................................... 166
`
`Table 5.4: Characteristics for GRP following DDR in RC5-grp.................................... 168
`
`Table 5.5: The number of chosen plaintexts required for differential attack against RC5
`and RC5-grp (64-bit block size). .................................................................. 169
`
`Table 5.6: Performance improvement of RC5-grp over RC5 with the same security level
`....................................................................................................................... 171
`
`Table 5.7: The number of chosen plaintexts required for differential attacks against RC5-
`grp-64 and RC6 (128-bit block size) ............................................................ 173
`
`xiii
`
`Oracle-1036 p. 13
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`Table 5.8: Number of plaintexts required for linear attacks against RC5-grp-64 and RC6
`(128-bit blocks)............................................................................................. 174
`
`Table 5.9: Performance of RC5-grp-64 and RC6 (128-bit blocks) ................................ 175
`
`Table 6.1: Example of the BroadcastBit instruction....................................................... 183
`
`Table 6.2: Number of instructions for sorting 64 subwords using GRP and MIX on 64-bit
`registers......................................................................................................... 193
`
`Table 6.3: Speedup of GRP and BroadcastBit................................................................ 195
`
`Table 6.4: Speedup of GRP and MIX for 64 values....................................................... 196
`
`
`
`xiv
`
`Oracle-1036 p. 14
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`List of Figures
`
`Figure 1.1: Parallel add of subwords .................................................................................. 4
`
`Figure 1.2: Basic datapath in single-issue processors......................................................... 9
`
`Figure 2.1: Symmetric-key algorithms ............................................................................. 15
`
`Figure 2.2: Public-key algorithms..................................................................................... 16
`
`Figure 2.3: Digital signature ............................................................................................. 18
`
`Figure 2.4: The authentication process in CHAP ............................................................. 19
`
`Figure 2.5: An example of the substitution-permutation network.................................... 23
`
`Figure 2.6: Round function in a Feistel network .............................................................. 23
`
`Figure 2.7: Structure of a typical block cipher ................................................................. 24
`
`Figure 2.8: The extract operation...................................................................................... 29
`
`Figure 2.9: The concatenation operation .......................................................................... 30
`
`Figure 2.10: The DEPOSIT instruction ............................................................................ 34
`
`Figure 2.11: The Shift Right Pair instruction.................................................................... 35
`
`xv
`
`Oracle-1036 p. 15
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`Figure 3.1: Standard processor datapath with a new permutation unit............................. 43
`
`Figure 3.2: Description of the GRP operation in pseudocode .......................................... 44
`
`Figure 3.3: An 8-bit GRP operation.................................................................................. 45
`
`Figure 3.4: Algorithm to configure a single GRP instruction........................................... 48
`
`Figure 3.5: Configure GRP instructions ........................................................................... 50
`
`Figure 3.6: Doing the initial permutation in DES with PPERM3R.................................. 54
`
`Figure 3.7: A 16-bit butterfly network.............................................................................. 56
`
`Figure 3.8: An 8-bit Benes network.................................................................................. 57
`
`Figure 3.9: Algorithm config_cross.................................................................................. 59
`
`Figure 3.10: The partitioning algorithm in config_cross.................................................. 61
`
`Figure 3.11: Disjoint rings in an example of BN(8) ......................................................... 62
`
`Figure 3.12: cross_partition on eight bits ......................................................................... 64
`
`Figure 3.13: Configuring Benes network for an 8-bit permutation .................................. 65
`
`Figure 3.14: A 16-bit OMFLIP functional unit ................................................................ 67
`
`Figure 3.15: An example of SWPERM instruction .......................................................... 68
`
`Figure 3.16: Doing bit permutation with SWPERM and SIEVE ..................................... 69
`
`Figure 3.17: Separation of bits when doing 2n-bit permutations...................................... 73
`
`Figure 3.18: Generation of n permuted bits when permuting 2n bits ............................... 74
`
`Figure 3.19: Scheduling of permutation instructions on 2-way superscalar processors... 78
`
`Figure 3.20: Instructions for table lookup......................................................................... 79
`
`Figure 3.21: Speedup of DES ........................................................................................... 81
`
`xvi
`
`Oracle-1036 p. 16
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`Figure 3.22: Six CROSS instructions permuting 64 bits .................................................. 83
`
`Figure 3.23: Permutation units and processor datapaths .................................................. 84
`
`Figure 3.24: BFLY and IBFLY instruction variants......................................................... 85
`
`Figure 3.25: BLFY and IBLFY with 4-way superscalar execution.................................. 89
`
`Figure 4.1: An inverter driving four identical inverters.................................................... 98
`
`Figure 4.2: Two 2:1 MUXIs for a pair of bits in the butterfly network.......................... 104
`
`Figure 4.3: Transistor diagram of a 2:1 MUXI............................................................... 105
`
`Figure 4.4: Transistor diagram of a minimum-sized inverter........................................ 105
`
`Figure 4.5: The critical path of the 64-bit butterfly network .......................................... 107
`
`Figure 4.6: Generating the select signals in the CROSS unit ......................................... 111
`
`Figure 4.7: The critical path of a CROSS unit................................................................ 112
`
`Figure 4.8: Implementation of a 3:1 MUX with 2:1 MUXIs.......................................... 115
`
`Figure 4.9: 4:1 MUX for 2-stage OMFLIP implementation .......................................... 116
`
`Figure 4.10: The critical path of a 64-bit 4-stage omega-flip network........................... 118
`
`Figure 4.11: The critical path of a 2-stage omega and flip network............................... 121
`
`Figure 4.12: Three steps performing a GRP operation................................................... 123
`
`Figure 4.13: The GRP implementation with shift registers............................................ 124
`
`Figure 4.14: Diagram of GRP64..................................................................................... 125
`
`Figure 4.15: Grabbing z bits recursively......................................................................... 126
`
`Figure 4.16: GRP1Z: the first stage in GRP units .......................................................... 127
`
`Figure 4.17: The basic cell used in GRP unit ................................................................. 128
`
`xvii
`
`Oracle-1036 p. 17
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`Figure 4.18: Diagram of GRP8D.................................................................................... 129
`
`Figure 4.19: Diagram of GRP8S..................................................................................... 130
`
`Figure 4.20: Transistor diagram of the basic cell (TG) in the GRP unit ........................ 131
`
`Figure 4.21: Transistor diagram of ITG that uses an inverted select signal ................... 133
`
`Figure 4.22: Number of cells in GRPmS and GRPmD................................................... 134
`
`Figure 4.23: The critical path of GRP64......................................................................... 136
`
`Figure 5.1: The encryption procedure of RC5................................................................ 164
`
`Figure 5.2: The round function of RC5-grp.................................................................... 166
`
`Figure 5.3: The encryption procedure of RC6................................................................ 172
`
`Figure 5.4: Instructions performing one round of RC6 .................................................. 175
`
`Figure 5.5: Programmatic definition of UNGRP............................................................ 177
`
`Figure 6.1: Sorting subwords in a register with GRP and BroadcastBit ........................ 184
`
`Figure 6.2: The pre-transpose conversion....................................................................... 185
`
`Figure 6.3: Sort subwords after the pre-transpose .......................................................... 186
`
`Figure 6.4: The MIX instruction..................................................................................... 187
`
`Figure 6.5: 8x8 matrix transpose with MIX.................................................................... 188
`
`Figure 6.6: Post-transpose after sorting .......................................................................... 190
`
`Figure 6.7: Sorting 16 4-bit subwords ............................................................................ 191
`
`Figure 6.8: GRP instructions for sorting eight 4-bit subwords....................................... 192
`
`Figure 6.9: Speedup of GRP and MIX over quicksort for sorting different number of 8-bit
`and 16-bit elements....................................................................................... 197
`
`xviii
`
`Oracle-1036 p. 18
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`
`
`Figure 6.10: Using GRP to configure GRP instructions for a permutation.................... 199
`
`Figure 6.11: Configuring an 8-bit permutation with GRP instructions .......................... 201
`
`Figure 6.12: Zigzag scanning of DCT coefficients......................................................... 203
`
`Figure 6.13: Performing zigzag scan with bit permutation instructions......................... 204
`
`
`
`xix
`
`Oracle-1036 p. 19
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`Chapter 1
`
`Introduction
`
`With the rapid proliferation of the Internet and wireless networks, security becomes an
`increasingly important issue for networked computing devices. These include high-end
`servers, desktops, laptops, handheld devices such as personal digital assistants (PDAs),
`and sensors. Awareness of security has increased because more and more events and
`activities are conducted electronically over the public networks. For example, many
`financial transactions are made only with online banking or stock exchange systems, and
`elections may also be held through online electronic voting systems. However, the
`security of networked computer systems is far from satisfactory. According to the
`statistics from the CERT Coordination Center [1], the number of reported security
`incidents has been rising every year from six in 1988 to 137,529 in 2003.
`
`Security features in computing devices may be implemented by either hardware or
`software mechanisms, or a combination of both. Hardware-implemented security
`features are often considered more reliable and correctly implemented [2], and
`customized hardware implementations often permit higher performance than software
`implementations. Software-implemented security features are much more flexible, and
`can more readily accommodate new standards, algorithms, policies and environments.
`
`1
`
`Oracle-1036 p. 20
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`Chapter 1: Introduction
`
`
`
`2
`
`By providing new hardware-implemented security enablers in a programmable processor,
`the benefits of software flexibility can be combined with hardware performance. In
`particular, this thesis studies how new processor features can accelerate software-
`implemented cryptography algorithms.
`
`Although computers have changed incredibly since the first one was built in 1940s
`[3], one thing has not changed. All the tasks performed are expressed, eventually, as
`basic commands that a computer’s underlying processor can recognize and execute.
`These basic commands are known as instructions. The instructions visible to a
`programmer or compiler are referred to as the instruction set architecture (ISA) of the
`programmable processor [4]. All software programs are converted to the processor’s
`instructions before being executed. Efficient ISAs reduce the implementation cost of a
`processor and provide superior performance at the same time. This thesis addresses the
`problem of efficient instructions for bit permutation.
`
`Bit permutation operations are interesting and important from both cryptographic
`and architectural points of view. Cryptographically, bit-level permutations naturally
`provide certain effects which are not easily obtained through word-level operations.
`Architecturally, the ability to support very fast bit permutations may be the next step in
`the evolution of word-oriented processors to support new multimedia and secure
`information processing workloads, which frequently manipulate data items smaller than a
`word. It is desirable to add to processors new instructions that operate on small data
`items efficiently. However, any new instructions introduced into a general-purpose
`processor should ideally be useful in many applications. In addition, a new instruction’s
`implementation complexity needs to be considered, and the cost versus benefit evaluated.
`This thesis considers these issues in the context of adding bit permutation instructions to
`processors.
`
`
`
`Oracle-1036 p. 21
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`Chapter 1: Introduction
`
`
`
`3
`
`Today, performing cryptographic processing in software rather than in add-on,
`special-purpose hardware is highly desirable to facilitate more pervasive use of secure
`information processing. This thesis is a detailed example of a more general exploration
`of new instruction-set architecture features motivated by cryptographic algorithms, as
`well as of architectural features occurring for other reasons, e.g., multimedia, that may
`influence the design of cryptographic algorithms. We hope to initiate a dialog between
`cipher designers and processor architects. Research in this direction could lead to
`architectural and algorithmic innovations that are helpful for achieving more pervasive
`security in information processing, networking, and storage.
`
`Section 1.1, together with Chapter 2, provides the background for the bit permutation
`problem that will be addressed in this thesis. Section 1.1 describes the processor
`architecture background for considering subword and bit permutations in word-oriented
`processor architectures, while Chapter 2 provides the security background by describing
`important classes of cryptography algorithms used in secure information processing.
`Also, Section 1.2 provides a brief summary of the thesis contributions, and Section 1.3
`provides an outline of the work covered in the thesis.
`
`1.1 Subword and bit permutations
`
`As technology advances and computers evolve,