throbber
Design and Test of ICs for Secure Embedded Computing
`
`A Survey of Lightweight-
`Cryptography
`Implementations
`
`Thomas Eisenbarth
`Ruhr University Bochum
`
`Sandeep Kumar
`Philips Research Europe
`
`Christof Paar and Axel Poschmann
`Ruhr University Bochum
`
`Leif Uhsadel
`Catholic University of Leuven
`
`Editor’s note:
`The tight cost and implementation constraints of high-volume products,
`including secure RFID tags and smart cards, require specialized crypto-
`graphic implementations. The authors review recent developments in this
`area for symmetric and asymmetric ciphers, targeting embedded hardware
`and software.
`
`—Patrick Schaumont, Virginia Tech
`
`three
`to optimize all
`very difficult
`design goals at once. For example,
`a secure and high-performance hard-
`ware implementation can be achieved
`by a pipelined, side-channel-resistant
`architecture, resulting in a high area
`requirement, and thus high costs. On
`the other hand, it’s possible to design
`a secure, low-cost hardware implemen-
`tation with the drawback of limited performance.
`In this article, we present a selection of recently
`published lightweight-cryptography implementations
`and compare them to state-of-the-art results in their
`field. This survey covers recent hardware and software
`implementations of symmetric as well as asymmetric
`ciphers. We will discuss software and hardware
`implementations separately, because they have differ-
`ent and sometimes contrary characteristics. For
`example, bit permutations are virtually free in
`hardware, whereas in software they can significantly
`slow down implementations. Also, large substitution
`tables are often software friendly, but hardware
`realizations can be relatively costly. Finally,
`the
`evaluation metric is different: For software implemen-
`tations, we compare both RAM and ROM requirements
`and the required number of clock cycles. For
`hardware implementations, we focus on the required
`chip size and the number of clock cycles. We don’t
`compare power consumption for
`the hardware
`implementations, because different standard-cell tech-
`nologies were used and estimates from simulating
`environments are not accurate. Software implementa-
`tions let us achieve a rough estimate of power
`
`&THE UPCOMING ERA of pervasive computing will be
`
`characterized by many smart devices that—because
`of the tight cost constraints inherent in mass deploy-
`ments—have very limited resources in terms of
`memory, computing power, and battery supply. Here,
`it’s necessary to interpret Moore’s law differently:
`Rather than a doubling of performance, we see
`a halving of the price for constant computing power
`every 18 months. Because many foreseen applications
`have extremely tight cost constraints—for example,
`RFID in tetrapacks—over
`time, Moore’s law will
`increasingly enable such applications. Many applica-
`tions will process sensitive health-monitoring or bio-
`metric data, so the demand for cryptographic compo-
`nents that can be efficiently implemented is strong and
`growing. For such implementations, as well as for
`ciphers that are particularly suited for this purpose, we
`use the generic term lightweight cryptography in this
`article.
`Every designer of lightweight cryptography must
`cope with the trade-offs between security, cost, and
`performance. It’s generally easy to optimize any two of
`the three design goals—security and cost, security and
`performance, or cost and performance; however, it is
`
`522
`
`0740-7475/07/$25.00 G 2007 IEEE
`
`Copublished by the IEEE CS and the IEEE CASS
`
`IEEE Design & Test of Computers
`
`Authorized licensed use limited to: Ellie Daw. Downloaded on June 16,2020 at 15:31:38 UTC from IEEE Xplore. Restrictions apply.
`
`Juniper Ex. 1043-p. 1
`Juniper v Huawei
`
`

`

`consumption by multiplying the processing time by
`the average power consumption of the target device.
`Another distinction is between symmetric and
`asymmetric ciphers, because the latter offer more
`security functionality and therefore have different
`application scenarios. Symmetric ciphers serve mainly
`for message integrity checks, entity authentication,
`and encryption, whereas asymmetric ciphers addi-
`tionally provide key-management advantages and
`nonrepudiation. Asymmetric ciphers are computation-
`ally far more demanding,
`in both hardware and
`software. The performance gap on constrained
`devices such as 8-bit microcontrollers is huge. For
`example, an optimized asymmetric algorithm such as
`elliptic-curve cryptography (ECC) performs 100 to
`1,000 times more slowly than a standard symmetric
`cipher such as the Advanced Encryption Standard
`(AES) algorithm, which correlates with a two- to three-
`orders-of-magnitude higher power consumption. Un-
`like block ciphers, which are well investigated and
`understood,
`stream ciphers have received little
`attention from the scientific community. Although this
`has recently started to change, we cite stream ciphers
`only for comparison. (The increasing interest in stream
`ciphers is apparent in projects such as eStream, within
`the European Network of Excellence in Cryptography,
`which aims to foster knowledge about stream ciphers.)
`The ‘‘Related work’’ sidebar summarizes some recent
`developments in lightweight cryptography.
`
`Symmetric ciphers
`Many works on symmetric ciphers have been
`published during the past two decades. Because most
`main applications of symmetric ciphers use software
`implementations,
`it’s no surprise that nearly all
`algorithms—for example, the AES—have been de-
`veloped with good software performance in mind. The
`paradigm shift that we foresee will likely lead to an
`increasing demand for lightweight ciphers that per-
`form well in hardware. Therefore, we focus here on
`recently published works on ciphers that have been
`developed for minimal hardware requirements—
`namely, DESL1 and Present.2
`
`Hardware implementations of symmetric ciphers
`The only well-established cipher designed with
`a strong focus on low hardware cost is the Data
`Encryption Standard (DES). Comparing a standard
`one-round implementation of AES and DES, we find
`that the latter consumes only about 6% of the logic
`
`November–December 2007
`
`resources of AES and has a shorter critical path.
`However, researchers have described a low-power,
`low-cost AES implementation that requires only 3,400
`gate equivalents (GEs) and encrypts a plaintext within
`1,032 clock cycles.3 This impressive result seems to
`have achieved the limit in area minimization. This
`implementation, as well as the implementations of
`DESL and Present, features encryption only, because
`encryption is sufficient for many lightweight target
`applications, such as authentication with a challenge-
`response protocol.
`
`Inspired by the one-round implementation
`DES.
`results of AES and DES, we implemented a serialized
`version of DES that processes 4-bit and 6-bit data words
`rather than those with 32 bits and 48 bits. Our
`implementation requires 2,310 GEs and encrypts
`a plaintext within 144 clock cycles.1 To our
`knowledge,
`this
`is
`the smallest
`reported DES
`implementation, sacrificing throughput
`to achieve
`minimal area requirements. However, the 56-bit key
`limits the security provided. Brute-forcing this key
`space using software takes a few months and
`hundreds of PCs, but only a few days with a special-
`purpose machine such as Copacobana.4 Hence, this
`implementation is relevant only for applications
`needing short-term security or where the values
`protected are relatively low.
`In certain low-cost
`applications, such a security level is adequate. When
`a higher security level
`is needed, so-called key
`whitening, can be added to standard DES, yielding
`DESX. The key-whitening technique requires only two
`additional XOR gates: one gate to add a prewhitening
`key to the plaintext before the cipher processes it, and
`another to add a postwhitening key to the resulting
`ciphertext. In the case of DES, this enlarges the key
`space from 56 bits to 184 bits. However, because of
`time-memory trade-offs (birthday attack), the security
`level of DESX is bounded by 118 bits.
`
`DESL and DESXL. In our serialized DES implemen-
`tation, substitution boxes (S-boxes) take up approxi-
`mately 32% of the area. We can further decrease the
`gate complexity of DES by replacing the eight original
`S-boxes with a single new one, eliminating seven S-
`boxes as well as the multiplexer. This lightweight DES
`variant is called DESL and results in an approximately
`20% smaller chip than DES (1,850 GEs versus 2,310
`GEs). The S-box has been carefully selected and highly
`optimized, enabling DESL to resist common attacks
`
`Authorized licensed use limited to: Ellie Daw. Downloaded on June 16,2020 at 15:31:38 UTC from IEEE Xplore. Restrictions apply.
`
`523
`
`Juniper Ex. 1043-p. 2
`Juniper v Huawei
`
`

`

`Design and Test of ICs for Secure Embedded Computing
`
`Related work
`There are many recent symmetric ciphers—for
`and 8051 (the Chipcon CC1010 platform). They show
`example, Hight, Clefia, DESXL,1 and Present2—with
`that for the Atmel ATmega128 clocked at 8 MHz, a point
`special implementation properties proposed. Hight was
`multiplication using a 160-bit ECC GF(p) standard
`first presented at the 2006 Workshop on Cryptographic
`curve required 0.81 second. A security equivalent
`Hardware and Embedded Systems and was designed
`1,024-bit RSA encryption requires about 11 seconds.
`with good hardware performance in mind.
`In their
`Consequently, this article considers only ECC in the
`paper, the authors provide hardware figures for a one-
`asymmetric case. Although hyperelliptic curves hold
`round implementation—that is, one round is performed
`promise too, their lack of standardization makes them
`in one cycle—and they conclude that Hight is well-
`less promising at
`the moment. Optimum extension
`suited for ubiquitous computing devices such as
`fields (OEFs), which can be parameterized for small
`wireless sensor nodes and RFID tags. Their figures
`CPUs, offer an alternative method for
`fast ECC
`implementations.4 Several hardware implementations
`show that Hight requires approximately the same chip
`for standardized ECC have been suggested, but few
`size as the Advanced Encryption Standard (AES)
`are aimed at low-end devices. Most implementations
`algorithm (3,048 versus 3,400 gate equivalents, or
`focus on speed and, owing to their huge area
`GEs) but is much faster. However, figures for imple-
`requirements, are suitable mostly for server-end appli-
`mentations with a smaller footprint in hardware are not
`cations only.5
`yet available. Clefia was designed with a broader
`application range in mind—to perform well
`in both
`hardware and software implementations. Two ciphers
`especially optimized for software architectures are the
`Tiny Encryption Algorithm (TEA) family and the In-
`ternational Data Encryption Algorithm (IDEA). They
`consist only of arithmetic operations on 16-bit words
`(IDEA) and 32-bit words (TEA).
`IDEA consists of
`addition, XOR addition, and multiplication. The TEA
`family uses only addition, XOR addition, and shifts. In
`both cases, each operation can be implemented
`efficiently on 8-bit platforms. Neither cipher uses
`a substitution box (S-box), so they don’t need much
`memory. The Scalable Encryption Algorithm (SEA) can
`be parameterized according to processor size as well
`as plaintext size and key size; the goal
`is to enable
`efficient implementations on different platforms.
`With respect
`to asymmetric algorithms on small
`processors, Gura et al.3 make a key contribution,
`comparing RSA (Rivest, Shamir, Adleman) and elliptic-
`curve cryptography (ECC) on two different, commonly
`used 8-bit CPUs: AVR (the Atmel ATmega128 platform)
`
`References
`
`1. G. Leander et al., ‘‘New Lightweight DES Variants,’’ Proc.
`
`Fast Software Encryption (FSE 07), LNCS 4593, Springer-
`
`Verlag, 2007, pp. 196-210.
`
`2. A. Bogdanov et al., ‘‘PRESENT: An Ultra-Lightweight Block
`
`Cipher,’’ Proc. Workshop Cryptographic Hardware and
`
`Embedded Systems (CHES 07), LNCS 4727, Springer,
`
`2007, pp. 450-466.
`
`3. N. Gura et al., ‘‘Comparing Elliptic Curve Cryptography and
`
`RSA on 8-bit CPUs,’’ Proc. 6th Int’l Workshop Cryptograph-
`
`ic Hardware and Embedded Systems (CHES 04), Springer-
`
`Verlag, LNCS 3156, 2004, pp. 119-132.
`
`4. A. Woodbury, D.V. Bailey, and C. Paar,
`
`‘‘Elliptic Curve
`
`Cryptography on Smart Cards without Coprocessors,’’
`
`Proc. 4th Working IFIP Conf. Smart Card Research and
`
`Advanced Applications (CARDIS 2000), Kluwer Academic,
`
`2000, pp. 71-92.
`
`5. L. Batina et al., ‘‘Hardware Architectures for Public Key
`
`Cryptography Integration,’’ VLSI J., vol. 34, no. 6, 2003,
`
`pp. 1-64.
`
`such as linear and differential cryptanalysis and the
`Davies-Murphy attack. Thus, DESL achieves a security
`level
`appropriate
`for many
`applications. Key
`whitening can be applied to strengthen the cipher,
`yielding the DESXL cipher, with a security level of
`approximately 118 bits. DESXL requires 2,170 GEs and
`encrypts a plaintext within 144 clock cycles. (Further
`details are available elsewhere.1)
`
`Present. Besides efficiently implementing or slightly
`modifying an established cipher, an alternative for
`lightweight cryptography is to design a new hardware-
`optimized cipher from scratch. We followed this
`approach when designing Present, an SPN-based
`(substitution permutation network) block cipher
`with 32 rounds, a block size of 64 bits, and a key size
`of 80 or 128 bits. The main design philosophy was
`
`524
`
`IEEE Design & Test of Computers
`
`Authorized licensed use limited to: Ellie Daw. Downloaded on June 16,2020 at 15:31:38 UTC from IEEE Xplore. Restrictions apply.
`
`Juniper Ex. 1043-p. 3
`Juniper v Huawei
`
`

`

`Figure 1. Data path of the Present implementation. (S: substitution layer; P:
`
`permutation layer.) (Source: Bogdanov et al.2)
`
`simplicity: No part of the cipher was
`added without a good reason, such as
`thwarting an attack. Figure 1 depicts
`Present’s very simple design. Let’s take
`the round counter XOR in the key
`scheduling as an example of how to
`thwart a whole class of attacks with
`minimal
`area overhead. Hardware
`designers favor repetition because it
`lets them reuse parts of the chip and
`hence reduce chip size. However, if the
`key-update functions are similar in each
`round, this property can be exploited by
`related-key attacks. Among the possi-
`bilities for achieving deviating round functions, we
`chose the one that is most hardware efficient and
`simple: adding a round-dependent constant in the key
`scheduling. Because a round counter
`is needed
`anyway,
`this
`constitutes
`no
`additional
`area
`requirements; the XOR requires only 13 GEs.
`Present, like any other SPN, comprises three stages:
`a key-mixing step, a substitution layer, and a permuta-
`tion layer. For the key mixing, we chose a simple XOR
`because this operation can be implemented efficiently
`in both hardware and software. The key schedule
`consists essentially of a 61-bit rotation together with an
`S-box and a round counter. (Present-80 uses a single S-
`box, whereas Present-128 requires two S-boxes.) The
`substitution layer comprises 16 S-boxes with 4-bit
`inputs and 4-bit outputs (4 3 4). We decided to use
`similar S-boxes in both the data path and the key
`scheduling because we learned from DESL that this
`can result in significant area savings when a serialized
`implementation is desired. The choice of 4 3 4 rather
`than 8 3 8 S-boxes was also hardware driven; 8-bit
`S-boxes require about 40 times more area than 4-bit
`S-boxes (1,000 GEs versus 25 GEs). However, 4-bit
`S-boxes must be selected very carefully because they
`are cryptographically weaker
`than 8-bit S-boxes.
`Nevertheless, through careful selection, it’s possible
`to achieve an appropriate security level. The permu-
`tation layer is a very regular and simple bit trans-
`position. It comes virtually free in hardware because it
`is realized by simple wiring and hence needs no
`transistors. The permutation layer ensures that an S-
`box’s four output bits will be distributed to four distinct
`S-boxes in the following round, which ensures the
`avalanche effect. This is required to thwart linear and
`differential cryptanalyses. (Further details are available
`elsewhere.2)
`
`Table 1 compares the implementation results of
`various lightweight ciphers.
`
`Software implementations of symmetric ciphers
`The many design choices within the AES reflect the
`wide discussion of efficient symmetric cryptography
`for software implementations. Yet,
`for constrained
`devices in particular, designers must take into account
`the target platform’s special properties when choosing
`cryptographic algorithms.
`In many areas where cost and energy considera-
`tions dominate, computational power comes in the
`form of a small, inexpensive CPU. By a wide margin,
`8-bit controllers have the largest share of the worldwide
`CPU market. These small microcontrollers are con-
`strained in program memory (flash or ROM), RAM,
`clock speed, register width, and arithmetic capabilities.
`In this context, efficiency means more than simply
`throughput: Resources needed to implement a cipher
`should be kept small.
`In fact,
`in many situations,
`resource efficiency (measured mainly by memory
`consumption) is more critical than throughput, espe-
`cially because many embedded applications encrypt
`only small payloads. Typically, these 8-bit microcon-
`trollers offer as little as tens of kilobytes of program
`memory, and sometimes less than 1 Kbyte of SRAM;
`they usually operate at clock speeds of a few MHz.
`Nevertheless, especially for battery-powered de-
`vices, low computational complexity can be of great
`value too because processing time directly correlates
`with power consumption. Modern microcontrollers
`can enter a variety of power-down and power-saving
`modes as soon as they have finished computation.
`Hence, a fast-executing algorithm can reduce energy
`consumption and lengthen the lifetime of a battery-
`powered device.
`
`November–December 2007
`
`525
`
`Authorized licensed use limited to: Ellie Daw. Downloaded on June 16,2020 at 15:31:38 UTC from IEEE Xplore. Restrictions apply.
`
`Juniper Ex. 1043-p. 4
`Juniper v Huawei
`
`

`

`Design and Test of ICs for Secure Embedded Computing
`
`Table 1. Comparison of lightweight ciphers.
`
`Key
`
`bits
`
`80
`
`128
`
`128
`
`128
`
`96
`
`56
`
`184
`
`Cipher
`
`Block ciphers
`
`Present
`
`AES
`
`Hight
`
`Clefia
`
`mCrypton
`
`DES
`
`DESXL
`
`Stream ciphers
`
`Block
`
`bits
`
`Cycles per
`
`Throughput at
`
`Logic
`
`block
`
`100 kHz (Kbps)
`
`process
`
`64
`
`128
`
`64
`
`128
`
`64
`
`64
`
`64
`
`32
`
`1,032
`
`34
`
`36
`
`13
`
`144
`
`144
`
`200.00
`
`12.40
`
`188.20
`
`355.56
`
`492.30
`
`44.40
`
`44.40
`
`0.18 mm
`0.35 mm
`0.25 mm
`0.09 mm
`0.13 mm
`0.18 mm
`0.18 mm
`
`Area
`
`(GEs)
`
`1,570
`
`3,400
`
`3,048
`
`4,993
`
`2,681
`
`2,309
`
`2,168
`
`Trivium5
`
`80
`
`1
`
`0.13 mm
`0.13 mm
`Grain5
`1,294
`100.00
`1
`1
`80
`*AES: Advanced Encryption Standard; DES: Data Encryption Standard; DESXL: lightweight DES with key whitening.
`
`1
`
`100.00
`
`2,599
`
`In this context, we compare the previously
`discussed ciphers that are primarily optimized for
`hardware with several software-friendly ciphers. For
`comparison, we added two software-oriented stream
`ciphers of the eStream project: Salsa20 and LEX.6 The
`latter is a modified AES in which several bytes of the
`intermediate states are extracted and used as a key
`stream. Salsa20, like the Tiny Encryption Algorithm
`(TEA) and the International Data Encryption Algo-
`rithm (IDEA), is based on simple arithmetic opera-
`tions. We chose these ciphers because their consump-
`tion of ROM and RAM is suited for small embedded
`processors. Stream ciphers usually have lengthy setup
`phases. LEX needs only one AES encryption for setup,
`and Salsa20’s setup phase is even shorter. Hence, these
`ciphers can provide efficient encryption of the small
`payloads often found in embedded systems. All the
`discussed ciphers were implemented for 8-bit AVR
`microcontrollers. AVRs are a popular family of 8-bit
`RISC microcontrollers. The ATmega family offers 8
`Kbytes to 128 Kbytes of flash memory and 1 Kbyte to 8
`Kbytes of SRAM. The devices of the ATmega series
`have 32 general-purpose registers with a word size of 8
`bits. Most of the microcontrollers’ 130 instructions are
`one cycle, and the microcontrollers can be clocked at
`up to 16 MHz.
`To keep the source code small, we used a straight-
`forward approach for all our software implementa-
`tions. Only the substitution tables are realized as
`lookup tables (LUTs), where applicable, because this
`provides an enormous speedup for reasonable mem-
`ory consumption. Many fast software implementations
`of ciphers use larger LUTs to achieve a higher
`
`throughput. Unfortunately, this leads to an unaccept-
`able increase in code size for many embedded
`applications. The LUTs are stored in the program
`memory (ROM). TEA, IDEA, and Salsa20 do not use
`substitution tables, which allows for a smaller code.
`For the inversion needed in IDEA, we used a slow but
`extremely small algorithm. This explains the small
`code as well as the huge discrepancy between
`encryption and decryption time.
`in
`The results of our implementations appear
`Table 2. As expected, the software-oriented ciphers
`perform better on our platform. We had problems
`decreasing the code size of Hight, but it still shows
`good encryption performance. Although Present
`shows poor performance, its code, along with that of
`IDEA, is extremely small. LEX is a modified AES cipher,
`yet its code is smaller than that of AES because it lacks
`the decryption part.
`Considering the trade-offs between security, cost,
`and performance,
`the stream ciphers seem to be
`a good choice. LEX and Salsa20 do well in both
`throughput and size, yet they are good choices only if
`the encrypted payload is sufficiently large. Otherwise,
`they produce a computational overhead because of
`their huge block length and their setup phase.
`When code size is extremely critical, TEA, IDEA,
`and even Present seem to be reasonable choices. For
`most other cases, AES again shows its strength in
`software.
`
`Asymmetric ciphers
`there are three
`Among public-key algorithms,
`established families of practical relevance: ECC, RSA
`
`526
`
`IEEE Design & Test of Computers
`
`Authorized licensed use limited to: Ellie Daw. Downloaded on June 16,2020 at 15:31:38 UTC from IEEE Xplore. Restrictions apply.
`
`Juniper Ex. 1043-p. 5
`Juniper v Huawei
`
`

`

`Table 2. Comparison of software implementations of ciphers.
`
`Block
`
`Encryption
`
`Throughput
`
`Decryption
`
`Relative
`
`(cycles/
`
`at 4 MHz
`
`(cycles/
`
`throughput
`
`Code
`
`size
`
`SRAM
`
`Relative
`
`size
`
`code size
`
`Key
`
`size
`
`Cipher
`
`(bits)
`
`size
`
`(bits)
`
`block)
`
`(Kbps)
`
`block)
`
`(% of AES)
`
`(bytes)
`
`(bytes)
`
`(% of AES)
`
`Hardware-oriented block ciphers
`
`DES
`
`DESXL
`
`Hight
`
`56
`
`184
`
`128
`
`64
`
`64
`
`64
`
`8,633
`
`8,531
`
`2,964
`
`29.6
`
`30.4
`
`80.3
`
`23.7
`
`8,154
`
`7,961
`
`2,964
`
`11,239
`
`38.4
`
`39.4
`
`104.2
`
`30.7
`
`4,314
`
`3,192
`
`5,672
`
`936
`
`0
`
`0
`
`0
`
`0
`
`152.4
`
`112.8
`
`200.4
`
`33.1
`
`Present
`
`80
`
`64
`
`10,723
`
`Software-oriented block ciphers
`
`AES
`
`IDEA
`
`TEA
`
`SEA
`
`128
`
`128
`
`128
`
`96
`
`128
`
`64
`
`64
`
`96
`
`Software-oriented stream ciphers
`
`6,637
`
`2,700
`
`6,271
`
`9,654
`
`77.1
`
`94.8
`
`40.8
`
`39.7
`
`7,429
`
`15,393
`
`6,299
`
`9,654
`
`100.0
`
`123.0
`
`53.0
`
`51.5
`
`2,606
`
`596
`
`1,140
`
`2,132
`
`224
`
`0
`
`0
`
`0
`
`Salsa20
`
`128
`
`512
`
`18,400
`
`111.3
`
`NA
`
`144.4
`
`1,452
`
`280
`
`304
`1,598
`287.3
`NA
`214.6
`5,963
`320
`128
`LEX
`*IDEA: International Data Encryption Algorithm; TEA: Tiny Encryption Algorithm; SEA: Scalable Encryption Algorithm.
`
`100.0
`
`21.1
`
`40.3
`
`75.3
`
`61.2
`
`67.2
`
`(Rivest-Shamir-Adleman), and discrete logarithms.
`ECC is considered the most attractive family for
`embedded environments because of
`its
`smaller
`operand lengths and relatively lower computational
`requirements. ECC has been accepted commercially
`and has also been adopted by standardizing bodies
`such as the American National Standards Institute
`(ANSI), the IEEE, the International Organization for
`Standardization (ISO),
`the Standards for Efficient
`Cryptography Group (SECG), and the National In-
`stitute of Standards and Technology (NIST).
`
`A lightweight elliptic-curve engine
`Interest
`is growing in stand-alone asymmetric
`cipher engines in small, constrained devices for
`applications such as sensor networks and contactless
`smart cards (for example, e-passports). This interest is
`normally dictated by the need for better performance
`to satisfy a communication protocol or energy
`constraints.
`Here, we present a hardware implementation of
`a low-area, stand-alone, public-key processor
`for
`standardized ECC curves, details of which can be
`found elsewhere.7 We sacrifice flexibility to save area
`by setting the design to fit a specific standardized
`binary-field curve that is quite reasonable for con-
`strained devices. Standardized binary fields that pro-
`vide short-term security (113 bits), as well as fields that
`are required for medium-term security applications
`
`(193 bits), are implemented. For some constrained
`applications, 113-bit fields provide adequate security.
`The main reason for choosing a binary field rather
`than a prime field is the carry-free arithmetic, which is
`well-suited for hardware implementations. A second
`reason is the simplified squaring structure, which is
`a central idea used in the algorithms chosen for the
`processor design.
`
`Inversion. Itoh and Tsujii proposed the construction
`of an addition chain such that the inversion could be
`performed in O(log m) multiplications.8 Although the
`algorithm was proposed for optimal normal-basis
`implementations, where squarings are almost
`free
`(cyclic rotations),
`the area requirement
`for
`the
`squaring structure in our implementation is within
`bounds but has the same timing efficiency of 1 clock
`cycle as in the normal basis. Our algorithm exploits the
`fact that squaring is very efficient in the standard basis
`as long as the field is fixed. It’s easy to show that the
`inverse A21 can then be obtained in (tlog2(m 2 1)s +
`Hw(m 2 1) 2 1) multiplications and (m 2 1)
`squarings using this addition chain, where Hw
`denotes
`the Hamming weight of
`the binary
`representation.
`
`Point multiplication. We used a modified version of
`the Montgomery algorithm for implementing the
`point multiplication. We require one inversion and
`
`November–December 2007
`
`527
`
`Authorized licensed use limited to: Ellie Daw. Downloaded on June 16,2020 at 15:31:38 UTC from IEEE Xplore. Restrictions apply.
`
`Juniper Ex. 1043-p. 6
`Juniper v Huawei
`
`

`

`Design and Test of ICs for Secure Embedded Computing
`
`Figure 2. Area-optimized GF(2m) elliptic-curve cryptography (ECC) processor.
`
`four multiplications in each iteration. The algorithm
`also lets us compute each iteration without requiring
`extra temporary memory locations, thus reducing
`area.
`
`ECC processor design. The overall design appears
`in Figure 2. The three units—GF(2 m) addition; GF(2 m)
`multiplication,
`implemented as an MSB-first (most-
`significant-bit) multiplier; and GF(2 m) squaring—are
`closely interconnected inside a single arithmetic unit
`sharing the common input data bus A. The adder
`needs an additional data bus B for the second operand,
`and the multiplier requires a single-bit bi signal for the
`multiplicand. The operands are stored in the memory
`as registers (some of them as cyclic registers) with the
`output being selected for A, B, and bi using
`multiplexers with control signals (Asel, Bsel, and bi_sel)
`from the controller. All
`the operand registers are
`connected in parallel
`to data bus C, with the
`appropriate register being loaded on the basis of the
`controller load signal Cld_reg.
`requires no additional
`Inversion, as described,
`hardware apart from the preexisting multiplying unit
`and squaring unit, with some additional control
`circuitry to enable loading the proper variables to
`the appropriate unit.
`
`The implementation was synthesized for a custom
`ASIC design using AMI Semiconductor 0.35-micron
`CMOS technology. The designs have a total area
`ranging from 10 K GEs for a 113-bit field for short-term
`security to 18 K GEs for a 193-bit field for medium-term
`security applications. This implementation shows that
`an extremely small-area implementation of an ECC
`processor is possible in affine coordinates. Table 3
`shows the ECC processor’s total area in GEs and
`latency in clock cycles for a single scalar multiplica-
`tion and compares them with those of other imple-
`mentations.9-12
`
`Hardware-software codesign for ECC
`An ECC coprocessor, as we’ve defined it, can be
`small; it can also be prohibitively expensive for many
`pervasive applications and can be capable of perfor-
`mance that those applications don’t need. Hardware
`assistance in the form of instruction set extensions
`(ISEs) is more favorable in such situations because the
`cost of extra hardware is quite low compared with that
`of a coprocessor. Here, we present an efficient ISE
`implementation for ECC that
`is a tightly coupled
`hardware and software codesign. As a first step, we
`used a software-only ECC implementation to identify
`the functional elements and code segments that would
`
`528
`
`IEEE Design & Test of Computers
`
`Authorized licensed use limited to: Ellie Daw. Downloaded on June 16,2020 at 15:31:38 UTC from IEEE Xplore. Restrictions apply.
`
`Juniper Ex. 1043-p. 7
`Juniper v Huawei
`
`

`

`Table 3. ECC processor performance for scalar multiplication.
`
`Total area
`
`Technology
`
`Frequency
`
`Source
`
`This work
`
`Batina et al.9
`
`Gaubatz et al.10
`
`Wolkerstorfer11
`O¨ tztu¨ rk et al.12
`
`Field
`GF (2113)
`GF (2131)
`GF (2163)
`GF (2193)
`GF (267)2
`GF (2131)
`GF (p100)
`GF (2191)
`GF (p166)
`
`(GEs)
`
`10,113
`
`11,970
`
`15,094
`
`17,723
`
`12,944
`
`14,735
`
`18,720
`
`23,000
`
`30,333
`
`(mm)
`
`0.35
`
`0.35
`
`0.35
`
`0.35
`
`0.25
`
`0.25
`
`0.13
`
`0.35
`
`0.13
`
`(MHz)
`
`13.560
`
`13.560
`
`13.560
`
`13.560
`
`0.175
`
`0.175
`
`0.500
`
`68.500
`
`20.000
`
`Time
`
`(ms)
`
`14.4
`
`18.0
`
`31.8
`
`41.7
`
`2,390.0
`
`430.0
`
`410.5
`
`6.7
`
`31.9
`
`provide efficiency gains if implemented as an ISE.
`Then, a hardware model of
`the new processor
`determined the effects of the new extension on such
`parameters as execution time, code size, and data
`RAM usage.
`We used the AT94K family of field-programmable,
`system-level
`ICs as a development platform. This
`architecture integrates an AVR 8-bit microcontroller
`core and FPGA resources on a single chip. We chose
`the standardized 163-bit elliptic curve over GF(2 m), as
`recommended by NIST and ANSI. We used scalar
`point multiplication over this curve for determining the
`benefits of the ISE.
`The pure software implementation was done first.
`Table 4 includes the point arithmetic performance; it
`shows that GF(2 m) multiplication is the most costly
`operation with respect to execution time and memory
`requirement. Moreover,
`in the point multiplication
`algorithms,
`field multiplications are extremely fre-
`quent and therefore constitute the bottleneck opera-
`
`tion for ECC. A closer analysis of the multiplication
`block shows that most of the time was spent for load
`and store operations because the small number of
`registers available in the AVR processor could not hold
`the large operands. Therefore, a hardware extension
`for this functional block would potentially reduce the
`memory bottleneck and speed up the ECC.
`We present a complete GF(2163) multiplier as an ISE
`requiring the minimum possible area. We implemen-
`ted two multiplier architectures that provide a trade-off
`between performance and the extra area requirement.
`is a 163 3 163 least-significant-bit-first
`The first
`multiplier. The multiplier requires 163 AND gates, 167
`XOR gates, and 489 flip-flops. A 1633 163 multiplica-
`tion computes in 163 clock cycles, excluding data
`input and output. In our implementation, overhead
`from control and memory access leads to a total
`execution time of 313 clock cycles.
`An additional trade-off between area and speed is
`possible using the second option, a digit-serial multipli-
`
`Table 4. ECC scalar point multiplication performance at 4 MHz.
`
`Time
`
`Code size
`
`Data RAM
`
`Precomputation
`
`Field multiplier
`
`logic blocks
`
`multiplier
`
`(s)
`
`(bytes)
`
`(bytes)
`
`(bytes)
`
`Combinational
`
`Point
`
`Software multiplier
`
`163 3 163 multiplier
`
`245
`
`163 3 163, 4 digits
`
`498
`
`*NAF: nonadjacent form.
`
`Binary
`
`NAF
`
`Montgomery
`
`Binary
`
`NAF
`
`Montgomery
`
`Binary
`
`NAF
`
`Montgomery
`
`6.039
`
`5.031
`
`4.140
`
`0.264
`
`0.224
`
`0.169
`
`0.183
`
`0.115
`
`0.113
`
`10,170
`
`10,654
`
`8,208
`
`2,936
`
`3,014
`
`2,048
`
`2,936
`
`3,014
`
`2,048
`
`379
`
`379
`
`358
`
`294
`
`294
`
`273
`
`294
`
`294
`
`273
`
`544
`
`0
`
`0
`
`November–December 2007
`
`529
`
`Authorized licensed use limited to: Ellie Daw. Downloaded on June 16,2020 at 15:31:38 UTC from IEEE Xplore. Restrictions apply.
`
`Juniper Ex. 1043-p. 8
`Juniper v Huawei
`
`

`

`Design and Test of ICs for Secure Embedded Computing
`
`Table 5. Performance of ECC scalar multiplication in software.
`
`Finite-field multiplication
`
`Binary multiplication, 160-bit prime field14
`
`Binary multiplication, 160-bit prime field13
`
`Binary multiplication, 134-bit OEF15
`
`Window multiplication, 134-bit OEF15
`*OEF: optimum extension field.
`
`Platform
`
`ATmega128L
`
`ATmega128L
`
`8,051
`
`8,051
`
`Time (ms)
`
`Time (103 cycles)
`
`810
`
`1,040
`
`8,370
`
`1,830
`
`6,480
`
`8,383
`
`100,440
`
`21,960
`
`the
`In a bit-serial multiplier, only one bit of
`er.
`multiplicand is used in each iteration; however, in
`a digit-serial multiplier, multiple

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket