`for DRM Applications(cid:63)
`
`S. Chow1, P. Eisen1, H. Johnson1, P.C. van Oorschot2
`
`1 Cloakware Corporation, Ottawa, Canada
`2 Carleton University, Ottawa, Canada
`(This research was carried out at Cloakware Corp.)
`{stanley.chow, phil.eisen, harold.johnson}@cloakware.com,
`vanoorschot@scs.carleton.ca
`
`Abstract. For applications such as digital rights management (drm)
`solutions employing cryptographic implementations in software, white-
`box cryptography (or more formally: a cryptographic implementation
`designed to withstand the white-box attack context) is more appropriate
`than traditional black-box cryptography. In the white-box context, the
`attacker has total visibility into software implementation and execution,
`and our objective is to prevent the extraction of secret keys from the
`program. We present methods to make key extraction difficult in the
`white-box context, with focus on symmetric block ciphers implemented
`by substitution boxes and linear transformations. A des implementation
`(useful also for triple-des) is presented as a concrete example.
`
`1 Introduction
`
`In typical software digital rights management (drm) implementations, crypto-
`graphic algorithms are part of the security solution. However, the traditional
`cryptographic model — employing a strong known algorithm, and relying on
`the secrecy of the cryptographic key — is inappropriate surprisingly often, since
`the platforms on which many drm applications execute are subject to the control
`of a potentially hostile end-user. This is the challenge we seek to address.
`A traditional threat model used in black-box symmetric-key cryptography is
`the adaptive chosen plaintext attack model. It assumes the attacker does not
`know the encryption key, but knows the algorithm, controls the plaintexts en-
`crypted (their number and content), and has access to the resulting ciphertexts.
`However, the dynamic encryption operation is hidden — the attacker has no
`visibility into its execution.
`We make steps towards providing software cryptographic solutions suitable in
`the more realistic (for drm applications) white-box attack context: the attacker
`is assumed to have all the advantages of an adaptive chosen-text attack, plus full
`access to the encrypting software and control of the execution environment. This
`includes arbitrary trace execution, examining sub-results and keys in memory,
`
`(cid:63) Version: Oct 15, 2002; Pre-proceedings for ACM DRM-2002 workshop.
`
`APPLE 1013
`
`1
`
`
`
`2
`
`S. Chow, P. Eisen, H. Johnson, P.C. van Oorschot
`
`performing arbitrary static analyses on the software, and altering results of sub-
`computation (e.g. via breakpoints) for perturbation analysis.
`Our main goal is to make key extraction difficult. While an attacker con-
`trolling the execution environment can clearly make use of the software itself
`(e.g. for decryption) without explicitly extracting the key, forcing an attacker to
`use the installed instance at hand is often of value to drm systems providers. How
`strong an implementation can be made against white-box threats is unknown.
`We presently have no security proofs for our methods. Nonetheless, regardless
`of the security of our particular proposal, we believe the general approach of-
`fers useful levels of security in the form of additional protection suitable in the
`commercial world, forcing an attacker to expend additional effort (compared to
`conventional black-box implementations). Our goal is similar to Aucsmith and
`Graunke’s split encryption/decryption [1]; the solutions differ.
`White-box solutions are inherently (and currently, quite significantly) bulkier
`and slower than black-box cryptography. These drawbacks are offset by ad-
`vantages justifying white-box solutions in certain applications. Software-only
`white-box key-hiding components may be cost-effectively installed and updated
`periodically (cf. Jakobsson and Reiter [8]), whereas smart cards and hardware
`alternatives can’t be transmitted electronically. Hardware solutions also cannot
`protect encryption within mobile code. While white-box implementations are
`clearly not appropriate for all cryptographic applications (see [4]), over time, we
`expect increases in processing power, memory capacity and transmission band-
`width, along with decreasing costs, to ameliorate the efficiency concerns.
`In black-box cryptography, differences in implementation details among func-
`tionally equivalent instances are generally irrelevant with respect to security
`implications. In contrast, for white-box cryptography, changing implementation
`details becomes a primary means for providing security. (This is also true, to
`a lesser extent, for cryptographic solutions implemented on smart cards and
`environments subject to so-called side-channel attacks.)
`In this paper, we focus on general techniques that are useful in producing
`White-Box implementations of Feistel ciphers. We use des (e.g. see [11]) to
`provide a detailed example of hiding a key in the software. des-like ciphers
`are challenging in the White-Box context since each round leaves half the bits
`unchanged and the expansions and permutations boxes are very simple (and
`known). We propose techniques to handle these problems.
`We largely ignore space and time requirements in the present paper, noting
`only that white-box implementations have been successfully used in commercial
`practice. In the present paper we restrict attention to the embedded (fixed) key
`case; dynamic-key white-box cryptography is the subject of ongoing research.
`The motivation for using des is twofold: (1) des needs only linear transforma-
`tions and substitution boxes, simplifying our discussion; and (2) our technique
`readily extends to triple-des which remains popular. We outline a white-box im-
`plementation for aes [5] elsewhere — see Chow et al. [4], to which we also refer
`for further discussion of the goals of white-box cryptography, related literature,
`
`2
`
`
`
`A White-Box DES Implementation
`
`3
`
`and why theoretical results such as that of Barak et al. [2] are not roadblocks to
`practical solutions.
`Following terminology and notation in §2, §3 outlines basic white-box con-
`struction techniques. §4 presents a blocking method for building encoded net-
`works. §5 provides an example white-box des implementation, with a recom-
`mended variant discussed in §5.3. Concluding remarks are found in §6.
`
`2 Terminology and Notation
`
`We follow the terminology of [4]. The main concept is the encoding of a transfor-
`mation, we use this extensively in this paper — we consider a substitution-box
`lookup to be a transformation; we also consider the whole des to be a transfor-
`mation. We use input/output encodings to protect all these transforms.
`
`Definition 1 (encoding) Let X be a transformation from m to n bits. Choose
`an m-bit bijection F and an n-bit bijection G. Call X(cid:48) = G◦X◦F −1 an encoded
`version of X. F is an input encoding and G is an output encoding.
`
`In many places, the transformations have wide inputs and/or outputs (in
`the des construction, some are 96 bits input and output), which means the
`implementation can get very large. To avoid huge tables, we can construct an
`encoding as the concatenation of smaller bijections. Consider bijections Fi of
`size ni, where n1 + n2 + . . . + nk = n. Let (cid:107) denote vector concatenation.
`Definition 2 (concatenated encoding) The function concatenation F1(cid:107)F2(cid:107)
`. . . (cid:107)Fk is the bijection F such that, for any n-bit vector b = (b1, b2, . . . , bn),
`F (b) = F1(b1,. . ., bn1)(cid:107)F2(bn1+1,. . ., bn1+n2)(cid:107) . . . (cid:107)Fk(bn1+...+nk−1+1,. . ., bn). For
`2 (cid:107) . . . (cid:107)F −1
`1 (cid:107)F −1
`such a bijection F , plainly F −1 = F −1
`k .
`Generally, output of a transformation will become the input to another subse-
`quent transformation, which means the output encoding of the first must match
`the input encoding of the second.
`
`Definition 3 (networked encoding) A networked encoding for computing Y
`◦ X (i.e. transformation X followed by transformation Y ) is an encoding of the
`form
`Y (cid:48) ◦ X(cid:48) = (H ◦ Y ◦ G−1) ◦ (G ◦ X ◦ F −1) = H ◦ (Y ◦ X) ◦ F −1 .
`
`In the subsequent sections, it is often useful to keep track of the sizes of the
`inputs and outputs. We use the following notations.
`P (cid:48) denotes an encoded implementation derived from the function P . n
`mP de-
`notes P , emphasizing that it maps m-vectors to n-vectors. For a matrix M, n
`mM
`denotes explicitly that M has m columns and n rows. (Interpreting application
`of M to a vector as function application, this notation is the same as above,
`though there is not necessarily a matrix representation M for every arbitrary
`function P .)
`
`3
`
`
`
`4
`
`S. Chow, P. Eisen, H. Johnson, P.C. van Oorschot
`
`I is the identity function on k-vectors. n
`mE (a mnemonic for an entropy-
`transference function) is any function from m-vectors to n-vectors such that, if
`m ≤ n, the mapping loses no bits of information, and if m > n, it loses at most
`n − m bits.
`(cid:104)e1, e2, e3, . . . , ek(cid:105) is a k-vector with elements ei; it will be evident from
`context whether the elements are bits. x(cid:107)y is the concatenation of vectors x
`and y. vi is the ith element. vi··j is the sub-vector containing elements i through
`j. kv denotes explicitly that v has k elements. ke is any vector with k elements
`(mnemonically: an entropy-vector). ⊕ denotes bitwise xor.
`at (mnemonic: affine transformation) denotes a vector-to-vector transforma-
`
`tion function P which can be defined for all me by nmP (me) = n
`mMme + nd, or
`P (e) = Me + d, where M is a constant matrix and d is a constant displacement
`vector. We consider ats over gf(2). Note that if A and B are ats, then so are
`A(cid:107)B and A ◦ B (where defined).
`
`kk
`
`3 Producing Encoded Implementations
`
`As is well known, des consists of “permutations”, S-Box lookups and xor oper-
`ations. We will follow our approach of applying encodings to each of these steps.
`For S-Box lookups and xor operations, encoding each operation (along with its
`input and output) seem to produce good security. For the various permutations,
`the problem is more difficult.
`These permutations are, by nature, very simple; and it is difficult to hide the
`information being manipulated. We address this by changing the domain from
`permutations to linear algebra so that we have more tools at our disposal. We
`start by expressing each of the des permutations and the bitwise xor operations
`as ats. Clearly, the resultant ats are still very simple and do not hide information
`well, but subsequent use of non-linear encoding (see 4) will make such hiding
`much more effective.
`
`3.1 Some techniques
`
`In general, we are working towards an implementation consisting entirely of
`substitution boxes, none of which implement ats. Several intermediate methods
`are needed to transform a normal cipher implementation to such a form. We
`describe such intermediate methods below. We will use a subset of these in our
`des example.
`
`Partial Evaluation If part of our input to P is known at implementation
`creation time, we can simply put in the known values to P (cid:48) and pre-evaluate all
`constant expressions. For example, in the present case where the key is known
`in advance, we can pre-evaluate all the operations involving the key. In the case
`of des this essentially means we replace the standard S-Boxes with key-specific
`S-Boxes.
`
`4
`
`
`
`A White-Box DES Implementation
`
`5
`
`Mixing Bijection A mixing bijection is a bijective at which attempts to maxi-
`mize the dependency of each output bit on all input bits. (Clearly, it is invertible
`and the inverse is also an at.)
`In des, for example, the permutations, represented as ats, have very sparse
`matrices (one or two 1-bits per row or column). In order to diffuse information
`over more bits, we can represent such a permutation P by J ◦ K, where K is
`a mixing bijection and J = P K−1, thereby replacing a sparse matrix with two
`non-sparse ones (which is advantageous in subsequent de-linearizing encoding
`steps).
`
`mP , where m and n are large, cannot
`I/O-Blocked Encoding An arbitrary n
`simply be encoded using two arbitrary bijective encodings as P (cid:48) = G ◦ P ◦ F −1
`using a substitution-box representation, since the size of a substitution-box varies
`exponentially with the number of input bits.
`We can produce practically implementable encodings for such a P by dividing
`its input into j blocks of a bits each, and its output into k blocks of b bits each, so
`
`mJ and nnK be two mixing bijections (see above).
`that m = ja and n = kb. Let m
`We choose (arbitrary) coding bijections for each block of the input and out-
`bGk. We then define FP = (F1(cid:107) · · · (cid:107)Fj) ◦ J and
`put: a
`aFj and b
`aF1, . . . , a
`bG1, . . . , b
`GP = (G1(cid:107) · · · (cid:107)Gk) ◦ K. Then P (cid:48) = GP ◦ P ◦ F −1
`as usual.
`P
`(We still have the problem of how wide-input ats are to be represented by
`networks of substitution boxes. Methods for this are described in §4.)
`This permits us to use networked encoding (def. 3) with a ‘wide I/O’ linear
`function in encoded form, since, prior to encoding, as a preliminary step, we
`only need to deal with J and K (i.e., we replace P with K ◦ P ◦ J−1), which
`
`i s and G(cid:48)is which we add
`can be done using the smaller blocking factors of the F (cid:48)
`during encoding. That is, if the input to P is provided by an at X, and the
`output from P is used by an at Y , we would use J ◦ X and Y ◦ K−1 instead.
`Then the input and output coding of the parts can ignore J and K — they have
`already been handled — and deal only with the concatenated non-linear partial
`I/O encodings F1(cid:107) · · · (cid:107)Fj and G1(cid:107) · · · (cid:107)Gk, which conform to smaller blocking
`factors easily handled by substitution boxes. This easily extends to non-uniform
`I/O blocked encoding (where blocks vary in size).
`
`Combined Function Encoding For functions P and Q that happen to be
`evaluated together, we could choose an encoding of P(cid:107)Q such as G◦(P(cid:107)Q)◦F −1.
`Essentially, we combine P and Q into a single function, then encode the combined
`input and output. The encoding mixes P ’s input and output entropy with Q’s,
`making it harder for an attacker to separate and determine the components P
`and Q. Note that this differs from concatenated encoding (def. 2) in how the
`encoding is applied. Here, the encoding applies to all components as a single
`unit.
`
`By-Pass Encoding In general, we want the implementation of each transform
`to have extra entropy at both the input and output, so that it is more difficult to
`
`5
`
`
`
`6
`
`S. Chow, P. Eisen, H. Johnson, P.C. van Oorschot
`
`identify the transform. For example, for a transform n
`mP , we carry a extra bits
`of entropy at the input, and b extra bits of entropy at the output, where a ≥ b,
`aE the by-pass componentm+aP (cid:48) as G ◦ (P(cid:107) baE) ◦ F −1. We call b
`
`
`we can encode n+b
`of P (cid:48). (If b
`
`aE has the specific form aaI, so that a = b, we call it identity by-pass.)
`
`Split-Path Encoding For a function nmP , we may use an encoding that is
`
`
`really a concatanation of two separate encodings. That is, we define n+km Q(me) =
`P (me)(cid:107) k
`mR(me) for all me, for some fixed function R. The effect is that, if P is
`lossy, Q may lose less information (or no information). We sometimes use this
`method to achieve local security (see §3.2.)
`
`Output Splitting This technique is useful for disguising outputs where input
`information can be well hidden. This does not appear to be the case for des:
`for implementations of des, output splitting is not recommended since it cannot
`provide much security.
`Where the technique is appropriate, we may encode a function P as k ≥ 2
`functions, P1, P2, . . . , Pk, where the encoded implementation for each part can
`mix in additional entropy as described above, and where the output of all of the
`encoded Pi’s is needed to determine the original output of P .
`mP1 to be a
`mP , we can choose k = 2, define n
`For example, given a function n
`mP2(me) = P (me) ⊕ P1(me) for all me.
`mE, and define n
`randomly selected fixed n
`At this point, we can compute the P output from the xor of the outputs of
`the two Pi’s. However, after we then independently encode the Pi’s, the output
`1 and P (cid:48)
`of P (cid:48)
`2 is not combinable via an at into information about P ’s output.
`
`3.2 Substitution Boxes and Local Security
`
`We can represent any function n
`mP by a substitution box (S-Box): an array of
`2m entries, each of n bits. To compute P (x), find the array entry indexed by the
`binary magnitude x. The exponential growth in S-Box size with its input width
`limits S-Boxes to the representation of narrow input functions.
`When the underlying P is bijective, the encoded S-Box for P (cid:48) is locally secure
`in the sense that it is not possible to extract any useful information by examining
`the encoded S-Box. The reason is that given a S-Box for P (cid:48), every possible
`bijective P fits (just as in an One-Time Pad, every possible plaintext can result
`in every possible ciphertext). Of course, this just means that an attack must be
`non-local.
`The lossy case is not locally secure. When a slightly lossy encoded function is
`represented as an S-Box, some information about the function beyond its input
`and output widths can be found by examining its S-Box. While it still requires
`non-local attacks to completely unravel such a lossy box, it often leaks enough
`information to allow attacks such as the statistical bucketing attack (see §5.4).
`
`6
`
`
`
`4 Wide-Input Encoded ATs: Building Encoded Networks
`
`A White-Box DES Implementation
`
`7
`
`(cid:80)n#
`
`(cid:80)m#
`
`Clearly, if we want to construct a S-Box with wide-input, say 32 bits or even
`96 bits, we will quickly use immense amounts of storage. This means we cannot
`represent a wide-input encoded at by an S-Box. We can, however, construct
`networks of S-Boxes to implement a wide-input encoded at. The following con-
`struction handles ats in considerable generality, including compositions of ats,
`
`mA encoded as nmA(cid:48). The form of the
`and for a wide variety of ats of the form n
`network can remain invariant except for variations in the bit patterns within its
`S-Boxes.
`For an at A, we simply partition the matrix and vectors into blocks, giving
`us well-known formulas using the blocks from the partition which subdivide the
`computation of A. We can then use (smaller) S-Boxes to encode the functions
`defined by the blocks, and combine the result into a network, using the methods
`in §3.1 above, so that the resulting network is an encoding of A.
`mM me + nd for all me. We choose
`
`Consider an at A, defined by nmA(me) = n
`partition counts m# and n# and sequences (cid:104)m1, . . . , mm#(cid:105) and (cid:104)n1, . . . , nn#(cid:105),
`such that
`1 ni = n. That is, the former sequence (the m-
`1 mi = m and
`partition) is an additive partition of m, and the latter sequence (the n-partition)
`is an additive partition of n.
`The m-partition partitions the inputs and the columns of M; the n-partition
`partitions d and the outputs. Hence the i, jth block in partitioned M contains
`mi columns and nj rows, the ith partition of the input contains mi elements,
`and the jth partition of d or the output contains nj elements.
`At this point, it is straightforward to encode the components (of the network
`forming A) to obtain an encoded network, by the methods of §3.1, and then
`representing it as a network of S-Boxes (see §3.2.) In such a network,none of
`the subcomputations is linear: each is encoded and represented as a non-linear
`S-Box.
`A naive version of this consists of a forest of n# trees of binary ‘vector add’
`S-Boxes, with m#(m# − 1) ‘vector add’ nodes per tree. At the leaves are m#
`unary ‘constant vector multiply’ nodes, and at the root is either a binary ‘vector
`add’ node (if there is no displacement) or a unary ‘constant vector add’ node (if
`there is a displacement).
`However, we can eliminate the unary ‘constant vector add’ and ‘constant
`vector multiply’ nodes entirely. We simply compose them into their adjacent
`binary ‘vector add’ nodes, thereby saving some space by eliminating their S-
`Boxes.
`A potential weakness of this entire approach is that the blocking of A may
`produce blocks, such as zero blocks, which convert to S-Boxes whose output
`contains none, or little, of their input information. This narrows the search space
`for an attacker seeking to determine the underlying at from the content and
`behavior of the network. However, so far as we have yet determined, such blocked
`implementations remain combinatorially quite difficult to crack, especially if we
`apply the following proposal.
`
`7
`
`
`
`8
`
`S. Chow, P. Eisen, H. Johnson, P.C. van Oorschot
`
`To address the above potential weakness, instead of encoding n
`mA, find trans-
`
`forms nmA1 and mmA2, such that A2 is a mixing bijection (see §3.1), and A1 =
`
`A◦ A−1
`2 . Encode the two functions separately into networks of S-Boxes, and con-
`2 representation to the inputs of the A(cid:48)
`nect the outputs of the A(cid:48)
`1 representation,
`
`1 ◦ A(cid:48)2 = A(cid:48).
`thus creating a representation of A(cid:48)
`While the above proposal helps, it is not easy, in general, to eliminate m × n
`blocks which lose more bits of input information than the minimum indicated
`knM into k × k blocks, we
`by m and n. For example, if we partition a matrix kn
`cannot guarantee that all of the k × k blocks are non-singular, even if the rank
`of M is greater than k. Hence if M is non-singular, a partition of M into square
`blocks may contain some singular (lossy) blocks.
`Therefore, some information about an encoded at may leak in its represen-
`tation as a blocked and de-linearized network of S-Boxes when this blocking
`method is used.
`
`5 A White-Box DES Implementation Example
`
`We now construct an embedded, fixed-key des implementation. We will start
`with a simple construction with weaknesses, both in security and efficiency. We
`discuss how to address these in §5.3.
`
`5.1 Replacing the DES SBs
`
`des is performed in 16 rounds, each employing the same eight des S-Boxes
`(dsbs), S1, . . . S8, and the same ats, sandwiched between initial and final ats
`(the initial and final permutations). Each dsb is an instance of 4
`6E (see e.g. [11]).
`
`In Figure 1(a), we see an unrolling of two typical des rounds. The round
`structure implements a Feistel network with a by-pass left-side data-path (Lr−1,
`Lr, Lr+1 in the figure) and an active right-side data-path (everything else in the
`figure). Kr is the round subkey of round r.
`???Check change fig 1b to be unrolled to match fig 1a??? The encoded im-
`plementation will look like Figure 1(b), where each round is represented by 12
`T boxes. Between rounds, the left and right sides are combined into a single 96
`bit representation and we uses a single rM2 transform to to subsume the P-Box,
`xor, side flip, and E-box. (See The Transfer Functions in §5.2 for a more
`detailed description.)
`We also need M1 to do an initial expansion of the input from 64 bits to the
`internal 96 bits form and M3 for the final shrink of the output.
`
`Eliminating the Overt Key by Partial Evaluation In each round, a dsb’s
`input is the xor of ‘unpredictable’ information (i.e. data), and ‘predictable’ infor-
`mation, determined by the algorithm and the key. We can merge the ‘predictable’
`information and the dsbs into new S-Boxes that are dependent on the key and
`round.
`
`8
`
`
`
`A White-Box DES Implementation
`
`9
`
`Fig. 1. Original and Modified DES
`
`96
`
`8
`
`8
`
`8
`
`r−1
`K T1
`
`r−1
`K T2
`
`r−1
`K T3
`
`. . .
`
`8
`
`8
`
`8
`
`8
`
`r−1
`K T12
`
`8
`
`96
`
`r−1
`M
`
`2
`
`2
`
`8
`
`T12
`
`r K
`
`8
`
`. . .
`
`96
`
`96
`
`8
`
`8
`
`8
`
`Tr
`K 1
`
`Tr
`K 2
`
`Tr
`K 3
`
`8
`
`8
`
`8
`
`L r−1
`
`rR −1
`
`Expansion
`
`r−1K
`
`...
`
`S1
`
`S8
`
`P−Box
`
`L r
`
`rR
`
`Expansion
`
`rK
`
`...
`
`S1
`
`S8
`
`P−Box
`
`L r+1
`
`rR +1
`
`} }}
`
`−1r
`TK
`
`r−1
`M
`
`Tr
`
`K
`
`(a) Two Rounds of DES
`
`(b) Modified DES Before
`De−Linearization and Encoding
`
`9
`
`
`
`10
`
`S. Chow, P. Eisen, H. Johnson, P.C. van Oorschot
`
`Let us therefore produce new S-Boxes identified as r
`KSi, where K is the
`encryption key, r is the round number, and i is the corresponding dsb number,
`KSi yields the same result as Si would produce in
`such that, for any given input, r
`round r if the des key were K, but the xors of the inputs of the original dsbs have
`been eliminated (see partial evaluation in §3.1). Each of the 16 × 8 = 128
`
`Si’s is still in 46E form of 6 input bits and 4 output bits.
`At this point, the overt key K has disappeared from the algorithm: it is
`
`represented in the contents of the rKSi’s. This permits us to remove the xors
`(“⊕”) with the inputs to S1, . . . , S8 shown in Figure 1(a).
`
`rK
`
`Preparing the Modified DSBs for Local Security In grey-box (smart card)
`implementations of des the dsbs have proven to be effective sites for statistical
`attacks. To make such attacks more difficult for our white-box implementation,
`we would like to end up with locally secure (see §3.2) S-Boxes. This means we
`need to replace the lossy S-Boxes with something that is bijective. We convert
`8E form by means of split-path encoding (see §3.1):
`
`the lossy rKSi’s into 8
`KSi(8e1··6)(cid:107) R(8e)
`Ti(8e) = r
`for all 8e, for the fixed key K, for round r = 1, . . . , 16, for S-Box number
`i = 1, . . . , 8, where we define R(8e) = (cid:104) 8e1, 8e6, 8e7, 8e8 (cid:105) for all 8e. This is an
`application of split-path encoding
`The plan is that the first six bits of the input of a rKTi will be the 6-bit input
`
`to dsb i in round r. We then add two extra input bits. The left 4-bit half of
`
`the output of a rKTi is the output of dsb i in round r, and the right 4-bit half
`contains the first and last input bits of dsb i in round r followed by the two
`extra input bits. That is, the right half of the output contains copies of four of
`the input bits.
`To see that each rKTi is a bijection, we note that the function Fa,b,c,d defined
`
`KTi((cid:104)a(cid:105)(cid:107)4e(cid:107)(cid:104)b, c, d(cid:105)) is a bijec-
`for any constant bits a, b, c, d by Fa,b,c,d(4e) = r
`tion. (Every row of every dsb contains a permutation of (cid:104)0, . . . , 15(cid:105), with the
`row selected by the bits corresponding to a, b above. The xor with the relevant
`bits of the key K effectively re-orders this permutation into a new permutation.
`The output of Fa,b,c,d is therefore a bijection mapping the 4e according to a 1-to-
`
`1 mapping of the input space determined by a permutation. Since rKTi simply
`KTi preserves all of its
`copies the bits corresponding to a, b, c, d to the output, r
`input entropy; i.e., it is a bijection.)
`
`rK
`
`Providing 64 bits of By-Pass Capacity In our construction, we want to
`hide the difference between the left and right sides of the Feistel data-path, so
`each rM2 expects more than just the 32 bits of S-Box outputs, both the left and
`(unchanged) right sides are needed. We refer to this as needing 64 bit of by-pass.
`As converted above, each r
`KTi carries eight bits to the next rM2: 4 bits of
`S-Box output, 2 bits from the right side and 2 bits that we can choose to be
`from the left. This means eight T boxes will only carry 16 bits from the left and
`
`10
`
`
`
`A White-Box DES Implementation
`
`11
`
`16 bits from the right. This means the by-pass capacity of the r
`KTi’s is too small
`by 32 bits.
`So we add four more S-Boxes for each round, designated as r
`KT9, . . . ,rK T12.
`
`Each is a bijective at of 8 bits to 8 bits. These extra S-Boxes are at’s to make it
`easier to access the bypassed bits for subsequent processing. (Subsequent steps
`will de-linearize every S-Box, so use of ats for these by-pass paths need not
`compromise security.) These new S-Boxes provide the remaining 32 bits: 16 bits
`of right-side by-pass capacity, and 16 bits of left-side by-pass capacity.
`
`5.2 Connecting and Encoding the New SBs to Implement DES
`
`The overall data-flow structure of our des implementation immediately prior
`to de-linearization of ats and encoding of S-Boxes (see §3.1, §3.2), is shown in
`Figure 1(b). It would look just the same after de-linearization and encoding,
`except that each Mi would be replaced by a corresponding M(cid:48)
`i and each r
`KTi
`KT(cid:48)
`i. Except for the addition of these “ (cid:48) ”
`would be replaced by a corresponding r
`characters, the figure would be identical.
`
`Data-Flow and Algorithm Before de-linearization and encoding, each Mi
`96M3, respectively.
`96M2, and 64
`64M1, 96
`is representable as a matrix, with forms 96
`(We briefly discuss the role of each one in §5.1 and in more detail below in The
`Transfer Functions.)
`In Figure 1(b), italic numbers such as 8 and 64 denote the length of the
`vectors traversing the data path to their left. Arrows represent data-paths and
`indicate their direction of data-flow.
`KTi’s in order by i in Figure 1(b) does not indicate
`The appearance of rows of r
`any ordering of their appearance in the implementation: the intervening M2
`transformations can handle any such re-ordering.
`
`The Transfer Functions In constructing M1, M2’s, and M3, we must deal
`with the sparseness of the matrices for the ats used in standard des. The bit-
`reorganizations, such as the Expansion and P-box transforms appearing in
`Figure 1(a), are all 0-bits except for one or two 1-bits in each row and column.
`The xor operations (“⊕” in Figure 1(a)) are similarly sparse.
`Therefore, we use the second method proposed for handling sparseness in §4:
`doubling the implementations into two blocked implementations, with the initial
`portion of each pair being a mixing bijection. We will regard this as part of
`the encoding process, and discuss the nature of the Mi’s prior to this ‘anti-
`sparseness’ treatment.
`The following constructions all involve only various combinations, compo-
`sitions, simple reorganizations, and concatenations of ats, and are therefore
`straightforward.
`M1 combines the following: (1) the initial permutation of des, (2) the Ex-
`pansion (see Figure 1(a)), modified to deliver its output bits to the first six in-
`KTi, combined with (3) the delivery of the 32 left-side data-path bits
`puts of each r
`
`11
`
`
`
`12
`
`S. Chow, P. Eisen, H. Johnson, P.C. van Oorschot
`
`to be passed through the by-pass provided by inputs 7 and 8 of r
`KT1, . . . ,r
`K T8
`and 16 bits of by-pass provided at randomly chosen positions in the four ‘dum-
`mies’, r
`K T12, all in randomly chosen order.
`KT9, . . . ,r
`M2 for each round combines the following: (1) the P-box transform (see
`Figure 1(a)), (2) the xor of the left-side data with the P-box output, (3) ex-
`traction of the original input of the right-side data-path, (4) the Expansion, as
`in M1, (5) the left-side by-pass, as in M1.
`M3 combines the following: (1) ignoring the inputs provided for simultaneous
`by-pass, (2) the left-side by-pass, as in M1, (3) inversion of the Expansion,
`ignoring half of each redundant bit pair, (4) swapping the left-side and right-
`side data (des effectively swaps the left and right halves after the last round),
`and (5) the final permutation.
`
`Blocking and Encoding Details We recommend using 4× 4 blocking for the
`Mi’s. As a result of the optimization noted in §4, this means that the entire
`implementation consists entirely of networked 8 × 4 (‘vector add’) and 8 × 8
`KT(cid:48)
`(r
`i) S-Boxes.
`Aside from M1’s input coding and M3’s output coding, both of which are
`64I (appropriately blocked), all S-Boxes are input- and output-coded
`simply 64
`using the method of §3.1 in order to match the 4-bit blocking factor required for
`each input by the binary ‘vector add’ S-Boxes.
`
`5.3 Recommended Variants
`
`The above section completes a naked variant of white-box des. The recommended
`variant applies input and output encodings to the whole des operations. That is,
`we modify the scheme shown in Figure 1(b), so that M1 is replaced by M1 ◦ M0
`and M3 is replaced by M4◦M3, where the M0 and M4 ats are mixing bijections.
`Each of M1 ◦ M0 and M4 ◦ M3 is, of course, a single at. When it is encoded in
`4-bit blocks, it becomes non-linear.
`One issue that arises is whether this recommended variant of des (or other ci-
`phers) is still an implementation of the standard algorithm. Although it employs
`an encoded input and output, we can pre- and post-process the input to this
`computation by the inverses of the pre- and post-encodings, to effectively cancel
`both. One might refer to this as operating on de-encoded intext and outtext. The
`de-encoding process can be done in any one or a combination of several places,
`for example: the software immediately surrounding the cryptographic computa-
`tion; more distant surrounding software; or even software executing on a separate
`node (with obvious coordination required). T