throbber
Lecture Notes in
`Computer Science
`
`1008
`
`Bart Preneel (Ed.)
`
`Fast Software Encryption
`
`Second International Workshop
`Leuven, Belgium, December 1994
`P roceedings
`
`Springer
`
`Page 1 of 24
`
`SAMSUNG EXHIBIT 1007
`
`

`

`Lecture Notes in Computer Science
`Edited by G. Goos, J. Hartmanis and J. van Leeuwen
`
`1008
`
`Advisory Board: W. Brauer D. Gries
`
`J. Stoer
`
`Page 2 of 24
`
`

`

`Bart Preneel (Ed.)
`
`Fast Software
`,,
`Encryption
`
`Second International Workshop
`Leuven, Belgium, December 14.:.16, 1994
`Proceedings
`
`Springer
`
`Page 3 of 24
`
`

`

`Series Editors
`Gerhard Goos
`Universitat Karlsruhe
`Vincenz-Priessnitz-Stra8e 3, D-76128 Karlsruhe, Germany
`
`Juris Hartmanis
`Department of Computer Science, Cornell University
`4130 Upson Hall, Ithaca, NY 14853, USA
`Jan van Leeuwen
`
`Volume
`
`tlA,b
`lq
`tA1__5T~3-r
`(Cf'q5
`o O p1 I
`
`Department Ele
`M, Katholieke Universiteit Leuven
`Kardinaal Mercierlaan 94, B-3001 Heverlee, Belgium
`
`Cataloging-in-Publication data applied for
`
`Die Deutsche Bibliothek - CIP-Einheitsaufnahme
`
`Fast software encry1>tion : second international workshop,
`Leuven, Belgium, December 14 - l6, 1994 ; proceedings / Bart
`Preneel (ed.). - Berlin ; Heidelberg ; New York ; Barcelona;
`Budapest : Hong Kong : London ; Milan ; Paris ; Tokyo
`Springer, 1995
`(Lecture notes in computer science ; Vol. 1008)
`ISBN 3-540-60590-8
`NE: Preneel, Bart [Hrsg.]; GT
`
`I
`-,, \ <
`
`✓
`
`, ·f ✓ I
`, _ l:J
`.
`
`r
`
`CR Subject Classification (1991): E.3, E2.l, E.4, G.2.1, G.4
`
`ISBN 3-540-60590-8 Springer-Verlag Berlin Heidelberg New York
`
`TI1i~ work i subjec1 10 copy right. All rlghti nre reserved, whether 1hc whole or part of lhe materin l is
`concerned, ~pec!ncoll y the right ~ of trn n~lation. rc11rin1ing , rc•usc or il lustrations. recitation, brondcnstlng,
`re produc ti on o n mlcrofi lrns or in any other way . und storage In dntn bn nks, Dupli ca1lon uf thi s pu blicn1ion
`or rnrt thereof is permitted onl y under the provL ion~ or lhc Ge.rmnn Copyri gh1 La w of September 9. 1965.
`in it current version. and permission for use mus t a lways he obrni ncd fro m Spri nger -Verlag , Violatio n ore
`liable for prosec111ion under the Germn n opyrfght Low .
`
`© Springer-Verlag Berlin Heidelberg 1995
`Printed in Germany
`
`Typesetting: Camera-ready by author
`SPIN 10487173
`06/3142 - 5 4 3 2 IO
`
`Printed on acid-free'paper
`
`Page 4 of 24
`
`

`

`\
`
`CONTENTS
`
`Introdu ction . . ... . . .......... .. . . .. . . . . . . . .... . . . . . . . .. . . ... . .......... . .. . 1
`,
`Bart Preneel
`
`Session 1: Stream Ciphers-Design
`Chair : Dieter Gollmann
`
`Clock-controlled pseudorandom generators on finite groups .... . . . ... . . . . ... 6
`Ulrich Baum and Simon Blackburn
`
`On random mappings and random permutations . ......... . ... .. . . . . ... .. . 22
`William G . Chambers
`
`Binary cyclotomic gener~tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
`C unsheng Ding
`
`Construction of bent functions and balanced Boolean functions
`with high nonlinearity ....... . . ... . . ........ ..... ........... ......... . . . . . 61
`Hans Dobbertin
`
`Additive and lin ear structures of cryptographic functions . . . .. .. . . .. . . . . ... 75
`Xuejia Lai
`
`Session 2: Block Ciphers-Design
`Chair: James Massey
`
`Th e R CS encryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
`Ronald L. Rivest
`
`Th e MacGuffln block cipher algorithm ....... . ....... . . . ... . ........ . . . . .. 97
`Matthew Blaze and Bruce Schneier
`
`S-boxes and round functions with controllable linearity and
`differential uniformity ........ ... ........... .... .......... . . ... .. . . ... . . . 111
`Kaisa Nyberg
`
`Properties of linear approximation tables . . . . ... .. . .. . . .... . ...... . . . .... 131
`Luke O'Connor
`
`Page 5 of 24
`
`

`

`VI
`
`Session 3: Stream Ciphers-Cryptanalysis
`Chair: Cunsheng Ding
`
`Searching far the optimum correlation attack . . ... . . . . .. . . . , . . . . . . . . . . . . . 137
`Ross Anderson
`
`A known plaintext attack on the PKZIP stream cipher . . .. .. .. .... . . . .. .. 144
`Eli Biham and Paul C. Kocher
`
`Linear cryptanalysis of stream ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
`Jovan Dj . Golie
`
`Feedback with carry shift registers over finite fields . . . . . . . . . . . . . . . . . . . . . . 170
`Andrew Klapper
`
`A free energy minimization framework for inference problem s
`in modulo 2 arithmetic . . ... . .. . ... . .. . ... .. .. . . . .. . . . ........ ... .. . . .... 179
`David J.C. MacKay
`
`Session 4: Block Ciphers-Differential Cryptanalysis
`Chair: Eli Biham
`
`Truncated and higher order differentials . . . •... ... •. .. . . .. . .. .. .. .. . . .. .. 196
`Lars R . Knudsen
`
`SAFER K-64: One year later . .. . .. . . .. ... .. . . . . .. . . ... . . . . . .. ..... . . .. . . 212
`James L. Massey
`
`Improved characteristics far differential cryptanalysis of hash functions
`based on block ciphers ..... ..... .. ... . .. . .. ...... . . .. . . . ... . .. .... . . . . ... 242
`Vincent Rijmen and Bart Preneel
`
`Session 5: Block Ciphers-Linear Cryptanalysis
`Chair: Bart Preneel
`
`Linear cryptanalysis using multiple approximations and FEAL . . . . .. . . . .. 249
`Burton S. Kaliski Jr. and Matt J.B. Robshaw
`
`Problems with the linear cryptanalysis of DES using more than one
`active S-box per round ... .. . . . . . . ......... .'. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
`Uwe Blocher and Markus Dichtl
`
`Page 6 of 24
`
`

`

`Correlation matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
`Joan Daemen, Rene Govaerts, and Joos Vandewalle
`
`VII
`
`Session 6: Odds and Ends
`Chair: Ross Anderson
`
`On the need for multipermutations: Cryptanalysis of MD4 and SAFER . . 286
`Serge Vaudenay
`How to exploit the intractability of exact TSP for cryptography . .. .... , .. 298
`Stefan Lucks
`
`Session 7: New Algorithms and Protocols
`Chair: Eli Biham
`
`How to reverse engineer an EES device . . .. . . .. . . .... . . . ... , . . . . . . . . . . . . . 305
`Michael Roe
`A fast homophonic coding algorithm based on arithmetic coding . . . . . . . . . 329
`Walter T. Penzhorn
`
`Session 8: Recent Results
`Chair: Bart Preneel
`
`On Fibonacci keystream generators ... .. . . ....... . ......... . . .. . . . .. .. .. . 346
`Ross Anderson
`
`Cryptanalysis of McGuffin
`Vincent Rijmen and Bart Preneel
`Performance of block ciphers and hash functions - one year later . . . . . . . . 359
`Michael Roe
`TEA, a tiny encryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
`David J. Wheeler and Roger M. Needham
`
`353
`
`Author Index . ... . . .... .... . . ... ..... . ... . ... . .. . . .... . . .. .. . . ..... .. 367
`
`Page 7 of 24
`
`

`

`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`A Free Energy Minimization Framework for
`Inference Problems in Modulo 2 Arithmetic
`
`David J.C . MacKay*
`
`Cavendish Laboratory
`Cambridge CB3 OHE
`United Kindom
`
`mackayCmrao.cam.ac.uk
`
`Abstract. This paper studies the task of inferring a binary vector s
`given noisy observations of the binary vector t = As modulo 2, where A
`is an M x N binary matrix. This task arises in correlation attack on a
`class of stream ciphers and in the decoding of error correcting codes.
`The unknown binary vector is replaced by a real vector of probabilities
`that are optimized by variational free energy minimization. The derived
`algorithms converge in computational time of order between WA and
`NwA, where WA is the number of ls in the matrix A, but convergence
`to the correct solution is not guaranteed.
`Applied to error correcting codes based on sparse matrices A, these al(cid:173)
`gorithms give a system with empirical performance comparable to that
`of BCH and Reed-Muller codes.
`Applied to the inference of the state of a linear feedback shift register
`given the noisy output sequence, the algorithms offer a principled version
`of Meier and Staffelbach's (1989) algorithm B, thereby resolving the open
`problem posed at the end of their paper. The algorithms presented here
`appear to give superior performance.
`
`1 The problem addressed in this paper
`
`Consider three binary vectors: s with components Sn, n = 1 . . . N, and r and n
`with components rm, nm, m = 1 ... M, related by:
`( As + n) mod 2 = r
`
`(1)
`
`where A is a binary matrix. Our task is to infer s given r and A, and given
`assumptions about the statistical properties of s and n.
`This problem arises, for example, in the decoding of a noisy signal transmitted
`using a linear code A. As a simple illustration, the (7,4) Hamming code takes
`
`* Supported by the Royal Society Smithson Research Fellowship
`
`Page 8 of 24
`
`

`

`180
`
`D.J.C. MacKay
`
`N = 4 signal bits, s, and transmits them followed by three parity check bits.
`The M = 7 transmitted symbols are given by t = As mod 2, where
`
`A =
`
`1 0 0 OJ
`0 1 0 0
`0 0 1 0
`0001
`J O l l
`l l O 1
`l 1 1 0
`
`[
`
`The noise vector n describes the comwtion of these bits by the communication
`channel. The received message is r = t + n mod 2. The receiver's task is to infer
`sumed nois properties of the channel. For the Ilaiun1ing
`s given r and the
`large and as I.he
`ode above this is not a. difficult task, but as N and M b ·om
`t.han
`number of ls in the matrix A increase , the inferen e of s in a. time I
`exponential in N becomes more challenging, for g neral A. Indeed, the general
`decoding problem for linear codes is NP-complete (Berlekamp, McEliece and van
`Tilborg 1978) .
`in the inference of the se(cid:173)
`The probl em defined in •quation (1) a l o ari e.
`quen e of a linear feedback shift register (LF R) from noisy observations (Meier
`tibaljevic and Golie 1993, An(cid:173)
`and Staff Ibach 1989, 1li.haljevi, aid Golie 1992,
`derson 1995).
`Th is pap r presents a fast algorithm for attempting to solve these tasks. The
`algorithm is similar in spirit to Gallager's (1963) soft decoding algorithm for
`low-density parity check codes.
`
`Assumptions
`
`and n is separable, i.e.,
`I assume that the prior probability distribution of
`P(s, n) = P(s)P (n) = Tin P(sn) Tim P(nm)- Defining I.he transmission t(s) =
`As mod 2, the probability of I.he observed data r as a funct.ion of s (the 'likelihood
`function') can be written:
`
`m
`
`where e is a vector of probabilities indexed by m given by em = P(nm = 1) if
`rm= 0 and em = P(nm = 0) if 7'm = 1. This likelihood function is fundamental to
`any probabilistic approach. The log likelihood can be written:
`
`logP(ris, A)= 2)m(s) l o g ~ + const.
`1- em
`= 2)m(s) 9m(em) + const.
`
`m
`
`m
`
`(2)
`
`(3)
`
`where 9m(em) = log[em/(1 - em)). Thus the natural
`distance between t and e is I:m tmgm(em),
`
`1
`norm for measuring the
`
`Page 9 of 24
`
`

`

`A Free Energy Minimization Framework for Inference Problems
`
`181
`
`The task is to infer s given A and the data r. The posterior distribution of
`5 is, by Bayes's theorem:
`
`P ( I A) = P(rls, A)P(s)
`
`s r,
`
`P(rlA)
`
`.
`
`(4)
`
`I assume our aim is to find the most probable s, i.e., the· s that maximizes the
`expression ( 4) over s. I assume however that an exhaustive search over the 2N
`possible sequences s is not permitted. One way to attack a discrete combinato(cid:173)
`rial problem is to create a related problem in which the discrete variables s are
`replaced by real variables, over which a continuous optimization can then be per(cid:173)
`formed [see for example (Hopfield and Tank 1985, Aiyer, Niranjan and Fallside
`1990, Durbin and Willshaw 1987, Peterson and Soderberg 1989, Gee and Prager
`1994, Blake and Zisserman 1987)]. In the present context, the question then is
`'how should one generalize the posterior probability ( 4) to the case where s is
`replaced by a vector with real components?' An appealing way to answer this
`question is to derive the continuous representation in terms of an approximation
`to probabilistic inference.
`
`2 Free Energy Minimization
`
`The variational free energy minimization method (Feynman 1972) takes an 'awk(cid:173)
`ward' probability distribution such as the one in ( 4), and attempts to approxi(cid:173)
`mate it by a simpler distribution Q(s; 0), parameterized by a vector of parame(cid:173)
`ters 0. For brevity in the following general description I will denote the complex
`probability distribution P(slA, r) by P(s). The measure of quality of the ap(cid:173)
`proximating distribution is the variational free energy,
`
`Q(s;0)
`~
`F(0) = L., Q(s; 0) log 7(;f ·
`s
`
`The function F(0) has a lower bound of zero which is attained only by a 0 such
`that Q(s; 0) = P(s) for all s. Alternatively, if P(s) is not normalized, and we
`define Z = I:s P(s), then F has a lower bound of - log Z, attained only by 0
`such that Q = P/Z. The variational method used in this paper is traditionally
`used in statistical Physics to estimate log Z, but here, log Z is just an additive
`constant which we ignore.
`When Q has a sufficiently simple form, the optimization of F over 0 may
`be a tractable problem, even though F is defined by a sum over all values of
`s. We find a 0* that minimizes F(0) in the hope that the approximating distri(cid:173)
`bution Q(s;0*) will give useful information about P(s). Specifically, we might
`hope that the s that maximizes Q(s; 0*) is a good guess for the s that maximizes
`P(s). Generically, free energy minimization produces approximating distribu(cid:173)
`tions Q(s; 0*) that are more compact than the true distribution P(s).
`
`Page 10 of 24
`
`

`

`182
`
`D.J.C. MacKay
`
`3 Free Energy Minimization for Equation ( 4)
`I take Q to be a separable distribution,
`
`n
`with 0n defining the probabilities qn thus:
`1
`9 = q~.
`1
`9 = q~; qn(Sn = O; 0n) = l
`qn(Sn = l; 0n) =
`+en
`l+e-n
`This is a nice parameterization because the log probability ratio is 0n = log(qA/ q~).
`The variational free energy is defined to be:
`'°'
`Q(s ;O)
`F(0) = ~ Q(s; 0) log P(rls , A)P(s) .
`I now derive an algorithm for computing F and its gradient with respect to 0 in
`a time that is proportional to the weight of A, WA (i.e., the number of ones in
`A). The free energy separates into three terms, F(0) = EL(O) + Ep(0) - S(0),
`where the 'entropy' is:
`S(O) = - LQ(s;0)logQ(s;0) = - L [q~logq~ + q~logq~],
`s
`with derivative: -/f;:S(O) = -q~q~On; the 'prior energy' is:
`Ep(O) = - LQ(s;O)logP(s) = - Lbnq~
`
`n
`
`n
`where bn = log[P(sn=l)/P(sn=O)], and has derivative atEp(0) = -q~qAbn;
`and the 'likelihood energy' is:
`EL(0) = - L Q(s; 0) log P(rls, A)= - LYm L Q(s; 0) tm(s) + const.
`m
`s
`The additive constant is independent of 0 and will now be omitted. To evaluate
`EL, we need the average value of tm (s) under the separable distribution Q(s; 0),
`that is, the probability that I:n AmnSn mod 2 = 1 under that distribution.
`
`S
`
`S
`
`Forward algorithm. We can compute Lhis probability for each m by a recur(cid:173)
`sion involving a sequence of probabilities P:n,., and p~1 ., for v = l ... N. These
`are defined to be the probabilities that the partial sum t~ = I::~=l AmnSn mod 2
`is equal to 1 and O respectively. These probabilities satisfy:
`}
`+ 1 0
`-
`1·f A
`1 _ 0 1
`q,.,Pm,v-1
`Pm,v - q,.,Pm,v-1
`mv -
`Pm,v = q,.,Pm,v-1 + q.,Pm,v-1
`1 1
`O O
`0
`
`l·
`,
`
`(5)
`
`- p l
`Pl
`m,v - m,v-1
`Po - p(cid:143)
`m,v - m,v-1
`
`} if Am,.,= 0.
`
`Page 11 of 24
`
`

`

`A Free Energy Minimization Framework for Inference Problems
`
`183
`
`The initial condition is P:,,, 0 = 0, P~,o = 1. The desired quantity is obtained in
`a time that is linear in the weight of row m of A:
`L Q(s; 0) tm(s) = p;,.,N.
`s
`The quantity p;,. N is a generalization to a continuous space of the product
`tm == Ams mod 2, 0with the satisfying property that if any one of the contributing
`terms qt is equal to 0.5, then the resulting value of p~ N is 0.5.
`The energy EL is then given by:
`'
`
`m
`
`Reverse algorithm. To obtain the derivative of the energy EL with respect to
`0n it is necessary to perform the same number of computations again. We intro(cid:173)
`duce another 'reverse' sequence of probabilities r:,, v and r~ v for v = N ... 1,
`defined to be the probabilities that the partial sum ti;: = "'£1=v AmnSn mod 2 is
`equal to 1 and O respectively. These can be computed by a recursive procedure
`equivalent to that in equation (5). Now note that P;,.,N can be written thus, for
`any n:
`Pm,N = qn Pm,n-1 r m,ntl + Pm,n--1 r m,ntl + qn Pm,n-1 r m,ntl + Pm,n-1 r m,ntl

`Having pulled out the 0n dependence (in the two factors q~ and q~), it is easy
`to obtain the derivative:
`
`0
`
`1
`
`)
`
`1(1
`
`1
`
`0
`
`0
`
`)
`
`1
`
`0(1
`
`0
`
`+D O ) (1
`d
`h
`- ( 1
`1
`0
`W ere mn -
`Pm,n-1 r m,ntl Pm,n-1 r m,ntl
`Pm,n-1 r m,ntl
`-
`
`+D 1 )
`Pm,n-1 r m,ntl ·
`
`Total derivative
`
`The derivative of the free energy is:
`
`{)On = qn qn 0n - bn - L;;: 9m dmn
`{)F
`
`O 1 [
`
`"
`
`(6)
`
`l
`
`•
`
`Optimizers
`
`This derivative can be inserted into a variety of continuous optimizers. I have
`implemented both conjugate gradient optimizers and 'reestimation' optimizers
`and found the latter, which I now describe, to be superior. The reestimation
`method is motivated by the form of the derivative (6); setting it to zero, we
`obtain the iterative prescription:
`
`0n :== bn + LYmdmn,
`m
`
`Page 12 of 24
`
`

`

`184
`
`D.J.C. MacKay
`
`which I call the reestimation optimizer. It can be implemented with either syn(cid:173)
`chronous or sequential dynamics, that is, we can update all 0n simultaneously,
`or one at a time. The sequential reestimation optimizer is guaranteed to reduce
`the free energy on every step, because everything to the right of 0n in equation
`(6) is independent of 0n. The sequential opt imi zer can be effi ciently interleaved
`with the reverse recursion; the reverse probability rt is evaluated from r,¾_1 for
`each m after 0v has been updated. The synchronous optimizer does not have an
`associated Lyapunov function, so it is not guaranteed to be a stable optimizer,
`but empirically it sometimes performs better.
`Optimizers of the free energy can be modified by introducing 'annealing'
`or 'graduated non-convexity' techniques (Blake and Zisserman 1987, Van den
`Bout and Miller 1989), in which the non-convexity of the objective function F
`is switched on gradually by varying an inverse temperature parameter (3 from
`0 to 1. This annealing procedm:e . is intended to prevent the algorithm running
`headlong into the minimum that 'the initial gradient points towards. We define:
`F(0, (3) = (3EL(0) + (3pEp(0) - S(0),
`and perform a sequence of minimizations of this function with successively larger
`values of (3, each starting where the previous one stopped. If we choose (3p = f3
`then f3 influences both the likelihood energy and the prior energy, and if f]p = l
`then (3 influences just the likelihood energy. The gradient of F(0, (3) with respect
`to 0 is identical to the gradient of Fin equation (6) except that the energy terms
`are multiplied by (3p and (3. This annealing procedure is deterministic and does
`not involve any simulated noise.
`
`(7)
`
`Comments
`
`None of these algorithms is expected to work in all cases, because a) there may
`be multiple free energy optima; b) the globally optimal s might not be associated
`with any of these free energy optima; c) the task is a probabilistic task, so even
`an exhaustive search is not guaranteed always to identify the correct vector s.
`Any particular problem can be reparameterized into _other representations
`s' = sU with new matrices A' = u- 1 A. The success of the algorithms is
`expected to depend crucially on the choice of representation. The algorithms are
`expected to work best if A is sparse and the true posterior distribution over s
`is close to separable.
`
`Computational complexity
`
`T he gradien t of t he free energy can be calcul ated in time linear in WA- T h
`algorit hms are expected to take of order 1., or at most N gradient evaluation
`to co1w rge, so hat the otal t.ime tak n is of order between W A and WA N .
`t he most
`The space requirements of'the sequen ial reestimaLion opt imizer ar
`demanding (but not severe): memory proportional to WA is required.
`
`Page 13 of 24
`
`

`

`A Free Energy Minimization Framework for Inference Problems
`
`185
`
`M
`
`0-5
`
`0.45
`
`0.4
`
`0 .35
`
`No ilIU.lealing
`
`• •
`
`,l'
`
`. .
`•
`. .
`.
`.;I'-
`0,3 .
`
`..,,.. . . ,l' . 0.45
`.;I'- • . ,l' . . . •'
`.
`.
`.
`. . .
`.;I'- .
`0 .35 . .
`
`0,5
`
`0.4
`
`.
`
`With annealing
`
`•
`
`.;I'-
`
`0,3
`
`. o
`fl
`rf1' cf''
`0.25
`02 # # _d'
`0.15 # # 0
`0. 1 #### tP
`0.05 #######tP 0
`0
`
`025
`
`0
`
`(cid:143)
`(cid:143)
`02 # cl' (cid:143) "
`0 .15 cl'#" (cid:143) d'' "
`,d'
`"
`,p' ,p' DO cf'
`0,05 cl'cl'##d' d' # d' d'
`
`0.1
`
`0
`
`(cid:143)
`
`500
`
`0
`
`0
`
`0.5
`
`0.45
`
`,l'
`
`.
`.
`
`20
`
`25
`
`10
`
`15
`
`•' .. •
`
`0
`05
`
`0.45
`
`0.4
`
`5
`
`.
`.
`.
`
`D
`
`D
`
`10
`
`15
`
`0
`
`0
`
`,o
`
`25
`
`20
`
`. • .
`
`(cid:143)
`
`0
`
`0
`
`0.4
`0.35 #
`(cid:143)
`d' cl'
`0.3
`cP' # d'
`0.25
`02 ## a
`0,15 cl'##d'
`0,1 #####
`0 .05 # # # # (cid:143)
`(cid:143)
`0
`
`1000
`
`0
`
`0.5
`
`0.45
`
`.
`
`10
`
`0
`
`15
`
`.
`
`. " d' d'
`0 " d'
`"
`
`,fl
`035
`,I'
`0.3 cl'#
`cP' #
`a
`0.25
`02 ###
`0.15 ###a d' a
`0.1 #### " # d'
`005 # # # # # cl' a" cfl' a
`0
`
`10
`
`15
`
`20
`
`0
`
`0,5
`
`0 45
`
`,l'
`
`0
`
`0.4
`035 #
`03 #cl'
`cP' # d'
`0 25
`0.2 ###
`a
`0,15 #### (cid:143)
`(cid:143)
`0. 1 ##### a
`" (cid:143)
`005 ######## "
`s
`
`0
`
`0
`
`0
`
`0
`
`10
`
`15
`
`20
`
`a
`
`(cid:143)
`
`25
`
`"
`(cid:143)
`
`25
`
`20
`
`25
`
`a D
`0
`
`a
`
`25
`
`a
`
`0
`
`a
`
`20
`
`0.4
`0 .35 #
`cl'
`0.3 ##
`0.25 # # 0
`a
`0.2 ## D
`0.15 ###a d'
`rfl' # DD tf'
`0.1
`0
`0
`,p',p',p',p',p'D•o tf'
`
`0
`
`0
`
`0
`
`0.05
`
`2000
`
`0
`
`0
`
`5
`
`10
`
`15
`
`Fig. 1. Performance of synchronous reestimation algorithm without and with anneal(cid:173)
`ing: N=SO
`Horizontal axis: average number of ls per row of A. Vertical axis: noise level f n• Three
`experiments were conducted at a grid of values. Outcome is represented by: box =
`'correct'; star= 'ok'; dot= 'wrong', Two points in each triplet have been displaced so
`that they do not overlap. The horizontal line on each graph indicates an information
`theoretic bound on the noise level beyond which the task is not expected to be soluble.
`
`4 Demonstration on Random Sparse Codes A
`
`Mock data were created as follows . The first N lines of A were set to the identity
`matrix, and of the remaining bits, a fraction f A were randomly set to 1. This
`matrix can be viewed as defining a systematic error correcting code in which
`the signal s is transmitted, followed by (M - N) sparse parity checks. Each
`component of s was set to 1 with probability 0.5, The vector t = As mod 2
`
`Page 14 of 24
`
`

`

`186
`
`D.J.C. MacKay
`
`was calculated and each of its bits was flipped with noise probability f n. Four
`parameters were varied: the vector length N, the number of measurements M
`the noise level fn of the measurements, and the density f A of the matrix A.
`'
`When optimizing 0 the following procedure was adopted. The initial condi(cid:173)
`tion was 0n = 0, Vn. If annealing was used, a sequence of 20 optimizations was
`performed, with values of /3 increasing linearly from 0.1 to 1.0. Without anneal(cid:173)
`ing, just a single optimization was performed with f3 = 1. The optimizers were
`iterated until the gradient had magnitude smaller than a predefined threshold ,
`or the change in F was smaller than a threshold, or until a maximum number
`of iterations was completed. At the end, s was guessed using the sign of each
`element of 0.
`We can calculate a bound on the noise level f n beyond which the task is
`definitely not expected to be soluble by equating the capacity of the binary
`symmetric channel, { 1 - H 2 (f n)), to the rate of the code, N / M. Here H 2 (f n) is
`the binary entropy, H2(fn) = -fn log2 fn - {1 -
`fn), This bound
`fn) log2 {1 -
`is indicated on the graphs that follow by a solid horizontal line. For noise levels
`significantly below this bound we expect the correct solution typically to have
`significantly greater likelihood than any other s.
`
`Results
`
`For the results reported here, I set N to 50 and M to 500, 1000 and 2000, and
`ran the synchronous reestimation algorithm multiple times with different seeds,
`with density f A varying from 0.05 to 0.50, and error probability fn varying from
`0.05 to 0.45. In each run there are three possible outcomes: 'correct', where the
`answer is equal to the true vectors; 'wrong', where the answer is not equal to the
`true vector, and has a smaller likelihood than it; and 'ok', where the algorithm
`has found an answer with greater likelihood than the true vector (in which case,
`one cannot complain).
`Annealing helps significantly when conjugate gradient optimizers are used,
`but does not seem to make much difference to the performance of the reestima(cid:173)
`tion algorithms. As was already mentioned, the reestimators work much better
`than the conjugate gradient minimizers ( even with annealing).
`Figure 1 shows the outcomes as a function of the weight of A (x-axis) and
`error probability (y-axis). There seems to be a sharp transition from solvability
`to unsolvability. It is not clear whether this boundary constitutes a fundamental
`bound on what free energy minimization can achieve; performance might possi(cid:173)
`bly be improved by smarter optimizers. Another idea would be to make a hybrid
`of discrete search methods with a free energy minimizer.
`Experiments with larger values of N h~ve also been conducted. The success
`region looks the same if plotted as a function of of the ii.verage number of ls per
`row of A and the noise level.
`
`Page 15 of 24
`
`

`

`A Free Energy Minimization Framework for Inference Problems
`
`187
`
`Table 1. Error rates of error correcting codes using free energy minimization.
`a) Sparse random code matrix (section 4). These results use N = 100 and f A = 0.05,
`and various noise levels f n and M = 400, 1000, 2000. The synchronous reestimation op(cid:173)
`timizer was used without annealing, and with a maximum of 50 gradient evaluations.
`For every run a new random matrix, signal and noise vector were created. One thou(cid:173)
`sand runs were made for each set of parameters. The capacity of a binary symmetric
`channel with fn = 0.l, 0.15, 0.2 is 0.53, 0.39, 0.28 respectively.
`b) LFSR code (section 5). The number of taps was 5, selected at random. The syn(cid:173)
`chronous reestimation optimizer was used with the annealing procedure described in
`section 5.
`
`a)
`
`fn N M Number Information
`of errors
`rate
`0.25
`0.1 100 400 14/1000
`0.1 100 1000 0/1000
`0.1
`0.15 100 1000 3/1000
`0.1
`0.2 100 1000 54/1000
`0.1
`0.2 100 2000 11/1000
`0.05
`
`b)
`
`fn
`
`k N Number Information
`of errors
`rate
`0.1 100 400 69/1000
`0.25
`0.15 100 1000 1/1000
`0.1
`0.2 100 2000 26/1000
`0.05
`
`Table 2, BCH codes and Reed-Muller codes.
`For each noise level f n = 0.l, 0.15, 0.2, I give n, k, t for selected BCH codes and Reed(cid:173)
`Muller codes with f:block = P(more than t errorsln) < 0.1, and rate > 0.05, 0.03, 0.01
`respectively, ranked by f:block. Values of n, k, t from Peterson and Weldon (1972).
`
`fn = O.l
`..__n- -k~ _t _k_6_io_c_k_ r_at_e_______.1-- n-
`
`II
`
`fn = 0.2
`in= 0.15
`-k-'--t-,-k--,6"'lo'"'c"'k-r-at-e--1 1---n-k~-t..,.lf:--.6,..lo'"'c"'k- r-at-e--1
`
`II
`
`BCH CODES
`63 10 13 0.003 0.159
`127 29 21 0.008 0.228
`1023 153 125 0.009 0.150
`63 16 11 0.021 0.254
`511 85 63 0.037 0.166
`REED-MULLER CODES
`8 31 8.8e-7 0.063
`128
`64
`7 15 4.4e-4 0.109
`1024 56 127 0.0055 0.055
`32
`6
`7 0.012 0.188
`512 46 63 0.038 0.0898
`16
`5
`3 0.068 0.313
`
`BCH CODES
`BCH CODES
`8 31 0.002 0.063 1023 26 239 0.004 0.025
`127
`511 40 95 0.011 0.D78
`255 9 63 0.028 0.035
`63
`7 15 0.021 0.111
`511 19 119 0.030 0.037
`127 15 27 0.022 0.118
`255 13 59 0.093 0.051
`255 21 55 0.002 0.082 1023 46 219 0.123 0.045
`1023 91 181 0.008 0.089
`511 28 111 0.152 0.055
`1023 101 175 0.028 0.099 REED-MULLER CODES
`255 29 47 0.056 0.114 1024 11 255 5.7e-5 0.011
`REED-MULLER CODES
`512 10 127 0.0034 0.020
`256
`9 63 2e-5
`0.035
`128 8 31 0.098 0.063
`8 31 0.0021 0.063
`128
`7 0.096 0.188
`32
`6
`
`Page 16 of 24
`
`

`

`188
`
`D.J.C. MacKay
`
`Potential of this method for error correcting codes
`
`Might an error correcting code using this free energy minimization algorithm be
`practically useful? I restrict attention to the ideal case of a binary symmetric
`channel and make comparisons with BCH codes, which are described by Peterson
`and Weldon (1972) as "the best known constructive codes" for memoryless noisy
`channels, and with Reed-Muller (RM) codes. These are multiple random error
`correcting codes that can be characterized by three parameters ( n, k, t). The
`block length is n (this section's M), of which k bits are data bits (this section's
`N) and the remainder are parity bits. Up to t errors can be corrected in one
`block. How do the information rate and probability of error of a sparse random
`code using free energy minimization compare with those of BCH codes?
`To estimate the probability of error of the present algorithm I made one
`thousand runs with N = 100 and f A = 0.05 for a small set of values of M and
`fn (table la). A theoretical analysis will be given, along with a number of other
`codes using free energy minimization, in (MacKay and Neal 1995).
`For comparison, table 2 shows the best BCH and RM codes appropriate for
`the same noise levels, giving their probability of block error and their rate. I as(cid:173)
`sumed that the probability of error for these codes was simply the probability of
`more than t errors in n bits. In principle, it may be possible in some cases to make
`a BCH decoder that corrects more than t errors, but according to Berlekamp
`(1968), "little is known about ... how to go about finding the solutions" and "if
`there are more than t + 1 errors then the situation gets very complicated very
`quickly."
`Comparing tables la and 2 it seems that the new code performs as well as
`or better than equivalent BCH and RM codes, in that no BCH or RM code has
`both a greater information rate and a smaller probability of block error.
`The complexity of the BCH decoding algorithm is n log n (here, n ¢? M),
`whereas that of the FE algorithm is believed to be WA(r)M where WA(r) is the
`number of ls per row of A, or at worst WA(r)MN. There is therefore no obvious
`computational disadvantage.
`
`5 Application to a cryptanalysis problem
`
`The inference of the state of a shift register from probabilistic observations of
`the output sequence is a task arising in certain code breaking problems (Meier
`and Staffelbach 1989, Anderson 1995).
`A cheap 'stream cipher' for binary communication is obtained by sending
`the bitwise sum modulo 2 of the plain text to be communicated and one bit
`of a linear feedback shift register (LFSR) of length k bits that is configured to
`produce an extremely long sequence of pseudo-random bits. This cipher can be
`broken by an adversary who obtains part of the plain text and the corresponding
`encrypted message. Given k bits of plain text, the state of the shift register can
`be deduced, and the entire encrypted message decoded. Even without a piece
`of plain text, an adversary may be able to break this code if the plain text has
`
`Page 17 of 24
`
`

`

`A Free Energy Minimization Framework for Inference Problems
`
`189
`
`high redundancy (for example, if it is an ASCII file containing English words),
`by guessing part of the plain text.
`To defeat this simple attack, the cipher may be modified as follows. Instead
`of using one bit of the shift register as the key for encryption, the key is defined
`to be a binary function of a subset of bits of the shift register. Let the number of
`bits in that subset be h. If this function is an appropriately chosen many-to-one
`function, it might be hoped that it would be hard to invert, so that even if an
`adversary obtained a piece of plain text and encrypted text, he would still not
`be able to deduce the underlying state of the shift register. However, such codes
`can still be broken (Anderson 1995). Consider a single bit moving along the
`shift register. This bit participates in the creation of h bits of the key string. It
`is possible that these h emitted bits together sometimes give away information
`about the hidden bit. To put it another way, consider the augmented function
`that maps from 2h - 1 successive bits in the shift register to h successive key
`bits. Think of the single bit of the preceding discussion as being the middle bit
`of these 2h - 1 bits; call this middle bit b. Write down all 2h possible outputs
`of this mapping, and run through all 22h-l possible inputs, counting how often
`each output was produced by an input in which b = 1, and how often b = 0. If
`these two counts are un

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket