`Computer Science
`
`1008
`
`Bart Preneel (Ed.)
`
`Fast Software Encryption
`
`Second International Workshop
`Leuven, Belgium, December 1994
`P roceedings
`
`Springer
`
`Page 1 of 24
`
`SAMSUNG EXHIBIT 1007
`
`
`
`Lecture Notes in Computer Science
`Edited by G. Goos, J. Hartmanis and J. van Leeuwen
`
`1008
`
`Advisory Board: W. Brauer D. Gries
`
`J. Stoer
`
`Page 2 of 24
`
`
`
`Bart Preneel (Ed.)
`
`Fast Software
`,,
`Encryption
`
`Second International Workshop
`Leuven, Belgium, December 14.:.16, 1994
`Proceedings
`
`Springer
`
`Page 3 of 24
`
`
`
`Series Editors
`Gerhard Goos
`Universitat Karlsruhe
`Vincenz-Priessnitz-Stra8e 3, D-76128 Karlsruhe, Germany
`
`Juris Hartmanis
`Department of Computer Science, Cornell University
`4130 Upson Hall, Ithaca, NY 14853, USA
`Jan van Leeuwen
`
`Volume
`
`tlA,b
`lq
`tA1__5T~3-r
`(Cf'q5
`o O p1 I
`
`Department Ele
`M, Katholieke Universiteit Leuven
`Kardinaal Mercierlaan 94, B-3001 Heverlee, Belgium
`
`Cataloging-in-Publication data applied for
`
`Die Deutsche Bibliothek - CIP-Einheitsaufnahme
`
`Fast software encry1>tion : second international workshop,
`Leuven, Belgium, December 14 - l6, 1994 ; proceedings / Bart
`Preneel (ed.). - Berlin ; Heidelberg ; New York ; Barcelona;
`Budapest : Hong Kong : London ; Milan ; Paris ; Tokyo
`Springer, 1995
`(Lecture notes in computer science ; Vol. 1008)
`ISBN 3-540-60590-8
`NE: Preneel, Bart [Hrsg.]; GT
`
`I
`-,, \ <
`
`✓
`
`, ·f ✓ I
`, _ l:J
`.
`
`r
`
`CR Subject Classification (1991): E.3, E2.l, E.4, G.2.1, G.4
`
`ISBN 3-540-60590-8 Springer-Verlag Berlin Heidelberg New York
`
`TI1i~ work i subjec1 10 copy right. All rlghti nre reserved, whether 1hc whole or part of lhe materin l is
`concerned, ~pec!ncoll y the right ~ of trn n~lation. rc11rin1ing , rc•usc or il lustrations. recitation, brondcnstlng,
`re produc ti on o n mlcrofi lrns or in any other way . und storage In dntn bn nks, Dupli ca1lon uf thi s pu blicn1ion
`or rnrt thereof is permitted onl y under the provL ion~ or lhc Ge.rmnn Copyri gh1 La w of September 9. 1965.
`in it current version. and permission for use mus t a lways he obrni ncd fro m Spri nger -Verlag , Violatio n ore
`liable for prosec111ion under the Germn n opyrfght Low .
`
`© Springer-Verlag Berlin Heidelberg 1995
`Printed in Germany
`
`Typesetting: Camera-ready by author
`SPIN 10487173
`06/3142 - 5 4 3 2 IO
`
`Printed on acid-free'paper
`
`Page 4 of 24
`
`
`
`\
`
`CONTENTS
`
`Introdu ction . . ... . . .......... .. . . .. . . . . . . . .... . . . . . . . .. . . ... . .......... . .. . 1
`,
`Bart Preneel
`
`Session 1: Stream Ciphers-Design
`Chair : Dieter Gollmann
`
`Clock-controlled pseudorandom generators on finite groups .... . . . ... . . . . ... 6
`Ulrich Baum and Simon Blackburn
`
`On random mappings and random permutations . ......... . ... .. . . . . ... .. . 22
`William G . Chambers
`
`Binary cyclotomic gener~tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
`C unsheng Ding
`
`Construction of bent functions and balanced Boolean functions
`with high nonlinearity ....... . . ... . . ........ ..... ........... ......... . . . . . 61
`Hans Dobbertin
`
`Additive and lin ear structures of cryptographic functions . . . .. .. . . .. . . . . ... 75
`Xuejia Lai
`
`Session 2: Block Ciphers-Design
`Chair: James Massey
`
`Th e R CS encryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
`Ronald L. Rivest
`
`Th e MacGuffln block cipher algorithm ....... . ....... . . . ... . ........ . . . . .. 97
`Matthew Blaze and Bruce Schneier
`
`S-boxes and round functions with controllable linearity and
`differential uniformity ........ ... ........... .... .......... . . ... .. . . ... . . . 111
`Kaisa Nyberg
`
`Properties of linear approximation tables . . . . ... .. . .. . . .... . ...... . . . .... 131
`Luke O'Connor
`
`Page 5 of 24
`
`
`
`VI
`
`Session 3: Stream Ciphers-Cryptanalysis
`Chair: Cunsheng Ding
`
`Searching far the optimum correlation attack . . ... . . . . .. . . . , . . . . . . . . . . . . . 137
`Ross Anderson
`
`A known plaintext attack on the PKZIP stream cipher . . .. .. .. .... . . . .. .. 144
`Eli Biham and Paul C. Kocher
`
`Linear cryptanalysis of stream ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
`Jovan Dj . Golie
`
`Feedback with carry shift registers over finite fields . . . . . . . . . . . . . . . . . . . . . . 170
`Andrew Klapper
`
`A free energy minimization framework for inference problem s
`in modulo 2 arithmetic . . ... . .. . ... . .. . ... .. .. . . . .. . . . ........ ... .. . . .... 179
`David J.C. MacKay
`
`Session 4: Block Ciphers-Differential Cryptanalysis
`Chair: Eli Biham
`
`Truncated and higher order differentials . . . •... ... •. .. . . .. . .. .. .. .. . . .. .. 196
`Lars R . Knudsen
`
`SAFER K-64: One year later . .. . .. . . .. ... .. . . . . .. . . ... . . . . . .. ..... . . .. . . 212
`James L. Massey
`
`Improved characteristics far differential cryptanalysis of hash functions
`based on block ciphers ..... ..... .. ... . .. . .. ...... . . .. . . . ... . .. .... . . . . ... 242
`Vincent Rijmen and Bart Preneel
`
`Session 5: Block Ciphers-Linear Cryptanalysis
`Chair: Bart Preneel
`
`Linear cryptanalysis using multiple approximations and FEAL . . . . .. . . . .. 249
`Burton S. Kaliski Jr. and Matt J.B. Robshaw
`
`Problems with the linear cryptanalysis of DES using more than one
`active S-box per round ... .. . . . . . . ......... .'. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
`Uwe Blocher and Markus Dichtl
`
`Page 6 of 24
`
`
`
`Correlation matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
`Joan Daemen, Rene Govaerts, and Joos Vandewalle
`
`VII
`
`Session 6: Odds and Ends
`Chair: Ross Anderson
`
`On the need for multipermutations: Cryptanalysis of MD4 and SAFER . . 286
`Serge Vaudenay
`How to exploit the intractability of exact TSP for cryptography . .. .... , .. 298
`Stefan Lucks
`
`Session 7: New Algorithms and Protocols
`Chair: Eli Biham
`
`How to reverse engineer an EES device . . .. . . .. . . .... . . . ... , . . . . . . . . . . . . . 305
`Michael Roe
`A fast homophonic coding algorithm based on arithmetic coding . . . . . . . . . 329
`Walter T. Penzhorn
`
`Session 8: Recent Results
`Chair: Bart Preneel
`
`On Fibonacci keystream generators ... .. . . ....... . ......... . . .. . . . .. .. .. . 346
`Ross Anderson
`
`Cryptanalysis of McGuffin
`Vincent Rijmen and Bart Preneel
`Performance of block ciphers and hash functions - one year later . . . . . . . . 359
`Michael Roe
`TEA, a tiny encryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
`David J. Wheeler and Roger M. Needham
`
`353
`
`Author Index . ... . . .... .... . . ... ..... . ... . ... . .. . . .... . . .. .. . . ..... .. 367
`
`Page 7 of 24
`
`
`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`A Free Energy Minimization Framework for
`Inference Problems in Modulo 2 Arithmetic
`
`David J.C . MacKay*
`
`Cavendish Laboratory
`Cambridge CB3 OHE
`United Kindom
`
`mackayCmrao.cam.ac.uk
`
`Abstract. This paper studies the task of inferring a binary vector s
`given noisy observations of the binary vector t = As modulo 2, where A
`is an M x N binary matrix. This task arises in correlation attack on a
`class of stream ciphers and in the decoding of error correcting codes.
`The unknown binary vector is replaced by a real vector of probabilities
`that are optimized by variational free energy minimization. The derived
`algorithms converge in computational time of order between WA and
`NwA, where WA is the number of ls in the matrix A, but convergence
`to the correct solution is not guaranteed.
`Applied to error correcting codes based on sparse matrices A, these al(cid:173)
`gorithms give a system with empirical performance comparable to that
`of BCH and Reed-Muller codes.
`Applied to the inference of the state of a linear feedback shift register
`given the noisy output sequence, the algorithms offer a principled version
`of Meier and Staffelbach's (1989) algorithm B, thereby resolving the open
`problem posed at the end of their paper. The algorithms presented here
`appear to give superior performance.
`
`1 The problem addressed in this paper
`
`Consider three binary vectors: s with components Sn, n = 1 . . . N, and r and n
`with components rm, nm, m = 1 ... M, related by:
`( As + n) mod 2 = r
`
`(1)
`
`where A is a binary matrix. Our task is to infer s given r and A, and given
`assumptions about the statistical properties of s and n.
`This problem arises, for example, in the decoding of a noisy signal transmitted
`using a linear code A. As a simple illustration, the (7,4) Hamming code takes
`
`* Supported by the Royal Society Smithson Research Fellowship
`
`Page 8 of 24
`
`
`
`180
`
`D.J.C. MacKay
`
`N = 4 signal bits, s, and transmits them followed by three parity check bits.
`The M = 7 transmitted symbols are given by t = As mod 2, where
`
`A =
`
`1 0 0 OJ
`0 1 0 0
`0 0 1 0
`0001
`J O l l
`l l O 1
`l 1 1 0
`
`[
`
`The noise vector n describes the comwtion of these bits by the communication
`channel. The received message is r = t + n mod 2. The receiver's task is to infer
`sumed nois properties of the channel. For the Ilaiun1ing
`s given r and the
`large and as I.he
`ode above this is not a. difficult task, but as N and M b ·om
`t.han
`number of ls in the matrix A increase , the inferen e of s in a. time I
`exponential in N becomes more challenging, for g neral A. Indeed, the general
`decoding problem for linear codes is NP-complete (Berlekamp, McEliece and van
`Tilborg 1978) .
`in the inference of the se(cid:173)
`The probl em defined in •quation (1) a l o ari e.
`quen e of a linear feedback shift register (LF R) from noisy observations (Meier
`tibaljevic and Golie 1993, An(cid:173)
`and Staff Ibach 1989, 1li.haljevi, aid Golie 1992,
`derson 1995).
`Th is pap r presents a fast algorithm for attempting to solve these tasks. The
`algorithm is similar in spirit to Gallager's (1963) soft decoding algorithm for
`low-density parity check codes.
`
`Assumptions
`
`and n is separable, i.e.,
`I assume that the prior probability distribution of
`P(s, n) = P(s)P (n) = Tin P(sn) Tim P(nm)- Defining I.he transmission t(s) =
`As mod 2, the probability of I.he observed data r as a funct.ion of s (the 'likelihood
`function') can be written:
`
`m
`
`where e is a vector of probabilities indexed by m given by em = P(nm = 1) if
`rm= 0 and em = P(nm = 0) if 7'm = 1. This likelihood function is fundamental to
`any probabilistic approach. The log likelihood can be written:
`
`logP(ris, A)= 2)m(s) l o g ~ + const.
`1- em
`= 2)m(s) 9m(em) + const.
`
`m
`
`m
`
`(2)
`
`(3)
`
`where 9m(em) = log[em/(1 - em)). Thus the natural
`distance between t and e is I:m tmgm(em),
`
`1
`norm for measuring the
`
`Page 9 of 24
`
`
`
`A Free Energy Minimization Framework for Inference Problems
`
`181
`
`The task is to infer s given A and the data r. The posterior distribution of
`5 is, by Bayes's theorem:
`
`P ( I A) = P(rls, A)P(s)
`
`s r,
`
`P(rlA)
`
`.
`
`(4)
`
`I assume our aim is to find the most probable s, i.e., the· s that maximizes the
`expression ( 4) over s. I assume however that an exhaustive search over the 2N
`possible sequences s is not permitted. One way to attack a discrete combinato(cid:173)
`rial problem is to create a related problem in which the discrete variables s are
`replaced by real variables, over which a continuous optimization can then be per(cid:173)
`formed [see for example (Hopfield and Tank 1985, Aiyer, Niranjan and Fallside
`1990, Durbin and Willshaw 1987, Peterson and Soderberg 1989, Gee and Prager
`1994, Blake and Zisserman 1987)]. In the present context, the question then is
`'how should one generalize the posterior probability ( 4) to the case where s is
`replaced by a vector with real components?' An appealing way to answer this
`question is to derive the continuous representation in terms of an approximation
`to probabilistic inference.
`
`2 Free Energy Minimization
`
`The variational free energy minimization method (Feynman 1972) takes an 'awk(cid:173)
`ward' probability distribution such as the one in ( 4), and attempts to approxi(cid:173)
`mate it by a simpler distribution Q(s; 0), parameterized by a vector of parame(cid:173)
`ters 0. For brevity in the following general description I will denote the complex
`probability distribution P(slA, r) by P(s). The measure of quality of the ap(cid:173)
`proximating distribution is the variational free energy,
`
`Q(s;0)
`~
`F(0) = L., Q(s; 0) log 7(;f ·
`s
`
`The function F(0) has a lower bound of zero which is attained only by a 0 such
`that Q(s; 0) = P(s) for all s. Alternatively, if P(s) is not normalized, and we
`define Z = I:s P(s), then F has a lower bound of - log Z, attained only by 0
`such that Q = P/Z. The variational method used in this paper is traditionally
`used in statistical Physics to estimate log Z, but here, log Z is just an additive
`constant which we ignore.
`When Q has a sufficiently simple form, the optimization of F over 0 may
`be a tractable problem, even though F is defined by a sum over all values of
`s. We find a 0* that minimizes F(0) in the hope that the approximating distri(cid:173)
`bution Q(s;0*) will give useful information about P(s). Specifically, we might
`hope that the s that maximizes Q(s; 0*) is a good guess for the s that maximizes
`P(s). Generically, free energy minimization produces approximating distribu(cid:173)
`tions Q(s; 0*) that are more compact than the true distribution P(s).
`
`Page 10 of 24
`
`
`
`182
`
`D.J.C. MacKay
`
`3 Free Energy Minimization for Equation ( 4)
`I take Q to be a separable distribution,
`
`n
`with 0n defining the probabilities qn thus:
`1
`9 = q~.
`1
`9 = q~; qn(Sn = O; 0n) = l
`qn(Sn = l; 0n) =
`+en
`l+e-n
`This is a nice parameterization because the log probability ratio is 0n = log(qA/ q~).
`The variational free energy is defined to be:
`'°'
`Q(s ;O)
`F(0) = ~ Q(s; 0) log P(rls , A)P(s) .
`I now derive an algorithm for computing F and its gradient with respect to 0 in
`a time that is proportional to the weight of A, WA (i.e., the number of ones in
`A). The free energy separates into three terms, F(0) = EL(O) + Ep(0) - S(0),
`where the 'entropy' is:
`S(O) = - LQ(s;0)logQ(s;0) = - L [q~logq~ + q~logq~],
`s
`with derivative: -/f;:S(O) = -q~q~On; the 'prior energy' is:
`Ep(O) = - LQ(s;O)logP(s) = - Lbnq~
`
`n
`
`n
`where bn = log[P(sn=l)/P(sn=O)], and has derivative atEp(0) = -q~qAbn;
`and the 'likelihood energy' is:
`EL(0) = - L Q(s; 0) log P(rls, A)= - LYm L Q(s; 0) tm(s) + const.
`m
`s
`The additive constant is independent of 0 and will now be omitted. To evaluate
`EL, we need the average value of tm (s) under the separable distribution Q(s; 0),
`that is, the probability that I:n AmnSn mod 2 = 1 under that distribution.
`
`S
`
`S
`
`Forward algorithm. We can compute Lhis probability for each m by a recur(cid:173)
`sion involving a sequence of probabilities P:n,., and p~1 ., for v = l ... N. These
`are defined to be the probabilities that the partial sum t~ = I::~=l AmnSn mod 2
`is equal to 1 and O respectively. These probabilities satisfy:
`}
`+ 1 0
`-
`1·f A
`1 _ 0 1
`q,.,Pm,v-1
`Pm,v - q,.,Pm,v-1
`mv -
`Pm,v = q,.,Pm,v-1 + q.,Pm,v-1
`1 1
`O O
`0
`
`l·
`,
`
`(5)
`
`- p l
`Pl
`m,v - m,v-1
`Po - p(cid:143)
`m,v - m,v-1
`
`} if Am,.,= 0.
`
`Page 11 of 24
`
`
`
`A Free Energy Minimization Framework for Inference Problems
`
`183
`
`The initial condition is P:,,, 0 = 0, P~,o = 1. The desired quantity is obtained in
`a time that is linear in the weight of row m of A:
`L Q(s; 0) tm(s) = p;,.,N.
`s
`The quantity p;,. N is a generalization to a continuous space of the product
`tm == Ams mod 2, 0with the satisfying property that if any one of the contributing
`terms qt is equal to 0.5, then the resulting value of p~ N is 0.5.
`The energy EL is then given by:
`'
`
`m
`
`Reverse algorithm. To obtain the derivative of the energy EL with respect to
`0n it is necessary to perform the same number of computations again. We intro(cid:173)
`duce another 'reverse' sequence of probabilities r:,, v and r~ v for v = N ... 1,
`defined to be the probabilities that the partial sum ti;: = "'£1=v AmnSn mod 2 is
`equal to 1 and O respectively. These can be computed by a recursive procedure
`equivalent to that in equation (5). Now note that P;,.,N can be written thus, for
`any n:
`Pm,N = qn Pm,n-1 r m,ntl + Pm,n--1 r m,ntl + qn Pm,n-1 r m,ntl + Pm,n-1 r m,ntl
`·
`Having pulled out the 0n dependence (in the two factors q~ and q~), it is easy
`to obtain the derivative:
`
`0
`
`1
`
`)
`
`1(1
`
`1
`
`0
`
`0
`
`)
`
`1
`
`0(1
`
`0
`
`+D O ) (1
`d
`h
`- ( 1
`1
`0
`W ere mn -
`Pm,n-1 r m,ntl Pm,n-1 r m,ntl
`Pm,n-1 r m,ntl
`-
`
`+D 1 )
`Pm,n-1 r m,ntl ·
`
`Total derivative
`
`The derivative of the free energy is:
`
`{)On = qn qn 0n - bn - L;;: 9m dmn
`{)F
`
`O 1 [
`
`"
`
`(6)
`
`l
`
`•
`
`Optimizers
`
`This derivative can be inserted into a variety of continuous optimizers. I have
`implemented both conjugate gradient optimizers and 'reestimation' optimizers
`and found the latter, which I now describe, to be superior. The reestimation
`method is motivated by the form of the derivative (6); setting it to zero, we
`obtain the iterative prescription:
`
`0n :== bn + LYmdmn,
`m
`
`Page 12 of 24
`
`
`
`184
`
`D.J.C. MacKay
`
`which I call the reestimation optimizer. It can be implemented with either syn(cid:173)
`chronous or sequential dynamics, that is, we can update all 0n simultaneously,
`or one at a time. The sequential reestimation optimizer is guaranteed to reduce
`the free energy on every step, because everything to the right of 0n in equation
`(6) is independent of 0n. The sequential opt imi zer can be effi ciently interleaved
`with the reverse recursion; the reverse probability rt is evaluated from r,¾_1 for
`each m after 0v has been updated. The synchronous optimizer does not have an
`associated Lyapunov function, so it is not guaranteed to be a stable optimizer,
`but empirically it sometimes performs better.
`Optimizers of the free energy can be modified by introducing 'annealing'
`or 'graduated non-convexity' techniques (Blake and Zisserman 1987, Van den
`Bout and Miller 1989), in which the non-convexity of the objective function F
`is switched on gradually by varying an inverse temperature parameter (3 from
`0 to 1. This annealing procedm:e . is intended to prevent the algorithm running
`headlong into the minimum that 'the initial gradient points towards. We define:
`F(0, (3) = (3EL(0) + (3pEp(0) - S(0),
`and perform a sequence of minimizations of this function with successively larger
`values of (3, each starting where the previous one stopped. If we choose (3p = f3
`then f3 influences both the likelihood energy and the prior energy, and if f]p = l
`then (3 influences just the likelihood energy. The gradient of F(0, (3) with respect
`to 0 is identical to the gradient of Fin equation (6) except that the energy terms
`are multiplied by (3p and (3. This annealing procedure is deterministic and does
`not involve any simulated noise.
`
`(7)
`
`Comments
`
`None of these algorithms is expected to work in all cases, because a) there may
`be multiple free energy optima; b) the globally optimal s might not be associated
`with any of these free energy optima; c) the task is a probabilistic task, so even
`an exhaustive search is not guaranteed always to identify the correct vector s.
`Any particular problem can be reparameterized into _other representations
`s' = sU with new matrices A' = u- 1 A. The success of the algorithms is
`expected to depend crucially on the choice of representation. The algorithms are
`expected to work best if A is sparse and the true posterior distribution over s
`is close to separable.
`
`Computational complexity
`
`T he gradien t of t he free energy can be calcul ated in time linear in WA- T h
`algorit hms are expected to take of order 1., or at most N gradient evaluation
`to co1w rge, so hat the otal t.ime tak n is of order between W A and WA N .
`t he most
`The space requirements of'the sequen ial reestimaLion opt imizer ar
`demanding (but not severe): memory proportional to WA is required.
`
`Page 13 of 24
`
`
`
`A Free Energy Minimization Framework for Inference Problems
`
`185
`
`M
`
`0-5
`
`0.45
`
`0.4
`
`0 .35
`
`No ilIU.lealing
`
`• •
`
`,l'
`
`. .
`•
`. .
`.
`.;I'-
`0,3 .
`
`..,,.. . . ,l' . 0.45
`.;I'- • . ,l' . . . •'
`.
`.
`.
`. . .
`.;I'- .
`0 .35 . .
`
`0,5
`
`0.4
`
`.
`
`With annealing
`
`•
`
`.;I'-
`
`0,3
`
`. o
`fl
`rf1' cf''
`0.25
`02 # # _d'
`0.15 # # 0
`0. 1 #### tP
`0.05 #######tP 0
`0
`
`025
`
`0
`
`(cid:143)
`(cid:143)
`02 # cl' (cid:143) "
`0 .15 cl'#" (cid:143) d'' "
`,d'
`"
`,p' ,p' DO cf'
`0,05 cl'cl'##d' d' # d' d'
`
`0.1
`
`0
`
`(cid:143)
`
`500
`
`0
`
`0
`
`0.5
`
`0.45
`
`,l'
`
`.
`.
`
`20
`
`25
`
`10
`
`15
`
`•' .. •
`
`0
`05
`
`0.45
`
`0.4
`
`5
`
`.
`.
`.
`
`D
`
`D
`
`10
`
`15
`
`0
`
`0
`
`,o
`
`25
`
`20
`
`. • .
`
`(cid:143)
`
`0
`
`0
`
`0.4
`0.35 #
`(cid:143)
`d' cl'
`0.3
`cP' # d'
`0.25
`02 ## a
`0,15 cl'##d'
`0,1 #####
`0 .05 # # # # (cid:143)
`(cid:143)
`0
`
`1000
`
`0
`
`0.5
`
`0.45
`
`.
`
`10
`
`0
`
`15
`
`.
`
`. " d' d'
`0 " d'
`"
`
`,fl
`035
`,I'
`0.3 cl'#
`cP' #
`a
`0.25
`02 ###
`0.15 ###a d' a
`0.1 #### " # d'
`005 # # # # # cl' a" cfl' a
`0
`
`10
`
`15
`
`20
`
`0
`
`0,5
`
`0 45
`
`,l'
`
`0
`
`0.4
`035 #
`03 #cl'
`cP' # d'
`0 25
`0.2 ###
`a
`0,15 #### (cid:143)
`(cid:143)
`0. 1 ##### a
`" (cid:143)
`005 ######## "
`s
`
`0
`
`0
`
`0
`
`0
`
`10
`
`15
`
`20
`
`a
`
`(cid:143)
`
`25
`
`"
`(cid:143)
`
`25
`
`20
`
`25
`
`a D
`0
`
`a
`
`25
`
`a
`
`0
`
`a
`
`20
`
`0.4
`0 .35 #
`cl'
`0.3 ##
`0.25 # # 0
`a
`0.2 ## D
`0.15 ###a d'
`rfl' # DD tf'
`0.1
`0
`0
`,p',p',p',p',p'D•o tf'
`
`0
`
`0
`
`0
`
`0.05
`
`2000
`
`0
`
`0
`
`5
`
`10
`
`15
`
`Fig. 1. Performance of synchronous reestimation algorithm without and with anneal(cid:173)
`ing: N=SO
`Horizontal axis: average number of ls per row of A. Vertical axis: noise level f n• Three
`experiments were conducted at a grid of values. Outcome is represented by: box =
`'correct'; star= 'ok'; dot= 'wrong', Two points in each triplet have been displaced so
`that they do not overlap. The horizontal line on each graph indicates an information
`theoretic bound on the noise level beyond which the task is not expected to be soluble.
`
`4 Demonstration on Random Sparse Codes A
`
`Mock data were created as follows . The first N lines of A were set to the identity
`matrix, and of the remaining bits, a fraction f A were randomly set to 1. This
`matrix can be viewed as defining a systematic error correcting code in which
`the signal s is transmitted, followed by (M - N) sparse parity checks. Each
`component of s was set to 1 with probability 0.5, The vector t = As mod 2
`
`Page 14 of 24
`
`
`
`186
`
`D.J.C. MacKay
`
`was calculated and each of its bits was flipped with noise probability f n. Four
`parameters were varied: the vector length N, the number of measurements M
`the noise level fn of the measurements, and the density f A of the matrix A.
`'
`When optimizing 0 the following procedure was adopted. The initial condi(cid:173)
`tion was 0n = 0, Vn. If annealing was used, a sequence of 20 optimizations was
`performed, with values of /3 increasing linearly from 0.1 to 1.0. Without anneal(cid:173)
`ing, just a single optimization was performed with f3 = 1. The optimizers were
`iterated until the gradient had magnitude smaller than a predefined threshold ,
`or the change in F was smaller than a threshold, or until a maximum number
`of iterations was completed. At the end, s was guessed using the sign of each
`element of 0.
`We can calculate a bound on the noise level f n beyond which the task is
`definitely not expected to be soluble by equating the capacity of the binary
`symmetric channel, { 1 - H 2 (f n)), to the rate of the code, N / M. Here H 2 (f n) is
`the binary entropy, H2(fn) = -fn log2 fn - {1 -
`fn), This bound
`fn) log2 {1 -
`is indicated on the graphs that follow by a solid horizontal line. For noise levels
`significantly below this bound we expect the correct solution typically to have
`significantly greater likelihood than any other s.
`
`Results
`
`For the results reported here, I set N to 50 and M to 500, 1000 and 2000, and
`ran the synchronous reestimation algorithm multiple times with different seeds,
`with density f A varying from 0.05 to 0.50, and error probability fn varying from
`0.05 to 0.45. In each run there are three possible outcomes: 'correct', where the
`answer is equal to the true vectors; 'wrong', where the answer is not equal to the
`true vector, and has a smaller likelihood than it; and 'ok', where the algorithm
`has found an answer with greater likelihood than the true vector (in which case,
`one cannot complain).
`Annealing helps significantly when conjugate gradient optimizers are used,
`but does not seem to make much difference to the performance of the reestima(cid:173)
`tion algorithms. As was already mentioned, the reestimators work much better
`than the conjugate gradient minimizers ( even with annealing).
`Figure 1 shows the outcomes as a function of the weight of A (x-axis) and
`error probability (y-axis). There seems to be a sharp transition from solvability
`to unsolvability. It is not clear whether this boundary constitutes a fundamental
`bound on what free energy minimization can achieve; performance might possi(cid:173)
`bly be improved by smarter optimizers. Another idea would be to make a hybrid
`of discrete search methods with a free energy minimizer.
`Experiments with larger values of N h~ve also been conducted. The success
`region looks the same if plotted as a function of of the ii.verage number of ls per
`row of A and the noise level.
`
`Page 15 of 24
`
`
`
`A Free Energy Minimization Framework for Inference Problems
`
`187
`
`Table 1. Error rates of error correcting codes using free energy minimization.
`a) Sparse random code matrix (section 4). These results use N = 100 and f A = 0.05,
`and various noise levels f n and M = 400, 1000, 2000. The synchronous reestimation op(cid:173)
`timizer was used without annealing, and with a maximum of 50 gradient evaluations.
`For every run a new random matrix, signal and noise vector were created. One thou(cid:173)
`sand runs were made for each set of parameters. The capacity of a binary symmetric
`channel with fn = 0.l, 0.15, 0.2 is 0.53, 0.39, 0.28 respectively.
`b) LFSR code (section 5). The number of taps was 5, selected at random. The syn(cid:173)
`chronous reestimation optimizer was used with the annealing procedure described in
`section 5.
`
`a)
`
`fn N M Number Information
`of errors
`rate
`0.25
`0.1 100 400 14/1000
`0.1 100 1000 0/1000
`0.1
`0.15 100 1000 3/1000
`0.1
`0.2 100 1000 54/1000
`0.1
`0.2 100 2000 11/1000
`0.05
`
`b)
`
`fn
`
`k N Number Information
`of errors
`rate
`0.1 100 400 69/1000
`0.25
`0.15 100 1000 1/1000
`0.1
`0.2 100 2000 26/1000
`0.05
`
`Table 2, BCH codes and Reed-Muller codes.
`For each noise level f n = 0.l, 0.15, 0.2, I give n, k, t for selected BCH codes and Reed(cid:173)
`Muller codes with f:block = P(more than t errorsln) < 0.1, and rate > 0.05, 0.03, 0.01
`respectively, ranked by f:block. Values of n, k, t from Peterson and Weldon (1972).
`
`fn = O.l
`..__n- -k~ _t _k_6_io_c_k_ r_at_e_______.1-- n-
`
`II
`
`fn = 0.2
`in= 0.15
`-k-'--t-,-k--,6"'lo'"'c"'k-r-at-e--1 1---n-k~-t..,.lf:--.6,..lo'"'c"'k- r-at-e--1
`
`II
`
`BCH CODES
`63 10 13 0.003 0.159
`127 29 21 0.008 0.228
`1023 153 125 0.009 0.150
`63 16 11 0.021 0.254
`511 85 63 0.037 0.166
`REED-MULLER CODES
`8 31 8.8e-7 0.063
`128
`64
`7 15 4.4e-4 0.109
`1024 56 127 0.0055 0.055
`32
`6
`7 0.012 0.188
`512 46 63 0.038 0.0898
`16
`5
`3 0.068 0.313
`
`BCH CODES
`BCH CODES
`8 31 0.002 0.063 1023 26 239 0.004 0.025
`127
`511 40 95 0.011 0.D78
`255 9 63 0.028 0.035
`63
`7 15 0.021 0.111
`511 19 119 0.030 0.037
`127 15 27 0.022 0.118
`255 13 59 0.093 0.051
`255 21 55 0.002 0.082 1023 46 219 0.123 0.045
`1023 91 181 0.008 0.089
`511 28 111 0.152 0.055
`1023 101 175 0.028 0.099 REED-MULLER CODES
`255 29 47 0.056 0.114 1024 11 255 5.7e-5 0.011
`REED-MULLER CODES
`512 10 127 0.0034 0.020
`256
`9 63 2e-5
`0.035
`128 8 31 0.098 0.063
`8 31 0.0021 0.063
`128
`7 0.096 0.188
`32
`6
`
`Page 16 of 24
`
`
`
`188
`
`D.J.C. MacKay
`
`Potential of this method for error correcting codes
`
`Might an error correcting code using this free energy minimization algorithm be
`practically useful? I restrict attention to the ideal case of a binary symmetric
`channel and make comparisons with BCH codes, which are described by Peterson
`and Weldon (1972) as "the best known constructive codes" for memoryless noisy
`channels, and with Reed-Muller (RM) codes. These are multiple random error
`correcting codes that can be characterized by three parameters ( n, k, t). The
`block length is n (this section's M), of which k bits are data bits (this section's
`N) and the remainder are parity bits. Up to t errors can be corrected in one
`block. How do the information rate and probability of error of a sparse random
`code using free energy minimization compare with those of BCH codes?
`To estimate the probability of error of the present algorithm I made one
`thousand runs with N = 100 and f A = 0.05 for a small set of values of M and
`fn (table la). A theoretical analysis will be given, along with a number of other
`codes using free energy minimization, in (MacKay and Neal 1995).
`For comparison, table 2 shows the best BCH and RM codes appropriate for
`the same noise levels, giving their probability of block error and their rate. I as(cid:173)
`sumed that the probability of error for these codes was simply the probability of
`more than t errors in n bits. In principle, it may be possible in some cases to make
`a BCH decoder that corrects more than t errors, but according to Berlekamp
`(1968), "little is known about ... how to go about finding the solutions" and "if
`there are more than t + 1 errors then the situation gets very complicated very
`quickly."
`Comparing tables la and 2 it seems that the new code performs as well as
`or better than equivalent BCH and RM codes, in that no BCH or RM code has
`both a greater information rate and a smaller probability of block error.
`The complexity of the BCH decoding algorithm is n log n (here, n ¢? M),
`whereas that of the FE algorithm is believed to be WA(r)M where WA(r) is the
`number of ls per row of A, or at worst WA(r)MN. There is therefore no obvious
`computational disadvantage.
`
`5 Application to a cryptanalysis problem
`
`The inference of the state of a shift register from probabilistic observations of
`the output sequence is a task arising in certain code breaking problems (Meier
`and Staffelbach 1989, Anderson 1995).
`A cheap 'stream cipher' for binary communication is obtained by sending
`the bitwise sum modulo 2 of the plain text to be communicated and one bit
`of a linear feedback shift register (LFSR) of length k bits that is configured to
`produce an extremely long sequence of pseudo-random bits. This cipher can be
`broken by an adversary who obtains part of the plain text and the corresponding
`encrypted message. Given k bits of plain text, the state of the shift register can
`be deduced, and the entire encrypted message decoded. Even without a piece
`of plain text, an adversary may be able to break this code if the plain text has
`
`Page 17 of 24
`
`
`
`A Free Energy Minimization Framework for Inference Problems
`
`189
`
`high redundancy (for example, if it is an ASCII file containing English words),
`by guessing part of the plain text.
`To defeat this simple attack, the cipher may be modified as follows. Instead
`of using one bit of the shift register as the key for encryption, the key is defined
`to be a binary function of a subset of bits of the shift register. Let the number of
`bits in that subset be h. If this function is an appropriately chosen many-to-one
`function, it might be hoped that it would be hard to invert, so that even if an
`adversary obtained a piece of plain text and encrypted text, he would still not
`be able to deduce the underlying state of the shift register. However, such codes
`can still be broken (Anderson 1995). Consider a single bit moving along the
`shift register. This bit participates in the creation of h bits of the key string. It
`is possible that these h emitted bits together sometimes give away information
`about the hidden bit. To put it another way, consider the augmented function
`that maps from 2h - 1 successive bits in the shift register to h successive key
`bits. Think of the single bit of the preceding discussion as being the middle bit
`of these 2h - 1 bits; call this middle bit b. Write down all 2h possible outputs
`of this mapping, and run through all 22h-l possible inputs, counting how often
`each output was produced by an input in which b = 1, and how often b = 0. If
`these two counts are un