`
`David J. C. :\Iac:Kay (mackay@mrao. cam. ac. uk)
`Cav~ndish Laboratory. Cambridge, CB3 OHE, Cnitcd Kingdom.
`
`Abstract
`
`In 1948, Claude Shannon posed and solwd one of the fundamental
`problems of information theory. The question \Vas ·whether it is possible
`to c:onnnunicate reliably over noisy channels, and, if so, at \vhat rate.
`He rlcfincd a theoretical limit, nmv lmmvn as the Shannon limit., up
`to \vhich communication is possible, and beyond >vhich communication
`is not possible. Since 1948, coding theorists have attempted to design
`error-correcting codes capable of getting dose to the Shannon limit.
`In the last decade remarkable progress has been made using codes
`that are defined in terms of sparse random graphs, and ·which are de(cid:173)
`coded by a simple probability based message passing algorithm.
`This paper revie\vs lmv-densit.y parity-check codes (Gallager codes),
`repeat accumulate codes, and turbo codes, emphasising recent advances.
`Some previously unpublished results arc then presented, describing (a)
`experiments on Gallager codes \vith small blocklengths; (b) a. stopping
`rule for decoding of repeat-accumulate codes, \vhich saves computer
`time and allmvs block decoding errors to be detected and fiagged; and
`(c) the empirical pmver laws obeyed by decoding times of sparse graph
`codes.
`
`Introduction
`1
`The central problem of communication theory is to construct an encoding and
`a decoding syRtem that make it possible to cornrnunicate reliably over a IlOiRy
`channel. The encoding RyRt.em uRes t.he source data to Relect. a codev·.ronl
`from a set of codevwrds. The decoding algorithm ideally infers, given the
`output of the channel, which codeword in the code is the most likely to have
`been transmitted; for an appropriate definition of distance, this is the 'closest'
`code\vord t.o the received signal. A good code iR one in \vhich t.he codev·.ronlR
`are \Vell spaced apart, so that codewords are unlikely to be confused.
`Designing a good and practical error correcting code is difficult because
`(a) it is hard to find an explicit set of well-spaced codewords; and (b) for a
`generic code, decoding, i.e., finding the closest code\vord to a received signal,
`iR intractable.
`However, a simple method for designing codes, first pioneered by Gallager
`(1962), has recently been rediscovered and generali7;ed. The codes are de-
`
`1
`
`Hughes, Exh. 1041, p. 1
`
`
`
`fined in terms of span;e random graphs. Because t.he graphs are constructed
`randomly, the codes are likely to have well spaced code1vords. And because
`the codes~ constraints arc defined by a sparse graph, the decoding problem
`can be solved -
`almost optimally -
`by message-passing on the graph. The
`practical performance of Gallager's codes and t.heir modern cousins is ·vast.ly
`better than the performance of the codes with 1vhich textbooks have been
`filled in the intervening years.
`
`2 Sparse Graph Codes
`In a sparse graph code, the nodes in t.he graph represent. the transmitted
`bits and the constraints t.hey satisfy. For a linear code 1vith a codeword length
`.N and rateR = J(jl\1, the number of constraints is of order J.\1 = l\1 - J(.
`[There could be more constraints, ifthey happen to be redundant.] Any linear
`code can be described by a graph, but what makes a sparse graph code special
`is that each constraint. involves only a small nurnber of variables in the graph:
`the number of edges in the graph scales roughly linearly 1vith .i.V ~ rather than
`as J.V 2
`.
`The graph defining a low-density parity-check code, or Gallager code
`(Gallager 1962; Gallager 1963; MacKay 1999), contains two types of node:
`code\vord bits, and parit.y constraints. In a regular (j, k) Gallager code (fig(cid:173)
`ure la), each codeword bit is connected to j parity constraints and each
`constraint is connected to k bits. The connections in the graph are made at
`random.
`Repeat-accumulate codes (Divsalar et al. 1998) can be represented
`by a graph with four types of node (figure 1b): equality const.raint.s 0, in(cid:173)
`termediate binary variables (black circles), parity constraints 0' and the
`transmitted bits (white circles). The encoder sets each group of intermediate
`bits to values read from the source. These bits arc put through a fixed random
`permutation. The transmitted stream is the accumulated sum (modulo 2) of
`the permuted intermediate bits.
`In a turbo code (Berrou and Glavieux 1996)~ the J( source bits drive hvo
`linear feedback shift registers, which emit parity bits (figure 1c).
`All these codes can be decoded b)' a local message-passing algorithm on
`the graph, t.he sum-product algorithm (1\IacKay and ~eal 1996; 1\IcEliece
`et al. 1998), and, \vhile t.his algorithm is not the optimal decoder, t.he empirical
`results are record breaking. Figure 2 shows the performance of various sparse
`graph codes on a Gaussian channel. In figure 2(a) turbo codes with rate 1/4
`arc compared with regular and irregular Gallager codes over GF(2), GF(8)
`and GF(16). In figure 2(b) the performance of repeat.-accurnulat.e codes of
`various blocklengths and rate 1/3 is shmvn.
`
`THE BEST SPARSE GRAPH CODES
`'Vhich of the three types of sparse graph code is "best' depends on the chosen
`rate and blocklength, the permitted encoding and decoding complexity, and
`
`2
`
`Hughes, Exh. 1041, p.2
`
`
`
`(a) Gallager code
`
`(b) Repeat-accumulate code
`
`,-----------ffi~ t(b)
`
`t
`Z4 (IJj Z3 (IJj Z2 (IJj Zr (IJj Zo
`s-cB-s-1£ :raJ
`
`(c2)
`
`(21/37)s recursive con(cid:173)
`volutional code
`
`( c1) Turbo code
`
`Figure 1. Graphs of three sparse graph codes.
`(a) A rate 1/4 lmv density parity check code (Gallager code) \vith
`blocklength N = 16, and j\f = 12 constraints. Each >vhite circle rep(cid:173)
`resents a transmitted bit. Each bit participates in j = 3 constraints,
`represented by G squares. Each G constraint forces the sum of the
`k = 4 bits to \vhich it is connected to be ewn.
`(h) A repcat-accnmulat.c code with rate 1/3. Each white circle rep(cid:173)
`resents a. transmitted bit. Each black circle represents an intermediate
`binary variable. Each G constraint forces the variables to \vhich it. is
`connected to be equaL
`(c) A turbo code \vith rate 1/3. (c:l) The circles represent the code(cid:173)
`word bits. The t\VO rectangles represent rate 1/2 convolutional codes
`( c2), \Vith the systematic bits { t(" l} occupying the left half of the rect(cid:173)
`angle anrl the parity bits {t(b)} occupying the right. half.
`
`3
`
`Hughes, Exh. 1041, p.3
`
`
`
`0.1
`
`0.01
`
`0.001
`
`0.0001
`
`1e-05
`
`1e-06
`
`Reg
`GF(16)
`
`Irreg
`GF(2)
`
`Reg
`GF(2)
`
`Turbo
`
`Irreg GF(8)
`
`-0.2
`
`0
`
`0.4
`0.2
`Eb/No (dB)
`
`0.6
`
`0.8
`
`1
`
`0.1
`
`0.01
`
`0.001
`
`0.0001
`
`1e-05
`
`total
`undetected
`
`N=204
`
`N=408
`
`N=816
`
`1
`
`N=30000
`
`2
`
`N=9999
`
`3
`
`N=3000
`
`4
`
`5
`
`
`
`1
`
`0.1
`
`0.01
`
`0.001
`
`0.0001
`
`1e-05
`
`1e-06
`
`1e-07
`
`3
`
`Gallager(0.936)
`Shannon(0.936)
`RS(0.942)
`RS(0.935)
`BCH(0.941)
`BCH(0.932)
`
`4
`
`5
`
`6
`
`7
`
`8
`
`1
`
`0.1
`
`0.01
`
`0.001
`
`0.0001
`
`1e-05
`
`Gallager, GF(16)
`RS(252,224)
`RS (2 interleaved)
`
`0.007
`
`0.01
`
`0.02
`
`0.03
`
`0.04
`
`
`
`DIFFERENCE’SET CYCLIC CODES
`
`27
`A!
`
`7
`4
`
`K 3
`d
`4
`k
`3
`
`21
`10
`
`11
`6
`5
`
`73
`28
`
`45
`10
`9
`
`273
`82
`
`191
`18
`17
`
`1057
`244
`
`4161
`730
`
`813
`34
`33
`
`3431
`66
`65
`
`1
`
`0'1
`0.1
`
`001
`0.01
`
`0.001
`0.001
`
`Gallager(273,82) —
`Gallager(273,82)
`DSC<273.82) -+--
`DSC(273,82)
`
`0.0001
`
`0.0001
`1.5
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`Figure 4. Differenceiset cyclic codes 7 lowidensity parityicheck codes satisfying
`many redundant constraints 7 outperform equivalent Gallager codes.
`The table shows the N, M, K, distance d, and row weight k of some
`differenceiset cyclic codes, highlighting the codes that have large d/N,
`small k, and large N/M. All DSC codes satisfy N constraints of weight
`k.
`In the comparison the Gallager code had (7', k) = (4,13), and rate
`identical to the DSC code. Vertical axis: block error probability; hor-
`izontal axis: Eb/NO/dB. Data on DSC code performance provided by
`R. Lucas and M. Fossorier.
`
`also be beaten by irregular Gallager codes defined over finite fields GF (q)
`(Davey and MacKay 1998). However, these successes for Gallager codes have
`only been obtained by careful work, and it is notable how easily simple turbo
`codes and repeatiaccumulate codes achieve almost as good results. The good
`performance of simple turbo codes and repeatiaccumulate codes compared
`with simple Gallager codes can presumably be attributed to the use of state
`variables.
`It seems plausible therefore that the best codes will make use of
`state variables. Probably what is holding up the development of even better
`turbo codes is the need for a theory, comparable to the theory of irregular
`Gallager code design (Urbanke et al. 1999), for the construction of irregular
`graphs with state variables.
`CODES WITH REDUNDANT CONSTRAINTS i 4THE TANNER CHALLENGE7
`
`The performance of Gallager codes can be enhanced by making a nonirandom
`code with redundant sparse constraints (Tanner 1981; Lucas et al. 1999).
`There is a differenceiset cyclic code, for example, that has N = 273, and
`K = 191, but the code satisfies not M = 82 but N, 7.6., 273,
`low-weight
`constraints (figure 4). It is impossible to make random Gallager codes that
`have anywhere near this much redundancy among their checks. The redun—
`dant checks allow the sumiproduct algorithm to work substantially better,
`as shown in figure 4, in which a D50 code outperforms a comparable regular
`binary Gallager code by about 0.7 dB. The (73,45) DSC code has been im-
`plemented on a chip by Karplus and Krit (1991) following a design of Tanner
`(1981). Product codes are another family of codes with redundant constraints.
`
`Hughes, Exh. 1041, p. 6
`
`
`
`1
`
`0.1
`
`0.01
`
`0.001
`
`0.0001
`
`1e-05
`
`1e-06
`
`1
`
`R=1/2 t=3 (33)
`
`total
`undetected
`
`1
`
`0.1
`
`0.01
`
`0.001
`
`0.0001
`
`N=96
`
`N=204
`
`N=408
`
`N=816
`
`N=96
`N=96
`
`N=204
`
`2
`
`3
`
`4
`
`5
`
`6
`
`1e-05
`
`1e-06
`
`1e-07
`
`R=1/3 t=4
`
`total
`undetected
`
`N=48
`
`N=96
`
`N=816
`
`N=408
`
`N=204
`
`N=48
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`
`
`zero probability; nevertheless, they appear to be so rare as to be effectively
`nonexistent. Since this propert.y is an important. advantage of Gallager codes,
`\Ve have explored empirically hmv far regular binary Gallager codes can be
`pushed before undetected errors shmv up. Figure 5 shmvi:> that undetected
`errors occur when \Ve reduce the blocklength J\T to sufficiently small values -
`behJ\V 200 bits in t.he case of rate 1/3 codes \Vit.h j = 4, and behJ\V 400 bit.s
`in the case of rate 1/2 codes \vith j = 3. The frequency of undetected errors
`appears to fall very rapidly vdth increa.f:>ing blocklength and with increasing
`column \Veight j.
`
`4 Stopping rules for the decoding of sparse graph codes
`'Vhen \Ve decode Gallager codes, we t.est. the bit.-by-bit. best guess at each
`iteration to see if it is a valid codevwrd. If it if:>, then our decoder halts.
`Othenvise, the decoder continues, declaring a. failure (a detected error) if f:>ome
`maximum number of iterations (e.g., 200 or 1000) occurs without successful
`decoding.
`In the turbo code and repeat.-accurnulat.e code cmmnunit.y, a different de(cid:173)
`coding procedure is 1videly used. The decoding algorithm is run for a fixed
`number of iterations (irrespective of whether the decoder ha.i:> actually f:>ettled
`in a consistent state at some earlier time) 1 and performance curves arc dis(cid:173)
`played as a funct.ion of the number of it.erat.ions. This practice is \vasteful of
`computer time, and it. blurs the distinction behveen undetected and det.ected
`errors. Undetected errorf:> are of scientific interest because they reveal distance
`properties of a code. And in engineering practice, it \vould seem preferable
`for the detected errors to be labelled as erasures if practically possible -
`undetected errors are a great nuisance in many cmnrnunication cont.exts.
`I therefore demonstrate here that it is possible to det.ect convergence of
`the sum product decoder of a. repeat accumulate code, just a.f:> in the case of
`turbo codes (Frey 1998). This assertion may be found confusing if the role
`of the decoder is viewed as 'finding the most probable state of the source
`bits'. Surely any hypothesis about the J{ source bits is a ·valid hypothesis,
`i:>O how can \Ve detect that the decoder has finished decoding'! The anf:>\Ver is
`that decoding should be viewed as finding the most probable state of all the
`variables in the graph, not just the source bits. Early on in the decoding, there
`\Vill be inconsistencies among the most probable states of internal variables
`in the graph. Only \vhen a sensible decoding has been found will t.here be a
`f:>ta.te of harmony in the netvmrk. [Kote that detecting convergence doesn't
`imply that \Ve get rid of undetected errors; undetected errors vdll f:>till occur
`if the decoder converges to an incorrect codeword.]
`I used the following method to decide when t.o stop t.he decoding of a
`repeat-accumulate code.
`
`8
`
`Hughes, Exh. 1041, p.8
`
`
`
`TTJE STOP WTTE'J" TT'S DO'l"E RULE fOil DECXJDJKG
`REPEAT-ACCCIVIULATE CODES.
`
`1. 'Vhile running the Rum-product algorithm up and dmvn the
`accumulator trellis, note the most probable state at each
`time. This state sequence
`if you unaccumulate it
`defines
`a sequence of guesses for the source bits, with each source bit
`being gueRRed q timeR, \vhere q iR the number of repetitions
`in t.he RA code.
`2. 'Vhen reading out the likelihoods from the trellis~ and com(cid:173)
`bining the q likelihoods for each of the source bits, compute
`the most probable state of each source bit. This is the state
`that maximizeR the product. of the likelihoodR.
`3. If all the guesses in (1) agree with the most probable state
`of the source bit found in (2) then the decoder has reached a
`valid state, so HALT. Othenvise continue.
`
`The cost of these extra operations is small compared to the cost of decod(cid:173)
`ing. Using this procedure, we can distinguish between undetected errors and
`detected errorR in a repeat-accumulate code, as Rhmvn in the reRult.s of figure
`2(b).
`'lle can also evaluate hmv many iterations it takes to decode these codes,
`and quantify the potential savings from applying the above stopping rule. If,
`for example, the old-fashioned method runs all decodings for 20 iterations,
`the hiRtogram in figure 6(b) shmvR that there \vould, for a particular code
`at 0. 75 dB, be a block error probability of about 0.8
`roughly 80% of the
`decodings took more than 20 iterations. Since a small but substantial number
`took more than 40 iterations~ you could run the old-fashioned decoder for
`hvice as long, and the block error probability \vould fall by a factor of ten
`or so. Hmve·ver, uRing t.he stop-v·.rhen-it/s-done decoder, you can use roughly
`the same amount of computer time and get the error probability down by a
`factor of about one hundred.
`"'othing is lost, because (if you log the stopping time in a file) )'OU can
`ahvays recover t.he old-faRhioned graphR if anyone Rtill \vantR t.hem.
`
`5 Empirical distribution of decoding times
`'Ve have investigated the number of iterations T of the sum product algorithm
`required to decode Gallager codes and repeat-accumulate codes. Given one
`code and a Ret of channel conditionR the decoding t.ime varies randomly from
`trial t.o t.rial. "\Ve find that the hiRtogram of decoding t.imes followR a pmver
`la\V 1 P(r) ex r-v, for larger. The po1ver p depends on the signal to noise
`ratio and becomes smaller (so that the distribution is more heavy tailed) as
`the signal to noise ratio decreases.
`
`9
`
`Hughes, Exh. 1041, p.9
`
`
`
`total
`detected
`undetected
`
`1
`
`0.1
`
`0.01
`
`0.001
`
`0.0001
`
`1e-05
`0.3
`
`0.4
`
`0.5
`
`0.6
`
`0.7
`
`0.8
`
`0.9
`
`1
`
`1.1
`
`2000
`
`1800
`
`1600
`
`1400
`
`1200
`
`1000
`
`800
`
`600
`
`400
`
`200
`
`0
`
`0
`
`1000
`
`100
`
`10
`
`1
`
`10
`
`20
`
`40
`
`60
`
`80
`
`100 120 140 160 180
`
`20
`
`30
`
`40
`
`50 60 70 80 90100
`
`3000
`
`2500
`
`2000
`
`1500
`
`1000
`
`500
`
`0
`
`0
`
`1000
`
`100
`
`10
`
`1
`
`10
`
`20
`
`40
`
`60
`
`80
`
`100 120 140 160 180
`
`20
`
`30
`
`40
`
`50
`
`60
`
`70 80 90 100
`
`
`
`25000
`25000
`
`20000
`20000
`
`15000
`15000
`
`10000
`10000
`
`5000
`
`0
`
`0
`0
`
`10
`10
`
`20
`20
`
`30
`30
`
`40
`40
`
`50
`50
`
`(a)
`
`5000
`0
`
`10000
`10000
`1000
`1000
`
`100
`100
`
`1°
`10
`
`1
`
`0.1
`
`10
`
`(b)
`
`100
`
`Figure 7.
`
`(a) Histogram of number of iterations for an irregular binary Gallager
`code, rate 1/2, blocklength N = 9972, at Eb/NO : 1.4 dB. (b) Log/log
`plot of iterations histogram showing that the tail of the distribution is
`well approximated by a power law. The straight line has slope —8.5.
`From MacKay et al. (1999).
`
`We have observed power laws in repeatiaccumulate codes and in irregular
`and regular Gallager codes. Figures 6(ii) and (iii) show the distribution of
`decoding times of a repeatiaccumulate code at two different signalitoinoise
`ratios. The power laws extend over several orders of magnitude. Figure 7
`shows the distribution of decoding times for an irregular Gallager code. The
`tail is well approximated by the power law P(T) ~ 7—85.
`It would be interesting to understand the reason for these power laws.
`
`References
`
`(1996) Near optimum error correcting cod-
`Berrou, C., and Glavieux, A.
`ing and decoding: Turbo-codes.
`IEEE Transactions on Communications
`44: 126171271.
`
`Davey, M. C., and MacKay, D. J. C. (1998) Low density parity check codes
`over GF(q). In Proceedings of the 1998 IEEE' Information Theory Workshop.
`IEEE.
`
`(1998) Coding theorems for
`Divsalar, D., Jin, H., and McEliece, R. J.
`‘turboilike’ codes. In Proceedings of the 36th Allerton Conference on Com-
`munication, Control, and Computing, Sept. 1998, pp. 2017210, Monticello,
`Illinois. Allerton House.
`
`(1998) Graphical Models for Machine Learning and Digital
`Frey, B. J.
`Communication. Cambridge MA.: MIT Press.
`
`11
`
`Hughes, Exh. 1041, p. 11
`
`
`
`Gallager, R. G. (1962) Low density parity check codes. IRE Trans. Info.
`Theor·y IT-8: 21-28.
`
`Gallager. R. G. (1963) Low Density Parity Cheek Codes. 1'\urnber 21 in
`Research monograph f:>eries. Cambridge, l\Iass.: 1\HT Prei:\i:\.
`
`Karplus, K., and Krit, H. (1991) A semi-systolic decoder for the PDSC-73
`error-correcting code. Discrete Applied Mathematics 33: 109-128.
`
`Luby, ~'vi. G., ~'viitzenrnacher, ~'vi., Shokrollahi, 1\L A., and Spielman, D. A.
`(1998) Improved low density parity check codes Uf:>ing irregular graphi:> and
`belief propagation. In Proceedings of the IEEE Inter·national Symposi-um on
`Information Theory {!SIT), p. 117.
`
`Lucas, R., Fossorier, 1\L, Kou, Y., and Lin, S., (1999) It.erat.ive decoding of
`one-step majority logic decodable codes based on belief propagation. Sub(cid:173)
`mitted.
`
`1\IacKay, D . .J. C. (1999) Good error correcting codes based on very sparse
`matrices. IEEE Tr-ansaction:; on Infonrwtion Theor·y 45 (2): 399-431.
`
`MacKay, D. J. C., and Davey, ~1. C., (1998) Evaluation of Gallager
`codes for short block length and high rate applications. A vailablc from
`wol.ra.phy.cam.ac.uk/mackay/.
`
`MacKay, D. J. C., and 1'\eal, R. M. (1996) 'lear Shannon limit performance
`of low density parity check codes. Electronics Letters 32 (18): 1645 1646.
`Reprinted Electronics Letter·s, 33(6):457-458, March 1997.
`
`JV!ar:Kay, D .. J. C., Wilson. S. T .. and Davey, lVI. C. (1999) Comparison of
`constructions of irregular Gallager codes. IEEE Transactions on Communi(cid:173)
`cations. In press.
`
`1\lcEliece, Il .. J., MacKay, D .. J. C., and Cheng, .J.-F. (1998) Turbo decoding
`as an inst.ance of Pearl's 'belief propagation' algorit.hrn. IEEE Journal on
`Seleded Areas in Com/rnnnieatioru; 16 (2): 140-152.
`
`Tanner, Il. 1\1. (1981) A recursive approach to low complexity codes. IEEE
`Transactions on Information Theory 27 (5): 533-547.
`
`Urbanke, R., Richardson, T., and Shokrollahi, A., (1999) Design of provably
`good lo1v density parity check codes. Submitted.
`
`12
`
`Hughes, Exh. 1041, p. 12
`
`