`based on Very Sparse Matrices
`
`David J(cid:2)C(cid:2) MacKay
`Cavendish Laboratory(cid:3) Cambridge(cid:3) CB HE(cid:2)
`United Kindom(cid:2) mackay(cid:2)mrao(cid:3)cam(cid:3)ac(cid:3)uk
`
`June (cid:3) (cid:9) submitted to IEEE(cid:10)IT
`
`Abstract
`
`We present two families of error(cid:2)correcting codes de(cid:3)ned in terms of very sparse matrices(cid:4)
`(cid:5)MN codes(cid:6) are new(cid:7) and (cid:5)GL codes(cid:6) were (cid:3)rst investigated by Gallager in (cid:7) but appear to
`have been largely forgotten(cid:7) in spite of their excellent properties(cid:4) The decoding of both codes
`can be tackled with a practical belief propagation algorithm(cid:4)
`We prove that these codes are (cid:5)very good(cid:6)(cid:7) in that sequences of codes exist which(cid:7) when
`optimally decoded(cid:7) achieve information rates up to the Shannon limit(cid:4) This result holds not
`only for the binary symmetric channel but also for any channel with symmetric stationary
`ergodic noise(cid:4)
`We give experimental results for binary symmetric channels and Gaussian channels demon(cid:2)
`strating that practical performance substantially better than that of standard convolutional and
`concatenated codes can be achieved(cid:12) indeed the performance of GL codes is almost as close to
`the Shannon limit as that of Turbo codes(cid:4)
`
`Keywords
`
`Error(cid:2)correction codes(cid:3) iterative probabilistic decoding(cid:3) Shannon Limit(cid:4)
`
`
`
`Introduction
`
`For a glossary of symbols used in this paper(cid:3) please see appendix A(cid:4)
`
` (cid:3) Background
`
`In (cid:3) Shannon (cid:9) (cid:12) proved that for any channel there exist block codes that achieve arbitrarily
`small probability of error at any communication rate up to the capacity of the channel(cid:4) We will
`refer to such code families as (cid:13)very good(cid:14) codes(cid:4) By (cid:13)good(cid:14) codes we mean code families that achieve
`arbitrarily small probability of error at non(cid:2)zero communication rates up to some maximum rate
`that may be less than the capacity of the given channel(cid:4) By (cid:13)bad(cid:14) codes we mean code families that
`can only achieve arbitrarily small probability of error by decreasing the information rate to zero(cid:4)
`(cid:15)Bad codes are not necessarily useless for practical purposes(cid:4)(cid:16) By (cid:13)practical(cid:14) codes we mean code
`families which can be encoded and decoded in time and space polynomial in the block length(cid:4)
`Shannon(cid:14)s proof was non(cid:2)constructive and employed random codes for which there is no practi(cid:2)
`cal encoding or decoding algorithm(cid:4) Since (cid:3) it has been proved that there exist very good linear
`codes (cid:15)non(cid:2)constructively(cid:16)(cid:17) that there exist very good cyclic codes (cid:15)non(cid:2)constructively(cid:16) (cid:9) (cid:12)(cid:17) that
`there exist very good codes that have structure (cid:15)non(cid:2)constructively(cid:16) (cid:9) (cid:12)(cid:17) and that very good codes
`can be produced with a short description in terms of permutations (cid:9) (cid:12)(cid:4) But no practical decoding
`algorithm is known for any of these codes(cid:3) and it is known that the general linear decoding problem
`
`
`
`Hughes, Exh. 1053, p. 1
`
`
`
`(cid:15)(cid:20)nd the maximum likelihood s in the equation GTs (cid:21) n (cid:22) r(cid:16) is NP(cid:23)complete (cid:9) (cid:12)(cid:4) Convolutional
`codes (cid:15)which can be viewed as block codes with memory(cid:16) can approach the Shannon limit as their
`constraint length increases but the complexity of their decoding grows exponentially with the con(cid:2)
`straint length(cid:4) For a long time a generally held view was that for practical purposes a channel(cid:14)s
`e(cid:24)ective capacity was a rate (cid:13)R (cid:14) smaller than the Shannon capacity(cid:3) if convolutional codes were
`used(cid:17) and many believed this conjecture applied to all codes(cid:3) speculating that practical commu(cid:2)
`nication beyond R was impossible(cid:4) Forney proved that there do exist very good (cid:13)concatenated(cid:14)
`codes that are practical (cid:9) (cid:12)(cid:17) but the proof was also non(cid:2)constructive (cid:9) (cid:12)(cid:4)
`When it comes to practical(cid:3) constructive codes(cid:3) constructions have been demonstrated of codes
`based on concatenation that are good(cid:3) though not very good(cid:3) but most known practical codes are
`asymptotically bad (cid:9) (cid:12)(cid:4) Goppa(cid:14)s recent algebraic geometry codes (cid:15)reviewed in (cid:9) (cid:12)(cid:16) appear to be
`both practical and good (cid:15)with practical decoding proven possible up to the Gilbert bound(cid:16)(cid:3) but
`we believe that the literature has not established whether they are very good(cid:4) The most practi(cid:2)
`cal decoding algorithm (cid:9) (cid:12) appears to be prohibitively costly (cid:15)N (cid:16) to implement(cid:3) and algebraic
`geometry codes do not appear to be destined for practical use(cid:4)
`Thus the conventional view is that there are few known constructive codes that are good(cid:3) fewer
`still that are practical(cid:3) and none at all that are both practical and very good(cid:4) It seems to be widely
`believed that whereas almost any random linear code is good(cid:3) codes with structure that allows
`practical coding are likely to be bad (cid:9) (cid:12)(cid:3) (cid:9) (cid:12)(cid:4) Battail expresses an alternative view(cid:3) however(cid:3) that
`(cid:13)we can think of good codes(cid:3) and we can decode them(cid:14) (cid:9)(cid:12)(cid:4) This statement is supported by the
`results of the present paper(cid:4)
`In this paper we present two code families(cid:4) Gallager(cid:14)s low(cid:2)density parity(cid:2)check codes (cid:15)(cid:13)GL
`codes(cid:14)(cid:16) are de(cid:20)ned in terms of a very sparse random parity check matrix (cid:9) (cid:3) (cid:3) (cid:12)(cid:4) (cid:13)MN codes(cid:14)
`are also de(cid:20)ned in terms of very sparse random matrices(cid:3) and were (cid:20)rst presented in (cid:9) (cid:12)(cid:4) (cid:15)We
`generalized MN codes to GL codes(cid:3) then realised that we had rediscovered Gallager(cid:14)s work(cid:4)(cid:16) MN
`codes are unconventional in that redundancy can be incorporated in the transmitted codewords not
`only by using a K (cid:0) N generator matrix with transmitted block length N greater than the source
`block length K(cid:3) but also by using a source that is itself redundant(cid:4)
`These code families both have two important properties(cid:4) First(cid:3) because of the construction in
`terms of sparse matrices(cid:3) practical decoding seems to be possible at good communication rates(cid:4)
`Second(cid:3) we prove in section that in spite of their simple construction these codes are very good (cid:27)
`that is(cid:3) sequences of codes exist which when optimally decoded achieve information rates up to the
`Shannon limit of the binary symmetric channel(cid:4) We further prove that the same codes are in fact
`good for any ergodic symmetric channel model(cid:4) Our proof may be viewed as a semi(cid:2)constructive
`proof of Shannon(cid:14)s noisy channel coding theorem(cid:4) It is indeed easy to think of good codes(cid:4)
`In section we present a (cid:13)belief propagation(cid:14) algorithm for solving the decoding problem(cid:3) (cid:20)rst
`presented by Gallager (cid:9) (cid:12)(cid:4) We give an analysis of the decoding algorithm in section (cid:4) (cid:4) These
`results lead us to conjecture that there exist GL and MN codes which are not only good but which
`also achieve error rates approaching zero at a non(cid:2)zero information rate when decoded using a
`practical algorithm(cid:4)
`In sections and we describe empirical results of computer experiments
`using the belief propagation algorithm to decode GL and MN codes(cid:4) Our experiments show that
`practical performance signi(cid:20)cantly superior to that of textbook codes can be achieved by these
`codes(cid:4) Section contains discussion speci(cid:20)c to MN codes(cid:4)
`
` (cid:3) De(cid:5)nitions
`
`The input and output alphabets of the binary symmetric channel (cid:15)BSC(cid:16) will be denoted f (cid:2) g(cid:4)
`We will denote the error probability of the BSC by fn(cid:3) where fn (cid:3) (cid:4)(cid:4)
`De(cid:2)nition The binary entropy functions H(cid:15)f (cid:16) and H e
`(cid:15)f (cid:16) are
`H(cid:15)f (cid:16) (cid:22) f log(cid:15) (cid:5)f (cid:16) (cid:21) (cid:15) (cid:2) f (cid:16) log(cid:15) (cid:5)(cid:15) (cid:2) f (cid:16)(cid:16)
`
`(cid:15) (cid:16)
`
`
`
`Hughes, Exh. 1053, p. 2
`
`
`
`H e
`(cid:15)f (cid:16) (cid:22) f loge(cid:15) (cid:5)f (cid:16) (cid:21) (cid:15) (cid:2) f (cid:16) loge(cid:15) (cid:5)(cid:15) (cid:2) f (cid:16)(cid:16)(cid:4)
`
`(cid:15)(cid:16)
`
`De(cid:2)nition The weight of a binary vector or matrix is the number of s in it(cid:3) We denote the
`weight of a vector x by w(cid:15)x(cid:16)(cid:3) The density of a source of random bits is the expected fraction of
` bits(cid:3) A source is sparse if its density is less than (cid:3)(cid:3) A vector v is very sparse if its density
`vanishes as its length increases(cid:6) for example(cid:6) if a constant number t of its bits are s(cid:3) The overlap
`between two vectors is the number of s in common between them(cid:3)
`
`De(cid:2)nition The capacity C(cid:15)fn(cid:16) of a binary symmetric channel with noise density fn is(cid:6) in bits
`per cycle(cid:6)
`
`The rate R (cid:15)fn(cid:16) is
`
`C(cid:15)fn(cid:16) (cid:22) (cid:2) H(cid:15)fn(cid:16)(cid:4)
`R (cid:15)fn(cid:16) (cid:3) (cid:2) log(cid:0) (cid:21) qfn(cid:15) (cid:2) fn(cid:16)(cid:2) (cid:4)
`
`(cid:15) (cid:16)
`
`(cid:15)(cid:16)
`
`This is the computational cuto(cid:7) of sequential decoding for convolutional codes(cid:8)the rate beyond
`which the expected cost of achieving vanishing error probability using sequential decoding becomes
`in(cid:9)nite(cid:3)
`The Gilbert bound GV (cid:15)fn(cid:16) is
`
`GV (cid:15)fn(cid:16) (cid:22) (cid:3) (cid:2) H(cid:15)fn(cid:16) fn (cid:3) (cid:5)
`fn (cid:4) (cid:5)
`This is the maximum rate at which one can communicate with a code whose codewords satisfy the
`Gilbert(cid:10)Varshamov minimum distance bound(cid:6) assuming bounded distance decoding (cid:11) (cid:14)(cid:3)
`
`(cid:4)
`
`(cid:15)(cid:16)
`
`
`
`De(cid:2)nition A model that de(cid:9)nes a probability distribution over strings x of any length N (cid:6)
`P (cid:15)xjN (cid:16)(cid:6) has mean entropy Hx if for any (cid:6) (cid:7) and (cid:8) (cid:7) there exists an N (cid:0) such that for all
`N (cid:7) N (cid:0)(cid:6)
`
`(cid:7) (cid:8)(cid:6) (cid:3) (cid:6)(cid:4)
`
`log
`
`P (cid:15)xjN (cid:16) (cid:2) Hx(cid:5)(cid:5)(cid:5)(cid:5)
`
` N
`
`P (cid:4)(cid:5)(cid:5)(cid:5)(cid:5)
`
`(cid:15)(cid:16)
`
`For example(cid:3) a memoryless binary symmetric channel(cid:14)s noise has mean entropy Hn (cid:22) H(cid:15)fn(cid:16)(cid:3)
`where fn is the density of the noise(cid:17) the proof of this statement(cid:3) by the law of large numbers(cid:3) is
`well known (cid:9) (cid:12)(cid:4) We will prove that the codes presented in this paper are good codes not only for
`the binary symmetric channel but also a wide class of channels with memory(cid:4)
`
`De(cid:2)nition A binary channel with symmetric stationary ergodic noise is a channel whose output
`in response to a transmitted binary vector t is given by r (cid:22) t (cid:21) n mod (cid:6) where n(cid:6) the noise vector(cid:6)
`has a probability distribution that is (cid:15)a(cid:16) independent of t and (cid:15)b(cid:16) stationary and ergodic(cid:3)
`
`For example(cid:3) burst noise might be modelled by a stationary and ergodic Markov process(cid:4) Such
`a process has a mean entropy(cid:3) though the evaluation of this quantity may be challenging(cid:4) The
`standard Gaussian channel with binary inputs is also equivalent to a binary channel with stationary
`stationary ergodic noise(cid:4)
`We will concentrate on the case of a binary channel with symmetric noise (cid:15)see de(cid:20)nition (cid:16) in
`the body of this paper(cid:4) Channels with memory whose inputs are binary and whose outputs are in
`some more general alphabet are addressed in appendix H(cid:4)
`
`
`
`Hughes, Exh. 1053, p. 3
`
`
`
` (cid:8)(cid:8) Linear codes
`
`A linear error correcting code can be represented by a N by K binary matrix GT (cid:15)the generator
`matrix(cid:16)(cid:3) such that a K(cid:2)bit binary message s is encoded as the N (cid:2)bit vector t (cid:22) GTs mod (cid:4) (cid:15)Note
`that we have chosen to use column vectors so the generator matrices act to the right rather than
`the left(cid:4)(cid:16) The generator matrix is in (cid:13)systematic form(cid:14) if it can be written as
`
`P (cid:8) (cid:2)
`GT (cid:22) (cid:7) IK
`where IK is the K (cid:0) K identity matrix(cid:3) and P is a binary matrix(cid:4) The channel adds noise n to the
`vector t with the resulting received signal r being given by(cid:28)
`
`(cid:15)(cid:16)
`
`r (cid:22) (cid:15)GTs (cid:21) n(cid:16) mod (cid:4)
`
`(cid:15)(cid:16)
`
`The decoder(cid:14)s task is to infer s given the received message r(cid:3) and the assumed noise properties of
`the channel(cid:4) The optimal decoder returns the message s that maximizes the posterior probability
`
`P (cid:15)sjr(cid:2) G(cid:16) (cid:22)
`
`(cid:4)
`
`P (cid:15)rjs(cid:2) G(cid:16)P (cid:15)s(cid:16)
`P (cid:15)rjG(cid:16)
`It is often not practical to implement the optimal decoder(cid:4)
`If the prior probability of s is assumed uniform(cid:3) and the probability of n is assumed to be
`independent of s (cid:15)c(cid:4)f(cid:4) de(cid:20)nition (cid:16)(cid:3) then it is convenient to introduce the (cid:15)N (cid:2) K(cid:16) (cid:0) N parity
`check matrix(cid:3) H(cid:3) which in systematic form is (cid:9)P IN (cid:2)K(cid:12)(cid:4) The parity check matrix has the property
`HGT (cid:22) mod (cid:3) so that(cid:3) applying H to equation (cid:15)(cid:16)(cid:3)
`
`(cid:15) (cid:16)
`
`Hn (cid:22) Hr mod (cid:4)
`
`(cid:15) (cid:16)
`
`Any other (cid:15)N (cid:2) K(cid:16) (cid:0) N matrix A whose rows span the same space as H is a valid parity check
`matrix(cid:4)
`The decoding problem thus reduces(cid:3) given the above assumptions(cid:3) to the task of (cid:20)nding the
`most probable noise vector n such that
`
`Hn mod (cid:22) z(cid:2)
`
`(cid:15) (cid:16)
`
`where the syndrome vector z (cid:22) Hr mod (cid:4)
`
` (cid:3) Description of the two code families
`
`We de(cid:20)ne two code families(cid:4) We explain the more conventional GL codes (cid:20)rst(cid:4)
`
` (cid:8) (cid:8) The idea behind GL codes
`
`We construct a linear code by (cid:20)rst making a very sparse random parity check matrix A(cid:4) (cid:15)Very
`sparse(cid:3) but not systematic(cid:4)(cid:16) We then use linear algebra to obtain a corresponding generator matrix(cid:4)
`This can be done by (cid:20)rst putting A into systematic form H(cid:3) then deducing a systematic G(cid:4) This
`simple construction is complemented by a straightforward decoding algorithm which implements
`an approximation to the ideal Bayesian inference of equation (cid:15) (cid:16)(cid:4)
`
`
`
`Hughes, Exh. 1053, p. 4
`
`
`
` (cid:8) (cid:8) Construction of GL codes
`
`The parity check matrix A can be constructed as follows(cid:4) We will describe variations on this
`construction later(cid:4)
`A transmitted block lengthN and a source block lengthK are selected(cid:4) We de(cid:20)ne M (cid:22) N (cid:2) K
`to be the number of parity checks(cid:4) We select a column weight t(cid:3) which is an integer greater than
`or equal to (cid:4) We create a rectangular M (cid:0) N matrix (cid:9)M rows and N columns(cid:12) A at random with
`exactly weight t per column and a weight per row as uniform as possible(cid:4) If N(cid:5)M is chosen to be
`an appropriate ratio of integers then the number per row can be constrained to be exactly tN(cid:5)M (cid:4)
`We then use Gaussian elimination and reordering of columns to derive an equivalent parity check
`matrix in systematic form (cid:9)P IM (cid:12)(cid:4) There is a possibility that the rows of A are not independent
`(cid:15)though for odd t(cid:3) this has small probability(cid:16)(cid:17) in this case(cid:3) A is a parity check matrix for a code with
`the same N and with smaller M (cid:3) that is(cid:3) a code with greater rate than assumed in the following
`sections(cid:4) Rede(cid:20)ning A to be the original matrix with its columns reordered as in the Gaussian
`elimination(cid:3) we have the following situation(cid:4)
`The matrix A (cid:22) (cid:9)C C(cid:12) is composed of two very sparse matrices C and C as follows(cid:4)
`
`is a square M (cid:0) M matrix that is very sparse and invertible(cid:4) The inverse C(cid:2)
`The matrix C
`
`of this matrix has been computed during the Gaussian elimination which produced the matrix
`P (cid:22) C(cid:2)
` C (cid:4) The inversion takes order M N time and is performed once only(cid:4)
`
`The matrix C
`
`is a rectangular M (cid:0) K matrix that is very sparse(cid:4)
`
`Encoding(cid:8) We de(cid:20)ne the generator matrix of the GL code to be
`
`
`where IK is the K (cid:0) K identity matrix(cid:4)
`
`GT (cid:22) (cid:7) IKP (cid:8) (cid:22) (cid:7)
`
`IK
`C(cid:2)
`
` C (cid:8) mod (cid:2)
`
` (cid:8) (cid:8) Variations
`
`(cid:15) (cid:16)
`
` (cid:4) When generating the matrix A(cid:3) one can constrain all pairs of columns in the matrix to have
`an overlap (cid:5) (cid:4) This is expected to improve the properties of the ensemble of codes(cid:3) for
`reasons that will become apparent in section (cid:4)(cid:4)
`
`(cid:4) One can further constrain the matrix A so that the topology of the corresponding bipartite
`graph does not contain short cycles(cid:4) This is discussed further in section (cid:4) (cid:4)
`
` (cid:8) (cid:8) The decoding problem for GL codes
`
`A source vector s of length K is encoded into a transmitted vector t de(cid:20)ned by(cid:28)
`
`t (cid:22) GTs mod (cid:4)
`
`(cid:15) (cid:16)
`
`If the generator matrix has been computed explicitly (cid:15)which takes M N time(cid:16) then the transmitted
`vector can be computed by explicit multiplication in N K time(cid:4) However(cid:3) encoding might be possible
`in less time using sparse matrix methods(cid:4)
`The received vector is
`
`r (cid:22) t (cid:21) n mod (cid:2)
`
`(cid:15) (cid:16)
`
`where the noise is n(cid:4) In the case of a binary symmetric channel(cid:3) n is assumed to be a sparse random
`vector with independent(cid:3) identically(cid:2)distributed bits of density fn(cid:4) We will discuss more general
`channels below(cid:4)
`
`
`
`Hughes, Exh. 1053, p. 5
`
`
`
`By construction(cid:3) A is a parity check matrix for G (cid:27) that is(cid:3) AGT (cid:22) (cid:27) so the decoding
`problem is to recover t by (cid:20)nding the most probable n that satis(cid:20)es the equation(cid:28)
`
`where z is the syndrome vector(cid:28)
`
`An (cid:22) z mod (cid:2)
`
`z (cid:22) Ar mod (cid:2)
`
`(cid:15) (cid:16)
`
`(cid:15) (cid:16)
`
`computation of which takes time of order N t(cid:3) if the sparseness of A is exploited(cid:4)
`The optimal decoder(cid:3) in the case of a binary symmetric channel(cid:3) is an algorithm that (cid:20)nds the
`sparsest vector (cid:29)n that satis(cid:20)es A(cid:29)n (cid:22) z(cid:4) From (cid:29)n we obtain our guess for the transmitted signal
`(cid:29)t (cid:22) r (cid:21) (cid:29)n(cid:3) the (cid:20)rst K bits of which are the optimal guess (cid:29)s for the original message(cid:4)
`Both the matrix A and the unknown vector n are sparse(cid:4) One might therefore hope that it is
`practical to solve this decoding problem (cid:15)though perhaps not right up to the theoretical limits of
`the optimal decoder(cid:16)(cid:4) We will demonstrate a practical decoder later(cid:4)
`
` (cid:8) (cid:8) Construction of MN codes
`
`Having created the matrices C(cid:2)
` and C (cid:3) we can de(cid:20)ne the generator matrix G of an MN code by
`the non(cid:23)systematic matrix GT (cid:22) C(cid:2)
` C (cid:4) The novel idea behind MN codes is that we canc onstrain
`the source vectors to be sparse and exploit this unconventional form of redundancy in the decoder
`(cid:9) (cid:12)(cid:4) We will discuss properties and possible applications of MN codes in section (cid:4) The decoding
`of these codes involves a problem isomorphic to the GL codes(cid:14) decoding problem Ax (cid:22) z(cid:4)
`
` (cid:8) (cid:8) Overview
`
`The theoretical e(cid:24)ectiveness of GL and MN codes as error correcting codes depends on the properties
`of very sparse parity matrices A in relation to the solvability of the decoding problem (cid:15) (cid:16)(cid:4) We
`address the question(cid:3) how well would these codes work if we had the best possible algorithm for
`solving the decoding problem(cid:30)
`The practical e(cid:24)ectiveness of GL and MN codes depends on our (cid:20)nding a practical algorithm
`for solving equation (cid:15) (cid:16) that is close enough to the optimal decoder that the desirable theoretical
`properties are not lost(cid:4)
`We now show theoretically that there exist GL and MN codes for which the optimal decoder
`would achieve information rates arbitrarily close to the Shannon limit for a wide variety of channels(cid:4)
`We then describe empirical results with a decoding algorithm(cid:3) which(cid:3) although suboptimal(cid:3) gives
`excellent results(cid:4) The decoding takes of order N t computational operations(cid:4)
`
` Limits of Optimal Decoding
`
`We prove properties of GL and MN codes by studying properties of the decoding problem Ax (cid:22) z
`where the unknown vector x is sparse and A is very sparse(cid:4) We make use of two standard tools(cid:28)
`we prove properties of the optimal decoder by proving properties of a slightly sub(cid:2)optimal (cid:13)typical
`set decoder(cid:14) which is easier to analyse(cid:17) and we average the performance of this decoder over an
`ensemble of random very sparse matrices A(cid:4) A (cid:13)good(cid:14) average performance proves that there exist
`(cid:13)good(cid:14) matrices A(cid:3) and indeed that any random matrix A from the ensemble is likely to be (cid:13)good(cid:14)(cid:4)
`As in all proofs of goodness of coding systems(cid:3) we employ a transmission block length that can be
`increased to a su(cid:31)ciently large value that an error probability smaller than a desired (cid:6) is achieved(cid:4)
`To prove that GL and MN codes are very good we will also increase the weight per column(cid:3) t(cid:3) of
`the matrix A(cid:3) but only in such a way as to keep the matrix very sparse(cid:3) i(cid:3)e(cid:3)(cid:3) t(cid:5)M (cid:6) (cid:4)
`Previous work on low density parity check codes has already established some good properties
`of GL codes(cid:4) Gallager (cid:9) (cid:3) (cid:12) proved some good properties of GL codes in terms of their distance
`
`
`
`Hughes, Exh. 1053, p. 6
`
`
`
`properties(cid:4) Zyablov and Pinsker proved that GL codes are good and gave a practical decoder(cid:3) but
`only for communication rates substantially below the Gilbert bound(cid:4) Our approach in terms of an
`ideal decoder allows us to prove that the codes are good not only for the binary symmetric channel
`but also for arbitrary ergodic symmetric channel models(cid:17) we also prove that GL codes are very
`good(cid:3) a result not explicitly proven in (cid:9) (cid:3) (cid:3) (cid:12)(cid:4)
`The properties that we prove depend on the details of the ensemble of matrices A that is used(cid:4)
`We (cid:20)nd it easiest to prove the desired properties by weakening the ensemble of matrices from that
`described in section (cid:4) (cid:4) We introduce the following ensembles which we believe are ordered such
`that the later ensembles de(cid:20)ne GL and MN codes that have smaller average probability of error(cid:3)
`though we do not have a proof of this statement(cid:4)
`
` (cid:4) Matrix A generated by starting from an all(cid:2)zero matrix and randomly ipping t not necessarily
`distinct bits in each column(cid:4)
`
`(cid:4) Matrix A generated by randomly creating weight t columns(cid:4)
`
` (cid:4) Matrix A generated with weight t per column and (cid:15)as near as possible(cid:16) uniform weight per
`row(cid:4)
`
`(cid:4) Matrix A generated with weight t per column and uniform weight per row(cid:3) and no two
`columns having overlap greater than (cid:4)
`
`(cid:4) Matrix A further constrained so that its bipartite graph has large girth(cid:4)
`
`(cid:4) Matrix A (cid:22) (cid:9)C C(cid:12) further constrained or slightly modi(cid:20)ed so that C is an invertible matrix(cid:4)
`
`Our proofs are for the (cid:20)rst ensemble(cid:4) Our demonstrations use matrices from ensembles (cid:3) and (cid:4)
`The properties also depend on the assumed noise model(cid:4) We will give theoretical results for
`three cases(cid:4) First(cid:3) we give a general theorem for a broad class of symmetric noise models with
`memory (cid:15)de(cid:20)nition (cid:16)(cid:4) Second(cid:3) we discuss a popular special case(cid:3) the memoryless binary symmetric
`channel(cid:3) corresponding to a vector x of uniform density f (cid:4) Third(cid:3) the generalization to channels
`with continuous outputs will be discussed in appendix H(cid:4)
`
`(cid:3) Decodability for arbitrary P (cid:11)x(cid:12)
`
`To avoid confusion between GL and MN codes when discussing their common decoding problem
`Ax (cid:22) z(cid:3) we refer to the number of columns in A as L and the number of rows as M (cid:4) (cid:9)For a glossary
`of all symbols used in this paper(cid:3) see appendix A(cid:4)(cid:12) Let the ratio of the length of x to the length of
`z be (cid:9) (cid:3) L(cid:5)M (cid:4) The decoding problem is equivalent to playing a game in which a vector x is drawn
`from a probability distribution P (cid:15)x(cid:16)(cid:3) and the vector z (cid:22) Ax mod is revealed(cid:17) the player(cid:14)s task
`is then to identify the original vector x(cid:3) given z(cid:2) A(cid:3) and the known distribution P (cid:4) The optimal
`decoder is an algorithm that identi(cid:20)es the vector x that maximizes P (cid:15)x(cid:16) subject to the constraint
`z (cid:22) Ax mod (cid:4)
`There is a Shannon limit for this game beyond which we clearly cannot hope to recover x from z
`reliably(cid:4) The maximum information content ofz is clearly M bits(cid:4) We assume that the probability
`P is stationary and ergodic so that a mean entropy Hx can be de(cid:20)ned(cid:4) Then the Shannon limit
`says reliable recovery of x from z is only possible if LHx (cid:5) M (cid:3) i(cid:3)e(cid:3)(cid:28)
`(cid:9)Hx (cid:5) (cid:4)
`If there are matrices A for which we can play the decoding game well at a value of (cid:9)Hx close to
`this limit(cid:3) then there exist GL codes which can communicate correspondingly close to the Shannon
`limit of a noisy channel whose noise distribution is given by P (cid:15)x(cid:16)(cid:4)
`
`(cid:15) (cid:16)
`
`
`
`Hughes, Exh. 1053, p. 7
`
`
`
`Theorem Given an integer t (cid:4) and a ratio (cid:9) (cid:7) (cid:6) there exists an entropy Hx(cid:15)(cid:9)(cid:2) t(cid:16) (cid:7) such
`that(cid:6) for any P (cid:15)x(cid:16) of mean entropy Hx (cid:3) Hx(cid:15)(cid:9)(cid:2) t(cid:16)(cid:6) and a desired block error probability (cid:6) (cid:7) (cid:6)
`there exists an integer M (cid:6) an integer L (cid:4) (cid:9)M and a matrix A having M rows and L columns with
`weight t or less per column(cid:6) with the following property(cid:17) if x is generated from P (cid:15)x(cid:16)(cid:6) the optimal
`decoder from z (cid:22) Ax back to (cid:29)x achieves a probability of block error less than (cid:6)(cid:3)
`
`Theorem Given a distribution P (cid:15)x(cid:16) of mean entropy Hx (cid:3) (cid:15)bits(cid:16) and a desired (cid:9) (cid:3) (cid:5)H x(cid:6)
`there exists an integer t(cid:15)Hx(cid:2) (cid:9)(cid:16) (cid:4) such that for any desired block error probability (cid:6) (cid:7) (cid:6) t here is
`an integer Mmin such that for any M (cid:7) Mmin(cid:6) there is a matrix A having M rows and L (cid:4) (cid:9)M
`columns with weight t(cid:15)Hx(cid:2) (cid:9)(cid:16) or less per column(cid:6) with the following property(cid:17) when x is generated
`from P (cid:15)x(cid:16)(cid:6) the optimal decoder from z (cid:22) Ax back to (cid:29)x achieves a probability of error less than (cid:6)(cid:3)
`
`Implications of theorems and for GL codes(cid:8) The (cid:20)rst theorem e(cid:24)ectively states that
`GL codes with any value of t (cid:4) are good(cid:3) i(cid:3)e(cid:3)(cid:3) for any channel with appropriate entropy(cid:3) there
`are GL codes which can achieve virtually error(cid:2)free transmission at rates up to some non(cid:2)zero rate
` (cid:2) (cid:5)(cid:9)(cid:3) if the block lengthL is made su(cid:31)ciently large(cid:4)
`The second theorem e(cid:24)ectively states that GL codes are very good(cid:27)if we are allowed to choose
`t(cid:3) then we can get arbitrarily close to capacity(cid:3) still using very sparse matrices with t(cid:5)M arbitrarily
`small(cid:4)
`In section (cid:4) we prove these theorems(cid:3) that is(cid:3) we derive expressions for a function Hx(cid:15)(cid:9)(cid:2) t(cid:16) (cid:7)
`satisfying theorem (cid:3) and a function t(cid:15)Hx(cid:2) (cid:9)(cid:16) satisfying theorem (cid:4) Let the largest function for which
`the (cid:20)rst theorem is true be H max
`(cid:15)(cid:9)(cid:2) t(cid:16)(cid:4) In section (cid:4) we evaluate a tighter numerical lower bound
`x
`for H max
`(cid:15)(cid:9)(cid:2) t(cid:16)(cid:4) These are worst case results(cid:3) true for any source of mean entropy Hx(cid:4) In section
`x
`(cid:4)(cid:4) we give numerical results for the case of the binary symmetric channel(cid:3) where considerably
`more optimistic bounds can be derived(cid:4)
`We also prove the following minimum distance theorem for GL codes which uses the function
`Hx(cid:15)(cid:9)(cid:2) t(cid:16) of theorem (cid:4)
`
`Theorem Given an integer t (cid:4) (cid:6) a fraction (cid:10) (cid:3) (cid:4)(cid:6) and a (cid:9) such that H(cid:15)(cid:10)(cid:16) (cid:3) Hx(cid:15)(cid:9)(cid:2) t(cid:16)(cid:6) there
`exist integers M and L (cid:4) (cid:9)M and a matrix A having M rows and L columns with weight t or less
`per column(cid:6) such that the GL code whose parity check matrix is A has minimum distance at least
`(cid:10)L(cid:3)
`
`We can also prove that the Gilbert bound can be attained as t (cid:6) (cid:3) still with A very sparse(cid:4)
`Theorem Given a fraction (cid:10) (cid:3) (cid:4)(cid:6) and a (cid:9) such that H(cid:15)(cid:10)(cid:16) (cid:3) (cid:5)(cid:9)(cid:6) there exists a t and an
`Mmin such that for any M (cid:7) Mmin there is a matrix A having M rows and L (cid:4) (cid:9)M columns with
`weight t or less per column(cid:6) such that the GL code whose parity check matrix is A has minimum
`distance at least (cid:10)L(cid:3)
`
`If one only aims to decode noise
`Implication of theorem contrasted with theorem (cid:8)
`patterns of weight up to half of the minimum distance d (cid:22) (cid:10)L (cid:15)as is conventional in much of coding
`theory(cid:16)(cid:3) then one can only handle noise levels up to (cid:15)d(cid:5)L(cid:16)(cid:5)(cid:4)
`In fact the optimal decoder can
`decode (cid:15)with vanishing probability of error(cid:16) at noise levels up to (cid:15)d(cid:5)L(cid:16)(cid:4) Thus GL codes can serve
`as good codes at noise levels twice as great as the maximum noise level that is attainable if one
`restricts attention to bounded distance decoding(cid:4) The intuition for this result is that in a very
`high dimensional binary space(cid:3) two spheres of radius r whose centres are a distance d apart have
`a vanishingly small volume of intersection(cid:3) if r is less than d (cid:27) even though they have non(cid:2)zero
`overlap for any r greater than d(cid:5)(cid:4)
`GL codes(cid:3) as Gallager showed (cid:9) (cid:12)(cid:3) and we will show later(cid:3) can in practice be decoded beyond
`their minimum distance(cid:4)
`
`
`
`Hughes, Exh. 1053, p. 8
`
`
`
`(cid:3) Proof of theorem
`
`Consider the problem(cid:3) given A and z(cid:3) of inferring x(cid:3) where Ax (cid:22) z(cid:3) where x has probability
`distribution P (cid:15)x(cid:16) with mean entropy Hx(cid:4) We consider the probability of error of a typical set
`decoder (cid:9) (cid:12)(cid:3) averaging it over all very sparse random matrices A(cid:4) We establish the existence of a
`function Hx(cid:15)(cid:9)(cid:2) t(cid:16) (cid:7) such that the probability of error can be bounded by a sum of terms which
`decrease as inverse powers of L or exponentially with L(cid:3) ifH x (cid:3) Hx(cid:15)(cid:9)(cid:2) t(cid:16)(cid:4)
`
`Typical set decoder(cid:8) We consider the typical set(cid:28)
`
`
`
`log
`
` L
`
`P (cid:15)x(cid:16) (cid:2) Hx(cid:5)(cid:5)(cid:5)(cid:5)
`T (cid:22) T(cid:5)L(cid:2)(cid:3)(cid:6) (cid:22) (cid:9)x f (cid:2) g