`
`Robert G. Gallager
`
`1963
`
`Apple 1005
`
`Apple 1005
`
`
`
`Preface
`
`The Noisy Channel Coding Theorem discovered by C. E. Shannon in 1948 of-
`fered communication engineers the possibility of reducing error rates on noisy
`channels to negligible levels without sacrificing data rates. The primary obstacle
`to the practical use of this theorem has been the equipment complexity and the
`computation time required to decode the noisy received data.
`This monograph presents a technique for achieving high data rates and neg-
`ligible error probabilities on noisy channels with a reasonable amount of equip-
`ment. The advantages and disadvantages ofthis technique over other techniques
`for the same purpose are neither simple nor clear-cut, and depend primarily
`upon the channel and the type of service required. More important than the
`particular technique, however, is the hope that the concepts here will lead to
`new and better coding procedures.
`The chapters of the monograph are arranged in such a way that with the
`exception of Chapter 5 each chapter can be read independently of the others.
`Chapter 1 sets the backgroundof the study, summarizes the results, and briefly
`compares low-density coding with other coding schemes. Chapter 2 analyzes
`the distances between code words in low-density codes and Chapter 3 applies
`these results to the problem of bounding the probability of decoding error that
`can be achieved for these codes on a broad class of binary-input channels. The
`results of Chapter 3 can be immediately applied to any code or class of codes
`for which the distance properties can be bounded. Chapter 4 presents a simple
`decoding algorithm for these codes and analyzes the resulting error probabil-
`ity. Chapter 5 briefly extends all the previous results to multi-input channels,
`and Chapter 6 presents the results of computer simulation of the low-density
`decoding algorithm.
`The work reported here is an expanded and revised version of my doctoral
`dissertation, completed in 1960 in the Department of Electrical Engineering,
`M.LT.
`I am grateful to my thesis supervisor, Professor Peter Elias, and to
`mythesis readers, Professors Robert M. Fano and John M. Wozencraft, for
`assistance and encouragement both during the course of the thesis andlater.
`This research was made possible in part by support extended by the Research
`Laboratory of Electronics of the Massachusetts Institute of Technology, which is
`supported in part by the U.S. Army, the Air Force Office of Scientific Research,
`and the Office of Naval Research; additional support was received through the
`National Science Foundation (Grant G-16526) and the National Institute of
`Health (Grant MH-04737-03).
`Muchof Chapter 4 is reprinted with permission of the editors from anarticle
`by the author in the Transactions of the LR.E., IT-9, pages 21 to 28.
`The experimental results in Chapter 6 were obtained in part through the
`support of the Rome Air Development Center and in part through the support
`of the M.LT. Computation Center.
`
`Cambridge, Mass.
`July, 1963
`
`Robert G. Gallager
`
`
`
`Contents
`
`1
`
`Introduction
`1.1 Coding for Digital Data Transmission ............--.
`1.2 Low-Density Parity-Check Codes ...-...-..--..-.0-4,
`8; Summatrot Results
`=
`2
`« cscs. a
`a wx we
`ow Bom aeceeueLa mw
`1.4 Comparison with Other Schemes ......------+-+---
`
`2 Distance Functions
`2.1 Equiprobable Ensemble of Parity-Check Codes ..........
`2.2 Distance Properties of Low-Density Codes... ..-......-.
`
`3 Probability of Decoding Error
`3.1 Symmetric Binary Input Channels .....--------+-+++-
`S32 Distance Propertiess sv x
`« evacsiam wes we He EEE iw
`«
`3.3. Upper Bound on Probability of Decoding Error... .......
`Sa Cherriov Bounds!
`« 4s.
`« we saiecececs woe kw ME SH eee
`3.5 P, for Codes and Ensembles of Codes ....-....-.0-5-
`3.6 Error Probability for Equiprobable Code Ensemble ........
`3.7 Binary Symmetric Channel .. 2... ee ee eee
`3.8 Upper Bound on Rate for Low-Density Codes .........-..-
`
`4 Decoding
`2 Se Se Rene ©
`@ BEG © Ghee e
`6
`6
`21 Introductions £4 §
`4:3 Probabilistic Decoding «2 4 6s west hk Ge ee Se eee
`4.3 Probability of Error Using Probabilistic Decoding. ........
`
`5 Low-Density Codes with Arbitrary Alphabet Sizes
`Bal
`{Distance FUTGHONS 6 en we ee epeceseneue eon oe eswerie
`5.2 Probability of Decoding Error... ....- 2-2-2 ee ee eee
`5.3 Probabilistic Decoding ..... 2-2-2... 2 ee ee ee es
`5.4 Probability of Error Using Probabilistic Decoding... .....-.
`
`6 Experimental Results
`& wm ween 4 ww ee © Be eee wie
`612 Code’Generation «<i 2 ee 2
`6.2 Binary Symmetric Channel ...-- 2-2. 22 ee eee ee ee
`6.3 White Gaussian Noise Channel ..... 2... -...-20220505
`64 Rayleigh Fading Channel
`. 2.
`6
`6 6 cece ae ee es
`
`A Properties of the Function B(A)
`
`B Miscellaneous Mathematical Derivations for Chapter 3
`
`C Analysis of Number of Independent Decoding Iterations
`
`References
`
`Index
`
`72
`
`81
`
`89
`
`90
`
`
`
`1
`
`Introduction
`
`1.1 Coding for Digital Data Transmission
`The need forefficient and reliable digital data communication systerns has been
`rising rapidly in recent years. This need has been brought on by a variety of
`reasons, among them being the increase in automatic dataprocessing equipment
`and the increased need for long range communication. Attempts to develop data
`systems through the use of conventional modulation and voice transmission
`techniques have generally resulted in systems with relatively low date rates and
`high error probabilities.
`A more fundamental approach to the problemsofefficiency and reliability
`in communication systems is contained in the Noisy Channel Coding theorem
`developed by C. E. Shannon[15, 4] in 1948. In order to understand the meaning
`of this theorem, consider Figure 1.1. The source produces binary digits, or
`binits, at some fixed time rate R,. The encoderis a device that performs data
`
`source binits
`
`Source R
`binits per
`second
`
`Encoder
`operates on v
`binits at a time
`
`.
`Noisy
`Channel
`
`Decoder pro-
`duces replica of
`
`Figure 1.1: Block diagram of a communication system.
`
`processing, modulation, and anything else that might be necessary to prepare
`the data for transmission over the channel. We shall assume, however, that
`the encoder separates the source sequence into blocks of v binits and operates
`on only one block at a time. The encoder output is then transmitted over the
`channel and changed by somesort of random disturbanceor noise. The decoder
`processes the channel output and produces a delayedreplica of the source binits.
`The coding theoremstates that for a large variety of channel models, encoders
`and decoders exist such that the probability of the decoder reproducing a source
`binit in error P, is bounded by
`
`e~[Ei(Re)+0Y)] <p, < @-VE(Re)
`
`The functions E(R;) and E,(R,) depend upon the channel but not upon v; they
`are positive when R,; = 0, and decrease with R, until they become 0 at some
`time rate C;, known as the channel capacity. The exact nature of these functions
`and the particular class of channels for which this theorem has been proved need
`not concern us here. The important result is that the coding constraint length
`v is a fundamental parameter of a communication system. If a channel is to be
`used efficiently, that is with R, close to C;, then v must be made correspondingly
`large to achieve a satisfactory error probability.
`The obvious response of an engineer to such a theorem is: “Splendid, but how
`does one build encoders and decoders that behave in this way whenv is large?”
`lt is rather sobering to observe that if an encoder stores a waveform or code
`
`
`
`word for each possible block of v binits, then the storage requirement must be
`proportional to 2”, which is obviously impractical whenv is large. Fortunately,
`Elias [3] and Reiffen [14] have proved that for a wide variety of channel models,
`the results of the Noisy Channel Coding theorem can be achieved with little
`equipment complexity at the encoder by the use of parity-check coding. This
`will be described in more detail later.
`Unfortunately, the problem of decoding simply but effectively when v is
`large appears to be much moredifficult than the problem of encoding. Enough
`approaches to this problem have been developed to assure one that the Cod-
`ing theorem has engineering importance. On the other hand these approaches
`have not been carried far enough for the design of an efficient, reliable data
`communication system to become a matter of routine engineering.
`This monograph contains a detailed study of one of the three or four most
`promising approaches to simple decoding for long constraint length codes. The
`purpose of publishing this work is primarily to show how such a coding and
`decoding scheme would work and where it might be useful. Also, naturally,
`it is hoped that this will stimulate further research on the subject. Further
`mathematical analysis will probably be fruitless, but there are many interesting
`modifications of the scheme that might be made and much experimental work
`that should be done.
`In order the prove mathematically some results about. low-density parity-
`check codes, we shall assume that the codes are to be used on a somewhat
`restricted and idealized class of channels. It is obvious that results using such
`channel models can be applied only to channels that closely approximate the
`model. However, when studying the probability of decoding error, we are in-
`terested primarily in the extremely atypical events that cause errors. It is not
`easy to find models that approximate both these atypical events and typical
`events. Consequently the analysis of codes on idealized channels can provide
`onlylimited insight about real channels, and such insight should be used with
`caution.
`The channel models to be considered here are called symmetric binary-input
`channels. By this we mean a time-discrete channel for which the input is a
`sequenceof the binary digits 0 and 1 and the output is a corresponding sequence
`of letters from a discrete or continuous alphabet. The channel is memoryless in
`the sense that given the input at a given time, the output at the corresponding
`time is statistically independentof all other inputs and outputs. The symmetry
`requirementwill be defined precisely in Chapter 3, but roughlyit means that the
`outputs can be paired in such a way that the probability of one output given an
`input is the same as that of the other output of the pair given the other input.
`The binary symmetric channel, abbreviated BSC, is a memberofthis class of
`channels in which there are only two output symbols, one corresponding to each
`input. The BSC can beentirely specified by the probability of a crossover from
`one input to the other output.
`If a symmetric binary-input channel were to be used without coding, a se-
`quenceof binary digits would be transmitted through the channel and the re-
`ceiver would guess the transmitted symbols one at a time from the received
`
`
`
`If coding were to be used, however, the coder would first take se-
`symbols.
`quences of binary digits carrying the information from the source and would
`map these sequences into longer redundant sequences, called code words, for
`transmission over the channel. We define the rate R of such codes to be the
`ratio of the length of the information sequence to the length of the code word
`sequence.
`If the code words are of length n, then there are 2"F possible se-
`quences from the source that are mapped into n-length code words. Thus only
`a fraction 2-"(!-) of the 2” different, n-length sequences can be used as code
`words.
`At the receiver, the decoder, with its knowledge of which sequences are code
`words, can separate the transmitted n-length code word from the channel noise.
`Thus the code word is mapped back into the nF information digits. Many
`decoding schemesfind the transmitted code word byfirst. making a decision on
`each received digit and then using a knowledge of the code words to correct the
`errors. This intermediate decision, however, destroys a considerable amount of
`information about the transmitted message, as discussed in detail for several
`channels in Reference[1]. The decoding scheme to be describedhere avoids this
`intermediate decision and operates directly with the a posteriori probabilities
`of the input symbols conditional on the corresponding received symbols.
`The codes to be discussed here are special examples of parity-check codes'.
`The code words of a parity-check code are formed by combining a block of
`binary-information digits with a block of check digits. Each check digit is the
`modulo 2 sum? of a pre-specified set of information digits. These formation rules
`for the check digits can be represented conveniently by a parity-check matrix,
`as in Figure 1.2. This matrix represents a set of linear homogeneous modulo 2
`equations called parity-check equations, and the set of code wordsis the set of
`solutions of these equations. Wecall the set of digits contained in a parity-check
`equation a parity-check set. For example, the first parity-check set in Figure 1.2
`is the set ofdigits (1, 2,3, 5).
`
`HZ,
`
`%@2
`
`83
`
`qe
`
`FR
`
`Le
`
`LF
`
`B7 = 2, + 2X3 + 24
`
`te =2, +te+ 73
`tg = 2, + fo + £4
`
`Figure 1.2; Example of a parity-check matrix.
`
`Theuse of-parity check codes makes coding(as distinguished from decoding)
`relatively simple to implement. Also, as Elias [3] has shown,if a typical parity-
`check code of long block length is used on a BSC,andif the code rate is between
`critical rate and channel capacity, then the probability of decoding error will be
`almost as small as that for the best possible code of that rate and block length.
`
`1For a more detailed discussion of parity-check codes, see Peterson [12].
`2The modulo 2 sum is 1 if the ordinary sum is odd and 0 if the ordinary sum is even.
`
`
`
`Unfortunately, the decoding of parity-check codes is not inherently simple
`to implement; thus we must look for special classes of parity-check codes, such
`as described in Section 1.2, for which reasonable decoding procedures exist.
`
`1.2 Low-Density Parity-Check Codes
`
`Low-density parity-check codesare codes specified by a matrix containing mostly
`0’s and relatively few 1’s. In particular, an (n, j,k) low-density code is a codeof
`block length n with a matrix like that of Figure 2.1, where each column contains
`a small fixed number j of 1’s and each row contains a small fixed number k of
`l’s. Note that this type of matrix does not have the check digits appearing
`in diagonal form as do those in Figure 1.2. However, for coding purposes, the
`equations represented by these matrices can always be solved to give the check
`digits as explicit sums of information digits.
`Low density codes are not optimum in the somewhatartificial sense of min-
`imizing the probability of decoding error for a given block length, and it can
`be shown that the maximum rate at. which they can be used is bounded below
`channel capacity. However, the existence of a simple decoding scheme more
`than compensates for these disadvantages.
`
`1.3 Summary of Results
`
`An ensemble of (n, j,k) codes will be formed in Chapter 2, and this ensemble
`will be used to analyze the distance properties of (n, j,k) codes. The distance
`between two wordsin a code is simply the numberof digits in which they differ.
`Clearly an important parameter in a codeis the set of distances separating one
`code word fromall the other code words. In a parity-check code, it can be shown
`that all code words have the same setof distances to the other code words[12].
`Thus the distance properties for the ensemble can be summarized by the typical
`number of code words at each distance from the all-zero code word. It is found
`that the typical (n, j,k) code for j > 3 has a minimum distance that increases
`linearly with the block length for j and & constant. Figure 2.4 plots the ratio
`of minimum distance to block length for several values of j and k and compares
`the ratio with the same ratio for ordinary parity-check codes. The (n,j,k)
`codes with j = 2 exhibit markedly different behavior, and it is shown that the
`minimum distance of an (n, 2, k) code can increase at most logarithmically with
`the block length.
`In Chapter 3, a general upper bound on the probability of decodingerror for
`symmetric binary-input channels with maximum-likelihood decoding is derived
`for both individual codes and for arbitrary ensembles of codes. The boundis
`a function of the code only through its distance properties. The assumption of
`maximum-likelihood decoding is made partly for analytic convenience and partly
`so as to be able to evaluate codes independently of their decoding algorithms.
`Anypractical decoding algorithm, such as that described in Chapter 4, involves
`a trade-off between error probability and simplicity; the maximum-likelihood
`
`
`
`decoder minimizes the error probability but is totally impractical if the block
`length is large.
`It is shown in Chapter 3 that if the distance properties of the code are
`exponentially related to the block length, and if the code rate is sufficiently low,
`then the bound to P(e) is an exponentially decreasing function of the block
`length. For the appropriate ensemble of codes, these bounds reduce to the
`usual random coding bounds[3, 4].
`For the special case of the binary symmetric channel, a particularly simple
`bound to P(e) is found;
`this is used to show that over a range of channel
`crossover probabilities, a typical low-density code has the same error behavior
`as the optimum code of a slightly higher rate. Figure 3.5 illustrates this loss of
`effective rate associated with low-density codes.
`In Chapter 4, two decoding schemesare described. In thefirst, which is par-
`ticularly simple, the decoderfirst makes a decision on each digit, then computes
`the parity checks and changes any digit that is contained in more than some
`fixed number of unsatisfied parity-check equations. The process is repeated,
`each time using the changed digits, until the sequence is decoded. The second
`decoding schemeis based on a procedure for computing the conditional proba-
`bility that an input symbolis 1; this is conditioned on all the received symbols
`that are in any of the parity-check sets containing the digit in question. Once
`again, the procedureis iterated until the sequence is decoded. The computation
`per digit per iteration in each scheme is independent of the code length. The
`probabilistic, or second scheme,entails slightly more computation than the first
`scheme, but decodes with a lower error probability.
`A mathematical analysis of the probability of decoding error using proba-
`bilistic decoding is difficult because of statistical dependencies. However, for a
`BSC with sufficiently small cross-over probabilities and for codes with j > 4,
`a very weak upper bound the probability of error is derived that decreases
`exponentially with a root of the code length. Figure 3.5 plots cross-over proba-
`bilities for which the probability of decoding error is guaranteed to approach 0
`with increasing code length. It is hypothesized that the probability of decoding
`error actually decreases exponentially with block length, while the number of
`iterations necessary to decode increases logarithmically.
`Chapter 5 extends all the major results of Chapters 2, 3, and 4 to non-
`binary low-density parity-check codes. Although the theory generalizes in a
`very natural way, the expressions for minimum distance, error probability, and
`probabilistic decoding performanceerror are sufficiently complicated thatlittle
`insight is gained into the advantagesor disadvantages of a multilevel system over
`a binary system. Some experimental work would be helpful here in evaluating
`these codes.
`Some experimental results for binary low-density codes are presented in
`Chapter 6. An IBM 7090 computer was used to simulate both probabilistic
`decoding and the noise generated by several different types of channels. Due
`to limitation on computer time, the only situations investigated were those in
`which the channel was sufficiently noisy to yield a probability of decoding er-
`ror greater than 107‘. The most spectacular data from these experiments are
`
`
`
`given in Figure 6.8, which emphasizes the advantages of a decoding scheme that
`operates from a likelihood receiver instead of a decision receiver.
`
`1.4 Comparison with Other Schemes
`Some other coding and decoding schemes that appear extremely promising for
`achieving low error probabilities and high data rates at reasonable cost are the
`following: first, convolutional codes [3] with sequential decoding as developed
`by Wozencraft [17], Fano [5], and Reiffen [14]; second, convolutional codes with
`Massey’s threshold decoding [10]; and third, the Bose-Chaudhuricodes[2] with
`the decoding schemes developed by Peterson [12] and Zierler and Gorenstein [18].
`It has been shown byFano[5] that for arbitrary discrete memoryless chan-
`nels, sequential decoding has a probability of decoding error that is upper
`bounded by a function of the form e~°". Here nis the constraint length of
`the code andais a function of both the channel and the code; a is positive for
`rates below channel capacity C. Fano also shows that for rates below a certain
`quantity called Reomp, where Reomp < C, the average amount of computation
`in decodinga digit is bounded by a quantity independent of constraint length.
`An experimental sequential decoder has been built at Lincoln Laboratories,
`Lexington, Massachusetts [11]. By using this decoderin a system with a feed-
`back link and an appropriately designed modulator and demodulator, reliable
`transmission has been achieved experimentally [9] over a telephone circuit at
`about 7500 bits per second rather than the 1200 or 2400 bits per second possi-
`ble without coding.
`The two principal weaknesses of sequential decoding are as follows: First,
`the amount of computation required per digit is a randomvariable, and this
`creates a waiting line problem at the decoder; second, if the decoder once makes
`an error, a large block of errors can be made before the decoder gets back on
`the proper track. If a feedback link is available, these problemsare not serious,
`but considerably more study is required for cases in which no feedback exists.
`Threshold decoding is the simplest scheme to implement that is discussed
`here;it involves only shift registers, a few binary adders, and a thresholddevice.
`It is most effective at relatively short constraint lengths, and has a somewhat
`higher error probabilityand less flexibility than sequential decoding.
`The computation per digit associated with the Bose-Chaudhuri codes on the
`BSC increases roughly as the cube of the block length but does not fluctuate
`widely. The decoding scheme guaranteescorrection of all combinationsof up to
`some fixed number of errors and corrects nothing beyond. For moderately long
`block lengths, this restriction in the decoding procedure causes a large increase
`in P,. No way is known to make use of the a posteriori probabilities at the
`output of more general binary input channels. This inability to make use of a
`posteriori probabilities appears to be a characteristic limitation of algebraic as
`opposed to probabilistic decoding techniques.
`The computation per digit associated with low-density parity-check codes
`appears to increase at most logarithmically with block length and notto fluctu-
`ate widely with the noise. The probability of decoding error is unknown,butis
`
`
`
`believed to decrease exponentially with block length at a reasonable rate. The
`ability to decode the digits of a block in parallel makes it possible to handle
`higher data rates than is possible with other schemes.
`For many channels with memory, retaining the a posteriori probabilities from
`the channel makes it practically unnecessary to take account of the memory in
`any other way. For instance, on a fading channel when the fade persists for
`several baud lengths, the a posteriori probabilities will indicate the presence of
`a fade. If this channel were used as a BSC however, it would be necessary for
`the decoder to account for the fact that bursts of errors are more probable than
`isolated errors. Then, using a posteriori probabilities gives low-density decoding
`and sequential decoding a great flexibility in handling channels with dependent,
`noise. For channels in which the noise is rigidly constrained to occur in short,
`severe bursts, on the other hand, there is a particularly simple procedure for
`decoding the Bose-Chaudhuri codes [12].
`Whentransmitting over channels subject to long fades or long noise bursts,
`it is often impractical to correct errors in these noisy periods. In such casesit is
`advantageous to use a combination of error correction anderror detection with
`feedback and retransmission[16]. All of the coding and decoding schemes being
`consideredherefit naturally into such a system, but in cases wherelittle or no
`error correction is attempted, low-density codes appear at a disadvantage.
`In conclusion, all these schemes have their own advantages, and clearly no
`scheme is optimum for all communication situations.
`It appears that enough
`coding and decoding alternatives now exist for serious consideration of the use
`of coding on particular channels.
`
`10
`
`
`
`2 Distance Functions
`
`The distance function of a parity check-code N(é) is defined as the number of
`code words in the code of weight ¢ From the group properties of a parity-
`check code, it easily follows [12] that N(£) is also the number of code words
`at distance @ from any given code word. The minimum distance D of a code
`is then defined as the smallest value of £ > 0 for which N(é) # 0. Clearly, in
`a code of given block length n and rate R it is desirable to make D as large
`as possible and to make N(é) as small as possible for those ¢ just larger than
`D. However, the next chapter, which discusses bounding the probability of
`decoding error for symmetric binary-input channels, will make the exact effect
`of N(é) on error-correcting capability clearer.
`For a parity-check code of long block length it is usually impractical to
`calculate exactly the distance function or even the minimum distance because
`of the enormous number of code words involved. It is often simpler to analyze the
`average distance function of an ensemble of codes;the statistics of an ensemble
`permit one to average over quantities that are not tractable in individual codes.
`From the ensemble average, one can then makestatistical statements about the
`member codes.
`
`2.1 Equiprobable Ensemble of Parity-Check Codes
`This chapter will be concerned primarily with the distance functions of low-
`density parity-check codes, but for comparison purposes, the average distance
`function of another ensemble of parity-check codes will be derived first. Since a
`parity-check code is completely specified by a parity check matrix, an ensemble
`of parity-check codes maybe defined in terms of an ensemble of parity-check
`matrices. The equiprobable ensemble of parity-check codes of rate R and block
`length nwill be defined as the ensemble in which the n(1—) by n parity-check
`matrix is filled with statistically independent equiprobable binary digits. This
`is essentially the same ensemble as that considered by Elias [3] in his random
`coding boundsfor parity-check codes; the minordifference is that codes in this
`ensemble may have a rate slightly higher than R, since the rows of a matrix in
`this ensemble are not necessarily independent over the modulo 2 field.
`
`in a
`Theorem 2.1. Let N(@) be the average number of code words of weight €
`code averaged over the equiprobable ensemble of parity-check codes of length n
`and rate R. Then for € > 0,
`—
`
`N(@) (‘)2-n0-B) < [2enX(1—A)]
`
`_i
`
`2 expn[H(A)-(1-—R)In2]
`
`(2.1)
`
`where
`
`A= ale
`
`1
`1
`H(A) =AIny + (1—d)Inj—
`
`11
`
`
`
`Proof. Let. P(£) be the probability of the set of codes for which someparticular
`sequence of weight £ is a code word. Stated differently, P(é) is the probability
`that a particular sequence of weight @ will be a code word in a code chosen at
`random from the ensemble. Since the all-zero code word is a code word in any
`parity-check code, P(é) = 1 for = 0. For é # 0, a particular parity-check will
`check with probability 5 on the last position in which the @ weight sequence has
`a one. This makes the probability } that a parity-check is satisfied regardless
`whetherthefirst £— 1 ones were checked an even or an odd numberof times. A
`sequence will be a codeif andonlyif it satisfies all the n(1 — R) parity checks,
`so that
`
`for 240
`P() =2-"0-*);
`The probability P(é) can also be interpreted as an expectation of a random
`variable that is 1 if the sequence is a code word and 0 otherwise. Nowwe observe
`that there are (7) sequences of weight @ and that the expected number of code
`words among these sequencesis the sum of the expectations that the individual
`sequences are code words. Thus
`
`Nb = (7)g-n(-R)
`We nowbound (7) by the Stirling approximation:
`
`Jinn exp(—n+ 12n
`Seon) = (") =in exp( Ati) ue
`
`,
`1
`
`«
`
`1
`1
`=n eee) S
`
`n
`
`[7S
`
`1
`"exp(—n+—)
`
`(2.
`
`(2.2)
`
`It follows after some manipulation that for An = £
`
`1
`
`L
`
`4
`
`1
`
`ATOP(O = wacew) < (") < JimnyPd)
`
`(2.4)
`
`where
`
`H(A) = —AlnA — (1 — A) In(1 —-))
`Combining Equations (2.4) and (2.2), we get the statement of the theorem.
`Oo
`
`Next we observe that over the equiprobable ensemble of parity-check codes,
`the minimum distance of the individual codes is a random variable whose dis-
`tribution function can be bounded by the following theorem.
`
`Theorem 2.2. Over the equiprobable ensemble of parity-check codes of length n
`and rate R, the minimum-distance distribution function Pr(D < dn) is bounded
`by both the following inequalities for 6 < 4 and 6n andinteger:
`Pr(D < dn) < a 5 expn[H(6) — (1 — R)In2]
`
`(2.5)
`
`—6
`
`Pr(D < én) <1
`
`12
`
`
`
`Proof. It was shown in Theorem 2.1 that the probability that a nonzero sequence
`is a code word over the ensemble of codes is 2~"(!-®), The probability that any
`sequence of weight né or less is a code word is certainly less than the sum of
`the probabilities that the individual sequences are code words. Thus,
`né
`
`Pr(D < nd) < yo (7)gah)
`
`#1
`
`(2.6)
`
`
`
` nd . _
`
`
`
`» (7) ~ (*,) [b+ * EES ti
`
`f=1
`
`Bounding this by a geometric series, we get
`
`bs (7) < (r)aa
`
`(2.7)
`
`Bounding Equation (2.7) by (2.4) and substituting into Equation (2.6), we get
`the statement of the theorem.
`Oo
`
`As n gets larger, this bound to Pr(D < én) as a function of 6 approaches
`a step function with the step at that dy) < } for which H(6j) = (1 — R)In2.
`Figure 2.4 plots dp as a function of rate. This result is closely related to the
`Gilbert bound on minimum distance [6]. The asymptotic form of the Gilbert
`boundfor large n states that there exists a code for which D > ndp. Theorem2.2
`states that for any « > 0, the probability of the set of parity-check codes for
`which D < n(ég9 —€) approaches 0 exponentially with n.
`
`2.2 Distance Properties of Low-Density Codes
`In this section an ensemble of low-density parity-check codes will be defined, and
`theorems similar to Theorems 2.1 and 2.2 will be proved. Then a new ensemble
`will be formed by expurgating those codes that have small minimum distances.
`This expurgated ensemblewill be used in the next chapter to derive bounds on
`the probability of decoding error for various channels.
`Define an (n, j,k) parity-check matrix as a matrix of n columns that has j
`ones in each column, & ones in each row, and zeros elsewhere. It follows from
`this definition that an (n, j,k) parity-check matrix has nj/k rows and thus a rate
`R>1-— j/k.
`In order to construct an ensemble of (n, j,k) matrices, consider
`first the special (n,j,k) matrix in Figure 2.1, for which n = 20, 7 = 3, and
`k=4,
`This matrix is divided into j submatrices, each containing a single 1 in
`each column. The first of these submatrices containsall its 1’s in descending
`order; that is, the it* row contains l’s in columns (i — 1)k +1 to ik. The
`other submatrices are merely column permutations ofthe first. We define the
`ensemble of (n, 7, k) codes as the ensemble resulting from random permutations
`of the columnsof each of the bottom 7 — 1 submatrices of a matrix such as in
`
`13
`
`
`
`oocoF
`occcr
`ooocre
`
`cccor
`ooors
`ooorc
`
`ooooOKF
`ocoorcse
`oorcco
`
`ooocre
`oreococco
`orococ&
`
`Qoocra
`seoocr
`eFococs
`
`ooors
`oocre
`oooce
`
`ooors
`oorcs
`oo6.4
`
`ooors
`Fococ©
`ooroc
`
`oorcs
`eoooccr
`orooo
`
`oorc$ccs
`
`rPoocorceacorf&
`
`ecroSs
`oorccrcose
`
`oorcsd
`
`ooocrecoco
`
`orcoe
`
`oorc9cjeacoocr
`
`o
`
`orooe
`
`Qoroccqceoocrcca
`
`rFPoopceolcroos|corocoos
`
`aeoorcdgrcj#ceccoccroacece
`
`Seroeoeocodgjrcorcdlinroooca
`
`Seooocrjcorcjocrocqcjcqc2
`
`
`
`ocrocoocorcooFocec
`
`rFPococolePooooc}rccoeosa
`
`o
`
`Figure 2.1: Example of a low-density code matrix for n = 20, 7 = 3, and k = 4.
`
`Figure 2.1 with equal probability assigned to each permutation. This definition
`is somewhat arbitrary and is made for mathematical convenience. In fact such
`an ensemble does not include all (n