`
`Robert G. Gallager
`
`1963
`
`1
`
`Apple 1007
`Apple 1007
`
`
`
`Preface
`
`The Noisy Channel Coding Theorem discovered by C. E. Shannon in 1948 of-
`fered communication engineers the possibility of reducing error rates on noisy
`charmels to negligible levels without sacrificing data rates. The primary obstacle
`to the practical use of this theorem has been the equipment complexity and the
`computation time required to decode the noisy received data.
`This monograph presents a technique for achieving high data rates and neg-
`ligible error probabilities on noisy channels with a reasonable amount of equip-
`ment. The advantages and disadvantages of this technique over other techniques
`for the sa.me purpose are neither simple nor clear—cut, and depend primarily
`upon the channel and the type of service required. More important than the
`particular technique, however, is the hope that the concepts here will lead to
`new and better coding procedures.
`The chapters of the monograph are arranged in such a way that with the
`' exception of Chapter 5 each chapter can be read independently of the others.
`Chapter 1 sets the background of the study, summarizes the results, and briefly
`compares low—density coding with other coding schemes. Chapter 2 analyzes
`the distances between code words in low—density codes and Chapter 3 applies
`these results to the problem of bounding the probability of decoding error that
`can be achieved for these codes on a broad class of binary-input channels. The
`results of Chapter 3 can be immediately applied to any code or class of codes
`for which the distance properties can be bounded. Chapter 4 presents a simple
`decoding algorithm for these codes and analyzes the resulting error probabil-
`ity. Chapter 5 briefly extends all the previous results to multi-input channels,
`and Chapter 6 presents the results of computer simulation of the low-density
`decoding algorithm.
`The work reported here is an expanded and revised version of my doctoral
`dissertation, completed in 1960 in the Department of Electrical Engineering,
`M.I.T.
`I am grateful to my thesis supervisor, Professor Peter Elias, and to
`my thesis readers, Professors Robert M. Fano and John M. Wozencraft, for
`assistance
`encouragement both during the course of the thesis and later.
`This research was made possible in part by support extended by the Research
`Laboratory of Electronics of the Massachusetts Institute of Technology, which is
`supported in part by the U.S. Army, the Air Force Office of Scientific Research,
`and the Oflice of Naval Research; additional support wasreceived through the
`- National Science Foundation (Grant G—16526) and the National Institute of
`Health (Grant MH-04737-03). .
`Much of Chapter 4 is reprinted with permission of the editors from an article
`by the author in the Transactions of the I.R.E., IT—9, pages 21 to 28.
`The experimental results in Chapter 6 were obtained in part through the
`support of the Rome Air Development Center and in part through the support
`of the M.I.T. Computation Center.
`
`Cambridge, Mass.
`July, 1963
`
`_
`Robert G. Gallager
`
`
`
`Contents
`
`1
`
`Introduction
`
`1.1 Coding for Digital Data Transmission
`1.2 Low-Density Parity-Check Codes .
`.
`1.3 Summary of Results
`1.4 Comparison with Other Schemes
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`2 Distance Functions
`
`2.1 Equiprobable Ensemble of Parity—Check Codes
`2.2 Distance Properties of Low—Density Codes .
`.
`
`.
`
`.
`
`.
`
`_
`
`.
`
`.
`.
`
`3 Probability of Decoding Error
`3.1 Symmetric Binary Input Channels
`.
`.
`.
`.
`.
`3.2 Distance Properties .
`.
`.
`.
`.
`.
`.
`.
`3.3 Upper Bound on Probability of Decoding Error .
`3.4 Chernov Bounds
`
`.
`
`.
`
`.
`.
`
`.
`
`.
`
`.
`.
`
`.
`
`.
`
`Pa for Codes and Ensembles of Codes
`3.5
`3.6 Error Probability for Equiprobable Code Ensemble .
`3.7 Binary Symmetric Channel
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`3.8 Upper Bound on Rate for Low-Density Codes .
`.
`.
`.
`
`_
`
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`
`_
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`
`.
`.
`
`.
`
`_
`.
`.
`
`.
`
`.
`
`.
`.
`
`.
`
`_
`.
`.
`
`_
`
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`_
`
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`
`.
`
`.
`.
`
`.
`
`_
`.
`.
`
`.
`
`4 Decoding
`4.1
`Introduction .
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`4.2 Probabilistic Decoding .
`4.3 Probability of Error Using Probabilistic Decoding.
`
`.
`
`.
`.
`
`5 Low—Density Codes with Arbitrary Alphabet Sizes
`5.1 Distance Functions .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`
`_
`_
`.
`_
`.
`.
`_
`.
`.
`.
`.
`5.2 Probability of Decoding Error.
`_
`_
`_
`_
`_
`_
`_
`_
`.
`_
`_
`5.3 Probabilistic Decoding .
`.
`.
`.
`.
`5.4 Probability of Error Using Probabilistic Decoding .
`
`_
`_
`.
`
`.
`
`.
`.
`
`.
`
`_
`_
`.
`
`.
`
`.
`.
`
`.
`
`_
`_
`.
`
`.
`
`.
`_
`
`.
`
`_
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`_
`.
`
`.
`
`.
`.
`
`.
`
`.
`_
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`6 Experimental Results
`6.1 Code Generation
`
`6.2 Binary Symmetric Channel
`63 White Gaussian Noise Channel
`
`6.4 Rayleigh Fading Channel
`
`A Properties of the Function B(/\)
`
`B Miscellaneous Mathematical Derivations for Chapter 3
`
`C Analysis of Number of Independent Decoding Iterations
`
`References
`
`Index
`
`39
`39
`40
`44
`
`49
`49
`
`52
`54
`
`57
`57
`57
`60
`62
`
`67
`
`72
`
`81
`
`89
`
`90
`
`
`
`1
`
`Introduction
`
`1.1 Coding for Digital Data Transmission
`
`The need for efiicient and reliable digital data communication systems has been
`rising rapidly in recent years. This need has been brought on by a variety of
`reasons, among them being the increase in automatic data processing equipment
`and the increased need for long range communication. Attempts to develop data
`systems through the use of conventional modulation and voice transmission
`techniques have generally resulted in systems with relatively low date rates and
`high error probabilities.
`.
`A more fundamental approach to the problems of efficiency and reliability
`in communication systems is contained in the Noisy Channel Coding theorem
`developed by C. E. Shannon [15, 4] in 1948. In order to understand the meaning
`of this theorem, consider Figure 1.1. The source produces binary digits, or
`binits, at some fixed time rate Rt. The encoder is a device that performs data
`
`Source R
`binits per
`second
`
`Encoder
`operates on u
`binits at a time
`
`source binits
`
`Decoder pro-
`duces replica of
`
`Figure 1.1: Block diagram of a communication system.
`
`processing, modulation, and anything else that might be necessary to prepare
`the data for transmission over the channel. We shall assume, however, that
`the encoder separates the source sequence into blocks of 1/ binits and operates
`on only one block at a time. The encoder output is then transmitted over the
`channel and changed by some sort of random disturbance or noise. The decoder
`processes the channel output and produces a delayed replica of the source binits.
`The coding theorem states that for a large variety of charmel models, encoders
`and decoders exist such that the probability of the decoder reproducing a source
`binit in error P9 is bounded by
`
`e—VlEL(Rt)+0(")l S pe S e—"E(R:)
`
`The functions E(R¢) and EL (Rt) depend upon the channel but not upon 1/; they
`are positive when Rt = 0, and decrease with R, until they become 0 at some
`time rate Ct known as the channel capacity. The exact nature of these functions
`and the particular class of channels for which this theorem has been proved need
`not concern us here. The important result is that the coding constraint length
`1/ is a fimdamental parameter of a communication system. If a channel is to be
`used efliciently, that is with Rt close to Ct, then 1/ must be made correspondingly
`large to achieve -a satisfactory error probability.
`The obvious response of an engineer to such a theorem is: “Splendid, but how
`does one build encoders and decoders that behave in this way when u is large?”
`It is rather sobering to observe that if an encoder stores a waveform or code
`
`
`
`word for each possible block of 1/ binits, then the storage requirement must be
`proportional to 2", which is obviously impractical when 1/‘is large. Fortunately,
`Elias [3] and Reiffen [14] have proved that for a wide variety of channel models,
`the results of the Noisy Channel Coding theorem can be achieved with little
`equipment complexity at the encoder by the use of parity—check coding. This
`will be described in more detail later.
`
`Unfortunately, the problem of decoding simply but effectively when u is
`large appears to be much more diflicult than the problem of encoding. Enough
`approaches to this problem have been developed to assure one that the Cod-
`ing theorem has engineering importance. On the other hand these approaches
`have not been carried far enough for the design of an efficient, reliable data
`communication system to become a matter of routine engineering.
`This monograph contains a detailed study of one of the three or four most
`promising approaches to simple decoding for long constraint length codes. The
`purpose of publishing this work is primarily to show how such a coding and
`decoding scheme would work and where it might be useful. Also, naturally,
`it is hoped that this will stimulate further research on the subject. Further
`mathematical analysis will probably be fruitless, but there are many interesting
`modifications of the scheme that might be made and much experimental work
`that should be done.
`
`In order the prove mathematically some results about low-density parity-
`check codes, we shall assume that the codes are to be used on a somewhat
`restricted and idealized class of channels. It is obvious that results using such
`channel models can be applied only to channels that closely approximate the
`model. However, when studying the probability of decoding error, we are in-
`terested primarily in the extremely atypical events that cause errors. It is not
`easy to find models that approximate both these atypical events and typical
`events. Consequently the analysis of codes on idealized channels can provide
`only limited insight about real channels, and such insight should be used with
`caution.
`
`The channel models to be considered here are called symmetric binary-input
`channels. By this we mean a time-discrete channel for which the input is a
`sequence of the binary digits 0 and 1 and the output is a corresponding sequence
`of letters from a discrete or continuous alphabet. The channel is memoryless in
`the sense that given the input at a given time, the output at the corresponding
`time is statistically independent of all other inputs and outputs. The symmetry
`requirement will be defined precisely in Chapter 3, but roughly it means that the
`outputs can be paired in such a way that the probability of one output given an
`input is the same as that of the other output of the pair given the other input.
`The binary symmetric channel, abbreviated BSC, is a member of this class of
`channels in which there are only two output symbols, one corresponding to each
`input. The BSC can be entirely specified by the probability of a crossover from
`one input to the other output.
`' If a symmetric binary-input channel were to be used without coding, a se-
`quence of binary digits would be transmitted throu-ghthe channel and the re-
`ceiver would guess the transmitted symbols one at a time from the received
`
`
`
`If coding were to be used, however, the coder would first take se-
`symbols.
`quences of binary digits carrying the information from the source and would
`map these sequences into longer redundant sequences, called code words, for
`transmission over the channel. We define the rate R of such codes to be the
`
`'
`
`ratio of the length of the information sequence to the length of the code word
`sequence.
`If the code words are of length n, then there are 2"“ possible se- .
`quences from the source that are mapped into n—length code words. Thus only
`-a fraction 2“"‘(1‘R) of the 2" different n—length sequences can be used as code
`words.
`
`At the receiver, the decoder, with its knowledge of which sequences are code
`words, can separate the transmitted n-length code word from the channel noise.
`Thus the code word is mapped back into the nR information digits. Many
`decoding schemes find the transmitted code word by_first making a decision on
`each received digit and then using a knowledge of the code words to correct the
`errors. This intermediate decision, however, destroys a considerable amount of
`information about the transmitted message, as discussed in detail ‘for several
`channels in Reference
`The decoding scheme to be described here avoids this
`intermediate decision and operates directly with the a posteriori probabilities
`of the input symbols conditional on the corresponding received symbols.
`The codes to be discussed here are special examples of parity-check codes‘.
`The code words of a parity-check code are formed by combining a block of
`bina.ry—information digits with a block of check digits. Each check digit is the
`modulo 2 sum” of a pre-specified set of information digits. These formation rules
`for the check digits can be represented conveniently by a parity-check matrix,
`as in Figure 1.2. This matrix represents a set of linear homogeneous modulo 2
`equations called parity-check equations, and the set of code words is the set of
`solutions of these equations. We call the set of digits contained in a parity-check
`equation a parity-check set. For example, the first parity-check set in Figure 1.2
`is the set of digits (1, 2,3, 5).
`
`271
`
`$2
`
`133
`
`I04
`
`935
`
`336
`
`$7
`
`$7=-7I1+$3+.'v4
`
`:lI5=£E1‘-+-(172-l-(I33
`a:6=a:1 +:1:2+a:4
`
`Figure 1.2: Example of a parity-check matrix.
`
`The use of-parity check codes makes coding (as distinguished from decoding)
`relatively simple to implement. Also, as Elias [3] has shown, if a typical pa.rity— »
`check code of long block length is used on a BSC, and if the code rate is between
`critical rate and channel capacity, then the probability of decoding error will be
`almost as small as that for the best possible code of that rate and block length.
`
`‘For a more detailed discussion of parity-check codes, see Peterson [12].
`2The modulo 2 sum is 1 if the ordinary sum is odd and 0 if the ordinary sum is even.
`
`
`
`Unfortunately, the decoding of parity-check codes is not inherently simple
`to implement; thus we must look for special classes of pa.rity—check codes, such
`as described in Section 1.2, for which reasonable decoding procedures exist.
`
`1.2 Low-Density Parity-Check Codes
`
`Low—density parity—check codes are codes specified by a matrix containing mostly
`0’s and relatively few 1’s. In particular, an (n, j, is) low-density code is a code of
`block length n with a matrix like that of Figure 2.1, where each column contains
`a small fixed number j of 1’s and each row contains a small fixed number k of
`1’s. Note that this type of matrix does not have the check digits appearing
`in diagonal form as do those in Figure 1.2. However, for coding purposes, the
`equations represented by these matrices can always be solved to give the check
`digits as explicit sums of information digits.
`‘
`Low density codes are not optimum in the somewhat artificial sense of min-
`imizing the probability of decoding error for a given block length, and it can
`be shown that the maximum rate at which they can be used is bounded below
`channel capacity. However, the existence of a simple decoding scheme more
`than compensates for these disadvantages.
`
`1.3 Summary of Results
`
`An ensemble of (n, j, k) codes will be formed in Chapter 2, and this ensemble
`will be used to analyze the distance properties of (n, j, 16) codes. The distance
`between two words in a code is simply the number of digits in which they differ.
`Clearly an important parameter in a code is the set of distances separating one
`code word from all thepother code words. In a parity—check code, it can be shown
`that all code words have the same set of distances to the other code words [12].
`Thus the distance properties for the ensemble can be summarized by the typical
`number of code words at each distance from the all-zero code word. It is found
`
`that the typical (n, j, 1:) code for j 2 3 has a minimum distance that increases
`linearly with the block length for j and k constant. Figure 2.4 plots the ratio
`of minimum distance to block length for several values of j and k and compares
`the ratio with the same ratio for ordinary pa.rity—check codes. The (n, j,k)
`codes with j = 2 exhibit markedly different behavior, and it is shown that the '
`minimum distance of an (n, 2, It) code can increase at most logarithmically with
`the block length.
`In Chapter 3, a general upper bound on the probability of decoding error for
`symmetric binary-input channels with maximum—likelihood decoding is derived
`for both individual codes and for arbitrary ensembles of codes. The bound is
`a function of the code only through its distance properties. The assumption of
`maximum—likelihood decoding is made partly for analytic convenience and partly
`so as to be able to evaluate codes independently of their decoding algorithms.
`Any practical decoding algorithm, such as that described in Chapter 4, involves
`a trade—off between error probability and simplicity; the maximum—likelihood
`
`
`
`decoder minimizes the error probability but is totally impractical if the block
`length is large.
`It is shown in Chapter 3 that if the distance properties of the code are
`exponentially related to the block length, and if the code rate is sufficiently low,
`then the bound to P(e) is an exponentially decreasing function of the block
`length. For the appropriate ensemble of codes, these bounds reduce to the
`usual random coding bounds [3, 4].
`For the special case of the binary symmetric channel, a particularly simple
`bound to P(e) is found; this is used to show that over a range of channel
`crossover probabilities, a typical low—density code has the same error behavior
`as the optimum code of a slightly higher rate. Figure 3.5 illustrates this loss of
`effective rate associated with low—density codes.
`In Chapter 4, two decoding schemes are described. In the first, which is par-
`ticularly simple, the decoder first makes a decision on each digit, then computes
`the parity checks and changes any digit that is contained in more than some
`fixed number of unsatisfied parity—check equations. The process is repeated,
`each time using the changed digits, until the sequence is decoded. The second
`decoding scheme is based on a procedure for computing the conditional proba-
`bility that an input symbol is 1; this is conditioned on all the received symbols
`that are in any of the parity—check sets containing the digit in question. Once
`again, the procedure is iterated until the sequence is decoded. The computation
`per digit per iteration in each scheme is independent of the code length. The
`probabilistic, or second scheme, entails slightly more computation than the first
`scheme, but decodes with a lower error probability.
`A mathematical analysis of the probability of decoding error using proba-
`bilistic decoding is difficult because of statistical dependencies. However, for a
`BSC with sufficiently small cross-over probabilities and for codes with j 2 4,
`a very weak upper bound the probability of error is derived that decreases
`exponentially with a root of the code length. Figure 3.5 plots cross-over proba-
`bilities for which the probability of decoding error is guaranteed to approach‘ 0
`with increasing code length. It is hypothesized that the probability of decoding
`error actually decreases exponentially with block length, while the number of
`iterations necessary to decode increases logarithmically.
`Chapter 5 extends all the major results of Chapters 2, 3, and 4 to non-
`binary low—density parity—check codes. Although the theory generalizes in a
`very natural way, the expressions for minimum distance, error probability, and
`probabilistic decoding performance error are sufficiently complicated that little
`insight is gained into the advantages or disadvantages of a multilevel system over
`a binary system. Some experimental work would be helpful here in evaluating
`these codes.
`
`Some experimental results for binary low—density codes are presented in
`Chapter 6. An IBM 7090 computer was used to simulate both probabilistic
`decoding and the noise generated by several different types of channels. Due
`to limitation on computer time, the only situations investigated were those in
`which the channel was sufliciently noisy to yield a probability of decoding er-
`ror greater than 10“. The most spectacular data from these experiments are
`
`
`
`given in Figure 6.8, which emphasizes the advantages of a decoding scheme that
`operates from a likelihood receiver instead of a decision receiver.
`
`1.4 Comparison with Other Schemes
`Some other coding and decoding schemes that appear extremely promising for
`achieving low error probabilities and high data rates at reasonable cost are the
`following: first, convolutional codes [3] with sequential decoding as developed
`by Wozencraft [17], Fano [5], and Reiifen [14]; second, convolutional codes with
`Massey’s threshold decoding [10]; and third, the Bose—Chaudhuri codes [2] with
`the decoding schemes developed by Peterson [12] and Zierler and Gorenstein [18].
`It has been shown by Fano [5] that for arbitrary discrete memoryless chan-
`nels, sequential decoding has a probability of decoding error that is upper
`bounded by a function of the form e'“". Here n is the constraint length of
`the code and a is a function of both the channel and the code; oz is positive for
`rates below channel capacity C. Fano also shows that for rates below a certain
`quantity called RCO“-‘P, where .Rc0n]p < C, the average amount of computation
`in decoding a digit is bounded by a quantity independent of constraint length.
`An experimental sequential decoder has been built at Lincoln Laboratories,
`Lexington, Massachusetts [11]. By using this decoder in a system with a feed-
`back link and an appropriately designed modulator and demodulator, reliable
`transmission has been achieved experimentally [9] over a telephone circuit at
`about 7500 bits per second rather than the 1200 or 2400 bits per second possi-
`ble without coding.
`.
`The two principal weaknesses of sequential decoding are as follows: First,
`the amount of computation required per digit is a random variable, and this
`creates a waiting line problem at the decoder; second, if the decoder once makes
`an error, a large block of errors can be made before the decoder gets back on
`the proper track. If a feedback link is available, these problems are not serious,
`but considerably more study is required for cases in which no feedback exists.
`Threshold decoding is the simplest scheme to implement that is discussed
`here; it involves only shift registers, a few binary adders, and a threshold device.
`It is most effective at relatively short constraint lengths, and has a somewhat
`higher error probability and less flexibility than sequential decoding.
`The computation per digit associated with the Bose—Chaudhuri codes on the
`BSC increases roughly as the cube of the block length but does not fluctuate
`widely. The decoding scheme guarantees correction of all combinations of up to
`some fixed number of errors and corrects nothing beyond. For moderately long
`block lengths, this restriction in the decoding procedure causes a large increase
`in P8. No way is known to make use of the a posteriori probabilities at the
`output of more general binary input channels. This inability to make use of a
`posteriori probabilities appears to be a characteristic limitation of algebraic as
`opposed to probabilistic decoding techniques.
`‘ The computation per digit associated with low-density parity-check codes
`appears to increase at most logarithmically with block length and not to fluctu-
`ate widely with the noise. The probability of decoding error is unknown, but is
`
`
`
`believed to decrease exponentially with block length at a reasonable rate. The
`ability to decode the digits of a block in parallel makes it possible to handle
`higher data rates than is possible with other schemes.
`For many channels with memory, retaining the a posteriori probabilities from
`the channel makes it practically unnecessary to take account of the memory in
`any other way. For instance, on a fading channel when the fade persists for
`several baud lengths, the a posteriori probabilities will indicate the presence of
`a fade. If this channel were used as a BSC however, it would be necessary for
`the decoder to account for the fact that bursts of errors are more probable than
`isolated errors. Then, using a posteriori probabilities gives low-density decoding
`and sequential decoding a great flexibility in handling channels with dependent
`noise. For channels in which the noise is rigidly constrained to occur in short,
`severe bursts, on the other hand, there is a particularly simple procedure for
`decoding the Bose-Chaudhuri codes [12].
`When transmitting over channels subject to long fades or long noise bursts,
`it is often impractical to correct errors in these noisy periods. In such cases it is
`advantageous to use a combination of error correction and error detection with
`feedback and retransmission [16]. All of the coding and decoding schemes being
`considered here fit naturally into such a system, but in cases where little or no
`error correction is attempted, low-density codes appear at a disadvantage.
`In conclusion, all these schemes have their own advantages, and clearly no
`scheme is optimum for all communication situations.
`It appears that enough
`coding and decoding alternatives now exist for serious consideration of the use
`of coding on particular channels.
`
`10
`
`
`
`2 Distance Functions
`
`The distance function of a parity check—code N (Z) is defined as the number of
`code words in the code of weight 2. From the group properties of a parity-
`check code, it easily follows [12] that N (Z)
`is also the number of code words
`at distance 2 from any given code word. The minimum distance D of a code
`is then defined as the smallest value of If > 0 for which N (Z) 76 0. Clearly, in
`a code of given block length n and rate R it is desirable to make D as large
`as possible and to make N(E) as small as possible for those 2 just larger than
`D. However, the next chapter, which discusses bounding the probability of
`decoding error for symmetric binary-input channels, will make the exact effect
`of N (Z) on error-correcting capability clearer.
`For a parity—check code of long block length it is usually impractical to
`calculate exactly the distance function or even the minimum distance because
`of the enormous number of code words involved. It is often simpler to analyze the
`average distance function of an ensemble of codes; the statistics of an ensemble
`permit one to average over quantities that are not tractable in individual codes.
`From the ensemble average, one can then make statistical statements about the
`member codes.
`
`2.1 Equiprobable Ensemble of Parity-Check Codes
`
`This chapter will be concerned primarily with the distance_ functions of low-
`density parity—check codes, but for comparison purposes, the average distance
`function of another ensemble of parity—check codes will be derived first. Since a
`parity—check code is completely specified by a parity check matrix, an ensemble
`of parity—check codes may be defined in terms of an ensemble of parity—check
`matrices. The, equiprobable ensemble of parity—check codes of rate R and block
`length n will be defined as the ensemble in which the n(1 —- R) by n parity—check
`matrix is filled with statistically independent equiprobable binary digits. This
`is essentially the same ensemble as that considered by Elias [3] in his random
`coding bounds for parity—check codes; the minor difference is that codes in this
`ensemble may have a rate slightly higher than R, since the rows of a matrix in
`this ensemble are not necessarily independent over the modulo 2 field.
`
`Theorem 2.1. Let N (E) be the average number of code words of weight If in a
`code averaged over the equiprobable ensemble of parity-check codes of length n
`and rate R. Then for If > 0,
`
`2-"<1-R) g [21rn)\(1 — )\)]
`
`_i2
`
`expn[H(A) — (1 — R) In 2]
`
`(2.1)
`
`where
`
`A = 5n
`
`1
`l
`H(/\)—Aln—+(l—/\)ln1_/\
`A
`
`11
`
`
`
`Proof. Let P(€) be the probability of the set of codes for which some particular
`sequence of weight 2 is a code word. Stated differently, P(€)
`is the probability
`that a particular sequence of weight Z will be a code word in a code chosen at
`random from the ensemble. Since the all-zero code word is a code word in any
`parity-check code, P(Z) = 1 for Z = 0. For 6 95 0, a particular parity-check will
`check with probability :7 on the last position in which the 3 weight sequence has
`a one. This makes the probability % that a parity-check is satisfied regardless
`whether the first Z — 1 ones were checked an even or an odd number of times. A
`
`sequence will be a code if and only if it satisfies all the n(1 — R) parity checks,
`so that
`
`P(Z) = 2-"<1-R);
`
`for 2 ¢ 0
`
`can also be interpreted as an expectation of a random
`The probability P(€)
`variable that is 1 if the sequence is a code word and 0 otherwise. Now we observe
`that there are
`sequences of weight I and that the expected number of code
`words among these sequences is the sum of the expectations that the individual
`sequences are code words. Thus
`
`(z) =
`
`2-“<14”
`
`(2.2)
`
`We now bound
`
`by the Stirling approximation:
`
`1
`
`1
`
`n
`
`1
`
`n
`
`1
`
`1
`
`\/mm ‘°"‘p(”” + LT ’ 360113) 5 <3) 5 \/rm” ‘°"‘p(‘" + m) (23)
`
`It follows after some manipulation that for An = Z
`
`1
`n
`1
`1
`T: H A —T T-——— _
`
`./’27m,\(1 —,\) exp("
`
`(
`
`)
`
`12n«\(1—A)) < (M) < ,/27mA("1‘— ,\) exp"
`
`H A
`
`(
`
`)
`
`(2.4)
`
`where
`
`H(A) = —,\1n,\ — (1 — )l)l1'l(1 — ,\)
`
`Combining Equations (2.4) and (2.2), we get the statement of the theorem.
`El
`
`Next we observe that over the equiprobable ensemble of parity-check codes,
`the minimum distance of the individual codes is a random variable whose dis-
`
`tribution function can be bounded by the following theorem.
`
`Theorem 2.2. Over the equiprobable ensemble of parity-check codes of length n
`and rate R, the minimum-distance distribution function Pr(D 3 6n) is bounded
`by both the following inequalities for 6 < % and 6n and integer:
`
`1-6
`1
`Pr(D 56705 1-26 277116
`Pr(D 3 (Sn) 3 1
`
`12
`
`expn[H(6) —- (1 —R)1n2]
`
`(2.5)
`
`
`
`Proof. It was shown in Theorem 2.1 that the probability that a nonzero sequence
`is a code word over the ensemble of codes is 2"‘(1‘R). The probability that any
`sequence of weight 71.6 or less is a code word is certainly less than the sum of
`the probabilities that the individual sequences are code words. Thus,
`n6
`
`Pr(D 3 n5) 5 Z 2"‘(1‘R)
`
`[:1
`
`§(3)=($>l1+;%%fi+ +"‘l
`
`e=1
`
`Bounding this by a geometric series, we get
`
`i (27) S (.Z;)%
`
`(=1
`
`(2.6)
`
`<2-7>
`
`Bounding Equation (2.7) by (2.4) and substituting into Equation (2.6), we get
`the statement of the theorem.
`D
`
`As n gets larger, this bound to Pr(D 3 611,) as a function of 6 approaches
`a step function with the step at that 60 < % for which H(60) = (1 — R) In 2.
`Figure 2.4 plots 60 as a function of rate. This result is closely related to the
`Gilbert bound on minimum distance
`The asymptotic form of the Gilbert
`bound for large 12 states that there exists a code for which D 3 rule. Theorem 2.2
`states that for any 6 > 0, the probability of the set of parity-check codes for
`which D < n(6g — e) approaches 0 exponentially with n.
`
`2.2 Distance Properties of Low-Density Codes
`
`In this section an ensemble of low-density parity-check codes will be defined, and
`theorems similar to Theorems 2.1 and 2.2 will be proved. Then a new ensemble
`will be formed by expurgating those codes that have small minimum distances.
`This expurgated ensemble will be used in the next chapter to derive bounds on
`the probability of decoding error for various channels.
`Define an (n, j, is) parity-check matrix as a matrix of n colurrms that has 3'
`ones in each column, Is: ones in each row, and zeros elsewhere. It follows from
`this definition that an (n, j, k) parity-check matrix has nj /k rows and thus a rate
`R Z 1 — j/k. In order to construct an ensemble of (n, j, k)‘ matrices, consider
`first the special (n, j,lc) matrix in Figure 2.1, forvwhich n = 20, j = 3, and
`k = 4.
`
`'
`
`This matrix is divided into 1' submatrices, each containing a single 1 in
`each column. The first of these submatrices contains all its 1’s in descending
`order; that is, the 2'“ row contains 1’s in columns (12 — 1)k + 1 to 12k. The
`other submatrices are merely column permutations of the first. We define the
`ensemble of (n, j, k) codes as the ensemble resulting from random permutations
`of the columns of each of the bottom 3' — 1 submatrices of a matrix such as in
`
`13
`
`
`
`$©$©D—‘©$©©F-‘©©$$r—|
`
`COO»-GOOD!-IO<DOO<DI—I
`
`c>o»—-oc>oo»—-oooooo»—-
`
`ooaoooooaoooooooia
`
`.-oooooooor-nooopao
`
`oooo»—-coo»-cocoo-o
`
`coo»-occa-cocoon-mo
`
`OOHOQHOOOOOOOHO
`
`osaoooooocaw-coo-c>c:
`
`»—-ooooooo»-accou-co
`
`ooopaoo-aooooonaoo
`
`oo<:>o»—-+-oooooopaoo
`
`oo»--oc>oooo+—-on-ooo
`
`oaaooocaon--ooo»--coo
`
`:—'-cocoon-coco»-coo
`
`OOO»—IOr--OOOOOHOOO
`
`©Id©©©©®$P--‘©l—|©©©©
`
`OOOOHOOHOOHOOO