`
`Robert G. Gallager
`
`1963
`
`Apple 1205
`
`Apple 1205
`
`
`
`Preface
`
`The Noisy Channel Coding Theorem discovered by C. E. Shannon in 1948 of—
`fered cornrnunica-tion engineers the possibility of reducing error rates on noisy
`channeis to negligible levels without sacrificing data rates. The primary obstacle
`to the practical use of this theorem has been the equipment complexity and the
`computation time required to decode the noisy received data.
`This monograph presents a technique for achieving high data rates and neg-
`ligible error probabilities on noisy channels with a reasonable amount of equip-
`ment. The advantages and disadvantages of this technique over other techniques
`for the same purpose are neither simple nor clear—cut, and depend primarily
`upon the channel and the type of service required. More important than the
`particular technique, however, is the hope that the concepts here will lead to
`new and better coding procedures.
`The chapters of the monograph are arranged in such a way that with the
`exception of Chapter 5 each chapter can be read independently of the others.
`Chapter 1 sets the background of the study, summarizes the results, and briefly
`compares low—density coding with other coding schemes. Chapter 2 analyzes
`the distances between code words in low—density codes and Chapter 3 applies
`these results to the problem of bounding the probability of decoding error that
`can be achieved for these codes on a broad class of binary—input channels. The
`results of Chapter 3 can be immediately applied to any code or class of codes
`for which the distance properties can be bounded. Chapter 4 presents a simple
`decoding algorithm for these codes and analyzes the resulting error probabil-
`ity. Chapter 5 briefly extends all the previous results to inulti-input channels,
`and Chapter 6 presents the results of computer simulation of the low-density
`decoding algorithm.
`The work reported here is an expanded and revised version of my doctoral
`dissertation, completed in 1.960 in the Department of Electrical Engineering,
`M.I.T.
`I am grateful to my thesis supervisor, Professor Peter Elias, and to
`my thesis readers, Professors Robert M. Fano and John M. Wozencraft, for
`assistance and encouragement both during the course of the thesis and later.
`This research was made possible in part by support extended by the Research
`Laboratory of Electronics of the Massachusetts Institute of Technology, which is
`supported in part by the US. Army, the Air Force Oflice of Scientific Research,
`and the Oflice of Naval Research; additional support was received through the
`National Science Foundation (Grant G-16526) and the National Institute of
`Health (Grant MH-04737-03).
`Much of Chapter 4 is reprinted with permission of the editors from an article
`by the author in the Transactions of the I.R.E., IT—-9, pages 21 to 28.
`The experimental results in Chapter 6 were obtained in part through the
`support of the Rome Air Development Center and in part through the support
`of the M.I.T. Computation Center.
`
`Cambridge, Mass.
`July, 1963
`
`Robert G. Gallager
`
`
`
`Contents
`
`1
`
`Introduction
`
`1.1 Coding for Digital Data Iransmission
`1.2 Low—Density Parity—Check Codes .
`.
`1.3 Summary of Results
`.
`.
`.
`.
`.
`.
`.
`.
`.
`1.4 Comparison with Other Schemes
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`
`2 Distance Functions
`
`2.1 Equiprobable Ensemble of Parity-Check Codes
`2.2 Distance Properties of Low—Density Codes .
`.
`
`.
`
`.
`.
`.
`
`.
`.
`
`3 Probability of Decoding Error
`.
`.
`.
`.
`.
`.
`.
`.
`3.1 Symmetric Binary Input Channels
`.
`.
`.
`.
`.
`.
`.
`.
`.
`3.2 Distance Properties .
`.
`.
`.
`.
`.
`.
`.
`3.3 Upper Bound on Probability of Decoding Error .
`3.4 Chernov Bounds
`
`.
`.
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`.
`.
`.
`.
`.
`3.5 P: for Codes and Ensembles of Codes
`3.6 Error Probability for Equiprobable Code Ensemble .
`3.7 Binary Symmetric Channel
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`3.8 Upper Bound on Rate for Low—Densit.y Codes .
`.
`.
`.
`
`4 Decoding
`4.1
`Introduction .
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`.
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`.
`
`.
`
`.
`.
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`.
`
`.
`
`.
`.
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`.
`
`.
`
`.
`.
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`.
`
`.
`
`.
`.
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`.
`
`.
`
`.
`.
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`.
`
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`
`.
`.
`.
`
`.
`.
`.
`.
`
`.
`
`.
`.
`
`11
`11
`13
`
`21
`21
`21
`23
`25
`28
`31
`34
`3'?
`
`39
`39
`40
`44
`
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`4.2 Probabilistic Decoding .
`4.3 Probability of Error Using Probabilistic Decoding .
`
`.
`.
`
`5 Low-Density Codes with Arbitrary Alphabet Sizes
`5.1 Distance Functions .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`5.2 Probability of Decoding Error .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`5.3 Probabilistic Decoding .
`.
`.
`.
`.
`5.4 Probability of Error Using Probabilistic Decoding .
`
`6 Experimental Results
`6.1 Code Generation .
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`6.2 Binary Symmetric Channel
`6.3 White Gaussian Noise Channel
`
`6.4 Rayleigh Fading Channel
`
`.
`
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`
`.
`.
`.
`
`.
`
`.
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`.
`
`.
`
`.
`.
`.
`
`.
`
`.
`.
`
`A Properties of the Function B()i)
`
`B Miscellaneous Mathematical Derivations for Chapter 3
`
`C Analysis of Number of Independent Decoding Iterations
`
`References
`
`Index
`
`49
`49
`51
`52
`54
`
`57
`57
`57
`60
`62
`
`67
`
`72
`
`81
`
`89
`
`90
`
`
`
`1
`
`Introduction
`
`1.1 Coding for Digital Data Transmission
`
`The need for efficient and reliable digital data communication systems has been
`rising rapidly in recent years. This need has been brought on by a variety of
`reasons, among them being the increase in automatic data. processing equipment
`and the increased need for long range communication. Attempts to develop data
`systems through the use of conventional modulation and voice transmission
`techniques have generally resulted in systems with relatively low date rates and
`high error probabilities.
`A more fundamental approach to the problems of efiiciency and reliability
`in communication systems is contained in the Noisy Channel Coding theorem
`developed by C. E. Shannon [15, 4} in 1948. In order to understand the meaning
`of this theorem, consider Figure 1.1. The source produces binary digits, or
`binits, at some fixed time rate R3. The encoder is a device that performs data
`
`source binits
`
`Encoder
`operates on u
`hinits at a time
`
`y
`
`Decoder pro-
`duces replica of
`
`Figure 1.1: Block diagram of a communication system.
`
`processing, modulation, and anything else that might be necessary to prepare
`the data for transmission over the channel. We shall assume, however, that
`
`the encoder separates the source sequence into blocks of V binits and operates
`on only one block at a time. The encoder output is then transmitted over the
`channel and changed by some sort of random disturbance or noise. The decoder
`processes the channel output and produces a delayed replica of the source binits.
`The coding theorem states that for a large variety of channei models, encoders
`and decoders exist such that the probability of the decoder reproducing a source
`binit in error PE is bounded by
`
`e—v[Er.(Ri)+0(v}] 5 pg 5 ,,,—-':uz.)
`
`The functions E (Rg) and E;,(R¢) depend upon the channel but not upon v; they
`are positive when R; : 0, and decrease with R; until they become 0 at some
`time rate 0; known as the channel capacity. The exact nature of these functions
`and the particular class of channels for which this theorem has been proved need
`not concern us here. The important result is that the coding constraint length
`1/ is a fundamental parameter of a communication system. If a channel is to be
`used efiiciently, that is with R; close to Cg, then :2 must be made correspondingly
`large to achieve a satisfactory error probability.
`The obvious response of an engineer to such a theorem is: “Splendid, but how
`does one build encoders and decoders that behave in this way when :2 is large?”
`It is rather sobering to observe that if an encoder stores a waveform or code
`
`
`
`word for each possible block of 1» binits, then the storage requirement must be
`proportional to 2", which is obviously impractical when :2 is large. Fortunately,
`Elias [3] and Reiffen [14] have proved that for a wide variety of channel models,
`the results of the Noisy Channel Coding theorem can be achieved with little
`equipment complexity at the encoder by the use of parity-check coding. This
`will be described in more detail later.
`
`Unfortunately, the problem of decoding simply but effectively when :2 is
`large appears to be much more diflicult than the problem of encoding. Enough
`approaches to this problem have been developed to assure one that the Cod-
`ing theorem has engineering importance. On the other hand these approaches
`have not been carried far enough for the design of an efiicient, reliable data
`communication system to become a matter of routine engineering.
`This monograph contains a detailed study of one of the three or four most
`promising approaches to simple decoding for long constraint length codes. The
`purpose of publishing this work is primarily to show how such a coding and
`decoding scheme would work and where it might be useful. Also, naturally,
`it is hoped that this will stimulate further research on the subject. Further
`mathematical analysis will probably be fruitless, but there a.re many interesting
`modifications of the scheme that might be made and much experimental work
`that should be done.
`
`In order the prove mathematically some results about low—density parity-
`check codes, we shall assume that the codes are to be used on a somewhat
`restricted and idealized class of channels. It is obvious that results using such
`channel models can be applied only to channels that closely approximate the
`model. However, when studying the probability of decoding error, we are in-
`terested primarily in the extremely atypical events that cause errors. It is not
`easy to find models that approximate both these atypical events and typical
`events. Consequently the analysis of codes on idealized channels can provide
`oniy limited insight about real channels, and such insight should be used with
`caution.
`
`The channel models to be considered here are called symmetric binary—input
`channels. By this we mean a time-discrete channel for which the input is a
`sequence of the binary digits 0 and 1 and the output is a corresponding sequence
`of letters from a discrete or continuous alphabet. The channei is mernoryless in
`the sense that given the input at a given time, the output at the corresponding
`time is statistically independent of all other inputs and outputs. The symmetry
`requirement will be defined precisely in Chapter 3, but roughly it means that the
`outputs can be paired in such a way that the probability of one output given an
`input is the same as that of the other output of the pair given the other input.
`The binary symmetric channel, abbreviated BSC, is a member of this class of
`channels in which there are only two output symbols, one corresponding to each
`input. The BSC can be entirely specified by the probability of a crossover from
`one input to the other output.
`If a symmetric binary-input channel were to be used without coding, a se—
`quence of binary digits would be transmitted through the channel and the re
`ceiver would guess the transmitted symbols one at a time from the received
`
`
`
`If coding were to be used, however, the coder would first take se-
`symbols.
`quences of binary digits carrying the information from the source and would
`map these sequences into longer redundant sequences, called code words, for
`transmission over the channel. We define the rate R of such codes to be the
`
`ratio of the length of the information sequence to the length of the code word
`sequence.
`If the code words are of length ‘R, then there are 2“R possible se-
`quences from the source that are mapped into n—length code words. Thus only
`a fraction 2‘"(“R) of the 2” different n—length sequences can be used as code
`words.
`
`At the receiver, the decoder, with its knowledge of which sequences are code
`words, can separate the transmitted n—length code word from the channel noise.
`Thus the code word is mapped back into the 11.}? information digits. Many
`decoding schemes find the transmitted code word by first making a decision on
`each received digit and then using a knowledge of the code words to correct the
`errors. This intermediate decision, however, destroys a considerable amount of
`information about the transmitted message, as discussed in detail for several
`channels in Reference
`The decoding scheme to be described here avoids this
`intermediate decision and operates directly with the a. posteriori probabilities
`of the input symbols conditional on the corresponding received symbols.
`The codes to be discussed here are special examples of parity—check codes‘.
`The code words of a parity—check code are formed by combining a block of
`binary-information digits with a block of check digits. Each check digit is the
`modulo 2 sum” of a pre—specified set of information digits. These formation ruies
`for the check digits can be represented conveniently by a parity-check matrix,
`as in Figure 1.2. This matrix represents a set of linear homogeneous modulo 2
`equations called parity~check equations, and the set of code words is the set of
`solutions of these equations. We call the set of digits contained in a parity-check
`equation a pa.rity—check set. For example, the first parity—check set in Figure 1.2
`is the set of digits (1,2,3, 5).
`
`.171
`
`(E2
`
`.113
`
`$4
`
`$5
`
`IE3
`
`IE7
`
`.'.":—,v=:E1-i-:.‘23+5."4
`
`:.t‘5=.."L'1+I-g+:t‘,-3
`..":5=i'?1+$«;+:r4
`
`Figure 1.2: Example of a parity—check matrix.
`
`The use of-parity check codes makes coding [as distinguished from decoding)
`relatively simple to implement. Also, as Elias [3] has shown, if a typical parity-
`check code of long block length is used on a BSC, and if the code rate is between
`critical rate and channel capacity, then the probability of decoding error will be
`almost as small as that for the best possible code of that rate and block length.
`
`1F‘or a more detailed discussion of parity—check codes, see Peterson [12].
`‘‘’''The module 2 sum is 1 if the ordinary sum is odd and 0 if the ordinary sum is even.
`
`
`
`Unfortunately, the decoding of parity-check codm is not inherently simple
`to implement; thus we must look for special classes of parity—check codes, such
`as described in Section 1.2, for which reasonable decoding procedures exist.
`
`1.2 Low-Density Parity-Check Codes
`
`Low—density parity-check codes are codes specified by a matrix containing mostly
`0’s and relatively few 1’s. In particular, an (1't,j, k) low—density code is a code of
`block length n with a matrix like that of Figure 2.1, where each column contains
`a small fixed number 3' of 1‘s and each row contains a small fixed number .3; of
`1's. Note that this type of matrix does not have the check digits appearing
`in diagonal form as do those in Figure 1.2. However, for coding purposes, the
`equations represented by these matrices can always be solved to give the check
`digits as explicit sums of information digits.
`Low density codes are not optimum in the somewhat artificial sense of min-
`imizing the probability of decoding error for a given block length, and it can
`be shown that the maximum rate at which they can be used is bounded below
`channel capacity. However, the existence of a simple decoding scheme more
`than compensates for these disadvantages.
`
`1.3 Summary of Results
`
`An ensemble of (n, _;«‘,k) codes will be formed in Chapter 2, and this ensemble
`will be used to analyze the distance properties of (n, 3', F9) codes. The distance
`between two words in a code is simply the number of digits in which they differ.
`Clearly an important parameter in a code is the set of distances separating one
`code word from all the other code words. I11 :1 parity—check code, it can be shown
`that all code words have the same set of distances to the other code words [12].
`Thus the distance properties for the ensemble can be summarized by the typical
`number of code words at each distance from the al1—2ero code word. It is found
`
`that the typical (n, 3', k) code for 3' 2 3 has a minimum distance that increases
`linearly with the block length for 3' and Is: constant. Figure 2.4 plots the ratio
`of minimum distance to block length for several values of 3' and is and compares
`the ratio with the same ratio for ordinary parity-check codes. The (n, j,:'c)
`codes with j = 2 exhibit markedly different behavior, and it is shown that the
`minimum distance of an (n, 2, la) code can increase at most logarithmically with
`the block length.
`In Chapter 3, a general upper bound on the probability of decoding error for
`symmetric binary—input channels with maximum-likelihood decoding is derived
`for both individual codes and for arbitrary ensembles of codes. The bound is
`a function of the code only through its distance properties. The assumption of
`maximum-likelihood decoding is made partly for analytic convenience and partly
`so as to be able to evaluate codes independently of their decoding algorithms.
`Any practical decoding algorithm, such as that described in Chapter 4, involves
`a trade-off between error probability and simplicity; the maxirnumdikelihood
`
`
`
`decoder minimizes the error probability but is totally impractical if the block
`length is large.
`It is shown in Chapter 3 that if the distance properties of the code are
`exponentially related to the block length, and if the code rate is sufliciently low,
`then the bound to P(e) is an exponentially decreasing function of the block
`length. For the appropriate ensemble of codes, these hounds reduce to the
`usual random coding bounds [3, 4].
`For the special case of the binary symmetric channel, a particularly simple
`bound to P(e) is found;
`this is used to show that over a range of channel
`crossover probabilities, a typical low—density code has the same error behavior
`as the optimum code of a slightly higher rate. Figure 3.5 illustrates this loss of
`effective rate associated with low—density codes.
`In Chapter 4, two decoding schemes are described. In the first, which is par-
`ticularly simple, the decoder first makes a decision on each digit, then computes
`the parity checks and changes any digit that is contained in more than some
`fixed number of unsatisfied parity—check equations. The process is repeated,
`each time using the changed digits, until the sequence is decoded. The second
`decoding scheme is based on a procedure for computing the conditional proba-
`bility that an input symbol is 1; this is conditioned on all the received symbols
`that are in any of the parity—check sets containing the digit in question. Once
`again, the procedure is iterated until the sequence is decoded. The computation
`per digit per iteration in each scheme is independent of the code length. The
`probabilistic, or second scheme, entails slightly more computation than the first
`scheme, but decodes with a lower error probability.
`A mathematical analysis of the probability of decoding error using proba-
`bilistic decoding is difficult because of statistical dependencies. However, for a
`BSC with sufliciently small cross-over probabilities and for codes with 3' 3 4,
`a very weak upper bound the probability of error is derived that decreases
`exponentially with a root of the code length. Figure 3.5 plots cross-over proba-
`bilities for which the probability of decoding error is guaranteed to approach 0
`with increasing code length. It is hypothesized that the probability of decoding
`error actually decreases exponentiaily with block length, while the number of
`iterations necessary to decode increases logarithmically.
`Chapter 5 extends all the major results of Chapters 2, 3, and 4 to non-
`binary low—density parity-'~check codes. Although the theory generalizes in a
`very natural way, the expressions for minimum distance, error probability, and
`probabilistic decoding performance error are sufficiently complicated that little
`insight is gained into the advantages or disadvantages of a multilevel system over
`a binary system. Some experimental work would be helpful here in evaluating
`these codes.
`I
`Some experimental results for binary low—density codes are presented in
`Chapter 6. An IBM 7090 computer was used to simulate both probabilistic
`decoding and the noise generated by several different types of channels. Due
`to limitation on computer time, the only situations investigated were those in
`which the channel was sufficiently noisy to yield a probability of decoding er-
`ror greater than 1{]“". The most spectacular data from these experiments are
`
`
`
`given in Figure 6.8, which emphasizes the advantages of a decoding scheme that
`operates from a likelihood receiver instead of a decision receiver.
`
`1.4 Comparison with Other Schemes
`
`Some other coding and decoding schemes that appear extremely promising for
`achieving low error probabilities and high data rates at reasonable cost are the
`following: first, convolutional codes [3] with sequential decoding as developed
`by Wozencraft [17], Fano [5], and Reiffen [14], second, convolutional codes with
`Massey‘s threshold decoding [10]; and third, the Bose—Chaudhuri codes [2] with
`the decoding schemes developed by Peterson [12] and Zierler and Gorenstein [18].
`It has been shown by Fano [5] that for arbitrary discrete memoryless chan-
`nels, sequential decoding has a probability of decoding error that is upper
`bounded by a function of the form e‘°‘”. Here n is the constraint length of
`the code and 0: is a function of both the channel and the code; 0: is positive for
`rates below channel capacity C’. Fano also shows that for rates below a certain
`quantity called Rcomp, where Rcgfnp < C’, the average amount of computation
`in decoding a digit is bounded by a quantity independent of constraint length.
`An experimental sequential decoder has been built at Lincoln Laboratories,
`Lexington, Massachusetts [11]. By using this decoder in a system with a feed-
`back link and mi appropriately designed modulator and demodulator, reliable
`transmission has been achieved experimentally [9] over a telephone circuit at
`about 7500 bits per second rather than the 1200 or 2400 bits per second possi-
`ble without coding.
`The two principal weaknesses of sequential decoding are as follows: First,
`the amount of computation required per digit is a random variable, and this
`creates a waiting line problem at the decoder; second, if the decoder once makes
`an error, a large block of errors can be made before the decoder gets back on
`the proper track. If a feedback link is available, these problems are not serious,
`but considerably more study is required for cases in which no feedback exists.
`Threshold decoding is the simplest scheme to implement that is discussed
`here; it involves only shift registers, a few binary adders, and a threshold device.
`It is most effective at relatively short constraint lengths, and has a somewhat
`higher error probability and less flexibility than sequential decoding.
`The computation per digit associated with the Bose-Chaudhuri codes on the
`BSC increases roughly as the cube of the block length but does not fluctuate
`widely. The decoding scheme guarantees correction of all combinations of up to
`some fixed number of errors and corrects nothing beyond. For moderately long
`block lengths, this restriction in the decoding procedure causes a large increase
`in P9. No way is known to make use of the a po.sterior~i probabilities at the
`output of more general binary input channels. This inability to make use of a
`posterilorii probabilities appears to be a characteristic limitation of algebraic as
`opposed to probabilistic decoding techniques.
`The computation per digit associated with low—density parity~check codes
`appears to increase at most logarithmically with block length and not to fluctu-
`ate widely with the noise. The probability of decoding error is unknown, but is
`
`
`
`believed to decrease exponentially with block length at a reasonable rate. The
`ability to decode the digits of a block in parallel makes it possible to handle
`higher data rates than is possible with other schemes.
`For many channels with memory, retaining the a. posteriori probabilities from
`the channel makes it practically unnecessary to take account of the memory in
`any other way. For instance, on a fading channel when the fade persists for
`several baud lengths, the a posteriori probabilities will indicate the presence of
`a fade. If this channel were used as a BSC however, it would be necessary for
`the decoder to account for the fact that bursts of errors are more probable than
`isolated errors. Then, using a posteriori probabilities gives low~density decoding
`and sequential decoding a great flexibility in handling channels with dependent
`noise. For channels in which the noise is rigidly constrained to occur in short,
`severe bursts, on the other hand, there is a particularly simple procedure for
`deooding the Bose—Chaudhuri codes [12].
`When transmitting over channels subject to long fades or long noise bursts,
`it is often impractical to correct errors in these noisy periods. In such cases it is
`advantageous to use a combination of error correction and error detection with
`feedback and retransmission [16]. All of the coding and decoding schemes being
`considered here fit naturally into such a system, but in cases where little or no
`error correction is attempted, low—density codes appear at a disadvantage.
`In conclusion, all these schemes have their own advantages, and clearly no
`scheme is optimum for all communication situations.
`It appears that enough
`coding and decoding alternatives now exist for serious consideration of the use
`of coding on particular channels.
`
`10
`
`
`
`2 Distance Functions
`
`The distance function of a parity check—code N (3) is defined as the number of
`code words in the code of weight 3. From the group properties of a parity-
`check code, it easily follows [12] that N (E) is also the number of code words
`at distance if from any given code word. The minimum distance D of a code
`is then defined as the smallest value of if > U for which N (I?) 94 0. Clearly, in
`a code of given block length ‘ft and rate R it is desirable to make D as large
`as possible and to make N (E) as small as possible for those 8 just larger than
`D. However, the next chapter, which discusses bounding the probability of
`decoding error for symmetric binary—input channels, will make the exact effect
`of N (E) on error—correcting capability clearer.
`For a parity-check code of long block length it is usually impractical to
`calculate exactly the distance function or even the minimum distance because
`of the enormous number of code words involved. It is often simpler to analyze the
`average distance function of an ensemble of codes; the statistics of an ensemble
`permit one to average over quantities that are not tractable in individual codes.
`From the ensemble average, one can then make statistical statements about the
`member codes.
`
`2.1 Equiprobable Ensemble of Parity—Check Codes
`
`This chapter will be concerned primarily with the distance functions of iow—
`density parity—check codes, but for comparison purposes, the average distance
`function of another ensemble of parity—check codes will be derived first. Since a
`parity-check code is completely specified by a parity check matrix, an ensemble
`of parity-—check codes may be defined in terms of an ensemble of parity~check
`matrices. The equiprobable ensemble of parity—checl< codes of rate R and block
`length n will be defined as the ensemble in which the n(1— R) by n parity-check
`matrix is filled with statistically independent equiprobable binary digits. This
`is essentially the same ensemble as that considered by Elias [3] in his random
`coding bounds for parity—check codes; the minor difference is that codes in this
`ensemble may have a rate slightly higher than R, since the rows of a. matrix in
`this ensemble are not necessarily independent over the module 2 field.
`
`be the average number of code words of weight 2 in a.
`Theorem 2.1. Let N(€)
`code averaged over the eqaipmbable ensemble of pun‘ty—check codes of length n
`and rate R. Then for :2’ > U,
`
`2-"“-*‘“ g [2‘lT1'1./\(1 — ,\)] ‘i expn[H(,\) — (1 — R) in 2]
`
`(2.1)
`
`where
`
`i=51;
`
`H()\) = ,\1n§+(1~—)l)ln
`
`1
`
`1-—)l
`
`11
`
`
`
`Proof. Let P ( E) be the probability of the set of codes for which some particular
`sequence of weight 2 is a code word. Stated differently, P(€)
`is the probability
`that a particular sequence of weight E will be a. code word in a code chosen at
`random from the ensemble. Since the all—zero code word is a code word in any
`parity-check code, P(£] = 1 for E = 0. For 8 ;£ 0, a particular pa:-ity—check will
`check with probability
`on the last position in which the 9 weight sequence has
`a one. This makes the probability % that a parity-chedc is satisfied regardless
`whether the first 3 —— 1 ones were checked an even or an odd number of times. A
`
`sequence will be a code if and only if it satisfies all the n(1 — R) parity checks,
`so that
`
`13(2) = 2-"(‘*R);
`
`for 2 75 0
`
`can also be interpreted as an expectation of a random
`The probability P(€)
`variable that is 1 if the sequence is a code word and 0 otherwise. Now we observe
`that there are
`sequences of weight if and that the expected number of code
`words among these sequences is the sum of the expectations that the individual
`sequences are code words. Thus
`
`‘("93 =
`
`2-“<14”
`
`(2.2)
`
`We now bound
`
`by the Stirling approximation:
`
` Ttn exp(—n + fi — sfiéna) 5
`
`5 En“ exp(—n + $1) (2.3)
`
`It follows after some manipulation that for An = if
`
` exp(nH()i) —
`
`where
`
`< (1:3) < expnH()\)
`
`(2-4)
`
`H(/\) = —)i1n/\ — (1 -— )i)ln(1 —)k)
`
`Combining Equations (2.4) and (2.2), we get the statement of the theorem.
`D
`
`Next we observe that over the equiprobable ensemble of parity-check codes,
`the minimum distance of the individual codes is a random variable whose dis-
`tribution function can be bounded by the following theorem.
`
`Theorem 2.2. Over the eqniprobabie ensemble of pority—chec:'s: codes of length n
`and rate R, the minimum-distance distribution function Pr(D 5 Sn) is bounded
`by both the foiliowing inequalities for 6 < % and on and integer.-
`
`Pr(D 5 511.) 5
`
`l
`1 — 2:5
`
`1 — 6
`2:r'rn6
`
`Pr(D 5 do.) 5 1
`
`12
`
`expn[H(6) — (1 — R) In 2]
`
`(2-5)
`
`
`
`Proof. It was shown in Theorem 2.1 that the probability that a nonzero sequence
`is a code word over the ensemble of codes is 2""“‘R3. The probability that any
`sequence of weight no or less is a code word is certainly less than the sum of
`the probabilities that the individual sequences are code words. Thus,
`
`Pun g ms) g E 2410-“?
`
`E 2
`
`{=1
`
`[H n—:.fl+1 ‘L (n—n:i(i36(;E)n6+2)
`
`Bounding this by a geometric series, we get
`
`(E) s (.’.2)%
`
`(2.5)
`
`W)
`
`Bounding Equation (2.7) by (2.4) and substituting into Equation (2.6), we get
`the statement of the theorem.
`[3
`
`As it gets larger, this bound to Pr(D 5 6n) as a function of 6 approaches
`a. step function with the step at that 50 < % for which H(do} = (1 — R) in 2.
`Figure 2.4 plots do as a function of rate. This result is ciosely related to the
`Gilbert bound on minimum distance [6]. The asymptotic form of the Gilbert.
`bound for large n states that there exists :1 code for which D 3 min. Theorem 2.2
`states that for any 5 > 0, the probability of the set of parity-check codes for
`which D < n(6g — e) approaches 0 exponentially with n.
`
`2.2 Distance Properties of Low-Density Codes
`
`In this section an ensemble of low—density parity—check codes will be defined, and
`theorems similar to Theorems 2.1 and 2.2 will be proved. Then a new ensemble
`will be formed by expurgating those codes that have small minimum distances.
`This expurgated ensemble will be used in the next chapter to derive bounds on
`the probability of decoding error for various channels.
`Define an (n, 3', 3:) parity~c-heck matrix as a matrix of 1:. columns that has 3'
`ones in each column, is ones in each row, and zeros elsewhere. It follows from
`this definition that an (rt, 3', 1:) parity-check matrix has nj/lc rows and thus a rate
`R Z 1 — j/k.
`In order to construct an ensemble of (re, j, k) matrices, consider
`first the special (n,3',lc) matrix in Figure 2.1, for which n = 20, j = 3, and
`it = 4.
`
`This matrix is divided into 3' submatrices, each containing a single 1 in
`each column. The first of these submatrices contains all its 1’s in descending
`order; that is, the W‘ row contains 1's in columns (i — 1).‘: + 1 to ilk. The
`other submatrices are merely column permutations of the first. We define the
`ensemble of (n, j, 1:) codes as the ensemble resulting from random permutations
`of the columns of each of the bottom j ~ 1 submatrices of a matrix such as in
`
`13
`
`
`
`$336!--‘EGGS:-|~DGG@I-I
`
`SS6}--|©SDClId®GGG$I—|
`
`SEHSGGGBGGGGGSH
`
`31-03333)-‘©©©$$©|=|—I
`
`I—|$$3©Q