`
`Thesis by
`
`Aamod Khandekar
`
`In Partial Fulfillment of the Requirements
`
`for the Degree of
`
`Doctor of Philosophy
`
`California Institute of Technology
`
`Pasadena, California
`
`2002
`
`(Submitted June 10, 2002)
`
`i
`
`Apple 1018
`
`
`
`ii
`
`c(cid:2) 2002
`
`Aamod Khandekar
`
`All Rights Reserved
`
`
`
`iii
`
`Acknowledgements
`
`I would like to begin by expressing my heartfelt gratitude to my advisor, Prof. Robert
`
`McEliece, for his active guidance and support throughout my stay at Caltech. His
`
`infectious enthusiasm and his many insights have been invaluable for the progress of
`
`my research career.
`
`I would like to thank Prof. Robert McEliece, Prof. P. P. Vaidyanathan, Prof.
`
`Jehoshua (Shuki) Bruck, Prof. John Preskill, Prof. Babak Hassibi and Dr. Dariush
`
`Divsalar for serving on my candidacy and defense committees, and for their advice
`
`on this thesis.
`
`I am grateful to all the current and former occupants of Moore 155, and to current
`
`and former residents of Braun graduate house, for making my stay at Caltech a very
`
`pleasant one.
`
`
`
`iv
`
`Abstract
`
`The field of error correcting codes was revolutionized by the introduction of turbo
`
`codes [7] in 1993. These codes demonstrated dramatic performance improvements
`
`over any previously known codes, with significantly lower complexity. Since then,
`
`much progress has been made towards understanding the performance of these codes,
`
`as well as in using this understanding to design even better codes.
`
`This thesis takes a few more steps in both these directions. We develop a new
`
`technique, called the typical set bound, for analyzing the asymptotic performance
`
`of code ensembles based on their weight enumerators. This technique yields very
`
`tight bounds on the maximum-likelihood decoding threshold of code ensembles, and
`
`is powerful enough to reproduce Shannon’s noisy coding theorem for the class of
`
`binary-input symmetric channels.
`
`We also introduce a new class of codes called irregular repeat-accumulate (IRA)
`
`codes, which are adapted from the previously known class of repeat-accumulate (RA)
`
`codes. These codes are competitive in terms of decoding performance with the class
`
`of irregular low-density parity-check (LDPC) codes, which are arguably the best class
`
`of codes known today, at least for long block lengths. In addition, IRA codes have a
`
`significant advantage over irregular LDPC codes in terms of encoding complexity.
`
`We also derive an analytical bound regarding iterative decoding thresholds of code
`
`ensembles on general binary-input symmetric channels, an area in which theoretical
`
`results are currently lacking.
`
`
`
`v
`
`Contents
`
`Acknowledgements
`
`Abstract
`
`1 Introduction
`
`1.1 Some Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.1.1 Channel Models . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.1.2 Codes and Code Ensembles
`
`. . . . . . . . . . . . . . . . . . .
`
`1.1.3 Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . . .
`
`1.2 Some Graphical Code Ensembles
`
`. . . . . . . . . . . . . . . . . . . .
`
`1.2.1 Parallel Concatenation of Convolutional Codes (PCCC) . . . .
`
`1.2.2
`
`Serial Concatenation of Convolutional Codes (SCCC) . . . . .
`
`1.2.3 Codes Defined on Tanner Graphs . . . . . . . . . . . . . . . .
`
`1.2.4 Decoding on Tanner Graphs . . . . . . . . . . . . . . . . . . .
`
`1.2.5 Low-Density Parity-Check (LDPC) Codes
`
`. . . . . . . . . . .
`
`1.2.6 Repeat Accumulate (RA) Codes . . . . . . . . . . . . . . . . .
`
`1.3 Density Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.3.1 Density Evolution on the BEC . . . . . . . . . . . . . . . . . .
`
`1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2 The Typical Set Bound
`
`2.1 The Union Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.2 The Typical Set Decoder . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.3 The Typical Set Bound . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`iii
`
`iv
`
`1
`
`2
`
`2
`
`6
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`15
`
`15
`
`17
`
`18
`
`19
`
`21
`
`22
`
`24
`
`28
`
`
`
`vi
`
`2.3.1 Properties of K(δ)
`
`. . . . . . . . . . . . . . . . . . . . . . . .
`
`2.3.2 The Main Theorem . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.4 The Typical Set Bound for Specific Channel Models . . . . . . . . . .
`
`2.4.1 The Binary Symmetric Channel
`
`. . . . . . . . . . . . . . . . .
`
`2.4.2 The Binary Erasure Channel . . . . . . . . . . . . . . . . . . .
`
`2.5 The Typical Set Bound for Specific Code Ensembles . . . . . . . . . .
`
`2.5.1 The Ensemble of Random Linear Codes
`
`. . . . . . . . . . . .
`
`2.5.2 LDPC Codes
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.5.3 Cycle Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.5.4 RA Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.6 Generalization to Arbitrary BISC’s . . . . . . . . . . . . . . . . . . .
`
`3 Irregular Repeat-Accumulate Codes
`
`3.1
`
`Irregular LDPC Codes: Definition . . . . . . . . . . . . . . . . . . . .
`
`3.2
`
`Irregular LDPC Codes on the BEC . . . . . . . . . . . . . . . . . . .
`
`3.3
`
`IRA Codes: Definition and Notation . . . . . . . . . . . . . . . . . .
`
`3.4
`
`IRA Codes on the BEC . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`3.4.1 Fixed Point Analysis of Iterative Decoding . . . . . . . . . . .
`
`3.4.2 Capacity Achieving Sequences of Degree Distributions . . . . .
`
`3.4.3
`
`Some Numerical Results . . . . . . . . . . . . . . . . . . . . .
`
`3.5
`
`IRA Codes on the BIAGN Channel
`
`. . . . . . . . . . . . . . . . . . .
`
`3.5.1 Gaussian Approximation . . . . . . . . . . . . . . . . . . . . .
`
`3.5.2 Numerical Results
`
`. . . . . . . . . . . . . . . . . . . . . . . .
`
`3.6 Complexity Analysis
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`4 A Lower Bound on Iterative Decoding Thresholds for General
`
`BISC’s
`
`29
`
`30
`
`33
`
`33
`
`34
`
`36
`
`36
`
`38
`
`40
`
`42
`
`43
`
`46
`
`47
`
`49
`
`51
`
`54
`
`54
`
`56
`
`59
`
`61
`
`61
`
`64
`
`67
`
`72
`
`73
`
`
`
`vii
`
`4.1
`
`Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`4.1.1 The Consistency Condition . . . . . . . . . . . . . . . . . . .
`
`4.1.2 The Stability Condition . . . . . . . . . . . . . . . . . . . . .
`
`4.2 The Main Result
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`5 IRA Codes on Non-Binary Channels
`
`5.1 The 2-D Gaussian Channel . . . . . . . . . . . . . . . . . . . . . . . .
`
`5.2 The Binary Adder Channel
`
`. . . . . . . . . . . . . . . . . . . . . . .
`
`5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`6 Conclusions
`
`A Miscellaneous Derivations for Chapter 2
`
`B Miscellaneous Derivations for Chapter 3
`
`C Miscellaneous Derivations for Chapter 4
`
`Bibliography
`
`73
`
`73
`
`74
`
`75
`
`80
`
`82
`
`82
`
`86
`
`89
`
`90
`
`92
`
`94
`
`98
`
`100
`
`
`
`viii
`
`List of Figures
`
`1.1 A canonical communication system. . . . . . . . . . . . . . . . . . . .
`
`3
`
`1.2 Parallel concatenation of two convolutional codes, connected through
`
`a random interleaver. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`10
`
`1.3 Serial concatenation of two convolutional codes, connected through a
`
`random interleaver.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.4 A repeat-accumulate (RA) code. . . . . . . . . . . . . . . . . . . . . .
`
`1.5 A small Tanner graph.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.6 The Tanner graph of an RA code. . . . . . . . . . . . . . . . . . . . .
`
`2.1 The function Kp(δ) for the BSC.
`
`. . . . . . . . . . . . . . . . . . . .
`
`2.2 The function Kp(δ) for the BEC.
`
`. . . . . . . . . . . . . . . . . . . .
`
`2.3 The asymptotic spectral shape of the ensemble of rate 1/3 random
`
`linear codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.4 The asymptotic spectral shape of the ensemble of (4, 8) LDPC codes.
`
`2.5 The asymptotic spectral shape of the ensemble of q = 5 RA codes. . .
`
`3.1 The Tanner graph of an irregular LDPC code.
`
`. . . . . . . . . . . . .
`
`3.2 The Tanner graph of an IRA code.
`
`. . . . . . . . . . . . . . . . . . .
`
`3.3 Performance of rate 1/2 IRA codes on the BIAGN channel. . . . . . .
`
`3.4 Variation of decoded BER with the number of iterations.
`
`. . . . . . .
`
`4.1 BIAGN channel thresholds of codes optimized for the BEC.
`
`. . . . .
`
`4.2 BSC thresholds of codes optimized for the BEC. . . . . . . . . . . . .
`
`12
`
`12
`
`13
`
`16
`
`33
`
`34
`
`36
`
`39
`
`42
`
`48
`
`51
`
`67
`
`70
`
`75
`
`76
`
`
`
`ix
`
`5.1 Performance of an IRA code on the 2-D Gaussian channel with 8-PSK
`
`modulation.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`84
`
`5.2 Performance of an IRA code on the 2-D Gaussian channel with 16-
`
`QAM modulation.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`5.3 Graphical representation of a BAC coding scheme. . . . . . . . . . . .
`
`85
`
`87
`
`
`
`x
`
`List of Tables
`
`2.1 BSC and BEC typical set thresholds for LDPC codes. . . . . . . . . .
`
`2.2 BSC and BEC typical set thresholds for RA codes.
`
`. . . . . . . . . .
`
`2.3 Some BIAGN channel typical set thresholds.
`
`. . . . . . . . . . . . . .
`
`40
`
`43
`
`44
`
`3.1 BEC thresholds for some ensembles of IRA codes.
`
`. . . . . . . . . . .
`
`60
`
`3.2 Some good degree sequences for rate 1/3 IRA codes on the BIAGN
`
`channel.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`65
`
`3.3 Some good degree sequences for rate 1/2 IRA codes on the BIAGN
`
`channel.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`66
`
`4.1 Comparison between capacity and threshold achieved by codes opti-
`
`mized for the BEC. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`81
`
`
`
`1
`
`Chapter 1 Introduction
`
`In his seminal paper in 1948, Shannon [45] derived the fundamental limits on the rate
`
`of communication on a noisy channel. Shannon’s “noisy coding theorem” states that
`
`every channel has a capacity C, which is the highest rate (in bits per channel use)
`
`at which reliable communication is possible. Shannon also showed that for a long
`
`enough block length, almost any block code of rate R < C, with optimal decoding,
`
`provides reliable communication over this channel. This scheme, however, is not a
`
`practical one, since both encoding and decoding are prohibitively expensive.
`
`Ever since Shannon proved his noisy coding theorem, the construction of practical
`
`capacity-achieving schemes has been the supreme goal of coding theory. The classical
`
`approaches to this problem included algebraic block codes and convolutional codes.
`
`The field was, however, revolutionized by the introduction of turbo codes by Berrou,
`
`Glavieux, and Thitimajshima [7]. The performance of turbo codes was much closer
`
`to capacity than that of any previous codes, and with significantly lower complexity.
`
`The power of turbo codes comes not only from the code construction, but also
`
`from the iterative decoding algorithm used. The code construction consists of simple
`
`blocks connected by a pseudorandom interleaver. The interleaver introduces enough
`
`randomness into the code to ensure good performance, yet keeps enough structure
`
`to allow simple encoding and decoding algorithms. The invention of turbo codes led
`
`to an explosion of interest in the field of codes on graphs and iterative decoding,
`
`which led among other things to the rediscovery of low-density parity-check (LDPC)
`
`codes [17, 37], and the invention of repeat-accumulate (RA) codes [15].
`
`While the turbo-decoding algorithm worked extremely well in practice, there was
`
`very little theory available to explain its dramatic performance. It was soon discov-
`
`
`
`2
`
`ered that the turbo-decoding algorithm was an instance of Pearl’s belief propagation
`
`algorithm over loopy graphs [39], and several frameworks were developed to formalize
`
`this notion [39, 4, 31]. The belief propagation algorithm is known to terminate in
`
`finite time in the case of non-loopy graphs (trees) and is optimal in this case. Its
`
`behavior on general graphs was, however, not known. Some results in this direction
`
`were presented in [2, 51, 52]. Luby et al. [33, 34], followed by Richardson and Ur-
`
`banke [42], showed how LDPC codes could be approximated by the cycle-free case,
`
`and were able to prove asymptotic results about their iterative decoding performance.
`
`Luby et al. also introduced the concept of irregularity, which seems to provide hope
`
`of operating arbitrarily close to channel capacity in a practical manner, on a wide
`
`class of channel models. More recently, connections have been discovered between
`
`coding theory and statistical physics [54, 55, 5], which show some hope of providing
`
`insight into the general problem of decoding on loopy graphs.
`
`All the codes mentioned in the preceding paragraph are so-called “graphical code
`
`ensembles.” Most of this thesis deals with the analysis and design of such code
`
`ensembles, and hence we devote the rest of this chapter to some basic background
`
`material concerning them.
`
`1.1 Some Basic Concepts
`
`1.1.1 Channel Models
`
`A canonical communication system is depicted in Figure 1.1. The objective is to
`
`communicate an input string U across a “noisy”channel. The encoder converts this
`
`string into another string X of symbols over the channel’s input alphabet. A string
`
`Y is seen at the other end of the channel, which is a non-deterministic function of
`
`the channel input X, and the decoder tries to reconstruct the input string U based
`
`on the knowledge of Y. To analyze such a system, we need to have a model for the
`
`
`
`3
`
`Encoder
`
`Channel
`
`U
`
`X
`
`Y
`
`Decoder
`
`Û
`
`Figure 1.1: A canonical communication system.
`
`channel. Typically, we assume that the output string Y of the channel has the same
`
`length as the input string X, and depends on X via a conditional probability density
`function (pdf) pY|X(y|x). This is still an extremely general model and we therefore
`
`define several special cases.
`
`Definition 1.1 A channel is called memoryless if the channel output at any time
`
`(cid:2)n
`instant depends only on the input at that time instant, i.e., if y = y1y2 . . . yn and
`x = x1x2 . . . xn, then pY|X(y|x) =
`i=1 pY |X(yi|xi).
`
`In this case, the channel is
`
`completely described by its input and output alphabets, and the conditional pdf
`pY |X(y|x) for one time instant.
`
`Definition 1.2 (Capacity) Let the input to a memoryless channel be generated by
`
`independent and identically distributed (iid) copies of a random variable X. Then
`
`the output of the channel will be iid copies of some random variable Y whose pdf
`
`can be computed. Clearly, I(X; Y ), the information between random variables X
`
`and Y (see [10] for a definition), is a function of the pdf of X. The capacity of the
`
`memoryless channel is then defined as suppX (x) I(X; Y ). The motivation behind this
`
`definition will be clear in Section 1.1.2.
`
`Example 1.1 A very commonly used channel model is the additive Gaussian noise
`
`channel. This channel has R as its input and output alphabets, and is parametrized
`
`by a non-negative real number σ. The channel output Y is given by X + N , where
`
`X is the channel input and N is a Gaussian random variable (random variable) with
`mean 0 and variance σ2. The conditional pdf pY |X(y|x) is therefore a Gaussian pdf
`
`
`
`with mean x and variance σ2.
`
`4
`
`Definition 1.3 A binary input channel is merely a channel with a binary input alpha-
`bet. We will interchangeably use the sets {0, 1} and {+1,−1} for the input alphabet
`
`with 0 mapping to +1 and 1 to -1.
`
`Example 1.2 (The Z-Channel) This is a binary input channel with a parameter
`p and output alphabet {0, 1}, in which a 0 is always received as a 0, but a 1 could be
`received as a 0 with probability p. That is, pY |X(0|0) = 1, pY |X(1|0) = 0, pY |X(0|1) =
`p, and pY |X(1|1) = 1 − p.
`
`Suppose we want to compute the input distribution of a binary-input channel
`
`conditioned on the knowledge of the received value y, i.e., we want to compute the
`a posteriori probabilities Pr(X = 0|Y = y) and Pr(X = 1|Y = y), which sum to 1.
`
`This knowledge can be efficiently packaged into one number, for example their ratio.
`
`By Bayes’ rule, we have
`
`Pr(X = 0|Y = y)
`Pr(X = 1|Y = y)
`
`=
`
`pY |X(y|0)
`pY |X(y|1)
`
`Pr(X = 0)
`Pr(X = 1)
`
`.
`
`(1.1)
`
`Hence the quantity
`
`pY |X (y|0)
`pY |X (y|1) is a sufficient statistic for estimating the input to the
`
`channel. So, of course, is any invertible function of it. This leads us to the following
`
`definition.
`
`Definition 1.4 The quantity
`
`input channel is called its likelihood ratio.
`
`pY |X (y|0)
`pY |X (y|1) corresponding to the output y of a binary-
`pY |X (y|0)
`pY |X (y|1)) is called the
`
`Its logarithm log
`
`log-likelihood ratio (LLR).
`
`Notice that there does not have to be an explicit channel for a (log)-likelihood
`
`ratio to be defined. It is well defined in the context of a noisy (probabilistic) estimate
`
`of a binary random variable.
`
`
`
`5
`
`Definition 1.5 A binary input symmetric channel (BISC) is a memoryless binary
`input channel with R as its output alphabet, satisfying pY |X(y|0) = pY |X(−y|1). Thus
`a BISC is completely described by the conditional pdf pY |X(y|0).
`
`In the BISC case, by symmetry, the optimizing pdf in Definition 1.2 is uniform,
`
`i.e., (1/2, 1/2), and hence the capacity computation is much simplified.
`
`The BISC assumption leads to many such simplifications in analysis and design,
`
`and most of the work presented in this thesis will be for this case. Also, many natural
`
`channel models do fall under this class. We present some of the most important ones
`
`here.
`
`Example 1.3 (The binary symmetric channel (BSC)) This is a binary-input,
`
`binary-output channel with parameter p. To view it as a BISC, it is convenient to
`let the input and output alphabets be {+1,−1}. Then the output is equal to the
`input with probability 1 − p, and is the negative of the input with probability p. p is
`
`called the crossover probability of the channel. We will omit writing the conditional
`pdf explicitly. The capacity of this channel is given by 1 − H(p), where H(·) is the
`
`entropy function.
`
`Example 1.4 (The binary erasure channel (BEC)) This channel, with param-
`eter p, has input alphabet {+1,−1} and output alphabet {+1, 0,−1}. The output
`
`symbol 0 is also called an erasure. The output is equal to the input with probability
`1 − p and is 0 with probability p. p is called the probability of erasure. This channel
`
`is arguably the simplest nontrivial channel model, and will be one of the focal points
`of this thesis. The capacity of this channel is 1 − p.
`
`Example 1.5 (The binary input additive Gaussian noise (BIAGN) channel)
`
`This is the additive Gaussian noise channel restricted to inputs +1 and -1. The ex-
`
`pression for the capacity of a general BISC is derived in Chapter 2, and is given by
`
`
`
`eq. (2.12). For the BIAGN channel, this expression simplifies to
`
`(cid:3)
`
`C = 1 − 1√
`2πσ2
`where H(·) is again the entropy function.
`
`H
`
`R
`
`6
`
`(cid:4)
`
`(cid:5)
`
`1
`1 + e2x/σ2
`
`(x−1)2
`2σ2 dx,
`
`e
`
`(1.2)
`
`1.1.2 Codes and Code Ensembles
`
`The basic idea in coding theory is to add redundancy to the transmitted information
`
`in order to help combat channel errors. Thus, in Figure 1.1, we restrict the set of
`
`strings X that can be used, so that no two legal strings are “close to” one another.
`
`In this way, a channel error is unlikely to cause confusion between two legal strings.
`
`Definition 1.6 A block code of length n over an alphabet A is a subset of An, the
`set of n-length strings over A. The elements of this subset are called codewords.
`
`We typically need to introduce more structure into our codes to aid in both analysis
`
`and design. Most codes used in practice are so-called linear codes, which are defined
`when the elements of the alphabet A form a field.
`
`Definition 1.7 An (n, k) linear code over a field F is a k-dimensional vector subspace
`of F n. k is called the dimension of the code, r = n − k its redundancy, and R = k/n
`
`its rate. A binary linear code is merely a linear code over the binary field.
`
`Linear codes have several nice properties, for example, they look exactly the same
`around any codeword. That is, if C is a linear code and c ∈ C is a codeword, then
`the set C − c is identical to C. Also, in order to describe a linear code, we don’t have
`
`to list all its elements, but merely a basis. Such a description is called a generator
`
`matrix representation.
`
`Definition 1.8 A generator matrix for an (n, k) linear code C is a k × n matrix G
`
`
`
`7
`whose rows form a basis for C. As u varies over the space F k, uG varies over the set
`
`of codewords. Thus, the matrix G provides an encoding mechanism for the code.
`
`Another useful representation of a linear code is a parity-check matrix representa-
`
`tion.
`
`Definition 1.9 A parity-check matrix for an (n, k) linear code C is an (n − k) × n
`matrix H whose rows form a basis for the space of vectors orthogonal to C. That is,
`H is a full rank matrix s.t. Hc = 0 ⇐⇒ c ∈ C.
`
`To formalize the notion of codewords being “close to” each other, we will make
`
`use of the Hamming metric.
`
`Definition 1.10 The Hamming distance between two vectors is the number of com-
`
`ponents in which they differ. The minimum distance of a code is the smallest Ham-
`
`ming distance between two distinct codewords. For a linear code, this is the same as
`
`the least weight of any nonzero codeword.
`
`Another useful notion is that of an ensemble of codes, which is used when we wish
`
`to average some quantity over a set of codes.
`
`Definition 1.11 An ensemble of codes is a set of codes of the same length, and
`
`typically having approximately the same rates, with an associated pdf. We use the
`
`same term for sequences of such sets, often with length going to infinity.
`
`Example 1.6 (The ensemble of random codes) To construct this ensemble, fix
`
`a length n and a rate R. Then the number of codewords should be 2nR. Also fix a pdf
`
`pX(x) over the code alphabet, and pick each element of each codeword independently
`
`according to this pdf. This ensemble is known as the ensemble of random codes.
`
`For a memoryless channel, Shannon [45] showed that if pX(x) was chosen to be
`
`the optimizing pdf in Definition 1.2, then for any rate R less than the capacity of the
`
`
`
`8
`
`channel, the average probability of error for the ensemble of random codes tends to
`
`zero as n tends to infinity. He also showed that the probability of error for a code
`
`whose rate exceeded the capacity was bounded away from zero.
`
`If pX(x) is uniform over the input alphabet, then this ensemble is very similar to
`
`the one in which we pick any code of the correct length and the correct rate with
`
`equal probability, and we use the term “random codes” for this ensemble also.
`
`Example 1.7 (The ensemble of random linear codes) This is the set of all lin-
`
`ear codes of some fixed rate R (and length n), each selected with equal probability.
`
`For a BISC, this ensemble is known to achieve capacity [18] (i.e., the average proba-
`
`bility of error tends to zero as n tends to infinity for any rate less than the capacity
`
`of the channel). This is another reason why the BISC assumption is so useful. One
`
`proof of this fact will be given in Chapter 2.
`
`This ensemble is sometimes defined by saying that every entry in the generator
`
`(or parity check) matrix is picked independently with a uniform pdf. While the
`
`ensembles so defined are not identical to the first definition, most properties for large
`
`n (in particular the weight enumerator, to be defined in Chapter 2) are indeed the
`
`same. We will use the term “random linear codes” for either description.
`
`1.1.3 Decoding Algorithms
`
`Given the output of a noisy channel, we need to form an estimate of the transmitted
`
`codeword. We now define notions of “optimal” decoding algorithms.
`
`Given a received vector y, the codeword most likely to have been transmitted
`is the one that maximizes pX|Y(x|y). If the channel is memoryless and each of the
`
`codewords is equally likely, then this reduces to the codeword x which maximizes
`pY|X(y|x) (which factorizes if the channel is memoryless). This is known as the
`
`maximum likelihood (ML) estimate of the transmitted codeword.
`
`
`
`9
`Definition 1.12 Given a code C and a received vector y, the maximum likelihood
`
`decoder has as its output
`
`ˆx = arg max
`x∈C
`
`pY|X(y|x).
`
`(1.3)
`
`Clearly this decoder has the least possible (word) probability of error.
`
`If, on the other hand, we wish to minimize the bit error probability, then we need
`to maximize Pr(xi = x|y) over all x.
`
`Definition 1.13 Given a received vector y, the MAP (maximum a posteriori) de-
`
`coder has as its output in the ith position
`
`ˆxi = arg max
`
`x∈A Pr(xi = x|y),
`
`(1.4)
`
`where the maximization is over the input alphabet of the channel.
`
`In the case of a binary code, the maximization is over a binary alphabet. Taking
`
`the maximum of two values is equivalent to comparing their ratio to 1, or the log of
`
`the ratio to 0. This latter quantity is nothing but the LLR. Therefore, MAP decoding
`
`in the case of a binary code is equivalent to taking the sign of the a posteriori LLR.
`
`1.2 Some Graphical Code Ensembles
`
`In this section, we will introduce some important graphical code ensembles, which
`
`is a generic term we will use to describe all “turbo-like” codes, or codes amenable
`
`to iterative decoding. All of these codes can be viewed under a unified graphical
`
`framework. Several such frameworks have been proposed, for example, see [4], [31],
`
`and [16]. In all these cases, the iterative decoding algorithm reduces to an instance of
`
`Pearl’s belief propagation algorithm [41] on loopy graphs. Such a general view will,
`
`however, not be necessary for the purposes of this thesis, and we will describe each
`
`
`
`10
`
`Convolutional
`Encoder 1 (IIR)
`
`Convolutional
`Encoder 2 (IIR)
`
`π
`
`Figure 1.2: Parallel concatenation of two convolutional codes, connected through a
`random interleaver (denoted by π).
`
`ensemble separately with its own decoding algorithm. The specific decoding algorithm
`
`we will describe is called the sum-product algorithm, which aims to minimize the bit
`
`error probability. The idea in every case is that we use “long, random-like codes”
`
`as suggested by Shannon, which is possible through the existence of a simple though
`
`suboptimal decoding algorithm.
`
`1.2.1 Parallel Concatenation of Convolutional Codes
`
`(PCCC)
`
`These codes are also known as “parallel turbo codes.” The original turbo code intro-
`
`duced by Berrou et al. [7] was a code of this type. The general structure is shown
`
`in Figure 1.2. As mentioned earlier, it consists of two relatively simple constituent
`
`codes, more specifically truncated binary IIR convolutional codes with a short con-
`
`straint length. They are connected by an interleaver (labeled π in the figure), which
`
`is merely a pseudo-random permutation of the information bits. In the figure, there is
`
`also a systematic (information) bit stream, which could be absent. There could also
`
`be more than two constituent codes, each with its own interleaver. Here, we briefly
`
`
`
`11
`
`describe the decoding algorithm in the case when there are two constituent encoders
`
`and a systematic data stream.
`
`The aim of the sum-product algorithm is to approximate MAP decoding, as de-
`
`fined in Definition 1.13, or equivalently to compute the a posteriori log-likelihoods
`
`of the individual transmitted bits given the received vector. The MAP decoding al-
`
`gorithm for the constituent convolutional codes can be implemented with the well
`
`known forward-backward or BCJR [6] algorithm, which is feasible in this case be-
`
`cause these codes have a short constraint length. Given an a priori estimate (or LLR)
`
`on each information bit and an LLR for each transmitted bit, the BCJR algorithm
`
`outputs the correct a posteriori LLR for each information bit.
`
`The turbo-decoding algorithm iterates between the MAP decoders corresponding
`
`to the two constituent codes. The received values corresponding to the systematic bits
`
`are used to initialize the a priori LLR’s for the information bits. One of the constituent
`
`decoders then outputs the a posteriori LLR’s by running the BCJR algorithm, the
`
`idea being to use these as a priori LLR’s for the other decoder. However, in order not
`
`to form short loops in the so-called “computation tree,” the difference between the
`
`a posteriori and the a priori LLR’s (this is known as extrinsic information) is fed to
`
`the other decoder as a priori LLR’s, and the same operation is repeated over and over
`
`again. Various stopping rules are used to decide on convergence and guard against
`
`limit-cycles.
`
`1.2.2 Serial Concatenation of Convolutional Codes (SCCC)
`
`These codes are also known as “serial turbo-codes.” In this case each convolutional
`
`encoder acts on the interleaved output of the previous one, instead of on the infor-
`
`mation stream directly. The general structure is shown in Figure 1.3. In this case
`
`also, the sum-product algorithm iterates between the decoders corresponding to the
`
`constituent codes. Some slight modifications are needed from the PCCC case, but
`
`
`
`12
`
`π
`
`Convolutional
` Encoder 1
`
`Convolutional
`Encoder 2 (IIR)
`
`Figure 1.3: Serial concatenation of two convolutional codes, connected through a
`random interleaver (denoted by π).
`
`the basic idea is the same, and we will omit giving a detailed description here.
`
`One example of the SCCC case is the ensemble of repeat-accumulate (RA) codes
`
`introduced in [15]. An RA code is shown in Figure 1.4. It is the concatenation of
`
`two particularly simple constituent codes, an outer “repeat by q” code and an inner
`
`“accumulate” code. This simple structure was intended to make analysis possible,
`
`but their performance under iterative decoding is surprisingly good, especially consid-
`
`ering that constituent decoders have extremely low complexity. These codes play an
`
`important role in this thesis, particularly in Chapter 3, and we will give an alternative
`
`description of them in Section 1.2.3.
`
`Repetition by q
` (Rate 1/q)
`
`π
`
`1/(1+D)
`(Rate 1)
`
`Figure 1.4: A repeat-accumulate (RA) code.
`
`1.2.3 Codes Defined on Tanner Graphs
`
`A Tanner graph is a general way of representing any linear code. An example is
`
`shown in Figure 1.5. A Tanner graph has two kinds of nodes, called variable nodes,
`
`represented by hollow circles in the figure, and check nodes, represented by filled
`
`circles. The graph is bipartite between these two types of nodes, i.e., every edge has
`
`
`
`13
`
`Figure 1.5: A small Tanner graph.
`
`a variable node at one end and a check node at the other end. The variable nodes
`
`represent actual variables, for example, elements of the codeword. The check nodes
`
`represent constraints among these variables. All the graphs we will look at will be for
`
`binary linear codes, in which case the variable nodes represent binary variables, and a
`
`check node says that the binary sum of all its neighbors is 0. Clearly, any linear code
`
`may be represented in this manner, by directly transcribing its parity-check matrix.
`
`In fact, it will have many possible representations, because it has many possible
`
`parity-check matrices, and also because we can add dummy or state variables to the
`
`graph, as we shall soon see in the case of RA codes.
`
`1.2.4 Decoding on Tanner Graphs
`
`The sum-product algorithm takes a particularly elegant form on a Tanner graph.
`
`It was first described in this case by Gallager [17] in the context of LDPC codes.
`
`It is a completely distributed algorithm, with each node acting as an independent
`
`entity, communicating with other nodes through the edges. The message sent by a
`
`variable node to a check node, say in LLR form, is its estimate of its own value. The
`
`message sent by a check node to a variable node is its estimate of the variable node’s
`
`
`
`14
`
`value. The update rules at the nodes are essentially MAP estimators, given that the
`
`incoming messages along the different edges are independent. Again, in order not to
`
`form short cycles in the computation tree, the output along any edge is based only on
`
`input from the other edges. At a variable node of degree j, if l1, l2, . . . , lj−1 denote
`the incoming LLR’s along j − 1 edges, and l0 the LLR corresponding to the channel
`evidence, then the outgoing LLR lout along the jth edge is merely the MAP estimate
`
`of the underlying binary random variable given j independent estimates of it, and is
`
`given by
`
`j−1(cid:6)
`
`lout = l0 +
`
`li.
`
`i=1
`
`(1.5)
`
`At a check node, the situation is similar, though the update rule is more compli-
`
`cated. If l1, l2, . . . , lk−1 denote the incoming LLR’s at a check node of degree k, then
`
`the outgoing LLR lk along the kth edge corresponds to the pdf of the binary sum of
`j − 1 independent random variables, and works out to be
`k−1(cid:7)
`
`tanh(lout/2) =
`
`tanh(li/2).
`
`(1.6)
`
`i=1
`
`(For a derivation of eqs. (1.5) and (1.6), see [42, Section 3.2].)
`
`Given these update rules, we only need a schedule for updating the various mes-
`
`sages to complete the description of the decoding