throbber
Graph-based Codes and Iterative Decoding
`
`Thesis by
`
`Aamod Khandekar
`
`In Partial Fulfillment of the Requirements
`
`for the Degree of
`
`Doctor of Philosophy
`
`California Institute of Technology
`
`Pasadena, California
`
`2002
`
`(Submitted June 10, 2002)
`
`i
`
`Apple 1018
`
`

`
`ii
`
`c(cid:2) 2002
`
`Aamod Khandekar
`
`All Rights Reserved
`
`

`
`iii
`
`Acknowledgements
`
`I would like to begin by expressing my heartfelt gratitude to my advisor, Prof. Robert
`
`McEliece, for his active guidance and support throughout my stay at Caltech. His
`
`infectious enthusiasm and his many insights have been invaluable for the progress of
`
`my research career.
`
`I would like to thank Prof. Robert McEliece, Prof. P. P. Vaidyanathan, Prof.
`
`Jehoshua (Shuki) Bruck, Prof. John Preskill, Prof. Babak Hassibi and Dr. Dariush
`
`Divsalar for serving on my candidacy and defense committees, and for their advice
`
`on this thesis.
`
`I am grateful to all the current and former occupants of Moore 155, and to current
`
`and former residents of Braun graduate house, for making my stay at Caltech a very
`
`pleasant one.
`
`

`
`iv
`
`Abstract
`
`The field of error correcting codes was revolutionized by the introduction of turbo
`
`codes [7] in 1993. These codes demonstrated dramatic performance improvements
`
`over any previously known codes, with significantly lower complexity. Since then,
`
`much progress has been made towards understanding the performance of these codes,
`
`as well as in using this understanding to design even better codes.
`
`This thesis takes a few more steps in both these directions. We develop a new
`
`technique, called the typical set bound, for analyzing the asymptotic performance
`
`of code ensembles based on their weight enumerators. This technique yields very
`
`tight bounds on the maximum-likelihood decoding threshold of code ensembles, and
`
`is powerful enough to reproduce Shannon’s noisy coding theorem for the class of
`
`binary-input symmetric channels.
`
`We also introduce a new class of codes called irregular repeat-accumulate (IRA)
`
`codes, which are adapted from the previously known class of repeat-accumulate (RA)
`
`codes. These codes are competitive in terms of decoding performance with the class
`
`of irregular low-density parity-check (LDPC) codes, which are arguably the best class
`
`of codes known today, at least for long block lengths. In addition, IRA codes have a
`
`significant advantage over irregular LDPC codes in terms of encoding complexity.
`
`We also derive an analytical bound regarding iterative decoding thresholds of code
`
`ensembles on general binary-input symmetric channels, an area in which theoretical
`
`results are currently lacking.
`
`

`
`v
`
`Contents
`
`Acknowledgements
`
`Abstract
`
`1 Introduction
`
`1.1 Some Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.1.1 Channel Models . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.1.2 Codes and Code Ensembles
`
`. . . . . . . . . . . . . . . . . . .
`
`1.1.3 Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . . .
`
`1.2 Some Graphical Code Ensembles
`
`. . . . . . . . . . . . . . . . . . . .
`
`1.2.1 Parallel Concatenation of Convolutional Codes (PCCC) . . . .
`
`1.2.2
`
`Serial Concatenation of Convolutional Codes (SCCC) . . . . .
`
`1.2.3 Codes Defined on Tanner Graphs . . . . . . . . . . . . . . . .
`
`1.2.4 Decoding on Tanner Graphs . . . . . . . . . . . . . . . . . . .
`
`1.2.5 Low-Density Parity-Check (LDPC) Codes
`
`. . . . . . . . . . .
`
`1.2.6 Repeat Accumulate (RA) Codes . . . . . . . . . . . . . . . . .
`
`1.3 Density Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.3.1 Density Evolution on the BEC . . . . . . . . . . . . . . . . . .
`
`1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2 The Typical Set Bound
`
`2.1 The Union Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.2 The Typical Set Decoder . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.3 The Typical Set Bound . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`iii
`
`iv
`
`1
`
`2
`
`2
`
`6
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`15
`
`15
`
`17
`
`18
`
`19
`
`21
`
`22
`
`24
`
`28
`
`

`
`vi
`
`2.3.1 Properties of K(δ)
`
`. . . . . . . . . . . . . . . . . . . . . . . .
`
`2.3.2 The Main Theorem . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.4 The Typical Set Bound for Specific Channel Models . . . . . . . . . .
`
`2.4.1 The Binary Symmetric Channel
`
`. . . . . . . . . . . . . . . . .
`
`2.4.2 The Binary Erasure Channel . . . . . . . . . . . . . . . . . . .
`
`2.5 The Typical Set Bound for Specific Code Ensembles . . . . . . . . . .
`
`2.5.1 The Ensemble of Random Linear Codes
`
`. . . . . . . . . . . .
`
`2.5.2 LDPC Codes
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.5.3 Cycle Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.5.4 RA Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.6 Generalization to Arbitrary BISC’s . . . . . . . . . . . . . . . . . . .
`
`3 Irregular Repeat-Accumulate Codes
`
`3.1
`
`Irregular LDPC Codes: Definition . . . . . . . . . . . . . . . . . . . .
`
`3.2
`
`Irregular LDPC Codes on the BEC . . . . . . . . . . . . . . . . . . .
`
`3.3
`
`IRA Codes: Definition and Notation . . . . . . . . . . . . . . . . . .
`
`3.4
`
`IRA Codes on the BEC . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`3.4.1 Fixed Point Analysis of Iterative Decoding . . . . . . . . . . .
`
`3.4.2 Capacity Achieving Sequences of Degree Distributions . . . . .
`
`3.4.3
`
`Some Numerical Results . . . . . . . . . . . . . . . . . . . . .
`
`3.5
`
`IRA Codes on the BIAGN Channel
`
`. . . . . . . . . . . . . . . . . . .
`
`3.5.1 Gaussian Approximation . . . . . . . . . . . . . . . . . . . . .
`
`3.5.2 Numerical Results
`
`. . . . . . . . . . . . . . . . . . . . . . . .
`
`3.6 Complexity Analysis
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`4 A Lower Bound on Iterative Decoding Thresholds for General
`
`BISC’s
`
`29
`
`30
`
`33
`
`33
`
`34
`
`36
`
`36
`
`38
`
`40
`
`42
`
`43
`
`46
`
`47
`
`49
`
`51
`
`54
`
`54
`
`56
`
`59
`
`61
`
`61
`
`64
`
`67
`
`72
`
`73
`
`

`
`vii
`
`4.1
`
`Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`4.1.1 The Consistency Condition . . . . . . . . . . . . . . . . . . .
`
`4.1.2 The Stability Condition . . . . . . . . . . . . . . . . . . . . .
`
`4.2 The Main Result
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`5 IRA Codes on Non-Binary Channels
`
`5.1 The 2-D Gaussian Channel . . . . . . . . . . . . . . . . . . . . . . . .
`
`5.2 The Binary Adder Channel
`
`. . . . . . . . . . . . . . . . . . . . . . .
`
`5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`6 Conclusions
`
`A Miscellaneous Derivations for Chapter 2
`
`B Miscellaneous Derivations for Chapter 3
`
`C Miscellaneous Derivations for Chapter 4
`
`Bibliography
`
`73
`
`73
`
`74
`
`75
`
`80
`
`82
`
`82
`
`86
`
`89
`
`90
`
`92
`
`94
`
`98
`
`100
`
`

`
`viii
`
`List of Figures
`
`1.1 A canonical communication system. . . . . . . . . . . . . . . . . . . .
`
`3
`
`1.2 Parallel concatenation of two convolutional codes, connected through
`
`a random interleaver. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`10
`
`1.3 Serial concatenation of two convolutional codes, connected through a
`
`random interleaver.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.4 A repeat-accumulate (RA) code. . . . . . . . . . . . . . . . . . . . . .
`
`1.5 A small Tanner graph.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.6 The Tanner graph of an RA code. . . . . . . . . . . . . . . . . . . . .
`
`2.1 The function Kp(δ) for the BSC.
`
`. . . . . . . . . . . . . . . . . . . .
`
`2.2 The function Kp(δ) for the BEC.
`
`. . . . . . . . . . . . . . . . . . . .
`
`2.3 The asymptotic spectral shape of the ensemble of rate 1/3 random
`
`linear codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.4 The asymptotic spectral shape of the ensemble of (4, 8) LDPC codes.
`
`2.5 The asymptotic spectral shape of the ensemble of q = 5 RA codes. . .
`
`3.1 The Tanner graph of an irregular LDPC code.
`
`. . . . . . . . . . . . .
`
`3.2 The Tanner graph of an IRA code.
`
`. . . . . . . . . . . . . . . . . . .
`
`3.3 Performance of rate 1/2 IRA codes on the BIAGN channel. . . . . . .
`
`3.4 Variation of decoded BER with the number of iterations.
`
`. . . . . . .
`
`4.1 BIAGN channel thresholds of codes optimized for the BEC.
`
`. . . . .
`
`4.2 BSC thresholds of codes optimized for the BEC. . . . . . . . . . . . .
`
`12
`
`12
`
`13
`
`16
`
`33
`
`34
`
`36
`
`39
`
`42
`
`48
`
`51
`
`67
`
`70
`
`75
`
`76
`
`

`
`ix
`
`5.1 Performance of an IRA code on the 2-D Gaussian channel with 8-PSK
`
`modulation.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`84
`
`5.2 Performance of an IRA code on the 2-D Gaussian channel with 16-
`
`QAM modulation.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`5.3 Graphical representation of a BAC coding scheme. . . . . . . . . . . .
`
`85
`
`87
`
`

`
`x
`
`List of Tables
`
`2.1 BSC and BEC typical set thresholds for LDPC codes. . . . . . . . . .
`
`2.2 BSC and BEC typical set thresholds for RA codes.
`
`. . . . . . . . . .
`
`2.3 Some BIAGN channel typical set thresholds.
`
`. . . . . . . . . . . . . .
`
`40
`
`43
`
`44
`
`3.1 BEC thresholds for some ensembles of IRA codes.
`
`. . . . . . . . . . .
`
`60
`
`3.2 Some good degree sequences for rate 1/3 IRA codes on the BIAGN
`
`channel.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`65
`
`3.3 Some good degree sequences for rate 1/2 IRA codes on the BIAGN
`
`channel.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`66
`
`4.1 Comparison between capacity and threshold achieved by codes opti-
`
`mized for the BEC. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`81
`
`

`
`1
`
`Chapter 1 Introduction
`
`In his seminal paper in 1948, Shannon [45] derived the fundamental limits on the rate
`
`of communication on a noisy channel. Shannon’s “noisy coding theorem” states that
`
`every channel has a capacity C, which is the highest rate (in bits per channel use)
`
`at which reliable communication is possible. Shannon also showed that for a long
`
`enough block length, almost any block code of rate R < C, with optimal decoding,
`
`provides reliable communication over this channel. This scheme, however, is not a
`
`practical one, since both encoding and decoding are prohibitively expensive.
`
`Ever since Shannon proved his noisy coding theorem, the construction of practical
`
`capacity-achieving schemes has been the supreme goal of coding theory. The classical
`
`approaches to this problem included algebraic block codes and convolutional codes.
`
`The field was, however, revolutionized by the introduction of turbo codes by Berrou,
`
`Glavieux, and Thitimajshima [7]. The performance of turbo codes was much closer
`
`to capacity than that of any previous codes, and with significantly lower complexity.
`
`The power of turbo codes comes not only from the code construction, but also
`
`from the iterative decoding algorithm used. The code construction consists of simple
`
`blocks connected by a pseudorandom interleaver. The interleaver introduces enough
`
`randomness into the code to ensure good performance, yet keeps enough structure
`
`to allow simple encoding and decoding algorithms. The invention of turbo codes led
`
`to an explosion of interest in the field of codes on graphs and iterative decoding,
`
`which led among other things to the rediscovery of low-density parity-check (LDPC)
`
`codes [17, 37], and the invention of repeat-accumulate (RA) codes [15].
`
`While the turbo-decoding algorithm worked extremely well in practice, there was
`
`very little theory available to explain its dramatic performance. It was soon discov-
`
`

`
`2
`
`ered that the turbo-decoding algorithm was an instance of Pearl’s belief propagation
`
`algorithm over loopy graphs [39], and several frameworks were developed to formalize
`
`this notion [39, 4, 31]. The belief propagation algorithm is known to terminate in
`
`finite time in the case of non-loopy graphs (trees) and is optimal in this case. Its
`
`behavior on general graphs was, however, not known. Some results in this direction
`
`were presented in [2, 51, 52]. Luby et al. [33, 34], followed by Richardson and Ur-
`
`banke [42], showed how LDPC codes could be approximated by the cycle-free case,
`
`and were able to prove asymptotic results about their iterative decoding performance.
`
`Luby et al. also introduced the concept of irregularity, which seems to provide hope
`
`of operating arbitrarily close to channel capacity in a practical manner, on a wide
`
`class of channel models. More recently, connections have been discovered between
`
`coding theory and statistical physics [54, 55, 5], which show some hope of providing
`
`insight into the general problem of decoding on loopy graphs.
`
`All the codes mentioned in the preceding paragraph are so-called “graphical code
`
`ensembles.” Most of this thesis deals with the analysis and design of such code
`
`ensembles, and hence we devote the rest of this chapter to some basic background
`
`material concerning them.
`
`1.1 Some Basic Concepts
`
`1.1.1 Channel Models
`
`A canonical communication system is depicted in Figure 1.1. The objective is to
`
`communicate an input string U across a “noisy”channel. The encoder converts this
`
`string into another string X of symbols over the channel’s input alphabet. A string
`
`Y is seen at the other end of the channel, which is a non-deterministic function of
`
`the channel input X, and the decoder tries to reconstruct the input string U based
`
`on the knowledge of Y. To analyze such a system, we need to have a model for the
`
`

`
`3
`
`Encoder
`
`Channel
`
`U
`
`X
`
`Y
`
`Decoder
`

`
`Figure 1.1: A canonical communication system.
`
`channel. Typically, we assume that the output string Y of the channel has the same
`
`length as the input string X, and depends on X via a conditional probability density
`function (pdf) pY|X(y|x). This is still an extremely general model and we therefore
`
`define several special cases.
`
`Definition 1.1 A channel is called memoryless if the channel output at any time
`
`(cid:2)n
`instant depends only on the input at that time instant, i.e., if y = y1y2 . . . yn and
`x = x1x2 . . . xn, then pY|X(y|x) =
`i=1 pY |X(yi|xi).
`
`In this case, the channel is
`
`completely described by its input and output alphabets, and the conditional pdf
`pY |X(y|x) for one time instant.
`
`Definition 1.2 (Capacity) Let the input to a memoryless channel be generated by
`
`independent and identically distributed (iid) copies of a random variable X. Then
`
`the output of the channel will be iid copies of some random variable Y whose pdf
`
`can be computed. Clearly, I(X; Y ), the information between random variables X
`
`and Y (see [10] for a definition), is a function of the pdf of X. The capacity of the
`
`memoryless channel is then defined as suppX (x) I(X; Y ). The motivation behind this
`
`definition will be clear in Section 1.1.2.
`
`Example 1.1 A very commonly used channel model is the additive Gaussian noise
`
`channel. This channel has R as its input and output alphabets, and is parametrized
`
`by a non-negative real number σ. The channel output Y is given by X + N , where
`
`X is the channel input and N is a Gaussian random variable (random variable) with
`mean 0 and variance σ2. The conditional pdf pY |X(y|x) is therefore a Gaussian pdf
`
`

`
`with mean x and variance σ2.
`
`4
`
`Definition 1.3 A binary input channel is merely a channel with a binary input alpha-
`bet. We will interchangeably use the sets {0, 1} and {+1,−1} for the input alphabet
`
`with 0 mapping to +1 and 1 to -1.
`
`Example 1.2 (The Z-Channel) This is a binary input channel with a parameter
`p and output alphabet {0, 1}, in which a 0 is always received as a 0, but a 1 could be
`received as a 0 with probability p. That is, pY |X(0|0) = 1, pY |X(1|0) = 0, pY |X(0|1) =
`p, and pY |X(1|1) = 1 − p.
`
`Suppose we want to compute the input distribution of a binary-input channel
`
`conditioned on the knowledge of the received value y, i.e., we want to compute the
`a posteriori probabilities Pr(X = 0|Y = y) and Pr(X = 1|Y = y), which sum to 1.
`
`This knowledge can be efficiently packaged into one number, for example their ratio.
`
`By Bayes’ rule, we have
`
`Pr(X = 0|Y = y)
`Pr(X = 1|Y = y)
`
`=
`
`pY |X(y|0)
`pY |X(y|1)
`
`Pr(X = 0)
`Pr(X = 1)
`
`.
`
`(1.1)
`
`Hence the quantity
`
`pY |X (y|0)
`pY |X (y|1) is a sufficient statistic for estimating the input to the
`
`channel. So, of course, is any invertible function of it. This leads us to the following
`
`definition.
`
`Definition 1.4 The quantity
`
`input channel is called its likelihood ratio.
`
`pY |X (y|0)
`pY |X (y|1) corresponding to the output y of a binary-
`pY |X (y|0)
`pY |X (y|1)) is called the
`
`Its logarithm log
`
`log-likelihood ratio (LLR).
`
`Notice that there does not have to be an explicit channel for a (log)-likelihood
`
`ratio to be defined. It is well defined in the context of a noisy (probabilistic) estimate
`
`of a binary random variable.
`
`

`
`5
`
`Definition 1.5 A binary input symmetric channel (BISC) is a memoryless binary
`input channel with R as its output alphabet, satisfying pY |X(y|0) = pY |X(−y|1). Thus
`a BISC is completely described by the conditional pdf pY |X(y|0).
`
`In the BISC case, by symmetry, the optimizing pdf in Definition 1.2 is uniform,
`
`i.e., (1/2, 1/2), and hence the capacity computation is much simplified.
`
`The BISC assumption leads to many such simplifications in analysis and design,
`
`and most of the work presented in this thesis will be for this case. Also, many natural
`
`channel models do fall under this class. We present some of the most important ones
`
`here.
`
`Example 1.3 (The binary symmetric channel (BSC)) This is a binary-input,
`
`binary-output channel with parameter p. To view it as a BISC, it is convenient to
`let the input and output alphabets be {+1,−1}. Then the output is equal to the
`input with probability 1 − p, and is the negative of the input with probability p. p is
`
`called the crossover probability of the channel. We will omit writing the conditional
`pdf explicitly. The capacity of this channel is given by 1 − H(p), where H(·) is the
`
`entropy function.
`
`Example 1.4 (The binary erasure channel (BEC)) This channel, with param-
`eter p, has input alphabet {+1,−1} and output alphabet {+1, 0,−1}. The output
`
`symbol 0 is also called an erasure. The output is equal to the input with probability
`1 − p and is 0 with probability p. p is called the probability of erasure. This channel
`
`is arguably the simplest nontrivial channel model, and will be one of the focal points
`of this thesis. The capacity of this channel is 1 − p.
`
`Example 1.5 (The binary input additive Gaussian noise (BIAGN) channel)
`
`This is the additive Gaussian noise channel restricted to inputs +1 and -1. The ex-
`
`pression for the capacity of a general BISC is derived in Chapter 2, and is given by
`
`

`
`eq. (2.12). For the BIAGN channel, this expression simplifies to
`
`(cid:3)
`
`C = 1 − 1√
`2πσ2
`where H(·) is again the entropy function.
`
`H
`
`R
`
`6
`
`(cid:4)
`
`(cid:5)
`
`1
`1 + e2x/σ2
`
`(x−1)2
`2σ2 dx,
`
`e
`
`(1.2)
`
`1.1.2 Codes and Code Ensembles
`
`The basic idea in coding theory is to add redundancy to the transmitted information
`
`in order to help combat channel errors. Thus, in Figure 1.1, we restrict the set of
`
`strings X that can be used, so that no two legal strings are “close to” one another.
`
`In this way, a channel error is unlikely to cause confusion between two legal strings.
`
`Definition 1.6 A block code of length n over an alphabet A is a subset of An, the
`set of n-length strings over A. The elements of this subset are called codewords.
`
`We typically need to introduce more structure into our codes to aid in both analysis
`
`and design. Most codes used in practice are so-called linear codes, which are defined
`when the elements of the alphabet A form a field.
`
`Definition 1.7 An (n, k) linear code over a field F is a k-dimensional vector subspace
`of F n. k is called the dimension of the code, r = n − k its redundancy, and R = k/n
`
`its rate. A binary linear code is merely a linear code over the binary field.
`
`Linear codes have several nice properties, for example, they look exactly the same
`around any codeword. That is, if C is a linear code and c ∈ C is a codeword, then
`the set C − c is identical to C. Also, in order to describe a linear code, we don’t have
`
`to list all its elements, but merely a basis. Such a description is called a generator
`
`matrix representation.
`
`Definition 1.8 A generator matrix for an (n, k) linear code C is a k × n matrix G
`
`

`
`7
`whose rows form a basis for C. As u varies over the space F k, uG varies over the set
`
`of codewords. Thus, the matrix G provides an encoding mechanism for the code.
`
`Another useful representation of a linear code is a parity-check matrix representa-
`
`tion.
`
`Definition 1.9 A parity-check matrix for an (n, k) linear code C is an (n − k) × n
`matrix H whose rows form a basis for the space of vectors orthogonal to C. That is,
`H is a full rank matrix s.t. Hc = 0 ⇐⇒ c ∈ C.
`
`To formalize the notion of codewords being “close to” each other, we will make
`
`use of the Hamming metric.
`
`Definition 1.10 The Hamming distance between two vectors is the number of com-
`
`ponents in which they differ. The minimum distance of a code is the smallest Ham-
`
`ming distance between two distinct codewords. For a linear code, this is the same as
`
`the least weight of any nonzero codeword.
`
`Another useful notion is that of an ensemble of codes, which is used when we wish
`
`to average some quantity over a set of codes.
`
`Definition 1.11 An ensemble of codes is a set of codes of the same length, and
`
`typically having approximately the same rates, with an associated pdf. We use the
`
`same term for sequences of such sets, often with length going to infinity.
`
`Example 1.6 (The ensemble of random codes) To construct this ensemble, fix
`
`a length n and a rate R. Then the number of codewords should be 2nR. Also fix a pdf
`
`pX(x) over the code alphabet, and pick each element of each codeword independently
`
`according to this pdf. This ensemble is known as the ensemble of random codes.
`
`For a memoryless channel, Shannon [45] showed that if pX(x) was chosen to be
`
`the optimizing pdf in Definition 1.2, then for any rate R less than the capacity of the
`
`

`
`8
`
`channel, the average probability of error for the ensemble of random codes tends to
`
`zero as n tends to infinity. He also showed that the probability of error for a code
`
`whose rate exceeded the capacity was bounded away from zero.
`
`If pX(x) is uniform over the input alphabet, then this ensemble is very similar to
`
`the one in which we pick any code of the correct length and the correct rate with
`
`equal probability, and we use the term “random codes” for this ensemble also.
`
`Example 1.7 (The ensemble of random linear codes) This is the set of all lin-
`
`ear codes of some fixed rate R (and length n), each selected with equal probability.
`
`For a BISC, this ensemble is known to achieve capacity [18] (i.e., the average proba-
`
`bility of error tends to zero as n tends to infinity for any rate less than the capacity
`
`of the channel). This is another reason why the BISC assumption is so useful. One
`
`proof of this fact will be given in Chapter 2.
`
`This ensemble is sometimes defined by saying that every entry in the generator
`
`(or parity check) matrix is picked independently with a uniform pdf. While the
`
`ensembles so defined are not identical to the first definition, most properties for large
`
`n (in particular the weight enumerator, to be defined in Chapter 2) are indeed the
`
`same. We will use the term “random linear codes” for either description.
`
`1.1.3 Decoding Algorithms
`
`Given the output of a noisy channel, we need to form an estimate of the transmitted
`
`codeword. We now define notions of “optimal” decoding algorithms.
`
`Given a received vector y, the codeword most likely to have been transmitted
`is the one that maximizes pX|Y(x|y). If the channel is memoryless and each of the
`
`codewords is equally likely, then this reduces to the codeword x which maximizes
`pY|X(y|x) (which factorizes if the channel is memoryless). This is known as the
`
`maximum likelihood (ML) estimate of the transmitted codeword.
`
`

`
`9
`Definition 1.12 Given a code C and a received vector y, the maximum likelihood
`
`decoder has as its output
`
`ˆx = arg max
`x∈C
`
`pY|X(y|x).
`
`(1.3)
`
`Clearly this decoder has the least possible (word) probability of error.
`
`If, on the other hand, we wish to minimize the bit error probability, then we need
`to maximize Pr(xi = x|y) over all x.
`
`Definition 1.13 Given a received vector y, the MAP (maximum a posteriori) de-
`
`coder has as its output in the ith position
`
`ˆxi = arg max
`
`x∈A Pr(xi = x|y),
`
`(1.4)
`
`where the maximization is over the input alphabet of the channel.
`
`In the case of a binary code, the maximization is over a binary alphabet. Taking
`
`the maximum of two values is equivalent to comparing their ratio to 1, or the log of
`
`the ratio to 0. This latter quantity is nothing but the LLR. Therefore, MAP decoding
`
`in the case of a binary code is equivalent to taking the sign of the a posteriori LLR.
`
`1.2 Some Graphical Code Ensembles
`
`In this section, we will introduce some important graphical code ensembles, which
`
`is a generic term we will use to describe all “turbo-like” codes, or codes amenable
`
`to iterative decoding. All of these codes can be viewed under a unified graphical
`
`framework. Several such frameworks have been proposed, for example, see [4], [31],
`
`and [16]. In all these cases, the iterative decoding algorithm reduces to an instance of
`
`Pearl’s belief propagation algorithm [41] on loopy graphs. Such a general view will,
`
`however, not be necessary for the purposes of this thesis, and we will describe each
`
`

`
`10
`
`Convolutional
`Encoder 1 (IIR)
`
`Convolutional
`Encoder 2 (IIR)
`

`
`Figure 1.2: Parallel concatenation of two convolutional codes, connected through a
`random interleaver (denoted by π).
`
`ensemble separately with its own decoding algorithm. The specific decoding algorithm
`
`we will describe is called the sum-product algorithm, which aims to minimize the bit
`
`error probability. The idea in every case is that we use “long, random-like codes”
`
`as suggested by Shannon, which is possible through the existence of a simple though
`
`suboptimal decoding algorithm.
`
`1.2.1 Parallel Concatenation of Convolutional Codes
`
`(PCCC)
`
`These codes are also known as “parallel turbo codes.” The original turbo code intro-
`
`duced by Berrou et al. [7] was a code of this type. The general structure is shown
`
`in Figure 1.2. As mentioned earlier, it consists of two relatively simple constituent
`
`codes, more specifically truncated binary IIR convolutional codes with a short con-
`
`straint length. They are connected by an interleaver (labeled π in the figure), which
`
`is merely a pseudo-random permutation of the information bits. In the figure, there is
`
`also a systematic (information) bit stream, which could be absent. There could also
`
`be more than two constituent codes, each with its own interleaver. Here, we briefly
`
`

`
`11
`
`describe the decoding algorithm in the case when there are two constituent encoders
`
`and a systematic data stream.
`
`The aim of the sum-product algorithm is to approximate MAP decoding, as de-
`
`fined in Definition 1.13, or equivalently to compute the a posteriori log-likelihoods
`
`of the individual transmitted bits given the received vector. The MAP decoding al-
`
`gorithm for the constituent convolutional codes can be implemented with the well
`
`known forward-backward or BCJR [6] algorithm, which is feasible in this case be-
`
`cause these codes have a short constraint length. Given an a priori estimate (or LLR)
`
`on each information bit and an LLR for each transmitted bit, the BCJR algorithm
`
`outputs the correct a posteriori LLR for each information bit.
`
`The turbo-decoding algorithm iterates between the MAP decoders corresponding
`
`to the two constituent codes. The received values corresponding to the systematic bits
`
`are used to initialize the a priori LLR’s for the information bits. One of the constituent
`
`decoders then outputs the a posteriori LLR’s by running the BCJR algorithm, the
`
`idea being to use these as a priori LLR’s for the other decoder. However, in order not
`
`to form short loops in the so-called “computation tree,” the difference between the
`
`a posteriori and the a priori LLR’s (this is known as extrinsic information) is fed to
`
`the other decoder as a priori LLR’s, and the same operation is repeated over and over
`
`again. Various stopping rules are used to decide on convergence and guard against
`
`limit-cycles.
`
`1.2.2 Serial Concatenation of Convolutional Codes (SCCC)
`
`These codes are also known as “serial turbo-codes.” In this case each convolutional
`
`encoder acts on the interleaved output of the previous one, instead of on the infor-
`
`mation stream directly. The general structure is shown in Figure 1.3. In this case
`
`also, the sum-product algorithm iterates between the decoders corresponding to the
`
`constituent codes. Some slight modifications are needed from the PCCC case, but
`
`

`
`12
`

`
`Convolutional
` Encoder 1
`
`Convolutional
`Encoder 2 (IIR)
`
`Figure 1.3: Serial concatenation of two convolutional codes, connected through a
`random interleaver (denoted by π).
`
`the basic idea is the same, and we will omit giving a detailed description here.
`
`One example of the SCCC case is the ensemble of repeat-accumulate (RA) codes
`
`introduced in [15]. An RA code is shown in Figure 1.4. It is the concatenation of
`
`two particularly simple constituent codes, an outer “repeat by q” code and an inner
`
`“accumulate” code. This simple structure was intended to make analysis possible,
`
`but their performance under iterative decoding is surprisingly good, especially consid-
`
`ering that constituent decoders have extremely low complexity. These codes play an
`
`important role in this thesis, particularly in Chapter 3, and we will give an alternative
`
`description of them in Section 1.2.3.
`
`Repetition by q
` (Rate 1/q)
`

`
`1/(1+D)
`(Rate 1)
`
`Figure 1.4: A repeat-accumulate (RA) code.
`
`1.2.3 Codes Defined on Tanner Graphs
`
`A Tanner graph is a general way of representing any linear code. An example is
`
`shown in Figure 1.5. A Tanner graph has two kinds of nodes, called variable nodes,
`
`represented by hollow circles in the figure, and check nodes, represented by filled
`
`circles. The graph is bipartite between these two types of nodes, i.e., every edge has
`
`

`
`13
`
`Figure 1.5: A small Tanner graph.
`
`a variable node at one end and a check node at the other end. The variable nodes
`
`represent actual variables, for example, elements of the codeword. The check nodes
`
`represent constraints among these variables. All the graphs we will look at will be for
`
`binary linear codes, in which case the variable nodes represent binary variables, and a
`
`check node says that the binary sum of all its neighbors is 0. Clearly, any linear code
`
`may be represented in this manner, by directly transcribing its parity-check matrix.
`
`In fact, it will have many possible representations, because it has many possible
`
`parity-check matrices, and also because we can add dummy or state variables to the
`
`graph, as we shall soon see in the case of RA codes.
`
`1.2.4 Decoding on Tanner Graphs
`
`The sum-product algorithm takes a particularly elegant form on a Tanner graph.
`
`It was first described in this case by Gallager [17] in the context of LDPC codes.
`
`It is a completely distributed algorithm, with each node acting as an independent
`
`entity, communicating with other nodes through the edges. The message sent by a
`
`variable node to a check node, say in LLR form, is its estimate of its own value. The
`
`message sent by a check node to a variable node is its estimate of the variable node’s
`
`

`
`14
`
`value. The update rules at the nodes are essentially MAP estimators, given that the
`
`incoming messages along the different edges are independent. Again, in order not to
`
`form short cycles in the computation tree, the output along any edge is based only on
`
`input from the other edges. At a variable node of degree j, if l1, l2, . . . , lj−1 denote
`the incoming LLR’s along j − 1 edges, and l0 the LLR corresponding to the channel
`evidence, then the outgoing LLR lout along the jth edge is merely the MAP estimate
`
`of the underlying binary random variable given j independent estimates of it, and is
`
`given by
`
`j−1(cid:6)
`
`lout = l0 +
`
`li.
`
`i=1
`
`(1.5)
`
`At a check node, the situation is similar, though the update rule is more compli-
`
`cated. If l1, l2, . . . , lk−1 denote the incoming LLR’s at a check node of degree k, then
`
`the outgoing LLR lk along the kth edge corresponds to the pdf of the binary sum of
`j − 1 independent random variables, and works out to be
`k−1(cid:7)
`
`tanh(lout/2) =
`
`tanh(li/2).
`
`(1.6)
`
`i=1
`
`(For a derivation of eqs. (1.5) and (1.6), see [42, Section 3.2].)
`
`Given these update rules, we only need a schedule for updating the various mes-
`
`sages to complete the description of the decoding

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket