IPR2017-00210, No. 1018-40 Exhibit - Khandekar, Graph based Codes and Iterative Decoding (P.T.A.B. Nov. 15, 2016)

Graph-based Codes and Iterative Decoding
`
`Thesis by
`
`Aamod Khandekar
`
`In Partial Fulﬁllment of the Requirements
`
`for the Degree of
`
`Doctor of Philosophy
`
`California Institute of Technology
`
`Pasadena, California
`
`2002
`
`(Submitted June 10, 2002)
`
`i
`
`Apple 1018
`
`

`
`ii
`
`c(cid:2) 2002
`
`Aamod Khandekar
`
`All Rights Reserved
`
`

`
`iii
`
`Acknowledgements
`
`I would like to begin by expressing my heartfelt gratitude to my advisor, Prof. Robert
`
`McEliece, for his active guidance and support throughout my stay at Caltech. His
`
`infectious enthusiasm and his many insights have been invaluable for the progress of
`
`my research career.
`
`I would like to thank Prof. Robert McEliece, Prof. P. P. Vaidyanathan, Prof.
`
`Jehoshua (Shuki) Bruck, Prof. John Preskill, Prof. Babak Hassibi and Dr. Dariush
`
`Divsalar for serving on my candidacy and defense committees, and for their advice
`
`on this thesis.
`
`I am grateful to all the current and former occupants of Moore 155, and to current
`
`and former residents of Braun graduate house, for making my stay at Caltech a very
`
`pleasant one.
`
`

`
`iv
`
`Abstract
`
`The ﬁeld of error correcting codes was revolutionized by the introduction of turbo
`
`codes [7] in 1993. These codes demonstrated dramatic performance improvements
`
`over any previously known codes, with signiﬁcantly lower complexity. Since then,
`
`much progress has been made towards understanding the performance of these codes,
`
`as well as in using this understanding to design even better codes.
`
`This thesis takes a few more steps in both these directions. We develop a new
`
`technique, called the typical set bound, for analyzing the asymptotic performance
`
`of code ensembles based on their weight enumerators. This technique yields very
`
`tight bounds on the maximum-likelihood decoding threshold of code ensembles, and
`
`is powerful enough to reproduce Shannon’s noisy coding theorem for the class of
`
`binary-input symmetric channels.
`
`We also introduce a new class of codes called irregular repeat-accumulate (IRA)
`
`codes, which are adapted from the previously known class of repeat-accumulate (RA)
`
`codes. These codes are competitive in terms of decoding performance with the class
`
`of irregular low-density parity-check (LDPC) codes, which are arguably the best class
`
`of codes known today, at least for long block lengths. In addition, IRA codes have a
`
`signiﬁcant advantage over irregular LDPC codes in terms of encoding complexity.
`
`We also derive an analytical bound regarding iterative decoding thresholds of code
`
`ensembles on general binary-input symmetric channels, an area in which theoretical
`
`results are currently lacking.
`
`

`
`v
`
`Contents
`
`Acknowledgements
`
`Abstract
`
`1 Introduction
`
`1.1 Some Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.1.1 Channel Models . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.1.2 Codes and Code Ensembles
`
`. . . . . . . . . . . . . . . . . . .
`
`1.1.3 Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . . .
`
`1.2 Some Graphical Code Ensembles
`
`. . . . . . . . . . . . . . . . . . . .
`
`1.2.1 Parallel Concatenation of Convolutional Codes (PCCC) . . . .
`
`1.2.2
`
`Serial Concatenation of Convolutional Codes (SCCC) . . . . .
`
`1.2.3 Codes Deﬁned on Tanner Graphs . . . . . . . . . . . . . . . .
`
`1.2.4 Decoding on Tanner Graphs . . . . . . . . . . . . . . . . . . .
`
`1.2.5 Low-Density Parity-Check (LDPC) Codes
`
`. . . . . . . . . . .
`
`1.2.6 Repeat Accumulate (RA) Codes . . . . . . . . . . . . . . . . .
`
`1.3 Density Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.3.1 Density Evolution on the BEC . . . . . . . . . . . . . . . . . .
`
`1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2 The Typical Set Bound
`
`2.1 The Union Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.2 The Typical Set Decoder . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.3 The Typical Set Bound . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`iii
`
`iv
`
`1
`
`2
`
`2
`
`6
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`15
`
`15
`
`17
`
`18
`
`19
`
`21
`
`22
`
`24
`
`28
`
`

`
`vi
`
`2.3.1 Properties of K(δ)
`
`. . . . . . . . . . . . . . . . . . . . . . . .
`
`2.3.2 The Main Theorem . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.4 The Typical Set Bound for Speciﬁc Channel Models . . . . . . . . . .
`
`2.4.1 The Binary Symmetric Channel
`
`. . . . . . . . . . . . . . . . .
`
`2.4.2 The Binary Erasure Channel . . . . . . . . . . . . . . . . . . .
`
`2.5 The Typical Set Bound for Speciﬁc Code Ensembles . . . . . . . . . .
`
`2.5.1 The Ensemble of Random Linear Codes
`
`. . . . . . . . . . . .
`
`2.5.2 LDPC Codes
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.5.3 Cycle Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.5.4 RA Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.6 Generalization to Arbitrary BISC’s . . . . . . . . . . . . . . . . . . .
`
`3 Irregular Repeat-Accumulate Codes
`
`3.1
`
`Irregular LDPC Codes: Deﬁnition . . . . . . . . . . . . . . . . . . . .
`
`3.2
`
`Irregular LDPC Codes on the BEC . . . . . . . . . . . . . . . . . . .
`
`3.3
`
`IRA Codes: Deﬁnition and Notation . . . . . . . . . . . . . . . . . .
`
`3.4
`
`IRA Codes on the BEC . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`3.4.1 Fixed Point Analysis of Iterative Decoding . . . . . . . . . . .
`
`3.4.2 Capacity Achieving Sequences of Degree Distributions . . . . .
`
`3.4.3
`
`Some Numerical Results . . . . . . . . . . . . . . . . . . . . .
`
`3.5
`
`IRA Codes on the BIAGN Channel
`
`. . . . . . . . . . . . . . . . . . .
`
`3.5.1 Gaussian Approximation . . . . . . . . . . . . . . . . . . . . .
`
`3.5.2 Numerical Results
`
`. . . . . . . . . . . . . . . . . . . . . . . .
`
`3.6 Complexity Analysis
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`4 A Lower Bound on Iterative Decoding Thresholds for General
`
`BISC’s
`
`29
`
`30
`
`33
`
`33
`
`34
`
`36
`
`36
`
`38
`
`40
`
`42
`
`43
`
`46
`
`47
`
`49
`
`51
`
`54
`
`54
`
`56
`
`59
`
`61
`
`61
`
`64
`
`67
`
`72
`
`73
`
`

`
`vii
`
`4.1
`
`Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`4.1.1 The Consistency Condition . . . . . . . . . . . . . . . . . . .
`
`4.1.2 The Stability Condition . . . . . . . . . . . . . . . . . . . . .
`
`4.2 The Main Result
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`5 IRA Codes on Non-Binary Channels
`
`5.1 The 2-D Gaussian Channel . . . . . . . . . . . . . . . . . . . . . . . .
`
`5.2 The Binary Adder Channel
`
`. . . . . . . . . . . . . . . . . . . . . . .
`
`5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`6 Conclusions
`
`A Miscellaneous Derivations for Chapter 2
`
`B Miscellaneous Derivations for Chapter 3
`
`C Miscellaneous Derivations for Chapter 4
`
`Bibliography
`
`73
`
`73
`
`74
`
`75
`
`80
`
`82
`
`82
`
`86
`
`89
`
`90
`
`92
`
`94
`
`98
`
`100
`
`

`
`viii
`
`List of Figures
`
`1.1 A canonical communication system. . . . . . . . . . . . . . . . . . . .
`
`3
`
`1.2 Parallel concatenation of two convolutional codes, connected through
`
`a random interleaver. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`10
`
`1.3 Serial concatenation of two convolutional codes, connected through a
`
`random interleaver.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.4 A repeat-accumulate (RA) code. . . . . . . . . . . . . . . . . . . . . .
`
`1.5 A small Tanner graph.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.6 The Tanner graph of an RA code. . . . . . . . . . . . . . . . . . . . .
`
`2.1 The function Kp(δ) for the BSC.
`
`. . . . . . . . . . . . . . . . . . . .
`
`2.2 The function Kp(δ) for the BEC.
`
`. . . . . . . . . . . . . . . . . . . .
`
`2.3 The asymptotic spectral shape of the ensemble of rate 1/3 random
`
`linear codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.4 The asymptotic spectral shape of the ensemble of (4, 8) LDPC codes.
`
`2.5 The asymptotic spectral shape of the ensemble of q = 5 RA codes. . .
`
`3.1 The Tanner graph of an irregular LDPC code.
`
`. . . . . . . . . . . . .
`
`3.2 The Tanner graph of an IRA code.
`
`. . . . . . . . . . . . . . . . . . .
`
`3.3 Performance of rate 1/2 IRA codes on the BIAGN channel. . . . . . .
`
`3.4 Variation of decoded BER with the number of iterations.
`
`. . . . . . .
`
`4.1 BIAGN channel thresholds of codes optimized for the BEC.
`
`. . . . .
`
`4.2 BSC thresholds of codes optimized for the BEC. . . . . . . . . . . . .
`
`12
`
`12
`
`13
`
`16
`
`33
`
`34
`
`36
`
`39
`
`42
`
`48
`
`51
`
`67
`
`70
`
`75
`
`76
`
`

`
`ix
`
`5.1 Performance of an IRA code on the 2-D Gaussian channel with 8-PSK
`
`modulation.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`84
`
`5.2 Performance of an IRA code on the 2-D Gaussian channel with 16-
`
`QAM modulation.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`5.3 Graphical representation of a BAC coding scheme. . . . . . . . . . . .
`
`85
`
`87
`
`

`
`x
`
`List of Tables
`
`2.1 BSC and BEC typical set thresholds for LDPC codes. . . . . . . . . .
`
`2.2 BSC and BEC typical set thresholds for RA codes.
`
`. . . . . . . . . .
`
`2.3 Some BIAGN channel typical set thresholds.
`
`. . . . . . . . . . . . . .
`
`40
`
`43
`
`44
`
`3.1 BEC thresholds for some ensembles of IRA codes.
`
`. . . . . . . . . . .
`
`60
`
`3.2 Some good degree sequences for rate 1/3 IRA codes on the BIAGN
`
`channel.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`65
`
`3.3 Some good degree sequences for rate 1/2 IRA codes on the BIAGN
`
`channel.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`66
`
`4.1 Comparison between capacity and threshold achieved by codes opti-
`
`mized for the BEC. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`81
`
`

`
`1
`
`Chapter 1 Introduction
`
`In his seminal paper in 1948, Shannon [45] derived the fundamental limits on the rate
`
`of communication on a noisy channel. Shannon’s “noisy coding theorem” states that
`
`every channel has a capacity C, which is the highest rate (in bits per channel use)
`
`at which reliable communication is possible. Shannon also showed that for a long
`
`enough block length, almost any block code of rate R < C, with optimal decoding,
`
`provides reliable communication over this channel. This scheme, however, is not a
`
`practical one, since both encoding and decoding are prohibitively expensive.
`
`Ever since Shannon proved his noisy coding theorem, the construction of practical
`
`capacity-achieving schemes has been the supreme goal of coding theory. The classical
`
`approaches to this problem included algebraic block codes and convolutional codes.
`
`The ﬁeld was, however, revolutionized by the introduction of turbo codes by Berrou,
`
`Glavieux, and Thitimajshima [7]. The performance of turbo codes was much closer
`
`to capacity than that of any previous codes, and with signiﬁcantly lower complexity.
`
`The power of turbo codes comes not only from the code construction, but also
`
`from the iterative decoding algorithm used. The code construction consists of simple
`
`blocks connected by a pseudorandom interleaver. The interleaver introduces enough
`
`randomness into the code to ensure good performance, yet keeps enough structure
`
`to allow simple encoding and decoding algorithms. The invention of turbo codes led
`
`to an explosion of interest in the ﬁeld of codes on graphs and iterative decoding,
`
`which led among other things to the rediscovery of low-density parity-check (LDPC)
`
`codes [17, 37], and the invention of repeat-accumulate (RA) codes [15].
`
`While the turbo-decoding algorithm worked extremely well in practice, there was
`
`very little theory available to explain its dramatic performance. It was soon discov-
`
`

`
`2
`
`ered that the turbo-decoding algorithm was an instance of Pearl’s belief propagation
`
`algorithm over loopy graphs [39], and several frameworks were developed to formalize
`
`this notion [39, 4, 31]. The belief propagation algorithm is known to terminate in
`
`ﬁnite time in the case of non-loopy graphs (trees) and is optimal in this case. Its
`
`behavior on general graphs was, however, not known. Some results in this direction
`
`were presented in [2, 51, 52]. Luby et al. [33, 34], followed by Richardson and Ur-
`
`banke [42], showed how LDPC codes could be approximated by the cycle-free case,
`
`and were able to prove asymptotic results about their iterative decoding performance.
`
`Luby et al. also introduced the concept of irregularity, which seems to provide hope
`
`of operating arbitrarily close to channel capacity in a practical manner, on a wide
`
`class of channel models. More recently, connections have been discovered between
`
`coding theory and statistical physics [54, 55, 5], which show some hope of providing
`
`insight into the general problem of decoding on loopy graphs.
`
`All the codes mentioned in the preceding paragraph are so-called “graphical code
`
`ensembles.” Most of this thesis deals with the analysis and design of such code
`
`ensembles, and hence we devote the rest of this chapter to some basic background
`
`material concerning them.
`
`1.1 Some Basic Concepts
`
`1.1.1 Channel Models
`
`A canonical communication system is depicted in Figure 1.1. The objective is to
`
`communicate an input string U across a “noisy”channel. The encoder converts this
`
`string into another string X of symbols over the channel’s input alphabet. A string
`
`Y is seen at the other end of the channel, which is a non-deterministic function of
`
`the channel input X, and the decoder tries to reconstruct the input string U based
`
`on the knowledge of Y. To analyze such a system, we need to have a model for the
`
`

`
`3
`
`Encoder
`
`Channel
`
`U
`
`X
`
`Y
`
`Decoder
`
`Û
`
`Figure 1.1: A canonical communication system.
`
`channel. Typically, we assume that the output string Y of the channel has the same
`
`length as the input string X, and depends on X via a conditional probability density
`function (pdf) pY|X(y|x). This is still an extremely general model and we therefore
`
`deﬁne several special cases.
`
`Deﬁnition 1.1 A channel is called memoryless if the channel output at any time
`
`(cid:2)n
`instant depends only on the input at that time instant, i.e., if y = y1y2 . . . yn and
`x = x1x2 . . . xn, then pY|X(y|x) =
`i=1 pY |X(yi|xi).
`
`In this case, the channel is
`
`completely described by its input and output alphabets, and the conditional pdf
`pY |X(y|x) for one time instant.
`
`Deﬁnition 1.2 (Capacity) Let the input to a memoryless channel be generated by
`
`independent and identically distributed (iid) copies of a random variable X. Then
`
`the output of the channel will be iid copies of some random variable Y whose pdf
`
`can be computed. Clearly, I(X; Y ), the information between random variables X
`
`and Y (see [10] for a deﬁnition), is a function of the pdf of X. The capacity of the
`
`memoryless channel is then deﬁned as suppX (x) I(X; Y ). The motivation behind this
`
`deﬁnition will be clear in Section 1.1.2.
`
`Example 1.1 A very commonly used channel model is the additive Gaussian noise
`
`channel. This channel has R as its input and output alphabets, and is parametrized
`
`by a non-negative real number σ. The channel output Y is given by X + N , where
`
`X is the channel input and N is a Gaussian random variable (random variable) with
`mean 0 and variance σ2. The conditional pdf pY |X(y|x) is therefore a Gaussian pdf
`
`

`
`with mean x and variance σ2.
`
`4
`
`Deﬁnition 1.3 A binary input channel is merely a channel with a binary input alpha-
`bet. We will interchangeably use the sets {0, 1} and {+1,−1} for the input alphabet
`
`with 0 mapping to +1 and 1 to -1.
`
`Example 1.2 (The Z-Channel) This is a binary input channel with a parameter
`p and output alphabet {0, 1}, in which a 0 is always received as a 0, but a 1 could be
`received as a 0 with probability p. That is, pY |X(0|0) = 1, pY |X(1|0) = 0, pY |X(0|1) =
`p, and pY |X(1|1) = 1 − p.
`
`Suppose we want to compute the input distribution of a binary-input channel
`
`conditioned on the knowledge of the received value y, i.e., we want to compute the
`a posteriori probabilities Pr(X = 0|Y = y) and Pr(X = 1|Y = y), which sum to 1.
`
`This knowledge can be eﬃciently packaged into one number, for example their ratio.
`
`By Bayes’ rule, we have
`
`Pr(X = 0|Y = y)
`Pr(X = 1|Y = y)
`
`=
`
`pY |X(y|0)
`pY |X(y|1)
`
`Pr(X = 0)
`Pr(X = 1)
`
`.
`
`(1.1)
`
`Hence the quantity
`
`pY |X (y|0)
`pY |X (y|1) is a suﬃcient statistic for estimating the input to the
`
`channel. So, of course, is any invertible function of it. This leads us to the following
`
`deﬁnition.
`
`Deﬁnition 1.4 The quantity
`
`input channel is called its likelihood ratio.
`
`pY |X (y|0)
`pY |X (y|1) corresponding to the output y of a binary-
`pY |X (y|0)
`pY |X (y|1)) is called the
`
`Its logarithm log
`
`log-likelihood ratio (LLR).
`
`Notice that there does not have to be an explicit channel for a (log)-likelihood
`
`ratio to be deﬁned. It is well deﬁned in the context of a noisy (probabilistic) estimate
`
`of a binary random variable.
`
`

`
`5
`
`Deﬁnition 1.5 A binary input symmetric channel (BISC) is a memoryless binary
`input channel with R as its output alphabet, satisfying pY |X(y|0) = pY |X(−y|1). Thus
`a BISC is completely described by the conditional pdf pY |X(y|0).
`
`In the BISC case, by symmetry, the optimizing pdf in Deﬁnition 1.2 is uniform,
`
`i.e., (1/2, 1/2), and hence the capacity computation is much simpliﬁed.
`
`The BISC assumption leads to many such simpliﬁcations in analysis and design,
`
`and most of the work presented in this thesis will be for this case. Also, many natural
`
`channel models do fall under this class. We present some of the most important ones
`
`here.
`
`Example 1.3 (The binary symmetric channel (BSC)) This is a binary-input,
`
`binary-output channel with parameter p. To view it as a BISC, it is convenient to
`let the input and output alphabets be {+1,−1}. Then the output is equal to the
`input with probability 1 − p, and is the negative of the input with probability p. p is
`
`called the crossover probability of the channel. We will omit writing the conditional
`pdf explicitly. The capacity of this channel is given by 1 − H(p), where H(·) is the
`
`entropy function.
`
`Example 1.4 (The binary erasure channel (BEC)) This channel, with param-
`eter p, has input alphabet {+1,−1} and output alphabet {+1, 0,−1}. The output
`
`symbol 0 is also called an erasure. The output is equal to the input with probability
`1 − p and is 0 with probability p. p is called the probability of erasure. This channel
`
`is arguably the simplest nontrivial channel model, and will be one of the focal points
`of this thesis. The capacity of this channel is 1 − p.
`
`Example 1.5 (The binary input additive Gaussian noise (BIAGN) channel)
`
`This is the additive Gaussian noise channel restricted to inputs +1 and -1. The ex-
`
`pression for the capacity of a general BISC is derived in Chapter 2, and is given by
`
`

`
`eq. (2.12). For the BIAGN channel, this expression simpliﬁes to
`
`(cid:3)
`
`C = 1 − 1√
`2πσ2
`where H(·) is again the entropy function.
`
`H
`
`R
`
`6
`
`(cid:4)
`
`(cid:5)
`
`1
`1 + e2x/σ2
`
`(x−1)2
`2σ2 dx,
`
`e
`
`(1.2)
`
`1.1.2 Codes and Code Ensembles
`
`The basic idea in coding theory is to add redundancy to the transmitted information
`
`in order to help combat channel errors. Thus, in Figure 1.1, we restrict the set of
`
`strings X that can be used, so that no two legal strings are “close to” one another.
`
`In this way, a channel error is unlikely to cause confusion between two legal strings.
`
`Deﬁnition 1.6 A block code of length n over an alphabet A is a subset of An, the
`set of n-length strings over A. The elements of this subset are called codewords.
`
`We typically need to introduce more structure into our codes to aid in both analysis
`
`and design. Most codes used in practice are so-called linear codes, which are deﬁned
`when the elements of the alphabet A form a ﬁeld.
`
`Deﬁnition 1.7 An (n, k) linear code over a ﬁeld F is a k-dimensional vector subspace
`of F n. k is called the dimension of the code, r = n − k its redundancy, and R = k/n
`
`its rate. A binary linear code is merely a linear code over the binary ﬁeld.
`
`Linear codes have several nice properties, for example, they look exactly the same
`around any codeword. That is, if C is a linear code and c ∈ C is a codeword, then
`the set C − c is identical to C. Also, in order to describe a linear code, we don’t have
`
`to list all its elements, but merely a basis. Such a description is called a generator
`
`matrix representation.
`
`Deﬁnition 1.8 A generator matrix for an (n, k) linear code C is a k × n matrix G
`
`

`
`7
`whose rows form a basis for C. As u varies over the space F k, uG varies over the set
`
`of codewords. Thus, the matrix G provides an encoding mechanism for the code.
`
`Another useful representation of a linear code is a parity-check matrix representa-
`
`tion.
`
`Deﬁnition 1.9 A parity-check matrix for an (n, k) linear code C is an (n − k) × n
`matrix H whose rows form a basis for the space of vectors orthogonal to C. That is,
`H is a full rank matrix s.t. Hc = 0 ⇐⇒ c ∈ C.
`
`To formalize the notion of codewords being “close to” each other, we will make
`
`use of the Hamming metric.
`
`Deﬁnition 1.10 The Hamming distance between two vectors is the number of com-
`
`ponents in which they diﬀer. The minimum distance of a code is the smallest Ham-
`
`ming distance between two distinct codewords. For a linear code, this is the same as
`
`the least weight of any nonzero codeword.
`
`Another useful notion is that of an ensemble of codes, which is used when we wish
`
`to average some quantity over a set of codes.
`
`Deﬁnition 1.11 An ensemble of codes is a set of codes of the same length, and
`
`typically having approximately the same rates, with an associated pdf. We use the
`
`same term for sequences of such sets, often with length going to inﬁnity.
`
`Example 1.6 (The ensemble of random codes) To construct this ensemble, ﬁx
`
`a length n and a rate R. Then the number of codewords should be 2nR. Also ﬁx a pdf
`
`pX(x) over the code alphabet, and pick each element of each codeword independently
`
`according to this pdf. This ensemble is known as the ensemble of random codes.
`
`For a memoryless channel, Shannon [45] showed that if pX(x) was chosen to be
`
`the optimizing pdf in Deﬁnition 1.2, then for any rate R less than the capacity of the
`
`

`
`8
`
`channel, the average probability of error for the ensemble of random codes tends to
`
`zero as n tends to inﬁnity. He also showed that the probability of error for a code
`
`whose rate exceeded the capacity was bounded away from zero.
`
`If pX(x) is uniform over the input alphabet, then this ensemble is very similar to
`
`the one in which we pick any code of the correct length and the correct rate with
`
`equal probability, and we use the term “random codes” for this ensemble also.
`
`Example 1.7 (The ensemble of random linear codes) This is the set of all lin-
`
`ear codes of some ﬁxed rate R (and length n), each selected with equal probability.
`
`For a BISC, this ensemble is known to achieve capacity [18] (i.e., the average proba-
`
`bility of error tends to zero as n tends to inﬁnity for any rate less than the capacity
`
`of the channel). This is another reason why the BISC assumption is so useful. One
`
`proof of this fact will be given in Chapter 2.
`
`This ensemble is sometimes deﬁned by saying that every entry in the generator
`
`(or parity check) matrix is picked independently with a uniform pdf. While the
`
`ensembles so deﬁned are not identical to the ﬁrst deﬁnition, most properties for large
`
`n (in particular the weight enumerator, to be deﬁned in Chapter 2) are indeed the
`
`same. We will use the term “random linear codes” for either description.
`
`1.1.3 Decoding Algorithms
`
`Given the output of a noisy channel, we need to form an estimate of the transmitted
`
`codeword. We now deﬁne notions of “optimal” decoding algorithms.
`
`Given a received vector y, the codeword most likely to have been transmitted
`is the one that maximizes pX|Y(x|y). If the channel is memoryless and each of the
`
`codewords is equally likely, then this reduces to the codeword x which maximizes
`pY|X(y|x) (which factorizes if the channel is memoryless). This is known as the
`
`maximum likelihood (ML) estimate of the transmitted codeword.
`
`

`
`9
`Deﬁnition 1.12 Given a code C and a received vector y, the maximum likelihood
`
`decoder has as its output
`
`ˆx = arg max
`x∈C
`
`pY|X(y|x).
`
`(1.3)
`
`Clearly this decoder has the least possible (word) probability of error.
`
`If, on the other hand, we wish to minimize the bit error probability, then we need
`to maximize Pr(xi = x|y) over all x.
`
`Deﬁnition 1.13 Given a received vector y, the MAP (maximum a posteriori) de-
`
`coder has as its output in the ith position
`
`ˆxi = arg max
`
`x∈A Pr(xi = x|y),
`
`(1.4)
`
`where the maximization is over the input alphabet of the channel.
`
`In the case of a binary code, the maximization is over a binary alphabet. Taking
`
`the maximum of two values is equivalent to comparing their ratio to 1, or the log of
`
`the ratio to 0. This latter quantity is nothing but the LLR. Therefore, MAP decoding
`
`in the case of a binary code is equivalent to taking the sign of the a posteriori LLR.
`
`1.2 Some Graphical Code Ensembles
`
`In this section, we will introduce some important graphical code ensembles, which
`
`is a generic term we will use to describe all “turbo-like” codes, or codes amenable
`
`to iterative decoding. All of these codes can be viewed under a uniﬁed graphical
`
`framework. Several such frameworks have been proposed, for example, see [4], [31],
`
`and [16]. In all these cases, the iterative decoding algorithm reduces to an instance of
`
`Pearl’s belief propagation algorithm [41] on loopy graphs. Such a general view will,
`
`however, not be necessary for the purposes of this thesis, and we will describe each
`
`

`
`10
`
`Convolutional
`Encoder 1 (IIR)
`
`Convolutional
`Encoder 2 (IIR)
`
`π
`
`Figure 1.2: Parallel concatenation of two convolutional codes, connected through a
`random interleaver (denoted by π).
`
`ensemble separately with its own decoding algorithm. The speciﬁc decoding algorithm
`
`we will describe is called the sum-product algorithm, which aims to minimize the bit
`
`error probability. The idea in every case is that we use “long, random-like codes”
`
`as suggested by Shannon, which is possible through the existence of a simple though
`
`suboptimal decoding algorithm.
`
`1.2.1 Parallel Concatenation of Convolutional Codes
`
`(PCCC)
`
`These codes are also known as “parallel turbo codes.” The original turbo code intro-
`
`duced by Berrou et al. [7] was a code of this type. The general structure is shown
`
`in Figure 1.2. As mentioned earlier, it consists of two relatively simple constituent
`
`codes, more speciﬁcally truncated binary IIR convolutional codes with a short con-
`
`straint length. They are connected by an interleaver (labeled π in the ﬁgure), which
`
`is merely a pseudo-random permutation of the information bits. In the ﬁgure, there is
`
`also a systematic (information) bit stream, which could be absent. There could also
`
`be more than two constituent codes, each with its own interleaver. Here, we brieﬂy
`
`

`
`11
`
`describe the decoding algorithm in the case when there are two constituent encoders
`
`and a systematic data stream.
`
`The aim of the sum-product algorithm is to approximate MAP decoding, as de-
`
`ﬁned in Deﬁnition 1.13, or equivalently to compute the a posteriori log-likelihoods
`
`of the individual transmitted bits given the received vector. The MAP decoding al-
`
`gorithm for the constituent convolutional codes can be implemented with the well
`
`known forward-backward or BCJR [6] algorithm, which is feasible in this case be-
`
`cause these codes have a short constraint length. Given an a priori estimate (or LLR)
`
`on each information bit and an LLR for each transmitted bit, the BCJR algorithm
`
`outputs the correct a posteriori LLR for each information bit.
`
`The turbo-decoding algorithm iterates between the MAP decoders corresponding
`
`to the two constituent codes. The received values corresponding to the systematic bits
`
`are used to initialize the a priori LLR’s for the information bits. One of the constituent
`
`decoders then outputs the a posteriori LLR’s by running the BCJR algorithm, the
`
`idea being to use these as a priori LLR’s for the other decoder. However, in order not
`
`to form short loops in the so-called “computation tree,” the diﬀerence between the
`
`a posteriori and the a priori LLR’s (this is known as extrinsic information) is fed to
`
`the other decoder as a priori LLR’s, and the same operation is repeated over and over
`
`again. Various stopping rules are used to decide on convergence and guard against
`
`limit-cycles.
`
`1.2.2 Serial Concatenation of Convolutional Codes (SCCC)
`
`These codes are also known as “serial turbo-codes.” In this case each convolutional
`
`encoder acts on the interleaved output of the previous one, instead of on the infor-
`
`mation stream directly. The general structure is shown in Figure 1.3. In this case
`
`also, the sum-product algorithm iterates between the decoders corresponding to the
`
`constituent codes. Some slight modiﬁcations are needed from the PCCC case, but
`
`

`
`12
`
`π
`
`Convolutional
` Encoder 1
`
`Convolutional
`Encoder 2 (IIR)
`
`Figure 1.3: Serial concatenation of two convolutional codes, connected through a
`random interleaver (denoted by π).
`
`the basic idea is the same, and we will omit giving a detailed description here.
`
`One example of the SCCC case is the ensemble of repeat-accumulate (RA) codes
`
`introduced in [15]. An RA code is shown in Figure 1.4. It is the concatenation of
`
`two particularly simple constituent codes, an outer “repeat by q” code and an inner
`
`“accumulate” code. This simple structure was intended to make analysis possible,
`
`but their performance under iterative decoding is surprisingly good, especially consid-
`
`ering that constituent decoders have extremely low complexity. These codes play an
`
`important role in this thesis, particularly in Chapter 3, and we will give an alternative
`
`description of them in Section 1.2.3.
`
`Repetition by q
` (Rate 1/q)
`
`π
`
`1/(1+D)
`(Rate 1)
`
`Figure 1.4: A repeat-accumulate (RA) code.
`
`1.2.3 Codes Deﬁned on Tanner Graphs
`
`A Tanner graph is a general way of representing any linear code. An example is
`
`shown in Figure 1.5. A Tanner graph has two kinds of nodes, called variable nodes,
`
`represented by hollow circles in the ﬁgure, and check nodes, represented by ﬁlled
`
`circles. The graph is bipartite between these two types of nodes, i.e., every edge has
`
`

`
`13
`
`Figure 1.5: A small Tanner graph.
`
`a variable node at one end and a check node at the other end. The variable nodes
`
`represent actual variables, for example, elements of the codeword. The check nodes
`
`represent constraints among these variables. All the graphs we will look at will be for
`
`binary linear codes, in which case the variable nodes represent binary variables, and a
`
`check node says that the binary sum of all its neighbors is 0. Clearly, any linear code
`
`may be represented in this manner, by directly transcribing its parity-check matrix.
`
`In fact, it will have many possible representations, because it has many possible
`
`parity-check matrices, and also because we can add dummy or state variables to the
`
`graph, as we shall soon see in the case of RA codes.
`
`1.2.4 Decoding on Tanner Graphs
`
`The sum-product algorithm takes a particularly elegant form on a Tanner graph.
`
`It was ﬁrst described in this case by Gallager [17] in the context of LDPC codes.
`
`It is a completely distributed algorithm, with each node acting as an independent
`
`entity, communicating with other nodes through the edges. The message sent by a
`
`variable node to a check node, say in LLR form, is its estimate of its own value. The
`
`message sent by a check node to a variable node is its estimate of the variable node’s
`
`

`
`14
`
`value. The update rules at the nodes are essentially MAP estimators, given that the
`
`incoming messages along the diﬀerent edges are independent. Again, in order not to
`
`form short cycles in the computation tree, the output along any edge is based only on
`
`input from the other edges. At a variable node of degree j, if l1, l2, . . . , lj−1 denote
`the incoming LLR’s along j − 1 edges, and l0 the LLR corresponding to the channel
`evidence, then the outgoing LLR lout along the jth edge is merely the MAP estimate
`
`of the underlying binary random variable given j independent estimates of it, and is
`
`given by
`
`j−1(cid:6)
`
`lout = l0 +
`
`li.
`
`i=1
`
`(1.5)
`
`At a check node, the situation is similar, though the update rule is more compli-
`
`cated. If l1, l2, . . . , lk−1 denote the incoming LLR’s at a check node of degree k, then
`
`the outgoing LLR lk along the kth edge corresponds to the pdf of the binary sum of
`j − 1 independent random variables, and works out to be
`k−1(cid:7)
`
`tanh(lout/2) =
`
`tanh(li/2).
`
`(1.6)
`
`i=1
`
`(For a derivation of eqs. (1.5) and (1.6), see [42, Section 3.2].)
`
`Given these update rules, we only need a schedule for updating the various mes-
`
`sages to complete the description of the decoding

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases