`
`Daniel A. Spielman1
`
`Massachusetts Institute of Technology
`
`Abstract. By concatenating linear—time codes with small, good codes,
`it is possible to construct in polynomial time a family of asymptotically
`good codes that approach the Shannon bound that can be encoded and
`decoded in linear time. Moreover, their probability of decoder error is
`exponentially small in the block length of the codes. In this survey, we
`will explain exactly what this statement means, how it is derived, and
`what problems in the complexity of error—correcting codes remain open.
`Along the way, we will survey some key developments in the complexity
`of error—correcting codes.
`
`1
`
`Introduction
`
`Error—correcting codes are the means by which we compensate for the cor—
`ruption that occurs in communication over imperfect channels. In a coding
`system, a message first passes through an encoder, which transforms it
`into a codeword; this codeword is then transmitted over the channel. The
`channel modifies the codeword by adding noise, so that the received word
`received by the receiver may differ from the codeword that was transmit-
`ted. The received word is processed by a decoder, which uses the received
`word to guess which codeword was transmitted, and outputs its guess.
`Much of the research on error-correcting codes is devoted to improving
`the trade-off between the probability that the decoder7s guess is correct
`and the complexity of the encoders and decoders.
`In this survey, we examine the software complexity1 of this problem
`from an asymptotic perspective. We will restrict our attention to commu-
`
`nication over the binary symmetric channel (BSC), as it seems natural
`to most computer scientists.2 A channel is binary if it allows the trans-
`mission of only two symbols, which we take to bc 0 and l. The binary
`
`symmetric channel with error-probability p (BSCp) is the channel that
`
`1 We should point out that, at the present time, almost every implementation of error-
`correcting codes uses special—purpose hardware. Moreover, coding schemes that are
`efficient in software can be inefficient in hardware, and vice verso.
`2 Results similar to those in this survey may be obtained for any reasonable memoryless
`channel.
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 1
`
`IPR2018-01474
`Apple Inc. EX1017 Page 1
`
`
`
`transmits one bit at a time and flips the value of that bit with proba—
`bility p. If it does not flip a bit, then it transmits the bit correctly. For
`each bit transmitted by the channel, the probability that it is flipped is
`independent of the others.
`In the rest of this introduction, we will review the definitions needed
`to describe the types of error-correcting codes we will use and then state
`the complexity and coding problems we will consider. We will conclude
`the introduction with an overview of the rest of the paper.
`The framework presented in this survey builds on those developed
`
`in [4] and [29,30]. Most of the material in Sections 2 and 3 can be found
`in those papers. The material in Section 2 is standard, and we refer the
`
`reader to a reference such as [21], [40], or [3] for a more thourough treat-
`ment. We also recommend the recent survey by Vardy [41]. For references
`on error-correcting codes that emphasize an engineer’s perspective, we
`recommend [26], [42] and [6].
`
`1.1 Coding Definitions
`
`A code is an injective mapping from strings to strings. We will consider
`families of block codes over the alphabet {0, 1}. A binary block code is
`one whose messages are strings over {0,1}m and whose codewords are
`strings over {0,1}", for some n 2 m; the length of the code is n, and
`its rate is n/m. The strings of {0, 1}m are the possible messages, and we
`assume that each is equally likely.3 Each image of a string from {0, 1}m is
`a codeword, and the word “code” is sometimes used to refer just to the set
`of codewords. An encoder maps a message to its codeword, and a decoder
`maps a word in {0, 1}" to either a codeword or a message indicating that
`it cannot decide on a codeword.
`
`A family of codes can be defined as an infinite sequence of codes, each
`of a difierent length, indexed by their length. A code constructor for a
`family of codes is a device that takes as input a block length, and outputs
`a description of an encoder and a decoder for the code in the family
`of that length. We do not insist that this decoder be the best possible
`decoder for the resulting code, as we will measure the quality of the coding
`system by the number of errors that the decoder can actually correct. By
`measuring the complexity of the code constructor, we allow ourselves
`
`3 The task of adjusting one’s message space so that each message is equally likely
`is that of compression. Compression schemes need to take advantage of the special
`structure of the data to which they are applied. We would like to avoid such concerns.
`In situations in which error—free communication is desired, it is reasonable to assume
`that data has been compressed before it is encoded with an error—correcting code.
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 2
`
`IPR2018-01474
`Apple Inc. EX1017 Page 2
`
`
`
`to consider the non—uniform complexity of the encoder and decoder. We
`separate the complexity of constructing the encoders and decoders from
`the complexities of these devices themselves because this is how they are
`used: the same encoders and decoders are used many times to transmit
`messages of the same block length In particular, a long message is usually
`broken up into a sequence of blocks which are then transmitted separately.
`It would be reasonable to change the definition of a family of codes so that
`it allows additional parameters, such as the rate of the code and perhaps
`a quality parameter; in this case, the complexity of the constructor should
`also be given in terms of these parameters.
`Systematic codes are a particularly convenient class of codes. In a
`systematic code, the message appears as a substring of the codeword.
`Thus, we can divide the bits of a codeword into message bits and check
`bits, where the message bits contain the message and the encoder need
`only compute the check bits from the message bits. All the codes that
`we consider will be linear codes, and we will see that all such codes can
`easily be made systematic. Linear codes are codes in which the alphabet
`
`is a field, such as the field GF(2), and the encoding function is a linear
`map. Thus, the codewords form a vector space.
`
`1.2 Coding Problems
`
`lntuitively, if an error-correcting code is going to provide strong protec-
`tion against errors, then most codewords should be far from every other
`codeword. Our notion of distance between two words is the Hamming
`distance,
`the number of bits in which the words differ. The decoding
`algorithm for transmission over the BSC that achieves the minimal prob-
`ability of error will map a received word to the codeword to which it is
`closest. This approach is known as maximum likelihood decoding, as it
`selects the codeword that maximizes the probability that the channel will
`output the received word. If there are codewords ml and 1172 that differ
`
`is transmitted and more than [9/2 of those
`in only k places, and if 1111
`bits are flipped, then the optimal decoder will return wg. On a BSC with
`error probability p, the probability of this happening is at least
`
`16
`
`(a +1>/2>p(k+1)/2(1 WW ”/27
`
`,
`
`for k: odd.
`This observation led to the definition of the minimum distance of a
`
`code: the minimum distance achieved by a pair of codewords in the code.
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 3
`
`IPR2018-01474
`Apple Inc. EX1017 Page 3
`
`
`
`If 101, 1,02, and 'v are codewords in a linear code, then
`
`d(w17 w?) : d((w1_ ’U), (1112 _ 11)) : d(07 (w2 _ 101)),
`
`and 1,02 — w] is a codeword; so, the minimum distance of the code is equal
`to the minimum weight of a codeword, where the weight of a codeword is
`equal to its distance from the all-0 word. It is possible to construct families
`of codes in which the rate remains constant while the minimum distance
`
`grows linearly with the block length. Such codes are called asymptotically
`good, and we measure their minimum relative distanceitheir minimum
`distance divided by their block length.
`Sufliciently long codes from an asymptotically good family can pro-
`vide excellent error-protection: the maximum likelihood decoder can cor-
`rect any number of errors up to half the minimum distance. Moreover, if
`one uses an asymptotically good code to transmit over a channel whose
`error probability is less than the relative minimum distance of the code,
`then the probability of decoding error decreases exponentially in the block
`length of the code. Since the rate of the code remains constant, the com—
`munication overhead of the code does not changeione just divides the
`communication into longer blocks. Of course7 the computation time of
`most decoders will increase with the block length. These observations
`lead to a natural complexity of coding problem:
`to design asymptoti—
`cally good codes that have fast encoding, decoding, and constructing al-
`gorithms, where the complexity is measured in terms of the number of
`message bits in the codes. In this case, we consider the number of er—
`rors that the proposed decoder can correct rather than the number of
`errors that the maximum likelihood decoder can correct, as this provides
`a more accurate measure of the quality of the resulting coding system. Of
`course, one should consider the tradeoff between the rate and minimum
`distance of the codes as well, although this affects the length of the trans-
`mission more than the complexity of the algorithms. Bassalygo, Zyablov,
`and Pinsker [4] approached versions of this problem using random linear
`codes, Reed-Solomon codes, and some graph-theoretic codes (from [12]).
`A more natural problem is to measure the average-case performance
`of a coding system. Rather than measure the maximum number of errors
`that the decoder can correct, we will measure the number of errors that
`the decoder can usually correct when the errors are chosen at random.
`
`In his paper, “A Mathematical Theory of Communication”, Shannon [32]
`proved that every channel has a fixed capacity, which is a rate beyond
`which it is not possible to communicate reliably. Moreover, he demon—
`strated that for every rate less than the capacity of the channel, it is
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 4
`
`IPR2018-01474
`Apple Inc. EX1017 Page 4
`
`
`
`possible to transmit information with arbitrarily small error if one uses
`codes of sufficiently long block length. Thus, we are presented with a
`natural complexity problem, first posed by Savage [31]: given a binary
`symmetric channel with error probability p, and an error—tolerance 6, find
`the coding system4 of least complexity that enables one to communicate
`over the channel so that the probability that the output of the decoder
`is incorrect is at most 6.
`
`Neither of the previous problem statements have compared the rate of
`the code produced with the optimal rate possible for the channel. The rate
`of the code is important as it dictates how much redundancy occurs in
`the transmission. As the time required to transmit the message across the
`channel may dominate the time of the computation, such factors cannot
`be ignored. Moreover, part of the channel may be a network, in which case
`an unnecessarily long transmission may slow the transmissions of others
`using the network. Thus, our analyses should include a comparison of
`the rate of our codes with the best rate possible. The existence of the
`Shannon bound makes this reasonable in the analysis of the average-case
`performance of a coding system. Shannon’s bound states that the capacity
`of the BSCp is 1—H(p), where H(:1:) = —x log2 06— (1—56) log2(1—:v) is the
`binary entropy function. It is trickier to examine the tradeoff between the
`rate and minimum relative distance of a code because the best tradeoff
`
`achieved by known codes, the Gilbert—Varshamov bound, does not meet
`
`the best known upper bound, the Linear Programming Bound [24]. Most
`linear codes meet the Gilbert-Varshamov bound, which means that their
`
`rate 7“ and relative minimum distance 6 satisfy 7“ Z 1 — H (6) As we are
`unaware of any recent advances in the construction of binary codes that
`approach the Gilbert-Varshamov bound, we will not have to worry about
`this problem in this survey.
`All the complexity statements in this paper are designed to hold for
`the unit-cost model of a RAM (see [1]) We have verified that the state-
`ments of Section 5 also hold for Pointer Machines (see [16]) and, with
`minor modifications of the coding system such as those outlined in [34],
`for the log-cost model of a RAM [1] as well.
`
`1.3 Overview
`
`In Section 2, we examine the random linear codes introduced by Elias [7]
`and point out that they meet both the Gilbert-Varshamov and Shannon
`bounds. We then consider the Wozencraft ensemble, a smaller family of
`
`4 Actually, Savage just considered the complexity of the decoder.
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 5
`
`IPR2018-01474
`Apple Inc. EX1017 Page 5
`
`
`
`linear codes with similar performance. and show that its complexity is
`polynomially related with its probability of decoder error. In Section 37
`
`we introduce Forney’s [9] method of concatenating codes to obtain low-
`complexity codes that approach the Shannon bound and have a prob—
`ability of decoder error that is almost exponential in their complexity.
`Section 4 provides a very brief overview of constructions of codes based
`on sparse bipartite graphs. These have been used to obtain coding systems
`in which the probability of decoder error is exponential in the complex—
`ity of encoding and decoding. In Section 5, we concatenate the codes of
`Section 2 with the codes of Section 4 to obtain codes that do meet the
`
`Shannon bound and have a probability of decoder error that is exponen—
`tial in the complexity of encoding and decoding. We conclude by pointing
`out that we have not analyzed the complexities of these codes in terms
`of how much computation is required to obtain a given probability of de-
`coder error at a given distance from the Shannon bound. We argue that
`this is an important measure of the performance of a coding system7 and
`that more work on this problem is needed.
`
`2 Random Linear Codes
`
`A linear code is a subspace of GF(2)". One can show that, with high
`probability7 a linear code chosen uniformly at random probably meets
`the Gilbert-Varshamov bound. Similarly, for any rate r < 1 — H (p)7 most
`linear codes of rate r enable the maximum likelihood decoder to achieve
`
`an error probability that decreases exponentially in the block length when
`communicating over the BSCp.
`
`A linear code is usually described by its generator matrix or its check
`matrix. The generator matrix of a linear code of length n and rate r is a
`matrix over GF(2) with rn rows and 71. columns such that the words of
`the code are precisely the linear combinations of the rows of the matrix.
`
`The check matrix of such a code is a matrix of rank (1 — r)n with (1 —r)n
`rows and n columns such that the codewords are exactly those words that
`have inner product zero with each row of this matrix. The check matrix
`is sometimes called the parity check matrir since each row indicates a
`subset of the message bits whose parity must be zero in order for a word
`to be a codeword. Note that elementary row operations do not change the
`code defined by a generator or check matrix. Since the choices for these
`matrices are not unique7 one should presume that some choices will lead
`to simpler representations of the code than others.
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 6
`
`IPR2018-01474
`Apple Inc. EX1017 Page 6
`
`
`
`Encoding: By performing row operations on the check matrix of a
`
`linear code, one can obtain a check matrix that has the (1 —r)n-by-(1—r)n
`identity matrix as a submatrix. Using this check matrix, one can see that a
`linear code can be made systematic: the bits corresponding to the columns
`in the identity matrix can be labeled the check bits, and the other 7'71 bits
`can be used as the message bits. For any setting of the Tn message bits,
`there is exactly one setting of the check bits that produces a word that
`has inner product 0 with each row of this check matrix; moreover, these
`check bits can be computed by multiplying the vector of 7‘” message bits
`by the (1 — r)n-by-rn submatrix of the check matrix corresponding to the
`columns of the message bits. Thus, the code constructor could construct5
`a matrix from which the encoder could encode a message in quadratic
`time.
`
`is
`it
`Code Construction: If we choose a linear code at random,
`very unlikely that its performance will be much worse than that of most
`linear codes. However, we know of no efficient algorithm for certifying
`that a particular linear code will perform well, even assuming that we
`are going to use a maximum likelihood decoder. In general, computing
`
`the minimum distance of a binary code is NP-hard (see [41]), and we
`see no reason that it should be easier for a randomly chosen code. In
`fact, we know of no algorithm for approximating the minimum distance
`of a linear code that is substantially faster than enumerating all low-
`weight words and then checking whether they are codewords. However,
`we can find explicit constructions of linear codes that meet the Gilbert—
`Varshamov bound in less time than that taken by the naive algorithm that
`enumerates all linear codes. An algorithm for doing this was presented
`in [4]. We will present a different approach in Section 2.1.
`Decoding: The best known decoding algorithms for arbitrary linear
`codes require time exponential in the lengths of the codewords. One algo—
`rithm is to enumerate all codewords, compute the distance of each from
`the received word, and output a word closest to the received word.
`
`If one is willing to make a decoder that uses a lot of space, then one can
`speed up this algorithm by constructing a giant look-up table indexed by
`every possible received word. The entry corresponding to a received word
`would be the codeword closest to it. This results in a table with 2" entries.
`
`To obtain a smaller but almost as useful table, one can use syndrome
`decoding. The syndrome of a received word 11;
`is the vector obtained by
`
`5 A naive algorithm will produce this matrix in time 0(n3). An asymptotic advantage
`can be obtained by observing that the time required is asymptotically the same as
`the time required for matrix multiplication (see [15] or [5, Section 164]).
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 7
`
`IPR2018-01474
`Apple Inc. EX1017 Page 7
`
`
`
`computing the inner product of w with each row of the check matrix.
`Note that an error pattern produces the same syndrome regardless of the
`codeword to which it is added. Thus7 if one associates to each syndrome
`a minimal weight error—pattern that produces this syndrome, then one
`can perform maximum-likelihood decoding of a linear code by mapping
`each received word to the codeword obtained by adding the error pattern
`associated with the received word’s syndrome. Such a list of error patterns
`associated with syndromes has 2(1’7’)" entries of length at most n, and
`can be produced in time at most 2” >x< n2 by enumerating all possible
`error-patterns. If one just desires an exponentially decreasing probability
`decoding error when using a BSC with error—probability p, then one can
`speed up this algorithm by only enumerating over error-patterns of weight
`at most 1) + e for some small 6 > 0.
`
`Using syndrome decoding, one can check whether a given code meets
`the Shannon bound in time O(2<1’r+6)n), for any 6 > 0. First, construct
`the syndrome table for all error-patterns of weight at most (37 + e')n in
`time 2(1’T+‘)”n2. Then, for each error pattern of weight at most
`(p +
`6’)n, compute its syndrome and check whether the syndrome decoding
`algorithm would decode it to the the 0 word. Using this information, one
`can estimate the probability of decoding error of the syndrome decoding
`algorithm.
`
`2.1 The Wozencraft Ensemble
`
`The Wozencraft ensemble6 is a set of codes much smaller than the set
`
`of linear codes, but from which a randomly chosen code has an error
`tolerance similar to that of a linear code chosen uniformly at random.
`The property of the Wozencroft ensemble that gives its codes their power
`is that a every word appears as a codeword in an equal number of codes
`in the ensemble. The proof that this condition suffices is almost identical
`to the proofs that a random linear code meets the Gilbert-Varsharnov
`and Shannon bounds.
`
`Theorem 1. Let S be a set of codes of length n and mte 7“ such that
`
`|{C E S :
`
`'w 6 CH does not depend on '11). Then,
`
`1. a code chosen at random from S probably meets the Gilbert-Varshamov
`bound, and
`
`6 The VVozencraft ensemble was presented by Massey [23] and attributed to
`VVozencraft.
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 8
`
`IPR2018-01474
`Apple Inc. EX1017 Page 8
`
`
`
`2. a code chosen at random from S probably approaches the Shannon
`bound.
`
`[Sketch] The probability that a randomly chosen code from S
`Proof.
`contains a word of weight at most k; is (2)2’u’r)". Thus, the probability
`that a randomly chosen code from S contains no non-zero codeword word
`of weight less than 671 is at least
`
`(in.
`
`27(177’)n Z (n),
`
`2:1
`
`2
`
`which becomes less than any constant as n grows large, provided that
`7" g 1 7 H(())
`To prove that a code chosen at random from S probably meets the
`Shannon bound, we assume, without loss of generality, that the 0 word is
`transmitted over the BSCp. The expected number of errors produced by
`the channel is pm, and Chebyshev’s inequality implies that the probability
`
`that it produces more than pn+m errors is less than l/c, for any c 2 1.
`Assuming that at most pn + W errors are produced by the channel,
`the probability that there is another codeword closer than the 0 word to
`the received word is at most
`
`pn+\/CITH
`
`2(1—r)77, Z (n) a
`
`.
`1:1
`
`2
`
`which becomes less than any constant as n grows large, provided that
`7" < l i H (p) Thus, the bound is obtained by letting c grow large and
`then letting n grow much more quickly.
`
`The 1/2-rate codes of length n from the Wozencraft ensemble can
`be indexed by elements of the field GF(2"/2). For each element
`oz 6
`GF(2”/2), the code W0, will consist of the binary representation of all
`pairs (:5, out), where w E CF (271/2), and by 04:]; we mean 04 multiplied by
`:5. Thus, :10 represents the message bits, and oar its check bits. It is clear
`that each word appears as a codeword exactly once; in fact, the word
`(any) appears in the code Wy/z. To produce codes of shorter length, we
`can puncture the code by ignoring some check bits. If we ignore 0 check
`bits, then each word will appear as a codeword 2C times. To produce
`longer codes,7 we can append more check bits by constructions such as
`
`7 These shorter and longer codes are different from those described by VVozencraft.
`We have presented them because they are easier to describe in this framework. If we
`use VVozencraffls original codes, we should be able to replace the r’ with r in what
`follows.
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 9
`
`IPR2018-01474
`Apple Inc. EX1017 Page 9
`
`
`
`setting WW3 to be the set of words of the form (momma), where a and
`B are chosen from GF(2"/2). A simple analysis shows that these codes
`also meet the requirements of Theorem 1.
`Encoding: The codes can be encoded using Discrete Fourier Trans—
`
`form based algorithms for multiplication over finite fields (see [27] and [1]).
`These algorithms take time O(n log n)
`Decoding: We would decode these codes with the syndrome decod—
`ing algorithm presented in Section 2. This algorithm takes time 0(712)
`and space OOH—Mo”)7 provided that the code constructor spends time
`0(2(1’T+‘)"n2), for e > 0, to build the syndrome table.
`Construction: One can find a code in these ensembles that meets the
`
`Gilbert-Varshamov bound in time 0(2(1_T/)"n2), where 7" is the greatest
`reciprocal of an integer that is less than 7“. To do this7 enumerate all
`words of weight at most 6717 where 11(6) 2 1 — 7’. For each of these words,
`determine the codes in which it appears. Then, choose a code in which
`none of these words appeared.
`Finding a code in the these ensembles that meets the Shannon bound
`does not seem to be as easy. One approach would be to enumerate over
`the codes, and then choose one that meets the Shannon bound. If we test
`the quality of a code using the syndrome decoding approach outlined at
`the end of Section 27 then this algorithm takes time 2<27TLTln7 where 7"
`is the greatest reciprocal of an integer that is less than 7“.
`
`3 Concatention of Codes
`
`Forney [9] introduced the method of concatenating codes to obtain low-
`complexity codes that approach the Shannon bound. Concatenation is
`a means of combining two codes. referred to as the inner code and an
`outer code. We’ll assume that the inner code is a binary code with m1
`message bits, rate 'r 1, and length m. The outer code will be a code over
`an alphabet with at most 2ml symbols. If the outer code has length 712
`and rate 7‘2, then the concatenation of the inner code with the outer code
`will be a binary code of rate mm with mngml message bits and length
`72,1712.
`
`To encode the concatenated code7 one collects the message bits into
`r2712 symbols in the alphabet of the outer code. These symbols are treated
`as message symbols and encoded using the encoding algorithm for the
`outer code, resulting in 712 symbols. Each of these 712 symbols of m1
`bits each is then treated as a set of message bits and encoded using the
`inner code. The code is then decoded in the reverse order. Thus7 the time
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 10
`
`IPR2018-01474
`Apple Inc. EX1017 Page 10
`
`
`
`required to encode (decode) the concatenated code is the sum of the time
`required to encode (decode) the outer code and 712 times the time required
`to encode (decode) the inner code. The idea behind the construction is
`that the length of the inner code should be small so that its complexity is
`small relative to 77.2. On the other hand, the outer code should be a code
`for which correction of errors in its symbols is very simple.
`
`As an outer code, Forney suggested the using a Reed-Solomon code.
`Reed—Solomon codes can be built with any field as their alphabet, and
`have length one less than the number of elements in the field. Using algo-
`rithms based on the Discrete Fourier Transform, Reed-Solomon codes can
`be encoded in time O(n log n log log n) and decoded in time O(n log2 n log log n).
`Using a new result of Pan [25], one can decode Reed—Solomon codes over
`prime fields in time O(n log n log log n). The decoding algorithm for a
`Reed-Solomon code of length n and rate 7" can correct up to n — 2m — 1
`errors, which is the best one could hope for.
`
`8
`
`We now sketch how one can use concatenated codes to approach the
`Shannon bound with low complexity. For any p, find a binary code C1 of
`
`rate 7‘1 < 1 — H(p) and length m such that the probability of decoding
`error using the maximum likelihood decoder is at most 6. Concatenate this
`binary code with a with a Reed Solrnon code 02 of rate 1 — 3e and length
`77.2 over the field GF(2“"1) . If fewer than 36/2 of the inner code decoding
`operations fail, then the decoder for the Reed-Solomon code will correctly
`decode the entire code. One can use a Chernoff bound to show that the
`
`probability that at least 36/ 2 of the inner code decoding operations fail
`is exponentially small in 17.2. As the rate of the concatenated code is
`(l — 36)r1, we can decrease e to obtain codes that approach the Shannon
`bound. A more careful analysis reveals that, for any rate 7" < 1 — H (p)
`and for sufficiently long block lengths,9 the probability that an inner
`code decoding operation fails will be 01”” for some (:1 > 1. Thus, the
`probability that the concatenated decoder will fail will be c; 711712 for some
`02 > 1.
`
`Encoding: If we use a code from the Wozencroft ensemble as the
`inner code, then the total time of the inner code encoding operations
`will be 0(n2n1 log n1). As mm : [logng], the encoding time will be
`dominated by the encoding time of the outer Reed-Solomon code.
`
`[38] and [5, Chapter 3, Sections 1 and 2].
`8 See [14], [28], [1],
`9 Note that we must have 1 — H(p) — 'r' < «pm before this analysis become applicable.
`Without this restriction, there is a constant probability that the number of errors
`will be more than the inner decoder can handle
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 11
`
`IPR2018-01474
`Apple Inc. EX1017 Page 11
`
`
`
`Decoding: Assuming that the code constructor has created look—up
`tables to aid the decoding of the inner code, the decoding time of the
`concatenated code will be dominated by the time required to encode and
`decode the outer Reed—Solomon code. For sufficiently large m, we can find
`primes close to and less than 27”, so we can actually use a Reed-Solomon
`
`code over a prime field as the outer code and make use of Pan’s [25]
`improved decoding algorithm.
`Construction: As the total block length is n : mug and mm :
`[log 712], the discussion in Section 2.1 yields an algorithm for choosing the
`inner code that takes time 001,244,), Where 7" is the greatest reciprocal
`of an integer such that r’ < r. This factor will dominate as it is larger
`than the time required to build the decoder and the time required to
`describe the Reed-Solomon codes.
`
`While concatenation allows us to construct codes with good average-
`case performance, it does not seem to help much with worst-case perfor-
`mance. It would be difficult to construct new codes that beat the Gilbert—
`
`Varshamov bound by concatenation: if the inner code has minimum rel-
`ative distance 61 and the outer code has minimum relative distance 62,
`then their concatenation could have minimum relative distance 6162.
`
`3.1
`
`Justesen Codes
`
`Justesen [13] was the first to find an explicit construction of an asymp-
`totically good family of error-correcting codes. Exactly what is meant
`by emplicit construction is not completely clear, but it certainly excludes
`any construction that involves a search over small codes. Ideally, it should
`mean that there is a “closed form” description of the code. One might like
`to formalize the notion of an explicit construction by defining it in terms
`of the complexity of the code constructor. However, no such definition
`will avoid the possibility of performing a constant-size search.
`J ustesen’s main idea was to concatenate an outer Reed-Solomon code
`
`with many different inner codes. This is, instead of using just one inner
`code, he would use a different inner code for each outer code symbol. In
`particular, he used a Reed-Solomon code constructed over a field of the
`
`form GF(2”) and the l/2-rate Wozencraft ensemble indexed by elements
`of GF(2”) as the inner codes. The previous analysis proving that concate-
`nated codes can be asymptotically good can easily be modified to show
`that the concatenated code will be asymptotically good provided that
`almost all of the inner codes are asymptotically good. While it was not
`Justesen’s main objective, one can also show that his codes approach the
`
`Shannon bound for rate 1/2 by using as an outer code a Reed-Solomon
`
`|PR2018—01474
`
`Apple Inc. EX1017 Page 12
`
`IPR2018-01474
`Apple Inc. EX1017 Page 12
`
`
`
`code of rate 1* 67 and letting 6 slowly go to zero as the block length grows
`large.
`
`4 Graph Theoretic Codes
`
`A major advance in the development of low-complexity codes was intro-
`
`duced in Gallager’s [11] thesis, “Low Density Parity-Check Codes”. This
`led to the development of codes that we will refer to as graph—theoretic
`codes. We do not have the time or space to provide a good explanation
`of the development of graph-theoretic coding; soj we will just mention a
`few of the points that are relevant to this work.
`
`Gallager [11] defined a family of linear codes by setting their check
`matrix to be the the bipartite adjacency matrix of a high-girth sparse
`bipartite graph. If this graph is chosen at random and the codes are de—
`coded by a maximum likelihood decoder, then they will approach the
`Shannon bound. Gallager proved that there was a linear-time algorithm
`for decoding these codes that would correct some constant fraction of ran-
`domly chosen errors with probability 2‘0Wfi). As we are now beginning
`to use codes of sufficiently long block lengths to make Gallagerls codes
`useful, they have become the subject of many recent papers (see, for ex-
`ample [10,8,20743,44]). Tanner [39] presented an important generalization
`of Gallagerls construction. along with generalizations of his decoding al—
`gorithms.
`
`Zyablov and Pinsker [45] showed that, with high probability over
`the choice of graph, Gallager’s decoder would alway correct some con—
`stant fraction of errors, regardless of where they appeared. Sipser and