`on Communication(cid:2) Control and Computing (cid:10) Champaign(cid:11)Urbana(cid:10) Illinois(cid:2)
`
`Trellis(cid:2)Constrained Codes
`
`Brendan J(cid:2) Frey
`Beckman Institute for Advanced Science and Technology
`University of Illinois at Urbana(cid:2)Champaign
`bfrey(cid:2)turbo(cid:3)beckman(cid:3)uiuc(cid:3)edu
`
`David J(cid:2) C(cid:2) MacKay
`Department of Physics(cid:3) Cavendish Laboratories
`Cambridge University
`mackay(cid:2)mrao(cid:3)cam(cid:3)ac(cid:3)uk
`
`Abstract
`
`We introduce a class of iteratively decodable trellis(cid:2)constrained codes as a gener(cid:2)
`alization of turbocodes(cid:3) low(cid:2)density parity(cid:2)check codes(cid:3) serially(cid:2)concatenated con(cid:2)
`volutional codes(cid:3) and product codes(cid:4) In a trellis(cid:2)constrained code(cid:3) multiple trellises
`interact to de(cid:5)ne the allowed set of codewords(cid:4) As a result of these interactions(cid:3) the
`minimum(cid:2)complexity single trellis for the code can have a state space that grows
`exponentially with block length(cid:4) However(cid:3) as with turbocodes and low(cid:2)density
`parity(cid:2)check codes(cid:3) a decoder can approximate bit(cid:2)wise maximum a posteriori de(cid:2)
`coding by using the sum(cid:2)product algorithm on the factor graph that describes the
`code(cid:4) We present two new families of codes(cid:3) homogenous trellis(cid:2)constrained codes
`and ring(cid:2)connected trellis(cid:2)constrained codes(cid:3) and give results that show these codes
`perform in the same regime as do turbo(cid:2)codes and low(cid:2)density parity(cid:2)check codes(cid:4)
`
` Introduction
`
`Recent interest in the impressive performances of turbocodes and low(cid:11)density parity(cid:11)
`check codes has led to several attempts at generalizing these codes in ways that lead
`to e(cid:12)cient iterative decoders(cid:2) In one view that is now quickly propagating across the
`research network a code is described by graphical constraints on a system of variables(cid:10)
`and the iterative decoder is based on a decade(cid:11)old expert systems algorithm applied to the
`graph which describes the constraints (cid:13) (cid:14)(cid:16)(cid:2) This probability propagation algorithm (cid:13)(cid:10) (cid:16) is
`exact only in cycle(cid:11)free graphs(cid:2) However(cid:10) as evidenced by the excellent error(cid:11)correcting
`capabilities of the iterative decoders for turbocodes (cid:13) (cid:16) and low(cid:11)density parity(cid:11)check
`codes (cid:13)(cid:16)(cid:10) the algorithm works impressively well in the graphs that describe these codes(cid:10)
`even though they contain many cycles(cid:2) See (cid:13) (cid:16) and (cid:13)(cid:16) for extensive dissertations on this
`subject(cid:2)
`
`
`
`Hughes, Exh. 1055, p. 1
`
`
`
`(cid:3)a(cid:7)
`
`(cid:3)b(cid:7)
`
`(cid:3)c(cid:7)
`
`R (cid:2) (cid:2)
`n (cid:2) (cid:2)
`
`R (cid:2) (cid:2)
`n (cid:2) (cid:2)
`
`(cid:3)d(cid:7)
`
`R (cid:2) (cid:2)(cid:6) n (cid:2)
`
`R (cid:2) (cid:2)(cid:6) n (cid:2)
`
`(cid:3)e(cid:7)
`
`Rt (cid:2) (cid:2)(cid:6) nt (cid:2) (cid:2) (cid:6) t (cid:2) (cid:3) (cid:3)
`
`Rt (cid:2) (cid:2)(cid:6) nt (cid:2) (cid:2) (cid:6) t (cid:2) (cid:3) (cid:4) (cid:4) (cid:4) (cid:3)
`
`Figure (cid:19) A codeword in a trellis(cid:6)constrained code must simultaneously be a codeword
`of all the constituent trellises(cid:10) with the codeword bits reordered(cid:2) The factor graphs are
`shown for (cid:3)a(cid:7) a single convolutional code(cid:20) (cid:3)b(cid:7) a turbocode(cid:10) where each parity bit is a
`codeword bit of only one trellis(cid:20) (cid:3)c(cid:7) a low(cid:11)density parity(cid:11)check code(cid:20) (cid:3)d(cid:7) a homogenous
`trellis(cid:11)constrained code(cid:20) and (cid:3)e(cid:7) a ring(cid:11)connected trellis(cid:11)constrained code(cid:2) The small
`un(cid:21)lled discs represent codeword bits(cid:2)
`
`Fig(cid:2) a shows the factor graph (cid:3)see (cid:13) (cid:16) in these proceedings(cid:7) for a trellis(cid:2) Unlike in a
`trellis(cid:10) in a factor graph the values that each state variable (cid:3)large white discs(cid:7) can take
`on are not explicitly shown(cid:2) Any two adjacent state variables and the corresponding
`codeword bits (cid:3)small white discs(cid:7) must satisfy a linear set of equations (cid:3)represented by
`the small black discs with a (cid:22)(cid:23)(cid:24) inside(cid:7)(cid:2) This representation is called a (cid:22)factor graph(cid:24)
`because it shows how the indicator function for allowed trellis behaviors factors into a
`product of local functions(cid:2) Associated with each black disc is a local function of its
`neighboring variables(cid:2) Each function evaluates to if its neighboring variables satisfy
`the local set of linear equations(cid:10) and to otherwise(cid:2) The global function is equal to the
`product of the local functions(cid:2) A given con(cid:21)guration of the codeword bits is a codeword if
`the global function evaluates to for some con(cid:21)guration of the state variables(cid:2) In general(cid:10)
`the local functions may be nonlinear(cid:10) the factor graph variables may be real(cid:11)valued(cid:10) and
`the local functions may evaluate to elements of a semi(cid:11)ring(cid:2)
`
`Although factor graphs are less explicit about local relationships than are trellises(cid:10)
`factor graphs allow us to represent a richer set of systems(cid:2) Fig(cid:2) b shows the factor graph
`for a simple turbocode(cid:2) A single trellis for the same turbocode would have an unwieldly
`large number of states(cid:2) More important than representation(cid:10) a factor graph provides
`a framework for iterative decoding via message passing on the graph(cid:2) The probability
`propagation algorithm (cid:13)(cid:10) (cid:16)(cid:10) a(cid:2)k(cid:2)a(cid:2) the sum(cid:6)product algorithm (cid:13) (cid:10) (cid:16)(cid:10) can be applied to
`a factor graph to approximate bit(cid:11)wise maximum a posteriori (cid:3)MAP(cid:7) decoding(cid:2) (cid:3)In the
`
`
`
`Hughes, Exh. 1055, p. 2
`
`
`
`special case of a turbocode(cid:10) this general algorithm reduces to turbodecoding (cid:13)(cid:10) (cid:16)(cid:2)(cid:7) Two
`di(cid:27)erent factor graphs for the same code may give decoders with di(cid:27)erent performances(cid:2)
`
`As another example(cid:10) Fig(cid:2) c shows the factor graph for a simple low(cid:11)density parity(cid:11)
`check code(cid:2) Each of the six trellises is a simple parity(cid:11)check trellis that enforces even
`parity on its six codeword bits (cid:13) (cid:16)(cid:2)
`
`In a sense(cid:10) whereas the trellis assisted in the design of low(cid:11)complexity codes and ex(cid:11)
`act linear(cid:11)time probabilistic decoders (cid:3)the Viterbi algorithm and the forward(cid:11)backward
`algorithm(cid:7)(cid:10) the factor graph assists in the design of high(cid:11)complexity codes and approx(cid:11)
`imate linear(cid:11)time probabilistic decoders(cid:2)
`In this paper(cid:10) we present a general class of
`high(cid:11)complexity(cid:10) linear(cid:11)time decodable codes that retain the chain(cid:11)type structure of trel(cid:11)
`lises locally(cid:10) as do turbocodes and to a lesser degree low(cid:11)density parity(cid:11)check codes(cid:2) A
`codeword in a trellis(cid:6)constrained code (cid:3)TCC(cid:7) must simultaneously be a codeword of mul(cid:11)
`tiple constituent trellises(cid:2) So(cid:10) if ft(cid:3)x(cid:7) is the constituent codeword indicator function for
`trellis t f (cid:2) (cid:3) (cid:3) (cid:3) (cid:2) T g(cid:10) the global codeword indicator function is
`
`(cid:3) (cid:7)
`
`ft(cid:3)x(cid:7)(cid:3)
`
`TY i
`
`(cid:2)
`
`f (cid:3)x(cid:7) (cid:28)
`
`Each constituent indicator function is given by a product of the local functions within the
`corresponding trellis(cid:2) Usually(cid:10) the codeword bits interact with the constituent trellises
`through permuters that rearrange the order of the codeword bits(cid:2)
`
`For the turbocode in Fig(cid:2) b(cid:10) there are two constituent functions(cid:10) f (cid:3)(cid:2)(cid:7) and f(cid:3)(cid:2)(cid:7)(cid:2)
`f (cid:3)(cid:2)(cid:7) indicates that the upper row of codeword bits are valid output from a convolutional
`encoder with the middle row of codeword bits as input(cid:2) f (cid:3)(cid:2)(cid:7) doesnot directly place any
`restrictions on the lower row of codeword bits(cid:10) so it e(cid:27)ectively only checks (cid:29) of the
`codeword bits(cid:2) f(cid:3)(cid:2)(cid:7) indicates that the lower row of codeword bits are valid output from
`a convolutional encoder with the middle row of codeword bits as input(cid:2) In contrast to
`f (cid:3)(cid:2)(cid:7)(cid:10) f(cid:3)(cid:2)(cid:7) does not place any restrictions on the upper row of codeword bits(cid:2)
`The rate R of a TCC is related to the rates of the constituent trellises Rt(cid:10) t (cid:28)
` (cid:2) (cid:3) (cid:3) (cid:3) (cid:2) T in a simple way(cid:2) If nt is the fraction of codeword bits that trellis t checks(cid:10) then
`trellis t removes at most (cid:3) (cid:3) Rt(cid:7)ntN binary degrees of freedom from the code(cid:2) It may
`remove a small number less if some of its constraint equations are linearly dependent on
`those given by other constituent trellises(cid:2) For large(cid:10) randomly generated permuters this
`e(cid:27)ect is quite small(cid:10) so we will ignore it when computing rates in the remainder of this
`paper(cid:2) (cid:3)As a result(cid:10) the actual rates may be slightly higher than the given rates(cid:2)(cid:7) The
`total number of binary degrees of freedom left after all trellis constraints are satis(cid:21)ed is
`N (cid:3) PT
`t(cid:2) (cid:3) (cid:3) Rt(cid:7)ntN(cid:10) so the rate of the TCC is
`
`(cid:3)(cid:7)
`
`TX t
`
`(cid:2)
`
`(cid:3) (cid:3) Rt(cid:7)nt(cid:3)
`
`R (cid:28) (cid:3)
`
`From this equation(cid:10) it is easy to verify that the turbocode in Fig(cid:2) b has rate (cid:29) and
`that the low(cid:11)density parity(cid:11)check code in Fig(cid:2) c also has rate (cid:29)(cid:2) (cid:3)Note that a k(cid:11)bit
`parity(cid:11)check trellis has rate (cid:3)k (cid:3) (cid:7)(cid:4)k(cid:2)(cid:7)
`
`Unlike encoding turbocodes and serially(cid:11)concatenated convolutional codes(cid:10) encoding
`a general TCC takes quadratic time in N(cid:2) In a general TCC(cid:10) we can designate a subset
`of the codeword bits as the systematic bits of the entire code and then use Gaussian
`elimination to compute a generator matrix (cid:3)once only(cid:7)(cid:2) Using such a systematic generator
`matrix for encoding requires R(cid:3) (cid:3) R(cid:7)N binary operations(cid:2)
`
`
`
`Hughes, Exh. 1055, p. 3
`
`
`
`Decoding a TCC involves performing the forward(cid:11)backward algorithm for each trellis
`and exchanging information between trellises in the fashion speci(cid:21)ed by the sum(cid:11)product
`algorithm(cid:2) The constituent trellises may be processed in parallel or sequentially(cid:2)
`
`In the following two sections(cid:10) we present two new families of TCC(cid:30)s and show that
`they perform in the same regime as do turbocodes and low(cid:11)density parity(cid:11)check codes(cid:2)
`
` Homogenous Trellis(cid:2)Constrained Codes
`
`In a turbocode(cid:10) the constituent trellises share only a systematic subset of their codeword
`bits(cid:2) The other parity bits of each constituent encoder are not constrained by the other
`encoders(cid:2) Fig(cid:2) d shows the factor graph for a simple homogenous TCC with T (cid:28) (cid:10) in
`which all of the bits are constrained by each constituent trellises(cid:2) From the general rate
`formula in (cid:3)(cid:7)(cid:10) we see that the rate for a homegenous turbocode is
`
`R (cid:28) (cid:3) T (cid:3) (cid:3) Ravg(cid:7)(cid:2)
`
`(cid:3) (cid:7)
`
`where Ravg (cid:28) (cid:2)PT
`t(cid:2) Rt(cid:3)(cid:4)T (cid:2)
`One di(cid:27)erence between the homogenous TCC and the turbocode is that the rate of
`a homogenous TCC decreases directly with the number of trellises T (cid:2) In the simulations
`discussed below(cid:10) we used T (cid:28) and R (cid:28) R (cid:28) Ravg (cid:28) (cid:4) to get R (cid:28) (cid:4)(cid:2) To obtain
`the same rate with T (cid:28) would require Ravg (cid:28) (cid:4)(cid:2) In contrast(cid:10) the rate for a turbocode
`varies roughly inversely with T (cid:2) A rate (cid:29) turbocode with T (cid:28) can be obtained with
`R (cid:28) R (cid:28) R (cid:28) (cid:4)(cid:2) Another di(cid:27)erence is that the permuter length of a homogenous
`TCC is N (cid:10) whereas for a turbocode(cid:10) the permuter length is RN(cid:2)
`
`(cid:2) Encoding and Decoding
`
`The trellises in a homogenous TCC share all their bits(cid:10) so we can(cid:30)t simply encode by
`dividing the bits in each constituent trellis into a systematic set and a parity set and
`running a linear(cid:11)time encoding method for each trellis(cid:10) as is possible in a turbocode(cid:2)
`Instead(cid:10) we apply a previously computed generator matrix to a previously selected sys(cid:11)
`tematic subset of codeword bits(cid:10) which takes R(cid:3) (cid:3) R(cid:7)N binary operations(cid:2)
`
`The iterative decoder processes each constituent trellis using the forward(cid:11)backward
`algorithm(cid:10) and passes (cid:22)extrinsic information(cid:24) between the trellises in the manner spec(cid:11)
`i(cid:21)ed by the sum(cid:11)product algorithm(cid:2) For two trellises(cid:10) the decoding schedule is straight(cid:11)
`forward(cid:2) For T (cid:5) (cid:10) di(cid:27)erent decoding schedules are possible(cid:2) The trellises may be
`processed sequentially(cid:10) in which case the current trellis uses the most recently computed
`probabilities produced by the other trellises(cid:2) Alternatively(cid:10) the trellises may be processed
`in parallel(cid:10) in which case the current trellis uses the probabilities produced by the other
`trellises in the previous decoding iteration(cid:2)
`
`For the sake of gaining insight into these new compound codes and the behavior of
`their iterative decoders(cid:10) we prefer to decode until a codeword is found or we are sure
`the iterative decoder has failed to (cid:21)nd a codeword(cid:2) After each decoding iteration(cid:10) the
`current bit(cid:11)wise MAP estimates are used to determine whether a valid codeword has
`been found(cid:10) in which case the iterative procedure is terminated(cid:2) If iterations are
`completed without (cid:21)nding a codeword(cid:10) we label the block a decoding failure(cid:2) Notice that
`given the factor graph of a code(cid:10) determining that a codeword is valid is simply a matter
`of checking that all the local functions evaluate to (cid:2)
`
`
`
`Hughes, Exh. 1055, p. 4
`
`
`
`(cid:2) Performance on an AWGN Channel
`
`Using Monte Carlo(cid:10) we estimated the performance of an N (cid:28) (cid:2) (cid:10) T (cid:28) homogenous
`TCC with R (cid:28) R (cid:28) (cid:4)(cid:10) giving R (cid:28) (cid:4)(cid:2) (cid:3)See App(cid:2) A for a description of how the BER
`con(cid:21)dence intervals were computed(cid:2)(cid:7) Each rate (cid:29) trellis was obtained by shortening
`every (cid:21)fth bit of a rate (cid:29) nonsystematic convolutional code with maximum dmin(cid:2) (cid:3)The
`generator polynomials for this code are given in (cid:13) (cid:16) and are (cid:3) (cid:2) (cid:2) (cid:2) (cid:2) (cid:7)octal(cid:2)(cid:7) Fig(cid:2)
`shows the performance of this homogenous TCC(cid:10) relative to the turbocode introduced
`by Berrou et(cid:7) al(cid:7) (cid:13) (cid:16) and the best rate (cid:29)(cid:10) N (cid:28) (cid:2) low(cid:11)density parity(cid:11)check code
`published to date (cid:13)(cid:16)(cid:2) Although it does not perform as well as the turbocode(cid:10) it performs
`signi(cid:21)cantly better than the low(cid:11)density parity(cid:11)check code(cid:2) We believe there is room for
`improvement here(cid:10) since we chose the set of generator polynomials that gave maximum
`(cid:3)Keep in mind(cid:10) however(cid:10) that the
`dmin and this is quite likely not the best choice(cid:2)
`performance of a homogenous TCC is not necessarily governed by the same constituent
`trellis properties that govern the performance of a turbocode(cid:2)(cid:7) Of signi(cid:21)cant importance
`compared to turbocodes(cid:10) we have observed that this homogenous TCC does not have
`low(cid:11)weight codewords(cid:2)
`
`Turbocode
`Ring-connected TCC
`Homogenous TCC
`Low-density parity-check code
`
`Uncoded
`
`Shannon(cid:30)s limit
`
`0
`
`0.2
`
`0.4
`
`0.8
`0.6
`Eb(cid:4)N (cid:3)dB(cid:7)
`
`1
`
`1.2
`
`1.4
`
`1.6
`
`1
`
`1e-1
`
`1e-2
`
`1e-3
`
`BER
`
`1e-4
`
`1e-5
`
`1e-6
`
`Figure (cid:19) The performances of a homogenous TCC and a ring(cid:11)connected TCC compared
`to the best rate (cid:29) turbocode and low(cid:11)density parity(cid:11)check code performances published
`to date (cid:13)(cid:10) (cid:16)(cid:2)
`
` Ring(cid:2)Connected Trellis(cid:2)Constrained Codes
`
`Fig(cid:2) e shows the factor graph for a simple ring(cid:6)connected TCC with T (cid:28) (cid:2) This code
`can be viewed as a serially(cid:11)concatenated convolutional code (cid:13) (cid:10) (cid:16) in which some of the
`
`
`
`Hughes, Exh. 1055, p. 5
`
`
`
`output bits are constrained to be equal to some of the input bits(cid:2) The factor graph thus
`forms a ring of connected trellises(cid:2) In the ring(cid:11)connected TCC shown(cid:10) each constituent
`trellis checks exactly (cid:4)T of the codeword bits(cid:2) From the general rate formula in (cid:3)(cid:7)(cid:10) the
`rate for such a ring(cid:11)connected turbocode is
`
`R (cid:28) (cid:3)Ravg (cid:3) (cid:4)(cid:7)(cid:3)
`
`(cid:3)(cid:7)
`
`Unlike turbocodes and homogenous TCC(cid:30)s(cid:10) for such a ring(cid:11)connected TCC the rate does
`not depend on the number of constituent trellises T (cid:2) However(cid:10) the permuter lengths are
` (cid:4)T relative to the block length(cid:2)
`
` (cid:2) Encoding and Decoding
`
`For ring(cid:11)connected TCC(cid:30)s(cid:10) we cannot use the same type of encoding procedure that is
`used for serially(cid:11)concatenated convolutional codes(cid:10) since some of the (cid:22)output(cid:24) bits must
`match some of the (cid:22)input(cid:24) bits(cid:2) As with the homogenous TCC(cid:10) we can encode with ap(cid:11)
`proximately R(cid:3) (cid:3) R(cid:7)N binary operations by applying a previously computed generator
`matrix to a previously selected systematic subset of codeword bits(cid:2) However(cid:10) if Rt (cid:28) (cid:4)(cid:10)
`t (cid:28) (cid:2) (cid:3) (cid:3) (cid:3) (cid:2) T (cid:3)giving R (cid:28) (cid:4)(cid:7)(cid:10) encoding computations can be saved in the following way(cid:2)
`First(cid:10) we pick N(cid:4)T systematic bits for trellis and generate the corresponding N(cid:4)T
`parity bits(cid:10) using a number of computations that is linear in N(cid:2) Then(cid:10) we work around
`the ring and for each trellis pick N(cid:4)T systematic bits and generate the corresponding
`N(cid:4)T parity bits(cid:2) When all but two trellises have been encoded(cid:10) the last two trellises are
`used to form a binary vector that when multiplied by a previously computed N(cid:4)T (cid:4) N(cid:4)T
`binary matrix yields the (cid:21)nal set of N(cid:4)T parity bits(cid:2) The computations for the last
`step dominate and require approximately N (cid:4)T binary operations(cid:2) For Ravg (cid:28) (cid:4)(cid:10)
`R (cid:28) (cid:4)(cid:10) T (cid:28) (cid:10) the (cid:21)rst method described above takes N (cid:4) operations(cid:10) whereas the
`second method takes N (cid:4) operations(cid:2)
`
`As with homogenous TCC(cid:30)s(cid:10) when T (cid:5) ring(cid:11)connected TCC(cid:30)s can be iteratively
`decoded using a variety of schedules(cid:2) In our simulations(cid:10) we process the trellises sequen(cid:11)
`Iterative decoding
`tially while passing probabilities around the ring in one direction(cid:2)
`continues until a valid codeword is found or until iterations are completed(cid:2)
`
` (cid:2) Performance on an AWGN Channel
`
`We estimated the performance of an N (cid:28) (cid:2) (cid:10) T (cid:28) ring(cid:11)connected TCC with
`n (cid:28) n (cid:28) n (cid:28) (cid:4) and R (cid:28) R (cid:28) R (cid:28) (cid:4)(cid:10) giving R (cid:28) (cid:4)(cid:2) Each rate (cid:29) trellis
`used the generator polynomials (cid:3) (cid:2) (cid:2) (cid:2) (cid:7)octal(cid:10) which we found by trial and error(cid:2) The
`constituent codeword bits were shared alternately with the two neighboring trellises(cid:2)
`Fig(cid:2) shows the performance of this ring(cid:11)connected TCC(cid:2) It performs sign(cid:21)cantly better
`than the homogenous TCC and the low(cid:11)density parity(cid:11)check code and only (cid:2) dB worse
`than the turbocde(cid:2)
`
` Discussion
`
`There is some literature on the properties of the constituent trellises that make good tur(cid:11)
`bocodes (cid:13) (cid:10) (cid:10) (cid:16) and serially(cid:11)concatenated convolutional codes (cid:13) (cid:16)(cid:2) We are currently
`exploring similar arguments for choosing the properties of the constituent trellises that
`
`
`
`Hughes, Exh. 1055, p. 6
`
`
`
`will make good homogenous TCC(cid:30)s and ring(cid:11)connected TCC(cid:30)s(cid:2) The behavior of these
`two new families of TCC(cid:30)s is quite di(cid:27)erent from that of turbocodes(cid:10) so we expect dif(cid:11)
`ferent properties to be relevant(cid:2) For example(cid:10) in order to avoid low(cid:11)weight codewords in
`a turbocode(cid:10) we try to avoid permuters that keep a pair of bits the same distance apart
`in the two constituent trellises (cid:13) (cid:16)(cid:2) This degenerate e(cid:27)ect is (cid:22)broken(cid:24) by the ring of a
`ring(cid:11)connected TCC(cid:10) which requires not only that two neighboring trellises have an equal
`set of shared (cid:22)input(cid:24) bits(cid:10) but also that their (cid:22)output(cid:24) bits must satisfy the constraints
`given by the ramainder of the ring(cid:2) However(cid:10) in a ring(cid:11)connected TCC low(cid:11)weight code(cid:11)
`words are introduced by permuters that preserve a degenerate pattern around the entire
`ring(cid:2) Another important question is what properties of a TCC lead to the initial drop
`(cid:3)say down to a BER of (cid:2) (cid:7)(cid:31) These properties may very well be independent (cid:3)or even
`contrary to(cid:7) those properties that give high minimum distance(cid:2)
`
`We believe that the general class of (cid:22)trellis(cid:11)constrained codes(cid:24) presented in this paper
`is a fruitful generalization of several other iteratively decodable codes(cid:2) If we think of entire
`trellises as nodes in a graph whose edges represent bundles of codeword bits(cid:10) it is evident
`that there are a variety of other interesting new TCC(cid:30)s aside from those shown in Fig(cid:2)
`that have yet to be explored(cid:2)
`
`Acknowledgements
`
`We appreciate discussions we had with members of the (cid:22)Ulm group(cid:24)(cid:10) Dave Forney(cid:10) Ralph
`Koetter(cid:10) Frank Kschischang(cid:10) Andi Loeliger(cid:10) Bob McEliece(cid:10) Michael Tanner(cid:10) and Niclas
`Wiberg(cid:10) as well as discussions we had with Rob Calderbank and Alexander Vardy(cid:2)
`
`Appendix A(cid:7) Computing BER Con(cid:8)dence Intervals
`
`When analytic methods are not available for computing bit error rates in error(cid:11)correcting
`coding systems(cid:10) we must resort to simulation(cid:2) Estimated BER(cid:30)s can vary signi(cid:21)cantly
`from experiment to experiment(cid:10) and so it is often desirable to include con(cid:21)dence intervals(cid:2)
`This is especially important for the long block length codes discussed in this paper(cid:10) since
`signi(cid:21)cant variability can be introduced by our inability to simulate enough blocks to
`pin down the word error rate(cid:2) Also(cid:10) for low bit error rates (cid:3)e(cid:7)g(cid:7)(cid:10) (cid:0)(cid:7) we may not be
`able to measure the distribution of bit errors within erroneously decoded words(cid:2) In this
`section(cid:10) we present a Monte Carlo approach for estimating the median and a (cid:2) (cid:29) (cid:2)
`con(cid:21)dence interval for the BER(cid:10) based on errors measured by Monte Carlo simulation(cid:2)
`
`The error model contains two parameters(cid:19) the probability pw of word error(cid:10) and the
`probability pb of bit error within erroneous words(cid:2) This is a rather crude approximation(cid:10)
`since in practice we expect there to be more than one failure mode(cid:10) i(cid:7)e(cid:7)(cid:10) there ought to
`be several pb(cid:30)s corresponding to di(cid:27)erent failure modes(cid:2)
`Let M be the number of words transmitted and let nw be the number of measured
`word errors(cid:2) Let K be the number of information bits per word(cid:10) and let nb be the
`total number of bit errors measured while transmitting all M blocks(cid:2) We will refer to
`the measured values as the data(cid:10) D (cid:28) fnw(cid:2) nbg(cid:2) From the Bayesian perspective(cid:10) before
`observing D(cid:10) we place a prior distribution p(cid:3)pw(cid:2) pb(cid:7) on the error model parameters(cid:2) After
`observing D(cid:10) we draw conclusions (cid:3)e(cid:7)g(cid:7)(cid:10) compute a con(cid:21)dence interval(cid:7) from the posterior
`distribution p(cid:3)pw(cid:2) pbjD(cid:7)(cid:10) where
`
`p(cid:3)pw(cid:2) pbjD(cid:7) (cid:5) p(cid:3)pw(cid:2) pb(cid:7)P (cid:3)Djpw(cid:2) pb(cid:7)(cid:3)
`
`(cid:3)(cid:7)
`
`
`
`Hughes, Exh. 1055, p. 7
`
`
`
`In this equation(cid:10) the constant of proportionality does not depend on pw or pb(cid:2) The last
`factor P (cid:3)Djpw(cid:2) pb(cid:7) is called the likelihood(cid:2)
`We let pw and pb be independent beta(cid:11)distributed random variables under the prior(cid:19)
`p(cid:3)pw(cid:2) pb(cid:7) (cid:28)p(cid:3)p w(cid:7)p(cid:3)pb(cid:7)(cid:10) where
`
`p(cid:3)pw(cid:7) (cid:5) p(cid:5)w (cid:0)
`w
`
`(cid:3) (cid:3) pw(cid:7)(cid:6)w (cid:0) (cid:2)
`
`and
`
`p(cid:3)pb(cid:7) (cid:5) p(cid:5)b(cid:0)
`b
`
`(cid:3) (cid:3) pb(cid:7)(cid:6)b(cid:0) (cid:3)
`
`(cid:3)(cid:7)
`
`In frequentist terms(cid:10) (cid:6)w and (cid:7)w have the e(cid:27)ect of shrinking our measurements toward
`a word error rate of (cid:6)w(cid:4)(cid:3)(cid:6)w (cid:23) (cid:7)w(cid:7)(cid:10) where the in!uence of this shrinkage grows with
`(cid:6)w (cid:23) (cid:7)w(cid:2) Typically(cid:10) we choose (cid:6)w (cid:28) (cid:7)w (cid:28) (cid:10) which gives a uniform prior over pw as
`shown in Fig(cid:2) a(cid:2)
`
`As for the prior over pb(cid:10) it should be chosen while keeping in mind the behavior of the
`decoder(cid:2) If the main source of bit errors is a failure to decode(cid:10) and if we believe that for
`failures the decoder will produce a probability of bit error that is roughly equal to the
`probability pu of bit error for uncoded transmission(cid:10) then the prior should place weight
`on pb (cid:28) pu(cid:2) In this case(cid:10) we choose (cid:6)b (cid:28) and (cid:7)b (cid:28) (cid:4)pu(cid:10) which ensures that the mode of
`the prior occurs at pu and that the prior is relatively broad(cid:2) For example(cid:10) for Eb(cid:4)N (cid:28)
`dB we have pu (cid:28) (cid:3) (cid:10) and so We choose (cid:6)b (cid:28) and (cid:7)b (cid:28) (cid:4) (cid:3) (cid:28) (cid:3)(cid:10) giving
`the prior distribution for pb shown in Fig(cid:2) b(cid:2)
`
`0
`
`0.2
`
`0.4
`pb
`
`0.6
`
`0.8
`
`1
`
`012345678
`
`(cid:3)b(cid:7)
`
`p(cid:3)pb(cid:7)
`
`(cid:3)a(cid:7)
`
`1
`
`p(cid:3)pw(cid:7)
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`0
`
`0
`
`0.2
`
`0.4
`
`0.8
`
`1
`
`0.6
`pw
`
`Figure (cid:19) (cid:3)a(cid:7) The prior distribution over the probability of word error pw(cid:2) (cid:3)b(cid:7) The prior
`distribution over the probability of bit error pb within erroneous words(cid:2) This distribu(cid:11)
`tion is designed so that its median is equal to the probability of bit error for uncoded
`transmission(cid:2)
`
`It is straightforward to show that the likelihood is
`
`w (cid:3) (cid:3) pw(cid:7)M (cid:0)nwpnb
`P (cid:3)Djpw(cid:2) pb(cid:7) (cid:28) P (cid:3)nw(cid:2) nbjpw(cid:2) pb(cid:7) (cid:5) pnw
`b (cid:3) (cid:3) pb(cid:7)nwK(cid:0)nb(cid:3)
`
`(cid:3)(cid:7)
`
`This distribution is the product of a binomial distribution for the number of word errors
`and a binomial distribution for the number of bit errors(cid:2) Combining this likelihood with
`the prior(cid:10) we obtain the posterior(cid:10)
`
`p(cid:3)pw(cid:2) pbjD(cid:7) (cid:5) p(cid:5)w (cid:0) (cid:10)nw
`w
`
`(cid:3) (cid:3) pw(cid:7)(cid:6)w(cid:0) (cid:10)M (cid:0)nw p(cid:5)b(cid:0) (cid:10)nb
`b
`
`(cid:3) (cid:3) pb(cid:7)(cid:6)b(cid:0) (cid:10)nwK(cid:0)nb(cid:2)
`
`(cid:3)(cid:7)
`
`
`
`Hughes, Exh. 1055, p. 8
`
`
`
`which is just the product of a beta distribution over pw and a separate beta distribution
`over pb(cid:2) Of course(cid:10) we are actually interested in the posterior distribution p(cid:3)pwpbjD(cid:7) over
`the total probability of a bit error pwpb(cid:2) A sample is obtained from p(cid:3)pwpbjD(cid:7) by drawing
`pw " pb pairs from the posterior in (cid:3)(cid:7) and taking the product of pw and pb in each pair(cid:2)
`This sample is sorted in ascending order(cid:10) and the value of pwpb occuring half(cid:11)way through
`the sorted list is taken as an estimate of the median of p(cid:3)pwpbjD(cid:7)(cid:2) Similarly(cid:10) the values
`of pwpb occuring (cid:2) and (cid:2) through the sorted list are taken as the con(cid:21)dence
`interval(cid:2)
`
`For the homogenous TCC(cid:10) we simulated the transmission of M (cid:28) blocks at
`Eb(cid:4)N (cid:28) (cid:3) dB using a block length of N (cid:28) (cid:2) (cid:2) We measured nw (cid:28) and
`nb (cid:28) (cid:2) (cid:2) Using the prior presented above for the slightly higher value of Eb(cid:4)N (cid:28)
`dB(cid:10) a sample of points from the posterior over pw and pb was obtained and is shown
`in Fig(cid:2) a(cid:2) As described above(cid:10) for (cid:8) (cid:28) (cid:2) (cid:10) (cid:2) and (cid:2) (cid:10) we found the values for p(cid:7)
`
`Samples
`= 9.9e-4
`= 1.7e-3
`= 2.6e-3
`
`pwpb
`pwpb
`pwpb
`
`(cid:3)a(cid:7)
`
`pb
`
`0.4
`
`0.35
`
`0.3
`
`0.25
`
`0.2
`
`0.15
`
`0.1
`
`0.05
`
`(cid:3)b(cid:7)
`
`pb
`
`0.4
`
`0.35
`
`0.3
`
`0.25
`
`0.2
`
`0.15
`
`0.1
`
`0.05
`
`Samples
`= 1.6e-7
`= 5.1e-6
`= 4.8e-5
`
`pwpb
`pwpb
`pwpb
`
`0
`1e-7
`
`1e-6
`
`1e-5
`
`1e-4
`
`1e-2
`
`1e-1
`
`1e-0
`
`1e-3
`pw
`
`0
`1e-7
`
`1e-6
`
`1e-5
`
`1e-4
`
`1e-2
`
`1e-1
`
`1e-0
`
`1e-3
`pw
`
`Figure (cid:19) (cid:3)a(cid:7) A (cid:11)point sample from p(cid:3)pw(cid:2) pbjD(cid:7) for M (cid:28) (cid:10) nw (cid:28) (cid:10) K (cid:28) (cid:2)
`and nb (cid:28) (cid:2) (cid:10) for the prior described in the main text(cid:2) (cid:3)b(cid:7) A (cid:11)point sample from
`p(cid:3)pw(cid:2) pbjD(cid:7) for M (cid:28) (cid:2) (cid:10) nw (cid:28) (cid:10) K (cid:28) (cid:2) and nb (cid:28) (cid:10) for the same prior(cid:2)
`
`such that #p(cid:3)pwpb (cid:9) p(cid:7)jD(cid:7) (cid:28) (cid:8)(cid:10) where #p is the sample distribution(cid:2) The corresponding
`three curves of the form pwpb (cid:28) p(cid:7) are shown in Fig(cid:2) a(cid:10) and the corresponding values of
`p(cid:7) give a median of (cid:3) (cid:4) (cid:0) and a con(cid:21)dence interval of (cid:3) (cid:3) (cid:4) (cid:0)(cid:2) (cid:3) (cid:4) (cid:0) (cid:7)(cid:2)
`Clearly(cid:10) in this case it is the values for pw that determine the p(cid:7)(cid:30)s for these curves(cid:10) whereas
`the values for pb are well(cid:11)determined by the measurements(cid:2) We could have assumed that
`pb took on its measured value instead of sampling from the posterior(cid:2)
`For the homogenous TCC described above(cid:10) we also simulated the transmission of
`M (cid:28) (cid:2) blocks at Eb(cid:4)N (cid:28) (cid:3) dB(cid:2) In this case(cid:10) we measured nw (cid:28) and nb (cid:28) (cid:2)
`Using naive methods(cid:10) we might conclude that the bit error rate is and that there isn(cid:30)t
`any variation in this value(cid:2) However(cid:10) the Bayesian technique gives the sample from the
`posterior shown in Fig(cid:2) b(cid:2)
`In this case(cid:10) the values of both pw and pb play a role in
`determining the p(cid:7)(cid:30)s for the three curves(cid:2) The median is (cid:3) (cid:4) (cid:0) and the con(cid:21)dence
`interval is (cid:3) (cid:3) (cid:4) (cid:0)(cid:2) (cid:3) (cid:4) (cid:0)(cid:7)(cid:2)
`
`
`
`Hughes, Exh. 1055, p. 9
`
`
`
`References
`
`(cid:0) (cid:3) R(cid:4) M(cid:4) Tanner(cid:5) (cid:6)A recursive approach to low complexity codes(cid:5)(cid:7) IEEE Transactions on Information
`Theory(cid:5) vol(cid:4) (cid:5) pp(cid:4) (cid:12)(cid:5) (cid:4)
`
`(cid:0)(cid:3) N(cid:4) Wiberg(cid:5) H(cid:4)(cid:16)A(cid:4) Loeliger(cid:5) and R(cid:4) K(cid:17)otter(cid:5) (cid:6)Codes and iterative decoding on general graphs(cid:5)(cid:7)
`Eur