IPR2017-00701, No. 1124-23 Exhibit - Kschischang, Iterative decoding of compound codes (P.T.A.B. Jan. 20, 2017)

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 16, NO. 2, FEBRUARY 1998
`
`219
`
`Iterative Decoding of Compound Codes by
`Probability Propagation in Graphical Models
`
`Frank R. Kschischang, Member, IEEE, and Brendan J. Frey
`
`Abstract—We present a uniﬁed graphical model framework for
`describing compound codes and deriving iterative decoding algo-
`rithms. After reviewing a variety of graphical models (Markov
`random ﬁelds, Tanner graphs, and Bayesian networks), we derive
`a general distributed marginalization algorithm for functions
`described by factor graphs. From this general algorithm, Pearl’s
`belief propagation algorithm is easily derived as a special case.
`We point out that recently developed iterative decoding algo-
`rithms for various codes, including “turbo decoding” of parallel-
`concatenated convolutional codes, may be viewed as probability
`propagation in a graphical model of the code. We focus on
`Bayesian network descriptions of codes, which give a natural
`input/state/output/channel description of a code and channel, and
`we indicate how iterative decoders can be developed for parallel-
`and serially concatenated coding systems, product codes, and
`low-density parity-check codes.
`Index Terms— Concatenated coding, decoding, graph theory,
`iterative methods, product codes.
`
`propagation algorithms and [15] for an extensive treatment of
`graphical models.)
`The ﬁrst to connect Pearl’s “belief propagation” algorithm
`with coding were MacKay and Neal [16]–[18], who showed
`that Gallager’s 35-year-old algorithm [5] for decoding low-
`density parity-check codes is essentially an instance of Pearl’s
`algorithm. Extensive simulation results of MacKay and Neal
`show that Gallager codes can perform nearly as well as turbo
`codes, indicating that we probably “sailed” much closer to
`capacity 35 years ago than might have been appreciated in the
`interim. McEliece et al. [19] have also independently observed
`that turbo decoding is an instance of “belief” propagation.
`They provide a description of Pearl’s algorithm, and make
`explicit the connection to the basic turbo decoding algorithm
`described in [1].
`Recently, and independently of developments in the expert
`systems literature, Wiberg et al. in [20] and Wiberg in his
`doctoral dissertation [21] have refocused attention on graphical
`models for codes. They show that a type of graphical model
`called a “Tanner graph” (ﬁrst introduced by Tanner [22] to
`describe a generalization of Gallager codes) provides a natural
`setting in which to describe and study iterative soft-decision
`decoding techniques, much as the code trellis [23] is an ap-
`propriate model in which to describe and study “conventional”
`maximum likelihood soft-decision decoding using the Viterbi
`algorithm. Forney [24] gives a nice description of the history
`of various “two-way” algorithms and their connections with
`coding theory.
`In this paper, we seek to unify this recent work by develop-
`ing a graphical model framework that can be used to describe
`a broad class of compound codes and derive corresponding
`iterative decoding algorithms. In Section II, we review and
`relate various graphical models, such as Markov random
`ﬁelds, Tanner graphs, and Bayesian networks. These graphs
`all support the basic probability propagation algorithm, which
`is developed in Section III in the general setting of a “factor
`graph,” and in Section IV for the speciﬁc case of a Bayesian
`network.
`Given a graphical code model, probability propagation can
`be used to compute the conditional probability of a message
`symbol given the observed channel output. For richly con-
`nected graphs containing cycles, exact probability propagation
`becomes computationally infeasible, in which case iterative
`decoding can still yield excellent results. The basic iterative
`decoding algorithm proceeds as if no cycles were present in
`the graph, with no guarantee that the computed “conditional
`probabilities” are close to the correct values, or that they even
`converge! Nevertheless, the excellent performance of turbo
`codes and Gallager codes is testimony to the efﬁcacy of these
`iterative decoding procedures.
`0733–8716/98$10.00 © 1998 IEEE
`
`I. INTRODUCTION
`
`COMPOUND codes are codes composed of a collection
`
`of interacting constituent codes, each of which can
`be decoded tractably. In this paper, we describe various
`graphical models that can be used not only to describe a
`wide variety of compound codes, but also to derive a variety
`of iterative decoding algorithms for these codes. Prominent
`among compound codes are the turbo codes introduced by
`Berrou et al. [1], in which the constituent convolutional codes
`interact
`in “parallel concatenation” through an interleaver.
`It
`is probably fair to say that the near-capacity error-rate
`performance of turbo codes has sparked much of the current
`interest in iterative decoding techniques, as evidenced by this
`special issue. Other examples of compound codes include
`classical serially concatenated codes [2] (see also [3], [4]),
`Gallager’s low-density parity-check codes [5], and various
`product codes [6], [7].
`In [8] and [9], we observed that iterative decoding algo-
`rithms developed for these compound codes are often instances
`of probability propagation algorithms that operate in a graphi-
`cal model of the code. These algorithms have been developed
`in the past decade in the expert systems literature, most notably
`by Pearl [10] and Lauritzen and Spiegelhalter [11]. (See
`[12]–[14] for textbook treatments on probability or “belief”
`
`Manuscript received September 27, 1996; revised May 8, 1997 and July
`29, 1997.
`F. R. Kschischang is with the Department of Electrical and Computer
`Engineering, University of Toronto, Toronto, Ont., Canada, on leave at the
`Massachusetts Institute of Technology, Cambridge, MA 02138 USA.
`B. J. Frey is with the Beckman Institute, University of Illinois at Ur-
`bana–Champaign, Urbana, IL 61801 USA.
`Publisher Item Identiﬁer S 0733-8716(98)00225-X.
`
`Apple 1124
`
`

`
`220
`
`IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 16, NO. 2, FEBRUARY 1998
`
`(c)
`(b)
`(a)
`Fig. 1. Graphical models for the (7, 4) Hamming code. (a) Markov random ﬁeld with a maximal clique indicated. (b) Tanner graph. (c) Bayesian network.
`
`In Section V, we describe Bayesian network models for
`a variety of compound codes, and describe how probability
`propagation can be used to decode these codes. As it is a
`straightforward exercise to develop Bayesian network models
`for many coding schemes such as multilevel codes and coset
`codes, and also for channels more general than the usual mem-
`oryless channels, we believe that there are many possibilities
`for application of iterative decoding techniques beyond what
`has been described in the literature to date.
`
`II. GRAPHICAL CODE MODELS
`In this section, we present several graph-based models that
`can be used to describe the conditional dependence structure in
`codes and channels. Given a set
`of random
`variables with joint probability distribution
`, a
`graphical model attempts to capture the conditional depen-
`dency structure inherent in this distribution, essentially by
`expressing how the distribution factors as a product of “local
`functions” (e.g., conditional probabilities) involving various
`subsets of
`. Graphical models are useful for describing the
`structure of codes, and are the key to “probability propagation”
`and iterative decoding.
`
`A. Markov Random Fields
`A Markov random ﬁeld (see, e.g., [25]) is a graphical model
`based on an undirected graph
`in which each vertex
`corresponds to a random variable, i.e., an element of
`. Denote
`by
`the neighbors of
`, i.e., the set of vertices of
`connected to
`by a single edge of
`. The graph
`is a
`Markov random ﬁeld (MRF) if the distribution
`satisﬁes the local Markov property:
`. In other words,
`is an MRF if every variable
`is independent of nonneighboring variables in the graph,
`given the values of its immediate neighbors. MRF’s are well
`developed in statistics, and have been used in a variety of
`applications (see, e.g., [25]–[28]).
`The joint probability mass (or density) function for the
`vertices of a MRF
`is often expressed in terms of a Gibbs
`potential function deﬁned on the maximal cliques of
`. A
`clique in
`is a collection of vertices which are all pairwise
`neighbors, and such a clique is maximal if it is not properly
`contained in any other clique. Corresponding to each clique
`in the set of maximal cliques
`is a collection of vertices
`that are contained in . Denote by
`the sample space for
`the random variable . Given a nonnegative potential function
`for each clique
`, i.e., a function
`
`over
`
`, the joint probability mass (or density) function
`is given by
`
`(1)
`
`the
`is a normalizing constant, assuming that
`where
`product in (1) is not everywhere zero. It is possible to deﬁne
`an MRF in terms of potential functions deﬁned over all cliques
`in
`, not just the maximal cliques, but any potential function
`deﬁned over a nonmaximal clique
`can be absorbed into the
`potential function deﬁned over the maximal clique containing
`.
`
`From the structure of the potential functions, it is a straight-
`forward exercise (see, e.g., [25]) to show that the resulting
`probability distribution satisﬁes the local Markov property.
`Indeed, every strictly positive MRF can be expressed in terms
`of a Gibbs potential, although the proof of this result (given,
`e.g., in [26, ch. 1]) is less straightforward. Lauritzen [15, pp.
`37–38] gives an example due to Moussouris of a nonstrictly
`positive MRF satisfying the local Markov property for which
`it is impossible to express the joint distribution as a product
`of potentials as in (1).
`To illustrate how MRF’s can be used to describe codes, con-
`sider the MRF with seven binary variables shown in Fig. 1(a).
`There are four maximal cliques:
`(dashed
`loop),
`, and
`.
`From (1), the joint probability distribution for
`, can
`be written as a product of Gibbs potential functions deﬁned
`over the variable subsets indicated by these four cliques. This
`MRF can be used to describe a Hamming code by setting
`(which is equivalent to neglecting
`), and by letting
`the ﬁrst three potentials be even parity indicator functions. That
`is,
`if its arguments form a conﬁguration with even
`parity and 0 otherwise. The MRF places a uniform probability
`distribution on all conﬁgurations that satisfy even parity in
`cliques
`, and
`, and zero probability on conﬁgurations
`not satisfying these parity relations.
`While the potential functions chosen in this example deﬁne a
`linear code, it is clear that such potential functions can be used
`to determine a code satisfying any set of local check condi-
`tions. In particular, given a set of variables
`,
`let
`be a collection of subsets of
`. Corresponding to each
`element
`of
`, a local check condition enforces structure on
`the variables contained in
`by restricting the values that these
`variables can assume. (For example, the check condition could
`enforce even parity, as in the example above.) By deﬁning an
`
`

`
`KSCHISCHANG AND FREY: ITERATIVE DECODING OF COMPOUND CODES
`
`221
`
`when the edge directions are ignored). As in an MRF, a
`random variable is associated with each graph vertex. Given a
`directed graph
`, let the parents (or direct ancestors)
`of vertex
`be the set of vertices of
`that have directed
`edges connecting to
`. For a Bayesian network, the joint
`probability distribution can be written
`
`(3)
`
`indicator function for each local check condition that assumes
`the value unity for valid conﬁgurations and zero for invalid
`conﬁgurations, and by deﬁning a graph in which each element
`of
`forms a clique, an MRF description that assigns a
`uniform probability distribution over the valid conﬁgurations is
`obtained, provided that at least one valid conﬁguration exists.
`As we shall see, a Tanner graph is another way to represent
`the same local check structure.
`
`B. Tanner Graphs
`Tanner graphs were introduced in [22] for the construction
`of good long error-correcting codes in terms of shorter codes.
`Our treatment follows the slightly different presentation of
`Wiberg et al. [20].
`A Tanner graph is a bipartite graph representation for a
`check structure, similar to the one described above. In such
`a graph, there are two types of vertices corresponding to
`the variables and the “checks,” respectively, with no edges
`connecting vertices of the same type. For example, a Tanner
`graph corresponding to the Hamming code described above is
`shown in Fig. 1(b). Each check vertex
`in the set of check
`vertices
`is shown as a ﬁlled circle. In this case, a check
`vertex ensures that its set of neighbors satisﬁes even parity in
`a valid conﬁguration.
`We see that the check vertices play precisely the same role
`in a Tanner graph as do the maximal cliques in an MRF.
`In general, for each check vertex
`with neighbors
`,
`we can associate a nonnegative real-valued potential function
`that assigns positive potential only to valid
`conﬁgurations of its arguments. We then write a probability
`distribution over the variables as
`
`(2)
`
`is a normalizing constant. Of course, (2) is
`where
`analogous to (1).
`An MRF can be converted to a Tanner graph by introducing
`a check vertex for each maximal clique, with edges connecting
`that check vertex to each variable in the clique. The potential
`function assigned to the check vertex would be the same as
`that assigned to the clique.
`A Tanner graph can be converted to an MRF by eliminating
`the check vertices and forming cliques from all variables
`originally connected to the same check vertex. The potential
`associated with the clique would be the same as that assigned
`to the check vertex. It is possible that some new cliques may be
`formed in this process, which are not associated with a check
`vertex of the Tanner graph. A unit potential is assigned to these
`“induced” cliques. Different Tanner graphs may map to the
`same MRF; hence, Tanner graphs may be more speciﬁc about
`dependencies than MRF’s. For example, the graph in Fig. 1(b)
`with an additional check vertex connected to
`, and
`will also map to the MRF in Fig. 1(a).
`
`C. Bayesian Networks
`We now introduce Bayesian networks that, unlike MRF’s
`and Tanner graphs, are directed acyclic graphs [12]. A directed
`acyclic graph is one where there are no graph cycles when the
`edge directions are followed (although there may be cycles
`
`where, if
`
`(i.e.,
`
`has no parents), then we take
`
`.
`Every distribution can be described by a Bayesian network
`since, by the chain rule of probability,
`
`It follows that we can pick any ordering of the variables,
`and then condition each variable on all variables that
`precede it. However, this trivial network does not capture
`any useful probabilistic structure because the last
`factor
`contains all
`variables, and so is
`really just as complicated as the full joint distribution.
`A Bayesian network for the Hamming code described above
`is shown in Fig. 1(c). The joint distribution is obtained from
`(3) using parent–child relationships
`
`The ﬁrst
`
`factors express the prior probabilities of
`four
`, while the last
`three factors capture the parity
`checks: e.g.,
`if
`, and
`have
`even parity and 0 otherwise.
`A Tanner graph (and by extension, an MRF) can be con-
`verted into a Bayesian network simply by directing edges
`toward the check vertices. A binary
`indicator ran-
`dom variable is introduced at each check site
`such that
`only if the random variables in the set
`satisfy the constraint checked by the corresponding vertex in
`the Tanner graph.
`A potential advantage of Bayesian networks is that the
`directed edges (arrows) can be used to model causality ex-
`plicitly. By inspecting the arrows in such models, it is easy
`to determine which variables directly inﬂuence others. This
`often makes it possible to simulate the network, i.e., draw
`a conﬁguration of variables consistent with the distribution
`speciﬁed by the network. One simply draws a conﬁguration
`for variables having no parents, consistent with the (prior)
`distribution affecting those variables. Once a conﬁguration has
`been drawn for all parents
`of a variable , a conﬁguration
`for
`can be drawn consistent with the conditional probability
`. For example, in Fig. 1(c), we simply pick values for
`the parentless vertices
`, and
`, and then determine
`the remainder of the codeword
`, and
`. This explicit
`representation of causality is also useful for modeling physical
`effects, such as channel noise and intersymbol interference.
`It should be noted that simulating a Bayesian network can
`become a hard problem when variables
`for which
`are required to take on a speciﬁc value, i.e., when some child
`
`

`
`222
`
`IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 16, NO. 2, FEBRUARY 1998
`
`simplify this computation, as we now show. A derivation along
`similar lines has also been carried out recently by Aji and
`McEliece [29], who also develop an algorithm for “information
`distribution” on a graph.
`
`A. Notation
`We begin by introducing some notation. Let
`be a ﬁnite
`index set, and let
`be a collection of ﬁnite
`sets called symbol alphabets, indexed by . The conﬁguration
`is deﬁned as the Cartesian product of symbol
`space
`alphabets
`, and elements of
`are called
`conﬁgurations. For
`, let
`denote the projection
`of
`onto the coordinates indexed by
`, so that
`, which is taken to be empty when
`is empty. For
`a conﬁguration
`and nonempty , we denote by
`the
`image of
`under this projection. We denote the complement
`of
`relative to
`as
`. By abuse of notation, we equate the
`pair
`with , although formally, some reordering of
`coordinates may be necessary for this equality strictly to hold.
`A function
`over the set of conﬁgurations is said
`to be a global function. Initially, we assume that the codomain
`is the set of real numbers, but later, we will allow to be
`an arbitrary commutative semiring [21], [29]–[31].
`It will often be useful
`to introduce families of global
`functions, indexed by a set of ﬁnite-dimensional real-valued
`, which are ﬁxed in any instance of distributed
`parameters
`marginalization. In this case, we write
`for the value
`the function assumes at conﬁguration
`. Introducing such
`parameters allows us to take into account the inﬂuence of
`continuous-valued variables such as channel outputs. How-
`ever, for notational convenience, we will sometimes omit the
`explicit dependence on .
`For a set
`, we deﬁne the marginal
`with respect to
`as
`
`function
`
`In other words, the value of the marginal function with respect
`to
`at the point
`is obtained by summing the global
`function over all conﬁgurations that agree with
`in the
`coordinates indexed by
`. Any variable not indexed by
`is
`said to be marginalized out in
`. Note that
`is the constant
`obtained by summing over all conﬁgurations of variables,
`while
`. We have chosen the symbol
`for the global
`function, as we view
`as a “Zustandssumme” (a sum-over-
`states), i.e., a partition function as in statistical physics (see,
`e.g., [32, p. 13]).
`If the function
`is the joint probability mass function of
`a collection of random variables indexed by , then
`is
`the marginal joint probability mass function for the random
`variables indexed by
`, and
`. Reintroducing the
`parameter
`, suppose the function
`is the conditional
`joint probability mass function of a collection of random
`variables given the observation of continuous-valued random
`vector
`. Then the marginal functions represent conditional
`probability mass functions. For example,
`given
`, the conditional probability mass function for
`the observed value of
`. Such formulations will often be useful
`in decoding problems, when the continuous-valued output of
`a noise channel is observed.
`
`Fig. 2. General Bayesian network for channel coding.
`variables are “clamped.” For example, drawing a conﬁguration
`of variables consistent with the observed output of a channel
`is essentially as hard as (or harder than) decoding. Similarly,
`when a Tanner graph is converted into a Bayesian network
`in the manner described above, it may be difﬁcult to draw a
`valid conﬁguration of the variables, as all indicator variables
`have a nonempty set of parents and all are required to take
`on the value one.
`In coding, the relationships among the information sym-
`bols
`, the encoder state variables
`(if there are any), the
`transmitted codeword symbols
`, and the received signals
`completely deﬁne the encoding and decoding problem for a
`given code. Without loss of generality, these relationships can
`be expressed probabilistically and depicted pictorially using
`graphical models. The Bayesian network for channel coding
`in general is shown in Fig. 2. By inspection of the network,
`the joint distribution is
`
`Usually,
`
`and
`is a uniform distribution and
`are deterministic (i.e., all probability mass is placed
`on a single outcome). The channel likelihood
`expresses
`the noise and intersymbol
`interference introduced by the
`channel.
`Fig. 3(a) shows the Bayesian network for a systematic
`convolutional code with a memoryless channel. The systematic
`codeword symbols
`are simply copies of the information
`symbols
`. The other codeword symbols are outputs of the
`encoder;
`depends on
`and state
`. By inspecting
`the parents of the received signals, we ﬁnd that
`which expresses the absence of
`memory in the channel. Fig. 3(b) shows a cycle-free network
`for the same code, obtained by grouping information and state
`variables together. This eliminates undirected cycles at the
`expense of increasing the complexity of some of the network
`variables.
`Further examples of Bayesian networks for codes will be
`discussed in Section V. In the next section, we will describe
`the basic distributed marginalization algorithm that will form
`the basis for iterative decoding.
`
`III. A FRAMEWORK FOR DISTRIBUTED MARGINALIZATION
`In this section, we develop the basic “probability prop-
`agation” algorithm that can be used to compute marginal
`probabilities in graphical models, given some observations.
`A common feature of the graphical models described in the
`previous section is that they can be used to describe a “global”
`joint probability distribution as a product of “local” functions.
`The computation of a conditional probability then amounts
`essentially to a “marginalization” of this global function. Using
`the structure of the local functions, it may be possible to greatly
`
`

`
`KSCHISCHANG AND FREY: ITERATIVE DECODING OF COMPOUND CODES
`
`223
`
`(b)
`(a)
`(a) Bayesian network for a systematic convolutional code and a memoryless channel. (b) Cycle-free connected network for the same code and channel.
`
`Fig. 3.
`
`is small,
`,
`the number of arguments of
`,
`When
`we will sometimes use a modiﬁed notation for the mar-
`ginal
`functions. We replace an argument
`of
`with
`a “ ” sign to indicate that
`the corresponding variable
`is to be summed over,
`i.e., marginalized out. Thus,
`if
`
`, and so on.
`It will often be useful to marginalize some variables while
`holding other variables constant, for example, in the case of
`computing a conditional probability mass function given that
`some variables are observed. Since the key operation in the
`computation of a marginal function or in the computation of
`a conditional probability is marginalization, we shall focus
`attention on developing efﬁcient algorithms for this operation.
`
`B. Local Functions and Factor Graphs
`The key to efﬁcient marginalization is to take into account
`any structure that the global function
`possesses. Suppose
`that
`is “separable,” i.e., that
`can be written as the product
`of a number of local functions, each a function of the variables
`contained in a subset of
`. More precisely, let
`be
`a collection of nonempty subsets of
`, and suppose
`
`(4)
`
`The functions
`are called local functions.
`For example, suppose that
`are random variables
`forming a Markov chain (in that order) given a speciﬁc
`observation
`. (For example, these random variables
`might represent the state sequence of a convolutional code
`in successive time intervals, and
`might represent the cor-
`responding channel output.) The conditional joint probability
`mass function can be written as
`
`Translating to the notation of this section, and observing
`that a conditional probability mass function
`is
`essentially a function of two variables (since
`is a constant),
`we write
`
`(5)
`We will have occasion to consider products of local func-
`and
`tions. For example, in (5), the product of
`is a function of three variables that we denote
`
`(a)
`
`(b)
`
`(d)
`(c)
`Fig. 4. Factor graphs for (a) a Markov chain, (b) a loopy example, and their
`corresponding second higher power graphs, (c) and (d), omitting self-loops.
`
`. We will also apply the “ ”-sign notation
`to local functions and their products.
`It will be useful to display a particular factorization of
`the global function by means of a bipartite graph called
`a factor graph. Suppose
`factors as in (4). A factor
`graph
`is a bipartite graph with vertex set
`. The only edges in
`are those
`that connect a vertex
`to a vertex
`if and only
`if
`. In words, each
`, i.e.,
`vertex of a factor graph
`corresponds to either a variable
`or a local function. An edge joins a variable
`to a local
`function
`if and only if
`appears as an argument of
`.
`For example, Fig. 4 shows the factor graph corresponding to
`the Markov chain (5). Note that a factor graph is essentially
`a generalization of a Tanner graph, in which local “checks”
`involving the incident variables have been replaced with
`local functions involving the incident variables.
`It is a straightforward exercise to convert the various graph-
`ical models described in Section II into a factor graph repre-
`sentation. A Markov random ﬁeld
`that expresses a Gibbs
`potential function yields a factor graph with one local function
`vertex for every maximal clique, i.e., a local function vertex
`for every factor in (1). A Tanner graph directly yields a factor
`graph by associating with each check vertex a binary indicator
`function that indicates whether the local check condition is sat-
`isﬁed. More generally, each factor of (2) can be associated with
`a local function vertex, as in the MRF case. Finally, a Bayesian
`network is converted into a factor graph by introducing a local
`function vertex for every factor of (3), and a variable vertex for
`each variable. Clearly, the local function vertex associated with
`
`

`
`224
`
`IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 16, NO. 2, FEBRUARY 1998
`
`Fig. 5. Computation of marginal functions by message passing. The mes-
`sages are descriptions of (possibly marginalized) local function products, and
`are passed along the graph edges as indicated by the arrows. (The arrows
`themselves are not part of the graph.)
`is connected by an edge to the variable vertices in
`the set
`. Thus, a Bayesian network with
`variables
`yields a factor graph with
`vertices.
`C. Marginalization by Message Passing
`Our aim is to derive a graph-based algorithm for computing
`for all
`. The functions
`the marginal functions
`are called “local” functions because we assume that access
`to these functions is local to a speciﬁc vertex in the graph.
`Knowledge of local functions (or functions derived from local
`functions) can be propagated to nonlocal vertices by “message
`passing” along the edges of the graph. The “messages” are
`descriptions of (possibly marginalized) local function products.
`To illustrate what we mean by this, and to motivate the
`general situation in which this “graph-based message-passing”
`paradigm is made more precise, we consider the speciﬁc case
`where
`is deﬁned in (5).
`Consider the computation of
`write
`
`. We
`
`(6)
`
`where we have identiﬁed the various factors that need to be
`computed to obtain
`. Our primary observation is
`that
`can be computed knowing just
`and
`. The latter factor can be computed knowing
`just
`and
`.
`Analyzing the computation of the remaining marginal func-
`tions in the same manner, we ﬁnd that
`
`(7)
`
`(8)
`
`Examining (6)–(8), we see that all marginal functions can be
`computed recursively from a chain of local function products,
`which we view as messages passed between the vertices of the
`“propagation tree” shown in Fig. 5. Comparing with (6)–(8),
`we observe that the information passed to the vertex associated
`with
`, is precisely that needed to compute the
`marginal function for
`, and so we choose that vertex as a
`“fusion site” for the variable
`.
`We consider now the general case. Let
`be a
`factor graph, describing the way in which the global function
`factors as a product of local functions as in (4). Consider
`the second higher power graph
`, deﬁned as having the same
`
`vertex set as
`, but with edges connecting two vertices if and
`only if there is a path of length two between the vertices in
`.
`Self-loops are ignored. Since
`is bipartite,
`will always
`split into at least two disconnected components,
`having
`vertices associated with variables and
`having vertices
`associated with local functions. For example, Fig. 4(c) and
`(d) shows the second higher power graphs associated with the
`factor graphs shown in (a) and (b).
`Observe that in
`, two variables will be joined by an edge
`if they are both arguments of the same local function, so all
`arguments of a particular local function form a clique. When
`the local functions of the factor graph correspond to Gibbs
`potential functions,
`then
`is the corresponding Markov
`random ﬁeld. In other words, an MRF can be recovered as
`a component of the second higher power graph (omitting
`self-loops) of the corresponding factor graph.
`Consider now
`, which we call the propagation graph
`. We assume that
`consists of a single
`corresponding to
`connected component; if not, marginalization can be carried
`out independently in the various components. The vertices of
`correspond to local functions. Two vertices are joined
`by an edge for each argument that the corresponding local
`functions have in common, although we will collapse multiple
`edges between two vertices to a single edge. A description of
`the general message-passing algorithm is simpliﬁed by imag-
`ining that a vertex is an active processing element, capable of
`receiving and transmitting messages (i.e., marginalized local
`function products) along the edges incident on the vertex, and
`capable of performing computations involving messages and
`the local function associated with the vertex.
`We now describe a general distributed marginalization algo-
`rithm that operates in a tree spanning the propagation graph;
`we refer to this spanning tree as a propagation tree. Given a
`factor graph
`, we must
`1) specify a spanning tree
`for
`2) identify a “fusion vertex” in
`tion to be computed.
`Note that
`is in general quite different from the graphical
`model (MRF, Tanner graph, or Bayesian network) from which
`is derived. In general, it may be a difﬁcult problem to choose
`optimally so as, e.g., to minimize overall computational
`complexity. For now, we assume that
`is chosen in some
`(arbitrary) manner.
`For a given
`is involved at a
`, we say that a variable
`vertex of
`if
`is an argument of the corresponding local
`function. Let
`be an edge of
`. We say that a given variable
`must be carried over
`if
`is part of a path that joins any
`vertex in which
`is involved with the fusion vertex for
`.
`In essence,
`must be carried over an edge of the subtree of
`that spans the fusion vertex for
`and the other vertices
`in which
`is involved. Outside this subtree, only marginal
`knowledge of
`is needed, and hence
`can be marginalized
`out. Given a propagation tree
`, and an assignment of fusion
`vertices, it is easy to determine which variables must be carried
`over any given edge in
`. (For example, each edge of the
`trees shown in Figs. 5 and 6 is labeled with the indexes of the
`variables to be carried over that edge.)
`The size of the messages sent over an edge is greatly
`inﬂuenced by the number of variables that must be carried
`over the edge, and by the number of possible values that each
`
`, and
`for each marginal func-
`
`

`
`KSCHISCHANG AND FREY: ITERATIVE DECODING OF COMPOUND CODES
`
`225
`
`such variable can assume. A simplistic measure of complexity
`associated with an edge is its thickness, deﬁned as the number
`of variables that are to be carried over that edge. (A more
`useful measure would be the product of the sizes of the
`symbol alphabets corresponding to these variables, or the size
`of a minimal description of the corresponding local function
`product.) It may be desirable to ﬁnd a propagation tree and an
`assignment of fusion vertices so that the maximum thickness
`is minimized, but we conjecture that this is a hard problem in
`general. (Given a propagation tree, the maximum thickness is
`minimized if the fusion vertex for a variable
`is a vertex in
`the subtree of
`that spans the vertices in which
`is involved,
`so the problem is to ﬁnd a suitable propagation tree.)
`To illustrate that
`it
`is not always possible to achieve
`unit maximum thickness, consider
`the global
`function
`. The factor
`graph and its second higher power graph are shown in Fig. 4(b)
`and (d). By symmetry, there is essentially only one propagation
`tree for this function, as shown in Fig. 6. Numbering the
`vertices in t

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases