`
`Marco Gomes, Gabriel Falcão, Vitor Silva, Vitor Ferreira, Alexandre Sengo and Miguel Falcão*
`Instituto de Telecomunicações, Pólo II da Universidade de Coimbra, 3030-290 Coimbra, Portugal
`*Chipidea Microelectrónica S.A., Rua Frederico Ulrich, n. 2650, 4470-605 Moreira da Maia, Portugal
`e-mail: marco@co.it.pt, gff@co.it.pt, vitor@co.it.pt, vitorhugo@co.it.pt, sengo@co.it.pt, mfalcao@chipidea.com
`
`
`Abstract — State-of-the-art decoders for DVB-S2 low-density
`parity-check (LDPC) codes explore semi-parallel architectures
`M =
`based on the periodicity
` factor of the special type of
`360
`LDPC-IRA codes adopted. This paper addresses the generalization
`of a well known hardware M-kernel parallel structure and proposes
`an efficient partitioning by any factor of M, without addressing
`overhead and keeping unchanged the efficient message memory
`mapping scheme. Our method provides a simple and efficient way
`to reduce the decoder complexity. Synthesizing the decoder for an
`FPGA from Xilinx shows a minimum throughput above the minimal
`90Mbps.
`
`
`
`I. INTRODUCTION
`
`The recent Digital Video Satellite Broadcast Standard
`(DVB-S2) [1] [2] has adopted a powerful FEC scheme based
`on the serial concatenation of BCH and Low Density Parity
`Check (LDPC) codes. This new FEC structure, combined
`with the adoption of high order modulations (QPSK, 8PSK,
`16APSK and 32APSK), is able to provide capacity gains of
`about 30% over the previous DVB-S standard [2], with the
`LDPC codes playing a fundamental role in this raise of
`performance.
`LDPC codes are linear block codes defined by sparse
`parity-check matrices [3] [4] [5], H and, usually, represented
`by Tanner graphs [6]. A Tanner graph is a bi-partite graph
`formed by two types of nodes. Check nodes ( Cν ), one per
`each code constraint, and bit nodes one per each codeword bit
`Iν and
`Pν ), with the
`(information and parity, respectively,
`connection edges between them being given by H.
`They are decoded using low complexity iterative belief
`propagation algorithms operating over the Tanner graph
`description [7]. However, a major drawback is their high
`encoding complexity caused by the fact that the generator
`matrix, G, is, in general, not sparse. In order to overcome this
`problem, DVB-S2 standard has adopted a special class of
`LDPC codes, with linear encoding complexity, known by
`Irregular Repeat-Accumulate (IRA) [8] [9].
`An important issue in the design of LDPC encoder and
`decoder architectures for DVB-S2 is the fact that the standard
`supports two different frame lengths (16200 bits for low
`delay applications and 64800 bits otherwise) and a set of
`different code rates (1 4 , 1 3 , 2 5 , 1 2 , 3 5 , 2 3 , 3 4 ,
`4 5 , 5 6 , 8 9 and 9 10 ) for both frame lengths and
`different modulation schemes [1] [9]. For each mode of
`operation is defined a different LDPC code and, although
`they share a similar structure and properties, this still poses
`
`2
`
`an enormous challenge on the development of an encoder and
`a decoder fully compliant with all operating modes.
`The decoder state-of-the-art is based on a flexible partial
` periodicity
`parallel architecture that explores the
`M =
`360
`nature of DVB-S2 LDPC codes [10]. Although capable of
`providing a throughput far above from the minimum
`mandatory rate of 90 Mbps , this architecture requires a huge
`ASIC area of
`22.74 mm on a ST Microelectronics 0.13 mµ
`technology, mainly due to the high number (360) of
`computation kernels or functional units (FU) and the wide
`length of the barrel shifter. In order to decrease the number of
`computation kernels to only 45 FU’s and to reduce the length
`of the barrel shifter, an alternative solution was proposed [11]
`which uses a re-structured version of H. As a consequence,
`this approach increases the complexity of the DVB-S2 de-
`interleaver and doubles (almost) the input memory in terms
`of [10].
`In this paper we generalize the architecture [10] and surpass
`its disadvantages. We will show that it is possible to reduce
`the number of computation kernels to any integer factor of
`, without addressing overhead and keep unchanged
`M =
`360
`the efficient message memory mapping scheme [10]. Our
`strategy also reduces the length of the barrel shifter by the
`same factor and considerably simplifies the routing problem.
`The throughput is reduced by the same factor but this does
`not represent a real problem since the architecture [10] is able
`to provide a throughput far above from the mandatory
`minimum rate. Thus, we provide a simple and efficient
`method to reduce the decoder complexity without loosing the
`throughput goals.
`The next section briefly describes DVB-S2 LDPC-IRA
`codes. Section III addresses the LDPC decoding for DVB-S2
`using a partial parallel architecture and its generalization by
`sub-sampling it by a factor of M. Synthesis results are
`presented in section IV and final conclusions are pointed out
`in section V.
`
`
`
`II. DVB-S2 LDPC-IRA CODES
`
`The new DVB-S2 [1] [9] standard adopted a special class
`of LDPC codes known by IRA codes [8] as the main solution
`for the FEC system. An IRA code is characterized by a parity
`check matrix, H , of the form,
`
`0001
`
`CALTECH - EXHIBIT 2007
`Apple Inc. v. California Institute of Technology
`IPR2017-00219
`
`
`
`H
`
`(
`
`)
`− ×
`n k n
`
`
`
`
`=
`
`
`
`
`=
`
`
`
`
`
`
`A
`
`(
`
`)
`− ×
`k
`n k
`
`B
`
`(
`
`) (
`− × −
`n k
`n k
`
`)
`
`
`
`
`a
`
`00
`
`a
`
`10
`
`(cid:2)
`
`(cid:2)
`
`a
`
`01
`
`a
`
`11
`
`(cid:1)
`
`(cid:1)
`
`a
`
`0 ,
`
`k
`
`−
`1
`
`a
`
`1,
`
`k
`
`−
`1
`
`(cid:2)
`
`(cid:2)
`
`1
`
`1
`
`0
`
`0
`
`1
`
`1
`
`(cid:1) (cid:1) (cid:1)
`
`0
`
`1
`
`(cid:3)
`
`(cid:2) (cid:3) (cid:3) (cid:3)
`
`,
`
`(1)
`
`0
`
`(cid:2)
`
`(cid:2)
`
`(cid:2)
`
`0
`
`1
`
`
`
`
`
`
`
`
`
`
`
`
`functional units working in parallel. In this paper we will
`show that it is possible to reduce the number of functional
`units by any integer factor of M , without addressing
`overhead, keeping unchanged its efficient memory mapping
`scheme. Our approach does not only surpass the architecture
`[10] disadvantages, but also makes the architecture flexible
`and easy
`reconfigurable according with
`the decoder
`constraints.
`
`0
`
`1
`
`1
`
`a
`
`− −
`n k
`
`2 ,0
`
`a
`
`− −
`n k
`
`2 ,1
`
`a
`
`− −
`1,0
`n k
`
`a
`
`− −
`1,1
`n k
`
`(cid:1)
`
`(cid:1)
`
`a
`
`− −
`n k
`
`2 ,
`
`k
`
`−
`1
`
`a
`
`− −
`1,
`n k
`
`k
`
`−
`1
`
`(cid:2)
`
`0
`
`(cid:3)
`
`… …
`
`1
`
`0
`
`A. Modulo M parallel architecture
`
`As previously described, DVB-S2 adopted a special class of
`structured LDPC-IRA codes with the properties stated in (2).
`Iν and
`Cν
`This turns possible the simultaneous processing of
`node sets, whose indices are given by,
`( )
`c
`M = , and
`, with, mod
`(cid:1)
`0
`c
`(3)
`, with, 0
`≤ ≤ − ,
`+
`+
`−
`=
`+
`{ ,
`,
`2 ,
`1
`(
`1) }
`R
`,
`r r q r
`q
`q
`r
`r M
`q
`respectively, (the superscript is the index of the first element
`of the set and, ‘r’ and ‘c’ mean row and column of H), which
`significantly simplifies the decoder control. In fact, according
`to (2), if
`(cid:5) is connected to
`, then,
`− ,
`i qν + × , with 0
`i M≤ ≤
`1
`× (cid:5)
`, where,
`will be connected to,
` is
`ν + − +(cid:5)
`=
`)
`( div
`M
`c M c
`( )C c to which
`Iν of the group
`the index of the first
`(cid:5)
`belongs.
`The architecture shown in Fig. 1 is based on M functional
`units (FU) working in parallel with shared control signals [12],
`Cν (in check mode) and
`Iν nodes (in bit
`that process both
`mode) in a flooding schedule manner [13] [14]. Attending to
`Pν and
`Cν nodes, they are
`the zigzag connectivity between
`updated jointly in check mode following a horizontal schedule
`
`I cν
`
`C r
`
`C rν
`
`I
`c
`
`(
`
`c c i
`
`) mod
`
`M
`
`I cν
`
`C
`
`=
`
`{ ,
`c c
`
`+
`
`1,
`
`,
`
`+
`c M
`
`−
`
`1}
`
`( )
`r
`
`(cid:1)
`
`approach [15]. A detailed description of the FU operation can be
`
`found in [12].
`
`
`RAM
`
`ROM
`
`ROM
`
`SHIFTS
`
`ADDRESSES
`
`q x wCN mem positions
`
`Message MEM (M-1)
`
`...
`...
`
`...
`
`...
`
`RAM
`
`RAM
`
`RAM
`
`RAM
`
`Message MEM 3
`
` Message MEM 2
`
`Message MEM 1
`
`Message MEM 0
`
`Sequential
`Counter
`
`01
`
`Shared
`Control
`
`Control:
`‘0’ - IN mode
`‘1’ - CN mode
`
`Control
`
`FU0
`
`FU1
`
`FU2
`
`FU3
`
`...
`
`FUM-1
`
`BARREL SHIFTER
`
`Figure 1. Modulo M parallel architecture for DVB-S2 LDPC
`decoding.
`
`Memory mapping and shuffling mechanism
`
`As mentioned before, a single FU unit is shared by a
`Iν ,
`Cν and
`Pν nodes (the last two are
`constant number of
`
`where B is a staircase lower triangular matrix. By restricting
`A to be sparse, it is obtained an LDPC-IRA code [9].
`The H matrices of the DVB-S2 LDPC codes have other
`properties beyond being of IRA type. Some periodicity
`constraints were put on the pseudo-random design of the A
`matrices, which allows a significant reduction on the storage
`requirement without code performance loss.
`The matrix A construction technique is based on dividing
`Iν nodes in disjoint groups of M consecutives ones. All
`the
`Iν nodes of a group l should have the same weight,
`the
`lw ,
`Cν nodes that connect
`and it is only necessary to choose the
`Iν of the group in order to specify the
`Cν nodes
`to the first
`that connect to each one of the remaining
`1M − nodes. The
`connection choice for the first element of group l is pseudo-
`random with the restrictions that the resulting LDPC code is
`cycle-4 free, the number of length 6 cycles is the shortest
`Cν nodes must connect to the same
`possible and all the
`Iν nodes.
`number of
`Cν nodes that
`Denoting by,
`r… , the indices of the
`,
`2,
`,
`r r
`1
`lw
`Iν of group l , the indices of the
`Cν nodes
`connect to the first
`− , of group l can be
`that connect to
`iν , with 0
`i M≤ ≤
`1
`obtained by,
`{(
`r
`1
`
`I
`
`
`
`+ ×
`i q
`
`) mod(
`
`−
`n k
`
`), (
`
`r
`2
`
`+ ×
`i q
`
`) mod(
`
`−
`n k
`
`),
`
`…
`
`,
`
`(
`
`r
`lw
`
`+ ×
`i q
`
`) mod(
`
`−
`n k
`
`)},
`
`
`
`(2)
`
`360
`
` (a common factor for all
`
`with
` and
`M =
`−
`=
`)
`(
`q
`n k M
`DVB-S2 supported codes).
`Another property of matrix A is that for each supported
`Iν nodes of constant weight
`code, there are a set of groups of
`3w > ( w is code dependent) and the remaining have all
`weight 3.
`
`
`
`III. DVB-S2 LDPC DECODING
`
`The huge dimensions of the LDPC-IRA codes adopted by
`the DVB-S2 standard, turns impractical the adoption of a
`fully parallel architecture that maps the Tanner graph
`structure [12]. Besides that, such solution is code dependent,
`which means that is required a different full parallel decoder
`for each code defined in the standard.
`Best known solutions are based on highly vectorized partial
`parallel architectures [10] [11], that explore the particular
`characteristics of the DVB-S2 LDPC-IRA codes, namely, the
`periodic nature (
`) shared by all the codes. One
`M =
`360
`solution was proposed in [10], whose architecture uses M
`
`0002
`
`
`
`processed jointly), depending on the code length and rate.
`More precisely, for a ( ,
` DVB-S2 LDPC-IRA code, the
`)n k
`FUi, with 0
`I
`ν
`{
` +
` + ×
`,
`,
`2
`i i M i M
`
`(cid:1)
` +
`
`,
`,
`i
`
`(
`)
`− ×
`Mα
`1
`
`− , updates sequentially in bit mode the
`i M≤ ≤
`1
`. In check mode, the
` nodes, with
`k Mα =
`
`}
`
`{
`+
`,
`,
`c c L c
`{
`c
`
`
`
`C
`
`( )
`c
`0
`
`( )
`c
`C
`1
`
`C
`
`=
`
`=
`
`(cid:2)
`
`=
`
`+
`
`1,
`
`+
`
`2 ,
`L
`
`(cid:1)
`
`,
`
`+
`(
`c N
`
`− ×
`1)
`
`}
`L
`
`c
`
`+ +
`1
`
`,
`L c
`
`+ +
`1 2 ,
`L
`
`(cid:1)
`
`,
`
`c
`
`+ +
`1 (
`
`N
`
`− ×
`1)
`
`}
`L
`
`.
`
`(5)
`
`1,
`
`c
`
`+
`
`2
`
`L
`
`−
`
`1,
`
`c
`
`+
`
`3
`
`L
`
`−
`
`1,
`
`(cid:1)
`
`,
`
`+
`× −
`c N L
`
`}
`1
`
`C rν
`
`I c
`
` nodes, with
`
`( )
`c
`−
`1
`L
`
`{
`+ −
`c L
`
`) mod(
`
`)
`
`C (
`
`r i L q
`
`0
`
`R
`
`0
`
`R
`
`=
`
`=
`
`,
`
`r
`
`+
`
`(
`
`N
`
`}
`− × ×
`1)
`L q
`
`+
`
`(
`
`L
`
`,
`q r
`
`+
`
`(2
`
`L
`
`q
`
`,
`
`(cid:1)
`
`,
`
`r
`
`+
`
`((
`
`N
`
`− × + ×
`1)
`1)
`L
`
` and
`
`P
`ν +
`{
`j
`
`,
`
`j
`
`1,
`
`}
`(cid:1)
` + −
`1
`,
`j q
`
`same FU updates the
`
`j
`
`= × .
`i q
`
`This
`
`C
`ν +
`{
`j
`
`,
`
`j
`
`1,
`
`}
`(cid:1)
` + −
`1
`,
`j q
`
`( )
`C c
`Each sub-group,
`1Lγ≤ ≤ − , can be described in
`γ , with 0
`terms of the first node of the subgroup (2),
` is
`γν + . If
`( )cCγ ,
`connected to the first information node of the subgroup,
`Iν node, with
`then,
` is connect to the i-th
`ν + × ×
`−
`n k
`− , of the referred subgroup.
`≤ ≤
`1
`i N
`Equally, the same down-sampling process by L can be
`( )R r group as:
`done on each
`( )
`{
`r
`(cid:1)
`+ ×
`+
`×
`,
`2
`,
`,
`r r L q r
`L q
`{
`+
`,
`r q r
`
`}
`q
`
`( )
`r
`1
`
`+ ×
`1)
`
`+ ×
`1)
`
`(cid:2)
`
`R
`
`( )
`r
`−
`1
`L
`
`=
`
`{
`r
`
`+
`
`(
`
`L
`
`− ×
`1)
`
`,
`q r
`
`+
`
`(2
`
`L
`
`− ×
`1)
`
`,
`q r
`
`+
`
`(3
`
`L
`
`− ×
`1)
`
`q
`
`,
`
`(cid:1)
`
`,
`
`r
`
`+
`
`(
`
`× − ×
`1)
`N L
`
`}
`q
`
`
`
`and, in a similar way, each subgroup,
`
`( )
`R r
`β , with 0
`
`(6)
`≤ − ,
`1Lβ≤
`
`C r
`
`qβν + × . If
`( )
`R r
`β , then,
`
`can be described just in terms of the first element,
`
`I cν
`
`(cid:5) is connected to the first node of sub-set
`
`ν +
`
`I
`c
`
`((
`
`(cid:5)
`− + ×
`c c i L
`
`)mod
`
`M
`
`)
`
`, with
`
`× (cid:5)
`=
`( div
`c M c
`
`M
`
`)
`
`, is connected to the i-th
`
`Cν , with 0
`
`≤ ≤
`i N
`
`− , of the considered subgroup.
`1
`
`From the framework just described in (5) and (6), we
`conclude that the down-sampling approach preserves the key
`modulo M properties and, thus, we can process individually
`( )
`( )
`C c
`R r
`each
`γ and
`β subgroup and the same architecture [10]
`
`can be used with only N processing units as shown in Fig. 2.
`( )cCγ , the
`In fact, when processing simultaneously a group
`( )rRβ and,
`
`computed messages have as destination a set
`
`vice-versa.
`
`Memory mapping and shuffling mechanism
`
`The down-sampling strategy allows a linear reduction (by a
`factor of L) of the hardware resources occupied by the FU’s
`blocks, reduces significantly the complexity of the barrel
`shifter (
`N ) and simplifies the routing problem. Yet,
`(
`log
`)
`O N
`
`2
`
`at first glance, it may seem that this strategy implies an
`increase by L in the size of the system ROM (Shifts and
`Addresses). Fortunately, if we know the properties of the
`( )
`( )
`0C c and
`0R r , we automatically know
`subgroups
`the
`( )C c
`( )R r
`γ and
`β
`
`the remaining subgroups,
`
`properties of
`
`respectively, with 0
`
`γ β≤
`,
`
`≤
`
`− . By a proper message
`1N
`
`memory mapping based on a convenient reshape by L of the
`matrix R (4), we can keep unchanged the size of the system
`ROM and compute on the fly the new shifts and addresses
`values as functions of the ones stored in the ROM of Fig. 2,
`( )
`( )
`0C c and
`0R r groups.
`i.e., for all
`
`
`
`processing
`that when
`guarantees
`( )C c , the computed messages have
`simultaneously the group
`( )R r , where each one of them will be
`as destination a set
`processed by a different FU. Considering (2), the new
`computed messages only need to be right rotated to be
`Cν nodes. The same happens when
`handled by the correct
`( )R r set, where according to (2), the right
`processing each
`rotation must be reversed in order to the new computed
`Iν nodes. The
`messages have as destination the exact
`shuffling network (barrel shifter) is responsible for the
`Cν and
`Iν nodes,
`correct message exchange between
`emulating the code Tanner graph. The shift values stored in
`ROM (Fig. 1) can be easily obtained from the annexes B and
`C of DVB-S2 standard tables [1].
`The messages sent along the Tanner graph edges are stored in
`RAM (see Fig. 1). If we adopt a sequential RAM access in bit
`mode, then, the access in check mode must be indexed or
`vice-versa. Both options are valid, so, without loss of
`generalization, we assume sequential access in bit mode.
`Cν node indices
`T
`Denoting by,
`, the vector of
`(cid:1)
`=r
`[
`]
`r r
`r
`1
`i
`i
`iw
`i
`i
`connected to the
`iν node of weight,
`iw , then, the message
`memory mapping can be obtained using the following matrix,
`
`
`2
`
`I
`
`
`
`
`
`
`
`
`
`(
`
`)
`×
`×
`q w M
`C
`
`,
`
`(4)
`
`r
`M
`r
`2
`
`−
`1
`
`M
`
`−
`1
`
`(cid:1) (cid:1)
`
`(cid:2)
`
`(cid:2)
`
`r
`1
`r
`M
`
`(cid:2)
`
`+
`1
`
`r
`(
`)
`α
`− ×
`1
`
`r
`(
`)
`α
`− × +
`1
`1
`M
`
`M
`
`(cid:1)
`
`r
`
`× −
`α
`1
`M
`
`
`
`R
`
`r
`0
`r
`M
`
`
`
`
`=
`(cid:2)
`
`
`
`
`
`where,
`
`Cw , is a code constant ( Cν weight is
`
`Cw + , except for
`2
`
`the first one (1)).
`( )R r set in check mode, the
`In order to process each
`required memory addresses can be obtained by finding the
`matrix R rows where the index r appears.
`
`B. Sub-sampling by a factor of M
`
`The simplicity of the shuffling mechanism and the efficient
`memory mapping scheme, constitute the major strengths of
`the architecture just described [10]. However, the high
`number of FU’s and the long width of the barrel shifter
`require a huge silicon area. Since this architecture is able to
`provide a throughput far above from the minimum mandatory
`rate of 90 Mbps , we may reduce the number of FU’s. In fact,
`we will show that this can be done by any factor of M .
`Let be
`,L N ∈ (cid:6) factors of M , with, M L N
`, and
`= ×
`( )C c set (3). This group can be decomposed by
`consider a
`down-sampling in L subgroups as:
`
`0003
`
`
`
`according to the decoder constraints and represents a trade off
`between silicon area and decoder throughput.
`Synthesis results show that the implementation of a
`complete LDPC-IRA DVB-S2 decoder is possible with 45
`functional units for Xilinx XC2VP FPGAs family.
`
`
`REFERENCES
`
`[7]
`
`[1] ETSI, Digital video broadcasting (DVB); Second generation
`framing structure, channel coding and modulation systems for
`broadcasting, interactive services, news gathering and other
`broad-band satellite applications: EN 302 307 V1. 1.1, 2005.
`[2] A. Morello and V. Mignone, "DVB-S2: The second
`generation standard
`for satellite broad-band services,"
`Proceedings of the IEEE, vol. 94, pp. 210-227, 2006.
`[3] R. G. Gallager, "Low-Density Parity-Check Codes," Ire
`Transactions on Information Theory, vol. 8, pp. 21-&, 1962.
`[4] D. J. C. MacKay, "Good error-correcting codes based on very
`sparse matrices," IEEE Transactions on Information Theory,
`vol. 45, pp. 399-431, 1999.
`[5] S. Y. Chung, G. D. Forney, T. J. Richardson, and R. Urbanke,
`"On the design of low-density parity-check codes within
`0.0045 dB of the Shannon limit," IEEE Communications
`Letters, vol. 5, pp. 58-60, 2001.
`[6] R. M. Tanner, "A Recursive Approach to Low Complexity
`Codes," IEEE Transactions on Information Theory, vol. 27,
`pp. 533-547, 1981.
`J. H. Chen and M. P. C. Fossorier, "Near optimum universal
`belief propagation based decoding of low-density parity check
`codes," IEEE Transactions on Communications, vol. 50, pp.
`406-414, 2002.
`[8] H. Jin, A. Khandekar, and R. McEliece, "Irregular repeat-
`accumulate codes," In. Proc. 2nd International Symposium on
`Turbo Codes & Related Topics, Brest, France, Sept 2000.
`[9] M. Eroz, F. W. Sun, and L. N. Lee, "DVB-S2 low density
`parity check codes with near Shannon limit performance,"
`International Journal of Satellite Communications and
`Networking, vol. 22, pp. 269-279, 2004.
`[10] F. Kienle, T. Brack, and N. Wehn, "A Synthesizable IP Core
`for DVB-S2 LDPC Code Decoding," In. Proc. Design,
`(DATE'05), Munich,
`Automation and Test
`in Europe
`Germany, Mar. 2005.
`[11] J. Dielissen, A. Hekstra, and V. Berg, "Low cost LDPC
`decoder for DVB-S2," In. Proc. Design, automation and test
`in Europe: Designers' forum (DATE'06), Munich, Germany,
`Mar. 2006.
`[12] M. Gomes, G. Falcão, J. Gonçalves, V. Silva, M. Falcão, and
`P. Faia, "HDL Library of Processing Units for Generic and
`DVB-S2 LDPC Decoding,"
`In. Proc.
`International
`Conference
`on
`Signal Processing
`and Multimédia
`Applications (SIGMAP2006), Setúbal, Portugal, Aug. 2006.
`[13] J. T. Zhang and M. P. C. Fossorier, "Shuffled iterative
`decoding," IEEE Transactions on Communications, vol. 53,
`pp. 209-213, 2005.
`[14] H. Xiao and A. H. Banihashemi, "Graph-based message-
`passing schedules
`for decoding LDPC codes,"
`IEEE
`Transactions on Communications, vol. 52, pp. 2098-2105,
`2004.
`[15] E. Sharon, S. Litsyn, and J. Goldberger, "An efficient
`message-passing schedule for LDPC decoding," Electrical
`and Electronics Engineers in Israel, 2004. Proceedings. 2004
`23rd IEEE Convention of, pp. 223-226, 2004.
`
`ROM
`
`SHIFTS
`
`ROM
`
`ADDRESSES
`
`positions
`
`q x wCN mem
`
`A0
`
`A1
`
`A2
`
`A3
`
`A4
`
`A5
`
`A357
`
`A358
`
`A359
`
`...
`
`Address
`Calculator
`
`Sequential
`Counter
`
`01
`
`Control:
`‘0’ - IN mode
`‘1’ - CN mode
`
`A4
`
`A5
`
`...
`
`A358
`
`A359
`
`A1
`
`A2
`
`q x wCN
`
`q x wCN
`
`q x wCN
`
`FU0
`
`FU1
`
`...
`
`FUN-1
`
`Shared
`Control
`
`Control
`
`BARREL SHIFTER
`
`...
`
`Shift
`Calculator
`
`Figure 2. Factorizable modulo M parallel architecture for
`DVB-S2 LDPC decoding.
`
`
`For the configuration shown in Fig. 2, each FUi, with
`− ,
`is now
`responsible
`for processing L α×
`≤ ≤
`1
`0
`i N
`information nodes in the following order
`
`
` +
` +
`{ ,
`,
`2
`i i M i M
`
`(cid:1)
`
`,
`
` +
`,
`i
`
`(
`
`−
`
`1)
`
`M
`
`;
`
`
`
`i
`
`+ + +
`1
`1,
`i
`
`M
`
`(cid:1)
`
`,
`
` + +
`1 (
`,
`i
`
`−
`
`1)
`
`M
`
`;
`
`(cid:1)
`
`;
`
`,
`
`(7)
`
`α α
`
`(cid:1)
`+ − + − +
` + − +
`
`α
`1
`1,
`1 (
`,
`,
`i L
`i L
`M
`i L
`and L q× check and parity nodes, {
`= × × .
`i L q
`
`−
`
`1)
`
`M
`
`}
`
`j
`
`(cid:1)
` +
`1,
`,
`j
`
` + × −
`,
`j L q
`
`}
`1
`
`, with
`
`j
`
`
`
`IV. SYNTHESIS RESULTS
`
`The architecture of Fig. 2 was synthesized on Virtex-II Pro
`FPGAs (XC2VP) from Xilinx. For XC2VPxx family it is
`necessary to use a factor
`8L = (45 FU’s) due to internal
`memory limitations. In fact, synthesis results show that it is
`mandatory to use at least the FPGA XC2VP50 in order to
`guarantee the minimum memory resources required to
`implement all code rates and
`lengths. However,
`this
`particular choice uses less than 50% of the FPGA available
`slices. Using external memory, it would be possible to choose
`the lower cost FPGA XC2VP30.
`The XC2VP100 FPGA allows the implementation of the
`architecture of Fig. 2 with 90 FUs, which doubles the
`throughput.
`
`
`
`V. CONCLUSIONS
`
`This paper addresses the generalization of a state-of-the-art
`M-kernel parallel
`structure
`for LDPC-IRA DVB-S2
`decoding, for any integer factor of
` by mean of sub-
`M =
`360
`sampling, keeping unchanged the efficient message memory
`mapping structure without addressing overheads. This
`architecture proves to be flexible and easily reconfigurable
`
`
`0004
`
`