`
`1
`
`Context-Based Adaptive Binary Arithmetic Coding
`in the H.264/AVC Video Compression Standard
`
`Detlev Marpe, Member, IEEE, Heiko Schwarz, and Thomas Wiegand
`
`Abstract—Context-based Adaptive Binary Arithmetic Cod-
`ing (CABAC) as a normative part of the new ITU-T | ISO/IEC
`standard H.264/AVC for video compression is presented. By
`combining an adaptive binary arithmetic coding technique
`with context modeling, a high degree of adaptation and redun-
`dancy reduction is achieved. The CABAC framework also in-
`cludes a novel low-complexity method for binary arithmetic
`coding and probability estimation that is well suited for effi-
`cient hardware and software implementations. CABAC sig-
`nificantly outperforms the baseline entropy coding method of
`H.264/AVC for the typical area of envisaged target applica-
`tions. For a set of test sequences representing typical material
`used in broadcast applications and for a range of acceptable
`video quality of about 30 to 38 dB, average bit-rate savings of 9
`to 14% are achieved.
`
`Index Terms—CABAC, entropy coding, context modeling,
`binary arithmetic coding, H.264, MPEG-4 AVC.
`
`N
`
`INTRODUCTION
`I.
`ATURAL camera-view video signals show non-
`stationary statistical behavior. The statistics of these
`signals largely depend on the video content and the acquisi-
`tion process. Traditional concepts of video coding that rely
`on a mapping from the video signal to a bitstream of vari-
`able length-coded syntax elements exploit some of the non-
`stationary characteristics but certainly not all of it. More-
`over, higher-order statistical dependencies on a syntax ele-
`ment level are mostly neglected in existing video coding
`schemes. Designing an entropy coding scheme for a video
`coder by taking into consideration these typically observed
`statistical properties, however, offers room for significant
`improvements in coding efficiency.
`Context-based Adaptive Binary Arithmetic Coding
`(CABAC) is one of the two entropy coding methods of the
`new
`ITU-T | ISO/IEC
`standard
`for video
`coding,
`H.264/AVC [1],[2]. The algorithm was first introduced in a
`rudimentary form in [7] and evolved over a period of suc-
`cessive refinements [8]−[17]. In this paper, we present a de-
`scription of the main elements of the CABAC algorithm in
`its final, standardized form as specified in [1]. Unlike the
`specification in [1], the presentation in this paper is in-
`
`Manuscript received May 21, 2003.
`The authors are with the Fraunhofer Institute for Communications –
`Heinrich Hertz Institute, Berlin, Germany.
`
`tended to provide also some information on the underlying
`conceptual ideas as well as the theoretical and historical
`background of CABAC.
`Entropy coding in today’s hybrid block-based video cod-
`ing standards such as MPEG-2 [3], H.263 [4], and MPEG-4
`[5] is generally based on fixed tables of variable length
`codes (VLC). For coding the residual data in these video
`coding standards, a block of transform coefficient levels is
`first mapped onto a one-dimensional list using an inverse
`scanning pattern. This list of transform coefficient levels is
`then coded using a combination of run-length and variable
`length coding. Due to the usage of variable length codes,
`coding events with a probability greater than 0.5 cannot be
`efficiently represented and hence, a so-called alphabet ex-
`tension of “run” symbols representing successive levels
`with value zero is used in the entropy coding schemes of
`MPEG-2, H.263, and MPEG-4. Moreover, the usage of
`fixed VLC tables does not allow an adaptation to the actual
`symbol statistics, which may vary over space and time as
`well as for different source material and coding conditions.
`Finally, since there is a fixed assignment of VLC tables and
`syntax elements, existing inter-symbol redundancies cannot
`be exploited within these coding schemes.
`Although, from a conceptual point-of-view, it is well
`known for a long time that all these deficiencies can be most
`easily resolved by arithmetic codes [23], little of this
`knowledge was actually translated into practical entropy
`coding schemes specifically designed for block-based hy-
`brid video coding. One of the first hybrid block-based video
`coding schemes that incorporate an adaptive binary arithme-
`tic coder capable of adapting the model probabilities to the
`existing symbol statistics was presented in [6]. The core of
`that entropy coding scheme was inherited from the JPEG
`standard (at least for coding of DCT coefficients) [25], and
`an adjustment of its modeling part to the specific statistical
`characteristics of typically observed residual data in a hy-
`brid video coder was not carried out. As a result, the per-
`formance of this JPEG-like arithmetic entropy coder in the
`hybrid block-based video coding scheme of [6] was not
`substantially better for inter-coded pictures than that of its
`VLC-based counterpart.
`The first and – until H.264/AVC was officially released –
`the only standardized arithmetic entropy coder within a hy-
`brid block-based video coder is given by Annex E of H.263
`[4]. Three major drawbacks in the design of that optional
`
`Realtime Adaptive Streaming LLC
`Exhibit 2012
`IPR2019-01035
`Page 1
`
`
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. X, NO. Y, MONTH 2003
`
`2
`
`arithmetic coding scheme can be identified. First, Annex E
`is applied to the same syntax elements as the VLC method
`of H.263 including the combined symbols for coding of
`transform coefficient levels. Thus, one of the fundamental
`advantages of arithmetic coding that a non-integer code
`length can be assigned to coding events is unlikely to be ex-
`ploited. Second, all probability models in Annex E of H.263
`are non-adaptive in the sense that their underlying probabil-
`ity distributions are assumed to be static. Although, multiple
`probability distribution models are defined and chosen in a
`frequency-dependent way for the combined symbols of run,
`level and “last” information, this conditioning does not re-
`sult in a significant gain in coding efficiency, since an adap-
`tation to the actual symbol statistics is not possible. Finally,
`the generic m-ary arithmetic coder used in Annex E in-
`volves a considerable amount of computational complexity,
`which may not be justified in most application scenarios,
`especially in view of the typically observed, small margins
`of coding gains.
`Entropy coding schemes based on arithmetic coding are
`quite frequently involved in the field of non block-based
`video coding. Most of these alternative approaches to video
`coding are based on the discrete wavelet transform (DWT)
`in combination with disparate methods of temporal predic-
`tion, such as overlapped block motion compensation, grid-
`based warping or motion-compensated temporal filtering
`[18],[19],[20]. The corresponding entropy coding schemes
`are often derived from DWT-based still image coding
`schemes
`like SPIHT [21] or other predecessors of
`JPEG2000 [35].
`In our prior work on wavelet-based hybrid video coding,
`which led to one of the proposals for the H.26L standardiza-
`tion [19], the entropy coding method of partitioning, ag-
`gregation and conditional coding (PACC) was developed
`[22]. One of its main distinguishing features is related to the
`partitioning strategy: Given a source with a specific alpha-
`bet size, for instance, quantized transform coefficients, it
`was found to be useful to first reduce the alphabet size by
`partitioning the range according to a binary selector, which
`e.g. in the case of transform coefficients would be typically
`given by the decision whether the coefficient is quantized to
`zero or not. In fact, range partitioning using binary selectors
`can be viewed as a special case of a binarization scheme,
`where a symbol of a non-binary alphabet is uniquely
`mapped to a sequence of binary decisions prior to further
`processing.
`This (somehow) dual operation to the aforementioned al-
`phabet extension, which in the sequel we will therefore refer
`to as alphabet reduction, is mainly motivated by the fact
`that it allows the subsequent modeling stage to operate more
`efficiently on this maximally reduced (binary) alphabet. In
`this way, the design and application of higher-order condi-
`tioning models is greatly simplified and, moreover, the risk
`of “overfitting” the model is reduced. As a positive side ef-
`fect, a fast table-driven binary arithmetic coder can be util-
`
`ized for the final arithmetic coding stage.
`The design of CABAC is in the spirit of our prior work.
`To circumvent the drawbacks of the known entropy coding
`schemes for hybrid block-based video coding such as An-
`nex E of H.263, we combine an adaptive binary arithmetic
`coding technique with a well-designed set of context mod-
`els. Guided by the principle of alphabet reduction, an addi-
`tional binarization stage is employed for all non-binary val-
`ued symbols. Since the increased computational complexity
`of arithmetic coding in comparison to variable length cod-
`ing is generally considered as its main disadvantage, great
`importance has been devoted to the development of an algo-
`rithmic design that allows efficient hardware and software
`implementations.
`For some applications, however, the computational re-
`quirements of CABAC may be still too high given today’s
`silicon technology. Therefore, the baseline entropy coding
`method of H.264/AVC [1] offers a different compression-
`complexity trade-off operating at reduced coding efficiency
`and complexity level compared to CABAC. It mostly relies
`on a single infinite-extended codeword set consisting of
`zero-order Exp-Golomb codes, which are used for all syntax
`elements except for the residual data. For coding the resid-
`ual data, a more sophisticated method called Context-
`Adaptive Variable Length Coding (CAVLC) is employed.
`In this scheme, inter-symbol redundancies are exploited by
`switching VLC tables for various syntax elements depend-
`ing on already transmitted coding symbols [1],[2]. The
`CAVLC method cannot provide an adaptation to the actu-
`ally given conditional symbol statistics. Furthermore, cod-
`ing events with symbol probabilities greater than 0.5 cannot
`be efficiently coded due to the fundamental lower limit of 1
`bit/symbol imposed on variable length codes. This restric-
`tion prevents the usage of coding symbols with a smaller al-
`phabet size for coding the residual data, which could allow
`a more suitable construction of contexts for switching be-
`tween the model probability distributions.
`The remainder of the paper is organized as follows. In
`Section II, we present an overview of the CABAC frame-
`work including a high-level description of its three basic
`building blocks of binarization, context modeling and bi-
`nary arithmetic coding. We also briefly discuss the motiva-
`tion and the principles behind the algorithmic design of
`CABAC. A more detailed description of CABAC is given
`in Section III, where the individual steps of the algorithm
`are presented in depth. Finally, in Section IV we provide
`experimental results to demonstrate the performance gains
`of CABAC relative to the baseline entropy coding mode of
`H.264/AVC for a set of interlaced video test sequences.
`
`II. THE CABAC FRAMEWORK
`Fig. 1 shows the generic block diagram for encoding a
`single syntax element in CABAC.1 The encoding process
`
`1 For simplicity and for clarity of presentation we restrict our exposition of
`CABAC to an encoder only view. In the text of the H.264/AVC standard
`
`Realtime Adaptive Streaming LLC
`Exhibit 2012
`IPR2019-01035
`Page 2
`
`
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. X, NO. Y, MONTH 2003
`
`3
`
`Fig. 1. CABAC encoder block diagram.
`
`consists of at most three elementary steps:
`
`1) binarization
`2) context modeling
`3) binary arithmetic coding
`
`In the first step, a given non-binary valued syntax element
`is uniquely mapped to a binary sequence, a so-called bin
`string. When a binary valued syntax element is given, this
`initial step is bypassed, as shown in Fig. 1. For each element
`of the bin string or for each binary valued syntax element,
`one or two subsequent steps may follow depending on the
`coding mode.
`In the so-called regular coding mode, prior to the actual
`arithmetic coding process the given binary decision, which,
`in the sequel, we will refer to as a bin, enters the context
`modeling stage, where a probability model is selected such
`that the corresponding choice may depend on previously
`encoded syntax elements or bins. Then, after the assignment
`of a context model the bin value along with its associated
`model is passed to the regular coding engine, where the fi-
`nal stage of arithmetic encoding together with a subsequent
`model updating takes place (see Fig. 1).
`Alternatively, the bypass coding mode is chosen for se-
`lected bins in order to allow a speedup of the whole encod-
`ing (and decoding) process by means of a simplified coding
`engine without the usage of an explicitly assigned model, as
`illustrated by the lower right branch of the switch in Fig. 1.
`In the following, the three main functional building
`blocks, which are binarization, context modeling, and bi-
`nary arithmetic coding, along with their interdependencies
`are discussed in more detail.
`
`A. Binarization
`1) General Approach
`For a successful application of context modeling and
`adaptive arithmetic coding in video coding we found that
`the following two requirements should be fulfilled:
`
`[1] itself, the converse perspective dominates – the standard normatively
`specifies only how to decode the video content without specifying how to
`encode it.
`
`i) a fast and accurate estimation of conditional probabili-
`ties must be achieved in the relatively short time inter-
`val of a slice coding unit
`ii) the computational complexity involved in performing
`each elementary operation of probability estimation
`and subsequent arithmetic coding must be kept at a
`minimum to facilitate a sufficiently high throughput of
`these inherently sequentially organized processes.
`
`To fulfill both requirements we introduce the important
`“pre-processing” step of first reducing the alphabet size of
`the syntax elements to encode. Alphabet reduction in
`CABAC is performed by the application of a binarization
`scheme to each non-binary syntax element resulting in a
`unique intermediate binary codeword for a given syntax
`element, called a bin string. The advantages of this ap-
`proach are both in terms of modeling and implementation.
`First, it is important to note that nothing is lost in terms of
`modeling, since the individual (non-binary) symbol prob-
`abilities can be recovered by using the probabilities of the
`individual bins of the bin string. For illustrating this aspect,
`let us consider the binarization for the syntax element
`mb_type of a P/SP slice.
`As depicted in Fig. 2 (left), the terminal nodes of the bi-
`nary tree correspond to the symbol values of the syntax
`element such that the concatenation of the binary decisions
`for traversing the tree from the root node to the correspond-
`ing terminal node represents the bin string of the corre-
`sponding symbol value. For instance, consider the value “3”
`of mb_type, which signals the macroblock type “P_8x8”,
`i.e., the partition of the macroblock into four 8×8 sub-
`macroblocks in a P/SP slice. In this case the corresponding
`bin string is given by “001”. As an obvious consequence,
`the symbol probability p(“3”) is equal to the product of the
`probabilities p(C0)(“0”), p(C1)(“0”) and p(C2)(“1”), where C0,
`C1 and C2 denote the (binary) probability models of the
`corresponding internal nodes, as shown in Fig. 2. This rela-
`tion is true for any symbol represented by any such binary
`tree, which can be deduced by the iterated application of the
`Total Probability Theorem [26].
`
`Realtime Adaptive Streaming LLC
`Exhibit 2012
`IPR2019-01035
`Page 3
`
`
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. X, NO. Y, MONTH 2003
`
`4
`
`ous example is quite demanding. However, typically meas-
`ured probability density functions (pdf) of prediction re-
`siduals or transformed prediction errors can be modeled by
`highly peaked Laplacian, generalized Gaussian or geometric
`distributions [28], where it is reasonable to restrict the esti-
`mation of individual symbol statistics to the area of the
`largest statistical variations at the peak of the pdf. Thus, if,
`for instance, a binary tree resulting from a Huffman code
`design would be chosen as a binarization for such a source
`and its related pdf, only the nodes located in the vicinity of
`the root node would be natural candidates for being mod-
`eled individually, whereas a joint model would be assigned
`to all nodes on deeper tree levels corresponding to the “tail”
`of the pdf. Note that this design is different from the exam-
`ple given in Fig. 2, where each (internal) node has its own
`model.
`In the CABAC framework, typically, only the root node
`would be modeled using higher-order conditional probabili-
`ties. In the above example this would result for a 2nd order
`model in only 4 different binary probability models instead
`of m2 different m-ary probability models with m = 256.
`
`2) Design of CABAC Binarization Schemes
`As already indicated above, a binary representation for a
`given non-binary valued syntax element provided by the bi-
`narization process should be close
`to a minimum-
`redundancy code. On the one hand, this allows to easily ac-
`cessing the most probable symbols by means of the binary
`decisions located at or close to the root node for the subse-
`quent modeling stage. On the other hand, such a code tree
`minimizes the number of binary symbols to encode on the
`average, hence minimizing the computational workload in-
`duced by the binary arithmetic coding stage.
`However, instead of choosing a Huffman tree for a given
`training sequence, the design of binarization schemes in
`CABAC (mostly) relies on a few basic code trees, whose
`structure enables a simple on-line computation of all code
`words without the need for storing any tables. There are
`four such basic types: the unary code, the truncated unary
`code, the kth order Exp-Golomb code and the fixed-length
`code. In addition, there are binarization schemes based on a
`concatenation of these elementary types. As an exception of
`these structured types, there are five specific, mostly un-
`structured binary trees that have been manually chosen for
`the coding of macroblock types and sub-macroblock types.
`Two examples of such trees are shown in Fig. 2.
`In the remaining part of this section, we explain in more
`detail the construction of the four basic types of binarization
`and its derivatives.
`Unary and Truncated Unary Binarization Scheme: For
`each unsigned integer valued symbol x ≥ 0 the unary code
`word in CABAC consists of x “1” bits plus a terminating
`“0” bit. The truncated unary (TU) code is only defined for x
`with 0 ≤ x ≤ S, where for x < S the code is given by the
`unary code, whereas for x = S the terminating “0” bit is ne-
`glected such that the TU code of x = S is given by codeword
`
`Fig. 2. Illustration of the binarization for mb_type (left) and sub_mb_type
`(right) both for P/SP slices.
`
`Although at this stage nothing seems to be gained, there
`is already the advantage of using a binary arithmetic coding
`engine on the bin string instead of an m-ary arithmetic coder
`operating on the original m-ary source alphabet. Adaptive
`m-ary arithmetic coding (for m > 2) is in general a computa-
`tionally complex operation requiring at least two multiplica-
`tions for each symbol to encode as well as a number of
`fairly complex operations to perform the update of the
`probability estimation [36]. In contrast to that, there are fast,
`multiplication-free variants of binary arithmetic coding, one
`of which was specifically developed for the CABAC
`framework, as further described below. Since the probabil-
`ity of symbols with larger bin strings is typically very low,
`the computational overhead of coding all bins of that bin
`string instead of using only one pass in an m-ary arithmetic
`coder is fairly small and can be easily compensated by using
`a fast binary coding engine.
`Finally, as the most important advantage, binarization en-
`ables context modeling on a sub-symbol level. For specific
`bins, which, in general, are represented by the most fre-
`quently observed bins, conditional probabilities can be
`used, whereas other, usually less frequently observed bins
`can be treated using a joint, typically zero-order probability
`model. Compared to the conventional approach of using
`context models in the original domain of the source with
`typically large alphabet size (like e.g. components of motion
`vector differences or transform coefficient levels) this addi-
`tional freedom in the design offers a flexible instrument for
`using higher-order conditional probabilities without suffer-
`ing from context “dilution” effects. These effects are often
`observed in cases, where a large number of conditional
`probabilities have to be adaptively estimated on a relatively
`small (coding) time interval, such that there are not enough
`samples to reach a reliable estimate for each model.2
`For instance, when operating in the original alphabet do-
`main, a quite moderately chosen 2nd order model for a given
`syntax element alphabet of size m = 256 will result in the in-
`tractably large number of 2562 ⋅ (256 − 1) ≈ 224 symbol
`probabilities to be estimated for that particular syntax ele-
`ment only. Even for a zero-order model, the task of tracking
`255 individual probability estimates according to the previ-
`
`2 A more rigorous treatment of that problem can be found in [23][24].
`
`Realtime Adaptive Streaming LLC
`Exhibit 2012
`IPR2019-01035
`Page 4
`
`
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. X, NO. Y, MONTH 2003
`
`5
`
`Bin string
`
`while(1) {
`TABLE I
` // unary prefix part of EGk
`UEG0 BINARIZATION FOR ENCODING OF ABSOLUTE VALUES OF
` if (x >= (1<<k) ) {
`TRANSFORM COEFFICIENT LEVELS
`put( 1 )
`x = x – (1<<k)
`Abs.
` k++
`value
`TU prefix
`EG0 suffix
` } else {
` put( 0 ) // terminating “0” of prefix part
`1 0
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` while(k--) // binary suffix part of EGk
`2 1 0
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` put( (x>>k) & 0x01 )
`3 1 1 0
`
`
`
`
`
`
`
`
`
` break
`4 1 1 1 0
` }
`5 1 1 1 1 0
`}
`
`...
`. . . . . .
`Fig. 3. Construction of kth order Exp-Golomb (EGk) code for a given un-
`
`
`
`
`
`
`
`
`...
`. . . . . . .
`.
`.
`.
`.
`.
`signed integer symbol x.
`13 1 1 1 1 1 1 1 1 1 1 1 1 0
`14 1 1 1 1 1 1 1 1 1 1 1 1 1 0
`15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
`16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
`17 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
`18 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
`19 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1
`20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0
`.
`...
`. . . . . . .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`bin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
`
`consisting of x “1” bits only.
`kth order Exp-Golomb Binarization Scheme: Exponen-
`tial Golomb codes were first proposed by Teuhola [29] in
`the context of run-length coding schemes. This parameter-
`ized family of codes is a derivative of Golomb codes, which
`have been proven to be optimal prefix-free codes for geo-
`metrically distributed sources [30]. Exp-Golomb codes are
`constructed by a concatenation of a prefix and a suffix code
`word. Fig. 3 shows the construction of the kth order Exp-
`Golomb (EGk) code word for a given unsigned integer sym-
`bol x. The prefix part of the EGk code word consists of a
`
`.)1
`=
`+
`unary code corresponding to the value
`k
` )(xl
`2/
`(
`log
`
`x
`2
`The EGk suffix part is computed as the binary representa-
`tion of x + 2k ( 1 − 2l(x) ) using k + l( x ) significant bits, as
`can be seen from the pseudo-C code in Fig. 3.
`Consequently, for the EGk binarization the number of
`symbols having the same code length of k + 2 · l( x ) + 1 is
`geometrically growing. By inverting Shannon’s relationship
`between ideal code length and symbol probability, we can
`e.g. easily deduce that EG0 is the optimal code for a pdf
`p(x) = ½ · (x + 1)−2 with x ≥ 0. This implies that for an ap-
`propriately chosen parameter k the EGk code represents a
`fairly good first-order approximation of the ideal prefix-free
`code for tails of typically observed pdfs, at least for syntax
`elements that are representing prediction residuals.
`Fixed-Length Binarization Scheme: For the applica-
`tion of fixed-length (FL) binarization, a finite alphabet of
`values of the corresponding syntax element is assumed. Let
`x denote a given value of such a syntax element, where
`0 ≤ x < S. Then, the FL code word of x is simply given by
`the binary representation of x with a fixed (minimum) num-
`=
`
`.
` of bits. Typically, FL binarization is ap-
`ber
`log2 S
`lFL
`plied to syntax elements with a nearly uniform distribution
`or to syntax elements, where each bit in the FL binary rep-
`
`resentation represents a specific coding decision as e.g. in
`the part of the coded block pattern symbol related to the lu-
`minance residual data.
`Concatenation of Basic Binarization Schemes: From
`the basic binarization schemes as described above three
`more binarization schemes are derived. The first one is a
`concatenation of a 4-bit FL prefix as a representation of the
`luminance related part of the coded block pattern and a TU
`suffix with S = 2 representing the chrominance related part
`of coded_block_pattern.
`Both the second and third concatenated scheme are de-
`rived from the TU and the EGk binarization. These
`schemes, which are referred as Unary / kth order Exp-
`Golomb (UEGk) binarizations, are applied to motion vector
`differences and absolute values of transform coefficient lev-
`els. The design of these concatenated binarization schemes
`is motivated by the following observations. First, the unary
`code is the simplest prefix-free code in terms of implemen-
`tation cost. Secondly, it permits a fast adaptation of the in-
`dividual symbol probabilities in the subsequent context
`modeling stage, since the arrangement of the nodes in the
`corresponding tree is typically such that with increasing dis-
`tance of the internal nodes from the root node the corre-
`sponding binary probabilities are less skewed.3 These ob-
`servations are only accurate for small values of the absolute
`motion vector differences and transform coefficient levels.
`For larger values, there is not much use of an adaptive mod-
`eling leading to the idea of concatenating an adapted trun-
`cated unary tree as a prefix and a static Exp-Golomb code
`tree as a suffix. Typically, for larger values, the EGk suffix
`part represents already a fairly good fit to the observed
`probability distribution, as already mentioned above. Thus,
`it is reasonable to speedup the encoding of the bins related
`to the EGk suffix part in CABAC by using the fast bypass
`coding engine for uniformly distributed bins, as further de-
`scribed in Section III.D.
`For motion vector differences UEGk binarization is con-
`structed as follows. Let us assume the value mvd of a mo-
`tion vector component is given. For the prefix part of the
`UEGk bin string, a TU binarization is invoked using the ab-
`solute value of mvd with a cut-off value of S = 9. If mvd is
`equal to zero, the bin string consists only of the prefix code
`word “0”. If the condition |mvd| ≥ 9 holds, the suffix is con-
`structed as an EG3 codeword for the value of |mvd| − 9, to
`which the sign of mvd is appended using the sign bit “1” for
`a negative mvd and the sign bit “0” otherwise. For mvd val-
`ues with 0 < |mvd| < 9, the suffix consists only of the sign
`bit. Noting that the component of a motion vector difference
`represents the prediction error at quarter-sample accuracy,
`the prefix part always corresponds to a maximum error
`component of ±2 samples. With the choice of the Exp-
`Golomb parameter k = 3, the suffix code words are given
`
`3 The aspect of a suitable ordering of nodes in binary trees for optimal
`modeling and fast adaptation has been addressed in [31], although in a
`slightly different context.
`
`Realtime Adaptive Streaming LLC
`Exhibit 2012
`IPR2019-01035
`Page 5
`
`
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. X, NO. Y, MONTH 2003
`
`6
`
`such that a geometrical increase of the prediction error in
`units of 2 samples is captured by a linear increase in the
`corresponding suffix code word length.
`UEGk binarization of absolute values of transform coef-
`ficient levels (abs_level) is specified by the cut-off value
`S = 14 for the TU prefix part and the order k = 0 for the
`EGk suffix part. Note that the binarization and subsequent
`coding process is applied to the syntax element co-
`eff_abs_value_minus1 = abs_level − 1, since zero valued
`transform coefficient levels are encoded using a significance
`map, as described in more detail in Section III.B below. The
`construction of a bin string for a given value of co-
`eff_abs_value_minus1 is similar to the construction of
`UEGk bin strings for the motion vector difference compo-
`nents except that no sign bit is appended to the suffix. Ta-
`ble I shows the corresponding bin strings for values of
`abs_level from 1 to 20, where the prefix parts are high-
`lighted in gray shaded columns.
`
`B. Context Modeling
`One of the most important properties of arithmetic coding
`is the possibility to utilize a clean interface between model-
`ing and coding such that in the modeling stage, a model
`probability distribution is assigned to the given symbols,
`which then, in the subsequent coding stage, drives the actual
`coding engine to generate a sequence of bits as a coded rep-
`resentation of the symbols according to the model distribu-
`tion. Since it is the model that determines the code and its
`efficiency in the first place, it is of paramount importance to
`design an adequate model that explores the statistical de-
`pendencies to a large degree and that this model is kept “up
`to date” during encoding. However, there are significant
`model costs involved by adaptively estimating higher-order
`conditional probabilities.
`Suppose a pre-defined set T of past symbols, a so-called
`context template, and a related set C = {0,…, C−1} of con-
`texts is given, where the contexts are specified by a model-
`ing function F: T→ C operating on the template T. For each
`symbol x to be coded, a conditional probability p( x|F( z ) )
`is estimated by switching between different probability
`models according to the already coded neighboring symbols
`z ∈ T. After encoding x using the estimated conditional
`probability p( x|F( z ) ), the probability model is updated
`with the value of the encoded symbol x. Thus, p( x|F( z ) ) is
`estimated on the fly by tracking the actual source statistics.
`Since the number τ of different conditional probabilities to
`be estimated for an alphabet size of m is equal to
`τ = C ⋅ (m − 1), it is intuitively clear that the model cost,
`which represents the cost of “learning” the model distribu-
`tion, is proportional to τ.4 This implies that by increasing
`the number C of different context models, there is a point,
`where overfitting of the model may occur such that inaccu-
`
`4 Rissanen derived a refinement of that model cost measure by also tak-
`ing into account that the precision of estimating the probabilities increases
`with the number of observations [24].
`
`rate estimates of p( x|F( z ) ) will be the result.
`In CABAC, this problem is solved by imposing two se-
`vere restrictions on the choice of the context models. First,
`very limited context templates T consisting of a few
`neighbors of the current symbol to encode are employed
`such that only a small number of different context models C
`is effectively used. Second, as already motivated in the last
`section, context modeling is restricted to selected bins of the
`binarized symbols. As a result, the model cost is drastically
`reduced, even though the ad-hoc design of context models
`under these restrictions may not result in the optimal choice
`with respect to coding efficiency. In fact, in a recently con-
`ducted research, it has been shown that additional gains can
`be obtained by applying the novel GRASP algorithm for an
`optimized selection of context models using larger context
`templates within the CABAC framework [31]. However, the
`improvements are quite moderate compared to the drastic
`increase in complexity required for performing the two-pass
`GRASP algorithm.
`
`B
`
`A C
`
`Fig. 4. Illustration of a context templ