`estimation
`for the Q-Coder
`
`by W. B. Pennebaker
`J. L. Mitchell
`
`The Q-Coder is an important new development
`in binary arithmetic coding. It combines a simple
`but efficient arithmetic approximation for the
`multiply operation, a new formalism which yields
`optimally efficient hardware and software
`implementations, and a new technique for
`estimating symbol probabilities which matches
`the performance of any method known. This
`paper describes the probability-estimation
`technique. The probability changes are
`estimated solely from renormalizations in the
`coding process and require no additional
`counters. The estimation process can be
`implemented as a finite-state machine, and is
`simple enough to allow precise theoretical
`modeling of single-context coding. Approximate
`models have been developed for a more
`complex multi-rate version of the estimator and
`for mixed-context coding. Experimental studies
`verifying the modeling and showing the
`performance achieved for a variety of image-
`coding models are presented.
`
`1. Introduction
`Arithmetic coding, introduced several years ago by Rissanen
`[ 11 and Pasco [2] and generalized by Langdon and Rissanen
`[3] (see Langdon [4] for a comprehensive review article), is a
`
`Vopyright 1988 by International Business Machines Corporation.
`Copying in printed form for private use is permitted without
`payment of royalty provided that ( I ) each reproduction is done
`without alteration and (2) the Journal reference and IBM copyright
`notice are included on the first page. The title and abstract, but no
`other portions, of this paper may be copied or distributed royalty
`free without further permission by computer-based and other
`information-service systems. Permission to republish any other
`portion of this paper must be obtained from the Editor.
`
`powerful technique for coding of strings of data symbols. It
`derives its power from an ability to approach the entropy
`limit in coding efficiency and to dynamically alter the
`estimate of the probability of the symbol being encoded.
`A new binary arithmetic coding system, the Q-Coder, has
`been developed as a joint effort between the authors of this
`paper and colleagues at the IBM Almaden Research Center.
`The new probability-estimation technique used in the Q-
`Coder is presented in this paper; companion papers describe
`the basic principles of the Q-Coder [ 5 ] , software
`implementations of the Q-Coder [6], and the arithmetic
`coding procedures which allow compatible yet optimal
`hardware and software structures [7, 81. The Q-Coder is part
`of a proposal submitted to the CCITT and IS0 Joint
`Photographic Experts Group (JPEG) for color photographic
`image compression [9].
`A description of the general structure of the Q-Coder
`arithmetic coding section is given in [5, 61. Briefly, the
`arithmetic coder contains two key registers, the interval
`register A and the code register C. The interval register
`contains the measure of the current probability interval, and
`the code register contains a pointer to the interval. In order
`to use fixed-precision integer arithmetic for the coding
`process, the interval and code registers must be periodically
`renormalized.
`When a given symbol is coded, the interval measure in A
`is reduced to the subinterval for that symbol, and the code
`string is repositioned to point within the subinterval. Ideally,
`the scaling of the interval is done by multiplying the current-
`interval measure A by the probability estimate of the symbol
`which occurred. If the less probable symbol (LPS or L)
`
`probability q is estimated as a and the more probable
`symbol (MPS or M) probability p is estimated as 1 - a, the
`subintervals, A X a and A - (A X Qe). The multiplication
`
`binary coding process divides the interval into two
`
`IBM J. RES. DEVELOP, VOL. 32 NO. 6 NOVEMBER 1988
`
`W. B. PENNEBAKER AND I. L. MITCHELL
`
`Unified Patents, Ex. 1013
`
`000001
`
`
`
`for mixed contexts and a random-interval model. Theory
`and experiments are compared for a single context in
`Section 5. Section 6 extends the probability estimation to a
`multi-rate system. Mixed-context coding is analyzed in
`Section 7.
`
`2. Estimation process
`The basic concept is as follows: The estimated LPS
`probability, Q,, is taken from a fixed table of allowed values.
`Renormalization occurs either when an L event is
`encountered or when the interval falls below 0.75 following
`an M event. When renormalization after an LPS is
`encountered, the index to the current Q, is shifted to a larger
`Q,. Conversely, whenever the MPS renormalization is
`encountered, the index is shifted to a smaller Q,. (The terms
`MPS renormalization and LPS renormalization are usually
`abbreviated as “MPS renorm” and “LPS renorm” in the text
`following.)
`The following approximate calculation suggests that the
`estimate of the probability obtained from the table of
`allowed Q, values will adapt to and closely approach the true
`LPS probability q of a binary symbol sequence. Given a
`starting value A for the interval register immediately
`following the last renormalization, N successive MPS events
`must occur to reach the MPS renorm point:
`N = 1 + [AA/Q,l,
`where Q, is the current estimated value of q, AA is the
`change in the interval (0.75 > AA 2 0), and the brackets
`denote the greatest integer function (rounding down to the
`nearest integer). The probability of getting N MPS events in
`a row (and an MPS renorm) is
`p,,, = (1 - 4 ) *
`(2)
`For simplicity, consider the case where q is small. Taking the
`natural logarithm of Equation (2) and approximating
`In (1 - 4 ) by -q,
`e - W d / e . )
`Pmmr
`The magnitude of AA is dependent on the type of
`renormalization. If the MPS renorm occurred last, AA is
`close to 0.75; if the LPS renorm occurred last, AA is
`typically somewhat smaller than 0.75. If the effective value
`of AA is assumed to be an appropriate average and the
`change in Q, is the same for both types of renorms, the
`renorm probabilities are balanced at the point where
`
`(1)
`
`(3)
`
`N
`
`can be avoided by introducing a tight constraint on the
`renormalization [3-5, 101. If the probability-interval measure
`A falls within the bounds 0.75 5 A < 1.5, A can be
`approximated by 1 when multiplying by $. The
`subintervals are then approximated by Q, and A - Q,, and
`the multiplication is avoided.
`Although the renormalization is required for the
`arithmetic approximation, it can serve another important
`purpose-it can also be used to estimate the probability of
`the symbol being coded.
`A number of different but somewhat related techniques
`have been used to estimate symbol probabilities. Langdon
`and Rissanen [ 1 11 and Pennebaker and Mitchell [ 121 have
`both used confidence-interval techniques to determine
`whether the current estimate Q, of the LPS probability
`should be changed. In [ 121, the degree to which the
`confidence limit is exceeded is used to determine the degree
`to which Q, should be changed; this gives a multi-rate
`estimation process. Goertzel and Mitchell [ 131 used a
`counting technique with periodic renormalization; although
`in principle a divide operation is required, the precision is
`small enough that a lookup table inversion and multiply can
`be used. Mohiuddin, Rissanen, and Wax [ 141 devised an
`intriguing multi-rate adapter in which the choice of
`estimation rate is based on a local minimization of the code
`string being generated. Although relatively complex to
`implement, this technique is quite powerful; we regard it as a
`standard against which other estimation techniques can be
`measured. Finally, Helman et al. [ 151 used a Monte Carlo
`technique involving the LPS renormalization and symbol
`counts for updating the estimate of the LPS probability in
`the Skew Coder [3].
`The rest of this paper is devoted to an analysis of a
`probability-estimation technique in which the probability is
`estimated solely from renormalization.’ The renewed
`attempt to use renormalization as the basis for probability
`estimation was inspired by earlier work on the Log Coder
`[ 121. In that work the computations for probability
`estimation were minimized by estimating probability each
`time one byte of compressed data was generated. While this
`system proved to be simple to implement and provided
`accurate estimates, it failed to adapt quickly enough in
`coding of facsimile data sets. On the other hand, calculating
`a new probability after the coding of each symbol, as was
`done by Rissanen and Mohiuddin [lo], provided good
`estimates and fast adaptation but involved far too much
`computation. Estimation after each renormalization
`appeared to be an attractive compromise between these two
`schemes.
`In Section 2 the estimation process is described. Section 3
`develops the exact theoretical modeling of that process for a
`single context. Section 4 continues the theoretical modeling
`’ In unpublished work G. Goertzel and J. L. Mitchell explored and abandoned this
`possibility because they were unable
`to obtain good coding efficiencies.
`
`738
`
`The equilibrium is stable at this balance point. If Qe is too
`
`W. B. PENNEBAKER AND 1. L. MITCHELL
`
`IBM J. RES. DEVELOP. VOL. 32 NO. 6 NOVEMBER 1988
`
`Unified Patents, Ex. 1013
`
`000002
`
`
`
`is also large and the system tends to move to
`large, P,,,
`smaller Q,. Conversely, if Q, is too small, P,,,
`is small and
`the system tends to move to larger Q,. Therefore, the system
`adapts to and balances approximately at the point q = Q, .
`Although these calculations are approximate, exact
`calculations which follow the same general approach and
`prove the point more rigorously are described below.
`As will be seen, the coding efficiencies achieved by the
`Q-Coder with this probability estimator for pseudorandom
`data sets are usually not as good as those obtainable with
`simple estimates of probability from counts [ 131. Coding
`inefficiency is due partly to the lower coding efficiency
`inherent in the arithmetic approximation to the multiply,
`partly to small but systematic errors in Q,, partly to the
`granularity of the set of allowed values of Q,, and partly to
`the intrinsic distribution in Q, resulting from the stochastic
`estimation process. However, the estimation process tracks
`variations in symbol probability very well. Consequently, the
`coding efficiency achieved with the less stable symbol
`probabilities encountered in many real coding environments
`is extremely good, competitive with the best that can be
`done by any technique currently known.
`This estimation process works well for both single-context
`and mixed-context coding. For a single-context system, the
`renormalization process is used to estimate only one
`probability. For mixed-context coding, the coding decisions
`are conditioned by past history, and a different probability
`must be estimated for each conditioning state or context. It
`is perhaps somewhat unexpected that renormalization of a
`single A register can be used to estimate the many different
`probabilities required in the mixed-context case.
`
`3. Modeling of the estimation process for a
`single context
`The estimator can be defined as a finite-state machine, that
`is, a table of Q, values and associated next states for each
`type of renorm (i.e., new table positions). The rate of change
`of Q, is determined by the granularity of the table of Q,
`values and by the new state associated with each Q, value for
`the two types of renorms. Figure 1 diagrams sections of the
`actual finite-state machine used to estimate the probabilities.
`The leftmost section illustrates the exchange of MPS and
`LPS definitions at Qe z 0.5 (kex is defined to be the particular
`state index, k, where this exchange occurs). The center
`section shows a region where the finite-state machine
`changes from a single-state jump on LPS to a double-state
`jump. Some parts of the finite-state machine require a jump
`of more than one state in order to correctly estimate the
`probability. The rightmost section shows the diagram for the
`smallest values of Q,. This last section shows how the
`transition at the MPS renorm for the smallest Q, value is
`returned to that state.
`Conditional changes in estimated probability, such as
`changing Q, only after the occurrence of two MPS renorms
`in a row, can readily be incorporated by allowing multiple
`
`entries of a given Qe value. A related form of this can be
`seen in the diagram for the lowest Q, state, where entry to
`that state can only occur after two MPS renorms in
`sequence. Handling conditional effects in this manner greatly
`simplifies the theoretical treatment, the only penalty being
`the need to solve a relatively large number of simultaneous
`equations when complex conditional structures are being
`considered.
`Figure 2 illustrates the sequencing of the probability
`estimator for an LPS followed by a sequence of MPSs. In
`Figure 2 the ordinate is the interval (A-register) value, and
`the abscissa is the discrete allowed values of Q, . The LPS
`renormalization causes a transition to a known A-register
`value and a known state in the finite-state machine (in this
`case from a
`of 0.42206 to the appropriate starting A-
`register value at Q, = 0.46893. (The particular Q, values in
`the figure are taken from the actual optimized 5-bit Q,
`values in Table 1, shown later. As MPSs are coded, the
`interval decays until it drops below 0.75. At that point a
`transition is made to a smaller Q,, and the interval is
`renormalized by doubling until it is greater than 0.75. In
`most cases only one doubling is needed. Thus, the pair of
`doublings shown at Q, = 0.32831 is the exception rather
`than the rule. Whether one or two doublings occur is of no
`consequence for the probability estimation. However, since
`each doubling produces one bit in the code string (ignoring
`bit stuffing for a carry), the extra doubling is important in
`the calculation of the coding efficiency.
`Figure 2 illustrates the sequencing behavior which
`underlies the calculation of the estimation process. The first
`half of the problem is determining the probability that the
`estimate will be at each of the allowed Q, values. If we define
`n, as the occupation probability for the state corresponding
`to Q,[k] (the kth allowed value of e,), balance of transition
`probabilities into and out of the kth state gives
`
`where X , = q for k 2 k,, and X, = 1 - q for k < kex. The
`symbols rkj and tkj are defined below. The table of allowed
`Q, is defined to have mirror symmetry at the boundary
`between keX and k,, - 1. Thus, k,, is the index in the table of
`Q, where the definitions of least probable symbol and most
`probable symbol are exchanged. (For k < kex the table
`provides an estimate of 1 - q rather than q.)
`The first term in the summation represents the transition
`probability into the state at Q,[k] from all states j which can
`reach the state k by an LPS followed by a sequence of MPSs.
`The exponent tkj is the number of MPSs needed to just enter
`the kth state when starting from an LPS at statej. Thus, for
`the example sketched in Figure 2, state j is marked with an
`asterisk, and state k could be any one of the states which is
`reached by the MPS sequence following the LPS.
`The second term in the summation represents the
`transition probability out of the state k, given that the MPS
`
`IBM I. RES. DEVELOP. VOL. 32 NO. 6 NOVEMBER 1988
`
`W. B. PENNEBAKER AND I. L. MITCHELL
`
`739
`
`Unified Patents, Ex. 1013
`
`000003
`
`
`
`e
`
`kex - 1
`
`MPS
`
`LPS
`
`
`
`
`
`LPS
`
`MPS
`
`*MPS exchange
`
`MPS
`
`MPS
`
`LPS*
`
`MPS
`
`MPS
`
`MPS
`
`MPS
`
`Sections of the state diagram for the probability estimator.
`
`e
`
`MPS
`
`MPS
`
`MPS
`
`MPS
`
`MPS
`
`MPS
`
`sequence continues until the interval decays below 0.75 for
`the kth state. The exponent rkj is the number of MPSs
`needed to just leave the kth state, counting from the symbol
`after the LPS at state j.
`The summation therefore represents the net gain in nk due
`to transitions from all statesj which can reach state k
`through an LPS followed by a sequence of MPSs. It is
`equated to the probability of transition out of state k due to
`an LPS event. (All probabilities are per-symbol encoded.)
`The probability of transition out of state k via the LPS path
`is the probability of the LPS multiplied by the occupation
`probability nk.
`Normalization requires
`1 nj= 1.
`
`(7 )
`
`j
`The numerical solution of these equations can present
`problems, in that the nk can be vanishingly small when the
`index k is far from
`the value for the most probablGvalue of
`
`Q,. Therefore, the equations are reduced to a subset
`involving only the nk near the most probable value of a.
`
`Contributions from members outside this range are assumed
`to be zero. The set of equations must be large enough that
`the error in truncating the set is small, yet small enough to
`avoid arithmetic precision problems in the calculation of
`determinants by the method of Gaussian elimination. Except
`table, the center value of k for the
`near the end of the
`subset is defined as the index for which Q,(k] is closest to q.
`Because the table of allowed values of Q, is finite in
`extent, the equations must be reformulated to take end
`conditions into account. This is done by assuming that
`either tkj = 0 or rkj = 00 in Equation (6). The latter condition
`exactly describes the closure of the state diagram for k,,
`in
`Figure 1, in that the system cannot exit from the k,, state
`after the MPS renorm. It also approximates the closure if the
`equation set is truncated before k reaches kma. Truncation
`near the other endpoint, k = kmin, is described by one of the
`two approximations, the choice depending on whether a
`
`740
`
`W. E. PENNEBAKER AND J. L. MITCHELL
`
`IBM J. RES. DEVELOP. VOL. 32 NO. 6 NOVEMBER 1988
`
`Unified Patents, Ex. 1013
`
`000004
`
`
`
`particular k is less than the exchange index, k,,, or not. The
`two assumptions are equivalent to assuming that either LPS
`or MPS renormalization is highly unlikely. Note that the
`approximate closure condition at the smallest value of k is
`not needed, as it is replaced by Equation (7).
`Given a table of Q, values and associated index changes to
`new Q, values for each renormalization path, these equations
`provide an exact solution of the probability of the system
`being at each Q,[k]. However, they hold only for single-
`context coding.
`The second half of the problem is the calculation of
`coding rate. Refer again to Figure 2. The current occupation
`of each state in the system is determined by the balance of
`LPS and MPS transitions into and out of that state. For each
`0.30487
`0.32831
`
`Q, the probability of the LPS renormalization is known by
`definition (q), and the probability of each succeeding MPS
`renormalization is readily calculated. The bits generated by
`each renormalization are also readily calculated. The net bit
`rate R, for the kth state is thus
`
`Rk = nkXk BLPS,k + x
`
`BMp&j
`
`I
`
`>
`
`(' -
`
`(8)
`
`[
`j
`
`where j ranges over all MPS renormalizations which can
`occur following the LPS renorm. BLPS,, is the number of bits
`generated in renormalizing Q,[k] to the allowed interval
`range. BMps,j is the number of bits generated by the jth MPS
`renorm. As defined earlier, X , = q for k 2 k,,, X, = 1 - q for
`k < k,,, and the exponent rkj is the number of MPSs needed
`to reach the jth MPS renorm after the LPS event from state
`k.
`The total coding rate in bits per symbol is therefore
`R = C R , .
`k
`
`4. Mixed contexts: The random-interval model
`The calculations in Section 3 are not applicable to coding of
`mixed-context symbols. If the context varies from one
`symbol to the next, Q, also varies. The calculation of
`probabilities of MPS renormalization and the associated bit
`rate is therefore far more complex. Let us consider the
`following hypothesis: The probability of the various interval
`values is sufficiently randomized by the effects of multiple
`contexts that the interval-register values are uniformly
`distributed in the interval from 0.75 to 1.5.
`Assuming that the above hypothesis is valid, the following
`equations give the LPS renormalization probability P,,k and
`the MPS renormalization probability Pm,k :
`
`The equations describing the balance in transition
`probabilities are similar to those developed in Section 3,
`except that the probability of the MPS renorm is calculated
`
`t
`
`I
`
`t I
`
`-r
`I
`
`O.,,t
`
`
`
`
`
` 0.42206*
`
`0.46893
`
`Allowed values of Q,
`
`I LPS.
`
`b Example of the sequencing of the probability estimator following an
`
`from Equation (1 1) rather than from the equations for the
`probability of a sequence of MPSs. Therefore, the balance
`for state k is given by
`ntPm,t - nk(Pm,k + p l , k ) = O 9
`nspI,s +
`
`S
`
`I
`where s is summed over all states which can make a
`transition to k via LPS, and t is summed over all states
`which can make a transition to k via a single MPS renorm.
`The normalization condition, Equation (7), completes the
`set of equations to be solved. Numerical precision again
`requires that the set of equations be truncated. Therefore,
`endpoint conditions are handled in the same manner as
`discussed in Section 3.
`The calculation of coding efficiency is done differently for
`the random-interval model. For a given interval A and a
`
`given estimated LPS probability a, the relative coding
`
`efficiency is
`
`where H i s the entropy and R, is the bit rate per symbol for
`state k. Defining p = 1 - q, the entropy is given by
`= -4 log 2(4) - P 1% 2 ( P h
`(14)
`and, for a uniform distribution of A values in the interval
`0.75 to 1.5, R, is given by
`
`IBM J. RES, DEVELOP. VOL. 32 NO, 6 NOVEMBER 1988
`
`W. B. PENNEBAKER AND J. L. MITCHELL
`
`Unified Patents, Ex. 1013
`
`000005
`
`
`
`0.10
`
`0.08
`
`0.06
`
`6
`.- 8
`5
`0 .-
`b
`3 0.04
`
`0.02
`
`0.00
`
`I single context using Table 1. The measured points were obtained
`from pseudorandom data sets of fixed q.
`4
`E
`
`Table 1 Probability estimates for 5-bit &.
`
`Q c
`index
`
`Q,
`
`value value
`
`
`
`0
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`
`X'OACl'
`X'OA8 1 '
`X'OAO 1 '
`X'0901'
`X'0701'
`X'068 1 '
`X'0601'
`X'0501'
`X'048 1 '
`X'0441'
`X'0381'
`X'0301'
`X'02C1'
`X'028 1 '
`X'024 1 '
`X'0181'
`X'0121'
`X'OOE 1 '
`X'OOA 1 '
`X'007 1 '
`X'0059'
`X'0053'
`X'0027'
`X'0017'
`X'0013'
`X'OOOB'
`X'0007'
`X'0005'
`X'0003'
`X'OOO 1 '
`
`742
`
`Decimal Q Decr Incr
`
`
`(LPS) (MPS)
`0
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`2
`2
`2
`2
`2
`2
`2
`2
`2
`2
`2
`2
`2
`2
`2
`2
`2
`3
`2
`3
`2
`3
`2
`
`1
`
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`1
`0
`
`0.50409
`0.49237
`0.46893
`0.42206
`0.32831
`1
`
`0.30487
`0.28 143
`1
`
`0.23456
`0.21 112
`0.19940
`0.16425
`0.14081
`0.12909
`0.11737
`0.10565
`0.07050
`0.05295
`0.04 120
`0.02948
`0.02069
`0.0 1630
`0.0 1520
`0.007 14
`0.0042 1
`0.00348
`0.0020 1
`0.00128
`0.00092
`0.00055
`0.000 1 8
`
`MPS
`exch
`
`1
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`0
`
`Rk,LPS =
`
`X , In ( X , ) - X , In (X,) - X , + X o
`>
`
`(17)
`
`where X, = 0.75/Qe[kl and X , = 1.5/Qe[k1.
`
`5. Comparison between theory and experiments
`for single-context coding
`In Figure 3, the results of calculations (solid curve) based on
`the equations derived in Sections 3 and 4 are compared to
`experimental results (circles). For single-context coding, the
`agreement between experiment and the exact theory for that
`case is excellent (as expected). The corresponding table of
`estimated Q, and the associated schedule of changes in index
`following renormalization are given in Table 1. This table
`was selected after much experimentation [ 51; it represents
`the best compromise among simplicity, minimum storage
`requirements for each context (6 bits),' reasonable coding
`efficiency for fixed statistics, and good performance on
`mixed-context data obtained from both facsimile-
`compression models and continuous-tone image-
`compression models.
`The Q, values in Table 1 are expressed as hexadecimal
`
`to the decimal fractional representation. The "Decr" column
`shows the decrement in the Qe index when moving to larger
`
`integers. Divide these a values by X'1000' X 4/3 to convert
`a following an LPS renormalization. The "Incr" column
`shows the increment in the Q index when moving to
`smaller Q, following an MPS renormalization. Where the
`"MPS exch" column entry is 1, an LPS will cause an
`exchange in the MPS definition.
`Figure 4 shows several theoretical (dashed curves) and
`experimental (solid curves) distributions of Qe for the 5-bit
`case. Again, agreement between calculation and
`measurement is excellent.
`Figures 5 and 6 show similar experimental and theoretical
`
`calculations for the 6-bit a table (Table 2). As might be
`
`expected, the finer granularity of this table significantly
`decreases the coding inefficiency for stationary statistics.
`(Compare Figure 3 with Figure 5.)
`Figure 6 shows several calculated and measured
`distributions of Q, for the 6-bit
`case. Comparing these
`distributions to those in Figure 4, the increase in coding
`
`The initial impetus to use very small amounts of storage per context came from
`G. G. Langdon. Somewhat to our surprise, he was able to demonstrate that our
`allowed a having less than 32 entries. We subsequently worked jointly to optimize
`estimation technique could achieve relatively good performance with a table of
`
`this singlerate system.
`
`W. B. PENNEBAKER AND 1. L. MITCHELL
`
`IBM J. RES. DEVELOP. VOL. 32 NO. 6 NOVEMBER 1988
`
`Unified Patents, Ex. 1013
`
`000006
`
`
`
`a" :,l
`#A,
`
`0.7
`
`0.6
`0.5
`
`0.4
`
`0.3
`
`5'
`
`0.2
`
`0.1
`
`0.0
`
`8 12
`
` 16
`
` 20
`
`
`
` 24 44 48 28
`
`
`
` 32
`
` 36
`
` 40
`
`0.10
`
`0.08
`
`-
`5- 0.06
`
`2 0.04
`
`0.02
`
`0.00
`
`-20-16-12-8
`
`-4 0 4
`
`
`
`12
`
`8
`
`
`
`16
`
`5 0.10 -
`
`$0.06 -
`-
`0.04
`-
`-
`
`0.02
`20
`0.00
`
`
`
`Q, index
`
`Q, index
`
`Q, index
`
`-8 -4 0 4 8
`
`12 16
`
`20 24 28 32
`
`efficiency with a 6-bit Q, is due to the reduced spread in the
`distribution. In both cases, the peak in the distribution is
`quite close to the desired Q,. The coarser granularity of the
`5-bit Q, table is not the only factor. The changes in table
`position for the 5-bit Q, are often larger as well. This
`increases the spread in the Q, distribution.
`The effect of the estimation process can be eliminated by
`choosing the Q, so as to minimize the coding inefficiency.
`Such a procedure might be appropriate when the probability
`is known a priori. The coding inefficiency for this special
`case is shown in Figure 7 for two cases. The first case (solid
`line) is for the 12-bit integer representation of with no
`additional granularity introduced. This curve gives the lower
`bound for the coding inefficiency that can be achieved with
`this integer representation. Quantization of Q, to the 12-bit
`integer representation causes the sequence of distinct
`minima which is noticeable for q < 2-l'. The sequence of
`distinct minima for q larger than about 2-3 reflects the fact
`that the bit rate stays constant over short intervals in Q,.
`These intervals of constant bit rate result from the arithmetic
`approximation and renormalization used in the Q-Coder.
`The dashed line in Figure 7 represents the 12-bit integer
`representation and the granularity of the 5-bit Q, table.
`One feature of Tables 1 and 2 is the avoidance of Q,
`values which renormalize to X' 1000'. For the integer
`representation chosen for the tables this is the minimum A
`value allowed before MPS renormalization must occur.
`Consequently, when the Q, value renormalizes to an A value
`too close to X' lOOO', the probability of the MPS renorm
`becomes very large. If A is exactly X' lOOO', any MPS will
`
`0.08 -
`
`A
`8
`.p 0.06
`$
`
`-
`2 .* # 0.04-
`
`V
`
`0.02 -
`
`2
`
`4
`
`6
`
`
`
` 8 12
`
`
`
`10
`
`immediately trigger an MPS renorm. In single-context
`coding, this creates a trap for the estimator. Note, however,
`that the smallest Q, entry in the table must renormalize to
`
`IBM J. RES. DEVELOP. VOL. 32 NO. 6 NOVEMBER 1988
`
`W. E. PENNEBAKER A N D 1. L. MITCHELL
`
`Unified Patents, Ex. 1013
`
`000007
`
`
`
`L m1:
`
`L
`
`0.8
`0.7
`0.6
`
`(C)
`
`0.16
`
`0.04
`
`0.00
`
`-10 0 10 20 30 40
`
` 50 60 70
`
`0.2
`0.1
`0.0
`
`:
`
`
`,
`,
`,
`,
`I
`20 30 40 50 60 70 80 90 1 0 0
`
`Q, index
`
`Q, index
`
`Q, index
`
`I
`t - 12-bit quantization only
`
`O.1°
`
`0.08
`
`I
`
`I
`2
`
`I
`4
`
`I
` 6
`
`I
`8
`12
`
`
`
`I
`10
`
`I
`
`I
`
`-log* (4)
`
`8 Coding inefficiency as a function of q. The solid curve is for a Q,
`i table which allows all values between X'ACI' and 1. The Q, value
`used for any given q is the smallest value in the table that minimizes
`f
`the coding inefficiency. The dashed curve is for the 5-bit Q, table
`1
`(Table I), again with the smallest value of Q, chosen that minimizes
`4
`the coding inefficiency.
`
`744
`
`X' 1000'. When the system is at that entry, the LPS renorm
`moves the Q, pointer such that at least two MPS renorms in
`succession are needed to return to the smallest entry. This
`avoids the trap.3
`The calculations and measurements described in this
`section show that coding inefficiency is strongly influenced
`by the Q, table granularity and the amount of change in
`index following renormalization. Table granularity and the
`amount of change directly influence the rate at which the
`system adapts to a change in q. However, the faster the
`adaptation rate, the more the distribution in Q, is spread,
`and the larger the coding inefficiency is for stationary
`statistics.
`
`6. Multi-rate probability estimation
`In real data sets the symbol probability can vary quite
`widely. Consequently, the adaptive nature of the probability
`estimator is extremely important in achieving good coding
`efficiencies. The two different Q, table granularities discussed
`in the preceding section exhibit quite different coding
`efficiencies for fixed-probability coding. However, the coarser
`granularity of the 5-bit table allows
`the estimator to amve
`
`G. G. Langdon suggested the particular integer representation (X' 1ooO'
`corresponding to 0.75) used in developing these tables. His suggestion was motivated
`by simplicity in hardware implementation, in that this representation allowed a single-
`bit test to determine when renormahtion of the interval was needed. However, this
`representation also guaranteed that the estimator would be trappA at the smallest
`integer value, 0, = I , We had explored conditional renormalization as a means of
`centering the estimator at the correct 0,. Langdon suggested using it to avoid the trap.
`
`W. B. PENNEBAKER AND 1. L. MITCHELL
`
`IBM J. RES. DEVELOP. VOL. 32 NO. 6 NOVEMBER 1988
`
`Unified Patents, Ex. 1013
`
`000008
`
`
`
`at the appropriate Qe value at a cost of about half as many
`bits in the compressed data string. This illustrates the trade-
`off that must be made between good coding efficiency for
`stationary probabilities and rapid estimation of changing
`probabilities.
`A second-order process has been developed, however,
`which allows use of the 6-bit Q, table (retaining most of the
`coding efficiency for stable statistics) and yet provides very
`fast estimation of changing probabilities. This second-order
`process is based on measurement of the correlation of
`renormalizations. If the probability of a given
`renormalization is about 0.5, the probability of two
`renormalizations in sequence being the same is also about
`0.5. Only if the current estimate of Q, is poor will the
`renormalization correlation probability significantly exceed
`0.5.
`Correlation of renormalizations is detected by comparing
`the previous renormalization (for a given context) to the
`current one. A 4-bit counter (R,), kept individually for each
`context, is used to determine the degree to which this
`correlation is occurring. The counter is incremented by one
`if the renormalization is the same as the last and
`decremented by two if the renormalization is different. The
`one exception to this rule is at the minimum value of Q,.
`A spurious indication of correlation would result if R,, were
`incremented in that state. If R,, is zero, the schedule for
`change in the
`index shown