throbber
Practical Implementations of
`Practical Implementations of
`Arithmetic Coding
`Arithmetic Coding
`
`Paul G. Howard and Jeffrey Scott Vitter
`Paul G. Howard and Jerey Scott Vitter
`
`Brown University
`Brown University
`Department of Computer Science
`Department of Computer Science
`Technical Report No. 92-18
`Technical Report No.  
`Revised version, April 1992
`Revised version, April 
`(Formerly Technical Report No. CS-91-45)
`Formerly Technical Report No. CS 
`
`Appears in Image and Text Compression,
`Appears in Image and Text Compression,
`James A. Storer, ed., Kluwer Academic Publishers, Norwell, MA, 1992, pages 85-112.
`James A. Storer, ed., Kluwer Academic Publishers, Norwell, MA, , pages  .
`
`A shortened version appears in the proceedings of the
`A shortened version appears in the proceedings of the
`International Conference on Advances in Communication and Control (COMCON 3),
`International Conference on Advances in Communication and Control COMCON ,
`Victoria, British Columbia, Canada, October 16-18, 1991.
`Victoria, British Columbia, Canada, October  , .
`
`Unified Patents, Ex. 1007
`
`000001
`
`

`

`Unified Patents, Ex. 1007
`
`000002
`
`

`

`PRACTICAL IMPLEMENTATIONS OF
`Practical Implementations of
`ARITHMETIC CODING'
`Arithmetic Coding
`
`Paul G. Howard2
`Jeffrey Scott Vitter3
`Jerey Scott Vitter
`Paul G. Howard 
`Department of Computer Science
`Department of Computer Science
`Brown University
`Brown University
`Providence, R.I. 02912-1910
`Providence, R.I.  
`
`Abstract
`Abstract
`We provide a tutorial on arithmetic coding, showing how it provides nearly
`We provide a tutorial on arithmetic coding, showing how it provides nearly
`optimal data compression and how it can be matched with almost any prob-
`optimal data compression and how it can be matched with almost any prob-
`abilistic model. We indicate the main disadvantage of arithmetic coding, its
`abilistic model. We indicate the main disadvantage of arithmetic coding, its
`slowness, and give the basis of a fast, space-efficient, approximate arithmetic
`slowness, and give the basis of a fast, space-ecient, approximate arithmetic
`coder with only minimal loss of compression efficiency. Our coder is based on
`coder with only minimal loss of compression eciency. Our coder is based on
`the replacement of arithmetic by table lookups coupled with a new deterministic
`the replacement of arithmetic by table lookups coupled with a new deterministic
`probability estimation scheme.
`probability estimation scheme.
`
`Index terms: Data compression, arithmetic coding, adaptive modeling, analysis
`Index terms: Data compression, arithmetic coding, adaptive modeling, analysis
`of algorithms, data structures, low precision arithmetic.
`of algorithms, data structures, low precision arithmetic.
`
`'A similar version of this paper appears in Image and Text Compression, James A. Storer, ed.,
`Kluwer Academic Publishers, Norwell, MA, 1992, 85-112. A shortened version of this paper appears
`in the proceedings of the International Conference on Advances in Communication and Control
`(COMCON 3), Victoria, British Columbia, Canada, October 16-18, 1991.
`2 Support was provided in part by NASA Graduate Student Researchers Program grant NGT-
`50420 and by a National Science Foundation Presidential Young Investigators Award grant with
`matching funds from IBM. Additional support was provided by a Universities Space Research As-
`sociation/CESDIS associate membership.
`3Support was provided in part by National Science Foundation Presidential Young Investigator
`Award CCR-9047466 with matching funds from IBM, by NSF research grant CCR-9007851, by
`Army Research Office grant DAAL03-91—G-0035, and by the Office of Naval Research and the De-
`fense Advanced Research Projects Agency under contract N00014-91—J-4052 ARPA Order No. 8225.
`Additional support was provided by a Universities Space Research Association/CESDIS associate
`membership.
`
`

`

`Unified Patents, Ex. 1007
`
`000004
`
`

`

`1
`
`
`1 Data Compression and Arithmetic Coding
` Data Compression and Arithmetic Coding
`
`Data can be compressed whenever some data symbols are more likely than others.
`Data can be compressed whenever some data symbols are more likely than others.
`Shannon [54] showed that for the best possible compression code (in the sense of
`Shannon  showed that for the best possible compression code in the sense of
`minimum average code length), the output length contains a contribution of — lg p
`minimum average code length, the output length contains a contribution of (cid:0) lg p
`bits from the encoding of each symbol whose probability of occurrence is p. If we can
`bits from the encoding of each symbol whose probability of occurrence is p. If we can
`provide an accurate model for the probability of occurrence of each possible symbol
`provide an accurate model for the probability of occurrence of each possible symbol
`at every point in a file, we can use arithmetic coding to encode the symbols that
`at every point in a le, we can use arithmetic coding to encode the symbols that
`actually occur; the number of bits used by arithmetic coding to encode a symbol with
`actually occur; the number of bits used by arithmetic coding to encode a symbol with
`probability p is very nearly — lg p, so the encoding is very nearly optimal for the given
`probability p is very nearly (cid:0) lg p, so the encoding is very nearly optimal for the given
`probability estimates.
`probability estimates.
`In this paper we show by theorems and examples how arithmetic coding achieves
`In this paper we show by theorems and examples how arithmetic coding achieves
`its performance. We also point out some of the drawbacks of arithmetic coding
`its performance. We also point out some of the drawbacks of arithmetic coding
`in practice, and propose a unified compression system for overcoming them. We
`in practice, and propose a unied compression system for overcoming them. We
`begin by attempting to clear up some of the false impressions commonly held about
`begin by attempting to clear up some of the false impressions commonly held about
`arithmetic coding; it offers some genuine benefits, but it is not the solution to all data
`arithmetic coding; it oers some genuine benets, but it is not the solution to all data
`compression problems.
`compression problems.
`The most important advantage of arithmetic coding is its flexibility: it can be
`The most important advantage of arithmetic coding is its exibility: it can be
`used in conjunction with any model that can provide a sequence of event probabilities.
`used in conjunction with any model that can provide a sequence of event probabilities.
`This advantage is significant because large compression gains can be obtained only
`This advantage is signicant because large compression gains can be obtained only
`through the use of sophisticated models of the input data. Models used for arithmetic
`through the use of sophisticated models of the input data. Models used for arithmetic
`coding may be adaptive, and in fact a number of independent models may be used
`coding may be adaptive, and in fact a number of independent models may be used
`in succession in coding a single file. This great flexibility results from the sharp
`in succession in coding a single le. This great exibility results from the sharp
`separation of the coder from the modeling process [47]. There is a cost associated with
`separation of the coder from the modeling process . There is a cost associated with
`this flexibility: the interface between the model and the coder, while simple, places
`this exibility: the interface between the model and the coder, while simple, places
`considerable time and space demands on the model's data structures, especially in
`considerable time and space demands on the model’s data structures, especially in
`the case of a multi-symbol input alphabet.
`the case of a multi-symbol input alphabet.
`The other important advantage of arithmetic coding is its optimality. Arithmetic
`The other important advantage of arithmetic coding is its optimality. Arithmetic
`coding is optimal in theory and very nearly optimal in practice, in the sense of encod-
`coding is optimal in theory and very nearly optimal in practice, in the sense of encod-
`ing using minimal average code length. This optimality is often less important than it
`ing using minimal average code length. This optimality is often less important than it
`might seem, since Huffman coding [25] is also very nearly optimal in most cases [8,9,
`might seem, since Human coding  is also very nearly optimal in most cases , ,
`18,39]. When the probability of some single symbol is close to 1, however, arithmetic
` , . When the probability of some single symbol is close to , however, arithmetic
`coding does give considerably better compression than other methods. The case of
`coding does give considerably better compression than other methods. The case of
`highly unbalanced probabilities occurs naturally in bilevel (black and white) image
`highly unbalanced probabilities occurs naturally in bilevel black and white image
`coding, and it can also arise in the decomposition of a multi-symbol alphabet into a
`coding, and it can also arise in the decomposition of a multi-symbol alphabet into a
`sequence of binary choices.
`sequence of binary choices.
`The main disadvantage of arithmetic coding is that it tends to be slow. We shall
`The main disadvantage of arithmetic coding is that it tends to be slow. We shall
`see that the full precision form of arithmetic coding requires at least one multiplication
`see that the full precision form of arithmetic coding requires at least one multiplication
`per event and in some implementations up to two multiplications and two divisions
`per event and in some implementations up to two multiplications and two divisions
`per event. In addition, the model lookup and update operations are slow because
`per event. In addition, the model lookup and update operations are slow because
`of the input requirements of the coder. Both Huffman coding and Ziv-Lempel [59,
`of the input requirements of the coder. Both Human coding and Ziv-Lempel  ,
`60] coding are faster because the model is represented directly in the data structures
` coding are faster because the model is represented directly in the data structures
`
`Unified Patents, Ex. 1007
`
`000005
`
`

`

`2
`
`
`2 TUTORIAL ON ARITHMETIC CODING
` TUTORIAL ON ARITHMETIC CODING
`
`used for coding. (This reduces the coding efficiency of those methods by narrowing
`used for coding. This reduces the coding eciency of those methods by narrowing
`the range of possible models.) Much of the current research in arithmetic coding
`the range of possible models. Much of the current research in arithmetic coding
`concerns finding approximations that increase coding speed without compromising
`concerns nding approximations that increase coding speed without compromising
`compression efficiency. The most common method is to use an approximation to
`compression eciency. The most common method is to use an approximation to
`the multiplication operation [10,27,29,43]; in this paper we present an alternative
`the multiplication operation  ,, , ; in this paper we present an alternative
`approach using table lookups and approximate probability estimation.
`approach using table lookups and approximate probability estimation.
`Another disadvantage of arithmetic coding is that it does not in general produce a
`Another disadvantage of arithmetic coding is that it does not in general produce a
`prefix code. This precludes parallel coding with multiple processors. In addition, the
`prex code. This precludes parallel coding with multiple processors. In addition, the
`potentially unbounded output delay makes real-time coding problematical in critical
`potentially unbounded output delay makes real-time coding problematical in critical
`applications, but in practice the delay seldom exceeds a few symbols, so this is not a
`applications, but in practice the delay seldom exceeds a few symbols, so this is not a
`major problem. A minor disadvantage is the need to indicate the end of the file.
`major problem. A minor disadvantage is the need to indicate the end of the le.
`One final minor problem is that arithmetic codes have poor error resistance, espe-
`One nal minor problem is that arithmetic codes have poor error resistance, espe-
`cially when used with adaptive models [5]. A single bit error in the encoded file causes
`cially when used with adaptive models . A single bit error in the encoded le causes
`the decoder's internal state to be in error, making the remainder of the decoded file
`the decoder’s internal state to be in error, making the remainder of the decoded le
`wrong. In fact this is a drawback of all adaptive codes, including Ziv-Lempel codes
`wrong. In fact this is a drawback of all adaptive codes, including Ziv-Lempel codes
`and adaptive Huffman codes [12,15,18,26,55,56]. In practice, the poor error resistance
`and adaptive Human codes  , , ,,,. In practice, the poor error resistance
`of adaptive coding is unimportant, since we can simply apply appropriate error cor-
`of adaptive coding is unimportant, since we can simply apply appropriate error cor-
`rection coding to the encoded file. More complicated solutions appear in [5,20], in
`rection coding to the encoded le. More complicated solutions appear in ,, in
`which errors are made easy to detect, and upon detection of an error, bits are changed
`which errors are made easy to detect, and upon detection of an error, bits are changed
`until no errors are detected.
`until no errors are detected.
`
`In Section 2 we give a tutorial on arithmetic coding.
`Overview of this paper.
`In Section  we give a tutorial on arithmetic coding.
`Overview of this paper.
`We include an introduction to modeling for text compression. We also restate several
`We include an introduction to modeling for text compression. We also restate several
`important theorems from [22] relating to the optimality of arithmetic coding in theory
`important theorems from  relating to the optimality of arithmetic coding in theory
`and in practice.
`and in practice.
`In Section 3 we present some of our current research into practical ways of improv-
`In Section we present some of our current research into practical ways of improv-
`ing the speed of arithmetic coding without sacrificing much compression efficiency.
`ing the speed of arithmetic coding without sacricing much compression eciency.
`The center of this research is a reduced-precision arithmetic coder, supported by
`The center of this research is a reduced-precision arithmetic coder, supported by
`efficient data structures for text modeling.
`ecient data structures for text modeling.
`
`2 Tutorial on Arithmetic Coding
` Tutorial on Arithmetic Coding
`
`In this section we explain how arithmetic coding works and give implementation
`In this section we explain how arithmetic coding works and give implementation
`details; our treatment is based on that of Witten, Neal, and Cleary [58]. We point out
`details; our treatment is based on that of Witten, Neal, and Cleary . We point out
`the usefulness of binary arithmetic coding (that is, coding with a 2-symbol alphabet),
`the usefulness of binary arithmetic coding that is, coding with a -symbol alphabet,
`and discuss the modeling issue, particularly high-order Markov modeling for text
`and discuss the modeling issue, particularly high-order Markov modeling for text
`compression. Our focus is on encoding, but the decoding process is similar.
`compression. Our focus is on encoding, but the decoding process is similar.
`
`2.1 Arithmetic coding and its implementation
`. Arithmetic coding and its implementation
`
`Basic algorithm. The algorithm for encoding a file using arithmetic coding works
`Basic algorithm. The algorithm for encoding a le using arithmetic coding works
`conceptually as follows:
`conceptually as follows:
`
`Unified Patents, Ex. 1007
`
`000006
`
`

`

`2.1 Arithmetic coding and its implementation
`. Arithmetic coding and its implementation
`
`Old interval
`Old interval
`
`Decomposition
`Decomposition
`
`New interval
`New interval
`
`0
`L
`0 L
`
`0
`0
`
`3
`
`
`H
`H
`
`1
`I
`
`1
`
`probability of a
`probability of ai
`i
`
`0
`0
`
`L'
`L
`
`H
`H'
`
`1
`1
`
`Figure 1: Subdivision of the current interval based on the probability of the input
`Figure : Subdivision of the current interval based on the probability of the input
`symbol a, that occurs next.
`symbol ai that occurs next.
`
`1. We begin with a "current interval" [L, H) initialized to [0, 1).
` . We begin with a current interval" L; H initialized to ; .
`
`2. For each symbol of the file, we perform two steps (see Figure 1):
`. For each symbol of the le, we perform two steps see Figure :
`
`(a) We subdivide the current interval into subintervals, one for each possible
`a We subdivide the current interval into subintervals, one for each possible
`alphabet symbol. The size of a symbol's subinterval is proportional to the
`alphabet symbol. The size of a symbol’s subinterval is proportional to the
`estimated probability that the symbol will be the next symbol in the file,
`estimated probability that the symbol will be the next symbol in the le,
`according to the model of the input.
`according to the model of the input.
`(b) We select the subinterval corresponding to the symbol that actually occurs
`b We select the subinterval corresponding to the symbol that actually occurs
`next in the file, and make it the new current interval.
`next in the le, and make it the new current interval.
`
`3. We output enough bits to distinguish the final current interval from all other
` . We output enough bits to distinguish the nal current interval from all other
`possible final intervals.
`possible nal intervals.
`
`The length of the final subinterval is clearly equal to the product of the probabilities
`The length of the nal subinterval is clearly equal to the product of the probabilities
`of the individual symbols, which is the probability p of the particular sequence of
`of the individual symbols, which is the probability p of the particular sequence of
`symbols in the file. The final step uses almost exactly — lg p bits to distinguish the
`symbols in the le. The nal step uses almost exactly (cid:0) lg p bits to distinguish the
`file from all other possible files. We need some mechanism to indicate the end of the
`le from all other possible les. We need some mechanism to indicate the end of the
`file, either a special end-of-file symbol coded just once, or some external indication of
`le, either a special end-of-le symbol coded just once, or some external indication of
`the file's length.
`the le’s length.
`In step 2, we need to compute only the subinterval corresponding to the symbol a,
`In step , we need to compute only the subinterval corresponding to the symbol ai
`that actually occurs. To do this we need two cumulative probabilities, PC = Pi(cid:0)
`that actually occurs. To do this we need two cumulative probabilities, Pc = a -=1 pk
`k= pk
`and PN = Pi
`and PN =
`=1 Pk . The new subinterval is [L Pc(H — L), L PN (H — L)). The
`k= pk. The new subinterval is L + PC H (cid:0) L; L + PN H (cid:0) L. The
`a
`need to maintain and supply cumulative probabilities requires the model to have a
`need to maintain and supply cumulative probabilities requires the model to have a
`complicated data structure; Moffat [35] investigates this problem, and concludes for
`complicated data structure; Moat   investigates this problem, and concludes for
`a multi-symbol alphabet that binary search trees are about twice as fast as move-to-
`a multi-symbol alphabet that binary search trees are about twice as fast as move-to-
`front lists.
`front lists.
`Example 1: We illustrate a non-adaptive code, encoding the file containing the
`Example : We illustrate a non-adaptive code, encoding the le containing the
`symbols bbb using arbitrary fixed probability estimates pa = 0.4, pb = 0.5, and
`symbols bbb using arbitrary xed probability estimates pa = :, pb = :, and
`PEOF - 0.1. Encoding proceeds as follows:
`pEOF = : . Encoding proceeds as follows:
`
`Unified Patents, Ex. 1007
`
`000007
`
`

`

`1
`
`
`COMM;
`2 T l TORIAL ON AR1111111EI IC
` TUTORIAL ON ARITHMETIC CODING
`
`Current
`Current
`Interval
`
`Action
`
`[0.000,1.000) Subdivide
`[0.400, 0.900) Subdivide
`[0.600, 0.850) Subdivide
`[0.700, 0.825) Subdivide
`[0.812, 0.825)
`
`Subintervals
`
`EOF
`
`Input
`
`[0.000,0.00)
`[0.400, 0.600)
`[0.6000.700)
`[0.700
`750)
`
`9 (0)
`[0.40
`850)
`[0.60
`[0.700, 0.825)
`[0.750,0.812)
`
`h
`[0.900, LON)
`h
`[0.850, 0.900)
`h
`[0.825, 0.850)
`[0.812,0.825) EOF
`
`The final interval (without rounding) is [0.8125.0.825). which in binary is approx-
`imately [0.11010 00000. 0.11010 01100). We can uniquely identify this interval by
`outputting 1101000. According to the fixed model. the probability p of this partic-
`ular file is (0.5)3(0.1) = 0.0125 (exactly the size of the final interval) and the code
`length (in bits) should be — lgp = 6.322. In practice we have to output I bits.
`q
`
`The idea of arithmetic coding originated with Shannon in his seminal 1918 paper
`on information theory [5i]. h was rediscovered by Elias about 15 years later, as
`briefly mentioned in [1].
`
`Implementation details. The basic implementation of arithmetic coding de-
`scribed above has two major difficulties: the shrinking current interval requires the
`use of high precision arithmetic, and no output is produced until the entire file has
`been read. The most straightforward solution to both of these problems is to output
`each leading bit as soon as it is known, and then to double the length of the cur-
`rent interval so that it reflects only the unknown part of the final interval. Witten,
`Neal, and Cleary [58] add a clever mechanism for preventing the current interval from
`shrinking too much when the endpoints are close to 1/2 but straddle 1/2. In that
`case we do not yet know the next output bit, but we do know that whatever it is, the
`following bit will have the opposite value; we merely keep track of that fact, and ex-
`pand the current interval symmetrically about 1/2. This follow-on procedure may be
`repeated any number of times, so the current interval size is always longer than 1/1.
`Mechanisms for incremental transmission and fixed precision arithmetic have been
`developed through the years by Pasco [i0], Rissanen [i8], Rubin [52], Rissanen and
`Langdon [i9], (Mazzo [19], and Witten, Neal, and Cleary [58]. The bit-stuffing idea
`of Langdon and others at IBM that limits the propagation of carries in the additions
`is roughly equivalent to the follow-on procedure described above.
`We now describe in detail how the coding and interval expansion work. This
`process takes place immediately after the selection of the subinterval corresponding
`to an input symbol.
`We repeat the following steps (illustrated schematically in Figure 2) as many times
`as possible:
`
`a. E the new subinterval is not entirely within one of the intervals [0, I/2), E/4, 3/4),
`or E/2, 1), we stop iterating and return.
`
`

`

`2.1 Arithmetic coding and its implementation
`. Arithmetic coding and its implementation
`
`5
`
`
`(a)
`(a)
`
`(b)
`(b)
`
`(c)
`(c)
`
`(d)
`(d)
`
`0
`0
`
`1/
`1/
`
`2
`2
`
`1
`1
`
`Figure 2: Interval expansion process. (a) No expansion. (b) Interval in [0, 1/2). (c)
`Figure : Interval expansion process. a No expansion. b Interval in ; =. c
`Interval in [1/2, 1). (d) Interval in [1/4, 3/4) (follow-on case).
`Interval in  =; . d Interval in  =; = follow-on case.
`
`b. If the new subinterval lies entirely within [0, 1/2), we output 0 and any Is left
`b. If the new subinterval lies entirely within ; =, we output and any s left
`over from previous symbols; then we double the size of the interval [0, 1/2),
`over from previous symbols; then we double the size of the interval ; =,
`expanding toward the right.
`expanding toward the right.
`
`c. If the new subinterval lies entirely within [1 2,1), we output 1 and any Os left
`c. If the new subinterval lies entirely within  =; , we output and any s left
`over from previous symbols; then we double the size of the interval [1/2, 1),
`over from previous symbols; then we double the size of the interval  =; ,
`expanding toward the left.
`expanding toward the left.
`
`d. If the new subinterval lies entirely within [14 3/4), we keep track of this fact
`d. If the new subinterval lies entirely within  =; =, we keep track of this fact
`for future output; then we double the size of the interval [1/4, 3/4), expanding in
`for future output; then we double the size of the interval  =; =, expanding in
`both directions away from the midpoint.
`both directions away from the midpoint.
`
`Example 2: We show the details of encoding the same file as in Example 1.
`Example  : We show the details of encoding the same le as in Example .
`
`Unified Patents, Ex. 1007
`
`000009
`
`

`

`6
`
`
`COMM;
`2 111O1ilaLU.1 ARI111111E1 IC
` TUTORIAL ON ARITHMETIC CODING
`
`Subintervals
`
`EOF
`
`Input
`
`0.40)
`[0.0
`
`
`[0.4 0.60)0,
`
`[0.40, 0.90)
`[0.60, 0.85)
`
`[0.90,1.00)
`[0.85, 0.90)
`
`h
`h
`
`[0.2
`
`40)
`
`[0.4(
`
`65)
`
`[0.65. 0,70)
`
`h
`
`[0.30, 0.50)
`
`[0.50
`
`75)
`
`[0.75, 0.80) EOF
`
`Current
`Current
`Interval
`
`Action
`
`[0.00,1.00) Subdivide
`[0.40, 0.90) Subdivide
`[0.60, 0.85) Output 1
`Expand [1/2, 1)
`[0.20, 0.70) Subdivide
`follow
`[0.40, 0.65)
`Expand [1/4, 3/4)
`[0.30,0.80) Subdivide
`[0.75, 0.80) Output 10
`Expand [1/2, 1)
`[0.50, 0.60) Output 1
`Expand [1/2, I)
`[0.00, 0.20) Output 0
`Expand [0, 1(2)
`[0.00,0.40) Output 0
`Expand [0, 1/2)
`[0.00, 0.80) Output 0
`
`The "lotion," ou put in the sixth line indicates the follow-on procedure: we keep
`track of our knowledge that the next output bit will be followed by its opposite; this
`"opposite" bit is the 0 output in the ninth line. The encoded file is 1101000, as
`q
`before.
`
`Clearly the current interval contains some information about the preceding inputs;
`this information has not yet been output, so we can think of it as the coder's state. E
`a is the length of the current interval, the state holds — lg a bits not yet output. In the
`basic method (illustrated by Example 1) the state contains all the information about
`the output, since nothing is output until the end. In the implementation illustrated
`by Example 2, the state always contains fewer than two bits of output information,
`since the length of the current interval is always more than 1/1. The final state in
`Example 2 is [0,0.8), which contains — lg 0.8
`0.322 bits of information.
`
`Use of integer arithmetic. In practice, the arithmetic can be done by storing
`the current interval in sufficiently long integers rather than in floating point or exact
`rational numbers. (We can think of Example 2 as using the integer interval [0, 100)
`by omitting all the decimal points.) We also use integers for the frequency counts
`used to estimate symbol probabilities. The subdivision process involves selecting non-
`overlapping intervals (of length at least 1) with lengths approximately proportional
`to the counts. To encode symbol ai we need two cumulative counts, C =
`and
`ci„ and the sum T of all counts. T =
`cy,. (Here and elsewhere we
`iV =
`denote the alphabet size by o.) The new subinterval is [1.
`+
`[c(IIL)
`\
`r -L)
`
`L
`
`

`

`2.1 Arithmetic coding and its implementation
`. Arithmetic coding and its implementation
`
`
`
`(in this discussion we continue to use half-open intervals as in the real arithmetic case.
`In this discussion we continue to use half-open intervals as in the real arithmetic case.
`In implementations [58] it is more convenient to subtract 1 from the right endpoints
`In implementations  it is more convenient to subtract from the right endpoints
`and use closed intervals. Moffat [36] considers the calculation of cumulative frequency
`and use closed intervals. Moat   considers the calculation of cumulative frequency
`counts for large alphabets.)
`counts for large alphabets.
`
`Example 3: Suppose that at a certain point in the encoding we have symbol counts
`Example : Suppose that at a certain point in the encoding we have symbol counts
`= 1, eb = 5, and CEOF = 1 and current interval [25, 89) from the full interval [0, 128).
`ca = , cb = , and cEOF = and current interval ;   from the full interval ; .
`Let the next input symbol be b. The cumulative counts for b are C = 1 and iV = 9,
`Let the next input symbol be b. The cumulative counts for b are C =  and N = ,
`+
`I 9[89-25) I
`and T = , so the new interval is  +b  (cid:0)
` we then
`and T = 10, so the new interval is [2,5 +
`I 4[89-25) I
`[50, 82);
`10
`
`10
`increment the follow-on count and expand the interval once about the midpoint 61,
`giving [(36, 100). It is possible to maintain higher precision, truncating (and adjusting
`to avoid overlapping subintervals) only when the expansion process is complete; this
`makes it possible to prove a tight analytical bound on the lost compression caused by
`the use of integer arithmetic, as we do in [22], restated as 'Theorem 1 below. in practice
`q
`this refinement makes the coding more difficult without improving compression.
`
`Analysis. in [22] we prove a number of theorems about the code lengths of files
`coded with arithmetic coding. Most of the results involve the use of arithmetic coding
`in conjunction with various models of the input; these will be discussed in Section 2.3.
`Here we note two results that apply to implementations of the arithmetic coder. The
`first shows that using integer arithmetic has negligible effect on code length.
`
`Theorem 1 If we use integers from the range [0, N) and use the high precision al-
`gorithm for scaling up the subrange, the code length is provably bounded by 1/(N 1n'2)
`bits per input symbol more than the ideal code length for the file.
`
`For a typical value N = 65,536, the excess code length is less than 10-4 bit per
`input symbol.
`The second result shows that if we indicate end-of-file by encoding a special symbol
`just once for the entire file. the additional code length is negligible.
`
`Theorem 2 The use of a special end-of-file symbol when coding a file of length t
`using integers from the range [0. N) results in additional code length of less than
`8t/(N ln 2) + lg N + I bits.
`
`Again the extra code length is negligible. less than 0.01 bit per input symbol for
`a typical 100,000 byte file.
`Since we seldom know the exact probabilities of the process that generated an
`input file. we would like to know how errors in the estimated probabilities affect the
`code length. We can estimate the extra code length by a straightforward asymptotic
`analysis. The average code length L for symbols produced by a given model in a
`given state is given by
`
`L = —
`
`lg
`
`

`

`8
`
`
`2 T 1 7TORIAL ON ARI I 11111EI IC COMM;
` TUTORIAL ON ARITHMETIC CODING
`
`where pi is the actual probability of the ith alphabet symbol and qi is its estimated
`where pi is the actual probability of the ith alphabet symbol and qi is its estimated
`probability. The optimal average code length for symbols in the state is the entropy
`probability. The optimal average code length for symbols in the state is the entropy
`of the state, given by
`of the state, given by
`
`tp:
`pi lg pi:
`
`nXi
`
`11 = -
`H = (cid:0)
`=
`The excess code length is E = L — 11; if we let di = — pi and expand asymptotically
`The excess code length is E = L(cid:0) H; if we let di = qi(cid:0) pi and expand asymptotically
`in d. we obtain
`in d, we obtain
`P
`1 d?
`= 
`_
`_
` + 0 ('
`E=L
`E =
`21n 2 pi
`(
`Pi
`(This corrects a similar derivation in [5], in which the factor of 1/1112 is omitted.)
`The vanishing of the linear terms means that small errors in the probabilities used
`by the coder lead to very small increases in code length. Because of this property,
`any coding method that uses approximately correct probabilities will achieve a code
`length close to the entropy of the underlying source. We use this fact in Section 3.1
`to design a class of fast approximate arithmetic coders with small compression loss.
`
`
`
`0-)
`
`nXi
`
`
`
`2.2 Binary arithmetic coding
`
`The preceding discussion and analysis has focused on coding with a multi-symbol
`alphabet, although in principle it applies to a binary alphabet as well. h is useful
`to distinguish the two cases since both the coder and the interface to the model are
`simpler for a binary alphabet. The coding of bilevel images, an important problem
`with a natural two-symbol alphabet, often produces probabilities close to 1, indicating
`the use of arithmetic coding to obtain good compress

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket