throbber
Arithmetic Coding Revisited
`
`ALISTAIR MOFFAT
`The University of Melbourne
`RADFORD M. NEAL
`University of Toronto
`and
`IAN H. WITTEN
`The University of Waikato
`
`Over the last decade, arithmetic coding has emerged as an important compression tool. It is
`now the method of choice for adaptive coding on multisymbol alphabets because of its speed,
`low storage requirements, and effectiveness of compression. This article describes a new
`implementation of arithmetic coding that incorporates several improvements over a widely
`used earlier version by Witten, Neal, and Cleary, which has become a de facto standard. These
`improvements include fewer multiplicative operations, greatly extended range of alphabet
`sizes and symbol probabilities, and the use of low-precision arithmetic, permitting implemen-
`tation by fast shift/add operations. We also describe a modular structure that separates the
`coding, modeling, and probability estimation components of a compression system. To moti-
`vate the improved coder, we consider the needs of a word-based text compression program. We
`report a range of experimental results using this and other models. Complete source code is
`available.
`Categories and Subject Descriptors: E.4 [Data]: Coding and Information Theory—data com-
`paction and compression; E.1 [Data]: Data Structures
`General Terms: Algorithms, Performance
`Additional Key Words and Phrases: Approximate coding, arithmetic coding, text compression,
`word-based model
`
`This investigation was supported by the Australian Research Council and the Natural
`Sciences and Engineering Research Council of Canada. A preliminary presentation of this
`work was made at the 1995 IEEE Data Compression Conference.
`Authors’ addresses: A. Moffat, Department of Computer Science, The University of Melbourne,
`Parkville, Victoria 3052, Australia; email: alistair@cs.mu.oz.au; R. M. Neal, Department of
`Statistics and Department of Computer Science, University of Toronto, Canada; email:
`radford@cs.utoronto.ca; I. H. Witten, Department of Computer Science, The University of
`Waikato, Hamilton, New Zealand; email: ihw@waikato.ac.nz.
`Permission to make digital / hard copy of part or all of this work for personal or classroom use
`is granted without fee provided that the copies are not made or distributed for profit or
`commercial advantage, the copyright notice, the title of the publication, and its date appear,
`and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to
`republish, to post on servers, or to redistribute to lists, requires prior specific permission
`and / or a fee.
`© 1998 ACM 1046-8188/98/0700 –0256 $5.00
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998, Pages 256 –294.
`
`Unified Patents, Ex. 1010
`
`
`
`000001
`
`

`

`Arithmetic Coding Revisited
`
`•
`
`257
`
`1. INTRODUCTION
`During its long gestation in the 1970’s and early 1980’s, arithmetic coding
`[Rissanen 1976; Rissanen and Langdon 1979; Rubin 1979; Rissanen and
`Langdon 1981; Langdon 1984] was widely regarded more as a curiosity
`than a practical coding technique. This was particularly true for applica-
`tions where the alphabet has many symbols, as Huffman coding is usually
`reasonably effective in such cases [Manstetten 1992]. One factor that
`helped arithmetic coding gain the popularity it enjoys today was the
`publication of source code for a multisymbol arithmetic coder by Witten et
`al. [1987] in Communications of the ACM, which we refer to as the CACM
`implementation. One decade later, our understanding of arithmetic coding
`has further matured, and it is timely to review the components of that
`implementation and summarize the improvements that have emerged. We
`also describe a novel, previously unpublished, method for performing the
`underlying calculation needed for arithmetic coding. Software is available
`that implements the revised method.
`The major improvements discussed in this article and implemented in
`the software are as follows:
`
`—Enhanced models that allow higher-performance compression.
`
`—A more modular division into modeling, estimation, and coding sub-
`systems.
`
`—Data structures that support arithmetic coding on large alphabets.
`
`—Changes to the coding procedure that reduce the number of multiplica-
`tions and divisions and which permit most of them to be done with
`low-precision arithmetic.
`
`—Support for larger alphabet sizes and for more accurate representations
`of probabilities.
`
`—A reformulation of the decoding procedure that greatly simplifies the
`decoding loop and improves decoding speed.
`
`—An extension providing efficient coding for binary alphabets.
`
`To motivate these changes, we examine in detail the needs of a word-based
`model for text compression. While not the best-performing model for text
`(see, for example, the compression results listed by Witten et al. [1994]),
`word-based modeling combines several attributes that test the capabilities
`and limitations of an arithmetic coding system.
`The new implementation of arithmetic coding is both more versatile and
`more efficient than the CACM implementation. When combined with the
`same character-based model as the CACM implementation, the changes
`that we advocate result in up to two-fold speed improvements, with only a
`small loss of compression. This faster coding will also be of benefit in any
`other compression system that makes use of arithmetic coding (such as the
`block-sorting method of Burrows and Wheeler [1994]), though the percent-
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000002
`
`

`

`258
`
`•
`
`A. Moffat et al.
`
`age of overall improvement will of course vary depending on the time used
`in other operations and on the exact nature of the hardware being used.
`The new implementation is written in C, and is publicly available
`through the Internet by anonymous ftp, at munnari.oz.au, directory
`/pub/arith_coder, file arith_coder.tar.Z or arith_coder.tar.gz.
`The original CACM package [Witten et al. 1987] is at ftp.cpsc.ucalgary.
`ca in file /pub/projects/ar.cod/cacm-87.shar. Software that imple-
`ments the new method for performing the arithmetic coding calculations,
`but is otherwise similar to the CACM version, can be found at ftp.cs.
`toronto.edu in the directory /pub/radford, file lowp_ac.shar.
`In the remainder of this introduction we give a brief review of arithmetic
`coding, describe modeling in general, and word-based models in particular,
`and discuss the attributes that the arithmetic coder must embody if it is to
`be usefully coupled with a word-based model. Section 2 examines the
`interface between the model and the coder, and explains how it can be
`designed to maximize their independence. Section 3 shows how accurate
`probability estimates can be maintained efficiently in an adaptive compres-
`sion system, and describes an elegant structure due to Fenwick [1994]. In
`Section 4 the CACM arithmetic coder is reexamined, and our improvements
`are described. Section 5 analyzes the cost in compression effectiveness of
`using low precision for arithmetic operations. Low-precision operations
`may be desirable because they permit a shift/add implementation, details
`of which are discussed in Section 6. Section 7 describes the restricted coder
`for binary alphabets, and examines a simple binary model for text compres-
`sion. Finally, Section 8 reviews the results and examines the various
`situations in which arithmetic coding should and should not be used.
`
`1.1 The Idea of Arithmetic Coding
`We now give a brief overview of arithmetic coding. For additional back-
`ground the reader is referred to the work of Langdon [1984], Witten et al.
`[1987; 1994], Bell et al. [1990], and Howard and Vitter [1992; 1994].
`Suppose we have a message composed of symbols over some finite
`alphabet. Suppose also that we know the probability of appearance of each
`of the distinct symbols, and seek to represent the message using the
`smallest possible number of bits. The well-known algorithm of Huffman
`[1952] takes a set of probabilities and calculates, for each symbol, a code
`word that unambiguously represents that symbol. Huffman’s method is
`known to give the best possible representation when all of the symbols
`must be assigned discrete code words, each an integral number of bits long.
`The latter constraint in effect means that all symbol probabilities are
`approximated by negative powers of two. In an arithmetic coder the exact
`symbol probabilities are preserved, and so compression effectiveness is
`better, sometimes markedly so. On the other hand, use of exact probabili-
`ties means that it is not possible to maintain a discrete code word for each
`symbol; instead an overall code for the whole message must be calculated.
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000003
`
`

`

`Arithmetic Coding Revisited
`
`•
`
`259
`
`The mechanism that achieves this operates as follows. Suppose that p i is
`the probability of the ith symbol in the alphabet, and that variables L and
`R are initialized to 0 and 1 respectively. Value L represents the smallest
`binary value consistent with a code representing the symbols processed so
`far, and R represents the product of the probabilities of those symbols. To
`encode the next symbol, which (say) is the jth of the alphabet, both L and R
`must be refined: L is replaced by L 1 RO
`j21p i and R is replaced by R z p j,
`i51
`preserving the relationship between L, R, and the symbols so far processed.
`At the end of the message, any binary value between L and L 1 R will
`unambiguously specify the input message. We transmit the shortest such
`binary string, c. Because c must have at least 2Ø log2 Rø and at most
`2Ø log2 Rø 1 2 bits of precision, the procedure is such that a symbol with
`probability p j is effectively coded in approximately 2log2 p j bits, thereby
`meeting the entropy-based lower bound of Shannon [1948].
`This simple description has ignored a number of important problems.
`Specifically, the process described above requires extremely high precision
`arithmetic, since L and R must potentially be maintained to a million bits
`or more of precision. We may also wonder how best to calculate the
`cumulative probability distribution, and how best to perform the arith-
`metic. Solving these problems has been a major focus of past research, and
`of the work reported here.
`
`1.2 The Role of the Model
`The CACM implementation [Witten et al. 1987] included two driver pro-
`grams that coupled the coder with a static zero-order character-based
`model, and with a corresponding adaptive model. These were supplied
`solely to complete a compression program, and were certainly not intended
`to represent excellent models for compression. Nevertheless, several people
`typed in the code from the printed page and compiled and executed it,
`only—much to our chagrin—to express disappointment that the new
`method was inferior to widely available benchmarks such as Compress
`[Hamaker 1988; Witten et al. 1988].
`In fact, all that the CACM article professed to supply was a state-of-the
`art coder with two simple, illustrative, but mediocre models. One can think
`of the model as the “intelligence” of a compression scheme, which is
`responsible for deducing or interpolating the structure of the input,
`whereas the coder is the “engine room” of the compression system, which
`converts a probability distribution and a single symbol drawn from that
`distribution into a code [Bell et al. 1990; Rissanen and Langdon 1981]. In
`particular, the arithmetic coding “engine” is independent of any particular
`model. The example models in this article are meant purely to illustrate
`the demands placed upon the coder, and to allow different coders to be
`compared in a uniform test harness. Any improvements to the coder will
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000004
`
`

`

`260
`
`•
`
`A. Moffat et al.
`
`primarily yield better compression efficiency, that is, a reduction in time or
`space usage. Improvements to the model will yield improved compression
`effectiveness, that is, a decrease in the size of the encoded data. In this
`article we are primarily interested in compression efficiency, although we
`will also show that the approximations inherent in the revised coder do not
`result in any substantial loss of compression effectiveness.
`The revised implementation does, however,
`include a more effective
`word-based model [Bentley et al. 1986; Horspool and Cormack 1992; Moffat
`1989], which represents the stream as a sequence of words and nonwords
`rather than characters, with facilities for spelling out new words as they
`are encountered using a subsidiary character mode. Since the entropy of
`words in typical English text is around 10 –15 bits each, and that of
`nonwords is around 2–3 bits, between 12 and 18 bits are required to encode
`a typical five-character word and the following one-character nonword.
`Large texts are therefore compressed to around 30% of their input size (2.4
`bits per character)—a significant improvement over the 55%– 60% (4.4 – 4.8
`bits per character) achieved by zero-order character-based models of En-
`glish. Witten et al. [1994] give results comparing character-based models
`with word-based models.
`A word-based compressor can also be faster than a character-based one.
`Once a good vocabulary has been established, most words are coded as
`single symbols rather than as character sequences, reducing the number of
`time-consuming coding operations required.
`What is more relevant, for the purposes of this article, is that word-based
`models illustrate several issues that do not arise with character-based
`models:
`
`—An efficient data structure is needed to accumulate frequency counts for
`a large alphabet.
`
`for tokens, characters, and
`—Multiple coding contexts are necessary,
`lengths, for both words and nonwords. Here, a coding context is a
`conditioning class on which the probability distribution for the next
`symbol is based.
`
`—An escape mechanism is required to switch from one coding context to
`another.
`
`—Data structures must be resizable because there is no a priori bound on
`alphabet size.
`
`All of these issues are addressed in this article.
`Arithmetic coding is most useful for adaptive compression, especially
`with large alphabets. This is the application envisioned in this article, and
`in the design of the new implementation. For static and semistatic coding,
`in which the probabilities used for encoding are fixed, Huffman coding is
`usually preferable to arithmetic coding [Bookstein and Klein 1993; Moffat
`and Turpin 1997; Moffat et al. 1994].
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000005
`
`

`

`Arithmetic Coding Revisited
`
`•
`
`261
`
`Fig. 1. Modeling, statistics, and coder modules.
`
`2. COOPERATING MODULES
`It is useful to divide the process of data compression into three logically
`disjoint activities: modeling, statistics-gathering, and coding. This separa-
`tion was first articulated by Rissanen and Langdon [1981], although the
`CACM implementation of Witten et al. [1987] combined statistics gathering
`with modeling to give a two-way split. This section describes the three-way
`partition, which is reflected in our implementation by three cooperating
`modules. Examples are given that show how the interfaces are used.
`
`2.1 Modeling, Statistics, and Coding
`Of the three modules, the most visible is that which performs the modeling.
`Least visible is the coder. Between these two is the statistics module, which
`manipulates a data structure to record symbol frequency counts (or some
`other estimate of symbol probabilities). In detail, a statistics module used
`with an arithmetic coder must be able to report the cumulative frequency of
`all symbols earlier in the alphabet than a given symbol, and to record that
`this symbol has occurred one more time. Both the model and the coder are
`oblivious to the exact mechanism used to accomplish this: the model is
`unaware of the probability attached to each symbol; and the coder is
`unaware of symbol identifiers and the size of the alphabet. This organiza-
`tion is shown in Figure 1.
`The CACM implementation [Witten et al. 1987] has just one interface
`level, reflecting the two-way modeling/coding division of activity. An array
`cumfreq containing cumulative frequencies and an actual symbol identifier
`s are passed from model to coder to achieve the transmission of each
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000006
`
`

`

`262
`
`•
`
`A. Moffat et al.
`
`Module
`
`Statistics
`
`Table I. Module Interface Functions
`
`Encoder
`C 4 create_context()
`encode~C, s!
`install_symbol~C, s!
`purge_context~C!
`
`Decoder
`C 4 create_context()
`s 4 decode~C!
`install_symbol~C, s!
`purge_context~C!
`
`Coder
`
`start_encode()
`arithmetic_encode~l, h, t!
`
`finish_encode()
`
`start_decode()
`target 4 decode_target~t!
`arithmetic_decode~l, h, t!
`finish_decode()
`
`symbol. This forces both modules to use an array to maintain their
`information—an unnecessarily restrictive requirement. By divorcing the
`statistics module from both model and coder, any suitable data structure
`can be used to maintain the statistics. Section 3 below considers some
`alternatives.
`The main routines required to interface the modules are listed in Table I.
`(The implementation includes several other routines, mainly for initializing
`and terminating compression.) The routine install_symbol() in both encoder
`and decoder has the same functionality as encode() except that no output
`bits are transmitted or consumed: its purpose is to allow contexts to be
`primed, as if text had preceded the beginning of the transmission.
`The routine purge_context removes all records for that context, leaving it
`as if it had just been created. This allows “synchronization points” to be
`inserted in the compressed output using a finish_encode and start_encode
`pair, from which points adaptive decoding can commence without needing
`to process the entire compressed message. Purging model frequencies and
`inserting synchronization points does, of course, reduce the compression
`rate.
`A zero-order character-based model requires just one context and rela-
`tively simple control structures, as shown in the psuedocode of Algorithm
`Zero-Order Character-Based (Figure 2), which closely corresponds to the
`adaptive model described by Witten et al. [1987]. A context C is created,
`install_symbol() is used to make each valid character available, and en-
`code() is called once for each character in the input. The compressed
`message is terminated by an end_of_message symbol which has also been
`previously installed in C. The method of Algorithm Zero-Order Character-
`Based can easily be extended to a first-order character-based model using
`an array of contexts, one for each possible conditioning character. This
`would require considerably more memory, but would improve the compres-
`sion effectiveness without impacting execution speed. Many other modifica-
`tions are possible.
`Complex models require the use of multiple contexts. The word-based
`model described in Section 1.2 uses six contexts: a zero-order context for
`words, a zero-order context for nonwords (sequences of spaces and punctu-
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000007
`
`

`

`Arithmetic Coding Revisited
`
`•
`
`263
`
`Fig. 2. Algorithm Zero-Order Character-Based.
`
`ation), a zero-order character context for spelling out new words, a zero-
`order character context for spelling out new nonwords, and contexts for
`specifying the lengths of words and of nonwords. The encoder for that
`model is sketched as Algorithm Zero-Order Word-Based (Figure 3), except
`that for brevity the input is treated as a sequence of words rather than
`alternating “word, nonword” pairs and so only three contexts, denoted W,
`C, and L, are used. To cater for nonwords as well requires additional
`contexts W9, C9, and L9, along with an outer loop that alternates between
`words and nonwords by using each set of contexts in turn. Note that
`Algorithm Zero-Order Word-Based assumes that the length of each word is
`bounded, so that context L can be initialized. In our implementation the
`actual definition of a word was “a string of at most 16 alphanumeric
`characters”; long symbols are handled by breaking them into shorter ones
`with zero-length opposite symbols between.
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000008
`
`

`

`264
`
`•
`
`A. Moffat et al.
`
`Fig. 3. Algorithm Zero-Order Word-Based.
`
`The decoder, omitted in Figure 3, is the natural combination of the ideas
`presented in Algorithms Zero-Order Character-Based (Figure 2) and Zero-
`Order Word-Based (Figure 3).
`
`2.2 Coding Novel Symbols
`The character-based model of Algorithm Zero-Order Character-Based (Fig-
`ure 2) codes at most 257 different symbols (256 different eight-bit charac-
`ters plus the end_of_message symbol), all of which are made available in
`that context by explicit install_symbol() calls. In contrast to this, in the
`word-based model there is no limit to the number of possible symbols—the
`number of distinct word tokens in an input stream might be hundreds,
`thousands, or even millions. To cater for situations such as this in which
`the alphabet size is indeterminate, the function call encode~C, s! returns a
`flag escape_transmitted if the symbol s is not known in context C, or if, for
`some other reason, s has zero probability. In this event, the word is
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000009
`
`

`

`Arithmetic Coding Revisited
`
`•
`
`265
`
`encoded using the length and character models, and is then installed into
`the lexicon. As well as returning a flag to indicate a zero-probability
`symbol, the encoder must also explicitly transmit an escape code to the
`decoder so that the corresponding call decode~C! can similarly return an
`exception flag.
`This raises the vexatious question as to what probability should be
`assigned to this escape code—the so-called zero-frequency problem. Of the
`methods described in the literature (summarized by Moffat et al. [1994]),
`we chose a modified version of method XC [Witten and Bell 1991] which we
`call method AX, for “approximate X.” Method XC approximates the number
`of symbols of frequency zero by the number of symbols of frequency one. If
`t 1 symbols have appeared exactly once, symbol s i has appeared c i times,
`and t 5 Oic i is the total number of symbols that have been coded to date,
`then the escape code probability p escape is given by t 1/t and the probability
`of symbol s i is estimated as p i 5 ~1 2 t 1/t! z ~c i /t!.
`The drawback of method XC is the need for a two-stage coding process
`when the symbol is not novel— one step to transmit the information “not
`novel” (probability 1 2 t 1/t), and a second to indicate which nonnovel
`symbol it is (probability c i /t). It is not feasible to calculate the exact
`probability for use in a single coding step, since the need to represent the
`product p i restricts t to a very small range of values if overflow is to be
`avoided (see also the discussion in Section 4 below). Instead, for method AX
`we advocate
`
`pescape 5 ~t1 1 1!/~t 1 t1 1 1!
`
`pi 5 ci /~t 1 t1 1 1!.
`The “11” allows novel events to be represented even when there are no
`events of
`frequency one (in method XC this exception is handled by
`reverting to another method called “method C” in the literature); and t 1 is
`incorporated additively in the denominator by taking a first-order binomial
`approximation to ~1 2 t 1/t! 21. With this method a single coding step
`suffices, as t 1 t 1 1 1 can be represented in roughly half the number of
`bits as the denominator t 2 required by method XC. The difference is crucial,
`since for flexibility we desire t to be similar in magnitude to the largest
`integer that can be represented in a machine word. The change distorts the
`probabilities slightly, but all zero-frequency methods are heuristics any-
`way, and the effect is small.
`Once the escape code has been transmitted, the new token can be spelled
`out and added to the W context. Both encoder and decoder assign it the
`next available symbol number, so there is no need to specify an identifier.
`
`2.3 Storage Management
`An industrial-strength compression system must provide some mechanism
`to bound the amount of memory used during encoding and decoding. For
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000010
`
`

`

`266
`
`•
`
`A. Moffat et al.
`
`example, some compressors reclaim list items using a least-recently-used
`policy, so that the model structure continues to evolve when memory is
`exhausted. Others purge the model when memory is full, but retain a
`sliding window buffer of recent symbols so that a smaller replacement
`model can be rebuilt (using install_symbol) immediately. The exact mecha-
`nism used in any application will depend upon the needs of that applica-
`tion, in particular, on the amount of memory consumed by the structural
`model (Figure 1). Because of this dependence, the only facility we provide
`in our implementation is the routine purge_context(), which removes all
`records for that context, as if it had just been created. One rationalization
`for this abrupt “trash and start again” approach is that memory is typically
`so plentiful that trashing should be rare enough to cause little deteriora-
`tion in the average compression rate. Furthermore, in the particular case of
`the word-based model the impact is softened by the fact that the underlying
`character-based models do not need to be purged, so transmission of novel
`words while the lexicon is rebuilt is less expensive than it was originally. In
`practice, purging of the lexicon in the word-based compressor is rare. A
`memory limit of one megabyte is ample to process texts with a vocabulary
`of about 30,000 distinct words, such as The Complete Works of Shakespeare.
`
`3. DATA STRUCTURES
`We now turn to the statistics module, which is responsible for managing
`the data structure that records cumulative symbol frequencies. In particu-
`lar, a call encode~C, s! to encode symbol s in context C must result in a call
`to the coder module of arithmetic_encode~l C, s, h C, s, t C!, where l C, s and
`h C, s are the cumulative frequency counts in context C of symbols respec-
`tively prior to and including s, according to some symbol ordering, and t C is
`the total frequency of all symbols recorded in context C, possibly adjusted
`upward by the inclusion of a “count” for the escape symbol. To avoid
`excessive subscripting, we suppress explicit references to the context C
`whenever there is no ambiguity and use l s, h s, and t to denote the values
`that must be calculated.
`
`3.1 Ordered List
`In the CACM implementation [Witten et al. 1987] cumulative frequency
`counts are stored in an array, which is summed backward from the end of
`the alphabet. The alphabet is ordered by decreasing symbol frequency so as
`to place common symbols near the beginning. Symbols are indexed through
`a permutation vector, allowing l s and h s to be calculated very quickly. In
`order to increment the count of symbol s in the cumulative frequency array,
`a loop is used to increment each symbol to the left of—that is, of greater
`frequency than—s, thereby keeping the cumulative counts correct. To
`maintain this structure, 3n words of storage are needed for an alphabet of
`n symbols.
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000011
`
`

`

`Arithmetic Coding Revisited
`
`•
`
`267
`
`Although compact compared to the demands of adaptive Huffman coders,
`this mechanism is only effective for small alphabets, or for highly skewed
`distributions. An alternative structure has been described which bounds
`the number of loop iterations required to calculate the coding parameters
`for each symbol by the number of bits produced when that symbol is
`entropy-coded [Moffat 1990b]. The structure is asymptotically optimal for
`both uniform and skew distributions, taking time linear in the number of
`input symbols and output bits; it is also efficient in practice [Moffat et al.
`1994]. About 4n words are required for an alphabet of n symbols.
`
`3.2 Implicit Tree Structure
`More recently, Fenwick [1994] proposed an implicit tree organization for
`storing an array of cumulative frequency counts that preserves the symbol
`ordering and so requires only n words to represent an n-symbol alphabet.
`Fenwick’s method calculates and updates l s 5 O
`s21f s and h s 5 O
`s
`f s,
`i51
`i51
`where f s is the observed frequency of symbol s, in Q~log n! time per symbol,
`irrespective of the underlying entropy of the distribution.1 Although this is
`suboptimal for very skewed distributions (see, for example, the results of
`Moffat et al. [1994]), Fenwick’s method works well for most of the probabil-
`ity distributions encountered in practice, and is employed in our revised
`arithmetic coding implementation because of its low space requirements.
`Fenwick proposed that a single array F@1. . . n# be used to record partial
`cumulative frequencies in a regular pattern based upon powers of two. At
`positions that are one less than a multiple of two the array records the
`corresponding symbol frequency; at positions that are two less than a
`multiple of four the value stored is the sum of the frequency for two
`symbols; at positions that are four less than a multiple of eight the value
`stored is the sum of the immediately preceding four symbol frequencies;
`and so on.
`Cumulative counts can be updated in logarithmic time in this structure
`by using the following device to impose an implicit tree structure on the
`array. Suppose that s is some symbol number in the range 1 # s # n.
`Define backward~s! to be the integer obtained from s by subtracting the
`binary number corresponding to the rightmost one-bit in s. For example,
`the binary representation of 13 is 1101, and so backward~13! 5 1100 2 5
`12; backward~12! 5 1000 2 5 8; and backward~8! 5 0. Similarly, define
`forward~s! to be s 1 2 i where i is again the position of the rightmost
`one-bit in s. For example, forward~13! 5 14, forward~14! 5 16, and for-
`ward~16! 5 32. Both backward and forward can be readily implemented
`using bitwise operations if integers are represented in two’s-complement
`
`1We make use of the standard functional comparators O~!, Q~!, and V~! to mean “grows no
`faster than,” “grows at the same rate as,” and “grows at least as quickly as,” respectively. For
`precise definitions of these comparators see, for example, Cormen et al. [1990].
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000012
`
`

`

`268
`
`•
`
`A. Moffat et al.
`
`Fig. 4. Algorithm Fenwick Get Frequency.
`
`form:2 backward~i! as either “i 2 (i AND 2 i)” or “i AND ~i 2 1!”; and
`forward~i! as “i 1 (i AND 2 i)”, where AND is a bitwise logical “and”
`operator.
`Array F is assigned values so that F@s# 5 h s 2 h backward~s!, where h 0
`5 0. Table II shows an example set of f s values, and the corresponding F@i#
`values stored by Fenwick’s method.
`Given the array F, calculating h s and incrementing the frequency count
`of symbol s is achieved by the loop shown in Algorithm Fenwick Get
`Frequency (Figure 4), which calculates h s using F, where 1 # s # n.
`is calculated as F@13# 1 F@12# 1 F@8#, which by
`For example, h 13
`definition of F is the telescoping sum ~h 13 2 h 12! 1 ~h 12 2 h 8! 1 ~h 8 2
`h0! 5 h 13, as required. This is achieved in steps (1) and (2) of Algorithm
`Fenwick Get Frequency (Figure 4). It is also straightforward to increment a
`frequency count, using a process that updates the power-of-two aligned
`entries to the right of F@s#, as shown by steps (3) and (4) in Algorithm
`Fenwick Get Frequency (Figure 4). The value of variable increment can for
`the moment be taken to be one, but, as is explained below, may be larger
`than one if some further constraint—such as that the frequencies be
`partially normalized in some range—is also to be maintained. Since l s 5
`hs21, the same process can be used to calculate l s—without, of course, steps
`(3) and (4). (In practice, the two values are calculated jointly, since much of
`the work is common.) The decoding process, which must determine a
`symbol s whose cumulative frequency range straddles an integer target, is
`sketched in Algorithm Fenwick Get Symbol in Figure 5 (see Fenwick [1994]
`for details). The important point
`to note is that all access and
`
`2Slightly altered calculations are necessary for one’s-complement or sign-magnitude represen-
`tations [Fenwick 1994].
`
`ACM Transactions on Information Systems, Vol. 16, No. 3, July 1998.
`
`Unified Patents, Ex. 1010
`
`
`
`000013
`
`

`

`Arithmetic Coding Revisited

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket