`
`See discussions, stats, and author profiles for this publication at: htss://www.researchgate.net/publication/2984808
`See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/2984808
`
`Arithmetic Coding for Data Compression
`Arithmetic Coding for Data Compression
`
`Article in Proceedings of the IEEE • July 1994
`Article in Proceedings of the IEEE · July 1994
`DOI: 10.1109/5.286189 • Source: IEEE Xplore
`DOI: 10.1109/5.286189 · Source: IEEE Xplore
`
`CITATIONS
`CITATIONS
`305
`305
`
`2 authors, including
`2 authors, including:
`
`Jeffrey Scott Vitter
`Jeffrey Scott Vitter
`University of Mississippi
`University of Mississippi
`433 PUBLICATIONS 16,365 CITATIONS
`433 PUBLICATIONS 16,365 CITATIONS
`
`SEE PROFILE
`SEE PROFILE
`
`READS
`READS
`129
`129
`
`Some of the authors of this publication are also working on these related projects:
`Some of the authors of this publication are also working on these related projects:
`
`Data Compression View project
`Data Compression View project
`
`Pros.
`
`Compressed Data Structures View project
`Compressed Data Structures View project
`
`All content following this page was uploaded by Jeffrey Scott Vitter on 03 June 2013.
`All content following this page was uploaded by Jeffrey Scott Vitter on 03 June 2013.
`
`The user has requested enhancement of the downloaded file.
`The user has requested enhancement of the downloaded file.
`
`Unified Patents, Ex. 1008
`
`000001
`
`
`
`CS--1994--09
`CS--1994--09
`
`Arithmetic Coding for Data Compression
`Arithmetic Coding for Data Compression
`
`Paul G. Howard, Jeffrey S. Vitter
`Paul G. Howard, Jeffrey S. Vitter
`
`Department of Computer Science
`Department of Computer Science
`
`Duke University
`Duke University
`
`Durham, North Carolina 27708-0129
`Durham, North Carolina 27708-0129
`
`March 25, 1994
`March 25, 1994
`
`Unified Patents, Ex. 1008
`
`000002
`
`
`
`ArithmeticCodingforDataCompression
`
`Arithmetic Coding for Data Compression
`
`PAUL G. HOWARD AND JEFFREY SCOTT VITTER, FELLOW, IEEE
`
`Arithmetic coding provides an effective mechanism for remov-
`ing redundancy in the encoding of data. We show how arithmetic
`coding works and describe an efficient implementation that uses
`table lookup as a fast alternative to arithmetic operations. The
`reduced-precision arithmetic has a provably negligible effect on the
`amount of compression achieved. We can speed up the implemen-
`tation further by use of parallel processing. We discuss the role of
`probability models and how they provide probability information
`to the arithmetic coder. We conclude with perspectives on the
`comparative advantages and disadvantages of arithmetic coding.
`Index terms Data compression, arithmetic coding, lossless
`compression, text modeling, image compression, text compres-
`sion, adaptive, semi-adaptive.
`
`I. ARITHMETIC CODING
`
`The fundamental problem of lossless compression is to de-
`compose a data set (for example, a text file or an image)
`into a sequence of events, then to encode the events using as
`few bits as possible. The idea is to assign short codewords
`to more probable events and longer codewords to less prob-
`able events. Data can be compressed whenever some events
`are more likely than others. Statistical coding techniques
`use estimates of the probabilities of the events to assign the
`codewords. Given a set of mutually distinct events e1, e2 , e3,
`, e r„ and an accurate assessment of the probability dis-
`tribution P of the events, Shannon [1] proved that the the
`smallest possible expected number of bits needed to encode
`an event is the entropy of P, denoted by
`
`H(P)=
`
`—p{e k} log2 p{ek},
`
`k=1
`
`where p{ek} is the probability that event e k occurs. An op-
`timal code outputs — log2 p bits to encode an event whose
`probability of occurrence is p. Pure arithmetic codes sup-
`plied with accurate probabilities provides optimal compres-
`sion. The older and better-known Huffman codes [2] are
`optimal only among instantaneous codes, that is, those in
`which the encoding of one event can be decoded before en-
`coding has begun for the next event.
`In theory, arithmetic codes assign one "codeword" to each
`possible data set. The codewords consist of half-open subin-
`tervals of the half-open unit interval [0, 1), and are expressed
`by specifying enough bits to distinguish the subinterval cor-
`responding to the actual data set from all other possible
`subintervals. Shorter codes correspond to larger subinter-
`vals and thus more probable input data sets. In practice,
`
`Manuscript received Mmm DD, 1993. Some of this work was per-
`formed while both authors were at Brown University and while the
`first author was at Duke University. Support was provided in part by
`NASA Graduate Student Researchers Program grant NGT-50420, by
`a National Science Foundation Presidential Young Investigator Award
`with matching funds from IBM, and by Air Force Office of Scientific
`Research grant number F49620-92—J-0515. Additional support was
`provided by Universities Space Research Association/CESDIS asso-
`ciate memberships.
`P. G. Howard is with AT&T Bell Laboratories, Visual Communica-
`tions Research, Room 4C-516, 101 Crawfords Corner Road, Holmdel,
`NJ 07733-3030.
`J. S. Vitter is with the Department of Computer Science, Duke
`University, Box 90129, Durham, NC 27708-0129.
`IEEE Log Number 0000000.
`
`the subinterval is refined incrementally using the probabili-
`ties of the individual events, with bits being output as soon
`as they are known. Arithmetic codes almost always give bet-
`ter compression than prefix codes, but they lack the direct
`correspondence between the events in the input data set and
`bits or groups of bits in the coded output file.
`A statistical coder must work in conjunction with a mod-
`eler that estimates the probability of each possible event at
`each point in the coding. The probability model need not
`describe the process that generates the data; it merely has
`to provide a probability distribution for the data items. The
`probabilities do not even have to be particularly accurate,
`but the more accurate they are, the better the compression
`will be. If the probabilities are wildly inaccurate, the file may
`even be expanded rather than compressed, but the original
`data can still be recovered. To obtain maximum compres-
`sion of a file, we need both a good probability model and
`an efficient way of representing (or learning) the probability
`model.
`To ensure decodability, the encoder is limited to the use
`of model information that is available to the decoder. There
`are no other restrictions on the model; in particular, it can
`change as the file is being encoded. The models can be adap-
`tive (dynamically estimating the probability of each event
`based on all events that precede it), semi-adaptive (using
`a preliminary pass of the input file to gather statistics), or
`non-adaptive (using fixed probabilities for all files). Non-
`adaptive models can perform arbitrarily poorly [3]. Adaptive
`codes allow one-pass coding but require a more complicated
`data structure. Semi-adaptive codes require two passes and
`transmission of model data as side information; if the model
`data is transmitted efficiently they can provide slightly bet-
`ter compression than adaptive codes, but in general the cost
`of transmitting the model is about the same as the "learn-
`ing" cost in the adaptive case [4].
`To get good compression we need models that go beyond
`global event counts and take into account the structure of
`the data. For images this usually means using the numeric
`intensity values of nearby pixels to predict the intensity of
`each new pixel and using a suitable probability distribution
`for the residual error to allow for noise and variation between
`regions within the image. For text, the previous letters form
`a context, in the manner of a Markov process.
`In Section II, we provide a detailed description of pure
`arithmetic coding, along with an example to illustrate the
`process. We also show enhancements that allow incremental
`transmission and fixed-precision arithmetic. In Section III
`we extend the fixed-precision idea to low precision, and show
`how we can speed up arithmetic coding with little degrada-
`tion of compression performance by doing all the arithmetic
`ahead of time and storing the results in lookup tables. We
`call the resulting procedure quasi-arithmetic coding. In Sec-
`tion IV we briefly explore the possibility of parallel coding
`using quasi-arithmetic coding. In Section V we discuss the
`modeling process, separating it into structural and probabil-
`ity estimation components. Each component can be adap-
`tive, semi-adaptive, or static; there are two approaches to
`the probability estimation problem. Finally, Section VI pro-
`
`1
`
`
`
`vides a discussion of the advantages and disadvantages of
`arithmetic coding and suggestions of alternative methods.
`
`11. HOW ARITHMETIC CODING WORKS
`
`In this section we explain how arithmetic coding works
`and give operational details; our treatment is based on that
`of Witten, Neal, and Cleary [5]. Our focus is on encoding,
`but the decoding process is similar.
`
`videsadiscussionoftheadvantagesanddisadvantagesof
`arithmeticcodingandsuggestionsofalternativemethods.
`II.Howarithmeticcodingworks
`Inthissectionweexplainhowarithmeticcodingworks
`andgiveoperationaldetails;ourtreatmentisbasedonthat
`ofWitten,Neal,andCleary[].Ourfocusisonencoding,
`butthedecodingprocessissimilar.
`A.Basicalgorithmforarithmeticcoding
`Thealgorithmforencodinga(cid:12)leusingarithmeticcoding
`worksconceptuallyasfollows:
` .Webeginwitha\currentinterval"[L;H)initializedto
`[ ; ).
`.Foreacheventinthe(cid:12)le,weperformtwosteps.
`(a)Wesubdividethecurrentintervalintosubintervals,
`oneforeachpossibleevent.Thesizeofaevent's
`subintervalisproportionaltotheestimatedproba-
`bilitythattheeventwillbethenexteventinthe
`(cid:12)le,accordingtothemodeloftheinput.
`(b)Weselectthesubintervalcorrespondingtothe
`eventthatactuallyoccursnext,andmakeitthe
`newcurrentinterval.
` .Weoutputenoughbitstodistinguishthe(cid:12)nalcurrent
`intervalfromallotherpossible(cid:12)nalintervals.
`Thelengthofthe(cid:12)nalsubintervalisclearlyequaltothe
`productoftheprobabilitiesoftheindividualevents,which
`istheprobabilitypoftheparticularsequenceofeventsin
`the(cid:12)le.The(cid:12)nalstepusesatmostb(cid:0)logpc+bitsto
`distinguishthe(cid:12)lefromallotherpossible(cid:12)les.Weneedsome
`mechanismtoindicatetheendofthe(cid:12)le,eitheraspecialend-
`of-(cid:12)leeventcodedjustonce,orsomeexternalindicationof
`the(cid:12)le'slength.Eithermethodaddsonlyasmallamount
`tothecodelength.
`Instep,weneedtocomputeonlythesubintervalcorre-
`spondingtotheeventaithatactuallyoccurs.Todothisitis
`convenienttousetwo\cumulative"probabilities:thecumu-
`lativeprobabilityPC=Pi(cid:0) k= pkandthenext-cumulative
`probabilityPN=PC+pi=Pik= pk.Thenewsubinterval
`is[L+PC(H(cid:0)L);L+PN(H(cid:0)L)).Theneedtomain-
`tainandsupplycumulativeprobabilitiesrequiresthemodel
`tohaveacomplicateddatastructure,especiallywhenmany
`morethantwoeventsarepossible.
`Wenowprovideanexample,repeatedanumberoftimes
`toillustratedi(cid:11)erentstepsinthedevelopmentofarithmetic
`coding.Forsimplicitywechoosebetweenjusttwoeventsat
`eachstep,althoughtheprinciplesapplytothemulti-event
`caseaswell.Weassumethatweknowapriorithatwe
`havea(cid:12)leconsistingofthreeevents(orthreelettersinthe
`caseoftextcompression);the(cid:12)rsteventiseithera (with
`probabilitypfa g=
`
`Basic algovilliwn for arillirraclic coding
`
`The algorithm for encoding a file using arithmetic coding
`works conceptually as follows:
`1. We begin with a "current interval" [L, 11 ) initialized to
`[0, 1).
`2. For each event in the file, we perform two steps.
`(a) We subdivide the current interval into subintervals,
`one for each possible event. The size of a event's
`subinterval is proportional to the estimated proba-
`bility that the event will be the next event in the
`file, according to the model of the input.
`(b) We select the subinterval corresponding to the
`event that actually occurs next, and make it the
`new current interval.
`3. We output enough bits to distinguish the final current
`interval from all other possible final intervals.
`The length of the final subinterval is clearly equal to the
`product of the probabilities of the individual events, which
`is the probability p of the particular sequence of events in
`the file. The final step uses at most L-log., pi + 2 bits to
`distinguish the file from all other possible files. We need some
`mechanism to indicate the end of the file, either a special end-
`of-file event coded just once, or some external indication of
`the file's length. Either method adds only a small amount(
`to the code length.
`In step 2, we need to compute only the subinterval cone-
`sponding to the event ai that actually occurs. To do this it is
`convenient to use two "cumulative" probabilities: the cumu-
`-1 ph and the next-cumulative
`lative probability Pr =
`probability PH = Pr + p• =
`.1.) The new subinterval
`k =1
`;
`PH (!1 — L.)). The need to main-
`is [L. + Pc(ii — L.), L
`tain and supply cumulative probabilities requires the model
`to have a complicated data structure, especially when many
`more than two events are possible.
`We now provide an example, repeated a number of times
`to illustrate different steps in the development of arithmetic
`coding. For simplicity we choose between just two events at
`each step, although the principles apply to the multi-event
`case as well. We assume that we know a priori that we
`have a file consisting of three events (or three letters in the
`case of text compression); the first event is either al (with
`probability Mai = 1) or bi (with probability p{bi = 4);
`the second event is a, (with probability p{a.;} =
`) or lb;
`); and the third event is as (with
`(with probability p{b2
`=
`probability p{as } = t) or bs (with probability plbs1 = ).
`The actual file to be encoded is the sequence bia,bs.
`The steps involved in pure arithmetic coding are illus-
`trated in Table 1 and Fig. 1. In this example the final interval
`corresponding to the actual file bia,bs is [M,i). The length
`of the interval is ÷, 5 , which is the probability of bi
`bs , com-
`puted by multiplying the probabilities of the three events:
`. In binary, the final inter-
`}p{a, }p{bs = i4 =
`val is [0.110001 . .. ,0.110101 . ..). Since all binary numbers
`that begin with 0.11001 are entirely within this interval, out-
`putting 11001 suffices to uniquely identify the interval.
`
`Table 1 Example of pure azithirieLic coding
`
`Action
`
`Subint ervals
`
`Start
`
`Subdivide with left prob. Hail =
`Input 1il , select right subinterval
`
`S ubdivide with left prob. p{ a, =
`Input u,, select left subinterval
`
`[0,1)
`[0, i). [1.1)
`[1.1)
`,1)
`
`),[
`
`[3,
`
`[ig) [g,D
`Subdivide with left prob. plus =
`[4 ,
`Input bs, select right subinterval
`= [0.110001 _ 2,0.110101 .. . 2)
`
`Output 11001
`
`0.11001, is the shortest binary
`fraction that lies within [*3).
`,
`
`0
`
`•Ir
`
`0
`
`L
`
`initial current interval
`subdivide
`-1
`3.1C
`') .
`
`2
`s
`
`al
`
`subdivide
`
`---
`
`1'
`
`1
`
`_,..
`
`s
`
`bl
`select b1
`
`subdivide
`
`b.)
`
`1
`
`s
`
`a2
`select a,
`4
`3.
`i
`4-4
`bs u5
`as
` l
`
`select bs
`a
`
`output 11001 .4— 0.110012
`
`0.110102
`
`Fig. 1. Pure ariLlutieLic coding graphically illusLra,Led.
`
`B. incrcracrilal maws'
`
`The basic implementation of arithmetic coding described
`above has two major difficulties: the shrinking current in-
`terval requires the use of high precision arithmetic-, and no
`output is produced until the entire file has been read. The
`most straightforward solution to both of these problems is
`to output each leading bit as soon as it is known, and then
`to double the length of the current interval so that it re-
`flects only the unknown part of the final interval. Wit ten,
`Neal, and Cleary [5] add a clever mechanism for preventing
`the current interval from shrinking too much when the end-
`points are close to A but straddle
`In that case we do not
`yet know the next output bit, but we do know that what-
`ever it is, the following bit will have the opposite value; we
`merely keep track of that fact, and expand the current inter-
`val symmetrically about
`This follow-on procedure may
`be repeated any number of times, so the current interval size
`is always strictly longer than 1.
`Mechanisms for incremental transmission and fixed preci-
`sion arithmetic- have been developed through the years by
`Pasco [6], Rissanen [7], Rubin [8], Rissanen and Langdon
`[9], Guazzo [10], and Witten, Neal, and Cleary [5]. The
`bit-st tilling idea of Langdon and others at IBM that limits
`the propagation of carries in the additions serves a function
`similar to that of the follow-on procedure described above.
`
`
`
`TableExampleofpurearithmeticcodingwithincremental
`transmissionandintervalexpansion
`Action
`Subintervals
`
`Action
`
`Subintervals
`
`Table 2 Example of pure arithmetic coding with incremental
`transmission and interval expansion
`
`Start
`
`[0, 1)
`
`Subdivide with left prob. Ma l =
`Input b1, select right subinterval
`Output 1, expand
`
`Subdivide with left prob. p{a2} =
`Input a2 , select left subinterval
`Increment follow count, expand
`
`Subdivide with left prob. p{a3} =
`Input b3 , select right subinterval
`Output 1, output 0 (follow bit), expand
`
`r i 17 ) r17
`6 /
`30 /,
`r17 5
`)
`30'
`
`[ 5
`D is entirely within [ 1+5 , 3)
`
`Output 01
`
`o
`
`<
`0
`
`V
`1
`6
`
`<
`
`1 6
`
`1
`
`1
`
`1
`
`1
`
`1
`
`initial current interval
`subdivide
`
`2
`3
`(11
`
`.
`
`1
`3
`bl
`select b1
`
`output 1; expand
`
`i v
`7
`
`1 3
`
`1
`3
`
`.
`
`1
`
`.
`
`1
`2
`a2
`
`select a2
`
`subdivide
`
`2
`3
`
`2
`3
`
`.
`
`follow: expand
`
`3
`a3
`
`subdivide
`><
`17
`30
`
`17
`. 30
`
`2 _)..
`
`b3
`select b3
`
`1
`2
`b2
`
`V
`5
`6
`
`5
`6
`
`5
`6
`
`output if = 10; expand
`
`0.012
`
`2
`3
`0.102 —> output 01
`
`Fig. 2. Pure arithmetic coding with incremental transmission
`and interval expansion, graphically illustrated
`
`We now describe in detail how the incremental output and
`interval expansion work. We add the following step immedi-
`ately after the selection of the subinterval corresponding to
`an input event, step 2(b) in the basic algorithm above.
`2. (c) We repeatedly execute the following steps in se-
`quence until the loop is explicitly halted:
`1. If the new subinterval is not entirely within one
`of the intervals [0, D,
`or g, 1), we exit
`the loop and return.
`
`3
`
`2. If the new subinterval lies entirely within [0, D,
`we output 0 and any following is left over from
`previous events; then we double the size of the
`subinterval by linearly expanding [0, D to [0, 1).
`3. If the new subinterval lies entirely within g, 1),
`we output 1 and any following Os left over from
`previous events; then we double the size of the
`subinterval by linearly expanding g, 1) to [0, 1).
`4. If the new subinterval lies entirely within [1,
`we keep track of this fact for future output by
`incrementing the follow count; then we double
`the size of the subinterval by linearly expanding
`D to [0,1).
`
`Table 2 and Fig. 2 illustrate this process. In the exam-
`ple, interval expansion occurs exactly once for each input
`event, but in other cases it may occur more than once or not
`at all. The follow-on procedure is applied when processing
`the second input event a2 . The 1 output after processing
`the third event b3 is therefore followed by its complement 0.
`The final interval is [ 1+5 ,
`Since all binary numbers that
`start with 0.01 are within this range, outputting 01 suffices
`to uniquely identify the range. The encoded file is 11001,
`as before. This is no coincidence: the computations are es-
`sentially the same. The final interval is eight times as long
`as in the previous example because of the three doublings of
`the current interval.
`
`Clearly the current interval contains some information
`about the preceding inputs; this information has not yet
`been output, so we can think of it as the coder's state. If a
`is the length of the current interval, the state holds — log2 a
`bits not yet output. In the basic method the state contains
`all the information about the output, since nothing is out-
`put until the end. In the incremental implementation, the
`state always contains fewer than two bits of output informa-
`tion, since the length of the current interval is always more
`than 1. The final state in the incremental example is [ 1+5 ,
`which contains — log2 k
`0.907 bits of information; the fi-
`nal two output bits are needed to unambiguously transmit
`this information.
`
`C. Use of integer arithmetic
`
`In practice, the arithmetic can be done by storing the
`endpoints of the current interval as sufficiently large inte-
`gers rather than in floating point or exact rational numbers.
`We also use integers for the frequency counts used to esti-
`mate event probabilities. The subdivision process involves
`selecting non-overlapping intervals (of length at least 1) with
`lengths approximately proportional to the counts. Table 3
`illustrates the use of integer arithmetic using a full inter-
`val of [0, N) = [0, 1024). (The graphical version of Table 3
`is essentially the same as Fig. 2 and is not included.) The
`length of the current interval is always at least N/4 2, 258
`in this case, so we can always use probabilities precise to at
`least 1/258; often the precision will be near 1/1024. In prac-
`tice we use even larger integers; the interval [0, 65 536) is a
`common choice, and gives a practically continuous choice of
`probabilities at each step. The subdivisions in this example
`are not quite the same as those in Table 2 because the re-
`sulting intervals are rounded to integers. The encoded file
`is 11001 as before, but for a longer input file the encodings
`would eventually diverge.
`
`
`
`Table Exampleofarithmeticcodingwithincrementaltrans-
`mission,intervalexpansion,andintegerarithmetic.Fullinterval
`is[ ; ),soine(cid:11)ectsubintervalendpointsareconstrainedto
`
`bemultiplesof
`
`Table 3 Example of arithmetic coding with incremental trans-
`
`mission, interval expansion, and integer
`arithmetic. Full interval
`is [0,1024), so in effect subinterval endpoints
`are constrained to
`be
`multiples of 1024
`1
`
`Table 5 Excerpts from quasi-arithmetic coding table, N = 8.
`Only the three states needed for the example are shown; there
`are nine more states. An "f' output indicates application of the
`follow-on procedure described in the text.
`
`Action
`
`Start
`
`Subintervals
`
`[0, 1024)
`
`Start
`state
`
`Probability
`of left input
`
`Left (a) input
`
`Output
`
`state
`
`Right (b) input
`Next
`fate
`
`Output
`
`subdivide with
`Ma l =
`0.66699
`left
`probability = 1608234
`Input b1 , select right subinterval
`Output 1, expand
`
`[0, 683), [683, 1024)
`[683, 1024)
`[342, 1024)
`
`Subdivide with left prob. p{a2} = z
`Input a2 , select left subinterval
`Increment follow count, expand
`
`[342, 683), [683, 1024)
`[342, 683)
`[172, 854)
`
`Ma l = i; subdivide with
`left probability =
`0.59971
`Input b3 , select right subinterval
`Output 1, output 0 (follow bit), expand
`
`[172, 581), [581, 854)
`[581, 854)
`[138, 654)
`
`Output 01
`
`[256, 512) is entirely within [138, 654)
`
`Table 4 Example of arithmetic coding with incremental trans-
`mission, interval expansion, and small integer arithmetic. Full
`interval is [0, 8), so in effect subinterval endpoints are constrained
`to be multiples of k.
`Action
`
`Subintervals
`
`Start
`
`subdivide with left prob. =
`Ma l =
`Input b1 , select right subinterval
`Output 1, expand
`
`Subdivide with left prob. p{a2} =
`Input a2 , select left subinterval
`Increment follow count, expand
`
`subdivide with left prob. =
`pla31 =
`Input b3 , select right subinterval
`Output 1, output 0 (follow bit), expand
`Output 0, expand
`
`[0, 8)
`
`[0, 5), [5, 8)
`[5, 8)
`[2, 8)
`
`[2, 5), [5, 8)
`[2, 5)
`[0, 6)
`
`[0, 4), [4, 6)
`[4, 6)
`[0, 4)
`[0, 8)
`
`[0,8) 0.000 - 0.182
`0.182 - 0.310
`0.310 - 0.437
`0.437 - 0.563
`0.563 - 0.690
`0.690 - 0.818
`0.818 - 1.000
`[0,6) 0.000 - 0.244
`0.244 - 0.415
`0.415 - 0.585
`0.585 - 0.756
`0.756 - 1.000
`
`[2,8) 0.000 - 0.244
`0.244 - 0.415
`0.415 - 0.585
`0.585 - 0.756
`0.756 - 1.000
`
`000
`00
`0
`0
`-
`-
`-
`000
`00
`0
`0
`-
`
`010
`01
`f
`f
`-
`
`[0, 8)
`[0, 8)
`[0, 6)
`[0, 8)
`[0, 5)
`[0, 6)
`[0, 7)
`[0, 8)
`[0, 8)
`[0, 6)
`[0, 8)
`[0, 5)
`
`[0, 8)
`[0, 8)
`[0, 6)
`[0, 8)
`[2, 7)
`
`[1, 8)
`[2, 8)
`[3, 8)
`[0, 8)
`[2, 8)
`[0, 8)
`[0, 8)
`[1, 6)
`[0, 8)
`[2, 8)
`[0, 8)
`[0, 8)
`
`[3, 8)
`[0, 8)
`[2, 8)
`[0, 8)
`[0, 8)
`
`1
`1
`11
`111
`
`f
`f
`10
`101
`
`1
`1
`11
`111
`
`Table 6 Example of operation of quasi-arithmetic coding
`
`Start in state [0, 8).
`Probla l = 3, so choose range 0.563 - 0.690 in [0, 8).
`First event is b1 , so choose right (b) input.
`Output 1. Next state is [2, 8).
`
`Prob{a2} = z, so choose range 0.415 - 0.585 in [2, 8).
`Second event is a2 , so choose left (a) input.
`f means increment follow count. Next state is [0, 6).
`
`Prob{a3} = 5, so choose range 0.585 - 0.756 in [0, 6).
`Third event is b3 , so choose right (b) input.
`Indicated output is 10. Output 1; output 0 to account
`for follow bit; output 0. Next state is [0, 8).
`
`III. LIMITED-PRECISION ARITHMETIC CODING
`
`A. Development of binary quasi-arithmetic coding
`
`Arithmetic coding as it is usually implemented is slow be-
`cause of the multiplications (and in some implementations,
`divisions) required in subdividing the current interval ac-
`cording to the probability information. Since small errors
`in probability estimates cause very small increases in code
`length, we expect that by introducing approximations into
`the arithmetic coding process in a controlled way we can
`improve coding speed without significantly degrading com-
`pression performance. In the Q-Coder work at IBM [11],
`the time-consuming multiplications are replaced by additions
`and shifts, and low-order bits are ignored. In [12] we describe
`a different approach to approximate arithmetic coding. Re-
`calling that the fractional bits characteristic of arithmetic
`coding are stored as state information in the coder, we re-
`duce the number of possible states, and replace arithmetic
`operations by table lookups; the lookup tables can be pre-
`computed. Here we review this reduced precision binary
`arithmetic coder, which we call a quasi-arithmetic coder. It
`should be noted that the compression is still completely re-
`versible; using reduced precision merely affects the average
`code length.
`
`We have seen that doing arithmetic coding with large in-
`tegers instead of real or rational numbers hardly degrades
`compression performance at all. In Table 4 we show the en-
`coding of the same file using small integers: the full interval
`[0, N) is only [0, 8).
`The number of possible states (after applying the interval
`expansion procedure) of an arithmetic coder using the inte-
`ger interval [0, N) is 3N2 /16. If we can reduce the number
`of states to a more manageable level, we can precompute all
`state transitions and outputs and substitute table lookups
`for arithmetic in the coder. The obvious way to reduce the
`number of states is to reduce N. The value of N should be a
`power of 2; its value must be at least 4. If we choose N = 8
`and apply the arithmetic coding algorithm in a straightfor-
`ward way, we obtain a table with 12 states, each state offering
`a choice of between 3 and 7 probability ranges. Portions of
`the table are shown in Table 5.
`Table 6 shows how Table 5 is used to encode our sam-
`ple file. Before coding each input event the coder is in a
`certain current state, corresponding to the current subinter-
`val. For each state there are a number of probability ranges;
`
`4
`
`
`
`wechoosetheonethatincludestheestimatedprobability
`forthenextevent.Thenwesimplyselecttheinputevent
`thatactuallyoccursandperformtheindicatedactions:out-
`puttingbits,incrementingthefollowcount,andchangingto
`anewcurrentstate.Intheexampletheencodedoutput(cid:12)le
`is .Becausewewereusingsuchlowprecision,thesub-
`divisionprobabilitiesbecamedistorted,leadingtoalower
`probabilityforthe(cid:12)le( = ),butonewhichendsinthefull
`interval[ ;),requiringnodisambiguatingbits.Weusually
`useasomewhatlargervalueofN;inpracticethecompres-
`sionine(cid:14)ciencyofabinaryquasi-arithmeticcoder(neglect-
`ingverylargeandverysmallprobabilities)islessthanone
`percentforN= andabout .percentforN= .
`Inimplementations,thecodingtablescanbestripped
`downsothatthenumericalvaluesoftheintervalendpoints
`andprobabilityrangesdonotappear.Fulldetailsandex-
`amplesappearin[ ].Thenextsectionexplainsthecompu-
`tationoftheprobabilityranges.Decodingusesessentially
`thesametables,andinfactiseasierthanencoding.
`B.Analysisofbinaryquasi-arithmeticcoding
`Wenowprovethatusingbinaryquasi-arithmeticcoding
`causesaninsigni(cid:12)cantincreaseinthecodelengthcompared
`withpurearithmeticcoding.Wemathematicallyanalyze
`severalcases.
`Inthisanalysisweexcludeverylargeandverysmallprob-
`abilities,namelythosethatarelessthan =Worgreaterthan
`(W(cid:0) )=W,whereWisthewidthofthecurrentinterval.
`Fortheseprobabilitiestherelativeexcesscodelengthcanbe
`large.Firstweassumethatweknowtheprobabilitypoftheleft
`branchofeachevent,andweshowbothhowtominimize
`theaverageexcesscodelengthandhowsmalltheexcess
`is.Inarithmeticcodingwedividethecurrentinterval(of
`widthW)intosubintervalsoflengthLandR;thisgives
`ane(cid:11)ectivecodingprobabilityq=L=Wsincetheresulting
`codelengthis(cid:0)logqfortheleftbranchand(cid:0)log( (cid:0)q)for
`theright.Whenweencodeasequenceofbinaryeventswith
`probabilitiespand (cid:0)pusinge(cid:11)ectivecodingprobabilitiesq
`and (cid:0)q,theaveragecodelengthL(p;q)isgivenby
`L(p;q)=(cid:0)plogq(cid:0)( (cid:0)p)log( (cid:0)q):
`Ifweusepurearithmeticcoding,wecansubdividethein-
`tervalintolengthspWand( (cid:0)p)W,thusmakingq=p
`andgivinganaveragecodelengthequaltotheentropy,
`H(p)=(cid:0)plogp(cid:0)( (cid:0)p)log( (cid:0)p);thisisoptimal.
`Considertwoprobabilitiesp andpthatareadjacent
`basedonthesubdivisionofanintervalofwidthW;inother
`words,p =(W(cid:0)(cid:1) )=W,p=(W(cid:0)(cid:1))=W,and(cid:1)=
`(cid:1) (cid:0) .Foranyprobabilitypbetweenp andp,either
`p orpshouldbechosen,whichevergivesashorteraverage
`codelength.Thereisacuto(cid:11)probabilityp(cid:3)forwhichp and
`pgivethesameaveragecodelength.Wecancomputep(cid:3)
`bysolvingtheequationL(p(cid:3);p )=L(p(cid:3);p),giving
`
`p(cid:3)=
`
`-
`
`P2
`log —
`-
`1+
`ph
`1 —
`log
`
`p2
`
`We can ffic Equation (1) to compute the probability range=
`in the coding la blex Ao an example. we compute the cutoff
`probability used hi deciding whether to subdivide interval
`[0.6) as 1[0,3), [3,64 or 1[0.4[1.64: tins is the number
`
`we chooye the on e that he-lady= the colimated probability
`to
`the next event. 'Ellen we ',imply =led the input event
`that actually OC CLEO and pertorm the indicated action= out-
`putting bib,. incrementing the follow count. and changing to
`a new current date. In the example the encoded output file
`is 1100. Because we were uoing ouch low precioion, the oub-
`dividon probabilities became did hied, leading to a lower
`probability for the lid (1/16), but one which tondo in the lull
`interval [0.8). requiring no diyambiguating bit= We usually
`ewhat larger value ot N: in practice the compreo-
`Ifdcncy ot a binary quad -arithmetic coder (neglect-
`oion
`' nkt very large and very omall probabilitim)
`leyo than one
`percent for Ai =32 and about 0.2 percent tor
`= 128.
`In implementation', tire coding table', can be stripped
`down so that the numerical values of the interval endpoint,
`and probability fairish= do not appear. Full details and ch-
`amp:Ho appear in [13]. Theft:ex( occlion explaino the compu-
`tation of the probability range= Decoding L
`ohdially
`the mime tables, and in tact io easier than encoding.
`
`B. 4rialy5i5 o f binary
`
`coding
`
`We now prove that uoing binary quad-arithmetic coding
`canoe', an inoignificald increase in the code length compared
`with pare arithmetic coding. We mathematically analyze
`oeveral cayey.
`lnII s analydo we exclude very large and very small prob-
`abiliGeo. namely am= that arc Imo than lilt or greater than
`(4 — 1)/4 WhereItP the width ot the current interval.
`Fo tLOC proba blink= the relative excess code:Length can be
`large.
`First we ayoume that we know the probability pot the kill
`branch ot each event. and we slow both how to minhnize
`the average exceoy code length and how small the exceoy
`s. In arithmetic coding we divide the current interval (of
`width 4 ) into subintervals of length L and R: thio
`an effective coding probability 4= L/4 office the resalting
`code length —
`q for the left branch and —log,(1—tOtor
`the right. When we encode a =xi ffince ot binary evenly with
`probabilities p and 1—p Lying effective coding probabiliticy g
`and l— q. the average code length L(p. q)
`given by
`
`L(P, '0= —P106( 2 —
`
`— 0 100 1 —q).
`
`If we ffic pure arithmetic coding. we can oubdivide