`
`Edgar H. Sibley
`Panel Editor
`
`The state of the art in data compression is arithmetic coding, not the better-
`known Huffman method. Arithmetic coding gives greater compression, is
`faster for adaptive models, and clearly separates the model from the channel
`encoding.
`
`ARITHMETIC CODING FOR
`DATA COIUPRESSION
`
`IAN H. WIllEN, RADFORD M. NEAL, and JOHN G. CLEARY
`
`in most respects to the
`coding is superior
`Arithmetic
`better-known Huffman
`[lo] method. It represents in-
`formation at least as compactly-sometimes
`consid-
`erably more so. Its performance
`is optimal without
`the need for blocking of input data. It encourages a
`clear separation between the model for representing
`data and the encoding of information with respect to
`that model. It accommodates adaptive models easily
`and is computationally
`efficient. Yet many authors
`and practitioners seem unaware of the technique.
`Indeed there is a widespread belief that Huffman
`coding cannot be improved upon.
`We aim to rectify
`this situation by presenting an
`accessible implementation
`of arithmetic coding and
`by detailing
`its performance characteristics. We start
`by briefly
`reviewing basic concepts of data compres-
`sion and introducing
`the model-based approach that
`underlies most modern techniques. We then outline
`the idea of arithmetic coding using a simple exam-
`ple, before presenting programs for both encoding
`and decoding. In these programs the model occupies
`a separate module so that different models can easily
`be used. Next we discuss the construction of fixed
`and adaptive models and detail the compression
`efficiency and execution
`time of the programs,
`including
`the effect of different arithmetic word
`lengths on compression efficiency. Finally, we out-
`line a few applications where arithmetic coding is
`appropriate.
`
`Financial support for this work has been provided by the Natural Sciences
`and E@neering Research Council of Canada.
`
`UNIX
`
`is a registered trademark of AT&T Bell Laboratories.
`
`0 1987 ACM OOOl-0782/87/OtiOO-0520 750
`
`DATA COMPRESSION
`To many, data compression conjures up an assort-
`ment of ad hoc techniques such as conversion of
`spaces in text to tabs, creation of special codes for
`common words, or run-length coding of picture data
`(e.g., see [8]). This contrasts with
`the more modern
`model-based paradigm for coding, where, from an
`input string of symbols and a model, an encoded string
`is produced that is (usually) a compressed version of
`the input. The decoder, which must have access to
`the same model, regenerates the exact input string
`from the encoded string. Input symbols are drawn
`from some well-defined
`set such as the ASCII or
`binary alphabets; the encoded string is a plain se-
`quence of bits. The model is a way of calculating,
`in
`any given context,
`the distribution
`of probabilities
`for the next input symbol. It must be possible for the
`decoder to produce exactly
`the same probability dis-
`tribution
`in the same context. Compression
`is
`achieved by transmitting
`the more probable symbols
`in fewer bits than the less probable ones.
`For example, the model may assign a predeter-
`mined probability
`to each symbol in the ASCII
`alphabet. No context
`is involved. These probabilities
`can be determined by counting
`frequencies
`in repre-
`sentative samples of text to be transmitted. Such a
`fixed model is communicated
`in advance to both en-
`coder and decoder, after which
`it is used for many
`messages.
`that an adaptive
`the probabilities
`Alternatively,
`model assigns may change as each symbol is trans-
`mitted, based on the symbol frequencies seen so far
`in the message. There is no need for a representative
`
`520
`
`Communications of the ACM
`
`June 1987 Volume 30 Number 6
`
`IPR2018-01413
`Sony EX1013 Page 1
`
`
`
`Computing Practices
`
`sample of text, because each message is treated as
`an independent unit, starting from scratch. The en-
`coder’s model changes with each symbol transmit-
`ted, and the decoder’s changes with each symbol
`received, in sympathy.
`More complex models can provide more accurate
`probabilistic predictions and hence achieve greater
`compression. For example, several characters of pre-
`vious context could condition
`the next-symbol prob-
`ability. Such methods have enabled mixed-case Eng-
`lish text to be encoded in around 2.2 bits/character
`with
`two quite different kinds of model [4, 61. Tech-
`niques that do not separate modeling
`from coding
`so distinctly,
`like that of Ziv and Lempel (231, do
`not seem to show such great potential
`for compres-
`sion, although
`they may be appropriate when the
`aim is raw speed rather than compression per-
`formance [22].
`The effectiveness of any model can be measured
`by the entropy of the message with respect to it,
`usually expressed in bits/symbol. Shannon’s funda-
`mental theorem of coding states that, given messages
`randomly generated from a model, it is impossible
`to
`encode them into less bits (on average) than the en-
`tropy of that model [21].
`A message can be coded with respect to a model
`using either Huffman or arithmetic coding. The for-
`mer method is frequently advocated as the best pos-
`sible technique
`for reducing
`the encoded data rate.
`It is not. Given that each symbol in the alphabet
`must translate into an integral number of bits in the
`encoding, Huffman coding indeed achieves “mini-
`mum redundancy.”
`In other words, it performs opti-
`mally if all symbol probabilities are integral powers
`of %. But this is not normally
`the case in practice;
`indeed, Huffman coding can take up to one extra bit
`per symbol. The worst case is realized by a source
`in which one symbol has probability approaching
`unity. Symbols emanating
`from such a source con-
`vey negligible
`information on average, but require at
`least one bit to transmit
`[7]. Arithmetic
`coding dis-
`penses with
`the restriction
`that each symbol must
`translate into an integral number of bits, thereby
`coding more efficiently.
`It actually achieves the the-
`oretical entropy bound to compression efficiency
`for
`any source, including
`the one just mentioned.
`In general, sophisticated models expose the defi-
`ciencies of Huffman coding more starkly
`than simple
`ones. This is because they more often predict sym-
`bols with probabilities close to one, the worst case
`for Huffman coding. For example, the techniques
`mentioned above that code English text in 2.2 bits/
`character both use arithmetic coding as the final
`step, and performance would be impacted severely
`
`if Huffman coding were substituted. Nevertheless,
`since our topic is coding and not modeling,
`the illus-
`trations
`in this article all employ simple models.
`Even so, as we shall see, Huffman coding is inferior
`to arithmetic coding.
`The basic concept of arithmetic coding can be
`traced back to Elias in the early 1960s (see [l,
`pp. 61-621). Practical
`techniques were first intro-
`duced by Rissanen [16] and Pasco [15], and de-
`veloped further by Rissanen [17]. Details of the
`implementation
`presented here have not appeared
`in the literature before; Rubin
`[2O] is closest to our
`approach. The reader interested
`in the broader class
`of arithmetic codes is referred to [18]; a tutorial
`is
`available
`in [l3]. Despite these publications,
`the
`method is not widely known. A number of recent
`books and papers on data compression mention
`it
`only in passing, or not at all.
`
`CODING
`IDEA OF ARITHMETIC
`THE
`In arithmetic coding, a message is represented by an
`interval of real numbers between 0 and 1. As the
`message becomes longer, the interval needed’to rep-
`resent it becomes smaller, and the number of bits
`needed to specify that interval grows. Successive
`symbols of the message reduce the size of the inter-
`val in accordance with
`the symbol probabilities gen-
`erated by the model. The more likely symbols re-
`duce the range by less than the unlikely symbols
`and hence add fewer bits to the message.
`Before anything
`is transmitted,
`the range for the
`message is the entire interval
`l), denoting
`the
`[0,
`half-open
`interval 0 5 x < 1. As each symbol is
`processed, the range is narrowed
`to that portion of it
`allocated to the symbol. For example, suppose the
`alphabet is (a, e, i, O, u, !I, and a fixed model is used
`with probabilities shown in Table I. Imagine trans-
`
`TABLE I. Example Fixed Model for Alphabet (a, e, i, o, u, !)
`
`Symbol
`
`Probability
`
`.2
`.3
`.l
`.2
`.l
`.l
`
`Range
`LO, 0.2)
`[0.2, 0.5)
`[0.5, 0.6)
`[0.6,0.8)
`[0.8, 0.9)
`[0.9, 1.0)
`
`the message eaii!. Initially, both encoder
`mitting
`and decoder know that the range is [0, 1). After
`it to
`seeing the first symbol, e, the encoder narrows
`the range the model allocates to this sym-
`[0.2, 04,
`bol. The second symbol, a, will narrow
`this new
`range to the first one-fifth of it, since a has been
`
`June 1987 Volume 30 Number 6
`
`Communications of the ACM
`
`521
`
`IPR2018-01413
`Sony EX1013 Page 2
`
`
`
`Computing Practices
`
`[0, 0.2). This produces [O.Z, 0.26), since the
`allocated
`previous range was 0.3 units long and one-fifth of
`that is 0.06. The next symbol, i, is allocated
`[0.5, 0.6),
`which when applied to [0.2, 0.26) gives the smaller
`range [0.23, 0.236). Proceeding in this way, the en-
`coded message builds up as follows:
`
`Initially
`After seeing e
`a
`i
`i
`!
`
`1)
`0.5)
`;::2,
`0.26)
`p.2,
`0.236)
`[0.23,
`0.2336)
`[0.233,
`[0.23354, 0.2336)
`
`Figure 1 shows another representation of the en-
`coding process. The vertical bars with
`ticks repre-
`sent the symbol probabilities stipulated by the
`model. After the first symbol has been processed, the
`model is scaled into the range [0.2, 0.5), as shown in
`
`After
`seeing
`
`Nothing
`
`’ ! U
`0 ri e
`
`0
`
`i
`
`a
`
`e
`
`a
`
`3
`
`Figure la. The second symbol scales it again into the
`range [0.2, 0.26). But the picture cannot be contin-
`ued in this way without a magnifying glass! Conse-
`quently, Figure lb shows the ranges expanded to
`full height at every stage and marked with a scale
`that gives the endpoints as numbers.
`Suppose all the decoder knows about the message
`is the final range, [0.23354, 0.2336). It can -immedi-
`ately deduce that the first character was e! since the
`range lies entirely within
`the space the model of
`Table I allocates for e. Now it can simulate
`the oper-
`ation of the encoder:
`
`Initially
`After seeing e
`
`1)
`P,
`[0.2, 0.5)
`
`This makes it clear that the second character
`since this will produce the range
`
`is a,
`
`After seeing a
`
`[0.2, 0.26),
`
`which entirely encloses the given range [0.23354,
`0.2336). Proceeding like this, the decoder can iden-
`tify the whole message.
`It is not really necessary for the decoder to know
`both ends of the range produced by the encoder.
`Instead, a single number within
`the range--for ex-
`ample, 0.23355-will
`suffice. (Other numbers, like
`0.23354, 0.23357, or even 0.23354321, would do just
`as well.) However,
`the decoder will
`face the problem
`of detecting the end of the message, to determine
`when to stop decoding. After all, the single number
`0.0 could represent any of a, aa, aaa, aaaa, . . . . To
`resolve the ambiguity, we ensure that each message
`ends with a special EOF symbol known
`to both en-
`coder and decoder. For the alphabet of Table I, ! will
`be used to terminate messages, and only to termi-
`
`i
`
`i
`
`!
`
`FIGURE la. Representation of the Arithmetic Coding Process
`
`After
`seeing
`
`Nothing
`
`!
`u
`
`0
`
`i
`
`e
`
`a
`
`1
`
`i
`0
`
`e
`
`0.5
`
`a
`
`0.26
`
`a
`
`0.2 i
`
`0.2 i
`
`!
`u
`
`0
`/
`
`i
`
`e
`
`a
`\
`
`!
`U
`
`0
`/
`
`i
`
`e
`
`a
`\
`
`FIGURE lb. Representation of the Arithmetic Coding
`Process with the interval Scaled Up at Each Stage
`
`522
`
`Communications of the ACM
`
`June 1987 Volume 30 Number 6
`
`IPR2018-01413
`Sony EX1013 Page 3
`
`
`
`Computing Practices
`
`/* ARITHMETIC ENCODING ALGORITHM. */
`
`repeatedly
`encode-symbol
`/* Call
`/* Ensure
`that
`a distinguished
`/*
`transmit
`any value
`in
`the
`
`in
`for each symbol
`"terminator"
`symbol
`range
`[low,
`high).
`
`the message.
`is encoded
`last,
`
`then
`
`*/
`*/
`*/
`
`encode-symbol(symbo1,
`range = high
`-
`high
`= low
`low
`= low
`
`f
`f
`
`cum-freq)
`
`low
`range*cum-freq[symbol-11
`range*cum-freq(symbol1
`
`/* ARITHMETIC DECODING ALGORITHM. */
`
`"Value"
`/*
`/* Continue
`
`that has been received.
`the number
`is
`calling
`decode-symbol
`until
`the
`terminator
`
`symbol
`
`is
`
`returned.
`
`*/
`*/
`
`decode-symbol(cum-freq)
`find
`symbol such
`cum-freq[symbol]
`
`that
`
`(value-low)/(high-low)
`<=
`/* This ensures
`that
`;*
`(low,
`high)
`range
`/*
`the
`following
`lines
`
`< cum-freqrsymbol-11
`l /
`value
`lies within
`the new
`that will
`be calculated
`by */
`of code.
`*/
`
`range = high
`high
`t
`= low
`t
`1OW = low
`return
`symbol
`
`-
`
`low
`range*cum-freq[symbol-11
`range*cum-freq[symbol]
`
`FIGURE 2. Pseudocode for the Encoding and Decoding Procedures
`
`nate messages. When the decoder sees this symbol,
`it stops decoding.
`Relative to the fixed model of Table I, the entropy
`of the five-symbol message eaii! is
`
`-log 0.3 - log 0.2 - log 0.1 - log 0.1 - log 0.1
`
`= -log 0.00006 = 4.22
`
`(using base 10, since the above encoding was per-
`formed in decimal). This explains why it takes five
`decimal digits to encode the message. In fact, the
`size of the final range is 0.2336 - 0.23354 = 0.00006,
`and the entropy is the negative logarithm of this
`figure. Of course, we normally work in binary,
`transmitting binary digits and measuring entropy
`in bits.
`Five decimal digits seems a lot to encode a mes-
`sage comprising
`four vowels! It is perhaps unfortu-
`nate that our example ended up by expanding
`rather than compressing. Needless to say, however,
`different models will give different entropies. The
`best single-character model of the message eaii! is
`the set of symbol frequencies
`(e(O.2), a(0.2), i(O.4),
`!(0.2)), which gives an entropy of 2.89 decimal digits.
`Using this model the encoding would be only three
`digits long. Moreover, as noted earlier, more sophis-
`ticated models give much better performance
`in general.
`
`CODING
`FOR ARITHMETIC
`A PROGRAM
`Figure 2 shows a pseudocode fragment that summa-
`rizes the encoding and decoding procedures devel-
`oped in the last section. Symbols are numbered, 1, 2,
`range for the ith symbol is
`. . . The frequency
`3
`from cum-freq[i]
`to cum-freq[i
`- 11. As i decreases,
`cum-freq[i]
`increases, and cum-freq[O] = 1. (The
`reason for this “backwards” convention
`is that
`cum-freq[O] will
`later contain a normalizing
`factor,
`and it will be convenient
`to have it begin the array.]
`The “current
`interval”
`is [Zozu, high), and for both
`encoding and decoding, this should be initialized
`to [O, 1).
`In
`Figure 2 is overly simplistic.
`Unfortunately,
`practice, there are several factors that complicate
`both encoding and decoding:
`Incremental transmission and reception. The encode
`algorithm as described does not transmit anything
`until
`the entire message has been encoded; neither
`does the decode algorithm begin decoding until
`it
`has received the complete transmission.
`In most
`applications an incremental mode of operation
`is
`necessary.
`The desire to use integer arithmetic. The precision
`required
`to represent the [low, high) interval grows
`with
`the length of the message. Incremental opera-
`tion will help overcome this, but the potential
`for
`
`]une 1987 Volume 30 Number 6
`
`Communications of the ACM
`
`523
`
`IPR2018-01413
`Sony EX1013 Page 4
`
`
`
`Computing Practices
`
`overflow and underflow must still be examined
`carefully.
`
`Representing the model so thnt it can be consulted
`efficiently.
`The representation used for the model
`should minimize
`the time required
`for the decode
`algorithm
`to identify
`the next symbol. Moreover,
`an adaptive model should be organized to minimize
`the time-consuming
`task of maintaining
`cumulative
`frequencies.
`
`Figure 3 shows working code, in C, for arithmetic
`encoding and decoding. It is considerably
`lmore de-
`tailed than the bare-bones sketch of Figure Z! Imple-
`mentations of two different models are given in
`Figure 4; the Figure 3 code can use either one.
`The remainder of this section examines the code
`of Figure 3 more closely, and includes a proof that
`decoding is still correct in the integer implementa-
`tion and a review of constraints on word lengths in
`the program.
`
`arithmetic-coding-h
`
`/' DECLARATIONS USED FOR ARITHMETIC ENCODING AND DECODING l /
`
`/* SIZE OF ARITHMETIC CODE VALUES. l /
`
`#define Code-value-bits
`typedef
`long code-value:
`
`16
`
`in a code value
`/* Number of bits
`/* Type of an arithmetic
`code value
`
`fdefine
`
`Top-value
`
`(((long)l<<Code_value_blts)-1)
`
`/* Largest
`
`code value
`
`/' HALF AND QUARTER POINTS IN THE CODE VALUE RANGE. l /
`
`First-qtr
`*define
`#define Half
`Idefine
`Third-qtr
`
`(Top-value/ltl)
`(Z'First-qtr)
`(3’Firat-qtr)
`
`/* Point
`/* Point
`/* Point
`
`after
`after
`after
`
`first
`first
`third
`
`quarter
`half
`quarter
`
`mode1.h
`
`/'
`
`INTERFACE TO THE MODEL. '/
`
`/' THE SET OF SYMBOLS THAT MAY BE ENCODED. l /
`
`#define No-of-chars
`#define
`EOF-symbol
`
`256
`(No-of-charetl)
`
`/* Number of character
`/*
`Index of EOF symbol
`
`symbols
`
`#define No-of-symbols
`
`(No-of-charstll
`
`/* Total
`
`number of symbols
`
`/' TRANSLATION TABLES BETWEEN CHARACTERS AND SYMBOL INDEXES.
`
`l /
`
`char-to-index[No-of-chars];
`int
`unsigned
`char
`index_to_char[No_of_symbols+l]:
`
`from character
`/* To index
`/* To character
`from
`
`index
`
`/* CUMULATIVE FREQUENCY TABLE. */
`
`Idefine Max-frequency
`
`16383
`
`int
`
`cum_frsq[No_of_symbols+l];
`
`/* Maximum allowed
`2a14 - 1
`/*
`/* Cumulative
`
`frequency
`
`count
`
`symbol
`
`frequencies
`
`l /
`l /
`
`l /
`
`l /
`"/
`l /
`
`'/
`'/
`
`*/
`
`'/
`l /
`
`l /
`l /
`l /
`
`1
`2
`3
`4
`5
`6
`7
`a
`9
`10
`11
`12
`13
`14
`15
`:6
`
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`
`FIGURE 3. C Implementation of Arithmetic Encoding and Decoding
`
`524
`
`Communicationso~
`
`the ACM
`
`June 1987 Volume 30 Number 6
`
`IPR2018-01413
`Sony EX1013 Page 5
`
`
`
`encode.c
`
`/' MAIN PROGRAM FOR ENCODING. l /
`
`#include
`#include
`
`<stdlo.h>
`"mode1.h"
`
`main0
`startmodel~):
`1
`start-outputlng-bitsO;
`start-encodingo;
`for
`(::I
`(
`symbol;
`lnt
`int
`ch;
`ch - getc(stdin);
`If
`(ch--EOF)
`break;
`symbol-
`char-to.-lndex[ch];
`encode-symbol(symbol,cum-freq);
`update-model(symbo1);
`
`1
`encode_symbol(EOF_~symbol,cum_freq);
`done-encoding();
`done-outputinebitso;
`exit
`(0) :
`
`arithmetic
`
`encode.c
`
`-
`
`/* ARITHMETIC ENCODING ALGORITHM. */
`
`finclude
`
`"arithmetic-cod1ng.h"
`
`/* Set up other modules.
`
`/'
`
`Loop
`
`through
`
`characters.
`
`/* Read the next character.
`/* Exit
`loop on end-of-file.*/
`/* Translate
`to an
`index.
`/* Encode
`that
`symbol.
`/* Update
`the model.
`
`the EOF symbol.
`/* Encode
`/* Send the
`last
`few blts.
`
`static
`
`vold bll-plus-follov();
`
`/* Routine
`
`that
`
`follows
`
`/' CURRENT STATE OF THE ENCODING. '/
`
`static
`static
`
`low, high;
`code value
`long-bits-to-follow;
`
`the current
`/* Ends of
`/* Number of opposite
`/*
`the next bit.
`
`code
`bits
`
`region
`to output
`
`after
`
`/* START ENCODING A STREAM OF SYMBOLS. l /
`
`start-encoding0
`low
`- 0;
`I
`hiah
`- Top-value;
`bits
`to
`follow
`--
`
`1
`
`- 0;
`
`/* ENCODE A SYMBOL. l /
`
`/* Full
`
`code
`
`range.
`
`/* No bits
`
`to
`
`follow
`
`next.
`
`I
`
`(high-low)+l;
`
`encode-symhoI(symbol,cum-freq)
`int
`symbol:
`int
`cum-freql]:
`long
`range;
`(long)
`range
`-
`-
`high
`low +
`(ranye'cum-freq[symbol-ll)/cum-freq[O)-1:
`low
`-
`low +
`~ranpe*cum~freq(symbol))/cum~freq(O);
`
`to encode
`/* Symbol
`/* Cumulative
`symbol
`/' Size of
`the current
`
`frequencies
`code
`region
`
`/* Narrow
`/*
`to
`that
`/'
`symbol.
`
`the code
`allotted
`
`region
`to
`this
`
`39
`40
`41
`42
`43
`44
`45
`46
`47
`40
`49
`50
`51
`52
`53
`54
`55
`56
`57
`58
`59
`60
`
`61
`62
`63
`64
`65
`66
`67
`68
`69
`70
`71
`72
`73
`74
`75
`76
`17
`78
`79
`80
`Bl
`82
`83
`84
`85
`86
`81
`88
`a9
`90
`91
`92
`93
`94
`
`Computing Practices
`
`l /
`
`l /
`
`l /
`
`l /
`l /
`l /
`
`l /
`l /
`
`‘/
`
`*/
`l /
`'/
`
`'/
`
`l /
`
`'/
`'/
`l /
`
`l /
`l /
`l /
`
`FIGURE 3. C Implementation of Arithmetic Encoding and Decoding (continued)
`
`]une 1987 Volume 30 Number 6
`
`Communications of the ACM
`
`525
`
`IPR2018-01413
`Sony EX1013 Page 6
`
`
`
`95
`96
`97
`98
`99
`100
`101
`102
`103
`104
`105
`106
`107
`108
`109
`110
`111
`112
`113
`114
`115
`116
`117
`118
`119
`120
`121
`122
`123
`124
`125
`126
`127
`128
`129
`130
`131
`132
`133
`134
`135
`
`136
`137
`138
`139
`140
`141
`142
`143
`144
`145
`146
`147
`148
`149
`150
`151
`152
`153
`154
`
`for
`
`(:;I
`if
`
`I
`I
`(high<Half)
`bit~plus~follow(0);
`
`I
`else
`
`I
`else
`
`(
`
`(low>-Half)
`if
`bit~plus~follow(1):
`low -- Half:
`high
`-* Half:
`
`qtr
`(low>-Flrst
`if
`sc hiQh<ThirdIqtr)
`bits-to-foliow
`t*
`low
`-- First-qtr;
`high
`-- First-qtr;
`
`1;
`
`(
`
`1
`else break:
`low * 2'low;
`* 2*hightl;
`high
`
`/' FINISH ENCODING THE STREAM. l /
`
`done-encoding0
`bits-to-follow
`I
`if
`(low<First-qtr)
`else bit-plus-follow(l):
`
`1
`
`+* 1:
`bit~plus~follow(0):
`
`/' Loop
`
`to ouLput bits.
`
`/* Output 0 if
`
`In
`
`low half.
`
`/* Output 1 if
`
`in hiQh half.*/
`
`/+ Subtract
`
`offset
`
`to
`
`top.
`
`/* Output an opposite
`in middle
`/.
`later
`if
`
`bit
`half.
`
`/* Subtract
`
`offset
`
`to middle'/
`
`/+ Otherwise
`
`exit
`
`loop.
`
`/* Scale up code
`
`range.
`
`'1
`
`l /
`
`*/
`
`l /
`./
`
`'1
`
`l /
`
`that
`two bits
`/* Output
`the quarter
`that
`/' Select
`/*
`the current
`code
`range
`/* contains.
`
`'1
`l /
`l /
`'/
`
`/* OUTPUT BITS PLUS FOLLOWING OPPOSITE BITS.
`
`l /
`
`void bitglus-foIlow(blt)
`static
`int bit:
`output-bitcbit);
`while
`(bits-to-follow>O)
`output-bit
`(Ibit);
`bits
`to-follow
`
`i
`
`/* Output
`
`the bit.
`
`/* Output bits-to-follow
`bits.
`Set
`/*
`opposfte
`/' bits-to-follow
`to zero.
`
`-* 1;
`
`f
`
`1
`
`-
`
`decode.c
`
`/* MAIN PROGPAM FOR DECODING. */
`
`#include
`(include
`
`<stdio.h>
`"mode1.h"
`
`main ()
`0 :
`I
`start-model
`start-inpUtinQ-bitso;
`start-decodingo:
`for
`(;;)
`(
`symboi;
`int
`lnt
`ch:
`- decode-symbolicurn-freq):
`symbol
`if
`(symbol**EOF
`symbol)
`break;
`ch *
`lndex_to_c?;ar[symbolJ;
`putc(ch,stdout):
`update-model(symbo1);
`
`1
`exit.
`
`(0) ;
`
`1
`
`/* Set up other modules.
`
`/* Loop
`
`through
`
`characters.
`
`symbol.
`/* Decode next
`loop
`if EOF symbol.
`/' Exit
`/* Translate
`to a character.*/
`/' Write
`that
`character.
`/* Update
`the model.
`
`l /
`
`l /
`'/
`l /
`
`'/
`
`'/
`
`'/
`'I
`
`'/
`l /
`
`FIGURE 3. C Implementation of Arithmetic Encoding and Decoding (continued)
`
`526
`
`Communications
`
`of the ACM
`
`]une 1987 Volume 30 Number 6
`
`IPR2018-01413
`Sony EX1013 Page 7
`
`
`
`arithmetic
`--
`
`decode.c
`
`-
`
`--
`
`/* ARITHMETIC
`
`DECODING ALGORITHM.
`
`l /
`
`#include
`
`“arithmetic-cod1ng.h”
`
`/' CURRENT STATE OF THE DECODING.
`
`l /
`
`static
`static
`
`value
`code
`code:value
`
`value;
`low,
`
`high;
`
`/* Currently-seen
`/*
`Ends
`of
`current
`
`code
`code
`
`value
`replan
`
`/* START DECODING
`
`A STREAM OF SYMBOLS.
`
`'/
`
`Start-deC”di”Q()
`lnt
`(
`value
`for
`
`i:
`
`- 0;
`I<-Code-value-bits;
`- 1;
`(1
`value
`- 2'value,lnput_bit();
`
`1
`low
`hiQh
`
`" 0:
`- Top-value;
`
`1
`
`/* DECODE THE NEXT SYMBOL.
`
`l '/
`
`lnt
`
`!
`
`(cum-freq)
`I;
`
`it+)
`
`(
`
`/*
`/*
`
`Input
`code
`
`bits
`value.
`
`to
`
`flll
`
`the
`
`/*
`
`Full
`
`code
`
`ranpe.
`
`Cumulative
`/*
`/* Size
`of
`/* Cumulative
`/*
`Symbol
`
`frequencies
`symbol
`current
`region
`code
`frequency
`calculated
`decoded
`
`/*
`
`Flnd
`
`cum
`
`freq
`
`for
`
`value.
`
`symboltt)
`
`find
`/* Then
`code
`the
`allotted
`
`;
`Narrow
`to
`that
`symbol.
`
`/*
`/*
`/*
`
`symbol.
`reQlon
`to
`
`this
`
`Computing Practices
`
`l /
`‘/
`
`l /
`l /
`
`l /
`
`l /
`l /
`‘/
`l /
`
`l /
`
`l /
`l /
`l /
`l /
`
`155
`156
`157
`158
`159
`160
`161
`162
`163
`164
`165
`166
`167
`168
`169
`170
`171
`172
`173
`174
`175
`176
`177
`178
`179
`180
`181
`182
`183
`184
`185
`186
`187
`188
`189
`190
`191
`192
`193
`194
`195
`196
`197
`198
`199
`200
`201
`202
`203
`204
`205
`206
`207
`208
`209
`210
`211
`212
`213
`214
`215
`
`decode-symbol
`int
`cwn-freq[
`lO”Q
`ra”Qe;
`int
`cum;
`int
`symbol:
`(lO"Q)(hlQh-1oW)tl;
`ra”Qe
`-
`-
`cum
`(((lonQl(value-lou)tl)*cum~freq[O]-l)/ranQe:
`~eyn!bol
`- 1;
`cum-fraq[symbolJ%wm;
`for
`t
`-
`low
`hiQh
`(ranQe*cum-freq[symbol-ll)/cum-freq[O]-1;
`low
`-
`low
`t
`(ra"Qe'cum~freq[symbo1])/cum~freq[O];
`I::1
`(
`if
`(hiQh<Half)
`/’
`nothing
`
`for
`
`I
`l /
`
`1
`else
`
`else
`
`(
`
`If
`value
`low
`hlQh
`
`(low>-Half)
`-- Half;
`Half:
`-- Half;
`
`--
`
`if
`‘L
`value
`low
`high
`
`(low>-First-qtr
`hiQh<Thlrd-qtr)
`--
`First-qtr;
`First-qtr;
`First-qtr;
`
`--
`--
`
`1
`else
`low
`high
`value
`
`bxeak:
`- 2*1ow:
`- 2'hiQhtl:
`- 2*valuetlnput_bit():
`
`1
`relucn
`
`symbol;
`
`t
`
`/*
`
`/’
`
`/*
`
`/*
`
`Loop
`
`to QOt
`
`rld
`
`of
`
`bits.
`
`Expand
`
`low half.
`
`Expand
`
`hlph
`
`half.
`
`Subtract
`
`offset
`
`to
`
`top.
`
`(
`
`/*
`
`Expand
`
`middle
`
`half.
`
`/*
`
`Subtract
`
`offset
`
`to mIddlr*/
`
`/*
`
`Otherwise
`
`exit
`
`loop.
`
`/* Scale
`/* Move
`
`up code
`in next
`
`range.
`input
`
`blt.
`
`l /
`
`'/
`
`'/
`
`*I
`
`l /
`
`l /
`
`l /
`l /
`
`FIGURE 3. C Implementation of Arithmetic Encoding and Decoding (continued)
`
`]une 1987 Volume 30 Number 6
`
`Communications of the ACM
`
`527
`
`IPR2018-01413
`Sony EX1013 Page 8
`
`
`
`bit-input-c
`
`/* BIT
`
`INPUT ROUTINES. "/
`
`+include
`flnclude
`
`<stdlo.h>
`"arithmetic-cod1ng.h"
`
`/' THE BIT BUFFER.
`
`‘/
`
`static
`static
`static
`
`buffer:
`int
`lnt blts_to-go:
`lnt
`garbage-bits:
`
`/*
`
`INITIALIZE
`
`BIT
`
`INPUT.
`
`*/
`
`St6rt_inpUting_blt6()
`bits-to-go
`l
`garbage-bits
`
`1
`
`-
`
`0;
`-
`
`0:
`
`/*
`
`INPUT A BIT.
`
`l /
`
`/* Bits waiting
`of
`/*
`Number
`bits
`/*
`Number
`of
`bit6
`
`to
`
`be
`still
`past
`
`input
`in buffer
`end-of-file
`
`l /
`l /
`‘/
`
`/* Buffer
`/*
`no
`bits
`
`starts
`in
`
`it.
`
`out with
`
`l /
`l /
`
`int
`(
`
`input-bit0
`lnt
`t:
`1
`if
`(bits-to-go--O)
`buffer
`- getc(stdln):
`if
`(buffer--EOF)
`(
`1;
`garbage-bits
`t-
`if
`(garbage~,blts>Code~value~blts-2)
`fprlntf(stderr,“Bad
`input
`exit
`(-1);
`
`/*
`/*
`
`Read
`bit6
`
`the
`are
`
`next
`left
`
`if no '/
`byte
`in buffer.
`*'/
`
`1
`
`Return
`/*
`/* after
`for
`/*
`
`arbitrary
`eof,
`but
`too many
`
`bits*/
`check
`such.
`
`l /
`l /
`
`flle\n’j;
`
`)
`
`1
`bits-to-go
`
`1
`- buffer&l;
`t
`buffer
`a>- 1;
`bits-to-go
`-*
`return
`
`t;
`
`I:
`
`-
`
`8:
`
`the next bit
`/* Return
`from
`/*
`the bottom of
`the
`byte.
`
`l /
`l /
`
`216
`217
`218
`219
`220
`221
`222
`223
`224
`225
`226
`227
`228
`229
`230
`231
`232
`233
`234
`235
`236
`237
`238
`239
`240
`241
`242
`243
`244
`245
`246
`247
`248
`249
`250
`251
`252
`253
`254
`255
`256
`
`FIGURE 3. C Implementation of Arithmetic Encoding and Decoding (continued)
`
`528
`
`Communications of the ACM
`
`June 1987 Volume 30 Number 6
`
`IPR2018-01413
`Sony EX1013 Page 9
`
`
`
`bit-output.c
`
`/* BIT OUTPUT ROUTINES. '/
`
`#include
`
`<stdlo.h>
`
`/" THE BIT BUFFER. */
`
`static
`static
`
`int buffer:
`int bits-to-go;
`
`/*
`
`INITIALIZE
`
`FOR BIT OUTPUT. l /
`
`start_.outputing-..bits()
`buffer
`- 0;
`1
`bitS_tO-QO-
`
`8;
`
`)
`
`/* OUTPUT A BIT.
`
`l /
`
`output-blt(bit)
`lnt bit:
`>>- 1;
`buffer
`buffer
`if
`(bit)
`-- 1;
`bits-to-go
`if
`(bits-to-go--O)
`putc(buffer,stdout):
`bits-to-go
`- 8;
`
`(
`
`t
`
`l /
`l /
`
`'/
`l /
`
`buffered
`/* Bits
`/* Number of bits
`
`for output
`free
`in buffer
`
`/* Buffer
`/' with.
`
`Is empty
`
`to start
`
`/* Dot bit
`
`In
`
`top of buffer.*/
`
`I- 0x00;
`
`(
`
`/* Output buffer
`/' now full.
`
`if
`
`It 1s
`
`l /
`l /
`
`t
`
`257
`258
`259
`260
`261
`262
`263
`264
`265
`266
`267
`268
`269
`270
`271
`272
`273
`274
`215
`276
`277
`270
`279
`200
`281
`282
`283
`204
`285
`286
`287
`288
`209
`290
`291
`292
`293
`294
`
`/* FLUSH OUT THE LAST BITS.
`
`l /
`
`done-outputlng-bits0
`putc(buffer>>blts_to-go,stdout):
`
`FIGURE 3. C Implementation of Arithmetic Encoding and Decoding (continued)
`
`]une 1987 Volume 30 Number 6
`
`Communications of the ACM
`
`529
`
`IPR2018-01413
`Sony EX1013 Page 10
`
`
`
`Computing Practices
`
`fixed-model.
`
`c
`
`/* THE FIXED SDDRCE MODEL l /
`
`1
`2
`
`finclude
`
`'mode1.h'
`
`int
`
`freq[No-of-symbolstlj
`0,
`1.
`1,
`
`1,
`1,
`
`1.
`1,
`
`-
`
`(
`
`1,
`1,
`
`t
`
`1,
`
`5
`7,
`
`1,
`1,
`
`$
`
`3,
`
`4
`4,
`
`1.
`1,
`
`+
`
`9,
`
`3
`5,
`
`/*
`1236,
`
`/'012
`15,
`
`/*@
`1,
`
`I
`
`1,
`
`l
`
`21,
`
`15,
`
`A
`24,
`
`8,
`
`B
`15,
`
`1,
`1,
`
`(
`
`2,
`
`S
`4,
`
`1,
`1,
`
`1,
`1,
`
`2:,
`
`15,
`
`6
`5,
`
`7
`4,
`
`C
`9,
`
`1, 124,
`1,
`1,
`
`1
`
`2,
`
`9:;C
`6,
`
`l
`
`2,
`
`3,
`
`J
`8,
`
`1,
`1,
`
`+
`
`1,
`
`2,
`
`K
`6,
`
`1,
`1,
`
`1,
`1,
`
`1,
`1,
`
`7b,
`
`19,
`
`60,
`
`1,
`1,
`
`/
`1,
`
`l /
`
`-
`1,
`
`>
`1,
`
`?*/
`1,
`
`M
`23,
`
`N
`13,
`
`O*/
`11,
`
`1,
`
`L
`12,
`
`..
`1,
`
`l /
`
`5,
`
`C
`22,
`
`D
`12,
`
`E
`15,
`
`F
`10,
`
`H
`16,
`
`I
`16,
`
`4
`5
`6
`7
`a
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`40
`49
`50
`51
`52
`53
`54
`55
`56
`57
`58
`59
`60
`61
`
`/'P
`14,
`
`Q
`1,
`
`R
`14,
`
`S
`26,
`
`T
`29,
`
`U
`6,
`
`V
`3,
`
`W
`11,
`
`X
`1,
`
`Y
`3,
`
`/*
`
`'
`1, 49:.
`
`/*p
`111,
`
`d
`232, 74:,
`
`b
`85, 17;,
`8
`q
`t
`r
`5, 388, 375, 531, 159,
`
`f
`127,
`
`ll:,
`
`i
`h
`293, 418,
`
`5:,
`
`91,
`
`x
`Y
`12, 101,
`
`1.
`1,
`1.
`1,
`1.
`1,
`1,
`1,
`
`1.
`1,
`1.
`1,
`1.
`1,
`1,
`1,
`
`1,
`1,
`1.
`1.
`1.
`1,
`1,
`1,
`
`1,
`1,
`1,
`1.
`1,
`1,
`1,
`1,
`
`1,
`1,
`1,
`1,
`1,
`1,
`1,
`i,
`
`1.
`1.
`1,
`1,
`1,
`1,
`1,
`I,
`
`1,
`1,
`1,
`1,
`1,
`1,
`1,
`I,
`
`1.
`1.
`1.
`1,
`1,
`1,
`1,
`I,
`
`1.
`1,
`1,
`1.
`1.
`1,
`1.
`1,
`
`1.
`1.
`1.
`1.
`1,
`1,
`1,
`1,
`
`Z
`1,
`
`j
`6,
`
`f.
`
`1,
`1,
`1,
`1.
`1.
`1,
`1,
`1,
`
`[
`1,
`
`\
`1,
`
`I
`1,
`
`m
`1
`0
`k
`39, 250, 139, 42:, 446.
`
`l /
`
`(
`2,
`
`1.
`1.
`1,
`1,
`1.
`1,
`1,
`1,
`
`I
`1,
`
`1.
`1.
`1,
`1,
`1.
`1,
`1,
`1,
`
`)
`2,
`
`1.
`1,
`1,
`1,
`1.
`1,
`1,
`I,
`
`-
`3.
`
`1,
`1.
`1.
`1,
`1.
`1,
`1.
`I,
`
`*/
`
`1.
`
`1,
`1.
`1,
`1,
`1.
`1,
`1.
`1,
`
`1’ ,
`
`/*
`
`INITIALIZE
`
`THE MODEL. '/
`
`start-model0
`!
`lnt
`1:
`i<No-of-chars:
`- 0;
`for
`(i
`char-to-index[i]
`-
`index_to-char[i+l)
`
`itl;
`-
`i;
`
`it+)
`
`(
`
`that
`/* Set up tables
`symbol
`/*
`translate
`between
`characters.
`/*
`indexes
`and
`
`I
`- 0;
`cum-freq[No-of-symbols]
`for
`(1 - No-of-symbols;
`i>O;
`cum-freq[i-lj
`- cum-freq[i]
`
`I
`if
`
`i--l
`
`(
`freq(iJ;
`
`t
`
`/* Set up cumulative
`/*
`frequency
`counts.
`
`(cum-freq[Ol
`
`> Max-frequency)
`
`abort0;
`
`/* Check counts within
`
`limit*/
`
`l /
`l /
`l /
`
`l /
`'/
`
`/* UPDATE THE MODEL TO ACCOUNT FOR A NEW SYMBOL. 'I
`
`(symbol)
`symbol;
`
`update-model
`lnt
`
`I
`I
`
`/*
`
`Do
`
`nothing.
`
`l /
`
`FIGURE 4. Fixed and Adaptive Models for Use with Figure 3
`
`530
`
`Communications of the ACM
`
`]une 1987 Volume .30 Number 6
`
`IPR2018-01413
`Sony EX1013 Page 11
`
`
`
`adaptive-mode1.c
`
`/* THE ADAPTIVE SOURCE MODEL l /
`
`#include
`
`"modo1.h"
`
`int
`
`freq]No-of-symbolstl]:
`
`/+ symbol
`
`frequencies
`
`/'
`
`INITIALIZE
`
`THE MODEL. l /
`
`start-model0
`int
`i:
`I
`(i
`i<No-of-chars;
`- 0;
`for
`char-to-index[l]
`-
`index-to-char[itl]
`
`it+)
`
`(
`
`itl:
`-
`
`i:
`
`I
`for
`
`- 0;
`(i
`freq[i]
`cum-freq[i]
`
`i<-No-of-symbols;
`- 1;
`
`- No-of-symbols-i;
`
`it+)
`
`(
`
`t
`freq[O]
`
`- 0;
`
`t
`
`/* Set up tables
`that
`symbol
`/*
`translate
`between
`/*
`indexes
`and characters.
`
`frequency
`/* Set up initial
`/* counts
`to be one
`for all
`/* symbols.
`
`/* Freq[Ol
`/* same as
`
`must not be the
`freq[l].
`
`/' UPDATE THE MODEL TO ACCOUNT FOR A NEW SYMBOL. '/
`
`(
`
`update~model(symbo1)
`int
`symbol:
`lnt 1;
`if
`(cum-freq]O]--Max-frequency)
`int
`cum;
`cum - 0:
`for
`(1 - No-of-symbols;
`freq[i]
`-
`(freq(i]+1]/2:
`cum-freq[l]
`- cum;
`cum
`t-
`freq[l];
`
`Index of new symbol
`/*
`symbol
`/* New index
`for
`(
`if
`/* See
`/* are at
`
`count8
`frequency
`thslr maxlawn.
`
`la-O;
`
`i--)
`
`(
`
`so, halve all
`If
`/*
`/* counts
`(keeplng
`/* non-zero).
`
`the
`them
`
`Computing Practices
`
`l /
`
`l /
`l /
`l /
`
`l /
`l /
`l /
`
`l /
`l /
`
`'1
`l /
`l /
`l /
`
`l /
`l /
`l /
`
`t
`
`t
`ior
`if
`
`freq(i]--freq[i-1):
`
`:
`
`- symbol;
`(i
`(i<symbol)
`]
`int
`ch-i,
`ch-symbol
`ch-i
`-
`index-to-char[i];
`ch-symbol
`-
`index-to-char[aymbol];
`index-to-char[i]
`- ch-symbol:
`index-to-char[symbol]
`- ch-1;
`char-to-index]ch-i]
`- symbol:
`dex[ch-symbol]
`-
`i;
`char-to-in'
`
`I
`freq[i]
`while
`
`l--l
`
`:
`
`/* Find symbol's
`
`new Index.
`
`l /
`
`/* Update
`/*
`tables
`/* moved.
`
`the
`lf
`
`translation
`the symbol has
`
`l /
`l /
`l /
`
`+- 1;
`(i>O)
`(
`-- 1:
`i
`cum-freq]!
`
`1 t- 1;
`
`frequency
`the
`Increment
`/*
`the symbol and
`/* count
`for
`/* update
`the cumulative
`/*
`frequencies.
`
`l /
`l /
`l /
`'/
`
`t
`
`t
`
`1
`2
`3
`4
`5
`6
`7
`0
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`21
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`40
`49
`50
`51
`52
`53
`
`FIGURE 4. Fixed and Adaptive Models for Use with Figure 3 (continued)
`
`/me 1987 Volume 30 Number 6
`
`Communications of the ACM
`
`531
`
`IPR2018-01413
`Sony EX1013 Page 12
`
`
`
`Computing Practices
`
`the Model
`Repr