`
`EEONAN
`vw i wh wiv’
`
`PRENTICE HALL SIGNAL PROCESSING SERIES
`
`Ht|Wl Wetweet
`
`
`1
`
`DISH 1018
`
`
`
`GeorgeMason University
`University Libraries
`
`Digital Video Processing
`
`A. Murat Tekalp
`University of Rochester
`
`For book and bookstore information
`
`http:/Awww.prenhall.com
`
`
`
`Prentice Hall PTR
`UpperSaddle River, NJ 07458
`
`
`
`2
`
`
`
`Tekalp, A. Murat.
`Digital video processing / A. Murat Tekalp.
`»
`om. -- (Prentice-Hall signal processing series)
`ISBN 0-13-190075-7 (alk. paper)
`i. Digital video. I. Title. II. Series.
`TK6680.5.T45 1995
`621.388‘ 33--dc20
`
`95-16650
`CIP
`
`Editorial/production supervision: Ann Sullivan
`Cover design: Design Source
`Manufacturing manager: Alexis R. Heydt
`Acquisitions editor: Karen Gettman
`Editorial assistant: Barbara Alfieri
`
`aY
`G4 Printed on Recycled Paper
`
`
`
`©1995 by Prentice Hall PTR
`Prentice-Hall, Inc.
`A Simon and Schuster Company
`Upper Saddle River, NJ 07458
`
`The publisher offers discounts on this book when ordered in bulk quantities,
`
`For move information, contact:
`
`Corporate Sales Department
`Prentice Hall PTR
`One Lake Street
`Upper Saddle River, NJ 07458
`
`Phone: 800-382-3419
`Fax: 201-236-7141
`
`email: corpsales @prenhall.com
`
`All rights reserved. Nopart of this book may be reproduced, in any form or by any means,
`without permission in writing from the publisher.
`
`Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro
`
`Printed in the United States of America
`
`100987654321
`
`ISBN: 0-13-190075-7
`
`Prentice-Hall International (UK) Limited, London
`Prentice-Hall of Australia Pty. Limited, Sydney
`Prentice-Hall CanadaInc., Toronto
`Prentice-Hall Hispanoamericana, S.A., Mexico
`Prentice-Hall of India Private Limited, New Delhi
`Prentice-Hall of Japan, Inc., Tokyo
`Simon & Schuster Asia Pte. Ltd., Singapore
`
`3
`
`
`
` Chapter 18
`
`LOSSLESS
`COMPRESSION
`
`The need for effective data compression is evident in almost all applications where
`storage and transmission of digital images are involved. For example, an 8.5 x 11 in
`document scanned at 300 pixels/in with 1 bit/pixel generates 8.4 Mbits data, which
`without compression requires about 15 min transmission time over a 9600 baud line.
`A 35 mm film scanned at 12 micron resolution results in a digital imageof size 3656
`pixels x 2664 lines. With 8 bits/pixel per color and three color channels, the storage
`required per picture is approximately 233 Mbits. The storage capacity of a CD is
`about 5 Gbits, which without compression can hold approximately 600 pages of a
`document, or 21 color images scanned from 35 mm film. Several world standards
`for image compression, such as ITU (formerly CCITT) Group 3 and 4 codes, and
`ISO/IEC/CCITT JPEG, have recently been developed for efficient transmission
`and storage of binary, gray-scale, and color images.
`Compression of image data without significant degradation of the visual quality
`is usually possible because images contain a high degree of 1) spatial redundancy, due
`to correlation between neighboring pixels, ii) spectral redundancy, dueto correlation
`among the color components, and iit) psychovisual redundancy, due to properties
`of the human visual system. The higher the redundancy, the higher the achievable
`compression. This chapter introduces the basics of image compression, and discusses
`some lossless compression methods.
`It is not intended as a formal review of the
`related concepts, but rather aims to provide the minimum information necessary to
`follow the popularstill-image and video compression algorithms/standards, which
`will be discussed in the subsequent chapters. Elements of an image compression
`system as well as someinformation theoretic concepts are introduced in Section 18.1.
`Section 18.2 discusses symbol]coding, and in particular entropy coding, which is an
`integral part of lossless compression methods. Finally, three commonly used lossless
`compression algorithms are presented in Section 18.3.
`
`348
`
`4
`
`
`
`
`
`18.1. BASICS OF IMAGE COMPRESSION
`
`349
`
`18.1 Basics of Image Compression
`
`In this section, we first present the elements of a general image compression system,
`then summarize some results from the information theory which provide bounds on
`the achievable compression ratios and bitrates.
`
`18.1.1 Elements of an Image Compression System
`
`In information theory, the process of data compression by redundancy reduction is
`referred to as source encoding. Images contain two types of redundancy,statistical
`(spatial) and pyschovisual. Statistical redundancy is present becausecertain spatial
`patterns are more likely than others, whereas psychovisual redundancy originates
`from the fact that the humaneye is insensitive to certain spatial frequencies. The
`block diagram of a source encoder is shown in Figure 18.1. It is composed of the
`following blocks:
`
`Symbols
`
`
`
`Input
`image
`
`C
`
`Binary
`bit
`stream
`
`Figure 18.1: Block diagram of an image compression system.
`
`i) Transformer (T) applies a one-to-one transformation to the input image data.
`The output of the transformer is an image representation which is more amenable
`to efficient compression than the raw image data. Typical transformationsare linear
`predictive mapping, which mapsthe pixel intensities onto a prediction error signal
`by subtracting the predictible part of the pixel intensities; unitary mappings such as
`the discrete cosine transform, which pack the energy of the signal to a small number
`of coefficients; and multiresolution mappings, such as subband decompositions and
`the wavelet transform.
`ii) Quantizer (Q) generates a limited number of symbols that can be usedin the rep-
`resention of the compressed image. Quantization is a many-to-one mapping which
`is irreversible. It can be performed by scalar or vector quantizers. Scalar quantiza-
`tion refers to element-by-element quantization of the data, whereas quantization of
`a block of data at once is known as vector quantization.
`iii) Coder (C) assigns a codeword, a binary bitstream, to each symbolat the output
`of the quantizer. The coder may employ fixed-length or variable-length codes.
`Variable-length coding (VLC), also known as entropy coding, assigns codewords in
`such a way as to minimize the average length of the binary representation of the
`
`5
`
`
`
`300
`
`CHAPTER 18, LOSSLESS COMPRESSION
`
`symbols. This is achieved by assigning shorter codewords to more probable symbols,
`which is the fundamental principle of entropy coding.
`Different image compression systems implement different combinations of these
`choices. Image compression methods can be broadly classified as:
`i) Lossless (noiseless) compression methods, which aim to minimize the bitrate
`without any distortion in the image.
`ii) Lossy compression methods, which air to obtain the best possible fidelity for a
`given bitrate, or to minimize the bitrate to achieve a given fidelity measure.
`The transformation and encoding blocks are lossless. However, quantizationis lossy,
`Therefore, lossless methods, which only make use of the statistical redundancies, do
`not employ a quantizer. In most practical cases a small degradation in the image
`quality must be allowed to achieve the desired bitrate. Lossy compression methods
`make use of both the statistical and psychovisual redundancies.
`In the following, we first briefly review some results from the information theory
`which gives bounds on the achievable bitrates in both lossless and lossy compres-
`sion.
`In Section 18.2, we present techniques for symbol coding (the third box in
`Figure 18.1). The discussion of the second box, the quantizer, is deferred until the
`next chapter. Finally, some commonly used lossless image compression methodsare
`presented in Section 18.3.
`
`
`
`Information Theoretic Concepts
`18.1.2
`A source Y with an alphabet A is defined as a discrete random process (a sequence
`of random variables X;,i=1,...) intheform Y = X1 Xo ..., where each random
`variable X; takes a value from the alphabet A.
`In the following, we assume that
`the alphabet contains a finite number (M) of symbols, i.e, A = {a1,d2,...,am}.
`Here, we introduce two source models, a discrete memoryless source (DMS) and a
`Markov-K source.
`A DMSis such that successive symbols are statistically independent.
`It is
`completely specified by the probabilities p(a;) = pi,
`7 = 1,...,M such that
`pit+...+pm = 1.
`In VLC, the optimum length of the binary code for a sym-
`bol is equal to the information (in bits) that the symbol conveys. According to
`information theory, the information content of a symbol is related to the extent
`that the symbol is unpredictable or unexpected.
`If a symbol with low probability
`occurs, a larger amount of information is transferred than in the occurence of a
`more likely symbol. This quantitative concept of surprise is formally expressed by
`the relation
`
`(18.1)
`for asEA
`I(a;) = log, (1/p(ai)),
`where I{a;) is the amountof information that the symbol a; with probability p(a:)
`carries. The unit of information is bit when we use logarithm with base-2, Observe
`that if p = 1, then J = 0 as expected, and as p — 0, I — oo.
`In practice,
`the probability of occurrence of each symbolis estimated from the histogram of a
`specific source, or a training set of sources.
`
`6
`
`
`
`|
`
`:
`:
`
`18.1. BASICS OF IMAGE COMPRESSION
`
`3ol
`
`The entropy H(#’) of a DMS 4 with an alphabet A is defined as the average
`information per symbol in the source, given by
`
`H(X)
`
`d_ Pla) logs (1/p(a))
`
`acA
`
`— S$) pla) logp(a)
`
`actA
`
`(18.2)
`
`The more skewed the probability distribution of the symbols, the smaller the entropy
`of the source. The entropy is maximized for a flat distribution, that is, when all
`symbols are equally likely. It follows that a source where some symbols are more
`likely than others has a smaller entropy than a source whereall symbols are equally
`likely.
`
`Example: Entropy of raw image data
`Suppose an 8-bit image is taken as a realization of aDMS 4. The
`symbols z are the gray levels of the pixels, and the alphabet A is the
`collection of all gray levels between 0 and 255. Then the entropy of the
`image is given by
`
`255
`
`H(X) =— > pla) logy pli)
`
`i=0
`
`where p(2) denotes the relative frequency of occurrence of the gray level
`i in the image. Note that the entropy of an image consisting of a single
`gray level (constant image)is zero.
`
`Most realistic sources can be better modeled by Markov- random processes.
`That is, the probability of occurence of a symbol depends on the values of A preced-
`ing symbols. A Markov-K source can be specified by the conditional probabilities
`p(X; = a;|X;-1,...,Xj-x), for all j, a; € A. The entropy of a Markov-/’ source
`is defined as
`
`A(X) = SX tyes XpACHING Ha) Xion)
`sk
`
`(18.3)
`
`where S* denotes all possible realizations of X;-1,...,X;-x, and
`
`H(A |Xj-1, oc ., Aj-K) = S/ p(ai|Xj-1, 2 ..,Xj_-K) log p(a;|Xj-1, . ..,Xj-K)
`
`acA
`
`In the following, we present two fundamental theorems, the Lossless Coding The-
`orem and the Source Coding Theorem, which are used to measure the performance
`of lossless coding and lossy coding systems, respectively.
`
`7
`
`
`
`
`
`352
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`Lossless Coding Theorem: [Shannon, 1948] The minimum bitrate that
`can be achieved by lossless coding of a discrete memoryless source Xx is
`given by
`
`(18.4)
`.
`min{R} = H(A’) +. bits/symbol
`where R is the transmission rate, H(4’) is the entropy of the source, and
`¢ is a positive quantity that can be made arbitrarily close to zero.
`The Lossless Coding Theorem establishes the lower boundfor the bitrate neces-
`sary to achieve zero coding-decodingerror. In the case of a DMS, we can approach
`this bound by encoding each symbol independently. For sources with memory, we
`need to encode blocks of N source symbols at a time to comearbitrarily close to
`the bound. For example, a Markov-M source should be encoded M symbols at a
`time. In the next section, we introduce two coding techniques, Huffman coding and
`arithmetic coding, that approach the entropy bound.
`In lossy coding schemes, the achievable minimum. bitrate is a function of the
`distortion that is allowed. This relationship between the bitrate and distortion is
`given by the rate distortion function [Ber 71).
`Source Coding Theorem: There exists a mapping from the source sym-
`bols to codewords suchthat for a given distortion D, R(D) bits/symbol
`are sufficient to enable source reconstruction with an average distortion
`that is arbitrarily close to D. The actual rate R should obey
`(18.5)
`R> RD)
`for the fidelity level D. The function R(D)is called the rate-distortion
`function. Note that R(0) = H(*).
`
`Rate, R
`
`F(X)
`
`0
`
`Distortion, D
`
`Dmax
`
`Figure 18.2: Rate distortion function,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`8
`
`
`
`18.2. SYMBOL CODING
`
`353
`
`A typical rate-distortion function is depicted in Figure 18.2. The rate distortion
`function can be computed analytically for simple source and distortion models.
`Computer algorithms exist to compute R(D) when analytical methodsfail or are
`unpractical [Ber 71]. In general, we are interested in designing a compression system
`to achieve either the lowest bitrate for a given distortion or the lowest distortion at
`a given bitrate. Note that the source coding theorem does not state how to design
`algorithms to achieve these desired limits. Some well-knownlossy coding algorithms
`are discussed in Chapters 19, 20, and 21.
`
`18.2 Symbol Coding
`
`Symbol coding is the process of assigning a bit string to individual symbols or to
`a block of symbols comprising the source. The simplest schemeis to assign equal-
`length codewords to individual symbols or a fixed-length block of symbols, which
`is known asfixed-length coding. Because compression is generally achieved by as-
`signing shorter-length codewords to more probable symbols, we next describe two
`variable-length coding, also known as entropy coding, schemes. Thefirst, Huffman
`coding, assigns variable-length codes to a fixed-length block of symbols, where the
`block length can be one. In Huffman coding, the length of the codewords is propor-
`tional to the information(in bits) of the respective symbols or block of symbols. The
`latter, arithmetic coding, assigns variable-length codes to a variable-length block of
`symbols.
`
`18.2.1 Fixed-Length Coding
`
`In fixed-length coding, we assign equal-length code words to each symbol in
`the alphabet A regardless of their probabilities.
`If the alphabet has M different
`symbols (or blocks of symbols), then the length of the code words is the smallest
`integer greater than log, M. Two commonly used fixed-length coding schemesare
`natural codes and Gray codes, which are shown in Table 18.1 for the case of a
`four-symbol source. Notice that in Gray coding, the consecutive codewords differ in
`only one bit position. This property of the Gray codes may provide an advantage in
`error detection. We will see in Section 18.3 that Gray codes are also better suited
`for run-length encoding of bit-planes.
`
`Table 18.1: Fixed-length codes for a four-symbol alphabet.
`
`
`
`
`
`Symbol Natural code Gray code
`ay
`00
`00
`
`aa
`01
`O1
`ag
`10
`ll
`
`11
`a4
`
`a
`
`
`
`9
`
`
`
`304
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`It can easily be shown that fixed-length coding is optimal only when:
`
`1) the number of symbols is equal to a power of 2, and
`2) all the symbols are equiprobable.
`Only then would the entropy of the source be equal to the average length of the
`codewords, which is equal to the length of each codewordin the case of fixed-length
`coding. For the example shown in Table 18.1, both the entropy of the source and the
`average codeword length is 2, assuming all symbols are equally likely. Most often,
`some symbols are more probable than others, whereit would be more advantageous
`to use entropy coding. Actually, the goal of the transformation box in Figure 18.1
`is to obtain a set of symbols with a skew probability distribution, to minimize the
`entropy of the transformed source.
`
`18.2.2 Huffman Coding
`Huffman coding yields the optimal integer prefix codes given a source with a finite
`number of symbols and their probabilities. In prefix codes, no codeword is a prefix
`of another codeword. Such codes are uniquely decodable since a given binary string
`can only be interpreted in one way. Huffman codes are optimalin the sense that no
`other integer-length VLC can be found to yield a smaller average bitrate. In fact,
`the average length of Huffman codes per codeword achieves the lower bound, the
`entropy of the source, when the symbol probabilities are all powers of 2.
`Huffman codes can be designed by following a very simple procedure. Let 4
`denote a DMS with the alphabet A and the symbol probabilities p(a;), a;
`€ A,
`i=1,...,M. Obviously, if Mf = 2, we must have
`e(a;)=0 and c(az)=1
`
`(18.6)
`
`where c(a;) denotes the codeword for the symbola;, i = 1,2. If A has more than
`two symbols, the Huffman procedure requires a series of source reduction steps. In
`each step, we find and merge the two symbols with the smallest probabilities, which
`results in a new source with a reduced alphabet. The probability of the new symbol
`in the reduced alphabet is the sum ofthe probabilities of the two “merged” symbols
`from the previous alphabet. This procedureis continued until we reach a source
`with only two symbols, for which the codeword assignmentis given by (18.6). Then
`we work backwards towards the original source, each time splitting the codeword
`of the “merged” symbol into two new codewords by appending it with a zero and
`one, respectively. The following examples demonstrate this procedure.
`
`Example: Symbol probabilities are powers of 2
`Let the alphabet A consist of four symbols, shown in Table 18.2. The
`probabilities and the information content of the symbols in the alphabet
`are also listed in the table. Note that all symbol probabilities are powers
`of 2, and consequently the symbols have integer information values.
`
`10
`
`
`
`10
`
`
`
`18.2. SYMBOL CODING
`
`355
`
`|
`
`Table 18.2: An alphabet where the symbol probabilities are powers of 2.
`Symbol Probability
`Information
`1 bit
`2 bits
`3 bits
`
`3 bits
`
`The Huffman coding procedure is demonstrated for this alphabet in
`Table 18.3. The reduced alphabet in Step 1 is obtained by merging
`the symbols a3 and a, in the original alphabet which have the lowest
`two probabilities. Likewise, the reduced alphabet in Step 2 is obtained
`by merging the two symbols with the lowest probabilities after Step 1.
`Since the reduced alphabet in Step 2 has only two symbols, we assign
`the codes 0 and 1 to these symbols in arbitrary order. Next, we assign
`codes to the reduced alphabet in Step 1. Werecall that the symbol 2
`in Step 2 is obtained by merging the symbols 2 and 3 in Step 1. Thus,
`we assign codes to symbols 2 and 3 in Step 1 by appendingthe code for
`symbol 2 in Step 2 by a zero and onein arbitrary order. The appended
`zero and one are shown by bold fonts in Table 18.3. Finally, the codes
`for the original alphabet are obtained in a similar fashion.
`
`Table 18.3: Illustration of alphabet reduction,
`
`Original Alphabet|Reduced Alphabet|Reduced Alphabet
`Step 1
`Step 2
`
`
`
`
`
`
`
`P
`
`0.50
`0.25
`0.125
`0.125
`
`Q
`10
`110
`111
`
`This procedure can alternatively be described by the tree diagram shown
`in Figure 18.3.
`Observe that in this case, the average codeword length is
`
`R=0.5x1+0.25 x 240.125 x 340.125 x 3=1.75
`
`and the entropy of the source is
`H = —0.51n0.5 — 0.25 1n 0.25 — 0.125 In0.125 — 0.125 1n 0.125 = 1.75
`
`which is consistent with the result that Huffman coding achieves the
`entropy of the source when the symbol probabilities are powers of 2.
`
`11
`
`11
`
`
`
`396
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`a,iseeoege
`
`=0.5
`
`
`
`
`
`
`
`0
`
`“2 p=0.25 a
`
`4 p=0.125
`
`Figure 18.3: Tree-diagram for Huffman coding.
`
`Next, we present an example in which the symbol probabilities are not
`powers of 2.
`Ezample 2: Symbol probabilities are not powers of 2
`The information content of each symbol is a real number, as shown in
`Table 18.4, when the probabilities of the symbols are not powers of 2.
`
`
`
`
`
`Table 18.4: An alphabet with arbitrary symbol probabilities.
`
`
`Symbol Probability
`Information
`a1
`0.40
`1.32 bits
`
`
`0.25
`2.00 bits
`ao
`a3
`0.15
`2.73 bits
`
`
`a4
`0.15
`2.73 bits
`
`4.32 bits
`
`
`Since the length of each codeword mustbe aninteger, it is not possible
`to design codewords whose lengths are equal to the information of the
`respective symbols in this case. Huffman code design for the alphabet
`
`in Table 18.4 is shown in Table 18.5. It can be easily seen that for this
`|
`example the average length of codewordsis 2.15, and entropy of the
`source is 2.07.
`|
`
`Notice that Huffman codes are uniquely decodable, with proper syn-
`| chronization, because no codewordisaprefix of another. For example,
`
`|
`001101101110000...
`can be decoded uniquely as
`
`a received binary string
`
`dg @, G2 Gy a2 QA, Gi G4 ..-
`
`12
`
`
`
`
`
`18.2. SYMBOL CODING
`
`357
`
`Table 18.5: Huffman coding when probabilities are not powers of 2,
`ee
`
`
`
`
`
`
`
`Huffman coding can also be used as a block coding scheme where we assign
`codewords to combinations of L symbols from the original alphabet at a time. Of
`course, this requires building a new block alphabet with all possible combinations
`of the L symbols from the original alphabet and computing their respective prob-
`abilities. Huffman codes for all possible combinations of the L symbols from the
`original alphabet can be formed using the above design procedure with the new
`block alphabet. Thus, Huffman codingis a block coding scheme, where we assign
`variable-length codesto fixed-length (L) blocks of symbols. ‘The case L — 1 refers to
`assigning an individual codeword to each symbol of the original alphabet, as shown
`in the above two examples. It has been shown that for sources with memory, the
`coding efficiency improves as L gets larger, although the design of Huffman codes
`gets more complicated.
`
`18.2.3. Arithmetic Coding
`In arithmetic coding a one-to-one correspondence between the symbols of an alpha-
`bet A and the codewords does notexist. Instead, arithmetic coding assigns a single
`variable-length code to a source ¥, composed of N symbols, where N is variable.
`The distinction between arithmetic coding and block Huffman coding is that in
`arithmetic coding the length of the input sequence,
`i.e., the block of symbols for
`which a single codeword is assigned,
`is variable. Thus, arithmetic coding assigns
`variable-length codewordsto variable-length blocks of symbols. Because arithmetic
`coding does not require assignment of integer-length codes to fixed-length blocks
`of symbols, in theory it can achieve the lower bound established by the noiseless
`coding theorem.
`Arithmetic coding associates a given realization of VY, x = {v1,...,en}, with a
`subinterval of (0,1) whose length equals the probability of the sequence p(x). The
`encoder processes the input stream of symbols one by one, starting with N = 1,
`where the length of the subinterval associated with the sequence gets smaller as
`N increases. Bits are sequentially sent to the channel starting from the most sig-
`nificant bit towards the least significant bit as they are determined according to
`a procedure, which is presented in an algorithmic form in the following. At the
`end of the transmission, the transmitted bitstream is a uniquely decodable code-
`
`13
`
`13
`
`
`
`—°°
`
`CHAPTER 18. LOSSLESS COMPRESSION
`358
`word representing the source, which is a binary number pointing to the subinterval
`associated with this sequence.
`
`The Procedure
`Consider an alphabet A that has M symbols aj, i= 1,..-,M, with the proba-
`bilities p(ai) = Pi, such that pp +..-+pm = 1. The procedure starts with assigning
`each individual symbol in the alphabet a subinterval, within 0 to 1, whose lengthis
`equalto its probability. It is assumed that this assignment is known to the decoder.
`1.
`If the first input symbol 21 = 4%,
`i = 1,...,M, then define the initial
`subinterval as I, = [l,71) = [pi-1, Pi-1 + p;), where po = 0. Stn=1,L=h,
`R=r,andd= mah.
`2. Obtain the binary expansions of L and & as
`oO
`L= So un2~*,
`and
`R= Ss: vR2*
`
`k=1
`
`14
`
`where u, and vz are 0 or 1.
`Compare uw, and v;. If they are not the same, send nothing to the channel at
`this time, and go to step 3.
`If uy = v1, then send the binary symbol u;, and compare U2 and vo. If they are
`not the same, go to step 3.
`If ue. — v2, also send the binary symbol uz, and compare ug and vg, and so on,
`until the next two corresponding binary symbols do not match, at which time go
`to step 3.
`3.
`Increment n, and read the next symbol. If the nth input symbol tn = Gi,
`then subdivide the interval from the previous step as
`In = Un» Tn) = [ln—1 + pi-id, ln-i t+ (pi-1 + pi)d).
`Set D =l,, R= Pn, and d=— In, and go to step 2.
`Note that the decoder may decode one binary symbolinto several source sym-
`bols, or it may require several binary symbols before it can decode one or more
`source symbols. The arithmetic coding procedure is illustrated by means of the
`following example.
`
`oo
`
`k=l
`
`Example
`Suppose we wish to determine an arithmetic code to represent a sequence
`of symbols,
`
`a9 a1 @3 ---
`
`from the source shown in Table 18.2. Because we have four symbols
`in the alphabet, the interval from 0 to 1 is initially subdivided into 4,
`where the lengths of the subintervals are equal to 0.5, 0.25, 0.125 and
`0.125, respectively. This is depicted in Figure 18.4.
`
`14
`
`
`
`4
`43
`49
`4)
`Symbol
`JNfeed
`Decimal
`0.75-, 0.875
`0
`7 0.5
`1.0
`Binary
`0.0
`ON
`0.11
`0411
`10
`aya, ajay
`ara,
`‘J
`0.6875 0.71875 0.75
`0.625-.
`~°6,1011 0.10111 0.11
`0.101
`998793829494
`299782
`828 1AG
`Pe
`eeeeeeed
`Decimal
`0.5
`0.5625
`0.59375 0.609375 0.625
`
`we
`ara,
`
`wt
`:
`0.5
`01
`
`
`
`18.2. SYMBOL CODING
`
`359
`
`Decimal
`Binary
`
`Binary
`
`0.1
`
`0.1001
`
`0.10011 0.100111 0.101
`
`Figure 18.4: Illustration of the concept of arithmetic coding.
`
`Thefirst symboldefines the initial interval as J; = [0.5, 0.75), where the
`binary representationsof theleft and right boundaries are L = 2-1 = 0.1
`and R = 2-14 2-? = 0.11,
`respectively. According to step 2,
`u,; = v, = 1;
`thus,
`1 is sent to the channel. Noting that ug = 0
`and vg = 1, we read the second symbol, a;. Step 3 indicates that
`Ip = [0.5, 0.625), with D = 0.10 and R= 0.101. Now that ug = v2 = 0,
`we send 0 to the channel. However, ug = 0 and vg = 1, so we read the
`third symbol, a3.
`It can be easily seen that fg = [0.59375, 0.609375),
`with £L = 0.10011 and R = 0.100111. Note that ug = v3 = 0,
`ug = v4 = 1, and us = v5 = 1, but ug = 0 and vg = 1. At this stage, we
`send 011 to the channel, and read the next symbol. A reserved symbol
`usually signals the end of a sequence.
`
`Let’s now briefly look at how the decoder operates, which is illustrated
`in Figure 18.5. The first bit restricts the interval to [0.5, 1). However,
`
`Received Bit
`1
`0
`0
`1
`i
`
`Interval
`(0.5, 1)
`[0.5, 0.75)
`(0.5, 0.609375)
`(0.5625, 0.609375)
`(0.59375, 0.609375)
`
`Symbol
`-
`a2
`ay
`-
`ag
`
`Figure 18.5: The operation of the decoder.
`
`15
`
`
`
`360
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`three symbols are within this range; thus, the first bit does not contain
`sufficient information. After receiving the second bit, we have 10 which
`points to the interval [0.5,0.75). All possible combinations of two sym-
`bols pointing to this range start with a2. Hence, we can now decode
`the first symbol as az. The information that becomes available after the
`receipt of each bit is summarized in Figure 18.9.
`
`In practice, two factors cause the performance of the arithmetic encoder to fall
`short of the theoretical bound:
`the addition of an end-of-message indicator, and
`the use of finite precision arithmetic. Practical implementations of the arithmetic
`coder overcome the precision problem by a scaling and a roundingstrategy.
`
`18.3. Lossless Compression Methods
`
`Error-free coding is the only acceptable means of compression in some applications
`for various reasons. For example, in the transmission or archival of medical im-
`ages lossy compression is not allowed for legal reasons. Recalling the elements of
`a compression system, lossless coding schemes do not employ a quantizer. They
`consist of a transformation, which generates symbols whose probability distribution
`is highly peaked, followed by an entropy coder. The transformation aims to mini-
`mize the entropy of its output, so that significant compression becomes possible by
`variable-length coding of the generated symbols.
`1)
`In this section, we present three popular methods for lossless compression:
`Lossless predictive coding, where an integer predictive mapping is employed, fol-
`lowed by entropy coding of the integer prediction errors.
`ii) Run-length coding of
`bit-planes, where the image is decomposed into individual bit-planes (binary im-
`ages), and the run-lengths of zeros and ones in these planes are entropy coded.
`iii) Ziv-Lempel coding, which is a deterministic coding procedure, where the input
`bit string is parsed into blocks of variable length to form a dictionary of blocks
`(symbols), each of which is represented by a fixed-length codeword. The achiev-
`able compression ratio using lossless coding methods ranges between 2:1 to 5:1,
`depending on the characteristics of the input image.
`
`18.3.1 Lossless Predictive Coding
`The first step in lossless predictive coding is to form an integer-valued prediction
`of the next pixel intensity to be encoded based on a set of previously encoded
`neighboring pixels. Then the difference between the actual intensity of the pixel
`and its prediction is entropy coded. Assuming each pixelis integer-valued, then the
`prediction errors are also integer-valued, which facilitates lossless compression. The
`block diagram of a simple predictor is shown in Figure 18.6, where the prediction
`is taken as the intensity of the previous pixel encoded. Someother commonly used
`integer predictors are shown in the following example.
`
`16
`
`
`
`16
`
`
`
`18.3. LOSSLESS COMPRESSION METHODS
`
`361
`
`
`Residual
`
`Sample
`
`Previous
`Sample
`
`(a)
`
`Residual
`
`Reconstructed
`
`Sample Previous.
`
`Sample
`
`(b)
`
`Figure 18.6: Block diagram of a) an encoder, and b) a decoder using a simple
`predictor.
`
`Example: Integer prediction
`In order for the decoder to be able to duplicate the prediction step, the
`prediction operation must be based on already-encoded pixels. In 2-D,
`such prediction models are called recursively computable. The support
`of a recursively computable predictor is shown in Figure 18.7, where the
`coefficients a, b, c, and d denote the intensities of the respective pixels.
`oe
`Two simple predictors based on this model can be written as
`
`# = int{(a+)/2}
`
`(18.7)
`
`
`
`Figure 18.7: Prediction schemes.
`
`
`
`17
`
`
`
`
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`# = int{(at+b+c+d)/4}
`
`(18.8)
`
`where # denotes the predicted value for the pixel z. In both expressions,
`the prediction is rounded to the nearest integer to ensure an integer
`prediction error. We note that many other forms of prediction, such as
`edge-adaptive prediction, also exist.
`If the input image intensity z has a dynamic range of (0,255), then the prediction
`error x — & has a theoretical dynamic range of (-255,255). The reader may have
`noticed that the predictive mapping in fact results in an expansion of the dynamic
`range of the signal to be encoded. However, inspection of the histogram (probability
`density) of the prediction error shows that it is highly peaked about 0, as compared
`to the histogram of the actual image intensities, as shown in Figure 18.8. Therefore,
`the prediction error always has much smaller entropy than the original intensity
`values, which implies that the prediction process removes a great deal of interpixel
`(statistical) redundancy.
`
`Relative frequency
`
`Relative frequency
`
`
`
`;
`;
`Original image intensity
`a)
`
`255
`
`—t-
`-25
`
`B
`
`0 |
`Integer prediction error
`b)
`
`255
`
`Figure 18.8: Histogramsof a) the original image intensity and b) integer prediction
`error.
`
`For lossless coding, every possible difference value in the range (-255,255) needs
`to be taken as a different symbol. Binary codewords for these symbols can be
`assigned by entropy coding, such as Huffman coding or arithmetic coding. However,
`because it is usually very costly to assign a different codeword to 513 different
`symbols, we usually assign a unique codeword to every difference value in the range
`(-15,16).
`In addition, codewords are assigned to a shift up (SU) symbol and a
`shift down (SD) symbol, which shifts a given difference value up or down by 32,
`respectively. Using these codewords, every possible difference value can be uniquely
`coded by using an appropriate number of SU or SD operators followed by a difference
`code in the range (-15,16). For example, a difference value of 100 can be represented
`
`18
`
`
`
`18
`
`
`
`
`
`18.3. LOSSLESS COMPRESSION METHODS
`
`363
`
`by cascading the codes for the symbols SD, SD, SD, and 4. This scheme results in a
`slightly higher bitrate than does designing 513 different codes, but offers significant
`reduction in complexity. The probabilities of each of these symbols are estimated
`by analyzing a histogram of the prediction errors obtained from a training set of
`