`
`32'??? [30955936];
`
`IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
`‘MII
`
`IIII IIIIIUII
`SIM/Ia
`
`PRENTICE HALL SIGNAL PROCESSING SERIES
`
`31'
`
`'
`:IIIIIIIJIi
`
`
`
`DISH 1018
`
`1
`
`DISH 1018
`
`
`
`George-Mason University
`Unwersnty Libraries
`
`I
`
`Digital Video Processing
`
`A. Murat "Lekalp
`University of Rochester
`
`For beak and bookstore Information
`
`hnp:llwww.pronhall.com
`
`Prentice Hall PTR
`Upper Saddle River, N] 07458
`
`
`
`2
`
`
`
`I|
`
`Tekalp, A. Murat.
`Digital video processing / A. Murat Tekalp.
`p.
`cm. -- (Prentice-Hall signal processing series)
`ISBN 0-13-190075-7 (alk. paper)
`1. Digital video. I. Title. II. Series.
`TK6680 . 5 . T45
`1995
`62l.388'33——dc20
`
`95—16650
`c I P
`
`Editorial/production supervision: Ann Sullivan
`Cover design: Design Source
`Manufacturing manager: Alexis R. Heydt
`Acquisitions editor: Karen Gettman
`Editorial assistant: Barbara Alfieri
`
`5)
`(239 Printed on Recycled Paper
`
`
`
`@1995 by Prentice Hall PTR
`Prentice-Hall, Inc.
`A Simon and Schuster Company
`Upper Saddle River, NJ 07458
`
`The publisher offers discounts on this book when ordered in bulk quantities.
`
`For more information, contact:
`
`Corporate Sales Department
`Prentice Hall PTR
`One Lake Street
`
`Upper Saddle River, NJ 07458
`
`Phone: 800-382-3419
`Fax: 201-236-7141
`
`email: corpsales@prenhall.com
`
`All rights reserved. No part of this book may be reproduced, in any form or by any means,
`without permission in writing from the publisher.
`
`Printed in the United States of America
`
`10987654321
`
`Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro
`
`ISBN: 0-13-190075-7
`
`Prentice-Hall International (UK) Limited, London
`Prentice-Hall of Australia Pty. Limited. Sydney
`Prentice-Hall Canada Inc., Toronto
`
`Prentice-Hall l-Iispanoamericana, S.A., Mexico
`Prentice-Hall of India Private Limited, New Delhi
`Prentice—Hall of Japan, Inc., Tokyo
`Simon & Schuster Asia Pte. Ltd., Singapore
`
`3
`
`
`
`LOSSLESS
`
`COMPRESSION
`
` Chapter 18
`
`The need for effective data compression is evident in almost all applications where
`storage and transmission of digital images are involved. For example, an 8.5 X 11 in
`document scanned at 300 pixels/in with 1 bit/pixel generates 8.4 Mbits data, which
`without compression requires about 15 min transmission time over a 9600 baud line.
`A 35 mm film scanned at 12 micron resolution results in a digital image of size 3656
`pixels x 2664 lines. With 8 bits/pixel per color and three color channels, the storage
`required per picture is approximately 233 Mbits. The storage capacity of a CD is
`about 5 Gbits, which without compression can hold approximately 600 pages of a
`document, or 21 color images scanned from 35 mm film. Several world standards
`for image compression, such as ITU (formerly CCITT) Group 3 and 4 codes, and
`ISO/IEC/CCITT JPEG, have recently been developed for efficient transmission
`and storage of binary, gray—scale, and color images.
`Compression of image data without significant degradation of the visual quality
`is usually possible because images contain a high degree ofi) spatial redundancy, due
`to correlation between neighboring pixels, ii) spectral redundancy, due to correlation
`among the color components, and iii) psychovisual redundancy, due to properties
`of the human visual system. The higher the redundancy, the higher the achievable
`compression. This chapter introduces the basics of image compression, and discusses
`some lossless compression methods.
`It is not intended as a formal review of the
`related concepts, but rather aims to provide the minimum information necessary to
`follow the popular still—image and video compression algorithms/ standards, which
`will be discussed in the subsequent chapters. Elements of an image compression
`system as well as some information theoretic concepts are introduced in Section 18.1.
`Section 18.2 discusses symbol coding, and in particular entropy coding, which is an
`integral part of lossless compression methods. Finally, three commonly used lossless
`compression algorithms are presented in Section 18.3.
`
`348
`
`4
`
`
`
`
`
`18.1. BASICS OF IMAGE COMPRESSION
`
`349
`
`18.1 Basics of Image Compression
`
`In this section, we first present the elements of a general image compression system,
`then summarize some results from the information theory which provide bounds on
`
`the achievable compression ratios and bitrates.
`
`18.1.1 Elements of an Image Compression System
`
`In information theory, the process of data compression by redundancy reduction is
`referred to as source encoding. Images contain two types of redundancy, statistical
`(spatial) and pyschovisual. Statistical redundancy is present because certain spatial
`patterns are more likely than others, whereas psychovisual redundancy originates
`from the fact that the human eye is insensitive to certain spatial frequencies. The
`block diagram of a source encoder is shown in Figure 18.1. It is composed of the
`following blocks:
`
`Symbols
`
`Input
`image
`
`
`
`C
`
`Binary
`bit
`stream
`
`Figure 18.1: Block diagram of an image compression system.
`
`i) Transformer (T) applies a one-to—one transformation to the input image data.
`The output of the transformer is an image representation which is more amenable
`to efficient compression than the raw image data. Typical transformations are linear
`predictive mapping, which maps the pixel intensities onto a prediction error signal
`by subtracting the predictible part of the pixel intensities; unitary mappings such as
`the discrete cosine transform, which pack the energy of the signal to a small number
`of coefficients; and multiresolution mappings, such as subband decompositions and
`the wavelet transform.
`
`ii) Quantizer (Q) generates a limited number of symbols that can be used in the rep—
`resention of the compressed image. Quantization is a many—to—one mapping which
`is irreversible. It can be performed by scalar or vector quantizers. Scalar quantiza—
`tion refers to element—by—element quantization of the data, whereas quantization of
`a block of data at once is known as vector quantization.
`iii) Coder (C) assigns a codeword, a binary bitstream, to each symbol at the output
`of the quantizer. The coder may employ fixed-length or variable—length codes.
`Variable—length coding (VLC), also known as entropy coding, assigns codewords in
`such a way as to minimize the average length of the binary representation of the
`
`5
`
`
`
`
`
`350
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`symbols. This is achieved by assigning shorter codewords to more probable symbols,
`which is the fundamental principle of entropy coding.
`Different image compression systems implement different combinations of these
`choices. Image compression methods can be broadly classified as:
`i) Lossless (noiseless) compression methods, which aim to minimize the bitrate
`Without any distortion in the image.
`ii) Lossy compression methods, which aim to obtain the best possible fidelity for a
`given bitrate, or to minimize the hitr'ate to achieve a given fidelity measure.
`The transformation and encoding blocks are lossless. However, quantization is lossy,
`Therefore, lossless methods, which only make use of the statistical redundancies, do
`not employ a quantizer. In most practical cases a small degradation in the image
`quality must be allowed to achieve the desired bitrate. Lossy compression methods
`make use of both the statistical and psychovisual redundancies.
`In the following, we first briefly review some results from the information theory
`which gives bounds on the achievable bitrates in both lossless and lossy compres—
`sion.
`In Section 18.2, we present techniques for symbol coding (the third box in
`Figure 18.1). The discussion of the second box, the quantizer, is deferred until the
`next chapter. Finally, some commonly used lossless image compression methods are
`presented in Section 18.3.
`
`18.1.2
`
`Information Theoretic Concepts
`
`A source X with an alphabet A is defined as a discrete random process (a sequence
`of random variables X,, i = 1, .
`. ) in the form X 2 X1 X2 .
`. ., where each random
`variable X,- takes a value from the alphabet A.
`In the following, we assume that
`the alphabet contains a finite number (M) of symbols, i.e., A = {(11, a2, .
`.
`.
`, GM}.
`Here, we introduce two source models, a discrete memoryless source (DMS) and a
`Markov-K source.
`It is
`A DMS is such that successive symbols are statistically independent.
`completely specified by the probabilities p(a,-) 2 pi,
`i = 1,...,M such that
`p1 +
`+ pM = 1.
`In VLC, the optimum length of the binary code for a sym—
`bol is equal to the information (in bits) that the symbol conveys. According to
`information theory, the information content of a symbol is related to the extent
`that the symbol is unpredictable or unexpected.
`If a symbol with low probability
`occurs, a larger amount of information is transferred than in the occurence of a
`more likely symbol. This quantitative concept of surprise is formally expressed by
`the relation
`
`I(a,-) = logz (1/p(a,-)),
`
`for
`
`a,- E A
`
`(18.1)
`
`where I((1,) is the amount of information that the symbol a,- with probability p(a,;)
`carries. The unit of information is bit when we use logarithm with base—2. Observe
`that if p = 1, then I = 0 as expected, and as p ——> 0, I —r 00.
`In practice,
`the probability of occurrence of each symbol is estimated from the histogram of a
`specific source, or a training set of sources.
`
`
`
`6
`
`
`
`I
`
`|
`I.
`
`_
`
`I
`'
`
`|
`
`‘
`
`‘| |
`
`18.1. BASICS OF IMAGE COMPRESSION
`
`351
`
`The entropy HOV) of a DMS X with an alphabet A is defined as the average
`information per symbol in the source, given by
`
`HW)
`
`2 19(0) 10g2 (1/p(a))
`aEA
`
`-2 17(0) 10821901)
`aEA
`
`(18-?)
`
`The more skewed the probability distribution of the symbols, the smaller the entropy
`of the source. The entropy is maximized for a flat distribution, that is, when all
`symbols are equally likely. It follows that a source Where some symbols are more
`likely than others has a smaller entropy than a source where all symbols are equally
`likely.
`
`Example: Entropy of raw image data
`
`Suppose an 8—bit image is taken as a realization of a 'DMS X. The
`symbols 1' are the gray levels of the pixels, and the alphabet A is the
`collection of all gray levels between 0 and 255. Then the entropy of the
`image is given by
`
`255
`
`H ((1’) = - Z W) logzpfi)
`i=0
`
`where p(i) denotes the relative frequency of occurrence of the gray level
`2' in the image. Note that the entropy of an image consisting of a single
`gray level (constant image) is zero.
`
`Most realistic sources can be better modeled by Markov—K random processes.
`That is, the probability of occurence of a symbol depends on the values of K preced—
`ing symbols. A Markov-K source can be specified by the conditional probabilities
`p(Xj : CLilXj_1, .
`.
`. ,Xj_K), for all j, (13- E A. The entropy of a Markov—K source
`is defined as
`
`. ,X,_K)
`
`(18.3)
`
`H(X) = Zp(X,-_1, .
`SK
`
`. .,X,-_K)H(X|X,-_1, .
`
`.
`
`where SK denotes all possible realizations of Xj_1, .
`
`.
`
`.
`
`, X,-_K, and
`
`H(XIXJ'_1, .
`
`. .,Xj_K) : Zp(ai|Xj_1,...,XJ-_K) logp(a,|Xj_1,...,XJ-_K)
`(16,41
`
`In the following, we present two fundamental theorems, the Lossless Coding The—
`orem and the Source Coding Theorem, which are used to measure the performance
`of lossless coding and lossy coding systems, respectively.
`
`7
`
`
`
`
`
`352
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`Losslcss Coding Theorem: [Shannon, 1948] The minimum bitrate that
`can be achieved by lossless coding of a discrete memoryless source X is
`given by
`
`min{R} = H(X) + e bits/symbol
`
`'
`
`(18.4)
`
`where R is the transmission rate, H(X) is the entropy of the source, and
`c is a positive quantity that can be made arbitrarily close to zero.
`
`The Lossless Coding Theorem establishes the lower bound for the bitrate neces-
`sary to achieve zero coding-decoding error. In the case of a DMS, we can approach
`this bound by encoding each symbol independently. For sources with memory, we
`need to encode blocks of N source symbols at a time to come arbitrarily close to
`the bound. For example, a Markov—M source should be encoded M symbols at a
`time. In the next section, we introduce two coding techniques, Huffman coding and
`arithmetic coding, that approach the entropy bound.
`In lossy coding schemes, the achievable minimumbitrate is a function of the
`distortion that is allowed. This relationship between the bitrate and distortion is
`given by the rate distortion function [Ber 71].
`
`Source Coding Theorem: There exists a mapping from the source sym-
`bols to codewords such that for a given distortion D, R(D) bits/symbol
`are sufficient to enable source reconstruction with an average distortion
`that is arbitrarily close to D. The actual rate R should obey
`
`R Z R(D)
`
`(18.5)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`for the fidelity level D. The function R(D) is called the rate-distortion
`function. Note that R(O) = H(X).
`
`Rate, R
`
`H(X)
`
`0
`
`Distortion, D
`
`Dmax
`
`Figure 18.2: Rate distortion function.
`
`
`
`
`
`8
`
`
`
`18.2. SYMBOL CODING
`
`353
`
`A typical rate—distortion function is depicted in Figure 18.2. The rate distortion
`function can be computed analytically for simple source and distortion models.
`Computer algorithms exist to compute R(D) when analytical methods fail or are
`unpractical [Ber 71]. In general, we are interested in designing a compression system
`to achieve either the lowest bitrate for a given distortion or the lowest distortion at
`a given bitrate. Note that the source coding theorem does not state how to design
`algorithms to achieve these desired limits. Some well—known lossy coding algorithms
`are discussed in Chapters 19, 20, and 21.
`
`18.2 Symbol Coding
`
`Symbol coding is the process of assigning a bit string to individual symbols or to
`a block of symbols comprising the source. The simplest scheme is to assign equal—
`length codewords to individual symbols or a fixed-length block of symbols, which
`is known as fixed—length coding. Because compression is generally achieved by as—
`signing shorter—length codewords to more probable symbols, we next describe two
`variable-length coding, also known as entropy coding, schemes. The first, Huffman
`coding, assigns variable—length codes to a fixed-length block of symbols, where the
`block length can be one. In Huffman coding, the length of the codewords is propor—
`tional to the information (in bits) of the respective symbols or block of symbols. The
`latter, arithmetic coding, assigns variable-length codes to a variable-length block of
`symbols.
`
`18.2.1 Fixed-Length Coding
`
`In fixed-length coding, we assign equal-length code words to each symbol in
`the alphabet A regardless of their probabilities.
`If the alphabet has MT different
`symbols (or blocks of symbols), then the length of the code words is the smallest
`integer greater than logz M. Two commonly used fixed—length coding schemes are
`natural codes and Gray codes, which are shown in Table 18.1 for the case of a
`four-symbol source. Notice that in Gray coding, the consecutive codewords differ in
`only one bit position. This property of the Gray codes may provide an advantage in
`error detection. We will see in Section 18.3 that Gray codes are also better suited
`for run-length encoding of bit—planes.
`
`
`Table 18.1: Fixed-length codes for a four-symbol alphabet.
`
`Symbol Natural code Gray code
`
`a1
`
`a2
`
`
`
`
`
`a3
`10
`ll
`11
`
`
`00
`
`01
`
`00
`
`0].
`
`
`
`9
`
`
`
`354
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`It can easily be shown that fixed—length coding is optimal only when:
`
`1) the number of symbols is equal to a power of 2, and
`2) all the symbols are equiprobable.
`
`Only then would the entropy of the source be equal to the average length of the
`codewords, which is equal to the length of each codeword in the case of fixed—length
`coding. For the example shown in Table 18.1, both the entropy of the source and the
`average codeword length is 2, assuming all symbols are equally likely. Most often,
`some symbols are more probable than others, where it would be more advantageous
`to use entropy coding. Actually, the goal of the transformation box in Figure 18.1
`is to obtain a set of symbols with a skew probability distribution, to minimize the
`entropy of the transformed source.
`
`18.2.2 Huffman Coding
`
`Huffman coding yields the optimal integer prefix codes given a source with a finite
`number of symbols and their probabilities. In prefix codes, no codeword is a prefix
`of another codeword. Such codes are uniquely decodable since a given binary string
`can only be interpreted in one way. Huffman codes are optimal in the sense that no
`other integer—length VLC can be found to yield a smaller average bitrate. In fact,
`the average length of Huifman codes per codeword achieves the lower bound, the
`entropy of the source, when the symbol probabilities are all powers of 2.
`Huffman codes can be designed by following a very simple procedure. Let X
`denote a DMS with the alphabet A and the symbol probabilities p(a,), a, E .A,
`2': 1, .
`. .,M. Obviously, if M = 2, we must have
`
`C(al) = 0
`
`and
`
`c(a2) :1
`
`(18.6)
`
`where C(ag) denotes the codeword for the symbol oi, 2' = 1, 2. If A has more than
`two symbols, the Huffman procedure requires a series of source reduction steps. In
`each step, we find and merge the two symbols with the smallest probabilities, which
`results in a new source with a reduced alphabet. The probability of the new symbol
`in the reduced alphabet is the sum of the probabilities of the two “merged” symbols
`from the previous alphabet. This procedure is continued until we reach a source
`with only two symbols, for which the codeword assignment is given by (18.6). Then
`we work backwards towards the original source, each time splitting the codeword
`of the “merged” symbol into two new codewords by appending it with a zero and
`one, respectively. The following examples demonstrate this procedure.
`
`Example: Symbol probabilities are powers of2
`
`Let the alphabet .A consist of four symbols, shown in Table 18.2. The
`probabilities and the information content of the symbols in the alphabet
`are also listed in the table. Note that all symbol probabilities are powers
`of 2, and consequently the symbols have integer information values.
`
`10
`
`
`
`10
`
`
`
`18.2. SYMBOL CODING
`
`355
`
`Table 18.2: An alphabet where the symbol probabilities are powers of 2.
`
`Symbol Probability
`
`Information
`1 bit
`
`3 bits
`
`2 bits
`
`3 bits
`
`The Huffman coding procedure is demonstrated for this alphabet in
`Table 18.3. The reduced alphabet in Step 1 is obtained by merging
`the symbols a3 and £14 in the original alphabet which have the lowest
`two probabilities. Likewise, the reduced alphabet in Step 2 is obtained
`by merging the two symbols with the lowest probabilities after Step 1.
`Since the reduced alphabet in Step 2 has only two symbols, we assign
`the codes 0 and 1 to these symbols in arbitrary order. Next, we assign
`codes to the reduced alphabet in Step 1. We recall that the symbol 2
`in Step 2 is obtained by merging the symbols 2 and 3 in Step 1. Thus,
`we assign codes to symbols 2 and 3 in Step 1 by appending the code for
`symbol 2 in Step 2 by a zero and one in arbitrary order. The appended
`zero and one are shown by bold fonts in Table 18.3. Finally, the codes
`for the original alphabet are obtained in a similar fashion.
`
`
`Table 18.3: Illustration of alphabet reduction.
`
`
`
`Original Alphabet Reduced Alphabet Reduced Alphabet
`Step 1
`Step 2
`
`
`
`
`
`0.50
`
`0.25
`
`0.125
`
`0.125
`
`0
`
`10
`
`110
`
`111
`
`This procedure can alternatively be described by the tree diagram shown
`in Figure 18.3.
`
`Observe that in this case, the average codeword length is
`
`R=O.5X1+0.25X2+0.125 x 3+0.125x 3: 1.75
`
`and the entropy of the source is
`
`H = —0.5ln0.5 — 0.251n0.25 — 0.125ln0.125 — 0.125ln0.125 = 1.75
`
`which is consistent with the result that Huffman coding achieves the
`entropy of the source when the symbol probabilities are powers of 2.
`
`11
`
`11
`
`
`
`356
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`a]”Fri
`
`:05
`
`0
`
`a2 p=0.25 a
`
`4 p=0.125
`
`Figure 18.3: Tree-diagram for Huffman coding.
`
`Next, we present an example in which the symbol probabilities are not
`powers of 2.
`
`Example 2: Symbol probabilities are not po‘wers of 2
`The information content of each symbol is a real number, as shown in
`Table 18.4, when the probabilities of the symbols are not powers of 2.
`
`
`
`
`
`Table 18.4: An alphabet with arbitrary symbol probabilities.
`
`Information
`
`Symbol Probability
`(11
`0.40
`1.32 hits
`(12
`0.25
`2.00 bits
`
`
`613
`0.15
`2.73 bits
`
`
`(14
`0.15
`2.73 bits
`4.32 bits
`
`
`
`
`Since the length of each codeword must be an integer1 it is not possible
`to design codewords whose lengths are equal to the information of the
`respective symbols in this case. Huffman code design for the alphabet.
`in Table 18.4 is shown in Table 18.5. It can be easily seen that for this
`example the average length of codewords is 2.15. and entropy of the
`source is 2.07.
`
`[-lufl"mau codes are uniquely decodable, with proper syn-
`Notice that.
`chronization. because no codeword is a prefix of another. For example,
`a received binary string
`
`001101101110000. ..
`
`can be decoded uniquely as
`
`a3 a1 a2 a1 a2 a1 a1 a4 ...
`
`12
`
`
`
`12
`
`
`
`
`
`18.2. SYMBOL CODING
`
`357
`
`Table 18.5: Huffman coding when probabilities are not powers of 2.
`Original Alphabet
`
`
`
`
`.
`
`.
`
`0.40
`1
`0.25
`01
`001
`0.15
`
`0.15
`0000
`
`0.05
`0001
`
`
`Huliman coding can also be used as a block coding scheme where we assign
`codewords to combinations of L symbols from the original alphabet at a time. Of
`course, this requires building a new block alphabet with all possible combinations
`of the L symbols from the original alphabet and computing their respective prob-
`abilities. Huffman codes for all possible combinations of the L symbols from the
`original alphabet can be formed using the above design procedure with the HEW
`block alphabet. Thus, Huffman coding is a block coding scheme, where we assign
`variable—length codes to fixed-length (L) blocks of symbols. The case L = 1 refers to
`assigning an individual codeWOrd to each symbol of the original alphabet, as shown
`in the above two examples. It has been shown that for sources with memory, the
`coding efficiency improves as L gets larger, although the design of Huffman codes
`gets more complicated.
`
`18.2.3 Arithmetic Coding
`
`In arithmetic coding a one—to—one correspondence between the symbols of an alpha—
`bet ./-1 and the codewords does not exist. Instead, arithmetic coding assigns a single
`variable—length code to a source X, composed of N symbols, where N is variable.
`The distinction between arithmetic coding and block Huffman coding is that in
`arithmetic coding the length of the input sequence,
`i.e., the block of symbols for
`which a single codeword is assigned,
`is variable. Thus, arithmetic coding assigns
`variable—length codewords to variable-length blocks of symbols. Because arithmetic '
`coding does not require assignment of integer—length codes to fixed—length blocks
`of symbols, in theory it can achieve the lower bound established by the noiseless
`coding theorem.
`. .,xN}, with a
`Arithmetic coding associates a given realization of X, x : {$1, .
`subinterval of [0, 1) whose length equals the probability of the sequence p(x). The
`encoder processes the input stream of symbols one by one, starting with N = 1,
`where the length of the subinterval associated with the sequence gets smaller as
`N increases. Bits are sequentially sent to the channel starting from the most sig—
`nificant bit towards the least significant bit as they are determined according to
`a procedure, which is presented in an algorithmic form in the following. At the
`end of the transmission, the transmitted bitstream is a uniquely decodable code-
`
`13
`
`13
`
`
`
`F———
`
`358
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`word representing the source, which is a binary number pointing to the subinterval
`associated with this sequence.
`
`The Procedure
`. .,M, with the probes
`i = 1, .
`mbols an,
`Consider an alphabet .A that has M sy
`l. The procedure starts with assigning
`bilities flog) = 39,-, such that p1 +. . .+pM 2
`3.1, within 0 to 1, whose length is
`each individual symbol in the alphabet a subinterv
`equal t
`is known to the decoder.
`0 its probability. It is assumed that this assignment
`
`i : 1,...,M, then define the initial
`If the first input symbol m1 = ai,
`1.
`subinterval as I1 : [11,7'1) = [p,-_1,p,-_1 +pi), where p0 : 0. Set n : 1, L :2 l1,
`R=r1, and d: rl—ll.
`2. Obtain the binary expansions of L and R as
`00
`
`00
`
`L = 2114624,
`
`and
`
`R = Z vk2'k
`
`k=l
`
`k=1
`
`ot the same, send nothing to the channel at
`
`where m, and 12k are 0 or 1.
`Compare ul and 01. If they are n
`this time, and go to step 3.
`If ul : v1, then send the binary symbol ul,
`not the same, go to step 3.
`If Hg 2 122, also send the binary symbol 112, and compare U3 and v3, and so on,
`until the next two corresponding binary symbols do not match, at which time go
`to step 3.
`3.
`Increment n, and read the next symbol. If the nth input symbol on = ai,
`then subdivide the interval from the previous step as
`In : llna 7'11) : [In—1 + Pi—ld; ln—l +(Pi—1+Pi)d).
`Set L :1”, R: r”, and d: rn —— In, and go to step 2.
`Note that the decoder may decode one binary symbol into several source sym-
`bols, or it may require several binary symbols before it can decode one or more
`source symbols. The arithmetic coding procedure is illustrated by means of the
`following example.
`
`
`
`and compare 112 and 02. If they are
`
`Example
`Suppose we wish to determine an arithmetic code to re
`of symbols,
`
`present a sequence
`
`l
`
`(1.2111653...
`
`shown in Table 18.2. Because we have four symbols
`from the source
`in the alphabet, the interval from 0 to 1 is initially subdivided into 4,
`where the lengths of the subintervals are equal to 0.5, 0.25, 0.125 and
`0.125, respectively. This is depicted in Figure 18.4.
`
`14
`
`14
`
`
`
`
`
`18.2. SYMBOL CODING
`
`359
`
`Symbol
`
`Decima1
`Binary
`
`“4
`“3
`“2
`“1
`;_——t.;—L—;J
`0
`_,x‘0.5
`0.7‘5~.‘ 0.875
`1.0
`0.0
`0.1
`0.11
`0.111
`1.0
`
`Decimal
`Binary
`
`Decimai
`
`Binary
`
`aza1
`
`L.-
`.
`0.5
`[50.1
`
`a2a2
`\
`0.625.
`0.101
`
`a2a3 a‘2~a4
`.,
`0.6875 0.71875 0.75
`“0.1011 0.10111 0.11
`
`lil2“1‘l‘;i‘“‘~2?‘1"'4
`=112"“11'2
`azalal
`l—J—l_l—‘-J
`
`0.5
`
`0.1
`
`0.5625
`
`0.59375 0.609375 0.625
`
`0.1001
`
`0.10011 0.100111 0.101
`
`Figure 18.4: Illustration of the concept of arithmetic coding.
`
`The first symbol defines the initial interval as I1 = [0.5, 0.75), where the
`binary representations of the left and right boundaries are L = 2‘1 = 0.1
`and R = 2'1 + 2‘2 = 0.11,
`respectively. According to step 2,
`u1 = 01 = 1;
`thus,
`1 is sent to the channel. Noting that U2 = 0
`and 712 = 1, we read the second symbol,
`(11. Step 3 indicates that
`I2 = [0.5,0.625), with L = 0.10 and R = 0101. Now that uz = 02 = 0,
`we send 0 to the channel. However, 11;; = 0 and 1);; = 1, so we read the
`third symbol, as.
`It can be easily seen that I3 = [0.59375, 0.609375),
`with L = 0.10011 and R = 0.100111. Note that U3 2 v3 = 0,
`U4 2 04 = 1, and U5 = 775 = 1, but us 2 0 and 115 = 1. At this stage, we
`send 011 to the channel, and read the next symbol. A reserved symbol
`usually signals the end of a sequence.
`
`Let’s now briefly look at how the decoder operates, which is illustrated
`in Figure 18.5. The first bit restricts the interval to [0.5, 1). However,
`
`Received Bit
`1
`0
`0
`1
`1
`
`
`Interval
`Symbol
`[0.5, 1)
`-
`[0.5, 0.75)
`a2
`[0.5, 0.609375)
`a1
`[0.5625, 0.609375)
`—
`[0.59375, 0.609375)
`a3
`
`Figure 18.5: The operation of the decoder.
`
`15
`
`15
`
`
`
`360
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`three symbols are Within this range; thus, the first bit does not contain
`sufficient information. After receiving the second bit, we have 10 which
`points to the interval [0.5, 0.75). All possible combinations of two sym—
`bols pointing to this range start with a2. Hence, we can now decode
`the first symbol as 0.2. The information that becomes available after the
`receipt of each bit is summarized in Figure 18.5.
`
`In practice, two factors cause the performance of the arithmetic encoder to fall
`short of the theoretical bound:
`the addition of an end—of—message indicator, and
`the use of finite precision arithmetic. Practical implementations of the arithmetic
`coder overcome the precision problem by a scaling and a rounding strategy.
`
`18.3 Lossless Compression Methods
`
`Error-free coding is the only acceptable means of compression in some applications
`for various reasons. For example, in the transmission or archival of medical im-
`ages lossy compression is not allowed for legal reasons. Recalling the elements of
`a compression system, lossless coding schemes do not employ a quantizer. They
`consist of a transformation, which generates symbols whose probability distribution
`is highly peaked, followed by an entropy coder. The transformation aims to mini—
`mize the entropy of its output, so that significant compression becomes possible by
`variable-length coding of the generated symbols.
`i)
`In this section, we present three popular methods for lossless compression:
`Lossless predictive coding, where an integer predictive mapping is employed, fol—
`lowed by entropy coding of the integer prediction errors.
`ii) Run—length coding of
`bit-planes, where the image is decomposed into individual bit-planes (binary im—
`ages), and the run—lengths of zeros and ones in these planes are entropy coded.
`iii) Ziv—Lempel coding, which is a deterministic coding procedure, where the input
`bit string is parsed into blocks of variable length to form a dictionary of blocks
`(symbols), each of which is represented by a fixedalength codeword. The achiev-
`able compression ratio using lossless coding methods ranges between 2:1 to 5:1,
`depending on the characteristics of the input image.
`
`18.3.1 Lossless Predictive Coding
`
`The first step in lossless predictive coding is to form an integer-valued prediction
`of the next pixel intensity to be encoded based on a set of previously encoded
`neighboring pixels. Then the difference between the actual intensity of the pixel
`and its prediction is entropy coded. Assuming each pixel is integer—valued, then the
`prediction errors are also integerflvalued, which facilitates lossless compression. The
`block diagram of a simple predictor is shown in Figure 18.6, where the prediction
`is taken as the intensity of the previous pixel encoded. Some other commonly used
`integer predictors are shown in the following example.
`
`16
`
`
`
`16
`
`
`
`
`
`18.3. LOSSLESS COMPRESSION METHODS
`
`361
`
`Sample
`
`Residual
`
`Sample
`
`Previous
`
`Residual
`
`
`
`(a)
`
`Reconstructed
`
`Sample
`
` Previous
`
`Sample
`
`(b)
`
`Figure 18.6: Block diagram of a) an encoder, and b) a decoder using a simple
`predictor.
`
`Example: Integer prediction
`
`In order for the decoder to be able to duplicate the prediction step, the
`prediction operation must be based on already-encoded pixels. In 2-D,
`such prediction models are called recursively computable. The support
`of a recursively computable predictor is shown in Figure 18.7, where the
`coefficients a, b, c, and 0! denote the intensities of the respective pixels.
`\.
`
`Two simple predictors based on this model can be written as
`
`:E' = int{(a + 20/2}
`
`(18.7)
`
`
`
`Figure 18.7: Prediction schemes.
`
`17
`
`
`
`17
`
`
`
`
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`:“c:int{(a+b+c+d)/4}
`
`(18.8)
`
`where 32' denotes the predicted value for the pixel as. In both expressions,
`the prediction is rounded to the nearest integer to ensure an integer
`prediction error. We note that many other forms of prediction, such as
`edge—adaptive prediction, also exist.
`
`If the input image intensity cc has a dynamic range of (0,255), then the prediction
`error :3 — a? has a theoretical dynamic range of (~255,255). The reader may have
`noticed that the predictive mapping in fact results in an expansion of the dynamic
`range of the signal to be encoded. However, inspection of the histogram (probability
`density) of the prediction error shows that it is highly peaked about 0, as compared
`to the histogram of the actual image intensities, as shown in Figure 18.8. Therefore,
`the prediction error always has much smaller entropy than the original intensity
`values, which implies that the prediction process removes a great deal of interpixel
`(statistical) redundancy.
`
`Relative frequency
`
`Relative frequency
`
`
`
`0
`
`_
`_
`,
`_
`Onginal image intensrty
`a)
`
`255
`
`_'_
`-25
`
`.
`.
`0_
`Integer predictlon error
`b)
`
`255
`
`Figure 18.8: Histograms of a) the original image intensity and b) integer prediction
`error.
`
`For lossless coding, every possible difference value in the range (—255,255) needs
`to be taken as a different symbol. Binary codewords for these symbols can be
`assigned by entropy coding, such as Huffman coding or arithmetic coding. However,
`because it is usually very costly to assign a different codeword to 513 different
`symbols, we usually assign a unique codeword to every difference value in the range
`(-15,16).
`In addition, codewords are assigned to a shift up (SU) symbol and a
`shift down (SD) symbol, which shifts a given difference value up or down by 32,
`respectively. Using these codewords, every possible difference value can be uniquely
`coded by using an appropriate number of SU or SD operators f