`
`32777 009559381
`
`I
`
`1
`
`11 ON1__
`
`1 11 I
`
`V
`
`w
`
`Ill
`
`iti
`
`I
`
`ffoe
`
`•
`
`•
`
`•
`
`•
`
`V
`
`•
`
`•
`
`•
`
`11111:1171114d
`
`r
`
`PRENTICE HALL SIGNAL PROCESSING SERIES
`
`Comcast - Exhibit 1018, page 1
`
`Comcast - Exhibit 1018, page 1
`
`
`
`George Mason University
`University Libraries
`Digital Video Processing
`
`A. Murat 'Tekalp
`University of Rochester
`
`For book and bookstore Information
`
`http://www.prenhall.com
`
`Prentice Hall PTR
`Upper Saddle River, NJ 07458
`
`Comcast - Exhibit 1018, page 2
`
`Comcast - Exhibit 1018, page 2
`
`
`
`I
`
`Tekalp, A. Murat.
`Digital video processing / A. Murat Tekalp.
`p. cm. -- (Prentice-Hall signal processing series)
`ISBN 0-13-190075-7 (alk. paper)
`1. Digital video. I. Title. II. Series.
`TK6680.5.T45 1995
`621.388'33--dc20
`
`95-16650
`CIP
`
`Editorial/production supervision: Ann Sullivan
`Cover design: Design Source
`Manufacturing manager: Alexis R. Heydt
`Acquisitions editor: Karen Gettman
`Editorial assistant: Barbara Alfieri
`
`CO'
`
`Printed on Recycled Paper
`
`©1995 by Prentice Hall PTR
`Prentice-Hall, Inc.
`A Simon and Schuster Company
`Upper Saddle River, NJ 07458
`
`The publisher offers discounts on this book when ordered in bulk quantities.
`
`For more information, contact:
`
`Corporate Sales Department
`Prentice Hall PTR
`One Lake Street
`Upper Saddle River, NJ 07458
`
`Phone: 800-382-3419
`Fax: 201-236-7141
`
`email: corpsales@prenhall.com
`
`All rights reserved. No part of this book may be reproduced, in any form or by any means,
`without permission in writing from the publisher.
`
`Printed in the United States of America
`
`10 9 8 7 6 5 4 3 2 1
`
`ISBN: 0-13-190075-7
`
`Prentice-Hall International (UK) Limited, London
`Prentice-Hall of Australia Pty. Limited, Sydney
`Prentice-Hall Canada Inc., Toronto
`Prentice-Hall Hispanoamericana, S.A., Mexico
`Prentice-Hall of India Private Limited, New Delhi
`Prentice-Hall of Japan, Inc., Tokyo
`Simon & Schuster Asia Pte. Ltd., Singapore
`Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro
`
`Comcast - Exhibit 1018, page 3
`
`Comcast - Exhibit 1018, page 3
`
`
`
`Chapter 18
`
`LOSSLESS
`COMPRESSION
`
`The need for effective data compression is evident in almost all applications where
`storage and transmission of digital images are involved. For example, an 8.5 x 11 in
`document scanned at 300 pixels/in with 1 bit/pixel generates 8.4 Mbits data, which
`without compression requires about 15 min transmission time over a 9600 baud line.
`A 35 mm film scanned at 12 micron resolution results in a digital image of size 3656
`pixels x 2664 lines. With 8 bits/pixel per color and three color channels, the storage
`required per picture is approximately 233 Mbits. The storage capacity of a CD is
`about 5 Gbits, which without compression can hold approximately 600 pages of a
`document, or 21 color images scanned from 35 mm film. Several world standards
`for image compression, such as ITU (formerly CCITT) Group 3 and 4 codes, and
`ISO/IEC/CCITT JPEG, have recently been developed for efficient transmission
`and storage of binary, gray-scale, and color images.
`Compression of image data without significant degradation of the visual quality
`is usually possible because images contain a high degree of i) spatial redundancy, due
`to correlation between neighboring pixels, ii) spectral redundancy, due to correlation
`among the color components, and iii) psychovisual redundancy, due to properties
`of the human visual system. The higher the redundancy, the higher the achievable
`compression. This chapter introduces the basics of image compression, and discusses
`some lossless compression methods. It is not intended as a formal review of the
`related concepts, but rather aims to provide the minimum information necessary to
`follow the popular still-image and video compression algorithms/standards, which
`will be discussed in the subsequent chapters. Elements of an image compression
`system as well as some information theoretic concepts are introduced in Section 18.1.
`Section 18.2 discusses symbol coding, and in particular entropy coding, which is an
`integral part of lossless compression methods. Finally, three commonly used lossless
`compression algorithms are presented in Section 18.3.
`
`348
`
`Comcast - Exhibit 1018, page 4
`
`Comcast - Exhibit 1018, page 4
`
`
`
`18.1. BASICS OF IMAGE COMPRESSION
`
`349
`
`18.1 Basics of Image Compression
`
`In this section, we first present the elements of a general image compression system,
`then summarize some results from the information theory which provide bounds on
`the achievable compression ratios and bitrates.
`
`18.1.1 Elements of an Image Compression System
`
`In information theory, the process of data compression by redundancy reduction is
`referred to as source encoding. Images contain two types of redundancy, statistical
`(spatial) and pyschovisual. Statistical redundancy is present because certain spatial
`patterns are more likely than others, whereas psychovisual redundancy originates
`from the fact that the human eye is insensitive to certain spatial frequencies. The
`block diagram of a source encoder is shown in Figure 18.1. It is composed of the
`following blocks:
`
`
`
`Input
`image
`
`T
`
`Symbols
`
`Binary
`bit
`stream
`
`Figure 18.1: Block diagram of an image compression system.
`
`i) Transformer (T) applies a one-to-one transformation to the input image data.
`The output of the transformer is an image representation which is more amenable
`to efficient compression than the raw image data. Typical transformations are linear
`predictive mapping, which maps the pixel intensities onto a prediction error signal
`by subtracting the predictible part of the pixel intensities; unitary mappings such as
`the discrete cosine transform, which pack the energy of the signal to a small number
`of coefficients; and multiresolution mappings, such as subband decompositions and
`the wavelet transform.
`ii) Quantizer (Q) generates a limited number of symbols that can be used in the rep-
`resention of the compressed image. Quantization is a many-to-one mapping which
`is irreversible. It can be performed by scalar or vector quantizers. Scalar quantiza-
`tion refers to element-by-element quantization of the data, whereas quantization of
`a block of data at once is known as vector quantization.
`iii) Coder (C) assigns a codeword, a binary bitstream, to each symbol at the output
`of the quantizer. The coder may employ fixed-length or variable-length codes.
`Variable-length coding (VLC), also known as entropy coding, assigns codewords in
`such a way as to minimize the average length of the binary representation of the
`
`Comcast - Exhibit 1018, page 5
`
`Comcast - Exhibit 1018, page 5
`
`
`
`350
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`symbols. This is achieved by assigning shorter codewords to more probable symbols,
`which is the fundamental principle of entropy coding.
`Different image compression systems implement different combinations of these
`choices. Image compression methods can be broadly classified as:
`i) Lossless (noiseless) compression methods, which aim to minimize the bitrate
`without any distortion in the image.
`ii) Lossy compression methods, which aim to obtain the best possible fidelity for a
`given bitrate, or to minimize the bitrate to achieve a given fidelity measure.
`The transformation and encoding blocks are lossless. However, quantization is lossy,
`Therefore, lossless methods, which only make use of the statistical redundancies, do
`not employ a quantizer. In most practical cases a small degradation in the image
`quality must be allowed to achieve the desired bitrate. Lossy compression methods
`make use of both the statistical and psychovisual redundancies.
`In the following, we first briefly review some results from the information theory
`which gives bounds on the achievable bitrates in both lossless and lossy compres-
`sion. In Section 18.2, we present techniques for symbol coding (the third box in
`Figure 18.1). The discussion of the second box, the quantizer, is deferred until the
`next chapter. Finally, some commonly used lossless image compression methods are
`presented in Section 18.3.
`
`Information Theoretic Concepts
`18.1.2
`A source X with an alphabet A is defined as a discrete random process (a sequence
`of random variables Xi, i = 1, . . .) in the form X = X1 X2 . ., where each random
`variable Xi takes a value from the alphabet A. In the following, we assume that
`the alphabet contains a finite number (M) of symbols, i.e., A = {al , a2,
`Here, we introduce two source models, a discrete memoryless source (DMS) and a
`Markov-K source.
`A DMS is such that successive symbols are statistically independent. It is
`, M such that
`completely specified by the probabilities p(ai) = pi, i = 1, .
`P1 + • • • + Pm = 1. In VLC, the optimum length of the binary code for a sym-
`bol is equal to the information (in bits) that the symbol conveys. According to
`information theory, the information content of a symbol is related to the extent
`that the symbol is unpredictable or unexpected. If a symbol with low probability
`occurs, a larger amount of information is transferred than in the occurence of a
`more likely symbol. This quantitative concept of surprise is formally expressed by
`the relation
`
`/(ai) = log2 (1/p(ai)) ,
`
`for ai E A
`
`(18.1)
`
`where I(ai) is the amount of information that the symbol ai with probability p(ai)
`carries. The unit of information is bit when we use logarithm with base-2. Observe
`0, I
`oo. In practice,
`that if p = 1, then I = 0 as expected, and as p
`the probability of occurrence of each symbol is estimated from the histogram of a
`specific source, or a training set of sources.
`
`Comcast - Exhibit 1018, page 6
`
`Comcast - Exhibit 1018, page 6
`
`
`
`18.1. BASICS OF IMAGE COMPRESSION
`
`351
`
`The entropy H(X) of a DMS X with an alphabet A is defined as the average
`information per symbol in the source, given by
`H(X) = E p(a) log2 (1/p(a))
`aEA
`- E p(a) loge p(a)
`
`(18.2)
`
`aEA
`
`The more skewed the probability distribution of the symbols, the smaller the entropy
`of the source. The entropy is maximized for a flat distribution, that is, when all
`symbols are equally likely. It follows that a source where some symbols are more
`likely than others has a smaller entropy than a source where all symbols are equally
`likely.
`
`Example: Entropy of raw image data
`Suppose an 8-bit image is taken as a realization of a "DMS X. The
`symbols i are the gray levels of the pixels, and the alphabet A is the
`collection of all gray levels between 0 and 255. Then the entropy of the
`image is given by
`
`255
`
`H(X) = — E p(i) log2 p(i)
`i=o
`
`where p(i) denotes the relative frequency of occurrence of the gray level
`i in the image. Note that the entropy of an image consisting of a single
`gray level (constant image) is zero.
`
`Most realistic sources can be better modeled by Markov-K random processes.
`That is, the probability of occurence of a symbol depends on the values of K preced-
`ing symbols. A Markov-K source can be specified by the conditional probabilities
`p(Xi = ailXi_i, . . . ,Xj_K), for all j, ai E A. The entropy of a Markov-K source
`is defined as
`
`H(X) = Ep(Xj-i, Xj-K)H(XIXj-i, • • • Xj—K)
`
`sK
`
`(18.3)
`
`, X .i _K, and
`where SK denotes all possible realizations of Xi _1,
`H(XIXj_i, . . . ,Xj_K)= E p(ailXj_i, • • • ,Xj_K) logp(ailXi_i, • • • ,Xj-K)
`
`aEA
`
`In the following, we present two fundamental theorems, the Lossless Coding The-
`orem and the Source Coding Theorem, which are used to measure the performance
`of lossless coding and lossy coding systems, respectively.
`
`Comcast - Exhibit 1018, page 7
`
`Comcast - Exhibit 1018, page 7
`
`
`
`352
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`Lossless Coding Theorem: [Shannon, 1948] The minimum bitrate that
`can be achieved by lossless coding of a discrete memoryless source X is
`given by
`
`min{R} = H(X) + e bits/symbol
`
`(18.4)
`
`where R is the transmission rate, H(X) is the entropy of the source, and
`c is a positive quantity that can be made arbitrarily close to zero.
`
`The Lossless Coding Theorem establishes the lower bound for the bitrate neces-
`sary to achieve zero coding-decoding error. In the case of a DMS, we can approach
`this bound by encoding each symbol independently. For sources with memory, we
`need to encode blocks of N source symbols at a time to come arbitrarily close to
`the bound. For example, a Markov-M source should be encoded M symbols at a
`time. In the next section, we introduce two coding techniques, Huffman coding and
`arithmetic coding, that approach the entropy bound.
`In lossy coding schemes, the achievable minimum.bitrate is a function of the
`distortion that is allowed. This relationship between the bitrate and distortion is
`given by the rate distortion function [Ber 71].
`
`Source Coding Theorem: There exists a mapping from the source sym-
`bols to codewords such that for a given distortion D, R(D) bits/symbol
`are sufficient to enable source reconstruction with an average distortion
`that is arbitrarily close to D. The actual rate R should obey
`
`R > R(D)
`
`(18.5)
`
`for the fidelity level D. The function R(D) is called the rate-distortion
`function. Note that R(0) = H(X).
`
`Rate, R
`
`H(X)
`
`R(D)
`
`Distortion, D
`
`D max
`
`Figure 18.2: Rate distortion function.
`
`Comcast - Exhibit 1018, page 8
`
`Comcast - Exhibit 1018, page 8
`
`
`
`18.2. SYMBOL CODING
`
`353
`
`A typical rate-distortion function is depicted in Figure 18.2. The rate distortion
`function can be computed analytically for simple source and distortion models.
`Computer algorithms exist to compute R(D) when analytical methods fail or are
`unpractical [Ber 71]. In general, we are interested in designing a compression system
`to achieve either the lowest bitrate for a given distortion or the lowest distortion at
`a given bitrate. Note that the source coding theorem does not state how to design
`algorithms to achieve these desired limits. Some well-known lossy coding algorithms
`are discussed in Chapters 19, 20, and 21.
`
`18.2 Symbol Coding
`
`Symbol coding is the process of assigning a bit string to individual symbols or to
`a block of symbols comprising the source. The simplest scheme is to assign equal-
`length codewords to individual symbols or a fixed-length block of symbols, which
`is known as fixed-length coding. Because compression is generally achieved by as-
`signing shorter-length codewords to more probable symbols, we next describe two
`variable-length coding, also known as entropy coding, schemes. The first, Huffman
`coding, assigns variable-length codes to a fixed-length block of symbols, where the
`block length can be one. In Huffman coding, the length of the codewords is propor-
`tional to the information (in bits) of the respective symbols or block of symbols. The
`latter, arithmetic coding, assigns variable-length codes to a variable-length block of
`symbols.
`
`18.2.1 Fixed-Length Coding
`
`In fixed-length coding, we assign equal-length code words to each symbol in
`the alphabet A regardless of their probabilities. If the alphabet has M different
`symbols (or blocks of symbols), then the length of the code words is the smallest
`integer greater than log2 M. Two commonly used fixed-length coding schemes are
`natural codes and Gray codes, which are shown in Table 18.1 for the case of a
`four-symbol source. Notice that in Gray coding, the consecutive codewords differ in
`only one bit position. This property of the Gray codes may provide an advantage in
`error detection. We will see in Section 18.3 that Gray codes are also better suited
`for run-length encoding of bit-planes.
`
`Table 18.1: Fixed-length codes for a four-symbol alphabet.
`Symbol Natural code Gray code
`a l
`00
`00
`01
`01
`a2
`a3
`10
`11
`a4
`11
`10
`
`Comcast- Exhibit 1018, page 9
`
`Comcast - Exhibit 1018, page 9
`
`
`
`354
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`It can easily be shown that fixed-length coding is optimal only when:
`
`1) the number of symbols is equal to a power of 2, and
`2) all the symbols are equiprobable.
`
`Only then would the entropy of the source be equal to the average length of the
`codewords, which is equal to the length of each codeword in the case of fixed-length
`coding. For the example shown in Table 18.1, both the entropy of the source and the
`average codeword length is 2, assuming all symbols are equally likely. Most often,
`some symbols are more probable than others, where it would be more advantageous
`to use entropy coding. Actually, the goal of the transformation box in Figure 18.1
`is to obtain a set of symbols with a skew probability distribution, to minimize the
`entropy of the transformed source.
`
`18.2.2 Huffman Coding
`
`Huffman coding yields the optimal integer prefix codes given a source with a finite
`number of symbols and their probabilities. In prefix codes, no codeword is a prefix
`of another codeword. Such codes are uniquely decodable since a given binary string
`can only be interpreted in one way. Huffman codes are optimal in the sense that no
`other integer-length VLC can be found to yield a smaller average bitrate. In fact,
`the average length of Huffman codes per codeword achieves the lower bound, the
`entropy of the source, when the symbol probabilities are all powers of 2.
`Huffman codes can be designed by following a very simple procedure. Let X
`denote a DMS with the alphabet A and the symbol probabilities p(ai), ai E A,
`i = 1, . . . , M. Obviously, if M = 2, we must have
`
`c(ai) = 0 and c(a2) = 1
`
`(18.6)
`
`where c(ai) denotes the codeword for the symbol ai, i = 1, 2. If A has more than
`two symbols, the Huffman procedure requires a series of source reduction steps. In
`each step, we find and merge the two symbols with the smallest probabilities, which
`results in a new source with a reduced alphabet. The probability of the new symbol
`in the reduced alphabet is the sum of the probabilities of the two "merged" symbols
`from the previous alphabet. This procedure is continued until we reach a source
`with only two symbols, for which the codeword assignment is given by (18.6). Then
`we work backwards towards the original source, each time splitting the codeword
`of the "merged" symbol into two new codewords by appending it with a zero and
`one, respectively. The following examples demonstrate this procedure.
`
`Example: Symbol probabilities are powers of 2
`Let the alphabet A consist of four symbols, shown in Table 18.2. The
`probabilities and the information content of the symbols in the alphabet
`are also listed in the table. Note that all symbol probabilities are powers
`of 2, and consequently the symbols have integer information values.
`
`Comcast - Exhibit 1018, page 10
`
`Comcast - Exhibit 1018, page 10
`
`
`
`18.2. SYMBOL CODING
`
`355
`
`Table 18.2: An alphabet where the symbol probabilities are powers of 2.
`Information
`Symbol Probability
`1 bit
`al
`0.50
`2 bits
`a2
`0.25
`3 bits
`a3
`0.125
`3 bits
`a4
`0.125
`
`The Huffman coding procedure is demonstrated for this alphabet in
`Table 18.3. The reduced alphabet in Step 1 is obtained by merging
`the symbols a3 and a4 in the original alphabet which have the lowest
`two probabilities. Likewise, the reduced alphabet in Step 2 is obtained
`by merging the two symbols with the lowest probabilities after Step 1.
`Since the reduced alphabet in Step 2 has only two symbols, we assign
`the codes 0 and 1 to these symbols in arbitrary order. Ne)tt, we assign
`codes to the reduced alphabet in Step 1. We recall that the symbol 2
`in Step 2 is obtained by merging the symbols 2 and 3 in Step 1. Thus,
`we assign codes to symbols 2 and 3 in Step 1 by appending the code for
`symbol 2 in Step 2 by a zero and one in arbitrary order. The appended
`zero and one are shown by bold fonts in Table 18.3. Finally, the codes
`for the original alphabet are obtained in a similar fashion.
`
`Table 18.3: Illustration of alphabet reduction.
`Reduced Alphabet
`Original Alphabet Reduced Alphabet
`Step 2
`Step 1
`c
`c
`0
`0
`1
`10
`11
`
`P
`0.50
`0.25
`0.125
`0.125
`
`c
`0
`10
`110
`111
`
`p
`0.50
`0.25
`0.25
`
`p
`0.50
`0.50
`
`This procedure can alternatively be described by the tree diagram shown
`in Figure 18.3.
`Observe that in this case, the average codeword length is
`
`.77/ = 0.5 x 1 + 0.25 x 2 + 0.125 x 3 0.125 x 3 = 1.75
`
`and the entropy of the source is
`
`H = —0.51n0.5 — 0.251n0.25 — 0.1251n0.125 — 0.1251n0.125 = 1.75
`
`which is consistent with the result that Huffman coding achieves the
`entropy of the source when the symbol probabilities are powers of 2.
`
`Comcast - Exhibit 1018, page 11
`
`Comcast - Exhibit 1018, page 11
`
`
`
`356
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`0
`
`1
`
`p=0.5
`
`a1
`
`p=0.5
`
`a2 p=0.25
`
`0
`
`1
`
`a3 p=0.125
`
`a4 p=0.125
`
`0
`
`1
`
`p=0.25
`
`Figure 18.3: Tree-diagram for Huffman coding.
`
`Next, we present an example in which the symbol probabilities are not
`powers of 2.
`
`Example 2: Symbol probabilities are not powers of 2
`The information content of each symbol is a real number, as shown in
`Table 18.4, when the probabilities of the symbols are not powers of 2.
`
`Table 18.4: An alphabet with arbitrary symbol probabilities.
`Symbol Probability
`Information
`1.32 bits
`al
`0.40
`2.00 bits
`a2
`0.25
`2.73 bits
`a3
`0.15
`2.73 bits
`a4
`0.15
`4.32 bits
`a5
`0.05
`
`Since the length of each codeword must be an integer, it is not possible
`to design codewords whose lengths are equal to the information of the
`respective symbols in this case. Huffman code design for the alphabet
`in Table 18.4 is shown in Table_ 18.5. It can he easily seen that for this
`example the average length of codewords is 2.15, and entropy of the
`source is 2.07.
`Notice that Huffman codes are uniquely decodable, with proper syn-
`chronization, because no codeword is a prefix of another. For example,
`a received binary string
`
`001101101110000. . .
`can be decoded uniquely as
`a3 al a2 al a2 al a l a4
`
`Comcast - Exhibit 1018, page 12
`
`Comcast - Exhibit 1018, page 12
`
`
`
`18.2. SYMBOL CODING
`
`357
`
`Table 18.5: Huffman coding when probabilities are not powers of 2.
`Original Alphabet
`Step 1
`Step 2
`Step 3
`p
`c
`p
`c
`p
`c
`p
`c
`0.40
`1
`0.40
`1
`1
`0.60 0
`0.40
`0.25
`01
`0.25
`01
`0.35 00 0.40 1
`0.20 000 0.25 01
`0.15
`001
`0.15
`0000
`0.15 001
`0001
`0.05
`
`Huffman coding can also be used as a block coding scheme where we assign
`codewords to combinations of L symbols from the original alphabet at a time. Of
`course, this requires building a new block alphabet with all possible combinations
`of the L symbols from the original alphabet and computing their respective prob-
`abilities. Huffman codes for all possible combinations of the L symbols from the
`original alphabet can be formed using the above design procedure with the new
`block alphabet. Thus, Huffman coding is a block coding scheme, where we assign
`variable-length codes to fixed-length (L) blocks of symbols. The case L = I refers to
`assigning an individual codeword to each symbol of the original alphabet, as shown
`in the above two examples. It has been shown that for sources with memory, the
`coding efficiency improves as L gets larger, although the design of Huffman codes
`gets more complicated.
`
`18.2.3 Arithmetic Coding
`
`In arithmetic coding a one-to-one correspondence between the symbols of an alpha-
`bet A and the codewords does not exist. Instead, arithmetic coding assigns a single
`variable-length code to a source X, composed of N symbols, where N is variable.
`The distinction between arithmetic coding and block Huffman coding is that in
`arithmetic coding the length of the input sequence, i.e., the block of symbols for
`which a single codeword is assigned, is variable. Thus, arithmetic coding assigns
`variable-length codewords to variable-length blocks of symbols. Because arithmetic
`coding does not require assignment of integer-length codes to fixed-length blocks
`of symbols, in theory it can achieve the lower bound established by the noiseless
`coding theorem.
`Arithmetic coding associates a given realization of X, x {x1, . .
`, xN}, with a
`subinterval of [0, 1) whose length equals the probability of the sequence p(x). The
`encoder processes the input stream of symbols one by one, starting with N = 1,
`where the length of the subinterval associated with the sequence gets smaller as
`N increases. Bits are sequentially sent to the channel starting from the most sig-
`nificant bit towards the least significant bit as they are determined according to
`a procedure, which is presented in an algorithmic form in the following. At the
`end of the transmission, the transmitted bitstream is a uniquely decodable code-
`
`Comcast - Exhibit 1018, page 13
`
`Comcast - Exhibit 1018, page 13
`
`
`
`1,
`
`358
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`word representing the source, which is a binary number pointing to the subinterval
`associated with this sequence.
`
`The Procedure
`Consider an alphabet A that has M symbols ai, i = 1, M, with the proba-
`bilities p(ai) = pi, such that pi + +pm = 1. The procedure starts with assigning
`each individual symbol in the alphabet a subinterval, within 0 to 1, whose length is
`equal to its probability. It is assumed that this assignment is known to the decoder.
`
`, M, then define the initial
`1. If the first input symbol x1 = a87 i = 1,
`+ pi), where po = 0. Set n = 1, L = 11,
`subinterval as /1 = [11, ri) = [A-1,
`R= r1, and d
`— /1.
`2. Obtain the binary expansions of L and R as
`R = Evo-k
`L = Euk2-k,
`
`CO
`
`CO
`
`and
`
`k=1
`
`k=1
`
`where uk and vk are 0 or 1.
`Compare u1 and v1. If they are not the same, send nothing to the channel at
`this time, and go to step 3.
`If ui = vi, then send the binary symbol u1, and compare u2 and v2. If they are
`not the same, go to step 3.
`If u2 = v2 , also send the binary symbol u2 , and compare u3 and v3, and so on,
`until the next two corresponding binary symbols do not match, at which time go
`to step 3.
`3. Increment n, and read the next symbol. If the nth input symbol xr, = ai,
`then subdivide the interval from the previous step as
`(pi_i+ pi)d).
`+
`In = [1„, rn) =
`Set L = ln , R = rn , and d = rr, — in , and go to step 2.
`Note that the decoder may decode one binary symbol into several source sym-
`bols, or it may require several binary symbols before it can decode one or more
`source symbols. The arithmetic coding procedure is illustrated by means of the
`following example.
`
`Example
`Suppose we wish to determine an arithmetic code to represent a sequence
`of symbols,
`
`1
`
`a2 a1 a3
`
`from the source shown in Table 18.2. Because we have four symbols
`in the alphabet, the interval from 0 to 1 is initially subdivided into 4,
`where the lengths of the subintervals are equal to 0.5, 0.25, 0.125 and
`0.125, respectively. This is depicted in Figure 18.4.
`
`Comcast - Exhibit 1018, page 14
`
`Comcast - Exhibit 1018, page 14
`
`
`
`18.2. SYMBOL CODING
`
`359
`
`Symbol
`Decimal
`Binary
`
`1
`0
`0.0
`
`Decimal
`Binary
`
`a1
`
`J
`.--` 0.5
`0.1
`
`a2
`
`a4
`
`a2a1
`
`0.5
`/0.1
`
`
`a2a1a1
`
`i,/+
`
`a2a2
`
`l
`0.625-,
`0.101
`a2a1a2
`
`a3
`I.
`1.0
`0.'15, 0.875
`1.0
`0.11
`0.411
`a2 a3 aa 2 4
`1
`r
`•J
`0.6875 0.71875 0.75
`''0,1011 0.10111 0.11
`ai2a
`a3:-a,2,a1 ja4
`1
`0.59375 0.609375 0.625
`0.10011 0.100111 0.101
`
`Decimal
`
`Binary
`
`0.5
`0.1
`
`0.5625
`
`0.1001
`
`Figure 18.4: Illustration of the concept of arithmetic coding.
`
`The first symbol defines the initial interval as I. = [0.5, 0.75), where the
`binary representations of the left and right boundaries are L = 2-1 = 0.1
`and R = 2-1 + 2-2 = 0.11, respectively. According to step 2,
`ui = vi = 1; thus, 1 is sent to the channel. Noting that u2 = 0
`and v2 = 1, we read the second symbol, a1. Step 3 indicates that
`/2 = [0.5, 0.625), with L = 0.10 and R = 0.101. Now that u2 = v2 = 0,
`we send 0 to the channel. However, u3 = 0 and v3 = 1, so we read the
`third symbol, a3. It can be easily seen that /3 = [0.59375, 0.609375),
`with L = 0.10011 and R = 0.100111. Note that u3 = v3 = 0,
`u4 = v4 = 1, and u5 = v5 = 1, but u6 = 0 and v6 = 1. At this stage, we
`send 011 to the channel, and read the next symbol. A reserved symbol
`usually signals the end of a sequence.
`
`Let's now briefly look at how the decoder operates, which is illustrated
`in Figure 18.5. The first bit restricts the interval to [0.5, 1). However,
`
`Received Bit
`1
`0
`0
`1
`1
`
`Interval
`[0.5,1)
`[0.5, 0.75)
`[0.5, 0.609375)
`[0.5625, 0.609375)
`[0.59375, 0.609375)
`
`Symbol
`
`a2
`
`a3
`
`Figure 18.5: The operation of the decoder.
`
`Comcast - Exhibit 1018, page 15
`
`Comcast - Exhibit 1018, page 15
`
`
`
`360
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`three symbols are within this range; thus, the first bit does not contain
`sufficient information. After receiving the second bit, we have 10 which
`points to the interval [0.5, 0.75). All possible combinations of two sym-
`bols pointing to this range start with a2 . Hence, we can now decode
`the first symbol as a2 . The information that becomes available after the
`receipt of each bit is summarized in Figure 18.5.
`
`In practice, two factors cause the performance of the arithmetic encoder to fall
`short of the theoretical bound: the addition of an end-of-message indicator, and
`the use of finite precision arithmetic. Practical implementations of the arithmetic
`coder overcome the precision problem by a scaling and a rounding strategy.
`
`18.3 Lossless Compression Methods
`
`Error-free coding is the only acceptable means of compression in some applications
`for various reasons. For example, in the transmission or archival of medical im-
`ages lossy compression is not allowed for legal reasons. Recalling the elements of
`a compression system, lossless coding schemes do not employ a quantizer. They
`consist of a transformation, which generates symbols whose probability distribution
`is highly peaked, followed by an entropy coder. The transformation aims to mini-
`mize the entropy of its output, so that significant compression becomes possible by
`variable-length coding of the generated symbols.
`In this section, we present three popular methods for lossless compression: i)
`Lossless predictive coding, where an integer predictive mapping is employed, fol-
`lowed by entropy coding of the integer prediction errors. ii) Run-length coding of
`bit-planes, where the image is decomposed into individual bit-planes (binary im-
`ages), and the run-lengths of zeros and ones in these planes are entropy coded.
`iii) Ziv-Lempel coding, which is a deterministic coding procedure, where the input
`bit string is parsed into blocks of variable length to form a dictionary of blocks
`(symbols), each of which is represented by a fixed-length codeword. The achiev-
`able compression ratio using lossless coding methods ranges between 2:1 to 5:1,
`depending on the characteristics of the input image.
`
`18.3.1 Lossless Predictive Coding
`
`The first step in lossless predictive coding is to form an integer-valued prediction
`of the next pixel intensity to be encoded based on a set of previously encoded
`neighboring pixels. Then the difference between the actual intensity of the pixel
`and its prediction is entropy coded. Assuming each pixel is integer-valued, then the
`prediction errors are also integer-valued, which facilitates lossless compression. The
`block diagram of a simple predictor is shown in Figure 18.6, where the prediction
`is taken as the intensity of the previous pixel encoded. Some other commonly used
`integer predictors are shown in the following example.
`
`Comcast - Exhibit 1018, page 16
`
`A
`
`Comcast - Exhibit 1018, page 16
`
`
`
`18.3. LOSSLESS COMPRESSION METHODS
`
`361
`
`Sample
`
`Residual
`
`Residual
`
`Previous
`Sample
`
`(a)
`
`Previous
`Sample
`
`(b)
`
`Reconstructed
`Sample
`
`Figure 18.6: Block diagram of a) an encoder, and b) a decoder using a simple
`predictor.
`
`Example: Integer prediction
`In order for the decoder to be able to duplicate the prediction step, the
`prediction operation must be based on already-encoded pixels. In 2-D,
`such prediction models are called recursively computable. The support
`of a recursively computable predictor is shown in Figure 18.7, where the
`coefficients a, b, c, and d denote the intensities of the respective pixels.
`Two simple predictors based on this model can be written as
`
`= int{(a b)/2}
`
`(18.7)
`
`b C d
`a X
`
`Figure 18.7: Prediction schemes.
`
`Comcast - Exhibit 1018, page 17
`
`Comcast - Exhibit 1018, page 17
`
`
`
`362
`
`and
`
`CHAPTER 18. LOSSLESS COMPRESSION
`
`= int{(a
`
`b
`
`c d)/4}
`
`(18.8)
`
`where X denotes the predicted value for the pixel x. In both expressions,
`the prediction is rounded to the nearest integer to ensure an integer
`prediction error. We note that many other forms of prediction, such as
`edge-adaptive prediction, also exist.
`
`If the input image intensity x has a dynamic range of (0,