`International Bureau
`
`
`
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(11) International Publication Number:
`
`WO 96/15620
`
`(51) In‘-°"“3ti°'“‘l Palm‘ Cl355ifi°3'i°“ 6 3
`
`H04N 1/41 A1
`
`(43) International Publication Date:
`23 May 1996 (2305.96)
`
`(81) Designated States: CA, JP, US, European patent (AT, BE. CH.
`
`DE, DK, ES. FR, GB, GR, IE, IT, LU, MC, NL, PT, SE).
`
`(21) International Application Number:
`
`PCT/CA95/00635
`
`(22) International Filing Date:
`
`8 November 1995 (08.1 1.95)
`
`(30) Priority Data:
`94227386
`
`10 November 1994 (10.1 l.94)
`
`GB
`
`
`
`
`Published
`
`With international search report.
`Before the expiration of the time limit for amending the
`
`
`claims and to be republished in the event of the receipt of
`amendments.
`
`
`
`(71) Applicant (for all designated States except US): THE CHI-
`NESE UNIVERSITY OF HONG KONG [GB/GB]; Re-
`search Administration Office, Room 226, Pi Chiu Building,
`Shatin, New Territories (HK).
`
`(7l)(72) Applicant and Inventor: WU. Xiaolin [CA/CA]; I7: 683
`Windemere Road, London, Ontario NSX 3T9 (CA).
`
`(72) Inventor; and
`(75) Inventor/Applicant (for US only): MEMON, Nasir [IN/US];
`128 Buena Vista Drive, Dekalb, IL 60115 (US).
`
`(74) Agent: MANNING, Gavin, N.; Oyen Wiggs Green & Mutala.
`480 - 601 West Cordova Street, Vancouver, British Colum-
`bia V6B lGl (CA).
`
`
`
`(54) Title: CONTEXT-BASED, ADAPTIVE, LOSSLESS IMAGE CODEC
`
`10
`
`CONDITIGIAL
`PROBABlLlTlES
`ESTIMATION
`
`
`
`G RAD|ENT-
`ADJUSTED
`
`(57) Abstract
`
`
`
`
`
`An encoding/decoding method is provided for lossless (reversible) compression of digital pictures of all types, including continuous-
`tone images, graphics, multimedia images of mixed text, graphics and photographs, binary documents and drawings. Continuous-tone mode
`
`
`and binary mode are identified on a pixel~by-pixel basis (12). In continuous-tone mode, context modeling and prediction (20, 22, 26) are
`employed involving mostly integer arithmetic and simple logic in a conceptually sophisticated scheme. Both the encoding and decoding
`
`
`techniques are suitable for sequential and progressive transmission, although different specific algorithms may be employed for the different
`
`specific cases. The system is symmetric, meaning that the encoder and decoder have the same time and space complexities.
`
`1
`LG 1023
`
`
`
`
`
`LG 1023
`
`1
`
`
`
`
`
`
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international
`applications under the PCT.
`
`AT
`AU
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`CG
`CH
`CI
`CM
`CN
`C5
`CZ
`DE
`DK
`ES
`FI
`FR
`GA
`
`Austria
`Australia
`Barbados
`Belgium
`Burlrina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Switzerland
`Cote d‘lvoire
`Cameroon
`China
`Czechoslovakia
`Czech Republic
`Germany
`Denmark
`Spain
`Finland
`France
`Gabon
`
`United Kingdom
`Georgia
`Guinea
`Greece
`Hungary
`Ireland
`Italy
`Japan
`Kenya
`Kyrgystan
`Democratic People's Republic
`of Korea
`Republic of Korea
`Kazakhstan
`Liechtenstein
`Sri Lanka
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`Mali
`Mongolia
`
`Mauritania
`Malawi
`Niger
`Netherlands
`Nonavay
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Slovenia
`Slovakia
`Senegal
`Chad
`Togo
`Tajikistan
`Trinidad and Tobago
`Ukraine
`United States of America
`Uzbekistan
`Viet Nam
`
`2
`
`
`
`W0 96,1562‘)
`
`PCTICA95l00635
`
`
`
`CONTEXT-BASED, ADAPTIVE, LOSSLESS IMAGE CODEC
`
`CLAIM OF PRIORITY
`
`The present application claims partial priority of
`
`British Provisional Patent Application Serial No. 9422738-6
`
`filed 10 November 1994.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`With rapidly-advancing computer,
`
`telecommunication,
`
`and digital imaging technologies,
`
`there is an astronomical
`
`amount of image data for a wide range of applications such as
`
`15
`
`education, entertainment, medical
`
`imaging, space exploration,
`
`electronic publishing, visual arts, etc. This rapid growth of
`
`image data puts punishing burdens on computer storage and
`
`visual communication bandwidth. Thus image compression
`
`becomes a pressing technical challenge in visual
`
`20
`
`communications and computing, without which it will be
`
`difficult to build, deploy, and use cost-effective multimedia
`
`information systems.
`
`Lossless compression is a form of compression where
`
`an image can be reconstructed without any loss of information.
`
`25
`
`Lossless image compression is required by medical imaging,
`
`satellite/aerial imaging,
`
`image archiving, preservation of
`
`precious art work and documents,
`
`the press, or any
`
`applications demanding ultra high image fidelity.
`Furthermore,
`lossless image coding is the necessary last step
`
`30
`
`of many lossy image compression systems, such as lossless
`
`compression of codeword indices in vector quantization (VQ),
`
`and lossless compression of transform coefficients in Discrete
`
`Cosine Transform (DCT) and wavelet/subband—based coding.
`
`There exists a large body of literature on lossless
`
`35
`
`image compression algorithms and systems, such as the IBM
`
`Q-coder, and JPEG lossless coder. Among notable patents and
`
`publications are the US patents and research publications
`listed below:
`
`SUBSTHUTESHEET(RULE26)
`3
`
`3
`
`
`
`
`
`W0 96/15620
`
`PCHCA95/00635
`
`4,463,342 1984 IBM.
`
`4,749,983 O7/1988 Langdon.
`
`4,969,204 11/1989 Melnychuck et al.
`
`5
`
`5,050,230 O9/1990 Jones et al.
`Universal Modeling and Coding —- J. Rissanen and G. Langdon,
`1981,
`IEEE, vol. IT-27.
`
`A Universal Data Compression System -- J. Rissanen, 1983,
`IEEE, vol.
`IT-29.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Parameter Reduction and Context Selection for Compression of
`the Gray-Scale Images -- S. Todd, G. Langdon, and J. Rissanen,
`1985,
`IBM J. Res.
`& Develop., vol. 29.
`Comparing the Lossless Image Compression Standards and
`Universal Context Modelling -- R. Arps, M. Weinberger, T.
`Truong, and J. Rissanen, Proc. of the Picture Coding
`symposium, Sacramento, September 1994.
`on the JPEG Model for Lossless Image Compression --
`G. Langdon, A. Gulati, and E. seiler,
`Proc. of 1992 Data Compression Conf.
`
`New Methods for lossless Image Compression Using Arithmetic
`Coding --P. Howard and J. Vitter, 1992, Info. Proc. & Manag.,
`vol. 28.
`
`The currently achievable lossless compression ratio
`is still modest, being typically from 1.5:1 to 2.5:1.
`For
`instance,
`in contrast to the success of JPEG's lossy
`compression standard,
`the current JPEG‘s lossless compression
`standard has sufficiently poor coding efficiency that it is
`seldom used in practice.
`
`ISO and JPEG solicited for proposals for
`In 1994,
`the next international standard for lossless image
`compression.
`The present invention is a result of the
`inventors’ response to the ISO solicitation.
`The lead
`inventor xiaolin Wu, developed a context-based, adaptive,
`lossless image coding and decoding technique (herein CALIC).
`Among nine proposals that were submitted to ISO for its
`initial evaluation as candidates for the lossless image
`compression standard in 1995,
`the present CALIC system ranked
`first according to a criterion that accounts for both coding
`efficiency and algorithm simplicity.
`
`4
`
`
`
`
`
`
`
`wo 95/15520
`
`PCTICA95/00635
`
`3
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`Known prior art on lossless compression of
`continuous-tone images is based on the principle of predictive
`coding.
`An image is traversed, and pixels are encoded in a
`fixed order, typically in raster scan sequence. Previously
`encoded pixels that are known to both the encoder and the
`decoder are used to predict the upcoming pixels.
`The
`prediction errors rather than the pixels themselves are
`The
`entropy encoded by Huffman or like arithmetic coding.
`original image is reconstructed by adding the error term back
`to the prediction value.
`The predictive coding works because
`the histogram of the errors is much more concentrated (heavily
`biased toward 0)
`than the histogram of the pixel values,
`resulting in a significantly smaller zero-order entropy for
`the former than for the latter.
`Among numerous prediction
`schemes in the literature,
`the simplest type is a fixed linear
`predictor such as those used under the current lossless JPEG
`standard.
`
`A linear predictor can be optimized on an
`image-by—image or even block-by-block basis via linear
`regression. However, such an optimization is expensive and
`brings only modest
`improvement in coding efficiency. Moreover
`the performance of linear predictors is not robust in the
`areas of edges. Adaptive, non-linear predictors can adjust
`parameters according to the local edge strengths and
`orientations, if edges exist.
`The adjustment of predictor
`parameters can be made very efficient since it is based on
`local information.
`lossless image compression inherited
`Historically,
`the theoretical framework and methodology of text compression.
`Statistical modeling of the source being compressed plays a
`central role in any data compression systems.
`Suppose that we
`encode a finite source x1,x2,m,xn sequentially.
`The optimal
`code length of the sequence in bits is then
`
`—log1'I
`
`p(xi,1lxi,...,x1) ,
`
`(1)
`
`35
`
`given the assignments of conditional probabilities.
`Arithmetic coding can approach this code length of the source.
`
`5
`
`
`
`
`
`wo 95/15520
`
`PCl‘lCA95I00635
`
`4
`
`The challenge is to assign the conditional probabilities
`p(x{i+1}|xi,m,x1)
`to maximize the product given above, hence
`minimize the code length.
`The achievable compression is
`governed by a scheme, or a model, that can assign high
`conditional probability distributions p(x{i+1}|xi,m,x1)
`given sequence.
`
`to the
`
`Fitting a given source well with statistical models
`is a difficult and computationally very expensive task.
`Context modeling of continuous-tone images is made even more
`difficult by the large alphabet size (2 256) of grey-scale
`images. Context modeling of the source symbols (pixel values)
`would lead to an unwieldily large number of possible model
`states (contexts). This is more than a problem of high time
`and space complexities for modeling.
`If the number of model
`parameters is too large with respect to the image resolution,
`there will be insufficient samples to obtain good estimates of
`conditional probabilities on the model states,
`leading to poor
`coding efficiency. This is known as a "context dilution
`problem." This problem was theoretically formulated by
`Rissanen in the framework of stochastic complexity as the
`"model cost." Rissanen's work proves that the high complexity
`of a model can reduce coding efficiency, as observed by many
`data compression practitioners. What is needed are innovative
`
`5
`
`10
`
`15
`
`20
`
`25
`
`improving both coding and computational efficiency.
`
`30
`
`35
`
`SUMMARY OF THE INVENTION
`According to the invention, an encoding/decoding
`method is provided for lossless (reversible) compression of
`digital pictures of all types,
`including continuous-tone
`images, graphics, multimedia images of mixed text, graphics
`and photographs, binary documents and drawings. Continuous-
`tone mode and binary mode are identified on a pixel-by-pixel
`basis.
`In continous-tone mode, context modeling and
`prediction are employed,
`involving mostly integer arithmetic
`and simple logic in a conceptually sophisticated scheme. Both
`the encoding and decoding techniques are suitable for
`sequential and progressive transmission, although different
`
`6
`
`
`
`
`
`W0 96/15620
`
`PCT/CA95/00635
`
`5
`
`specific algorithms may be employed for the different specific
`cases.
`The system is symmetric, meaning that the encoder and
`
`decoder have the same time and space complexities.
`
`A primary reason for the improved coding efficiency
`
`5
`
`by context modeling of errors and error feedback lies in the
`fact that the prediction error sequence is a composition of
`
`multiple sources of distinct underlying statistics.
`
`The use
`
`of proper contexts to separate the statistically distinct
`
`sources of errors from the code stream can get a significant
`
`10
`
`reduction in conditional entropies.
`
`The invention will be better understood by reference
`
`to the following detailed description in connection with the
`
`accompanying drawings.
`
`15
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Fig. 1 is a schematic description of an encoding
`
`process for compression according to the invention in which
`
`the decoding process is a reverse.
`
`Fig. 2 illustrates labeling of neighboring pixels
`
`20
`
`for context-based prediction and modeling.
`
`Fig.
`
`3 shows four possible convexity relationships
`
`between a GAP predictor f and two neighboring pixels.
`
`Fig. 4 illustrates a two-stage adaptive prediction
`
`scheme via context modeling of prediction errors and error
`
`25
`
`feedback.
`
`Fig. 5 is a chart depicting a method for determining
`
`a possible hierarchy of contexts (causal templates around a
`
`coded pixel marked by '?') in which optimal modeling context
`is selected.
`
`30
`
`Fig.
`
`6 is a two-part graph illustrating a sign
`
`flipping technique to sharpen conditional error probability
`
`density functions for entropy reduction without increasing
`
`modeling space.
`
`35
`
`scheme for progressive transmission according to the
`
`Fig. 7 is the block diagram of a four-pass encoding
`
`invention.
`
`7
`
`
`
`
`
`wo 95/15520
`
`PCI‘ICA95I00635
`
`6
`
`Fig.
`
`8 is a diagram and key illustrating interlaced
`
`sampling for the first pass and the configuration of
`
`prediction and modeling context for such a pass.
`
`Fig. 9 is a diagram and key illustrating interlaced
`
`5
`
`sampling for the second pass and the configuration of
`
`prediction and modeling context for such a pass.
`
`Fig. 10 is a diagram and key illustrating
`functionally lossless reconstruction at the third pass and the
`configuration of prediction and modeling context used in this
`pass.
`
`10
`
`DESCRIPTION OF SPECIFIC EMBODIMENTS
`as '
`
`Co
`
`Referring to Fig. 1, before actual compression, a
`system 10 operating according to the invention includes a
`
`15
`
`20
`
`25
`
`decision maker 12 which dynamically selects on parts of the
`image between a compression scheme for continuous-tone image
`data and a compression scheme for binary image data based on
`context of the pixel data.
`In binary mode,
`the image data is
`compressed by a first encoder 14 using context-based,
`adaptive,
`ternary entropy encoding.
`In the continuous-tone
`
`mode,
`
`the image data is compressed by a second encoder 16
`
`using context-based, adaptive, non—linear predictive coding.
`Before encoding or decoding an image,
`the system is designed
`to provide or selects between sequential transmission and
`
`to encode or decode image data by
`progressive transmission,
`the appropriated types of context-sensitive entropy-encoding
`and decoding processors.
`Image data encoded and decoded in
`
`raster scan order in a single pass through the image is
`
`30
`
`preferably transmitted by sequential transmission.
`
`The coding
`
`process uses prediction contexts that involve only the
`
`previous two or three scan lines of coded pixels.
`
`the encoding and decoding processes require only
`Consequently,
`a simple double buffer 18 that holds only the two or three
`
`35
`
`rows of pixels that immediately precede the current pixel.
`
`(Where greater buffer space is available,
`
`the system can
`
`achieve progressive transmission via interlaced subsampling
`
`8
`
`
`
`
`
`
`
`W0 96/15620
`
`PCTlCA95I00635
`
`7
`
`that encodes, and accordingly decodes, an image in four
`
`passes.)
`
`In coding a picture,
`
`the CALIC system dynamically
`
`The
`binary and continuous-tone modes.
`operates in two modes:
`binary mode is for the situation in which the current locality
`
`5
`
`of interest in the input image has no more than two distinct
`
`intensity values; hence it is designed for a more general
`
`class of images or subimages than the class of black-and-white
`
`images.
`
`10
`
`An innovation of the system is to distinguish
`
`between binary and continuous-tone types of pictures on a
`
`local, rather than a global, basis.
`
`The system automatically
`
`selects one of the two modes depending on the context of the
`
`current pixel.
`
`In the binary mode,
`
`the context-based adaptive
`
`15
`
`entropy coder 14 is used to code three symbols,
`
`including an
`
`escape symbol. This distinction between binary and
`
`continuous-tone modes is important because the coding
`
`methodologies are very different in the two modes, and it
`
`contributes to the universality and robustness of the CALIC
`
`20
`
`system over a wide range of pictures,
`
`including those that mix
`
`texts, graphics and photographs in multimedia applications.
`
`The selection between the continuous-tone and binary
`
`modes is based on pixel context.
`
`The mode selection is
`
`automatic and completely transparent to the user.
`
`No side
`
`25
`
`information about mode switching is required.
`
`The coded image
`
`only needs a header of five bytes:
`
`two for width,
`
`two for
`
`height, and one for depth or the number of bits in intensity
`resolution.
`
`In the continuous-tone mode, a mechanism 20 is
`
`30
`
`provided to compress pixel values by context-based, adaptive,
`
`predictive coding.
`
`To predict the current pixel value, a
`
`predictor 22 is provided for a local intensity gradient to be
`
`estimated based on the neighboring pixels in horizontal and
`
`vertical directions. These estimates are used to detect the
`
`35
`
`magnitude and orientation of edges in the input
`
`image so that
`
`adjustments can be made to the prediction accordingly. This
`gradient—adjusted prediction (GAP), denoted by f, of I is
`
`further corrected based on the particular pattern that the
`
`9
`
`
`
`
`
`wo 95/15510
`
`PC'l‘ICA9sI00635
`
`8
`
`(as shown here as an output of an
`neighboring pixels exhibit
`error modeling function 26). This is necessary because
`
`gradients alone cannot adequately characterize some of the
`
`more complex relationships between the predicted pixel and its
`surroundings.
`context modeling can exploit higher-order
`
`5
`
`structures such as texture patterns in the image. This is
`
`done by quantizing the neighborhood into different
`
`conditioning classes, and estimating the expectation of the
`GAP prediction error e = I - I in each of these classes
`
`10
`
`separately.
`
`The reasons for estimating conditional
`
`expectations rather than the conditional density functions, as
`commonly suggested in the literature, are to prevent the
`
`problem of context dilution (sample counts are insufficient to
`
`obtain good estimates), and to reduce time and space
`complexities.
`The conditional error expectation estimate is
`
`15
`
`then fed back to the GAP predictor 22 I to generate a second
`
`and improved prediction denoted by f, which is equal to I + E.
`The net effect is a context-based, adaptive, non-linear
`
`predictor that can correct itself by learning from its
`mistakes made in the past and in the same context. This is a
`
`20
`
`key feature to distinguish the prediction scheme of the CALIC
`
`system from the existing prediction schemes.
`
`encoded.
`
`The final prediction error e = I - f is entropy
`In driving the entropy coder 16, a modest number
`
`25
`
`30
`
`(between 8 and 32) of conditional probabilities of prediction
`errors 5 are estimated (by an estimator 24) depending on an
`estimate of the prediction error energy.
`The error energy
`estimator is a linear function of the horizontal and vertical
`
`gradient and of the prediction errors at the preceding pixels.
`It can be optimized off line via linear regression for
`
`specific types of images.
`
`The conditioning classes for
`
`entropy coding are generated by quantizing the error energy
`
`estimator.
`
`The quantization can also be optimized via
`
`off-line dynamic programming to minimize the total code length
`
`35
`
`of a set of training images.
`
`In addition,
`
`the CALIC system
`
`uses a novel and elegant technique to "sharpen" the
`
`conditional probabilities (thus reducing underlying entropy)
`
`10
`
`10
`
`
`
`
`
`wo 96/15520
`
`PCT/CA95/00635
`
`9
`
`for entropy coding by a simple sign flipping mechanism 28
`
`without any increase in modeling space.
`
`3.
`
`_E:.
`
`!
`
`3
`
`E
`
`i.
`
`!
`
`A nonlinear predictor according to the invention
`
`5
`
`adapts itself to the image gradients near the predicted pixel.
`
`This predictor improves the robustness of traditional DPCM
`
`predictors, particularly in areas of edges.
`
`The basic
`
`function is now explained.
`
`10
`
`I[i,j], 0 5 i < W, O 5 j < H.
`
`In order not to obscure the
`
`Denote an input image of width W and height H by
`
`concept of the proposed compression algorithm, we only
`
`consider in the following development the sequential coding of
`
`interior pixels I[i,j], 2 s i < W—2,
`
`2 5 j < H-2. Many
`
`possible treatments of boundary pixels are possible, and they
`
`15
`
`do not make a significant difference in the final compression
`
`ratio due to the small population of boundary pixels.
`
`For
`
`instance, boundary pixels can be coded by a simple DPCM scheme
`
`as they are encountered in the raster scan.
`
`To facilitate the prediction of I[i,j] and entropy
`
`20
`
`coding of the prediction error via context modeling, we
`
`compute the following quantities:
`
`db = |I[i-1,j] —I[i—2,j1 I+|I[i.j-1] -I[i-1,j-1] |+
`
`|I[i+1,j—1] -I[i,j—1]|
`
`(2)
`
`d, = |I[i—1,j]-I[i-1.j-1]l+|I[i,j—1]—I[i.j-2]|+
`
`|I[i+l,j—1]-I[i+1,j-2]I.
`
`25
`
`30
`
`Clearly, dh and dv are estimates, within a scaling
`
`factor, of the gradients of the intensity function near pixel
`
`I[i,j] in horizontal and vertical directions.
`
`The values of
`
`dh and dv are used to detect the magnitude and orientation of
`edges in the input image, and make necessary adjustments in
`
`the prediction. We aim to alleviate the problem that the
`
`precision of existing DPCM-type predictors can be adversely
`
`11
`
`11
`
`
`
`
`
`W0 96/15620
`
`PCT/CA95l00635
`
`10
`
`affected by edges.
`
`In Equations (2)
`
`three absolute
`
`differences are used for d in each direction. This has been
`
`found to give the best compression results.
`
`Two or one
`
`absolute differences can be used here for lower complexity
`with a small loss in performance. Efficient incremental
`
`5
`
`and/or parallel schemes for evaluating dh and dv are
`straightforward.
`For instance,
`to avoid unnecessary repeated
`computations, one can store the values of [I[-,-] - I[-,-|
`associated with preceding pixels for future reference. This
`
`10
`
`only requires an array of the size W.
`
`For simple denotation in the sequel, we let
`
`15
`
`20
`
`D = Ifi; j'1}z W = I[i'lr j}: H9 = I[i+1z j‘1J.
`nw = Iri-1, 2'-11, nn = In‘,
`j-21, w = I[i-2, 1'1,
`
`(3)
`
`meaning north, west, northeast, northwest, north-north, and
`west-west, respectively.
`
`The locations of these pixels with respect to I[i,j]
`are given in Fig. 2.
`
`Based on db and dv, a simple technique, as described
`
`by the following conditional statements,
`
`is used to make a
`
`gradient-adjusted prediction (GAP) 1‘[i,j3 of I[i,j].
`
`12
`
`12
`
`
`
`W0 96/15620
`
`PCT/CA95l00635
`
`ll
`
`IF (dv - dh > 80) {sharp horizontal edge}
`
`f[i,j] = W
`
`ELSE IF (dv - dh
`
`< -80)
`
`(sharp vertical edge}
`
`5
`
`ELSE {
`
`f[i;j] = n
`
`f[izj] = (W+n)/2 + (ne'nW)/4?
`
`IF (dv - db > 32) {horizontal edge}
`
`I'ti,j1 = (I‘[i,jJ+w>/2
`
`ELSE IF (dv - dh > 8) {weak horizontal edge)
`
`f[irj] = (3T[i,j]+W)/4
`
`ELSE IF (dv - dh < -32) {vertical edge}
`
`T[i,j] = (f[i,j]+n)/2
`
`ELSE IF (dv - dh < -8) {weak vertical edge}
`
`I‘{i,j] = <3I'[i,j1+n>/4
`
`} T
`
`he procedure given above is parallelizable. This
`
`10
`
`15
`
`technique differs from the existing linear predictors in that
`
`it weights the neighboring pixels of I[i,j] according to the
`estimated gradients of the image.
`In effect, I[i,j] is a
`
`20
`
`simple, adaptive, nonlinear predictor.
`
`The predictor
`
`coefficients and thresholds given above were empirically
`
`chosen.
`
`A major criterion in choosing these coefficients is
`
`the ease of computations.
`
`For instance, most coefficients are
`
`25
`
`The
`of power of 2 to avoid multiplication and divisions.
`multiples of 3
`in the procedure above can be computed by a bit
`shift and an addition.
`
`It is possible, albeit quite expensive,
`
`to optimize
`
`the coefficients and thresholds for an image or a class of
`
`images, so that a norm of the expected prediction error
`
`30
`
`E{nI — IH} is minimized.
`
`It is not recommended that such an
`
`optimization process to be carried out on an image—by-image
`basis. However, it is important to point out that the
`
`coefficients and thresholds in computing
`
`f[i,j] can be set by
`
`the user,
`
`if the user knows the optimal or nearly optimal
`
`35
`
`coefficients and thresholds for the target images.
`
`13
`
`
`
`13
`
`
`
`
`
`W0 96/15620
`
`PCI‘lCA95l00635
`
`12
`
`Error Energy Quantization for Minimum Entropy
`
`Although the nonlinear predictor f[i,j] outperforms
`
`linear predictors, it does not completely remove the
`
`statistical redundancy in the image.
`
`The variance of
`
`5
`
`prediction errors e = I - f strongly correlates to the
`
`smoothness of the image around the predicted pixel I[i,j].
`model this correlation at a small computational cost, we
`
`To
`
`define an error energy estimator to be
`
`w;,
`A=ad,,+bdV+cDe’
`
`(4)
`
`10
`
`15
`
`where dh and dv are defined in Equation (2) which are reused
`
`here to quantify the variability of pixel values, and
`ew = I[i-1,j] - f[i-1,j].
`|ew|
`is chosen because large errors
`
`tend to occur consecutively.
`
`The coefficients a, b, and c can
`
`be determined,
`in an off-line design process, by the standard
`linear regression technique so that A is the least—squares
`estimator of the error strength |e| based on dh, dv,
`few].
`For algorithm efficiency, we recommend a = b = 1 and c = 2
`
`in
`
`Equation (4).
`
`In our experiments we also find that other
`
`20
`
`definitions of A can sometimes estimate the error energy
`better, such as
`
`A = amin {div dv} + bmax X {lewlt
`
`ienl}r
`
`(5)
`
`25
`
`depending on images, where en = I[i,j—1]
`
`- f[i,j—1].
`
`By conditioning the error distribution on A, we can
`
`separate the prediction errors into classes of different
`
`variances.
`
`Thus entropy coding of errors using estimated
`
`conditional probability p(e|A)
`
`improves coding efficiency over
`
`30
`
`using p(e). For time and space efficiency, we quantize
`
`A to L levels.
`
`In practice, L=8 is found to be sufficient.
`
`Larger L improves coding efficiency only marginally.
`
`Denote the A quantizer by Q, i.e., Q(A)
`
`in {0,1,w,7).
`
`In
`
`entropy coding of prediction errors, we estimate and use eight
`
`35
`
`conditional probabilities p(e]Q(A)).
`
`Since A is a random
`
`variable,
`
`it requires only scalar quantization.
`
`The
`
`14
`
`14
`
`
`
`
`
`13
`
`errors.
`
`In an off-line design process, we get a training set
`
`of (e,A) pairs from test images, and use the standard dynamic
`
`programming technique to choose 0 = qo < ql <
`to partition A into L intervals such that
`
`< q{L__1} < qL = no
`
`L‘).
`
`d- 0
`
`-
`
`2:
`qdsA sad,‘
`
`p(e)lo9 D(e)
`
`(6)
`
`is minimized.
`
`In practice, we found that an image—independent A
`
`quantizer whose bins are fixed,
`
`q,=5, q2=15. q,=25. q,=42, q5=60, q6=85, q,=140.
`
`(7)
`
`10
`
`worked almost as well as the optimal
`
`image-dependent A
`
`quantizer.
`
`Estimating L-8 conditional error probabilities
`
`p(e|Q(A)) requires only a modest amount of memory in the phase
`
`15
`
`of entropy coding of prediction errors.
`
`Furthermore,
`
`the
`
`small number of involved conditional error probabilities means
`
`that even small images will provide enough samples for context
`
`modeling to learn p(e|Q(A)) quickly in adaptive entropy
`
`coding.
`
`20
`
`Context Modeling of Prediction Errors
`
`The precision of the nonlinear GAP predictor I[i,j]
`
`can be significantly improved via context modeling, because
`
`gradients alone cannot adequately characterize some of the
`
`25
`
`more complex relationships between the predicted pixel I[i,j]
`
`and its surrounding. Context modeling of the prediction error
`
`e = I - I can exploit higher-order structures such as texture
`
`patterns in the image for further compression gains.
`
`Denote by a K-tuple C = {x0,x1,~,xx_1} a modeling
`context of f[i,j] that consists of K events xi.
`
`30
`
`For both space and time efficiencies and to avoid
`
`the problem of context dilution, we need to drastically reduce
`
`15
`
`15
`
`
`
`
`
`W0 96/15620
`
`PCTlCA95/00635
`
`14
`
`the number of possible contexts C by vector quantization of C.
`
`Specifically, we consider a prediction context of eight events
`
`C ={z%,~,Ag,J9} =
`
`{n,w,nw,ne,nn,ww,2n—nn,2w—ww}
`
`(0)
`
`5
`
`and quantize context C = {xo,x1,m,x7} to an 8-bit binary
`number b = b7b5~bo using the prediction value 1 as the
`
`threshold, namely
`
`0ifxk2f[i,j]
`
`b,=
`
`,05k<K=8.
`
`(9)
`
`1ifxk<.'l'[1',j]
`
`Clearly, number B (a binary vector codeword) encodes the
`
`10
`
`texture patterns in the modeling context which are indicative
`
`15
`
`of the behavior of e. Also note that an event xi in a
`prediction context need not be a neighboring pixel to I[i,j].
`It can be a function of some neighboring pixels.
`By letting
`x5 = 2n—nn, x7 = 2w-ww in the above, and consequently setting
`be and b, depending on whether 2n-nn < I[i,j] and
`2w-ww < I[i,j], we can detect whether the prediction value
`I[i,j] forms a convex or concave waveform with respect to the
`neighboring pixels in vertical and horizontal directions, as
`depicted by Fig. 3. This convexity information is useful in
`
`20
`
`the context modeling of e.
`
`Since the variability of neighboring pixels also
`influences the error distribution, we combine the quantized
`error energy 0 5 LQ(A)/2] < L/2 with the quantized texture
`pattern 0 5 B < 2‘ to form compound prediction contexts,
`denoted by C([Q(A)/2J,B). This scheme can be viewed as a
`
`product quantization of two independently treated image
`features: spatial texture patterns and the energy of
`prediction errors.
`
`At a glance, we would seemingly use 4
`- 23 = 1024
`different compound contexts, since 0 5 LQ(A)/2] < L/2 = 4 and
`0 5 B < 23 = 25. However, not all 23 binary codewords of B
`
`25
`
`30
`
`16
`
`16
`
`
`
`
`
`W0 96/15620
`
`PCT/CA95/00635
`
`15
`
`quantizer defined by (9) are possible.
`value f[i,j] lies in between n and nn,
`
`If the prediction
`then the condition
`
`2n-nn < i[i,j] is either always true or always false.
`
`In
`
`other words,
`
`the outcome of the test 2n-nn < f[i,j] in (9)
`
`is
`
`5
`
`uniquely determined if n > f[i,j] and nn < f[i,j], or if
`n < I[i,j] and nn > f[i,j]. This should be apparent from
`
`Fig. 3.
`
`By referring to (8) and (9), we see that in B, be is
`
`fixed if b2 and b5 are different. Likewise, b7 is fixed if b0
`
`and b4 are different. Because b6, b2, and b5 are not
`independent,
`they can only yield 2-2 + 2-1 = 6 combinations.
`
`10
`
`The same holds for b7, b0, and b4. Thus, 5 represents only
`
`6-6-4 = 144 rather than 256 valid texture patterns. Finally,
`
`the total number of valid compound contexts in error modeling
`
`is only 4
`
`- 144 = 576.
`
`15
`
`Th model cost is still to high to estimate
`
`p(e|C(LQ(A)/2J,B)) for 576 compound contexts. Therefore, we
`estimate only the conditional expectations E{e}C(6,B)} using
`
`the corresponding sample means E(6,B) for different compound
`
`contexts. Computing §(6,B)
`
`involves only accumulating the
`
`20
`
`error terms in each compound context and maintaining a count
`
`on the occurrence of each context.
`
`For each compound context
`
`C(6,B), 0 5 6 < 4,
`
`0 S B < 144, we count the number N(6,B) of
`
`occurrences of C(6,B) for the purpose of computing the
`
`conditional sample mean ?(6,B). We only use one byte to store
`
`25
`
`the occurrence count. Whenever the count reaches some value
`
`nm‘x < 256, we scale down the count but let the corresponding
`?(6,B)
`intact. We recommend nm‘x = 128.
`
`In order to update ?(6,B)
`
`in error modeling, we use
`
`an accumulator S(6,B) of e(6,B), and compute
`
`30
`
`E(5.B) = S(6.B)/N(6.B)
`
`(1°3
`
`whenever a compound context C(6,B) occurs. We use a 16-bit
`
`signed integer to store S(6,B) whose range is more than
`
`sufficient in practice.
`
`It takes more than 128 consecutive
`
`35
`
`E(a,;3) = -255 or E(a,B) = 255 to violate the bounds
`
`17
`
`17
`
`
`
`
`
`wo 95/15510
`
`PCT/CA95I00635
`
`16
`
`-215 < s(6,B) < 215, but we have nmx = 123. Once N(6,B)
`
`reaches 128, we re-scale the variables by setting
`
`s<5,t3)=s(5,o)/2,-
`
`N(6,B)=64
`
`(11)
`
`the coding system needs one byte for
`In summary,
`N(6,B) and two bytes for s(6,B) for each of 576 compound
`contexts.
`Thus the total size of the memory required by
`context modeling of prediction errors can be as small as 1728
`
`bytes.
`
`Besides being a technique to reduce the memory use,
`rescaling also has the beneficial side effect of aging the
`observed data. This is an inexpensive way of adapting the
`context error modeling to time varying sources.
`Indeed,
`the
`scaling technique gave a slightly better compression in our
`experiments.
`
`5
`
`10
`
`15
`
`Context-based Adaptive Eredictor via Eggor Feedback
`The idea of gaining coding efficiency by context
`modeling of expected errors E{e|c(6,B)} came from the
`
`is generally not
`observation that the conditional mean 3(6,B)
`zero in a given context C(6,B). This does not contradict the
`
`20
`
`well-known fact that the DPCM prediction error without
`
`conditioning on the contexts, follows a Laplacian (symmetric
`exponential) distribution, hence is zero mean for most
`
`continuous-tone images.
`
`The Laplacian distribution can be
`
`25
`
`viewed as a composition of distributions of different means
`
`and different variances. Context modeling of prediction
`
`errors is a means to separate these distributions. Therefore,
`the more biased is E(6,B)
`from zero,
`the more effective is the
`
`modeling.
`
`Since the conditional mean E(6,6)
`
`is the most
`
`30
`
`likely prediction error in a given compound context C(6,B), we
`can improve the pre