`worm) INTELLECI'UAL_ PROPERTY ORGANIZATION
`
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(51) lntnrnntinnnl Patent Classification 6 =
`H04N 1/41
`
`(11) International Publication Number:
`
`WO 96/15620
`
`A1
`
`(43) International Publication Date:
`
`23 May 1996 (23.05.96)
`
`(21) International Application Number:
`
`PCT/CA95/00635
`
`(81) Designated States: CA, JP. US. Elll'0p€‘/an Patent (AT. BE. CH.
`DE, DK, ES. FR. GB, GR. IE. IT, LU, MC, NL, PT, SE).
`
`(22) International Filing Date:
`
`8 November 1995 (08.1 1.95)
`
`Published
`
`(30) Priority Data:
`94227386
`
`10 November 1994 (l0.l1.94)
`
`GB
`
`With international search report.
`Before the expiration of the time limit for amending the
`claims and to be republished in the event of the receipt of
`amendments.
`
`(71) Applicant (for all designated States except US): THE CHI-
`NESE UNIVERSITY OF HONG KONG [GB/GB]; Re-
`search Administration Office. Room 226, Pi Chiu Building,
`Shatin, New Territories (HK).
`
`(7I)(72) Applicant and Inventor: WU. Xiaolin [CA/CA]; 17: 683
`Windemere Road, London, Ontario NSX 3T9 (CA).
`
`(72) Inventor; and
`(75) Inventor/Applicant (for US only): MEMON. Nasir [IN/US];
`128 Buena Vista Drive, Dekalb, IL 60115 (US).
`
`(74) Agent: MANNING. Gavin, N.; Oyen Wiggs Green & Mutala,
`480 - 601 West Cordova Street, Vancouver, British Colum-
`bia V6B 1G1 (CA).
`
`(54) Title: CONTEXT-BASED. ADAPTIVE, LOSSLESS IMAGE CODEC
`
`10
`
`I
`
`GRAD|ENT-
`ADJUSTED
`PREDICTION
`
`CONDITIONAL
`PROBABILITIES
`ESTIMATION
`
`ERROR
`MODEUNG
`
`oooma
`§R§lR?,‘§flf,}f3
`
`hi ENTR
`
`(57) Abstract
`
`An encoding/decoding method is provided for lossless (reversible) compression of digital pictures of all types, including continuous-
`tone images, graphics, multimedia images of mixed text. graphics and photographs, binary documents and drawings. Continuous-tone mode
`and binary mode are identified on a pixel-by-pixel basis (12).
`in continuous-tone mode, context modeling and prediction (20. 22, 26) are
`employed involving mostly integer arithmetic and simple logic in a conceptually sophisticated scheme. Both the encoding and decoding
`techniques are suitable for sequential and progressive transmission, although different specific algorithms may be employed for the different
`specific cases. The system is symmetric. meaning that the encoder and decoder have the same time and space complexities.
`
`Sony, Ex. 1023, p.1
`
`
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCI‘ on the from pages of pamphlets publishing international
`applications under the PCP.
`AT
`AU
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`CG
`CH
`CI
`CM
`CN
`CS
`CZ
`DE
`DK
`ES
`FI
`FR
`GA
`
`United Kingdom
`Georgia
`Guinea
`Greece
`Hungary
`Ireland
`Italy
`Japan
`Kenya
`Kyrgystan
`Democraiic People '5 Republic
`of Korea
`Republic of Korea
`Kazakhstan
`Liechtenstein
`Sri Lanka
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`Mali
`Mongolia
`
`Austria
`Australia
`Barbados
`Belgium
`Burliina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Swilzerland
`Cole d‘lvoixe
`Cameroon
`China
`Czechoslovakia
`Czech Republic
`Gennany
`Denmark
`Spain
`Finland
`France
`Gabon
`
`GE
`GE
`GN
`GR
`HU
`IE
`[T
`JP
`KE
`KG
`KP
`
`KR
`KZ
`Ll
`LK
`LU
`LV
`MC
`MD
`MG
`ML
`MN
`
`Mauritania
`Malawi
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Slovenia
`Slovakia
`Senegal
`Chad
`Togo
`Tajikistan
`Trinidad and Tobago
`Ukraine
`United States of America
`Uzbekistan
`Viei Nam
`
`Sony, Ex. 1023, p.2
`
`
`
`WO 96/15620
`
`PCTlCA95/00635
`
`CONTEXT-BASED, ADAPTIVE, LOSSLESS IMAGE CODEC
`
`CLAIM OF PRIORITY
`
`The present application claims partial priority of
`
`British Provisional Patent Application Serial No. 9422738-6
`
`filed 10 November 1994.
`
`BACKGROUND OF THE INVENTION
`
`With rapidly-advancing computer,
`
`telecommunication,
`
`and digital imaging technologies,
`
`there is an astronomical
`
`amount of image data for a wide range of applications such as
`
`education, entertainment, medical
`
`imaging, space exploration,
`
`electronic publishing, visual arts, etc. This rapid growth of
`
`image data puts punishing burdens on computer storage and
`
`visual communication bandwidth. Thus image compression
`
`becomes a pressing technical challenge in visual
`
`communications and computing, without which it will be
`
`difficult to build, deploy, and use cost—effective multimedia
`
`information systems.
`
`Lossless compression is a form of compression where
`
`an image can be reconstructed without any loss of information.
`
`Lossless image compression is required by medical imaging,
`
`satellite/aerial imaging,
`
`image archiving, preservation of
`
`precious art work and documents,
`
`the press, or any
`
`applications demanding ultra high image fidelity.
`
`Furthermore,
`
`lossless image coding is the necessary last step
`
`of many lossy image compression systems, such as lossless
`
`compression of codeword indices in vector quantization (VQ),
`
`and lossless compression of transform coefficients in Discrete
`
`Cosine Transform (DCT) and wavelet/subband—based coding.
`
`There exists a large body of literature on lossless
`
`image compression algorithms and systems, such as the IBM
`
`Q-coder, and JPEG lossless coder. Among notable patents and
`
`publications are the US patents and research publications
`listed below:
`
`SUBSTITUTE SHEET (RULE 26)
`
`Sony, Ex. 1023, p.3
`
`
`
`WO 96115620
`
`PCl'lCA95l006J5
`
`4,463,342 1984 IBM.
`
`4,749,983 07/1933 Langdon.
`
`4,969,204 11/1989 Melnychuck et al.
`
`5,050,230 09/1990 Jones et al.
`
`Universal Modeling and Coding —- J. Rissanen and G. Langdon,
`1981,
`IEEE, vol.
`IT-27.
`
`A Universal Data Compression System -- J. Rissanen, 1983,
`IEEE, vol.
`IT-29.
`
`Parameter Reduction and Context Selection for Compression of
`the Gray-Scale Images -- S. Todd, G. Langdon, and J. Rissanen,
`1985,
`IBM J. Res.
`& Develop., vol.
`Comparing the Lossless Image Compression Standards and
`Universal Context Modelling -- R. Arps, M. Weinberger, T.
`Truong, and J. Rissanen, Proc. of the Picture Coding
`symposium, Sacramento, September 1994.
`
`on the JPEG Model for Lossless Image Compression --
`G. Langdon, A. Gulati, and E. Seiler,
`Proc. of 1992 Data Compression Conf.
`New Methods for lossless Image Compression Using Arithmetic
`Coding --P. Howard and J. Vitter, 1992,
`& Manag.,
`vol. 28.
`
`The currently achievable lossless compression ratio
`is still modest, being typically from 1.5:1 to 2.5:1.
`For
`instance,
`in contrast to the success of JPEG's lossy
`compression standard,
`the current JPEG's lossless compression
`standard has sufficiently poor coding efficiency that it is
`seldom used in practice.
`
`ISO and JPEG solicited for proposals for
`In 1994,
`the next international standard for lossless image
`compression.
`The present invention is a result of the
`inventors’ response to the ISO solicitation.
`The lead
`inventor xiaolin Wu, developed a context-based, adaptive,
`lossless image coding and decoding technique (herein CALIC).
`Among nine proposals that were submitted to ISO for its
`initial evaluation as candidates for the lossless image
`compression standard in 1995,
`the present CALIC system ranked
`first according to a criterion that accounts for both coding
`efficiency and algorithm simplicity.
`
`Sony, Ex. 1023, p.4
`
`
`
`WO 96115620
`
`PCTICA95/00635
`
`3
`
`Known prior art on lossless compression of
`continuous-tone images is based on the principle of predictive
`coding.
`An image is traversed, and pixels are encoded in a
`fixed order, typically in raster scan sequence. Previously
`encoded pixels that are known to both the encoder and the
`decoder are used to predict the upcoming pixels.
`The
`prediction errors rather than the pixels themselves are
`The
`entropy encoded by Huffman or like arithmetic coding.
`original image is reconstructed by adding the error term back
`to the prediction value. The predictive coding works because
`the histogram of the errors is much more concentrated (heavily
`biased toward 0)
`than the histogram of the pixel values,
`resulting in a significantly smaller zero-order entropy for
`the former than for the latter. Among numerous prediction
`schemes in the literature,
`the simplest type is a fixed linear
`predictor such as those used under the current lossless JPEG
`standard.
`
`A linear predictor can be optimized on an
`image-by—image or even block-by-block basis via linear
`regression. However, such an optimization is expensive and
`brings only modest
`improvement in coding efficiency. Moreover
`the performance of linear predictors is not robust in the
`areas of edges. Adaptive, non-linear predictor? can adjust
`parameters according to the local edge strengths and
`orientations, if edges exist.
`The adjustment of predictor
`parameters can be made very efficient since it is based on
`local information.
`lossless image compression inherited
`Historically,
`the theoretical framework and methodology of text compression.
`Statistical modeling of the source being compressed plays a
`central role in any data compression systems.
`Suppose that we
`encode a finite source x1,x2,m,xn sequentially.
`The optimal
`code length of the sequence in bits is then
`
`-log
`
`9:1 p(zg,1lx,,m,zg),
`1 0
`
`(1)
`
`given the assignments of conditional probabilities.
`Arithmetic coding can approach this code length of the source.
`
`Sony, Ex. 1023, p.5
`
`
`
`wo 95/15520
`
`PCI‘/CA95/00635
`
`4
`
`The challenge is to assign the conditional probabilities
`p(x,i+1}|xi,m,x1) to maximize the product given above, hence
`minimize the code length.
`The achievable compression is
`governed by a scheme, or a model, that can assign high
`conditional probability distributions p(x,i+1}|xi,m,x1)
`given sequence.
`
`to the
`
`Fitting a given source well with statistical models
`is a difficult and computationally very expensive task.
`Context modeling of continuous—tone images is made even more
`
`states (contexts).
`
`If the number of model
`and space complexities for modeling.
`parameters is too large with respect to the image resolution,
`there will be insufficient samples to obtain good estimates of
`conditional probabilities on the model states,
`leading to poor
`coding efficiency. This is known as a "context dilution
`problem." This problem was theoretically formulated by
`Rissanen in the framework of stochastic complexity as the
`"model cost." Rissanen's work proves that the high complexity
`of a model can reduce coding efficiency, as observed by many
`data compression practitioners. What is needed are innovative
`algorithmic techniques to reduce the model complexity for
`improving both coding and computational efficiency.
`
`SUMMARY OF THE INVENTION
`According to the invention, an encoding/decoding
`method is provided for lossless (reversible) compression of
`digital pictures of all types,
`including continuous-tone
`images, graphics, multimedia images of mixed text, graphics
`and photographs, binary documents and drawings. Continuous-
`tone mode and binary mode are identified on a pixel—by-pixel
`basis.
`In continous-tone mode, context modeling and
`prediction are employed,
`involving mostly integer arithmetic
`and simple logic in a conceptually sophisticated scheme. Both
`the encoding and decoding techniques are suitable for
`sequential and progressive transmission, although different
`
`Sony, Ex. 1023, p.6
`
`
`
`W0 96/15620
`
`,
`
`PCTIC.-195100635
`
`5
`
`specific algorithms may be employed for the different specific
`cases.
`The system is symmetric, meaning that the encoder and
`
`decoder have the same time and space complexities.
`
`A primary reason for the improved coding efficiency
`
`by context modeling of errors and error feedback lies in the
`fact that the prediction error sequence is a composition of
`
`multiple sources of distinct underlying statistics.
`
`The use
`
`of proper contexts to separate the statistically distinct
`
`sources of errors from the code stream can get a significant
`
`reduction in conditional entropies.
`
`The invention will be better understood by reference
`
`to the following detailed description in connection with the
`
`accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Fig. 1 is a schematic description of an encoding
`
`process for compression according to the invention in which
`
`the decoding process is a reverse.
`
`Fig. 2 illustrates labeling of neighboring pixels
`
`for context-based prediction and modeling.
`
`Fig.
`
`3 shows four possible convexity relationships
`
`between a GAP predictor i and two neighboring pixels.
`
`Fig.
`
`4 illustrates a two-stage adaptive prediction
`
`scheme via context modeling of prediction errors and error
`
`feedback.
`
`Fig. 5 is a chart depicting a method for determining
`
`a possible hierarchy of contexts (causal templates around a
`
`coded pixel marked by '?') in which optimal modeling context
`is selected.
`
`Fig.
`
`6 is a two-part graph illustrating a sign
`
`flipping technique to sharpen conditional error probability
`
`density functions for entropy reduction without increasing
`
`modeling space.
`
`Fig.
`
`7 is the block diagram of a four-pass encoding
`
`scheme for progressive transmission according to the
`invention.
`
`Sony, Ex. 1023, p.7
`
`
`
`W0 96/15620
`
`PC!‘/CA95/00635
`
`6
`
`Fig.
`
`8 is a diagram and key illustrating interlaced
`
`sampling for the first pass and the configuration of
`
`prediction and modeling context for such a pass.
`
`Fig. 9 is a diagram and key illustrating interlaced
`
`sampling for the second pass and the configuration of
`
`prediction and modeling context for such a pass.
`
`Fig. 10 is a diagram and key illustrating
`functionally lossless reconstruction at the third pass and the
`configuration of prediction and modeling context used in this
`pass.
`
`DESCRIPTION OF SPECIFIC EMBODIMENTS
`
`Co
`
`es
`
`Referring to Fig. 1, before actual compression, a
`system 10 operating according to the invention includes a
`
`decision maker 12 which dynamically selects on parts of the
`image between a compression scheme for continuous-tone image
`data and a compression scheme for binary image data based on
`context of the pixel data.
`In binary mode,
`the image data is
`compressed by a first encoder 14 using context-based,
`adaptive,
`ternary entropy encoding.
`In the continuous-tone
`
`mode,
`
`the image data is compressed by a second encoder 16
`
`using context-based, adaptive, non-linear predictive coding.
`Before encoding or decoding an image,
`the system is designed
`to provide or selects between sequential transmission and
`
`to encode or decode image data by
`progressive transmission,
`the appropriated types of context-sensitive entropy-encoding
`and decoding processors.
`Image data encoded and decoded in
`
`raster scan order in a single pass through the image is
`
`preferably transmitted by sequential transmission.
`
`The coding
`
`process uses prediction contexts that involve only the
`
`previous two or three scan lines of coded pixels.
`
`the encoding and decoding processes require only
`Consequently,
`a simple double buffer 18 that holds only the two or three
`
`rows of pixels that immediately precede the current pixel.
`
`(Where greater buffer space is available,
`
`the system can
`
`achieve progressive transmission via interlaced subsampling
`
`Sony, Ex. 1023, p.8
`
`
`
`W0 96/159620
`
`PCI‘lCA95I00635
`
`7
`
`that encodes, and accordingly decodes, an image in four
`
`passes.)
`
`In coding a picture,
`
`the CALIC system dynamically
`
`The
`binary and continuous-tone modes.
`operates in two modes:
`binary mode is for the situation in which the current locality
`
`of interest in the input image has no more than two distinct
`
`intensity values; hence it is designed for a more general
`
`class of images or subimages than the class of black-and-white
`
`images.
`
`An innovation of the system is to distinguish
`
`between binary and continuous-tone types of pictures on a
`
`local, rather than a global, basis.
`
`The system automatically
`
`selects one of the two modes depending on the context of the
`
`current pixel.
`
`In the binary mode,
`
`the context-based adaptive
`
`entropy coder 14 is used to code three symbols,
`
`including an
`
`escape symbol. This distinction between binary and
`
`continuous-tone modes is important because the coding
`
`methodologies are very different in the two modes, and it
`
`contributes to the universality and robustness of the CALIC
`
`system over a wide range of pictures,
`
`including those that mix
`
`texts, graphics and photographs in multimedia applications.
`The selection between the continuous-tone and binary
`
`modes is based on pixel context.
`
`The mode selection is
`
`automatic and completely transparent to the user.
`
`No side
`
`information about mode switching is required.
`
`The coded image
`
`only needs a header of five bytes:
`
`two for width,
`
`two for
`
`height, and one for depth or the number of bits in intensity
`resolution.
`
`In the continuous-tone mode, a mechanism 20 is
`
`provided to compress pixel values by context-based, adaptive,
`
`predictive coding.
`
`To predict the current pixel value, a
`
`predictor 22 is provided for a local intensity gradient to be
`
`estimated based on the neighboring pixels in horizontal and
`
`vertical directions. These estimates are used to detect the
`
`magnitude and orientation of edges in the input
`
`image so that
`
`adjustments can be made to the prediction accordingly. This
`gradient-adjusted prediction (GAP), denoted by i, of I is
`further corrected based on the particular pattern that the
`
`Sony, Ex. 1023, p.9
`
`
`
`W0 96/1 5620
`
`PCTICA95I00635
`
`8
`
`(as shown here as an output of an
`neighboring pixels exhibit
`error modeling function 26). This is necessary because
`gradients alone cannot adequately characterize some of the
`
`more complex relationships between the predicted pixel and its
`surroundings. Context modeling can exploit higher-order
`
`structures such as texture patterns in the image. This is
`
`done by quantizing the neighborhood into different
`
`conditioning classes, and estimating the expegtatiog of the
`
`GAP prediction error e
`
`I - I in each of these classes
`
`separately.
`
`The reasons for estimating conditional
`
`expectations rather than the conditional density functions, as
`
`commonly suggested in the literature, are to prevent the
`
`problem of context dilution (sample counts are insufficient to
`
`obtain good estimates), and to reduce time and space
`complexities.
`The conditional error expectation estimate is
`
`then fed back to the GAP predictor 22 I to generate a second
`
`and improved prediction denoted by f, which is equal to I + E.
`The net effect is a context-based, adaptive, non-linear
`
`predictor that can correct itself by learning from its
`mistakes made in the past and in the same context. This is a
`
`key feature to distinguish the prediction scheme of the CALIC
`
`system from the existing prediction schemes.
`
`encoded.
`
`The final prediction error 5 = I - f is entropy
`In driving the entropy coder 16, a modest number
`
`(between 8 and 32) of conditional probabilities of prediction
`
`errors 5 are estimated (by an estimator 24) depending on an
`estimate of the prediction error energy.
`The error energy
`estimator is a linear function of the horizontal and vertical
`
`gradient and of the prediction errors at the preceding pixels.
`It can be optimized off line via linear regression for
`
`specific types of images.
`
`The conditioning classes for
`
`entropy coding are generated by quantizing the error energy
`
`estimator.
`
`The quantization can also be optimized via
`
`off-line dynamic programming to minimize the total code length
`of a set of training images.
`In addition,
`the CALIC system
`
`uses a novel and elegant technique to "sharpen" the
`
`conditional probabilities (thus reducing underlying entropy)
`
`Sony, Ex. 1023, p.10
`
`
`
`W0 96Il5620
`
`PCTICA95/00635
`
`9
`
`for entropy coding by a simple sign flipping mechanism 28
`
`without any increase in modeling space.
`
`3.
`
`_lg.
`
`I
`
`3
`
`E
`
`3.
`
`A nonlinear predictor according to the invention
`
`adapts itself to the image gradients near the predicted pixel.
`
`This predictor improves the robustness of traditional DPCM
`
`predictors, particularly in areas of edges.
`
`The basic
`
`function is now explained.
`
`Denote an input image of width W and height H by
`
`I[i,j], O 5 i < W,
`
`O s j < H.
`
`In order not to obscure the
`
`concept of the proposed compression algorithm, we only
`
`consider in the following development the sequential coding of
`
`interior pixels I[i,j], 2 s i < w—2,
`
`2 s j < H-2. Many
`
`possible treatments of boundary pixels are possible, and they
`
`do not make a significant difference in the final compression
`
`ratio due to the small population of boundary pixels.
`
`For
`
`instance, boundary pixels can be coded by a simple DPCM scheme
`
`as they are encountered in the raster scan.
`
`To facilitate the prediction of I[i,j] and entropy
`
`coding of the prediction error via context modeling, we
`
`compute the following quantities:
`
`d,, = |I[i-1.1"] -I[i-2,j]|+|I[i,j-1] —I[i-l,j-1]I+
`
`|ru+1,j—11—m.j—11|
`
`¢,=|I[i-1,j]-I[i-l,j-1][+|I[i,j-1]-1Ti,j-2]|+
`
`|I[i+l.j'l]-I[i+l.j'2]L
`
`Clearly, dh and dv are estimates, within a scaling
`
`factor, of the gradients of the intensity function near pixel
`
`I[i,j] in horizontal and vertical directions.
`
`The values of
`
`dh and dv are used to detect the magnitude and orientation of
`edges in the input image, and make necessary adjustments in
`
`the prediction. We aim to alleviate the problem that the
`
`precision of existing DPCM-type predictors can be adversely
`
`Sony, Ex. 1023, p.11
`
`
`
`W0 96/15620
`
`PCT/CA95/00635
`
`10
`
`affected by edges.
`
`In Equations (2)
`
`three absolute
`
`differences are used for d in each direction. This has been
`
`found to give the best compression results.
`
`Two or one
`
`absolute differences can be used here for lower complexity
`with a small loss in performance. Efficient incremental
`
`and/or parallel schemes for evaluating dh and dv are
`straightforward.
`For instance,
`to avoid unnecessary repeated
`computations, one can store the values of 1I[-,-] - I[-,-I
`associated with preceding pixels for future reference. This
`
`only requires an array of the size W.
`
`For simple denotation in the sequel, we let
`
`D = Iii; j'1}, W = I[i-1/ j]; H9 = I[i+1z j’1].
`nw = I[i-1, j-1], nn = I[i, j-2], ww = I[i-2, j],
`
`(3)
`
`meaning north, west, northeast, northwest, north-north, and
`west-west, respectively.
`
`The locations of these pixels with respect to I[i,j]
`are given in Fig. 2.
`
`Based on dh and dv, a simple technique, as described
`
`by the following conditional statements,
`
`is used to make a
`
`gradient—adjusted prediction (GAP) I[i,j] of I[i,j].
`
`Sony, Ex. 1023, p.12
`
`
`
`W0 96/15620
`
`PC!‘/CA95/00635
`
`11
`
`v I
`
`IF (d
`
`- dh > 80) {sharp horizontal edge}
`
`[i,j] = w
`
`ELSE IF (dv - dh
`
`< -80) {sharp vertical edge}
`
`f[i/j} = n
`
`ELSE {
`
`I'[i,:iJ
`
`(w+n)/2 + (ne-nw)/4;
`
`IF (dv - db
`
`32) {horizontal edge}
`
`T[i,j]
`
`(T[i:j]+W)/2
`
`ELSE IF (dv
`
`dh > 8) {weak horizontal edge}
`
`I'[iz_7'] = <3I'[i,j1+w)/4
`
`ELSE IF (dv - dh < -32) {vertical edge}
`
`T[i,j] = (f[i,j]+n)/2
`
`ELSE IF (dv - dh < -8) {weak vertical edge)
`
`f[i,j] = (3f[i,j]+D)/4
`
`} T
`
`he procedure given above is parallelizable. This
`
`technique differs from the existing linear predictors in that
`
`it weights the neighboring pixels of I[i,j] according to the
`estimated gradients of the image.
`In effect, f[i,j] is a
`
`simple, adaptive, nonlinear predictor.
`
`The predictor
`
`coefficients and thresholds given above were empirically
`
`chosen.
`
`A major criterion in choosing these coefficients is
`
`the ease of computations.
`
`For instance, most coefficients are
`
`The
`of power of 2 to avoid multiplication and divisions.
`multiples of 3
`in the procedure above can be computed by a bit
`shift and an addition.
`
`It is possible, albeit quite expensive,
`
`to optimize
`
`the coefficients and thresholds for an image or a class of
`
`images, so that a norm of the expected prediction error
`
`E{HI - IN} is minimized.
`
`It is not recommended that such an
`
`optimization process to be carried out on an image—by-image
`basis. However, it is important to point out that the
`
`coefficients and thresholds in computing
`
`I[i,j] can be set by
`
`the user,
`
`if the user knows the optimal or nearly optimal
`
`coefficients and thresholds for the target images.
`
`Sony, Ex. 1023, p.13
`
`
`
`W0 96/15620
`
`PCI‘ICA9S/00635
`
`12
`
`Error Energy Quantization for Minimum Entropv
`
`Although the nonlinear predictor f[i,j] outperforms
`
`linear predictors, it does not completely remove the
`
`statistical redundancy in the image.
`
`The variance of
`
`prediction errors e
`
`I - I strongly correlates to the
`
`smoothness of the image around the predicted pixel I[i,j].
`model this correlation at a small computational cost, we
`define an error energy estimator to be
`
`To
`
`wit
`A=ad,,+bdV+c!e'
`
`(4)
`
`where dh and dv are defined in Equation (2) which are reused
`
`here to quantify the variability of pixel values, and
`ew = I[i-1,j] - I[i-1,j].
`|ew|
`is chosen because large errors
`
`tend to occur consecutively.
`
`The coefficients a, b, and c can
`
`be determined,
`in an off-line design process, by the standard
`linear regression technique so that A is the least—squares
`estimator of the error strength |e| based on dh, dv,
`few].
`For algorithm efficiency, we recommend a = b
`1 and c = 2
`
`in
`
`Equation (4).
`
`In our experiments we also find that other
`
`definitions of A can sometimes estimate the error energy
`better, such as
`
`A = amin {dh' dv} + bmax X {iewit
`
`ienl}I
`
`depending on images, where en = I[i,j—1]
`
`- f[i,j-1].
`
`By conditioning the error distribution on A, we can
`
`separate the prediction errors into classes of different
`
`variances.
`
`Thus entropy coding of errors using estimated
`
`conditional probability p(e|A)
`
`improves coding efficiency over
`
`using p(e). For time and space efficiency, we quantize
`
`A to L levels.
`
`In practice, L=8 is found to be sufficient.
`
`Larger L improves coding efficiency only marginally.
`
`Denote the A quantizer by Q, i.e., Q(A)
`
`in {O,l,w,7}.
`
`In
`
`entropy coding of prediction errors, we estimate and use eight
`conditional probabilities p(e|Q(A)).
`Since A is a random
`
`variable,
`
`it requires only scalar quantization.
`
`The
`
`Sony, Ex. 1023, p.14
`
`
`
`W0 96/15620
`
`PCl‘lCA95/00635
`
`13
`
`errors.
`
`In an off-line design process, we get a training set
`
`of (e,A) pairs from test images, and use the standard dynamic
`
`programming technique to choose 0 = qo < ql <
`to partition A into L intervals such that
`
`< q(L_1} < qL = on
`
`L-1
`
`2:
`- 2:
`d‘ 0 G456 sqdq
`
`p(e)log p(e)
`
`is minimized.
`
`In practice, we found that an image-independent A
`
`quantizer whose bins are fixed,
`
`q1=5, q2=l5. q,=25. q,=42, q5=6O, q6=85, q7=14O.
`
`(7)
`
`worked almost as well as the optimal
`
`image-dependent A
`
`quantizer.
`
`Estimating L=8 conditional error probabilities
`
`p(e|Q(A)) requires only a modest amount of memory in the phase
`
`of entropy coding of prediction errors.
`
`Furthermore,
`
`the
`
`small number of involved conditional error probabilities means
`
`that even small
`
`images will provide enough samples for context
`
`modeling to learn p(e|Q(A)) quickly in adaptive entropy
`
`coding.
`
`Context Modeling of Prediction Errors
`
`The precision of the nonlinear GAP predictor I[i,j]
`
`can be significantly improved via context modeling, because
`
`gradients alone cannot adequately characterize some of the
`
`more complex relationships between the predicted pixel I[i,j]
`
`and its surrounding. Context modeling of the prediction error
`
`e = I - I can exploit higher-order structures such as texture
`
`patterns in the image for further compression gains.
`
`Denote by a K-tuple C = {xo,x1,~,xk_1} a modeling
`
`context of f[i,j] that consists of K events xi.
`For both space and time efficiencies and to avoid
`
`the problem of context dilution, we need to drastically reduce
`
`Sony, Ex. 1023, p.15
`
`
`
`W0 96/15620
`
`PCI‘lCA95/00635
`
`14
`
`the number of possible contexts C by vector quantization of C.
`Specifically, we consider a prediction context of eight events
`
`C ={3%,m,Jg,A9} =
`
`{n,w,nw,ne,nn,ww,2n—nn,2w—ww}
`
`(9)
`
`and quantize context C = {x0,x1,~,x7} to an 8-bit binary
`number b = b7b5~bo using the prediction value I as the
`
`threshold, namely
`
`0ifx,,2.i’[i,j]
`
`bk
`
`,Osk<K=8.
`
`lifxk< i[i,j]
`
`Clearly, number 3 (a binary vector codeword) encodes the
`
`texture patterns in the modeling context which are indicative
`
`of the behavior of e. Also note that an event xi in a
`prediction context need not be a neighboring pixel to I[i,j].
`It can be a function of some neighboring pixels.
`By letting
`x5 = 2n-nn, x7 = 2w-ww in the above, and consequently setting
`be and b7 depending on whether 2n-nn < I[i,j] and
`2w-ww < I[i,j], we can detect whether the prediction value
`I[i,j] forms a convex or concave waveform with respect to the
`neighboring pixels in vertical and horizontal directions, as
`depicted by Fig. 3. This convexity information is useful in
`the context modeling of e.
`
`Since the variability of neighboring pixels also
`influences the error distribution, we combine the quantized
`error energy 0 5 LQ(A)/2] < L/2 with the quantized texture
`pattern 0 5 B < 2‘ to form compound prediction contexts,
`denoted by C(LQ(A)/2J,B). This scheme can be viewed as a
`
`product quantization of two independently treated image
`features: spatial texture patterns and the energy of
`prediction errors.
`
`At a glance, we would seemingly use 4
`
`- 28 = 1024
`
`< L/2 = 4 and
`different compound contexts, since 0 5 LQ(A)/2]
`0 5 B < 23 = 28. However, not all 28 binary codewords of B
`
`Sony, Ex. 1023, p.16
`
`
`
`wo 96/15620
`
`PC!‘/CA9Sl00635
`
`15
`
`quantizer defined by (9) are possible.
`value f[i,j] lies in between n and nn,
`
`If the prediction
`then the condition
`
`2n-nn < f[i,j] is either always true or always false.
`
`In
`
`other words,
`
`the outcome of the test 2n-nn < f[i,j] in (9)
`
`is
`
`uniquely determined if n > f[i,j] and nn < f[i,j], or if
`n < I[i,j] and nn > f[i,j]. This should be apparent from
`
`By referring to (8) and (9), we see that in B, b6 is
`Fig. 3.
`fixed if b2 and b5 are different. Likewise, b7 is fixed if b0
`
`and b4 are different. Because b6, b2, and b5 are not
`independent,
`they can only yield 2-2 + 2-1 = 6 combinations.
`
`The same holds for b7, b0, and b4. Thus, 5 represents only
`
`6-6-4 = 144 rather than 256 valid texture patterns. Finally,
`
`the total number of valid compound contexts in error modeling
`
`is only 4
`
`- 144 = 576.
`
`Th model cost is still to high to estimate
`
`p(e|C(LQ(A)/2J,B)) for 576 compound contexts. Therefore, we
`
`estimate only the conditional expectations E{e]C(6,B)} using
`
`the corresponding sample means 3(6,fi) for different compound
`
`contexts. Computing 3(6,B)
`
`involves only accumulating the
`
`error terms in each compound context and maintaining a count
`
`on the occurrence of each context.
`
`For each compound context
`
`C(6,B),
`
`0 5 6 < 4, 0 5 B < 144, we count the number N(6,fi) of
`
`occurrences of C(6,B) for the purpose of computing the
`
`conditional sample mean ?(6,B). We only use one byte to store
`
`the occurrence count. Whenever the count reaches some value
`
`nm.x < 256, we scale down the count but let the corresponding
`E(6,B)
`intact. We recommend nmax = 128.
`
`In order to update ?(6,B)
`
`in error modeling, we use
`
`an accumulator S(6,B) of e(5,B), and compute
`
`?(5.B) = S(5.B)/N(6.B)
`
`whenever a compound context C(6,B) occurs. We use a 16-bit
`
`signed integer to store S(6,B) whose range is more than
`
`sufficient in practice.
`
`It takes more than 128 consecutive
`
`35
`
`E(5,B) = -255 or E(5,B) = 255 to violate the bounds
`
`Sony, Ex. 1023, p.17
`
`
`
`W0 96/15610
`
`PCT/CA95/00635
`
`16
`
`-215 < s(a,;3) < 215, but we have nmx = 123. Once N(6,B)
`
`reaches 128, we re-scale the variables by setting
`
`s(6.[5)=s(5.B)/2;
`
`N(6.B)=64
`
`(11)
`
`the coding system needs one byte for
`In summary,
`N(6,B) and two bytes for s(6,B) for each of 576 compound
`contexts.
`Thus the total size of the memory required by
`context modeling of prediction errors can be as small as 1728
`
`bytes.
`
`Besides being a technique to reduce the memory use,
`rescaling also has the beneficial side effect of aging the
`observed data. This is an inexpensive way of adapting the
`context error modeling to time varying sources.
`Indeed,
`the
`
`scaling technique gave a slightly better compression in our
`experiments.
`
`Context-based Adaptive Eredictor via E;;g; Feedback
`The idea of gaining coding efficiency by context
`modeling of expected errors E{e|C(6,B)} came from the
`
`is generally not
`observation that the conditional mean 3(6,B)
`zero in a given context C(6,B). This does not contradict the
`
`well-known fact that the DPCM prediction error without
`
`follows a Laplacian (symmetric
`conditioning on the contexts,
`exponential) distribution, hence is zero mean for most
`
`continuous-tone images.
`
`The Laplacian distribution can be
`
`viewed as a composition of distributions of different means
`
`and different variances. Context modeling of prediction
`
`errors is a means to separate these distributions. Therefore,
`the more biased is §(6,6) from zero,
`the more effective is the
`
`modeling.
`
`Since the conditional mean §(6,B)
`
`is the most
`
`likely prediction error in a given compound context C(6,B), we
`can improve the prediction by feeding back §(6,B) and
`
`adjusting the prediction I to f = I + E(6,B).
`
`In order not to over-adjust the GAP predictor, we
`
`actually consider the new prediction error 5 = I - f rather
`
`than e = I - I in context-based error modeling.
`
`The context
`
`modeling of e
`
`in turn leads to an improved predictor for
`
`Son