`
`High .. Resolution Still Picture Compression *
`Mladen Victor Wickerhauser
`Department of Mathematics, Washington University,
`Campus Box 1146, St. Louis, Missouri 63130
`
`-We shall consider the problem of storing, transmit(cid:173)
`
`1. INTRODUCTION
`
`ting, and manipulating digital electronic images. Be(cid:173)
`cause of the file sizes involved, transmitting images
`will always consume large amounts of bandwidth, and
`storing images will always require hefty resources.
`Because of the large number N of pixels in a high-reso(cid:173)
`lution image, manipulation o f digital images is infea(cid:173)
`sible without low-complexity algorithms, i.e., O(N)
`or O(N log(N)). Our goal is to describe some new
`methods which are firmly grounded in harmonic anal(cid:173)
`ysis and the mathematical theory of function spaces,
`which promise to combine effective image compres(cid:173)
`sion with low-complexity image processing. We s hall
`take a broad perspective, but we shall a lso compare
`specific new algorithms to the state of the art.
`Roughly speaking, most image compression algo(cid:173)
`rithms split into three parts: invertible transforma(cid:173)
`t ion, lossy quantization or rank reduction, a nd en(cid:173)
`tropy coding (or redundancy removal). There are a
`few algorithms which differ fundamentally from this
`scheme, e.g., the collage coding algorithm [ 4 J, o r pure
`vector quantization of the pixels. The former uses a
`deep observation that pictures of natural objects ex(cid:173)
`hibit self-similarity at different scales; we prefer to
`avoid relying on this phenomenon, since our images
`may not be "natural." The latte r uses a complex algo(cid:173)
`rithm to build a superefficient e mpirical vocabulary
`to describe an ensemble of images; we prefer to avoid
`training our algorithm with any sample of images, to
`avoid the problem of producing a sufficiently large
`and suitable ensemble.
`There has emerged an international standard for
`picture compression, promulgated by the Joint Photo·
`
`• Research eupported in parL by AFOSR Gra nL F49620-92·J ·
`0106 and by FBI Contract Al0?183.
`
`graphic Experts Group (JPEG ), which is remarkably
`effective in reducing the s ize of digitized image files.
`JPEG is two-djmensional discrete cosine transform
`(OCT) coding of 8 X 8 blocks of pixels, followed by a
`possibly proprietary quantization scheme on the OCT
`amplitudes, followed by either Huffman, Lempel(cid:173)
`Ziv Welch, or a rithmetic coding oft he quantized coef(cid:173)
`ficients. It has some drawbacks; for example, several
`incompatible implementations are allowed under t he
`standard. Also, JPEG degrades ungracefully at high
`and ultrahigh compression rat ios, a nd it makes cer(cid:173)
`tain assumptions about the picture thai are violated
`by zooming in or out, or other transformations. It
`works so well on typical photographs and many other
`images, however, that it has become the algorithm to
`beat in most applications. J PEG fai ls most noticeably
`on high-resolution (i.e., oversampled ) data, and on
`images which must be closely examined by humans or
`machines.
`Alternatives to JPEG have recently appeared, and
`we shall discuss three of these: the fast discrete wave(cid:173)
`let transform, t he local trigonometric or lapped or(cid:173)
`thogonal transform, and the best-basis algor ithm.
`These differ in the transform coding step; i.e., instead
`of OCT they first apply the wavelet transform, lapped
`orthogonal t ransform, or wavelet packet transform,
`possibly followed by a best-basis search. The resulting
`stream o f amplitudes is then quantized a nd coded to
`remove rPrlunrlllncy.
`Existing image processing algorithms work on the
`original pixels or else on the ( 2-dimensional ) Fourier
`transform of the pixels. If the image has been com(cid:173)
`pressed, it must be uncompressed prior to such pro(cid:173)
`cessing. Alternatively, we can try to devise algorithms
`which transform the compressed parameters. If com(cid:173)
`pression is accomplished by retaining only a low-rank
`approximation to the signal, then we can use more
`complex algorithms for subsequent processing. To
`
`1061 -2004/92$4.00
`C'opyroaht C 1992 by Ac•dtmoc Pras, Inc.
`All rights of ...,production on •ny fom> m;erved.
`
`Page 1 of 23
`
`
`
`put this idea into practice, we need to retain useful
`analytic properties such as the large derivatives used
`in edge detection. These will not be preserved by
`purely information-theoretic coding such as pure vec(cid:173)
`tor quantization, but we can choose transform coding
`methods whose mathematical properties combine ef(cid:173)
`ficient compression with good analytic behavior.
`
`2. TRANSFORM CODING IMAGE COMPRESSION
`
`-A digitally sampled image can represent only a
`
`band-limited function, since there is no way of resolv(cid:173)
`ing spatial frequencies higher than half the pixel
`pitch. Band-limited functions are smooth; in fact they
`are entire analytic, which means that at each point
`they can be differentiated arbitrarily often and the
`resulting Taylor series converges arbitrarily far away.
`Since digitally sampled images faithfully reproduce
`the originals as far as our eyes can tell, we may confi(cid:173)
`dently assume that our images are in fact smooth and
`well approximated by band-limited functions. An(cid:173)
`other way of saying this is that adjacent pixels are
`highly correlated, or that there is a much lower rank
`description of the image which captures virtually all
`of the independent features. In transform coding, we
`seek a basis of these features, in which the coordi(cid:173)
`nates are less highly correlated or even uncorrelated.
`These coordinates are then approximated to some
`precision, and that approximate representation is fur(cid:173)
`ther passed through a lossless redundancy remover.
`Figure 1 depicts a generic image compression trans(cid:173)
`form coder. It embodies a three-step algorithm. The
`first block (Transform) applies an invertible coordi(cid:173)
`nate transformation to the image. We think of this
`transformation as implemented in real arithmetic,
`with enough precision to keep the truncation error
`below the quantization error introduced by the origi(cid:173)
`nal sampling. The output of this block will be treated
`as a stream of real numbers, though in practice we are
`always limited to a fixed precision.
`The second block (Quantize) replaces the real num(cid:173)
`ber coordinates with lower-precision approximations
`which can be coded in a (small) finite number of dig(cid:173)
`its. If the transform step is effective, then the new
`coordinates are mostly very small and can be set to
`zero, while only a few coordinates are large enough to
`survive. The output of this block is a stream of small
`
`integers, most of which are the same (namely 0). If
`our goal is to reduce the rank ofthe representation, we
`can now stop and take only the surviving amplitudes
`and tag them with some identifiers. If our goal is to
`reduce the number of bits that we must transmit or
`store, then we should proceed to the next step.
`The third block (Remove redundancy) replaces the
`stream of small integers with some more efficient al(cid:173)
`phabet of variable-length characters. In this alphabet
`the frequently occurring letters (like "0") are repre(cid:173)
`sented more compactly than rare letters.
`
`3. DECORRELATION BY TRANSFORMATION
`
`-We will consider six pixel transformations which
`
`have proven useful in decorrelating smooth pictures.
`
`3.1. Karhunen-Loeve
`Let us now fix an image size-say height H and
`width W, with N = H X W pixels-and treat the indi(cid:173)
`vidual pixels as random variables. Our probability
`space will consists of some collection of pictures J' =
`{ S1 , S2 , ••• , SM}, where M is a large number. The
`intensity of the nth pixel S ( n), 1 .:; n .:; N, is a ran(cid:173)
`dom variable that takes a nonnegative real value for
`each individual picture S E J'. Nearby pixels in a
`smooth image are correlated, which means that the
`value of one pixel conveys information about the like(cid:173)
`lihood of its neighbors' values. This implies that hav(cid:173)
`ing transmitted the one pixel value at full expense, we
`should be able to exploit this correlation to reduce the
`cost of transmitting the neighboring pixel values.
`This is done by transforming the picture into a new
`set of coordinates which are uncorrelated over the
`collection J' and then transmitting the uncorrelated
`values.
`More precisely, the collection of smooth pictures J'
`has off-diagonal terms in the autocovariance matrix
`A = (A ( i,j)) ~~1 of the pixels in J',
`1 M
`
`A(i,j) = M ~1 Sm(i) X Sm(j),
`where Sm = Sm- ( 1 I M) :Lm Sm. A can be diagonalized
`because it is symmetric (see [ 2, Theorem 5.4, page
`120] for a proof of this very general fact). We can
`write T for the orthogonal matrix that diagonalizes A;
`
`-
`
`-
`
`(1)
`
`Scanned
`image
`
`Remove
`redundancy
`
`FIG. 1.
`
`Idealized transform coder.
`
`::===::1
`Storage I
`
`'------
`
`Page 2 of 23
`
`
`
`then TAT* is diagonal, and Tis called the Karhunen(cid:173)
`Loeve transform, or alternatively the principal orthogo(cid:173)
`nal decomposition. The rows ofT are the vectors of the
`Karhunen-Loeve basis for the collectionS, or equiva(cid:173)
`lently for the matrix A. The number of positive eigen(cid:173)
`values on the diagonal of TAT* is the actual number
`of uncorrelated parameters, or degrees of freedom, in
`the collection of pictures. Each eigenvalue is the vari(cid:173)
`ance of its degree of freedom. TSm is Sm written in
`these uncorrelated parameters, which is what we
`should transmit.
`Unfortunately, the above method is not practical
`because of the amount of computation required. For
`typical pictures, N is in the range 10 4-10 6
`• To diago(cid:173)
`nalize A and find T requires 0 ( N 3
`) operations in the
`general case. Furthermore, to apply T to each picture
`requires 0 ( N 2 ) operations in general. Hence several
`simplifications are usually made.
`
`3.2. DCT
`For smooth signals, the autocovariance matrix is
`assumed to be of the form
`
`A(i,j) = riHI,
`
`(2)
`
`where r is the adjacent pixel correlation coefficient
`and is assumed to be just slightly less than I. The
`expression I i - j I should be interpreted as I i, - j, I +
`I ic - jc I, where i, and ic are respectively the row and
`column indices of pixel i, and similarly for j. Experi(cid:173)
`ence shows that this is quite close to the truth for
`small sections of large collections of finely sampled
`smooth pictures. It is possible to compute the Kar(cid:173)
`hunen-Loeve basis exactly in the limit N-oo: in that
`case A is the matrix of a two-dimensional convolution
`with an even function, so it is diagonalized by the
`two-dimensional DCT. In one dimension, this trans(cid:173)
`form is an inner product with functions such as the
`one in the Fig. 2. This limit transform can be used
`
`0.5
`
`0
`
`-0.5
`
`-1
`
`-1.~0.6 -0.4 -0.2
`
`0
`
`0.2 0.4 0.6 0.8
`
`1.2
`
`1.4
`
`1.6
`
`FIG. 2. Example DCT basis function.
`
`instead of the exact Karhunen-Loeve basis; it has the
`added advantage of being rapidly computable via the
`fast DCT derived from the fast Fourier transform.
`The JPEG algorithm uses this transform and one
`other simplification. N is limited to 64 by taking 8 X 8
`subblocks of the picture. JPEG applies two-dimen(cid:173)
`sional DCT to the subblocks, and then treats the 64
`vectors of amplitudes individually in a manner which
`we will discuss in the next section.
`
`3.3. LCT or LOT
`Rather than use disjoint 8 X 8 blocks as in JPEG, it
`is possible to use "lapped" or "localized" (but still
`orthogonal) discrete cosine functions which are sup(cid:173)
`ported on overlapping patches of the picture. These
`local cosine transforms ( LCT, as in [ 7]) or lapped or(cid:173)
`thogonal transforms (LOT, as in [ 23]) are modifica(cid:173)
`tions of DCT which attempt to solve the blockiness
`problem by using smoothly overlapping blocks. This
`can be done in such a way that the overlapping blocks
`are still orthogonal; i.e., there is no added redundancy
`from using amplitudes in more than one block to rep(cid:173)
`resent a single pixel. For the smooth blocks to be or(cid:173)
`thogonal we must use DCT-IV, which is the discrete
`cosine transform using half-integer grid points and
`half-integer frequencies. The formulas for the smooth
`overlapping basis functions in two dimensions are de(cid:173)
`rived from the following formulas in one dimension.
`For definiteness we will use a particular symmetric
`bump function
`
`sin : (1 + sin 1TX ) ,
`
`b(x) =
`{
`
`0,
`
`3
`1
`if--< X<-,
`2
`2
`otherwise.
`
`(3)
`
`This function is symmetric about the value x = i· It
`is smooth on ( -t, ~)with vanishing derivatives at the
`boundary points, so that it has a continuous deriva(cid:173)
`tive on R. Note that we can modify b to obtain addi(cid:173)
`tional continuous derivatives by iterating the inner(cid:173)
`most sin 1rx. Let b1 ( x) = b ( x) and define
`
`(4)
`
`Then bn will have (use L'Hopital's rule!) at least 2n-J
`vanishing derivatives at -i and ~-
`Now consider the interval of integers I =
`{ 0, 1, 2,
`... , N- 1} , where N = 2 n is a positive integer power
`of 2. This can be regarded as the "current block" of N
`samples in an array; there are previous samples /' =
`{ · · ·, -2, -1} and future samples I"= {N, N + 1,
`· · · } as well. The lapped orthogonal functions are
`mainly supported on I, but they take values on { - N I
`2, ... , -1} C I' and { N, ... , N I 2 - 1} C I" as well;
`
`Page 3 of 23
`
`
`
`those are the overlapping parts. For integers k E { 0, 1,
`1} , we can define the function
`... , N
`
`Apart from b, these are evidently the basis func(cid:173)
`tions for the so-called DCT-IV transform. Figure 3
`shows one such function, with N chosen large enough
`so that the smoothness is evident. The orthogonality
`of such functions may be checked by verifying the
`equations
`
`3N/2-l
`
`2::
`
`j=-N/2
`
`{ 1
`l{;k(j),p,., (j) =
`,
`0,
`
`if k = k',
`
`if k =/= k'.
`
`(6)
`
`The chosen window function or "bell" allows co(cid:173)
`sines on adjacent intervals to overlap while remaining
`orthogonal. For example, the function ifik( j + N) is
`centered over the rangej E { -N, ~N + 1, ... , -1}
`and overlaps the function ifik· ( j) at values j E {- N I 2,
`- N I 2 + 1, ... , N I 2 - 1} . Yet these two functions
`are orthogonal, which may be checked by verifying
`the equation
`
`N/2-1
`
`2::
`
`j=-N/2
`
`if;,.(j + N)ifik' (j) = 0,
`
`for all integers k, k'.
`
`(7)
`
`Of course, rather than calculate inner products
`with the sequences tPk, we can preprocess data so that
`standard fast DCT-IV algorithms may be used. This
`may be visualized as "folding" the overlapping parts
`of the bells back into the interval. This folding can be
`transposed onto the data, and the result will be dis(cid:173)
`joint intervals of samples which can be "unfolded" to
`produce smooth overlapping segments. This is best
`illustrated by an example. Suppose we wish to fold a
`
`/ __ ,-M----~---r~·-·rr~~\~-
`1 ' I II I I II ' '
`v1
`j·····ri/V
`I I \I I I v
`v v v
`
`0.5
`
`0
`
`-0.5
`
`·l
`
`-0.4 -0.2
`
`0
`
`0.2 0.4 0.6 0.8
`
`I
`
`1.2
`
`1.4
`
`FIG. 3. Example LCT basis function.
`
`smooth function across 0, onto the intervals {-N /2,
`... , -1} and {0, 1, ... , Nl2
`1}, using the bell b
`defined above. Then folding replaces the function f =
`f (j} with the left and right parts fo- and fo+:
`
`fo_(j)
`f(j), ifj<-NI2,
`
`b( -); i) f ( j) b( j ~ i ) f (
`1 ) '
`if j E { - N I 2, ... , -1},
`
`fo+ (j)
`
`b( j ~ i) f ( j) + b(- j;; i) f (- j - 1),
`if j E { 0, 1, ... , N 12 - 1},
`f(j), ifj;;;:.NI2.
`(8)
`
`The symmetry of b allows us to use b (- x) instead of
`introducing the bell attached to the left interval. This
`action divides f into two independent functions (the
`even and odd parts of f ) which merge smoothly
`around the grid point 0. The process is an orthogonal
`transformation. We can fold the smooth function
`around the grid point N in a similar manner:
`
`ft- (j)
`f ( j) ,
`if j < N /2,
`b(-jN t-1)f(j) be~i 1)
`Xf(2N
`1), ifjE {NI2, ... ,N 1},
`
`j
`
`fl+ (j)
`
`. + 1
`b 1 N 2 - 1 f( j)
`(
`
`)
`
`+b(-j;;j-1)t(2N-j 1),
`
`ifj E {N, N + 1, ... , 3NI2- 1},
`if j ;;;:. 3N I 2
`( 9 )
`
`t ( j) ,
`
`The new function fo defined below is a smooth, inde(cid:173)
`pendent segment of the original smooth function{,
`restricted to the interval of values { 0, 1, ... , N
`1}:
`
`if j E { 0, 1, ... , N /2 - 1},
`ifjE {N/2, N/2 + 1, ... ,N -1}.
`(10)
`
`Page 4 of 23
`
`
`
`We can now apply theN-point DCT-IV transform
`directly to /0 •
`We can likewise define f m (j) for the values j E
`{mN, mN + 1, ... , (m + 1)N- 1} by the same
`folding process, which segments a smooth function f
`into smooth independent blocks. Folding to intervals
`of different lengths is easily defined as well. We can
`also generalize to two dimensions by separably fold(cid:173)
`ing in x and then in y.
`Unfolding reconstructs f from fo- and fo+ by the
`formulas
`
`f(j)
`
`b( -j;; i) fo- ( J) + b( j ~ i) fo+ ( - J - 1) ,
`
`ifj E { -N/2, ... , -1},
`
`b( j ~ t) fo+ ( J) - b( - j N- i) fo- ( - J - 1) ,
`
`ifjE{0,1, ... ,N/2-l}. (11)
`
`Composing these relations yields f (j) = [ b (j + V
`N) 2 + b(
`-t/N} 2
`] f(j). This equation is verified
`by the bell b defined above, for which the sum of the
`squares is 1.
`
`3.4. Adapted Block Cosines
`We can also build a library of block LCT bases (or
`block DCT bases) and search it for the minimum of
`some cost function. The chosen "best LCT basis" will
`be a patchwork of different-sized blocks, adapted to
`different-sized embedded textures in the picture. It
`will then be necessary to encode the basis choice to(cid:173)
`gether with the amplitudes. A description of two ver(cid:173)
`sions of this algorithm and some experiments may be
`found in [13].
`
`3.5. Subband Coding
`A (one-dimensional) signal may be divided into
`frequency subbands by repeated application of convo(cid:173)
`lution by a pair of digital filters, one high-pass and
`one low-pass, with mutual orthogonality properties.
`Let { hk} :!i/, { gk} :!(/ be two finite sequences, and
`define two operators H and G as
`
`M-l
`
`(Hf)l< = L hJij+2k•
`j=O
`
`M-1
`
`( G I),= L gJj+2k·
`j=O
`
`(12)
`
`H and G are defined on square-summable signal se(cid:173)
`quences of any length. They are also defined for peri(cid:173)
`odic sequences of (even) period P, where we simply
`interpret the index of f as j + 2k (mod P). In that
`case, the filtered sequences will be periodic with pe(cid:173)
`riod P/2.
`
`The adjoints H* and G* of Hand G are defined by
`
`(H* I h
`
`L
`
`O"".k-2j<M
`
`hk-2jf;,
`
`(G*/h =
`
`L
`
`gk-2jt·
`0"".k-2j<M
`
`(13)
`
`Hand G are called (perfect reconstruction) quadra(cid:173)
`ture mirror filters (or QMFs) if they satisfy a pair of
`orthogonality conditions:
`
`HG* GH*
`
`0; H* H + G*G
`
`I.
`
`(14)
`
`Here I is the identity operator. These conditions
`translate to restrictions on the sequences { hk } , { gk } .
`Let m0 , m 1 be the bounded periodic functions defined
`by
`
`m0 (~)
`
`M-1
`
`:Z: hkeik~, mr(~)
`k=O
`
`M-1
`
`L gkeik~.
`k=O
`
`(15)
`
`Then H, G are quadrature mirror filters if and only if
`the matrix below is unitary for all ~:
`
`m 0 (0 m0 (~ + rr).).
`m1 ( 0 mr( ~ + rr)
`
`(
`
`(16)
`
`This fact is proved in [11]. QMFs can be obtained by
`constructing a sequence { hk} with the desired low(cid:173)
`pass
`filter
`response, and
`then putting gk
`( - 1 ) k h M _ 1_k. That reference also contains an algo(cid:173)
`rithm for constructing a family of such { hk}, one for
`each even filter length M.
`The frequency response of one particular pair of
`QMFs ( "C30") is depicted in Fig. 4. We have plotted
`the absolute values of m0 and m 1 , respectively, over
`one period [
`rr, rr]. Note that m 0 attenuates frequen(cid:173)
`cies far from 0, while m 1 attenuates those near 0.
`The traditional block diagram describing the action
`of a pair of quadrature mirror filters is shown in Fig. 5.
`On the left is convolution and downsampling (by 2);
`on the right is upsampling (by 2) and adjoint convo(cid:173)
`lution, followed by summing of the components. The
`broken lines in the middle represent either transmis(cid:173)
`sion or storage.
`The underlying functions of subband filtering are
`produced by iterating H* and G* until we have
`enough points. For example, 10 iterations of H* ap(cid:173)
`plied to the sequence e0 = { · · ·, 0, 0, 1, 0, 0, · · ·}
`produce a 1024-point approximation to the smooth
`function whose translates span the lowest-frequency
`sub band. Likewise, a single G * after 9 iterations of H*
`applied to e0 produces a 1024-point approximation to
`the next lowest-frequency function. These are distin(cid:173)
`guished examples; the first is called the "scaling" or
`
`Page 5 of 23
`
`
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`-3
`
`-1
`
`-3
`
`Low pass C30
`
`High pass C30
`
`FIG. 4. Absolute values of m 0 and m 1 •
`
`"father" function, while the second is called the
`"wavelet" or "mother" function. They are depicted in
`Fig. 6 for a particular QMF ( "C30," with 30 taps).
`Higher-frequency subbands are spanned by func(cid:173)
`tions with more oscillations, which are produced by
`using G* earlier in the iteration. The sequence of
`filters used to generate a function can be converted to
`an integer in binary notation as follows. Put F~ = H*
`and Fr = G* in the formula for the function ( respec(cid:173)
`tively, put F0 = Hand F 1 = G). Then for any pair of
`integers nand L with 0 ~ n < 2L we can write n = n02°
`+ n 12 1 + · · · + nL_12L-I, where ni E { 0, 1} for all i =
`0, 1, ... , L- 1. To that combination (n, L) we can
`.
`F* F*
`F*
`assocmte a vector
`no o
`nL_ 1 eo.
`n 1 o • • • o
`For example, the functions given in Fig. 7 are 1024-
`point sequences H*G*(H* ) 8 e0 and G*G*(H* ) 8 e0 , re(cid:173)
`spectively, given by (n, L) = (2, 10) and (n, L) = (3,
`10). It is well known (and shown in [17]) that the
`number of oscillations of the vector produced in this
`manner increases with n', where n is the Gray-code
`permutation of n'. The renumbering n -
`n' relates
`Paley order to sequency order for Walsh functions
`and has an analogous effect in the smooth case. This
`fact can be used to analyze the spectrum of acoustic
`signals by measuring the amplitudes of wavelet
`packet coefficients. Acoustic signal compression by
`wavelet packet and best-basis methods was discussed
`in [32].
`Now we can define four two-dimensional convolu(cid:173)
`tion -decimation operators in terms of H and G,
`namely the tensor products of the pair of quadrature
`mirror filters:
`
`def
`F0 = H®H,
`F 0v(x,y) = L hihjv(i + 2x,j + 2y)
`
`i,j
`
`def
`F 1 =H0G,
`F 1v(x,y) = L higjv(i + 2x,j + 2y)
`
`i,j
`
`(17)
`
`(18)
`
`def
`F 2 = G0H,
`F 2v(x, y) = L: gihjv(i + 2x,j + 2y)
`
`i,j
`
`def
`F3 = G0G,
`F 3v(x, y) = L: gigjv(i + 2x,j + 2y)
`i,j
`
`(19)
`
`(20)
`
`These convolution decimations have the following
`adjoints:
`
`F~v(x,y) = L: hx-2ihy-2jv(i,j)
`
`i,j
`
`Frv(x, y) = L: hx-2igy-2jv(i,j)
`
`i,j
`
`F;v(x, y) = L gx-2ihy-2jv(i,j)
`
`i,j
`
`F;v(x, y) = L gx-2igy-2jv(i,j).
`
`i,j
`
`(21)
`
`(22)
`
`(23)
`
`(24)
`
`The orthogonality relations for this collection are
`
`FnF':n = OnmJ;
`
`(25)
`
`I= F~Fo G FrFl G F;F2 G F;F3.
`
`(26)
`
`By a "picture" we will mean a finite sequence in(cid:173)
`dexed by two coordinates S = S ( x, y) . It is convenient
`to regard pictures as periodic in both x andy, though
`this is not absolutely necessary. For simplicity of im(cid:173)
`plementation, we shall also assume that the x period
`(the "width" Nx = 2 n,) and they period (the "height"
`Ny = 2 ny) are both positive integer powers of 2, so that
`
`FIG. 5. Block diagram of subband filtering.
`
`Page 6 of 23
`
`
`
`C30 Scaling or father function
`
`C30 Wavelet or mother function
`
`FIG. 6. Lowest-frequency subband basis functions.
`
`we can always decimate by 2 and get an integer period.
`The space of such pictures may be decomposed into a
`partially ordered set W of subspaces W ( n, m) called
`subbands (see below), where m ~ 0, and 0 ,;;; n < 4 m.
`These are the images of orthogonal projections com(cid:173)
`posed of products of convolution decimations. Denote
`the space of Nx X Nv pictures by W ( 0, 0) (it is Nx X
`Ny-dimensional), arid define recursively
`
`W ( 4n + i, m + 1) Fi F; W ( n, m) C W ( 0, 0),
`fori= 0, 1, 2, 3.
`(27)
`
`The orthogonality condition on the QMFs implies
`that the projections from W ( 0, 0) onto W ( n, m) are
`orthogonal; i.e., they conserve energy. The subspace
`W(n, m) is (N,2-m) X (Ny2-m)-dimensional. These
`subspaces may be partially ordered by a relation
`which we define recursively as follows. We say W is a
`precursor of W' (write W-< W') if they are equal or if
`W' F* FW for a convolution decimation Fin the set
`{F0 , F 1, F2 , F 3 }. We also say that W-< W'ifthere is a
`finite sequence vl, ... ' v$ of subspaces in w such
`that W-< V1 -< · · · -< Vs-< W'. This is well defined,
`since each application ofF* F increases the index m.
`Subspaces of a single precursor W E W will be
`called its descendents, while the first generation of
`descendents will naturally be called children. By the
`orthogonality condition,
`
`The right-hand side contains all the children of W.
`The subspaces W ( n, m) are called subbands, and
`
`the transform coding method that first transforms a
`signal into sub band coordinates is called subband cod(cid:173)
`ing. If S E W ( 0, 0) is a picture, then its orthogonal
`projection onto W ( n, m) can be computed in the
`standard coordinates of W ( n, m) by the formula
`F(l) • • • F<rnl W ( 0, 0), where the particular filters
`F < 1 l • · • F (m l are determined uniquely by n. Therefore
`we can express in standard coordinates the orthogo(cid:173)
`nal projections of W ( 0, 0) onto the complete tree of
`subspaces W by recursively convolving and decimat(cid:173)
`ing with the filters.
`The quadrature mirror filters Hand G form a parti(cid:173)
`tion of unity in the Fourier transform (or wavenum(cid:173)
`ber) space. The same is true for the separable filters
`F;. They can be described as nominally dividing the
`support set of the Fourier transform S of the picture
`into dyadic squares. If the filters were perfectly sharp,
`then this would be literally true, and the children of W
`would correspond to the four dyadic subsquares one
`scale smaller. We illustrate this in Fig. 8.
`Figure 9 shows two generations of descendents, the
`complete decomposition of R 4 X R 4
`• The sub bands
`are labeled by the "n" index in W ( n, m). Within the
`dyadic squares are the n indices of the corresponding
`subspaces at that level. If we had started with a pic(cid:173)
`ture of N X N pixels, then we could repeat this decom(cid:173)
`position process log2 (N) times.
`All subbands together form a quadtree, in which
`each subspace forms a node and directed edges go
`from precursors to descendents. The orthogonality
`relation among the subs paces implies that every con(cid:173)
`nected subtree which contains the root W ( 0, 0)
`corresponds to an orthonormal subband decomposi(cid:173)
`tion of the original picture: the sub bands correspond
`
`Higher frequency C30 wavelet packets
`
`FIG. 7. Higher-frequency subband basis functions.
`
`Page 7 of 23
`
`
`
`FIG. 8. Four subband descendants of a picture.
`
`to the leaves of the subtree. Having stated this general
`nonsense result, let us consider specific examples.
`The subbands W ( n, m) as we define them are in
`one-to-one correspondence with rectangular regions
`(in fact squares) in wavenumber space, and the quad(cid:173)
`tree stacks these regions one on top of the other. We
`can idealize various orthogonal subband bases as dis(cid:173)
`joint covers of wavenumber space. A few of these are
`schematically represented in Fig. 10.
`The leftmost decomposition is a sub band decompo(cid:173)
`sition in which we have all the subbands at a fixed
`level-in this case, level 2. The subbands are labeled
`( 0, 2), ( 1, 2), ( 2, 2), ( 3, 2), ... , (15, 2) as in Fig. 9.
`The middle decomposition produces two-dimensional
`"isotropic" wavelets, i.e., those which have the same
`scale in both the x and y directions. The sub bands in
`this decomposition are labeled by ( 0, 4), ( 1, 4), ( 2, 4),
`(3, 4), (1, 3), (2, 3), (3, 3), (1, 2), (2, 2), (3, 2), (1,
`1), ( 2, 1), and ( 3, 1).
`The rightmost decomposition is an adapted sub(cid:173)
`band basis such as might be discovered by minimizing
`some cost function over all subband decompositions.
`In that particular example, the subspaces are labeled
`by ( 0, 4)' ( 1, 4)' ( 2, 4)' ( 3, 4)' ( 4, 4)' ( 5, 4)' ( 6, 4)' ( 7,
`4)' ( 8, 4)' ( 9, 4)' ( 10, 4)' ( 11, 4)' ( 3, 3)' ( 16, 4)' ( 17'
`4)' ( 18, 4)' ( 19, 4)' ( 5, 3)' ( 6, 3)' ( 7' 3)' ( 32, 4)' ( 33,
`4), (34, 4), (35, 4), (9, 3), (10, 3), (11, 3), (3, 2), (4,
`2), (5, 2), (24, 3), (25, 3), (26, 3), (27, 3), (7, 2), (8,
`2), (36, 3), (37, 3), (38, 3), (39, 3), (10, 2), (11, 2),
`( 12, 2), (13, 2), ( 14, 2), and (15, 2). It is similar to a
`subband basis which is particularly effective at repre(cid:173)
`senting fingerprint images.
`3.5.1. Complete subband level. We can divide a
`picture into patches and code the spatial frequencies
`of the patches by a decimation-in-space algorithm
`(sub band coding) rather than a decimation-in-fre(cid:173)
`quency algorithm (block DCT) . This is computation(cid:173)
`ally equivalent, but has the advantage of using
`smooth basis functions (if the filters are chosen appro(cid:173)
`priately) and also greater flexibility, since there are
`additional degrees of freedom available in the choice
`of filters. The drawback is poorer time-frequency lo(cid:173)
`calization. Subband basis functions spread out in the
`spatial domain in proportion to their smoothness, and
`they have rather complicated Fourie:..· transforms.
`The sub band correlations are computed recursively
`as depicted in Fig. 11. This method is popular largely
`due to its simplicity. The amplitudes within subbands
`correspond to equal-sized basis functions distin-
`
`guished only by their spatial frequencies. Hence vari(cid:173)
`able bit allocation is simple to implement, using the
`many transform values within a subband as the ran(cid:173)
`dom variables, and allocating bits to a subband ac(cid:173)
`cording to its variance. In addition, it is possible to use
`a simple visibility weighting formula based on the
`eye's spatial frequency response. See the quantization
`section below for further details. With these addi(cid:173)
`tional steps, subband coding becomes quite competi(cid:173)
`tive with other transform coding compression algo(cid:173)
`rithms.
`3.5.2. Wavelets, or multiresolution analysis. We
`can attempt pixel decorrelation by assuming that the
`picture looks more or less the same at various differ(cid:173)
`ent scales. This assumption underlies certain models
`of vision (see [ 24] ) , and partially accounts for the
`natural appearance of fractal computer images. In
`this case, the decorrelating transformation should do
`the same thing at each scale of the picture.
`We can build an orthonormal basis with this prop(cid:173)
`erty (in one dimension) by choosing a special
`"mother function" if; = 1/;( x) and then generating its
`descendants 1/lab = 1/;ab(x) = 2a/ 21/;(2ax- b), with in(cid:173)
`tegers a and b. A (separable) two-dimensional wave(cid:173)
`let basis can be built by taking products if; ab ( x) if; cd (y),
`although recent work by several authors (for exam(cid:173)
`ple, [ 6,21,30] has produced nonseparable multidi(cid:173)
`mensional wavelet bases which could also be used. It
`is not immediately obvious that a function if; with the
`requisite properties exists. The surprising and fortu(cid:173)
`nate recent discovery of many such functions
`( [11,22,36] and others) also provided a fast O(N) al(cid:173)
`gorithm for computing the associated wavelet trans(cid:173)
`forms.
`The wavelet basis down to level L consists of the
`elements spanning the sub bands W ( 1, 1), W ( 2, 1),
`W(3, 1), W(1, 2), W(2, 2), W(3, 2), ... , W(l,
`L-1), W(2, L-1), W(3, L-1), W(1, L), W(2, L),
`W ( 3, L) and the largest-scale average W ( 0, L). The
`pixel values may be transformed into this basis via the
`two-dimensional version of the "pyramid scheme"
`described in [ 22]. This is depicted graphically in Fig.
`12. The pyramid scheme is also called a multiresolu(cid:173)
`tion analysis, and has been extensively studied. It
`provides an algorithm of complexity O(N) for the
`transformation of anN-pixel picture.
`
`2 3 6 7
`8 9 12 13
`10 11 14 15
`
`DGE 0 1 4 5
`
`Javel 0
`
`Iaveii
`
`level2
`
`FIG. 9. Two levels of subband decomposition.
`
`Page 8 of 23
`
`
`
`" · -
`
`I
`I
`
`-···
`
`i
`
`I
`
`I
`
`Basis ot subbands
`at one level
`
`Mulliresolution or
`wavelet basis
`
`Adapted subband or
`wavelet packet basis
`
`FIG. 10. Various decompositions into subbands.
`
`3.5.3. Custom subbands. There may be features
`which are most efficiently described