`
`Block Matching
`
`As mentionedin the previous chapter, displacement vector measurement andits usage in motion
`compensation in interframe coding for a TV signal can be traced back to the 1970s. Netravali and
`Robbins (1979) developed a pel-recursive technique, whichestimates the displacement vector for
`eachpixel recursively fromits neighboring pixels using an optimization method. Limb and Murphy
`(1975), Rocca and Zanoletti (1972), Cafforio and Rocea (1976), and Brofferio and Rocca (1977)
`developed techniques for the estimation of displacement vectors of a block ofpixels. In the latter
`approach, an image is first segmented into areas with each having an approximately uniform
`translation. Then the motion vector is estimated for each area. The segmentation and motion
`estimation associated withthese arbitrarily shaped blocks are very difficult. When there are muluple
`moving areas in images, the situation becomes more challenging. In addition to motion vectors,
`the shape information of these areas needs to be coded. Hence, when moving areas have various
`complicated shapes, both computational complexity and coding load will increase remarkably.
`In contrast,
`the block matching technique, which is
`the focus of this chapter,
`is simple,
`straightforward, and yet very efficient.
`It has been by far the most popularly uulized motion
`estimation technique in video coding.
`In fact, it has been adopted by all the international video
`coding standards; ISO, MPEG-I and MPEG-2, and ITU H.261, and H.263. These standards will
`be introduced in detail in Chapters 16, 17, and 19, respectively.
`It is interesting to note that even nowadays, with the tremendous advancements in multimedia
`engineering, object-based and/or content-based manipulation of audiovisual informationis still very
`demanding, particularly in audiovisual data storage, retrieval, and distribution. The applications
`include digital library, video on demand, audiovisual databases, and so on. Therefore, the coding
`of arbitrarily shaped objects has attracted great researchattention these days. It has been included
`in the MPEG-4 activities (Brailean, 1997), and will be discussed in Chapter 18,
`In this chapter various aspects of block matching are addressed. They include the concept and
`algorithm, matching criteria, searching strategies, limitations, and new improvements.
`
`11.1 NONOVERLAPPED, EQUALLY SPACED, FIXED SIZE,
`SMALL RECTANGULAR BLOCK MATCHING
`
`To avoid the kind of difficulties encountered in motion estimation and motion compensation with
`arbitrarily shaped blocks, the block matching technique was proposed by Jain and Jain (1981) based
`on the following simple motion model.
`An imageis partitioned into a set of nonoverlapped, equally spaced,fixed size, small rectangular
`blocks; and thetranslation motion within each block is assumed to be uniform. Althoughthis simple
`model considers translation motion only, other types of motions, such as rotation and zooming of
`large objects, may be closely approximated by the piecewise translation of these small blocks
`provided that these blocks are small enough. This observation, originally made by Jain and Jain,
`has been confirmed again and again since then.
`Displacementvectors for these blocks are estimated by finding their best matched counterparts
`in the previous frame.
`In this manner, motion estimation is significantly easier than that for
`arbitrarily shaped blocks. Since the motion of each block is described by only one displacement
`vector,
`the side information on motion vectors decreases. Furthermore,
`the rectangular shape
`
`221
`
`IPR2018-01413
`Sony EX1008 Page 247
`
`IPR2018-01413
`Sony EX1008 Page 247
`
`
`
`222
`
`Image and Video Compression for Multimedia Engineering
`
`the best block matching
`
`correlation window
`
` rth displacementvector
`
`
`
`An original block
`
`(a) t, frame
`
`(b)t,., frame
`
`FIGURE 11.1 Block matching.
`
`information is known to both the encoderand the decoder, and hence does not needto be encoded,
`which saves both computation load and side information,
`the smaller the block size, the more
`The block size needs to be chosen properly. In general,
`accurate is the approximation. It is apparent, however, that the smaller block size leads to more
`motion vectors being estimated and encoded, which means anincrease in both computation and
`side information. As a compromise,a size of 16 x 16 is considered to be a good choice. (This has
`been specified in international video coding standards such as H.261, H.263, and MPEG-1 and
`MPEG-2.) Notethat for finer estimation a block size of 8 x 8 is sometimes used.
`Figure 11.1 is utilized to illustrate the block matching technique.
`In Figure !1.1(a) an image
`frame at moment£, is segmented into nonoverlapped p x g rectangular blocks. As mentioned above,
`in commonpractice, square blocks of p = q = 16 are used most often. Consider one of the blocks
`centered at (x, y). It is assumed that the block is translated as a whole. Consequently, only one
`displacement vector needsto be estimated for this block. Figure | 1.1(b) shows the previous frame:
`the frame at momentt,.,. In order to estimate the displacement vector, a rectangular search window
`is opened in the framer,, and centered atthe pixel (x, y). Consider a pixel in the search window,
`a rectangular correlation window of the same size p x q is opened with the pixel
`located in its
`center. A certain type ofsimilarity measure (correlation) is calculated, After this matching process
`has been completed for all candidate pixels in the search window,the correlation window corre
`spondingto thelargest similarity becomesthe best matchof the block under consideration in frame
`1,. The relative position betweenthese two blocks(the block and its best match) gives the displace-
`ment vector. This is shown in Figure 11.1(b).
`The size of the search window is determined by the size of the correlation window and the
`maximum possible displacementalongfour directions: upward, downward,rightward, and lefuward.
`In Figure 11.2 these four quantities are assumed to be the same and are denoted by d. Notethat d
`is estimated from a priori knowledge about the translation motion, which includes the largest
`possible motion speed and the temporal interval between two conseculive frames, 1-€.,
`fy ~ Unt:
`
`11.2} MATCHING CRITERIA
`
`Block matching belongs to image matching and can be viewed from a wider perspective. In many
`image processing tasks, we need to examine two imagesor twoportions of images on a pixel-by-pixel
`
`IPR2018-01413
`Sony EX1008 Page 248
`
`IPR2018-01413
`Sony EX1008 Page 248
`
`
`
`Block Matching
`
`223
`
`qt2d
`
`)
`
`pr2d
`
`FIGURE 11.2 Search window and correlation window,
`
`basis. These two images or two image regions canbe selected from a spatial image sequence, i.e.,
`from twoframes taken at the same time with \wo different sensors aiming at the same object, or
`from a temporal image sequence,i.c., from two frames taken al Lwo different moments by the same
`sensor. The purpose ofthe examinationis to determinethe similarity between the two images or
`two portions of images. Examples of this type of application include image registration (Pratt,
`1974) and template matching (Jain, 1989). The former deals with spatial registration of images,
`while the latter extracts and/or recognizes an object in an image by matching the object template
`and a certain area ofthe image.
`The similarity measure, or correlation measure, is a key element in the matching process, The
`basic correlation measure between two images1, and f,,, C (s, 0), is defined as follows (Anuta, 1969).
`
`=
`
`Pa2a (11.1)
`[SFShnoe! Or Deeines
`
`Thisis also referred to as a normalized two-dimensionalcross-correlation function (Musmannet al.,
`1985).
`Instead offinding the maximum similarity or correlation, an equivalent but yet more compu-
`tationally efficient way of block matchingis to find the minimumdissimilarity, or matchingerror.
`The dissimilarity (sometimesreferred to as the error, distortion, or distance) between (wo images
`i, and t,.,, D (s, t) is defined as follows.
`
`r= LYmi(Kfatnk+0),
`
`j=l k=l
`
`(11.2)
`
`where M(u,v)is a metric that measures the dissimilarity between the two arguments u and v. The
`D (s, t) is also referred to as the matchingcriterion or the D values.
`In the literature there are several types of matching criteria, among which the mean square
`error (MSE) (Jain and Jain, 1981) and mean absolute difference (MAD) (Koga et al., 1981) are
`used mostoften. It is noted that the sum of the squared difference (SSD) (Anandan, 1987) or the
`sum of the squared error (SSE) (Chanet al., 1990) is essentially the same as MSE. The mean
`
`IPR2018-01413
`Sony EX1008 Page 249
`
`IPR2018-01413
`Sony EX1008 Page 249
`
`
`
`294
`
`Image and Video Compression for Multimedia Engineering
`
`absolute difference is sometimes referred to as the mean absolute error (MAB) in the literature
`(Nogaki and Ohta, 1972).
`In the MSE matchingcriterion, the dissimilarity metric M (u, v) 1s defined as
`
`In the MAD,
`
`M(u,v)=(u—v)’.
`
`M(u,v)=|u—y.
`
`(11.3)
`
`(11.4)
`
`Obviously,both criteria are simpler than the normalized two-dimensional cross-correlation measure
`defined in Equation 11.1.
`Before proceeding to the next section, acommentontheselection ofthe dissimilarity measure
`is due. A study based on experimental works reported that the matching criterion does not signif-
`icantly affect the search (Srinivasan, 1984). Hence, the MADis preferred due toits simplicity in
`implementation (Musmann et al,, 1985),
`
`11.3.
`
`SEARCHING PROCEDURES
`
`The searching strategy is another importantissue to deal with in block matching. Several searching
`Strategies are discuused below.
`
`11.3.1
`
`Futt Searcy
`
`Figure 11.2 showsa search window, a correlation window,andtheir sizes. In searching for the best
`match,the correlation window is moved to each candidate position within the search window. That
`is, there are a total (2 d+1) x (2 d+!) positions that need to be examined. The minimum dissimilarity
`gives the best match. Apparently, this full search procedureis brute force in nature. While the full
`search delivers good accuracy in searching for the best match (thus, good accuracy in motion
`estimation), a large amount of computation is involved.
`In order to lower computational complexity, several
`developed. They are introduced below.
`
`fast searching procedures have been
`
`11.3.2
`
`2-D LocaritHmic SearcH
`
`Jain and Jain (1981) developed a 2-D logarithmic searching procedure. Based on a 1-D logarithmic
`search procedure (Knuth, 1973), the 2-D procedure successively reduces the search area,
`thus
`reducing the computational burden, Thefirst steps computes the matching criteria for five points
`in the search window.Thesefive points are as follows: the central point of the search window and
`the four points surroundingit, with each being a midpoint between the central point and one of
`the four boundaries of the window. Amongthesefive points, the one corresponding to the minimum
`dissimilarity is picked as the winner, In the next step, surrounding this winner, anotherset of five
`points are selected in a similar fashionto that in thefirst step, with the distances between the five
`points remaining unchanged. The exception takes place wheneither a central point of a set of five
`points or a boundary pointofthe search window gives a minimumDvalue.In these circumstances,
`the distances betweenthe five points need to be reduced, The procedure continues until the final
`step, in whicha set of candidate points are located in a 3 x 3 2-D grid. Figure 11.3 demonstrates
`two cases of the procedure. Figure 11.3(a) shows that the minimum D value takes place on 4
`boundary, while Figure 11.3(b) shows the minimum D valuein the central position,
`
`IPR2018-01413
`Sony EX1008 Page 250
`
`IPR2018-01413
`Sony EX1008 Page 250
`
`
`
`Block Matching
`
`225
`
`
`
`(b)
`
`(a) A 2-D logarithmic search procedure, Points at (j, k+2), (+2, k+2), (+2, k+4), and G+1,
`FIGURE 11.3
`k+4) are foundto give the minimum dissimilarity in steps 1, 2, 3, and 4, respectively. (>) A: 2-D loganthmte
`‘ search procedure, Pointsat (j, k-2), (j+2, k-2), and (+2, k-1) are found to give the minimum dissimilarity in
`Steps 1, 2, 3, and 4, respectively.
`
`A convergenceproof ofthe procedureis presented by Jain and Jain (1981), underthe assumption
`that the dissimilarity monotonically increases as the search point moves away from the point
`corresponding to the minimum dissimilarity.
`
`IPR2018-01413
`Sony EX1008 Page 251
`
`IPR2018-01413
`Sony EX1008 Page 251
`
`
`
`226
`
`Image and Video Compression for Multimedia Engineering
`
`ol a
`
`°4
`
`=
`
`
`werosoewaaieee (~)
`
`=)
`
`No
`
`
`
`
`
`
`es
`
`el
`iigelalel
`lta
`
`ee
`(er[aeeaa
`Dalgetyer|
`
`
`ieaEd
`
`
`Cleteaaaeaesteee
`
`
`FIGURE 11.4 Three-step search procedure. Points (j+4, k-4), (j+4, k-6), and (j+5, k-7) give the minimum
`dissimilarity in steps 1, 2, and 3, respectively.
`
`11.3.3 Coarse-Fine THree-Step SEARCH
`
`Another important work on the block matching technique was completed at almost the same time
`by Koga et al. (1981). A coarse-fine three-step procedure was developed for fast searching.
`The three-step search is very similar to the 2-D logarithm search. There are, however, three
`main differences between the two procedures.First, each step in the three-step search comparesa
`set of nine points that form a 3 x 3 2-D grid structure. Second, the distances between the points in
`the 3 x 3 2-D grid structurein the three-step search decrease monotonically in steps 2 and 3. Third,
`a total of only three steps are carried out. Obviously, these three itemsare different from the 2-D
`logarithmic search described in Section 11.3.2. Anillustrative example of the three-step search is
`shown in Figure 11.4.
`
`11.3.4 Conyucate Direction SEARCH
`
`The conjugate direction search is another fast search algorithm that was developed by Srinivasan
`and Rao (1984). In principle, the procedure consists of two parts. In the first part,
`it finds the
`minimum dissimilarity along the horizontal direction with the vertical coordinatefixed at aninitial
`position. In the second part, it finds the minimum D value along the vertical direction with the
`horizontal coordinate fixed in the position determined in the first part. Starting with the vertical
`direction followed by the horizontaldirection is, of course, functionally equivalent. It was reported
`that this search procedure works quite efficiently (Srinivasan and Rao, 1984).
`Figure 11.5 illustrates the principle of the conjugate direction search. In this example, each
`step involves a comparison between three testing points. If a point assumes theminimum D ae
`compared with both of its two immediate neighbors (in one direction), then it is considered e e
`the best match along this direction, and the search along anotherdirection is started. See y:
`the procedure starts to compare the D valuesfor three points (j, k-1), (j, k), and G, k+1). I ay
`value of point (j, kK-1) appears to be the minimum amongthe three, then points (j, k-2), U,
`;
`
`IPR2018-01413
`Sony EX1008 Page 252
`
`IPR2018-01413
`Sony EX1008 Page 252
`
`
`
`227
`
`Block Matching
`
`
`
`FIGURE 11.5 Conjugate direction search.
`
`and (j, k) are examined. The procedure continues, finding point (j, k-3) as the best match along
`the horizontal direction since its D value is smaller than that of points (j, k-4) and (j, k-2), The
`procedure is then conducted alongthe vertical direction. In this example the best matchingis finally
`found at point (j+2, k—3).
`
`11.3.5
`
` SuBsamMPLING IN THE CORRELATION WINDOW
`
`In the evaluation of the matching criterion, either MAD or MSE,all pixels within a correlation
`windowatthe f,.; frame and an original blockatthe ¢, frame are involved in the computation. Note
`that the correlation window and theoriginal block are the samesize (refer to Figure 11.1). In order
`to further reduce the computational effort, a subsampling inside the window and the block is
`performed (Bierling, 1988). Aliasing effects can be avoided by using low-passfiltering. For instance,
`only every second pixel, both horizontally and vertically inside the window and the block,is taken
`into account for the evaluation of the matching criterion. Obviously, by using this subsampling
`technique, the computational burdenis reduced by a factor of 4. Since 3/4 of the pixels within the
`window and the block are not involved in the matching computation, however, the use of such a
`subsampling procedure may affect the accuracy ofthe estimated motion vectors, especially in the
`case of small-size blocks. Therefore,
`the subsampling technique is recommended only for those
`cases with a large enough blocksize so that the matching accuracy will not be seriously affected.
`Figure 11.6 shows an example of 2 x 2 subsampling applied to both an original block of 16x 16
`at the 1, frame and a correlation window of the samesize at the f,_, frame.
`
`11.3.6 MuttiresoLuTION BLock MATCHING
`
`is a very
`is well known that a multiresolution structure, also known as a pyramid structure,
`Tt
`powerful computational configuration for various image processing tasks, To save computation in
`block matching,it is natural to resort to the pyramid structure. In fact, the multiresolution technique
`has been regarded as one of the most efficient methods in block matching (Tzovaras et al., 1994).
`In a named top-down multiresolution technique, a typical Gaussian pyramid is formed first.
`
`IPR2018-01413
`Sony EX1008 Page 253
`
`IPR2018-01413
`Sony EX1008 Page 253
`
`
`
`228
`
`Image and Video Compressionfor Multimedia Engineering
`
`an original block
`
`a corelation window
`
`(a) An original block of 16%16 in frame alt,
`
`(b) A correlation window of 146%16 in frameat t
`
`FIGURE 11.6 An example of 2x 2 subsampling in the original block and correlation window for a fast
`search.
`
`Before diving into further description, let us pause here to give those readers who have not been
`exposed to the Gaussian pyramid a short introduction to the concept. For those who knowthe
`concept, this paragraph can be skipped. Briefly speaking, a Gaussian pyramid can be understood
`as a set of images with different resolutions related to an original image in a certain way. The
`original image has the highestresolution and is considered as the lowest level, sometimes called
`the bottom level,
`in the set. From the bottom level
`to the top level,
`the resolution decreases
`monotonically. Specifically, between two consecutive levels, the upper level is half as large as the
`lower level in both horizontal and vertical directions. The upper level is generated by applying a
`low-passfilter (which has a group ofweights) to the lowerlevel, followed by a 2 x 2 subsampling.
`That is, cach pixel in the upperlevel is a weighted average of some pixels in the lower level.
`In
`general, this iterative procedure of generating a level in the set is equivalent to convolving a specific
`weight function with the original imageat the bottom level followed by an appropriate subsampling.
`Undercertain conditions, these weight functions can closely approximate the Gaussian probability
`density function, which is why the pyramid is namedafter Gauss. (For a detailed discussion, readers
`are referred to Burt and Adelson [1983, 1984].) A Gaussian pyramid structure 1s depicted in
`Figure 11.7. Note that the Gaussian pyramid depicted in Figure 11.7 resembles a so-called quad-
`tree structure in which each node hasfourchildren nodes. In the simplest quad-tree pyramid, each
`pixel in an upperlevel is assigned an average valueof ils corresponding four pixels in the next
`lowerlevel,
`Nowlet’s return to our discussion on the top-down multiresolution technique. After a Gaussian
`pyramid has been constructed, motion search rangés are allocated among the different pyramid
`levels. Block matchingis initiated at the lowest resolution level to obtain an initial estimation of
`motion vectors. These computed motion vectors are then propagated to the next higher resolution
`level, where they are corrected and then propagated to the next level. This procedure continues
`until the highest resolution level is reached. As a result, a large amount of computation can be
`saved. Tzovaras et al. (1994) showed that a two-level Gaussian pyramid outperforms a three-level
`pyramid. Compared with full search block matching, the top-down multiresolution block search
`sayes up to 67% of computations withoutseriously affecting the quality ofthe reconstructed images.
`In conclusion,it has been demonstrated that multiresolution is indeedan efficient computational
`Structure in block matching. This once again confirms the high computational efficiency of the
`multiresolution structure.
`
`IPR2018-01413
`Sony EX1008 Page 254
`
`IPR2018-01413
`Sony EX1008 Page 254
`
`
`
`Block Matching
`
`229
`
`
`
`level
`increasing
`resolution
`ae a
`
`level 4
`
`level 3
`
`level 2
`
`level |
`
`level 0
`
`FIGURE 11.7 Gaussian pyramid structure.
`
`11.3.7 THRESHOLDING MULTIRESOLUTION BLock MATCHING
`
`Withthe multiresolution technique discussed above, the computed motion vectors at any interme-
`diate pyramid level are projected to the next higher resolution level. In reality, some computed
`motion vectors at
`the lower resolution levels may be inaccurate and have to be further refined,
`while others may be relatively accurate and able to provide satisfactory motion compensation for
`the corresponding block. From a computation-saving point of view,for the latter class it may not
`be worth propagating the motion vectors to the next higher resolution level for further processing.
`Motivated by the above observation, a new multiresolution block matching method with a
`thresholding technique was developed by Shi and Xia (1997). The thresholding technique prevents
`those blocks, whose estimated motion vectors provide satisfactory motion compensation, from
`further processing, thus saving a lot of computation. In what follows, this technique is presented
`in detail so as to provide readers with an insight to both multiresolution block matching and
`thresholding multiresolution block matching techniques.
`Algorithm — Letf,(x, y) be the frame of an image sequence al current moment n.First, two
`Gaussian pyramids are formed, pyramids n and n — 1, from image frames f(x, y) and f,_\(x,y),
`respectively, Let the levels of the pyramids be denoted by /, /= 0,
`I, ..., L, where 0 is the lowest
`resolution level (top level), L is the full resolution level (bottom level), and L+1 is the total number
`of layers in the pyramids. If(i, j) are the coordinates of the upper-left corner of a block at level /
`of pyramid , the block is referred to as block(i, j)!. The horizontal and vertical dimensions ofa
`block at level / are denoted by b! and b!, respectively. Like the variable block size method (refer
`to Method |
`in Tzovarasetal. [1994]), the size of the block in this work varies with the pyramid
`levels. Thatis, if the size of a block atlevel/ is b!, then the size ofthe blockat level / + 1 becomes
`2b; x 2b). The variable block size methodis used becauseit gives more efficient motion estimation
`than the fixed block size method. Here, the matching criterion used for motion estimation is the
`MADbecauseit does not require multiplication and performs similar to the MSE. The MAD
`between block (i, j)'b! of the current frame and block (i + v,,j + vy)'b}_; of the previous frame at
`leyel / can be calculated as
`
`IPR2018-01413
`Sony EX1008 Page 255
`
`IPR2018-01413
`Sony EX1008 Page 255
`
`
`
`230
`
`Image and Video Compression for Multimedia Engineering
`
`MAD, (“=o SYelithit m) = fy_,(i+ k+vij+m+ vi)
`I xb,
`xb, k=0 m=0
`
`]
`
`plat bi-1
`
`(11,5)
`
`where V' = (vj, vj) is one ofthe candidates of the motion vector of block (i, j)},, v/, wf are the two
`components of the motion vector along the x and y directions, respectively.
`A block diagram of the algorithm is shown in Figure 11.8. The threshold in terms of MAD
`needs to be determined in advance according to the accuracy requirement of the motion estimation
`Determining the threshold is discussed below in Part B of this subsection. Gaussian pyramids are
`formed for two consecutive frames of an image sequence [rom which motion estimationis desired
`Block matching is then performed at
`the top level with the full-search scheme. ‘The estimated
`motion vectors are checkedto see if they provide satisfactory motion compensation. If the accuracy
`requirementis met, then the motion vectors will be directly wansformed to the bottomlevel ofthe
`pyramid, Otherwise, the motion vectors will be propagated to the next higher resolution level for
`further refinement. This thresholding process is discussed below in Part C ofthis subsection. The
`algorithm continuesin this fashion until either the threshold has beensatisfied or the bottomlevel
`has been reached. The skipping of someintermediate-level calculations provides for computational
`saving. Experimental work with quite different motion complexities demonstrates that the proposed
`algorithm reducesthe processing time from 14 to 20%, while maintaining almost the same quality
`in the reconstructed image compared with the fastest existing multiresolution block matching
`algorithm (Tzovaras et al., 1994),
`
`
`Block matching
`
`and subsampling
`
`
`and subsampling
`
`
`
`
`
`
`
`Low passfiltering
`and subsampling
`
`
`
`Sausfying
`threshold
`
`Low pass filtering
`
`Block matching
`
`Low pass filtering
`and subsampling
`
`Satisfying
`threshold
`
`Low passfiltering
`
`Block matching
`
`Motion field
`
`FIGURE 11.8 Block diagram fora three-level threshold multiresolution block matching.
`
`IPR2018-01413
`Sony EX1008 Page 256
`
`IPR2018-01413
`Sony EX1008 Page 256
`
`
`
`Block Matching
`
`231
`
`a T
`
`ABLE 11.1
`Parameters Used in the Experiments
`
`
`
`Parameters at Level Low Resolution Level_Full Resolution Level
`
`Search range
`Block size
`Thresholding value
`
`Search range
`Block size
`
`Thresholding value
`
`Miss America
`3x3
`4x4
`2
`
`Train
`
`4x4
`4x4
`
`3
`
`Football
`
`Ix]
`8x8
`None (not applicable)
`
`Ixl
`8x8
`
`None (not applicable)
`
`Search range
`4x4
`Ixl
`Block size
`4x4
`gx 8
`4Thresholding value None (not applicable)
`
`
`
`
`
`Threshold Determination — The MADaccuracycriterion is used in this work for the sake of
`Saving computations. The threshold value has a direct impact on the performance ofthe proposed
`algorithm. A small threshold value can improve the reconstructed image quality at the expense of
`increased computational effort, On the other hand, a large threshold value can reduce the compu-
`tational complexity, but the quality of the reconstructed image may be degraded. One possible way
`to determine a threshold value, which was used in many experiments by Shi and Xia (1997), is as
`follows.
`The peak signal-to-noise ratio (PSNR) is commonly used as a measure of the quality of the
`reconstructed image. As introduced in Chapter1, it is defined as
`
`PSNR =10102,,
`
`255°
`MSE
`
`(11.6)
`
`Fromthe given required PSNR, one can find the necessary MSE value. A square rootofthis
`MSEvalue can be chosen as a threshold value, which is applied to the first two images from the
`sequence. If the resulting PSNR and required processing timeare satisfactory, it is then used for
`the rest of the sequence. Otherwise, the threshold can be slightly adjusted accordingly and applied
`to the secondand third imagesto check the PSNRand processing time. It was reported in numerous
`experiments that this adjusted threshold value was accurate enough, and that there was no need for
`further adjustment. As shown in Table 11.1,
`the threshold values used for the “Miss America,”
`“Train,” and “Football” sequences(three sequences having quite different motion complexities) are
`2, 3, and 4, respectively. They are all determined in this fashion and give satisfactory performance,
`as shown in the three rows marked “New Method (TH=2),” “New Method (TH=3)” and “New
`Method (TH=4),” respectively, in Table 11.2. That is, the PSNR experiences only about 0.1 dB loss
`and the processing time decreases drastically. In the experiments, the threshold value of 3, i.e., the
`average value of 2, 3, and 4, was also tried. Refer to the three rows marked “New Method (TH=3)"
`in Table 11.2. It is noted that this average threshold value 3 has already given satisfactory perfor-
`mance for all three sequences. Specifically, for the “Miss America” sequence, since the criterion
`increases from 2 to 3, the PSNRloss increases from 0.12 to 0.48 dB, and the reductionin processing
`lime increases from 20 to 38%. For the “Football” sequence, since the criterion decreases from
`4 to 3, the PSNRloss decreases from 0.08 to 0.05 dB, and the reduction in processing time decreases
`
`IPR2018-01413
`Sony EX1008 Page 257
`
`IPR2018-01413
`Sony EX1008 Page 257
`
`
`
`232
`
`Image and Video Compression for Multimedia Engineering
`
`from 14 to 9%. Obviously, for the “Train” sequence, the criterion, as well as the performance,
`remains the same. One can therefore conclude that the threshold determination may not require
`much computation atall.
`
`Thresholding — Motion vectors estimated at each pyramid level will be checked to see if they
`provide satisfactory motion compensation. Assume V!(i,j) =(v{, v\) is the estimated motion vector
`for block (i, j)', at level / of pyramid n. For thresholding, V! (/,/) should be directly projected to
`the bottom level L. The corresponding motion vector for the same block at the bottom level of
`pyramid n will be V4 (2¢-? 7,2) j), and is given as
`
`ye(atte getj) = att yl(i, 9)
`
`(11.7)
`
`The MAD betweentheblockat the bottom pyramid level of the current frame and its counterpart
`in the previous frame can be determined according to Equation 11.5, where the motion vectoris
`VE = VE (2) |,20-1) Ff). This computed MADvalue can be compared with the predefined threshold.
`If this MAD valueis less than the threshold, the computed motion vector V5 (2!) 7.2" j) wall
`be assigned to block (2i,2"j)- at level L in the current frame and motion estimation for this
`block will be stopped. If not, the estimated motion vector V!(i, }) at
`level
`/ will be propagated to
`level / + | for further refinement. Figure 11.9 gives an illustration ofthe above thresholding process.
`
`Experiments — Toverify the effectiveness of the proposed algorithm, extensive experiments have
`been conducted. The performance of the new algorithm is evaluated and compared with that of
`Method1, oneofthe mostefficient multiresolution block matching methods (Tzovaraset al., 1994)
`in terms of PSNR,error image entropy, motion vector entropy, the number of blocks stopped at
`the top level vs. the total number of blocks, and processing time. The number of blocks stopped
`at the top level is the number of blocks withheld from further processing, while the total number
`of blocks is the numberof blocks existing at
`the top level. It
`is noted that the (otal number of
`blocksis the samefor each levelin the pyramid. The processing time is the sum ofthe total number
`of additions involved in the evaluation of the MAD and the thresholding operation.
`In the experiments, two-level pyramids are used since they give better performance for motion
`estimation purposes (Tzovaras et al., 1994). The algorithms are tested on three video sequences
`with different motion complexities, i.e., the “Miss America,” “Train,” and “Football.” The “Miss
`America” sequence has a speaker imposed on a static background and contains less motion. The
`“Train” sequence has moredetail and contains a fast-moving object (train). The 20th frameof the
`sequenceis shownin Figure 11.10. The “Football” sequence contains the most complicated motion
`
`Pyramid
`n-]
`
`LE
`
`Estimation of motion vector
`ofa block at level |
`
`
`Pyramid
`n
`
`LF
`
`Pyramid
`level
`
`Projection of
`the block and
`its estimated
`motion vector
`at level L
`
`Calculation of the MAD of
`the block at level L
`
`
`
`FIGURE 11.9
`
`Thethresholding process.
`
`IPR2018-01413
`Sony EX1008 Page 258
`
`IPR2018-01413
`Sony EX1008 Page 258
`
`
`
`Block Matching
`
`233
`
`FIGURE 11.10 The 20th frame of the “Train” sequence.
`
`
`etee tttaseealanee coed
`
`FIGURE 11.11 The 20th frame in the “Football” sequence.
`
`compared with the other two sequences. The 20th frame is shown in Figure 11.11. Table 11.1 is
`the list of implementing parameters used in the experiments. Tables 11.2 and 11.3 give the perfor-
`manceofthe proposed algorithm compared with Method1. In all three cases, the motion estimation
`has a half-pixel accuracy, the meaning of which will be explained in the next section. All perfor-
`mance measureslisted there are averaged for the first 25 frames ofthe testing sequences.
`Each frameof the “Miss America” sequence is of 360 x 288 pixels. For convenience, only the
`central portion, 320 x 256 pixels, is processed. Using the operational parameters listed in Table 11.1
`(with a criterion value of 2), 38% of the total blocks at the top level satisfy the predefined criterion
`and are not propagated to the bottom level. The processing time needed by the proposed algorithm
`is 20% less than Method ‘1, while the PSNR, the error image entropy, and the vector entropy are
`almost the same. Compa