`
`’|
`
`Block Matching
`
`As mentioned in the previous chapter. displacement vector measurement and its usage in motion
`compensation in intei li'ai‘i‘ic. coding ['or it TV signal can be traced back to the 19703. Netravali and
`Robbins ([9791 developed a pet-recursive technique. which estimates the displacement vector for
`each pixel recursively Ironi its neighboring pixels using an optimization method. Limb and Murphy
`(1975). Rocca and 7.;tnoletti Il'El‘i'E). Cal‘l‘orio and Rocea (1W6). and Broilerio and Rocco (1977)
`developed techniques for the estimation of diSplaccntent vectors of a block of pixels.
`[11 the latter
`approach. an image Is
`lirst segmented into areas with each having an approximately uniform
`translation. Then the motion vector is estimated for each area. The segmentation and motion
`estimation asst-mated with these arbitrarily shaped blocks are very dilliculi. When there are multiple
`moving areas in images. the situation becomes more challenging. In addition to motion vecmt‘s.
`the shape iiili‘ii'iiiatioii ot‘ these areas needs to be coded. Hence. when moving areas have various
`complicated shapes. hotli computational complexity and coding load will increase remarkably.
`In contrast,
`the block matching technique. Which is
`the locus of this chapter.
`is simple,
`straightforward. and yet very cllictent.
`It has been by far the most popularly utilized motion
`estimation technique in video coding In fact. it has been adopted by all the international video
`coding standards: lSO. MPEG—1 and MPEG-2. and [TU H.261. and H.263. These standards will
`be introduced in detail in Chapters lo.
`l'i'. and IQ. respectively.
`it is interesting to note that even nowadays, with the tremendous advancements in multimedia
`engineering. object-based andlor content-based manipulation ofiiudiovisual information is still very
`demanding. particularly in audiovisual data storage. retrieval. and distribution. The applications
`include digital library. video on demand. audiovisual databases. and so on. Therefore. the coding
`of arbitrarily shaped objects has attracted great research attention these days. It has been included
`in the MPEG—4 activities (Brailean. 199?}. and will he discussed in Chapter 18.
`In this chapter various aspects of block matching are addressed. They include the concept and
`algorithm. matching criteria. searching strategies. limitations. and new improvements.
`
`11.1 NONOVERLAPPED. EQUALLY SPACED. FIXED SIZE,
`SMALL RECTANGULAR BLOCK MATCHING
`
`To avoid the kind of difficulties encountered ll'l motion estimatEOn and motion compensation with
`arbitrarily shaped blocks. the block matching technique was prop05cd by Jain and Jain (198 I) based
`on the following simple motion model.
`_
`’
`An image is partitioned into a set of nonoverlapped. equally spaced. llxed 812C. Emil” rectangular
`blocks: and the translation motion within each block is assumed to be uniform. Although this Simple
`model considers translation motion only. other types of motions, such as rotation and zooming of
`large Obit-1018, may be closely approximated by the piecewise translation of these small blocks
`provided that these blocks are small enough. This observation. originally made by Jain and Jain.
`has been confirmed again and again since then.
`.
`‘
`Di5placement vectors for these blocks are estimated by finding their best matched counterparts
`in the previous frame.
`In this manner. motion estimation is significantly easier than that for
`arbitrarily shaped blocks. Since the motion of each block is described by only one displacement
`“301m.
`the side information on motion vectors decreases. Furthermore.
`the rectangular shape
`
`221
`
`|PR2018—01413
`
`Sony EX1008 Page 247
`
`IPR2018-01413
`Sony EX1008 Page 247
`
`
`
`222
`
`image and Video Compression for Multimedia Engineering
`
`
`the best block matching
`
`correlation window
`
`_|III displacement vector
`
`[Illa-III.-
`IIIIIIIIIII
`
`
`
`An original block
`
`(a) t. frame
`
`[13) [M frame
`
`FIGURE 11.1 Block matching.
`
`information is known to both the encoder and the decoder. and hence does not need to be chOdEd.
`
`which saves both computation load and side information.
`the smaller the block size. the more
`The block size needs to be chosen properly. In general.
`accurate is the approximatiOn. It is apparent. however. that the smaller block size leads to more
`motion vectors being estimated and encoded. which means an increase in both computation and
`side information. As a compromise. a size of 16 x 16 is considered to he a good choicc. (This “35
`been specified in international video coding standards such as H.26I, H.2o3. and MPEG-1 and
`MPEG-2.) Note that for finer estimation :1 block size of 8 X S is sometimes used.
`Figure 11.] is utilized to illustrate the block stretching technique.
`in Figure ll.l(a) an image
`Frame at moment in is segmented into nonoverlapped p x q rectangular blocks. As mentioned above.
`in common practice. square blocks ofp = q = 16 are used most ol'ten. Consider one of the blocks
`centered at (x. 3;). It is assumed that the block is translated as a whole. Consequently. only 0113
`displacement vector needs to be estimated for this block. Figure l l.l(b) shows the previous frame:
`the frame at moment t“. In order to estimate the displacement vector. a rectangular search window
`is opened in the frame rm. and centered at the pixel (it. y). Consider a pixel in the search window.
`a rectangular correlation window of the same size p x r; is opened with the pixel
`located in its
`center. A certain type of similarity measure (correlation) is calculated. After this matching process
`has been completed for all candidate pixels in the search window, the correlation window corre—
`spending 10 the 131363! Similarity becomes the best match of the block under consideration in frame
`r... The relative position between these two blocks (the biock and its best match) gives the displace—
`ment vector. This is shown in Figure 1 1.1(b).
`The size of the search window is determined by the size of the correlation window and the
`maximum POSSibIB diSplacement along four directions: upward, downward. rightward. and leftward.
`1“ Figure l 1-2 these four quantities are assumed to be the same and are denoted by d. Note that r!
`is estimated from apr‘r’orr' knowledge about the translation motion. which includes the largest
`PDSSible motion Speed and the temporal interval between two consecutive trainer». i.e.. 1.. ‘ r"...
`
`1 1 .2 MATCHING CRITERIA
`
`Block matching belongs to image matching and can be viewed from a wider perspective. In mil"),
`image processing tasks. we need to examine two images or two portions of images on a PiXEI‘bY'P'xel
`
`|PR2018—01413
`
`Sony EX1008 Page 248
`
`IPR2018-01413
`Sony EX1008 Page 248
`
`
`
`Block Matching
`
`223
`
`q+2d
`
`l
`
`FIGURE 11.2
`
`Search window and correlation window.
`
`basis. These two images or two image regions can be selected From a spatial image sequence. i.e.,
`From two ['rames taken at the same time with two different sensors aiming at the same object. 01'
`from a temporal image sequence. i.e.. from two frames taken at two different moments by the same
`sensor. The purpose at the examination is to determine the similarity between the two images or
`two portions of images. Examples of this type of application include image registration (Pratt,
`1974) and template matching (Jain,
`l989). The former deals with spatial registration of images.
`while the latter extracts andi’or i'cCOgnixes an object in an image by matching the object template
`and a certain area of the image.
`The similarity measure. or correlation measure. is 21 key element in ”“3 matching process. The
`basic correlation measure between two images I" and r,,_,. C (s. t). is defined as follows (Anuta. 1969).
`
`I):
`
`2.” z:tf“(U‘k)ffl_l[j+s,k+t)
`:41“..kaMix: Hff—I(j+5.k+!)2
`
`.
`
`J2;
`
`(11.1)
`
`This is also referred to as a normalized two-dimensional cross-correlation function (Musmann et at.
`1985)
`Instead ot finding the maximum similarity or correlation an equivalent but yet mote compu-
`tationally efficient way of block matching is to find the minimum dissimilarity. or matching error.
`The dissimilarity (sometimes relerred to as the cum distortion. or distance) between two images
`I" and t“. D (s, t) is defined as tollows.
`
`D('§‘(!)=—EZMUnflj’klfJt-f_|(j+5,.k+f))
`
`j==lkl
`
`(11.2)
`
`where M(u,v) is a metric that measures the dissimilarity between the two arguments u and v. The
`D (S. t) is also referred to as the matching criterion or the D values.
`In the literature there are several types of matching criteria. among which the mean square
`error (MSE) (Jain and Jain. 1981) and mean absolute dill'erence (MAD) (Koga et al., I981) are
`used most often. It is noted that the sum of the squared difference (SSD) (Anandan, 1987) or the
`Sum of the squared error (SSE) (Chan et al.. 1990) IS essentially the same as MSE. The mean
`
`|PR2018—01413
`
`Sony EX1008 Page 249
`
`IPR2018-01413
`Sony EX1008 Page 249
`
`
`
`224
`
`Image and Video Compression for Multimedia Engineering
`
`absolute difference is sometimes referred to as the mean absolute error (MAE) in the literature
`
`(Nogaki and Ohta. 1972).
`In the MSE matching criterion. the dissimilarity metric M (L1. v) is defined as
`
`In the MAD.
`
`’1
`
`M(tt.v)=(rr—v)‘.
`
`M(n.v):|tr—vl.
`
`[l l.3)
`
`{11.4}
`
`Obviously, both criteria are simpler than the normalized two-dimensional cross-correlation measure
`defined in Equation 11.].
`Before proceeding to the next section. a comment on the selection ot' the dissimilarity measure
`is due. A studyr based on experimental works reported that the matching criterion does not signif-
`icantly affect the search {Srinivasam 1984). Hence. the MAD is preferred due to its simplicity in
`implementation (Musmann ct al.. 1985).
`
`11.3
`
`SEARCHING PROCEDURES
`
`'I‘he searching strategy is another important issue to deal with in block matching. Several searching
`strategies are discuused below.
`
`11.3.1
`
`FULL SEARCH
`
`Figure 1 1.2 shows a search window, a correlation window, and their sizes. In searching for the best
`match. the correlation window is moved to each candidate position within the search window. That
`is. there are a total (2 d+l) x (2 d+l) positions that need to be examined. The minimum dissimilaril)‘
`gives the best match. Apparently, this full search procedure is brute force in nature. While the full
`search delivers good accuracy in searching for the best match (thus, good accuracy in motion
`estimation). a large amount of computation is involved.
`In order to lower computational complexity. several
`developed. They are introduced below.
`
`fast searching procedures have been
`
`11.3.2
`
`2-D [OCARITHMIC SEARCH
`
`Jain and Jain (1931) developed a 2-D logarithmic searching procedure. Based on a 1—D logarithmic
`search procedure (Kmflh. 1973). the 2-D procedure successively reduces the search area.
`thus
`reducing the computational burden. The first steps computes the matching criteria for five 1301““;
`i“ the search window. Thesc five Points are as follows: the central point of the search window and
`the four points surrounding it. with each being a midpoint between the central point and one of
`the four boundaries of the window. Among these five points. the one corresponding to the minimum
`dissimilarity is picked as the winner. In the next step. surrounding this winner. another set of five
`points are selected in a similar fashion to that in the first step. with the distances between the five
`points remaining unchanged. The exception takes place when either a central point of a set of five
`points 01' 3 boundary POW 0m“? SCEIICh Window gives a minimum D value. In these circumstanceS.
`the distances between the five points need to be reduced. The procedure continues until the final
`step. in which a set of candidate points are located in a 3 x 3 2-D grid. Figure l 1.3 demonstrates
`No cases of the procedure. Figure 11.3(a) shows that the minimum D value takes place on a
`boundary. while Figure 11.3(b) shows the minimum D value in the central position.
`
`|PR2018—01413
`
`Sony EX1008 Page 250
`
`IPR2018-01413
`Sony EX1008 Page 250
`
`
`
`Block Matching
`
`225
`
`
`
`01}
`
`(a) A 2-D logarithmic search procedure. Points at (i. k+2L 0+2: k‘l'2)- 0+2: “4}: ancl 0+}:
`FIGURE 11.3
`k+4J are found to give the minimum dissimilarity in steps 1. 3. 3. and 4. IE§P€CIIV€ly:
`(‘b} A 2? loganl-hmic
`. search procedure. Points at U. k,2)‘ 0+2, 192). and 0+2. k.]) are found to give the minimum diSSlmiIarity In
`sleps ]. 2. 3, and 4, respectively.
`
`A convergence proofofthe procedure is presented by Jain and Jain (1931). under the assumption
`that the dissimilarity monotonically increases as the search pom: moves away from the point
`correSponding to the minimum dissimilarity.
`
`|PR2018—01413
`
`Sony EX1008 Page 251
`
`IPR2018-01413
`Sony EX1008 Page 251
`
`
`
`226
`
`if on
`
`W
`
`Image and Video Compre55ion for Mu|timedia Engineering
`
`
`3% ‘b”#1gush-l
`
`'0'ta0
`
`I:0Go,a
`
`Three—step search procedure. Points (j+~'l. k—4). U+4. k—o), and tj+5. k-7) give the minimum
`FIGURE 11.4
`dissimilarity in steps 1, 2. and 3. respectively.
`
`11.3.3
`
`COARSE—FINE THREE-STEP SEARCH
`
`Another important work on the block matching technique was completed at almost the same time
`by Koga et a1. (1981). A coarse-fine three—step procedure was developed for fast searching.
`The three-step search is very similar to the 2—D logarithm search. There are, however, three
`main differences between the two procedures. First, each step in the three-step search COITIpfll’BS‘fl
`set of nine points that form a 3 x 3 2—D grid structure. Second, the distances between the points tn
`the 3 X 3 2—D grid structure in the three-step search decrease monotonically in steps 2 and 3. Third,
`a total of Only three steps are carried out, Obviously. these three items are different from the 2—D
`logarithmic search described in Section 11.3.2. An illustrative example of the three-step search is
`shown in Figure 11.4.
`
`11.3.4 CONJUGATE DIRECTION SEARCH
`
`The conjugate direction search is another fast search algorithm that was developed by Srinivasan
`and R30 (1934)- In PfinCiPle. the procedure consists of two parts. In the first part,
`it findsthe
`minimum dissimilarity along the horizontal direction with the vertical coordinate fixed at an’initial
`position. In the second part, it finds the minimum D value along the vertical direction With the
`horizontal coordinate fixed in the position determined in the first part. Starting with the vertical
`direction followed by the horizontal direction is, of course, functionally equivalent. It was reported
`that this search procedure works quite efficiently (Srinivasan and Rao, 1984).
`Figure 11.5 illustrates the principle of the conjugate direction search. In this example, each
`step involves a comparison between three testing points. If a point assumes ihE-mlmm-um D V3126
`compared with both of its two immediate neighbors (in one direction), then it ts consrdereld to“ 6
`the best match along this direction, and the search along another direction is started. Spwffi‘: I);
`the procedure starts to compare the D values tor three points (i, k—-l), (i, k), and (i, k+l)- 1_ :1)
`value of point (j, k—l) appears l0 be the minimum among the three, then points (1. k-Z). (l.
`.
`
`|PR2018—01413
`
`Sony EX1008 Page 252
`
`IPR2018-01413
`Sony EX1008 Page 252
`
`
`
`Block Matching
`
`227
`
`
`
`FIGURE 11.5 Conjugate direction search.
`
`and (j. k) are examined. The procedure continues. finding point (j, k—3) as the best match along
`the horizontal direction since its D value is smaller than that of points (j, k—4} and (j, k—2). The
`procedure is then conducted along the vertical direction. In this example the best matching is finally
`found at point lj+2t k—3).
`
`11.3.5
`
`SUBSAMPLING IN THE CORRELATION WINDOW
`
`In the evaluation of the matching criterion. either MAD or MSE, all pixels within a correlation
`window at the {M frame and an original block at the I” frame are involved in the computation. Note
`that the correlation window and the original block are the same size (refer to Figure l 1.1). In order
`to further reduce the computational effort. a subsantpling inside the window and the block is
`performed (Bierling, 1988). Aliasing effects can be avoided by using low-pass filtering. For instance.
`GHIY every second pixel, both horizontally and vertically inside the window and the block, is taken
`into account for the evaluation of the matching criterion. Obviously, by using this subsampling
`technique, the computational burden is reduced by a factor 0M. Since 3.34 of the pixels within the
`window and the block are not involved in the matching computation. however, the use of such a
`suhsampling procedure may affect the accuracy of the estimated motion vectors. especially in the
`case of small-size blocks. Therefore.
`the subsampting technique is recommended only for those
`cases with a large enough block size so that the matching accuracy will not be seriously affected.
`Figure l 1.6 shows an example of 2 x 2 subsampling applied to both an original block of 16 x 16
`at the I" frame and a correlation window of the same size at the t,” frame.
`
`11.3.6 MULTIRESOLUTION BLOCK MATCHING
`
`is a very
`is well known that a muttiresolution structure. also known as a pyramid structure.
`It
`Powerful computational configuration for various image processing tasks. To save computation in
`block matching, it is natural to resort to the pyramid structure. In fact, the multiresolution technique
`has been regarded as one of the most efficient methods in block matching (Tzovaras et al., 1994).
`In a named top-down multiresolution technique, a typical Gaussian pyramid is formed first.
`
`|PR2018—01413
`
`Sony EX1008 Page 253
`
`IPR2018-01413
`Sony EX1008 Page 253
`
`
`
`223
`
`Image and Video Compression for Multimedia Engineering
`
`an original block
`
`a correlation windmi-
`
`(a) An original block of l6>¢15 in fi'ame at I .
`
`(b) A correlation mndnw of 15): I 5 Ln Emma atl
`
`FIGURE 11.6 An example of 2 x 2 subsarnpling in the original block and correlation window for a faSI
`search.
`
`Before diving into further description. let us pause here to give those readers who have not been
`exposed to the Gaussian pyramid a short introduction to the concept. For those who know the
`coacept. this paragraph can be skipped. Briclly speaking, a Gaussian pyramid can he understood
`as a set of images with different resolutions related to an original image in a certain way. The
`original image has the highest resolution and is considered as the lowest level. sometimes called
`the bottom level.
`in the set. From the bonorn level
`to the top level.
`the resolution decreases
`monotonically. Specifically, between two consecutive levels, the upper level is half as large as the
`lower level in both horizontal and vertical directions. The upper level is generated by applying a
`low-pass filter {which has a group of weights) to the lower level. followed by a 2 >< 2 subsampllng‘
`That is. each pixel in the upper level is a weighted average of some pixels tn the lower level
`In
`general, this iterative procedure of generating a level in the set is equivalent to convolving a specific
`weight function With the original image at the bottom level followed by an appropriate subsantpling'
`Under certain conditions. these weight functions can closely approximate the Gaussian probabilit)’
`density fUflCtion. which is why the pyramid is named after Gauss. (For a detailed discussion. readers
`are referred to Burt and Adelson [1933. 1984].)l A Gaussian pyramid structure is depicted tn
`Figure It]. Note that the Gaussian pyramid depicted in Figure l 1.? resembles a so-called quad—
`trce structure in which each node has four children nodes. In the simplest quad-tree pyramid, eaCh
`pixel in an upper level is assigned an average value of its correSponding four pixels in the DES“
`lower level.
`
`Now let‘s return to our discussion on the top~down multiresolution technique After a Gaussran
`Piramid has been constructed. motion search ranges are allocated among the different Pyram'd
`levels. Block matching is initiated at the lowest resolution level to obtain an initial estimation of
`motion vectors. These computed motion vectors are then propagated to the next higher resolution
`level. where they are corrected and then propagated to the next level. This procedure continues
`until the highest resolution level is reached. As a result. a large amount of computation can be
`saved. Tzovaras et a1. (1994) showed that a two~level Gaussian pyramid outperforms a three-level
`pyramid. Compared with full search block. matching. the top-down multiresolution block searClt
`saves up to 67% ofcomputations withoutseriously affecting the quality of the reconstructed images-
`In conclusion. it has been demonstrated that multiresolution is indeed an efficient compuml‘onal
`structure in block matching. This once again confines the high computational efficiency of ”“3
`multiresolution structure.
`
`|PR2018—01413
`
`Sony EX1008 Page 254
`
`IPR2018-01413
`Sony EX1008 Page 254
`
`
`
`Block Matching
`
`229
`
`level 4
`
`level
`increasing
`resolution
`
`de
`
`tag
`
`level I
`
`
`level 3
`
`level 2
`
`FIGURE 11.7 Gaussian pyramid structure.
`
`11.3.7 THRESHOLDING MULTIRESOLUTIDN Btocx MATCHING
`
`With the multiresolution technique disenssed above. the computed motion vectors at any interrac—
`diate pyramid level are projected to the next higher resolution level. In reality. some computed
`motion vectors at
`the lower resolution levels may be inaccurate and have to be further refined.
`while others may be relatively accurate and able to provide satisfactory motion compensation for
`the corresponding block. From a computation-saving point of view. for the latter class it may not
`be worth propagating the motion vectors to the next higher resolution level For further processing.
`Motivated by the above observation,
`at new inultircsolution block matching method with a
`thresholding technique was developed by Shi and Xia (1997). The thresholding technique prevents
`those blocks, whose estimated motion vectors provide satisfactory motion compensation. from
`further processing. thus saving a lot of computation. In what follows. this technique is presented
`in detail so as to proVide readers with an insight to both multiresolution block matching and
`thresholding multiresolution block matching techniques.
`
`Algerithm —— Let fact, y) be the frame of an image sequence at current moment it. First. two
`Gaussian pyramids are formed, pyramids u and n — I. from image frames fits, y) and fi,_,(x.y).
`respectively. Let the levels of the pyramids be denoted by l. l: 0.
`l.
`L, where 0 is the lowest
`resolution level (rep level). L is the full resolution level (bortom level), and L+l is the total number
`0f layers in the pyramids. If (Lj) are the coordinates of the upper-left corner of a block at level l
`01" pyramid n. the block is referred to as block (i.j),§. The horizontal and vertical dimensions of a
`block at level I are denoted by b; and b'y. respectively. Like the variable block size method (refer
`to Method 1
`in Tzovaras et al. {1994]}, the size of the block in this work varies with the pyramid
`levels. That is. ifthe size ofa block at level i is bi, then the size of the block at level H» 1 becomes
`2b; 3': 2b;. The variable block size method is used because it gives more efficient metion estimation
`than the fixed block size method. Here. the matching criterion used for motion estimation is the
`MAD because it does not require ntultiplication and performs similar to the MSE. The MAD
`between block (i._i)'b,'1 of the current frame and block (i + v‘, j + v,)'b:._l of the previous frame at
`level l can be calculated as
`
`|PR2018—01413
`
`Sony EX1008 Page 255
`
`IPR2018-01413
`Sony EX1008 Page 255
`
`
`
`230
`
`Image and Video Compression for Multimedia Engineering
`
`MADW],[V_:.1’:_)=fiE Elfif(t+k,j+ m)—fif_,(r+ k + vfi.j+ m + vi)!
`
`3' k1“ mzfl
`
`{115)
`
`bl-t bi-l
`
`where V1 = (vi. vy'.) is one of the candidates of the motion vector of block (id-)1.- u". it: are the two
`components of the motion vector along the it and y directions, respectively.
`A block diagram of the algorithm is shown in Figure 11.8. The threshold in terms of MAD
`needs to be determined in advance according to the accuracy requirement ol‘ the motion estimation
`Determining the threshold is discussed below in Part B of this subsection. Gaussian pyramids are
`formed for two consecutive frames of an image sequence from which motion estimation is desired
`Block matching is then performed at
`the top level with the full-search scheme. The estimated
`motion vectors are checked to see if they provide satisfactory motion compensation. it the accuracy
`requirement is met. then the motion vectors will be directly transformed to the. bottom level of the
`pyramid. Otherwise, the motion vectors will be propagated to the next higher resolution level for
`further refinement. This thresholding process is discussed below in Part C of this Subsection. The
`algorithm continues in this fashion until either the threshold has been satisfied or the bottom level
`has been reached. The skipping of some intermediate~level calculations provides for computational
`saving. Experimental work with quite different motion complexities demonstrates that the proposed
`algorithm reduces the processing time from 14 to 20%. while maintaining almost the same quality
`in the reconstructed image compared with the fastest existing mnltiresolution block matching
`algorithm (Tzovaras et al.. t994).
`
`Block matching
`
`Low pass filtering
`and subsampling
`
`Block matching
`
`Satisfying
`thrflhold
`
`Low pass filtering
`
`
`
`and substimpling
`
`
`
`and suhsarnpling
`
`
`Low pass filtering
`
` Satisfying
`
`
`Low pass filtering
`and subsampling
`
`threshold
`
`Motion field
`
`
`FIGURE 11.8 Block diagram for a three-level threshold multiresolution block matching-
`
`|PR2018—01413
`
`Sony EX1008 Page 256
`
`IPR2018-01413
`Sony EX1008 Page 256
`
`
`
`Block Matching
`
`231
`
`——_._._—_____—_____—
`
`TABLE 11.1
`
`Parameters Used in the Esperiments
`
`Parameters at Level
`
`Low Resolution Level
`
`Full Resolution level
`
`Search range
`Block size
`
`Thresholding value
`
`Search range
`Block size
`
`Tltrcsholding value
`
`MiSS America
`3 x 3
`4 X 4
`
`| x l
`3 x 8
`
`2
`
`Train
`
`4 a 4
`4x4
`
`3
`
`Football
`
`None {not applicable}
`
`Ix I
`8x8
`
`None (not applicable)
`
`Search range
`4 x 4
`l x |
`Block size
`4 a: 4
`Bx 8
`4Thresholding value None (not applicable)
`
`
`
`
`
`Threshold Determination — The MAD accuracy criterion is used in this work for the sake of
`saving computations. The threshold value has a direct impact on the performance of the proposed
`algorithm. A small threshold value can improve the reconstructed image quality at the expense of
`increased computational effort. On the other hand. a large threshold value can reduce the compu-
`tational complexity. but the quality of the reconstructed image may be degraded. One possible way
`to determine a threshold value. which was used in many experiments by Shi and Xia U997), is as
`follows.
`
`The peak signal-to-noise ratio (PSNR) is commonly used as a measure of the quality of the
`reconstructed image. As introduced in Chapter 1. it is defined as
`
`PSNR = 10 logm
`
`
`2552
`MSE
`
`(t 1.6)
`
`From the given required PSNR, one can find the necessary MSE value. A square root of this
`MSE value can be chosen as a threshold value, which is applied to the first two images from the
`sequence. If the resulting PSNR and required processing time are satisfactory. it is then used for
`the rest of the sequence. Otherwise. the threshold can be slightly adjusted accordingly and applied
`to the second and third images to check the PSNR and processing time. it was reported in numerous
`experiments that this adjusted threshold value was accurate enough. and that there was no need for
`further adjustment. As shown in Table NJ, the threshold values used for the “Miss America.“
`“Train." and “Football" sequences (three sequences having quite different motion complexities) are
`2. 3, and 4, respectively. They are all determined in this fashion and give satisfactory performance.
`as shown in the three rows marked “New Method (TH=2}," ”New Method (TI—1:3)" and “New
`Method (TH=4),“ respectively, in Table l 1.2. That is, the PSNR experiences only about 0-1 dB 1055
`and the processing time decreases drastically. In the experiments. the threshold value of 3. Le, the
`average value of2, 3. and 4. was also tried. Refer to the three rows marked "New Method {TH=3}
`in Table 11.2. It is noted that this average. threshold value 3 has already given satisfactory perfor-
`mance for all three sequences. Specifically, for the "Miss America“ sequence, since the criterion
`increases from 2 to 3, the PSNR loss increases from 0.12 to 0.48 dB. and the reduction in processing
`time increases from 20 to 38%. For the ”Football“ sequence, since the criterion decreases from
`4 10 3. the PSNR loss decreases from 0.08 lo 0.05 dB, and the reduction in PFOCBSSIUS time decreases
`
`|PR2018—01413
`
`Sony EX1008 Page 257
`
`IPR2018-01413
`Sony EX1008 Page 257
`
`
`
`232
`
`image and Video Compression for Multimedia Engineering
`
`from 14 to 9%. Obviously. for the "Train“ sequence. the criterion, as well as the performance,
`remains the same. One can therefore conclude that the threshold determination may not require
`much computation at all.
`
`Thresholding — Motion vectors estimated at each pyramid level will be checked to see it they
`provide satisfactory motion compensation. Assume V’ (Lj) = (vi. vi) is the estimated motion vector
`for block (i.j)in at level i of pyramid H. For thresliolding. V’ {Lj} should be directly projected to
`the bottom level L. The corresponding motion vector for the same block at the bottom level of
`pyramid it will be VL (2“‘43 r'.2“-'“j), and is given as
`
`VL(2“-'”t. THU): Z‘L'i'V’lLJ)
`
`(11.7}
`
`The MAD between the block at the bottom pyramid level of the current frame and its counterpart
`in the previous frame can be determined according to Equation 1 IS. where the motion vector is
`Vb: VL (Eur-ll LEW-”J0. This computed MAD value can be compared with the predefined threshold.
`If this MAD value is less than the threshold. the computed motion vector l/L (Z‘H’ i.2""“j) will
`be assigned to block (2"~"" i.2"-‘”j){; at level L in the current frame and motion estimation for this
`block will be stopped. If not. the estimated motion vector V“ (i._l) at
`level
`1’ will be propagated to
`level l+ l for further refinement. Figure 11.9 gives an illustration of the above thresholding process.
`
`Experiments — To verify the effeCIiveness of the proposed algorithm. extensive experiments have.
`been conducted. The performance of the new algorithm is evaluated and compared with that of
`Method 1. one of the most efficient multiresolution block matching methods (Trovaras et at. 1994}
`in terms of PSNR, error image entropy, motion vector entropy, the number of blocks stopped at
`the top level vs. the total number of blocks, and processing time. The number of blocks stopped
`at the top level is the number of blocks withheld from further processing. whilc the total number
`of blocks is the number of blocks existing at
`the top level. It
`is noted that the total number of
`blocks is the same for each level in the pyramid. The processing time is the sum ot’ the total number
`of additions involved in the evaluation of the MAD and the thresholding operation.
`In the experiments. two-level pyramids are used since they give better performance for motion
`estimation purposes [Tzovaras etal.. 1994). The algorithms are tested on three video sequences
`with different motion complexities. i.e.. the “Miss America." “Train." and “Football." The “Miss
`America" sequence has a Speaker imposed on a Static background and contains less motion. The
`“Train" sequence has more detail and contains a fast-moving object (train). The 20th frame of the
`sequence is shown in Figure l 1.10. The “Football“ sequence contains the most complicated motion
`
`Prrmfid
`n-l
`
`fl
`
`Estimation of motion vector
`ofa block at level I
`
`
`Pyramid
`:1
`
`f
`
`Pyramid
`level
`
`I
`
`Projection of
`the block and
`its esnma'ted
`motion vector
`at level I.
`
`Calculation of the MAD of
`the block at level I.
`
`
`
`FIGURE 11.9 The thresholding proceSs.
`
`|PR2018—01413
`
`Sony EX1008 Page 258
`
`IPR2018-01413
`Sony EX1008 Page 258
`
`
`
`Block Matching
`
`233
`
`
`
`FIGURE 11.111 The ltttlt frame of the “Train" sequence.
`
`‘ F
`
`IGURE [1.1] The 20th frame in the “Football" sequence.
`
`.‘h >"I—""":"'""_-_a'9' _...
`
`mmPared with the other two sequences. The 20th frame is shown in Figure 11.11. Table 11.1 is
`the list of implementing parameters used in the experiments. Tables 1 1.2 and l [.3 give the perfor~
`mance ofthe proposed algorithm compared with Method 1. In all three cases, the motion estimation
`has a half-pixel accuracy, the meaning of which will be explained in the next section. All perfor—
`mance measures listed there are averaged for the first 25 frames of the testing sequences.
`Each frame of the ”Miss America" sequence is of 360 x 288 pixels. For convenience, only the
`central