throbber
|
`
`Block Matching
`
`As mentionedin the previous chapter, displacement vector measurement andits usage in motion
`compensation in interframe coding for a TV signal can be traced back to the 1970s. Netravali and
`Robbins (1979) developed a pel-recursive technique, whichestimates the displacement vector for
`eachpixel recursively fromits neighboring pixels using an optimization method. Limb and Murphy
`(1975), Rocca and Zanoletti (1972), Cafforio and Rocea (1976), and Brofferio and Rocca (1977)
`developed techniques for the estimation of displacement vectors of a block ofpixels. In the latter
`approach, an image is first segmented into areas with each having an approximately uniform
`translation. Then the motion vector is estimated for each area. The segmentation and motion
`estimation associated withthese arbitrarily shaped blocks are very difficult. When there are muluple
`moving areas in images, the situation becomes more challenging. In addition to motion vectors,
`the shape information of these areas needs to be coded. Hence, when moving areas have various
`complicated shapes, both computational complexity and coding load will increase remarkably.
`In contrast,
`the block matching technique, which is
`the focus of this chapter,
`is simple,
`straightforward, and yet very efficient.
`It has been by far the most popularly uulized motion
`estimation technique in video coding.
`In fact, it has been adopted by all the international video
`coding standards; ISO, MPEG-I and MPEG-2, and ITU H.261, and H.263. These standards will
`be introduced in detail in Chapters 16, 17, and 19, respectively.
`It is interesting to note that even nowadays, with the tremendous advancements in multimedia
`engineering, object-based and/or content-based manipulation of audiovisual informationis still very
`demanding, particularly in audiovisual data storage, retrieval, and distribution. The applications
`include digital library, video on demand, audiovisual databases, and so on. Therefore, the coding
`of arbitrarily shaped objects has attracted great researchattention these days. It has been included
`in the MPEG-4 activities (Brailean, 1997), and will be discussed in Chapter 18,
`In this chapter various aspects of block matching are addressed. They include the concept and
`algorithm, matching criteria, searching strategies, limitations, and new improvements.
`
`11.1 NONOVERLAPPED, EQUALLY SPACED, FIXED SIZE,
`SMALL RECTANGULAR BLOCK MATCHING
`
`To avoid the kind of difficulties encountered in motion estimation and motion compensation with
`arbitrarily shaped blocks, the block matching technique was proposed by Jain and Jain (1981) based
`on the following simple motion model.
`An imageis partitioned into a set of nonoverlapped, equally spaced,fixed size, small rectangular
`blocks; and thetranslation motion within each block is assumed to be uniform. Althoughthis simple
`model considers translation motion only, other types of motions, such as rotation and zooming of
`large objects, may be closely approximated by the piecewise translation of these small blocks
`provided that these blocks are small enough. This observation, originally made by Jain and Jain,
`has been confirmed again and again since then.
`Displacementvectors for these blocks are estimated by finding their best matched counterparts
`in the previous frame.
`In this manner, motion estimation is significantly easier than that for
`arbitrarily shaped blocks. Since the motion of each block is described by only one displacement
`vector,
`the side information on motion vectors decreases. Furthermore,
`the rectangular shape
`
`221
`
`IPR2018-01413
`Sony EX1008 Page 247
`
`IPR2018-01413
`Sony EX1008 Page 247
`
`

`

`222
`
`Image and Video Compression for Multimedia Engineering
`
`the best block matching
`
`correlation window
`
` rth displacementvector
`
`
`
`An original block
`
`(a) t, frame
`
`(b)t,., frame
`
`FIGURE 11.1 Block matching.
`
`information is known to both the encoderand the decoder, and hence does not needto be encoded,
`which saves both computation load and side information,
`the smaller the block size, the more
`The block size needs to be chosen properly. In general,
`accurate is the approximation. It is apparent, however, that the smaller block size leads to more
`motion vectors being estimated and encoded, which means anincrease in both computation and
`side information. As a compromise,a size of 16 x 16 is considered to be a good choice. (This has
`been specified in international video coding standards such as H.261, H.263, and MPEG-1 and
`MPEG-2.) Notethat for finer estimation a block size of 8 x 8 is sometimes used.
`Figure 11.1 is utilized to illustrate the block matching technique.
`In Figure !1.1(a) an image
`frame at moment£, is segmented into nonoverlapped p x g rectangular blocks. As mentioned above,
`in commonpractice, square blocks of p = q = 16 are used most often. Consider one of the blocks
`centered at (x, y). It is assumed that the block is translated as a whole. Consequently, only one
`displacement vector needsto be estimated for this block. Figure | 1.1(b) shows the previous frame:
`the frame at momentt,.,. In order to estimate the displacement vector, a rectangular search window
`is opened in the framer,, and centered atthe pixel (x, y). Consider a pixel in the search window,
`a rectangular correlation window of the same size p x q is opened with the pixel
`located in its
`center. A certain type ofsimilarity measure (correlation) is calculated, After this matching process
`has been completed for all candidate pixels in the search window,the correlation window corre
`spondingto thelargest similarity becomesthe best matchof the block under consideration in frame
`1,. The relative position betweenthese two blocks(the block and its best match) gives the displace-
`ment vector. This is shown in Figure 11.1(b).
`The size of the search window is determined by the size of the correlation window and the
`maximum possible displacementalongfour directions: upward, downward,rightward, and lefuward.
`In Figure 11.2 these four quantities are assumed to be the same and are denoted by d. Notethat d
`is estimated from a priori knowledge about the translation motion, which includes the largest
`possible motion speed and the temporal interval between two conseculive frames, 1-€.,
`fy ~ Unt:
`
`11.2} MATCHING CRITERIA
`
`Block matching belongs to image matching and can be viewed from a wider perspective. In many
`image processing tasks, we need to examine two imagesor twoportions of images on a pixel-by-pixel
`
`IPR2018-01413
`Sony EX1008 Page 248
`
`IPR2018-01413
`Sony EX1008 Page 248
`
`

`

`Block Matching
`
`223
`
`qt2d
`
`)
`
`pr2d
`
`FIGURE 11.2 Search window and correlation window,
`
`basis. These two images or two image regions canbe selected from a spatial image sequence, i.e.,
`from twoframes taken at the same time with \wo different sensors aiming at the same object, or
`from a temporal image sequence,i.c., from two frames taken al Lwo different moments by the same
`sensor. The purpose ofthe examinationis to determinethe similarity between the two images or
`two portions of images. Examples of this type of application include image registration (Pratt,
`1974) and template matching (Jain, 1989). The former deals with spatial registration of images,
`while the latter extracts and/or recognizes an object in an image by matching the object template
`and a certain area ofthe image.
`The similarity measure, or correlation measure, is a key element in the matching process, The
`basic correlation measure between two images1, and f,,, C (s, 0), is defined as follows (Anuta, 1969).
`
`=
`
`Pa2a (11.1)
`[SFShnoe! Or Deeines
`
`Thisis also referred to as a normalized two-dimensionalcross-correlation function (Musmannet al.,
`1985).
`Instead offinding the maximum similarity or correlation, an equivalent but yet more compu-
`tationally efficient way of block matchingis to find the minimumdissimilarity, or matchingerror.
`The dissimilarity (sometimesreferred to as the error, distortion, or distance) between (wo images
`i, and t,.,, D (s, t) is defined as follows.
`
`r= LYmi(Kfatnk+0),
`
`j=l k=l
`
`(11.2)
`
`where M(u,v)is a metric that measures the dissimilarity between the two arguments u and v. The
`D (s, t) is also referred to as the matchingcriterion or the D values.
`In the literature there are several types of matching criteria, among which the mean square
`error (MSE) (Jain and Jain, 1981) and mean absolute difference (MAD) (Koga et al., 1981) are
`used mostoften. It is noted that the sum of the squared difference (SSD) (Anandan, 1987) or the
`sum of the squared error (SSE) (Chanet al., 1990) is essentially the same as MSE. The mean
`
`IPR2018-01413
`Sony EX1008 Page 249
`
`IPR2018-01413
`Sony EX1008 Page 249
`
`

`

`294
`
`Image and Video Compression for Multimedia Engineering
`
`absolute difference is sometimes referred to as the mean absolute error (MAB) in the literature
`(Nogaki and Ohta, 1972).
`In the MSE matchingcriterion, the dissimilarity metric M (u, v) 1s defined as
`
`In the MAD,
`
`M(u,v)=(u—v)’.
`
`M(u,v)=|u—y.
`
`(11.3)
`
`(11.4)
`
`Obviously,both criteria are simpler than the normalized two-dimensional cross-correlation measure
`defined in Equation 11.1.
`Before proceeding to the next section, acommentontheselection ofthe dissimilarity measure
`is due. A study based on experimental works reported that the matching criterion does not signif-
`icantly affect the search (Srinivasan, 1984). Hence, the MADis preferred due toits simplicity in
`implementation (Musmann et al,, 1985),
`
`11.3.
`
`SEARCHING PROCEDURES
`
`The searching strategy is another importantissue to deal with in block matching. Several searching
`Strategies are discuused below.
`
`11.3.1
`
`Futt Searcy
`
`Figure 11.2 showsa search window, a correlation window,andtheir sizes. In searching for the best
`match,the correlation window is moved to each candidate position within the search window. That
`is, there are a total (2 d+1) x (2 d+!) positions that need to be examined. The minimum dissimilarity
`gives the best match. Apparently, this full search procedureis brute force in nature. While the full
`search delivers good accuracy in searching for the best match (thus, good accuracy in motion
`estimation), a large amount of computation is involved.
`In order to lower computational complexity, several
`developed. They are introduced below.
`
`fast searching procedures have been
`
`11.3.2
`
`2-D LocaritHmic SearcH
`
`Jain and Jain (1981) developed a 2-D logarithmic searching procedure. Based on a 1-D logarithmic
`search procedure (Knuth, 1973), the 2-D procedure successively reduces the search area,
`thus
`reducing the computational burden, Thefirst steps computes the matching criteria for five points
`in the search window.Thesefive points are as follows: the central point of the search window and
`the four points surroundingit, with each being a midpoint between the central point and one of
`the four boundaries of the window. Amongthesefive points, the one corresponding to the minimum
`dissimilarity is picked as the winner, In the next step, surrounding this winner, anotherset of five
`points are selected in a similar fashionto that in thefirst step, with the distances between the five
`points remaining unchanged. The exception takes place wheneither a central point of a set of five
`points or a boundary pointofthe search window gives a minimumDvalue.In these circumstances,
`the distances betweenthe five points need to be reduced, The procedure continues until the final
`step, in whicha set of candidate points are located in a 3 x 3 2-D grid. Figure 11.3 demonstrates
`two cases of the procedure. Figure 11.3(a) shows that the minimum D value takes place on 4
`boundary, while Figure 11.3(b) shows the minimum D valuein the central position,
`
`IPR2018-01413
`Sony EX1008 Page 250
`
`IPR2018-01413
`Sony EX1008 Page 250
`
`

`

`Block Matching
`
`225
`
`
`
`(b)
`
`(a) A 2-D logarithmic search procedure, Points at (j, k+2), (+2, k+2), (+2, k+4), and G+1,
`FIGURE 11.3
`k+4) are foundto give the minimum dissimilarity in steps 1, 2, 3, and 4, respectively. (>) A: 2-D loganthmte
`‘ search procedure, Pointsat (j, k-2), (j+2, k-2), and (+2, k-1) are found to give the minimum dissimilarity in
`Steps 1, 2, 3, and 4, respectively.
`
`A convergenceproof ofthe procedureis presented by Jain and Jain (1981), underthe assumption
`that the dissimilarity monotonically increases as the search point moves away from the point
`corresponding to the minimum dissimilarity.
`
`IPR2018-01413
`Sony EX1008 Page 251
`
`IPR2018-01413
`Sony EX1008 Page 251
`
`

`

`226
`
`Image and Video Compression for Multimedia Engineering
`
`ol a
`
`°4
`
`=
`
`
`werosoewaaieee (~)
`
`=)
`
`No
`
`
`
`
`
`
`es
`
`el
`iigelalel
`lta
`
`ee
`(er[aeeaa
`Dalgetyer|
`
`
`ieaEd
`
`
`Cleteaaaeaesteee
`
`
`FIGURE 11.4 Three-step search procedure. Points (j+4, k-4), (j+4, k-6), and (j+5, k-7) give the minimum
`dissimilarity in steps 1, 2, and 3, respectively.
`
`11.3.3 Coarse-Fine THree-Step SEARCH
`
`Another important work on the block matching technique was completed at almost the same time
`by Koga et al. (1981). A coarse-fine three-step procedure was developed for fast searching.
`The three-step search is very similar to the 2-D logarithm search. There are, however, three
`main differences between the two procedures.First, each step in the three-step search comparesa
`set of nine points that form a 3 x 3 2-D grid structure. Second, the distances between the points in
`the 3 x 3 2-D grid structurein the three-step search decrease monotonically in steps 2 and 3. Third,
`a total of only three steps are carried out. Obviously, these three itemsare different from the 2-D
`logarithmic search described in Section 11.3.2. Anillustrative example of the three-step search is
`shown in Figure 11.4.
`
`11.3.4 Conyucate Direction SEARCH
`
`The conjugate direction search is another fast search algorithm that was developed by Srinivasan
`and Rao (1984). In principle, the procedure consists of two parts. In the first part,
`it finds the
`minimum dissimilarity along the horizontal direction with the vertical coordinatefixed at aninitial
`position. In the second part, it finds the minimum D value along the vertical direction with the
`horizontal coordinate fixed in the position determined in the first part. Starting with the vertical
`direction followed by the horizontaldirection is, of course, functionally equivalent. It was reported
`that this search procedure works quite efficiently (Srinivasan and Rao, 1984).
`Figure 11.5 illustrates the principle of the conjugate direction search. In this example, each
`step involves a comparison between three testing points. If a point assumes theminimum D ae
`compared with both of its two immediate neighbors (in one direction), then it is considered e e
`the best match along this direction, and the search along anotherdirection is started. See y:
`the procedure starts to compare the D valuesfor three points (j, k-1), (j, k), and G, k+1). I ay
`value of point (j, kK-1) appears to be the minimum amongthe three, then points (j, k-2), U,
`;
`
`IPR2018-01413
`Sony EX1008 Page 252
`
`IPR2018-01413
`Sony EX1008 Page 252
`
`

`

`227
`
`Block Matching
`
`
`
`FIGURE 11.5 Conjugate direction search.
`
`and (j, k) are examined. The procedure continues, finding point (j, k-3) as the best match along
`the horizontal direction since its D value is smaller than that of points (j, k-4) and (j, k-2), The
`procedure is then conducted alongthe vertical direction. In this example the best matchingis finally
`found at point (j+2, k—3).
`
`11.3.5
`
` SuBsamMPLING IN THE CORRELATION WINDOW
`
`In the evaluation of the matching criterion, either MAD or MSE,all pixels within a correlation
`windowatthe f,.; frame and an original blockatthe ¢, frame are involved in the computation. Note
`that the correlation window and theoriginal block are the samesize (refer to Figure 11.1). In order
`to further reduce the computational effort, a subsampling inside the window and the block is
`performed (Bierling, 1988). Aliasing effects can be avoided by using low-passfiltering. For instance,
`only every second pixel, both horizontally and vertically inside the window and the block,is taken
`into account for the evaluation of the matching criterion. Obviously, by using this subsampling
`technique, the computational burdenis reduced by a factor of 4. Since 3/4 of the pixels within the
`window and the block are not involved in the matching computation, however, the use of such a
`subsampling procedure may affect the accuracy ofthe estimated motion vectors, especially in the
`case of small-size blocks. Therefore,
`the subsampling technique is recommended only for those
`cases with a large enough blocksize so that the matching accuracy will not be seriously affected.
`Figure 11.6 shows an example of 2 x 2 subsampling applied to both an original block of 16x 16
`at the 1, frame and a correlation window of the samesize at the f,_, frame.
`
`11.3.6 MuttiresoLuTION BLock MATCHING
`
`is a very
`is well known that a multiresolution structure, also known as a pyramid structure,
`Tt
`powerful computational configuration for various image processing tasks, To save computation in
`block matching,it is natural to resort to the pyramid structure. In fact, the multiresolution technique
`has been regarded as one of the most efficient methods in block matching (Tzovaras et al., 1994).
`In a named top-down multiresolution technique, a typical Gaussian pyramid is formed first.
`
`IPR2018-01413
`Sony EX1008 Page 253
`
`IPR2018-01413
`Sony EX1008 Page 253
`
`

`

`228
`
`Image and Video Compressionfor Multimedia Engineering
`
`an original block
`
`a corelation window
`
`(a) An original block of 16%16 in frame alt,
`
`(b) A correlation window of 146%16 in frameat t
`
`FIGURE 11.6 An example of 2x 2 subsampling in the original block and correlation window for a fast
`search.
`
`Before diving into further description, let us pause here to give those readers who have not been
`exposed to the Gaussian pyramid a short introduction to the concept. For those who knowthe
`concept, this paragraph can be skipped. Briefly speaking, a Gaussian pyramid can be understood
`as a set of images with different resolutions related to an original image in a certain way. The
`original image has the highestresolution and is considered as the lowest level, sometimes called
`the bottom level,
`in the set. From the bottom level
`to the top level,
`the resolution decreases
`monotonically. Specifically, between two consecutive levels, the upper level is half as large as the
`lower level in both horizontal and vertical directions. The upper level is generated by applying a
`low-passfilter (which has a group ofweights) to the lowerlevel, followed by a 2 x 2 subsampling.
`That is, cach pixel in the upperlevel is a weighted average of some pixels in the lower level.
`In
`general, this iterative procedure of generating a level in the set is equivalent to convolving a specific
`weight function with the original imageat the bottom level followed by an appropriate subsampling.
`Undercertain conditions, these weight functions can closely approximate the Gaussian probability
`density function, which is why the pyramid is namedafter Gauss. (For a detailed discussion, readers
`are referred to Burt and Adelson [1983, 1984].) A Gaussian pyramid structure 1s depicted in
`Figure 11.7. Note that the Gaussian pyramid depicted in Figure 11.7 resembles a so-called quad-
`tree structure in which each node hasfourchildren nodes. In the simplest quad-tree pyramid, each
`pixel in an upperlevel is assigned an average valueof ils corresponding four pixels in the next
`lowerlevel,
`Nowlet’s return to our discussion on the top-down multiresolution technique. After a Gaussian
`pyramid has been constructed, motion search rangés are allocated among the different pyramid
`levels. Block matchingis initiated at the lowest resolution level to obtain an initial estimation of
`motion vectors. These computed motion vectors are then propagated to the next higher resolution
`level, where they are corrected and then propagated to the next level. This procedure continues
`until the highest resolution level is reached. As a result, a large amount of computation can be
`saved. Tzovaras et al. (1994) showed that a two-level Gaussian pyramid outperforms a three-level
`pyramid. Compared with full search block matching, the top-down multiresolution block search
`sayes up to 67% of computations withoutseriously affecting the quality ofthe reconstructed images.
`In conclusion,it has been demonstrated that multiresolution is indeedan efficient computational
`Structure in block matching. This once again confirms the high computational efficiency of the
`multiresolution structure.
`
`IPR2018-01413
`Sony EX1008 Page 254
`
`IPR2018-01413
`Sony EX1008 Page 254
`
`

`

`Block Matching
`
`229
`
`
`
`level
`increasing
`resolution
`ae a
`
`level 4
`
`level 3
`
`level 2
`
`level |
`
`level 0
`
`FIGURE 11.7 Gaussian pyramid structure.
`
`11.3.7 THRESHOLDING MULTIRESOLUTION BLock MATCHING
`
`Withthe multiresolution technique discussed above, the computed motion vectors at any interme-
`diate pyramid level are projected to the next higher resolution level. In reality, some computed
`motion vectors at
`the lower resolution levels may be inaccurate and have to be further refined,
`while others may be relatively accurate and able to provide satisfactory motion compensation for
`the corresponding block. From a computation-saving point of view,for the latter class it may not
`be worth propagating the motion vectors to the next higher resolution level for further processing.
`Motivated by the above observation, a new multiresolution block matching method with a
`thresholding technique was developed by Shi and Xia (1997). The thresholding technique prevents
`those blocks, whose estimated motion vectors provide satisfactory motion compensation, from
`further processing, thus saving a lot of computation. In what follows, this technique is presented
`in detail so as to provide readers with an insight to both multiresolution block matching and
`thresholding multiresolution block matching techniques.
`Algorithm — Letf,(x, y) be the frame of an image sequence al current moment n.First, two
`Gaussian pyramids are formed, pyramids n and n — 1, from image frames f(x, y) and f,_\(x,y),
`respectively, Let the levels of the pyramids be denoted by /, /= 0,
`I, ..., L, where 0 is the lowest
`resolution level (top level), L is the full resolution level (bottom level), and L+1 is the total number
`of layers in the pyramids. If(i, j) are the coordinates of the upper-left corner of a block at level /
`of pyramid , the block is referred to as block(i, j)!. The horizontal and vertical dimensions ofa
`block at level / are denoted by b! and b!, respectively. Like the variable block size method (refer
`to Method |
`in Tzovarasetal. [1994]), the size of the block in this work varies with the pyramid
`levels. Thatis, if the size of a block atlevel/ is b!, then the size ofthe blockat level / + 1 becomes
`2b; x 2b). The variable block size methodis used becauseit gives more efficient motion estimation
`than the fixed block size method. Here, the matching criterion used for motion estimation is the
`MADbecauseit does not require multiplication and performs similar to the MSE. The MAD
`between block (i, j)'b! of the current frame and block (i + v,,j + vy)'b}_; of the previous frame at
`leyel / can be calculated as
`
`IPR2018-01413
`Sony EX1008 Page 255
`
`IPR2018-01413
`Sony EX1008 Page 255
`
`

`

`230
`
`Image and Video Compression for Multimedia Engineering
`
`MAD, (“=o SYelithit m) = fy_,(i+ k+vij+m+ vi)
`I xb,
`xb, k=0 m=0
`
`]
`
`plat bi-1
`
`(11,5)
`
`where V' = (vj, vj) is one ofthe candidates of the motion vector of block (i, j)},, v/, wf are the two
`components of the motion vector along the x and y directions, respectively.
`A block diagram of the algorithm is shown in Figure 11.8. The threshold in terms of MAD
`needs to be determined in advance according to the accuracy requirement of the motion estimation
`Determining the threshold is discussed below in Part B of this subsection. Gaussian pyramids are
`formed for two consecutive frames of an image sequence [rom which motion estimationis desired
`Block matching is then performed at
`the top level with the full-search scheme. ‘The estimated
`motion vectors are checkedto see if they provide satisfactory motion compensation. If the accuracy
`requirementis met, then the motion vectors will be directly wansformed to the bottomlevel ofthe
`pyramid, Otherwise, the motion vectors will be propagated to the next higher resolution level for
`further refinement. This thresholding process is discussed below in Part C ofthis subsection. The
`algorithm continuesin this fashion until either the threshold has beensatisfied or the bottomlevel
`has been reached. The skipping of someintermediate-level calculations provides for computational
`saving. Experimental work with quite different motion complexities demonstrates that the proposed
`algorithm reducesthe processing time from 14 to 20%, while maintaining almost the same quality
`in the reconstructed image compared with the fastest existing multiresolution block matching
`algorithm (Tzovaras et al., 1994),
`
`
`Block matching
`
`and subsampling
`
`
`and subsampling
`
`
`
`
`
`
`
`Low passfiltering
`and subsampling
`
`
`
`Sausfying
`threshold
`
`Low pass filtering
`
`Block matching
`
`Low pass filtering
`and subsampling
`
`Satisfying
`threshold
`
`Low passfiltering
`
`Block matching
`
`Motion field
`
`FIGURE 11.8 Block diagram fora three-level threshold multiresolution block matching.
`
`IPR2018-01413
`Sony EX1008 Page 256
`
`IPR2018-01413
`Sony EX1008 Page 256
`
`

`

`Block Matching
`
`231
`
`a T
`
`ABLE 11.1
`Parameters Used in the Experiments
`
`
`
`Parameters at Level Low Resolution Level_Full Resolution Level
`
`Search range
`Block size
`Thresholding value
`
`Search range
`Block size
`
`Thresholding value
`
`Miss America
`3x3
`4x4
`2
`
`Train
`
`4x4
`4x4
`
`3
`
`Football
`
`Ix]
`8x8
`None (not applicable)
`
`Ixl
`8x8
`
`None (not applicable)
`
`Search range
`4x4
`Ixl
`Block size
`4x4
`gx 8
`4Thresholding value None (not applicable)
`
`
`
`
`
`Threshold Determination — The MADaccuracycriterion is used in this work for the sake of
`Saving computations. The threshold value has a direct impact on the performance ofthe proposed
`algorithm. A small threshold value can improve the reconstructed image quality at the expense of
`increased computational effort, On the other hand, a large threshold value can reduce the compu-
`tational complexity, but the quality of the reconstructed image may be degraded. One possible way
`to determine a threshold value, which was used in many experiments by Shi and Xia (1997), is as
`follows.
`The peak signal-to-noise ratio (PSNR) is commonly used as a measure of the quality of the
`reconstructed image. As introduced in Chapter1, it is defined as
`
`PSNR =10102,,
`
`255°
`MSE
`
`(11.6)
`
`Fromthe given required PSNR, one can find the necessary MSE value. A square rootofthis
`MSEvalue can be chosen as a threshold value, which is applied to the first two images from the
`sequence. If the resulting PSNR and required processing timeare satisfactory, it is then used for
`the rest of the sequence. Otherwise, the threshold can be slightly adjusted accordingly and applied
`to the secondand third imagesto check the PSNRand processing time. It was reported in numerous
`experiments that this adjusted threshold value was accurate enough, and that there was no need for
`further adjustment. As shown in Table 11.1,
`the threshold values used for the “Miss America,”
`“Train,” and “Football” sequences(three sequences having quite different motion complexities) are
`2, 3, and 4, respectively. They are all determined in this fashion and give satisfactory performance,
`as shown in the three rows marked “New Method (TH=2),” “New Method (TH=3)” and “New
`Method (TH=4),” respectively, in Table 11.2. That is, the PSNR experiences only about 0.1 dB loss
`and the processing time decreases drastically. In the experiments, the threshold value of 3, i.e., the
`average value of 2, 3, and 4, was also tried. Refer to the three rows marked “New Method (TH=3)"
`in Table 11.2. It is noted that this average threshold value 3 has already given satisfactory perfor-
`mance for all three sequences. Specifically, for the “Miss America” sequence, since the criterion
`increases from 2 to 3, the PSNRloss increases from 0.12 to 0.48 dB, and the reductionin processing
`lime increases from 20 to 38%. For the “Football” sequence, since the criterion decreases from
`4 to 3, the PSNRloss decreases from 0.08 to 0.05 dB, and the reduction in processing time decreases
`
`IPR2018-01413
`Sony EX1008 Page 257
`
`IPR2018-01413
`Sony EX1008 Page 257
`
`

`

`232
`
`Image and Video Compression for Multimedia Engineering
`
`from 14 to 9%. Obviously, for the “Train” sequence, the criterion, as well as the performance,
`remains the same. One can therefore conclude that the threshold determination may not require
`much computation atall.
`
`Thresholding — Motion vectors estimated at each pyramid level will be checked to see if they
`provide satisfactory motion compensation. Assume V!(i,j) =(v{, v\) is the estimated motion vector
`for block (i, j)', at level / of pyramid n. For thresholding, V! (/,/) should be directly projected to
`the bottom level L. The corresponding motion vector for the same block at the bottom level of
`pyramid n will be V4 (2¢-? 7,2) j), and is given as
`
`ye(atte getj) = att yl(i, 9)
`
`(11.7)
`
`The MAD betweentheblockat the bottom pyramid level of the current frame and its counterpart
`in the previous frame can be determined according to Equation 11.5, where the motion vectoris
`VE = VE (2) |,20-1) Ff). This computed MADvalue can be compared with the predefined threshold.
`If this MAD valueis less than the threshold, the computed motion vector V5 (2!) 7.2" j) wall
`be assigned to block (2i,2"j)- at level L in the current frame and motion estimation for this
`block will be stopped. If not, the estimated motion vector V!(i, }) at
`level
`/ will be propagated to
`level / + | for further refinement. Figure 11.9 gives an illustration ofthe above thresholding process.
`
`Experiments — Toverify the effectiveness of the proposed algorithm, extensive experiments have
`been conducted. The performance of the new algorithm is evaluated and compared with that of
`Method1, oneofthe mostefficient multiresolution block matching methods (Tzovaraset al., 1994)
`in terms of PSNR,error image entropy, motion vector entropy, the number of blocks stopped at
`the top level vs. the total number of blocks, and processing time. The number of blocks stopped
`at the top level is the number of blocks withheld from further processing, while the total number
`of blocks is the numberof blocks existing at
`the top level. It
`is noted that the (otal number of
`blocksis the samefor each levelin the pyramid. The processing time is the sum ofthe total number
`of additions involved in the evaluation of the MAD and the thresholding operation.
`In the experiments, two-level pyramids are used since they give better performance for motion
`estimation purposes (Tzovaras et al., 1994). The algorithms are tested on three video sequences
`with different motion complexities, i.e., the “Miss America,” “Train,” and “Football.” The “Miss
`America” sequence has a speaker imposed on a static background and contains less motion. The
`“Train” sequence has moredetail and contains a fast-moving object (train). The 20th frameof the
`sequenceis shownin Figure 11.10. The “Football” sequence contains the most complicated motion
`
`Pyramid
`n-]
`
`LE
`
`Estimation of motion vector
`ofa block at level |
`
`
`Pyramid
`n
`
`LF
`
`Pyramid
`level
`
`Projection of
`the block and
`its estimated
`motion vector
`at level L
`
`Calculation of the MAD of
`the block at level L
`
`
`
`FIGURE 11.9
`
`Thethresholding process.
`
`IPR2018-01413
`Sony EX1008 Page 258
`
`IPR2018-01413
`Sony EX1008 Page 258
`
`

`

`Block Matching
`
`233
`
`FIGURE 11.10 The 20th frame of the “Train” sequence.
`
`
`etee tttaseealanee coed
`
`FIGURE 11.11 The 20th frame in the “Football” sequence.
`
`compared with the other two sequences. The 20th frame is shown in Figure 11.11. Table 11.1 is
`the list of implementing parameters used in the experiments. Tables 11.2 and 11.3 give the perfor-
`manceofthe proposed algorithm compared with Method1. In all three cases, the motion estimation
`has a half-pixel accuracy, the meaning of which will be explained in the next section. All perfor-
`mance measureslisted there are averaged for the first 25 frames ofthe testing sequences.
`Each frameof the “Miss America” sequence is of 360 x 288 pixels. For convenience, only the
`central portion, 320 x 256 pixels, is processed. Using the operational parameters listed in Table 11.1
`(with a criterion value of 2), 38% of the total blocks at the top level satisfy the predefined criterion
`and are not propagated to the bottom level. The processing time needed by the proposed algorithm
`is 20% less than Method ‘1, while the PSNR, the error image entropy, and the vector entropy are
`almost the same. Compa

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket