`
`Image and Video Comp ressio n fo r M L1ltim edi a Engi neerin g
`
`Cons equently, this scheme is capable of achievin g high accur acy in moti on es tim ation , a nd at the
`san1e time it does not cause a large increa se in side information dt1e to the n1otion fie ld segmentation.
`Another key issue is how to achieve a recon .structed n1otion field with pixel res olut ion alon g
`moving boundaries _. In order to avoid extra moti on vector s that need to be enco ded and tran smitted,
`the n1otion vector s applied to the se segmented region s in the area s of motion d iscontinui ty are
`selected from a set of neighboring motion vectors. As a result , the propose d tec hnique is capable
`of reconstructing discontinuitie s in the motion field at pixel resolution whil e maint ainin g the same
`amount of motion vectors as the conventio 'na.l block matchin g tec h11ique.
`A number of algorithm s usin g this type of moti on field segmentatio n tec h·niqL1e have bee n
`developed and their performance has been tested and evalu ated on some rea l v ideo sequences
`(Orchard, 1993). T\vo of the 40-fr&.me test sequences used were Ll1e ''Tab le Tenni s'' and the
`''Football '' sequence s. The former contain s fast ball motion and can1era zoo min g , while the latter
`contains sm,all object s \Vith relatively moderat e amount s of motion and ca n1era pa nnin g . Severa l
`propo sed algorithm s \Vere compar ed with conventional block matching in ter1n s of aver age pixe l
`prediction error energy and bits ·per fra1ne required for codin g predic tion error. Fo r the aver~ge
`pixel prediction error energy , the propo sed algorithn1s achieve a significa nt reducti on, ran gi ng from
`.
`--0.7 to -2.8 d.B with respect to the ''Table Tenni s'' sequ ence , and from - 1.3 to -4.8 dB \Vith the
`''Football'' sequence . For bits per frame required for codin g prediction error, a reduction of 20 to
`30% was reported.
`
`.
`
`11.6.4 OVERLAPPED BLOCK MATCHING "
`
`All the ·technique s discussed so fru-in this section aim at more re liable motio n e ti mation . As a
`result, they aJso alleviate anno ying block artifa cts to a ce rtain exte nt. In thi
`ubsect ion \Ve discuss
`a group of tec.hnigue s, ten11ed overla ,pped block matching, deve lo.ped to a·uevi ate or elimin ate block
`artifacts (Watanabe, 1991; Nogaki and Ohta , 1992; Au yeun g et al. 1992) .
`The idea is to reJax the restriction ·of a nonoverlapp ed block partition
`imp ose d in the bloc k(cid:173)
`based model in block matching . After the nono verl apped, fixed size, small rec tan guJar block
`partition has been . made , each block is enlarged along aU four dire ction s from the cent er of the
`block. Refer to Figure 11.21. Both motion estimation (block rnatchin g) and moti on-comp ensa ted
`prediction are conducted in tb.e same manner as that in block matchin g except for the inclu sion of
`
`an origina1 non-overlapped block
`
`estimated motion vector
`
`•
`
`an enlarged
`.
`.
`. .
`·target bloct· a neiglibonng overlapped block
`
`best matched enlarged · block
`
`(a) frame-at ta
`
`(b) frame at ta-1
`
`FIGURE 11.21 Overlapped block matching.
`
`IPR2021-00827
`Unified EX1008 Page 270
`
`
`
`Block Matching
`
`245
`
`-
`
`a \vin?o':' function. That is, a 2-D window function is utilized in order to mainta·in an appropriat'e
`quant1tat1ve level along tl1e overlapped portion. The window function decays towards the bound(cid:173)
`aries. In (Nogaki and Ohta, 1992) a sine-shaped window function was used.
`. Next, we use the algoritl1rn proposed by Nogaki and Ohta as an example to specifically illustrate
`this type of technique. Consider one of tl1e enlarged, overlapped original (also known as target)
`blocks, T(x,y), with a dimension of l x l. Assume Ll1at a vector vi is one of the candidate displacement
`vectors under conside.ra~ion. Th.e pr~dicted version of the· target block witl1 vi is denoted by v1, ~ ';
`(.x·, y) .. Tl1us, the pred1ct1on error with v;, Ev, (x, )') can be calculated according to the following
`equat1on
`
`Tl1e window function W(x , y) is applied at tl1is stage as follows, resulting in a window-·operat:ed
`prediction error with v;, WE11
`
`• •
`I
`
`( 1 I . 8)
`
`iiV Ev. ( .,r, )1) = E1
`• ( ): , ) ' ) X W ( X, ) ')
`
`I
`
`I
`
`( 11 .9)
`
`Assume that the MAD is used as the matching criterion. Tl can then be deter1nined as usual by
`using the \vindow-operated prediction error WE"1 (x, )'). That is,
`
`•
`
`( 11.10)
`
`The best 1natch, which corresponds to the mir1irnu111 MAD, produces the displace111ent vector v.
`In 111otion-compe11sated prediction, the predicted version of the enlarged target block, P" (x, y)
`is deri\ 1ed from Lhe frame at ti.t by using estimated vector v. The same wir1dow function W (x, y)
`is used lo generate the fir1al window-operated predicted version of the target block. That is,
`
`(11.ll)
`
`It was reported _by Nogaki (l 9·92) that tl1e lur11inance si·gnal of an HDTV sequence was u_sed
`in computer simulation. A block size of 16 x 16 was used for conventional block m.atching, \vl1ile
`a block size of 32 x 32 was employed for tl1e proposed overlapped block. n1atching. T11·e n1axiniun1
`displacement range d w·as taken as d = 15, i.e., fron1 - 15 to + 15 in both the horizontal and vertical
`directions. The simulation indicated a reduction in the power of the prediction error by about 19%.
`Subjectively, it was observed that tl1e blocking edg_es originally existing in tl1e prediction error
`signal with conventional block n1atching was largely removed with the proposed overlapped block
`n1atching technique.
`·
`
`11.7 SUMMARY
`
`By far, bloc_k matching is used. more frequently than any otl1er motion estimation techniqµe in
`motion-compensated coding. By partj.tioning ·a frame into nonoverlapped, equally spaced, fixed
`size, small rectangular blocks and assuming that all the pixels in a block experience the san1e
`translational motion, block matching avoids tl1e djfficulty encountered in motion estin1atio11 qf
`arbitrarily shaped blocks. Consequently, block matching is much sin1p]er and involves less side
`inforrnation compared with moti·on estimation witl1 arbitrarily shaped blocks .
`
`•
`
`IPR2021-00827
`Unified EX1008 Page 271
`
`
`
`246
`
`Image and Video Con1pre ssion for Multir11ecJia Engi neering
`
`Altllou.gl1 tl1is simple n1odel considers translatio11 motion only, otl1er types of 111otions, SLJCh as
`rotation and zoon1in·g of large objects, ,nay be closely approxin1ated by the piece\vise tra.nslation
`of tl1ese s111all blocks, provided that tl1ese blocks are small enougl1. Tl1is i1111Jorta11t observation ,
`originally made by 1-ain and Jain, has beer, con(in11ed again and agai11 since tl1en.
`Various issues related to block matcl1ing such as selectio,1 of .block size , matcl1ing criteria,
`searcl1 strategies, 111atching accuracy, and its li1nitatio11s a11d i111pro,,e,11e11ts are discussed in this
`cl1apter. Specifically, a block size of 16 x 16 is used n1ost ofte11. For ,nore accurate n1otio11 esti1nation,
`tl1e size of 8 x 8 is used so111eti111es.
`Ir1 tl1e latter case, 1.nore accurate n1otion estin1atio11 i obtained
`at tl1e cost of 111ore $ide infom1.ation and l1igl1er computation.al co111plexiL)1•
`Tl1ere are several differe11t types of n1,1tcl1i11g critericl tl1at can be used in l1lock matching. Since
`it ,was s·l10\vn tl1at tl1e differen't criteria ,do 11ot cause sigr1ificant dil'ferences i11 block n1atching, the
`111ean absolute difference is l1ence preferred due to its si111plicity in in1plen1entatio11.
`On the or1e l1and, a full-search procedure delivers good accuraC)' ir1 earcl1ing for the best n1atch.
`011 the other l1and, it requires a large ar11ount of co111putation. 111 order to lo,,,cr computational
`complexity, several fast searching procedures \Vere developed: 2-D logari tl1n1 ic se,lrcl1, coa rse- fine
`three-step search, and conjugate direction searcl1, to n<1me a t·e,.v.
`Besides these suboptin1un1 searcl1 procedures, tl1ere are ·on1c otl1cr n1ea Lire developed to
`lo\.ver computation. One of tl1en1 is su'bsampling ir1 the original block. r1nd tl1c co11·elation ,vindo,,,s.
`gy subsampling, tl1e .co111putational burde11 in block rn.atcl1i11g can be reduced dra t ical I y, ,vhi le the
`accuracy of tl1e estimated motior1 vectors n1ay be affected. Tl1erefore, the s·ubsa111pling procedure
`is only recon1mended for tl1e case \Vith a large block size.
`Naturally, the multiresolution structure, a po,verru l con1pu La Li on al configura tion i 11 i n1age pro(cid:173)
`cessing, lends itself \vell to a fast search in block matchir1g .. It signif1cantly red.uces tl1e computations
`involved. Thresholding multiresolution block matcl1ir1g furtl1er saves co111putation.
`In terms of matching accuracy, several comnio11 choices are one-pixel 1 half-pixel, and quarter (cid:173)
`pixel aGcuracies. Spatial jnterpolation is usually required for half-pixel and quarter-pixel accuracies .
`That is, a l1igl1er accuracy is achieved \.Vith more co1nputatio11.
`Limitations wilh block matcl1ing tccl1niques are mainly ar1 unreliable n10Lion vector field and
`block artifacts. B·olh are caused by the simple model: each block is assL1111ed to expe rience a uni for 111
`trar1slation. Much efforts have been made to in1prove tl1ese drawbacks. Several tecl1niques tl1ar a.re
`an improvement over the conventional block matching technique are di·scussed in this chap ter.
`111 the l1ierarchical block matching technique a set of different sizes for botl1 the original block
`and the correlation windo\v are used. The first leve·I in tl1e hierarchy with a large wir1dow size and
`a large displacen1ent range determines a rnajor portion of the displacen1ent vector reljability. 1l1e
`success.ive levels \Vith smaller \vindow sizes and sn1aller displacement ranges are capable of
`adaptively estimating motion vectors more loc·ally.
`The multigrid. block matching technique uses multigrid structure , another powerful con1puta(cid:173)
`t:ional stru·cture in image processing, to provide a variable size block rnatchin,g. With a split-and(cid:173)
`merge strategy, the thresh.olding multigrid block 1nat.ching technique segments an image into a set
`of variabJe size blocks, each of wllich experiences an a.pprox-imately unifo11111notion. A tree structure
`(bin-tree or quad-tree) is used to record tl1e relationship between tl1ese variable size blocks. With
`the flexibility provided through the variable-size n1ethodology, tl1e tl1resholding block 1natcl1ing
`technique is capabl~ of making lhe motion model of the uniform n1otion within each block rnore
`accurate th·an fixe.d-size block n1atching can do.
`As pointed out in Chapter I 0, the ultimate goal of n1otion con1pensation in video coding i .. s to
`ach.ieve a high coding efficiency. In otl1er words, ac,curate true motion estin1ation is not tl1e f1nal
`goal. From this point o·f view, in .the above-mentioned multigrid block matching, the decision of
`splitting a block is made 0nly when the bits us~d to encode extra 1notion vectors involved in tile
`splitting are less than the bits sa·ved from encoding reduced prediction error due to more acc~ra~e
`estimation. To this end,. an ada.ptive entropy criterion is proposed and used jn the optimal multigricl
`
`IPR2021-00827
`Unified EX1008 Page 272
`
`
`
`Bloc k Mat c hing
`
`247
`
`block n1atching techniqu e. Not only is it opli1nal in the sense of bit saving, but it also eliminates
`lhe need for selling a threshold.
`A[)parently the block-based 111odel encounters a more severe problem along moving boundaries.
`To solve tl1e prob]e111, the f)redictive 1norion field segrnentalion te.chnique make the blocks involving
`111oving boundarie s have tt1e n1otion field measured \vitt1 pixel resolution instead of block resolution.
`In. order to save sh,1pe overl1ead, seg111e11tatio11 is can·iGd out backwards, i.e., based on previously
`decoded fra111es. Ir1 order lo avoid a large i11crease of side informati·or1 associated \vith extra motion
`vectors, tl1e 1notio11 vectors applied to these seg1ne nled regions along n1oving botJndaries are selected
`fron1 a set of 11eigl1boring 111otion vectors. As a result, the technique is capable of reconstructing
`discontir1uities i11 the 1notion field at pixel resolution while maintaining tl1e same amount of motion
`\rectors as the conventional block n1atcl1ing tecl1nigue.
`Tl1e last i11.1proven1enl over conventional block 111atching discussed i11 this cl1apter is overlapped
`block matcl1ing. I11 co11trast to dealir1g with blocks independently of eacl1 otl1er, the overlapped
`block n1atcl1i ng tecl1r1 ique enlarges b.locks so as to make tl1em overlap. A window function is tl1en
`constructed and used in both 1notior1 estimation and n1otion compensation. Because it relaxes tl1e
`restriction or a 11or1overlapped block partition imposed by conve11tional block matching, it achieves
`better performance tba11 tl1c conventional block matcl1ing.
`
`11 .8 EXE RC I SES
`
`11-1. Refer to Figure 11.2. Jt is s,1id tt1at tl1ere are a total of (2d + I ) x (2d + I ) positions tl1at
`need to be exa111i11ed i11 block matcl1ing \iVitl1 fulJ search if one-pixel accuracy is required.
`How ma11y positions are there that need to be ex_mined in block 1natcl1ing vvith full
`search if hair-pixel and quarter-pixel accuracies are required?
`11.-2. What are the 1wo effects that subsan1pling in tl1e original block and the correlation block
`may bring out?
`11-3. Read Burt ,1nd Adelson ( 1983 ) or Burt (1984), and explai11 why Lhe pyranud is named
`a fter Gc:1uss.
`11-4. Read Burt and Adelson ( 1983) or Burt ( I 984 ), and explain \vhy a pyran1id structure is
`co nsidered as a powerful co,nputational con.figuration. Specifically, in multireso lutional
`block 1natcl1ir1g, l1ow a11d to \vhal exte.nt does it save computation dran1atically, con1pared
`with
`tl1e co nven ti 011al block n1atcl1ing techniqu e? You rnay want to refe r to
`Sectior1 1 I .3.7.
`11-5. How is tl1e tl1re-shold dele.1111i11ed in the thresholding mulrjdin1er1sional block 1natching
`technique (refer to Section 11.3.7). It is said tl1at tl1e square root of tl1e MSE value,
`derived fro1n ll1e given PSNR accordi11g to Equation I I .6, is used as an i11itial tl1reshold
`value. Justify tl1e necessity of the square root operation.
`11-6. Re'fer to Section 11.6.1 or tl1e pa1)er by Bierling ( 1988). State the different requirements
`.in tl1e applications of motior1-c·ompensated i11terpolation and 1notion-compensated cod(cid:173)
`ing. Discuss where a full resolution of the tra11slatio11al n1otion vector field may be used?
`11-7. Read the paper Du faux and Mosct1eni ( J 995), and explain the n1ain f'eature of opti111al
`multigrid block matching. State l1ow the adaptive entropy criterion is established. In1ple(cid:173)
`rnent the algorithn1 and compare its perforn1ance witl1 tl1at presented 'by CJ1a11 et al.
`( 1990 ).
`i 1-8. Learn tl1e predict ive moti·on field seg1ne11tatio11 tecl1nique (Orchard, 1993) . Explain hovv
`
`the -algoritl11ns avoid a large increase in overl1ead due to 111otion field segme11tation.
`Implement the overlapped block matching algorjthm introduced by N9gaki ( 1992).
`Compare its perfor~mance witl1 tl1al of tf1e co:nver1tio11al bloc.k n1atcl1ing t.ecl1nique.
`
`11-9.
`
`IPR2021-00827
`Unified EX1008 Page 273
`
`
`
`248
`
`REFERENCES
`
`Image and Video Compression for Multir11edia Engineering
`
`Anandan, P. Measurement Visual Motion Frorn Image Sequences, Ph.D. tl1esis, COINS Department, University
`of Massac'husetts, Atnherst, 1987.
`Anuta, P. F. D'igital registration ,of multispectral video itnagery, Soc. Pl1oto-01;t. /12srr1,111. E,zg. 1 .. 7, 168-175,
`1969.
`AU)'eung, C., J. KosmacJ1, M. Orchard, and T. Kalafatis, Overlapped block motion compensation, SPIE Proc.
`
`\lis11al Co11v111111. !,,,age Process. '92, Boston, MA, Nov. 1992, vol. 18 18. 56 1-571.
`Bierlin,g, M. Displacement estimation by hierarct1icaJ blockmatcl1ing. SPIE P,-oc. \/js1,al Co,111111111. r,,zage
`Process .. 100], 942-951, 1988.,
`Brailean, J. Universal Access,ibility a.nd Object-Based Functio11ality, JSCAS T,,ro,·ial or1 NI PEG 4, June 1997,
`Chap. 3.3.
`Brofferio, S. and F. Rocca, lnterframe redundancy reduction of video signals generated by translating objects,
`IEEE Trans. Co,111111,11.
`, COM-25, 448-455, I 977.
`Burt, P.,J. and E. H. Adel&on, The Laplacian pyran1id as a con1pacl image code , IEEE Tra11s. Co,111111,12., COlv1-
`31 (4), 532-540, 1983.
`Burt, P. J. Tile pyran1id as a structure for effieient computation. in !v!,,Lci re~;o/1,rio,1 !111c1ge Processi,zg a11d
`A11·a/;1sis, A. Rosenfeld., Ed., Springer-Verlag, New York, 1984, 6.
`Cafforio, C. and F. Rocca, Method for 111easur;ing small displacemenl of television images, IEEE Tra,zs. /, if.
`T/1eory,, IT-22, 573a.579, 1976.
`Chan, M. H., Y. B. Yu, and A. G. Constantinides, Variable size block matchir1g r11otion compen sation \V1lh
`applications to video coding. IEEE Proc., 137( 4). 205-212, 1990.
`D,ufaux, F. and M. Kunt, Mu,Jtigrid block n1atching motion estimation vvitl1 an adap.tive loca l mes h refinement,
`
`SPIE Proc. ~s1,al Co1n11i1,11. /111age Process. '92, I 818, 97-109, 1992.
`Dufaux, F. Multigrid Blook Matching Motion Estimalion for Generic Video Coding, Pl1.D. disse rtation. S\viss
`Federal Institute of Technology, Lausanne, Switzerland, 1994.
`Dufaux. F. an,d F. Moscheni, Motion estimation tecl111iques for digital TY: A revie\V and a ne\.v contribution,
`Proc . IEEE, 83(6), 858-876. 1995.
`Hackbusch, W. and U. Trotte_nbe.rg, Eds., M1,ltigrid Metlzocls, Springer-Verlag, Ne\V York, 1982.
`Haskell, B. G. and J. O'. Limb, Predictive video encod,ing using 111easured subject velocity, U.S. Patent
`3.632,865, January 1972.
`Jain, J. R. and A. K. Jain·, Displacement measurement and its application in interframe image cod ing, IEEE
`Tra11s. Co111n11,,11., COM-29( 12), 1799-1808, 1981.
`Jain,, A. K. Fz1nda,11e111als of Digital /111age Processi11g, Prentice-Hall, Engle\vood Cliffs, NJ. 1989.
`Koga, T., K. Linuma, A. Hirano, Y. lijima, and T. Ishiguro, Motion-compe nsated inrerframe codi ng for video
`conferencing, Proc. NTC'81 , G5.3. l-G5.3.5, Ne\v Orleans, LA, Dec. 1981.
`Knuth, D. E. Searching and Sorting, T/1e Ari of Co111p1,ter Progra,n111i11g, Vol. 3, Addison-Wesley, Reading,
`MA, 1973 .
`L,imb, J. 0. and J. A. M.urphy, Measuring rhe speed of moving 9bjects from televisio11 signals, IEEE T,·a,is.
`Co111,,1ii,11., COM-23, 474-478, 1975.
`Lin, S., Y. Q. Shi-, and Y.-Q. Zhang, An optical flow-based motion co111pensation algoritl1m for very lo\v bit(cid:173)
`rate video coding, Proc. J 997 IEEE /1zt. Co,if. Aco1,stics, Speec/1 Sig rial Pro·cess., 2869-2872, Municll,
`Germany, April J 9,97; Int. J. /111aging S),sf. Tec/1nol., 9(4), 230-237, 1998.
`Moscheni, F., F. Dufaux., and H. Nicolas, Entropy criterion for optimal bit alloc ation between motion and
`prediction error information, ·in SPIE 1993 Proc. Visz,al Co11z111un. Jr11age Process., 235-242, Cambridge.
`MA:, Nov. l 993.
`Musmann. H. G., P. Pirsc,h, and. H.J. Grallert, Advances in picture coding, Proc. IEEE, 73(4), 523-548, I 985 .
`Netravali, ,A. N. and J. D. Robbins, Motion-comp,ensated television coding: Part I, Bell S;1st. Teclt 1., 58(3),
`6{3,1-670, 1979.
`Nogaki, S. ancl Ohta, M., An overlapped blo_ck motion compensation for high-qualjty motion picture coding.
`Proc. IEEE Int. S,y11zp. Circz,its a,zd Syste111s, vol. I, 184-187., ,San Di~go, 1992.
`Orchard, M. T.' Predictive motion-field segmentation fo.r image sequence coding, IEEE T1·a12s. Ci1·c11ils and
`S)'St. Video Technol . ., 3( I), 54-69, J 993.
`Pratt, W. K. Correlation techniques o.f irnage registraJion, IEEE Tra,zs. Aerasp. Ele~tto,z. Syst., Al!S-10(3).
`3S3-358., J-974.
`
`.
`
`.
`
`IPR2021-00827
`Unified EX1008 Page 274
`
`
`
`Block Matcl1i ng
`
`249
`
`Rocca., F. ancl Zanoletti> S., Bandwidtl1 reduction via movement compensation on a model of the random video
`process, IEEE Trc111s. Co,11111., vol. COM-20, 960-965, Oct. 1972.
`Shi, Y. Q. and X. Xia, A ll1resholdin g n1ultidin1ensional block matching algorithm, IEEE Ttar1s. Cir c1,irs a11cl
`S),st. Video Tec/111ol., 7(2), 437-440, AprjJ 1997.
`Srinivasan, R. and K. R. Rao, Predi ctive codi ng based on efficient motion estimation, Proc . of/CC, 521-526.,
`May 1984.
`T zovaras, D., M. G. Strintzis, and H. Sahinolou. Evaluation of multiresolution block matching · techniques for
`motion and disparity estimati on, Sig11al Process. /111age Co1111r111n., 6, 56-67, 1994.
`Watanabe, H. a11d Singl1al, S., Windo\v ed mo.lion compensation, SPIB, vol. 1605, in Vis1,al Co1111111,11icario11s
`ancl !t1zage P1·ocessi1ig, I 99 I: Vis,~al Co1r11111,11icc1tio1i,
`582-589, November 1991.
`Xia, X. and Y. Q. Shi, A thresholdin g hierarchical block matchi ng algorithm, Proc. IEEE 1996 /11t. Sy111p.
`Circ1,its S)1st.1 I I, pp. 624-627, Atlanta, GA, M ay 1996; J. Co111p1,1. Sci. Inf. Manage ., l (2), 83-90, 1998 .
`
`•
`
`•
`
`•
`
`IPR2021-00827
`Unified EX1008 Page 275
`
`
`
`•
`
`•
`
`|PR2021-00827
`
`Unified EX1008 Page 276
`
`IPR2021-00827
`Unified EX1008 Page 276
`
`
`
`-,
`
`--
`
`...
`
`Pel
`
`As discussecl in Cl1apler 10, tl1e pel recursive lechnique is 011e of t'l1e three 111ajor ap.proacl1es to
`t\vo-di111ensio nal displacen1ent eslimalion in i111age planes for Lhe signal processi 11g com n1unity.
`Concept ually speak ing, it is one lype of region-1natching tech11ique. In contrast to block n1atcl1.ing
`(\vhicl1 was discussed i11 Ll1e previou s cl1aplcr), it 1·ect11·sivel)' estin1ates displacen1ent vectors for
`ec1c/1 /Ji)~el i11 an i1nctge fra111e. Tl1e displacen1cnl vector of a pixel is estimated by recursively
`minin1izing ,1 no11linear function of Ll1e dissin1ilarity between two certain regions located in two
`consec utive fran1es. Nate ll1al r·egio,, 111eans a group of pixels, but it could be as sn1all as a single
`pixel. Also note tl1~1t tl1e tern1s /Je/ a11Ci /Ji.x:el have tl1e same meaning. Both te1-n1s are used freque11tl.y
`in Lhe field of s ignal and image proces i11g.
`This cl1,1pter is organized ,1s follows . A general descriptio11 of Lhe recursive technique is provided
`in Sec tion 12.1 . So,ne fundan1ental techniques in optin1i,zation are covere d i11 Sectio11 12.2.
`Section 12.3 describe s the Net ravali and R'obbins algoritl1m, tl1·e pioneering work in this catego ry.
`Severa l ott1er typica l pe l recursive algoritl1n1s are inLroduced in Sectio n 12.4. In Sec tion 12.5, a
`perf orn1a11cc co111pariso11 bet wee11 these algorithn1s is made.
`
`12.1 PROBLEM FORMULATION
`
`In 1979 Netravali and Robbi 11s published Lhe first pe1 recursive algorithm to estimate displacement
`vector s for n1otion-co n1pensated i11terfra1n e in1age cod ing. Netravali and Robbi ns ( 1979) defined a
`quantity , ca lled the displaced frarne difference (DFD), as follo\VS.
`
`D FD(.,\:, )'; cl.r, d.,·) = !,, ( x, )') - f,, _ 1 ( x - cl_r , )) - cl Y) ,
`
`( l 2. 1)
`
`where the subscript ,z and 11 -
`l i11dicate two mon1ents associa ted with two success i\1e frar11es b,1sed
`on wl1icl1 mot ior1 vectors are to be estin1ated; .,t,, ), are coordinate s in image planes, d.\'' ciy are th.e
`t \VO con1por1ents o·f the disp.lace111ent vector, d, along the horizonta l and vertical d~ections . in the
`imag e pla11es, respectively. DFD(.:i:, y; (/x, dy) ca,1 aJs.0. be expressed as DFD(.,r, )'; cl. Wl1enever it
`does not ca use co r1fusion, it can be written as DFD for the snke of brevity. Obviously, if there is
`no error in the estimation, i:e., tl1e esti111ated displtlCe111er1t vector is exactly ·equal Lo tl1e true motion
`vector, then DFD wil l be zero.
`A nonlin ear functio11 ot· rhe DFD was tl1en proposed as a dissimilarity 1,1eas ure by Netravali
`and Robbin s (1979), which is a square function of fJFD, i.e., DFD2
`.
`Netrava li and Robbins tl1us converted disp,lacen1ent estin1ation into a mi11im.iz·ation prob lem .
`That is, each pixel corresponds to a pair of i 11tegers ( .. r, .)'), de11oting i ls spatial position _in tl1e image
`plane. Th erefo re, the DFD is a function of cl. Tl1e estimated disp lacen1er1t \,ector cl = (dx, dy)1:
`wher e ( )T denqtes tl1e tra11spositjor1 of tl1e argL1111ent vector or 111atrix, 9an be detelimined by
`minimi zing tl1e DFD2. This is a typical nonlinear prograrnming problem, on wl1icl1 a large body
`of research l1as been report ed in the literature. In. tl1e 11ext sectio r1, severa l tecl1niques that rely on
`a m·etl1od, calle d desce nt niethod, in optimization are introduced. The Netravali arid Robbins
`algorithm can be applied to a pixel once or iteratively applied several tin1es for displacement
`estimat ion. Then the algarit l1m mo\res to the next pixel. The estin,ated dis-placen1ent vecto r of a
`pixel can be used as a,1 itiitial estin1ate for the next pixel. Tl1is recursion can be carried .out
`
`251
`
`IPR2021-00827
`Unified EX1008 Page 277
`
`
`
`252
`
`lrnage and Video Compr ession for Mu ltim edia Engineering
`
`0
`(x, y) (x.y+ / )
`0
`0
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0 0
`
`0 0
`
`0 0
`
`0 0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`(a) Horizo ntal
`
`0
`(.r. y)
`0
`(x + Ly)
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`(b) Vertical
`
`X
`
`y
`
`t
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`- o
`
`0
`0 (.r.y.,.6
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`(c_) Temporal
`
`FIGURE 12.1 Thr ee Lypes of recursions: (a) horizo ntal; (b) verti cal; (c) temporal.
`
`horizontally, vertically, or temporally. By ·te11zpo1·lzlly, we mean that the estimated displaceme nt
`vector can be passed to the pixel of the sam.e spatial position within imag e planes in a temporally
`neighboring frame. Figure 12.1 illustrates these three different types of recursion.
`
`12.2 DESCENT METH,QDS
`Consider a nonlinear real-valued function z of a vector variable .x,
`
`z =f(x),
`
`( 12 .. 2)
`
`with i e Rn, where Rn represents t'he set of all n-tu.ples of real numbers. The question we face no\v
`is how to find such a vector denoted by x* that the function z is minimized. This is classified as
`an unconstrained nonlinear programming problem.
`
`12.2.1
`
`flRST-0RDE '.R NECESSARY CONDITIONS
`
`.AocoJiding to the optimization th eory, if f (x) has continuou$ fi·rst-order partial derivatives, then the
`first-o r der necessary condition s that i* has to satisfy are
`
`(12.3)
`
`IPR2021-00827
`Unified EX1008 Page 278
`
`
`
`Pel Recursive Tech niqu e
`
`253
`
`\vhere V denot es tl1e grad ient operation with respect to x evaluated at x*. Note that whenever there
`is only one vector variable in the function z to wt1ich the gradient operator is applied, the sign V
`would remain without a subscript, as in Ec1uation I 2.3. Otherwise, i.e., if there is more than one
`vector variable in the function, we will explicitly write o·ut the variable, to which the gradient
`operator is app lied, as a subscript of tt1e sign V. In tl1e component form, Equation 12.3 can be
`expressed as
`
`0
`
`aJ(x) =
`dX l
`df(i) = O
`dX 2
`of(x) _
`a - o.
`X
`11
`
`•
`•
`•
`
`•
`
`( 12.4)
`
`12.2.2
`
`SECOND-ORDER SUFFICIENT CONDITIONS
`
`If F (x) has seco nd-order continuous derivatives, tt1en the second-order sufficient conditions ·for
`F(i *) to reach the n1ir1i1num are knowr1 as
`
`where H denotes the Hessian n1atrix and is defined as follows.
`
`H(x·) > 0,
`
`( 12.5)
`
`(12.6)
`
`H(i) =
`
`a2J(x) a2 f(x)
`dx1d):2
`a2J(x) a2 J(.fJ
`d2:r,,
`d.-t2dX1
`-•
`a2 J(x) a·2t(x)
`dX dX
`ax ax
`2
`I
`
`0 2 ).:I
`
`•
`•
`•
`
`II
`
`••
`
`•
`
`•••
`
`•
`•
`•
`
`••
`
`•
`
`a2 J(x)
`ax,a_.,'(,,
`a2 f(x)
`ax ax
`2
`n
`•
`•
`a2 f(x)
`•
`a2x II
`
`•
`
`•
`
`( 12.7)
`
`•
`•
`
`II
`
`We can thus see that the Hessian matrix consists of all tl1e second-order partial derivatives of fwi th
`respect to the components of .x. Equation 12.6 means tl1at the Hessian matrix H is positive definite.
`
`12.2.3
`
`UNDERLYING STRATEGY
`
`Our aim is to derive an iterative procedure for tl1e minimizatio.n. That is, we want to find a sequence
`
`such that
`
`-
`- - -
`x0 , x 1, X1, • • ·, X 11
`
`, • • ·,
`
`and the sequen,ce converges to the minin1un1 off (i), f(x*)
`
`.
`
`( 12.8)
`
`( 12.9)
`
`IPR2021-00827
`Unified EX1008 Page 279
`
`
`
`254
`
`Image and Video Cornpr essior1 for Mu lti med ia Engineering
`
`-k+I
`.r
`
`. 0
`
`FIGURE 12.2 Descent method.
`
`A funda111ental underlying strategy for aln1ost al I tl1e descent ,1lgori 111 n1s (LL1er1bcrger, 1984) is
`described next. We start \Vith an initial point in the space; we detern1ine a direction to mO\'e
`according to a certain rule; then \Ve mo,1e along the direction to a relati\,e r11ini111un1 of the function
`z. This minin1um point becomes the initial point for tl1e next itera tion.
`This strategy can be better vjsualized using a 2-D exan1ple, sl1own in Figure 12.2. Tl1ere, x =
`(.,t·1 • • t 2) 1• Several closed curves are referred to as co ,110111· CL11·ve~· or Level c 1,11)es. Tl1at is, each of
`the curves represents
`
`( 12.10)
`
`w-itl1 c bein.g a constant.
`Assum.e t'l1at at the ktl1 iteration, \V e have a guess: .rk . For the (k + l)th iteration, \ Ve need to
`
`• F·in,d a search direction, pointed by a vector ook;
`• Det e, 1nine a11 optimal step size a1: \vitl1 a1: > 0,
`
`such that the next g·uess xk+ i is
`
`- k+ I
`x
`
`- k
`k - k
`= ,r +a ro
`
`( 12. 11)
`
`and PT1 satisfies f(x") > f(x"+1).
`In Equation 12.11, .i.k can be viewed as a prediction vector for
`vecto .r, vk. Hence, using the Taylor series expansion, we can have
`
`j k+I,
`
`\.Vhile a k ci)k an updat e
`
`where (s, /) denotes the inner product between vectors s and t~ and e repre sents tl1e higher -order
`ter111s in the ex.pansio·n. Consid.er tl1at the incre1nent of ak il:Jk i's sma ll enough and, thus, e car1 be
`ignored. From Equation 12.10, it is obvious that in order to l1ave / (xk+1) < F( .,"ik) \Ve 111ust ha,,e
`fJ!&k) < 0. That is,
`(Vf(xk),
`
`( 12.13)
`
`(12 . 12)
`
`IPR2021-00827
`Unified EX1008 Page 280
`
`
`
`Pel Recursive Tecl1nique
`
`255
`Choosing ,l diff erent update vector, i .e., ll1c product of the {if vector and the step size a'·:, results
`
`i11 a diff erer1t algoritl1111 ir1 i111ple111enting tl1e descent n1etl1od.
`I11 tl1e san1e category of tl1e descent 111etl1od. a variety of tecl111iques have been developed. The
`reader 111ay refer to Lu e11berger ( 1984) or the 111a11y other existin g books on opt i1n'izati on. Two
`co111n1only used tccl111iqL1es or tl1c dcscc11t n1.etl1oc.t are discussed below. One is call ed the steepest
`de ccn t n1ethod, in wl1 icl1 tl1e sca re!, dir ection represented by the w vector is chosen to be opposite
`ro tl1nt of tt1e gradie nt vector, and a real parameter of tl1e step size ak is used; the other is the
`Ne\vtor1-Rapl1so11 n1ethod, in wl1ich tl1e u.pdate vector in estin1ation, determined joint ly by the
`enrch dire cti o11 and tl1e step size, is related to the Hessj·an matrix , de.fin ed in Equ ation 12.7. The~e
`l\VO tccl1niques ,1rc rurtl1er di scussed i,1 Sections 12.2.5 a11d 12.2.6, respectively.
`
`12.2.4
`CONVERGEN CE SPEED
`Speed or co 11vergcnce i
`,1n in1portant is uc in discus ing th e de cent n1etl1od. IL is utili zed to
`c\,al uate tl1c pcrf'on11ar1cc of di rrcren t algorj tl1ms.
`
`Order of Convergence
`As ume a seq.uence of \iectors { ,tk}, \Viti, k = 0 1 l , · · ·, oo, converges to
`a 111i n i m.u1n denoted by .x*. We say that tl1e convergence is of order JJ i r the fol lowin g formu la
`l1olds (Lu e11berger
`l 984) :
`
`I-k +I -•1
`
`.\'
`Q~ liill k-l co - k
`.t
`
`-
`
`,
`
`( 12.14)
`
`- .\"
`, < 00
`_ .
`.,r
`
`1,
`
`the li 111it superio r, and I I indi cates the 1nag11itude or norn1 of a
`\vhere p is posi ti ve, Jin1 denote
`vector argun1e11t. For the two latter noti ons, 111ore description s foll o\v.
`T l1e co nce 1) l of Lhe lin1it uperi or is based on lhe concept of supremum. Hence, let us fir st
`discuss tl1e supremun1. Consider a set or real nun1bers, denoted by Q, 1i1at is bounded ab·ove. Then
`tl1ere n1L1st ex i·Sl a s111allest real nun1ber (J such tl1at for all t~1c real nun1bers i11 tt1e set Q, i .e., q E
`Q, we l1ave Cf~ o. Tl1is real nu111ber o is referred to as tl1e least upper bound or the supre111un1 of
`lhe set Q, and is denoted by
`
`( 12. I 5)
`
`No\v tum to a real bounded above sequence ,.k, k = 0, I ,·· ,oo. If s·k = sup { ri: j ~ k}, tl1e11 tl1e sequence
`{sk} converg es to a real 11uml)er s*. Tl1is real r1u111ber s* is referred to as tl1e li·111it superior of tl1e
`sequence { ,.k} and is de11oted by
`
`The 111ag,nitude or norrn of a vector x, denoted by lxJ, is d.efined as
`
`(12.16)
`
`( 12. I 7)
`
`•
`
`where (s, t) is. tlie i.nne( product bet\veen tl1e vector s and T. Tl1roughout .tl1is di scuss~on, \Vl1e11 \V e
`say vector we ,nean column vector. (Row ,,ectors ca11 ·be J1a11d