`
`1 Iref• Icum aBM• M
`
`Quasi-Newton minimization with
`translational motion model and
`4-smoothed images
`
`1 amin• MSEm;n
`
`Quasi-Newtein minimi:z:ation with
`affine motion model and
`2-smoothed images
`
`1 3min• MSEmin
`
`Quasi-Newton minimization with
`quadratic motion model and
`2-smoothed images
`
`1 Rmin• MSEmin
`
`Quasi-Newton minimization with
`aquadratic motion model and
`non-smoothed images
`
`lllQN· MSEQN
`
`Figure 19: Block diagram of Quasi-Newton minimization of the error function
`
`After each minimization, error function value (evaluated in non-smoothed image domain)
`is compared to the lowest value obtained so far. If the value has decreased the motion
`parameters are stored and the next minimization begins at this point in the parameter
`space. Otherwise the motion parameters of the last minimization are discarded and the
`next minimization is initialized by the parameters yielding the smallest value of the error
`function. In the first three minimizations bilinear interpolation of the subpixel values is
`applied to speed up the minimization. Cubic interpolation is then used only in the last
`minimization to guarantee maximum performance.
`
`After the termination of the Quasi-Newton minimizations error function value MSEQN is
`compared to the one after the minimizations of the Taylor expanded error functions
`MSETaylor and the motion parameters yielding smaller value are chosen for the motion
`parameters for this block.
`
`4.3 Merging
`
`Splitting and motion estimation results in a large number of blocks (usually) with good
`predictions. However many of these blocks could have been predicted with the same set
`of motion parameters without affecting the quality of the prediction too much_ To remove
`this redundancy, blocks with similar motion characteristics are merged together to form
`
`Motion Estimation and Representation for Video Coding Applications
`
`34
`
`Vedanti Systems Limited - Ex. 2007
`Page 41
`
`
`
`larger segments. Following subsections describe how to decide which blocks are to be
`merged and how to obtain motion parameters for the merged segment knowing the
`parameters for the original ones.
`
`4.3.1 Finding the final segmentation
`
`To find the optimal segmentation, Lagrangian cost
`
`L=D+AA
`p
`
`(50)
`
`of the image is to be minimized [Kar96]. In (50), the square error between the original and
`the decoded frame is denoted by D, the number of bits needed to code the frame is
`denoted by R and the number of pixels in the frame is denoted by P. A. is the Lagrangian
`multiplier specifying the cost of one bit with respect to the distortion measure. In order to
`find the minimum in L, all the possible combinations of segments should be checked. To
`avoid the computational complexity an approximative method is presented in this section.
`
`Let M...kl be the reduction in Lagrangian cost when two segments sk and sl are merged to
`form Sid. M...ld can be calculated by
`
`(51)
`
`The segment Sk can be coded in three different modes (i.e. m INTER, INTRA or
`UNCHANGED mode). The cost of the segment is assumed to be the minimum of the
`costs of these three modes, since it minimizes (50). Thus the Lagrangian costs for all the
`modes have to be evaluated
`
`In calculation of the INTER mode cost the number of bits for the motion information is
`needed. However this information will be available only after the merging and motion
`model adaptation. To overcome the problem a constant number of bits per motion
`parameter is assumed. By simulations a value of three bits per parameter is found to be a
`good approximation. Prediction error of the segment is coded with 8x8 DCTs similarly as
`in ITU's H.263 recommendation [ITU95]. Bits for the prediction error coding are added
`to the bits for motion information to yield R(Sk). D(Sk) is the squared error between the
`coded segment and the corresponding segment in the current image. These values are then
`placed to the equation (46) and LINlER(Sk) is evaluated.
`
`Calculations of the Lagrangian costs for INTRA and UNCHANGED mode are simpler.
`INTRA mode cost LINTRA(Sk) is obtained by coding the segment with the codec's intra
`coding method
`(in
`this
`implementation
`that
`is
`the one specified
`in H.263
`recommendation) and the cost for the UNCHANGED mode LuNCHANGEo(SJ is equal to
`
`Motion Estimation and Representation for Video Coding Applications
`
`35
`
`Vedanti Systems Limited - Ex. 2007
`Page 42
`
`
`
`I
`
`!
`
`i·
`
`I!I!IC="---------------~~~~------------~-~-~
`
`the· mean squared difference between the reference and the current image in the segment
`area. Lagrangian cost for the segment Sk is then
`
`(52)
`
`INTRA and UNCHANGED mode costs for combined segment Skl are calculated simply
`by adding costs for the same mode of the segments Sk and S1• To obtain Lagrangian cost
`for the INTER mode, joint motion parameters must be found, motion estimated prediction
`must be build and the prediction error must be coded.
`
`Merging procedure is initialized by building an adjacency graph of the segments. Each
`node in the graph is associated with the Lagrangian cost of the segment and each arc with
`the reduction in cost when adjacent segments are merged. Figure 20 shows an example of
`such a graph before and after segments Sk and S1 have been merged together.
`
`Adjacency graphs
`
`Segmentations
`
`S;
`
`I
`
`Figure 20: Adjacency graphs and segmentations before (up) and after (down) merging Sk and S1
`
`After building the adjacency graph and calculation of the initial L(Sk)s and M.kls, arc with
`the largest reduction in Lagrangian cost is chosen and the segments associated with it are
`merged. Adjacency graph is updated and the algorithm proceeds by choosing again the arc
`which yields the largest reduction in cost. This is iterated until there exists no longer a
`pair of segments which yields reduction in cost when merged. Algorithm is summarized
`in figure 21 and in the figure 22 there is shown resulting segmentation for the first frames
`of both Akiyo and earphone test sequences.
`
`Motion Estimation and Representation for Video Coding Applications
`
`36
`
`Vedanti Systems Limited - Ex. 2007
`Page 43
`
`
`
`r--------·---------.. -------···--·-
`
`--·' ~·
`
`f
`
`I,.r, Icum ak for each segment
`
`Building of the adjacency
`graph and calculation of the
`L(SJs and 8Lkls
`
`,.......-~ Find the arc with largest reduction
`in cost and merge segments
`
`Update the adjacency graph and
`find motion parameters and costs
`for the new segment pairs
`
`'-Y-e-s--1
`
`Check if there is segments which
`yield reduction in cost
`
`No
`
`Finished merging
`
`Final segmentation and ak for each segment
`
`Figure 21: Block diagram ofthe merging
`
`Aklyo
`
`Oarph<=~ne
`
`Figure 22: Resulting segmentations after merging for the first frames of Akiyo and Carp hone
`sequences
`
`The main complexity of the algorithm presented here comes from finding the motion
`parameters for segment pairs which are candidates for merging and from the calculation
`of the lNTER mode Lagrangian costs for these segment pairs. In the next section a low
`complexity method for finding the joint motion parameters for two segments is shown. To
`decrease the burden caused by the calculation of INTER mode costs, following
`assumptions were made. First it is assumed that if adjacent segments Sk and S1 are both in
`INTRA or in UNCHANCED mode (i.e. min{LINTER(S), LINTRA(S), LuNCHANGEn(S)} is
`either L!NTRA(S) or LuNCHANGEo(S)) the resulting segment Ski will be in the same mode and
`
`Motion Estimation and Representation for Video Coding Applications
`
`37
`
`Vedanti Systems Limited - Ex. 2007
`Page 44
`
`
`
`--------------------------·-e--·-•·•·--···-•·•--
`
`!·
`
`thus there is no need to calculate LINrnR(S~ci). It is also assumed that the lower the increase
`in prediction error due to merging the higher the reduction in Lagrangian cost will be. An
`estimate for the increase in prediction error is obtained from the calculation of the joint
`motion parameters as will be seen in the following section. This information is used to
`choose whether or not the Lagrangian cost for a segment pair Ski is to be calculated.
`
`Assume that S~ci is a segment which was formed by merging two segments S~c and S1. To
`update the adjacency graph, joint motion parameters for Ski and all its neighbours have to
`be calculated. Also the approximated increases in prediction error have to be obtained for
`all the segment pairs. Letting .6.Gmin be the smallest increase in prediction error, the
`INTER costs are calculated only for those segment pairs for which .ilG < T-AGmin·
`Otherwise the INTER mode Lagrangian cost is set to infinity. In the implementation a
`value of 3.0 is used for the multiplier T. With this value, on average less than a half of the
`segment pairs need to be examined and the best candidate for merging is found in over
`95% of the cases [Kar96].
`
`4.3.2 Finding the joint set of motion parameters for two segments
`
`Given two segments Sk and Sh the problem is to find a set of motion parameters
`minimizing the error function for the merged segment Ski = Sk u St. The approach is very
`similar to the one in Taylor expanded minimization of the error function. Again Taylor
`expansion of the reference image Iref is employed. The expansion is done around the
`points
`
`x' = x+dx(ak,x,y), y' = y+d/ak,x,y), (x,y)ESk
`x'=x+dx(a1,x,y), y'=y+dy(a,,x,y), (x,y)ES1
`
`{
`
`in image coordinate space. Using (35) and (36), the problem can be written as
`
`(53)
`
`(54)
`
`where ak1 is a set of joint motion parameters and the matrices Ek and E1 and vectors Yk and
`y1 are given by (38) and (39), respectively. Substituting both Ek and Et by their QR
`decompositions (54) becomes
`
`(55)
`
`Multiplying both sides of the equations above by the transposes (inverses) of the
`orthogonal Q matrices yields
`
`Motion Estimation and Representation for Video Coding Applications
`
`38
`
`.
`
`.
`
`'
`
`:1
`'
`
`,..,--.. -__ ·
`'ft
`·~ -~
`
`i I
`
`~
`
`Vedanti Systems Limited - Ex. 2007
`Page 45
`
`
`
`(56)
`
`Rk and R1 are both upper triangular matrices of size Pi*l2 (Pi being the number of pixels
`in segment Si) and thus having non-zero elements only in the first twelve rows. Letting
`Rkl, Rn, Zkt and Z11 be the first twelve rows of~' R1. Qk TYk and QtTYI respectively, (56)
`can be rewritten as a set of 24 equations and 12 unknowns (ald)
`
`or combining Rkl with R11 and ZkJ with z11
`
`(57)
`
`(58)
`
`Givens rotations can be applied to both [Rkt Rn]T and [Zkt zn]T to triangularize the first
`one [Pre92]. Solution of (58) becomes then a solution of a triangular system:
`
`(59)
`
`from where the joint motion parameters akl are obtained by simple backsubstitution. It
`should be noted that the only information needed from the original segments Sk and St is
`the matrices Rkl and Rn of size 12x12 and the vectors zkl and zn of length 12. For the
`initial segments of quadtree segmentation these matrices and vectors have to be build by
`the QR decomposition of the corresponding E matrices. For a merged segment these are
`obtained as a 'side product' (Rkn and zkll in equation 59) when finding the joint set of
`motion parameters. Conveniently Zk12 in (59) offers an estimate of the increase in
`prediction error when segments Sk and S1 are merged. Increase in squared error can be
`calculated as follows
`
`11G kl o= liz mll2
`
`4.3.3 Implementation issues
`
`ll
`
`o= L ( z kl2 ):
`
`i=O
`
`(60)
`
`Finding the joint motion parameters and Rktl and Zkll for each segment pair is an
`approximative operation. Every time segments are merged certain amount of interference
`is introduced. Thus it is important not to allow this intetference accumulate. To avoid the
`accumulation, after merging of two segments the prediction errors of the original
`
`Motion Estimation and Representation for Video Coding Applications
`
`39
`
`.
`
`. ~.'
`
`.,
`
`u
`
`Vedanti Systems Limited - Ex. 2007
`Page 46
`
`
`
`I I
`
`i j
`l ·
`
`!
`
`,;
`
`segments are compared to the one of the combined segment. If the increase in MSE is
`larger than a threshold (experimentally set to 7.0) addition motion estimation takes place.
`
`Additional motion estimation is similar to the one presented in section 4.2.2 with some
`simplifications. Initial motion parameters are set to the motion parameters obtained by
`solving (59). In Minimizations of Taylor expanded error functions -stage only the steps
`with non-smoothed images are performed. Quasi-Newton minimization of the error
`function takes place if the MSE of the segment is larger than 100.0 and only the last
`minimization described in section 4.2.2.4 is carried out. These simplifications are due to
`the fact that ' a fairly good estimate of motion parameters is already available. After
`additional motion estimation Rkn and ZJdt are no longer valid for the segment and they are
`recalculated by QR decomposition.
`
`Like the validity of the motion parameters also that of the QR decomposition is checked
`after merging two segments. If the actual increase in prediction MSE differs too much
`from the one estimated by the QR decomposition as given in (60), the QR decomposition
`is rebuilt. As a threshold a MSE difference of 5.0 is used in the implementation.
`
`4.4 Motion model adaptation and mode selection
`
`In the last stage of motion vector field coding the motion model is adapted to match the
`motion in each segment separately. Motion parameters are removed from the model one
`by one and the parameters yielding the minimum in the INTER mode Lagrangian cost are
`set
`to
`the motion parameters of
`the
`segment. Finally
`the coding mode
`(INTER/INTRA/UNCHANGED) of the segment is chosen according to the Lagrangian
`costs of the modes.
`
`4.4.1 Orthonormalization
`
`Before the actual motion model adaptation, motion parameters ak of each segment are
`transformed from quadratic motion model to the orthonormalized ones iik. Not only the
`motion parameters but also the Rk1 and zkl matrices obtained from the merging procedure
`can be transformed as follows
`
`Ru =RkiT
`zkl = zkl
`
`(61)
`
`where - is used to denote matrices and vectors in the orthonormal basis. Transform matrix
`T is calculated by
`
`Motion Estimation and Representation for Video Coding Applications
`
`40
`
`.
`
`~
`
`Vedanti Systems Limited - Ex. 2007
`Page 47
`
`
`
`T=[~ ~J
`
`(62)
`
`and
`
`Tl == •
`
`(63)
`
`aJ,ol3o.o
`ao.oP1.o
`CXo,oPo,o
`ao.ofiz.o
`CXl,Ofii.O
`az.of3o.o
`ai.JJo,o
`iiufi1.o az.J3o.o
`0
`0
`0
`ao,oPJ.l
`0
`0
`iio,o/32,1
`0
`al.ofil.l
`0
`0
`0
`0
`0
`al.lfil.l
`0
`0
`0
`0
`0
`0:2,2/30,0
`0
`0
`0
`0
`ao.oPz.z
`0
`where a .. 's and jJ .. 's are given by ( 17). The characteristic equation of segment Sk with
`p=I2 basis functions becomes
`
`l,)
`
`1,)
`
`(64)
`
`4.4.2 Removal of a basis function from the motion model
`
`Removing a basis function from the model is equivalent to removing a column from Rk1
`and an element from at. Let there be initially p basis functions, out of which the i'th one
`is to be removed. ( 64) can be rewritten for p~ 1 basis functions as follows
`
`where Rf;1 is equal to Rf1 with i'th column removed and at1 is the new set of p-1
`motion parameters (unknown at this point). System in (65) can be triangularized by e.g.
`Givens rotations to yield
`
`(65)
`
`(66)
`
`which can be solved by backsubstitution (Rr1 being a triangular matrix). The increase in
`square prediction error when i'th basis function is removed is available without even
`solving (66) by just calculating i.
`
`4.4.3 Finding the best motion model and selecting the coding mode
`
`It is assumed that the basis function which yields the least increase in prediction error
`when removed, is the least significant one and thus the first one to be removed from the
`model. This increase can be easily calculated for all the basis functions as seen in the
`
`Motion Estimation and Representation for Video Coding Applications
`
`41
`
`Vedanti Systems Limited - Ex. 2007
`Page 48
`
`
`
`r
`
`I ,,
`
`previous section. Starting from the least important one, basis functions are removed one
`by one from the model. After each removal Lagrangian cost for the INTER mode is
`evaluated similarly as in the case of merging (section 4.3.1). Only difference being that
`now the number of bits for motion parameters does not need to be approximated, but they
`can be calculated by coding the parameters. After every basis function has been removed
`the set of basis functions yielding the minimum in Lagrangian cost forms the motion
`model of the s~gment. Finally Lagrangian costs for INTER, INTRA and UNCHANGED
`modes are compared and the coding mode is chosen to be the one of smaJlest cost. The
`algorithm is shown in figure 23.
`
`1 Iref> lcom Rkt, Zkl
`Transformation to orthogonalized I
`motion model 1 jill -12
`
`.tl, zkt
`
`Calculating the INTER Lagrangian
`cost for the full motion model
`
`1 jil2
`
`-12
`J:l' Z.c;J
`
`Removing the least significant
`remaining basis function
`
`1 jip-1
`
`kl
`
`-p-1
`., Z.u
`
`Iterate until p = 0
`
`Evaluating the INTER mode cost
`and saving the result if it is the
`smallest so far
`_ _L minimum LINTER· afm;D
`
`Selection of the coding mode
`
`1 selection information, ak
`
`Finished
`
`Ftgure 23: Block dtagram of fmdmg the mot10n model for a segment
`
`4.4.4 Coding of the motion information
`
`In the last stage of coding the motion parameters of the segments in INTER mode are
`quantized. A uniform scalar quantization is performed for all the remaining motion
`parameters. Before the quantization motion parameters are normalized with respect to the
`ratio of pixels in the segment and in the bounding box of the segment. This is done to
`prevent a small segment with a large bounding box to take as many bits as a large segment
`with similar motion and the same size of the bounding box. The quantized parameters are
`then obtained from the non-quantized ones by
`
`Motion Estimation and Representation for Video Coding Applications
`
`42
`
`1
`~
`
`• •
`•
`
`-.-
`•
`
`, •• 4 .
`'
`
`I
`
`Vedanti Systems Limited - Ex. 2007
`Page 49
`
`
`
`(67)
`
`where the number of pixels in the segment Sk is denoted by Pk. number of pixels in the
`bounding box, by pkbox and the uniform quantizer step by Qstep· In the implementation a
`step size of 3.0 is used.
`
`To inform the decoder which basis functions have been removed from the motion model,
`two six bit headers are constructed. Each bit in the first header indicates whether or not a
`basis function j(x,y), i = 0, ... , 5 is removed and the bits in the second header is used to
`indicate whether or not ~(x,y), j = 6, ... , 11 is removed. These headers are separately
`VLC coded with a Huffman code. Finally also the quantized motion parameters. having
`nonzero values are Huffman coded and put to the bitstream.
`
`4.5 Coding results
`
`The motion estimation scheme is combined with a Multi-Transform Prediction Error
`Coding technique as described in [Kar96J. and the coding results are compared against
`those of the state-of-the-art low bitrate coding standard H.263 with 'Advanced Prediction
`Mode' and 'Unrestricted Motion Vectors'. The results for both schemes are shown in the
`table 1 for a number of test sequences. Both QCIF (176x144 pixels) and CIF (352x288
`pixels) sequences are included in the test material.
`
`Table 1; Comparison of Nokia codec and H.263 codec with equal bitrates
`
`Picture
`size
`
`Nokia codec
`PSNR(Y/C)
`Bitrate
`[dB]
`[kbps]
`
`H.263
`Bitrate
`PSNR(Y/C)
`[dB]
`[kbps]
`
`Improvement in
`PSNR(Y/C)
`[dB]
`
`QCIF
`
`QCIF
`
`QCIF
`
`QCIF
`
`QCIF
`
`QCIF
`
`QCIF
`
`QCIF
`
`CIF
`
`QCIF
`
`QCIF
`
`CIF
`
`CIF
`
`10.17
`9.98
`
`9.77
`
`9.93
`
`21.36
`
`23.97
`
`23.39
`
`24.39
`
`47.91
`46.49
`
`49.76
`
`111.96
`
`107.10
`
`33.9/40.3
`31.7 I 38.1
`
`32.0/37.6
`
`37.4/40.8
`
`36.5/42.2
`
`32.7/36.9
`
`35.3/40.5
`
`32.6/37.1
`
`33.0/37.4
`
`33.9/39.0
`
`30.2/41.3
`
`35.9/39.2
`
`31.5/37.5
`
`10.31
`10.27
`
`10.12
`
`10.31
`
`21.28
`
`24.32
`
`23.66
`
`24.84
`
`48.08
`
`47.16
`
`49.70
`
`108.35
`
`110.18
`
`33.3/39.2
`31.2/38.0
`
`30.3/36.8
`
`35.6/39.8
`
`35.4140.9
`
`32.1136.8
`
`33.6/39.2
`
`31.2136.1
`
`31.9/36.9
`
`32.3/38.0
`
`29.4/40.5
`
`34.1/38A
`
`30.5/36.8
`
`0.6/1.1
`0.5 I 0.1
`
`1.7/0.9
`
`1.8/1.0
`
`1.1/1.3
`
`0.6/0.1
`
`1.7/1.3
`
`1.4/1.0
`
`1.1/0.5
`
`1.6/1.0
`
`0.8/0.8
`
`1.8/0.8
`
`1.0/0.7
`
`43
`
`Sequence
`
`Mother&Dtr
`
`Hall Objects
`
`Container
`
`Akiyo
`
`Mother&Dtr
`
`Silent Voice
`
`Container
`
`News
`
`News
`
`Foreman
`
`Coastguard
`
`News
`
`Foreman
`
`Motion Estimation and Representation for Video Coding Applications
`
`Vedanti Systems Limited - Ex. 2007
`Page 50
`
`
`
`----~--------------~---------~--···--
`
`To compare the results, bitrates are made equal by assigning the quantizer parameter for
`the error coding schemes. The quality of the coded video signal is measured with the
`average peak signal to noise ratio (PSNR) calculated as
`
`2552
`PSNR=lOlog-(cid:173)
`MSE
`
`(68)
`
`where MSE is the mean square difference between the original and the coded frame. The
`chrominance PSNR values in the table 1 are averages of the PSNRs . of both the
`chrominance components. The frame rates in simulations with bitrates from 10 kbps to 48
`kpbs vary from 7.5 fps to 10 fps and the simulations with the bitrate of 112 kbps were run
`on 15 fps.
`
`As seen from the table 1, the Nokia codec reaches PSNRs from 0.5 to 1.8 dB higher than
`the H.263 codec. Most of the improvements are due to the motion modelling and
`estimation scheme presented in this thesis, since with the same DCT based prediction
`error coding scheme used in the H.263 the Nokia codec achieves improvements between
`0.3 and 1.5 dB over the H.263. To obtain the corresponding reduction in bitrate another
`set of results are shown in the table 2 below.
`Table 2: Comparison between Nokia codec and H.263 with nearly equal PSNRs
`
`Sequ~nce Picture
`size
`
`Noki~ codec
`Bitrate
`PSNR(Y/C)
`[dB]
`[kbps]
`
`H.263
`PSNR(Y/C)
`[dB]
`
`Bitrate
`[kbps]
`
`Bitrate
`reduction
`[%]
`
`PSNR (Y/C)
`difference
`[dB]
`
`Akiyo
`
`Container
`
`Mthr&Dtr
`
`Mthr&Dtr
`
`Foreman
`
`News
`
`QCIF
`
`QCIF
`
`QCIF
`
`QCIF
`
`QCIF
`
`CIF
`
`9.93
`9.77
`
`10.17
`
`21.36
`
`46.49
`
`37.4/40.8
`32.0/37.6
`
`33.9/40.3
`
`36.5/42.2
`
`33.9/39.0
`
`18.06
`17.46
`
`11.96
`
`26.96
`
`61.76
`
`111.96
`
`35.9/39.2
`
`156.66
`
`37.4 I 40.4
`32.0/37.6
`
`33.8/39.5
`
`36.1/41.3
`
`33.5/38.8
`
`35.5/39.1
`
`45
`44
`IS
`21
`
`25
`
`29
`
`0.0/0.4
`0.0/0.0
`
`0.1/0.8
`
`0.4/0.9
`
`0.4/0.2
`
`0.410.1
`
`Now the quantizer parameters are chosen in a way that Nokia codec meets certain bitrate
`requirements (first three results are with a bitrate of approximately 10 kbps, then 24 kbps,
`48 kbps and finally 112 kbps). H.263 codec is set to yield PSNRs which does not exceed
`the PSNRs of the Nokia codec. Comparing the bitrates it can be noticed that with the
`motion estimation algorithm presented in this chapter and a novel prediction error coding
`scheme reductions of 15-45% in bitrate can be obtained while keeping the objective
`quality the same or better. In appendix A there can be found examples of images coded
`with both schemes together with prediction images resulted from motion estimations with
`both translational and quadratic polynomial motion models.
`
`Motion Estimation and Representation for Video Coding Applications
`
`44
`
`Vedanti Systems Limited - Ex. 2007
`Page 51
`
`
`
`---- - - - - - - - -
`
`..
`(;
`
`5. Conclusions
`
`In this thesis a very sophisticated algorithm for motion estimation was introduced. The
`full quadratic motion model was chosen as the basis of the motion model. This allows the
`system to compensate a number of different types of movements in the image plane. To
`make the motion parameters more robust to the quantization the basis functions of the
`motion model are orthonormalized with respect to the bounding boxes of the segments.
`The problem with the larger number of motion parameters compared to the state-of-the-art
`solutions is handled by keeping only the parameters yielding the optimal performance in
`rate-distortion sense.
`
`Not only the motion model but also the segmentation of the images differs from the ones
`used in the current video coding standards. The image to be coded is not segmented in
`square blocks of constant size, but the segments consists of smaller blocks which are
`merged together to form larger segments with different motion characteristics. Also the
`segmentation is done in the rate-distortion sense to achieve the maximum in performance.
`
`The motion estimation algorithm developed for this scheme is very robust to local minima
`in the error function . It can also adapt its behaviour to the complexity of the function to be
`minimized. It observes the convergence rate and according to this makes decisions how to
`proceed with the minimization.
`
`Motion Estimation and Representation for Video Coding Applications
`
`45
`
`Vedanti Systems Limited - Ex. 2007
`Page 52
`
`
`
`The experiments confirm that the motion modelling and estimation scheme described in
`this thesis yields significant improvements in the image quality or corresponding
`reductions in the bandwidth compared to the current standards. However the complexity
`of the encoding process limits the possible applications of this scheme with today' s
`technology. This is evident especially in high bitrate applications where the images are
`large and the frame rate is high. As such the algorithm is most. suitable for low bitrate
`applications. They are characterized by smaller picture size and lower frame rate. Smaller
`picture size makes the algorithm faster and in the lower frame rate full potential of the
`more complex motion model can be utilized (since movements need to be modelled over
`longer periods of time).
`
`Unlike the encoder, the decoder can be implemented with low cost making the scheme
`very attractive when it comes to asymmetric applications with not so tight real-time
`requirements. Such applications
`include electronic mail (video as attachments),
`broadcasting, different retrieval applications (e.g. world wide web) and video archives.
`'
`.
`
`Motion Estimation and Representation for Video Coding Applications
`
`46
`
`--
`
`~-
`
`. .,..._
`
`-
`
`Vedanti Systems Limited - Ex. 2007
`Page 53
`
`
`
`r---------
`
`,
`t _ Bibliography
`
`[IS093] ISOIIEC 11172-2, "Coding of Moving Pictures and Associated Audio for Digital
`Storage Media up to about 1.5 Mbits/s - Part 2: Video", International Organization for
`Standardization (ISO), 1993.
`
`[Sun96] World Wide Web site of the Sun Microsystems Computer Corporation, URL:
`http://www.sun.com/sparc/whitepapers/UltraSPARCtechnology/, October 1996
`
`[ITU95] ITU-T Draft Recommendation H.263, "Video Coding for Low Bitrate
`Communication", December 1995.
`
`[IS096] ISOIIEC 13818-2, "Generic Coding of Moving Pictures Associated Audio
`Information:'Video", International Organization for Standardization (ISO), 1996.
`
`[Dug95] Jean-Luc Dugelay, Henri Sanson, "Differential methods for the identification of
`2D and 3D motion models
`in
`image sequences", Signal Processing: hnage
`Communication, Vol.7, No. 1, March 1995, pp 105-127.
`
`[Wol90] G. Wolberg, "Digital Image Warping", New York, IEEE Computer Society,
`1990, pp. 67-70.
`
`[Ger95] A. Gersho, R. Gray, "Vector Quantization and Signal Compression", Kluwer
`Academic Publishers, Norwell (MA), 1995, pp 235-240.
`
`[Abr72] M. Abramowitz, I. A. Stegun, "Handbood of Mathematical Functions with
`Formulas, Graphs and Mathematical Tables", New York, NY, Dover 1972
`
`[Cic94] P. Cicconi, H. Nicolas, "Efficient Region-based Motion Estimation and
`Symmetry Oriented Segmentation for Image Sequence Coding", IEEE Transactions on
`Circuits and Systems for Video Technology, Vol. 4, No.3, June 1994, pp. 357-364.
`
`[Kar95] M. Karczewicz, J. Nieweglowski, P. Haavisto, "Motion Estimation and
`Representation of Arbitrarily Shaped Image Regions", The Proceedings of IEEE
`International Conference ofimage Processing, Vol. 2, October 1995, pp. 197-201.
`
`Motion Estimation and R~presentation for Video Coding Applications
`
`47
`
`·..
`
`.
`
`J
`
`Vedanti Systems Limited - Ex. 2007
`Page 54
`
`
`
`[Die91] N. Diehl, "Object-oriented Motion Estimation and Segmentation of Image
`Sequences", Signal Processing: Image Comminication, Vol. 3, No 1, February 1991, pp.
`23-56.
`
`[Pre92] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, "Numerical
`Recipes in C", Second Edition, Cambridge University Press, Cambridge UK, 1992
`
`[Fie95] R. Fletcher, "Practical Methods of Optimization", Second Edition, John Wiley &
`Sons, Chichester UK, 1995
`
`[Wu90] S. F. Wu, J. Kittler, "A Differential Method for Simultaneous Estimation of
`Rotation, Change of Scale and Translation", Signal Processing: Image Communication,
`Vol. 2, No. 1, May 1990, pp. 69-80.
`
`[Kar96] M. Karczewicz, J. Nieweglowski, J. Lainema, 0. Kaleva, "Video Coding Using
`Motion Compensation with Polynomial Motion Vector Fields", The Proceedings of First
`International Workshop on Wireless ImageNideo Communications, Loughborough
`University, Loughborough, UK, September 1996, pp. 26-31.
`
`Motion Estimation and Representation for Video Coding Applications
`
`48
`
`Vedanti Systems Limited - Ex. 2007
`Page 55
`
`
`
`--------------------------------~_.... -··
`
`l I,
`
`<
`
`(
`I
`
`'
`
`Appendix A
`
`Examples of coded images
`
`Motion Estimation and Representation for Video Coding Applications
`
`I
`
`I.
`i
`
`. . tl
`
`Vedanti Systems Limited - Ex. 2007
`Page 56
`
`
`
`p---------------------------~...._:am=F""""'"'"·------- -~...___..,"""'-'~
`
`1 :·
`
`I I I I
`
`a)
`
`c)
`
`d)
`
`e)
`
`f)
`
`FigureA1:
`a) Reference image (coded with H.263 intra coder)
`b) Current image (not coded)
`c) Prediction image with translational motion model
`d) Coded image with H.263 (2752 bits Ri 22.93 kbps)
`c) Prediction image with quadratic polynomial motion model
`d) Coded image with Nokia codec (2214 bits R~ 18.44 kbps)
`
`Motion Estimation and Representation for Video Coding Applications
`
`II
`
`Vedanti Systems Limited - Ex. 2007
`Page 57
`
`
`
`~------------------... -..... -Iii ........ "".,..,, •. "".-.... ·'""---···-----=.--· -~- --·
`
`a)
`
`c)
`
`e)
`
`l
`I
`
`b)
`
`d)
`
`f)
`
`Figure A2:
`a) Reference image (coded with H.263 intra coder)
`b) Current image (not coded)
`c) Prediction image with translational motion model
`d) Coded image with H.263 (2384 bits jO:i 23.84 kbps)
`c) Prediction image with quadratic polynomial motion model
`d) Coded image with Nokia codec (1764 bits jO:i 17.64 kbps)
`
`Motion Estimation and Representation for Video Coding Applications
`
`ill
`
`-
`
`-
`
`_;j~_. '
`
`.: ..... . .
`
`. ... •
`
`...
`
`• -
`
`.
`
`Vedanti Systems Limited - Ex. 2007
`Page 58
`
`
`
`.~
`
`a)
`
`c)
`
`e)
`
`b)
`
`d)
`
`t)
`
`Figure A3 :
`a) Reference image (coded with H.263 intra coder)
`b) Current image (not coded)
`c) Prediction image with translational motion model
`d) Coded image with H.263 (3032 bits~ 25.27 kbps)
`c) Prediction image with quadratic polynomial motion model
`d) Coded image with Nokia codec (25J9 bits~ 20.98 kbps)
`
`Motion Estimation and Representation for Video Coding Applications
`
`IV
`
`·
`
`-
`
`·
`
`-~
`
`·
`
`It
`
`Vedanti Systems Limited - Ex. 2007
`Page 59