`Study Group 15, Working Party 15/1
`Expert's Group on Very Low Bitrate Video Telephony
`
`LBC-97-029
`February 1996
`
`Title:
`Source:
`Purpose:
`
`Proposal for Advanced Video Coding
`Nokia Research Center1
`Proposal
`
`Anlage D 7
`
`i.Sa. Google I VSL
`Nichtigkeitsklage vom 8. April 2015
`Bundespatentgericht Munchen
`Quinn Emanuel LLP
`
`Introduction
`This document presents a low bit rate video coding scheme developed at Nokia Research Center. The simulation results
`reported in this document indicate that the scheme consistently achieves higher coding efficiency than the H.263 video
`coder. This contribution is intended as a continuation of the proposal presented in the document LBC-96-91 at the July
`1996 London ITU-T meeting.
`The Appendix of this document contains a detailed description of the proposed video coder. It gives a brief description of
`novel elements of the coder as well as a description of the bitstream syntax.
`
`Video coder
`
`The video coder presented in this contribution contains three major elements distinguishing it from the VM codec which
`prove to be the source of its improved coding efficiency. These elements are:
`1. Rough segmentation of the video frame into arbitrary shaped regions composed of 4-connected 8x8 pixel blocks. The
`segmentation allows compact encoding of motion vector fields and can be described with relatively few bits.
`2. Motion compensation scheme utilizing the above mentioned segmentation and a quadratic motion field model which
`enables very accurate prediction. Motion fields are compactly encoded using 2-D separable orthogonal polynomials.
`3. Powerful VQ and Multi-Shape DCT based scheme utilizing spatial properties of the prediction frame for efficient
`coding of the residual error.
`
`Coding Results
`The performance of the proposed coder has been compared to the H.263 video coder. The Telenor implementation of
`H.263 (TMN 1.6c) with Advanced Prediction Mode and Unrestricted Motion Vectors was used in all the comparisons.
`MPEG-4 test sequences of Class A and Class B in QCIF resolution were used in the simulations.
`Both schemes operated at a fixed frame rate and with a fixed value of the quantiser parameter. In all simulations the
`Nokia coder used the same frrst INTRA frame as H.263.
`
`In the frrst experiment the objective quality of reconstructed sequences at approximately equal bit rates was compared.
`Average2 Peak-Signal-to-Noise Ratio (PSNR) was used as the measure of quality. Simulation results are collected in
`Table 1.
`
`contact person: Contact person: M Karczewicz
`email: marta.karczewicz @research.nokia.fi
`
`2 obtained by averaging PSNRs of particular frames.
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 1
`
`
`
`Sequence Picture QP
`NOKIA
`Intra QP Bit rate PSI\'R(YfC) QP Bit rate
`[kbps]
`[dB]
`[kbps]
`
`size
`
`H.263
`
`Mthr&Dtr
`
`Hall Objects
`
`Container
`
`Akiyo
`
`Mthr&Dtr
`
`Silent Voice
`
`Container
`
`News
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`8
`
`8
`
`8
`
`4
`
`4
`
`8
`
`4
`
`8
`
`15
`
`13
`
`13
`
`7
`
`7
`
`10
`
`7
`
`12
`
`10.14
`
`34.5/40.5
`
`10.22
`
`33.7/39.1
`
`9.88
`
`9.77
`
`20.68
`
`24.19
`
`24.07
`
`33.3/38.8
`
`38.6/42.3
`
`37.2/42.5
`
`33.5/37.8
`
`36.4/41.0
`
`24.91
`
`33.8/37.8
`
`12
`
`15
`
`15
`
`9
`
`I 8
`10
`
`9
`
`13
`
`10.26
`
`10.21
`
`10.00
`
`10.28
`
`21.10
`
`24.30
`
`23.85
`
`24.86
`
`PSNR (YfC)
`
`[dB]
`
`33.5/39.4
`
`32.4/38.5
`
`31.0/37.7
`
`35.9/40.9
`
`PSNR
`
`improvement
`(Y/C) [dB]
`
`1.0 I 1.1
`
`1.3/0.6
`
`2.3/1.1
`
`2.7/1.4
`
`35.7/41.2
`
`1.5/1.3
`
`32.6/37.2
`
`33.9/39.7
`
`0.9/0.6
`
`2.5/1.3
`
`31.7/36.6
`
`2.1/1.2
`
`32.3/38.0
`
`Foreman
`
`Coastguard
`
`qcif
`
`qcif
`
`8
`
`8
`
`10
`
`13
`
`47.03
`
`49.12
`
`34.3/38.8
`
`I 10
`
`30.4/41.3
`
`13
`
`47.43
`
`49.51
`
`29.5/40.6
`
`2.0/0.8
`
`0.9/0.7
`
`Table 1: Comparison of reconstruction PSNRs at equal bit rates
`
`Results in Table 1 shows that the Nokia coder achieves 0.9- 2.7 dB higher reconstruction PSNR than H.263.
`The purpose of the second experiment was to fmd out the bit rate reductions achieved by the Nokia coder when compared
`to H.263 at approximately equal reconstruction PSNRs. As before, all the simulations were obtained using fixed values of
`quantiser parameters. The results of simulations are collected in Table 2.
`As can be seen in Table 2 the Nokia coder enables bit rate reductions between 33 and 53 %. Experiments not shown in
`this document confirm similar improvements over H.263 when PB frames are used in both schemes. It was also noted
`that improvements over H.263 tends to be higher when the quality of the first frame improves.
`
`Sequence QP
`
`NOKIA
`
`H.263
`
`Bitrate
`
`Intra QP Bit rate
`[kbps]
`
`PSNR(Y/C)
`[dB]
`
`QP
`
`Bit rate
`[kbps]
`
`PSNR (YIC)
`[dB]
`
`reduction
`[%]
`
`Akiyo
`
`Container
`
`Mthr&Dtr
`
`Mthr&Dtr
`
`Foreman
`
`4
`
`8
`
`8
`
`4
`
`8
`
`7
`
`9
`
`11
`
`7
`
`8
`
`11.11
`
`14.14
`
`39.0/42.3
`
`34.4/39.3
`
`12.19
`
`35.1/40.8
`
`26.66
`
`55.70
`
`38.1/42.6
`
`35.2 I 39.6
`
`5
`
`7
`
`8
`
`5
`
`6
`
`23.55
`
`28.48
`
`18.23
`
`44.67
`
`87.64
`
`39.0/42.2
`
`34.5 I 39.4
`
`35.2/40.4
`
`38.1/42.7
`
`35.1 f 4·0.0
`
`53
`so
`33
`
`40
`
`36
`
`PSNR
`(YIC)
`
`diff.
`[dB]
`
`0.0/0.1
`
`-0.1/-0.1
`
`-0.1/0.4
`
`0.0/-0.1
`
`0.1/-0.4
`
`Table 2: Bit rate reductions achieved by Nokia coder.
`
`Conclusions
`
`The video coder presented in this proposal is shown to achieve higher coding efficiency than H.263 video coder. The
`objective improvements achieved by the Nokia coder range from 0.9 to 2.7 dB which translates to bit rate savings
`between 33 and 53%. In the light of above facts we believe that the codec has a good potential to be a basis for
`development of the ITU-T Long-term Videotelephony Standard H.263L.
`
`2
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 2
`
`
`
`~
`NOKiA~
`
`RESEARCH CENTER
`
`APPENDIX
`
`DESCRIPTION OF NOKIA VERY LOW BIT RATE
`VIDEO CODER
`version 3.0
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder
`
`3
`
`Vedanti Systems Limited - Ex. 2008
`Page 3
`
`
`
`1. INTRODUCTION
`
`2. CODER DESCRIPTION
`
`2.1 Video Source Coding Algorithm
`2.1.1 Definitions
`2.1.2 Coding modes
`
`2.2 Frame Segmentation
`2.2.1 Splitting
`2.2.2 Merging
`
`2.3 Motion Field Coding
`2.3 .1 Motion model
`2.3.2 Motion coefficient scaling and quantization
`2.3 .2.1 Motion coefficient scaling
`2.3 .2.2 Motion coefficient quantization
`2.3.3 Motion coefficient coding
`2.3.4 Image interpolation
`
`2.4 Multi-Transform Prediction Error Coding
`2.4.1 Overview
`2.4.2 8x8 Classification
`2.4.3 4x4 Classification
`2.4.3.1 Variance Classification
`2.4.3.2 Rate Classification
`2.4.3.3 Directionality Classifier
`2.4.4 Coding Methods
`2.4.4.1 8x8 Coding Methods
`2.4.4.2 4x4 Coding Methods
`2.4.5 Method Selection
`2.4.6 Decoding process.
`
`2.5 Chrominance coding
`
`3. PB-FRAME MODE
`
`3.1 Introduction
`
`3.2 Region Layer
`
`3.3 PH-frames and INTRA regions
`
`3.4 Calculation of motion vector field for the H-picture
`
`3.5 Prediction of a H-region in PH-frame
`
`3.6 Reconstruction of a H-region in PH-frame
`
`4. SYNTAX AND SEMANTICS
`
`4.1 Picture Layer
`4.1.1 Picture Start Code (PSC) (22 bits)
`4.1.2 Picture Type Information (PTYPE) (3 bits)
`4.1.3 Split Information (SPLIT) (Variable Length)
`4.1.4 Merge Information (MERGE) (Variable Length)
`4.1.5 Quantizer Information (PQUANT) (5 bits)
`4.1.6 Quantization information forB-pictures (PBQUANT) (2 bits)
`
`4
`
`Description of the Nokia Coder
`
`6
`
`6
`
`6
`7
`8
`
`8
`9
`9
`
`10
`11
`12
`12
`12
`13
`13
`
`14
`14
`14
`15
`15
`15
`15
`17
`17
`18
`18
`19
`
`19
`
`20
`
`20
`
`20
`
`21
`
`21
`
`22
`
`22
`
`22
`
`23
`24
`24
`24
`24
`25
`25
`
`Vedanti Systems Limited - Ex. 2008
`Page 4
`
`
`
`4.1.7 Coded/NotCoded Information for Chrominance (CNCC) (Variable Length) and Coded/NotCoded Information
`for Chrominance ofB Frames (CNCCB) (Variable Length)
`25
`
`4.2 Region Layer
`4.2.1 Region Type Information (RTYPE) (Variable Length)
`4.2.2 Region Type Information forB Frame Region (RTYPEB) (Variable Length)
`4.2.3 Quantization Parameter Offset for Inter and Intra Regions (RDQUANT) (Variable Length)
`4.2.4 Motion Information (MINFO), Motion Information forB-region (MINFOB) (Variable length)
`4.2.5 Coded/NotCoded Information forB Frame Region Luminance(CNCYB) (Variable Length) AND
`Coded/NotCoded Information for P Region Luminance (CNCY) (Variable Length)
`
`4.3 Block Layer, Block Layer B and Block Layer C
`4.3.1Blocklayer
`4.3.2 8x8 Coding Method Type (MTYPE8) and 4x4 Coding Method Type (MTYPE4) (Variable Length)
`4.3.3 INTRADC (Fixed Length) and VLC for Coding Methods (MVLC) (Variable Length)
`4.3.4 Block layer B
`4.3 .5 Block layer C
`
`S.REFERENCES
`
`26
`26
`26
`27
`27
`
`30
`
`31
`31
`31
`32
`36
`36
`
`38
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder 5
`
`Vedanti Systems Limited - Ex. 2008
`Page 5
`
`
`
`1. Introduction
`
`This document describes a scheme for compression of video at very low bit rates developed in Nokia Research Center.
`The primary objective of the scheme is to achieve higher coding efficiency when compared to H.263 video coder and
`MPEG-4 Verification Model. The video coder described in this document includes a number of novel elements which
`contribute to its improved performance. These elements are:
`
`•
`
`segmentation of coded frames into regions obtained by quadtree splitting and then merging.
`
`• very accurate motion compensated prediction utilizing 2-D orthogonal polynomials for encoding of motion vector
`fields.
`
`• multiple choice coding scheme applied to motion compensated prediction error.
`
`In order to facilitate fair comparison with H.263, the coder described in this document has been designed taking into
`account similar algorithmic delay constraints as those imposed on H.263 codec. Also the bitstream syntax was kept,
`whenever possible, similar to H.263. The major syntactic difference is the adoption of Region Layer in place of H.263
`Macroblock Layer. Frame segmentation into regions is defined by segmentation information contained in the Picture
`Layer of the bitstream. The syntax of the coder allows straightforward extension to support arbitrarily-shaped frame
`segmentation.
`
`Unless otherwise specified in this document the video source is assumed to in the QCIF format (176x 144 pixels for
`luminance, 88 x 72 pixels for chrominance). The coder support also other source formats.
`
`2. Coder Description
`
`2.1 Video Source Coding Algorithm
`
`The schematic diagram of the encoder is shown in Figure 2-1. The main novel elements of the coder are the motion field
`encoding scheme and the prediction error coding scheme shown in the figure as Inter Coding which utilizes motion
`compensated prediction frame (as shown in Figure 2-1) to improve coding efficiency.
`
`6
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 6
`
`
`
`CODER CONTROL
`
`l,(x,y)
`
`video
`input
`
`A
`l,(x,y)
`
`E : Prediction Error Frame
`i : Prediction Frame
`E : Reconstructed Prediction Error
`I
`: Original Current Frame
`T : Reconstructed Current Frame
`
`C!l z
`~
`--' c.. s
`§l to
`decoder
`
`motion
`information
`
`Figure2-1
`
`Schematic diagram of the encoder.
`
`2. 1. 1 Definitions
`Reference frame: The most recently reconstructed frame. Denoted ln-l
`Current frame: The frame that is being coded at the current moment. Denoted In.
`Prediction frame: Motion compensated prediction of the current frame. Denoted in.
`I type frame: The frames which are coded independently of the reference frame.
`P type frame: The frames coded by motion compensated prediction.
`PB type frame:. A PB-frame consists of two frames being coded as one unit. The name PB comes from the name of
`picture types in Recommendation H.263.
`Region: A semi-arbitrary shaped segment in the current frame. The term 'semi-arbitrary shaped' refers to the convention
`that the building blocks of the regions are n x n pel blocks. In the current implementation, n=8.
`UNCHANGED type region: The regions which are found to have very little temporal change. These regions are coded
`by copying the corresponding area from the reference frame.
`INTRA type region: The regions which are decoded independently from the reference frame.
`INTER type region: Regions coded by motion compensated predict,ion using the reference frame. Information for such
`regions includes motion information and coded prediction error.
`NO MOTION INTER type region: The difference between the current frame and the reference frame is coded.
`Information for such regions includes only coded prediction error.
`INTER-B type region: (This type of region can occur only in B type frames of a PB unit). Region which is coded by
`bidirectional motion compensated prediction using both the reference frame and the reconstructed P-frame.
`
`Motion vector: A pair of numbers [.ix(x,y),dy(x,y)] where .ix(x,y) and dy(x,y) are the values of horizontal and
`vertical displacements of a pel at location (x, y), respectively.
`Motion vector field: A set of motion vectors of all pels in a INTER type region.
`Motion model: A parametric formula describing values of motion vectors. Motion model used in Nokia coder is a
`second order polynomial model. (See Section 2.3 .l for details)
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder
`
`7
`
`Vedanti Systems Limited - Ex. 2008
`Page 7
`
`
`
`Motion coefficients: Coefficients needed to reconstruct motion for a region, as defined by the underlying motion model.
`The Nokia coder uses motion model with 12 coefficients.
`
`2.1.2 Coding modes
`
`The coding mode defmes the region type in the current frame. Each frame type has its own set of coding modes. The
`coding mode decision is sent to the decoder.
`
`If the current frame is P type, its regions can be of type UNCHANGED, INTER or INTRA. In UNCHANGED mode, no
`further information is sent. In INTER mode, the motion coefficients and the prediction error for the region are sent. In
`INTRA mode, the contents of the region are coded as such.
`
`For PB-type frames, the region types are determined independently for the P- and B-frames. For the ?-component, the
`possible region types are UNCHANGED, INTER-B or INTRA. In UNCHANGED mode, no further information is sent.
`In INTER-B mode, the motion coefficients and the prediction error for the region are sent. In INTRA mode both motion
`coefficients and the contents of the region are sent. For B-component, there are 8 possible region types. These types are
`determined by independently indicating usage of the following three coding elements:
`
`the differential correction to the motion parameters inherited from the region in ? -component (similar to
`delta motion vectors in PB-frames of H.263),
`the prediction error information,
`backward prediction.
`The coding of INTRA-type frames is adopted from Recommendation H.263 [3].
`
`2.2 Frame Segmentation
`
`Throughout this section the current frame is assumed to consist of luminance component only. The division of the
`current frame into segments is described in two steps: splitting (SPLIT bits) and merging (MERGE bits). The SPLIT and
`MERGE bitstream can describe any segmentation consisting of combination of 8 x 8 pel blocks.
`
`The decoder starts with a fixed partitioning of the current frame into 32 x 32 pel segments (initial segmentation) shown
`by the solid lines in Figure 2-23
`• The received SPLIT bits indicate which of those segments should be further divided into
`smaller segments. The received MERGE bits indicate how the resulting split segments should be recombined to form the
`final frame segmentation
`
`The encoder starts the segmentation from the 32 x 32 pel regions. Motion of these regions is estimated and each region
`not satisfying a normalized prediction error criterion is split into smaller blocks. Splitting and motion estimation for
`regions proceeds recursively until the measure of prediction error for the region falls below a given threshold, or until tlie
`minimum size of the region (Rx8 pixels) is reached. The next step in the encoder is the merging during which some of the
`neighboring blocks are combined together using the rate-distortion criterion. This two-stage process can result in an
`segmentation consisting of regions which are combinations of 4-connected 8 x 8 pel blocks (Figure 2-5).
`~ ~ 32
`~~2
`
`I<E--
`
`'
`'
`
`- -·--
`-- .- -
`'
`
`-----:::>~16 fE-
`
`'
`
`I
`
`'
`
`I
`
`'
`
`I
`
`- -r- - -- , --
`'
`'
`
`- -.--
`
`i
`1
`
`144
`
`'
`--l-- - -- .; - -~--· - -
`'
`I
`I
`'
`- _I_- __ J... _ _ - - -' - - L - -• - -
`
`I
`I
`
`I
`I
`
`I
`
`176
`
`'
`'
`
`- --.--
`'
`
`-
`
`--1 - -
`I
`
`I
`_ _I __
`
`I
`
`~
`16
`f
`
`Figure 2-2.
`
`Initial segmentation of the coded frame.
`
`3 QCIF resolution frames cannot be fully divided into 32 x 32 blocks which is why the initial partitioning of the QCIF
`frame includes 16 x 32, 32 x 16 and 16 x 16 blocks on the frame boundaries.
`
`8
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 8
`
`
`
`2.2.1 Splitting
`
`Split information is sent from the encoder to the decoder as two sequences of bits (SPLIT bits). Bits of the first sequence
`indicate splitting of all the regions in the initial segmentation which are greater than 16x16 pel (i.e. 32x32, 32xl6, and
`16x32) into four or two 16x/6 regions as shown by the dashed lines in Figure 2-2. We refer to this operation as first split
`level. An example of the regions resulting from the first split is shown in Figure 2-3.
`
`The bits of the second sequence indicate which of the 16xl6 regions present in the frame segmentation after performing
`the first split level should be split into four 8x8 regions. We refer to this operation as second split level. An example of
`the allowed partitioning for the second split is shown in Figure 2-3. Figure 2-4 shows an example frame segmentation
`after two levels of split.
`
`Each of the resulting two SPLIT bit sequences is coded using an entropy coded run-length technique described in Section
`4.1.3.
`
`2.2.2 Merging
`The next step in building the segmentation is merging of regions prodm::ed by the above splitting procedure. In this step,
`the encoder merges those neighboring regions which satisfy a prediction error criterion for the hypothetical region that
`would result from merging those two neighboring regions. We refer to this merging algorithm as Motion Assisted
`Merging. The algorithm is described in [2}.
`The encoder informs the decoder about the merge/not-merge decisions using MERGE bits which are inserted in the
`bitstream right after the SPLIT bits. MERGE bits refer to an adjacency graph, which is initialized at the beginning of the
`transmission and is being updated in the meantime according to the following rules:
`Initialize adjacency graph:
`1
`1.1
`Assign an unique label to each of the split segments: scan the segmentation image from left to right(cid:173)
`top to bottom with a step of 8x8 pel block, every time a new segment is encountered assign a new label
`to it, which is incremented by one comparing to the previous label.
`Associate with each segment array of labels of neighboring segments using 4-connectivity rule4
`arrays are initially sorted according to increasing index of segment labels.
`Construct MERGE bit sequence by parsing the adjacency diagram from the segment with lowest index to the
`segment with highest index, and parsing the array of neighbors from start to end. Generate bits for merge/not
`merge decisions only for those neighbors having a higher segment index than the index of the segment being
`processed. 0 indicates that the two segments remain intact, 1 indicates that the two segments are merged.
`Every time two segments are merged update the adjacency graph before proceeding with encoding/decoding
`subsequent MERGE bits.
`3 .1
`The whole area of the merged segment should be labeled with the lower label. The higher label should
`be removed from the adjacency graph completely.
`Adjacency relations must be updated. Specifically, if the segment labeled i is merged with the segment
`labeled j, where i<j, do the following for each of the indices k in the array of neighbors of segment j;
`parsing the array from start to end:
`lfk is a common neighborofi andj,
`proceed to the next k
`
`1.2
`
`3.2
`
`2
`
`3
`
`• These
`
`else
`
`concatenate the index k to the end of neighbor array of segment i.
`
`Figure 2-5 shows a possible segment layout after merging.
`
`4 In this context, 4-connectivity rule means that two segments are neighbors if they have at least a line segment as their
`common border
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder
`
`9
`
`Vedanti Systems Limited - Ex. 2008
`Page 9
`
`
`
`144
`
`1 - - - - l - - - - -+ - - - _ ~ 1
`
`~---- - ~--~---L--~~
`
`~<~----176------->•
`
`Figure 2-3.
`
`Example region boundaries after the first level split. Dashed lines indicate candidates for the second
`level split.
`
`f-t-K-
`f-t-
`
`1-f---
`
`I
`I
`
`-
`
`144
`
`<
`
`--
`176 ----~---'>,..
`
`Figure2-4.
`
`Example region boundaries after the second level split
`
`~16f;E-r 144 1
`
`~
`16
`7f •<--- 176 ---->•
`
`Figure2-5.
`
`Example regions after merging
`
`2.3 Motion Field Coding
`In this document only decoder specific features of motion vector field coding are described. Detailed description of
`motion estimation and encoding (Motion Assisted Merging and Coefficient Removal) can be found in [1,2].
`Motion compensated prediction of a region resulting from the segmentation process described above is performed
`according to the following equation:
`
`10
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 10
`
`
`
`in (x,y) = ln-![x+ ~(x,y),y+ t1y(x,y)]
`
`( 2-1)
`
`where in is the prediction frame and In-I is the previous reconstructed frame (reference frame) and the pair of numbers
`[ &(x,y), t1y(x, y)] is a motion vector of the pel in location (x, y).
`
`The chrorninance motion vectors are calculated by evaluating the luminance motion vector field in the half-pel position
`corresponding to the location of this chrominance pel and by dividing the resulting motion vector by two.
`
`2.3. 1 Motion model
`The motion field of each INTER region is represented by a set of 12 motion coefficients. These coefficients are produced
`in the encoder by the motion estimation and motion field encoding blocks in Figure 2-l. The relation between motion
`coefficients and the actual values of motion vectors in a given point of a region is defined by the following parametric
`model:
`
`&(x,y) =edt (x,y)+ c2fz(x,y) +c3f3(x,y) + c4/4(x,y) +csfs(x,y) + c6f6(x,y)
`~y(x,y) = c7ft (x,y) + c8h (x, y) + c9.f3(x,y) + c10f4(x, y) + c11J5 (x,y) + 'Ct2f6(x,y)
`where functions /jO (j=1,2, ... ,6) are called motionfield basis functions and (x, y) are integer pixel coordinates in a
`system with the origin in the left-upper comer of the frame.
`The motion model is based on 6 basis functions and the same model is used for horizontal and vertical displacements.
`The basis functions are obtained by orthonorrnalizing the basis function set {1, x, y, xy, x?, lJ to the bounding box of the
`region (example shown in Figure 2-6).
`
`( 2-2)
`
`Hence the form of the motion field basis functions is uniquely determined by the size of the bounding box of the region
`and ·can be constructed in the decoder after the SPLIT and MERGE bits arc received. The two dimensional (2-D) basis
`functions /jO are built as a tensor product of two sequences of one dimensional (1-D) discrete orthonormal
`polynomials:
`r
`g,(x) = :La,,jxj, r = 0,1,2 .orthonormalontheinterval [xrnin,Xmax]
`j~O
`
`r
`h,(y) = Lflr.jYj, r = 0,1,2 orthonormal on the interval [Ymin•Ymax]
`j;Q
`
`Functions fj 0 are built as follows:
`fz (x,y) = go(x)hl (y) IJ(x,y) = gl (x)ho(Y)
`/1 (x,y) = go(x)ho(Y)
`f5 (x, y) =go (x)h2 (y) 16 (x, y) = gz (x)ho (y)
`/4 (x, y) = gl (x)hl (y)
`The coefficients of the polynomial gr (x), with L = xmax- Xmin are given by
`ao,o =~ L~1
`
`( 2-3)
`
`( 2-4)
`
`3
`3L
`(L+I)(L+2) +Zxrnin L(L+l)(L+2)
`
`ai,O =
`
`3
`- 21--~--
`a
`1·1 - - L(L + l)(L + 2)
`
`S(L-l)L
`a2 0 - - - - ' - - - ' - - - + 6xmin
`(L+l)(L+2)(L+3)
`' -
`
`5L
`(L-1)(L+l)(L+2)(L+3)
`
`6 2
`+ Xmin
`
`f<
`
`5
`(L- l)L(L+l)(L+2)(L+3)
`
`(2-5)
`
`a, 1 = -6
`-·
`
`5L
`·
`(L-l)(L+l)(L+2)(L+3)
`
`- 12x ·
`mm
`
`5
`(L -l)L(L + l)(L + 2)(L + 3)
`
`5
`a
`-6 J - - - - - - - - - -
`(L-'1)L(L+l)(L+2)(L+3)
`2'2-
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder 11
`
`Vedanti Systems Limited - Ex. 2008
`Page 11
`
`
`
`The respective coefficients Pr,j are calculated as coefficients ar,j using the above formulas by replacing Xmin by Ymin
`
`and setting L = Ymax - Ymin ·
`The choice of orthogonal polynomials for basis functions was motivated by the observation that coefficients
`corresponding to such polynomials are less sensitive to quantization than coefficients corresponding to ba~is functions
`{l,x, y, xy, :1, /J. Two options were considered initially:
`• polynomials orthonormalized with respect to shape of the region (using some orthogonalization method) or
`•
`separable polynomials orthonormalized with respect to the bounding box of the region calculated according. to
`formulas ( 2-4 ).
`It was found that the latter motion model provides equally good performance5 as the former one. The separable model
`was chosen since its basis functions are given by analytic formulas and can be computed with relatively low
`computational complexity.
`
`(Xmin' Ymin)
`
`r------n----------------------:
`
`(Xmax' Ymin)
`
`Figure2-6.
`
`Bounding box for the region Rk
`
`2.3.2 Motion coefficient scaling and quantization
`
`2.3.2.1 Motion coefficient scaling
`
`Scaling of motion coefficients enables to vary the bit allocation between segments having different sizes, but the same
`size of the bounding box. Scaling operation itself does not affect neither the bit allocation nor the precision of motion
`estimation. If there was no quantization of motion coefficients, scaling would not affect the codec performance at all. It is
`the quantization of scaled motion coefficients, which dedicates more bits (assuming that the number of motion
`coefficients are equal) to larger segments among those having the same size of the bounding box.
`Motion coefficients of a segment are scaled according to the ratio between the size of the bounding box and the size of
`the segment. Let us denote the number of pixels in the segment by P, and the number of pixels in the bounding box by
`Pbo~· The scaling factor scale equals to:
`
`pbox
`scale=-(cid:173)
`p
`
`( 2-6)
`
`Each of the motion coefficients of the segment is divided by the scaling factor scale. In order to keep the validity of ( 2-2
`) it is necessary that the value of each of the basis functions ( 2-4 ) at each pixel location in the segment is multiplied by
`the same scaling factor scale.
`
`2.3.2.2 Motion coefficient quantization
`
`A uniform scalar quantizer is applied to each of the non-zero motion coefficients. The quantizer step size is predefined as
`STEP=3. The quantization of a coefficient cj corresponding to orthonormal basis functions is done according to the
`formula:
`
`measured in bits needed to achieve a given prediction error.
`
`12
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 12
`
`
`
`LEVEL = ci II STEP,
`
`( 2-7)
`
`where II denotes division followed by rounding operation.
`Each of the non-zero motion coefficients cj is reconstructed from the transmitted LEVEL. The following formula
`describes the inverse quantization process of a given coefficient resulting in a reconstructed coefficient value c j :
`c j = sign(LEVEL) * !LEVELl * STEP
`
`( 2-8)
`
`The maxL'llum eneodable value of !LEVELl is 1130. Whenever !LEVELl exceeds the maximum encodable value it is
`clipped.
`Encoder transmits scaled motion coefficients, so before proceeding to build the prediction frame in the decoder it is
`necessary to scale the basis functions as described in the previous section.
`
`2.3.3 Motion coefficient coding
`
`The motion field of an INTER type region is described by 12 motion coefficients, 6 of which relate to horizontal and 6 to
`vertical components of motion vectors ( 2-2 ). Since the amount and type of motion in the sequences tends to vary, the
`method for coding motion coefficients was designed to enable adaptation of the complexity of the motion model to the
`motion in the sequence.
`The motion information for a region consists oftwo elements: selection information and quantized coefficients. Selection
`information contains two 6-tuples of bits: MCP _x and MCP _y. Each bit of the 6-tuple MCP _x corresponds to one
`coefficient of the horizontal displacement function Ax(x, y) whereas bits of MCP _y correspond to coefficients of the
`vertical displacement function ~y(x,y)of the region. The role of these bits is to indicate whether the corresponding
`coefficient has a nonzero value. Thus selection information identifies which motion coefficients are transmitted to the
`decoder.
`Note that this structure of information allows varying the complexity of the motion model between regions, frames, and
`sequences. This enables the usage in the encoder of Coefficient Removal Algorithm to determine the importance of a
`given coefficient for the result of the prediction and to determine which coefficients need to be transmitted for a
`particular region. The algorithm is described in [2]. However, other methods for estimating and selecting the motion
`coefficients can be used in the encoder.
`The 6-tuples in the selection information as well as the values of the quantized non-zero coefficients are Huffman coded
`as described in Section 4.2.4.
`The coder uses a PB-frame mode adapted to operate on arbitrary shaped regions and to utilize a polynomial motion
`model. If the PB-frame mode is used, then, as in H.263, additional differential motion coefficients can be sent for a
`region to improve prediction of the region in the B-frame. The implementation ofPB-frame mode is described in detail in
`Section 3.
`
`2.3.4 Image interpolation
`
`Since values of motion vectors can have non-integer values, motion compensated prediction requires evaluating the
`luminance and chrominanee values at non-integer locations (x, y) in the reference frame In-l. The interpolation of
`luminance and ehrominance values is done by cubic convolution interpolation using pixel luminance values in the 4x4
`neighborhood [4].
`For a frame of size M x N, the pixels have coordinates x j = 0,1, ... , M -1 and Yk = 0,1, ... , N -1. Let (x J, Yk) be such
`that xi :5: x <xi+! and Yk s y :5: Yk+l. The cubic convolution interpolation in the point (x, y) is defined as:
`2
`2
`ln-l(x,y) = L Lcj+l,k+mu(x - xj+z)u(y- Yk+m)
`l=-lm=-1
`
`( 2-9)
`
`where 1-D interpolation function is defined as:
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder 13
`
`Vedanti Systems Limited - Ex. 2008
`Page 13
`
`
`
`O<ls!<l
`1 <lsi< 2
`2 <lsi
`
`( 2-10)
`
`·For pixels within the frame boundaries, the c1k's are given by c jk = ln-l(x1,yk ). Luminance values cjk residing outside
`the frame which are needed for interpolation are obtained by duplicating the luminance values of pixels on the boundary
`of the frame.
`
`2.4 Multi-Transform Prediction Error Coding
`
`2.4.1 Overview
`The Multi-Transform Prediction Error Coding technique used in this coder is based on the observation that in typical
`video sequences the prediction error (residual error after motion compensation) is concentrated near the contours of the
`moving objects. Knowledge of the localization of prediction error can be used to improve coding efficiency by using
`transforms with better localization properties.
`The exact location of contours of moving objects is not known in general. However, the locations can be approximately
`determined by finding edges and other discontinuities in a video frame. For this purpose, the coder utilizes the prediction
`frame which is known both to the encoder and the decoder (after receiving motion coefficients). The improved coding
`efficiency of the system is due to the fact that properties (location, directionality, etc.) of the prediction error signal at a
`given location can be inferred from properties of the prediction frame in in the same location. Thus the decoder can
`anticipate what coding technique(s) the encoder will choose for coding of prediction error pattern in this block.
`In the proposed coder both the encoder and the decoder include a classifier which analyses spatial properties (location of
`discontinuities and their directionality) of the prediction frame i,. The above information is used to switch between a
`multitude of coding methods. The decision on the best methods is made by an optimization procedure based on rate(cid:173)
`distortion performance. The selection information is transmitted to the decoder as a variable length codeword. Among the
`possible methods there are Multi-Shape DCTs, extrapolation and Entropy-Constrained VectorQuantization (ECVQ).
`
`2.4.2 BxB Classification
`
`The criterion for classification of an 8x8 block, using the prediction frame in, is the location of the areas with high
`variance of pixel values. Each 8x8 block is