`(10) Patent No.:
`a2) United States Patent
`Lainema
`(45) Date of Patent:
`May12, 2009
`
`
`US007532808B2
`
`(54) METHOD FOR CODING MOTIONIN A
`VIDEO SEQUENCE
`
`ea
`;
`.
`Jani Lainema, Irving, TX (US)
`Inventor:
`(75)
`:
`.
`:
`:
`(73) Assignee: Nokia Corporation, Espoo (FI)
`;
`.
`;
`;
`a
`Notice:
`Subject to any disclaimer.> the term ofthis
`patent is extended or adjusted under 35
`USS.C. 154(b) by 1003 days.
`
`(
`
`OTHER PUBLICATIONS
`
`.
`;
`“Global Motion Vector Coding (GMVC)”; Shijun Sun et al.;
`ITU—Telecommunications Standardization Sector, Video Coding
`Experts Group (VCEG); Meeting: Pattaya, Thailand, Dec. 4-7, 2001;
`pp. 1-6.
`“Joint Model Number | (JM-1)”; Doc. JVT-A003; Joint VideoTeam
`of ISO/IEC and ITU-T VCEG;Jan. 2002;pp. 1-79.
`Acta ofZhongshan University, vol. 40, No. 2; L. Hongmeiet al.; “An
`Improved Multiresolution Motion Estimation Algorithm”; pp. 34-37;
`Mar.2001.
`ITU Telecommunications Standardization Sector, Doc.VWCEG-N77;
`S. Sun et al; “Motion Vector Coding with Global Motion Param-
`eters”;
`pp. 1-11; Fourteenth Meeting: Santa Barbara, CA, USA,Sep.
`24-28, 2001.
`
`(Conhinues))
`Jyiiary Hxaminer—Huy"l Nguyen.
`(74) Attorney, Agent, or Firm—Ware, Fressola, Van Der
`Sluys & Adolphson, LLP
`(57)
`ABSTRACT
`
`A method of motion-compensated video encoding that
`enables a video sequence with a global motion componentto
`q
`g
`Pp
`be encoded in an efficient manner. A video encoder is
`arranged to assign macroblocks to be coded to specific coding
`modes including a skip mode, whichis usedto indicate one of
`twopossible types of macroblock motion: a) zero motion, or
`b) global or regional motion. As each macroblock is encoded,
`a previously encoded region surrounding the macroblock is
`examined and the characteristics of motion in that region
`determined. With the skip mode, the macroblock to be coded
`and a motion vector describing the global motion or regional
`motion is associated with the macroblock if the motion in the
`regionis characteristic of global motionor regional motion.If
`the region exhibits an insignificant level of motion, a zero
`valued motion vector is associated with the macroblock.
`
`65 Claims, 10 Drawing Sheets
`
`
`
`21)
`
`(
`
`(22)
`(65
`
`Appl. No.: 10/390,549
`pp
`,
`
`Filed:
`
`Mar. 14, 2003
`Prior Publication Data
`US 2003/0202594.A1
`Oct. 30, 2003
`°
`Related U.S. Application Data
`(60) Provisional application No. 60/365,072,filed on Mar.
`15, 2002.
`
`51)
`
`Int. Cl.
`
`(56)
`
`(2006.01)
`HOAN 5/91
`UWS. Gh: csscssessmnceeemncs 386/111; 386/112
`62)
`(58) Field of Classification Search ..........0...00.... 386/68,
`386/111, 112, 95; 348/466, 699; 375/240.15
`See application file for complete search history.
`.
`References Cited
`U.S. PATENT DOCUMENTS
`
`—
`~~
`5,148,272 A
`9/1992 Acamporaetal.
`.......... 358/133
`
`eee 358/335
`5,191,436 A
`3/1993 Yonemitsu oo...
`
`8/1995 Sunetal. cee 348/402
`5,442,400 A
`12/1997 Kato wees 348/699
`5,701,164 A
`6,683,987 BL*
`1/2004 Sugahara 0.0... 382/235
`7,200,275 B2*
`4/2007 Srinivasan et al.
`.......... 382/239
`
`¢ 630
`PoErr
`I
`|
`|
`'
`|
`|
`|
`[
`|
`'
`
`
`
`;
`
`|
`|
`\
`|
`I
`|
`|
`|
`|
`|
`
`Generate Active
`Anal
`640, 650
`
`
`nalyze
`Motion Parameters
`
`
`
`!
`Surrounding
`128
`Motion
`
`
`Motion ||Compensation
`
`
`
`
`Generate Zero-Motion
`
`
`Parameters
` eee we eee ee ee ee eea ee a ae a a a er eee eee
`
`AMAZON-1001
`7,932,808
`
`||\|||\
`
`AMAZON-1001
`7,532,808
`
`
`
`US 7,532,808 B2
`
`Page 2
`
`OTHER PUBLICATIONS
`
`ITU Telecommunications Standardization Sector, Doc. VCEG-N16;
`S. Sun et al; “Core Experiment description: Motion Vector Coding
`with Global Motion Parameters”; pp. 1-6; Fourteenth Meeting: Santa
`Barbara, CA, USA,Sep. 24-28, 2001.
`
`Joint Photography Expert Group Conference, Crowborough JPEG
`Forum Ltd, GB, Specialists Group on Coding for Visual Telephony
`Joint Photographic Expert Group; “Description of Ref. Model 8
`(RM8)”; pp. 1-72; Jun. 9, 1989.
`
`* cited by examiner
`
`
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 1 of 10
`
`US 7,532,808 B2
`
`SEL
`
`
`XNWaa1XAW
`
`
`
`
`
`
`
`OZt3SUSANI
`
`
`
`(OJIBAUOHOW)A
`
`anus
`
`NOILLOW
`
`BHOLS-OBLVSN3dWOo
`
`
`
`OVAldNOILOW
`
`ONIGOD
`
`(LUVWOIYd)|“Big
`
`oe
`
`NOLLOW
`
`NOILVWILSS
`
`c#HOLIMS
`
`
`
`
`
`ecb“496619YSLNI/VYLNI
`
`
`
`(MALI/VELL!40}Bey)dCole‘\a°°||.IOYMLNOSONIGODcomeesees-00L
`GeS}UB19}JedWojsueyPezyUeNbb#HOLIMSC)YAZILNYND
`
`Fares______.Pounulsie.10)Bel)
`veMN(uopeoipuyuonezyuenb)zbeth\wnnnnnnnnnneenenennnnnee901a'Shh
`
`
`Lz7O9L
`ge}NOILOIUsad
`
`
`vo}sioepMSLNI/WHLNI|Ebb
`
`WYUSSNVaL
`
`Oblal
`
`g80lOo
`
`aSY3AN!
`
`YAZILNVINIO
`
`WYOsSNVEL01
`
`'
`
`+
`
`NIO3GIA
`
`LOL
`
`
`
`
`
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 2 of 10
`
`US 7,532,808 B2
`
`LNOOACIA
`
`082
`
`Oe
`
`t
`
`ASYZANIL
`
`WHOASNVaL
`
`!11
`
`SSY3ANI0
`
`SHSZILNVIIO
`
`Seb
`
`OaLVSNAdWOO
`
`NOLLUIGSad
`
`NOILOW
`
`
`
`(0}9AuOnOW)A
`
`(LYYWOrud)z‘Big
`
`022
`
`
`
`(uOnesIpuluoWeznuenb)zb|8b¢
`
`he,
`
`092
`
`
`
`JOULNODONIGOD
`
`11
`
`(HELNI/WYLNI10}Bey)csLe
`
`(paymsuen10;Bey)49L2
`
`
`
`(uoneoipuyuoyezyuenb)zbL)2
`
`NN002
`
`
`
`
`
`
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 3 of 10
`
`US 7,532,808 B2
`
`(Luv
`
`wOldd):C.
`
`rE|A
`
`8
`
`8
`
`8
`
`zAqojdures-qns
`
`88
`
`oT
`
`8
`
`8
`
`
`
`
`US 7,532,808 B2
`
`FIG.4
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 4 of 10
`
`4x8meaé
`
`4x4
`
`16x88x8
`
`16x16
`
`
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 5 of 10
`
`US 7,532,808 B2
`
`FIG.95
`
`
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 6 of 10
`
`US 7,532,808 B2
`
`seg
`
`SSZILNVIIO
`
`‘
`sad
`
`SSYaANI
`
`AdUsSNVSL
`
`
`
`(10]98AUONOW)A
`
`9°OId
`
`GALYSNAdWOS
`
`NOLLIIUA8e
`
`NOILOW
`
`lalaNOLLOW
`
`ONIOOD
`
`NOILVWILSA
`
`NOLLOW
`
`ZHHOLIMS
`
`ie
`
`SSYaAANI0
`
`¥01
`
`80}cot
`
`
`
`
`
`PZl(uonesipuluonezquenb}zb
`
`90L
`
`
`
`
`
`PeZznUENe;YBZILNYNOSwEolyeoduWO\sue)
`
`NYOISNVELLL
`
`b#HOLIMS
`
`elt
`
`NIOZdIA
`
`
`
`
`
`
`
`Shhuo|sfoePYALNI/VELNI
`
`ecb(pepiusue10)Bey)3
`
`(SSLNI/VHLNI40)Bey)
`
`O9t
`
`
`
`JOBLNODONIGOS
`
`meeneeeeed
`
`496619MSN!/VELNI
`
`009
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 7 of 10
`
`US 7,532,808 B2
`
`LNOOAGIApl
`
`WaUdSNYaaSYaANI
`
`
`
`TaOLSANVUS
`
`CaLYSNIdWOD
`
`NULLad
`
`NOLLOW
`
`oieLOHOW)A
`
`264
`
`OLL
`
`09¢j
`
`00L
`
`
`
`JOULNODONIGOSD
`
` CSZLLNI/VULLNI40)Bey)dco[Z
`
`
`
`
`
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 8 of 10
`
`US 7,532,808 B2
`
`059‘079
`
`udT}OF
`
`uoresusdtuos
`
`
`
`SAIOYoyerouaH
`
`
`
`SIoJoWeIegUOTOW
`
`uolOW
`
`azA]euy
`
`Ssulpunoling
`
`uoTIOY
`
`8c}
`
`UOTOJ-0197d}eIOUIH)
`
`DATIOW-UON
`
`sIg}ouleIeg
`
`uonOW
`
`meeeewneeeeeeeeeeeeeeeeaeaeeweeaeasceaeee
`
`
`
`
`
`AIOUISJAUOTYLULIOJUTUOTOYY
`
`
`
`
`
`
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 9 of 10
`
`US 7,532,808 B2
`
`FIG.9
`
`
`
`U.S. Patent
`
`May12, 2009
`
`Sheet 10 of 10
`
`US 7,532,808 B2
`
`NOILOANNOO
`
`NLSdOL
`
`28
`
`b€e2'9
`
`
`yiaa93009
`
`NAWdINOAjaolanvolanv
`
`
`
`ANSAWdINOSSILVWATSL
`
`02h1
`
`€g8
`
`JOYLNOD
`
`WALSAS
`
`
`
`
`
`US 7,532,808 B2
`
`1
`METHOD FOR CODING MOTION IN A
`VIDEO SEQUENCE
`
`This application claims the benefit of U.S. Provision!
`Application No. 60/365,072 filed Mar. 15, 2002.
`
`FIELD OF THE INVENTION
`
`The invention relates generally to communication systems
`and more particularly to motion compensation in video cod-
`ing.
`
`BACKGROUNDOF THE INVENTION
`
`A digital video sequence,like an ordinary motion picture
`recorded on film, comprises a sequenceofstill images, the
`illusion of motion being created by displaying consecutive
`imagesof the sequence oneafterthe otherat a relatively fast
`rate, typically 15 to 30 frames per second. Because of the
`relatively fast frame display rate,
`images in consecutive
`frames tendto be quite similar and thus contain a considerable
`amount of redundant information. For example, a typical
`scene may comprise somestationary elements, such as back-
`ground scenery, and some moving areas, which may take
`many different forms, for example the face of a newsreader,
`movingtraffic and so on. Alternatively, or additionally, so-
`called “global motion” may be presentin the video sequence,
`for example due to translation, panning or zooming of the
`camera recording the scene. However, in many cases, the
`overall change between one video frameandthe nextis rather
`small.
`
`Each frame of an uncompressed digital video sequence
`comprises an array of image pixels. For example, in a com-
`monly used digital video format, knownas the Quarter Com-
`mon Interchange Format (QCIF), a frame comprises an array
`of 176x144 pixels, in which case each frame has 25,344
`pixels. In turn, each pixel is represented by a certain number
`of bits, which carry information about the luminance and/or
`color contentof the region of the image correspondingto the
`pixel. Commonly, a so-called YUV color model is used to
`represent the luminance and chrominance content of the
`image. The luminance, or Y, componentrepresents the inten-
`sity (brightness) of the image, while the color content of the
`imageis represented by two chrominanceorcolor difference
`components, labelled U and V.
`Color models based on a luminance/chrominance repre-
`sentation of image content provide certain advantages com-
`pared with color models that are based on a representation
`involving primary colors (that is Red, Green and Blue, RGB).
`The humanvisual system is moresensitive to intensity varia-
`tions than it is to color variations and YUV color models
`
`exploit this property by using a lowerspatial resolutionfor the
`chrominance components (U, V) thanfor the luminance com-
`ponent (Y). In this way, the amount of information needed to
`code the color information in an image can be reduced with an
`acceptable reduction in image quality.
`The lower spatial resolution of the chrominance compo-
`nents is usually attained by spatial sub-sampling. Typically,
`each frame of a video sequence is divided into so-called
`“macroblocks”, which comprise luminance (Y) information
`and associated (spatially sub-sampled) chrominance (U, V)
`information. FIG.3 illustrates one way in which macroblocks
`can be formed. FIG. 3a showsa frame of a video sequence
`represented using a YUV color model, each componenthav-
`ing the samespatial resolution. Macroblocks are formed by
`representing a region of 16x16 image pixels in the original
`image (FIG. 35) as four blocks of luminance information,
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`each luminance block comprising an 8x8 array of luminance
`(Y) values and two spatially corresponding chrominance
`components (U and V) which are sub-sampled by a factor of
`two in the horizontal and vertical directions to yield corre-
`sponding arrays of 8x8 chrominance (U, V) values (see FIG.
`3c).
`A QCIF image comprises 11x9 macroblocks. If the lumi-
`nance blocks and chrominanceblocksare represented with 8
`bit resolution (that is by numbersin the range 0 to 255), the
`total numberofbits required per macroblock is (16x16x8)+
`2x(8x8x8)=3072 bits. The numberofbits needed to represent
`a video frame in QCIF formatis thus 99x3072=304,128 bits.
`This means that the amount of data required to transmit/
`record/display an uncompressed video sequence in QCIF
`format, represented using a YUV color model,at a rate of 30
`frames per second, is more than 9 Mbps (million bits per
`second). This is an extremely high data rate and is impractical
`for use in video recording, transmission anddisplay applica-
`tions because of the very large storage capacity, transmission
`channel capacity and hardware performance required.
`If video data is to be transmitted in real-time over a fixed
`
`line network such as an ISDN (Integrated Services Digital
`Network) or a conventional PSTN (Public Switched Tele-
`phone Network), the available data transmission bandwidth is
`typically ofthe order of 64 kbits/s. In mobile videotelephony,
`where transmission takes place at least in part over a radio
`communicationslink, the available bandwidth can be as low
`as 20 kbits/s. This meansthat a significant reduction in the
`amountof information used to represent video data must be
`achieved in order to enable transmission of digital video
`sequences over low bandwidth communication networks. For
`this reason, video compression techniques have been devel-
`oped which reduce the amount of information transmitted
`while retaining an acceptable image quality.
`Video compression methods are based on reducing the
`redundant and perceptually irrelevant parts of video
`sequences. The redundancy in video sequences can becat-
`egorised into spatial,
`temporal and spectral redundancy.
`“Spatial redundancy”is the term used to describe the corre-
`lation (similarity) between neighbouring pixels within a
`frame. The term “temporal redundancy” expresses the fact
`that objects appearing in one frameof a sequence arelikely to
`appear in subsequent frames, while “spectral redundancy”
`refers to the correlation between different color components
`of the same image.
`Sufficiently efficient compression cannot usually be
`achieved by simply reducing the various forms ofredundancy
`in a given sequence of images. Thus, most current video
`encoders also reduce the quality of those parts of the video
`sequence which are subjectively the least important. In addi-
`tion, the redundancy ofthe compressedvideobit-stream itself
`is reduced by meansofefficient loss-less encoding. Gener-
`ally, this is achieved using a technique known as entropy
`coding.
`There is often a significant amount of spatial redundancy
`between the pixels that make up each frameof a digital video
`sequence. In other words, the value of any pixel within a
`frame ofthe sequence 1s substantially the sameas the value of
`other pixels in its immediate vicinity. Typically, video coding
`systemsreduce spatial redundancy using a technique known
`as “block-based transform coding”, in which a mathematical
`transformation, such as a two-dimensional Discrete Cosine
`Transform (DCT), is applied to blocks of image pixels. This
`transforms the image data from a representation comprising
`pixel values to a form comprising a set of coefficient values
`representative of spatial frequency components significantly
`
`
`
`US 7,532,808 B2
`
`3
`reducing spatial redundancy and thereby producing a more
`compact representation of the image data.
`Frames of a video sequence which are compressed using
`block-based transform coding, without reference to any other
`frame within the sequence, are referred to as INTRA-coded or
`I-frames. Additionally, and where possible, blocks of
`INTRA-coded frames are predicted from previously coded
`blocks within the same frame. This technique, known as
`INTRA-prediction, has the effect of further reducing the
`amountof data required to represent an INTRA-coded frame.
`Generally, video coding systems not only reducethespatial
`redundancy within individual frames of a video sequence,but
`also make use of a technique knownas “motion-compensated
`prediction”,
`to reduce the temporal redundancy in the
`sequence. Using motion-compensated prediction, the image
`content of some (often many) frames in a digital video
`sequenceis “predicted”from one or moreother frames in the
`sequence, knownas “reference” frames. Prediction of image
`content is achieved by tracking the motion of objects or
`regions of an image between a frame to be coded (com-
`pressed) and the reference frame(s) using “motion vectors”.
`In general, the reference frame(s) may precedethe frameto be
`coded or mayfollow it in the video sequence. As in the case of
`INTRA-coding, motion compensated prediction of a video
`frame is typically performed macroblock-by-macroblock.
`Frames of a video sequence which are compressed using
`motion-compensated prediction are generally referred to as
`INTER-coded or P-frames. Motion-compensated prediction
`alone rarely provides a sufficiently precise representation of
`the image content of a video frame andtherefore it is typically
`necessary to provide a so-called “prediction error” (PE)
`frame with each INTER-coded frame. The prediction error
`frame represents the difference between a decoded version of
`the INTER-codedframe and the image content ofthe frame to
`be coded. Morespecifically, the prediction error frame com-
`prises values that represent the difference between pixel val-
`ues in the frame to be coded and corresponding reconstructed
`pixel values formedonthe basis of a predicted version of the
`frame in question. Consequently, the prediction error frame
`has characteristics similar to a still image and block-based
`transform coding can be applied in orderto reduceits spatial
`redundancy and hence the amount of data (numberofbits)
`required to representit.
`In orderto illustrate the operation ofa generic video coding
`system in greater detail, reference will now be madeto the
`exemplary video encoder and video decoder illustrated in
`FIGS. 1 and 2 of the accompanying drawings. The video
`encoder 100 ofFIG. 1 employs a combination ofINTRA- and
`INTER-coding to produce a compressed (encoded) video
`bit-stream and decoder 200 of FIG. 2 is arranged to receive
`and decodethe video bit-stream produced by encoder 100 in
`order to produce a reconstructed video sequence. Throughout
`the following description it will be assumed that the lumi-
`nance component of a macroblock comprises 16x16 pixels
`arranged as an array of 4, 8x8 blocks, and that the associated
`chrominance componentsare spatially sub-sampledby a fac-
`tor oftwo in the horizontal andverticaldirections to form 8x8
`
`blocks, as depicted in FIG. 3. Extension of the description to
`other block sizes and other sub-sampling schemes will be
`apparentto those of ordinary skill in the art.
`The video encoder 100 comprises an input 101 for receiv-
`ing a digital video signal from a camera or other video source
`(not shown). It also comprises a transformation unit 104
`which is arranged to perform a block-based discrete cosine
`transform (DCT), a quantizer 106, an inverse quantizer 108,
`an inverse transformation unit 110, arranged to perform an
`inverse block-based discrete cosine transform (IDCT), com-
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`biners 112 and 116, and a frame store 120. The encoder
`further comprises a motion estimator 130, a motion field
`coder 140 and a motion compensatedpredictor 150. Switches
`102 and 114 are operated co-operatively by control manager
`160 to switch the encoder between an INTRA-modeof video
`
`encoding and an INTER-mode of video encoding. The
`encoder 100 also comprises a video multiplex coder 170
`which forms a single bit-stream from the various types of
`information produced by the encoder 100 for further trans-
`mission to a remote receiving terminal or, for example, for
`storage on a mass storage medium, such as a computer hard
`drive (not shown).
`Encoder 100 operates as follows. Each frame of uncom-
`pressed video provided from the video source to input 101 is
`received and processed macroblock by macroblock, prefer-
`ably in raster-scan order. When the encoding of a new video
`sequencestarts, the first frame to be encodedis encoded as an
`INTRA-coded frame. Subsequently,
`the encoder is pro-
`grammedto code each frame in INTER-coded format, unless
`one of the following conditions is met: 1) it is judged that the
`current macroblockof the frame being codedis so dissimilar
`from the pixel values in the reference frame usedin its pre-
`diction that excessive prediction error information is pro-
`duced,
`in which case the current macroblock is coded in
`INTRA-coded format; 2) a predefined INTRA framerepeti-
`tion interval has expired; or 3) feedback is received from a
`receiving terminal indicating a request for a frame to be
`provided in INTRA-coded format.
`The occurrence of condition 1) is detected by monitoring
`the output of the combiner 116. The combiner 116 forms a
`difference between the current macroblockofthe frame being
`coded and its prediction, produced in the motion compen-
`sated prediction block 150. Ifa measure ofthis difference (for
`example a sum of absolute differences of pixel values)
`exceeds a predeterminedthreshold, the combiner 116 informs
`the control manager 160 viaa control line 119 and the control
`manager 160 operates the switches 102 and 114 via control
`line 113 so as to switch the encoder 100 into INTRA-coding
`mode. In this way, a frame which is otherwise encoded in
`INTER-coded format may comprise INTRA-coded macrob-
`locks. Occurrence of condition 2) is monitored by means of a
`timer or frame counter implemented in the control manager
`160, in such a waythat if the timer expires, or the frame
`counter reaches a predetermined numberof frames, the con-
`trol manager 160 operates the switches 102 and 114 via
`control line 113 to switch the encoder into INTRA-coding
`mode. Condition 3) is triggered if the control manager 160
`receives a feedback signal from, for example, a receiving
`terminal, via control line 121 indicating that an INTRA frame
`refresh is required by the receiving terminal. Such a condition
`mayarise, for example, if a previously transmitted frameis
`badly corrupted by interference during its transmission, ren-
`dering it impossible to decode atthe receiver. In this situation,
`the receiving decoderissues a request for the next frame to be
`encoded in INTRA-coded format,thus re-initialising the cod-
`ing sequence.
`Operation of the encoder 100 in INTRA-coding mode will
`now be described. In INTRA-coding mode,the control man-
`ager 160 operates the switch 102 to accept video input from
`input line 118. The video signal input is received macroblock
`by macroblock from input 101 via the input line 118. As they
`are received, the blocks ofluminance and chrominance values
`which make up the macroblock are passed to the DCTtrans-
`formation block 104, which performs a 2-dimensionaldis-
`crete cosine transform on each block of values, producing a
`2-dimensionalarray of DCT coefficients for each block. DCT
`transformation block 104 produces an array of coefficient
`
`
`
`US 7,532,808 B2
`
`5
`values for each block, the numberof coefficient values cor-
`respondingto the dimensionsofthe blocks which makeup the
`macroblock(in this case 8x8). The DCT coefficients for each
`block are passed to the quantizer 106, where they are quan-
`tized using a quantization parameter QP. Selection of the
`quantization parameter QPis controlled by the control man-
`ager 160 via controlline 115.
`The array of quantized DCT coefficients for each block is
`then passed from the quantizer 106 to the video multiplex
`coder 170, as indicated by line 125 in FIG. 1. The video
`multiplex coder 170 orders the quantized transform coeffi-
`cients for each block using a zigzag scanning procedure,
`thereby converting the two-dimensional array of quantized
`transform coefficients into a one-dimensional array. Each
`non-zero valued quantized coefficient in the one dimensional
`array is then represented as a pairofvalues, referred to as level
`and run, wherelevel is the value of the quantized coefficient
`and run is the numberof consecutive zero-valued coefficients
`precedingthe coefficient in question. The run andlevel values
`are further compressed in the video multiplex coder 170 using
`entropy coding, for example, variable length coding (VLC),
`or arithmetic coding.
`Once the run and level values have been entropy coded
`using an appropriate method, the video multiplex coder 170
`further combines them with control information, also entropy
`coded using a method appropriate for the kind of information
`in question, to form a single compressed bit-stream of coded
`image information 135. It should be noted that while entropy
`coding has been described in connection with operations
`performed by the video multiplex coder 170, in alternative
`implementations a separate entropy coding unit maybe pro-
`vided.
`A locally decoded version ofthe macroblock is also formed
`in the encoder 100. This is done by passing the quantized
`transform coefficients for each block, output by quantizer
`106, through inverse quantizer 108 and applying an inverse
`DCTtransform in inverse transformation block 110. In this
`way a reconstructed array of pixel values is constructed for
`each block of the macroblock. The resulting decoded image
`data is input to combiner 112. In INTRA-coding mode,
`switch 114 is set so that the input to the combiner 112 via
`switch 114 is zero. In this way, the operation performed by
`combiner 112 is equivalent to passing the decoded image data
`unaltered.
`As subsequent macroblocks of the current frame are
`received and undergo the previously described encoding and
`local decoding steps in blocks 104, 106, 108, 110 and 112, a
`decoded version of the INTRA-coded frameis built up in
`frame store 120. When the last macroblock of the current
`
`frame has been INTRA-coded and subsequently decoded,the
`frame store 120 contains a completely decoded frame,avail-
`able for use as a motion prediction reference frame in coding
`asubsequently received video frame in INTER-coded format.
`Operation of the encoder 100 in INTER-coding mode will
`now be described. In INTER-coding mode, the control man-
`ager 160 operates switch 102 to receive its input from line
`117, which comprises the output of combiner 116. The com-
`biner 116 receives the video input signal macroblock by mac-
`roblock from input 101. As combiner 116 receives the blocks
`of luminance and chrominance values which make up the
`macroblock,
`it forms corresponding blocks of prediction
`error information. The prediction error information repre-
`sents the difference between the block in question and its
`prediction, produced in motion compensated prediction
`block 150. More specifically, the prediction error information
`for each block of the macroblock comprises a two-dimen-
`sionalarray ofvalues, each ofwhichrepresentsthe difference
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`between a pixel value in the block of luminance or chromi-
`nance information being coded and a decoded pixel value
`obtained by forming a motion-compensatedprediction for the
`block, according to the procedure to be described below.
`Thus, in the exemplary video coding system considered here
`where each macroblock comprises, for example, an assembly
`of 8x8 blocks comprising luminance and chrominanceval-
`ues, the prediction error information for each block of the
`macroblock similarly comprises an 8x8 array of prediction
`error values.
`
`The prediction error information for each block of the
`macroblock is passed to DCT transformation block 104,
`which performs a two-dimensionaldiscrete cosine transform
`on each block of prediction error values to produce a two-
`dimensional array of DCT transform coefficients for each
`block. DCT transformation block 104 produces an array of
`coefficient values for each prediction error block, the number
`of coefficient values corresponding to the dimensionsof the
`blocks which make up the macroblock(in this case 8x8). The
`transform coefficients derived from each prediction error
`block are passed to quantizer 106 where they are quantized
`using a quantization parameter QP, in a manner analogousto
`that described above in connection with operation of the
`encoder in INTRA-coding mode.As before, selection of the
`quantization parameter QP is controlled by the control man-
`ager 160 via controlline 115.
`The quantized DCT coefficients representing the predic-
`tion error information for each block of the macroblock are
`
`passed from quantizer 106 to video multiplex coder 170, as
`indicated by line 125 in FIG. 1. As in INTRA-coding mode,
`the video multiplex coder 170 orders the transform coeffi-
`cients for each prediction error block using a certain zigzag
`scanning procedure and then represents each non-zero valued
`quantized coefficient as a run-levelpair. It further compresses
`the run-level pairs using entropy coding, in a manner analo-
`gous to that described above in connection with INTRA-
`coding mode. Video multiplex coder 170 also receives motion
`vector information (described in the following) from motion
`field coding block 140 via line 126 and control information
`from control manager160. It entropy codes the motion vector
`information and control information and formsa single bit-
`stream of coded image information, 135 comprising the
`entropy coded motion vector, prediction error and control
`information.
`The quantized DCT coefficients representing the predic-
`tion error information for each block of the macroblock are
`also passed from quantizer 106 to inverse quantizer 108. Here
`they are inverse quantized and the resulting blocks of inverse
`quantized DCTcoefficients are applied to inverse DCTtrans-
`form block 110, where they undergo inverse DCT transfor-
`mation to producelocally decoded blocks of prediction error
`values. The locally decoded blocksof prediction error values
`are then input to combiner 112. In INTER-coding mode,
`switch 114 is set so that the combiner 112 also receives
`
`predicted pixel values for each block of the macroblock,
`generated by motion-compensatedprediction block 150. The
`combiner 112 combineseach ofthe locally decoded blocks of
`prediction error values with a corresponding block of pre-
`dicted pixel values to produce reconstructed image blocks
`and stores themin frame store 120.
`
`As subsequent macroblocks of the video signal are
`received from the video source and undergo the previously
`described encoding and decoding steps in blocks 104, 106,
`108, 110, 112, a decoded version of the frame is built up in
`frame store 120. When the last macroblock of the frame has
`been processed, the frame store 120 contains a completely
`decoded frame, available for use as a motion prediction ref-
`
`
`
`US 7,532,808 B2
`
`7
`erence frame in encoding a subsequently received video
`frame in INTER-coded format.
`
`The details of the motion-compensated prediction per-
`formed by video encoder 100 will now be considered.
`Any frame encoded in INTER-coded format requires a
`reference frame for motion-compensated prediction. This
`means, necessarily, that when encoding a video sequence, the
`first frame to be encoded, whetherit is the first frame in the
`sequence, or someother frame, must be encoded in INTRA-
`coded format. This, in turn, means that when the video
`encoder 100 is switched into INTER-coding mode by control
`manager 160, a complete reference frame, formed by locally
`decoding a previously encoded frame, is already available in
`the frame store 120 of the encoder. In general, the reference
`frame is formed by locally decoding either an INTRA-coded
`frame or an INTER-coded frame.
`
`In the following description it will be assumed that the
`encoder performs motion compensated prediction on a mac-
`roblock basis, i.e. a macroblock is the smallest element of a
`video frame that can be associated with motion information.
`It will further be assumedthat a prediction for a given mac-
`roblock is formed by identifying a region of 16x16 values in
`the luminance componentof the reference frame that shows
`best correspondence with the 16x16 luminancevaluesof the
`macroblock in question. Motion-compensatedprediction in a
`video coding system where motion information maybe asso-
`ciated with elements smaller than a macroblock will be con-
`sidered laterin the text.
`
`The first step in forming a prediction for a macroblock of
`the current frame is performed by motion estimation block
`130. The motion estimation block 130 receives the blocks of
`
`luminance and chrominance values which make up the cur-
`rent macroblock of the frame to be codedvia line 128. It then
`
`performs a block matching operation in order to identify a
`region in the reference frame that corresponds best with the
`current macroblock.In order to perform the block matching
`operation, motion estimation block 130 accesses reference
`frame data stored in frame store 120 via line 127. More
`
`specifically, motion estimation block 130 performs block-
`matching by calculating difference values (e.g. sums of abso-
`lute differences) representing the difference in pixel values
`between the macroblock under examination and candidate
`best-matching regionsofpixels from a reference framestored
`in the frame store 120. A difference value is produced for
`candidate regionsat all possible offsets within a predefined
`search region of the reference frame and motion estimation
`block 130 determines the smallest calculated difference
`
`value. The candidate regionthat yields the smallest difference
`value is selected as the best-matching region. The offset
`between the current macroblock and the best-matching
`region identified in the reference frame defines a “motion
`vector” for the macroblock in question. The motion vector
`typically comprises a pair of numbers, one describing the
`horizontal (Ax) between the current macroblock andthe best-
`matching region of the reference frame, the other represent-
`ing the vertical displacement (Ay).
`Once the motion estimation block 130 has produced a
`motion vector for the macroblock,it outputs the motion vec-
`tor to the motion field coding block 140. The motion field
`coding block 140 approximates the motion vector received
`from motion estimation block 130 using a motion model
`comprising a set of basis functions and motion coefficients.
`Morespecifically, the motion field coding block 140 repre-
`sents the motion vector as a set of motion coefficient values
`which, when multiplied by the basis functions, form an
`approximation ofthe motion vector. Typically, a translational
`
`10
`
`15
`
`20
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`motion model having only two motion coefficients and basis
`functions is used, but motion models of greater complexity
`mayalso