`
`i. Sa. Google I VSL
`Replik vom 12. November 2015
`BPatG, Az. 2 Ni 5/15
`Quinn Emanuel LLP
`
`Rate-Distortion Optimization lor
`
`VIDEO
`MPRESSION
`
`Gary]. Sullivan and T!Jomas Wiegand
`
`T lt: rate-distortion efficiency of today's video
`
`compression schemes is based on a sophisti(cid:173)
`cated interaction.bctwcro various motion rep(cid:173)
`resentation possibilities, waveform coding of
`differences, and waveform coding of various refreshed re(cid:173)
`gions. Hence, a key problem in high-compression video
`coding is d1e operational control of me encoder. This
`problem is compounded by the widely varying content
`and motion found in typical video sequences, necessitating
`the selection between different representation possibilities
`with varying rate-distortion effi(cid:173)
`ciency.' This article addresses the
`problem of video encoder optimi(cid:173)
`zation and discusses its conse(cid:173)
`quences on the compression
`architecture of me overall coding
`system. Based on me well-known
`hybrid video coding structure,
`Lagrangian optinlization tech-
`niques are presented that try to answer me: question:
`"What part of the video signal should be coded USing what
`method and parameter settings?"
`
`Video Compression Basics
`Motion video data consists essentially of a time-ordered
`sequence of pictures, and cameras typically generate ap(cid:173)
`proximately 24, 25, or 30 pictures ·( or frames) per second~
`This results in a large amount of data m at demands m~
`use of compression. For. example, assume mat eacli pic(cid:173)
`ture has a relatively low "QCIF" (quarter~com
`mon·intermediate-format) resolution (i.e., 176 x 144
`samples) for which each sample is digitally represented
`widl 8 bits, and assume that we skip two out of every
`dlree pictures in order to cut
`down the bit rate. For color pic(cid:173)
`tures, three color component
`samples are necessary tO repre(cid:173)
`sent a sufficient color space for
`each pixel.. In order to transmit
`even mis relatively low-fidelity
`sequence of pictures, me raw
`source data tate is still more d1an
`6 Mbit/s. However, today's low-cost transmission chan(cid:173)
`nels often operate at much lower data rates so that me
`data rate of the video signal needs to be further com-
`
`Lagrangian methods for
`making good decisions in
`high-compression
`video coding
`
`74
`
`IEEE SIGNAL PROCESSING MAGAZINE
`1053-5888/98/$10.00@ 1998IEEE
`
`NOVEMBER 1998
`
`Vedanti Systems Limited - Ex. 2009
`Page 1
`
`
`
`A History of Existing
`Visual Coding Standards
`MPEG-2: A step higher in bit rare, picrurc quality, and
`H.l20: The first international digital video coding statl·
`dard [3J. It may have even been the first international digital
`popularity. MPEG-2 forms the heart of broadcast-quality
`digital tekvision for both st~ndard-ddinition and
`compression standard for natural continuous-tone visual
`content of any kind (whether video or still picture). H .I20
`high-definicion television (SDTV and HDTV) [7-9].
`MPEG-2 video (IS 13818-2/ITU-T H.262) was designed to
`was developed by the ITU-T organization (the International
`encompass MPEG-1 and to also provide high quality with
`Telecommunications Union- Telecommunications Stan(cid:173)
`dardization Sector, then called the CCITI), and received fi(cid:173)
`interlaced video sources at much higher bit rates. Although
`usually thought of as an ISO standard, MPEG-2 video was
`nal approval in 1984. It originally was a conditional
`replenishment (CR) coder with differential pulse-code mod(cid:173)
`developed as an official joint project of both the ISO/IEC
`ulation (DPCM), scalar quantization, and variable-kngth
`fTCl and ffU-T organizations, and was completed in late
`1994. Irs primary new technical features were efficient han(cid:173)
`coding, and it had an ability to switch to quincunx
`dling of interlaced-scan pictures and hierarchical bit-usage
`sub-sampling for bit-ra~ control. In 1988, a second version
`ofH.l20 added motion compensation and background pre(cid:173)
`scalability. Irs target bit-rate range was approximately 4-30
`diction. (None of the later comple~d standards have yet in(cid:173)
`Mbit/s.
`cluded background prediction again, although a form of it is
`H.263: The first codec designed specifically ro handle
`very low-bit-rate video, and its performance in that arena
`in the draft of the future MPEG-4 standard.) Irs operational
`is still state-of-the-art [l 0, ll]. H .263 is the current best
`bit rates were 1544 and 2048 Kbit/s. H.l20 is essentially no
`longer in use today, although a few H .l20 systems arc ru(cid:173)
`standard for practicaJvidco telecommunication. Its orig(cid:173)
`inal target bit-rate range was about l0-30 Kbit/s, but this
`mored to still be in operational condition.
`was broadened during development to perhaps at least
`H.26l: The first widespread practical success-a video
`codec capable of operation at affordable telecom bit rates
`10-2048 Kbit/s as it became apparent that it could be su(cid:173)
`(with 80-320 Kbit/s devoted to video) [4, 5]. It was the first
`perior to H.261 at any bit rare. H .263 (version l) was a
`project of the ITU-T and was approved in early 1996
`standard to use the basic typical structure we find still pre·
`(with technicalcontentcompleted in 1995). The key new
`dominant today (16 x 16macroblock motion compensation,
`techntcal features ofH.263 were variable block-size mo(cid:173)
`8 x 8 bloc I:: DCT, scalar quantization, and two-dimensional
`run-level variable-length entropy coding). H.261 was ap(cid:173)
`tion compensation, overlapped-block motion compensa(cid:173)
`tion (OBMC), picture-extrapolating motion vccrors,
`proved by the ITU-T in early 1991 (with technical content
`completed in late 1990). It was later revised in 1993 to in(cid:173)
`three-dimensional run-level-last variable-length coding,
`median MV prediction, and more efficient header infor(cid:173)
`clude a backward-compatible high-resolution graph ics
`mation signaling (and, relative to H.26l, arithmetic cod(cid:173)
`transfer mode. Its targ<:t bit-rate range was 64-2048 Kbit/s.
`ing, half-pixel motion, and bi-directional
`JPEG: A highly successful continuous-tone, still-picmre
`prediction-but the firsr of these three features was also
`coding standard named after the Joint Photographic Ex-pertS
`found in JPEG and some form of the other two were in
`Group that developed it [1, 2]. Anyone who has browsed the
`MPEG-1). At very low bit rates (e.g., below 30 Kbit/s),
`world-wide web has experienced JPEG. JPEG (IS
`H.263 can code with the same quality a.~ H.261 using
`l0918-l/ITU-TT.8l) was originally approved in 1992 and
`was developed as an official joint project of both the
`half or less than halfthe bit rate [ 12] . At greater bit rates
`(e.g., above 80 Kbit/s) it can provide a more moderate
`ISO/IEC fTC! and ITU-Torganizations. In its typical use,
`degree of performance superiority over H.261. (See also
`it is essentially H.261n,n"RA coding with prediction of aver(cid:173)
`age values and an ability to customize the quantizer recon(cid:173)
`H .263+ below.)
`H.263+: Technically a second version ofH.263 [io,
`struction scaling and the entropy coding to the specific
`picture content. However, there is much more in the Jl'EG
`13]. The H.263 + project added a number of new op(cid:173)
`tional features to H.263. One notable technical advance
`standard than what is typically described or used. In particu(cid:173)
`over prior standards is that H.263 version 2 was the first
`lar, this includes progressive coding, lossless coding, and
`video coding standard to offer a high degree of error re(cid:173)
`arithmetic coding.
`silience for wireless or packet-based transport networks.
`MPEG-1: A widely successful video codec capable of ap(cid:173)
`proximately VHS videotape quality or better at about 1.5
`H.263+ also added a number of improvements in com(cid:173)
`pression efficiency, custom and flexible video formats,
`Mbit/s and covering a bit rate range of about 1-2 Mbit/s [ 6,
`scalability, and backward-compatible supplemental en(cid:173)
`7]. MPEG-1 gets its acronym from the Moving Pictures Ex(cid:173)
`hancement information. It was approved in January of
`perts Group char developed it ( 6, 7]. MPEG-1 video (IS
`1998 by the ITU-T (with technical content completed in
`11172-2) was a project of the ISO/IEC fTCl organization
`September 1997). It extends the effective bit-rate range
`and was approved in 1993. In terms of technical features, it
`added bi-directionally predicted frames (known as
`of H.263 to essentially any bit rate and any progres(cid:173)
`sive-scan (noninterlace ) picture formats and frame rates,
`B-frames) and half-pixel motion. (Half-pixel modon had
`and H.263 + is capable of superior performance relative
`been proposed during the development of H .261, but was
`apparently thought to be roo complex at the time.} It pro(cid:173)
`to any existing standard over this entire range. The first
`author was the editor ofH.263 during the H.263+ pro(cid:173)
`vided superior quality than H.261 when operated at higher
`bit rates. (At bit races below, perhaps, I Mbitfs, H.261 per(cid:173)
`ject and is the Rapporteur (chairman) of the ITU-T Ad(cid:173)
`vanced Video Coding Experts Group (SG16/Ql5),
`forms better, as MPEG-1 was not de-signed to be capable of
`operation in this range.)
`which developed it.
`
`NOVEMBER 1998
`
`IEEE SIGNAL PROCESSING MAGAZINE
`
`75
`
`Vedanti Systems Limited - Ex. 2009
`Page 2
`
`
`
`The most successful class of
`video compression designs are
`called hybrid codecs.
`
`redundancy reduction method used in the tirst digital
`video codi;1g standard, 1TU-T Rcc. H.l20 (3]. CR cod(cid:173)
`ing consists of sending signals to indicate which areas of a
`·picture can just be repeated, and sending new coded in(cid:173)
`formation to replace the changed areas. CR thus allows a
`choice between one of two modes of representation for
`each area, which are called the SKIP mode and the INTRA
`mode. However, CR coding has a significant shortcom(cid:173)
`ing, which is its inability to refine an approximation. Of(cid:173)
`ten the content of an area of a prior picture can be a good
`approximation of the new picture, needing only a minor
`alteration to become a better representation. But CR cod(cid:173)
`ing allows only exact repetition or complete replacement
`of each picture area. Adding a third type of "prediction
`mode," in which a retiningframe diffirence approxima(cid:173)
`tion can be sent, results in a further improvement of com(cid:173)
`pression performance.
`The concept of frame difference retinement can also be
`taken a step further, by adding motion-compensated pred i c(cid:173)
`tion (MCP). Most changes in video content arc typically
`due to the motion of objects in the depicted scene relative
`to the imaging plane, and a small amount of motion can
`result in a large difference in the values of the pixels in a
`picture· area (especially near the edges of an object). Of(cid:173)
`ten, displacing an area of the prior picture by a few pixels
`in spatial location can result in a significant reduction in
`the amount of information that needs to be sent as a frame
`difference approximation. This u'e of spatial displace(cid:173)
`ment to form an approximation is known as motion com-
`
`0---- DCT.
`-
`-------- ------------------- -----------
`
`Input Frame
`
`Quantization,
`EntroPY Code
`
`I
`(Dotle~ Box
`Shows•Decoder)
`
`Motion
`Compensated
`Prediction
`
`pressed. For instance, using V.34 modems that transmit
`at most 33.4 Kbit/s over dial-up analog phone lines, we
`still need to compress the video bit rate furtl1er by a factor
`of about 200 (more if audio is consuming 6 Kbit/s of that
`same channel or if the phone line is roo noisy for achiev(cid:173)
`ing the full bit rate ofV.34).
`One way of compressing video content is simply to
`compress each picture, using an image-coding syntax
`such as JPEG [1, 2). The most common "baseline" JPEG
`scheme consists of breaking up the image into equal-size
`blocks. These blocks are transformed by a discrete cosine
`transform (DCT), and the DCT coefficients are then
`quantized ~nd transmitted using variable-length codes.
`We will refer to this kind of coding scheme as
`INTRA-frame coding, since the picture is coded without
`referring to other pictures in the video sequence. 1n fact,
`such INTRA coding alone (often called "motion JPEG") is
`in common use as a video coding method today in pro(cid:173)
`duction-quality editing systems that demand rapid access
`to any frame of video content.
`However, improved compression performance can be
`attained by taking advantage
`of the large amount of tempo(cid:173)
`ral redundancy in video con(cid:173)
`tent. We will refer to such
`techniques as INTER-frame
`coding. Usually, much of the
`depicted scene is essentially
`just repeated in picture after
`picture without any signifi(cid:173)
`cant change. It should be ob(cid:173)
`vious then that the video can
`be represented more .:ffi(cid:173)
`ciently by coding only the
`changes in the video content,
`rather than coding each entire
`picture repeatedly. This abil(cid:173)
`ity to usc
`the tempo·
`ral-domain redundancy to
`improve coding efficiency is
`what fundamentally distin(cid:173)
`guishes video compression
`from sdU-image compression.
`A simple method of im·
`proving compression by cod(cid:173)
`ing only the changes in a
`video scene is called condi(cid:173)
`tional replenishment (CR),
`and it was the only temporal 4 I. Typical mod on-compensated OCT video coder.
`
`Encoded Residual
`(To Channel)
`
`Entropy Decode,
`Inverse Quantize,
`Inverse DCT
`
`I
`I
`
`~ Approximated
`
`Input Frame
`
`(To Display)
`
`I
`
`t
`I
`
`I
`
`I
`
`I
`I
`
`I
`
`I
`
`I
`I
`!
`
`I
`
`j
`
`Prior Coded
`Approximated
`Frame
`
`Motion
`Frame Buffer
`I
`Compensated
`j (Delay)
`Prediction
`- ------ --------- ---------------- ---- -
`Motion Vector and
`Prediction Mode Data
`(To Channel)
`
`Motion
`Estimation and
`Mode Decision
`
`-
`
`76
`
`IEEE SIGNA~ PROCESSING MAGAZINE
`
`NOVEMBER 1998
`
`Vedanti Systems Limited - Ex. 2009
`Page 3
`
`
`
`In practice, a number of
`interactions between coding
`decisions must be neglected in
`video coding optimization.
`
`pcnsation and the encoder's search for the best spatial
`displacement approximation to use is known as motion
`estimation. The coding of the resulting difference signal
`for the retlnement of the MCP signal is known as dis(cid:173)
`placed frame difference (DFD) coding.
`Hence, the most successful class of video compression
`designs arc called hybrid codecs. The naming of this coder
`is due to its construction as a hybrid of motion-handling
`and picture-coding techniques, and the term codec is used
`to rder to both the coder and decoder of a video compres(cid:173)
`sion system. Figure l shows such a hybrid coder. Its de(cid:173)
`sign and operation involve the optimization of a number
`of decisions, including
`l. How to segment each picture into areas,
`2. Whether or not to replace each area of the picture
`with completely new INTRA-picture content,
`3. lf not replacing an area with new Il'TRA content
`(a) How to do motion estimation; i.e, how to select
`tbc spatial shifting displacement to usc for INTER-picture
`predictive coding (with a zero-valued displacemenr being
`an important special case),
`(b) How to do DFD coding; i.e., how to select the ap(cid:173)
`proximation to use as a refinement ofthe INTER predic(cid:173)
`tion (with a zero-valued approximation being an
`important special case), and
`4. If replacing an area with new INTRA content, what
`approximation to send as the replacement content.
`At this point, we have introduced a problem for the en(cid:173)
`gineer who designs such a video coding system, which is:
`What pm1: aftbe image sbould be coded ming whatm<:tbod 1 If
`the possible modes of operation are restricted to INTRA
`coding and SKI!', the choice is relatively simple. However,
`hybrid video codccs achieve their compression perfor(cid:173)
`mance by employing several modes of operation that are
`adaptively assigned to parts of the encoded picture, and
`there is a dependency between the etfccts of the motion
`estimation and DFD coding stages ofiNTER coding. The
`modes of operation arc generally associated with sig(cid:173)
`nal-dependent rate-distortion characteristics, and
`rate-distortion trade-offs are inherent in the design of
`each of these aspects. The second and third items above in
`particular arc unique to motion video coding. The opti(cid:173)
`mization of these decisions in tbe design and operation of
`a video coder is the primary topic of this article. Some fur(cid:173)
`ther techniques that go somewhat beyond this model will
`also be discussed.
`
`Motion-Compensated
`Video Coding Analysis
`Consider the nth coded picture of size W x H in a video
`sequence, consisting of an array I" ( s) of color component
`values (e.g., Y,. (s),Cb,. (s), and Cr., (s)) for each pixel lo·
`cations= (x,y), in which x and .V are integers such that
`0 ~ x < Wand 0 ~ y < H. The decoded approximation of
`this picture will be denoted as I,. (s ).
`The rypical video decoder (see Fig. l) receives a repre(cid:173)
`sentation of the picture that is segmented into some num(cid:173)
`ber K of distinct regional areas {5t,,,.} ,: 1 • For each area, a
`prediction-mode signal p E {0,1} is received indicating
`whether or not the area ;~'predicted from the prior pic·
`ture. For rhe areas that are predicted from the prior pic(cid:173)
`ture, a motion vector (MV), denoted v,_", is received. The
`MV specitles a spatial displacement for motion compen(cid:173)
`sation of that region. Using the prediction mode and
`
`An Overview of Future Visual
`Coding Standardization Projects
`MPEG-4: A fumre visual coding standard for both still
`and moving visual content. The ISOfiEC SC29 WG 11 or·
`ganization is currently developing two drafts, called ver(cid:173)
`sion 1 and version 2 of MPEG-4 visual. Final approval of
`version l is planned in r anuary 1999 (with technical con(cid:173)
`tent completed in October 1998), at1d approval of version
`2 is currently planned for approximately one year later.
`MPEG-4 visual (which will become IS 14496-2) will in·
`elude most technical features of the prior video and
`srill-picrure coding standards, and will also include anum(cid:173)
`ber of new tearures such as zero-tree wavelet coding of still
`pictttres, segmented shape coding of objects, and coding of
`hybrids ofsymhctic and narural video content. It will cover
`essentially all bit rates, picrurc formats, and frame rates, in(cid:173)
`cluding both interlaced and progressive-scan video pic(cid:173)
`tures. Its efficiency for predictive coding of normal
`camera-view video content will be similar to that ofH.263
`for noninterlaced video sources and similar to that of
`MPEG-2 for interlaced sources. For some special purpose
`and artiticially generated scenes, it will provide signifi·
`cantly superior compression performance and new ob(cid:173)
`ject-oriented capabilities. It will also contain a still-pi~-rure
`coder that has improved compression quality relative to
`]PEG at low bit rates.
`H.263++: Fururc enhancements of H.263. The
`H.263+ +.project is considering adding more optional en(cid:173)
`hancement' to H.263 and is currently scheduled for com(cid:173)
`pletion late in the y~ar 2000. It is a project of the ITU-T
`Advanced Video Coding Experts Group (SG16/Q15).
`JPEG-2000: A future new still-picture coding stan(cid:173)
`dard. JPEG-2000 is a joint project of the ITU-T SG8 and
`ISO/IEC JTCl SC29 WG l organizations. It is scheduled
`for completion late in the year 2000.
`H.26L: A future new generation of video coding stan(cid:173)
`dard with improved efficiency, error resilience, and stream(cid:173)
`ing support. H.26L is currently scheduled for approval in
`2002. It is a project of the ITU-T Advanced Video Coding
`Experts Group (SG16/Q15).
`
`NOVEMBER 199a
`
`lEEl; SIGNAL PROCESSING MACiAZlNE
`
`77
`
`Vedanti Systems Limited - Ex. 2009
`Page 4
`
`
`
`MV, an MCP i. (s) is formed for eath pixd location
`S E.!'l. I,IJ
`i,.(s)= p,_,. -i •. , (s-l';,.),s eA,,,..
`
`(l)
`
`(Nore: The MV v. has no effect ifp = Oand so theM V
`is rheref<Jre norrr:ally nor sent in th~i:"case.)
`In addition to the prediction mode and MV informa(cid:173)
`tion, the decoder receives an approximation ii,,. ( s) of the
`DFD residual error _R1 •• (s) between the true image value
`I. ( s) and its MCP I., ( s). It then adds the residual signal
`to the prediction to form the final coded representation
`i. (s) = l.,(s) + ii,_. (s), sEA •.
`
`(2)
`
`Since there is often no movement in large parts of the pic(cid:173)
`tun:, and since the representation of such regions in the
`previous picture may be adequate, video coders often
`provide special provisions for a SKIP mode of area treat(cid:173)
`ment, which is efficiently transmitted using very short
`code words (P, .• = 1, "'·" = O,.ii,,. (s) = 0).
`ln video coders designed primarily for natural cam(cid:173)
`era-view scene content, often little real freedom is given
`to the encoder for choosing the segmentation of the pic(cid:173)
`ture into region areas. Instead, rhe segmentation is typi(cid:173)
`cally either fixed to always consist of a particular
`two-dimensional block size (typically 16 x 16 pixels for
`prediction-mode signals and 8 x 8 for DFD residual con(cid:173)
`tent) or in some cases it is allowed to switch adaptively be(cid:173)
`tween block sizes (such as allowing the segmentation
`used for motion compensation to have either a 16 x 16 or
`8 x 8 block size). This is because providing the encoder
`more freedom to specify a precise segmentation has gen(cid:173)
`erally not yet rc.~ulted in a significant improvement of
`compression performance for natural camera-view scene
`content (due to the number of bits needed to specify the
`segmentation), and also because determining the best
`possible segmentation in an encoder can be very complex.
`However, in special applications (especially those includ(cid:173)
`ing artificially constructed picture content rather than
`camera -view scenes), segmented object-based coding can
`be justified. Rate-distortion optimization of
`segmentations for variable block-size video coding was
`first discussed in [30, 31 ], which was later enhanced to in(cid:173)
`clude dynamic programming· to account ior sequential
`dependencies in [37]-(39]- The optimization of coders
`that use object segmentation is discussed in an accompa(cid:173)
`nying article [ 15).
`
`Distortion Meosures
`Rate-distortion optimization requires an ability to mea(cid:173)
`sure distortion. However, the perceived distortion in vi(cid:173)
`sual content is a vc:ry difficult quantity to measure, as the
`characteristics of the human visual system are complex
`and not well understood. This problem is aggravated in
`video coding, because the addition of the temporal do(cid:173)
`main relative to still-picture coding further complicates
`
`Standard Hybrid Video Codec Terminology
`
`The following terms are useful for understanding the
`
`various international standards for video coding:
`prediction mode: A basic representation model that is
`selected for use in approximating a picture region (INTRA,
`INTER, etc.).
`mode decision: An encoding process rhat selects the
`prediction mode for each region to be encoded.
`block: A rectangular region (normally of size 8 x 8) in a
`picture. The discrete cosine transform (DCf) in standard
`video coders operates on 8 x 8 block regions.
`macroblock: A region of size l6x 16 in the luminance
`picture and the corresponding region of chrominancc in(cid:173)
`formation (often an8x 8 region), which is associated with
`a prediction mode.
`motion vector (MV): A spatial displacement offset for
`use in the prediction of an image region. In the INTER pre(cid:173)
`diction mode an MV affects a macro block region, while in
`the INTER +4V prediction mode, an individual MV is sent
`for each of the fourS x Bluminance blocks in a macroblock.
`motion compensation: A decoding process that repre(cid:173)
`sents motion in each region of a picture by application of
`the transmitted MVs w the prior decoded picrure.
`motion estimation: An encoding process that selects
`the MVs to be used for motion compensation.
`half-pixel motion: A representation of motion in
`which an MV may specify prediction from pixel locations
`that are halfWa)' between the pixel grid locations in the
`prior picture, rhus requiring interpolation to construct the
`prediction of an image region.
`picture-extrapolating MV s: A representation of mo(cid:173)
`tion in which an MV may specify prediction from pixel lo(cid:173)
`cations that lie partly or entirely outside the boundaries of
`the prior picture, thus requiring extrapolation of the edges
`of the picture to construct the prediction of an image re(cid:173)
`gion.
`overlapped-block motion compensation (OBMC):
`A representation of motion in which the MVs that repre(cid:173)
`sent the motion in a picrure have overlapping areas ofinflu(cid:173)
`et:'IU.
`INTRA mode: A prediction mode in which the picture
`con rem of a macroblock region is represented without ref(cid:173)
`erence to a region in any previously decoded picture.
`SKIP mode: A predictinn mode in which the picrure
`content of a macroblock region is represented as a copy of
`the macroblock in the same location in a previously de(cid:173)
`coded picture.
`INTER mode: A prediction mode in which the picrure
`content of a macroblock region is represented as the sum of
`a motion-compensated prediction nsing a motion ''ector,
`plus (optionally} a decoded rc.o;idual difference signal repre(cid:173)
`sentation.
`INTEll+4Vmode, A prediction mode in which the pic(cid:173)
`ture content of a macroblock. region is represented as in the
`INTER mode, but using four motion vectors (one for each
`8 x 8 block in the macroblock).
`INTEll+Q mode: A prediction mode in which the pic(cid:173)
`rure content of a macroblock is represented as in the INTER
`mode, and a change is indicated for the inverse
`quantization scaling of the decoded residual signal repre(cid:173)
`sentation.
`
`78
`
`IEEE SIIONAl PROCESSING MAGADNE
`
`NOVEMBER 1998
`
`Vedanti Systems Limited - Ex. 2009
`Page 5
`
`
`
`the issue. In practice, highly imperfect distortion models
`such as the sum of squared differences (SSD) or irs equiv(cid:173)
`alents, known as mean squared error (MSE) or peak sig(cid:173)
`nal-ro-noise ratio (PSNR), are used in most actual
`comparisons. They are defined by
`SSD~ (F,G) = :L,!F(s) -G(s) !2
`
`(3)
`
`.u:-1
`
`MSE, (F,G)=__!_ SSD, (F,G)
`.
`!..'21.1
`.
`.
`
`,
`PSNR_, (F,G) = 10log 111
`
`(255) 2
`.
`, decibels.
`MSE" (F,G)
`
`(4)
`
`(5)
`
`Another distortion measure in common use (since it is of(cid:173)
`ten easier to compute) is the sum of absolute differences
`(SAD)
`
`SAD_,(F,G)= I,!F(s) - G(s) !
`
`(6)
`
`where F and G are two array argumenrs (such as lumi(cid:173)
`nance arrays of the acrual and approximated pictures).
`These measures are often applied to only the luminance
`field of the picture during optimization processes, but
`
`better performance can be obtained by including all three
`color components. (The chrominance components are
`often treated as something of a minor nuisance in video
`coding; since they need only about I 0% of the bit rate of
`the luminance they provide a limited opportunity for op(cid:173)
`timization gain.)
`
`EHectiveness of Basic Technical Features
`In the previous sections we described the various
`technical features of a basic modern video coder. The cf·
`fectiveness of these features and the dependence of this
`ctfectivem:ss on video content is shown in rig. 2. The
`upper plot of Fig. 2 shows performance for a
`videophone sequence known as Mother & Daughter,
`with moderate object motion and a stationary back·
`ground. The lower plot ofFig. 2 shows performance for
`a more demanding scene known as Foreman, with heavy
`object motion and an unstable hand-held-moving cam(cid:173)
`era. Each sequence was encoded in QCIF resolution at
`10 frames per second using the framework of a
`well-optimized H.263 [10] video encoder (using opti(cid:173)
`mization methods described later in this article). ( H .263
`has 16 x 16 prediction-mode regions called macroblocks
`and 8 x 8 DCT-based D FD coding.)
`
`Complicating Factors in
`Video Coding Optimization
`T he video coder model described in this article is useful for
`age area may also be ftltered to avoid high-frequency artifacts
`(as in Rcc. H.261 [4]).
`illustration purposes, but in practice actual video cod.:r
`Often there are interactions between the coding of differ-
`designs often differ from it in various ways that complicate
`ent regions in a video coder. The number of bits needed to
`design and analysis. Some of the important differences are
`specify an MV value may depend on the va.lues of the MVs in
`described in the following few paragraphs.
`neighboring regions. The areas of influence of differentMVs
`Color chrominance components (e.g., Cb.( s) and Cr, ( s ))
`can be overlapping due to overlapped-block motion com-
`are often represented with lower resolution (e.g.,
`W I 2x H /2) than theluminancecomponentoftheimage
`pensation (OBMC) [16]·[19], and the areas of influence of
`coded transform blocks can also overlap due to the applica-
`Y(s). This is because the human psycho-visual system is
`tion of deblocking filters. While these cross-dependencies
`much more sensitive to brightness than to chrominance, al-
`can improve coding performance, they can also complicate
`lowing bit·ratc savings by coding the chrominance at lower
`the task of optimizing the decisions made in an encoder. For
`resolution. In such a system, the method of operation must
`this reason these cross-dependencies arc often neglected (or
`be adjusted to account for the difference in resolution (for
`only partially accounted for) during encoder optimization.
`example, by dividing the MV values by two for chrominance
`One important and often-neglected interaction be-
`components).
`Since image values I, ( s) are defined only fur integer pixel
`tween the coding of video regions is the temporal propa-
`gation of error. The fidelity of each area of a particular
`locations s = (x,y) within the rectangular picrure area, the
`picture will affect the ability to usc that picture area for
`above model will work properly in the strict sense only if ev-
`ery motion vector "•·• is restricted to have an integer value
`the prediction of subsequent pictures. Real-rime
`encoders must neglect to account for this aspect to a large
`and only a value that causes access to locations in the prior
`extent, since they cannot tolerate the delay necessary for
`picrure that are within the picrure's rectangular boundary.
`optimizing a long temporal sequence of decisions with
`These restrictions, which are maintained in some early
`accounting for the cemporal effects on many pictures.
`video-coding methods such as ITU·T Rec. H.261 [4], are
`However, even nonreal-time encoders also often neglect
`detrimental to performance. More recent designs such as
`this to account for this propagation in any significant
`ITU-T Rcc. H.263 [ 10] support the removal of these restric-
`way, due to the sheer complexiry of adding this extra di·
`tions by using interpolation ofrhe prior picture for any frac-
`tional-valuedMVs (normally half-integer values, resulting in mcnsion to the analysis. An example for the exploitation
`what is called half-pixel motion) andMVs that access loca-
`oftemporai dependencies in video coding can be found
`tions outside the boundary of the picture (resulting in what
`in [20 ]. The work of Ramchandran, Ortega, and Vetterli
`we call picrure-e:XrrapolatingMVs). The prediction of an im-
`in [20] was extended by Lee and Dickinson in [21].
`
`NOVEMBER 1996
`
`IEEE SIGiiAI. PROCESSiNG MAGAZINE
`
`79
`
`Vedanti Systems Limited - Ex. 2009
`Page 6
`
`
`
`A gain in performanct: is shown for forming a CR
`coder by adding rhc SKIP coding mode ro the encoder.
`Further gains in performance arc shown when adding the
`various INTER coding modes to the encoder that were
`discussed in the previous sections:
`& INTER (MV = {0,0) only): frame-difierence coding
`with only zero-valued MY displacements
`4 INTER (Full-pixel motion compensation). inte(cid:173)
`ger-pixel (full-pixel) precision motion compensation
`with DFD coding
`.ft. INTER (Half.:pixel motion compensation): half-pixel
`precision motion compensation with DFD coding
`4 INTER & INTER +4V: half-pixel precision motion com(cid:173)
`pensation with DFD coding and the addition of an "ad(cid:173)
`vanced prediction" mode (H.263 Annex F), which
`includes a segmentation switch allowing a choice of either
`one or four MVs per 16 x 16 area and also includes over(cid:173)
`lapped-block motion compensation (OBMC) and pic(cid:173)
`rure-extrapolating MVs [ 10]. (The use of four