throbber
Anlage 0 8
`
`i. Sa. Google I VSL
`Replik vom 12. November 2015
`BPatG, Az. 2 Ni 5/15
`Quinn Emanuel LLP
`
`Rate-Distortion Optimization lor
`
`VIDEO
`MPRESSION
`
`Gary]. Sullivan and T!Jomas Wiegand
`
`T lt: rate-distortion efficiency of today's video
`
`compression schemes is based on a sophisti(cid:173)
`cated interaction.bctwcro various motion rep(cid:173)
`resentation possibilities, waveform coding of
`differences, and waveform coding of various refreshed re(cid:173)
`gions. Hence, a key problem in high-compression video
`coding is d1e operational control of me encoder. This
`problem is compounded by the widely varying content
`and motion found in typical video sequences, necessitating
`the selection between different representation possibilities
`with varying rate-distortion effi(cid:173)
`ciency.' This article addresses the
`problem of video encoder optimi(cid:173)
`zation and discusses its conse(cid:173)
`quences on the compression
`architecture of me overall coding
`system. Based on me well-known
`hybrid video coding structure,
`Lagrangian optinlization tech-
`niques are presented that try to answer me: question:
`"What part of the video signal should be coded USing what
`method and parameter settings?"
`
`Video Compression Basics
`Motion video data consists essentially of a time-ordered
`sequence of pictures, and cameras typically generate ap(cid:173)
`proximately 24, 25, or 30 pictures ·( or frames) per second~
`This results in a large amount of data m at demands m~
`use of compression. For. example, assume mat eacli pic(cid:173)
`ture has a relatively low "QCIF" (quarter~com­
`mon·intermediate-format) resolution (i.e., 176 x 144
`samples) for which each sample is digitally represented
`widl 8 bits, and assume that we skip two out of every
`dlree pictures in order to cut
`down the bit rate. For color pic(cid:173)
`tures, three color component
`samples are necessary tO repre(cid:173)
`sent a sufficient color space for
`each pixel.. In order to transmit
`even mis relatively low-fidelity
`sequence of pictures, me raw
`source data tate is still more d1an
`6 Mbit/s. However, today's low-cost transmission chan(cid:173)
`nels often operate at much lower data rates so that me
`data rate of the video signal needs to be further com-
`
`Lagrangian methods for
`making good decisions in
`high-compression
`video coding
`
`74
`
`IEEE SIGNAL PROCESSING MAGAZINE
`1053-5888/98/$10.00@ 1998IEEE
`
`NOVEMBER 1998
`
`Vedanti Systems Limited - Ex. 2009
`Page 1
`
`

`
`A History of Existing
`Visual Coding Standards
`MPEG-2: A step higher in bit rare, picrurc quality, and
`H.l20: The first international digital video coding statl·
`dard [3J. It may have even been the first international digital
`popularity. MPEG-2 forms the heart of broadcast-quality
`digital tekvision for both st~ndard-ddinition and
`compression standard for natural continuous-tone visual
`content of any kind (whether video or still picture). H .I20
`high-definicion television (SDTV and HDTV) [7-9].
`MPEG-2 video (IS 13818-2/ITU-T H.262) was designed to
`was developed by the ITU-T organization (the International
`encompass MPEG-1 and to also provide high quality with
`Telecommunications Union- Telecommunications Stan(cid:173)
`dardization Sector, then called the CCITI), and received fi(cid:173)
`interlaced video sources at much higher bit rates. Although
`usually thought of as an ISO standard, MPEG-2 video was
`nal approval in 1984. It originally was a conditional
`replenishment (CR) coder with differential pulse-code mod(cid:173)
`developed as an official joint project of both the ISO/IEC
`ulation (DPCM), scalar quantization, and variable-kngth
`fTCl and ffU-T organizations, and was completed in late
`1994. Irs primary new technical features were efficient han(cid:173)
`coding, and it had an ability to switch to quincunx
`dling of interlaced-scan pictures and hierarchical bit-usage
`sub-sampling for bit-ra~ control. In 1988, a second version
`ofH.l20 added motion compensation and background pre(cid:173)
`scalability. Irs target bit-rate range was approximately 4-30
`diction. (None of the later comple~d standards have yet in(cid:173)
`Mbit/s.
`cluded background prediction again, although a form of it is
`H.263: The first codec designed specifically ro handle
`very low-bit-rate video, and its performance in that arena
`in the draft of the future MPEG-4 standard.) Irs operational
`is still state-of-the-art [l 0, ll]. H .263 is the current best
`bit rates were 1544 and 2048 Kbit/s. H.l20 is essentially no
`longer in use today, although a few H .l20 systems arc ru(cid:173)
`standard for practicaJvidco telecommunication. Its orig(cid:173)
`inal target bit-rate range was about l0-30 Kbit/s, but this
`mored to still be in operational condition.
`was broadened during development to perhaps at least
`H.26l: The first widespread practical success-a video
`codec capable of operation at affordable telecom bit rates
`10-2048 Kbit/s as it became apparent that it could be su(cid:173)
`(with 80-320 Kbit/s devoted to video) [4, 5]. It was the first
`perior to H.261 at any bit rare. H .263 (version l) was a
`project of the ITU-T and was approved in early 1996
`standard to use the basic typical structure we find still pre·
`(with technicalcontentcompleted in 1995). The key new
`dominant today (16 x 16macroblock motion compensation,
`techntcal features ofH.263 were variable block-size mo(cid:173)
`8 x 8 bloc I:: DCT, scalar quantization, and two-dimensional
`run-level variable-length entropy coding). H.261 was ap(cid:173)
`tion compensation, overlapped-block motion compensa(cid:173)
`tion (OBMC), picture-extrapolating motion vccrors,
`proved by the ITU-T in early 1991 (with technical content
`completed in late 1990). It was later revised in 1993 to in(cid:173)
`three-dimensional run-level-last variable-length coding,
`median MV prediction, and more efficient header infor(cid:173)
`clude a backward-compatible high-resolution graph ics
`mation signaling (and, relative to H.26l, arithmetic cod(cid:173)
`transfer mode. Its targ<:t bit-rate range was 64-2048 Kbit/s.
`ing, half-pixel motion, and bi-directional
`JPEG: A highly successful continuous-tone, still-picmre
`prediction-but the firsr of these three features was also
`coding standard named after the Joint Photographic Ex-pertS
`found in JPEG and some form of the other two were in
`Group that developed it [1, 2]. Anyone who has browsed the
`MPEG-1). At very low bit rates (e.g., below 30 Kbit/s),
`world-wide web has experienced JPEG. JPEG (IS
`H.263 can code with the same quality a.~ H.261 using
`l0918-l/ITU-TT.8l) was originally approved in 1992 and
`was developed as an official joint project of both the
`half or less than halfthe bit rate [ 12] . At greater bit rates
`(e.g., above 80 Kbit/s) it can provide a more moderate
`ISO/IEC fTC! and ITU-Torganizations. In its typical use,
`degree of performance superiority over H.261. (See also
`it is essentially H.261n,n"RA coding with prediction of aver(cid:173)
`age values and an ability to customize the quantizer recon(cid:173)
`H .263+ below.)
`H.263+: Technically a second version ofH.263 [io,
`struction scaling and the entropy coding to the specific
`picture content. However, there is much more in the Jl'EG
`13]. The H.263 + project added a number of new op(cid:173)
`tional features to H.263. One notable technical advance
`standard than what is typically described or used. In particu(cid:173)
`over prior standards is that H.263 version 2 was the first
`lar, this includes progressive coding, lossless coding, and
`video coding standard to offer a high degree of error re(cid:173)
`arithmetic coding.
`silience for wireless or packet-based transport networks.
`MPEG-1: A widely successful video codec capable of ap(cid:173)
`proximately VHS videotape quality or better at about 1.5
`H.263+ also added a number of improvements in com(cid:173)
`pression efficiency, custom and flexible video formats,
`Mbit/s and covering a bit rate range of about 1-2 Mbit/s [ 6,
`scalability, and backward-compatible supplemental en(cid:173)
`7]. MPEG-1 gets its acronym from the Moving Pictures Ex(cid:173)
`hancement information. It was approved in January of
`perts Group char developed it ( 6, 7]. MPEG-1 video (IS
`1998 by the ITU-T (with technical content completed in
`11172-2) was a project of the ISO/IEC fTCl organization
`September 1997). It extends the effective bit-rate range
`and was approved in 1993. In terms of technical features, it
`added bi-directionally predicted frames (known as
`of H.263 to essentially any bit rate and any progres(cid:173)
`sive-scan (noninterlace ) picture formats and frame rates,
`B-frames) and half-pixel motion. (Half-pixel modon had
`and H.263 + is capable of superior performance relative
`been proposed during the development of H .261, but was
`apparently thought to be roo complex at the time.} It pro(cid:173)
`to any existing standard over this entire range. The first
`author was the editor ofH.263 during the H.263+ pro(cid:173)
`vided superior quality than H.261 when operated at higher
`bit rates. (At bit races below, perhaps, I Mbitfs, H.261 per(cid:173)
`ject and is the Rapporteur (chairman) of the ITU-T Ad(cid:173)
`vanced Video Coding Experts Group (SG16/Ql5),
`forms better, as MPEG-1 was not de-signed to be capable of
`operation in this range.)
`which developed it.
`
`NOVEMBER 1998
`
`IEEE SIGNAL PROCESSING MAGAZINE
`
`75
`
`Vedanti Systems Limited - Ex. 2009
`Page 2
`
`

`
`The most successful class of
`video compression designs are
`called hybrid codecs.
`
`redundancy reduction method used in the tirst digital
`video codi;1g standard, 1TU-T Rcc. H.l20 (3]. CR cod(cid:173)
`ing consists of sending signals to indicate which areas of a
`·picture can just be repeated, and sending new coded in(cid:173)
`formation to replace the changed areas. CR thus allows a
`choice between one of two modes of representation for
`each area, which are called the SKIP mode and the INTRA
`mode. However, CR coding has a significant shortcom(cid:173)
`ing, which is its inability to refine an approximation. Of(cid:173)
`ten the content of an area of a prior picture can be a good
`approximation of the new picture, needing only a minor
`alteration to become a better representation. But CR cod(cid:173)
`ing allows only exact repetition or complete replacement
`of each picture area. Adding a third type of "prediction
`mode," in which a retiningframe diffirence approxima(cid:173)
`tion can be sent, results in a further improvement of com(cid:173)
`pression performance.
`The concept of frame difference retinement can also be
`taken a step further, by adding motion-compensated pred i c(cid:173)
`tion (MCP). Most changes in video content arc typically
`due to the motion of objects in the depicted scene relative
`to the imaging plane, and a small amount of motion can
`result in a large difference in the values of the pixels in a
`picture· area (especially near the edges of an object). Of(cid:173)
`ten, displacing an area of the prior picture by a few pixels
`in spatial location can result in a significant reduction in
`the amount of information that needs to be sent as a frame
`difference approximation. This u'e of spatial displace(cid:173)
`ment to form an approximation is known as motion com-
`
`0---- DCT.
`-
`-------- ------------------- -----------
`
`Input Frame
`
`Quantization,
`EntroPY Code
`
`I
`(Dotle~ Box
`Shows•Decoder)
`
`Motion
`Compensated
`Prediction
`
`pressed. For instance, using V.34 modems that transmit
`at most 33.4 Kbit/s over dial-up analog phone lines, we
`still need to compress the video bit rate furtl1er by a factor
`of about 200 (more if audio is consuming 6 Kbit/s of that
`same channel or if the phone line is roo noisy for achiev(cid:173)
`ing the full bit rate ofV.34).
`One way of compressing video content is simply to
`compress each picture, using an image-coding syntax
`such as JPEG [1, 2). The most common "baseline" JPEG
`scheme consists of breaking up the image into equal-size
`blocks. These blocks are transformed by a discrete cosine
`transform (DCT), and the DCT coefficients are then
`quantized ~nd transmitted using variable-length codes.
`We will refer to this kind of coding scheme as
`INTRA-frame coding, since the picture is coded without
`referring to other pictures in the video sequence. 1n fact,
`such INTRA coding alone (often called "motion JPEG") is
`in common use as a video coding method today in pro(cid:173)
`duction-quality editing systems that demand rapid access
`to any frame of video content.
`However, improved compression performance can be
`attained by taking advantage
`of the large amount of tempo(cid:173)
`ral redundancy in video con(cid:173)
`tent. We will refer to such
`techniques as INTER-frame
`coding. Usually, much of the
`depicted scene is essentially
`just repeated in picture after
`picture without any signifi(cid:173)
`cant change. It should be ob(cid:173)
`vious then that the video can
`be represented more .:ffi(cid:173)
`ciently by coding only the
`changes in the video content,
`rather than coding each entire
`picture repeatedly. This abil(cid:173)
`ity to usc
`the tempo·
`ral-domain redundancy to
`improve coding efficiency is
`what fundamentally distin(cid:173)
`guishes video compression
`from sdU-image compression.
`A simple method of im·
`proving compression by cod(cid:173)
`ing only the changes in a
`video scene is called condi(cid:173)
`tional replenishment (CR),
`and it was the only temporal 4 I. Typical mod on-compensated OCT video coder.
`
`Encoded Residual
`(To Channel)
`
`Entropy Decode,
`Inverse Quantize,
`Inverse DCT
`
`I
`I
`
`~ Approximated
`
`Input Frame
`
`(To Display)
`
`I
`
`t
`I
`
`I
`
`I
`
`I
`I
`
`I
`
`I
`
`I
`I
`!
`
`I
`
`j
`
`Prior Coded
`Approximated
`Frame
`
`Motion
`Frame Buffer
`I
`Compensated
`j (Delay)
`Prediction
`- ------ --------- ---------------- ---- -
`Motion Vector and
`Prediction Mode Data
`(To Channel)
`
`Motion
`Estimation and
`Mode Decision
`
`-
`
`76
`
`IEEE SIGNA~ PROCESSING MAGAZINE
`
`NOVEMBER 1998
`
`Vedanti Systems Limited - Ex. 2009
`Page 3
`
`

`
`In practice, a number of
`interactions between coding
`decisions must be neglected in
`video coding optimization.
`
`pcnsation and the encoder's search for the best spatial
`displacement approximation to use is known as motion
`estimation. The coding of the resulting difference signal
`for the retlnement of the MCP signal is known as dis(cid:173)
`placed frame difference (DFD) coding.
`Hence, the most successful class of video compression
`designs arc called hybrid codecs. The naming of this coder
`is due to its construction as a hybrid of motion-handling
`and picture-coding techniques, and the term codec is used
`to rder to both the coder and decoder of a video compres(cid:173)
`sion system. Figure l shows such a hybrid coder. Its de(cid:173)
`sign and operation involve the optimization of a number
`of decisions, including
`l. How to segment each picture into areas,
`2. Whether or not to replace each area of the picture
`with completely new INTRA-picture content,
`3. lf not replacing an area with new Il'TRA content
`(a) How to do motion estimation; i.e, how to select
`tbc spatial shifting displacement to usc for INTER-picture
`predictive coding (with a zero-valued displacemenr being
`an important special case),
`(b) How to do DFD coding; i.e., how to select the ap(cid:173)
`proximation to use as a refinement ofthe INTER predic(cid:173)
`tion (with a zero-valued approximation being an
`important special case), and
`4. If replacing an area with new INTRA content, what
`approximation to send as the replacement content.
`At this point, we have introduced a problem for the en(cid:173)
`gineer who designs such a video coding system, which is:
`What pm1: aftbe image sbould be coded ming whatm<:tbod 1 If
`the possible modes of operation are restricted to INTRA
`coding and SKI!', the choice is relatively simple. However,
`hybrid video codccs achieve their compression perfor(cid:173)
`mance by employing several modes of operation that are
`adaptively assigned to parts of the encoded picture, and
`there is a dependency between the etfccts of the motion
`estimation and DFD coding stages ofiNTER coding. The
`modes of operation arc generally associated with sig(cid:173)
`nal-dependent rate-distortion characteristics, and
`rate-distortion trade-offs are inherent in the design of
`each of these aspects. The second and third items above in
`particular arc unique to motion video coding. The opti(cid:173)
`mization of these decisions in tbe design and operation of
`a video coder is the primary topic of this article. Some fur(cid:173)
`ther techniques that go somewhat beyond this model will
`also be discussed.
`
`Motion-Compensated
`Video Coding Analysis
`Consider the nth coded picture of size W x H in a video
`sequence, consisting of an array I" ( s) of color component
`values (e.g., Y,. (s),Cb,. (s), and Cr., (s)) for each pixel lo·
`cations= (x,y), in which x and .V are integers such that
`0 ~ x < Wand 0 ~ y < H. The decoded approximation of
`this picture will be denoted as I,. (s ).
`The rypical video decoder (see Fig. l) receives a repre(cid:173)
`sentation of the picture that is segmented into some num(cid:173)
`ber K of distinct regional areas {5t,,,.} ,: 1 • For each area, a
`prediction-mode signal p E {0,1} is received indicating
`whether or not the area ;~'predicted from the prior pic·
`ture. For rhe areas that are predicted from the prior pic(cid:173)
`ture, a motion vector (MV), denoted v,_", is received. The
`MV specitles a spatial displacement for motion compen(cid:173)
`sation of that region. Using the prediction mode and
`
`An Overview of Future Visual
`Coding Standardization Projects
`MPEG-4: A fumre visual coding standard for both still
`and moving visual content. The ISOfiEC SC29 WG 11 or·
`ganization is currently developing two drafts, called ver(cid:173)
`sion 1 and version 2 of MPEG-4 visual. Final approval of
`version l is planned in r anuary 1999 (with technical con(cid:173)
`tent completed in October 1998), at1d approval of version
`2 is currently planned for approximately one year later.
`MPEG-4 visual (which will become IS 14496-2) will in·
`elude most technical features of the prior video and
`srill-picrure coding standards, and will also include anum(cid:173)
`ber of new tearures such as zero-tree wavelet coding of still
`pictttres, segmented shape coding of objects, and coding of
`hybrids ofsymhctic and narural video content. It will cover
`essentially all bit rates, picrurc formats, and frame rates, in(cid:173)
`cluding both interlaced and progressive-scan video pic(cid:173)
`tures. Its efficiency for predictive coding of normal
`camera-view video content will be similar to that ofH.263
`for noninterlaced video sources and similar to that of
`MPEG-2 for interlaced sources. For some special purpose
`and artiticially generated scenes, it will provide signifi·
`cantly superior compression performance and new ob(cid:173)
`ject-oriented capabilities. It will also contain a still-pi~-rure
`coder that has improved compression quality relative to
`]PEG at low bit rates.
`H.263++: Fururc enhancements of H.263. The
`H.263+ +.project is considering adding more optional en(cid:173)
`hancement' to H.263 and is currently scheduled for com(cid:173)
`pletion late in the y~ar 2000. It is a project of the ITU-T
`Advanced Video Coding Experts Group (SG16/Q15).
`JPEG-2000: A future new still-picture coding stan(cid:173)
`dard. JPEG-2000 is a joint project of the ITU-T SG8 and
`ISO/IEC JTCl SC29 WG l organizations. It is scheduled
`for completion late in the year 2000.
`H.26L: A future new generation of video coding stan(cid:173)
`dard with improved efficiency, error resilience, and stream(cid:173)
`ing support. H.26L is currently scheduled for approval in
`2002. It is a project of the ITU-T Advanced Video Coding
`Experts Group (SG16/Q15).
`
`NOVEMBER 199a
`
`lEEl; SIGNAL PROCESSING MACiAZlNE
`
`77
`
`Vedanti Systems Limited - Ex. 2009
`Page 4
`
`

`
`MV, an MCP i. (s) is formed for eath pixd location
`S E.!'l. I,IJ
`i,.(s)= p,_,. -i •. , (s-l';,.),s eA,,,..
`
`(l)
`
`(Nore: The MV v. has no effect ifp = Oand so theM V
`is rheref<Jre norrr:ally nor sent in th~i:"case.)
`In addition to the prediction mode and MV informa(cid:173)
`tion, the decoder receives an approximation ii,,. ( s) of the
`DFD residual error _R1 •• (s) between the true image value
`I. ( s) and its MCP I., ( s). It then adds the residual signal
`to the prediction to form the final coded representation
`i. (s) = l.,(s) + ii,_. (s), sEA •.
`
`(2)
`
`Since there is often no movement in large parts of the pic(cid:173)
`tun:, and since the representation of such regions in the
`previous picture may be adequate, video coders often
`provide special provisions for a SKIP mode of area treat(cid:173)
`ment, which is efficiently transmitted using very short
`code words (P, .• = 1, "'·" = O,.ii,,. (s) = 0).
`ln video coders designed primarily for natural cam(cid:173)
`era-view scene content, often little real freedom is given
`to the encoder for choosing the segmentation of the pic(cid:173)
`ture into region areas. Instead, rhe segmentation is typi(cid:173)
`cally either fixed to always consist of a particular
`two-dimensional block size (typically 16 x 16 pixels for
`prediction-mode signals and 8 x 8 for DFD residual con(cid:173)
`tent) or in some cases it is allowed to switch adaptively be(cid:173)
`tween block sizes (such as allowing the segmentation
`used for motion compensation to have either a 16 x 16 or
`8 x 8 block size). This is because providing the encoder
`more freedom to specify a precise segmentation has gen(cid:173)
`erally not yet rc.~ulted in a significant improvement of
`compression performance for natural camera-view scene
`content (due to the number of bits needed to specify the
`segmentation), and also because determining the best
`possible segmentation in an encoder can be very complex.
`However, in special applications (especially those includ(cid:173)
`ing artificially constructed picture content rather than
`camera -view scenes), segmented object-based coding can
`be justified. Rate-distortion optimization of
`segmentations for variable block-size video coding was
`first discussed in [30, 31 ], which was later enhanced to in(cid:173)
`clude dynamic programming· to account ior sequential
`dependencies in [37]-(39]- The optimization of coders
`that use object segmentation is discussed in an accompa(cid:173)
`nying article [ 15).
`
`Distortion Meosures
`Rate-distortion optimization requires an ability to mea(cid:173)
`sure distortion. However, the perceived distortion in vi(cid:173)
`sual content is a vc:ry difficult quantity to measure, as the
`characteristics of the human visual system are complex
`and not well understood. This problem is aggravated in
`video coding, because the addition of the temporal do(cid:173)
`main relative to still-picture coding further complicates
`
`Standard Hybrid Video Codec Terminology
`
`The following terms are useful for understanding the
`
`various international standards for video coding:
`prediction mode: A basic representation model that is
`selected for use in approximating a picture region (INTRA,
`INTER, etc.).
`mode decision: An encoding process rhat selects the
`prediction mode for each region to be encoded.
`block: A rectangular region (normally of size 8 x 8) in a
`picture. The discrete cosine transform (DCf) in standard
`video coders operates on 8 x 8 block regions.
`macroblock: A region of size l6x 16 in the luminance
`picture and the corresponding region of chrominancc in(cid:173)
`formation (often an8x 8 region), which is associated with
`a prediction mode.
`motion vector (MV): A spatial displacement offset for
`use in the prediction of an image region. In the INTER pre(cid:173)
`diction mode an MV affects a macro block region, while in
`the INTER +4V prediction mode, an individual MV is sent
`for each of the fourS x Bluminance blocks in a macroblock.
`motion compensation: A decoding process that repre(cid:173)
`sents motion in each region of a picture by application of
`the transmitted MVs w the prior decoded picrure.
`motion estimation: An encoding process that selects
`the MVs to be used for motion compensation.
`half-pixel motion: A representation of motion in
`which an MV may specify prediction from pixel locations
`that are halfWa)' between the pixel grid locations in the
`prior picture, rhus requiring interpolation to construct the
`prediction of an image region.
`picture-extrapolating MV s: A representation of mo(cid:173)
`tion in which an MV may specify prediction from pixel lo(cid:173)
`cations that lie partly or entirely outside the boundaries of
`the prior picture, thus requiring extrapolation of the edges
`of the picture to construct the prediction of an image re(cid:173)
`gion.
`overlapped-block motion compensation (OBMC):
`A representation of motion in which the MVs that repre(cid:173)
`sent the motion in a picrure have overlapping areas ofinflu(cid:173)
`et:'IU.
`INTRA mode: A prediction mode in which the picture
`con rem of a macroblock region is represented without ref(cid:173)
`erence to a region in any previously decoded picture.
`SKIP mode: A predictinn mode in which the picrure
`content of a macroblock region is represented as a copy of
`the macroblock in the same location in a previously de(cid:173)
`coded picture.
`INTER mode: A prediction mode in which the picrure
`content of a macroblock region is represented as the sum of
`a motion-compensated prediction nsing a motion ''ector,
`plus (optionally} a decoded rc.o;idual difference signal repre(cid:173)
`sentation.
`INTEll+4Vmode, A prediction mode in which the pic(cid:173)
`ture content of a macroblock. region is represented as in the
`INTER mode, but using four motion vectors (one for each
`8 x 8 block in the macroblock).
`INTEll+Q mode: A prediction mode in which the pic(cid:173)
`rure content of a macroblock is represented as in the INTER
`mode, and a change is indicated for the inverse
`quantization scaling of the decoded residual signal repre(cid:173)
`sentation.
`
`78
`
`IEEE SIIONAl PROCESSING MAGADNE
`
`NOVEMBER 1998
`
`Vedanti Systems Limited - Ex. 2009
`Page 5
`
`

`
`the issue. In practice, highly imperfect distortion models
`such as the sum of squared differences (SSD) or irs equiv(cid:173)
`alents, known as mean squared error (MSE) or peak sig(cid:173)
`nal-ro-noise ratio (PSNR), are used in most actual
`comparisons. They are defined by
`SSD~ (F,G) = :L,!F(s) -G(s) !2
`
`(3)
`
`.u:-1
`
`MSE, (F,G)=__!_ SSD, (F,G)
`.
`!..'21.1
`.
`.
`
`,
`PSNR_, (F,G) = 10log 111
`
`(255) 2
`.
`, decibels.
`MSE" (F,G)
`
`(4)
`
`(5)
`
`Another distortion measure in common use (since it is of(cid:173)
`ten easier to compute) is the sum of absolute differences
`(SAD)
`
`SAD_,(F,G)= I,!F(s) - G(s) !
`
`(6)
`
`where F and G are two array argumenrs (such as lumi(cid:173)
`nance arrays of the acrual and approximated pictures).
`These measures are often applied to only the luminance
`field of the picture during optimization processes, but
`
`better performance can be obtained by including all three
`color components. (The chrominance components are
`often treated as something of a minor nuisance in video
`coding; since they need only about I 0% of the bit rate of
`the luminance they provide a limited opportunity for op(cid:173)
`timization gain.)
`
`EHectiveness of Basic Technical Features
`In the previous sections we described the various
`technical features of a basic modern video coder. The cf·
`fectiveness of these features and the dependence of this
`ctfectivem:ss on video content is shown in rig. 2. The
`upper plot of Fig. 2 shows performance for a
`videophone sequence known as Mother & Daughter,
`with moderate object motion and a stationary back·
`ground. The lower plot ofFig. 2 shows performance for
`a more demanding scene known as Foreman, with heavy
`object motion and an unstable hand-held-moving cam(cid:173)
`era. Each sequence was encoded in QCIF resolution at
`10 frames per second using the framework of a
`well-optimized H.263 [10] video encoder (using opti(cid:173)
`mization methods described later in this article). ( H .263
`has 16 x 16 prediction-mode regions called macroblocks
`and 8 x 8 DCT-based D FD coding.)
`
`Complicating Factors in
`Video Coding Optimization
`T he video coder model described in this article is useful for
`age area may also be ftltered to avoid high-frequency artifacts
`(as in Rcc. H.261 [4]).
`illustration purposes, but in practice actual video cod.:r
`Often there are interactions between the coding of differ-
`designs often differ from it in various ways that complicate
`ent regions in a video coder. The number of bits needed to
`design and analysis. Some of the important differences are
`specify an MV value may depend on the va.lues of the MVs in
`described in the following few paragraphs.
`neighboring regions. The areas of influence of differentMVs
`Color chrominance components (e.g., Cb.( s) and Cr, ( s ))
`can be overlapping due to overlapped-block motion com-
`are often represented with lower resolution (e.g.,
`W I 2x H /2) than theluminancecomponentoftheimage
`pensation (OBMC) [16]·[19], and the areas of influence of
`coded transform blocks can also overlap due to the applica-
`Y(s). This is because the human psycho-visual system is
`tion of deblocking filters. While these cross-dependencies
`much more sensitive to brightness than to chrominance, al-
`can improve coding performance, they can also complicate
`lowing bit·ratc savings by coding the chrominance at lower
`the task of optimizing the decisions made in an encoder. For
`resolution. In such a system, the method of operation must
`this reason these cross-dependencies arc often neglected (or
`be adjusted to account for the difference in resolution (for
`only partially accounted for) during encoder optimization.
`example, by dividing the MV values by two for chrominance
`One important and often-neglected interaction be-
`components).
`Since image values I, ( s) are defined only fur integer pixel
`tween the coding of video regions is the temporal propa-
`gation of error. The fidelity of each area of a particular
`locations s = (x,y) within the rectangular picrure area, the
`picture will affect the ability to usc that picture area for
`above model will work properly in the strict sense only if ev-
`ery motion vector "•·• is restricted to have an integer value
`the prediction of subsequent pictures. Real-rime
`encoders must neglect to account for this aspect to a large
`and only a value that causes access to locations in the prior
`extent, since they cannot tolerate the delay necessary for
`picrure that are within the picrure's rectangular boundary.
`optimizing a long temporal sequence of decisions with
`These restrictions, which are maintained in some early
`accounting for the cemporal effects on many pictures.
`video-coding methods such as ITU·T Rec. H.261 [4], are
`However, even nonreal-time encoders also often neglect
`detrimental to performance. More recent designs such as
`this to account for this propagation in any significant
`ITU-T Rcc. H.263 [ 10] support the removal of these restric-
`way, due to the sheer complexiry of adding this extra di·
`tions by using interpolation ofrhe prior picture for any frac-
`tional-valuedMVs (normally half-integer values, resulting in mcnsion to the analysis. An example for the exploitation
`what is called half-pixel motion) andMVs that access loca-
`oftemporai dependencies in video coding can be found
`tions outside the boundary of the picture (resulting in what
`in [20 ]. The work of Ramchandran, Ortega, and Vetterli
`we call picrure-e:XrrapolatingMVs). The prediction of an im-
`in [20] was extended by Lee and Dickinson in [21].
`
`NOVEMBER 1996
`
`IEEE SIGiiAI. PROCESSiNG MAGAZINE
`
`79
`
`Vedanti Systems Limited - Ex. 2009
`Page 6
`
`

`
`A gain in performanct: is shown for forming a CR
`coder by adding rhc SKIP coding mode ro the encoder.
`Further gains in performance arc shown when adding the
`various INTER coding modes to the encoder that were
`discussed in the previous sections:
`& INTER (MV = {0,0) only): frame-difierence coding
`with only zero-valued MY displacements
`4 INTER (Full-pixel motion compensation). inte(cid:173)
`ger-pixel (full-pixel) precision motion compensation
`with DFD coding
`.ft. INTER (Half.:pixel motion compensation): half-pixel
`precision motion compensation with DFD coding
`4 INTER & INTER +4V: half-pixel precision motion com(cid:173)
`pensation with DFD coding and the addition of an "ad(cid:173)
`vanced prediction" mode (H.263 Annex F), which
`includes a segmentation switch allowing a choice of either
`one or four MVs per 16 x 16 area and also includes over(cid:173)
`lapped-block motion compensation (OBMC) and pic(cid:173)
`rure-extrapolating MVs [ 10]. (The use of four

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket