throbber
(12) United States Patent
`Sawhney et al.
`
`(10) Patent N0.:
`
`(45) Date of Patent:
`
`US 6,907,073 B2
`Jun. 14, 2005
`
`US006907073B2
`
`(54)
`
`(75)
`
`TWEENING-BASED CODEC FOR
`SCALEABLE ENCODERS AND DECODERS
`WITH VARYING MOTION COMPUTATION
`CAPABILITY
`
`Inventors: Harpreet Singh Sawhney, West
`Windsor, NJ (US); Rakesh Kumar,
`Monmouth Junction, NJ (US); Keith
`Hanna, Princeton, NJ (US); Peter
`Burt, Princeton, NJ (US); Norman
`Winarsky, Princeton, NJ (US)
`
`(73)
`
`Assignee: Sarnofi' Corporation, Princeton, NJ
`(US)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 596 days.
`
`(21)
`
`(22)
`
`(65)
`
`(60)
`
`(51)
`(52)
`(58)
`
`(56)
`
`Appl. No.: 09/731,194
`
`Filed:
`
`Dec. 6, 2000
`Prior Publication Data
`
`US 2001/0031003 A1 Oct. 18, 2001
`
`Related U.S. Application Data
`Provisional application No. 60/172,841, filed on Dec. 20,
`1999.
`
`Int. Cl.7 ................................................ .. H04B 1/66
`U.S. Cl.
`................................................ .. 375/240.14
`Field of Search ..................... .. 375/240.01, 240.03,
`375/240.15, 240.16, 240.12, 240.11, 240.14,
`240.09, 240.22, 240.23, 240.25, 240.13,
`382/234, 236, 238, 250, H04B 1/66
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,121,202 A *
`5,677,735 A
`
`.................. .. 375/240.16
`6/1992 Tanoi
`10/1997 Ueno et al.
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`EP
`
`0 305 127 A2
`
`3/1989
`
`OTHER PUBLICATIONS
`
`“The Motion Transform: A New Motion Compensation
`Technique”, by Armitano, R. M. et al; IEEE International
`Conference on Acoustics, Speech, and Signal Processing
`Proceedings, vol. CONF. 21, May 7, 1996, pp. 2295-2298.
`Patent Abstracts of Japan, vol. No. 14, Dec. 31, 1998 and JP
`10 257502 A, Matsushita Electric Ind Co Ltd, Sep. 25, 1998
`abstract.
`
`Primary Examiner—Tung V0
`(74) Attorney, Agent, or Firm—William J. Burke
`
`(57)
`
`ABSTRACT
`
`A scaleable video encoder has one or more encoding modes
`in which at least some, and possibly all, of the motion
`information used during motion-based predictive encoding
`of a video stream is excluded from the resulting encoded
`video bitstream, where a corresponding video decoder is
`capable of performing its own motion computation to gen-
`erate its own version of the motion information used to
`perform motion-based predictive decoding in order to
`decode the bitstream to generate a decoded video stream. All
`motion computation, whether at the encoder or the decoder,
`is preferably performed on decoded data. For example,
`frames may be encoded as either H, L, or B frames, where
`H frames are intra-coded at full resolution and L frames are
`intra-coded at low resolution. The motion information is
`generated by applying motion computation to decoded L and
`H frames and used to generate synthesized L frames.
`L-frame residual errors are generated by performing inter-
`frame differencing between the synthesized and original L
`frames and are encoded into the bitstream. In addition,
`synthesized B frames are generated by tweening between the
`decoded H and L frames and B-frame residual errors are
`generated by performing inter-frame differencing between
`the synthesized B frames and, depending on the
`implementation, either the original B frames or sub-sampled
`B frames. These B-frame residual errors are also encoded
`into the bitstream. The ability of the decoder to perform
`motion computation enables motion-based predictive encod-
`ing to be used to generate an encoded bitstream without
`having to expend bits for explicitly encoding any motion
`information.
`
`(Continued)
`
`35 Claims, 5 Drawing Sheets
`
` 11\6
`
`L FRAME
`* SYNTHESIS
`
`100
`
`To
`120
`"E5'°U"'-
`H ERROR
`
`ENCODING
`
`119 1
`|N'|'ER-FRAME
`DIFFERENCING
`
`
`
`RESIDUAL ‘
`eanon
`ENOOD|NG ,
`
`
`
`Google Inc.
`GOOG 1041
`
`IPR20l6-00212
`
`,
`
`
`
`
`
`IM=UT
`STREAM
`
`.
`‘-
`FRAMEI
`nseaou H
`' we 5
`semcnon
`
`BITSTREAM
`(ownom)
`
`0001
`
`INTEFLFRAME
`
`.
`' DIFFEHENCING
`
`
`
`Google Inc.
`GOOG 1041
`IPR2016-00212
`
`0001
`
`

`
`US 6,907,073 B2
`Page 2
`
`US. PATENT DOCUMENTS
`
`6,563,549 B1 *
`
`5/2003 Sethuraman .............. .. 348/700
`
`........ .. 375/240.16
`5,686,962 A * 11/1997 Chung et al.
`5,703,649 A * 12/1997 Kondo ................ .. 375/240.18
`5,764,805 A *
`6/1998 Martucci et al.
`..... .. 382/238
`
`......... .. 375/240.23
`5,852,469 A * 12/1998 Nagai et al.
`6,097,842 A *
`8/2000 Suzuki et al.
`............. .. 382/232
`6,427,027 B1 *
`7/2002 Suzuki et al.
`..
`382/236
`
`6,490,705 B1 * 12/2002 Boyce ...................... .. 714/776
`6,535,558 B1 *
`3/2003 Suzuki et al.
`........ .. 375/240.12
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`EP
`W0
`W0
`
`0 753 970 A2
`0 920 214 A2
`WO 93/02526
`WO 99/57906
`
`1/1997
`6/1999
`2/1993
`11/1999
`
`* cited by examiner
`
`0002
`
`0002
`
`

`
`U.S. Patent
`
`Jun. 14, 2005
`
`Sheet 1 0f5
`
`US 6,907,073 B2
`
`
`ozaouzmozaoozm
`m_Omm_wOZ_OZmImu__u__DAw.mw_.._._.Z>w..OZ_QOOmOA-<mHZ_
`
`
`
`
`ms_<E.E:,__mmm.>>o...
`
`mi
`
`s,mm.5=m2
`
`
`
`e:2:o._.
`
`s_<mm.._.w._._m
`
`5%.
`
`
`
`_2<mm._.m._._mmz<mu_
`
`:<zoEo.oz_%_oE
`
`._s,_oEo
`
`zo_._.om._mw
`
`
`
`
`
`w_m.w..:.z>w.2zo_wmms_<mE.m
`
`0003
`
`ozaoozmmn_>._.
`
`
`
`2<m_%m:mdoflmmmwfi..oz_ozm_$&_o.A.A._<Dn=wmmm2<mu_.,_._<:.<..._w
`
`
`
`
`
`_oz_._.;_<.a,
`
`
`
`K.@E
`
`0003
`
`
`
`
`

`
`U.S. Patent
`
`Jun. 14, 2005
`
`Sheet 2 0f5
`
`US 6,907,073 B2
`
`H BLBLBLBLBHBLBLBLBLBH
`
`304
`
`306
`
`
`
`
`INCLUDE INTRA-FRAME
`
`CODED H-FRAME DATA
`
`
`
`GENERATE DECODED FULL-RES H
`
`FRAME FOR USE AS REFERENCE
`
`INTO BITSTREAM
`
`DATA FOR L AND B FRAMES
`
`
`FIG. 3
`
`502
`
`504
`
`506
`
`508
`
`SUB-SAMPLE FULL-RES B FFIAME
`TO GENERATE LOW-RES B FRAME
`
`
`
`
`
`
`USING MOTION INFO
`
`FROM H/L FRAME ENCODING,
`
`GENERATE SYNTHESIZED
`
`LOW-RES B FRAME
`
`
`
`
`
`GENERATE B-FRAME RESIDUALS
`
`BTW LOW-RES B FRAME &
`
`SYNTHESIZED LOW-RES B FRAME
`
`
`
`
`
`
`ENCODE B-FRAME RESIDUALS
`
`INTO BITSTREAM
`
`I L
`
`0004
`
`0004
`
`

`
`U.S. Patent
`
`Jun. 14, 2005
`
`Sheet 3 0f5
`
`US 6,907,073 B2
`
`SUB-SAMPLE FULL-RES L FRAME
`
`T0 GENERATE Low-RES LFRAME J
`
`‘R
`
`1
`
`INTRA-ENOODE Low-RES L FRAME
`~————~—-A
`
`1
`GENERATE DECODED
`Low-RES I. FRAME
`
`FOR DECODED Low-RES L FRAME
`
`PERFORM MTION COMP
`
`
`
`TO GENERATE MOTION INFO
`I
`
`E
`
`
`
`406
`
`1
`
`
`
`INCLUDE INTRA-
`. CODED L FRAME
`INTO BITSTREAM
`"
`
`
`
`412
`
`
`
`OPTIONALLY
`_ ENCODE SOM EIALL
`MOTION INFO
`
`
`—
`
`INTO BITSTREAM
`-——~
`
`
`
`
`
`
`GENERATE RESID ERRORS BETWEEN
`SYNTHESIZED FULL-RES L FRAME
`AND ORIGINAL FULL-RES L FRAM
`
`‘
`
`THRESHOLD RESIDUAL ERRORS TO
`GENERATE BINARY MASK
`
`4
`
`
`
`
`
`
`BASED ON BINARY MASK,
`AUGMENT BITSTREAM WITH
`
`ENCODED RESIDUAL ERRORS.
`
`
`-
`
`0005
`
`v
`
`USE MOTION INFO TO SYNTHESIZE
`FULL-RES L FRAME
`
`I
`FROM DECODED H FRAME(s)
`
`
`
`
`
`
`
`
`402
`
`404
`.
`
`408
`
`410
`
`414
`
`416
`
`418
`
`420
`
`0005
`
`

`
`U.S. Patent
`
`Jun. 14, 2005
`
`Sheet 4 0f5
`
`US 6,907,073 B2
`
`E
`
`.0m_mmE_,_>m.zozoz
`
`m_m-;o._
`
`1f
`
`GdroN_.wcom
`
`Booomo..-ws_<E._mum
`_2..mEmzoE8<£<8_mm§
`
`
`
`
`
`O...m_2<m_n_mm._.z_3<_5_mmmms_<E._.>>O._wooomo
`
`omoooma..<Ez_Qwooozm
`
`
`_2<mEmGzaoowomn_>.—s_<mm_._.m.:m
`
`
`
`zo_nm_wm
`
`0006
`
`m_2<Emm_Eso..
`
`m_m_Ez>m
`
`ws_<¢.._._m:z_
`
`zoE8<
`
`
`
`.25..maoomo
`
`m_2<Emme
`
`3.8.3;
`
`®.@E
`
`0006
`
`
`
`
`
`

`
`U.S. Patent
`
`Jun. 14, 2005
`
`Sheet 5 of 5
`
`US 6,907,073 B2
`
`ADD L-FRAMERESIDUALSTO
`SYNTHESIZED FULL-RESL FRAME
`TO GENERATE DECODED
`
`
`
`DECODE BITSTREAM TO GENERATE
`DECODED B-FRAME RESIDUALS
`
`SYNTHESIZELOW-RES B FRAME
`
`ADD B-FRAME RESIDUALS TO
`SYNTHESIZED LOW.RESBFRAME
`TO GENERATE DECODED
`LOW-RES B FRAME
`
`V
`GENERATE DECODED FULL-RES
`B FRAME FROM DECODED
`LOW-RES B FRAME
`Fil@?, {?
`
`1002
`
`Fl??, ?
`
`?
`
`TiME
`
`WEWS
`FROZEN
`INTIME
`
`VIEWPOINT
`
`A WiFW
`OFA
`
`0007
`
`

`
`1
`TWEENING-BASED CODEC FOR
`SCALEABLE ENCODERS AND DECODERS
`WITH VARYING MOTION COMPUTATION
`CAPABILITY
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application claims the benefit of the filing date of
`U.S. provisional application No. 60/172,841, filed on Dec.
`20, 1999.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`invention relates to video compression/
`The present
`decompression (codec) processing.
`2. Description of the Related Art
`Traditional video compression/decompression processing
`relies on asymmetric computation between the encoder and
`decoder. The encoder is used to do all the analysis of the
`video stream in terms of inter- and intra-frame components,
`including block-based motion computation, and also in
`terms of object-based components. The analysis is used to
`compress static and dynamic information in the video
`stream. The decoder simply decodes the encoded video
`bitstream by decompressing the intra- and block-based inter-
`frame information. No significant analysis is performed at
`the decoder end. Examples of such codecs include MPEG1,
`MPEG2, MPEG4, H.263, and related standards. The quality
`of “codeced” video using the traditional asymmetric
`approach is reasonably good for data rates above about 1.2
`megabits/second (Mbps). However, the typical quality of
`output video is significantly degraded at modem speeds of
`56 kilobits/second (Kbps) and even at speeds as high as a
`few 100 Kbps.
`SUMMARY OF THE INVENTION
`
`The present invention is related to video compression/
`decompression processing that
`involves analysis of the
`video stream (e.g., motion computation) at both the encoder
`end and the decoder end. With the rapid increase in pro-
`cessing power of commonly available platforms, and with
`the potential for dedicated video processing sub-systems
`becoming viable, the techniques of the present invention
`may significantly influence video delivery on the Internet
`and other media at low and medium bit-rate channels.
`
`In traditional video compression, any and all motion
`computation is performed by the encoder, and none by the
`decoder. For example, in a conventional MPEG-type video
`compression algorithm, for predictive frames, the encoder
`performs block-based motion estimation to identify motion
`vectors that relate blocks of data in a current frame to closely
`matching blocks of reference data for use in generating
`motion-compensated inter-frame differences. These inter-
`frame differences (also referred to as residual errors) along
`with the motion vectors themselves are explicitly encoded
`into the resulting encoded video bitstream. Under this codec
`paradigm, without having to perform any motion computa-
`tion itself, a decoder recovers the motion vectors and inter-
`frame differences from the bitstream and uses them to
`
`generate the corresponding frames of a decoded video
`stream. As used in this specification,
`the term “motion
`computation” refers to motion estimation and other types of
`analysis in which motion information for video streams is
`generated, as opposed to motion compensation, where
`already existing motion information is merely applied to
`video data.
`
`US 6,907,073 B2
`
`2
`According to certain embodiments of the present
`invention, a video decoder is capable of performing at least
`some motion computation. As such, the video encoder can
`omit some or all of the motion information (e.g., motion
`vectors) from the encoded video bitstream, relying on the
`decoder to perform its own motion computation analysis to
`generate the equivalent motion information required to
`generate the decoded video stream. In this way, more of the
`available transmission and/or storage capacity (i.e., bit rate)
`can be allocated for encoding the residual errors (e.g.,
`inter-frame differences) rather than having to expend bits to
`encode motion information.
`
`According to one embodiment, the present invention is a
`method for encoding a video stream to generate an encoded
`video bitstream, comprising the steps of (a) encoding, into
`the encoded video bitstream, a first original frame/region in
`the video stream using intra-frame coding to generate an
`encoded first frame/region; and (b) encoding,
`into the
`encoded video bitstream, a second original frame/region in
`the video stream using motion-based predictive coding,
`wherein at least some motion information used during the
`motion-based predictive coding is excluded from the
`encoded video bitstream.
`
`10
`
`15
`
`20
`
`According to another embodiment, the present invention
`is a video encoder for encoding a video stream to generate
`an encoded video bitstream, comprising (a) a frame/region
`type selector configured for selecting different processing
`paths for encoding different frames/regions into the encoded
`video bitstream; (b) a first processing path configured for
`encoding, into the encoded video bitstream, a first original
`frame/region in the video stream using intra-frame coding to
`generate an encoded first frame/region; and (c) a second
`processing path configured for encoding, into the encoded
`video bitstream, a second original frame/region in the video
`stream using motion-based predictive coding, wherein the
`video encoder has an encoding mode in which at least some
`motion information used during the motion-based predictive
`coding is excluded from the encoded video bitstream.
`According to yet another embodiment, the present inven-
`tion is a method for decoding an encoded video bitstream to
`generate a decoded video stream, comprising the steps of (a)
`decoding, from the encoded video bitstream, an encoded
`first frame/region using intra-frame decoding to generate a
`decoded first frame/region; and (b) decoding, from the
`encoded video bitstream, an encoded second frame/region
`using motion-based predictive decoding, wherein at least
`some motion information used during the motion-based
`predictive decoding is generated by performing motion
`computation as part of the decoding method.
`According to yet another embodiment, the present inven-
`tion is a video decoder for decoding an encoded video
`bitstream to generate a decoded video stream, comprising (a)
`a frame/region type selector configured for selecting differ-
`ent processing paths for decoding different encoded frames/
`regions from the encoded video bitstream; (b) a first pro-
`cessing path configured for decoding, from the encoded
`video bitstream, an encoded first frame/region in the video
`stream using intra-frame decoding to generate a decoded
`first frame/region; and (c) a second processing path config-
`ured for decoding, from the encoded video bitstream, an
`encoded second frame/region in the video stream using
`motion-based predictive decoding, wherein the video
`decoder has a decoding mode in which at least some motion
`information used during the motion-based predictive decod-
`ing is generated by the video decoder performing motion
`computation.
`According to yet another embodiment, the present inven-
`tion is a method for decoding an encoded video bitstream to
`0008
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`0008
`
`

`
`US 6,907,073 B2
`
`3
`
`generate a decoded video stream, comprising the steps of (a)
`decoding, from the encoded video bitstream, a plurality of
`encoded frames/regions to generate a plurality of decoded
`frames/regions using motion information; and (b) perform-
`ing tweening based on the motion information to insert one
`or more additional frames/regions into the decoded video
`stream.
`
`According to yet another embodiment, the present inven-
`tion is a decoder for decoding an encoded video bitstream to
`generate a decoded video stream, comprising (a) one or
`more processing paths configured for decoding, from the
`encoded video bitstream, a plurality of encoded frames/
`regions to generate a plurality of decoded frames/regions
`using motion information; and (b) an additional processing
`path configured for performing tweening based on the
`motion information to insert one or more additional frames/
`regions into the decoded video stream.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Other aspects, features, and advantages of the present
`invention will become more fully apparent from the follow-
`ing detailed description,
`the appended claims, and the
`accompanying drawings in which:
`FIG. 1 shows a block diagram of a scaleable video
`encoder, according to one embodiment of the present inven-
`tion;
`FIG. 2 shows a representation of the encoding of an input
`video stream by the video encoder of FIG. 1;
`FIG. 3 shows a flow diagram of the processing of each H
`frame by the video encoder of FIG. 1;
`FIG. 4 shows a flow diagram of the processing of each L
`frame by the video encoder of FIG. 1;
`FIG. 5 shows a flow diagram of the processing of each B
`frame by the video encoder of FIG. 1;
`FIG. 6 shows a block diagram of a video decoder,
`according to one embodiment of the present invention;
`FIG. 7 shows a flow diagram of the processing of each L
`frame by the video decoder of FIG. 6;
`FIG. 8 shows a flow diagram of the processing of each B
`frame by the video decoder of FIG. 6;
`FIG. 9 represents a basketball event being covered by a
`ring of cameras; and
`FIG. 10 represents a space-time continuum of views along
`the ring of cameras of FIG. 9.
`DETAILED DESCRIPTION
`
`In current state-of-the-art motion video encoding
`algorithms, like those of the MPEGx family, a large part of
`the bit budget and hence the bandwidth is consumed by the
`encoding of motion vectors and error images for the non-
`intra-coded frames.
`In a typical MPEG2 coded stream,
`approximately 5% of the bit budget is used for overhead,
`10-15% is for intra-coded frames (i.e., frames that are coded
`as stills), 20-30% is for motion vectors, and 50-65% of the
`budget is for error encoding. The relatively large budget for
`error encoding can be attributed to two main reasons. First,
`motion vectors are computed only as a translation vector for
`(8><8) blocks or (16><16) macroblocks, and, second,
`the
`resulting errors tend to be highly uncorrelated and non-
`smooth.
`
`According to certain embodiments of the present
`invention, motion computation is performed at both the
`encoder end and the decoder end. As such, motion informa-
`tion (e.g., motion vectors) need not be transmitted. Since
`
`4
`motion computation is performed at the decoder end, instead
`of limiting the representation of motion to block-based
`translations, motion fields can be computed with greater
`accuracy using a combination of parametric and non-
`parametric representations.
`Embodiments of the present invention enable the video
`stream to be sub-sampled both temporally and spatially at
`the encoder. The video stream can be sub-sampled in time so
`that not all of the frames are transmitted. In addition, some
`of the frames that are transmitted may be coded at a lower
`spatial resolution. Using dense and accurate motion com-
`putation at the decoder end, the decoded full-resolution and
`low-resolution frames are used to recreate a full-resolution
`
`decoded video stream with missing frames filled in using
`motion-compensated spatio-temporal
`interpolation (also
`referred to as “tweening”). This could result in large savings
`in compression while maintaining quality of service for a
`range of different bandwidth pipes.
`In one embodiment of the present invention, a scaleable
`encoder is capable of encoding input video streams at a
`number of different encoding modes corresponding to dif-
`ferent types of decoders, e.g., having different levels of
`processing capacity.
`At one extreme class of encoding modes, the encoder
`generates an encoded video bitstream for a decoder that is
`capable of performing all of the motion computation per-
`formed by the encoder. In that case, the encoder encodes the
`video stream using an encoding mode in which motion-
`based predictive encoding is used to encode at least some of
`the frames in the video stream, but none of the motion
`information used during the video compression processing is
`explicitly included in the resulting encoded video bitstream.
`The corresponding decoder performs its own motion com-
`putation during video decompression processing to generate
`its own version of the motion information for use in gener-
`ating a decoded video stream from the encoded video
`bitstream, without having to rely on the bitstream explicitly
`carrying any motion information.
`At the other extreme class of encoding modes, the encoder
`encodes the video stream for a decoder that is incapable of
`performing any motion computation (as in conventional
`video codecs). In that case, if the encoder uses any motion
`information during encoding (e.g., for motion-compensated
`inter-frame differencing), then all of that motion information
`is explicitly encoded into the encoded video bitstream. The
`corresponding decoder recovers the encoded motion infor-
`mation from the encoded video bitstream to generate a
`decoded video stream without having to perform any motion
`computation on its own.
`In between these two extremes are a number of different
`
`encoding modes that are geared towards decoders that
`perform some, but not all of the motion computation per-
`formed by the encoder. In these situations,
`the encoder
`explicitly encodes some, but not all of the motion informa-
`tion used during its motion-based predictive encoding, into
`the resulting video bitstream. The corresponding decoder
`recovers the encoded motion information from the bitstream
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`and performs its own version of motion computation to
`generate the rest of the motion information used to generate
`a decoded video stream.
`
`65
`
`Independent of how much motion information is to be
`encoded into the bitstream, a scaleable encoder of the
`present invention is also capable of skipping frames with the
`expectation that the decoder will be able to insert frames into
`the decoded video stream during playback. Depending on
`the implementation, frame skipping may involve providing
`0009
`
`0009
`
`

`
`US 6,907,073 B2
`
`5
`at least some header information for skipped frames in the
`encoded video bitstream or even no explicit information at
`all.
`
`Encoding
`FIG. 1 shows a block diagram of a scaleable video
`encoder 100, according to one embodiment of the present
`invention. Scaleable video encoder 100 will
`first be
`
`described in the context of an extreme encoding mode in
`which none of the motion information used during motion-
`based predictive encoding is explicitly encoded into the
`resulting encoded video bitstream. Other encoding modes
`will then be described.
`
`According to this extreme encoding mode, each frame in
`an input video stream is encoded as either an H frame, an L
`frame, or a B frame. Each H frame is intra-encoded as a high
`spatial resolution (e.g., full-resolution) key frame, each L
`frame is intra-encoded as a low spatial resolution (e.g., %><%
`resolution) key frame augmented by residual error encoding,
`and each B frame is inter-encoded as a low spatial resolution
`frame based on motion estimates between sets of H and/or
`L frames. Video encoder 100 encodes an input video stream
`as a sequence of H, L, and B frames to form a corresponding
`output encoded video bitstream.
`FIG. 2 shows a representation of a particular example of
`the encoding of an input video stream by video encoder 100
`of FIG. 1. In the example of FIG. 2, the input video stream
`is encoded using a repeating 10-frame sequence of
`(HBLBLBLBLB). In general, however, other fixed or even
`adaptive frame sequences are possible. For example, in one
`preferred fixed frame sequence, a 30 frame/second (fps)
`video stream is encoded using the fixed 30-frame sequence
`of:
`
`(HBBBBBLBBBBBLBBBBBLBBBBBLBBBBB).
`
`The generation of the frame-type sequence may also be
`performed adaptively, e.g., based on the amount of motion
`present across frame, with fewer B frames between con-
`secutive H/L key frames and/or fewer L frames between
`consecutive H frames when motion is greater and/or less
`uniform, and vice versa.
`Referring again to FIG. 1, type selection 102 is applied to
`the input video stream to determine which frames in the
`video stream are to be encoded as H, L, and B frames.
`(Although this type selection 102 will be described in the
`context of entire frames, this process may also be imple-
`mented based on regions within a frame, such as square
`blocks,
`rectangular regions, or even arbitrary shaped
`regions, with the corresponding estimation and encoding
`applied to each.) As mentioned above, depending on the
`particular implementation, the frame-type selection may be
`based on a fixed frame sequence or an appropriate adaptive
`selection algorithm, e.g., based on motion magnitude, spe-
`cial effects, scene cuts, and the like. Each of the different
`types of frames is then processed along a corresponding
`processing path represented in FIG. 1. As shown in FIG. 1,
`an option exists to drop one or more frames from the input
`video stream. This optional frame dropping may be incor-
`porated into a fixed frame sequence or adaptively selected,
`e.g., based on the amount of motion present or bit-rate
`considerations.
`
`FIG. 3 shows a flow diagram of the processing of each H
`frame by video encoder 100 of FIG. 1. Referring to the
`blocks in FIG. 1 and the steps in FIG. 3, the current H frame
`is intra-encoded at full resolution, e.g., using wavelet encod-
`ing (block 104 of FIG. 1 and step 302 of FIG. 3). As is
`known in the art, wavelet encoding typically involves the
`application of wavelet transforms to different sets of pixel
`
`6
`data corresponding to regions within a current frame, fol-
`lowed by quantization, run-length encoding, and variable-
`length (Huffman-type) encoding to generate the current
`frame’s contribution to the encoded video bitstream.
`
`Typically, the sizes of the regions of pixel data (and therefore
`the sizes of the wavelet transforms) vary according to the
`pixel data itself. In general, the more uniform the pixel data,
`the larger the size of a region that is encoded with a single
`wavelet transform. Note that even though this encoding is
`referred to as “full resolution,” it may still involve sub-
`sampling of the color components (e.g., 421:1 YUV sub-
`sampling).
`The resulting intra-encoded full-resolution H-frame data
`is incorporated into the encoded video bitstream (step 304).
`The same intra-encoded H-frame data is also decoded (block
`106 and step 306), e.g., using wavelet decoding, to generate
`a full-resolution decoded H frame for use as reference data
`
`for encoding L and B frames.
`FIG. 4 shows a flow diagram of the processing of each L
`frame by video encoder 100 of FIG. 1. Referring to the
`blocks in FIG. 1 and the steps in FIG. 4,
`the current
`full-resolution L frame is spatially sub-sampled (e.g., by a
`factor of 4 in each direction) to generate a corresponding
`low-resolution L frame (block 108 and step 402). Depending
`on the particular implementation, this spatial sub-sampling
`may be based on any suitable technique, such as simple
`decimation or more complicated averaging.
`The low-resolution L frame is then intra-encoded (block
`110 and step 404), e.g., using wavelet encoding, and the
`resulting intra-encoded low-resolution L-frame data is incor-
`porated into the encoded video bitstream (step 406). The
`same intra-encoded L-frame data is also decoded to generate
`a decoded low-resolution L frame (block 112 and step 408).
`Motion computation analysis is then performed compar-
`ing the decoded low-resolution L-frame data to one or more
`other sets of decoded data (e.g., decoded full-resolution data
`corresponding to the previous and/or subsequent H frames
`and/or decoded low-resolution data corresponding to the
`previous and/or subsequent L frames) to generate motion
`information for the current L frame (block 114 and step 410).
`In this particular “extreme” encoding mode, none of this
`L-frame motion information is explicitly encoded into the
`encoded video bitstream.
`In other encoding modes
`(including the opposite “extreme” encoding mode), some or
`all of the motion information is encoded into the bitstream
`
`(step 412).
`The exact type of motion computation analysis performed
`depends on the particular implementation of video encoder
`100. For example, motion may be computed for each L
`frame based on either the previous H frame, the closest H
`frame, or the previous key (H or L) frame. Moreover, this
`motion computation may range from conventional MPEG-
`like block-based or macroblock-based algorithms to any of
`a combination of optical flow, layered motion, and/or multi-
`frame parametric/non-parametric algorithms.
`For example, in one implementation, video encoder 100
`may perform conventional forward, backward, and/or
`bi-directional block-based motion estimation in which a
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`motion vector is generated for each (8><8) block or (16><16)
`macroblock of pixels in the current frame. In alternative
`embodiments, other types of motion computation analysis
`may be performed, including optical flow analysis in which
`a different motion vector is generated for each pixel in the
`current frame. (For those encoding modes in which some or
`all of the motion information is encoded into the bitstream,
`the optical flow can be compactly represented using either
`wavelet encoding or region-based parametric plus residual
`0010
`
`65
`
`0010
`
`

`
`US 6,907,073 B2
`
`8
`to
`errors are then encoded, e.g., using wavelet encoding,
`generate encoded B-frame residual data for inclusion in the
`encoded video bitstream (block 128 and step 508). Depend-
`ing on the particular implementation,
`the residual error
`encoding of block 128 may rely on a thresholding of
`B-frame inter-frame differences to determine which residu-
`als to encode, similar to that described previously with
`regard to block 120 for the L-frame residual errors. Note
`that, since B frames are never used to generate reference
`data for encoding other frames, video encoder 100 does not
`have to decode the encoded B-frame residual data.
`
`In an alternative implementation of video encoder 100,
`instead of synthesizing low-resolution B frames,
`full-
`resolution B frames can be synthesized by tweening between
`pairs of decoded full-resolution H frames generated by block
`106 and synthesized full-resolution L frames generated by
`block 116. Inter-frame differencing can then be applied
`between the original full-resolution B frames and the syn-
`thesized full-resolution B frames to generate residual errors
`that can be encoded, e.g., using wavelet encoding, into the
`encoded video bitstream.
`In that case,
`the spatial sub-
`sampling of block 122 can be omitted.
`As mentioned earlier, the processing in FIGS. 3-5 corre-
`spond to the extreme encoding mode in which video encoder
`100 performs motion-based predictive encoding, but none of
`the corresponding motion information is explicitly encoded
`into the resulting encoded video bitstream, where the
`decoder performs its own motion computation to generate its
`own version of the motion information for use in generating
`the corresponding decoded video stream. As mentioned
`earlier, video encoder 100 is preferably a scaleable video
`encoder that can encode video streams at a variety of
`different encoding modes. Some of the encoding options
`available in video encoder 100 include:
`
`10
`
`15
`
`20
`
`25
`
`30
`
`7
`
`flow encoding.) Still other implementations may rely on
`hierarchical or layered motion analysis in which a number of
`different motion vectors are generated at different
`resolutions, where finer motion information (e.g., corre-
`sponding to smaller sets of pixels) provide corrections to
`coarser motion information (e.g., corresponding to larger
`sets of pixels). In any case, the resulting motion information
`characterizes the motion between the current L frame and
`
`corresponding H/L frames.
`No matter what type of analysis is performed, the motion
`information generated during the motion computation is
`then used to synthesize a full-resolution L frame (block 116
`and step 414). In particular, the motion information is used
`to warp (i.e., motion compensate)
`the corresponding
`decoded full-resolution H frame to generate a synthesized
`full-resolution frame corresponding to the current L frame.
`Note that the synthesized full-resolution L frame may be
`generated using forward, backward, or even bi-directional
`warping based on more than one decoded full-resolution H
`frame. This would require computation of motion informa-
`tion relative to two different decoded full-resolution H
`
`frames, but will typically reduce even further the corre-
`sponding residuals that need to be compressed.
`In general, the synthesized full-resolution L frame may
`have artifacts due to various errors in motion computation
`due to occlusions, mismatches, and the like. As such, a
`quality of alignment metric (e.g., based on pixel-to-pixel
`absolute differences) is generated between the synthesized
`full-resolution L frame and the original full-resolution L
`frame (block 118 and step 416). The quality of alignment
`metrics form an image of residual errors that represent the
`quality of alignment at each pixel.
`The residual errors are then encoded for inclusion into the
`
`In one
`encoded video bitstream (block 120).
`implementation, the image of residual errors is thresholded
`at an appropriate level to form a binary mask (step 418) that
`identifies those regions of pixels for whom the residual error
`should be encoded, e.g., using a wavelet
`transform, for
`inclusion into the encoded video bitstream (step 420). For
`typical video processing, the residual errors for only about
`10% of the pixels will be encoded

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket