`(10) Patent N0.:
`(12) United States Patent
`
`Girod et al. Nov. 12, 2002 (45) Date of Patent:
`
`
`U8006480541B1
`
`(54) METHOD AND APPARATUS FOR
`PROVIDING SCALABLE PRE-COMPRESSED
`DIGITAL VIDEO WITH REDUCED
`
`(75)
`
`QUANTIZATION BASED ARTIFACTS
`Inventors: Bernd Girod, Spardorf (DE); Stafl'an
`'
`~
`.
`-
`Ems“? BrOOkllne> MA (US)> ,Y‘my
`A. Resmk, Seattle, WA (US); leolaus
`Farber, Erlangen (DE)
`
`(73) Assignee: Realnetworks, Inc., Seattle, WA (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/177,406
`
`(22)
`
`Filed:
`
`Oct. 23, 1998
`
`Related US. Application Data
`
`(63) Continuation—impart of application No. 08/753,618, filed on
`Nov. 27, 1996.
`Int. Cl.7 .................................................. A04B 1/66
`(51)
`(52) U.S. Cl.
`................................... 375/240.12, 382/232
`(58) Field Of Search ....................... 375/240.25, 240.12,
`375/240~21> 240~23> 240-08> 240~1> 240-01;
`348/425~1> 432~1> 44L 390~1> 415~1> 416~1>
`412~1> 423~1> 4242; 382/240> 302, 249,
`232; 370/232; 725/93> 116; 358/133> 135>
`136> 433, 467
`
`(56)
`
`References Clted
`U.S. PATENT DOCUMENTS
`.
`498689653 A
`9/1989 (3011“ et al~
`giggg’ggg 2
`11/133: garlflfley it al’
`5,122,873 A
`41992 621:: et a‘
`592259904 A
`7/1993 Golin et a1.
`5,253,058 A
`10/1993 Gharavi
`5,262,855 A
`11/1993 Alattar et 211.
`5,325,124 A
`6/1994 Keith
`
`5,347,600 A
`5,363,139 A
`5,384,598 A
`5,384,867 A
`5,392,396 A
`g’flg’gég 2
`,
`,
`5,430,812 A
`5,485,211 A
`5,491,513 A
`5,508,732 A
`5,604,731 A *
`5,706,290 A *
`
`9/1994 Barnsley et a1.
`11/1994 Keith
`1/1995 Rodriguez et 211.
`1/1995 Barnsley et a1.
`2/1995 MacInnis
`3/1995 K11?“
`/1995 Keith
`7/1995 Barnsley et al.
`1/1996 Kuzma
`2/1996 Wickstrom et 211.
`4/1996 Bottomley et a1.
`...... 370/232
`2/1997 Grossglauser et a1.
`1/1998 Shaw et al.
`............ 375/240.23
`
`*
`
`.
`'t d b
`y examiner
`C1 e
`Primary Examiner—Nhon Diep
`Assistant Examiner—Gims Philippe
`(74) Attorney, Agent, or Firm—Kudirka & Jobse, LLP
`
`(57)
`
`ABSTRACT
`
`Amethod for generating a digital motion video sequence at
`a plurality of bit rates uses a transitional coding source when
`SWitChing between bitStreamS having different bit rates dur-
`ing transmission of a video sequence. The transitional data
`ma
`be frames coded usin
`reconstructed frames recon-
`Smited for a first bitstreamgusmg the Characteristics of the
`SCCOIld bitstream. These “low bit rate insert frames,” 01'
`LBIFS, contain the image characteristics of a signal coded at
`the lower bit rate. With a bitstream having a higher bit rate
`being periodically coded using an LBIF, a point of image
`continuity between the two bitstreams is provided. Thus,
`switching from one bitstream to the other at this point in the
`video sequence minimizes the production of artifacts caused
`by differences in bit rate. In another embodiment of the
`invention, a separate set of transitional data is created, taking
`the .form of “switch” frames, or S-frames. The S-frames are
`typically the difference between a frame of a first bitstream
`and a frame of a second bitstream. These frames are inserted
`into the decoded bitstream during the transition from one
`bitstream to the other, and compensate for any visual arti-
`facts that might otherwrse occur due to the difference in b1t
`rate of the two bitstreams.
`
`78 Claims, 17 Drawing Sheets
`
`100
`\'\
`
`156a
`FEC
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`VIDEO
`
`
`IN C
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`/—15ec
`
`FEC
`
`
`
`
`
`
`
`
`
`
`
`1
`
`NEULION 1026
`
`1
`
`NEULION 1026
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet 1 0f 17
`
`US 6,480,541 B1
`
`
`
`VIDEO IN
`
`FIGURE1(PRIORART)
`
`2
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`SheetZ 0f17
`
`US 6,480,541 B1
`
`mokukx
`
`<>MO§w§
`
`novF.L\
`
`m>m02m§
`
`oovw
`
`0>m02m2
`
`
`
`.Z_
`
`Om9>
`
`3
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet3 0f17
`
`US 6,480,541 B1
`
`8.880
`
`82>
`
`acgm
`
`<>m02m§
`
`m>m02m§
`
`new?
`
`movw
`
`
`
`m.HEDGE
`
`O>m0§m§
`
`oovr
`
`4
`
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet 4 0f 17
`
`US 6,480,541 B1
`
`:flDDmDE
`
`SUBBED
`
`mm
`
`rm
`
`mm
`
`mm
`
`vHEDGE
`
`
`
`Emam2<EEmmamEyEm2,9
`
`
`
`
`
`
`
`
`
`m_.r<mmm.>>On_20m“.m=>_<mn_DmEmeHmZOOmE
`
`
`
`
`
`m
`
`5
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet 5 0f 17
`
`US 6,480,541 B1
`
`9::
`
`DDQEDDDDDQWBQ
`
`
`
`aas....m@1155E11a...
`
`
`
`_1I1_Q111D._11111DDEwiuQ111D._1.1.;H_$
`
`
`
`
`
`O1_1m1_1<m1_._m_IQI5.01mth<n_ZO_1:wZ<m1_1
`
`
`
`
`
`
`
`
`
`
`
`m1_1<m1_._m_>>O|_
`
`
`
`
`
`OHm:1<m1_1_m_>>O1_20mm.wIHdiZO_1_._wZ<mFA
`
`m1_1<m1cmIO_IA
`
`
`
`
`
`
`
`
`
`mWEDGE.
`
`6
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet 6 0f 17
`
`US 6,480,541 B1
`
` <6HEDGE
`
`.Z_
`
`Omo_>
`
`7
`
`
`
`US. Patent
`
`0N
`
`1
`
`20
`
`S
`
`w
`
`h09
`
`mm:
`
`v.—
`
`2,fizz/«Io
`
`mmjoEzoo
`
`
`
`m.E0089
`
`US 6,480,541 B1
`
`._m_ZZ<IO
`
`Win—00mm
`
`MoHEDGE
`
`89>._m_zz<Io788ng
` mmooomon229w.$206mmooomoM89>
`
`55I<0.0:
`
`8
`
`
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet8 0f17
`
`US 6,480,541 B1
`
`vmw
`
`omw
`
`W31
`I...
`
`mmr
`
`l__‘l
`L_
`
`vmrer
`
`m28;
`
`{—1IL
`_
`
`._NE
`
`_—1
`
`I
`L...-
`
`Fm
`
`mm
`
`mm
`
`
`
`wk<mmm>>0._20m“.m:>_<mn_QMFODumZOOmm
`
`
`
`
`
`kHEDGE
`
`
`
`
`
`
`
`m2<EEmmzmEmtm>>o._ m=2<Eomooom2<E<EZ“Ems
`
`
`
`
`
`
`
`9
`
`
`
`U.S. Patent
`
`Nov 12, 2002
`
`Sheet 9 0f 17
`
`US 6,480,541 B1
`
`u'
`l
`
`F---
`
`DEEDE0v2.u..v3.nEn..w-..E.rm7"
`
`\F-
`
`mmrNmF
`
`DDDDDDDE
`
`10
`
`
`
`
`
`
`
`m._.<mMm>>Ol_20m“—m__>_<mn_Dm—HODmeZOOmm
`
`
`
`An:m_._vm:>_<mn_memZm_._.<m._._m_>>O|_
`
`
`
`
`
`mHEDGE
`
`
`
`m__>_<mu_Own—ODm=>_<mn_<m._.z_"run
`
`
`
`
`
`L-J
`
`10
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet 10 0f 17
`
`US 6,480,541 B1
`
`mWEDGE
`
`llllllillllllllllll
`
`anNQVFN
`
`11
`
`11
`
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet 11 0f 17
`
`US 6,480,541 B1
`
`2‘HEDGE
`
`0mm
`
`novm
`
`m_
`
`_>_m=>_
`
`
`
`12
`
`12
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet12 0f17
`
`US 6,480,541 B1
`
`nmuoomo
`
`82>
`
`_mc9m
`
`>m02m2
`
`:‘HEDGE
`
`m>m02m2
`
`novm
`
`,<
`
`movm
`
`m>mO§mE
`
`omm
`
`13
`
`13
`
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet 13 0f 17
`
`US 6,480,541 B1
`
`DQDDDHDDDDDE
`
`rfl.......mFE."DmaB0BDDg
`
`
`
`ocsAU«om
`
`14
`
`NFHEDGE
`
`
`
`HE:m:9:OFmqu5ESEoEmIEn.zoEmz<EA.lllll
`
`14
`
`
`
`US. Patent
`
`Nov 12,2002
`
`Sheet 14 0f 17
`
`US 6,480,541 B1
`
`m“WEDGE
`
`2L
`
`U m
`E
`
`2N
`
`J (I)
`2
`
`UJ
`
`[——
`o
`
`O
`
`I
`
`+
`
`15
`
`
`
`«vol|||||||||||||||0L_
`
`“:0“:0
`
`_
`
`
`
`
`
`.5900wEmhnTw_|III|_+
`
`
`
`<_IaI0mPOD00mmE88L+0.600.0
`
`15
`
`
`
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet 15 0f 17
`
`US 6,480,541 B1
`
`DDDDDDDDWFDDDWWDDDE
`
`0:.CON
`
`DDDDDDDDDDDDDDDDDDEDDDODEDDDDDDDE
`
`
`
`DDoomqmon_fimon_H_
`
`1'—
`
`\lelJ}
`
`:5:Exam:5:52:5
`
`
`
`49::24
`
`16
`
`
`
`
`
`WEIQIO._.m:.<m._._m_mm>>O._
`
`
`
`
`
`ZO_H_wZ<N_._.m_.r<m._._m_
`
`
`
`VFWEDO‘H‘mm>>o._0»EE5$19:
`
`zoEmz<EHEEA......................
`
`16
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet16 0f17
`
`US 6,480,541 B1
`
`<>m02m2
`
`m>m02m2
`
`w>mOEm§
`
`movm
`
`mmm
`
`aovm
`
`17
`
`msHEDGE
`
`17
`
`
`
`
`US. Patent
`
`Nov. 12, 2002
`
`Sheet 17 0f 17
`
`US 6,480,541 B1
`
`
`
`orHEDGE
`
`.E0205
`
`18
`
`18
`
`
`
`US 6,480,541 B1
`
`1
`METHOD AND APPARATUS FOR
`PROVIDING SCALABLE PRE-COMPRESSED
`DIGITAL VIDEO WITH REDUCED
`QUANTIZATION BASED ARTIFACTS
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This is a continuation-in-part of US. patent application
`Ser. No. 08/753,618, filed Nov. 27, 1996.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`This invention relates to the field of compressed motion
`video and, more specifically,
`to pre-compressed, stored
`video for video-on-demand applications.
`2. Description of the Related Art
`Digital video signals are typically compressed for trans-
`mission from a source to a destination. One common type of
`compression is “interframe” coding, such as is described in
`the International Telecommunications Union-
`
`Telecommunications (ITU-T) Recommendations H.261 and
`H.262, or the Recommendation H.263. Interframe coding
`exploits the spatial similarities of successive video frames
`by using previous coded and reconstructed video frames to
`predict the current video signal. By employing a differential
`pulse code modulation (DPCM) loop, only the difference
`between the prediction signal and the actual video signal
`amplitude (i.e. the “prediction error”) is coded and trans-
`mitted.
`
`In interframe coding, the same prediction is formed at the
`transmitter and the receiver, and is updated frame-by-frame
`at both locations using the prediction error. If a transmission
`error causes a discrepancy to arise between the prediction
`signal at the transmitter and the prediction signal at
`the
`receiver,
`the error propagates temporally over several
`frames. Only when the affected region of the image is
`updated by an intraframe coded portion of the transmission
`(i.e. a frame coded without reference to a previous frame),
`will the error propagation be terminated. In practice, this
`error propagation may result in an annoying artifact which
`may be visible for several seconds in the decoded, recon-
`structed signal.
`Shown in FIG. 1 is a schematic representation of a
`conventional hybrid interframe coder 10. Only the funda-
`mental elements of the coder are shown in FIG. 1. However,
`this type of hybrid coder is known in the art, and the omitted
`elements are not germane to understanding its operation.
`The coder of FIG. 1 receives an input video signal at
`summing node 12. The output of summing node 12 is a
`subtraction from a current frame of the input signal, of a
`motion-compensated version of a previous frame of the
`input signal (discussed in more detail hereinafter). The
`output of summing node 12 is received by discrete cosine
`transform block 14 (hereinafter DCT 14). The DCT 14 is a
`hardware, software, or hybrid hardware/software component
`that performs a discrete cosine transform on the data
`received from the summing node 12, in a manner well-
`known in the art. The result is the transform of the incoming
`video signal (one block of elements at a time) to a set of
`coefficients which are then input
`to quantizer 16. The
`quantizer 16 assigns one of a plurality of discrete values to
`each of the received coefficients, resulting in an amount of
`compression provided by the quantizer which depends on
`the number of quantization levels used by the quantizer (i.e.
`the “coarseness” of the quantization). Since the quantizer
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`maps each coefficient to one of a finite number of quanti-
`zation levels, there is an error introduced by the quantizer,
`the magnitude of which increases with a decreasing number
`of quantization levels.
`In order to perform the desired interframe coding, the
`output of quantizer 16 is received by an inverse quantizer 17
`and an inverse discrete cosine transform element
`
`(hereinafter “inverse DCT”) 18. Inverse quantizer 17 maps
`the quantizer index into a quantizer representative level. The
`inverse DCT 18 is a hardware, software, or hybrid hardware/
`software component that performs an inverse discrete cosine
`transform on the data received from inverse quantizer 17, in
`a manner well-known in the art. This inverse transform
`decodes the coded data to create a reconstruction of the
`
`prediction error. The error introduced into the signal by
`quantizer 16 reduces the quality of the image which is later
`decoded, the reduced quality being a side effect of the data
`compression achieved through quantization.
`The decoded version of the video signal is output by
`summing node 19, and is used by the coder 10 to determine
`variations in the video signal from frame to frame for
`generating the interframe coded signal. However,
`in the
`coder of FIG. 1, the decoded signal from summing node 19
`is first processed using some form of motion compensation
`means (hereinafter “motion compensator”) 20, which works
`together with motion estimator 21. Motion estimator 21
`makes motion estimations based on the original input video
`signal, and passes the estimated motion vectors to both
`motion compensator 20 and entropy coder 23. These vectors
`are used by motion compensator 20 to build a prediction of
`the image by representing changes in groups of pixels using
`the obtained motion vectors. The motion compensator 20
`may also include various filtering functions known in the art.
`At summing node 12, a frame-by-frame difference is
`calculated, such that the output of summing node 12 is only
`pixel changes from one frame to the next. Thus, the data
`which is compressed by DCT 14 and quantizer 16 is only the
`interframe prediction error representing changes in the
`image from frame to frame. This compressed signal may
`then be transmitted over a network or other transmission
`
`media, or stored in its compressed form for later recall and
`decompression. Prior to transmission or storage, the inter-
`frame coded signal is also typically coded using entropy
`coder 22. The entropy coder provides still further compres-
`sion of the video data by mapping the symbols output by the
`quantizer to variable length codes based on the probability
`of their occurrence. After entropy coding, the signal output
`from entropy coder 22 is transmitted along with the com-
`pressed motion vectors output from entropy coder 23.
`In practice, if a compressed video signal such as the one
`output from the coder of FIG. 1 is transmitted over unreli-
`able channels (e.g. the internet, local area networks without
`quality of service (QoS) guarantees, or mobile radio
`channels), it is particularly vulnerable to transmission errors.
`Certain transmission errors have the characteristic of low-
`
`ering the possible maximum throughput (i.e. lowering the
`channel capacity or “bandwidth”) of the transmission
`medium for a relatively long period of time. Such situations
`might arise due to a high traffic volume on a store-and-
`forward network such as the internet, or due to an increasing
`distance between a transmitter and receiver of a mobile radio
`channel.
`In order to maintain a real-time transmission of the video
`
`information in the presence of a reduced bandwidth, the
`transmitter must reduce the bit rate of the compressed video.
`Networks without QoS guarantees often provide messaging
`
`19
`
`19
`
`
`
`US 6,480,541 B1
`
`3
`channels that allow the receiver or the network to request a
`lower transmission bit
`rate from the transmitter. For
`
`4
`degrees of quantization used in coding different bitstreams at
`different bit rates.
`
`example, real-time protocol (RTP), designed by the Internet
`Engineering Task Force and now part of the ITU-T Draft
`International Standard H.225 .0 “Media Stream Packetiza-
`tion and Synchronization on Non-Guaranteed Quality of
`Service LANS”, can be used to “throttle” the transmitter bit
`rate. For a point-to-point transmission with real-time coding,
`the video source coder can usually accommodate the request
`for a reduced bit rate by using a coarser quantization by
`reducing the spatial resolution of the frames of the video or
`by periodically dropping video frames altogether. However,
`if the video has been coded and stored previously, the bit rate
`is chosen in advance, making such a request difficult to
`satisfy.
`To accommodate the desire for a variable bit rate in the
`
`transmission of stored video, a “scalable” video representa-
`tion is used. The term “scalable” is used herein to refer to the
`
`ability of a particular bitstream to be decoded at different bit
`rates. With scalable video, a suitable part of the bitstream
`can be extracted and decoded to yield a reconstructed video
`sequence with a quality lower than what could be obtained
`by decoding a larger portion of the bitstream. Thus, scalable
`video supports “graceful degradation” of the picture quality
`with decreasing bit rate.
`In a video-on-demand server, the same original motion
`video sequence can be coded and stored at a variety of bit
`rates. When a request for the sequence is made to the server,
`the appropriate bit
`rate would be selected,
`taking into
`account the current capacity of the network. A problem
`arises, however, if it becomes necessary to change the bit
`rate during the transmission. The server may switch from a
`first bitstream having a first bit rate to a second bitstream
`having a second bit rate due to a different coarseness of
`quantization or different spatial resolution. However, if the
`sequences are interframe coded,
`the switchover produces
`annoying artifacts due to the difference in the image quality
`of the two bitstreams. These can be avoided by the regular
`use of intraframe coded frames (generally referred to as
`“I-frames”), in which the entire image is coded, rather than
`just the differences from the previous frame. The Moving
`Picture Experts Group (MPEG) standard (i.e. ITU-T H.262)
`calls for the regular inclusion of I-frames, typically every
`few hundred milliseconds. However, the use of I-frames,
`requiring a significant amount of data, dramatically
`increases the overall bit rate. For example, an I-frame might
`require six times as much data as an interframe coded frame.
`In such a case, coding every fifth frame as an I-frame would
`double the bit rate.
`
`US. Pat. No. 5,253,058, to Gharavi, discloses a scalable
`video architecture which uses a base layer and an enhance-
`ment
`layer (called a contribution layer) which must be
`encoded by a separate encoder. The method does not support
`different frame rates for the video at different quality levels
`but,
`rather,
`for different spatial
`resolutions. More
`importantly, in this method, the enhancement layer cannot
`be transmitted and decoded independently;
`it always
`requires the transmission and decompression of the base
`layer first. This makes bandwidth-adaptive serving a com-
`plicated task,
`leads to inefficient compression, and ulti-
`mately affects the performance of the whole system.
`It is therefore an object of this invention to allow the
`coding of video sequences for storage and retrieval over
`networks without QoS guarantees, such that the bit rate
`provided by the server can be changed during the transmis-
`sion of the sequence without resorting to the use of I-frames,
`but while minimizing artifacts produced by the different
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`SUMMARY OF THE INVENTION
`
`The present invention avoids the aforementioned artifacts
`by providing a set of transition data that can be interframe
`decoded between decoding of a first bitstream (at a first bit
`rate) and a second bitstream (at a second bit rate). The
`transition data compensates for visual discrepancies
`between a decoded version of the first bitstream and a
`decoded version of the second bitstream. Thus, after a first
`bitstream has been decoded, the transition data is decoded,
`and then the second bitstream. The second bitstream pro-
`vides a continuation of the video sequence that was begun
`with the first bitstream, and the transition data compensates
`for visual artifacts that would otherwise be present due to the
`difference in the bit rates of the first and second bitstreams.
`
`In one embodiment of the invention, the transition data is
`created by periodically imputing the characteristics of a first
`(typically lower bit rate) bitstream to a second (typically
`next higher bit rate) bitstream. During interframe coding of
`the first bitstream, coded data is decoded and employed by
`the first bitstream coder for use in comparing to data in a
`subsequent frame, thus allowing the differences between the
`frames to be determined. The decoded (i.e., reconstructed)
`video signal has image characteristics due to the relatively
`coarse quantization used during coding of the first bitstream,
`or due to a different spatial resolution. This embodiment
`therefore uses the reconstructed signal as a source from
`which to periodically code a frame of the second bitstream.
`That
`is, while the second bitstream is normally coded
`directly from the analog video signal, frames of the signal
`are periodically coded using the signal reconstructed from
`the first bitstream.
`In effect, a lower bit rate frame is
`“inserted” into the higher bit rate data stream. These frames
`are therefore referred to herein as “lower bit rate insert
`
`frames” (LBIFs).
`The LBIFs inserted into the second bitstream provide
`points of correspondence between the image data of the two
`bitstreams in that the effects of the coarser quantization (or
`different spatial resolution) of the first bitstream are peri-
`odically introduced to the second bitstream. These LBIFs
`therefore provide points in the temporal progression of the
`video sequence at which a change from one bitstream to the
`other may be made, without the introduction of any signifi-
`cant visual artifacts into the decoded video. Thus, when
`switching from the first bitstream to the second bitstream, it
`is most desirable to have the first frame received from the
`
`second bitstream be a frame that follows an LBIF. Similarly,
`when switching from the second bitstream to the first
`bitstream, it is desirable to have the last frame received from
`the second bitstream be an LBIF. In this way, the two frames
`will be as closely related as possible.
`This embodiment of the invention preferably makes use
`of LBIFs in a video-on-demand server. Multiple bitstreams
`are stored to be decoded using different relative bit rates. For
`all but the bitstream having the lowest bit rate, LBIFs are
`periodically inserted into the bitstreams from the bitstream
`having the next lower bit rate. Thus, the server has the same
`video sequence at different bit rates, with LBIFs to enable
`switching between the bitstreams. As the server is streaming
`the video data at one bit rate, a request for a different bit rate
`(higher or lower) is satisfied by switching to another stored
`bitstream at the temporal point in the video sequence cor-
`responding to an LBIF in the bitstream having the higher bit
`rate. Effectively seamless bit rate “throttling” is therefore
`accomplished with a minimization of artifacts.
`
`20
`
`20
`
`
`
`US 6,480,541 B1
`
`5
`In an alternative embodiment, the multiple bitstreams are
`transmitted simultaneously over a transmission medium,
`such as the airwaves. The bitstreams are multiplexed
`together, and demultiplexed at the site of a decoder. With all
`of the bitstreams being available at the decoder location, the
`switching from one bitstream to the next is accomplished in
`the manner described above, only by switching between the
`received, demultiplexed bitstreams. Preferably, each frame
`of each bitstream is accompanied by coded data regarding
`the nature of the frame (i.e. whether it is a frame after which
`one may switch to a next higher bit rate bitstream, a next
`lower bit rate bitstream, or not at all).
`In another alternative embodiment, the input video signal
`is periodically coded in intraframe mode, such that frames of
`data are generated which correspond to interframe coded
`frames of the lowest rate bitstream, but which include all of
`the data necessary to independently recreate that frame of
`the video sequence. This embodiment does not have the high
`level of data compression of the preferred embodiment, but
`allows for random access. LBIFs are used in the higher rate
`bitstreams as points at which one may switch between the
`bitstreams with a minimum of quantization-based artifacts.
`However, the intraframe coded frames allow a user to begin
`the video sequence at any of the temporal points correspond-
`ing to the location of the intraframe coded frames. If a higher
`bit rate is thereafter desired, the rate may be increased at the
`appropriate LBIF locations, as described above. This
`embodiment is also useful in that it allows for fast forward
`
`and fast rewind of the video sequence by displaying the
`intraframe coded frames only, thus allowing a user to search
`quickly through the video sequence.
`In yet another embodiment of the invention, LBIFs are not
`inserted into the existing bitstreams. Instead, at least one
`(and typically a plurality of) “switch” frames are created.
`That is, transition data is stored on the server separate from
`the bitstreams containing the video data, and is used to
`provide an interframe decoding system with data that com-
`pensates for the difference in reconstructed frames of the two
`bitstreams. This compensation is typically for a given frame
`of video data at any point in time, each switch frame (or
`“S-frame”)
`therefore providing a point of continuity
`between the bitstreams only for that frame. The S-frame is
`preferably the difference between the two bitstreams for
`similar frames. Since a given frame represents a “time
`index” (a specific temporal point in the video sequence), any
`difference between frames that are reconstructed for a given
`time index from the first and second bitstream comes from
`
`the different bit rates (e.g., a difference in quantization levels
`or spatial resolution). Thus, taking the difference between
`reconstructed frames of the same time index (or consecutive
`time indexes) for the two bitstreams provides the informa-
`tion necessary to compensate the decoder for bitstream
`transition related artifacts.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`In one version of the S-frame embodiment, the S-frames
`do not have a common time index with a frame from each
`
`55
`
`6
`rate bitstream to the higher bit rate bitstream, the S-frame is
`generated by subtracting a motion compensated lower bit
`rate frame (having an earlier time index) from a higher bit
`rate frame (having a later time index).
`If the S-frame is generated using frames from the lower
`bit rate bitstream and the higher bit rate bitstream that have
`the identical time index, a two-directional point of continu-
`ity is created between the bitstreams by the S-frame. In that
`case, motion compensation is omitted, and a single S-frame
`can be used to transition from either the lower bit rate
`
`bitstream to the higher bit rate bitstream, or vice versa. In
`such an embodiment, the transmitted S-frame has the same
`time index as the last frame of the first bitstream, and is
`transmitted before a first frame of the second bitstream,
`which typically has a subsequent time index. If the S-frame
`was created by subtracting a frame of the first bitstream from
`a frame of the second bitstream, it may be processed directly
`by the decoder. However, for an S-frame used to switch from
`the second bitstream to the first bitstream, the frame is first
`inverted by the decoder before being added. This ensures
`that
`the correct compensation is being provided by the
`S-frame.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a schematic view of a prior art interframe coding
`apparatus.
`FIG. 2 is a schematic view of a scalable interframe coding
`apparatus according to the present invention.
`FIG. 3 is a schematic view of a video-on-demand server
`
`apparatus according to the present invention.
`FIG. 4 is a diagrammatic view of three bitstreams that are.
`decoded using a lower bit rate insert frame architecture
`according to the present invention.
`FIG. 5 is a diagrammatic view of three bitstreams that are
`decoded using a lower bit rate insert frame architecture of
`the present
`invention and showing transition paths for
`switching between the bitstreams.
`FIG. 6A is a schematic view of the coding portion of an
`alternative embodiment of the present invention in which
`multiple bitstreams are transmitted over a broadband trans-
`mission channel.
`
`FIG. 6B is a schematic view of the decoding portion of the
`alternative embodiment shown in FIG. 6A.
`
`FIG. 7 is a diagrammatic view of three bitstreams gen-
`erated using an alternative embodiment of the invention in
`which intraframe coded frames of the input video signal are
`periodically generated.
`FIG. 8 is a diagrammatic view of three bitstreams gen-
`erated using an alternative embodiment of the invention in
`which multiple bitstreams are transmitted over a broadband
`transmission channel, and in which intraframe coded frames
`of the input video signal are generated and used to periodi-
`cally replace an interframe coded frame of a low bit rate
`bitstream.
`
`of the higher and lower bitstreams, and the coding of the
`difference between reconstructed frames in enhanced by
`motion compensation. Thus, the direction of transition (e.g.,
`from the higher bit rate bitstream to the lower bit rate
`bitstream) determines which difference must be taken. That
`is, since the lower bit rate and upper bit rate frames used to
`construct
`the S-frame are from consecutive (not
`simultaneous) time indexes, it is necessary to subtract the
`motion compensated frame having the earlier time index
`from the frame having the later time index to generate the
`right S-frame. Therefore, if the S-frame is intended to create
`a point at which the decoding may change from the lower bit
`
`FIG. 9 is a schematic view of a scalable interframe coding
`apparatus similar to that of FIG. 2, but which makes use of
`reference picture resampling elements.
`FIG. 10 is a schematic view of a scalable interframe
`
`coding apparatus according to an alternative embodiment of
`the invention in which S-frames are separately generated
`and stored.
`FIG. 11 is a schematic view of a video-on-demand server
`
`apparatus according to an alternative embodiment of the
`invention in which S-frames are transmitted and used for bit
`rate transitions.
`
`60
`
`65
`
`21
`
`21
`
`
`
`US 6,480,541 B1
`
`7
`FIG. 12 is a diagrammatic view of bitstreams that are
`decoded using an S-frame architecture according to the
`present invention, the diagram showing transition paths for
`switching between the bitstreams.
`FIG. 13 is a schematic view of a scalable interframe
`
`coding apparatus according to the present invention that is
`similar to that of FIG. 10 and that makes use of reference
`
`picture resampling elements.
`FIG. 14 is a diagrammatic view of bitstreams that are
`decoded using an S-frame architecture according to the
`present
`invention in an embodiment
`that allows two-
`directional transitioning between the bitstreams.
`FIG. 15 is a schematic view of a video-on-demand server
`
`apparatus used according to an alternative embodiment of
`the invention in which two-directional transitioning between
`bitstreams is allowed.
`FIG. 16 is a schematic view of a scalable interframe
`
`coding apparatus according to another alternative embodi-
`ment of the invention in which two sets of S-frames are
`
`encoded, each allowing transitioning in one direction
`between two bitstreams.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`
`An interframe coding apparatus 100 according to the
`present invention is shown schematically in FIG. 2. The
`coding apparatus 100 contains three embedded coders 100a,
`100b and 100C, each of which supplies data at a different bit
`rate. It will be recognized by those skilled in the art that the
`coding apparatus 100 may consist of any plural number of
`embedded coders, so as to allow the support of any desired
`number of different bit rates. It will also be understood that
`
`the coders 100a, 100b, 1006 may be any type of interframe
`coder, and still make use of the inventive principles
`described herein. The preferred embodiment for each of the
`embedded coders 100a, 100b, 100C, however, are of essen-
`tially the same structure as the prior art coder of FIG. 1.
`Some of the more important details of the coders are
`discussed briefly above in conjunction with FIG. 1.
`The coding apparatus 100 of FIG. 2 is arranged to allow
`the coding and storage of the same video signal at a variety
`of different bit rates. In particular, the video signal is coded
`using different resolutions of quantization in each of coders
`100a, 100b, 100C, respectively. As shown, the output of
`coder 100a is stored in memory unit 140a, the output of
`coder 100b is stored in memory unit 140b, and the output of
`coder 1006 is stored in memory unit 140C. Once the video
`signal is coded and stored, the stored signals may be used as
`part of a video-on-demand server to provide the same video
`signal at any of a number of different bit rates. The manner
`in which the data is coded and stored allows for the bit rate
`
`to be changed during a transmission of the video signal by
`switching the output from, for example, memory 140a to
`memory 140b.
`Multiple coders 100a, 100b, 1006 are each designed for
`coding data with a different level of compression, so that
`each provides video data for transmission at a different bit
`rate. In general, the greater the number of quantization levels
`used by the coder, the higher the quality of the transmitted
`image, and the higher the bit rate. Thus,
`in the tradeoff
`between image quality and transmission bandwidth,
`the
`quality of a transmission channel often determines the
`bandwidth which will allow real time decoding and display
`at the receiving end of the transmission. If a variety of bit
`rates are available, handshaking commands between the
`destination and the source can be used to select the highest
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`
`bit rate tolerable by the transmission channel (for real time
`decoding), thereby providing the best possible image qual-
`ity.
`In the FIG. 2 embodiment, coder 100a codes the video
`signal with a coarseness of quantization which results in its
`output having the lowest bit rate of the signals provided by
`the coders. Similarly, the signal output by coder 100b has a
`less coarse quantization which produces the next higher bit
`rate, and the signal output by coder 1006 has an even less
`coarse quantization than coder 100b, which results in the
`highest bit rate. Thus, if a transmission channel b