throbber
(19) United States
`(12) Patent Application Publication (10) Pub. N0.: US 2002/0071485 A1
`
`Caglar et al.
`(43) Pub. Date:
`Jun. 13, 2002
`
`US 20020071485A1
`
`(54) VIDEO CODING
`
`(57)
`
`ABSTRACT
`
`(76)
`
`Inventors: Kerem Caglar, Tampere (Fl); Miska
`Hannuksela, Tampere (F1)
`
`Correspondence Address:
`PERMAN & GREEN
`425 POST ROAD
`
`FAIRFIELI), CT 06430 (US)
`
`(21) Appl. No.:
`
`09/935,119
`
`(22)
`
`Filed:
`
`Aug. 21, 2001
`
`(30)
`
`Foreign Application Priority Data
`
`Aug. 21, 2000
`
`(F1) ............................................. 20001847
`
`Publication Classification
`
`Amethod for encoding a Video signal comprises the steps of:
`encoding a first complete frame by forming a bit-stream
`containing information for its subsequent full recon-
`struction (150) the information being prioritized (148)
`into high and low priority information;
`
`defining (160) at least one Virtual frame on the basis of a
`version of the first complete frame constructed using
`the high priority information of the first complete frame
`in the absence of at least some of the low priority
`information of the first complete frame; and
`
`encoding (146) a second. complete frame.by forming a
`bit-stream containing information for its subsequent
`full reconstruction the information being prioritized
`into high and low priority information enabling the
`second complete frame to be fully reconstructed on the
`basis of the Virtual frame rather than on the basis of the
`
`Int. Cl.7 ....................................................... H04B 1/66
`(51)
`(52) US. Cl.
`........................................................ 375/240.01
`
`first complete frame. Acorresponding decoding method
`is also described.
`
`Displayed
`pictures
`
`Bit-stream
`
`not displayed
`
`Virtual
`pictures,
`
`UNIFIED 1007
`
`UNIFIED 1007
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 1 01'16
`
`US 2002/0071485 A1
`
`Input Video
`
`Ouiput Video
`
`Source Coder
`
`
`
`
`I Waveform Coder '
`i Waveform Decoder I
`
`
`
`Entropy Coder _[
`Entropy Decoder
`
`
`Source Decoder
`
`
`
`Transport Coder
`
`Transport Decoder
`
`Fig. 1
`
` r
`
`A
`
`~——
`
`v—y/~
`
`anchor/reference pictures
`
`-—r
`
`Fig. 2
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 2 0f 16
`
`US 2002/0071485 A1
`
`
`
`120 kme
`
`SO'kbEG
`
`
`
`Fig. 3
`
`Enhancement
`
`J
`
`Layer
`
`Base
`
`Layer
`
`
`
`Fig. 4
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 3 0f 16
`
`US 2002/0071485 Al
`
`Enhancement
`
`Layer
`
`F W
`
`.Base
`Layer
`
`Fig. 5
`
`INTRAFHAME
`
`PHEmCTED
`
`’
`
`‘
`
`*’
`
`_—\
`
`QUALITY
`
`Low
`
`BASE LAYER
`
`{ TST LAYER
`
`2ND LAYER 3RD LAYER
`
`ENHANCEMENT
`LAYERS
`
`Fig. 6
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 4 0f 16
`
`US 2002/0071485 A1
`
`Base Layer
`
`If 1st Layer
`
`
`
`Enhancement
`Layers
`
`2nd Layer
`
`8rd Layer
`
`Qualigy
`Low
`
`
`
`1
`
`High
`
`Fig. 7
`
`Base Layer
`
`1st Layer
`
`and Layer
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 5 0f 16
`
`US 2002/0071485 A1
`
`Base Layer
`
`1st Layer
`
`
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 6 0f 16
`
`US 2002/0071485 A1
`
`
`
`time
`
`
`
`Fig. 11
`
`
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 7 0f 16
`
`US 2002/0071485 A1
`
`
`
`Fig. 13
`
`scene out
`
`scene cut
`
`1ml
`
`
`
`Fig. 14
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 8 of 16
`
`US 2002/0071485 A1
`
`scene cut
`
`i
`
`new
`
`time
`
`_
`
`scerle out
`
`I
`In -'
`
`1
`
`input Video
`
`Output Video
`
`__-..-..-_.__.__ ___-.--.-_....._____—_—__._.._._._.__..HH—____u...._.....
`Uncompressed pictures
`
`._____...______
`
`VCL - Encoder
`
`VCL « Decoder
`
`Coded parameters
`
`
`
`
`
`
`
`
`
`
`
`NAL - Packetizer
`
`NAL - Depacket'lzer
`
`
`
`Transport Coder SDUs
`
`
`
`
`
`Transport Coder
`
`
`
`Transport Decoder
`
`r'
`
`/
`
`151.
`
`E
`
`Fig. 16
`
`
`
`
`
`

`

` (:) H.28L Syntaxelements
`
`Priority classes
`Lil—J
`
`
`‘— Dependency
`
`
`
`
`
`
`
`
`
`91.!06109118zooz‘fl'unl‘noun-1mmnounauddwumd
`
`
`
`
`
`Fig- 17
`
`IVSSflLlIOI'ZOIJZSH
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 10 0f 16
`
`US 2002/0071485 A1
`
` initiatise frame counter / 110
`
`initialise complete
`frame buffer
`
`1 12
`
`initialise virtual reference
`frame buffer
`
`114
`
`-
`-
`[receive raw uncoded Video data for
`a frame e.. from Video camera
`122
`
`116
`
`120
`
`receive indication of coding
`mode to be used for current
`frame i.e. ENTER VS. INTRA
`
`[/1 18
`reset encodin
`Echeme; e.
`_ g
`IBBPBBPB I...
`
`
`pre
`
`determined
`
`.
`.
`_
`_
`
`receive Indicator: of
`134
`reference to be used
`N
`
`
`3“ INTER
`feedback
`coding of
`
`
`
`130
`from remot
`142
`decoder
`
`
`retrieve referenc
`from com lete
`frame bu er
`
`I
`
`144
`
`
`encode corrent frame in INTER
`ormat usm raw video data and /145
`
`selected re erence frame
`
`
`:—"""""""'
`:lscene cut .
`5 detector
`3
`l“""“'"“"'
`_______..._..""""""
`eedback from
`
`I___-__......- CL (D OD Q. ('0 _1
`
`->INTRA request
`
`'_"_"U"""_“"
`I
`124
`
`Y
`
`,
`n.f
`
`128
`
`encme curren
`frame to form
`com ressed fram =
`in J
`RA flan-”at
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 11 0f 16
`
`US 2002/0071485 A1
`
`
`
`
`pFPritise information /.y143
`
`
`
`o rame
`
`
`or
`bit—stre m
`to etransmerltted
`
`
`150
`
`
`
`
`
`1/152
`
`transmit bit—stre m
`to remote deco er
`
`
`
`ooecoe rame usm
`complete coded do a
`to form complete
`reconstruction of frame
`
`store complete
`reconstruction of frame
`In complete frame buffer
`
`Hecoe rame usrng‘
`
`higfh priority Information
`
`to arm Vlrtual frame
`
`157
`
`158
`
`160
`
`{store virtual frame
`In vrrtual frame buffer W152
`
`Fig. 18b
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 12 0f 16
`
`US 2002/0071485 A1
`
`initialise virtual reference
`frame buffer
`
`
`
`
`
`
`M210
`
`
`
`
`fnitiafise normal reference we“
`
`initfaffse frame counter
`
`/212
`
`receive bit-stream for
`compressed frame
`
`/214
`
`218
`N
`decode compressed
`LEERfiéaEi‘t‘i’s‘frs’an Y
`to foFer completg
`reconstruction of
`fNTRA frame
`
`/ 21B
`
`INTRA
`frame?
`
`gee
`N
`N fdefffffr reference frame
`{of Ice used m predrctlcn
`0
`rame
`
`230
`232
`
`«I
`222
`
`f/
`j
`‘
`.
`
`
`f‘
`ma!
`retrieve
`
`Efrdtfigecj Y
`trgfggenc- N indicaltetd
`
`
`
`
`
`
`sea? eerie:
`
`refeeee
`
`
`
`
`from
`
`
`
`b uffe r
`I'O m bUfi'e
`
`
`
`
`
`
`
`
`
`decode INTER) frame using complete
`store newly
`
`FECEIVB bit-stream and Indicated reference
`decoded frame
`
`
`
`frame as redrchon reference
`m complete
`
`reference buffer
`
`
`
`decode compressed
`INTRA frame
`1/236
`using high priority
`
`Information to form
`
`next Vlf'fUal reference
`
`'a
`224 E40
`
`
`decode compressed
`
`store new
`
`
`(INTER) frame usin
`
`
`Virtual _referen ce
`
`h! h pnonfyrnforma io
`
`
`frame m frame
`
`in compfete
`to orm next wr'tuaf
`
`
`
`buffer
`reference buffer
`
`
`reference fr
`
`
`
`
`220
`
`'
`
`.
`retrieve
`-
`-
`
`
`
`
`
`
`
`
`_._$“
`234
`
`
`
`242
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 13 of 16
`
`US 2002/0071485 A1
`
`from step 214
`
`
`bit—stream /310
`correctly
`
`received?
`
`I
`to step 214
`N
`
`
`
`
`
`31:31
`issue {NTRA
`6‘ update request
`
`to sending
`terminal and
`
`
`
`312
`
`
`
`able to
`
`decode high
`
`_
`priority
`
`
`
`nformatien?
`
`316
`
`to step 214
`
`
`
`Y
`
`3
`
`
`
`instryct sending
`termrnal to
`encode next
`
`frame with
`
`reference to
`
`hi h priqrity
`
`m ormatlon of
`current frame and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`abte to
`
`decode
`ilow prlqnty
`
`
`Informatren’?
`
`N
`
`318
`
`
`
`Y
`
`Fig 20 to step216
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 14 0f 16
`
`US 2002/0071485 A1
`
`Displayed
`pictures
`
`Bil-stream
`
`not displayed
`
`Virtual
`
`pictures,
`
`Fig. 21
`
`
`
`Fig. 22
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 15 0f 16
`
`US 2002/0071485 A1
`
`404
`
`402
`
`W
`400
`
`Fig.23
`
`

`

`Patent Application Publication
`
`Jun. 13, 2002 Sheet 16 0f 16
`
`US 2002/0071485 A1
`
`500
`
`F/
`
`Input source
`T Uncompressed video
`
`Encoder
`iii
`
`ZPE-plcture(s)
`
`
`Compressed picture(s)
`
`Packetizer ZPE‘bit“5tream TX-ZPE—decoder
`flfi
`fl
`
`
`
`
`
`Video packets
`
`Transmitter
`
`iLB
`
`Transmissan channel
`
`Video packets
`
`Receiver
`
`£2
`
`512
`
`Video packets
`
`Depackefizer
`524
`
`ZPE-bit—stream
`
`Compressed picture(s}
`
`Decoder
`526
`
`ZPE‘D'CWBW RXZPE-decoder
`528
`
`Reconstructed video
`
`0
`
`Fig. 24
`
`

`

`US 2002/0071485 A].
`
`Jun. 13, 2002
`
`VIDEO CODING
`
`FIELD OF THE INVENTION
`
`[0001] The invention relates to data transmission and is
`particularly, but not exclusively, related to transmission of
`data representative of picture sequences, such as video. It is
`particularly suited to transmission over links susceptible to
`errors and loss of data, such as over the air interface of a
`cellular telecommunications system.
`
`BACKGROUND OI“ 'I'IIIi INVEN'I‘ION
`
`[0002] During the past few years, the amount of multi—
`media contenl available through the Internet has increased
`considerably. Since data delivery rates to mobile terminals
`are becoming high enough to enable such terminals to
`retrieve multi—media content,
`it
`is becoming desirable to
`provide such retrieval from the Internet. An example of a
`high-speed data delivery system is the General Packet Radio
`Service (GPRS) of the planned (JSM phase 2+.
`
`[0003] The term multi-media as used herein includes both
`sound and pictures, soqu only and pictures only. Sound
`includes speech and music.
`
`In the Internet, transmission of rnulti-media content
`[0004]
`is packet—based. Network traffic through the Internet is based
`on a transport protocol called the Internet Protocol (IP). IP
`is concerned with transporting data packets from one loca-
`tion to another. It facilitates the routing of packets through
`intermediate gateways, that is, it allows data to be sent to
`machines (e.g. routers) that are not directly connected in the
`same physical network. The unit of data transported by the
`IP layer is called an IP datagrarn. The delivery service
`offered by IP is connectionless, that is IP datagrams are
`routed around the Internet
`independently of each other.
`Since no resources are permanently committed within the
`gateways to any particular connection, the gateways may
`occasionally have to discard datagrams because of lack of
`bufler space or other resources. Thus, the delivery service
`ofi'ered by IP is a best efi'ort service rather than a guaranteed
`service.
`
`Internet multi—media is typically streamed using
`[0005]
`the User Datagram Protocol (UDP), the Transmission Con—
`trol Protocol (PCP) or
`the Hypertext Transfer Prot0col
`(I'I'I'l'l’). UDP does not check that the datagrams have been
`received, does not retransmit missing datagrams, nor does it
`guarantee that the datagrams are received in the same order
`as they were transmitted. UDP is connectionless. TCP
`checks that the datagrams have been received and retrans-
`mits missing datagrams. It also guarantees that the data-
`grams are received in the same order as they were transmit—
`ted. TCP is connection orientated.
`
`In order to ensure multi—media content of a suffi—
`[0006]
`cient quality is delivered, it can be provided over a reliable
`network connection, such as 'I‘CP, to ensure that received
`data are error-free and in the correct order. Lost or corrupted
`protocol data units are retransmitted.
`
`[0007] Sometimes re-transrnission of lost data is not
`handled by the transport protocol but rather by some higher—
`level protocol. Such a protocol can select the most vital lost
`parts of a multi—media stream and request the re—transmis—
`sion of those. The most vital parts can be used for prediction
`of other parts of the stream, for example.
`
`[0008] Multi-media content typically includes video. In
`order to be transmitted etficiently, video is often com—
`pressed. Therefore, compression efiiciency is an important
`parameter in video transmission systems. Another important
`parameter is tolerance to transmission errors. Improvement
`in either one of these parameters tends to adversely affect the
`other and so a video transmission system should have a
`suitable balance between the two.
`
`[0009] FIG. 1 shows a video transmission system. The
`system comprises a source coder which compresses an
`uncompressed video signal
`to a desired bit rate thereby
`producing an encoded and compressed video signal and a
`source decoder which decodes the encoded and compressed
`video signal to reconstruct the uncompressed video signal.
`The source coder comprises a waveform coder and an
`entropy coder. The waveform coder performs Iossy video
`signal compression and the entropy coder losslessly converts
`the output of the waveform coder into a binary sequence.
`The binary sequence is conveyed from the source coder to
`a transport coder which encapsulates the compressed video
`according to a suitable transport protocol and then transmits
`it to a receiver corn prising a transport decoder and a source
`decoder. The data is transmitted by the transport coder to the
`transport decoder over a transmission channel. The transport
`coder may also manipulate the compressed video in other
`ways. For example, it may interleave and modulate the data.
`After being received by the transport decoder the data is then
`passed on to the source decoder. The source decoder com—
`prises a waveform decoder and an entropy decoder. The
`transport decoder and the source decoder perform inverse
`operations to obtain a reconstructed video signal for display.
`The receiver may also provide feedback to the transmitter.
`For example, the receiver may signal the rate of successfully
`received transmission data units.
`
`[0010] A video sequence consists of a series of still
`images. A video sequence is compressed by reducing its
`redundant and perceptually irrelevant parts. The redundancy
`in a video sequence can be categorised as spatial, temporal
`and spectral redundancy. Spatial redundancy refers to the
`correlation between neighbouring pixels within the same
`image. Temporal redundancy refers to the fact that objects
`appearing in a previous image are likely to appear in a
`current image. Spectral redundancy refers to the correlation
`between the difl'erent colour components of an image.
`
`[0011] Temporal redundancy can be reduced by generat-
`ing motion compensation data, which describes relative
`motion between the current image and a previous image
`(referred to as a reference or anchor picture). Efiectively the
`current image is formed as a prediction from a previous one
`and the technique by which this is achieved is commonly
`referred to as motion compensated prediction or motion
`compensation. In addition to predicting one picture from
`another, parts or areas of a single picture may be predicted
`from other parts or areas of that picture.
`
`[0012] Asufficient level of compression cannot usually be
`rCached just by reducing the redundancy of a video
`sequence. "I'herefore, video encoders also try to reduce the
`quality of those parts of the video sequence which are
`subjectively less important. In addition, the redundancy of
`the encoded bit—stream is reduced by means of eflicient
`lossless coding of compression parameters and coefficients.
`The main technique is to use variable length codes.
`
`

`

`US 2002/0071485 A1.
`
`Jun. 13, 2002
`
`[0013] Video compression methods typically dilferentiate
`images on the basis of whether they do or do not utilise
`temporal redundancy reduction (that is, whether they are
`predicted or not). Referring to FIG. 2, compressed images
`which do not utilise temporal redundancy reduction methods
`are usually called INTRA or I—frames. INT'RA frames are
`frequently introduced to prevent the effects of packet losses
`from propagating spatially and temporally. In broadcast
`situations,
`IN'I‘RA frames enable new receivers to start
`decoding the stream, that is they provide “access points”.
`Video coding systems typically enable insertion of INTRA
`frames periodically every n seconds or in frames. It is also
`advantageous to utilise INTRA frames at natural scene cuts
`where the image content changes so much that temporal
`prediction from the previous image is unlikely to be suc-
`cessful or desirable in terms of compression efficiency.
`
`[0014] Compressed images which do utilise temporal
`redundancy reduction methods are usually called INTER or
`P-l‘rames. INTER frames employing motion-compensation
`are rarely precise enough to allow sufficiently accurate
`image reconstruction and so a spatially compressed predic-
`tion error image is also associated with each INTER frame.
`This represents the difference between the current frame and
`its prediction.
`
`[0015] Many video compression schemes also introduce
`temporally bi-directionally-predicted frames, which are
`commonly referred to as B-pictures or B-frames. Il-frames
`are inserted between anchor (I or P) frame pairs and are
`predicted from either one or both of the anchor frames, as
`shown in FIG. 2. B-frames are not
`themselves used as
`anchor frames, that is other frames are never predicted from
`them and are simply used to enhance perceived image
`quality by increasing the picture display rate. As they are
`never used themselves as anchor frames,
`they can be
`dropped without affecting the decoding of subsequent
`frames. This enables a video sequence to be decoded at
`difl'erent rates according to bandwidth constraints of the
`transmission network, or different decoder capabilities.
`[0016] The term group of pictures (GOP) is used to
`describe an INTRA frame followed by a sequence of tem—
`porally predicted (P or B) pictures predicted from it.
`
`[0017] Various international video coding standards have
`been developed. Generally, these standards define the bit-
`stream syntax used to represent
`a compressed video
`sequence and the way in which the bit—stream is decoded.
`One such standard, H.263, is a recommendation developed
`by the International Telecommunications Union (I'l'U). Cur-
`rently, there are two versions of 11.263. Version 1 consists of
`a core algorithm and four optional coding modes. H.263
`version 2 is an extension of version 1 which provides twelve
`negotiable coding modes. 11.263 version 3, which is pres-
`ently under development, is intended to contain two new
`coding modes and a set of additional supplemental enhance-
`ment information code—points.
`[0018] According to H263, pictures are coded as a lumi—
`nance component (Y) and two colour dijfercnce (chromi—
`nance) components (CB and CR). The chrorninance compo-
`nents are sampled at half spatial resolution along both
`co—ordinate axes compared to the luminance component.
`The luminance data and spatially sub—sampled chrominance
`data is assembled into mabroblocks (MBs). Typically a
`macroblock comprises 16X16 pixels of luminance data and
`the spatially corresponding 8x8 pixels of chrominance data.
`
`[0019] Each coded picture, as well as the corresponding
`coded bit—stream, is arranged in a hierarchical structure with
`four layers which are, from top to bottom, a picture layer, a
`picture segment layer, a macroblock (M B) layer and a block
`layer. The picture segment layer can be either a group of
`blocks layer or a slice layer.
`
`[0020] The picture layer data contains parameters affect-
`ing the whole picture area and the decoding of the picture
`data. The picture layer data is arranged in a so—called picture
`header.
`
`[0021] By default, each picture is divided into groups of
`blocks. A group of blocks (GOB) typically comprises 16
`sequential pixel lines. Data for each GOB comprises an
`optional GOB header followed by data for macroblocks.
`
`If an optional slice structured mode is used, each
`[0022]
`picture is divided into slices instead of (3013s. Data for each
`slice comprises a slice header followed by data for macrob—
`locks.
`
`[0023] A slice defines a region within a coded picture.
`Typically, the region is a number of macroblocks in nortnal
`scanning order. There are no prediction dependencies across
`slice boundaries within the same coded picture. Ilowever,
`temporal prediction can generally cross slice boundaries
`unless H.263 Annex R (Independent Segment Decoding) is
`used. Slices can be decoded independently from the rest of
`the image data (except for the picture header). Consequently,
`the use of slice structured mode improves error resilience in
`packet—based networks that are prone to packet
`loss, so—
`called packet—lossy networks.
`
`[0024] Picture, GOB and slice headers begin with a syn—
`chronisation code. No other code word or valid combination
`of code words can form the same bit pattern as the synchro-
`nisation codes. Thus, the synchronisation codes can be used
`for bit—stream error detection and re—synchronisation after bit
`errors. The more synchronisation codes that are added to the
`bit—stream the more error—robust coding becomes.
`
`[0025] Each GOB or slice is divided into macroblocks. As
`explained above, a macroblock comprises 16x16 pixels of
`luminance data and the spatially corresponding 8x8 pixels of
`chrominance data. In other words, an M13 comprises four
`8x8 blocks of luminance data and the two spatially corre—
`sponding 8x8 blocks of chrorninance data.
`
`[0026] A block comprises 8x8 pixels of luminance or
`chrorninance data. Block layer data consists of uniformly
`quantised discrete cosine transform coefficients, which are
`scanned in zig—zag order, processed with a run-length
`encoder and coded with variable length codes, as explained
`in detail in ITU—T recommendation H.263.
`
`[0027] One useful property of coded bit-streams is scal-
`ability. In the following, bit—rate scalability is described. The
`term bit—rate scalability refers to the ability of a compressed
`sequence to be decoded at different data rates. Acompressed
`sequence encoded so as to have bit-rate scalability can be
`streamed over channels with different bandwidths and can
`
`be decoded and played back in real—time at different receiv—
`ing terminals.
`
`[0028] Scalable multi—mcdia is typically ordered into hier—
`archical layers of data. A base layer contains an individual
`representation of a multi-media data, such as a video
`sequence and enhancement layers contain refinement data
`
`

`

`US 2002/0071485 A].
`
`Jun. 13, 2002
`
`which can be used in addition to the base layer. The quality
`of the multi—media clip improves progressively as enhance—
`ment layers are added to the base layer. Scalability may take
`many different forms including, but not limited to temporal,
`signal-to-ncise-ratio (SN R) and spatial scalability, all of
`which are described in funher detail below.
`
`[0029] Scalability is a desirable properly for heteroge-
`neous and error prone environments such as the Internet and
`wireless channels in cellular communications networks.
`
`This property is desirable in order to counter limitations
`such as constraints on bit rate, display resolution, network
`throughput and decoder complexity.
`
`In multi-point and broadcast multi-media applica-
`[0030]
`tions, constraints on network throughput may not be fore-
`seen at the time of encoding. Thus, it is advantageous to
`encode multi—media content to form a scalable bit—stream.
`
`An example of a scalable bit—stream being used in IP
`multi-casting is shown in FIG. 3. Each router (RI-R3) can
`strip the bit-stream according to its capabilities. In this
`example, the server S has a multi—media clip which can be
`scaled to at least three bit rates, 120 kbitfs, 60 kbitfs and 28
`kbitfs. In the case of a multi—cast transmission, where the
`same bit-stream is delivered to multiple clients at the same
`time with as few copies of the bit-stream being generated in
`the network as possible, it is beneficial from the point of
`view of network bandwidth to transmit a single, bit—rate—
`scalable bit-stream.
`
`If a sequence is downloaded and played back in
`[0031]
`different devices each having different processing powers,
`bit-rate scalability can be used in devices having lower
`processing power to provide a lower quality representation
`of the video sequence by decoding only a part of the
`bit—stream. Devices having higher processing power can
`decode and play the sequence with full quality. Additionally,
`bit-rate scalability means that the processing power needed
`for decoding a lower quality representation of the video
`sequence is lower than when decoding the full quality
`sequence. This can be viewed as a form of computational
`scalability.
`
`If a video sequence is pre—stored in a streaming
`[0032]
`server, and the server has to temporarily reduce the bit-rate
`at which it is being transmitted as a bit-stream, for example
`in order to avoid congestion in the network, it is advanta-
`geous if the server can reduce the bit—rate of the bit-stream
`whilst still transmitting a useable bit—stream. This is typi—
`cally achieved using bit-rate scalable coding.
`
`[0033] Scalability can also be used to improve error resil—
`ience in a transport system where layered coding is com-
`bined with transport prioritisation. The term transport pri-
`oritisation is used to describe mechanisms that provide
`different qualities of service in transport. These include
`unequal error protection, which provides different channel
`errorfloss rates, and assigning dilferent priorities to support
`different delay/loss requirements. For example,
`the base
`layer of a scalably encoded bit—stream may be delivered
`through a transmission channel with a high degree of error
`protection, whereas the enhancement layers may be trans—
`mitted in more error-prone channels.
`
`[0034] One problem with scalable multi—media coding is
`that it often suffers from a worse compression efficiency than
`non-scalable coding. A high-quality scalable video sequence
`
`generally requires more bandwidth than a non-scalable,
`single—layer video sequence of a corresponding quality.
`However, exceptions to this general
`rule do exist. For
`example, because lI-frames can be dropped from a com-
`pressed video sequence without adversely affecting the
`quality of subsequently coded pictures, they can be regarded
`as providing a form of temporal scalability. In other words,
`the bit—rate of a video sequence compressed to form a
`sequence of temporal predicted pictures including e.g. alter-
`nating P and I3 frames can be reduced by removing the
`B—frames. This has the effect of reducing the frame—rate of
`the compressed sequence. Hence the term temporal scalabil—
`ity. In many cases, the use of l3-l’rames may actually improve
`coding efliciency, especially at high frame rates and thus a
`compressed video sequence comprising ll-frames in addi-
`tion to P—frames may exhibit a higher compression efficiency
`than a sequence having equivalent quality encoded using
`only P—frames. Ilowever, the improvement in compression
`performance provided by B-frames is achieved at
`the
`expense of
`increased computational
`complexity and
`memory requirements. Additional delays are also intro—
`duced.
`
`Signal—to—Noise Ratio (SNR) scalability is illus—
`[0035]
`trated in FIG. 4. SNR scalability involves the creation of a
`multi-rate bit-stream. It allows for the recovery of coding
`errors, or differences, between an original picture and its
`reconstruction. This is achieved by using a finer quantiser to
`encode a difference picture in an enhancement layer. This
`additional
`information increases the SNR of the overall
`I'Cproduced picture.
`
`[0036] Spatial scalability allows for the creation of multi-
`resolution bit-streams to meet varying display requirements;Ir
`constraints. A spatially scalable structure is shown in FIG.
`5. It
`is similar to that used in SNR scalability. In spatial
`scalability, a spatial enhancement layer is used to recover the
`coding loss between an up-sampled version of the recon-
`structed layer used as a reference by the enhancement layer,
`that is the reference layer, and a higher resolution version of
`the original picture. For example, if the reference layer has
`a Quarter Common Intermediate Format (QCIF) resolution,
`176x144 pixels, and the enhancement layer has a Common
`Intermediate Format (CIlI') resolution, 352x288 pixels, the
`reference layer picture must be scaled accordingly such that
`the enhancement
`layer picture can be appropriately pre—
`dicted from it. According to H.263 the resolution is
`increased by a factor of two in the vertical direction only,
`horizontal direction only, or both the vertical and horimntal
`directions for a single enhancement
`layer. There can be
`multiple enhancement layers, each increasing picture reso—
`lution over that of the previous layer. Interpolation filters
`used to up-sample the reference layer picture are explicitly
`defined in H.263. Apart from the up-sampling process from
`the reference to the enhancement layer, the processing and
`syntax of a spatially scaled picture are identical to those of
`an SNR scaled picture. Spatial scalability provides increased
`spatial resolution over SNR scalability.
`
`In either SNR or spatial scalability, the enhance-
`[0037]
`ment layer pictures are referred to as Lil- or EP-pictures. If
`the enhancement layer picture is upwardly predicted from an
`INTRA picture in the reference layer, then the enhancement
`layer picture is referred to as an Enhancement—I (El) picture.
`In some cases, when reference layer pictures are poorly
`predicted, over-coding of static parts ofthe picture can occur
`
`

`

`US 2002/0071485 A1.
`
`Jun. 13, 2002
`
`in the enhancement layer, requiring an excessive bit rate. To
`avoid this problem, forward prediction is permitted in the
`enhancement layer. A picture that is forwardly predicted
`from a previous enhancement layer picture or upwardly
`predicted from a predicted picture in the reference layer is
`referred to as an Enhancement—P (EP) picture. Computing
`the average of both upwardly and forwardly predicted
`pictures can provide a bi—directional prediction option for
`EP-pictures. Upward prediction of El- and LIP-pictures from
`a reference layer picture implies that no motion vectors are
`required. In the case of forward prediction for EP—pictures,
`motion vectors are required.
`
`[0038] The scalability mode (Annex 0) of 11.263 specifies
`syntax to support
`temporal, SNR, and spatial scalability
`capabilities.
`
`[0039] One problem with conventional SNR scalability
`coding is termed drifting. Drifting refers to the impact of a
`transmission elTor. Avisual artefact caused by an en'or drifts
`temporally from the picture in which the error occurs. Due
`to the use of motion compensation, the area of the visual
`artefact may increase from picture to picture. In the case of
`scalable coding, the visual artefact also drifts from lower
`enhancement layers to higher layers. The effect of drifting
`can be explained with reference to FIG. 7 which shows
`conventional prediction relationships used in scalable cod—
`ing. Once an error or packet
`loss has occurred in an
`enhancement layer, it propagates to the end of a group of
`pictures (GOP), since the pictures are predicted from each
`other in sequence. In addition, since the enhancement layers
`are based on the base layer, an error in the base layer causes
`errors in the enhancement layers. Because prediction also
`occurs between the enhancement layers, a serious drifting
`problem can occur in the higher layers of subsequent pre-
`dicted frames. Even though there may subsequently be
`sufficient bandwidth to send data to correct an error, the
`decoder is not able to eliminate the error until the prediction
`chain is re-initialised by another INTRA picture representing
`the start of a new GDP.
`
`[0040] To deal with this problem, a form of scalability
`referred to as Fine (lranularity Scalability (FGS) has been
`developed. In FGS a low-quality base layer is coded using
`a hybrid predictive loop and an (additional) enhancement
`layer delivers the progressively encoded residue between the
`reconstructed base layer and the original frame. FGS has
`been proposed, for example, in MPEG-4 visual standardi-
`sation.
`
`[0041] An example of prediction relationships in line
`granularity scalable coding is shown in FIG. 6. In a line
`granularity scalable video coding scheme,
`the base—layer
`video is transmitted in a wellcontrolled channel (e.g. one
`with a high degree of error protection) to minimise error or
`packet-loss, in such a way that the base layer is encoded to
`fit into the minimum channel bandwidth. This minimum is
`the lowest bandwidth that may occur or may be encountered
`during operation. All enhancement layers in the prediction
`frames are coded based on the base layer in the reference
`frames. Thus, errors in the enhancement layer of one frame
`do not cause a drifting problem in the enhancement layers of
`subsequently predicted frames and the coding scheme can
`adapt to channel conditions. However, since prediction is
`always based on a low quality base-layer, the coding eff-
`ciency of FGS coding is not as good as, and is sometimes
`
`much worse than, conventional SNR scalability schemes
`such as those provided for in H.263 Annex 0.
`
`In order to combine the advantages of both FGS
`[0042]
`coding and conventional layered scalability coding, a hybrid
`coding scheme shown in FIG. 8 has been proposed which is
`called Progressive FGS (PFGS). There are two points to
`note. Firstly, in PFGS as many predictions as possible from
`the same layer are used to maintain coding efliciency.
`Secondly, a prediction path always uses prediction from a
`lower layer in the reference frame to enable error recovery
`and channel adaptation. The first point makes sure that, for
`a given video layer, motion prediction is as accurate as
`possible, thus maintaining coding efliciency. The second
`point makes sure that drifting is reduced in the case of
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket