throbber
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL. 17, NO. 9, SEPTEMBER 2007
`
`1103
`
`Overview of the Scalable Video Coding
`Extension of the H.264/AVC Standard
`
`Heiko Schwarz, Detlev Marpe, Member, IEEE, and Thomas Wiegand, Member, IEEE
`
`(Invited Paper)
`
`Abstract—With the introduction of the H.264/AVC video
`coding standard, significant improvements have recently been
`demonstrated in video compression capability. The Joint Video
`Team of the ITU-T VCEG and the ISO/IEC MPEG has now also
`standardized a Scalable Video Coding (SVC) extension of the
`H.264/AVC standard. SVC enables the transmission and decoding
`of partial bit streams to provide video services with lower tem-
`poral or spatial resolutions or reduced fidelity while retaining a
`reconstruction quality that is high relative to the rate of the partial
`bit streams. Hence, SVC provides functionalities such as graceful
`degradation in lossy transmission environments as well as bit
`rate, format, and power adaptation. These functionalities provide
`enhancements to transmission and storage applications. SVC has
`achieved significant improvements in coding efficiency with an
`increased degree of supported scalability relative to the scalable
`profiles of prior video coding standards. This paper provides an
`overview of the basic concepts for extending H.264/AVC towards
`SVC. Moreover, the basic tools for providing temporal, spatial,
`and quality scalability are described in detail and experimentally
`analyzed regarding their efficiency and complexity.
`
`Index Terms—H.264/AVC, MPEG-4, Scalable Video Coding
`(SVC), standards, video.
`
`I.
`
`INTRODUCTION
`
`DVANCES in video coding technology and standard-
`
`Aieaion ({1]-[6] along with the rapid developments and
`
`improvements of network infrastructures, storage capacity, and
`computing power are enabling an increasing number of video
`applications. Application areas today range from multimedia
`messaging, video telephony, and video conferencing over mo-
`bile TV, wireless and wired Internet video streaming, standard-
`and high-definition TV broadcasting to DVD, Blu-ray Disc,
`and HD DVD optical storage media. For these applications,
`a variety of video transmission and storage systems may be
`employed.
`Traditional digital video transmission and storage systems
`are based on H.222.0| MPEG-2 systems [7] for broadcasting
`services over satellite, cable, and terrestrial transmission chan-
`nels, and for DVD storage, or on H.320 [8] for conversational
`video conferencing services. These channels are typically char-
`
`Manuscript received October 6, 2006; revised July 15, 2007. This paper was
`recommended by Guest Editor T. Wiegand.
`The authors are with the Fraunhofer Institute for Telecommunications, Hein-
`tich Hertz Institute, 10587 Berlin, Germany (e-mail: hschwarz@hhi.hg.de;
`marpe@hhi.fhg.de; wiegand@bhi.fhg.de).
`Color versions of one or more ofthe figures in this paper are available online
`at http://ieeexplore.iecee.org.
`Digital Object Identifier 10.1 109/TCSVT.2007.905532
`
`acterized by a fixed spatio-temporal format of the video signal
`(SDTV or HDTV or CIF for H.320 video telephone). Their ap-
`plication behavior in such systems typicallyfalls into one of the
`two categories: it works or it does not work.
`Modern video transmission and storage systems using the In-
`ternet and mobile networks are typically based on RTP/IP [9] for
`real-time services (conversational and streaming) and on com-
`puter file formats like mp4 or 3gp. Most RTP/IP access networks
`are typically characterized by a wide range of connection quali-
`ties and receiving devices. The varying connection quality is re-
`sulting from adaptive resource sharing mechanisms ofthese net-
`works addressing the time varying data throughput requirements
`of a varying numberof users. The variety of devices with dif-
`ferent capabilities ranging from cell phones with small screens
`and restricted processing powerto high-end PCs with high-def-
`inition displays results from the continuous evolution of these
`endpoints.
`Scalable Video Coding (SVC) is a highly attractive solution
`to the problems posed by the characteristics of modern video
`transmission systems. The term “scalability” in this paperrefers
`to the removal ofparts of the video bit stream in order to adapt
`it to the various needs or preferences of end users as wellas to
`varying terminal capabilities or network conditions. The term
`SVCis used interchangeably in this paper for both the concept
`of SVC in general and for the particular new design that has
`been standardized as an extension of the H.264/AVC standard.
`
`The objective of the SVC standardization has been to enable the
`encoding of a high-quality video bit stream that contains one or
`more subset bit streams that can themselves be decoded with a
`
`complexity and reconstruction quality similar to that achieved
`using the existing H.264/AVC design with the same quantity of
`data as in the subset bit stream.
`SVC has been an active research and standardization area for
`
`at least 20 years. Theprior international video coding standards
`H.262 | MPEG-2 Video[3], H.263 [4], and MPEG-4 Visual [5]
`already include several tools by which the most important scala-
`bility modes can be supported. However, the scalable profiles of
`those standards have rarely been used. Reasonsfor that include
`the characteristics of traditional video transmission systems as
`well as the fact that the spatial and quality scalability features
`came along with a significant loss in coding efficiency as well
`as a large increase in decoder complexity as compared to the
`corresponding nonscalable profiles. It should be noted that two
`or more single-layer streams, i.e., nonscalable streams, can al-
`ways be transmitted by the method of simulcast, which in prin-
`ciple provides similar functionalities as a scalable bit stream,
`
`1051-8215/$25.00 © 2007 IEEE
`
`Authorized licensed use limited to: New York University. Downloaded on April 13,2010 at 13:04:43 UTC from IEEE Xplore. Restrictions apply.
`
`SAMSUNG-1010
`
`1
`
`SAMSUNG-1010
`
`

`

`1104
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL. 17, NO. 9, SEPTEMBER 2007
`
`althoughtypically at the costof a significant increasein bit rate.
`Moreover, the adaptation of a single stream can be achieved
`through transcoding, which is currently used in multipoint con-
`trol units in video conferencing systems or for streaming ser-
`vices in 3G systems. Hence, a scalable video codec has to com-
`pete against these alternatives.
`This paper describes the SVC extension of H.264/AVC andis
`organized as follows. Section II explains the fundamental scal-
`ability types and discusses some representative applications of
`SVC as well as their implications in terms of essential require-
`ments. Section III gives the history of SVC. Section IV briefly
`reviews basic design concepts of H.264/AVC. In Section V,
`the concepts for extending H.264/AVC toward na SVCstan-
`dard are described in detail and analyzed regarding effective-
`ness and complexity. The SVC high-level design is summarized
`in Section VI. For more detailed information about SVC, the
`reader is referred to the draft standard [10].
`
`Il. TYPES OF SCALABILITY, APPLICATIONS,
`AND REQUIREMENTS
`
`In general, a video bit stream is called scalable when parts of
`the stream can be removed in a waythat the resulting substream
`forms another valid bit stream for some target decoder, and the
`substream represents the source content with a reconstruction
`quality that is less than that of the complete original bit stream
`but is high when considering the lower quantity of remaining
`data. Bit streams that do not provide this property are referred
`to as single-layer bit streams. The usual modesofscalability are
`temporal, spatial, and quality scalability. Spatial scalability and
`temporal scalability describe cases in which subsets of the bit
`stream represent the source content with a reduced picture size
`(spatial resolution) or frame rate (temporal resolution), respec-
`tively. With quality scalability, the substream provides the same
`spatio-temporal resolution as the complete bit stream, but with
`a lowerfidelity—where fidelity is often informally referred to
`as signal-to-noise ratio (SNR). Quality scalability is also com-
`monly referred to as fidelity or SNR scalability. More rarely
`required scalability modes are region-of-interest (ROI) and ob-
`ject-based scalability, in which the substreams typically repre-
`sent spatially contiguous regions of the original picture area.
`The different types of scalability can also be combined, so that a
`multitude of representations with different spatio-temporal res-
`olutions and bit rates can be supported within a single scalable
`bit stream.
`
`Efficient SVC provides a numberof benefits in terms of ap-
`plications [11]-[13]—a few of which will be briefly discussed
`in the following. Consider, for instance, the scenario of a video
`transmission service with heterogeneousclients, where multiple
`bit streams of the same source contentdiffering in coded picture
`size, frame rate, and bit rate should be provided simultaneously.
`With the application of a properly configured SVC scheme, the
`source content has to be encoded only once—for the highest re-
`quired resolution andbit rate, resulting in a scalable bit stream
`from which representations with lower resolution and/or quality
`can be obtained by discarding selected data. For instance, a
`client with restricted resources (display resolution, processing
`
`power, or battery power) needs to decode only a part of the de-
`livered bit stream. Similarly, in a multicast scenario, terminals
`with different capabilities can be served by a single scalable bit
`stream.In an alternative scenario, an existing video format(like
`QVGA) can be extended in a backward compatible way by an
`enhancement video format (like VGA).
`Another benefit of SVC is that a scalable bit stream usually
`contains parts with different importance in terms of decoded
`video quality. This property in conjunction with unequal error
`protection is especially useful in any transmission scenario
`with unpredictable throughput variations and/or relatively high
`packet loss rates. By using a stronger protection of the more
`important information, error resilience with graceful degra-
`dation can be achieved up to a certain degree of transmission
`errors. Media-Aware Network Elements (MANEs), which re-
`ceive feedback messages aboutthe terminal capabilities and/or
`channel conditions, can remove the nonrequired parts from
`a scalable bit stream, before forwarding it. Thus, the loss of
`important transmission units due to congestion can be avoided
`and the overall error robustness of the video transmission
`
`service can be substantially improved.
`SVCis also highly desirable for surveillance applications, in
`which video sources not only need to be viewed on multiple
`devices ranging from high-definition monitors to videophones
`or PDAs, but also need to be stored and archived. With SVC,
`for instance, high-resolution/high-quality parts of a bit stream
`can ordinarily be deleted after some expiration time, so that only
`low-quality copies of the video are kept for long-term archival.
`The latter approach may also become an interesting feature in
`personal video recorders and home networking.
`Even though SVC schemes offer such a variety of valuable
`functionalities, the scalable profiles of existing standards have
`rarely been used in the past, mainly because spatial and quality
`scalability have historically come at the price of increased de-
`coder complexity and significantly decreased codingefficiency.
`In contrast to that, temporal scalability is often supported, e.g.,
`in H.264/AVC-based applications, but mainly because it comes
`along with a substantial coding efficiency improvement (cf.
`Section V-A.2).
`H.264/AVC is the most recent international video coding
`standard. It provides significantly improved coding efficiency
`in comparison to all prior standards [14]. H.264/AVC has
`attracted a lot of attention from industry and has been adopted
`by various application standards and is increasingly used in a
`broad variety of applications.It is expected that in the near-term
`future H.264/AVC will be commonly used in most video appli-
`cations. Given this high degree of adoption and deploymentof
`the new standard and taking into accountthe large investments
`that have already been taken place for preparing and developing
`H.264/AVC-based products, it is quite natural to now build a
`SVC scheme as an extension of H.264/AVC and to reuse its
`
`key features.
`Considering the needs of today’s and future video applica-
`tions as well as the experiences with scalable profiles in the past,
`the success of any future SVC standard critically depends on the
`following essential requirements.
`* Similar coding efficiency compared to single-layer
`coding—for each subsetofthe scalable bit stream.
`
`Authorized licensed use limited to: New York University. Downloaded on April 13,2010 at 13:04:43 UTC from IEEE Xplore. Restrictions apply.
`
`2
`
`2
`
`

`

`SCHWARZ et al.: OVERVIEW OF THE SCALABLE VIDEO CODING EXTENSION
`
`1105
`
`* Little increase in decoding complexity compared to single-
`layer decoding that scales with the decoded spatio-tem-
`poral resolution and bitrate.
`* Support of temporal, spatial, and quality scalability.
`¢ Support of a backward compatible base layer (H.264/AVC
`in this case).
`* Support of simple bit stream adaptations after encoding.
`In any case, the coding efficiency of scalable coding should
`be clearly superior to that of “simulcasting” the supported
`spatio-temporal resolutions andbit rates in separate bit streams.
`In comparison to single-layer coding,bit rate increases of 10%
`to 50% for the same fidelity might be tolerable depending on
`the specific needs of an application and the supported degree
`of scalability.
`This paper provides an overview how these requirements
`have been addressed in the design of the SVC extension of
`H.264/AVC,
`
`Ill. History oF SVC
`
`Hybrid video coding, as found in H.264/AVC [6] and all past
`video coding designs that are in widespread application use,
`is based on motion-compensated temporal differential pulse
`code modulation (DPCM) together with spatial decorrelating
`transformations [15]. DPCM is characterized by the use of
`synchronous prediction loops at the encoder and decoder.
`Differences between these prediction loops lead to a “drift”
`that can accumulate over time and produce annoyingartifacts.
`However, the scalability bit stream adaptation operation, i.e.,
`the removal of parts of the video bit stream can produce such
`differences.
`
`Subband or transform coding does not have the drift prop-
`erty of DPCM.Therefore, video coding techniques based on
`motion-compensated 3-D wavelet transforms have been studied
`extensively for use in SVC [16]-[19]. The progress in wavelet-
`based video coding caused MPEG to start an activity on ex-
`ploring this technology. As a result, MPEG issued a call for
`proposals forefficient SVC technology in October 2003 with
`the intention to develop a new SVC standard. Twelve of the
`14 submitted proposals in response to this call [20] represented
`scalable video codecs based on 3-D wavelet transforms, while
`the remaining two proposals were extensions of H.264/AVC
`[6]. After a six-month evaluation phase, in which several sub-
`jective tests for a variety of conditions were carried out and
`the proposals were carefully analyzed regarding their poten-
`tial for a successful future standard, the scalable extension of
`H.264/AVC as proposed in [21] was chosenas the starting point
`[22] of MPEG’s SVC project in October 2004.In January 2005,
`MPEG and VCEG agreed to jointly finalize the SVC project as
`an Amendment of H.264/AVCwithin the Joint Video Team.
`
`Although the initial design [21] included a wavelet-like
`decomposition structure in temporal direction,
`it was later
`removed from the SVC specification [10]. Reasons for that
`removal
`included drastically reduced encoder and decoder
`complexity and improvements in coding efficiency. It was
`shown that an adjustment of the DPCM prediction structure
`can lead to a significantly improved drift control as will be
`shown in the paper. Despite this change, most components of
`the proposal in [21] remained unchanged from the first model
`
`[22] to the latest draft [10] being augmented by methods for
`nondyadic scalability and interlaced processing which were not
`included in the initial design.
`
`TV. H.264/AVC Basics
`
`SVC was standardized as an extension of H.264/AVC. In
`
`order to keep the paper self-contained, the following brief
`description of H.264/AVC is limited to those key features
`that are relevant for understanding the concepts of extending
`H.264/AVC towards SVC. For more detailed information
`about H.264/AVC, the readeris referred to the standard [6] or
`corresponding overview papers [23]-[26].
`Conceptually,
`the design of H.264/AVC covers a Video
`Coding Layer (VCL) and a Network Abstraction Layer (NAL).
`While the VCL creates a coded representation of the source
`content, the NAL formats these data and provides header infor-
`mation in a waythat enables simple and effective customization
`of the use of VCL data for a broad variety of systems.
`
`A. Network Abstraction Layer (NAL)
`
`The coded video data are organized into NAL units, which
`are packets that each contains an integer number of bytes. A
`NAL unit starts with a one-byte header, which signals the type
`of the contained data. The remaining bytes represent payload
`data. NAL units are classified into VCL NAL units, which con-
`tain coded slices or coded slice data partitions, and non-VCL
`NAL units, which contain associated additional information.
`The most important non-VCL NAL units are parameter sets and
`Supplemental Enhancement Information (SEI). The sequence
`and picture parameter sets contain infrequently changing infor-
`mation for a video sequence. SEI messages are not required for
`decoding the samples of a video sequence. They provide addi-
`tional information which can assist the decoding process or re-
`lated processeslike bit stream manipulation or display. A set of
`consecutive NAL units with specific properties is referred to as
`an access unit. The decoding of an access unit results in exactly
`one decodedpicture. A set of consecutive access units with cer-
`tain propertiesis referred to as a coded video sequence. A coded
`video sequence represents an independently decodable part of a
`NAL unitbit stream. It always starts with an instantaneous de-
`coding refresh (IDR) access unit, which signals that the IDR ac-
`cess unit and all following access units can be decoded without
`decoding any previous pictures of the bit stream.
`
`B. Video Coding Layer (VCL)
`The VCL of H.264/AVC follows the so-called block-based
`hybrid video coding approach. Althoughits basic design is very
`similar to that of prior video coding standards such as H.261,
`MPEG-1 Video, H.262|MPEG-2 Video, H.263, or MPEG-4
`Visual, H.264/AVC includes new features that enable it to
`achieve a significant improvement in compression efficiency
`relative to any prior video coding standard [14]. The main dif-
`ference to previous standards is the largely increased flexibility
`and adaptability of H.264/AVC.
`The way pictures are partitioned into smaller coding units in
`H.264/AVC, however, follows the rather traditional concept of
`subdivision into macroblocks and slices. Each picture is par-
`titioned into macroblocks that each covers a rectangular pic-
`
`Authorized licensed use limited to: New York University. Downloaded on April 13,2010 at 13:04:43 UTC from IEEE Xplore. Restrictions apply.
`
`3
`
`3
`
`

`

`1106
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL. 17, NO. 9, SEPTEMBER 2007
`
`ture area of 16 x 16 luma samples and, in the case of video in
`4:2:0 chroma sampling format, 8 x 8 samples of each of the two
`chroma components. The samples of a macroblock are either
`spatially or temporally predicted, and the resulting prediction
`residual signal is represented using transform coding. The mac-
`roblocks of a picture are organizedin slices, each of which can
`be parsed independently ofother slices in a picture. Depending
`on the degree of freedom for generating the prediction signal,
`H.264/AVC supports three basic slice coding types.
`1) I-slice: intra-picture predictive coding using spatial predic-
`tion from neighboring regions,
`2) P-slice: intra-picture predictive coding and inter-picture
`predictive coding with one prediction signal for each pre-
`dicted region,
`3) B-slice: intra-picture predictive coding, inter-picture pre-
`dictive coding, and inter-picture bipredictive coding with
`two prediction signals that are combined with a weighted
`average to form the region prediction.
`For I-slices, H.264/AVC provides several directional spatial
`intra-prediction modes, in which the prediction signal is gener-
`ated by using neighboring samples of blocks that precede the
`block to be predicted in coding order. For the luma component,
`the intra-predictionis either applied to 4 x 4, 8 x 8, or 16 x 16
`blocks, whereas for the chroma components,it is always applied
`on a macroblockbasis.!
`
`For P- and B-slices, H.264/AVC additionally permits variable
`block size motion-compensated prediction with multiple refer-
`ence pictures [27]. The macroblock type signals the partitioning
`of a macroblockinto blocks of 16 x 16, 16 x 8,8 x 16, or8 x 8
`luma samples. When a macroblock type specifies partitioning
`into four 8 x 8 blocks, each of these so-called submacroblocks
`can be further split into 8 x 4,4 x 8, or4 x 4blocks, whichis in-
`dicated through the submacroblock type. For P-slices, one mo-
`tion vector is transmitted for each block. In addition, the used
`reference picture can be independently chosen for each 16 x 16,
`16 x 8, or 8 x 16 macroblockpartition or 8 x 8 submacroblock.
`It is signaled via a reference index parameter, which is an index
`into a list of reference pictures that is replicated at the decoder.
`In B-slices, two distinct reference picture lists are utilized,
`and for each 16 x 16, 16x 8, or 8 x 16 macroblock partition
`or 8 x 8 submacroblock, the prediction method can be selected
`between Jist 0, list 1, or biprediction. Whilelist 0 andlist 1 pre-
`diction refer to unidirectional prediction using a reference pic-
`ture of reference picture list 0 or 1, respectively, in the bipredic-
`tive mode,the prediction signal is formed by a weighted sum of
`a list 0 andlist 1 prediction signal. In addition, special modes
`as so-called direct modes in B-slices and skip modes in P- and
`B-slices are provided, in which such data as motion vectors
`and reference indexes are derived from previously transmitted
`information.
`
`For transform coding, H.264/AVC specifies a set of integer
`transforms ofdifferent block sizes. While for intra-macroblocks
`the transform size is directly coupled to the intra-prediction
`blocksize, the luma signal of motion-compensated macroblocks
`that do not contain blocks smaller than 8 x 8 can be coded by
`
`1Some details of the profiles of H.264/AVC that were designed primarily to
`serve the needs of professional application environments are neglected in this
`description, particularly in relation to chromaprocessing and range ofstep sizes.
`
`using either a 4 x 4 or 8 X 8 transform. For the chroma com-
`ponents a two-stage transform, consisting of 4 x 4 transforms
`and a Hadamard transform of the resulting DC coefficients is
`employed.! A similar hierarchical transform is also used for the
`luma componentof macroblocks codedin intra 16 x 16 mode.
`All inverse transforms are specified by exact integer operations,
`so that inverse-transform mismatches are avoided. H.264/AVC
`
`uses uniform reconstruction quantizers. One of 52 quantization
`step sizes! can be selected for each macroblock by the quantiza-
`tion parameter QP. The scaling operations for the quantization
`step sizes are arranged with logarithmic step size increments,
`such that an increment of the QP by 6 corresponds to a dou-
`bling of quantization step size.
`For reducing blockingartifacts, which are typically the most
`disturbing artifacts in block-based coding, H.264/AVCspecifies
`an adaptive deblockingfilter, which operates within the motion-
`compensated prediction loop.
`H.264/AVC supports two methods of entropy coding, which
`both use context-based adaptivity to improve performance rel-
`ative to prior standards. While context-based adaptive variable-
`length coding (CAVLC)uses variable-length codesandits adap-
`tivity is restricted to the coding of transform coefficient levels,
`context-based adaptive binary arithmetic coding (CABAC) uti-
`lizes arithmetic coding and a more sophisticated mechanism for
`employing statistical dependencies, which leads to typical bit
`rate savings of 10%—15% relative to CAVLC.
`In addition to the increased flexibility on the macroblock
`level, H.264/AVC also allows much more flexibility on a picture
`and sequence level compared to prior video coding standards.
`Here we mainly refer to reference picture memory control.
`In H.264/AVC, the coding and display order of pictures is
`completely decoupled. Furthermore, any picture can be marked
`as reference picture for use in motion-compensated prediction
`of following pictures, independent of the slice coding types.
`The behavior of the decoded picture buffer (DPB), which can
`hold up to 16 frames (depending on the used conformance
`point and picture size), can be adaptively controlled by memory
`management control operation (MMCO) commands, and the
`reference picture lists that are used for coding of P- or B-slices
`can be arbitrarily constructed from the pictures available in the
`DPBvia reference picture list reordering (RPLR) commands.
`In order to enable a flexible partitioning of a picture
`into slices,
`the concept of slice groups was introduced in
`H.264/AVC. The macroblocks of a picture can be arbitrarily
`partitioned into slice groups via a slice group map. The slice
`group map, which is specified by the content of the picture
`parameter set and some slice header information, assigns a
`unique slice group identifier to each macroblock ofa picture.
`And each slice is obtained by scanning the macroblocks of
`a picture that have the same slice group identifier as thefirst
`macroblock of the slice in raster-scan order. Similar to prior
`video coding standards, a picture comprises the set of slices
`representing a complete frame or one field of a frame (such
`that, e.g., an interlaced-scan picture can be either coded as a
`single frame picture or two separate field pictures). Addition-
`ally, H.264/AVC supports a macroblock-adaptive switching
`between frame and field coding. For that, a pair of vertically
`adjacent macroblocks is considered as a single coding unit,
`
`Authorized licensed use limited to: New York University. Downloaded on April 13,2010 at 13:04:43 UTC from IEEE Xplore. Restrictions apply.
`
`4
`
`4
`
`

`

`SCHWARZ et al.: OVERVIEW OF THE SCALABLE VIDEO CODING EXTENSION
`
`1107
`
`
`
`which can be either transmitted as two spatially neighboring
`frame macroblocks, or as interleaved top and a bottom field
`macroblocks.
`
`V. BASIC CONCEPTS FOR EXTENDING H.264/AVC
`TOWARDS AN SVC STANDARD
`
`Apart from the required support of all commontypesofscal-
`ability, the most important design criteria for a successful SVC
`standard are coding efficiency and complexity, as was noted
`in Section II. Since SVC was developed as an extension of
`H.264/AVC with all ofits well-designed core coding tools being
`inherited, oneofthe design principles of SVC was that new tools
`should only be added if necessary for efficiently supporting the
`required types of scalability.
`
`A. Temporal Scalability
`
`A bit stream provides temporal scalability when the set of
`corresponding access units can be partitioned into a temporal
`base layer and one or more temporal enhancementlayers with
`the following property. Let the temporal layers be identified by a
`temporal layer identifier J’, whichstarts from 0 for the base layer
`and is increased by 1 from one temporal layer to the next. Then
`for each natural number é, the bit stream that is obtained by
`removingall access units of all temporal layers with a temporal
`layer identifier T greater than k forms another valid bit stream
`for the given decoder.
`For hybrid video codecs, temporal scalability can generally
`be enabled byrestricting motion-compensated prediction to
`reference pictures with a temporal layer identifier that is less
`than or equal to the temporal layer identifier of the picture to
`be predicted. The prior video coding standards MPEG-1 [2],
`H.262 | MPEG-2 Video [3], H.263 [4], and MPEG-4 Visual [5]
`all support temporal scalability to some degree. H.264/AVC
`[6] provides a significantly increased flexibility for temporal
`scalability because of its reference picture memory control. It
`allows the coding of picture sequences with arbitrary temporal
`dependencies, whichare only restricted by the maximum usable
`DPBsize. Hence, for supporting temporal scalability with a
`reasonable numberof temporal layers, no changesto the design
`of H.264/AVC were required. The only related change in SVC
`refers to the signaling of temporal layers, which is described in
`Section VI.
`
`I) Hierarchical Prediction Structures: Temporal scalability
`with dyadic temporal enhancement layers can be very efficiently
`provided with the concept of hierarchical B-pictures [28], [29]
`as illustrated in Fig. 1(a).2 The enhancementlayer pictures are
`typically coded as B-pictures, where the reference picture lists 0
`and 1 are restricted to the temporally preceding and succeeding
`picture, respectively, with a temporal layer identifier less than
`the temporal layer identifier of the predicted picture. Each set
`of temporal layers {To,..., T,,} can be decoded independently
`of all layers with a temporal layer identifier T > :. In the fol-
`lowing,the set of pictures between two successive pictures of
`
`2As described above, neitherP- or B-slices are directly coupled with the man-
`agement of reference pictures in H.264/AVC. Hence, backward prediction is not
`necessarily coupled with the use of B-slices and the temporal coding structure
`of Fig. 1(a)
`also be realized using P-slices resulting in a structure that is
`often called hierarchical P-pictures.
`
`
`
`l1°12 139
`To
`Tz
`Tz
`(b)
`
`o
`Ty
`
`3
`Tz
`
`4
`Tz
`
`2
`Ts
`
`6
`Tz
`
`fF
`12
`
`5
`T,
`
`8
`Tz
`
`@
`Tp
`
`11
`Ty
`
`15
`Tz
`
`16
`Ta
`
`14
`Ty
`
`17
`Tr
`
`18
`Te
`
`10
`To
`
`15
`12 13 14
`11
`10
`9
`Ts Tz Ts Ti
`Ts Tz Ts
`
`16
`To
`
`8T
`
`o
`
`7 T
`
`s
`
`5
`Ts
`
`6
`Tz
`
`4 T
`
`h
`
`3T
`
`s
`
`2 T
`
`z
`
`14
`Ts
`
`o
`To
`
`(c)
`
`Fig. 1. Hierarchical prediction structures for enabling temporal scalability.
`(a) Coding with hierarchical B-pictures. (b) Nondyadic hierarchical prediction
`structure. (c) Hierarchical prediction structure with a structural encoding/de-
`coding delay of zero. The numbers directly below the pictures specify the
`coding order, the symbols T,, specify the temporal layers with k representing
`the corresponding temporal layer identifier.
`
`the temporal base layer together with the succeeding base layer
`picture is referred to as a group ofpictures (GOP).
`Although the described prediction structure with hierarchical
`B-pictures provides temporal scalability and also shows excel-
`lent coding efficiency as will be demonstrated later, it repre-
`sents a special case. In general, hierarchical prediction struc-
`tures for enabling temporal scalability can always be combined
`with the multiple reference picture concept of H.264/AVC. This
`means that the reference picture lists can be constructed by using
`more than one reference picture, and they can also include pic-
`tures with the same temporal level as the picture to be pre-
`dicted. Furthermore, hierarchical prediction structures are not
`restricted to the dyadic case. As an example, Fig. 1(b) illustrates
`a nondyadic hierarchical prediction structure, which provides 2
`independently decodable subsequences with 1/9th and 1/3rd of
`the full frame rate. It should further be notedthat it is possible to
`arbitrarily modify the prediction structure of the temporal base
`layer, e.g., in order to increase the coding efficiency. The chosen
`temporal prediction structure does not need to be constant over
`time.
`
`Note thatit is possible to arbitrarily adjust the structural delay
`between encoding and decoding a picture by restricting mo-
`tion-compensated prediction from pictures that follow the pic-
`ture to be predicted in display order. As an example, Fig. 1(c)
`shows a hierarchical prediction structure, which does not em-
`ploy motion-compensated prediction from pictures in the future.
`Although this structure provides the same degree of temporal
`scalability as the prediction structure of Fig. 1(a), its structural
`delay is equal to zero compared to 7 pictures for the prediction
`structure in Fig. 1(a). However, such low-delay structures typi-
`cally decrease coding efficiency.
`
`Authorized licensed use limited to: New York Univer

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket