throbber
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012
`
`1649
`
`Overview of the High Efficiency Video Coding
`(HEVC) Standard
`
`Gary J. Sullivan, Fellow, IEEE, Jens-Rainer Ohm, Member, IEEE, Woo-Jin Han, Member, IEEE, and
`Thomas Wiegand, Fellow, IEEE
`
`Abstract—High Efficiency Video Coding (HEVC) is currently
`being prepared as the newest video coding standard of the
`ITU-T Video Coding Experts Group and the ISO/IEC Moving
`Picture Experts Group. The main goal of the HEVC standard-
`ization effort is to enable significantly improved compression
`performance relative to existing standards—in the range of 50%
`bit-rate reduction for equal perceptual video quality. This paper
`provides an overview of the technical features and characteristics
`of the HEVC standard.
`
`Index Terms—Advanced video coding (AVC), H.264, High
`Efficiency Video Coding (HEVC), Joint Collaborative Team
`on Video Coding (JCT-VC), Moving Picture Experts Group
`(MPEG), MPEG-4, standards, Video Coding Experts Group
`(VCEG), video compression.
`
`I. Introduction
`
`Video coding standards have evolved primarily through the
`development of the well-known ITU-T and ISO/IEC standards.
`The ITU-T produced H.261 [2] and H.263 [3], ISO/IEC
`produced MPEG-1 [4] and MPEG-4 Visual [5], and the two
`organizations jointly produced the H.262/MPEG-2 Video [6]
`and H.264/MPEG-4 Advanced Video Coding (AVC) [7] stan-
`dards. The two standards that were jointly produced have had a
`particularly strong impact and have found their way into a wide
`variety of products that are increasingly prevalent in our daily
`lives. Throughout this evolution, continued efforts have been
`made to maximize compression capability and improve other
`characteristics such as data loss robustness, while considering
`the computational resources that were practical for use in prod-
`ucts at the time of anticipated deployment of each standard.
`The major video coding standard directly preceding the
`HEVC project was H.264/MPEG-4 AVC, which was initially
`developed in the period between 1999 and 2003, and then
`was extended in several important ways from 2003–2009.
`H.264/MPEG-4 AVC has been an enabling technology for dig-
`ital video in almost every area that was not previously covered
`by H.262/MPEG-2 Video and has substantially displaced the
`older standard within its existing application domains. It is
`widely used for many applications, including broadcast of high
`definition (HD) TV signals over satellite, cable, and terrestrial
`transmission systems, video content acquisition and editing
`systems, camcorders, security applications, Internet and mo-
`bile network video, Blu-ray Discs, and real-time conversa-
`tional applications such as video chat, video conferencing, and
`telepresence systems.
`the grow-
`However, an increasing diversity of services,
`ing popularity of HD video, and the emergence of beyond-
`HD formats (e.g., 4k×2k or 8k×4k resolution) are creating
`even stronger needs for coding efficiency superior to H.264/
`MPEG-4 AVC’s capabilities. The need is even stronger when
`higher resolution is accompanied by stereo or multiview
`capture and display. Moreover, the traffic caused by video
`applications targeting mobile devices and tablet PCs, as well
`as the transmission needs for video-on-demand services, are
`imposing severe challenges on today’s networks. An increased
`desire for higher quality and resolutions is also arising in
`mobile applications.
`HEVC has been designed to address essentially all existing
`applications of H.264/MPEG-4 AVC and to particularly focus
`on two key issues: increased video resolution and increased
`use of parallel processing architectures. The syntax of HEVC
`1051-8215/$31.00 c(cid:2) 2012 IEEE
`
`T HE High Efficiency Video Coding (HEVC) standard is
`
`the most recent joint video project of the ITU-T Video
`Coding Experts Group (VCEG) and the ISO/IEC Moving
`Picture Experts Group (MPEG) standardization organizations,
`working together in a partnership known as the Joint Col-
`laborative Team on Video Coding (JCT-VC) [1]. The first
`edition of the HEVC standard is expected to be finalized in
`January 2013, resulting in an aligned text that will be published
`by both ITU-T and ISO/IEC. Additional work is planned to
`extend the standard to support several additional application
`scenarios, including extended-range uses with enhanced pre-
`cision and color format support, scalable video coding, and
`3-D/stereo/multiview video coding. In ISO/IEC, the HEVC
`standard will become MPEG-H Part 2 (ISO/IEC 23008-2)
`and in ITU-T it is likely to become ITU-T Recommendation
`H.265.
`
`Manuscript received May 25, 2012; revised August 22, 2012; accepted
`August 24, 2012. Date of publication October 2, 2012; date of current
`version January 8, 2013. This paper was recommended by Associate Editor
`H. Gharavi. (Corresponding author: W.-J. Han.)
`G. J. Sullivan is with Microsoft Corporation, Redmond, WA 98052 USA
`(e-mail: garysull@microsoft.com).
`J.-R. Ohm is with the
`Institute of Communication Engineering,
`RWTH Aachen University, Aachen
`52056, Germany
`(e-mail:
`ohm@ient.rwth-aachen.de).
`W.-J. Han is with the Department of Software Design and Management,
`Gachon University, Seongnam 461-701, Korea (e-mail: hurumi@gmail.com).
`T. Wiegand is with the Fraunhofer Institute for Telecommunications, Hein-
`rich Hertz Institute, Berlin 10587, Germany, and also with the Berlin Institute
`of Technology, Berlin 10587, Germany (e-mail: twiegand@ieee.org).
`Color versions of one or more of the figures in this paper are available
`online at http://ieeexplore.ieee.org.
`Digital Object Identifier 10.1109/TCSVT.2012.2221191
`
`Virginia Innovation Sciences, Ex. 2004.
`
`

`

`1650
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012
`
`is generic and should also be generally suited for other
`applications that are not specifically mentioned above.
`As has been the case for all past ITU-T and ISO/IEC video
`coding standards, in HEVC only the bitstream structure and
`syntax is standardized, as well as constraints on the bitstream
`and its mapping for the generation of decoded pictures. The
`mapping is given by defining the semantic meaning of syntax
`elements and a decoding process such that every decoder
`conforming to the standard will produce the same output
`when given a bitstream that conforms to the constraints of the
`standard. This limitation of the scope of the standard permits
`maximal freedom to optimize implementations in a manner
`appropriate to specific applications (balancing compression
`quality, implementation cost, time to market, and other con-
`siderations). However, it provides no guarantees of end-to-
`end reproduction quality, as it allows even crude encoding
`techniques to be considered conforming.
`To assist the industry community in learning how to use the
`standard, the standardization effort not only includes the de-
`velopment of a text specification document, but also reference
`software source code as an example of how HEVC video can
`be encoded and decoded. The draft reference software has been
`used as a research tool for the internal work of the committee
`during the design of the standard, and can also be used as a
`general research tool and as the basis of products. A standard
`test data suite is also being developed for testing conformance
`to the standard.
`This paper is organized as follows. Section II highlights
`some key features of the HEVC coding design. Section III
`explains the high-level syntax and the overall structure of
`HEVC coded data. The HEVC coding technology is then
`described in greater detail in Section IV. Section V explains
`the profile, tier, and level design of HEVC. Since writing an
`overview of a technology as substantial as HEVC involves a
`significant amount of summarization, the reader is referred
`to [1] for any omitted details. The history of the HEVC
`standardization effort is discussed in Section VI.
`
`II. HEVC Coding Design and Feature Highlights
`
`The HEVC standard is designed to achieve multiple goals,
`including coding efficiency, ease of transport system integra-
`tion and data loss resilience, as well as implementability using
`parallel processing architectures. The following subsections
`briefly describe the key elements of the design by which
`these goals are achieved, and the typical encoder operation
`that would generate a valid bitstream. More details about the
`associated syntax and the decoding process of the different
`elements are provided in Sections III and IV.
`
`A. Video Coding Layer
`The video coding layer of HEVC employs the same hy-
`brid approach (inter-/intrapicture prediction and 2-D transform
`coding) used in all video compression standards since H.261.
`Fig. 1 depicts the block diagram of a hybrid video encoder,
`which could create a bitstream conforming to the HEVC
`standard.
`
`An encoding algorithm producing an HEVC compliant
`bitstream would typically proceed as follows. Each picture
`is split into block-shaped regions, with the exact block par-
`titioning being conveyed to the decoder. The first picture
`of a video sequence (and the first picture at each clean
`random access point into a video sequence) is coded using
`only intrapicture prediction (that uses some prediction of data
`spatially from region-to-region within the same picture, but has
`no dependence on other pictures). For all remaining pictures
`of a sequence or between random access points, interpicture
`temporally predictive coding modes are typically used for
`most blocks. The encoding process for interpicture prediction
`consists of choosing motion data comprising the selected
`reference picture and motion vector (MV) to be applied for
`predicting the samples of each block. The encoder and decoder
`generate identical interpicture prediction signals by applying
`motion compensation (MC) using the MV and mode decision
`data, which are transmitted as side information.
`The residual signal of the intra- or interpicture prediction,
`which is the difference between the original block and its pre-
`diction, is transformed by a linear spatial transform. The trans-
`form coefficients are then scaled, quantized, entropy coded,
`and transmitted together with the prediction information.
`The encoder duplicates the decoder processing loop (see
`gray-shaded boxes in Fig. 1) such that both will generate
`identical predictions for subsequent data. Therefore, the quan-
`tized transform coefficients are constructed by inverse scaling
`and are then inverse transformed to duplicate the decoded
`approximation of the residual signal. The residual is then
`added to the prediction, and the result of that addition may
`then be fed into one or two loop filters to smooth out artifacts
`induced by block-wise processing and quantization. The final
`picture representation (that is a duplicate of the output of the
`decoder) is stored in a decoded picture buffer to be used for
`the prediction of subsequent pictures. In general, the order of
`encoding or decoding processing of pictures often differs from
`the order in which they arrive from the source; necessitating a
`distinction between the decoding order (i.e., bitstream order)
`and the output order (i.e., display order) for a decoder.
`Video material to be encoded by HEVC is generally ex-
`pected to be input as progressive scan imagery (either due to
`the source video originating in that format or resulting from
`deinterlacing prior to encoding). No explicit coding features
`are present in the HEVC design to support the use of interlaced
`scanning, as interlaced scanning is no longer used for displays
`and is becoming substantially less common for distribution.
`However, a metadata syntax has been provided in HEVC to
`allow an encoder to indicate that interlace-scanned video has
`been sent by coding each field (i.e., the even or odd numbered
`lines of each video frame) of interlaced video as a separate
`picture or that it has been sent by coding each interlaced frame
`as an HEVC coded picture. This provides an efficient method
`of coding interlaced video without burdening decoders with a
`need to support a special decoding process for it.
`In the following, the various features involved in hybrid
`video coding using HEVC are highlighted as follows.
`1) Coding tree units and coding tree block (CTB) structure:
`The core of the coding layer in previous standards was
`
`Virginia Innovation Sciences, Ex. 2004.
`
`

`

`SULLIVAN et al.: OVERVIEW OF THE HEVC STANDARD
`
`1651
`
`Fig. 1. Typical HEVC video encoder (with decoder modeling elements shaded in light gray).
`
`the macroblock, containing a 16×16 block of luma sam-
`ples and, in the usual case of 4:2:0 color sampling, two
`corresponding 8×8 blocks of chroma samples; whereas
`the analogous structure in HEVC is the coding tree unit
`(CTU), which has a size selected by the encoder and
`can be larger than a traditional macroblock. The CTU
`consists of a luma CTB and the corresponding chroma
`CTBs and syntax elements. The size L×L of a luma
`CTB can be chosen as L = 16, 32, or 64 samples, with
`the larger sizes typically enabling better compression.
`HEVC then supports a partitioning of the CTBs into
`smaller blocks using a tree structure and quadtree-like
`signaling [8].
`2) Coding units (CUs) and coding blocks (CBs): The
`quadtree syntax of the CTU specifies the size and
`positions of its luma and chroma CBs. The root of the
`quadtree is associated with the CTU. Hence, the size of
`the luma CTB is the largest supported size for a luma
`CB. The splitting of a CTU into luma and chroma CBs
`is signaled jointly. One luma CB and ordinarily two
`chroma CBs, together with associated syntax, form a
`coding unit (CU). A CTB may contain only one CU or
`may be split to form multiple CUs, and each CU has an
`associated partitioning into prediction units (PUs) and a
`tree of transform units (TUs).
`3) Prediction units and prediction blocks (PBs): The de-
`cision whether to code a picture area using interpicture
`or intrapicture prediction is made at the CU level. A
`PU partitioning structure has its root at the CU level.
`
`Depending on the basic prediction-type decision, the
`luma and chroma CBs can then be further split in size
`and predicted from luma and chroma prediction blocks
`(PBs). HEVC supports variable PB sizes from 64×64
`down to 4×4 samples.
`4) TUs and transform blocks: The prediction residual is
`coded using block transforms. A TU tree structure has
`its root at the CU level. The luma CB residual may be
`identical to the luma transform block (TB) or may be
`further split into smaller luma TBs. The same applies to
`the chroma TBs. Integer basis functions similar to those
`of a discrete cosine transform (DCT) are defined for the
`square TB sizes 4×4, 8×8, 16×16, and 32×32. For the
`4×4 transform of luma intrapicture prediction residuals,
`an integer transform derived from a form of discrete sine
`transform (DST) is alternatively specified.
`5) Motion vector signaling: Advanced motion vector pre-
`diction (AMVP) is used, including derivation of several
`most probable candidates based on data from adjacent
`PBs and the reference picture. A merge mode for MV
`coding can also be used, allowing the inheritance of
`MVs from temporally or spatially neighboring PBs.
`Moreover, compared to H.264/MPEG-4 AVC, improved
`skipped and direct motion inference are also specified.
`6) Motion compensation: Quarter-sample precision is used
`for the MVs, and 7-tap or 8-tap filters are used for
`interpolation of fractional-sample positions (compared
`to six-tap filtering of half-sample positions followed
`by linear interpolation for quarter-sample positions in
`
`Virginia Innovation Sciences, Ex. 2004.
`
`

`

`1652
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012
`
`H.264/MPEG-4 AVC). Similar to H.264/MPEG-4 AVC,
`multiple reference pictures are used. For each PB, either
`one or two motion vectors can be transmitted, resulting
`either in unipredictive or bipredictive coding, respec-
`tively. As in H.264/MPEG-4 AVC, a scaling and offset
`operation may be applied to the prediction signal(s) in
`a manner known as weighted prediction.
`7) Intrapicture prediction: The decoded boundary samples
`of adjacent blocks are used as reference data for spa-
`tial prediction in regions where interpicture prediction
`is not performed. Intrapicture prediction supports 33
`directional modes (compared to eight such modes in
`H.264/MPEG-4 AVC), plus planar (surface fitting) and
`DC (flat) prediction modes. The selected intrapicture
`prediction modes are encoded by deriving most probable
`modes (e.g., prediction directions) based on those of
`previously decoded neighboring PBs.
`8) Quantization control: As in H.264/MPEG-4 AVC, uni-
`form reconstruction quantization (URQ)
`is used in
`HEVC, with quantization scaling matrices supported for
`the various transform block sizes.
`9) Entropy coding: Context adaptive binary arithmetic cod-
`ing (CABAC) is used for entropy coding. This is sim-
`ilar to the CABAC scheme in H.264/MPEG-4 AVC,
`but has undergone several
`improvements to improve
`its throughput speed (especially for parallel-processing
`architectures) and its compression performance, and to
`reduce its context memory requirements.
`10) In-loop deblocking filtering: A deblocking filter similar
`to the one used in H.264/MPEG-4 AVC is operated
`within the interpicture prediction loop. However, the
`design is simplified in regard to its decision-making and
`filtering processes, and is made more friendly to parallel
`processing.
`11) Sample adaptive offset (SAO): A nonlinear amplitude
`mapping is introduced within the interpicture prediction
`loop after the deblocking filter. Its goal is to better
`reconstruct the original signal amplitudes by using a
`look-up table that
`is described by a few additional
`parameters that can be determined by histogram analysis
`at the encoder side.
`
`B. High-Level Syntax Architecture
`A number of design aspects new to the HEVC standard
`improve flexibility for operation over a variety of applications
`and network environments and improve robustness to data
`losses. However, the high-level syntax architecture used in
`the H.264/MPEG-4 AVC standard has generally been retained,
`including the following features.
`1) Parameter set structure: Parameter sets contain informa-
`tion that can be shared for the decoding of several re-
`gions of the decoded video. The parameter set structure
`provides a robust mechanism for conveying data that are
`essential to the decoding process. The concepts of se-
`quence and picture parameter sets from H.264/MPEG-4
`AVC are augmented by a new video parameter set (VPS)
`structure.
`
`2) NAL unit syntax structure: Each syntax structure is
`placed into a logical data packet called a network
`abstraction layer (NAL) unit. Using the content of a two-
`byte NAL unit header, it is possible to readily identify
`the purpose of the associated payload data.
`3) Slices: A slice is a data structure that can be decoded
`independently from other slices of the same picture, in
`terms of entropy coding, signal prediction, and residual
`signal reconstruction. A slice can either be an entire
`picture or a region of a picture. One of the main
`purposes of slices is resynchronization in the event of
`data losses. In the case of packetized transmission, the
`maximum number of payload bits within a slice is
`typically restricted, and the number of CTUs in the slice
`is often varied to minimize the packetization overhead
`while keeping the size of each packet within this bound.
`4) Supplemental enhancement information (SEI) and video
`usability information (VUI) metadata: The syntax in-
`cludes support for various types of metadata known as
`SEI and VUI. Such data provide information about the
`timing of the video pictures, the proper interpretation of
`the color space used in the video signal, 3-D stereoscopic
`frame packing information, other display hint informa-
`tion, and so on.
`
`C. Parallel Decoding Syntax and Modified Slice Structuring
`Finally, four new features are introduced in the HEVC stan-
`dard to enhance the parallel processing capability or modify
`the structuring of slice data for packetization purposes. Each
`of them may have benefits in particular application contexts,
`and it is generally up to the implementer of an encoder or
`decoder to determine whether and how to take advantage of
`these features.
`1) Tiles: The option to partition a picture into rectangular
`regions called tiles has been specified. The main pur-
`pose of tiles is to increase the capability for parallel
`processing rather than provide error resilience. Tiles are
`independently decodable regions of a picture that are
`encoded with some shared header information. Tiles can
`additionally be used for the purpose of spatial random
`access to local regions of video pictures. A typical
`tile configuration of a picture consists of segmenting
`the picture into rectangular regions with approximately
`equal numbers of CTUs in each tile. Tiles provide
`parallelism at a more coarse level of granularity (pic-
`ture/subpicture), and no sophisticated synchronization of
`threads is necessary for their use.
`2) Wavefront parallel processing: When wavefront parallel
`processing (WPP) is enabled, a slice is divided into
`rows of CTUs. The first row is processed in an ordinary
`way, the second row can begin to be processed after
`only two CTUs have been processed in the first row,
`the third row can begin to be processed after only
`two CTUs have been processed in the second row,
`and so on. The context models of the entropy coder
`in each row are inferred from those in the preceding
`row with a two-CTU processing lag. WPP provides a
`form of processing parallelism at a rather fine level of
`
`Virginia Innovation Sciences, Ex. 2004.
`
`

`

`SULLIVAN et al.: OVERVIEW OF THE HEVC STANDARD
`
`1653
`
`granularity, i.e., within a slice. WPP may often provide
`better compression performance than tiles (and avoid
`some visual artifacts that may be induced by using tiles).
`3) Dependent slice segments: A structure called a de-
`pendent slice segment allows data associated with a
`particular wavefront entry point or tile to be carried in
`a separate NAL unit, and thus potentially makes that
`data available to a system for fragmented packetization
`with lower latency than if it were all coded together in
`one slice. A dependent slice segment for a wavefront
`entry point can only be decoded after at least part of
`the decoding process of another slice segment has been
`performed. Dependent slice segments are mainly useful
`in low-delay encoding, where other parallel tools might
`penalize compression performance.
`In the following two sections, a more detailed description
`of the key features is given.
`
`III. High-Level Syntax
`
`The high-level syntax of HEVC contains numerous elements
`that have been inherited from the NAL of H.264/MPEG-4
`AVC. The NAL provides the ability to map the video coding
`layer (VCL) data that represent the content of the pictures
`onto various transport layers, including RTP/IP, ISO MP4,
`and H.222.0/MPEG-2 Systems, and provides a framework
`for packet loss resilience. For general concepts of the NAL
`design such as NAL units, parameter sets, access units, the
`byte stream format, and packetized formatting, please refer
`to [9]–[11].
`NAL units are classified into VCL and non-VCL NAL
`units according to whether they contain coded pictures or
`other associated data, respectively. In the HEVC standard,
`several VCL NAL unit types identifying categories of pictures
`for decoder initialization and random-access purposes are
`included. Table I lists the NAL unit types and their associated
`meanings and type classes in the HEVC standard.
`The following subsections present a description of the new
`capabilities supported by the high-level syntax.
`
`A. Random Access and Bitstream Splicing Features
`The new design supports special features to enable random
`access and bitstream splicing. In H.264/MPEG-4 AVC, a
`bitstream must always start with an IDR access unit. An
`IDR access unit contains an independently coded picture—
`i.e., a coded picture that can be decoded without decoding
`any previous pictures in the NAL unit stream. The presence
`of an IDR access unit indicates that no subsequent picture
`in the bitstream will require reference to pictures prior to the
`picture that it contains in order to be decoded. The IDR picture
`is used within a coding structure known as a closed GOP (in
`which GOP stands for group of pictures).
`The new clean random access (CRA) picture syntax speci-
`fies the use of an independently coded picture at the location
`of a random access point (RAP), i.e., a location in a bitstream
`at which a decoder can begin successfully decoding pictures
`without needing to decode any pictures that appeared earlier
`in the bitstream, which supports an efficient temporal coding
`
`TABLE I
`NAL Unit Types, Meanings, and Type Classes
`
`Type
`0, 1
`2, 3
`4, 5
`6, 7
`8, 9
`10–15
`16–18
`19, 20
`21
`22–31
`32
`33
`34
`35
`36
`37
`38
`39, 40
`41–47
`48–63
`
`Meaning
`Slice segment of ordinary trailing picture
`Slice segment of TSA picture
`Slice segment of STSA picture
`Slice segment of RADL picture
`Slice segment of RASL picture
`Reserved for future use
`Slice segment of BLA picture
`Slice segment of IDR picture
`Slice segment of CRA picture
`Reserved for future use
`Video parameter set (VPS)
`Sequence parameter set (SPS)
`Picture parameter set (PPS)
`Access unit delimiter
`End of sequence
`End of bitstream
`Filler data
`SEI messages
`Reserved for future use
`Unspecified (available for system use)
`
`Class
`VCL
`VCL
`VCL
`VCL
`VCL
`VCL
`VCL
`VCL
`VCL
`VCL
`non-VCL
`non-VCL
`non-VCL
`non-VCL
`non-VCL
`non-VCL
`non-VCL
`non-VCL
`non-VCL
`non-VCL
`
`order known as open GOP operation. Good support of random
`access is critical for enabling channel switching, seek opera-
`tions, and dynamic streaming services. Some pictures that fol-
`low a CRA picture in decoding order and precede it in display
`order may contain interpicture prediction references to pictures
`that are not available at the decoder. These nondecodable
`pictures must therefore be discarded by a decoder that starts
`its decoding process at a CRA point. For this purpose, such
`nondecodable pictures are identified as random access skipped
`leading (RASL) pictures. The location of splice points from
`different original coded bitstreams can be indicated by broken
`link access (BLA) pictures. A bitstream splicing operation
`can be performed by simply changing the NAL unit type of
`a CRA picture in one bitstream to the value that indicates
`a BLA picture and concatenating the new bitstream at the
`position of a RAP picture in the other bitstream. A RAP
`picture may be an IDR, CRA, or BLA picture, and both
`CRA and BLA pictures may be followed by RASL pictures
`in the bitstream (depending on the particular value of the
`NAL unit type used for a BLA picture). Any RASL pictures
`associated with a BLA picture must always be discarded by
`the decoder, as they may contain references to pictures that
`are not actually present in the bitstream due to a splicing
`operation. The other type of picture that can follow a RAP
`picture in decoding order and precede it in output order is
`the random access decodable leading (RADL) picture, which
`cannot contain references to any pictures that precede the
`RAP picture in decoding order. RASL and RADL pictures
`are collectively referred to as leading pictures (LPs). Pictures
`that follow a RAP picture in both decoding order and output
`order, which are known as trailing pictures, cannot contain
`references to LPs for interpicture prediction.
`
`B. Temporal Sublayering Support
`Similar to the temporal scalability feature in the H.264/
`MPEG-4 AVC scalable video coding (SVC) extension [12],
`
`Virginia Innovation Sciences, Ex. 2004.
`
`

`

`1654
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012
`
`reference picture list 0 and list 1. An index called a reference
`picture index is used to identify a particular picture in one
`of these lists. For uniprediction, a picture can be selected
`from either of these lists. For biprediction, two pictures are
`selected—one from each list. When a list contains only one
`picture, the reference picture index implicitly has the value 0
`and does not need to be transmitted in the bitstream.
`The high-level syntax for identifying the RPS and estab-
`lishing the reference picture lists for interpicture prediction is
`more robust to data losses than in the prior H.264/MPEG-4
`AVC design, and is more amenable to such operations as
`random access and trick mode operation (e.g., fast-forward,
`smooth rewind, seeking, and adaptive bitstream switching).
`A key aspect of this improvement
`is that
`the syntax is
`more explicit, rather than depending on inferences from the
`stored internal state of the decoding process as it decodes the
`bitstream picture by picture. Moreover, the associated syntax
`for these aspects of the design is actually simpler than it had
`been for H.264/MPEG-4 AVC.
`
`IV. HEVC Video Coding Techniques
`
`As in all prior ITU-T and ISO/IEC JTC 1 video coding
`standards since H.261 [2],
`the HEVC design follows the
`classic block-based hybrid video coding approach (as depicted
`in Fig. 1). The basic source-coding algorithm is a hybrid
`of interpicture prediction to exploit temporal statistical de-
`pendences, intrapicture prediction to exploit spatial statistical
`dependences, and transform coding of the prediction residual
`signals to further exploit spatial statistical dependences. There
`is no single coding element in the HEVC design that provides
`the majority of its significant improvement in compression
`efficiency in relation to prior video coding standards. It is,
`rather, a plurality of smaller improvements that add up to the
`significant gain.
`
`A. Sampled Representation of Pictures
`For representing color video signals, HEVC typically uses a
`tristimulus YCbCr color space with 4:2:0 sampling (although
`extension to other sampling formats is straightforward, and is
`planned to be defined in a subsequent version). This separates
`a color representation into three components called Y, Cb,
`and Cr. The Y component is also called luma, and represents
`brightness. The two chroma components Cb and Cr represent
`the extent to which the color deviates from gray toward blue
`and red, respectively. Because the human visual system is more
`sensitive to luma than chroma, the 4:2:0 sampling structure
`is typically used, in which each chroma component has one
`fourth of the number of samples of the luma component (half
`the number of samples in both the horizontal and vertical
`dimensions). Each sample for each component is typically
`represented with 8 or 10 b of precision, and the 8-b case is the
`more typical one. In the remainder of this paper, we focus our
`attention on the typical use: YCbCr components with 4:2:0
`sampling and 8 b per sample for the representation of the
`encoded input and decoded output video signal.
`The video pictures are typically progressively sampled with
`rectangular picture sizes W×H, where W is the width and
`
`Fig. 2. Example of a temporal prediction structure and the POC values,
`decoding order, and RPS content for each picture.
`
`HEVC specifies a temporal identifier in the NAL unit header,
`which indicates a level in a hierarchical temporal prediction
`structure. This was introduced to achieve temporal scalability
`without the need to parse parts of the bitstream other than the
`NAL unit header.
`Under certain circumstances, the number of decoded tem-
`poral sublayers can be adjusted during the decoding process
`of one coded video sequence. The location of a point in the
`bitstream at which sublayer switching is possible to begin
`decoding some higher temporal layers can be indicated by the
`presence of temporal sublayer access (TSA) pictures and step-
`wise TSA (STSA) pictures. At the location of a TSA picture, it
`is possible to switch from decoding a lower temporal sublayer
`to decoding any higher temporal sublayer, and at the location
`of an STSA picture, it is possible to switch from decoding a
`lower temporal sublayer to decoding only one particular higher
`temporal sublayer (but not the further layers above that, unless
`they also contain STSA or TSA pictures).
`
`C. Additional Parameter Sets
`The VPS has been added as metadata to describe the
`overall characteristics of coded video sequences, including
`the dependences between temporal sublayers. The primary
`purpose of this is to enable the compatible extensibility of
`the standard in terms of signaling at the systems layer, e.g.,
`when the base layer of a future extended scalable or multiview
`bitstream would need to be decodable by a legacy decoder, but
`for which additional information about the bitstream structure
`that
`is only relevant for the advanced decoder would be
`ignored.
`
`D. Reference Picture Sets and Reference Picture Lists
`For multiple-reference picture management, a particular set
`of previously decoded pictures needs to be present in the de-
`coded picture buffer (DPB) for the decoding of the remainder
`of

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket