throbber
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL. 17, NO. 9, SEPTEMBER 2007
`
`1121
`
`Spatial Scalability Within the H.264/AVC Scalable
`Video Coding Extension
`
`C. Andrew Segall, Member, IEEE, and Gary J. Sullivan, Fellow, IEEE
`
`(Invited Paper)
`
`Abstract—A scalable extension to the H.264/AVC video coding
`standard has been developed within the Joint Video Team (JVT), a
`joint organization of the ITU-T Video Coding Group (VCEG)and
`the ISO/IEC Moving Picture Experts Group (MPEG). The exten-
`sion allows multiple resolutions of an image sequence to be con-
`tained in a single bit stream. In this paper, we introduce the spa-
`tially scalable extension within the resulting Scalable Video Coding
`standard. The high-level design is described and individual coding
`tools are explained. Additionally, encoder issues are identified. Fi-
`nally, the performance of the design is reported.
`
`Index Terms—H.264/AVC, Scalable Video Coding (SVC), spatial
`scalability.
`
`I.
`
`INTRODUCTION
`
`ITH the expectation that future applications will sup-
`
`Wao: a diverse range of display resolutions and trans-
`
`mission channel capacities, the Joint Video Team (JVT) has
`developed a scalable extension [1], [2] to the state-of-the-art
`H.264/AVC video coding standard [3]-[6]. This extension is
`commonly known as Scalable Video Coding (SVC)andit pro-
`vides support for multiple display resolutions within a single
`compressed bit stream (or in hierarchically related bit streams),
`whichis referred to here as spatial scalability. Additionally, the
`SVC extensions support combinations of temporal scalability
`(frame rate enhancement) and quality scalability (fidelity en-
`hancementfor pictures of the same resolution) with the spatial
`scalability feature [2]. This is achieved while balancing both de-
`coder complexity and coding efficiency.
`Theresolution diversity of current display devices motivates
`the need for spatial scalability. Specifically, larger format, high
`definition displays are becoming common in consumerapplica-
`tions, with displays containing over two million pixels readily
`available. By contrast, lower resolution displays with between
`ten thousand and one hundred thousand pixels are also popular
`in applications constrained by size, power and weight. Unfortu-
`nately, transmitting a single representation of a video sequence
`to the range of display resolutions available in the market is im-
`practical. For example,it is rarely justifiable to design a device
`
`Manuscript received January 7, 2007; revised July 24, 2007. This paper was
`recommended by Guest Editor T. Wiegand.
`C. A. Segall is with Sharp Laboratories of America, Camas, WA 98607 USA
`(e-mail: asegall @sharplabs.com).
`G. J. Sullivan is with Microsoft Corporation, Redmond, WA 98052 USA
`(e-mail: garysull@microsoft.com).
`Color versions of one or more ofthe figures in this paper are available online
`at http://ieeexplore.iecee.org.
`Digital Object Identifier 10.1109/TCSVT.2007.906824
`
`with low display resolution with the capacity for decoding and
`down-sampling high-resolution video material. Such a require-
`ment could increase the cost and power ofthe device to the point
`of exceeding the very constraints that determined its display res-
`olution. In addition, sending the high-resolution details that are
`ultimately not shown on the display for such a device is a waste
`ofits receiving channelbitrate.
`Diverse, limited, and time-varying channel capacity provides
`a second motivation for spatial scalability. Here, the concern is
`that channel capacity may preclude the reliable transmission of
`high-resolution video to specific devices or at specific time in-
`stances. Spatial scalability allows for the rapid bit rate adapta-
`tion that can be a necessity in such scenarios. Thisbit rate adap-
`tation is achieved without transcoding operations or feedback to
`a complex real-time encoding process, both of which can intro-
`duce unacceptable complexity and delay.
`The purposeofthis paperis to discuss key concepts of spa-
`tial scalability within the SVC extension. This project is the
`fourth in a historical series of efforts to standardize spatially
`SVC schemes (after prior efforts in MPEG-2 [7], [8], H.263
`Annex O [9], and MPEG-4 part 2 [10]), although the prior de-
`signs were basically not successful in terms of industry adop-
`tion. This paper points out several ways in which the new design
`addresses the problems of those prior approaches.
`Therestof this paper is organized as follows. Section II pro-
`vides an overview of H.264/AVCspatially scalable coding and
`comparesit to alternative scalable approaches. In Section III,
`the specific coding tools within the spatial SVC design are de-
`scribed. In Section IV, encoder issues related to spatial SVC
`are considered. In Section V, the performance of the spatial
`SVC extension is presented. Finally, conclusions are provided
`in Section VI.
`
`II. OVERVIEW
`
`The SVC extension of H.264/AVC provides a mechanism for
`reusing an encoded lower resolution version of an image se-
`quencefor the coding of a corresponding higher resolution se-
`quence. This is shown in Fig. 1, where a diagram of a hypo-
`thetical SVC encoder is provided. Subsequent sections discuss
`the specific tools introduced in the SVC extension. However,
`to better aid in the understanding of the SVC design, this sec-
`tion focuses on higher level concepts. We begin by identifying
`basic concepts and definitions necessary for discussion of the
`SVC design. Then, we consider the high level spatial relation-
`ship between resolutions in a bit stream. Finally, we summarize
`
`Authorized licensed use limited to: Fish & Richardson PC.
`
`1051-8215/$25.00 © 2007 IEEE
`Downloaded on December 23,2023 at 02:17:28 UTC from IEEE Xplore. Restrictions apply.
`
`SAMSUNG-1006
`
`1
`
`SAMSUNG-1006
`
`

`

`1122
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL. 17, NO. 9, SEPTEMBER 2007
`
`High Resolution
`Sequence
`
`Separate
`Frame into
`Macroblocks
`
`Control
`
`Deblocking
`Operation
`
`Transform and
`Quantize
`
`Enhancement :
`Layer
`y
`
`Scaling and
`Inverse
`
`Transform
`
`Deblocking
`Operation
`
`
`
`Inter-Layer Prediction ‘Sucnsemmnsenn
`
`
`
`
` Encoder
`
`ieee
`
`
`
` Macroblocks PUCCCMROEECOeee
`
`Base LayerSERRACCERETC
`
`Deblocking
`Operation
`
`a Motion
`Compensation
`
`
`
`Intra-
`Prediction
`
`Scaling and
`Inverse
`Transform
`
`
`
`Control
`
`Encoder
`
`Low Resolution
`Sequence
`
`Separate
`Frame into
`
`Fig. 1. High-level diagram ofspatial scalability in the SVC design. The “base layer” encoder takes a lower resolution video sequence as input and encodesit with
`the H.264/AVC video coding standard while conformingto a legacy profile. The enhancementlayer encoder takes a higher resolution sequence as input. The higher
`resolution sequence can be encoded with ordinary H.264/AVCtechnologies. Moreover, inter-layer prediction can be used to provide additional coding choices. For
`the case of intra-picture coded blocks in the base layer, reconstructed intensities provide a prediction for the enhancementlayer. For the case of inter-picture coded
`blocks in the base layer, enhancement layer motion vectors and residual difference information can be predicted from the base layer. Further resolution layers can
`be added in an analogous fashion and can utilize either the base layer or previously transmitted enhancementlayers for inter-layer prediction. Moreover, other
`forms of SVC (temporal or quality) enhancement may also be present.
`
`A. Basic Concepts
`
`two key design concepts in the SVC extension—image pyra-
`mids and single-loop decoding.
`
`bit streams and transmitting them using the sum of the two bit
`rates. The challenge here is considerable—amongthe three typ-
`ical basic forms ofbit stream scalability, i-e., spatial, temporal,
`and quality, the spatial form seems to be the most difficult in
`which to achieve significant superiority to a simulcast solution.
`The basic mission of a scalable design is two-fold: 1) to min-
`One dominant reason for this is the focus of the JVT on sup-
`imize the codingefficiency loss relative to single-layer coding;
`porting lower resolution versions of image sequences with high
`and 2) to minimize the complexity increase (especially for de-
`visual quality, as opposed to lower resolution representations
`coders) relative to single-layer coding. By single-layer coding,
`that provide high codingefficiency.
`we refer to the coding of a video sequence without providing
`The lowest resolution video data in a spatially scalable system
`the scalability functionality. Unless a result with coding effi-
`is sometimes referred to as the base layer (especially when it
`ciency significantly superior to a simulcast solution can be ob-
`is decodable by an ordinary nonscalable single-layer decoder),
`tained, a scalable solution with any complexity penalty is use-
`and the higher resolution video datais often referred to as the en-
`less. By simulcast, we refer to the coding of both source video
`hancementlayer. Processes that determine or predict the value
`sequencesofa scalable scenarioas entirely separate single-layer
`Authorized licensed use limited to: Fish & Richardson PC. Downloaded on December 23,2023 at 02:17:28 UTC from IEEE Xplore. Restrictions apply.
`
`2
`
`2
`
`

`

`SEGALL AND SULLIVAN: SPATIAL SCALABILITY WITHIN THE H.264/AVC SCALABLE VIDEO CODING EXTENSION
`
`1123
`
`B. Inter-Layer Spatial Relationships and Profile Constraints
`
`of enhancement layer data from previously reconstructed data
`of a lower resolution layer at the same time instance are re-
`An important feature of the SVC design, from a high-level
`ferred to as inter-layer prediction processes, and the source for
`functionality perspective, is the ability for the lower resolution
`the prediction is referred to as the reference layer. Other forms
`and higher resolution pictures in a spatially scalable bit stream
`of prediction include inter-picture prediction, involving predic-
`to represent different regions of a video scene. For example, a
`tion operating temporally between different pictures of the same
`system may transmit standard definition television content, with
`resolution layer, and intra-picture prediction, involving predic-
`a picture aspect ratio of 4:3, as a base layer and high definition
`tion operating spatially within the samepicture of one particular
`television content, with a picture aspect ratio of 16:9, in a higher
`resolution layer.
`resolution enhancementlayer. Such a use case requires crop-
`From a video coding specification perspective, the set of
`ping and offsetting the origins of the picture regions in addition
`data comprising a SVC representation is treated as a single
`to scaling, as the lower resolution layer signal may not repre-
`bit stream. However, from a systems multiplex orfile storage
`sent the entire extent of the higher resolution sequence (and vice
`perspective, the data might often be handled differently—as
`versa). The common “pan and scan” technique used on stan-
`distinct hierarchically related streams of content that are coor-
`dard-definition DVDs for converting wide screen data for dis-
`dinated using decoding timestamps or other such mechanisms.
`play on a 4:3 display is an example of a more limited form of
`In this fashion, a system can ease the handling of the data, such
`such display adaptation. The SVC extension supports such ca-
`as enabling channel bit rate adaptation or ensuring that legacy
`pability in a flexible but straightforward manner. Relative posi-
`decoders that do not support scalability are presented with only
`tioning and windowing parameters are providedin picture-level
`the base layer for decoding.
`syntax structures, so that flexible cropping, scaling, and align-
`Some degree of familiarity with the concepts of the orig-
`ment relationships can not only be supported but may be varied
`inal H.264/AVC standard is assumed in the presentation pro-
`on a picture-by-picture basis.
`vided herein, such as the concepts of macroblocks, motion par-
`However, such flexibility can be constrained to simplify the
`titions, biprediction, inter-picture prediction using multiple ref-
`use cases for particular applications. In particular,
`the SVC
`erence pictures, and reference picture lists. Readers unfamiliar
`extension includes the definition of three profiles of the design.
`with this background information may benefit from referring to
`These are the “Scalable Baseline”profile, the “Scalable High”
`[3]-[6]. Moreover, one topic that is somewhat neglected in this
`profile, and the “Scalable High Intra” profile. While the latter
`presentationis that of interlaced-scan video content. Herein the
`two profiles support full spatial SVC flexibility, the Scalable
`principles of the spatial SVC design are explained under the as-
`Baseline profile imposes the following constraints to enable
`sumption of frame-structured progressive-scan pictures, so that
`simplified application scenarios.
`the concepts can be described without the need to consider the
`* The width and height of the scaled regions of lower res-
`details of the handling of interlaced fields and frames. The ap-
`olution and higher resolution pictures must have the same
`plication of these SVC concepts to interlaced videois straight-
`scaling ratio, andthis ratio can only have the value 1.5 or 2.
`forward for those familiar with interlaced video coding using
`* The spatial offsets specifying the relative location of the
`H.264/AVC.For further information aboutinterlaced video sup-
`upper left corner of the lower and higher resolution pic-
`port in the SVC context, the reader is referred to [11]. The
`ture regions must be multiples of 16 both horizontally and
`overview of SVC in general thatis found in [2] will also be of
`vertically (i-e., they must be in units of macroblocks).
`interest to many readers.
`The case using a scaling ratio of 2 with spatial offset con-
`An additional simplification used in much ofthe discussion
`straints as noted above is often referred to as dyadic spatial
`for this overview paper is to primarily considerabit stream con-
`scalability, whereas the more general case is knownas extended
`taining only two layers—a lower resolution base layer and a
`spatial scalability [12].
`higher resolution spatial scalability enhancementlayer. In fact,
`the SVC design fully supports multilayer scenarios including
`multiple spatial scalability layers and the mixing ofspatial scal-
`ability layers with other layers that provide temporal or quality
`scalability. Considerable flexibility is also provided in regard
`to the selection of the reference layer for each enhancement
`layer, such that a bit stream can contain branching dependency
`structures.
`
`C. Image Pyramids and Related Coarse-to-Fine Hierarchies
`
`Image pyramids describe a relationship between lowerres-
`olution and higher resolution versions of an image.! This re-
`lationship is found in a variety of image and video processing
`scenarios, and image pyramids have been incorporated into a
`variety of applications, e.g., [13]-[19], as well as previous scal-
`able efforts in the video coding standards [7]-{10]. In the SVC
`extension, a coarse-to-fine hierarchy of imagesis also used for
`spatial scalability. The original high-resolution image sequence
`is converted to lower resolutions by filtering and decimating.
`Then, the sequence ofpictures at the lowest of these resolutions
`is coded in a mannersuchthat it can be decoded independently.
`Eachhigher resolution video sequence is coded relative to a de-
`coded lower resolution sequence.
`
`As with prior international standards for video coding (scal-
`able and nonscalable), the scope of the standard is limited to
`specifying the decoding process and the format of the syntax.
`Encoder designers are free to use any encoding algorithmsthey
`wish, so long as the bit stream they produce conforms to the
`format specification. Any kind of preprocessingis also allowed
`prior to encoding, and decoding devices are allowed to contain
`any sort of post-processing, error and loss concealment tech-
`1The terms picture and image are used interchangeably herein.
`niques, and display-related customization.
`Authorized licensed use limited to: Fish & Richardson PC. Downloaded on December 23,2023 at 02:17:28 UTC from IEEE Xplore. Restrictions apply.
`
`3
`
`3
`
`

`

`1124
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL. 17, NO. 9, SEPTEMBER 2007
`
`The use of an image pyramid for video coding does not come
`without penalties. Specifically, an image pyramid is an over-
`complete decomposition. In other words, the numberof image
`samples in the entire pyramidstructure is larger than the number
`of samples in an original high-resolution image. Thisis in con-
`trast to embedded representations that usecritically sampled de-
`compositions. For example, wavelet decompositions are well
`knownto provide inherent scalability and viable image coding
`designs [20]-[24]. In the development of the SVC extension,
`such critically sampled decompositions were also considered
`[25]-[28]. However, the aliasing introduced by these decom-
`positions, while suitable for still image coding, were deemed
`problematic for video. Specifically, the aliasing can make effec-
`tive motion compensated inter-picture prediction more difficult,
`as well as lead to objectionable temporal artifacts. Additionally,
`the wavelet design may be likely to require more computational
`resources than the traditional block-based coding approach.
`The decision to use an image pyramid in the SVC project
`provides flexibility for encoder and application designers. The
`down-sampling operation is not defined in the standard, so
`that encoder designers are free to employ the down-sampler
`that they consider most suitable. For example, applications
`that are sensitive to encoder hardware costs would select a
`
`down-sampler with minimum complexity for the specific im-
`plementation architecture. Alternatively, in other applications,
`additional computational complexity may be acceptable in
`order to achieve a higher quality result. These applications
`would choose a more sophisticated, and likely more complex,
`down-sampling method.
`
`D. Single-Loop Decoding Concept
`
`The concept of image pyramids describes the relationship be-
`tween images ofdifferent resolution. However, image pyramids
`do not capture the evolution of that relationship between com-
`pressed images through time in sequences of such images. To
`understandthis relationship, we need to consider the concepts of
`multiloop and single-loop decoding. Single-loop decoding,also
`called constrainedinter-layer prediction [29|-{32],is a funda-
`mental property of the new SVC design, andit is described in
`the remainderofthe section.
`
`Then, the coarse-to-fine relationship of the image pyramid is
`used to predict the lower frequency components of a higherres-
`olution enhancement-layer picture using up-samplingofthe de-
`coded lower resolution picture. Additionally, motion compen-
`sated inter-picture prediction is performed again at the enhance-
`ment layer. This predicts the high frequency components of the
`enhancement layer.
`Using a decoder with multiple motion compensation loops
`does improve the coding efficiency of a scalable video codec,
`but the benefit in coding efficiency turns out to be minimal
`whenall available coded data is used effectively in other ways
`(29]-[32]. Moreover, the multiloop decoding schemeincreases
`decoding complexity. Motion compensation is performed at
`each resolution and the reconstructed pictures of all levels of
`the pyramid are stored for each time instant. This becomes
`problematic in practice, as motion compensation requires
`high memory bandwidths for many processing architectures
`[33], and the extra decoding processes involved in multiloop
`decoding add undesirable sequential dependencies to the de-
`coding process as well as require extra encoder and decoder
`implementation and debuggingefforts.
`In the SVC design, a lower complexity approach is adopted.
`Motion compensation is performed only at the target decoded
`resolution (e.g., the displayed resolution). Thus, the decoding
`structure of the SVC design is referred to as a single-loop design,
`which simply meansthat only the operation of a single motion
`compensation loop is necessary to reconstruct the image se-
`quencefor any resolution layer. This provides an important fea-
`ture, as it reduces the complexity of motion compensation to that
`of a single-layer decoder—eliminating the major source of com-
`plexity penalty in prior SVC designs. As will be seen in the next
`section, good codingefficiency can still be achieved without re-
`quiring multiloop decoding, by effectively propagating the in-
`formation found in the coded motion vectors, mode information
`and residual difference data from each lower resolution layer to
`each next higher resolution layer. This propagation employs the
`previously described image pyramid concept.
`To further ease implementation, the syntax of the SVC ex-
`tension has been designed in a way that it allows the separate
`parsing of each layer of the syntax (without parsing other layers
`and without operating the decoding processes of lower layers)
`[34], [35]. Completing the full decoding process, of course, re-
`quires further processing of someparsed data of each layer up to
`the target decoded layer (but not full multilayer decoding, due
`to the single-loop nature of the design).
`
`In the family of ITU-T and ISO/IEC video coding standards,
`which includes H.264/AVC, block-wise motion compensated
`inter-picture prediction playsa critical role in improving coding
`efficiency. This is accomplished by transmitting (or having the
`decoder infer) one or more motion vectors to predict a block in
`the current picture from the content of previously decoded ref-
`erence pictures. Then, additional information aboutthe residual
`SVCintroduces several design features to enable spatial scal-
`difference between the prediction and the actual image data
`ability. These tools include the calculation of corresponding po-
`may be sent. For natural image sequences, which often contain
`sitionsin different resolution layers, methodsfor inter-layer pre-
`slowly evolving features, the motion compensation process ex-
`diction ofvarious data such as macroblock prediction modes and
`ploits the inherent characteristics of the image sequence.
`motion vectors, an “I_BL” macroblock type that uses inter-layer
`In designing the SVC extension of the H.264/AVC standard, a
`up-sampled image prediction, and a residual difference signal
`fundamental question was how to use the motion compensation
`prediction technique that uses inter-layer up-sampled residual
`process within the context of spatial scalability. One potential
`difference prediction. These tools are provided in addition to the
`approach (used in all previous standardized designs) would be
`original single-layer coding tools, such as (spatial) intra-picture
`to perform multiloop decoding. In this scenario, each low-res-
`olution picture is completely decoded, including low-resolu-
`and (temporal) inter-picture coding techniques, and an encoder
`must determine when eachtool is most appropriate. Describing
`tion motion-compensation prediction operations in particular.
`Authorized licensed use limited to: Fish & Richardson PC. Downloaded on December 23,2023 at 02:17:28 UTC from IEEE Xplore. Restrictions apply.
`
`Ill. CopING TooLs
`
`4
`
`4
`
`

`

`SEGALL AND SULLIVAN: SPATIAL SCALABILITY WITHIN THE H.264/AVC SCALABLE VIDEO CODING EXTENSION
`
`1125
`
`the new tools and how they are combined effectively with the
`original H.264/AVCsingle-layer design features is the focus of
`this section.
`
`A. Calculation of Corresponding Spatial Positions
`
`Thefirst design feature that we will discuss in detail is the
`calculation of corresponding positions in adjacent levels of the
`pyramid hierarchy. This concept is used in several ways in the
`spatial SVC extension.
`Identifying sample locations in a lower resolution layer that
`correspond to sample locations in the enhancementlayer is per-
`formed at fractional-sample accuracy. Specifically, sample po-
`sitions are calculated to 1/16th sample position increments and
`derived using fixed-point operations as
`
`BE,
`
`*D,+R
`
`B, = Round (a)
`
`B,, = Round (Aa)
`
`()
`
`where B, and B, are, respectively, the horizontal and vertical
`sample coordinates in the lowerresolution (e.g., base layer) pic-
`ture array, E, and E, are horizontal and vertical sample coordi-
`nates in the high-resolution (enhancement-layer) picture array,
`and R, and R, are higherprecision (1/2° sampleposition) ref-
`erence offset locations for grid reference position alignment,
`and D, and D, are scaled inverses of the horizontal and ver-
`tical resampling ratios. .D, and D,, are specified as
`
`25 + BaseWidth
`D, = Round (=~
`
`oun (an)
`Noun ——)
`
`D, = Round
`
`=
`
`25 » BaseHeight
`(=———""5—
`
`2
`
`@)
`
`+ The D, and D, computations only need to be performed
`once, with the results reused repeatedly for computations
`of B, and B, for the entire image (or sequence of video
`images).
`When movingfrom positionto position from rightto left or
`top to bottom in computing B,. and B, for a series of values
`of F, and F.,, a multiplication operation can be converted
`to an addition so that computation of each B, and B, can
`be performed incrementally, requiring only one addition
`and one right shift operation to obtain the result of each
`formula.
`
`This design supports essentially arbitrary resizing ratios (ex-
`cept in constrained applications using the Scalable Baseline pro-
`file), and the position calculation equations have low complexity
`regardless of the ratio, in contrast to some prior standardized de-
`signs in which only relatively simple rational ratios were prac-
`tical due to the way the position calculations were specified.
`
`B. Coarse-to-Fine Projection ofMacroblock Modes, Motion
`Partitioning, Reference Picture Indices, and Motion Vectors
`
`In the enhancement layer syntax for areas of the enhance-
`ment layer that correspond to areas within the lower resolution
`picture, a flag, called the base mode flag, can be sent for each
`nonskipped macroblock? to determine whether the macroblock
`mode, motion segmentation, reference picture indices, and mo-
`tion vectors are to be inferred from the data at corresponding
`positions in the lower resolution layer. The basic concepts of
`this inference process were proposed for use with dyadic spatial
`scalability in [37] and were extended to arbitrary spatial scala-
`bility relationships in [38]-[40]. In some sense the projection
`consists offirst projecting the sample grid ofthe finer level to
`the coarser level of the pyramid and then using this projection
`to propagate data from the coarser level to the finer level.
`When the base modeflag is equal to 0, the macroblock pre-
`diction modeis sent within the enhancement layer macroblock-
`level syntax. Then, within each motion partition, a flag can be
`sent for each reference picture list, called the motion predic-
`tion flag, to determine whether reference picture indexes will
`be sent in the enhancement layer or not and whether the mo-
`tion vectors are to be predicted within the enhancementlayer or
`using inter-layer prediction from the lower resolution layer mo-
`tion data.
`
`where BaseWidth and BaseHeight denote the width and height
`of the rectangular region of the lower resolution picture array to
`be up-sampled, respectively, and ScaledBaseWidth and Scaled-
`BaseHeight denote the width and height of the corresponding
`region of the up-sampled lowerresolution picture array, respec-
`tively. The precision control parameter S has been chosen to
`trade off between precision and ease of computation; S is spec-
`ified to be 16 for most uses to enable the use of 16-bit word-
`
`length arithmetic, and to be a somewhat larger number opti-
`mized for 32-bit arithmetic for enhanced-capability decoders
`that support very large picture sizes. The basic design of these
`formulas was proposed in [36], and some later refinements were
`subsequently applied. The formulas are designed for computa-
`tional simplicity as follows.
`* The above formulas are specified for implementation using
`two’s complement integer operations, most of which re-
`quire at most 16 bits of dynamic range (for example, noting
`that BaseWidth is always less than ScaledBaseWidth, D,
`requires no more than S' bits).
`2To save the need to repeatedly send the base mode flag in cases when an
`Multiplication and division scale factors that are powers
`encoder will not vary its value in applicable macroblocks, a default value for
`of two are specified to be performed using left and right
`the flag can alternatively be sent at the slice header level.
`binary arithmetic shifts.
`3To save the need to repeatedly send the motion prediction flag in cases when
`Roundingofa ratio is accomplished by adding half of the
`an encoder will not vary its value in applicable macroblocks, a default value for
`the flag can alternatively be sent at the slice header level.
`value of the denominator prior to right shifting.
`Authorized licensed use limited to: Fish & Richardson PC. Downloaded on December 23,2023 at 02:17:28 UTC from IEEE Xplore. Restrictions apply.
`
`When the base modeflag is equal to 1, since the finest gran-
`ularity of H.264/AVC coding decisions is at the 4 x 4 level,
`the inference process is performed based on 4 x 4 luma block
`structures. For each 4 x 4 lumablock, the process begins by
`identifying a corresponding block in the lowerresolution layer.
`Numbering the samples of the luma block from 0 to 3 both
`horizontally and vertically, the luma sample at position (1,1) is
`used to determine the block’s associated data. A corresponding
`sample in the lower resolution layer for this sample is iden-
`tified in a similar manner as described in Section III-A, but
`
`5
`
`5
`
`

`

`1126
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL. 17, NO. 9, SEPTEMBER 2007
`
`with nearest sample precision instead of 1/16th sample preci-
`sion. The prediction type (intra-picture, inter-picture predictive,
`or inter-picture bipredictive), reference picture indices, and mo-
`tion vectors associated with the prediction block containing the
`corresponding lowerresolution layer position are then assigned
`to the 4 x 4 enhancementlayer block. Motion vectors are scaled
`by the resampling ratio and offset by any relative picture grid
`spatial offset so that they become relevant to the enhancement
`layer picture coordinates. Then a merging process takes place
`to determine the final mode and motion segmentation in the en-
`hancement layer macroblock.
`Tf all 4 x 4 luma blocksof the enhancement macroblock cor-
`
`respond to intra-picture coded lowerresolution layer blocks, the
`inferred macroblock type is considered to be “I-BL,” a mac-
`roblock type that is described in the following section; other-
`wise, motion segmentation, reference picture indices, and mo-
`tion vectors then need to be inferred. (It should be noted that
`because the prediction mode is determined from only one po-
`sition in each 4 x 4 block, it is possible that a few samples in
`enhancement layer LBL macroblock may have corresponding
`locations in the lower resolution layer picture that lie in inter-
`picture predicted regions of the lower resolution layer.)
`In H.264/AVC,reference picture indexes have an 8 x 8 luma
`granularity. To achieve this granularity, for each 8 x 8 luma
`region of the enhancement layer, the reference picture index is
`set to the minimum of the reference picture indexes inferred
`from the corresponding constituent 4 4 blocks when per-
`forming inter-layer motion prediction [38]-[40]. When some
`lower resolution layer blocks are in a B-slice, the minimum is
`computed separately for each of the two reference picture lists
`and biprediction is inferred if both lists were used in the set
`of 4 x 4 blocks. For 4 x 4 regions that did not use a selected
`reference picture index (or indices, in the case of biprediction),
`the motion vectoris set to that of a neighboring block that did
`(so that some motion vector value is assigned that is relevant to
`the selected reference picture index).
`Then the values of motion vectors are inspected to determine
`the final motion partitioning of the enhancement layer mac-
`roblock (4 x 4, 4 x 8, 8 x 8, 8 x 16, 16x 18, or 16 x 16). Par-
`titions with identical reference picture indexes and similar or
`identical motion vectors are merged to make thefinal predicted
`motion more coherent and reduce the complexity of the associ-
`ated inter-picture prediction processing [41].
`The result is predicted mode and motion data that fits with
`the same basic structure of ordinary single-layer H.264/AVC
`prediction.
`
`C. LBL Macroblock Type and Inter-Layer Texture Prediction
`
`the prediction. Finally, a deblocking filter is applied to the re-
`sulting picture.
`It is important to understand thatLBL macroblocks cannot
`occur at arbitrary locations in the enhancementlayer. Instead,
`the ILBL macroblocktype is only available whe

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket