throbber
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 8, NO. 7, NOVEMBER 1998
`
`849
`
`H.263+: Video Coding at Low Bit Rates
`
`Guy Cˆot´e, Student Member, IEEE, Berna Erol, Michael Gallant, Student Member, IEEE,
`and Faouzi Kossentini, Member, IEEE
`
`Abstract—In this tutorial paper, we discuss the ITU-T H.263+
`(or H.263 Version 2) low-bit-rate video coding standard. We
`first describe, briefly, the H.263 standard including its optional
`modes. We then address the 12 new negotiable modes of
`H.263+. Next, we present experimental results for these modes,
`based on our public-domain implementation (see our Web
`site at http://spmg.ece.ubc.ca). Tradeoffs among compression
`performance, complexity, and memory requirements for the
`H.263+ optional modes are discussed. Finally, results for mode
`combinations are presented.
`
`Index Terms— H.263, H.263+, video compression standards,
`video compression and coding, video conferencing, video tele-
`phony.
`
`I. INTRODUCTION
`
`IN the past few years, there has been significant interest
`
`in digital video applications. Consequently, academia and
`industry have worked toward developing video compression
`techniques [1]–[5], and several successful standards have
`emerged, e.g., ITU-T H.261, H.263, ISO/IEC MPEG-1, and
`MPEG-2. These standards address a wide range of applications
`having different requirements in terms of bit rate, picture
`quality, complexity, error resilience, and delay.
`While the demand for digital video communication ap-
`plications such as videoconferencing, video e-mailing, and
`video telephony has increased considerably, transmission rates
`over public switched telephone networks (PSTN) and wireless
`networks are still very limited. This requires compression
`performance and channel error robustness levels that cannot be
`achieved by previous block-based video coding standards such
`as H.261. Version 1 of the international standard ITU-T H.263,
`entitled “Video Coding for Low Bit Rate Communications”
`[6], addresses the above requirements and, as a result, becomes
`the new low-bit-rate video coding standard.
`Although its coding structure is based on that of H.261,
`H.263 provides better picture quality at low bit rates with little
`additional complexity. It also includes four optional modes
`aimed at improving compression performance. H.263 has been
`adopted in several videophone terminal standards, notably
`ITU-T H.324 (PSTN), H.320 (ISDN), and H.310 (B-ISDN).
`
`Manuscript received October 26, 1997; revised April 24, 1998. This work
`was supported by the Natural Sciences and Engineering Research Council
`of Canada and by AVT Audio Visual Telecommunications Corporation. This
`paper was recommended by Associate Editor M.-T. Sun.
`The authors are with the Department of Electrical and Computer Engineer-
`ing, University of British Columbia, Vancouver, B.C., V6T 1Z4, Canada.
`Publisher Item Identifier S 1051-8215(98)06325-3.
`
`H.263 Version 2, also known as H.263+ in the standards
`community, was officially approved as a standard in January
`1998 [7]. H.263+ is an extension of H.263, providing 12
`new negotiable modes and additional features. These modes
`and features improve compression performance, allow the use
`of scalable bit streams, enhance performance over packet-
`switched networks, support custom picture size and clock
`frequency, and provide supplemental display and external
`usage capabilities.
`
`II. THE ITU-T H.263 STANDARD
`The H.263 video standard is based on techniques common
`to many current video coding standards. In this section, we
`describe the source coding framework of H.263.
`
`A. Baseline H.263 Video Coding
`Fig. 1 shows a block diagram of an H.263 baseline encoder.
`Motion-compensated prediction first reduces temporal redun-
`dancies. Discrete cosine transform (DCT)-based algorithms are
`then used for encoding the motion-compensated prediction
`difference frames. The quantized DCT coefficients, motion
`vectors, and side information are entropy coded using variable-
`length codes (VLC’s).
`1) Video Frame Structure: H.263 supports five standard-
`ized picture formats: sub-QCIF, QCIF, CIF, 4CIF, and 16CIF.
`The luminance component of the picture is sampled at these
`resolutions, while the chrominance components,
`and
`,
`are downsampled by two in both the horizontal and vertical
`directions. The picture structure is shown in Fig. 2 for the
`QCIF resolution. Each picture in the input video sequence is
`divided into macroblocks, consisting of four luminance blocks
`of 8 pixels
`8 lines followed by one
`block and one
`block, each consisting of 8 pixels
`8 lines. A group of blocks
`(GOB) is defined as an integer number of macroblock rows, a
`number that is dependent on picture resolution. For example, a
`GOB consists of a single macroblock row at QCIF resolution.
`2) Video Coding Tools: H.263 supports interpicture predic-
`tion that is based on motion estimation and compensation. The
`coding mode where temporal prediction is used is called an
`inter mode. In this mode, only the prediction error frames—the
`difference between original frames and motion-compensated
`predicted frames—need be encoded. If temporal prediction is
`not employed, the corresponding coding mode is called an
`intra mode.
`a) Motion estimation and compensation: Motion-com-
`pensated prediction assumes
`that
`the pixels within the
`
`1051–8215/98$10.00 ª
`
`1998 IEEE
`
`Realtime Adaptive Streaming LLC
`Exhibit 2011
`IPR2019-01035
`Page 1
`
`

`

`850
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 8, NO. 7, NOVEMBER 1998
`
`Fig. 1. H.263 video encoder block diagram.
`
`Fig. 3. H.263 source coding algorithm: motion compensation.
`
`motion. This motion information is represented by two-
`dimensional displacement vectors or motion vectors. Due
`to the block-based picture representation, many motion
`estimation algorithms employ block-matching techniques,
`where the motion vector
`is obtained by minimizing a
`cost function measuring the mismatch between a candidate
`macroblock and the current macroblock. Although several cost
`measures have been introduced, the most widely used one is
`the sum-of-absolute-differences (SAD) defined by
`
`Fig. 2. H.263 picture structure at QCIF resolution.
`
`current picture can be modeled as a translation of those
`within a previous picture, as shown in Fig. 3. In baseline
`H.263, each macroblock is predicted from the previous
`frame. This implies an assumption that each pixel within
`the macroblock undergoes the same amount of translational
`
`16
`th pixel of a 16
`represents the
`where
`macroblock from the current picture at the spatial location
`, and
`represents the
`th pixel of a
`candidate macroblock from a reference picture at the spatial
`displaced by the vector
`. To find the
`location
`macroblock producing the minimum mismatch error, we need
`to calculate the SAD at several locations within a search
`window. The simplest, but the most compute-intensive search
`
`Realtime Adaptive Streaming LLC
`Exhibit 2011
`IPR2019-01035
`Page 2
`
`

`

`C ˆOT ´E et al.: H.263+
`
`851
`
`method, known as the full search or exhaustive search method,
`evaluates the SAD at every possible pixel location in the
`search area. To lower the computational complexity, several
`algorithms that restrict the search to a few points have been
`proposed [8]. In baseline H.263, one motion vector per mac-
`roblock is allowed for motion compensation. Both horizontal
`and vertical components of the motion vectors may be of half
`pixel accuracy, but their values may lie only in the [ 16, 15.5]
`range, limiting the search window used in motion estimation.
`A positive value of the horizontal or vertical component of the
`motion vector represents a macroblock spatially to the right or
`below the macroblock being predicted, respectively.
`8 DCT specified
`b) Transform: The purpose of the 8
`8 blocks of original pixels
`by H.263 is to decorrelate the 8
`or motion-compensated difference pixels, and to compact
`their energy into as few coefficients as possible. Besides its
`relatively high decorrelation and energy compaction capa-
`8 DCT is simple, efficient, and amenable
`bilities, the 8
`to software and hardware implementations [9]. The most
`8 DCT is that
`common algorithm for implementing the 8
`which consists of eight-point DCT transformation of the rows
`8 DCT is defined by
`and the columns, respectively. The 8
`
`where
`
`and
`for
`
`8 original
`th pixel of the 8
`denotes the
`Here,
`8 DCT
`denotes the coefficients of the 8
`block, and
`transformed block. The original 8
`8 block of pixels can be
`8 inverse DCT (IDCT) given by
`recovered using an 8
`
`Although exact reconstruction can be theoretically achieved,
`it is often not possible using finite-precision arithmetic. While
`forward DCT errors can be tolerated, inverse DCT errors must
`meet the H.263 standard if compliance is to be achieved.
`c) Quantization: The human viewer is more sensitive to
`reconstruction errors related to low spatial frequencies than
`those related to high frequencies [10]. Slow linear changes in
`intensity or color (low-frequency information) are important
`to the eye. Quick, high-frequency changes can often not be
`seen, and may be discarded. For every element position in
`the DCT output matrix, a corresponding quantization value is
`computed using the equation
`
`where
`
`is the
`
`th DCT coefficient and
`
`is the
`
`Fig. 4. Zigzag scan pattern to reorder DCT coefficients from low to high
`frequencies.
`
`th quantization value. The resulting real numbers are
`then rounded to their nearest integer values. The net effect
`is usually a reduced variance between quantized coefficients
`as compared to the variance between the original DCT co-
`efficients, as well as a reduction of the number of nonzero
`coefficients.
`In H.263, quantization is performed using the same step
`size within a macroblock (i.e., using a uniform quantization
`matrix). Even quantization levels in the range from 2 to 62 are
`allowed, except for the first coefficient (DC coefficient) of an
`intra block, which is uniformly quantized using a step size of
`eight. The quantizers consist of equally spaced reconstruction
`levels with a dead zone centered at zero. After the quantization
`process, the reconstructed picture is stored so that it can be
`later used for prediction of the future picture.
`d) Entropy coding: Entropy coding is performed by
`means of variable-length codes (VLC’s). Motion vectors
`are first predicted by setting their component’s values to
`median values of those of neighboring motion vectors already
`transmitted: the motion vectors of the macroblocks to the
`left, above, and above right of the current macroblock. The
`difference motion vectors are then VLC coded.
`Prior to entropy coding, the quantized DCT coefficients are
`arranged into a one-dimensional array by scanning them in
`zigzag order. This rearrangement places the DC coefficient
`first
`in the array, and the remaining AC coefficients are
`ordered from low to high frequency. This scan pattern is
`illustrated in Fig. 4. The rearranged array is coded using a
`three-dimensional run-length VLC table, representing the triple
`(LAST, RUN, LEVEL). The symbol RUN is defined as the
`distance between two nonzero coefficients in the array. The
`symbol LEVEL is the nonzero value immediately following a
`sequence of zeros. The symbol LAST replaces the H.261 end-
`of block flag, where “LAST
`1” means that the current code
`corresponds to the last coefficient in the coded block. This
`coding method produces a compact representation of the 8
`8 DCT coefficients, as a large number of the coefficients are
`normally quantized to zero and the reordering results (ideally)
`in the grouping of long runs of consecutive zero values. Other
`information such as prediction types and quantizer indication
`is also entropy coded by means of VLC’s.
`
`Realtime Adaptive Streaming LLC
`Exhibit 2011
`IPR2019-01035
`Page 3
`
`

`

`852
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 8, NO. 7, NOVEMBER 1998
`
`(a)
`
`(c)
`
`(b)
`
`(d)
`
`Fig. 5.
`
`Improved PB frames. (a) Structure. (b) Forward prediction. (c) Backward prediction. (d) Bidirectional prediction.
`
`3) Coding Control: The two switches in Fig. 1 represent
`the intra/inter mode selection, which is not specified in the
`standard. Such a selection is made at the macroblock level.
`The performance of the motion estimation process, usually
`measured in terms of the associated SAD values, can be used
`to select the coding mode (intra or inter). If a macroblock
`does not change significantly with respect to the reference
`picture, an encoder can also choose not to encode it, and
`the decoder will simply repeat the macroblock located at the
`subject macroblock’s spatial location in the reference picture.
`
`B. Optional Modes
`In addition to the core encoding and decoding algorithms
`described above, H.263 includes four negotiable advanced
`coding modes: unrestricted motion vectors, advanced predic-
`tion, PB frames, and syntax-based arithmetic coding. The first
`two modes are used to improve inter picture prediction. The
`PB-frames mode improves temporal resolution with little bit
`rate increase. When the syntax-based arithmetic coding mode
`is enabled, arithmetic coding replaces the default VLC coding.
`These optional modes allow developers to trade off between
`compression performance and complexity. We next provide
`a brief description of each of these modes. A more detailed
`description of such modes can be found in [11] and [12].
`1) Unrestricted Motion Vector Mode (Annex D): In base-
`line H.263, motion vectors can only reference pixels that are
`within the picture area. Because of this, macroblocks at the
`border of a picture may not be well predicted. When the
`unrestricted motion vector mode is used, motion vectors can
`take on values in the range [ 31.5, 31.5] instead of [ 16,
`15.5], and are allowed to point outside the picture boundaries.
`
`The longer motion vectors improve coding efficiency for larger
`picture formats, i.e., 4CIF or 16CIF. Moreover, by allowing
`motion vectors to point outside the picture, a significant
`gain is achieved if there is movement along picture edges.
`This is especially useful in the case of camera movement or
`background movement.
`2) Syntax-Based Arithmetic Coding Mode (Annex E):
`Baseline H.263 employs variable-length coding as a means of
`entropy coding. In this mode, syntax-based arithmetic coding
`is used. Since VLC and arithmetic coding are both lossless
`coding schemes, the resulting picture quality is not affected,
`yet the bit rate can be reduced by approximately 5% due to
`the more efficient arithmetic codes. It is worth noting that use
`of this annex is not widespread.
`3) Advanced Prediction Mode (Annex F): This mode al-
`lows for the use of four motion vectors per macroblock, one
`8 luminance blocks. Furthermore,
`for each of the four 8
`overlapped block motion compensation is used for
`the
`luminance macroblocks, and motion vectors are allowed to
`point outside the picture as in the unrestricted motion vector
`mode. Use of this mode improves inter picture prediction, and
`yields a significant improvement in subjective picture quality
`for the same bit rate by reducing blocking artifacts.
`4) PB-Frames Mode (Annex G): In this mode, the frame
`structure consists of a P picture and a B picture, as illustrated
`in Fig. 5(a). The quantized DCT coefficients of the B and P
`pictures are interleaved at the macroblock layer such that a
`P-picture macroblock is immediately followed by a B-picture
`macroblock. Therefore, the maximum number of blocks trans-
`mitted at the macroblock layer is 12 rather than 6. The P
`picture is forward predicted from the previously decoded P
`
`Realtime Adaptive Streaming LLC
`Exhibit 2011
`IPR2019-01035
`Page 4
`
`

`

`C ˆOT ´E et al.: H.263+
`
`853
`
`picture. The B picture is bidirectionally predicted from the
`previously decoded P picture and the P picture currently being
`decoded. The forward and backward motion vectors for a B
`macroblock are calculated by scaling the motion vector from
`the current P-picture macroblock using the temporal resolution
`of the P and B pictures with respect to the previous P picture.
`If this motion vector does not yield a good prediction, it can
`be enhanced by a delta vector. The delta vector is obtained by
`performing motion estimation, within a small search window,
`around the calculated motion vectors.
`When decoding a PB-frame macroblock, the P macroblock
`is reconstructed first, followed by the B macroblock since
`the information from the P macroblock is needed for B-
`macroblock prediction. When using the PB-frames mode, the
`picture rate can be doubled without a significant increase in
`bit rate.
`
`III. THE ITU-T H.263+ STANDARD
`The objective of H.263+ is to broaden the range of ap-
`plications and to improve compression efficiency. H.263+, or
`H.263 version 2, is backward compatible with H.263. Not only
`is this critical due to the large number of video applications
`currently using the H.263 standard, but it is also required by
`ITU-T rules.
`H.263+ offers many improvements over H.263. It allows the
`use of a wide range of custom source formats, as opposed to
`H.263, wherein only five video source formats defining picture
`size, picture shape, and clock frequency can be used. This
`added flexibility opens H.263+ to a broader range of video
`scenes and applications, such as wide format pictures, resize-
`able computer windows, and higher refresh rates. Moreover,
`picture size, aspect ratio, and clock frequency can be specified
`as part of the H.263+ bit stream. Another major improvement
`of H.263+ over H.263 is scalability, which can improve the
`delivery of video information in error-prone, packet-lossy,
`or heterogeneous environments by allowing multiple display
`rates, bit rates, and resolutions to be available at the decoder.
`Furthermore, picture segment1 dependencies may be limited,
`likely reducing error propagation.
`
`A. H.263+ Optional Modes
`Next, we describe each of the 12 new optional coding
`modes of the H.263+ video coding standard, including the
`modification of H.263’s unrestricted motion vector mode when
`used within an H.263+ framework.
`1) Unrestricted Motion Vector Mode (Annex D): The defi-
`nition of the unrestricted motion vector mode in H.263+ is
`different from that of H.263. When this mode is employed
`within an H.263+ framework, new reversible VLC’s (RVLC’s)
`are used for encoding the difference motion vectors. These
`codes are single valued, as opposed to the earlier H.263 VLC’s
`which were double valued. The double-valued codes were not
`popular due to limitations in their extendibility, and also to
`their high implementation cost. Reversible VLC’s are easy to
`
`1 A picture segment is defined as a slice or any number of GOB’s preceded
`by a GOB header.
`
`Fig. 6. Neighboring blocks used for intra prediction in the advanced intra
`coding mode.
`
`implement as a simple state machine can be used to generate
`and decode them.
`More importantly, reversible VLC’s can be used to increase
`resilience to channel errors. The idea behind RVLC’s is that
`decoding can be performed by processing the received motion
`vector part of the bit stream in the forward and reverse
`directions. If an error is detected while decoding in the forward
`direction, motion vector data are not completely lost as the
`decoder can proceed in the reverse direction; this improves
`error resilience of the bit stream [13].2 Furthermore, the motion
`vector range is extended to up to
`256,
`255.5] depending
`on the picture size, as depicted in Table I. This is very useful
`given the wide range of new picture formats available in
`H.263+.
`2) Advanced Intra Coding Mode (Annex I): This mode im-
`proves compression performance when coding intra mac-
`roblocks. In this mode, inter block prediction from neighboring
`intra coded blocks, a modified inverse quantization of intra
`DCT coefficients, and a separate VLC table for intra coded
`coefficients are employed. Block prediction is performed using
`data from the same luminance or chrominance components
`(
`or
`). As illustrated in Fig. 6, one of three different
`prediction options can be signaled: DC only, vertical DC and
`AC, or horizontal DC and AC. In the DC only option, only the
`DC coefficient is predicted, usually from both the block above
`and the block to the left, unless one of these blocks is not in the
`same picture segment or is not an intra block. In the vertical
`DC and AC option, the DC and first row of AC coefficients
`are vertically predicted from those of the block above. Finally,
`in the horizontal DC and AC option, the DC and first column
`of AC coefficients are horizontally predicted from those of the
`
`2 To exploit the full error resilience potential of RVLC’s, the motion vector
`bits should be blocked into one stream for each video frame, concatenating a
`large number of RLVC’s. This can be performed by data partitioning, which
`is currently being proposed in H.263++.
`
`Realtime Adaptive Streaming LLC
`Exhibit 2011
`IPR2019-01035
`Page 5
`
`

`

`854
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 8, NO. 7, NOVEMBER 1998
`
`TABLE I
`MOTION VECTOR RANGE IN H.263+’S
`UNRESTRICTED MOTION VECTOR RANGE MODE
`
`block to the left. The option that yields the best prediction is
`applied to all blocks of the subject intra macroblock.
`The difference coefficients, obtained by subtracting the
`predicted DCT coefficients from the original ones, are then
`quantized and scanned differently, depending on the selected
`prediction option. Three scanning patterns are used: the basic
`zigzag scan for DC only prediction,
`the alternate-vertical
`scan (as in MPEG-2) for horizontally predicted blocks, or
`the alternate-horizontal scan for vertically predicted blocks.
`The main part of the standard employs the same VLC table
`for coding all quantized coefficients. However, this table is
`designed for inter macroblocks and is not very effective
`for coding intra macroblocks. In intra macroblocks, larger
`coefficients with smaller runs of zeros are more common.
`Thus, the advanced intra coding mode employs a new VLC
`table for encoding the quantized coefficients, a table that is
`optimized to global statistics of intra macroblocks.
`intro-
`3) Deblocking Filter Mode (Annex J): This mode
`duces a deblocking filter inside the coding loop. Unlike in
`postfiltering, predicted pictures are computed based on filtered
`versions of the previous ones. A filter is applied to the edge
`boundaries of the four luminance and two chrominance 8
`8
`blocks. The filter is applied to a window of four edge pixels
`in the horizontal direction, and it is then similarly applied in
`the vertical direction. The weight of the filter’s coefficients
`depend on the quantizer step size for a given macroblock,
`where stronger coefficients are used for a coarser quantizer.
`This mode also allows the use of four motion vectors per
`macroblock, as specified in the advanced prediction mode of
`H.263, and also allows motion vectors to point outside picture
`boundaries, as in the unrestricted motion vector mode. The
`above techniques, as well as filtering, result in better prediction
`and a reduction in blocking artifacts. The computationally
`expensive overlapping motion compensation operation of the
`advanced prediction mode is not used here in order to keep
`the additional complexity of this mode minimal.
`4) Slice Structured Mode (Annex K): A slice structure, in-
`stead of a GOB structure, is employed in this mode. This
`allows the subdivision of the picture into segments containing
`variable numbers of macroblocks. The slice structure consists
`of a slice header followed by consecutive complete mac-
`roblocks. Two additional submodes can be signaled to reflect
`the order of transmission, sequential or arbitrary, and the
`shape of the slices, rectangular or not. These add flexibility
`to the slice structure so that it can be designed for different
`environments and applications. For example, rectangular slices
`can be used to subdivide a picture into rectangular regions of
`interest for region-based coding. The slice header locations
`
`within the bit stream act as resynchronization points, which
`help the decoder recover from bit errors and packet losses.
`They also allow slices to be decoded in an arbitrary order.
`5) Supplemental Enhancement Information Mode (Annex
`L): In this mode, supplemental information is included in
`the bit stream in order to offer display capabilities within
`the coding framework. This supplemental information includes
`support for picture freeze, picture snapshot, video segmenta-
`tion, progressive refinement, and chroma keying. These added
`functionalities are externally negotiated at the system layer
`(using H.245 for example) to ensure picture synchronization.
`The picture freeze option allows the encoder to signal a
`complete or partial freeze of a picture. Rectangular areas of
`a picture can be frozen while the rest of the picture is still
`being updated. A picture freeze release code is explicitly sent
`to the decoder. The picture snapshot option allows part of
`or the full picture to be used as a still image snapshot by
`an external application. When video subsequences can be
`used by an external application, such can be signaled by
`the video segmentation option of this mode. The progressive
`refinement option signals to the decoder that the following
`pictures represent a refinement in quality of the subject picture,
`as opposed to pictures at different times. The chroma keying
`option indicates that transparent or semitransparent pixels can
`be employed during the video decoding process. When set
`on, transparent pixels are not displayed. Instead, a background
`picture that is externally controlled is displayed.
`All of the above options are aimed at providing decoder
`supporting features and functionalities within the video bit
`stream. For example, such options will facilitate interoper-
`ability between different applications within the context of
`windows-based environments.
`6) Improved PB-Frames Mode (Annex M): This mode is
`an enhanced version of the H.263 PB-frames mode. The
`main difference is that the H.263 PB-frames mode allows
`only bidirectional prediction to predict B pictures in a
`PB frame, whereas the improved PB-frames mode permits
`forward, backward, and bidirectional prediction as illustrated
`in Fig. 5. Bidirectional prediction methods, as illustrated in
`Fig. 5(d), are the same in both modes, except that, in the
`improved PB-frames mode, no delta vector is transmitted. In
`forward prediction, as shown in Fig. 5(b), the B macroblock
`is predicted from the previous P macroblock, and a separate
`motion vector is then transmitted. In backward prediction,
`as illustrated in Fig. 5(c), the predicted macroblock is equal
`to the future P macroblock, and therefore no motion vector
`is transmitted. Use of the additional forward and backward
`predictors makes the improved PB frames less susceptible to
`significant changes that may occur between pictures.
`7) Reference Picture Selection Mode (Annex N): In H.263,
`a picture is predicted from the previous picture. If a part of the
`subject picture is lost due to channel errors or packet loss, the
`quality of future pictures can be severely degraded. Using this
`mode, it is possible to select the reference picture for prediction
`in order to suppress temporal error propagation due to inter
`coding. Multiple pictures must be stored at the decoder, and
`the encoder should signal the necessary amount of additional
`picture memory by external means. The information which
`
`Realtime Adaptive Streaming LLC
`Exhibit 2011
`IPR2019-01035
`Page 6
`
`

`

`C ˆOT ´E et al.: H.263+
`
`855
`
`specifies the selected picture for prediction is included in the
`encoded bit stream.
`If a back-channel is employed, two back-channel mode
`switches define four messaging methods (NEITHER, ACK,
`NACK, and ACK+NACK) that
`the encoder and decoder
`employ to determine which picture segment will be used for
`prediction. For example, a NACK sent to the encoder from
`the decoder signals that a given picture has been degraded by
`errors. Thus, the encoder may choose not to use this picture for
`future prediction, and instead employ a different, unaffected,
`reference picture. This mode reduces error propagation, thus
`maintaining good picture reproduction quality in error-prone
`environments.
`8) Temporal, SNR, and Spatial Scalability Mode (Annex
`O): This mode specifies syntax to support temporal, SNR,
`and spatial scalability capabilities. Scalability is a desirable
`property for error-prone and heterogeneous environments. It
`implies that the encoder’s output bit stream can be manipulated
`any time after it has been generated. This property is desirable
`in order to counter limitations such as constraints on bit
`rate, display resolution, network throughput, and decoder
`complexity. In multipoint and broadcast video applications,
`such constraints cannot be foreseen at the time of encoding.
`Temporal scalability provides a mechanism for enhancing
`perceptual quality by increasing the picture display rate. This
`is achieved via bidirectionally predicted pictures,
`inserted
`between anchor picture pairs and predicted from either one or
`both of these anchor pictures, as illustrated in Fig. 7(a). Thus,
`for the same quantization level, B pictures yield increased
`compression as compared to forward predicted P pictures. B
`pictures are not used as anchor pictures, i.e., other pictures are
`never predicted from them. Therefore, they can be discarded
`without impacting picture quality of future pictures; hence,
`the name temporal scalability. Note that, while B pictures
`improve compression performance as compared to P pictures,
`they increase encoder complexity and memory requirements
`and introduce additional delays.
`Spatial scalability and SNR scalability are closely related,
`the only difference being the increased spatial resolution
`provided by spatial scalability. An example of SNR scalable
`pictures is shown in Fig. 7(b). SNR scalability implies the
`creation of multirate bit streams. It allows for the recovery of
`coding error, or the difference, between an original picture and
`its reconstruction. This is achieved by using a finer quantizer
`to encode the difference picture in an enhancement layer.
`This additional information increases the SNR of the overall
`reproduced picture; hence, the name SNR scalability.
`Spatial scalability allows for the creation of multiresolution
`bit streams to meet varying display requirements/constraints
`for a wide range of clients. A spatial scalable structure
`is illustrated in Fig. 7(c). It
`is essentially the same as in
`SNR scalability, except that a spatial enhancement layer here
`attempts to recover the coding loss between an upsampled ver-
`sion of the reconstructed reference layer picture and a higher
`resolution version of the original picture. For example, if the
`reference layer has a QCIF resolution, and the enhancement
`layer has a CIF resolution, the reference layer picture must
`be scaled accordingly such that the enhancement layer picture
`
`(a)
`
`(b)
`
`(c)
`
`Fig. 7.
`
`Illustration of scalability features. (a) Temporal. (b) SNR. (c) Spatial.
`
`can be appropriately predicted from it. The standard allows the
`resolution to be increased by a factor of 2 in the vertical only,
`horizontal only, or both the vertical and horizontal directions
`for a single enhancement layer. There can be multiple enhance-
`ment layers, each increasing picture resolution over that of the
`previous layer. The interpolation filters used to upsample the
`reference layer picture are explicitly defined in the standard.
`Aside from the upsampling process from the reference to the
`enhancement layer, the processing and syntax of a spatially
`scaled picture are identical to those of an SNR scaled picture.
`In either SNR or spatial scalability, the enhancement layer
`pictures are referred to as EI or EP pictures. If the enhancement
`layer picture is upward predicted, from a picture in the
`reference layer, then the enhancement layer picture is referred
`to as an enhancement-I (EI) picture. In some cases, when
`reference layer pictures are coarsely represented, over coding
`of static parts of the picture can occur in the enhancement
`
`Realtime Adaptive Streaming LLC
`Exhibit 2011
`IPR2019-01035
`Page 7
`
`

`

`856
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 8, NO. 7, NOVEMBER 1998
`
`layer, requiring an unnecessarily excessive bit rate. To avoid
`this problem, forward prediction is permitted in the enhance-
`ment layer. A picture that can be forward predicted from a
`previous enhancement layer picture or upward predicted from
`the reference layer picture is referred to as an enhancement-P
`(EP) picture. Note that computing the average of the upward
`and forward predicted pictures can provide a bidirectional
`prediction option for EP pictures. For both EI and EP pictures,
`upward prediction from the reference layer picture implies
`that no motion vectors are required. In the case of forward
`prediction for EP pictures, motion vectors are required.
`9) Reference Picture Resampling Mode (Annex P): Th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket