throbber
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 3, NO. 1, MARCH 2001
`
`53
`
`The MPEG-4 Fine-Grained Scalable Video Coding
`Method for Multimedia Streaming Over IP
`
`Hayder M. Radha, Member, IEEE, Mihaela van der Schaar, and Yingwei Chen
`
`Abstract—Real-time streaming of audiovisual content over the
`Internet is emerging as an important technology area in multi-
`media communications. Due to the wide variation of available
`bandwidth over Internet sessions, there is a need for scalable
`video coding methods and (corresponding) flexible streaming
`approaches that are capable of adapting to changing network
`conditions in real time. In this paper, we describe a new scalable
`video-coding framework that has been adopted recently by the
`MPEG-4 video standard. This new MPEG-4 video approach,
`which is known as Fine-Granular-Scalability (FGS), consists of
`a rich set of video coding tools that support quality (i.e., SNR),
`temporal, and hybrid temporal-SNR scalabilities. Moreover, one
`of the desired features of the MPEG-4 FGS method is its simplicity
`and flexibility in supporting unicast and multicast streaming
`applications over IP.
`
`Index Terms—Author, please supply index terms. E-mail key-
`words@ieee.org for info.
`
`I. INTRODUCTION
`
`T HE transmission of multimedia content over the World
`
`Wide Web (WWW) has been growing steadily over the
`past few years. This is evident from the large number of popular
`web sites that include multimedia content specifically designed
`for streaming applications. The growth in streaming audiovisual
`information over the web has being increasing rather dramati-
`cally without any evidence of the previously-feared collapse in
`the Internet or its global backbone. Consequently, multimedia
`streaming and the set of applications that rely on streaming are
`expected to continue growing. Meanwhile, the current quality
`of streamed multimedia content, in general, and video in par-
`ticular still needs a great deal of improvement before Internet
`video can be accepted by the masses as an alternative to tele-
`vision viewing. A primary objective of most researchers in the
`field, however, is to mature Internet video solutions to the level
`when viewing of good-quality video of major broadcast televi-
`sion events (e.g., the Super Bowl, Olympics, World Cup, etc.)
`over the WWW becomes a reality [10]–[15].
`To achieve this level of acceptability and proliferation of In-
`ternet video, there are many technical challenges that have to be
`addressed in the two areas of video-coding and networking. One
`
`Manuscript received May 31, 2000; revised December 1, 2000. The associate
`editor coordinating the review of this manuscript and approving it for publica-
`tion was Dr. K. J. R. Liu.
`H. M. Radha is with the Video Communications Research Department,
`Philips Research Laboratories, Briarcliff Manor, NY 10510 USA, and also
`with the Department of Electrical and Computer Engineering, Michigan State
`University, East Lansing, MI 48823 USA (e-mail: radha@egr.msu.edu).
`M. van der Schaar and Y. Chen are with the Video Communications Research
`Department, Philips Research Laboratories, Briarcliff Manor, NY 10510 USA.
`Publisher Item Identifier S 1520-9210(01)01863-6.
`
`generic framework that addresses both the video-coding and
`networking challenges associated with Internet video is scal-
`ability. From a video-coding point-of-view, scalability plays a
`crucial role in delivering the best possible video quality over un-
`predictable “best-effort” networks. Bandwidth variation is one
`of the primary characteristics of “best-effort” networks, and the
`Internet is a prime example of such networks [38]. Therefore,
`video scalability enables an application to adapt the streamed-
`video quality to changing network conditions (and specifically
`to bandwidth variation). From a networking point-of-view, scal-
`ability is needed to enable a large number of users to view any
`desired video stream, at anytime, and from anywhere. This leads
`to the requirement that servers and the underlying transport pro-
`tocols should be able to handle the delivery of a very large
`number (hundreds, thousands, or possibly millions) of video
`streams simultaneously.
`Consequently, any scalable Internet video-coding solution
`has to enable a very simple and flexible streaming framework,
`and hence, it must meet the following requirements [3].
`1) The solution must enable a streaming server to perform
`minimal real-time processing and rate control when out-
`putting a very large number of simultaneous unicast (on-
`demand) streams.
`2) The scalable Internet video-coding approach has to be
`highly adaptable to unpredictable bandwidth variations
`due to heterogeneous access-technologies of the receivers
`(e.g., analog modem, cable mode, xDSL, etc.) or due to
`dynamic changes in network conditions (e.g., congestion
`events).
`3) The video-coding solution must enable low-complexity
`decoding and low-memory requirements to provide
`common receivers (e.g., set-top-boxes and digital televi-
`sions), in addition to powerful computers, the opportunity
`to stream and decode any desired Internet video content.
`4) The
`streaming
`framework
`and
`related
`scalable
`video-coding approach should be able to support
`both multicast and unicast applications. This, in gen-
`eral, eliminates the need for coding content in different
`formats to serve different types of applications.
`5) The scalable bitstream must be resilient to packet loss
`events, which are quite common over the Internet.
`The above requirements were the primary drivers beyond
`the design of the fine-granular-scalability (FGS) video-coding
`scheme introduced originally in [1]. Although there are other
`promising video-coding schemes that are capable of supporting
`different degrees of scalability, they, in general, do not meet all
`of the above requirements. For example, the three–dimensional
`(3-D) wavelet/sub-band-based coding schemes require large
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`1520–9210/01$10.00 © 2001 IEEE
`
`1
`
`SAMSUNG-1040
`
`

`

`54
`
`IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 3, NO. 1, MARCH 2001
`
`memory at the receiver, and consequently they are undesirable
`for low-complexity devices [19]–[21]. In addition, some of
`these methods rely on motion-compensation to improve the
`coding efficiency at the expense of sacrificing scalability and
`resilience to packet
`losses [19], [20]. Other video-coding
`techniques totally avoid any motion-compensation and conse-
`quently sacrifice a great deal of coding efficiency [21], [37].
`The FGS framework, as explained further in the docu-
`ment, strikes a good balance between coding efficiency and
`scalability while maintaining a very flexible and simple
`video-coding structure. When compared with other packet-loss
`resilient streaming solutions (e.g., [40], [41]), FGS has also
`demonstrated good resilience attributes under packet losses
`[42]. Moreover, and after new extensions and improvements
`to its original framework,1 FGS has been recently adopted
`by the ISO MPEG-4 video standard as the core video-coding
`method for MPEG-4 streaming applications [4]. Since the first
`version of the MPEG-4 FGS draft standard [5], there have
`been several improvements introduced to the FGS framework.
`In particular, we highlight three aspects of the improved FGS
`method. First, a very simple residual-computation approach
`was proposed in [6]. Despite its simplicity, this approach pro-
`vides the same or better performance than the performance of
`more elaborate residual-computation methods. (As explained
`later, yet another alternative approach for computing the FGS
`residual has been proposed very recently [46]). Second, an
`“adaptive quantization” approach was proposed in [7], and it
`resulted in two FGS-based video-coding tools. Third, a hybrid
`all-FGS scalability structure was also proposed recently [8],
`[9]. This novel FGS scalability structure enables quality [i.e.,
`signal-to-noise-ratio (SNR)], temporal, or both temporal-SNR
`scalable video coding and streaming. All of these improvements
`to FGS (i.e., simplified residual computation, “adaptive-quan-
`tization,” and the new all-FGS hybrid scalability) have already
`been adopted by the MPEG-4 video standard [4].
`In this paper, we describe the MPEG-4 FGS framework
`and its new video coding tools which have not been presented
`outside the MPEG-4 community. The remainder of the paper
`is organized as follows. Section II describes the SNR FGS
`framework, its ability in supporting unicast and multicast In-
`ternet video applications, and its basic coding tools. Section III
`presents the “adaptive quantization” approach for the FGS
`enhancement-layer signal and the related video-coding tools
`adopted by MPEG-4. Section IV describes the FGS-based
`hybrid temporal-SNR scalability method. Simulation results
`will be shown in each section to demonstrate the performance
`of the corresponding video coding tool. Section V concludes
`the paper with a summary.
`
`II. SNR FGS VIDEO CODING METHOD
`
`In order to meet the requirements outlined in the previous sec-
`tion, FGS encoding is designed to cover any desired bandwidth
`range while maintaining a very simple scalability structure. As
`shown in Fig. 1, the FGS structure consists of only two layers: a
`
`1The original FGS framework was introduced and described in [1] and [3].
`Meanwhile, FGS was first introduced to MPEG-4 in [2].
`
`and a single enhancement-layer
`base-layer coded at a bitrate
`coded using a fine-granular (or embedded) scheme to a max-
`imum bitrate of
`. This structure provides a very efficient, yet
`simple, level of abstraction between the encoding and streaming
`processes. The encoder only needs to know the range of band-
`width
`over which it has to code
`the content, and it does not need to be aware of the particular
`bitrate the content will be streamed at. The streaming server on
`the other hand has a total flexibility in sending any desired por-
`tion of any enhancement layer frame (in parallel with the cor-
`responding base layer picture), without the need for performing
`complicated real-time rate control algorithms. This enables the
`server to handle a very large number of unicast streaming ses-
`sions and to adapt to their bandwidth variations in real-time.
`On the receiver side, the FGS framework adds a small amount
`of complexity and memory requirements to any standard mo-
`tion-compensation based video decoder. These advantages of
`the FGS framework are achieved while maintaining rather sur-
`prisingly good coding-efficiency results (as will be illustrated at
`the end of this section).
`For multicast applications, FGS also provides a flex-
`ible framework for the encoding, streaming, and decoding
`processes. Identical to the unicast case, the encoder com-
`presses the content using any desired range of bandwidth
`. Therefore, the same compressed
`streams can be used for both unicast and multicast applications.
`At time of transmission, the multicast server partitions the FGS
`enhancement layer into any preferred number of “multicast
`channels” each of which can occupy any desired portion of the
`total bandwidth (see Fig. 2). At the decoder side, the receiver
`can “subscribe” to the “base-layer channel” and to any number
`of FGS enhancement-layer channels that the receiver is capable
`of accessing (depending for example on the receiver access
`bandwidth). It is important to note that regardless of the number
`of FGS enhancement-layer channels that the receiver subscribes
`to, the decoder has to decode only a single enhancement-layer
`as shown in Fig. 2.
`The above approach for multicasting FGS [12] is based on the
`receiver-driven layered multicast framework that is supported
`by the IP Multicast backBONE (i.e., the MBONE) [36], [37].
`Therefore, the IP multicast control and routing protocols needed
`for multicasting FGS-based streams have already been defined
`and supported by IP-multicast enabled routers. Moreover, new
`multicast routing protocols and architectures can further en-
`hance the delivery of FGS-based multicast applications [10],
`[11], [15]–[17].
`After this overview of the FGS framework, below we describe
`the basic SNR-based FGS encoder and decoder.
`
`A. Basic FGS Encoder and Decoder
`As shown in Fig. 1, the FGS framework requires two en-
`coders, one for the base-layer and the other for the enhancement
`layer. The base-layer can be compressed using any motion-com-
`pensation video encoding method. Naturally, the DCT-based
`MPEG-4 video standard is a good candidate for the base-layer
`encoder due to its coding efficiency especially at low bitrates.
`Prior to introducing FGS, MPEG-4 included a very rich set of
`video coding tools most of which are applicable for the FGS
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`2
`
`

`

`RAHDA et al.: MPEG-4 FINE-GRAINED SCALABLE VIDEO CODING METHOD
`
`55
`
`the enhancement layer is a sensible option [43]. Therefore, the
`basic SNR MPEG-4 FGS coding scheme is built upon: a) the
`original FGS scalability structure proposed in [1], [2], and b)
`embedded DCT coding of the enhancement layer as proposed
`in [26] and [27].
`Using the DCT transform at both the base and enhancement
`layers enables the encoder to perform a simple residual compu-
`tation2 of the FGS enhancement-layer as shown in Fig. 3 [6].
`Each DCT FGS-residual frame consists of
`bitplanes:
`
`is the maximum DCT (magnitude) value of the
`where
`residual frame under consideration.3 After identifying
`and the corresponding
`,
`the FGS enhancement-layer
`encoder scans the residual signal using the traditional zig-zag
`scanning method starting from the most significant bitplane
`and ending at the least significant bitplane4
`as shown in Fig. 3(b). Every bitplane consists of nonoverlap-
`ping
`macroblocks (MB’s), and each MB includes four
`luminance (
`) blocks and two chroma blocks (
`and
`). Run-length codes are used for (lossless) entropy-coding
`of the zeros and ones in each
`bitplane block [4]. This
`process generates variable length codes that constitute the FGS
`compressed bitstream. A special “all-zero blocks” code is used
`when all six bitplane-blocks (within a given bitplane-mac-
`roblock) do not have any bits with a value of one.
`At the receiver side, the FGS bitstream is first decoded by
`a variable length decoder (VLD) as shown in Fig. 4. Due to
`the embedded nature of the FGS stream, the VLD re-generates
`the DCT residual bitplanes starting from the most significant
`bitplane toward the least significant one. Moreover, due to the
`type of scanning used by the FGS encoder [Fig. 3(b)], it is pos-
`sible that the decoder does not receive all of the bitplane-blocks
`that belong to a particular bitplane. Any bitplane block not re-
`ceived by the decoder can be filled with zero values.5 The re-
`sulting DCT residual is then inverse-transformed to generate the
`SNR residual pixels. These residual pixels are then added to the
`base-layer decoder output to generate the final enhanced scal-
`able video.
`In summary, the basic SNR FGS codec employs embedded
`DCT variable length encoding and decoding operations that re-
`
`2It is important to note that there is an alternative approach for computing the
`FGS residual [46]. This alternative approach is based on computing the residual
`after clipping the base-layer reference picture in the pixel domain. Therefore,
`this approach, which is known as the “post clipping” method, computes the
`FGS residual in the pixel domain, and consequently it requires an additional
`DCT computation of the FGS residual prior to performing the bitplane coding.
`As reported in [46], there is no noticeable difference in the performance of both
`methods. Throughout this document we describe FGS based on the “pre clip-
`ping” residual computation approach which eliminates the need for performing
`DCT computation within the FGS enhancement-layer encoder.
`3In the FGS MPEG-4 standard, three parameters are used for the number-of-
`bitplanes variable N : N (Y ), N (U ), and N (V ) for the luminance
`and chroma components of the video signal [4].
`4Alternatively, the encoder may stop encoding the residual signal if the de-
`sired maximum bitrate is reached.
`5For an “optimal” reconstruction (in a mean-square-error sense) of the DCT
`coefficients, one-fourth (1/4) of the received quantization step-sized as added.
`For example, if the decoder receives only the MSB of a coefficient (with a value
`x, where x = 0 or 1), then this coefficient is reconstructed using the value
`x01000 (i.e., instead of x0000 ).
`
`Fig. 1. Examples of the FGS scalability structure at the encoder (left),
`streaming server (center), and decoder (right) for a typical unicast Inernet
`streaming application. The top and bottom rows of the figure represent
`base-layers without and with bidirectional (B) frames, respectively.
`
`Fig. 2. Example of an FGS-based multicast scenario. (The distribution of the
`base-layer is implicit and therefore is not shown in the figure.)
`
`base-layer. For a complete description of these tools, the reader
`is referred to [4], [39].
`In principle, the FGS enhancement-layer encoder can be
`based on any fine-granular coding method. When FGS was
`first introduced to MPEG-4, three approaches were proposed
`for coding the FGS enhancement layer: wavelet, DCT, and
`matching-pursuit based methods [2]. This led to several pro-
`posals and extensive evaluation of these and related approaches
`(see, for example, [22]–[33]). In particular, the performance of
`different variations of bitplane DCT-based coding [26], [27]
`and wavelet compression methods were studied, compared,
`and presented recently in [43]. Based on a thorough analysis
`of the FGS enhancement-layer (SNR) signal, the study in
`[43] concluded that both bitplane DCT coding and embedded
`zero-tree wavelet (EZW) based compression provide very
`similar results. The same conclusion was reached by the
`MPEG-4 FGS effort. Consequently, and due to the fact that
`the FGS base-layer is coded using MPEG-4 compliant DCT
`coding, employing embedded DCT method for compressing
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`3
`
`

`

`56
`
`IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 3, NO. 1, MARCH 2001
`
`(a) Basic (SNR) FGS encoders for the base and enhancement layers. It is clear that the added complexity of the FGS enhancement-layer encoder is
`Fig. 3.
`relatively small. (b) The scanning order of the FGS enhancement-layer residual DCT coefficients. Scanning starts from the most-significant-bitplane (MSB) toward
`the least-significant-bitplane. Each 8  8 bitplane-block is scanned using the traditional zig-zag pattern.
`
`Fig. 4. Basic structure of the FGS SNR decoder. The FGS decoder includes bitplane de-shifting to compensate for the two “adaptive quantization” encoding
`tools: selective enhancement and frequency weighting.
`
`semble the ones used in typical DCT based standards. In pre-
`vious standards (including MPEG-4 base-layer), the DCT co-
`efficients are coded with (run-length, amplitude) type of codes,
`whereas with FGS the bitplane ones-and-zeros are coded with
`(run-length) codes since the “amplitude” is always one. For
`more information about the VLC codes used by FGS, the reader
`is referred to [4].
`
`B. Performance Evaluation of the FGS SNR Coding Method
`
`In particular, rate-distortion (RD) results of multilayer (dis-
`crete) SNR scalable video coding were compared with FGS
`results over wide ranges of bitrates (e.g., [28]). These results
`have clearly shown that FGS coding provides the same or better
`coding efficiency as traditional SNR scalability methods. Fig. 5
`shows an example of these results for one of the MPEG-4 video
`sequences. As illustrated in the figure, FGS outperforms multi-
`layer SNR scalable coding over a wide range of bitrates and for
`both QCIF and CIF resolution video. (For more extensive data
`on the comparison between FGS and multilayer SNR coding,
`the reader is referred to [28].)
`
`The performance of FGS has been compared thoroughly with
`the performance of traditional SNR scalability video coding.6
`
`6Traditional SNR scalability coding methods include the ones supported by
`MPEG-2 and MPEG-4 (i.e., prior to FGS).
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`4
`
`

`

`RAHDA et al.: MPEG-4 FINE-GRAINED SCALABLE VIDEO CODING METHOD
`
`57
`
`Fig. 5. Performance of FGS coding and “traditional” MPEG-4 SNR coding with multiple layers. It is clear from these plots that FGS outperforms multilayer
`(discrete) SNR coding over a wide range of bitrates. For more data on the comparison between the performance of FGS and multilayer SNR compression, the
`reader is referred to [28].
`
`Fig. 6. Three frames from the “Stefan” sequence. This is an example of a sequence that exhibits a high-degree of temporal correlation among successive frames.
`The pictures shown here are 25-frame apart yet most of the background is very similar from one picture to another.
`
`Here, we focus on two important (yet related) aspects of
`FGS rate-distortion performance. Before discussing these
`two aspects, it is important to highlight that one of the key
`advantages of FGS is its simplicity and flexibility in supporting
`adaptive streaming applications. Naturally,
`this flexibility
`comes, in general, at the expense in video quality. Hence,
`one important question is: how much penalty is being paid in
`quality when comparing FGS with a nonscalable stream that
`is coded at a particular bitrate
`? Therefore, one aspect of
`FGS performance that we would like to address here is how
`does FGS compare with a set of nonscalable streams coded
`at discrete bitrates (e.g.,
`,
`)
`covering the same bandwidth range
`? Although
`this type of comparison may seem to be unfair to FGS—since
`the (multiple) nonscalable streams are optimized for particular
`bitrates whereas FGS covers the same range of bandwidth
`with a single enhancement-layer, this comparison provides an
`insight into the theoretical (upper) limits of FGS’s rate-dis-
`tortion performance. Moreover, since the (ideal) nonscalable
`multiple-streams’ scenario represents an extreme case of
`inflexibility, this comparison provides an insight into the level
`of quality-penalty being paid for FGS’s flexibility.
`A related aspect to the above question is the impact of the
`base-layer (coded at a given bitrate
`) on the overall perfor-
`mance of FGS over the range of bandwidth
`. In this
`section, we will try to shed some light on these two aspects in a
`joint manner. To achieve that, we have conducted a very compre-
`hensive evaluation of a large number of sequences with different
`
`motion and texture characteristics. Each sequence was coded at
`multiple (discrete) bitrates
`,
`,
`to generate
`the nonscalable streams at these rates. Then, we used the non-
`scalable streams (coded with a bitrate
`,
`, 2,
`) to
`generate corresponding FGS streams that covers the bandwidth
`range
`.
`To illustrate some of the key conclusions of our simulation
`study, we show here the results of two video sequences coded
`in the range 100 kbit/s to 1 Mbit/s, at 10 frame/s, and with a
`CIF resolution.7 The two selected sequences: “Stefan” (shown
`in Fig. 6) and “Flying” (Fig. 7). These sequences, “Stefan” and
`“Flying,” represent two types of content: one type with rela-
`tively high-temporal correlation and the other content without
`significant correlation among frames, respectively.
`
`7This bandwidth range was selected since it represents the type of band-
`width variation one may encounter over “broadband” Internet access (e.g.,
`cable-modem access technologies [18]) which are suitable for video streaming
`applications. FGS performance results over lower-bitrate bandwidth ranges
`were similar to the ones presented here when using lower resolution pictures
`(e.g., using QCIF resolution for 10 kbit/s to 100 kbit/s bitrates). Moreover, it
`is important to highlight here that the selected frame-rate (i.e., 10 frames/s)
`was chosen since it represents the adequate frame rate for a base-layer coded
`at around 100 kbit/s. By employing the SNR (only) scalability of FGS, the
`enhancement layer is “locked” to the base-layer frame rate regardless of the
`bitrate. This issue, which represents an intrinsic limitation of all SNR-based
`scalability coding methods, is resolved when using the hybrid temporal-SNR
`FGS scheme discussed later in this paper. In general, using higher frame rates
`(e.g., 15 frames/s) will lower the PSNR values for both FGS and nonscalable
`streams. However, and depending on the video sequence, the difference in the
`performance between FGS and the nonscalable streams may increase with
`increasing the frame rate of the base-layer stream.
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`5
`
`

`

`58
`
`IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 3, NO. 1, MARCH 2001
`
`Fig. 7. Three frames from the “Flying” sequence. This is an example of a sequence that exhibits a high-degree of motion and scene changes. The pictures shown
`here are only 5-frame apart yet most of the visual content is changing from one picture to another.
`
`Fig. 8. FGS performance in comparison with multiple nonscalable streams’ (ideal) case. The left figure shows the results for a sequence that exhibits a high-degree
`of temporal correlation among successive frames. For this type of sequences, FGS pay penalty in performance due to the absence of motion compensation within
`the enhancement-layer. The right figure shows the performance for a sequence with very-high motion and a large number of scene cuts. For this type of sequences,
`FGS performance is either similar to or even slightly better than the ideal nonscalable case.
`
`The Peak SNR performance numbers are shown in Fig. 8. The
`left figure shows the results for the sequence “Stefan” (which is
`characterized by a relatively high-degree of temporal correlation
`amongsuccessiveframes).Itisclearthatforthesesequences,FGS
`pays some penalty in performance when compared with the ideal
`nonscalable case, due to the absence of motion compensation
`within the FGS enhancement-layer. This penalty manifests itself
`in more “blocky”video for FGS coded sequences whencompared
`with the nonscalable streams, in particular, at low bitrates (e.g.,
`around the 300–500 kbit/s bitrate-range for the Stefan sequence).
`At higher bitrates, the difference in quality is usually less visible.
`It is also clear that selecting a higher bitrate base-layer could
`provide rather significant improvement in quality at the expense
`of decreasing the bandwidth range that FGS covers.
`The right plots in Fig. 8 show the performance for the se-
`quence “Flying” which includes very-high motion scenes and
`a large number of scene cuts. For these sequences, FGS per-
`formance is either similar to or even slightly better than the
`ideal nonscalable case. It is also clear that, here, the impact
`of selecting a higher bitrate base-layer does not provide sig-
`nificant improvement, and therefore one can still cover the de-
`sired (wider) range of bandwidth without paying much penalty
`in quality. Consequently, based on our study, the following key
`conclusions can be made:
`1) When compared with the (ideal) nonscalable coding case,
`FGS suffers the most for sequences with high temporal
`
`correlation between successive frames.8 This result is
`somewhat intuitive since FGS exploits temporal redun-
`dancy only at the base layer, and therefore FGS suffers
`some coding efficiency due to lack of motion compensa-
`tion at the enhancement-layer. (An example of this type
`of sequences is shown in Fig. 6.) Other very common
`examples of such sequences include simple “head-and-
`shoulder” scenes with static background (e.g., scenes of
`news anchors, talk shows, etc.).
`2) On the other hand, for sequences with a high degree of
`motion (e.g., with a large number of scene cuts and/or
`very fast motion), FGS’s rate-distortion performance is
`very good. In these cases, FGS usually (and rather surpris-
`ingly) provides similar (sometimes slightly better) coding
`efficiency when compared with the nonscalable (ideal)
`streams. (An example of this type of sequences is show
`in Fig. 7.) Although this type of video content is not
`as common as the type of sequences mentioned above
`(i.e., in 1), the presence of high-motion video content is
`growing in support of many IP streaming applications.
`Examples of this type of high-motion sequences include
`“movie trailers” (which usually contain a large number of
`scene changes), certain commercials, and news clips with
`high-action content.
`
`8Here, “temporal correlation” is based on subjective observations rather than
`an objective measure.
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`6
`
`

`

`RAHDA et al.: MPEG-4 FINE-GRAINED SCALABLE VIDEO CODING METHOD
`
`59
`
`3) As expected, the base-layer (and its corresponding bi-
`trate) could have a major impact on the overall perfor-
`mance of FGS. In particular, this observation is prevalent
`for the sequences with high-level of temporal correlation
`(i.e., case 1 above).
`4) There is an inherit trade-off between the overall perfor-
`mance and the amount of bandwidth range
`,
`, 2,
`, one needs/desires to cover. For example,
`the average performance of FGS over a bandwidth range
`could be significantly better than the average
`performance over the wider range
`when the
`nonscalable streams coded at
`and
`are used as base-
`layers, respectively. This is usually due, in part, to the fact
`that the nonscalable (base-layer) stream coded at
`has a
`better quality than the lower bitrate stream , and there-
`fore, starting form a higher-quality base-layer naturally
`improves the overall quality. Again, this observation was
`only clear for sequences with high-level of temporal cor-
`relation.
`In summary, FGS provides fairly acceptable to very good re-
`sults even when compared with the multiple (ideal) nonscalable
`streams scenario. In addition, the high-level of flexibility and
`simplicity that FGS provides makes it an attractive solution for
`IP streaming applications. Moreover, FGS are further enhanced
`by two important video coding tools and features as described
`in the following two sections.
`
`III. FGS CODING WITH ADAPTIVE QUANTIZATION
`
`Adaptive quantization is a very useful coding tool for im-
`proving the visual quality of transform-coded video. It is nor-
`mally achieved through a quantization matrix that defines dif-
`ferent quantization step sizes for the different transform coeffi-
`cients within a block (prior to performing entropy coding on
`these coefficients). For example, the dc coefficient and other
`“low frequency” coefficients normally contribute more to the vi-
`sual quality and consequently small step sizes are used for quan-
`tizing them. Adaptive quantization can also be controlled from
`one macroblock to another through a quantization factor whose
`value varies on a macroblock-by-macroblock basis. These adap-
`tive quantization tools have been employed successfully in the
`MPEG-2 and MPEG-4 (base-layer) standards.
`Performing “adaptive quantization”—AQ—on bitplane sig-
`nals consisting of only ones and zeros has to be achieved through
`a different (yet conceptually similar) set of techniques. We first
`introduced the notion of adaptive quantization for the FGS bit-
`plane signal in [7]. FGS-based AQ is achieved through bitplane
`shifting of a) selected macroblocks within an FGS enhancement
`layer frame, and/or b) selected coefficients within the
`blocks. Bitplane shifting is equivalent to multiplying a partic-
`ular set of coefficients by a power-of-two integer. For example,
`let assume that the FGS encoder wishes to “emphasize” a partic-
`ular macroblock within an FGS frame. All blocks within this
`selected macroblock
`can be multiplied9 by a factor
`
`9Throughput the remainder of this section we will use the words “shifted,”
`“multiplied,” and “up-shifted” interchangeably.
`
`Fig. 9. Example illustrating the use of the Selective Enhancement AQ tool. In
`this case, a “selected” macroblock is emphasized (relative to the surrounding
`macroblocks) by up-shifting all coefficients within that macroblock. This
`generates a new bitplane when compared to the original number of bitplanes.
`
`therefore, the new value
`(within macroblock ) is
`
`of a coefficient of block
`
`is the original value of the coefficient. This is
`where
`equivalent to up-shifting the set of coefficients
`,
`,
`by
`bitplanes relative to other coefficients that
`belong to other macroblocks. An example of this is illustrated
`in Fig. 9. This type of adaptive-quantization tool is referred to
`as Selective Enhancement since through this approach selected
`macroblo

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket