`
`53
`
`The MPEG-4 Fine-Grained Scalable Video Coding
`Method for Multimedia Streaming Over IP
`
`Hayder M. Radha, Member, IEEE, Mihaela van der Schaar, and Yingwei Chen
`
`Abstract—Real-time streaming of audiovisual content over the
`Internet is emerging as an important technology area in multi-
`media communications. Due to the wide variation of available
`bandwidth over Internet sessions, there is a need for scalable
`video coding methods and (corresponding) flexible streaming
`approaches that are capable of adapting to changing network
`conditions in real time. In this paper, we describe a new scalable
`video-coding framework that has been adopted recently by the
`MPEG-4 video standard. This new MPEG-4 video approach,
`which is known as Fine-Granular-Scalability (FGS), consists of
`a rich set of video coding tools that support quality (i.e., SNR),
`temporal, and hybrid temporal-SNR scalabilities. Moreover, one
`of the desired features of the MPEG-4 FGS method is its simplicity
`and flexibility in supporting unicast and multicast streaming
`applications over IP.
`
`Index Terms—Author, please supply index terms. E-mail key-
`words@ieee.org for info.
`
`I. INTRODUCTION
`
`T HE transmission of multimedia content over the World
`
`Wide Web (WWW) has been growing steadily over the
`past few years. This is evident from the large number of popular
`web sites that include multimedia content specifically designed
`for streaming applications. The growth in streaming audiovisual
`information over the web has being increasing rather dramati-
`cally without any evidence of the previously-feared collapse in
`the Internet or its global backbone. Consequently, multimedia
`streaming and the set of applications that rely on streaming are
`expected to continue growing. Meanwhile, the current quality
`of streamed multimedia content, in general, and video in par-
`ticular still needs a great deal of improvement before Internet
`video can be accepted by the masses as an alternative to tele-
`vision viewing. A primary objective of most researchers in the
`field, however, is to mature Internet video solutions to the level
`when viewing of good-quality video of major broadcast televi-
`sion events (e.g., the Super Bowl, Olympics, World Cup, etc.)
`over the WWW becomes a reality [10]–[15].
`To achieve this level of acceptability and proliferation of In-
`ternet video, there are many technical challenges that have to be
`addressed in the two areas of video-coding and networking. One
`
`Manuscript received May 31, 2000; revised December 1, 2000. The associate
`editor coordinating the review of this manuscript and approving it for publica-
`tion was Dr. K. J. R. Liu.
`H. M. Radha is with the Video Communications Research Department,
`Philips Research Laboratories, Briarcliff Manor, NY 10510 USA, and also
`with the Department of Electrical and Computer Engineering, Michigan State
`University, East Lansing, MI 48823 USA (e-mail: radha@egr.msu.edu).
`M. van der Schaar and Y. Chen are with the Video Communications Research
`Department, Philips Research Laboratories, Briarcliff Manor, NY 10510 USA.
`Publisher Item Identifier S 1520-9210(01)01863-6.
`
`generic framework that addresses both the video-coding and
`networking challenges associated with Internet video is scal-
`ability. From a video-coding point-of-view, scalability plays a
`crucial role in delivering the best possible video quality over un-
`predictable “best-effort” networks. Bandwidth variation is one
`of the primary characteristics of “best-effort” networks, and the
`Internet is a prime example of such networks [38]. Therefore,
`video scalability enables an application to adapt the streamed-
`video quality to changing network conditions (and specifically
`to bandwidth variation). From a networking point-of-view, scal-
`ability is needed to enable a large number of users to view any
`desired video stream, at anytime, and from anywhere. This leads
`to the requirement that servers and the underlying transport pro-
`tocols should be able to handle the delivery of a very large
`number (hundreds, thousands, or possibly millions) of video
`streams simultaneously.
`Consequently, any scalable Internet video-coding solution
`has to enable a very simple and flexible streaming framework,
`and hence, it must meet the following requirements [3].
`1) The solution must enable a streaming server to perform
`minimal real-time processing and rate control when out-
`putting a very large number of simultaneous unicast (on-
`demand) streams.
`2) The scalable Internet video-coding approach has to be
`highly adaptable to unpredictable bandwidth variations
`due to heterogeneous access-technologies of the receivers
`(e.g., analog modem, cable mode, xDSL, etc.) or due to
`dynamic changes in network conditions (e.g., congestion
`events).
`3) The video-coding solution must enable low-complexity
`decoding and low-memory requirements to provide
`common receivers (e.g., set-top-boxes and digital televi-
`sions), in addition to powerful computers, the opportunity
`to stream and decode any desired Internet video content.
`4) The
`streaming
`framework
`and
`related
`scalable
`video-coding approach should be able to support
`both multicast and unicast applications. This, in gen-
`eral, eliminates the need for coding content in different
`formats to serve different types of applications.
`5) The scalable bitstream must be resilient to packet loss
`events, which are quite common over the Internet.
`The above requirements were the primary drivers beyond
`the design of the fine-granular-scalability (FGS) video-coding
`scheme introduced originally in [1]. Although there are other
`promising video-coding schemes that are capable of supporting
`different degrees of scalability, they, in general, do not meet all
`of the above requirements. For example, the three–dimensional
`(3-D) wavelet/sub-band-based coding schemes require large
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`1520–9210/01$10.00 © 2001 IEEE
`
`1
`
`SAMSUNG-1040
`
`
`
`54
`
`IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 3, NO. 1, MARCH 2001
`
`memory at the receiver, and consequently they are undesirable
`for low-complexity devices [19]–[21]. In addition, some of
`these methods rely on motion-compensation to improve the
`coding efficiency at the expense of sacrificing scalability and
`resilience to packet
`losses [19], [20]. Other video-coding
`techniques totally avoid any motion-compensation and conse-
`quently sacrifice a great deal of coding efficiency [21], [37].
`The FGS framework, as explained further in the docu-
`ment, strikes a good balance between coding efficiency and
`scalability while maintaining a very flexible and simple
`video-coding structure. When compared with other packet-loss
`resilient streaming solutions (e.g., [40], [41]), FGS has also
`demonstrated good resilience attributes under packet losses
`[42]. Moreover, and after new extensions and improvements
`to its original framework,1 FGS has been recently adopted
`by the ISO MPEG-4 video standard as the core video-coding
`method for MPEG-4 streaming applications [4]. Since the first
`version of the MPEG-4 FGS draft standard [5], there have
`been several improvements introduced to the FGS framework.
`In particular, we highlight three aspects of the improved FGS
`method. First, a very simple residual-computation approach
`was proposed in [6]. Despite its simplicity, this approach pro-
`vides the same or better performance than the performance of
`more elaborate residual-computation methods. (As explained
`later, yet another alternative approach for computing the FGS
`residual has been proposed very recently [46]). Second, an
`“adaptive quantization” approach was proposed in [7], and it
`resulted in two FGS-based video-coding tools. Third, a hybrid
`all-FGS scalability structure was also proposed recently [8],
`[9]. This novel FGS scalability structure enables quality [i.e.,
`signal-to-noise-ratio (SNR)], temporal, or both temporal-SNR
`scalable video coding and streaming. All of these improvements
`to FGS (i.e., simplified residual computation, “adaptive-quan-
`tization,” and the new all-FGS hybrid scalability) have already
`been adopted by the MPEG-4 video standard [4].
`In this paper, we describe the MPEG-4 FGS framework
`and its new video coding tools which have not been presented
`outside the MPEG-4 community. The remainder of the paper
`is organized as follows. Section II describes the SNR FGS
`framework, its ability in supporting unicast and multicast In-
`ternet video applications, and its basic coding tools. Section III
`presents the “adaptive quantization” approach for the FGS
`enhancement-layer signal and the related video-coding tools
`adopted by MPEG-4. Section IV describes the FGS-based
`hybrid temporal-SNR scalability method. Simulation results
`will be shown in each section to demonstrate the performance
`of the corresponding video coding tool. Section V concludes
`the paper with a summary.
`
`II. SNR FGS VIDEO CODING METHOD
`
`In order to meet the requirements outlined in the previous sec-
`tion, FGS encoding is designed to cover any desired bandwidth
`range while maintaining a very simple scalability structure. As
`shown in Fig. 1, the FGS structure consists of only two layers: a
`
`1The original FGS framework was introduced and described in [1] and [3].
`Meanwhile, FGS was first introduced to MPEG-4 in [2].
`
`and a single enhancement-layer
`base-layer coded at a bitrate
`coded using a fine-granular (or embedded) scheme to a max-
`imum bitrate of
`. This structure provides a very efficient, yet
`simple, level of abstraction between the encoding and streaming
`processes. The encoder only needs to know the range of band-
`width
`over which it has to code
`the content, and it does not need to be aware of the particular
`bitrate the content will be streamed at. The streaming server on
`the other hand has a total flexibility in sending any desired por-
`tion of any enhancement layer frame (in parallel with the cor-
`responding base layer picture), without the need for performing
`complicated real-time rate control algorithms. This enables the
`server to handle a very large number of unicast streaming ses-
`sions and to adapt to their bandwidth variations in real-time.
`On the receiver side, the FGS framework adds a small amount
`of complexity and memory requirements to any standard mo-
`tion-compensation based video decoder. These advantages of
`the FGS framework are achieved while maintaining rather sur-
`prisingly good coding-efficiency results (as will be illustrated at
`the end of this section).
`For multicast applications, FGS also provides a flex-
`ible framework for the encoding, streaming, and decoding
`processes. Identical to the unicast case, the encoder com-
`presses the content using any desired range of bandwidth
`. Therefore, the same compressed
`streams can be used for both unicast and multicast applications.
`At time of transmission, the multicast server partitions the FGS
`enhancement layer into any preferred number of “multicast
`channels” each of which can occupy any desired portion of the
`total bandwidth (see Fig. 2). At the decoder side, the receiver
`can “subscribe” to the “base-layer channel” and to any number
`of FGS enhancement-layer channels that the receiver is capable
`of accessing (depending for example on the receiver access
`bandwidth). It is important to note that regardless of the number
`of FGS enhancement-layer channels that the receiver subscribes
`to, the decoder has to decode only a single enhancement-layer
`as shown in Fig. 2.
`The above approach for multicasting FGS [12] is based on the
`receiver-driven layered multicast framework that is supported
`by the IP Multicast backBONE (i.e., the MBONE) [36], [37].
`Therefore, the IP multicast control and routing protocols needed
`for multicasting FGS-based streams have already been defined
`and supported by IP-multicast enabled routers. Moreover, new
`multicast routing protocols and architectures can further en-
`hance the delivery of FGS-based multicast applications [10],
`[11], [15]–[17].
`After this overview of the FGS framework, below we describe
`the basic SNR-based FGS encoder and decoder.
`
`A. Basic FGS Encoder and Decoder
`As shown in Fig. 1, the FGS framework requires two en-
`coders, one for the base-layer and the other for the enhancement
`layer. The base-layer can be compressed using any motion-com-
`pensation video encoding method. Naturally, the DCT-based
`MPEG-4 video standard is a good candidate for the base-layer
`encoder due to its coding efficiency especially at low bitrates.
`Prior to introducing FGS, MPEG-4 included a very rich set of
`video coding tools most of which are applicable for the FGS
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`2
`
`
`
`RAHDA et al.: MPEG-4 FINE-GRAINED SCALABLE VIDEO CODING METHOD
`
`55
`
`the enhancement layer is a sensible option [43]. Therefore, the
`basic SNR MPEG-4 FGS coding scheme is built upon: a) the
`original FGS scalability structure proposed in [1], [2], and b)
`embedded DCT coding of the enhancement layer as proposed
`in [26] and [27].
`Using the DCT transform at both the base and enhancement
`layers enables the encoder to perform a simple residual compu-
`tation2 of the FGS enhancement-layer as shown in Fig. 3 [6].
`Each DCT FGS-residual frame consists of
`bitplanes:
`
`is the maximum DCT (magnitude) value of the
`where
`residual frame under consideration.3 After identifying
`and the corresponding
`,
`the FGS enhancement-layer
`encoder scans the residual signal using the traditional zig-zag
`scanning method starting from the most significant bitplane
`and ending at the least significant bitplane4
`as shown in Fig. 3(b). Every bitplane consists of nonoverlap-
`ping
`macroblocks (MB’s), and each MB includes four
`luminance (
`) blocks and two chroma blocks (
`and
`). Run-length codes are used for (lossless) entropy-coding
`of the zeros and ones in each
`bitplane block [4]. This
`process generates variable length codes that constitute the FGS
`compressed bitstream. A special “all-zero blocks” code is used
`when all six bitplane-blocks (within a given bitplane-mac-
`roblock) do not have any bits with a value of one.
`At the receiver side, the FGS bitstream is first decoded by
`a variable length decoder (VLD) as shown in Fig. 4. Due to
`the embedded nature of the FGS stream, the VLD re-generates
`the DCT residual bitplanes starting from the most significant
`bitplane toward the least significant one. Moreover, due to the
`type of scanning used by the FGS encoder [Fig. 3(b)], it is pos-
`sible that the decoder does not receive all of the bitplane-blocks
`that belong to a particular bitplane. Any bitplane block not re-
`ceived by the decoder can be filled with zero values.5 The re-
`sulting DCT residual is then inverse-transformed to generate the
`SNR residual pixels. These residual pixels are then added to the
`base-layer decoder output to generate the final enhanced scal-
`able video.
`In summary, the basic SNR FGS codec employs embedded
`DCT variable length encoding and decoding operations that re-
`
`2It is important to note that there is an alternative approach for computing the
`FGS residual [46]. This alternative approach is based on computing the residual
`after clipping the base-layer reference picture in the pixel domain. Therefore,
`this approach, which is known as the “post clipping” method, computes the
`FGS residual in the pixel domain, and consequently it requires an additional
`DCT computation of the FGS residual prior to performing the bitplane coding.
`As reported in [46], there is no noticeable difference in the performance of both
`methods. Throughout this document we describe FGS based on the “pre clip-
`ping” residual computation approach which eliminates the need for performing
`DCT computation within the FGS enhancement-layer encoder.
`3In the FGS MPEG-4 standard, three parameters are used for the number-of-
`bitplanes variable N : N (Y ), N (U ), and N (V ) for the luminance
`and chroma components of the video signal [4].
`4Alternatively, the encoder may stop encoding the residual signal if the de-
`sired maximum bitrate is reached.
`5For an “optimal” reconstruction (in a mean-square-error sense) of the DCT
`coefficients, one-fourth (1/4) of the received quantization step-sized as added.
`For example, if the decoder receives only the MSB of a coefficient (with a value
`x, where x = 0 or 1), then this coefficient is reconstructed using the value
`x01000 (i.e., instead of x0000 ).
`
`Fig. 1. Examples of the FGS scalability structure at the encoder (left),
`streaming server (center), and decoder (right) for a typical unicast Inernet
`streaming application. The top and bottom rows of the figure represent
`base-layers without and with bidirectional (B) frames, respectively.
`
`Fig. 2. Example of an FGS-based multicast scenario. (The distribution of the
`base-layer is implicit and therefore is not shown in the figure.)
`
`base-layer. For a complete description of these tools, the reader
`is referred to [4], [39].
`In principle, the FGS enhancement-layer encoder can be
`based on any fine-granular coding method. When FGS was
`first introduced to MPEG-4, three approaches were proposed
`for coding the FGS enhancement layer: wavelet, DCT, and
`matching-pursuit based methods [2]. This led to several pro-
`posals and extensive evaluation of these and related approaches
`(see, for example, [22]–[33]). In particular, the performance of
`different variations of bitplane DCT-based coding [26], [27]
`and wavelet compression methods were studied, compared,
`and presented recently in [43]. Based on a thorough analysis
`of the FGS enhancement-layer (SNR) signal, the study in
`[43] concluded that both bitplane DCT coding and embedded
`zero-tree wavelet (EZW) based compression provide very
`similar results. The same conclusion was reached by the
`MPEG-4 FGS effort. Consequently, and due to the fact that
`the FGS base-layer is coded using MPEG-4 compliant DCT
`coding, employing embedded DCT method for compressing
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`3
`
`
`
`56
`
`IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 3, NO. 1, MARCH 2001
`
`(a) Basic (SNR) FGS encoders for the base and enhancement layers. It is clear that the added complexity of the FGS enhancement-layer encoder is
`Fig. 3.
`relatively small. (b) The scanning order of the FGS enhancement-layer residual DCT coefficients. Scanning starts from the most-significant-bitplane (MSB) toward
`the least-significant-bitplane. Each 8 8 bitplane-block is scanned using the traditional zig-zag pattern.
`
`Fig. 4. Basic structure of the FGS SNR decoder. The FGS decoder includes bitplane de-shifting to compensate for the two “adaptive quantization” encoding
`tools: selective enhancement and frequency weighting.
`
`semble the ones used in typical DCT based standards. In pre-
`vious standards (including MPEG-4 base-layer), the DCT co-
`efficients are coded with (run-length, amplitude) type of codes,
`whereas with FGS the bitplane ones-and-zeros are coded with
`(run-length) codes since the “amplitude” is always one. For
`more information about the VLC codes used by FGS, the reader
`is referred to [4].
`
`B. Performance Evaluation of the FGS SNR Coding Method
`
`In particular, rate-distortion (RD) results of multilayer (dis-
`crete) SNR scalable video coding were compared with FGS
`results over wide ranges of bitrates (e.g., [28]). These results
`have clearly shown that FGS coding provides the same or better
`coding efficiency as traditional SNR scalability methods. Fig. 5
`shows an example of these results for one of the MPEG-4 video
`sequences. As illustrated in the figure, FGS outperforms multi-
`layer SNR scalable coding over a wide range of bitrates and for
`both QCIF and CIF resolution video. (For more extensive data
`on the comparison between FGS and multilayer SNR coding,
`the reader is referred to [28].)
`
`The performance of FGS has been compared thoroughly with
`the performance of traditional SNR scalability video coding.6
`
`6Traditional SNR scalability coding methods include the ones supported by
`MPEG-2 and MPEG-4 (i.e., prior to FGS).
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`4
`
`
`
`RAHDA et al.: MPEG-4 FINE-GRAINED SCALABLE VIDEO CODING METHOD
`
`57
`
`Fig. 5. Performance of FGS coding and “traditional” MPEG-4 SNR coding with multiple layers. It is clear from these plots that FGS outperforms multilayer
`(discrete) SNR coding over a wide range of bitrates. For more data on the comparison between the performance of FGS and multilayer SNR compression, the
`reader is referred to [28].
`
`Fig. 6. Three frames from the “Stefan” sequence. This is an example of a sequence that exhibits a high-degree of temporal correlation among successive frames.
`The pictures shown here are 25-frame apart yet most of the background is very similar from one picture to another.
`
`Here, we focus on two important (yet related) aspects of
`FGS rate-distortion performance. Before discussing these
`two aspects, it is important to highlight that one of the key
`advantages of FGS is its simplicity and flexibility in supporting
`adaptive streaming applications. Naturally,
`this flexibility
`comes, in general, at the expense in video quality. Hence,
`one important question is: how much penalty is being paid in
`quality when comparing FGS with a nonscalable stream that
`is coded at a particular bitrate
`? Therefore, one aspect of
`FGS performance that we would like to address here is how
`does FGS compare with a set of nonscalable streams coded
`at discrete bitrates (e.g.,
`,
`)
`covering the same bandwidth range
`? Although
`this type of comparison may seem to be unfair to FGS—since
`the (multiple) nonscalable streams are optimized for particular
`bitrates whereas FGS covers the same range of bandwidth
`with a single enhancement-layer, this comparison provides an
`insight into the theoretical (upper) limits of FGS’s rate-dis-
`tortion performance. Moreover, since the (ideal) nonscalable
`multiple-streams’ scenario represents an extreme case of
`inflexibility, this comparison provides an insight into the level
`of quality-penalty being paid for FGS’s flexibility.
`A related aspect to the above question is the impact of the
`base-layer (coded at a given bitrate
`) on the overall perfor-
`mance of FGS over the range of bandwidth
`. In this
`section, we will try to shed some light on these two aspects in a
`joint manner. To achieve that, we have conducted a very compre-
`hensive evaluation of a large number of sequences with different
`
`motion and texture characteristics. Each sequence was coded at
`multiple (discrete) bitrates
`,
`,
`to generate
`the nonscalable streams at these rates. Then, we used the non-
`scalable streams (coded with a bitrate
`,
`, 2,
`) to
`generate corresponding FGS streams that covers the bandwidth
`range
`.
`To illustrate some of the key conclusions of our simulation
`study, we show here the results of two video sequences coded
`in the range 100 kbit/s to 1 Mbit/s, at 10 frame/s, and with a
`CIF resolution.7 The two selected sequences: “Stefan” (shown
`in Fig. 6) and “Flying” (Fig. 7). These sequences, “Stefan” and
`“Flying,” represent two types of content: one type with rela-
`tively high-temporal correlation and the other content without
`significant correlation among frames, respectively.
`
`7This bandwidth range was selected since it represents the type of band-
`width variation one may encounter over “broadband” Internet access (e.g.,
`cable-modem access technologies [18]) which are suitable for video streaming
`applications. FGS performance results over lower-bitrate bandwidth ranges
`were similar to the ones presented here when using lower resolution pictures
`(e.g., using QCIF resolution for 10 kbit/s to 100 kbit/s bitrates). Moreover, it
`is important to highlight here that the selected frame-rate (i.e., 10 frames/s)
`was chosen since it represents the adequate frame rate for a base-layer coded
`at around 100 kbit/s. By employing the SNR (only) scalability of FGS, the
`enhancement layer is “locked” to the base-layer frame rate regardless of the
`bitrate. This issue, which represents an intrinsic limitation of all SNR-based
`scalability coding methods, is resolved when using the hybrid temporal-SNR
`FGS scheme discussed later in this paper. In general, using higher frame rates
`(e.g., 15 frames/s) will lower the PSNR values for both FGS and nonscalable
`streams. However, and depending on the video sequence, the difference in the
`performance between FGS and the nonscalable streams may increase with
`increasing the frame rate of the base-layer stream.
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`5
`
`
`
`58
`
`IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 3, NO. 1, MARCH 2001
`
`Fig. 7. Three frames from the “Flying” sequence. This is an example of a sequence that exhibits a high-degree of motion and scene changes. The pictures shown
`here are only 5-frame apart yet most of the visual content is changing from one picture to another.
`
`Fig. 8. FGS performance in comparison with multiple nonscalable streams’ (ideal) case. The left figure shows the results for a sequence that exhibits a high-degree
`of temporal correlation among successive frames. For this type of sequences, FGS pay penalty in performance due to the absence of motion compensation within
`the enhancement-layer. The right figure shows the performance for a sequence with very-high motion and a large number of scene cuts. For this type of sequences,
`FGS performance is either similar to or even slightly better than the ideal nonscalable case.
`
`The Peak SNR performance numbers are shown in Fig. 8. The
`left figure shows the results for the sequence “Stefan” (which is
`characterized by a relatively high-degree of temporal correlation
`amongsuccessiveframes).Itisclearthatforthesesequences,FGS
`pays some penalty in performance when compared with the ideal
`nonscalable case, due to the absence of motion compensation
`within the FGS enhancement-layer. This penalty manifests itself
`in more “blocky”video for FGS coded sequences whencompared
`with the nonscalable streams, in particular, at low bitrates (e.g.,
`around the 300–500 kbit/s bitrate-range for the Stefan sequence).
`At higher bitrates, the difference in quality is usually less visible.
`It is also clear that selecting a higher bitrate base-layer could
`provide rather significant improvement in quality at the expense
`of decreasing the bandwidth range that FGS covers.
`The right plots in Fig. 8 show the performance for the se-
`quence “Flying” which includes very-high motion scenes and
`a large number of scene cuts. For these sequences, FGS per-
`formance is either similar to or even slightly better than the
`ideal nonscalable case. It is also clear that, here, the impact
`of selecting a higher bitrate base-layer does not provide sig-
`nificant improvement, and therefore one can still cover the de-
`sired (wider) range of bandwidth without paying much penalty
`in quality. Consequently, based on our study, the following key
`conclusions can be made:
`1) When compared with the (ideal) nonscalable coding case,
`FGS suffers the most for sequences with high temporal
`
`correlation between successive frames.8 This result is
`somewhat intuitive since FGS exploits temporal redun-
`dancy only at the base layer, and therefore FGS suffers
`some coding efficiency due to lack of motion compensa-
`tion at the enhancement-layer. (An example of this type
`of sequences is shown in Fig. 6.) Other very common
`examples of such sequences include simple “head-and-
`shoulder” scenes with static background (e.g., scenes of
`news anchors, talk shows, etc.).
`2) On the other hand, for sequences with a high degree of
`motion (e.g., with a large number of scene cuts and/or
`very fast motion), FGS’s rate-distortion performance is
`very good. In these cases, FGS usually (and rather surpris-
`ingly) provides similar (sometimes slightly better) coding
`efficiency when compared with the nonscalable (ideal)
`streams. (An example of this type of sequences is show
`in Fig. 7.) Although this type of video content is not
`as common as the type of sequences mentioned above
`(i.e., in 1), the presence of high-motion video content is
`growing in support of many IP streaming applications.
`Examples of this type of high-motion sequences include
`“movie trailers” (which usually contain a large number of
`scene changes), certain commercials, and news clips with
`high-action content.
`
`8Here, “temporal correlation” is based on subjective observations rather than
`an objective measure.
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on December 21,2023 at 05:18:10 UTC from IEEE Xplore. Restrictions apply.
`
`6
`
`
`
`RAHDA et al.: MPEG-4 FINE-GRAINED SCALABLE VIDEO CODING METHOD
`
`59
`
`3) As expected, the base-layer (and its corresponding bi-
`trate) could have a major impact on the overall perfor-
`mance of FGS. In particular, this observation is prevalent
`for the sequences with high-level of temporal correlation
`(i.e., case 1 above).
`4) There is an inherit trade-off between the overall perfor-
`mance and the amount of bandwidth range
`,
`, 2,
`, one needs/desires to cover. For example,
`the average performance of FGS over a bandwidth range
`could be significantly better than the average
`performance over the wider range
`when the
`nonscalable streams coded at
`and
`are used as base-
`layers, respectively. This is usually due, in part, to the fact
`that the nonscalable (base-layer) stream coded at
`has a
`better quality than the lower bitrate stream , and there-
`fore, starting form a higher-quality base-layer naturally
`improves the overall quality. Again, this observation was
`only clear for sequences with high-level of temporal cor-
`relation.
`In summary, FGS provides fairly acceptable to very good re-
`sults even when compared with the multiple (ideal) nonscalable
`streams scenario. In addition, the high-level of flexibility and
`simplicity that FGS provides makes it an attractive solution for
`IP streaming applications. Moreover, FGS are further enhanced
`by two important video coding tools and features as described
`in the following two sections.
`
`III. FGS CODING WITH ADAPTIVE QUANTIZATION
`
`Adaptive quantization is a very useful coding tool for im-
`proving the visual quality of transform-coded video. It is nor-
`mally achieved through a quantization matrix that defines dif-
`ferent quantization step sizes for the different transform coeffi-
`cients within a block (prior to performing entropy coding on
`these coefficients). For example, the dc coefficient and other
`“low frequency” coefficients normally contribute more to the vi-
`sual quality and consequently small step sizes are used for quan-
`tizing them. Adaptive quantization can also be controlled from
`one macroblock to another through a quantization factor whose
`value varies on a macroblock-by-macroblock basis. These adap-
`tive quantization tools have been employed successfully in the
`MPEG-2 and MPEG-4 (base-layer) standards.
`Performing “adaptive quantization”—AQ—on bitplane sig-
`nals consisting of only ones and zeros has to be achieved through
`a different (yet conceptually similar) set of techniques. We first
`introduced the notion of adaptive quantization for the FGS bit-
`plane signal in [7]. FGS-based AQ is achieved through bitplane
`shifting of a) selected macroblocks within an FGS enhancement
`layer frame, and/or b) selected coefficients within the
`blocks. Bitplane shifting is equivalent to multiplying a partic-
`ular set of coefficients by a power-of-two integer. For example,
`let assume that the FGS encoder wishes to “emphasize” a partic-
`ular macroblock within an FGS frame. All blocks within this
`selected macroblock
`can be multiplied9 by a factor
`
`9Throughput the remainder of this section we will use the words “shifted,”
`“multiplied,” and “up-shifted” interchangeably.
`
`Fig. 9. Example illustrating the use of the Selective Enhancement AQ tool. In
`this case, a “selected” macroblock is emphasized (relative to the surrounding
`macroblocks) by up-shifting all coefficients within that macroblock. This
`generates a new bitplane when compared to the original number of bitplanes.
`
`therefore, the new value
`(within macroblock ) is
`
`of a coefficient of block
`
`is the original value of the coefficient. This is
`where
`equivalent to up-shifting the set of coefficients
`,
`,
`by
`bitplanes relative to other coefficients that
`belong to other macroblocks. An example of this is illustrated
`in Fig. 9. This type of adaptive-quantization tool is referred to
`as Selective Enhancement since through this approach selected
`macroblo