`
`Signal Processing for Internet Video Streaming: A Review
`
`Jian Lu*
`Apple Computer, Inc.
`
`
`ABSTRACT
`
`Despite the commercial success, video streaming remains a black art owing to its root in proprietary commercial
`development. As such, many challenging technological issues that need to be addressed are not even well understood. The
`purpose of this paper is to review several important signal processing issues related to video streaming, and put them in the
`context of a client-server based media streaming architecture on the Internet. Such a context is critical, as we shall see that a
`number of solutions proposed by signal processing researchers are simply unrealistic for real-world video streaming on the
`Internet. We identify a family of viable solutions and evaluate their pros and cons. We further identify areas of research that
`have received less attention and point to the problems to which a better solution is eagerly sought by the industry.
`
`Keywords: video streaming, video compression, Internet video, rate adaptation, packet loss, error control, error recovery.
`
`1. INTRODUCTION
`
`Video streaming is one of the most exciting applications on the Internet. It already created a new business known as Internet
`broadcast, or Intercast/Webcast. Although Internet broadcast is still in its infancy, we get a glimpse of the future of
`broadcasting where people can enjoy not only scheduled live or pre-recorded broadcast but also on-demand and personalized
`programs. Stirring more excitement is the idea that anyone will be able to set up an Internet TV station and broadcast his or
`her video programs on the Internet just like we publish our “homepage” on the Web today. While video streaming as a core
`technology behind a new business is a success story, it remains very much a black art owing to its root in proprietary
`commercial development. Indeed, to the author’s knowledge, there has been no published research literature to date that
`offers a comprehensive review of video streaming architecture and system, and related design issues.
`
`Even the definition of streaming is sketchy; it is often described as a process. Generally speaking, streaming involves
`sending media (e.g., audio and video) from a server to a client over a packet-based network such as the Internet. The server
`breaks the media into packets that are routed and delivered over the network. At the receiving end, the packets are
`reassembled by the client and the media is played as it comes in. A series of time-stamped packets is called a stream. From
`a user’s perspective, streaming differs from simple file transfer in that the client plays the media as it comes in over the
`network, rather than waiting for the entire media to be downloaded before it can be played. In fact, a client may never
`actually download a streaming video; it may simply reassemble the video’s packets and play the video as the packets come
`in, and then discard them. To a user the essence of streaming video may lie in an expectation that the video stream the user
`has requested should come in continuously without interruption. This is indeed one of the most important goals for which a
`video streaming system is designed.
`
`Over the last decade, IP multicast and particularly the MBone have received considerable attention from the research
`community. IP multicast provides a simple and elegant architecture and service model that are particularly suitable for real-
`time multimedia communications. The emergence of the MBone, an experimental virtual multicast backbone, has offered a
`testbed for researchers around the world to develop and test a suite of applications known as MBone tools for real-time
`multimedia communications and collaborations. See, for example, a recent article by McCanne1 for the evolution of the
`MBone and its applications. Many of these applications have been designed for audio and video conferencing and
`broadcasting using IP multicast. On November 18, 1994, the MBone reached a festive milestone amid great fanfare when
`Rolling Stones broadcast the first major rock band concert on the MBone.
`
`Despite the promise demonstrated by the MBone and a logical reasoning that IP multicast would serve as a foundation for
`video streaming on the Internet, commercial deployment of multicast applications and services on the public Internet has
`been painfully slow by the Internet standard. The bulk of the streaming media traffic generated by three major commercial
`
`
`* Correspondence: One Infinite Loop, MS: 302-3MT, Cupertino, CA 95014, USA; Email: jian@computer.org.
`
`VIMEO/IAC EXHIBIT 1029
`VIMEO ET AL., v. BT, IPR2019-00833
`
`
`
`streaming systems (i.e., QuickTime, RealSystem, and Windows Media) on today’s Internet consists of largely unicast
`streams. It is out of this paper’s scope to discuss the unresolved issues that have slowed down the deployment of multicast
`applications and services; interested readers are referred to a recent white paper published by the IP Multicast Initiatives.2
`However, we want to note that even when multicast is fully deployed, a very large volume of streaming media traffic will
`remain to be unicast streams. This is because a large portion of the streaming media content is archived and played on-
`demand by users on the Internet. Multicast is of little help to on-demand streaming unless a large number of users from the
`same neighborhood of the Internet would request the same content simultaneously.
`
`The purpose of this paper is to review several important issues in Internet video streaming from a signal processing
`perspective. For the reasons we have just mentioned above, our review will focus on unicast streaming while considering
`multicast as a possible extension. Drawing from the author’s experience in developing one of the major commercial media
`streaming systems, we attempt to identify the most challenging problems in video streaming and evaluate a number of signal
`processing solutions that have been proposed; some of these solutions have already been implemented in experimental and/or
`commercial video streaming systems. To make the review useful to both researchers and application developers, we would
`like to clearly identify among the proposed solutions: what works in real world, what doesn’t, and why? We want to note that
`although to a Web surfer streaming video commonly implies synchronized audio as well, video and audio are actually
`delivered in independent streams in a streaming session. We will not discuss streaming audio in this paper.
`
`The rest of this paper is organized as follows. Section 2 introduces streaming system and architecture and describes what we
`believe to be the most challenging problems in video streaming. We also examine some important characteristics of the
`streaming system and the constraints that are imposed on any prospective signal processing solutions. Section 3 reviews a
`number of signal processing solutions to the video streaming problems in great detail. We first comment on some proposed
`solutions that are simply unrealistic for real-world video streaming on the Internet. Then, we evaluate a family of viable
`solutions, identify areas of research that have received less attention, and point to the problems to which a better solution is
`eagerly sought by the industry. Section 4 concludes the paper.
`
`2. VIDEO STREAMING SYSTEM AND ARCHITECTURE
`
`2.1. Video Streaming Model and Architecture
`
`A schematic system diagram of Internet video streaming is shown in Figure 1 for both unicast and multicast scenarios.
`Figure 1(a) represents the unicast model for two types of video streaming services, on-demand or live. In on-demand
`streaming, pre-compressed, streamable video content is archived on a storage system such as a hard disk drive and served to a
`client on demand by the streaming server. In live streaming, a video stream coming out of a live source such as a video
`camera is compressed in real-time by the encoder. The compressed bitstream is packetized by the packetizer, and sent over
`the Internet by the streaming server. On the client side, the video bitstream is reassembled from the received packets by the
`reassembler, then decoded by the decoder, and finally rendered on a display. Noting that each client connecting to the server
`receives an individual stream, the disadvantages of the unicast streaming model can be explained. Clearly, the server load
`will increase proportionally to the number of clients connected to and served by the server. When a large number of clients
`try to connect to a server at the same time, the server gets overloaded, creating a “hot spot” on the Internet. Additionally,
`sending copies of the same data in individual streams to a large number of clients is obviously inefficient; it causes
`congestion on the network and deteriorates quality of services.
`
`The multicast model of Figure 1(b) has many advantages over unicast in live video streaming (broadcasting) where a large
`audience could tune in for the broadcast at the same time. Instead of sending a stream to each client individually, the server
`sends out only one stream that is routed to one or more “group addresses”. A client tunes in to a multicast group in its
`neighborhood to receive the stream. Since the server is decoupled from connecting to each client directly, server load does
`not increase with the number of receiving clients. In the case of delivering a stream to multiple groups, the stream on its
`route gets replicated and branched only at the fan-out points within the network. When combined with layered coding that
`will be discussed in Section 3, multicast provides a decentralized and highly efficient video streaming architecture.
`
`Despite the clear advantages of multicast-based video streaming, it is employed in only a small fraction of today’s Internet,
`primarily in Intranets within an institution or corporation. There are complex reasons for the slow deployment of multicast-
`based services in general.2 Beyond deployment hurdles we note that a large volume of streaming video content is archived
`and streamed to users on demand. Multicast does not help very much in on-demand streaming where the service requests
`from users are random and highly desynchronized in terms of requested content and timing of these requests. Interestingly,
`
`2
`
`
`
`recent development in the so-called “content delivery network” appears to have achieved some improvements in unicast-
`based on-demand streaming that are reminiscent of those we have seen with multicast-based live streaming. A content
`delivery network such as the one developed by Akamai Technologies3 is essentially a load-balanced network of servers. The
`servers are deployed on the “edge” of the Internet, making them close to the end users. These “edge servers” communicate
`with a “source server” to update their content. A client requesting a streaming video is redirected dynamically to the closest
`edge server that is not overloaded. Like multicast-based video streaming, this decentralized service model can reduce “hot
`spots”. In principle, a content delivery network can also help unicast-based live streaming with a possibly increased latency.
`
`••
`
`Encoder
`
`3
`
`
`
`2.2. The Internet Challenge to Video Streaming
`
`The Internet provides a great challenge to video streaming. From a communication point of view the Internet is a shared
`channel with a massive number of heterogeneous components. This channel is extremely volatile over space and time in
`terms of communication capacity and quality. To see what it means to video streaming let us look at some data that we have
`collected from a real-life video streaming session.
`
`4
`
`
`
`Figure 2 shows us all but a typical video streaming session when everything runs normally. There are also extreme cases.
`Many of us may have experience of watching streaming video that stalled completely for several times during a streaming
`session. In such cases, the bitrate of the incoming video is nil and the packet loss rate is 100%, making it totally worthless to
`watch. Generally speaking, the experience of streaming video isn’t always as bad (or good) as that in our memory or in
`Figure 2. A person who was watching the same streaming video from another location might have had a totally different
`experience. The point is that the Internet channel variations are so volatile and space- and time-dependent that the
`observation can be different by every client on the network at every second. The data shown in Figure 2 and our experience
`about video streaming challenge us with two fundamental signal processing problems whose solutions can improve the
`performance of Internet video streaming dramatically. The first problem is concerned with how to adapt to the constantly
`changing bandwidth and congestion condition on the Internet. The second calls for solutions to coping with packet loss due
`to the best-effort service model of the current Internet. We will, for short, refer to these two problems as “rate adaptation”,
`and “error control and recovery”, respectively. Before we attempt to address these problems and evaluate a number of
`proposed solutions, we need to examine the constraints that have been imposed on any prospective solutions. We go back to
`the video streaming model and architecture of Section 2.1 and make a few important observations.
`
`2.3. Video Streaming System Characteristics and Constraints
`
`Many signal processing researchers are familiar with the video conferencing model thanks to the popularity of ITU H.261
`and H.263 standards. In the following we review the characteristics of a video streaming system and point out its difference
`from video conferencing. For the comparison purpose we provide in Figure 3 a functional diagram of an IP-based peer-to-
`peer video conferencing system. As we have mentioned, we will focus on analyzing the unicast video streaming system.
`
`(a) Latency. One difference between streaming and conferencing is that the former does not require two-way user
`interaction. As such the requirement on latency is more relaxed in streaming than in conferencing. Latency requirement
`commonly translates into a decoder/client buffer of certain size; which is often simulated by the encoder/server for rate
`control. When a video streaming session is initiated, the client waits for a few seconds for its buffer to be filled by the
`server. This process is called pre-roll. After the pre-roll the client starts playing the video. Obviously, the client’s
`buffer has to be kept from underflow in order for the video to continue to play without interruption. Generally speaking,
`streaming can afford to use a much larger buffer than that can be considered in conferencing to absorb network jitters.
`However, latency and smooth playback must be balanced to create a good user experience, which becomes more
`important in streaming live video. It is common in commercial video streaming systems to pre-roll a few seconds of data
`into the decoder/client buffer.
`
`(b) Client-Server Model. This makes a major distinction between the video streaming system of Figure 1(a) and the peer-to-
`peer conferencing system of Figure 3. In the video conferencing system of Figure 3, there is no server involved.
`Additionally, most components including encoder, decoder, packetizer, and reassembler at either end often reside on one
`piece of hardware, which is a desktop computer in the so-called desktop conferencing environment. The video streaming
`system in Figure 1(a), however, operates in a client-server environment. As such, a streaming server may be required to
`serve thousands of clients simultaneously. It is therefore necessary for the encoder to be separate from the server
`hardware so that the server hardware can be dedicated to serving the clients. For archived, on-demand video streaming,
`encoder and server are separate by default. In live streaming, encoder/server separation enables a more efficient “encode
`once, serve multiple times” service model. This model is almost always adopted in practice. A common implementation
`for live video streaming is to combine the encoder and packetizer to reside on one piece of hardware that broadcasts a
`stream to the server. The server simply “reflects” the stream to its clients. The client-server model and the nature of
`encoder/server separation make it unsuitable for many signal processing techniques that have been developed for video
`conferencing to be used in video streaming. We will come back to this observation later.
`
`(c) Server Load Restrictions. As we have pointed out that a streaming video server may be required to serve thousands of
`clients simultaneously. When the number of clients connected to a server increases, the server experiences an elevated
`load due to increased processing and I/O demands. Eventually the server can get overloaded and the quality of service
`drops. While it is relatively well understood that a good signal processing solution should not require the server to do
`much processing, the requirement for I/O is often overlooked. In archived, on-demand video streaming, when the
`number of clients increases, heavy I/O activity, particularly disk access, often becomes the bottleneck in the streaming
`system. This ought to be kept in mind in designing a signal processing solution for video streaming.
`
`5
`
`
`
`Internet
`
`6
`
`
`
`3. SIGNAL PROCESSING SOLUTIONS FOR VIDEO STREAMING
`
`3.1. Solutions That Do Not Fit
`
`After reviewing the architecture and characteristics of a video streaming system in the previous Section, it becomes clear that
`some of the methods that have been developed for video conferencing and other applications such as wireless video are not
`quite applicable to video streaming. We comment on these “unfit” solutions before moving onto evaluating a family of
`viable solutions.
`
`3.1.1. Feedback-Based Encoder Adaptation
`
`Many signal processing researchers have explored the idea of having a video encoder adapt to the changing conditions in
`bandwidth and packet loss. It is assumed that a back-channel exists and will feed the necessary information back to the
`encoder which in turn adjusts its behavior in response to the bandwidth and packet loss conditions. This idea has motivated
`the design of the RTCP channel along with RTP.4-6 To date, many ways of encoder adaptation have been proposed and
`studied. Bolot, Turletti, and Wakeman5 proposed an adaptive rate control mechanism by which a video encoder adjusts its
`output bitrate based on the feedback information from the receivers. There has been continued work along this direction.7-10
`In the same spirit as feedback-based rate control, Steinbach, Färber, and Girod11 proposed to adapt the coding mode selection
`at the encoder to the error conditions at the receiver for wireless video. The idea is that the error damaged MBs may be
`tracked by the sender through a receiver’s feedback, and a rapid error recovery can be achieved by INTRA refreshing
`corresponding MBs in the next frame. More recently, the scheme of adaptive coding mode selection was proposed for
`Internet video based on a packet loss probability model that must be estimated at the receiver and fed back to the sender10.
`Similar to adaptive INTRA MB refreshing and placement, it was proposed that a damaged reference picture could be
`dynamically replaced at the sender according to the back-channel signaling.12 This mechanism is now supported in ITU
`H.263+ standard. 13 There are still more ways of encoder adaptation. For example, unequal error protection can be applied
`according to packet loss rate,14 provided that a current estimate of the packet loss rate at a specific receiver is available.
`
`Although the feedback-based encoder adaptation techniques that we have just reviewed can be effective in peer-to-peer video
`conferencing and wireless video, they don’t fit in a real-world video streaming system. This is due to the encoder/server
`separation that we discussed in the previous Section. As we mentioned there (see Figure 1(a)), in live video streaming, for
`efficiency purpose the encoder will encode once, producing one bitstream for the server. Of course, the system can be
`configured to enable the encoder to generate a bitstream adapted to each client. However, such a design will not scale up to
`an industry-strength video streaming system. It is well-known that real-time video compression is computationally
`expensive; this is one reason why the encoder is separate from the server. Considering the server may be serving thousands
`of clients simultaneously, it is simply unrealistic to expect the encoder to compress and output thousands of different
`bitstreams that are adapted to each client. In on-demand video streaming, it is obvious that encoder adaptation can not be
`applied because the video content is compressed off-line.
`
`3.1.2. Complex Server Involvement
`
`Now that we know the encoder in a video streaming system cannot afford to adapt to each client’s condition. One may
`wonder if the server can adapt. Generally speaking, the server does have some flexibility and capability to manipulate the
`video bitstream to adapt to a client’s needs. In fact, all viable solutions to rate adaptation to be reviewed in Section 3.2 rely
`on the server to do some kind of processing. The exact requirements for the server involvement depend on a specific
`adaptation technique, but for the obvious reason the server should not be tasked with complex processing on a per client
`basis.
`
`An example of tasks that are too complex is for the server to convert the bitrate of a pre-encoded video bitstream dynamically
`by transcoding. Although a transcoder is often much less complex than a full encoder, a considerable amount of computing
`such as bit re-allocation, and motion vector search or refinement may be required.15-17 A server is unlikely to scale up to
`serving a large number of clients if it performs transcoding for each client. A simplified version of transcoding, called
`dynamic rate shaping (DRS).18,19 requires even less processing because it achieves bitrate reduction by selectively dropping
`quantized transform coefficients or blocks, or by re-quantization of the blocks.20 Despite the significant reduction in
`computational complexity at the expense of introducing a quality problem known as “drift”, we still feel that DRS is too
`complex for a server to do on a per client basis. This is because DRS requires the server to have knowledge about a specific
`codec to process the bitstream.
`
`7
`
`
`
`3.1.3. Intraframe Error Control and Concealment at Low Bitrate
`
`There has been much work on designing error resilient bitstream for error control. For example, the emerging MPEG-4
`video standard supports such mechanisms as resynchronization marker, data partition, and reversible VLC for intraframe
`error control. When these features are implemented in a compressed bitstream, the decoder can regain synchronization in
`case of bit errors, and continue to decode a video frame even if a portion of the frame is damaged or lost. Furthermore, error
`concealment techniques can be used to recover a damaged or lost region within a video frame to certain degree.
`
`Obviously, any intraframe error control and concealment technique would become useless if a whole video frame is lost
`completely. Unfortunately, this is often the case at low bitrate when we attempt to packetize a video bitstream for the MTU
`of LAN. As we mentioned in Section 2.3, the MTU for the Ethernet is 12000 bits. Consider that we compress video for a
`target bitrate of 36kbps at a frame rate of 4fps. This would be suitable for users with a 56kbps modem connection because
`we need to leave some room for audio and bandwidth fluctuation. At this target bitrate and frame rate, the average size of a
`video frame is 9000 bits, which is less than the MTU of 12000 bits. In practice, a motion compensated frame is often much
`smaller in size than 9000 bits in low bitrate video streaming. This implies that when a packet is lost, it is most likely that a
`whole video frame is lost. Therefore, no matter how well intraframe error control and concealment techniques are
`implemented in the video bitstream and at the decoder, they are of little help in low bitrate video streaming.
`
`3.2. Solutions for Rate Adaptation
`
`As we discussed in Section 3.1, the video encoder cannot afford to adapt to individual clients, and the streaming server is
`constrained to doing only light processing on a per client basis. The solutions to be reviewed below for rate adaptation fit
`these requirements. Additionally, these solutions do not require the server to understand a compressed video bitstream.
`From a server design point of view, it is advantageous to make the server codec-independent, or even media-independent. A
`good example of media-independent RTP streaming server is Apple’s Darwin QuickTime Streaming Server (QTSS). The
`Darwin QTSS is open-sourced and available from Apple.21
`
`3.2.1. Dynamic Stream Switching
`
`Video streaming using multiple bitstreams encoded at different bitrates has been supported in commercial streaming systems
`such as Apple’s QuickTime, RealNetworks’ RealSystem G2, and Microsoft Windows Media. RealSystem G2 was the first to
`support dynamic stream switching under the name of SureStream.22,23 The idea is that the server can switch among multiple
`streams to serve a client with one that best matches the client’s available bandwidth. The switching is dynamic, depending
`on the estimated available bandwidth reported by a client and the server’s decision on whether it will respond. We want to
`note that in some research literature, video streaming using dynamic stream switching is called simulcast.24 We prefer not to
`use this term because what we are describing here is different from the same word originated and used in the broadcasting
`industry.
`
`In the SureStream implementation, three streams are prepared for a single target bitrate. It is worth noting that at the low end
`is a keyframe only stream encoded at approximately 50% of the target bitrate.22 The bitrate of this keyframe only stream can
`be further reduced by dropping additional keyframes since there is no dependency among the keyframes. Furthermore, the
`keyframe only stream is a good fallback choice when the packet loss rate is high and there is a need to resynchronize.
`
`Although being effective and requiring only a little server involvement, dynamic stream switching makes the encoding
`process more complex. The mechanism of making a switchable SureStream is not published, but one can think of some
`feasible approaches. For example, we can build some synchronization points to facilitate switching among the multiple
`streams. A keyframe shared by multiple streams would provide a convenient synchronization point, provided that the buffers
`of these streams can be synchronized. Besides making encoding process complicated, we note that dynamic stream
`switching requires redundant data storage but still has limited rate adaptability because the server can switch among only a
`small number of available streams.
`
`3.2.2. Scalable Video Coding
`
`Research in scalable video coding dates back to almost a decade ago.25 A similar term, namely layered coding, is also used
`frequently to refer scalability, and has been popular in multicast community.26,27 To date, support for scalable video coding
`
`8
`
`
`
`can be found in modern video compression standards such as H.263+ and MPEG-4. There has also been continued research
`in this direction. See, for example, a few representative papers.28-30 There are three types of scalability that can be built into
`a compressed video bitstream. They are temporal scalability, spatial (resolution) scalability, and (spatial) quality scalability.
`The last one is also called SNR scalability because the quality of a compressed picture is often measured in signal-to-noise
`ratio (SNR). Let’s now look at what these different types of scalability can do for video streaming.
`
`Temporal scalability is commonly achieved by including B-frames in a compressed video bitstream. B-frames are
`bidirectionally predicted from the forward and backward I- or P-frames, and not used as reference pictures for motion
`compensation in reconstructing other B- or P-frames. For this reason, a B-frame can be dropped without affecting the picture
`quality of other frames. However, dropping a B-frame will lower the frame rate or temporal quality of a video playback,
`giving the meaning of temporal scalability. Temporal scalability through B-frame placement is supported in all MPEG family
`video coding standards and H.263+. It should be noted that P-frames can also be dropped for rate reduction, but doing so
`will cause a decoder drift unless all P-frames are dropped. Use of temporal scalability for rate adaptation by dropping B- and
`P-frames at the server has been experimented in video streaming23,31. Although effective to some degree, it is found that
`temporal scalability alone is often inadequate to achieve desired rate reduction.
`
`Spatial scalability is obtained by creating a multiresolution representation of each frame in a video bitstream. In spatial
`scalable coding, a base layer at a reduced resolution is first encoded, and an enhancement layer representing the difference
`between the original picture and the base layer is also encoded. It is worth noting that an enhancement layer can be split into
`multiple enhancement layers so that each one is built upon the previous ones at a lower resolution. Spatial scalability is
`supported in MPEG-4 and H.263+. There is a limited application of spatial scalability in video streaming. This is because
`that for good user experience the displayed frame resolution will not be changed during a video playback. So, even if the
`server could drop an enhancement layer and send only the base layer to a client in rate adaptation, the client would have to
`interpolate the resolution-reduced base layer to the full frame size for display. This makes spatial scalability a special case of
`SNR scalability that we are going to discuss below.
`
`SNR scalability is similar to spatial scalability in that a base layer and one or more enhancement layers are encoded. The
`difference is that in SNR scalable coding the base layer is not subsampled to a lower resolution. Instead, the base layer is
`coarsely quantized and the difference between the base layer and the original picture is encoded into one or more
`enhancement layers. Spatial scalability can be viewed as a special case of SNR scalability if we consider an interpolated base
`layer in spatial scalable coding as the base layer in SNR scalable coding. The SNR or spatial scalability that we have just
`described assumes that an enhancement layer is either available and decoded as a whole or not available at all. It is not very
`effective for rate adaptation in video streaming for several reasons. First, if only one or two enhancement layers are used, we
`have very limited rate adaptability that is similar to the situation in dynamic stream switching. Second, if more enhancement
`layers are used, overhead will be up and coding efficiency will drop significantly. In fact, a recent study32 on H.263+
`scalability performance suggested that even with only two enhancement layers, the loss in coding efficiency of SNR scalable
`coding in comparison to multiple bitstream coding becomes unacceptable (about 1-1.5dB loss per layer) at low bitrate. This
`explains why the industry chose to use dynamic stream switching in commercial video streaming systems instead of scalable
`video coding options that are available in the modern video compression standards.
`
`In recent years, research in scalable coding has centered on developing coders that can generate an “embedded bitstream”.
`Much of the work was inspired by the design and excellent coding efficiency of the EZW33 and SPIHT34 coders for still
`images. An embedded bitstream is such that bits are packed in the order of “significance”, a measure that is often tied to the
`magnitude of image coefficients in the transform domain. An intuitive way of embedding would be to order image
`coefficients by their magnitude and pack them in a bitstream in descending order of bit-planes. An embedded bitstream can
`be truncated from the end to match a reduced target bitrate. In EZW33 and SPIHT,34 the embedding is so fine-granular that
`one can scale the size of a bitstream in byte accuracy. For this reason, an embedded bitstream is said to have fine-granularity
`scalability