`
`Signal Processing for Internet Video Streaming: A Review
`
`Jian Lu*
`Apple Computer, Inc.
`
`
`ABSTRACT
`
`Despite the commercial success, video streaming remains a black art owing to its root in proprietary commercial
`development. As such, many challenging technological issues that need to be addressed are not even well understood. The
`purpose of this paper is to review several important signal processing issues related to video streaming, and put them in the
`context of a client-server based media streaming architecture on the Internet. Such a context is critical, as we shall see that a
`number of solutions proposed by signal processing researchers are simply unrealistic for real-world video streaming on the
`Internet. We identify a family of viable solutions and evaluate their pros and cons. We further identify areas of research that
`have received less attention and point to the problems to which a better solution is eagerly sought by the industry.
`
`Keywords: video streaming, video compression, Internet video, rate adaptation, packet loss, error control, error recovery.
`
`1. INTRODUCTION
`
`Video streaming is one of the most exciting applications on the Internet. It already created a new business known as Internet
`broadcast, or Intercast/Webcast. Although Internet broadcast is still in its infancy, we get a glimpse of the future of
`broadcasting where people can enjoy not only scheduled live or pre-recorded broadcast but also on-demand and personalized
`programs. Stirring more excitement is the idea that anyone will be able to set up an Internet TV station and broadcast his or
`her video programs on the Internet just like we publish our “homepage” on the Web today. While video streaming as a core
`technology behind a new business is a success story, it remains very much a black art owing to its root in proprietary
`commercial development. Indeed, to the author’s knowledge, there has been no published research literature to date that
`offers a comprehensive review of video streaming architecture and system, and related design issues.
`
`Even the definition of streaming is sketchy; it is often described as a process. Generally speaking, streaming involves
`sending media (e.g., audio and video) from a server to a client over a packet-based network such as the Internet. The server
`breaks the media into packets that are routed and delivered over the network. At the receiving end, the packets are
`reassembled by the client and the media is played as it comes in. A series of time-stamped packets is called a stream. From
`a user’s perspective, streaming differs from simple file transfer in that the client plays the media as it comes in over the
`network, rather than waiting for the entire media to be downloaded before it can be played. In fact, a client may never
`actually download a streaming video; it may simply reassemble the video’s packets and play the video as the packets come
`in, and then discard them. To a user the essence of streaming video may lie in an expectation that the video stream the user
`has requested should come in continuously without interruption. This is indeed one of the most important goals for which a
`video streaming system is designed.
`
`Over the last decade, IP multicast and particularly the MBone have received considerable attention from the research
`community. IP multicast provides a simple and elegant architecture and service model that are particularly suitable for real-
`time multimedia communications. The emergence of the MBone, an experimental virtual multicast backbone, has offered a
`testbed for researchers around the world to develop and test a suite of applications known as MBone tools for real-time
`multimedia communications and collaborations. See, for example, a recent article by McCanne1 for the evolution of the
`MBone and its applications. Many of these applications have been designed for audio and video conferencing and
`broadcasting using IP multicast. On November 18, 1994, the MBone reached a festive milestone amid great fanfare when
`Rolling Stones broadcast the first major rock band concert on the MBone.
`
`Despite the promise demonstrated by the MBone and a logical reasoning that IP multicast would serve as a foundation for
`video streaming on the Internet, commercial deployment of multicast applications and services on the public Internet has
`been painfully slow by the Internet standard. The bulk of the streaming media traffic generated by three major commercial
`
`
`* Correspondence: One Infinite Loop, MS: 302-3MT, Cupertino, CA 95014, USA; Email: jian@computer.org.
`
`Petitioners' Exhibit 1012
`Page 0001
`
`
`
`streaming systems (i.e., QuickTime, RealSystem, and Windows Media) on today’s Internet consists of largely unicast
`streams. It is out of this paper’s scope to discuss the unresolved issues that have slowed down the deployment of multicast
`applications and services; interested readers are referred to a recent white paper published by the IP Multicast Initiatives.2
`However, we want to note that even when multicast is fully deployed, a very large volume of streaming media traffic will
`remain to be unicast streams. This is because a large portion of the streaming media content is archived and played on-
`demand by users on the Internet. Multicast is of little help to on-demand streaming unless a large number of users from the
`same neighborhood of the Internet would request the same content simultaneously.
`
`The purpose of this paper is to review several important issues in Internet video streaming from a signal processing
`perspective. For the reasons we have just mentioned above, our review will focus on unicast streaming while considering
`multicast as a possible extension. Drawing from the author’s experience in developing one of the major commercial media
`streaming systems, we attempt to identify the most challenging problems in video streaming and evaluate a number of signal
`processing solutions that have been proposed; some of these solutions have already been implemented in experimental and/or
`commercial video streaming systems. To make the review useful to both researchers and application developers, we would
`like to clearly identify among the proposed solutions: what works in real world, what doesn’t, and why? We want to note that
`although to a Web surfer streaming video commonly implies synchronized audio as well, video and audio are actually
`delivered in independent streams in a streaming session. We will not discuss streaming audio in this paper.
`
`The rest of this paper is organized as follows. Section 2 introduces streaming system and architecture and describes what we
`believe to be the most challenging problems in video streaming. We also examine some important characteristics of the
`streaming system and the constraints that are imposed on any prospective signal processing solutions. Section 3 reviews a
`number of signal processing solutions to the video streaming problems in great detail. We first comment on some proposed
`solutions that are simply unrealistic for real-world video streaming on the Internet. Then, we evaluate a family of viable
`solutions, identify areas of research that have received less attention, and point to the problems to which a better solution is
`eagerly sought by the industry. Section 4 concludes the paper.
`
`2. VIDEO STREAMING SYSTEM AND ARCHITECTURE
`
`2.1. Video Streaming Model and Architecture
`
`A schematic system diagram of Internet video streaming is shown in Figure 1 for both unicast and multicast scenarios.
`Figure 1(a) represents the unicast model for two types of video streaming services, on-demand or live. In on-demand
`streaming, pre-compressed, streamable video content is archived on a storage system such as a hard disk drive and served to a
`client on demand by the streaming server. In live streaming, a video stream coming out of a live source such as a video
`camera is compressed in real-time by the encoder. The compressed bitstream is packetized by the packetizer, and sent over
`the Internet by the streaming server. On the client side, the video bitstream is reassembled from the received packets by the
`reassembler, then decoded by the decoder, and finally rendered on a display. Noting that each client connecting to the server
`receives an individual stream, the disadvantages of the unicast streaming model can be explained. Clearly, the server load
`will increase proportionally to the number of clients connected to and served by the server. When a large number of clients
`try to connect to a server at the same time, the server gets overloaded, creating a “hot spot” on the Internet. Additionally,
`sending copies of the same data in individual streams to a large number of clients is obviously inefficient; it causes
`congestion on the network and deteriorates quality of services.
`
`The multicast model of Figure 1(b) has many advantages over unicast in live video streaming (broadcasting) where a large
`audience could tune in for the broadcast at the same time. Instead of sending a stream to each client individually, the server
`sends out only one stream that is routed to one or more “group addresses”. A client tunes in to a multicast group in its
`neighborhood to receive the stream. Since the server is decoupled from connecting to each client directly, server load does
`not increase with the number of receiving clients. In the case of delivering a stream to multiple groups, the stream on its
`route gets replicated and branched only at the fan-out points within the network. When combined with layered coding that
`will be discussed in Section 3, multicast provides a decentralized and highly efficient video streaming architecture.
`
`Despite the clear advantages of multicast-based video streaming, it is employed in only a small fraction of today’s Internet,
`primarily in Intranets within an institution or corporation. There are complex reasons for the slow deployment of multicast-
`based services in general.2 Beyond deployment hurdles we note that a large volume of streaming video content is archived
`and streamed to users on demand. Multicast does not help very much in on-demand streaming where the service requests
`from users are random and highly desynchronized in terms of requested content and timing of these requests. Interestingly,
`
`2
`
`Petitioners' Exhibit 1012
`Page 0002
`
`
`
`recent development in the so-called “content delivery network” appears to have achieved some improvements in unicast-
`based on-demand streaming that are reminiscent of those we have seen with multicast-based live streaming. A content
`delivery network such as the one developed by Akamai Technologies3 is essentially a load-balanced network of servers. The
`servers are deployed on the “edge” of the Internet, making them close to the end users. These “edge servers” communicate
`with a “source server” to update their content. A client requesting a streaming video is redirected dynamically to the closest
`edge server that is not overloaded. Like multicast-based video streaming, this decentralized service model can reduce “hot
`spots”. In principle, a content delivery network can also help unicast-based live streaming with a possibly increased latency.
`
`Reassembler
`
`Decoder
`
`Display
`
`Components of
`streaming client
`
`R
`
`D
`
`Internet
`
`Streaming Clients
`
`Streaming
`Clients
`
`Streaming Server
`
`•
`
`••
`
`(a)
`
`Storage Device
`
`Video Camera
`
`Encoder
`
`Packetizer
`
`E
`
`P
`
`•
`
`••
`
`•
`
`Internet
`
`Streaming Server
`
`•
`
`••
`
`Storage Device
`
`Video Camera
`
`Encoder
`
`Packetizer
`
`E
`
`P
`
`Figure 1. Video streaming system models. (a) Unicast video streaming, and (b) Multicast video streaming.
`
`(b)
`
`3
`
`Petitioners' Exhibit 1012
`Page 0003
`
`
`
`2.2. The Internet Challenge to Video Streaming
`
`The Internet provides a great challenge to video streaming. From a communication point of view the Internet is a shared
`channel with a massive number of heterogeneous components. This channel is extremely volatile over space and time in
`terms of communication capacity and quality. To see what it means to video streaming let us look at some data that we have
`collected from a real-life video streaming session.
`
`100
`
`200
`
`300
`Time (sec)
`
`400
`
`500
`
`600
`
`
`
`120000
`
`100000
`
`80000
`
`60000
`
`40000
`
`20000
`
`0
`
`0
`
`
`
`90
`
`80
`70
`
`60
`
`50
`40
`
`30
`20
`
`10
`
`0
`
`RTP video packet loss (%)
`
`RTP video bitrate (bits/sec)
`
`0
`
`100
`
`200
`
`300
`Time (sec)
`
`400
`
`500
`
`600
`
`Figure 2. RTP video bitrate (top) and packet loss rate (bottom). The data was recorded at 10:08-10:18am
`on December 23, 1999, while watching BBC World live broadcast on QuickTimeTV from Sunnyvale, CA.
`
`Figure 2 shows a 10-minute data log by a streaming client in Sunnyvale, California, during a session watching the BBC
`World channel on QuickTimeTV on December 23, 1999. The data shows the variations in bitrate and percentage of packet
`loss of the incoming RTP (Real-time Transport Protocol)4 video stream. The client has signaled to the server that it had a dual
`ISDN connection with up to 112kbps in bandwidth. Knowing that the client’s bandwidth was also shared by streaming
`audio, we observed that the bitrate of the incoming video fluctuated between 60 and 80kbps in general and the packet loss
`ranged from 0 to 10%. There are, however, a few exceptional instances where the bitrate dropped to as low as 2kbps and
`packet loss rate shot up as high as 85%. It is also interesting to observe from Figure 2 that there is a strong correlation
`between a sudden drop in bitrate and a burst of packet loss. This appears to agree with a common-sense logic about the
`relationship between congestion and packet loss.
`
`4
`
`Petitioners' Exhibit 1012
`Page 0004
`
`
`
`Figure 2 shows us all but a typical video streaming session when everything runs normally. There are also extreme cases.
`Many of us may have experience of watching streaming video that stalled completely for several times during a streaming
`session. In such cases, the bitrate of the incoming video is nil and the packet loss rate is 100%, making it totally worthless to
`watch. Generally speaking, the experience of streaming video isn’t always as bad (or good) as that in our memory or in
`Figure 2. A person who was watching the same streaming video from another location might have had a totally different
`experience. The point is that the Internet channel variations are so volatile and space- and time-dependent that the
`observation can be different by every client on the network at every second. The data shown in Figure 2 and our experience
`about video streaming challenge us with two fundamental signal processing problems whose solutions can improve the
`performance of Internet video streaming dramatically. The first problem is concerned with how to adapt to the constantly
`changing bandwidth and congestion condition on the Internet. The second calls for solutions to coping with packet loss due
`to the best-effort service model of the current Internet. We will, for short, refer to these two problems as “rate adaptation”,
`and “error control and recovery”, respectively. Before we attempt to address these problems and evaluate a number of
`proposed solutions, we need to examine the constraints that have been imposed on any prospective solutions. We go back to
`the video streaming model and architecture of Section 2.1 and make a few important observations.
`
`2.3. Video Streaming System Characteristics and Constraints
`
`Many signal processing researchers are familiar with the video conferencing model thanks to the popularity of ITU H.261
`and H.263 standards. In the following we review the characteristics of a video streaming system and point out its difference
`from video conferencing. For the comparison purpose we provide in Figure 3 a functional diagram of an IP-based peer-to-
`peer video conferencing system. As we have mentioned, we will focus on analyzing the unicast video streaming system.
`
`(a) Latency. One difference between streaming and conferencing is that the former does not require two-way user
`interaction. As such the requirement on latency is more relaxed in streaming than in conferencing. Latency requirement
`commonly translates into a decoder/client buffer of certain size; which is often simulated by the encoder/server for rate
`control. When a video streaming session is initiated, the client waits for a few seconds for its buffer to be filled by the
`server. This process is called pre-roll. After the pre-roll the client starts playing the video. Obviously, the client’s
`buffer has to be kept from underflow in order for the video to continue to play without interruption. Generally speaking,
`streaming can afford to use a much larger buffer than that can be considered in conferencing to absorb network jitters.
`However, latency and smooth playback must be balanced to create a good user experience, which becomes more
`important in streaming live video. It is common in commercial video streaming systems to pre-roll a few seconds of data
`into the decoder/client buffer.
`
`(b) Client-Server Model. This makes a major distinction between the video streaming system of Figure 1(a) and the peer-to-
`peer conferencing system of Figure 3. In the video conferencing system of Figure 3, there is no server involved.
`Additionally, most components including encoder, decoder, packetizer, and reassembler at either end often reside on one
`piece of hardware, which is a desktop computer in the so-called desktop conferencing environment. The video streaming
`system in Figure 1(a), however, operates in a client-server environment. As such, a streaming server may be required to
`serve thousands of clients simultaneously. It is therefore necessary for the encoder to be separate from the server
`hardware so that the server hardware can be dedicated to serving the clients. For archived, on-demand video streaming,
`encoder and server are separate by default. In live streaming, encoder/server separation enables a more efficient “encode
`once, serve multiple times” service model. This model is almost always adopted in practice. A common implementation
`for live video streaming is to combine the encoder and packetizer to reside on one piece of hardware that broadcasts a
`stream to the server. The server simply “reflects” the stream to its clients. The client-server model and the nature of
`encoder/server separation make it unsuitable for many signal processing techniques that have been developed for video
`conferencing to be used in video streaming. We will come back to this observation later.
`
`(c) Server Load Restrictions. As we have pointed out that a streaming video server may be required to serve thousands of
`clients simultaneously. When the number of clients connected to a server increases, the server experiences an elevated
`load due to increased processing and I/O demands. Eventually the server can get overloaded and the quality of service
`drops. While it is relatively well understood that a good signal processing solution should not require the server to do
`much processing, the requirement for I/O is often overlooked. In archived, on-demand video streaming, when the
`number of clients increases, heavy I/O activity, particularly disk access, often becomes the bottleneck in the streaming
`system. This ought to be kept in mind in designing a signal processing solution for video streaming.
`
`5
`
`Petitioners' Exhibit 1012
`Page 0005
`
`
`
`Video Camera
`
`Encoder
`
`Packetizer
`
`P
`
`R
`
`E
`
`D
`
`Internet
`
`P R
`
`E
`
`D
`
`Display
`
`Decoder
`
`Reassembler
`
`Figure 3. An IP-based peer-to-peer video conferencing system.
`
`(d) The Best-Effort Service Model. The Internet is a best-effort service network. As such, each request to send a packet is
`honored by the Internet at best it can. Congestion on the Internet causes packet delay and loss. Normally, a packet is
`routed through multiple hops on its way to the destination. At each switching point linking the hops, if there are other
`packets that have arrived earlier, the incoming packet is queued before it can be processed by the router, resulting a
`queuing delay. If the router’s queue is full, the incoming packet will be dropped, resulting a packet loss. Statistics have
`shown that packet delay and loss often occur in bursts because an overloaded router and the affected route are likely to
`stay congested for some time. In video streaming, each packet is scheduled to be played at certain time; a packet that has
`arrived later than the scheduled time is useless and will be discarded. For this reason, we don’t distinguish packets that
`are dropped by a router due to congestion or by the streaming client because they come too late to be used. More
`generally, taking a signal processing approach, we can view the Internet as a “black box”. A streaming client from
`outside the black box can observe an effective bandwidth and packet loss rate without having to know what is going on
`inside the black box.
`
`(e) Packetization. A packetizer takes an encoded video bitstream and packetizes it for transmission over a transport protocol
`such as RTP. One of the most important aspects in designing a packetizer is to determine how to fragment the video
`bitstream. A well-designed fragmentation scheme with associated header information can help contain error in case of a
`packet loss. To date many video payload formats have been specified or proposed for RTP including those for the
`popular H.261, H.263 and the MPEG family. These payload formats offer fragmentation schemes of different
`granularities, such as Group-Of-Block(GOB) based or Macro-Block (MB) based. Besides fragmentation scheme it is
`also important to choose the size of a packet. On the one hand, small packets could control the adverse effect of a packet
`loss; on the other hand, it is desirable to send large packet to reduce packet header overhead. When streaming video over
`RTP, it is common to packetize the data for the maximum transmission unit (MTU) of a local area network (LAN). For
`the Ethernet, the MTU is about 1500 bytes, or 12000 bits. However, other restrictions may apply. For example, a RTP
`video packet is required to contain at most a single video frame. In other words, a new frame cannot start in the middle
`of a packet, though a frame can be split into multiple packets.
`
`(f) Back-channel. A back-channel in a video streaming system is a communication channel established for a streaming
`client to report its state to and/or request services from the server. In RTP streaming, implementation of back-channels is
`facilitated by the Real-time Transport Control Protocol (RTCP) that is part of the RTP specifications.4 The RTCP
`defines a type of packet for receiver report (RR). A RR packet includes such information as fraction of packets lost since
`the previous RR packet, cumulative number of packets lost, and interarrival jitter. It also supports profile- or
`application-specific extensions for including more information from the receiver. In addition to using the RR packets, it
`is also possible to define customized RTCP packet to carry feedback information from a streaming client. While the
`information about effective bandwidth and the state of packet loss can be fed back and useful to the streaming server, it
`is a common mistake to assume that the video encoder can always take advantage of the back-channel. We will discuss
`more on this in next Section.
`
`6
`
`Petitioners' Exhibit 1012
`Page 0006
`
`
`
`3. SIGNAL PROCESSING SOLUTIONS FOR VIDEO STREAMING
`
`3.1. Solutions That Do Not Fit
`
`After reviewing the architecture and characteristics of a video streaming system in the previous Section, it becomes clear that
`some of the methods that have been developed for video conferencing and other applications such as wireless video are not
`quite applicable to video streaming. We comment on these “unfit” solutions before moving onto evaluating a family of
`viable solutions.
`
`3.1.1. Feedback-Based Encoder Adaptation
`
`Many signal processing researchers have explored the idea of having a video encoder adapt to the changing conditions in
`bandwidth and packet loss. It is assumed that a back-channel exists and will feed the necessary information back to the
`encoder which in turn adjusts its behavior in response to the bandwidth and packet loss conditions. This idea has motivated
`the design of the RTCP channel along with RTP.4-6 To date, many ways of encoder adaptation have been proposed and
`studied. Bolot, Turletti, and Wakeman5 proposed an adaptive rate control mechanism by which a video encoder adjusts its
`output bitrate based on the feedback information from the receivers. There has been continued work along this direction.7-10
`In the same spirit as feedback-based rate control, Steinbach, Färber, and Girod11 proposed to adapt the coding mode selection
`at the encoder to the error conditions at the receiver for wireless video. The idea is that the error damaged MBs may be
`tracked by the sender through a receiver’s feedback, and a rapid error recovery can be achieved by INTRA refreshing
`corresponding MBs in the next frame. More recently, the scheme of adaptive coding mode selection was proposed for
`Internet video based on a packet loss probability model that must be estimated at the receiver and fed back to the sender10.
`Similar to adaptive INTRA MB refreshing and placement, it was proposed that a damaged reference picture could be
`dynamically replaced at the sender according to the back-channel signaling.12 This mechanism is now supported in ITU
`H.263+ standard. 13 There are still more ways of encoder adaptation. For example, unequal error protection can be applied
`according to packet loss rate,14 provided that a current estimate of the packet loss rate at a specific receiver is available.
`
`Although the feedback-based encoder adaptation techniques that we have just reviewed can be effective in peer-to-peer video
`conferencing and wireless video, they don’t fit in a real-world video streaming system. This is due to the encoder/server
`separation that we discussed in the previous Section. As we mentioned there (see Figure 1(a)), in live video streaming, for
`efficiency purpose the encoder will encode once, producing one bitstream for the server. Of course, the system can be
`configured to enable the encoder to generate a bitstream adapted to each client. However, such a design will not scale up to
`an industry-strength video streaming system. It is well-known that real-time video compression is computationally
`expensive; this is one reason why the encoder is separate from the server. Considering the server may be serving thousands
`of clients simultaneously, it is simply unrealistic to expect the encoder to compress and output thousands of different
`bitstreams that are adapted to each client. In on-demand video streaming, it is obvious that encoder adaptation can not be
`applied because the video content is compressed off-line.
`
`3.1.2. Complex Server Involvement
`
`Now that we know the encoder in a video streaming system cannot afford to adapt to each client’s condition. One may
`wonder if the server can adapt. Generally speaking, the server does have some flexibility and capability to manipulate the
`video bitstream to adapt to a client’s needs. In fact, all viable solutions to rate adaptation to be reviewed in Section 3.2 rely
`on the server to do some kind of processing. The exact requirements for the server involvement depend on a specific
`adaptation technique, but for the obvious reason the server should not be tasked with complex processing on a per client
`basis.
`
`An example of tasks that are too complex is for the server to convert the bitrate of a pre-encoded video bitstream dynamically
`by transcoding. Although a transcoder is often much less complex than a full encoder, a considerable amount of computing
`such as bit re-allocation, and motion vector search or refinement may be required.15-17 A server is unlikely to scale up to
`serving a large number of clients if it performs transcoding for each client. A simplified version of transcoding, called
`dynamic rate shaping (DRS).18,19 requires even less processing because it achieves bitrate reduction by selectively dropping
`quantized transform coefficients or blocks, or by re-quantization of the blocks.20 Despite the significant reduction in
`computational complexity at the expense of introducing a quality problem known as “drift”, we still feel that DRS is too
`complex for a server to do on a per client basis. This is because DRS requires the server to have knowledge about a specific
`codec to process the bitstream.
`
`7
`
`Petitioners' Exhibit 1012
`Page 0007
`
`
`
`3.1.3. Intraframe Error Control and Concealment at Low Bitrate
`
`There has been much work on designing error resilient bitstream for error control. For example, the emerging MPEG-4
`video standard supports such mechanisms as resynchronization marker, data partition, and reversible VLC for intraframe
`error control. When these features are implemented in a compressed bitstream, the decoder can regain synchronization in
`case of bit errors, and continue to decode a video frame even if a portion of the frame is damaged or lost. Furthermore, error
`concealment techniques can be used to recover a damaged or lost region within a video frame to certain degree.
`
`Obviously, any intraframe error control and concealment technique would become useless if a whole video frame is lost
`completely. Unfortunately, this is often the case at low bitrate when we attempt to packetize a video bitstream for the MTU
`of LAN. As we mentioned in Section 2.3, the MTU for the Ethernet is 12000 bits. Consider that we compress video for a
`target bitrate of 36kbps at a frame rate of 4fps. This would be suitable for users with a 56kbps modem connection because
`we need to leave some room for audio and bandwidth fluctuation. At this target bitrate and frame rate, the average size of a
`video frame is 9000 bits, which is less than the MTU of 12000 bits. In practice, a motion compensated frame is often much
`smaller in size than 9000 bits in low bitrate video streaming. This implies that when a packet is lost, it is most likely that a
`whole video frame is lost. Therefore, no matter how well intraframe error control and concealment techniques are
`implemented in the video bitstream and at the decoder, they are of little help in low bitrate video streaming.
`
`3.2. Solutions for Rate Adaptation
`
`As we discussed in Section 3.1, the video encoder cannot afford to adapt to individual clients, and the streaming server is
`constrained to doing only light processing on a per client basis. The solutions to be reviewed below for rate adaptation fit
`these requirements. Additionally, these solutions do not require the server to understand a compressed video bitstream.
`From a server design point of view, it is advantageous to make the server codec-independent, or even media-independent. A
`good example of media-independent RTP streaming server is Apple’s Darwin QuickTime Streaming Server (QTSS). The
`Darwin QTSS is open-sourced and available from Apple.21
`
`3.2.1. Dynamic Stream Switching
`
`Video streaming using multiple bitstreams encoded at different bitrates has been supported in commercial streaming systems
`such as Apple’s QuickTime, RealNetworks’ RealSystem G2, and Microsoft Windows Media. RealSystem G2 was the first to
`support dynamic stream switching under the name of SureStream.22,23 The idea is that the server can switch among multiple
`streams to serve a client with one that best matches the client’s available bandwidth. The switching is dynamic, depending
`on the estimated available bandwidth reported by a client and the server’s decision on whether it will respond. We want to
`note that in some research literature, video streaming using dynamic stream switching is called simulcast.24 We prefer not to
`use this term because what we are describing here is different from the same word originated and used in the broadcasting
`industry.
`
`In the SureStream implementation, three streams are prepared for a single target bitrate. It is worth noting that at the low end
`is a keyframe only stream encoded at approximately 50% of the target bitrate.22 The bitrate of this keyframe only stream can
`be further reduced by dropping additional keyframes since there is no dependency among the keyframes. Furthermore, the
`keyframe only stream is a good fallback choice when the packet loss rate is high and there is a need to resynchronize.
`
`Although being effective and requiring only a little server involvement, dynamic stream switching makes the encoding
`process more complex. The mechanism of making