`a2) Patent Application Publication (0) Pub. No.: US 2010/0235528 Al
`
`(43) Pub. Date: Sep. 16, 2010
`Bocharovet al.
`
`US 20100235528A1
`
`(54) DELIVERING CACHEABLE STREAMING
`MEDIA PRESENTATIONS
`
`(75)
`
`Inventors:
`
`John A. Bocharoy,Seattle, WA
`(US); Geqiang (Sam) Zhang,
`Redmond, WA (US); Krishna
`Prakash Duggaraju, Renton, WA
`(US); Sudheer Sirivara, Redmond,
`WA (US); Lin Liu, Sammamish,
`WA (US); Anirban Roy, Kirkland,
`WA (US); Jimin Gao, Seattle, WA
`(US); Jack E. Freelander, Monroe,
`WA (US); Christopher G.
`Knowlton, Redmond, WA (US);
`Vishal Sood, Bothell, WA (US)
`
`Correspondence Address:
`MICROSOFT CORPORATION
`ONE MICROSOFT WAY
`
`REDMOND, WA98052 (US)
`
`(73) Assignee:
`
`MICROSOFT CORPORATION,
`Redmond, WA (US)
`
`(21) Appl. No.:
`
`12/405,220
`
`(22)
`
`Filed:
`
`Mar. 16, 2009
`Publication Classification
`
`(51)
`
`Int. Cl.
`(2006.01)
`GO6F 15/16
`(2006.01)
`GO6F 7/00
`(52) US. CL wee 709/231; 709/248; 707/E17.009;
`709/203; 707/E17.032
`
`(57)
`
`ABSTRACT
`
`A smooth streaming system provides a stateless protocol
`between a client and server in which the server embeds incre-
`mental control information in media fragments. The server
`provides uniform media fragment responses to media frag-
`ment requests that are cacheable by existing Internet cache
`infrastructure. The smooth streaming system receives media
`data in fragments from one or more encoders, creates an index
`of each fragment, and stores the fragments. The server pro-
`vides fragments to clients that contain metadata information
`describing the encodings available on the server and the
`encoding of the fragment. The server may also provideinfor-
`mation within each fragment that allowsthe client to deter-
`mine whetherthe client is requesting data too fast or too slow,
`so that the client can adaptits request rate to a cadence in tune
`with the rate at which the server is receiving encoderdata.
`
`100
`
`Smooth Streaming System
`
`110
`
`Register
`Event
`
`Encoder
`Interface
`
`Index
`Fragment
`
`Fragment
`Data Store
`
`Manifest
`
`Client
`Interface
`
`Build Client
`
`Clock
`Synchroniz-
`ation
`
`Google Exhibit 1007
`Google Exhibit 1007
`Google v. Ericsson
`Google v. Ericsson
`
`
`
`Patent Application Publication
`
`Sep. 16,2010 Sheet 1 of 5
`
`US 2010/0235528 Al
`
`Smooth Streaming System
`
`110
`
`100
`
`Manifest
`
`Clock
`Synchroniz-
`ation
`
`Register
`Event
`
`Encoder
`Interface
`
`Index
`Fragment
`
`Fragment
`Data Store
`
`Client
`Interface
`
`Build Client
`
`FIG. 1
`
`
`
`Patent Application Publication
`
`Sep. 16,2010 Sheet 2 of 5
`
`US 2010/0235528 Al
`
` /eusayU|——_nr[===
`
`
`
`[..s)J9AJ3S[..s)I9AI3S
`09¢OS¢2
`ulBUOSi
`90JNOS——_an
`
`
`
`YIOMION[BUs3a}xy
`
`OVEA
`
`jeuayu|NGO
`
`4IOMJON
`
`COIA
`
`jUsI|D
`
`06¢
`
`082
`
`0&2
`
`BIpay| Oc?
`
`$)Japoouy
`
`90/NOS
`
`
`
`Patent Application Publication
`
`Sep. 16,2010 Sheet 3 of 5
`
`US 2010/0235528 Al
`
`Encoder Push/Pull
`
`310
`
`Receive Event Registration
`
`Request Stream/Server
`
`330
`
`Receive Stream/Server
`Manifest
`
`
`
`
`
`
`More Fragments?
`
`380
`
`N
`
`Done
`
`FIG. 3
`
`
`
`Patent Application Publication
`
`Sep. 16,2010 Sheet 4 of 5
`
`US 2010/0235528 Al
`
`
`
`
`
`
`
`Request
`Received?
`
`
`
`FIG. 4
`
`
`
`Request
`
`480
`
`
`
`Patent Application Publication
`
`Sep. 16,2010 Sheet 5 of 5
`
`US 2010/0235528 Al
`
`505
`
`510
`
`Encoder /
`Ingestion Server
`
`520
`
`Origin Server
`
`f-MP4 Stream
`
`515
`
`Client
`
`Chunk’s Information
`
`Generate Client Manifest
`with Latest Chunk Information
`
`525
`
`Archive
`Chunks
`
`Client Manifest Request
`
`Client Manifest Response
`
`Chunk Requests
`
`Chunk Responseswith Following
`
`530
`
`540
`
`545
`
`550
`
`FIG. 5
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`DELIVERING CACHEABLE STREAMING
`MEDIA PRESENTATIONS
`
`BACKGROUND
`
`Streaming media is multimedia that is constantly
`[0001]
`received by, and normally presented to, an end-user (using a
`client) while it is being delivered by a streaming provider
`(using a server). Several protocols exist for streaming media,
`including the Real-time Streaming Protocol (RTSP), Real-
`time Transport Protocol (RTP), and the Real-time Transport
`Control Protocol (RTCP), which are often used together. The
`Real Time Streaming Protocol (RTSP), developed by the
`Internet Engineering Task Force (IETF) andcreated in 1998
`as Request For Comments (RFC) 2326, is a protocol for use
`in streaming media systems, which allows a client to remotely
`control a streaming media server, issuing VCR-like com-
`mands suchas “play” and “pause”, and allowing time-based
`accessto files on a server.
`
`[0002] The sending of streamingdata itselfis not part ofthe
`RTSP protocol. Most RTSP servers use the standards-based
`RTPas the transport protocol for the actual audio/video data,
`acting somewhat as a metadata channel. RTP defines a stan-
`dardized packet format for delivering audio and video over
`the Internet. RTP was developed by the Audio-Video Trans-
`port Working Groupofthe IETF andfirst published in 1996 as
`RFC 1889, and superseded by RFC 3550 in 2003. The pro-
`tocolis similar in syntax and operation to Hypertext Transport
`Protocol (HTTP), but RTSP adds new requests. While HTTP
`is stateless, RTSP is a stateful protocol. A session ID is used
`to keep track of sessions when needed. RTSP messagesare
`sent from client to server, although some exceptions exist
`where the server will send messagesto the client.
`[0003] RTP is usually used in conjunction with RTCP.
`While RTP carries the media streams(e.g., audio and video)
`or
`out-of-band
`signaling
`(dual-tone multi-frequency
`(DTMF)), RTCP is used to monitor transmission statistics
`and quality of service (QoS) information. RTP allows only
`one type of message, onethat carries data from the source to
`the destination. In many cases, there is a use for other mes-
`sages in a session. These messagescontrol the flow and qual-
`ity of data and allow the recipient to send feedback to the
`source or sources. RTCP is a protocol designed for this pur-
`pose. RTCP has five types of messages: sender report,
`receiver report, source description message, bye message,
`and application-specific message. RTCP provides out-of-
`band control information for an RTP flow. RTCP partners
`with RTPin the delivery and packaging of multimedia data,
`but does nottransport any dataitself. It is used periodically to
`transmit control packets to participants in a streaming multi-
`media session. One function of RTCPis to provide feedback
`on the quality of service being provided by RTP. RTCP gath-
`ers statistics on a media connection and information such as
`bytes sent, packets sent, lost packets, jitter, feedback, and
`roundtrip delay. An application may use this information to
`increase the quality of service, perhaps by limiting flow or
`using a different codec orbit rate.
`[0004] One problem with existing media streaming archi-
`tectures is the tight coupling between server and client. The
`stateful connection between client and server creates addi-
`tional server overhead, because the server tracks the current
`state of each client. This also limits the scalability of the
`server. In addition,the client cannot quickly react to changing
`conditions, such as increased packetloss, reduced bandwidth,
`user requests for different content or to modify the existing
`
`content (e.g., speed up or rewind), and so forth, withoutfirst
`communicating with the server and waiting for the server to
`adapt and respond. Often, whena clientreports a loweravail-
`able bandwidth (e.g., through RTCP), the server does not
`adapt quickly enough causing breaks in the media to be
`noticed by the user on the client as packets that exceed the
`available bandwidth are not received and new lowerbitrate
`
`packets are not sent from the server in time. To avoid these
`problems, clients often buffer data, but buffering introduces
`latency, which for live events may be unacceptable.
`[0005]
`In addition, the Internet contains many types of
`downloadable media content items, including audio, video,
`documents, and so forth. These content items are often very
`large, such as video in the hundreds ofmegabytes. Users often
`retrieve documents over the Internet using HTTP through a
`web browser. The Internet has built up a large infrastructure of
`routers and proxies that are effective at caching data for
`HTTP. Servers can provide cached data to clients with less
`delay and by using fewer resources than re-requesting the
`content from the original source. For example, a user in New
`York may download a content item served from a host in
`Japan, and receive the content item through a router in Cali-
`fornia. If a user in New Jersey requests the samefile, the
`router in California may be able to provide the content item
`without again requesting the data from the host in Japan. This
`reduces the networktraffic over possibly strained routes, and
`allowsthe user in New Jersey to receive the content item with
`less latency.
`[0006] Unfortunately, live media often cannot be cached
`using existing protocols, and each client requests the media
`from the sameserver or set of servers. In addition, when
`streaming media can be cached,it is often done by specialized
`cache hardware, not existing and readily available HTTP-
`based Internet caching infrastructure. The lack of caching
`limits the number ofparallel viewers and requests that the
`servers can handle, and limits the attendanceofa live event.
`The world is increasingly using the Internet to consumeup to
`the minute live information, such as the record number of
`users that watchedlive events such as the opening ofthe 2008
`Olympics via the Internet. The limitations of current technol-
`ogy are slowing adoption of the Internet as a medium for
`consuming this type of media content.
`
`SUMMARY
`
`[0007] Asmooth streaming system is described herein that
`providesa stateless protocol between the client and server in
`which the server embeds incremental information in media
`
`fragments that eliminates the usage ofa typical control chan-
`nel. In addition, the server provides uniform media fragment
`responses to media fragmentrequests, thereby allowing exist-
`ing Internet cache infrastructure to cache streaming media
`data. The smooth streaming system receives media data in
`fragments from one or more encoders, creates an index of
`each fragment, and stores the fragments. As the event
`progresses, the server provides fragments requested bycli-
`ents until the end of the event. Each fragment contains meta-
`data information that describes the encodingsavailable on the
`server and the encoding of the fragment in addition to the
`media content of the fragment for playback bythe client. The
`server may provide fragments in multiple encodings so that
`the client can, for example, switch quickly to fragments of a
`different bit rate or playback speed based on network condi-
`tions. The server may also provide information within each
`fragmentthat allowsthe client to determine whetherthe client
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`is requesting data too fast or too slow,so that the client can
`adapt its request rate to a cadence in tune with the rate at
`whichthe serveris receiving encoder data. Thus, the smooth
`streaming system provides a more scalable streaming media
`server without tracking client state and with an increased
`likelihood that clients will receive media with lowerlatency
`from a cacheserver local to the client.
`
`[0008] This Summaryis provided to introducea selection
`of concepts in a simplified form that are further described
`below in the Detailed Description. This Summary is not
`intended to identify key features or essential features of the
`claimed subject matter, nor is it intended to be used to limit
`the scope of the claimed subject matter.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram that illustrates compo-
`[0009]
`nents of the smooth streaming system, in one embodiment.
`[0010]
`FIG. 2 is a block diagram thatillustrates an operat-
`ing environment of the smooth streaming system using
`Microsoft Windows and Microsoft Internet
`Information
`Server (IIS), in one embodiment.
`[0011]
`FIG. 3 isa flow diagram thatillustrates the process-
`ing of the system to receive media data from encoders, in one
`embodiment.
`
`FIG. 4isa flow diagram thatillustrates the process-
`[0012]
`ing of the system to handle a client connection for streaming
`media, in one embodiment.
`[0013] FIG.5isa data flow diagram thatillustrates the flow
`of media fragments from an encoderto an origin server to a
`client, in one embodiment.
`
`DETAILED DESCRIPTION
`
`[0014] A smooth streaming system is described herein that
`provides a stateless protocol between the client and server in
`which the server embeds incremental information in media
`fragments (i.e., chunks)that eliminates the usage ofa typical
`control channel. In addition, the server provides uniform
`media fragment responses to media fragment requests(1.e.,
`clients requesting the same fragmentget the same response),
`thereby allowing existing Internet cache infrastructure to
`cache streaming media data. Each fragment has a distin-
`guished Uniform Resource Locator (URL) that allows the
`fragment to be identified and cached by both Internet cache
`servers and the client’s browser cache. Caching reduces the
`load on the server and allows moreclients to view the same
`
`content at the same time. The smooth streaming system
`receives media data in fragments from one or more encoders,
`creates an index of each fragment, and stores the fragments.
`As the event progresses,
`the server provides fragments
`requested by clients until the end of the event. Each fragment
`contains metadata information that describes the encodings
`available on the server and the encoding of the fragment in
`addition to the media contentof the fragmentfor playback by
`the client. The server may provide fragments in multiple
`encodings so that the client can, for example, switch quickly
`to fragments ofa differentbit rate or playback speed based on
`network conditions. The server may also provide information
`within each fragment that allows the client to determine
`whetherthe client is requesting data too fast or too slow, so
`that the client can adapt its request rate to a cadence in tune
`with the rate at which the server is receiving encoder data.
`Thus, the smooth streaming system provides a more scalable
`streaming media server without tracking client state and with
`
`an increased likelihood that clients will recetve media with
`lowerlatency from a cache server local to the client.
`[0015]
`In some embodiments, the smooth streaming sys-
`tem uses a particular data transmission format between the
`server and client. The client requests fragments ofmedia from
`a serverthat include a portion of the media. For example, for
`a 10-minutefile, the client may request 2-second fragments.
`Note that unlike typical streaming where the server pushes
`data to the client, in this case the client pulls media fragments
`from the server. In the case of a live stream, the server may be
`creating the media on the fly and producing fragments to
`respondto client requests. Thus, the client may only be sev-
`eral fragments behind the server in terms of how fast the
`server creates fragments and how fast the client requests
`fragments.
`[0016] Each fragment contains metadata and media con-
`tent. The metadata may describe useful information about the
`media content, such as the bit rate at which the media content
`was encoded, where the media contentfits into a larger media
`element (e.g., this fragment represents offset 1:10 in a 10
`minute videoclip), the codec used to encode the media con-
`tent, and so forth. The client uses this informationto place the
`fragmentinto a storyboard of the larger media element and to
`properly decode and playback the media content.
`[0017]
`FIG. 1 is a block diagram that illustrates compo-
`nents of the smooth streaming system, in one embodiment.
`The smooth streaming system 100 includes a register event
`component 110, an encoder interface component 120, an
`index fragment component 130, a fragmentdata store 140, a
`client interface component 150, a build client manifest com-
`ponent 160, and a clock synchronization component 170.
`Each of these components is described in further detail
`herein.
`
`[0018] The register event component 110 receives informa-
`tion abouta live or other media event for which the system
`will receive encoded media data. The information may
`include network address information or other identifiers for
`each of the encoders that will supply encoded media data to
`the server. The information also includes a URL to which
`encoders will supply encoded media data and at which clients
`can access the media data.
`
`[0019] The encoderinterface component 120 provides an
`interface between the system and one or more encoders that
`provide the encoded media data. The encoders may push data
`to the system using common networkprotocols. For example,
`the encoders may use an HTTP POST request to provide
`encoded media data to the system. The encoders may each use
`a distinguished URL that specifies the encoder that is the
`source of the encoded media data, which the server may
`match to the information received by the register event com-
`ponent 110 when the media event wasregistered.
`[0020] Theencoderinterface component 120 mayspecify a
`particular formatfor received encoded media data, such as an
`MP4orother media container (e.g., MKV). The MP4 con-
`tainer format allows multiple types of data to be associated in
`a single file. The individual data that makes up an MP4
`container is called a box, and each box typically has a label
`that identifies the type ofdata stored in the box. Encoders may
`place metadata information in the boxes such as the type of
`encoding used to encode the encoded media data, as well as
`the encoded mediadataitself.
`
`[0021] The index fragment component 130 creates and
`maintains an index table of fragments received from various
`encoders. Because the system 100 is receiving media frag-
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`from the server too slowly. Thus, in responseto any particular
`ments on an on-going basis during an event from potentially
`many encoders, the system 100 uses the index table to keep
`fragment request, build manifest component 160 provides
`track of what media fragments have been received and from
`manifest information about as many subsequent fragments as
`which encoders(or in which formats). Each encoder may use
`the system 100 has received up to that point.
`a common methodfor identifying media fragments (e.g., a
`[0026] Theclock synchronization component 170 synchro-
`time stamp using a synchronized clock) so that the index
`nizes the clocks of the system 100, clients, and encoders.
`fragment component 130 can correlate fragments from dif-
`Although absolute time is not relevant to the system 100,
`ferent encoders that represent the same periodinalive event.
`being able to identify a particular fragment across multiple
`In this way, the system 100 can detect when media fragments
`encoders and providing clients with the rate (i.e. cadence) at
`are missing and can provide clients with manifest information
`whichto request fragments is relevant to the system 100. For
`about available media fragments.
`example, ifthe client requests data too quickly, the server will
`[0022] The fragment data store 140 stores received media
`not yet have the data and will respond with error responses
`fragments andthecreated indextable of fragments to provide
`(e.g., an HTTP 404 not found error response) creating many
`to clients based on receivedclient requests. The fragment data
`spurious requests that unnecessarily consume bandwidth. On
`store may include a database,disk drive, or other form of data
`the other hand, if the client requests data too slowly, then the
`storage (e.g., a Storage Area Network (SAN) or even a cloud-
`client may not have data in time for playback creating notice-
`based storage service).
`able breaks in the media played backto the user. In addition,
`[0023] The client interface component 150 receives client
`encoders produce media fragments in encodings that may
`requests for media fragments and provides manifest data and
`differ dramatically and provide no meaningful way ofcorre-
`media fragments to clients. Whena client initially connects to
`lating two fragments that represent the sameperiodof time in
`the system 100, the client may send a request for a client
`different encodings as well as where the fragmentsfit into an
`manifest. The client interface component 150 invokes the
`overall timeline of the media event. The clock synchroniza-
`build client manifest component160 to create a manifest that
`includes information about the encodings available from the
`tion component 170 providesthis information by allowing the
`server, encoders, and clients to have a similar clock value at a
`system 100, and fragments stored by the system 100 upto the
`current time based on the index table. The client can use this
`particular time. The encoders may also mark each media
`fragment with the time at which the encoder created the
`fragment. In this way, if a client requests a particular frag-
`ment, the client will get a fragment representing the same
`period regardless of the encoding thatthe clientselects.
`[0027] The computing device on which the smooth stream-
`ing system is implemented may include a central processing
`unit, memory, input devices (e.g., keyboard and pointing
`devices), output devices (e.g., display devices), and storage
`devices(e.g., disk drives or other non-volatile storage media).
`The memory andstorage devices are computer-readablestor-
`age media that may be encoded with computer-executable
`instructions (e.g., software) that implement or enable the
`system. In addition, the data structures and message struc-
`tures may be stored or transmitted via a data transmission
`medium, such as a signal on a communication link. Various
`communication links may be used, such as the Internet, a
`local area network, a wide area network, a point-to-point
`dial-up connection, a cell phone network,and so on.
`[0028] Embodimentsofthe system may be implemented in
`various operating environments that include personal com-
`puters, server computers, handheld or laptop devices, multi-
`processor systems, microprocessor-based systems, program-
`mable consumerelectronics, digital cameras, network PCs,
`minicomputers, mainframe computers, distributed comput-
`ing environments that include any of the above systems or
`devices, and so on. The computer systems maybe cell phones,
`personaldigital assistants, smart phones, personal computers,
`programmable consumerelectronics, digital cameras, and so
`on.
`
`information either to begin requesting ongoing live frag-
`ments, or to skip backwards in time to earlier portions of a
`presentation. This can be used, for example, ifthe clientjoins
`a live event that is already in progress and wants to catch up
`with the previous portions of the event.
`[0024] The build client manifest component 160 builds a
`manifest to satisfy a client request that includes information
`about each of the encodings available from the system 100
`and fragments stored by the system up to the current time. The
`build client manifest component 160 also provides a manifest
`to include with each media fragmentthat provides informa-
`tion to the client about the current media fragment as well as
`potentially subsequent fragments. By combiningthe initially
`received manifest with subsequent manifests provided with
`each media fragment, the client can build an up to date mani-
`fest that includes complete information about the media event
`from thestart up until the current time. When the media event
`completes, the client has a complete storyboard of the media
`event that the client can use for on-demand viewing of the
`media event.
`
`In some embodiments, the client interface compo-
`[0025]
`nent 150 respondsto client requests in a way that encourages
`clients to make requests a certain amountof time after media
`fragments are available. For example, the system 100 may not
`responds with a particular media fragment until the system
`100 has received one or more subsequent fragments from the
`encoders. This allows the system 100 to include manifest
`information about the subsequent fragments in the current
`fragment response. The system 100 may also provide the
`client with a count of subsequent fragmentthatthe client can
`expect with each media fragment. This becomesa timing hint
`for the client. If the client receives a media fragment with
`information about fewer subsequent fragments than the pro-
`vided count, then the client can assume that the client is
`requesting data from the server too quickly. On the other
`hand,ifthe client receives a media fragment with information
`about more subsequent fragments than the provided count,
`then the client can assumethat the client is requesting data
`
`[0029] The system may be described in the general context
`of computer-executable instructions, such as program mod-
`ules, executed by one or more computers or other devices.
`Generally, program modules include routines, programs,
`objects, components, data structures, and so on that perform
`particular tasks or implementparticular abstract data types.
`Typically, the functionality of the program modules may be
`combinedor distributed as desired in various embodiments.
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`[0030] As discussed above,the build client manifest com-
`ponentcreates a client manifest. Following is an example of a
`typical client manifest.
`
`<?xml version="1.0" encoding="utf-8"'2>
`<!--Created with Expression Encoder version 2.1.1205.0-->
`<SmoothStreamingMedia MajorVersion="1" MinorVersion="0"
`Duration="6537916781"
`LookAheadFragmentCount="3" IsLive=""TRUE">
`<StreamIndex Type="video" Subtype="WVC1" Chunks="327"
`Url="QualityLevels({bitrate})/Fragments(video={start time})">
`<QualityLevel Bitrate="1450000" FourCC="WVC1" Width="848"
`Height="480" CodecPrivateData=”. .
`.” />
`<QualityLevel Bitrate="1050000" FourCC="WVC1" Width="592"
`Height="336" CodecPrivateData=". .
`.” />
`<¢ n="0" t="12345678" d="20000000" />
`<¢ n="1" t="32345678" d="20000000" />
`<¢ n="2" t="52345678" d="20000000" />
`<c¢ n="3" t="72345678" d="20000000" />
`</StreamIndex>
`</SmoothStreamingMedia>
`
`[0031] Theclient manifestlists the decoding information as
`well as information for all the fragments that the server has
`archived so far. The total media fragment number and dura-
`tion is only for the media fragments that the server has
`archived up until when the client makes the request (this
`allowsthe client to quickly build the seek bar). For each media
`fragment, “t’’ meansthe absolute timestamp. Theclient uses
`this value to compose the fragment URL (e.g., “Fragments
`(video={start time})). LookAheadFragmentCount indicates
`the targeted number of subsequent fragments that “Track-
`FragmentReferenceBox”is going to reference as described
`further herein. “IsLive” indicates whetherthe live broadcast
`
`is still going on.
`[0032]
`Insome embodiments, whena client requests a par-
`ticular media fragment the smooth streaming system provides
`information about subsequent media fragments. For example,
`the server may hold a particular fragment that is ready until
`some numberofadditional fragments (e.g., two fragments)is
`available. Then, the server may send the fragment along with
`manifest information aboutthe next few fragments. The client
`can use this information to know what is coming and adapt
`appropriately. This allowsthe clientto intelligently adjust the
`request rate. For example, if a client requests a fragment and
`does not have any information aboutlater fragments, then the
`client knowsit is requesting data too fast. Ifthe client requests
`a fragment and receives information about too manylater
`fragments, then the client may be requesting information too
`slow. Thus, the client can adapt using the advance metadata as
`a hint.
`
`[0033] The information about subsequent media fragments
`maybestored in an MP4 container using a custom box. For
`example, the server may insert a “TrackFragmentReference-
`Box” into the ‘traf? box shown above with the definition
`below:
`
`Box Type: ‘uuid’, {d4807ef2-ca39-4695-8e54-26cb9e46a79f}
`Container: ‘traf?
`Mandatory: Yes
`Quantity: Exactly one
`aligned(8) class TrackFragmentReferenceBox
`extends Box(‘uuid’, {d4807ef2-ca39-4695-8e54-2 6cb9e46a79f})
`{
`
`-continued
`
`unsigned int(8) version;
`bit(24) flags = 0;
`unsigned int (8) fragment__count;
`for(i=1; i * fragment_count; i++){
`if(version==1) {
`unsigned int(64) fragment_absolute_time;
`unsigned int(64) fragment__duration;
`} else {
`unsigned int(32) fragment__absolute_time;
`unsigned int(32) fragment__duration;
`
`}
`
`[0034] The fragment_count specifies the number of imme-
`diate subsequent fragments of the sametrack thatthis box is
`referencing. The fragmentsarelisted in the same order as they
`appear in the MP4stream. This numberis equal or greater
`than 1. The fragment_absolute_time specifies a 32- or 64-bit
`integer that indicates the absolute timestamp of the first
`sample in the subsequent fragment. The fragment_duration
`specifies a 32- or 64-bit integer that indicates the duration of
`the subsequent fragment. The number of subsequent frag-
`ments in “TrackFragmentReferenceBox” box (as in ‘frag-
`ment_count’)is a configurable setting on the server. When the
`server receives a fragment request,if the server has enough
`subsequent fragments as the configured value to fill the
`“TrackFragmentReferenceBox”, the server can follow the
`normal response handling code path with default cache con-
`trol settings.
`[0035]
`Ifinstead the server hasat least one but not enough
`subsequent fragments tofill the “TrackFragmentReference-
`Box”, the server maystill return the fragment responseright
`away with the limited subsequent fragment’s information.
`The server mayset a small cache timeout value (depending on
`the fragmentduration) and expect to update the response with
`full “TrackFragmentReferenceBox”for future requests. The
`low amount of subsequent fragment information is a hint to
`the client that the client is requesting data too quickly. If the
`server does not have any subsequentfragmentfor this track,it
`can fail the request with a particular error code indicating
`“fragment temporarily out of range”. The error response can
`becacheable fora small time window.Clients detect this error
`and retry the same request after a small delay. One exception
`is the case whena live session has stopped andtheserveris
`aboutto serve out the very last fragment, in which case there
`will not be any subsequent fragment information, and the
`server respondsto the request with the final stream fragments.
`[0036]
`FIG. 2 is a block diagram that illustrates an operat-
`ing environment of the smooth streaming system using
`Microsoft Windows and Microsoft Internet Information
`Server (IIS), in one embodiment. The environmenttypically
`includes a source client 210, a content delivery network 240,
`and an external network 270. The source client is the source of
`the media or live event. The source client includes a media
`source 220 and one or more encoders 230. The media source
`220 may include cameras each providing multiple camera
`angles, microphones capture audio, slide presentations, text
`(such as from a closed captioning service), images, and other
`types of media. The encoders 230 encode the data from the
`media source 220 in one or more encoding formatsin parallel.
`For example, the encoders 230 may produce encoded media
`in a variety ofbit rates.
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`[0037] The content delivery network 240, where the
`smooth streaming system operates, includes one or more
`ingest servers 250 and one or more origin servers 260. The
`ingest servers 250 receive encoded media in each of the
`encoding formats from the encoders 230 and create a mani-
`fest describing the encoded media. The ingest servers 250
`maycreate and store the media fragments described herein or
`maycreate the fragments onthe fly as they are requested. The
`ingest servers 250 can receive pushed data, such as via an
`HTTP POST,from the encoders 230, or via pull by requesting
`data from the encoders 230. The encoders 230 and ingest
`servers 250 may be connected in a variety of redundant con-
`figurations. For example, each encoder may send encoded
`media data to each of the ingest servers 250, or only to one
`ingest server until a failure occurs. The origin servers 260 are
`the servers that respond to client requests for media frag-
`ments. The origin servers 260 may also be configured in a
`variety of redundant configurations.
`[0038]
`In some embodiments, the ingest servers 250 com-
`prise one or more servers dedicated to ingesting encoder
`media streams. An administrator or content author maycreate
`a publishing point that defines a URL at whichclients of the
`ingest servers 250 canfind a particular media element(e.g., a
`live event). For example, using IIS, the administrator may
`publish a URL “http://ingserver/pubpoint.isml.” The publish-
`ing point is used by the encoders 230 to provide new media
`data to the ingest servers 250 and bythe origin servers 260 to
`request media data from the ing