`a2) Patent Application Publication 0) Pub. No.: US 2010/0235528 Al
`
`(43) Pub. Date: Sep. 16, 2010
`Bocharovet al.
`
`US 20100235528A1
`
`(54) DELIVERING CACHEABLE STREAMING
`MEDIA PRESENTATIONS
`
`(75)
`
`Inventors:
`
`John A. Bocharov,Seattle, WA
`(US); Geqiang (Sam) Zhang,
`Redmond, WA (US); Krishna
`Prakash Duggaraju, Renton, WA
`(US); Sudheer Sirivara, Redmond,
`WA (US); Lin Liu, Sammamish,
`WA (US); Anirban Roy, Kirkland,
`WA (US); Jimin Gao,Seattle, WA
`(US); Jack E. Freelander, Monroe,
`WA (US); Christopher G.
`Knowlton, Redmond, WA (US);
`Vishal Sood, Bothell, WA (US)
`
`Correspondence Address:
`MICROSOFT CORPORATION
`ONE MICROSOFT WAY
`
`REDMOND, WA 98052 (US)
`
`(73) Assignee:
`
`MICROSOFT CORPORATION,
`Redmond, WA (US)
`
`(21) Appl. No.:
`
`12/405,220
`
`(22)
`
`Filed:
`
`Mar.16, 2009
`Publication Classification
`
`(51)
`
`Int. Cl.
`(2006.01)
`G06F 15/16
`(2006.01)
`G06F 7/00
`(52) US. CL we 709/231; 709/248; 707/E17.009,
`709/203; 707/E17.032
`
`(57)
`
`ABSTRACT
`
`A smooth streaming system provides a stateless protocol
`betweena client and server in which the server embedsincre-
`mental control information in media fragments. The server
`provides uniform media fragment responses to media frag-
`ment requests that are cacheable by existing Internet cache
`infrastructure. The smooth streaming system receives media
`data in fragments from one or more encoders, creates an index
`of each fragment, and stores the fragments. The server pro-
`vides fragments to clients that contain metadata information
`describing the encodings available on the server and the
`encoding of the fragment. The server mayalso provideinfor-
`mation within each fragment that allows the client to deter-
`mine whether the client is requesting data toofast or too slow,
`so thatthe client can adaptits request rate to a cadence in tune
`with the rate at which the server is receiving encoderdata.
`
`100
`
`Smooth Streaming System
`
`110
`
`Register
`Event
`
`Encoder
`Interface
`
`Index
`Fragment
`
`Fragment
`Data Store
`
`Manifest
`
`Client
`Interface
`
`Build Client
`
`Clock
`Synchroniz-
`ation
`
`APPLE 1005
`
`APPLE 1005
`
`1
`
`
`
`Patent Application Publication
`
`Sep. 16,2010 Sheet 1 of 5
`
`US 2010/0235528 Al
`
`Smooth Streaming System
`
`110
`
`100
`
`Manifest
`
`Clock
`Synchroniz-
`ation
`
`Register
`Event
`
`Encoder
`Interface
`
`Index
`Fragment
`
`Fragment
`Data Store
`
`Client
`Interface
`
`Build Client
`
`FIG. 1
`
`2
`
`
`
`Patent Application Publication
`
`Sep. 16, 2010 Sheet 2 of 5
`
`US 2010/0235528 Al
`
`06¢
`
`082
`
`09¢OS¢2
`
`Ot
`
`BIDS] Oc?
`
`[=H=
`
`
`[..S)I9AJ3S[..s)I9AI3S
`ulbUCO(seen
`
`OVEA
`
`
`
`YIOMION[BUW3a}xXy
`
`jeusayy|
`
`jeusayu|NGO
`
`IOMIBN
`
`COld
`
`$)Japoouy
`
`g01N0S
`
`99/NOS—nr,
`
`JUSI|D
`
`3
`
`
`
`Patent Application Publication
`
`Sep. 16,2010 Sheet 3 of 5
`
`US 2010/0235528 Al
`
`
`
`
`
`More Fragments?
`
`
`
`380
`
`N
`
`FIG. 3
`
`4
`
`
`
`Patent Application Publication
`
`Sep. 16,2010 Sheet 4 of 5
`
`US 2010/0235528 Al
`
`
`
`
`
`
` Request
`
`
`Received?
`
`Wait for Next Fragment
`Request
`
`480
`
`5
`
`
`
`Patent Application Publication
`
`Sep. 16,2010 Sheet 5 of 5
`
`US 2010/0235528 Al
`
`505
`
`510
`
`Encoder /
`Ingestion Server
`
`520
`
`Origin Server
`
`f-MP4 Stream
`
`515
`
`Client
`
`Chunk’s Information
`
`Generate Client Manifest
`with Latest Chunk Information
`
`525
`
`Archive
`Chunks
`
`Client Manifest Request
`
`Client Manifest Response
`
`Chunk Requests
`
`Chunk Responseswith Following
`
`530
`
`540
`
`545
`
`550
`
`FIG. 5
`
`6
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`DELIVERING CACHEABLE STREAMING
`MEDIA PRESENTATIONS
`
`BACKGROUND
`
`Streaming media is multimedia that is constantly
`[0001]
`received by, and normally presented to, an end-user (using a
`client) while it is being delivered by a streaming provider
`(using a server). Several protocols exist for streaming media,
`including the Real-time Streaming Protocol (RTSP), Real-
`time Transport Protocol (RTP), and the Real-time Transport
`Control Protocol (RTCP), which are often used together. The
`Real Time Streaming Protocol (RTSP), developed by the
`Internet Engineering Task Force (IETF) and created in 1998
`as Request For Comments (RFC) 2326, is a protocol for use
`in streaming media systems, which allowsa client to remotely
`control a streaming media server, issuing VCR-like com-
`mandssuchas “play” and “pause”, and allowing time-based
`accessto files on a server.
`
`[0002] The sending of streamingdata itselfis not part ofthe
`RTSP protocol. Most RTSP servers use the standards-based
`RTPasthe transport protocolfor the actual audio/video data,
`acting somewhat as a metadata channel. RTP defines a stan-
`dardized packet format for delivering audio and video over
`the Internet. RTP was developed by the Audio-Video Trans-
`port Working Group ofthe IETFandfirst published in 1996 as
`RFC 1889, and superseded by RFC 3550 in 2003. The pro-
`tocolis similar in syntax and operation to Hypertext Transport
`Protocol (HTTP), but RTSP adds new requests. While HTTP
`is stateless, RTSPis a stateful protocol. A session ID is used
`to keep track of sessions when needed. RTSP messagesare
`sent from client to server, although some exceptions exist
`wherethe server will send messages tothe client.
`[0003] RTP is usually used in conjunction with RTCP.
`While RTP carries the media streams(e.g., audio and video)
`or
`out-of-band
`signaling
`(dual-tone multi-frequency
`(DTMF)), RTCP is used to monitor transmission statistics
`and quality of service (QoS) information. RTP allows only
`one type of message, onethatcarries data from the source to
`the destination. In many cases, there is a use for other mes-
`sages in a session. These messages controlthe flow and qual-
`ity of data and allow the recipient to send feedback to the
`source or sources. RTCPis a protocol designed for this pur-
`pose. RTCP has five types of messages: sender report,
`receiver report, source description message, bye message,
`and application-specific message. RTCP provides out-of-
`band control information for an RTP flow. RTCP partners
`with RTP in the delivery and packaging of multimedia data,
`but doesnot transport any data itself. It is used periodically to
`transmit control packets to participants in a streaming multi-
`media session. One function of RTCPis to provide feedback
`on the quality of service being provided by RTP. RTCP gath-
`ers statistics on a media connection and information such as
`bytes sent, packets sent, lost packets, jitter, feedback, and
`round trip delay. An application may use this information to
`increase the quality of service, perhaps by limiting flow or
`using a different codecorbit rate.
`[0004] One problem with existing media streaming archi-
`tectures is the tight coupling between server and client. The
`stateful connection between client and server creates addi-
`tional server overhead, because the server tracks the current
`state of each client. This also limits the scalability of the
`server. In addition, the client cannot quickly react to changing
`conditions, such as increased packetloss, reduced bandwidth,
`user requests for different content or to modify the existing
`
`content(e.g., speed up or rewind), and so forth, withoutfirst
`communicating with the server and waiting for the server to
`adapt and respond. Often, when a client reports a lower avail-
`able bandwidth (e.g., through RTCP), the server does not
`adapt quickly enough causing breaks in the media to be
`noticed by the user on the client as packets that exceed the
`available bandwidth are not received and new lowerbit rate
`
`packets are not sent from the server in time. To avoid these
`problems, clients often buffer data, but buffering introduces
`latency, which for live events may be unacceptable.
`[0005]
`In addition, the Internet contains many types of
`downloadable media content items, including audio, video,
`documents, and so forth. These content items are often very
`large, such as video in the hundreds ofmegabytes. Users often
`retrieve documents over the Internet using HTTP through a
`web browser. The Internethas built up a large infrastructure of
`routers and proxies that are effective at caching data for
`HTTP. Servers can provide cached data to clients with less
`delay and by using fewer resources than re-requesting the
`content from the original source. For example, a user in New
`York may download a content item served from a host in
`Japan, and receive the content item through a router in Cali-
`fornia. If a user in New Jersey requests the samefile, the
`router in California may be able to provide the content item
`without again requesting the data from the host in Japan. This
`reduces the networktraffic over possibly strained routes, and
`allowsthe user in New Jersey to receive the content item with
`less latency.
`[0006] Unfortunately, live media often cannot be cached
`using existing protocols, and each client requests the media
`from the same server or set of servers. In addition, when
`streaming media can be cached,it is often done by specialized
`cache hardware, not existing and readily available HTTP-
`based Internet caching infrastructure. The lack of caching
`limits the numberof parallel viewers and requests that the
`servers can handle, and limits the attendanceofa live event.
`The worldis increasingly using the Internet to consumeup to
`the minute live information, such as the record number of
`users that watchedlive events such as the opening ofthe 2008
`Olympics via the Internet. The limitations of current technol-
`ogy are slowing adoption of the Internet as a medium for
`consuming this type of media content.
`
`SUMMARY
`
`[0007] A smooth streaming system is described herein that
`provides a stateless protocol between the client and server in
`which the server embeds incremental information in media
`
`fragments that eliminates the usage of a typical control chan-
`nel. In addition, the server provides uniform media fragment
`responses to media fragment requests, thereby allowing exist-
`ing Internet cache infrastructure to cache streaming media
`data. The smooth streaming system receives media data in
`fragments from one or more encoders, creates an index of
`each fragment, and stores the fragments. As the event
`progresses, the server provides fragments requested by cli-
`ents until the end of the event. Each fragment contains meta-
`data information that describes the encodingsavailable on the
`server and the encoding of the fragment in addition to the
`media contentof the fragment for playback bythe client. The
`server may provide fragments in multiple encodings so that
`the client can, for example, switch quickly to fragments of a
`different bit rate or playback speed based on network condi-
`tions. The server may also provide information within each
`fragmentthat allowsthe client to determine whetherthe client
`
`7
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`an increased likelihood that clients will receive media with
`lower latency from a cache serverlocal to theclient.
`[0015]
`In some embodiments, the smooth streaming sys-
`tem uses a particular data transmission format between the
`server and client. The client requests fragments ofmedia from
`a server that include a portion of the media. For example, for
`a 10-minute file, the client may request 2-second fragments.
`Note that unlike typical streaming where the server pushes
`data to the client, in this casethe client pulls media fragments
`from the server. In the case ofa live stream, the server may be
`creating the media on the fly and producing fragments to
`respondto client requests. Thus, the client may only be sev-
`eral fragments behind the server in terms of how fast the
`server creates fragments and how fast the client requests
`fragments.
`[0016] Each fragment contains metadata and media con-
`tent. The metadata may describe useful information about the
`media content, such as thebit rate at which the media content
`wasencoded, where the media contentfits into a larger media
`element (e.g., this fragment represents offset 1:10 in a 10
`minute video clip), the codec used to encode the media con-
`tent, and so forth. The client uses this information to place the
`fragment into a storyboardofthe larger media element and to
`properly decode and playback the media content.
`[0017]
`FIG. 1 is a block diagram that illustrates compo-
`nents of the smooth streaming system, in one embodiment.
`The smooth streaming system 100 includes a register event
`component 110, an encoder interface component 120, an
`index fragment component 130, a fragment data store 140, a
`client interface component 150, a build client manifest com-
`ponent 160, and a clock synchronization component 170.
`Each of these components is described in further detail
`herein.
`
`[0018] The register event component 110 receives informa-
`tion about a live or other media event for which the system
`will receive encoded media data. The information may
`include network address information or other identifiers for
`each of the encoders that will supply encoded media data to
`the server. The information also includes a URL to which
`encoders will supply encoded media data and at which clients
`can access the media data.
`
`is requesting data too fast or too slow, so that the client can
`adapt its request rate to a cadence in tune with the rate at
`whichthe server is receiving encoder data. Thus, the smooth
`streaming system provides a more scalable streaming media
`server without tracking client state and with an increased
`likelihood that clients will receive media with lowerlatency
`from a cacheserver local to the client.
`
`[0008] This Summary is provided to introduce a selection
`of concepts in a simplified form that are further described
`below in the Detailed Description. This Summary is not
`intended to identify key features or essential features of the
`claimed subject matter, nor is it intended to be used to limit
`the scope of the claimed subject matter.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram that illustrates compo-
`[0009]
`nents of the smooth streaming system, in one embodiment.
`[0010]
`FIG. 2 is a block diagram thatillustrates an operat-
`ing environment of the smooth streaming system using
`Microsoft Windows and Microsoft Internet
`Information
`Server (IIS), in one embodiment.
`[0011]
`FIG. 3 isa flow diagram thatillustrates the process-
`ing of the system to receive media data from encoders, in one
`embodiment.
`
`FIG. 4is a flow diagram thatillustrates the process-
`[0012]
`ing of the system to handle a client connection for streaming
`media, in one embodiment.
`[0013]
`FIG. 5isa data flow diagram thatillustrates the flow
`of media fragments from an encoder to an origin server to a
`client, in one embodiment.
`
`DETAILED DESCRIPTION
`
`[0014] A smooth streaming system is described herein that
`provides a stateless protocol between the client and server in
`which the server embeds incremental information in media
`fragments (i.e., chunks) that eliminates the usage of a typical
`control channel. In addition, the server provides uniform
`media fragment responses to media fragment requests(1.e.,
`clients requesting the same fragmentget the same response),
`thereby allowing existing Internet cache infrastructure to
`cache streaming media data. Each fragment has a distin-
`guished Uniform Resource Locator (URL) that allows the
`fragment to be identified and cached by both Internet cache
`servers and the client’s browser cache. Caching reduces the
`load on the server and allows more clients to view the same
`
`[0019] The encoder interface component 120 provides an
`interface between the system and one or more encoders that
`provide the encoded media data. The encoders may push data
`to the system using common network protocols. For example,
`the encoders may use an HTTP POST request to provide
`encoded media data to the system. The encoders may each use
`content at the same time. The smooth streaming system
`a distinguished URL that specifies the encoder that is the
`receives media data in fragments from one or more encoders,
`source of the encoded media data, which the server may
`creates an index of each fragment, andstores the fragments.
`match to the information received by the register event com-
`As the event progresses,
`the server provides fragments
`ponent 110 when the media event wasregistered.
`requested by clients until the end of the event. Each fragment
`[0020] The encoderinterface component 120 mayspecify a
`contains metadata information that describes the encodings
`particular format for received encoded media data, such as an
`available on the server and the encoding of the fragment in
`MP4or other media container (e.g., MKV). The MP4 con-
`addition to the media contentof the fragmentfor playback by
`tainer format allows multiple types ofdata to be associated in
`the client. The server may provide fragments in multiple
`a single file. The individual data that makes up an MP4
`encodingsso that the client can, for example, switch quickly
`containeris called a box, and each box typically has a label
`to fragments ofadifferent bit rate or playback speed based on
`that identifies the type ofdata stored in the box. Encoders may
`network conditions. The server mayalso provide information
`place metadata information in the boxes such as the type of
`within each fragment that allows the client to determine
`encoding used to encode the encoded media data, as well as
`whether the client is requesting data too fast or too slow, so
`the encoded media data itself.
`that the client can adaptits request rate to a cadence in tune
`with the rate at which the server is receiving encoder data.
`Thus, the smooth streaming system provides a more scalable
`streaming media server withouttracking clientstate and with
`
`[0021] The index fragment component 130 creates and
`maintains an index table of fragments received from various
`encoders. Because the system 100 is receiving media frag-
`
`8
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`ments on an on-going basis during an event from potentially
`many encoders, the system 100 uses the index table to keep
`track of what media fragments have been received and from
`which encoders(or in which formats). Each encoder may use
`a common method for identifying media fragments (e.g., a
`time stamp using a synchronized clock) so that the index
`fragment component 130 can correlate fragments from dif-
`ferent encodersthat represent the same periodin a live event.
`In this way, the system 100 can detect when media fragments
`are missing and can provide clients with manifest information
`about available media fragments.
`[0022] The fragment data store 140 stores received media
`fragments andthe created index table of fragments to provide
`to clients based on received client requests. The fragment data
`store may include a database, disk drive, or other form of data
`storage (e.g., a Storage Area Network (SAN)or even a cloud-
`based storage service).
`[0023] The client interface component 150 receives client
`requests for media fragments and provides manifest data and
`media fragments to clients. When a client initially connects to
`the system 100, the client may send a request for a client
`manifest. The client interface component 150 invokes the
`build client manifest component 160 to create a manifest that
`includes information about the encodings available from the
`system 100, and fragments stored by the system 100 upto the
`current time based on the index table. The client can use this
`
`information either to begin requesting ongoing live frag-
`ments, or to skip backwards in time to earlier portions of a
`presentation. This can be used, for example,ifthe client joins
`a live event that is already in progress and wants to catch up
`with the previous portions of the event.
`[0024] The build client manifest component 160 builds a
`manifest to satisfy a client request that includes information
`about each of the encodings available from the system 100
`and fragments stored by the system upto the current time. The
`build client manifest component 160 also provides a manifest
`to include with each media fragmentthat provides informa-
`tion to the client about the current media fragmentas well as
`potentially subsequent fragments. By combiningthe initially
`received manifest with subsequent manifests provided with
`each media fragment, the client can build an up to date mani-
`fest that includes complete information about the media event
`from the start up until the current time. When the media event
`completes, the client has a complete storyboard of the media
`event that the client can use for on-demand viewing of the
`media event.
`
`In some embodiments, the client interface compo-
`[0025]
`nent 150 respondsto client requests in a way that encourages
`clients to make requests a certain amountof time after media
`fragments are available. For example, the system 100 may not
`responds with a particular media fragment until the system
`100 has received one or more subsequent fragments from the
`encoders. This allows the system 100 to include manifest
`information about the subsequent fragments in the current
`fragment response. The system 100 may also provide the
`client with a count of subsequent fragmentthat the client can
`expect with each media fragment. This becomesa timing hint
`for the client. If the client receives a media fragment with
`information about fewer subsequent fragments than the pro-
`vided count, then the client can assume that the client is
`requesting data from the server too quickly. On the other
`hand,ifthe client receives a media fragment with information
`about more subsequent fragments than the provided count,
`then the client can assumethat the client is requesting data
`
`from the server too slowly. Thus, in responseto any particular
`fragment request, build manifest component 160 provides
`manifest information about as many subsequent fragments as
`the system 100 has received up to thatpoint.
`[0026] Theclock synchronization component 170 synchro-
`nizes the clocks of the system 100, clients, and encoders.
`Although absolute time is not relevant to the system 100,
`being able to identify a particular fragment across multiple
`encoders and providing clients with the rate (i.e. cadence) at
`which to request fragmentsis relevant to the system 100. For
`example,ifthe client requests data too quickly, the server will
`not yet have the data and will respond with error responses
`(e.g., an HTTP 404 not founderror response) creating many
`spurious requests that unnecessarily consume bandwidth. On
`the other hand, if the client requests data too slowly, then the
`client may not have data in time for playback creating notice-
`able breaks in the media played back to the user. In addition,
`encoders produce media fragments in encodings that may
`differ dramatically and provide no meaningful way ofcorre-
`lating two fragments that represent the sameperiod of time in
`different encodings as well as where the fragmentsfit into an
`overall timeline of the media event. The clock synchroniza-
`tion component170 provides this information by allowing the
`server, encoders, and clients to have a similar clock value at a
`particular time. The encoders may also mark each media
`fragment with the time at which the encoder created the
`fragment. In this way, if a client requests a particular frag-
`ment, the client will get a fragment representing the same
`period regardless of the encoding that the clientselects.
`[0027] The computing device on which the smooth stream-
`ing system is implemented may include a central processing
`unit, memory, input devices (e.g., keyboard and pointing
`devices), output devices (e.g., display devices), and storage
`devices (e.g., disk drives or other non-volatile storage media).
`The memory and storage devices are computer-readable stor-
`age media that may be encoded with computer-executable
`instructions (e.g., software) that implement or enable the
`system. In addition, the data structures and message struc-
`tures may be stored or transmitted via a data transmission
`medium, such as a signal on a communication link. Various
`communication links may be used, such as the Internet, a
`local area network, a wide area network, a point-to-point
`dial-up connection, a cell phone network, and so on.
`[0028] Embodiments ofthe system may be implemented in
`various operating environments that include personal com-
`puters, server computers, handheld or laptop devices, multi-
`processor systems, microprocessor-based systems, program-
`mable consumerelectronics, digital cameras, network PCs,
`minicomputers, mainframe computers, distributed comput-
`ing environments that include any of the above systems or
`devices, and so on. The computer systems maybe cell phones,
`personal digital assistants, smart phones, personal computers,
`programmable consumerelectronics, digital cameras, and so
`on.
`
`[0029] The system maybe described in the general context
`of computer-executable instructions, such as program mod-
`ules, executed by one or more computers or other devices.
`Generally, program modules include routines, programs,
`objects, components, data structures, and so on that perform
`particular tasks or implementparticular abstract data types.
`Typically, the functionality of the program modules may be
`combinedor distributed as desired in various embodiments.
`
`9
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`[0030] As discussed above, the build client manifest com-
`ponentcreates a client manifest. Following is an example of a
`typical client manifest.
`
`<?xml version="1.0" encoding="utf-8"?>
`<!--Created with Expression Encoder version 2.1.1205.0-->
`<SmoothStreamingMedia MajorVersion="1" MinorVersion="0"
`Duration="6537916781"
`LookAheadFragmentCount="3" IsLive="TRUE">
`<StreamIndex Type="video” Subtype="WVC1" Chunks="327"
`Url="QualityLevels({bitrate})/Fragments(video={start time})">
`<QualityLevel Bitrate="1450000" FourCC="WVC1" Width="848"
`Height="480" CodecPrivateData=”. . 2” />
`<QualityLevel Bitrate="1050000" FourCC="WVC1" Width="592"
`Height="336" CodecPrivateData=".. ." />
`<¢ n="0" t="12345678" d="20000000" />
`<¢ n="1" t="32345678" d="20000000" />
`<¢ n="2" t="52345678" d="20000000" />
`<¢ n="3" t="72345678" d="20000000" />
`</StreamIndex>
`</SmoothStreamingMedia>
`
`[0031] Theclient manifest lists the decoding information as
`well as information forall the fragments that the server has
`archived so far. The total media fragment number and dura-
`tion is only for the media fragments that the server has
`archived up until when the client makes the request (this
`allowsthe client to quickly build the seek bar). For each media
`fragment, “t’” means the absolute timestamp. The client uses
`this value to compose the fragment URL (e.g., “Fragments
`(video={start time})). LookAheadFragmentCount indicates
`the targeted number of subsequent fragments that “Track-
`FragmentReferenceBox”is going to reference as described
`further herein.“IsLive” indicates whether the live broadcast
`is still going on.
`[0032]
`Insome embodiments, when a client requests a par-
`ticular media fragment the smooth streaming system provides
`information about subsequent media fragments. For example,
`the server may hold a particular fragmentthat is ready until
`some numberof additional fragments(e.g., two fragments)is
`available. Then, the server may send the fragment along with
`manifest information aboutthe next few fragments. The client
`can use this information to know what is coming and adapt
`appropriately. This allowstheclientto intelligently adjust the
`request rate. For example, if a client requests a fragment and
`does not have any information about later fragments, then the
`client knowsit is requesting data too fast. Ifthe client requests
`a fragment and receives information about too many later
`fragments, then the client may be requesting information too
`slow. Thus, the client can adapt using the advance metadata as
`a hint.
`
`[0033] The information about subsequent media fragments
`may be stored in an MP4 container using a custom box. For
`example, the server mayinsert a “TrackFragmentReference-
`Box” into the ‘traf? box shown above with the definition
`below:
`
`Box Type: ‘uuid’, {d4807ef2-ca39-4695-8e54-26cb9e46a79f}
`Container: ‘traf?
`Mandatory: Yes
`Quantity: Exactly one
`aligned(8) class TrackFragmentReferenceBox
`extends Box(‘uuid’, {d4807ef2-ca39-4695-8e54-26cb9e46a79f})
`{
`
`-continued
`
`unsigned int(8) version;
`bit(24) flags = 0;
`unsigned int (8) fragment__count;
`for(i=1; i * fragment_count; i++){
`if(version==1) {
`unsigned int(64) fragment__absolute_time;
`unsigned int(64) fragment__duration;
`} else {
`unsigned int(32) fragment__absolute_time;
`unsigned int(32) fragment__duration;
`
`}
`
`[0034] The fragment_countspecifies the number of imme-
`diate subsequent fragments of the sametrack that this box is
`referencing. The fragments are listed in the same order as they
`appear in the MP4 stream. This number is equal or greater
`than 1. The fragment_absolute_time specifies a 32- or 64-bit
`integer that indicates the absolute timestamp of the first
`sample in the subsequent fragment. The fragment_duration
`specifies a 32- or 64-bit integer that indicates the duration of
`the subsequent fragment. The number of subsequent frag-
`ments in “TrackFragmentReferenceBox”box (as in ‘frag-
`ment_count’) is a configurable setting on the server. When the
`server receives a fragment request, if the server has enough
`subsequent fragments as the configured value to fill the
`“TrackFragmentReferenceBox”, the server can follow the
`normal response handling code path with default cache con-
`trol settings.
`[0035]
`Ifinstead the server hasat least one but not enough
`subsequent fragments tofill the “TrackFragmentReference-
`Box”, the server maystill return the fragment responseright
`away with the limited subsequent fragment’s information.
`The server may set a small cache timeout value (depending on
`the fragment duration) and expect to update the response with
`full “TrackFragmentReferenceBox”for future requests. The
`low amount of subsequent fragment information is a hint to
`the client that the client is requesting data too quickly. If the
`server does not have any subsequent fragmentforthis track,it
`can fail the request with a particular error code indicating
`“fragment temporarily out of range”. The error response can
`be cacheable for a small time window.Clientsdetect this error
`
`and retry the same request after a small delay. One exception
`is the case whena live session has stopped andtheserveris
`aboutto serve out the very last fragment, in which case there
`will not be any subsequent fragment information, and the
`server responds to the request with the final stream fragments.
`[0036]
`FIG. 2 is a block diagram thatillustrates an operat-
`ing environment of the smooth streaming system using
`Microsoft Windows and Microsoft Internet Information
`Server (IIS), in one embodiment. The environmenttypically
`includesa source client 210, a content delivery network 240,
`and an external network 270. The sourceclientis the source of
`the media orlive event. The source client includes a media
`source 220 and one or more encoders 230. The media source
`220 may include cameras each providing multiple camera
`angles, microphones capture audio, slide presentations, text
`(such as from a closed captioning service), images, and other
`types of media. The encoders 230 encode the data from the
`media source 220 in one or more encoding formats in parallel.
`For example, the encoders 230 may produce encoded media
`in a variety of bit rates.
`
`10
`
`10
`
`
`
`US 2010/0235528 Al
`
`Sep. 16, 2010
`
`[0037] The content delivery network 240, where the
`smooth streaming system operates, includes one or more
`ingest servers 250 and one or moreorigin servers 260. The
`ingest servers 250 receive encoded media in each of the
`encoding formats from the encoders 230 and create a mani-
`fest describing the encoded media. The ingest servers 250
`maycreate andstore the media fragments described herein or
`maycreate the fragments on the fly as they are requested. The
`ingest servers 250 can receive pushed data, such as via an
`HTTP POST,from the encoders 230, or via pull by requesting
`data from the encoders 230. The encoders 230 and ingest
`servers 250 may be connectedin a variety of redundant con-
`figurations. For example, each encoder may send encoded
`media data to each of the ingest servers 250, or only to one
`ingest server until a failure occurs. The origin servers 260 are
`the servers that respond to client requests for media frag-
`ments. The origin servers 260 may also be configured in a
`variety of redundant configurations.
`[0038]
`In some embodiments, the ingest servers 250 com-
`prise one or more servers dedicated to ingesting encoder
`media streams. An administrator or content author may create
`a publishing point that defines a URL at whichclients of the
`ingest servers 250 canfind a particular media element(e.g., a
`live event). For example, using IIS, the administrator may
`publish a URL“http://ingserver/pubpoint.ism].” The publish-
`ing point is used by the encoders 230 to provide new media
`data to the ingest servers 250 andby the origin servers 260 to
`request media data from the ingest servers 250. Each encoder
`mayuse a distinguished URLto connectto the ingest servers
`250 so that