`a2) Patent Application Publication co) Pub. No.: US 2011/0099594 Al
`
` CHENetal. (43) Pub. Date: Apr. 28, 2011
`
`
`US 20110099594A1
`
`(54) STREAMING ENCODED VIDEO DATA
`
`(52) US. CD. veces creseneeeseceneees 725/105; 709/231
`
`(75)
`
`Inventors:
`
`YING CHEN,San Diego, CA
`(US); Marta Karezewicz, San
`Diego, CA (US)
`
`(73) Assignee:
`(21) Appl. No.:
`
`Diego.CA(US) San
`,
`12/785,770
`
`(22)
`
`Filed:
`
`May24,2010
`
`Related U.S. Application Data
`oo
`.,
`(60) Provisional application No. 61/255,767,filed on Oct.
`28, 2009.
`
`oo.
`
`Publication Classification
`
`.
`
`.
`
`(51)
`
`Int. Cl.
`HOAN 7/173
`GO6F 15/16
`
`(2006.01)
`(2006.01)
`
`(57)
`
`ABSTRACT
`
`A source device may signal characteristics of a media pre-
`sentation description (MPD) file such that a destination
`device may select one of a numberofpresentations corre-
`sponding to the MPDfile and retrieve one or morevideofiles
`ofthe selected presentation. In one example, an apparatus for
`transporting encodedvideo data includes a management unit
`configured to receive encoded video data comprising a num-
`ber of video segments and forms a presentation comprising a
`numberofvideofiles, each ofthe videofiles corresponding to
`a respective one of the video segments, and a network inter-
`face configured to, in response to a request specifying a tem-
`poral sectionofthevideodata, outputat leastone ofthevideo
`files corresponding to the number of video segments of the
`requested temporal section. A client may request temporally
`sequential fragments from different ones ofthe presentations.
`
`AN SOURCE DEVICE
`20
`
`AN DESTINATION DEVICE
`40
`
`AUDIO SOURCE
`22
`
`VIDEO SOURCE
`24
`
`AUDIO OUTPUT
`42
`
`VIDEO OUTPUT
`44
`
`32 y 1°
`
`AUDIO
`ENCODER
`26
`
`VIDEO
`ENCODER
`28
`
`AUDIO
`DECODER
`46
`
`VIDEO
`DECODER
`48
`
`MPD
`MANAGEMENT
`UNIT
`30
`
`NETWORK
`INTERFACE
`
`NETWORK
`INTERFACE
`36
`
`WEB BROWSER
`38
`
`APPLE 1012
`
`APPLE 1012
`
`1
`
`
`
`Patent Application Publication
`
`Apr. 28
`
`’
`
`2011 Sheet 1 of 7
`
`US 2011/0099594 Al
`
`
`
`
`
`LAdLNOOACIALAdLNOOIGNV
`
`Ov
`
`
`
`ASIAACNOILVNILSAGA/V
`
`ASIAAG
`
`
`
`ADYNOSA/V
`
`vvav
`
`ve
`
`ce
`
`
`
`O3GIAolanv
`
`
`
`rE[eloyer-[ely4ad0o04d
`
`8vov
`
`O3dIA
`
`YAdOONA
`
`82
`
`olanv
`
`YAdOONA
`
`9¢
`
`dAaSMOUsGam
`
`Se
`
`MYOMLAN
`
`dOVsAYSLNI
`
`9€
`
`WYOMLAN
`
`SJOVsAYALNI
`
`ce
`
`LNAWASVNVIN
`
`ddlN
`
`LINA
`
`0€
`
`LSls
`
`
`
`ADYNOSOACIA
`
`
`
`d0YNOsSOIGNV
`
`2
`
`
`
`Patent Application Publication
`
`Apr. 28, 2011 Sheet 2 of 7
`
`US 2011/0099594 Al
`
`Sadddd€a]zz
`LNAWASVNVINGdn
`
`AldddD€
`
`LNANASVNVI
`LINNONITIVNSIS
`YdsaLANVaVd
`
`LINNNOILVAYSddl
`
`09
`
`LINN
`
`0E
`
`AOVANSLNI
`
`08
`
`INdNIOAGIA
`
`VivdO3dIA
`
`dG3qd09Na4
`
`ONINVSYLSOZ
`
`_
`
`SLSanoay
`
`dtLHLINNLNdLNOdd
`
`dilHJddd9¢E
`
`LINNYaANaSLINNTWASINLaY
`
`ASVYOLSAldddiW
`
`v8
`
`éSls
`
`LINN
`
`v9
`
`29
`
`LNdNIOIGNV
`
`vivdo1anv
`
`GAaqdOONA
`
`d0VAYALNI
`
`zs
`
`3
`
`
`
`Patent Application Publication
`
`Apr. 28, 2011 Sheet 3 of 7
`
`US 2011/0099594 Al
`
`UNIFORM
`RESOURCE
`LOCATOR
`92
`
`MPD FILE STORAGE
`84
`
`MPD FILE
`90
`
`DURATION
`VALUE
`94
`
`PRESENTATION
`100A
`
`BASE UNIFORM
`RESOURCE
`NAME
`96
`
`116A
`
`PRESENTATION
`IDENTIFIER
`102A
`
`INCREMENTAL
`URN
`104A
`
`PROFILE IDC
`VALUE
`106A
`
`LEVEL IDC
`VALUE
`108A
`
`FRAME RATE
`VALUE
`110A
`
`PICTURE WIDTH
`VALUE
`112A
`
`PICTURE
`HEIGHT VALUE
`114A
`
`TIMING
`INFORMATION
`ATTRIBUTES
`
`3GPP FILE
`IDENTIFIERS
`118A
`
`FIG. 3
`
`4
`
`
`
`Patent Application Publication
`
`Apr. 28
`
`,2011 Sheet 4 of 7
`
`US 2011/0099594 Al
`
`OAdIA
`
`LNAWS3AS
`
`NOEL
`
`
`
`LNAWS3ASLNAWS3SLNANWS3AS
`
`
`
`OAdIAOAdIA
`
`OAdIA
`
`NOILVLNAS3Yd
`
`
`
`JOELqo0el
`
`VOClL
`
`AldddS€
`
`N8eL
`
`UBElgsel
`
`V8EL
`
`9eL
`
`AldddS€AldddS€
`
`Aldddd€
`
`Vivd¥aqvsH
`
`vel
`
`Aldddd€
`
`Nvvl
`
`
`
`UPrrlarvh
`
`Vrrl
`
`crl
`
`AldddS€AldddD€
`
`AldddD€
`
`Vivd¥aqvsH
`
`NOILVLNAS3Yd
`
`orl
`
`v'Sls
`
`5
`
`
`
`Patent Application Publication
`
`Apr. 28,2011 Sheet 5 of 7
`
`US 2011/0099594 Al
`
`SOURCE DEVICE
`
`DESTINATION DEVICE
`
`180
`
`ENCAPSULATE ENCODED
`VIDEO DATA IN MPD FILE
`
`182
`
`SIGNAL CHARACTERISTICS
`OF MPD FILE
`
`REQUEST MPD FILE
`CHARACTERISTICS
`
`SEND CHARACTERISTICS
`
`186
`
`DETERMINE POTENTIAL
`PRESENTATIONS
`
`DETERMINE AVAILABLE
`
`BANDWIDTH
`
`SELECT PRESENTATION
`
`| | | | |
`
`|
`
`
`
`| | | | | |
`
`196
`
`RETRIEVE AND SEND
`REQUESTED 3GPP FILE
`
`184
`
`188
`
`19
`
`0
`
`192
`
`94
`
`98
`
`
`
`
`REQUEST NEXT 3GPP FILE
`FROM PRESENTATION
`
`DECODE AND DISPLAY
`RECEIVED 3GPP FILE
`
`1
`
`1
`
`2
`
`6
`
`
`
`Patent Application Publication
`
`Apr. 28, 2011 Sheet 6 of 7
`
`US 2011/0099594 Al
`
`XOVUsIN
`
`0€z
`
`XOdOYAWNXO€dVas
`
`4ZISV4sWLNAWSVas
`ANTIVASONITVNOSIS
`
`9€7cee
`87ve?
`LNAWSVad|LNAWSVdad|LNSWSVas
`
`
`LNAWSVeasLNAWSVesLNAWSVeSs
`
`
`NOILISOdNOILISOdNOILISOd
`
`
`LYVLSLYVLSLYVLS
`
`
`GulHLGQNOOASLSHId
`
`aYaqvsaH 0e2
`
`—N
`
`
`
`AZISVYAW
`
`9‘Sl4
`
`AldddD€E
`
`SALAGN
`
`xOdAOOW
`
`ce
`
`vivd
`
`vez
`
`7
`
`
`
`Patent Application Publication
`
`Apr. 28,2011 Sheet 7 of 7
`
`US 2011/0099594 Al
`
`RECEIVE REQUEST TO SEEK
`TO TIME IN VIDEO
`
`DETERMINE 3GPP FILE
`INCLUDING SEEK TIME
`
`RETRIEVE MFRO DATA
`FROM 3GPP FILE
`
`RETRIEVE MFRA DATA
`BASED ON MFRO DATA
`
`
`
`RETRIEVE MOOV HEADER
`DATA BASED ON MFRA DATA
`
`250
`
`252
`
`254
`
`256
`
`258
`
`DETERMINE FRAGMENT(S) TO.}~202
`RETRIEVE FROM MOOV DATA
`
`DETERMINED FRAGMENTS
`
`ISSUE PARTIAL GET
`REQUEST(S) TO RETRIEVE
`
`262
`
`FIG. 7
`
`8
`
`
`
`US 2011/0099594 Al
`
`Apr. 28, 2011
`
`STREAMING ENCODED VIDEO DATA
`
`[0001] This application claims the benefit of U.S. Provi-
`sional Application No. 61/255,767, filed Oct. 28, 2009, the
`entire contents ofwhich are hereby expressly incorporated by
`reference.
`
`TECHNICAL FIELD
`
`[0002] This disclosurerelates to transport of encoded video
`data.
`
`BACKGROUND
`
`[0003] Digital video capabilities can be incorporated into a
`wide range of devices, including digital televisions, digital
`direct broadcast systems, wireless broadcast systems, per-
`sonal digital assistants (PDAs), laptop or desktop computers,
`digital cameras, digital recording devices, digital media play-
`ers, video gaming devices, video game consoles, cellular or
`satellite radio telephones, video teleconferencing devices,
`and the like. Digital video devices implement video compres-
`sion techniques, such as those described in the standards
`defined by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T
`H.264/MPEG-4, Part 10, Advanced Video Coding (AVC),
`and extensions of such standards, to transmit and receive
`digital video information moreefficiently.
`[0004] Video compression techniques perform spatial pre-
`diction and/or temporal prediction to reduce or remove redun-
`dancy inherent in video sequences. For block-based video
`coding, a video frameor slice maybe partitioned into mac-
`roblocks. Each macroblock can be furtherpartitioned. Mac-
`roblocks in an intra-coded(1) frameor slice are encoded using
`spatial prediction with respect to neighboring macroblocks.
`Macroblocksin an inter-coded(P or B) frameor slice may use
`spatial prediction with respect to neighboring macroblocksin
`the sameframeorslice or temporal prediction with respect to
`other reference frames.
`
`[0005] After video data has been encoded, the video data
`may be packetized by a multiplexer for transmission orstor-
`age. The MPEG-2 standard, for example, includes a “Sys-
`tems” section that defines a transport level for many video
`encoding standards. MPEG-2 transport level systems may be
`used by MPEG-2 video encoders, or other video encoders
`conforming to different video encoding standards. For
`example, the MPEG-4 standard prescribes different encoding
`and decoding methodologies than those of MPEG-2, but
`video encoders implementing the techniques of the MPEG-4
`standard maystill utilize the MPEG-2 transport level meth-
`odologies. Third Generation Partnership Project (3GPP)also
`provides techniques for transporting encoded video data
`using a particular multimedia container format for the
`encoded video data.
`
`SUMMARY
`
`In general, this disclosure describes techniques for
`[0006]
`supporting streaming transport of encoded video data via a
`network protocol such as, for example, hypertext transfer
`protocol (HTTP). A source device may form a media presen-
`tation description (MPD)file that lists multiple presentations
`of encoded media data. Each presentation corresponds to
`different encoding for a commonvideo. For example, each
`presentation may have different expectations of a destination
`device in terms of encoding and/or rendering capabilities, as
`well as various averagebitrates.
`
`[0007] The source device may signal the characteristics of
`each presentation, allowing a destination device to select one
`of the presentations based on the decoding and rendering
`capabilities of the destination device and to switch between
`different presentations based on the network environment
`variation and the bandwidths of the presentations. The pre-
`sentations may be pre-encodedor real-time encoded and
`stored in a serverasfile(s) or file fragments, compliantto e.g.,
`ISO base mediafile formatand its extensions. The destination
`
`device mayretrieve data from one or moreofthe presenta-
`tions at various times over, for example, HTTP. The source
`device may further signal fragments of each presentation,
`such as byte ranges and corresponding temporal locations of
`video fragments within each presentation, such that destina-
`tion devices may retrieve individual video fragments from
`various presentations based on e.g., HTTP requests.
`[0008]
`In one example, a methodfor transporting encoded
`video data includes receiving, by a source video device,
`encoded video data comprising a numberof video segments,
`forming a presentation comprising a numberofvideofiles,
`each ofthe videofiles correspondingto a respective one ofthe
`video segments, and, in response to a request specifying a
`temporal section of the video data, outputting at least one of
`the videofiles corresponding to the numberofvideo segments
`of the requested temporal section.
`[0009]
`In another example, an apparatus for transporting
`encodedvideo data includes amanagementunit configured to
`receive encoded video data comprising a number of video
`segments and forms a presentation comprising a number of
`videofiles, each of the video files corresponding to a respec-
`tive one of the video segments, and a network interface con-
`figured to, in response to a request specifying a temporal
`section of the video data, outputat least one of the videofiles
`corresponding to the number of video segments of the
`requested temporal section.
`[0010]
`In another example, an apparatus for transporting
`encoded video data includes means for receiving encoded
`video data comprising a numberofvideo segments, meansfor
`forming a presentation comprising a numberofvideofiles,
`each ofthe videofiles correspondingto a respective one ofthe
`video segments, and means for outputting, in response to a
`request specifying a temporal section of the video data, at
`least one of the video files corresponding to the numberof
`video segments of the requested temporal section.
`[0011]
`In another example, a computer-readable storage
`medium comprises instructions that, when executed, cause a
`processorof a source device for transporting encoded video
`data to receive encoded video data comprising a numberof
`video segments, form a presentation comprising a numberof
`videofiles, each of the video files corresponding to a respec-
`tive one of the video segments, and, in response to a request
`specifying a temporalsection ofthe video data, output at least
`one ofthe videofiles corresponding to the numberof video
`segments of the requested temporal section.
`[0012]
`In still another example, a method for retrieving
`encoded video data includesretrieving, by a client device,
`presentation description data that describes characteristics of
`a presentation of video data, wherein the video data com-
`prises a numberof video segments, and wherein the presen-
`tation comprises a numberof video files, each of the video
`files corresponding to a respective oneofthe video segments,
`submitting a request specifying a temporal section of the
`video data to a source device, receiving, in response to the
`request, at least one of the video files corresponding to the
`
`9
`
`
`
`US 2011/0099594 Al
`
`Apr. 28, 2011
`
`FIG. 5isa flowchart illustrating an example method
`[0021]
`for transporting encoded video data from a source device to a
`destination device.
`[0022]
`FIG. 6is a block diagram illustrating elements of an
`example 3GPPfile.
`[0023]
`FIG. 7isa flowchart illustrating an example method
`for requesting a fragment of a 3GPPfile in responseto a seek
`request for a temporal location within the 3GPPfile.
`
`DETAILED DESCRIPTION
`
`numberof video segments of the requested temporal section
`from the source device, and decoding and displaying the at
`least one of the videofiles.
`
`In another example, an apparatus for retrieving
`[0013]
`encoded video data includes a network interface, a control
`unit configuredto retrieve, via the network interface, presen-
`tation description data that describes characteristics of a pre-
`sentation of video data, wherein the video data comprises a
`number of video segments, and wherein the presentation
`comprises a number of video files, each of the video files
`corresponding to a respective one of the video segments, to
`submit a request specifying a temporal section of the video
`data to a source device, and to receive, in response to the
`request, at least one of the video files corresponding to the
`numberof video segments of the requested temporal section
`from the source device, a video decoder configured to decode
`the at least one of the video files, and a user interface com-
`prising a display configured to display the decoded at least
`one ofthe videofiles.
`
`[0024] The techniques of this disclosure are generally
`directed to supporting streaming transport ofvideo data using
`a protocol, such as, for example, hypertext transfer protocol
`(HTTP) and the HTTP streaming application of HTTP. In
`general, references to HTTP mayinclude references to HTTP
`streamingin this disclosure. This disclosure provides a media
`presentation description (MPD)file that signals characteristic
`elements of a numberofpresentations of video data such as,
`In another example, an apparatus for retrieving
`[0014]
`for example, where fragments of video data are stored within
`encoded video data includes meansfor retrieving presenta-
`the presentations. Each presentation may include a numberof
`tion description data that describes characteristics of a pre-
`individual files, e.g., third Generation Partnership Project
`sentation of video data, wherein the video data comprises a
`(3GPP)files. In general, each presentation may includeaset
`number of video segments, and wherein the presentation
`of individual characteristics, such as, for example, a bit rate,
`comprises a number of video files, each of the video files
`frame rate, resolution, interlaced or progressive scan type,
`corresponding to a respective one of the video segments,
`encoding type (e.g., MPEG-1, MPEG-2, H.263, MPEG-4/H.
`meansfor submitting a request specifying a temporal section
`264, H.265, etc.), or other characteristics.
`of the video data to a source device, means for receiving, in
`[0025] Each ofthe 3GPPfiles can be individually stored by
`response to the request, at least one of the video files corre-
`a server and individually retrieved by a client, e.g., using
`sponding to the numberof video segments of the requested
`HTTP GETandpartial GET requests. HTTP GETandpartial
`temporal section from the source device, and means for
`GETrequests are described in R. Fielding etal., “Hypertext
`decoding and displaying the at least one of the videofiles.
`Transfer Protocol—HTTP/1.1,” Network Working Group,
`[0015]
`In another example, a computer-readable storage
`RFC2616, June 1999, available at http://tools.ietf.org/html/
`medium comprisesinstructions that, when executed, cause a
`rfc2616. In accordance with the techniquesofthis disclosure,
`processor of a device for retrieving encoded video data to
`3GPPfiles of each presentation may be aligned such that they
`retrieve presentation description data that describes charac-
`correspond to the same section of video, that is, the same set
`teristics of a presentation of video data, wherein the video
`of one or more scenes. Moreover, a server may namecorre-
`data comprises a numberof video segments, and wherein the
`sponding 3GPP files of each presentation using a similar
`presentation comprises a numberof video files, each of the
`video files corresponding to a respective one of the video
`naming scheme. In this manner, an HTTPclient may easily
`segments, submit a request specifying a temporal section of
`change presentations as network conditions change. For
`the video data to a source device, receive, in response to the
`example, when a high amount of bandwidthis available, the
`request, at least one of the video files corresponding to the
`client may retrieve 3GPPfiles of a relatively higher quality
`numberof video segments of the requested temporal section
`presentation, whereas when a lower amount of bandwidth is
`from the source device, cause a video decoderof the client
`available, the client may retrieve 3GPPfiles of a relatively
`device to decodetheat least one of the video files, and cause
`lower quality presentation.
`a userinterface of the client device to display the at least one
`[0026] This disclosure also provides techniques for signal-
`of the decodedvideofiles.
`ing characteristics of presentations and corresponding 3GPP
`files summarized in an MPDfile. As an example, this disclo-
`sure provides techniques by which a server may signal char-
`acteristics such as, for example, an expected rendering capa-
`bility and decoding capability of a client device for each
`presentation. In this manner, a client device can select
`between the various presentations based on decoding and
`rendering capabilities of the client device. As another
`example, this disclosure provides techniques for signaling an
`average bit rate and maximum bit rate for each presentation.
`In this manner, a client device can determine bandwidth avail-
`ability and select between the various presentations based on
`the determined bandwidth.
`
`[0016] The details of one or more examplesare set forth in
`the accompanying drawingsand the description below. Other
`features, objects, and advantages will be apparent from the
`description and drawings, and from the claims.
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`FIG. 1 is a block diagram illustrating an example
`[0017]
`system in which an audio/video (A/V) source device trans-
`ports audio and video data to an A/V destination device.
`[0018]
`FIG. 2 is a block diagram illustrating an example
`arrangement of components of a multiplexer.
`[0019] FIG.3 is a block diagram illustrating an example set
`of program specific information tables.
`[0020]
`FIG. 4 is a conceptual diagram illustrating align-
`ment between Third Generation Partnership Project (3GPP)
`files of various presentations and corresponding video seg-
`ments.
`
`Inaccordance with the techniquesofthis disclosure,
`[0027]
`a server may use a naming convention that indicates 3GPP
`files of each presentation that correspondto the same scene.
`This disclosure provides techniques for aligning 3GPPfiles
`of each presentation such that each scene corresponds to one
`of the 3GPPfiles in each presentation. For example, a server
`
`10
`
`10
`
`
`
`US 2011/0099594 Al
`
`Apr. 28, 2011
`
`may name 3GPPfiles of each presentation corresponding to a
`scene lasting from time T to time T+N using a naming con-
`vention similar to “[program]_preX_T_T+N” where T and
`T+Nin the naming convention correspondto values for time
`T and time T+N,“[program]” corresponds to the nameofthe
`video, and “_preX”corresponds to an identifier of the pre-
`sentation (e.g., “pre2” for presentation 2). Accordingly, the
`3GPPfiles of each presentation may be aligned suchthat the
`file sizes ofthe 3GPPfiles in the same time period can be used
`to derive the instantaneousbit rate for each presentation.
`[0028]
`Inaddition,the server maysignalthe starting time as
`well as the ending time and/or duration of each of the 3GPP
`files for each presentation. In this manner, a client can retrieve
`a particular 3GPPfile using an HTTP GETbased on the name
`ofthe file by retrieving the starting time and ending timeofthe
`3GPPfile as signaled by the server and automatically gener-
`ating the file name basedonthe starting time and ending time.
`In addition, the server may also signal byte ranges for each of
`the 3GPPfiles of each presentation. Accordingly, the client
`mayretrieve all or a portion ofa 3GPPfile using a partial GET
`based on the automatically generated name and a byte range
`ofthe 3GPPfile to be retrieved. The client may use the HEAD
`method of HTTPtoretrieve thefile size of a particular 3GPP
`file. In general, a HEAD requestretrieves header data without
`corresponding body data for a URN or URL to which the
`HEADrequestis directed.
`[0029]
`FIG. 1 is a block diagram illustrating an example
`system 10 in which audio/video (A/V) source device 20trans-
`ports audio and video data to A/V destination device 40.
`System 10 of FIG. 1 may correspond to a video teleconfer-
`ence system, a server/client system, a broadcaster/receiver
`system, gaming system, or any other system in which video
`data is sent from a source device, such as A/V source device
`20, to a destination device, such as A/V destination device 40.
`In some examples, audio encoder 26 may comprise a voice
`encoder, also referred to as a vocoder.
`[0030] A/V source device 20, in the example of FIG. 1,
`includes audio source 22, video source 24, audio encoder 26,
`video encoder 28, media presentation description (MPD)
`managementunit 30, and network interface 32. Audio source
`22 may comprise, for example, a microphonethat produces
`electrical signals representative of captured audio data to be
`encoded by audio encoder 26. Alternatively, audio source 22
`may comprise a storage medium storing previously recorded
`audio data, an audio data generator such as a computerized
`synthesizer, or any other source of audio data. Video source
`24 may comprise a video camera that produces video data to
`be encoded by video encoder 28, a storage medium encoded
`with previously recorded video data, a video data generation
`unit for computer graphics, or any other source of video data.
`Raw audio and video data may comprise analog or digital
`data. Analog data may be digitized before being encoded by
`audio encoder 26 and/or video encoder28.
`
`[0031] Audio source 22 may obtain audio data from a
`speaking participant while the speaking participant is speak-
`ing, and video source 24 may simultaneously obtain video
`data of the speaking participant. In other examples, audio
`source 22 may comprise a computer-readable storage
`medium comprising stored audio data, and video source 24
`may comprise a computer-readable storage medium compris-
`ing stored video data.
`In this manner,
`the techniques
`described in this disclosure maybe appliedto live, streaming,
`real-time audio and video data and/or to archived, pre-re-
`corded audio and videodata.
`
`[0032] Audio frames that correspond to video frames are
`generally audio frames containing audio data that was cap-
`tured by audio source 22 contemporaneously with video data
`captured by video source 24that is contained within the video
`frames. For example, while a speaking participant generally
`produces audio data by speaking, audio source 22 captures
`the audio data, and video source 24 captures video data ofthe
`speaking participant at the same time, that is, while audio
`source 22 is capturing the audio data. Hence, an audio frame
`may temporally correspond to one or more particular video
`frames. Accordingly, an audio frame correspondingto a video
`frame generally corresponds to a situation in which audio
`data and video data were captured at the same time and for
`which an audio frame and a video frame comprise, respec-
`tively, the audio data and the video data that was captured at
`the same time. Audio data may also be addedseparately,e.g.,
`soundtrack information, added sounds, music, soundeffects,
`and the like.
`
`[0033] Audio encoder 26 may encode a timestamp in each
`encoded audio framethat represents a time at which the audio
`data for the encoded audio frame wasrecorded,and similarly,
`video encoder 28 may encode a timestamp in each encoded
`video framethat represents a time at which the video data for
`encoded video frame was recorded. In such examples, an
`audio frame corresponding to a video frame may comprise an
`audio frame comprising a timestamp and a video frame com-
`prising the same timestamp. A/V source device 20 may
`includean internal clock from which audio encoder 26 and/or
`
`video encoder 28 may generate the timestamps, or that audio
`source 22 and video source 24 mayuseto associate audio and
`video data, respectively, with a timestamp.
`[0034] Audio source 22 may send data to audio encoder 26
`corresponding to a time at which audio data was recorded,
`and video source 24 may send data to video encoder 28
`corresponding to a time at which video data was recorded. In
`some examples, audio encoder 26 may encode a sequence
`identifier in encoded audio data to indicate a relative temporal
`ordering of encoded audio data but without necessarily indi-
`cating an absolute time at which the audio data was recorded,
`and similarly, video encoder 28 mayalso use sequence iden-
`tifiers to indicate a relative temporal ordering of encoded
`video data. Similarly, in some examples, a sequenceidentifier
`may be mappedor otherwise correlated with a timestamp.
`[0035] Audio encoder 26 and video encoder 28 provide
`encoded data to MPD managementunit 30. In general, MPD
`managementunit 30 stores summarization regarding encoded
`audio and video data in the form of MPDfiles corresponding
`to the encoded audio and video data in accordance with the
`
`techniques of this disclosure. As discussed in greater detail
`below, an MPDfile describes a numberofpresentations, each
`presentation having a numberofvideofiles, e.g., formed as
`Third Generation Partnership Project (GPP) files. MPD
`management unit 30 may create the same number of 3GPP
`files in each presentation and mayalign the 3GPPfiles of each
`presentation such that similarly positioned 3GPPfiles corre-
`spond to the same video segment. That is, similarly posi-
`tioned 3GPPfiles may correspondto the same temporal video
`fragment. MPD managementunit 30 mayalsostore data that
`describes characteristics of the 3GPP files of each presenta-
`tion, such as, for example, the durations of the 3GPPfiles.
`[0036] MPD management unit 30 may interact with net-
`work interface 32 to provide video data to a client, such as
`A/V destination device 40. Network interface 32 may imple-
`ment HTTP(or other network protocols) to allow destination
`
`11
`
`11
`
`
`
`US 2011/0099594 Al
`
`Apr. 28, 2011
`
`device 40 to request individual 3GPPfiles listed in an MPD
`file which is stored by MPD managementunit 30. Network
`interface 32 may therefore respond to HTTP GET requests for
`3GPPfiles, partial GET requests for individual byte ranges of
`3GPPfiles, HEAD requests to provide header information for
`the MPD and/or 3GPPfiles, and other such requests. Accord-
`ingly, network interface 32 may deliver data to destination
`device 40 that is indicative of characteristics of an MPDfile,
`such as, for example, a base namefor the MPDfile, charac-
`teristics of presentations of the MPDfile, and/or characteris-
`tics of 3GPP files stored in each presentation. Data that
`describe characteristics of a presentation of an MPDfile, the
`MPDfile itself, and/or 3GPPfiles corresponding to the MPD
`file may be referred to as “presentation description data.” In
`some examples, network interface 32 may instead comprise a
`network interface card (NIC)that extracts application-layer
`data from received packets and then passes the application-
`layer packets to MPD management unit 30.
`In some
`examples, MPD management unit 30 and network interface
`32 may be functionally integrated.
`[0037]
`In this manner, a user may interact with destination
`device 40 via a web browser 38 application executed on
`destination device 40, in the example of FIG. 1, to retrieve
`video data. Web browser 38 mayinitially retrieve a first video
`file or its fragments from one of the presentations stored by
`MPD managementunit 30, then retrieve subsequent video
`files or fragments as the first videofile is being decoded and
`displayed by video decoder 48 and video output 44, respec-
`tively. Destination device 40 may include a user interface that
`includes video output 44, e.g., in the form of a display, and
`audio output 42, as well as other input and/or output devices
`such as, for example, a keyboard, a mouse, a joystick, a
`microphone, a touchscreen display, a stylus, a light pen, or
`other input and/or output devices. When the video files
`include audio data, audio decoder 46 and audio output 42 may
`decode andpresent the audio data, respectively. Moreover, a
`user may “seek”to a particular temporal location of a video
`presentation. For example, the user may seek in the sense that
`the user requests a particular temporal location within the
`video data, rather than watching the videofile in its entirety
`from start to finish. The web browser may cause a processor
`or other processing unit of destination device 40 to determine
`oneofthe videofiles that includes the temporal location ofthe
`seek, then request that videofile from source device 20.
`[0038]
`In some examples, a control unit within destination
`device 40 may perform the functionality of web browser 38.
`That is, the control unit may execute instructions for web
`browser38 to submit requests to source device 20 via network
`interface 36, to select between presentations of an MPDfile,
`and to determine available bandwidth of network connection
`
`34. The instructions for web browser 38 maybestored in a
`computer-readable storage medium. The control unit may
`further form requests, e.g., HTTP GET and partial GET
`requests, for individual 3GPPfiles from source device 20, as
`described in this disclosure. The control unit may comprise a
`general purpose processor and/or one or more dedicated hard-
`ware units such as, for example, ASICs, FPGAs, or other
`hardware or processing units or circuitry. The control unit
`may further, in some examples, perform the functionality of
`any of audio decoder 46, video decoder 48, and/or any other
`functionality described with respect to destination device 40.
`[0039]
`In general, presentations of an MPDfile differ by
`characteristics such as, for example, expected rendering
`capabilities of a destination device, expected decoding capa-
`
`bilities of a destination device, and average bit rate for video
`files of the presentations. MPD management unit 30 may
`signal the expected rendering capabilities, expected decoding
`capabilities, and average bitrates for the presentations in pre-
`sentation headers of the MPDfile. In this manner, destination
`device 40 may determine which of the presentations from
`whichtoretrieve video files, for example, based on rendering
`capabilities ofvideo output 44 and/or decoding capabilities of
`video decoder 48.
`
`[0040] Destination device 40 may further determine cur-
`rent bandwidth availability, e.g., of network connection 34,
`and select a presentation based on the averagebit rate for the
`presentation. That is, when video output 44 and video decoder
`48 have the capability to render and decode, respectively,
`video files of more than one of the presentations of an MPD
`file, destination device 40 mayselect one of the presentations
`based on current bandwidth availability. Likewise, when the
`bandwidth availability changes, destination device 40 may
`dynamically change between supported presentations. For
`example, when bandwidth becomesrestricted, destination
`device 40 may retrieve a next video file from a presentation
`having relatively lower bit rate video files, whereas when
`bandwidth expands, destination device 40 mayretrieve a next
`videofile from a presentation havingrelatively higherbit rate
`videofiles.
`
`[0041] By temporally aligning the video files of each pre-
`sentation, the dynamic switch between presentations may be
`simplified for destination devices such as destination device
`40. Thatis, destination device 40 may, upon determining that
`bandwidth conditions have changed, determine a time period
`for which video data has already been retrieved, and then
`retrieve the next videofile from oneofthe presentations based
`on the bandwidth conditions. For example, if the last video
`file retrieved by destination device 40 endsat time T, and the
`next file is of duration N, destination device 40 mayretrieve
`the video file from time T to time T+N from any of the
`presentations, based on the bandwidth conditions, because
`the video files of each of the presentations are temporally
`aligned.
`[0042] Moreover, MPD management unit 30 and web
`browser 38 may be configured with a common naming con-
`vention for video files. In general, each video file (e.g., each
`3GPPfile) may comprise a namebased on a uniform resource
`locator (URL) at which the MPD file is stored, a uniform
`resource name (URN)of the MPDfile at the URL, a name of
`a presentation, a start time, and an end time. Thus both MPD
`management unit 30 and web browser 38 may be configured
`to use a naming