`Goetz et al.
`
`[54] SYSTEM, DEVICE, AND METHOD FOR
`STREAMING A MULTIMEDIA FILE
`
`[75]
`
`Inventors: Tom Goetz, Attleboro; Manickam R.
`Sridhar, Holliston; Mukesh Prasad,
`Chestnut Hill, all of Mass.
`
`[73]
`
`Assignee: Motorola, Inc., Schaumburg, Ill.
`
`[21]
`
`Appl. No.: 08/711,701
`
`[22]
`
`Filed:
`
`Sep. 6, 1996
`
`[51]
`[52]
`[58]
`
`[56]
`
`Int. Cl.6
`...................................................... G06F 15/16
`U.S. Cl . ............................................. 709/231; 709/232
`Field of Search ............................ 395/200.6, 200.61,
`395/200.66, 200.68, 200.33, 200.62; 707/501,
`515, 10, 104
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,197,083
`5,231,492
`5,333,299
`5,339,413
`5,463,620
`5,485,147
`5,487,167
`5,581,703
`5,602,582
`5,603,029
`5,652,749
`5,659,539
`5,664,226
`5,691,713
`5,727,002
`5,732,216
`5,737,495
`5,751,883
`
`3/1993
`7/1993
`7/1994
`8/1994
`10/1995
`1/1996
`1/1996
`12/1996
`2/1997
`2/1997
`7/1997
`8/1997
`9/1997
`11/1997
`3/1998
`3/1998
`4/1998
`5/1998
`
`Gandini eta!. ........................... 375/10
`Dangi et a!. ... ... ... ... ....... ......... 358/143
`Koval et a!. ............................ 395/550
`Koval eta!. ............................ 395/650
`Sriram ... ... ... ....... ...... ... ...... ........ 370/60
`Jaffe et a!.
`........................... 340/825.5
`Dinallo et a!. . ... ... ... ... .......... ... 395/650
`Baugher et al. ................... 395/200.06
`Wanderscheid et a!. ................. 348/12
`Aman et a!. ............................ 395/675
`Davenport et a!. ..................... 370/466
`Porter et a!. ....................... 395/200.61
`Czako eta!. ............................ 395/872
`Ishida ................................. 340/870.01
`Miller eta!. .............................. 371!32
`Logan et a!. ....................... 395!200.33
`Adams et a!.
`.......................... 395/615
`Ottensen et a!. .. ... ... ....... ...... ... .. 386/27
`
`111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US005928330A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,928,330
`Jul. 27, 1999
`
`5,867,230
`
`2/1999 Wang eta!. ............................ 348/845
`
`OTHER PUBLICATIONS
`
`e mail message containing search report having several
`abstracts (e mail dated Apr. 29, 1996; performed Apr. 29,
`1996).
`RealMedia Overview, "The First Open Multimedia Stream(cid:173)
`ing Platform," http://V.'V>'W.realuadio.com/ or e-mail
`radinfo@Jprognet.com; copyright ©Progressive Networks,
`1996.
`RealMedia, "Technical White Paper," http:// V.'WW.realua(cid:173)
`dio.com/ or e-mail radinfo @prognet.com; copyright ©Pro(cid:173)
`gressive Networks, 1996.
`
`Primary Examiner---Mehmet B. Geckil
`A.ttorney, Agent, or Firm-Hugh Dunlop; Joanne N. Pappas
`ABSTRACT
`
`[57]
`
`A system and device for, and method of, presenting multi(cid:173)
`media information. In a client-server context, the invention
`includes a client that receives units of the multimedia
`information and presents the information on a presentation
`device. Each unit of information has an importance value
`assigned to it, which in an exemplary embodiment is indica(cid:173)
`tive of the unit's importance in relation to the quality of the
`presentation. The invention includes a mechanism for char(cid:173)
`acterizing the performance capabilities of the system. For
`example, several conventional statistics may be gathered
`and analyzed concurrently with the streaming operation and
`before it begins. The invention includes a mechanism for
`inferring network conditions from the characterized perfor(cid:173)
`mance. The server may then stream the units of multimedia
`information to the client at a streaming rate and adapt the
`streaming rate of the streaming in response to the impor(cid:173)
`tance information and in response to the inferred net\vork
`conditions.
`
`51 Claims, 8 Drawing Sheets
`
`YES
`
`NO
`
`1320
`
`1360
`
`1370
`
`1340
`
`THROTTLE DOWN
`(DROP PACKET IF
`IMPORTANCE = 5)
`
`THROTTLE DOWN
`(DROP PACKET IF
`IMPORTANCE > 4)
`
`THROTTlE UP
`(RETRANSMIT IF
`IMPORTANCE < 3)
`
`DON'T THROTTLE
`
`SEND PACKETS
`
`CISCO Exhibit 1010, pg. 1
`
`
`
`U.S. Patent
`
`Jul. 27, 1999
`
`Sheet 1 of 8
`
`5,928,330
`
`rl
`
`100
`
`FILE
`HEADER
`
`FILE
`BODY
`
`110)
`120)
`FIG. 1
`
`FILE HEADER
`PREAMBLE
`~
`
`210
`
`(
`110
`
`INSTANCE
`MEDIA
`DESCRIPTOR
`~
`
`~
`
`220
`
`220
`FIG. 2A
`
`I
`
`•
`
`•
`
`•
`
`~
`
`220
`
`16
`211
`FILE SIGNATURE
`212
`HEADER SIZE
`VERSION MAJOR VERSION MINOR I NUMBER OF MEDIA INSTANCES __./
`215
`216
`RESERVED
`~
`(
`217
`RESERVED
`i---'
`\ 214 FIG. 2B
`
`31
`
`__./
`
`__./
`
`__./
`
`210
`0 (
`
`(
`J 21
`
`0
`
`16
`MEDIA BLOCK OFFSET
`
`MEDIA TYPE
`
`ENCODING TYPE
`
`SUBTYPE
`
`31
`f--.; 221
`
`222
`f--.;.
`
`1-----' 223
`
`f--.; 224
`
`f--.; 225
`
`(
`220
`
`RATE
`FIG. 2C
`
`CISCO Exhibit 1010, pg. 2
`
`
`
`U.S. Patent
`
`Jul. 27, 1999
`
`Sheet 2 of 8
`
`5,928,330
`
`,-r2o
`
`I
`
`,-3to
`
`MEDIA
`BLOCK
`~30
`
`FIG.
`
`3
`
`f400
`
`• • •
`
`!
`
`MEDIA
`BLOCK
`DIRECTORY
`~20
`
`420 I~
`
`PREAMBLE
`
`PACKET
`DISCRIPTOR
`
`)
`410
`
`I
`
`\_ 420
`
`l./
`420
`FIG. 4A
`411 \
`I
`PACKET COUNT
`PRESENTATION LENGTH
`RESERVED
`
`1
`
`RESERVED
`
`412 '
`
`1
`
`0
`
`(
`410
`
`0
`
`8
`
`FIG. 4B
`16
`
`PACKET OFFSET
`TIME STAMP
`RESERVED
`
`31
`
`....
`....
`
`13
`-4
`14
`-4
`
`31
`!'-----' 421
`422
`!'-----'
`
`I
`
`IMPORTANCE
`\
`423 FIG. 4C
`
`I PACKET LENGTH
`(
`425
`
`• • •
`
`FIG. 4D "--430
`
`CISCO Exhibit 1010, pg. 3
`
`
`
`U.S. Patent
`
`Jul. 27,1999
`
`Sheet 3 of 8
`
`5,928,330
`
`442
`441
`(
`(16
`8
`I IMPORTANCE
`CHANNEL I
`PACKET LENGTH
`SEQUENCE I
`TIME STAMP
`RESERVED
`
`444
`443
`(
`(
`31
`SEGMENT~# 1TOTAL SEGMENTS
`RESERVED
`
`.....
`
`.....
`
`v
`v
`
`447
`448
`
`DATA
`
`460
`I-'
`
`'
`
`FIG. 4E
`
`0
`
`445 ·--
`
`(
`440
`
`0
`
`513
`1----'
`514
`I----'
`
`512
`.)
`511
`)
`16
`31
`I MAX FRAME SIZE
`PACKET COUNT
`PRESENTATION LENGTH
`I-FRAME COUNT
`FIG. 5
`
`0
`
`31
`
`__.,.,
`
`613
`
`616
`'V
`
`612 .)
`611
`)
`16
`PACKET COUNT T MIDI FORMAT
`PRESENTATION LENGTH
`I
`614 .....__.... TRACK COUNT
`DIVISOR
`FIG. 6
`743
`744 (
`{
`
`745 ·--
`
`(
`
`700
`
`741
`742 (
`(
`CHANNEL I I ' IMPORTANCE
`PACKET LENGTH
`SEQUENCE I
`TIME STAMP
`START TIME
`END TIME
`
`OF
`
`DATA
`
`FIG. 7
`
`l
`
`TOTAL
`
`747
`f-....-'
`748
`f-....-'
`749
`f-....-'
`750
`~
`
`~
`f-....-'
`
`CISCO Exhibit 1010, pg. 4
`
`
`
`U.S. Patent
`
`Jul. 27, 1999
`
`Sheet 4 of 8
`
`5,928,330
`
`VCR
`
`805 ,_
`-
`CAMERA
`810
`815 - MICROPHONE
`
`I
`
`I
`
`VIDEO AND AUDIO
`CAPTURE
`
`20
`~'
`
`825-
`
`VIDEO ENCODER
`
`AUDIO ENCODER
`
`f-B
`30
`
`AUDIO PACKETIZER -8 40
`835- VIDEO PACKETIZER
`!
`!
`QOS MANAGER
`
`~45
`
`FIG. B
`
`FILE
`
`'r-850
`
`CLIENT
`
`9(10
`
`931
`I
`\
`: CONTROL ;
`I
`I
`
`~
`I
`
`DATA
`I
`9~2
`
`(
`930
`
`FIG. 9
`
`SERVER
`
`(
`920
`
`CISCO Exhibit 1010, pg. 5
`
`
`
`U.S. Patent
`
`Jul. 27, 1999
`
`Sheet 5 of 8
`
`5,928,330
`
`CLIENT
`
`WEB
`BROWSER
`APPLICATION
`
`10,
`
`~
`
`9fo
`
`MULTIMEDIA
`CLIENT
`APPLICATION
`
`-
`
`9fo
`
`toto
`
`(
`910
`
`SERVER
`
`WEB
`SERVER
`SITE
`
`_f
`WEB PAGES
`
`10~0
`
`1031
`
`MULTIMEDIA
`SERVER
`APPLICATION
`
`f
`MUL TH.~EDIA
`FILE
`
`10{0
`
`afo
`
`I
`920
`
`FIG. 10
`
`CISCO Exhibit 1010, pg. 6
`
`
`
`U.S. Patent
`
`Jul. 27, 1999
`
`Sheet 6 of 8
`
`5,928,330
`
`CLIENT
`
`SERVER
`
`START }
`~
`SEND MEDIA
`REQUEST
`MESSAGE
`
`•
`•
`
`SEND MESSAGE
`SPECIFYING
`RATE
`
`.. GO ..
`SEND
`MESSAGE
`
`..... _
`1------
`
`1100
`
`tlo
`
`11/o
`
`ttfo
`
`---
`
`SEND MEDIA
`RESPONSE
`MESSAGE
`
`1~0
`SEND MESSAGE
`~ FOR CALCULATING
`ROUND-TRIP DELAY
`~
`\
`1150
`t-- r--------_ r--,.. SEND CONFIG
`
`~
`
`SEND RESPONSE
`nfo
`
`MESSAGE
`
`SEND "GO"
`MESSAGE
`
`~70
`
`}
`
`1180
`
`END
`
`1199
`
`FIG. 11
`
`CISCO Exhibit 1010, pg. 7
`
`
`
`U.S. Patent
`
`Jul. 27, 1999
`
`Sheet 7 of 8
`
`5,928,330
`
`START
`
`1200
`
`1299
`
`-1235
`
`STREAM A TIME
`SLICE OF PACKETS
`
`.._ __ co_To_s_LE_EP __ ......-
`
`1220
`
`1230
`
`RETRANSMIT
`
`1240
`
`GO TO SLEEP
`
`1260
`
`1265
`
`FIG. 12
`
`CISCO Exhibit 1010, pg. 8
`
`
`
`U.S. Patent
`
`Jul. 27, 1999
`
`Sheet 8 of 8
`
`5,928,330
`
`1300
`
`YES
`
`NO
`
`1320
`
`1360
`
`1370
`
`1340
`
`THROTTLE DOWN
`(DROP PACKET IF
`IMPORTANCE = 5)
`
`THROTTLE DOWN
`(DROP PACKET IF
`IMPORTANCE > 4)
`
`THROTTLE UP
`(RETRANSMIT IF
`IMPORTANCE < 3)
`
`DON'T THROTTLE
`
`SEND PACKETS
`
`1380
`
`1399
`
`FIG. 13
`
`CISCO Exhibit 1010, pg. 9
`
`
`
`5,928,330
`
`1
`SYSTEM, DEVICE, AND METHOD FOR
`STREAMING A MULTIMEDIA FILE
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is related to the following U.S. patent
`applications, all of which are assigned to the assignee of this
`application and all of which are incorporated by reference
`herein:
`Device, System And Method Of Real-Time Multimedia
`Streaming (Serial No. 08/636,417, to Qin-Fan Zhu,
`Manickam R.
`Sridhar, and M. Vedat Eyuboglu, filed Apr., 23, 1996); and
`Improved Video Encoding System And Method Ser. No.
`08/711,702, to Manickam R. Sridhar and Feng Chi
`Wang, filed on even date herewith.).
`
`BACKGROUND
`
`s
`
`10
`
`2
`relationships, in short, that the stream itself i<> coherent.
`Inter-stream synchronism means that multiple related
`streams are presented in synchronism \Vith respect to each
`other. Concerning intra-stream synchronism, all streams, for
`the most part, should present data in order. However, users
`are more forgiving if some streams, such as video, leave out
`certain portions of the data than they are of other streams,
`such as audio, doing the same. A video stream with missing
`data may appear a little choppy, but an audio stream with
`missing data may be completely unintelligible. Concerning
`inter-stream synchronism, poor control vvilllikely result in
`poor "lip synch," making the presentation appear and sound
`like a poorly dubbed movie.
`In the network-based context, one simple model of pro-
`15 ducing the information involves the consuming entity to
`request the downloading of the multimedia information for
`an entire presentation from a server, storing the multimedia
`information. Once downloaded, the client may then
`consume, or present, the information. Although relatively
`20 simple to implement, this model has the disadvantage of
`requiring the user to wait for the downloading to complete
`before the presentation can begin. This delay can be con(cid:173)
`siderable and is especially annoying when a user finds that
`he or she is only interested in a small portion of the overall
`25 presentation.
`A more sophisticated model of producing information
`involves a server at one network site "streaming" the mul(cid:173)
`timedia information over the network to a client at another
`site. The client begins to present the information as it arrives,
`rather than waiting for the entire data set to arrive before
`beginning presentation. This benefit of reduced delay is at
`the expense of increased complexity. Without the proper
`control, data overflow and underflow may occur, seriously
`degrading the quality of the presentation.
`Many modern multimedia applications involve the trans(cid:173)
`fer of a large amount of information, placing a considerable
`load on the resources of the network, server, and client. The
`use of network-based multimedia applications appears to be
`growing. As computers become more powerful and more
`40 people access network-based multimedia applications, there
`will be an increased demand for longer, more complicated,
`more flexible multimedia applications, thereby placing even
`larger loads and demands on the network, server, and client.
`The demand placed on servers by these ever-growing mul-
`timedia applications is particularly high, as individual serv(cid:173)
`ers are called upon to support larger numbers of simulta(cid:173)
`neous uses: it is not uncommon even today for an Internet
`server to handle thousands of simultaneous channels.
`Consequently, there is a need in the art for a device, system,
`and method that, among other things,
`can handle longer, more complicated presentations;
`utilize a network's resources more efficiently; and
`utilize a server's and client's resources more efficiently.
`
`SUMMARY
`In short, the invention involves a new file format for
`organizing related multimedia information and a system and
`device for, and method of, using the new file format. The
`60 invention eases the management and control of multimedia
`presentations, having various media streams, each of a
`specific type, each specific type further classified by encod(cid:173)
`ing type, subtype, and encoding rate. Thus, with the
`invention, an application may support several instances of a
`65 particular media type, \vith each instance having different
`characteristics. For example, the application may support
`multiple audio streams, with each stream in a different
`
`1. Field of the Invention
`The invention generally relates to real-time multimedia
`applications and, more particularly, to the streaming of
`real-time multimedia information over a communication
`network.
`2. Discussion of Related Art
`Generally speaking, multimedia applications present
`related media information, such as video, audio, music, etc.,
`on a presentation device, such as a computer having a
`display and sound system. Some multimedia applications
`are highly interactive, whereas other applications are far less
`interactive. For example, a game is a highly interactive
`application in which the application must respond to many
`user input<> such as keyboard commands and joystick
`movements, whereas viewing a video clip is less interactive
`and may only involve start and stop commands. Moreover,
`multimedia applications may be directed to standalone
`single computer contexts, or they may be directed to
`distributed, network-based contexts.
`At a very high level of abstraction, following the
`producer-consumer paradigm, any multimedia application
`involves producing and consuming the related multimedia
`information. The above examples of highly interactive and
`less interactive and standalone and network-based applica(cid:173)
`tions differ in the manner in which the information is
`produced and consumed and the complexity in controlling
`the production and consumption .
`For example, in PCs and other standalone contexts, the
`information need only be read off a local CD Rom or the
`like, and thus, producing the information to the consumer
`involves relatively predictable characteristics and requires
`relatively simple control logic. In network-based contexts,
`on the other hand, the information must be produced over
`the network and is thus subject to the unpredictable char(cid:173)
`acteristics and intricacies of the network: data may be lost, 55
`performance may vary over time, and so on. Consequently,
`the control logic may need to be relatively complicated.
`In either the standalone or network-based context, con(cid:173)
`suming the information involves presenting the related
`information to the corresponding presentation components
`in a controlled manner and in real time. For example, to
`provide intelligible audio-video clips, the video data must be
`provided to a video driver and audio data must be provided
`to a sound card driver within specified timing tolerances to
`maintain intra- and inter-stream synchronism. Intra-stream
`synchronism means that a given stream, such as audio, is
`presented in synchronism within specified time
`
`30
`
`35
`
`45
`
`50
`
`CISCO Exhibit 1010, pg. 10
`
`
`
`5,928,330
`
`3
`language, from which a user may select. Analogously, with
`the invention, the application may choose a given instance
`of a media stream based on the network's characteristics; for
`example, the application may choose an audio subtype that
`is encoded for a transmission rate that matches the network's s
`characteristics. The invention reduces a server's memory
`and processing requirements, thus allowing a server to
`simultaneously service more requests and support more
`channel<;. And, the invention dynamically adapts the media's
`streaming rate to use the network's resources more effi- 10
`ciently while minimizing the effects of the adaptation on the
`quality of the presentation.
`The invention includes a system and device for, and
`method of, presenting multimedia information. In a client(cid:173)
`server context, the invention includes a client that receives
`units of the multimedia information and presents the infor(cid:173)
`mation on a presentation device. Each unit of information
`has an importance value assigned to it, which in an exem(cid:173)
`plary embodiment is indicative of the unit's importance in
`relation to the quality of the presentation. The invention 20
`includes a mechanism for characterizing the performance
`capabilities of the system. For example, several conven(cid:173)
`tional statistics may be gathered and analyzed concurrently
`with the streaming operation and before it begins.! The
`invention includes a mechanism for inferring network con- 25
`ditions from the characterized performance. The server may
`then stream the units of multimedia information to the client
`at a streaming rate and adapt the streaming rate of the
`streaming in response to the importance information and in
`response to the inferred network conditions.
`According to one aspect of the invention, the importance
`information that is pre-assigned is stored with each unit of
`information in the file. This importance information is
`indicative of the unit's importance to the perceived quality
`of the presentation
`According to another aspect of the invention, the adap(cid:173)
`tation of the streaming rate includes throttling up the stream(cid:173)
`ing rate if the inferred network condition indicates that there
`is sufficient bandwidth to receive multiple copies of a given
`unit of information.
`According to yet another aspect of the invention, the
`adaptation of the streaming rate includes throttling down the
`streaming rate if the inferred network condition indicates
`that there is insufficient bandwidth to receive a given unit of
`information. Similarly, throttling down may occur if the
`inferred network condition indicates that there is network
`congestion.
`One of the performance characteristics gathered, believed
`to be novel, is the characterizing of a bit rate throughput 50
`value. This may be done, for example, by determining the
`data rate of a communications device, such as a modem,
`utilized by the client computer to communicate with the
`server computer.
`
`30
`
`4
`FIG. 4A is a block diagram showing a generic media
`block directory format;
`FIG. 4B is a block diagram shovving the format of a
`generic directory preamble;
`FIG. 4C is a block diagram showing the format of a
`generic packet descriptor;
`FIG. 4D is a block diagram showing the format of a
`generic media block body;
`FIG. 4E is a block diagram showing the format of a
`generic packet;
`FIG. 5 is a block diagram showing the format of an H.263
`media block;
`FIG. 6 is a block diagram showing the format of a MIDI
`15 media block preamble;
`FIG. 7 is a block diagram showing the format of a MIDI
`packet;
`FIG. 8 is a block diagram showing a system for creating
`a multimedia file;
`FIG. 9 is a block diagram showing a system embodying
`the invention in a client-server context;
`FIG. 10 is a block diagram showing the client and server
`components of the system;
`FIG. 11 is a flow diagram showing an initial interaction
`between the client and the server;
`FIG. U is a flow diagram showing the streaming logic of
`the server; and
`FIG. 13 is a flow diagram shmving the retransmit logic of
`the server.
`
`DETAILED DESCRIPTION
`
`In short, the invention involves a new file format for
`35 organizing related multimedia information and a system and
`device for, and method of, using files organized according to
`the new format. The invention eases the management and
`control of multimedia presentations, having various media
`streams, each of a specific type, each specific type further
`40 classified by encoding type, subtype, and encoding rate.
`Thus, with the invention, an application may support several
`related audio streams, such as English and French, from
`which a user may select. Analogously, with the invention,
`the application may choose a particular instance of a media
`45 stream based on the network's characteristics; for example,
`the application may choose an audio stream that is encoded
`for a transmission rate that matches the network's charac(cid:173)
`teristics. The invention reduces a server's memory and
`processing requirements, thus allowing a server to simulta(cid:173)
`neously service more requests and support more channels.
`And, the invention dynamically adapts the media's stream-
`ing rate to use the network's resources more efficiently while
`minimizing the effects of the adaptation on the quality of the
`presentation.
`
`I. The File Format
`
`55
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`In the Drav.ring,
`FIG. 1 is a block diagram showing the novel multimedia
`file at a high level of abstraction;
`FIG. 2A is a block diagram showing the file header of the 60
`multimedia file;.
`FIG. 2B is a block diagram showing the format of a file
`header preamble;
`FIG. 2C is a block diagram showing the format of a media
`instance descriptor;
`FIG. 3 is a block diagram showing the format of a media
`block;
`
`The new file format, among other things, allows various
`types and subtypes of multimedia information to be
`organized, maintained, and used as a single file. The file
`format simplifies the server by not requiring the server to
`know, for example, which video files are related to which
`audio files for a given application, or to know how to locate
`and use related files, each with their own internal
`organization, corresponding method of access and process-
`65 ing.
`The file format allows multiple instances of a single
`media type to be stored in the file. Multiple instances of a
`
`CISCO Exhibit 1010, pg. 11
`
`
`
`5,928,330
`
`5
`
`10
`
`5
`single media type may be desirable for supporting alternate
`encodings of the same media type, for example, an audio
`segment in multiple languages. This flexibility allows a
`single file to contain, in effect, multiple versions of the same
`presentation. Each instance corresponds to a "presentation's
`worth" of information for that media type. For example,
`with the audio media type, an instance may involve the
`entire soundtrack in French or the entire soundtrack encoded
`at a particular rate.
`The file format allows media instances to be added and
`deleted to a file. This feature allows the file to be updated as
`new media types and new media segments are developed,
`without requiring modification of the server's or client's
`logic to support the newly-added instances. This flexibility
`makes it easier to modify, create, and maintain large, com- 15
`plicated multimedia presentations.
`The file format allows the server to implement more
`flexible and powerful presentations. For example, the server
`could support multiple languages as various subtypes of an
`audio stream. In addition, the server could support multiple,
`expected transfer rates. For example, a video media type
`may be implemented as a subtyped instance having pre(cid:173)
`packetized video data encoded for a target transfer rate of
`28.8 kb/s or encoded for a target transfer rate of 14.4 kb/s.
`Moreover, when properly used by a server or other 25
`application, files organized according to the new format will
`reduce the amount of memory and processor resources
`required to stream the file's contents to a client or the like.
`These advantages are further discussed below.
`The new file format 100 is shown at a high level of
`abstraction in FIG. 1. The file format 100 includes a file
`header 110 and a file body 120. In short, the file header 110
`describes the file itself and the contents of the file body 120
`and includes information used to locate data in the file body. 35
`The file body 120 includes more information used to locate
`data in the file body as well as including the actual data used
`during a presentation.
`More specifically, the file header 110 includes a file
`header preamble 210 and a number of media instance 40
`descriptors 220, shmvn in FIG. 2A. The file header preamble
`210, show"U in more detail in FIG. 2B, includes a field 211
`containing a file signature, a field 212 containing the size of
`the header, a field 213 containing the major version number,
`a field 214 containing the minor version number, and a field 45
`215 containing the number of media instances in the file.
`(The major and minor version numbers are used by the client
`and the server to determine if they are compatible, for
`example, to determine whether they are communicating with
`an expectation of the same format of the file) The file header 50
`preamble 210 also includes reserved field5 216 and 217 to
`allow for future expansion of the preamble.
`As shown in more detail in FIG. 2C, each media instance
`descriptor 220 includes a variety of fields 221-225 which
`are used to describe and identify a media instance. Some of
`the fields are used to describe various characteristics or
`attributes about the instance's data that will be presented,
`whereas other fields are used to locate and select the data.
`Field 221 indicating the offset, e.g., number of bytes, from
`the beginning of the body to the corresponding media block.
`The media instance descriptor 220 also includes a field 222
`indicating the media type of the corresponding media block,
`for example, video, audio, MIDI, and other existing and
`future media types. Field 223 indicates the encoding type of
`the corresponding media block, for example, H.263, H.261,
`MPEG, G.723, MIDI, and other standard or proprietary
`encoding types. Field 224 indicates a corresponding
`
`6
`subtype, for example, English audio, French audio, QCIF
`video, CIF video, etc. Field 225 indicates an encoding rate
`of the corresponding media block; for example, for video
`information, the encoding rate indicates the target data rate
`for which the video information was encoded by a video
`encoder, such as the video encoder described in the related
`U.S. patent application Improved Video Encoding System
`and Method", identified and incorporated above, whereas for
`audio information, the encoding rate might indicate one of
`a number of audio sampling rates.
`As will be explained below, the number contained in field
`215 is not necessarily the same as the number of media
`streams that will eventually be involved in a presentation.
`The file contains a number of potentially related media
`streams, or instances, organized according to media type,
`encoding type, subtype, and rate. A presentation, on the
`other hand, will likely involve only a subset of the available
`media streams, typically one instance of each of the plurality
`of media types. For example, a given presentation will likely
`involve only one of the multiple audio (language) subtyped
`20 instances that may be provided by a file organized according
`to the format. The same can be said for data encoded at
`different rates, and of course, a user may not be interested in
`a full compliment of media streams, e.g., the user may not
`be interested in receiving audio, even if it is supported by a
`file. Depending upon the services supported by the server
`(more below), the actual media types and particular
`instances of those media types involved in a presentation
`may be controlled by an end user, e.g., which language, and
`may also be controlled by the system, e.g., which encoded
`30 rate of audio.
`Once the media types and particular instances of those
`media types have been determined, for example, by being
`selected by the user or the server, the server v,rill construct
`data structures using information from the file header 110,
`described above, so that the server can index into and iterate
`over the data packets contained in the file body 120.
`(Indexing and iteration logic are kno"W'll) The server can also
`use the header's information to perform revision control and
`other knov.'ll maintenance operations, discussed below when
`describing an exemplary server.
`The data contained in the file body 120 is organized as
`contiguous media blocks 310, one media block for each
`instance of a media type, as shown in FIG. 3. Each media
`block 310 includes a media block directory 320 and a media
`block body 330. The media block directory 320 includes
`information that may be used to locate information in the
`media block body 330, and the media block body 330
`includes the actual data that will eventually be presented.
`This data is stored in the media block body 330 in pre(cid:173)
`packetized form. "Pre-packetized" means that the data
`stored in media block body 330 is organized as discrete
`packets of information that can be transported without
`requiring any processing by the server to build the packets.
`An exemplary embodiment, discussed below, pre-packetizes
`55 the data so that it can be applied directly to the User
`Datagram Protocol (UDP) layer, which is part of the TCP/IP
`protocol suite. (UDP and TCP/IP are known).
`The pre-packetization process is media instance specific.
`For example, the G.723 audio encoding standard encodes an
`60 audio stream into a stream of blocks, in which each block
`represents 30 milliseconds of audio information. One
`method of pre-packetizing the audio would be to form an
`audio packet for each G.723 block. A potentially more
`efficient method, however, merges many g.723 blocks into a
`65 packet, for example, 32 G.723 blocks to form an audio
`packet representing 960 milliseconds' worth of audio infor-
`mation.
`
`CISCO Exhibit 1010, pg. 12
`
`
`
`5,928,330
`
`10
`
`35
`
`7
`Pre-packetizing video information, on the other hand,
`may benefit from dividing, rather than merging, presentation
`units. For example, under the H.263 video encoding
`standard, video information is encoded as a sequence of
`video frames (i.e., a frame being a presentation unit). 5
`Although a video packet may be formed to correspond to a
`single video frame, or presentation unit, advantages may be
`attained by dividing the presentation unit into several pack(cid:173)
`ets. In this fashion, a large video frame may be divided into
`several packets so that each video packet is limited to a
`predetermined, maximum packet size.
`By pre-packetizing the data, an appropriately designed
`server's processor load may be reduced by alleviating the
`processor from having to perform certain tasks such as
`constructing packets on-the-fly from the media information. 15
`With the invention, the server can simply read a packet from
`the file and pass it to a UDP layer of the protocol stack via
`a standard interface.
`In addition, by pre-packetizing the data, an appropriately
`designed server's memory requirements are by alleviating
`the server from having to keep recently transmitted packets
`available in a "packet window" in memory. Packet v.rindows
`are conventionally used to hold recently-transmitted net(cid:173)
`work packets in case they need to be retransmitted because
`they were lost in the network. The protocol being used
`dictates the required size of a packet window, but it is not
`uncommon in modern systems to have v.rindows that require
`on the order of 100 kb of memory (RAM). Given that each
`network channel requires a corresponding packet window
`and that it is not uncommon for current high-demand
`Internet servers to support upwards of 5,000 simultaneous
`channels (v.rith foreseeable demand growing to over 20,000
`simultaneous channels in the near future), 512 Megabytes of
`expensive high-speed memory are needed just to support
`packet windowing. This requirement precludes many mod(cid:173)
`ern personal computers, and other small systems, from
`operating as a server. In contrast, the invention obviates the
`need for the packet windows and thus allows smaller,
`lower-cost systems to potentially operate as servers.
`The organization of a generic media block 310 is shown 40
`in FIGS. 4A-E and defines the basic template for a media
`block. In short, the generic media block format describes
`certain features common to all media types and instances. As
`will be described below, specific media types, such as video,
`audio, and MIDI may need to "supplement" the generic
`template.
`A generic media block directory 400, shown in FIG. 4A,
`includes a directory preamble 410 and a number of packet
`descriptors 420. The directory preamble 410, shown in more
`detail in FIG. 4B, includes a packet count field 411, indi(cid:173)
`cating the number of packets contained in the media block
`body and a presentation length field 413 indicating the total
`time duration of the presentation. A number of reserved
`fields 412 and 414 provide additional storage for media
`instance specific information (more below).
`A packet descriptor 420, shown in FIG. 4C, describes a
`single packet in the media block body. There is a one-to-one
`correspondence between packet descriptors 420 in the media
`block directory and p