`(10) Patent N0.:
`US 6,637,031 B1
`
`Chou
`('45) Date of Patent:
`Oct. 21, 2003
`
`U5006637031B1
`
`(54) MULTIMEDIA PRESENTATION LATENCY
`MINIMIZATION
`Inventor: Philip A. Chou, Menlo Park, CA (US)
`
`(75)
`
`EP
`
`FOREIGN PATENT DOCUMENTS
`0695094
`1/1996
`............ H04N/7/26
`OTHER PUBLICATIONS
`
`.
`.
`,
`,
`(73) Ass1gnee. 351305011 Corporatlon, Redmond, WA
`.
`,
`.
`.
`.
`.
`( T ) Notice.
`Subject. to any disclaimer, the term Of thls
`patent is extended or adjusted under 35
`U S C 154(b) b 0 da s
`‘
`‘
`"
`y
`y ‘
`
`(21) Appl. No.: 09/205,875
`.
`Dec. 4’ 1998
`Flled:
`(22)
`Int. CI.7 ................................................ H04N 7/173
`(51)
`(52) US. Cl.
`............................. 725/87; 725/94; 725/98;
`725/118; 709/219; 709/247; 348/384.1;
`348/395.1; 348/438.1; 375/2401; 375/240.”
`(58) Field of Search ............................. 725/94, 98, 118,
`725/240.26, 87; 370/468, 236, 230_232,
`235. 709/219 247. 348/394.1 395.1 409.1
`’ 410.1, 4&51’ ’438.l; 375/7240'1’ 240.11:
`240.19, 240.08
`
`(56)
`
`References Cited
`
`U~S~ PATENT DOCUMENTS
`5,262,875 A 7 11/1993 Mincer et a1.
`.............. 358/335
`5,659,539 A
`8/1997 Porter et a1.
`........... 395/200.61
`5,742,343 A
`4/1998 Haskell et a1.
`.............. 348/415
`5,886,733 A *
`3/1999 Zdepski et al.
`............... 348/13
`5,982,436 A * 11/1999 Balakrishnan et a1.
`...... 348/409
`6,014,694 A 5‘
`1/2000 Aharoni el. al.
`............. 709/219
`0,185,025 B1 *
`2/2001 Tso et al.
`709/247
`
`8/2001 Hindus et al.
`.............. 370/468
`6,282,206 B1 *
`
`Chiang, T., et al., “Hierarchical Coding of Digital TeleVi-
`sion”, IEEE Communications Alagazine, V01. 32, No. 5,
`38—45, (May 1, 1994).
`Zheng, B., et al., “Multimedia Over High Speed Networks:
`Reducing Network Requirements with Fast Buffer Fillup”,
`.
`.
`IEEE Global
`Telecommunications Conference, NK
`XP000825861, 779—784, (1998).
`* " d b ' examiner
`W >
`Primary ExamineriVivek Srivastava
`Assistant Examiner—Ngoc Vu
`(74) Attorney. Agent. or Firm—Lee & Hayes, PLLC
`(57)
`ABSTRACT
`.
`.
`.
`'
`.
`.
`.
`To obtain real-time responses With 1nteract1Ve multimedia
`servers, the server prov1des at least
`two different audio/
`Visual data streams. A first data stream has fewer bits per
`frame and prov1des a Video image much more quickly than
`asecond data stream with a higher number of bits and hence
`higher quality Video image. The first data stream becomes
`available to a client much faster and may be more quickly
`
`displayed on demand while the second data stream is sent to
`improve the quality as soon as the playback buffer can
`handle it. In one embodiment, an entire Video signal
`is
`layered> With a base layer prOVifllng the firs‘ Signal and
`further cnhanccmcnt layers compr1s1ng the second. The base
`layer may be actual Image frames or J11st the aule POIIIOH
`of a Video stream. The first and second streams are gradually
`combined in a manner such that the playback buffer does not
`overflow of underflow.
`
`18 Claims, 6 Drawing Sheets
`
`200
`
`
`
`
`
`
`
`
`
`
`
`
`VIDEO SERVER
`
`‘I
`
`II| l
`
`IIIII
`
`4'
`
`1.202
`
`
`
`
`
`
`
`VIDEO CLIENT
`
`VIDEO CAPTURING TOOLS
`
`Page 1 of 15
`Page 1 of 15
`
`VIMEO/IAC EXHIBIT 1011
`VIMEO/IAC EXHIBIT 1011
`VIMEO ET AL., IPR2019-00833
`VIMEO ET AL., |PR2019-00833
`
`
`
`US. Patent
`
`Oct. 21, 2003
`
`Sheet 1 0f 6
`
`US 6,637,031 B1
`
`502mm
`
`«52200
`
`
`
`29202“:mango:
`
`F.9...
`
`
`
`
`
`onEmz«my?.203_
`
`Du
`
`«9.202
`
`o(
`
`\l
`_______i__‘___
`
`mmEEO
`
`xmoEmz
`
`mo<mmmE_
`
`._<_mmm
`
`Eon
`
`mofixEZ
`
`12250
`
`”Ema
`
`wo<.._~_m:.z_
`
`me0.520%
`meom<I
`
`mama
`
`mo<.._mm.—z_
`
`mama
`
`mo<.._~_m.—z_
`
`2560mm
`
`$.58:
`
`mom5me
`
`a
`
`mmE<o<
`
`89>:2:
`027.88%3..........
`
`Page 2 of 15
`Page 2 of 15
`
`
`
`
`
`
`
`
`US. Patent
`
`Oct. 21, 2003
`
`Sheet 2 0f 6
`
`US 6,637,031 B1
`
`
`
`mm>mmmOm.n__>
`
`._.Zm_:0
`
`
`Omo_>
`
`CON
`
`Page 3 of 15
`Page 3 of 15
`
`
`
`._.2m=._oown=>
`
`
`
`US. Patent
`
`Oct. 21, 2003
`
`Sheet 3 0f 6
`
`US 6,637,031 B1
`
`
`
`Page 4 of 15
`Page 4 of 15
`
`
`
`US. Patent
`
`Oct. 21, 2003
`
`Sheet 4 0f 6
`
`US 6,637,031 B1
`
`BITS
`
`A
`
`B
`
`C
`
`D
`
`FIG. 4A
`
`FIG.4B
`
`BITS
`
`A B
`
`C
`
`D
`
`FIG. 4C
`
`BITS
`
`A BIB C19,
`
`D
`
`
`
`TIME
`
`TIME
`
`TIME
`
`FIG. 4D
`
`B'TS
`
`.E
`
`A SB
`
`9
`
`D
`
`F
`
`TIME
`
`D .,.F
`_,C
`,G
`A J3
`,E
`BITS
`
` TRANSMISSION
`DELAY I/Ir
`
`
`2
`E
`START—UP DELAY
`
`TIME
`
`FIG. 4E
`
`6‘9
`INITIAL ENCODER
`BUFFER EMPTINESS
`
`Page 5 of 15
`Page 5 of 15
`
`
`
`US. Patent
`
`Oct. 21, 2003
`
`Sheet 5 0f 6
`
`US 6,637,031 B1
`
`BITS
`
` FIG. 5
`
`FIG. 6
`
`TIME
`
`TIME
`
`FIG. 7
`
`BITS
`
`
`
`TIME
`
`Page 6 of 15
`Page 6 of 15
`
`
`
`US. Patent
`
`Oct. 21, 2003
`
`Sheet 6 0f 6
`
`US 6,637,031 B1
`
`FIG. 8
`
`
`
`TIME
`
`BITS
`
`
`
`TIME
`
`TIME
`
`FIG. 9
`
`FIG. 10
`
`Page 7 of 15
`Page 7 of 15
`
`
`
`US 6,637,031 B1
`
`1
`MULTIMEDIA PRESENTATION LATENCY
`MINIMIZATION
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to multimedia
`communications and more specifically to latency minimiza—
`tion for on-demand interactive multimedia applications.
`
`COPYRIGHT NOTICE/PERMISSION
`
`10
`
`A portion of the disclosure of this patent document
`contains material which is subject to copyright protection.
`The copyright owner has no objection to the facsimile
`reproduction by anyone of the patent document or the patent
`disclosure as it appears in the Patent and Trademark Office
`patent file or records, but otherwise reserves all copyright
`rights whatsoever. The following notice applies to the soft-
`ware and data as described below and in the drawing hereto:
`Copyright © 1998, Microsoft Corporation, All Rights ’
`Reserved.
`
`15
`
`BACKGROUND
`
`Information presentation over the Internet is changing
`dramatically. New time-varying multimedia content is now
`being brought to the Internet, and in particular to the World
`Wide Web (the web), in addition to textual HTML pages and
`still graphics. Here, time-varying multimedia content refers
`to sound, video, animated graphics, or any other medium
`that evolves as a function of elapsed time, alone or in
`combination. In many situations, instant delivery and pre-
`sentation of such multimedia content, on demand, is desired.
`“On-demand” is a term for a wide set of technologies that
`enable individuals to select multimedia content from a
`central server for instant delivery and presentation on a
`client (computer or television). For example, vidco-on-
`demand can be used for entertainment (ordering movies
`transmitted digitally), education (viewing training videos)
`and browsing (viewing informative audiovisual material on
`a web page) to name a few examples.
`Users are generally connected to the Internet by a com-
`munications link of limited bandwidth, such as a 56 kilo bits
`per second (Kbps) modem or an integrated services digital
`network (ISDN) connection. Even corporate users are usu-
`ally limited to a fraction of the 1.544 mega bits per second
`(Mbps) 'l‘—1 carrier rates. This bandwidth limitation pro-
`vides a challenge to on-demand systems: it may be impos-
`sible to transmit a large amount of image or video data over
`a limited bandwidth in the short amount of time required for
`“instant delivery and presentation.” Downloading a large
`image or video may take hours before presentation can
`begin. As a consequence, special
`techniques have been
`developed for on-demand processing of large images and
`video.
`
`A technique for providing large images on demand over
`a communications link with limited bandwidth is progres-
`sive image transmission. In progressive image transmission,
`each image is encoded, or compressed, in layers, like an
`onion. The first (core) layer, or base layer, represents a
`low-resolution version of the image. Successive layers rep-
`resent successively higher resolution versions of the image.
`The server transmits the layers in order, starting from the
`base layer. The client receives the base layer, and instantly
`presents to the user a low-resolution version of the image.
`The client presents higher resolution versions of the image
`as the successive layers are received. Progressive image
`
`Page 8 of 15
`Page 8 of 15
`
`2
`transmission enables the user to interact with the server
`instantly, with low delay, or low latency. For example,
`progressive image transmission enables a user to browse
`through a large database of images, quickly aborting the
`transmission of the unwanted images before they are com-
`pletely downloaded to the client.
`Similarly, streaming is a technique that provides time-
`varying content, such as video and audio, on demand over a
`communications link with limited bandwidth. In streaming,
`audiovisual data is packetized, delivered over a network, and
`played as the packets are being received at the receiving end,
`as opposed to being played only after all packets have been
`downloaded. Streaming technologies are becoming increas-
`ingly important with the growth of the Internet because most
`users do not have fast enough access to download large
`multimedia files quickly. With streaming, the client browser
`or application can start displaying the data before the entire
`file has been transmitted.
`
`In a video on-demand delivery system that uses
`streaming,
`the audiovisual data is often compressed and
`stored on a disk on a media server for later transmission to
`
`the client side
`a client system. For streaming to work,
`receiving the data must be able to collect the data and send
`it as a steady stream to a decoder or an application that is
`processing the data and converting it to sound or pictures. If
`the client receives the data more quickly than required, it
`needs to save the excess data in a buffer. Conversely, if the
`client receives the data more slowly than required, it needs
`to play out some of the data from the buffer. Storing part of
`a multimedia file in this manner before playing the file is
`referred to as buffering. Buffering can provide smooth
`playback even if the client temporarily receives the data
`more quickly or more slowly than required for real-time
`playback.
`There are two reasons that a client can temporarily receive
`data more quickly or more slowly than required for real-time
`playback. First, in a variable-rate transmission system such
`as a packet network, the data arrives at uneven rates. Not
`only does packetized data inherently arrive in bursts, but
`even packets of data that are transmitted from the sender at
`an even rate may not arrive at the receiver at an even rate.
`This is due to the fact that individual packets may follow
`different routes, and the delay through any individual router
`may vary depending on the amount of traffic waiting to go
`through the router. The variability in the rate at which data
`is transmitted through a network is called network jitter.
`A second reason that a client can temporarily receive data
`more quickly or more slowly than required for real-time
`playback is that the media content is encoded to variable bit
`rate. For example, high-motion scenes in a video may be
`encoded with more bits than low-motion scenes. When the
`
`encoded video is transmitted with a relatively constant bit
`rate, then the high-motion frames arrive at a slower rate than
`the low—motion frames. For both these reasons (variable—rate
`source encoding and variable-rate transmission channels),
`buffering is required at the client to allow a smooth presen-
`tation.
`
`40
`
`45
`
`60
`
`65
`
`Unfortunately, buffering implies delay, or latency. Start-
`up delay refers to the latency the user experiences after he
`signals the server to start transmitting data from the begin-
`ning of the content (such as when a pointer to the content is
`selected by the user) before the data can be decoded by the
`client system and presented to the user. Scck dclay refers to
`the latency the user experiences after he signals to the server
`to start transmitting data from an arbitrary place in the
`middle of the content (such as when a seek bar is dragged to
`
`
`
`US 6,637,031 B1
`
`3
`a particular point in time) before the data can be decoded and
`presented. Both start-up and seek delays occur because even
`after the client begins to receive new data, it must wait until
`its buffer is sufficiently full to begin playing out of the buffer.
`It does this in order to guard against future buffer underflow
`due to network jitter and variable-bit rate compression. For
`typical audiovisual coding on the Internet, start-up and seek
`delays between two and ten seconds are common.
`Large start-up and seek delays are particularly annoying
`when the user is trying to browse through a large amount of
`audiovisual content trying to find a particular video or a
`particular location in a video. As in the image browsing
`scenario using progressive transmission, most of the time
`the user will want to abort the transmission long before all
`the data are downloaded and presented. In such a scenario,
`delays of two to ten seconds between aborts seem intoler-
`able. What is needed is a method for reducing the start-up
`and seek delays for such “on demand” interactive multime-
`dia applications.
`SUMMARY OF THE INVENTION
`
`The above-identified problems, shortcomings and disad-
`vantages with the prior art, as well as other problems,
`shortcoming and disadvantages, are solved by the present
`invention, which will be understood by reading and studying
`the specification and the drawings. The present invention ,
`minimizes the start—up and seek delays for on—demand
`interactive multimedia applications, when the transmission
`bit rate is constrained.
`
`In one embodiment, a server provides at least two differ-
`ent data streams. A first data stream is a low resolution
`stream encoded at a bit rate below the transmission bit rate.
`A second data stream is a normal resolution stream encoded
`
`at a bit rate equal to the transmission bit rate. The server
`initially transmits the low resolution stream faster than real
`time, at a bit rate equal to the transmission bit rate. The client
`receives the low resolution stream faster than real time, but
`decodes and presents the low resolution stream in real time.
`Unlike previous systems, the client does not need to wait
`for its buffer to become safely full before beginning to
`decode and present. The reason is that even at the beginning
`of the transmission, when the client buffer is nearly empty,
`the buffer will not underflow, because it is being filled at a
`rate faster than real time, but is being played out at a rate
`equal to real time. Thus, the client can safely begin playing
`out of its buffer as soon as data are received. In this way, the
`delay due to buifering is reduced to nearly zero.
`When the client buffer has grown sufficiently large to
`guard against future underflow by the normal resolution
`stream, the server stops transmission of the low resolution
`stream and begins transmission of the normal resolution
`stream. The system of the present invention reduces the
`start-up or seek delay for interactive multimedia applications
`such as video on-demand, at the expense of initially lower
`quality. The invention includes systems, methods,
`computers, and computer-readable media of varying scope.
`Besides the embodiments, advantages and aspects of the
`invention described here, the invention also includes other
`embodiments, advantages and aspects, as will become
`apparent by reading and studying the drawings and the
`following description.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a diagram of an exemplary computer system in
`which the invention may be implemented.
`FIG. 2 is a diagram of an example network architecture in
`which embodiments of the present invention are incorpo-
`rated.
`
`40
`
`45
`
`60
`
`65
`
`Page 9 of 15
`Page 9 of 15
`
`4
`FIG. 3 is a block diagram representing the data flow for
`a streaming media system for use with the computer network
`of FIG. 2.
`
`FIGS. 4A, 4B, 4C, 4D, and 4E are schedules illustrating
`data flow for example embodiments of the streaming media
`system of FIG. 3.
`FIG. 5 is a decoding schedule for multimedia content
`pre—encoded at a full bit rate.
`FIG. 6 is a schedule showing the full bit rate encoding of
`FIG. 5 advanced by T seconds.
`FIG. 7 is a schedule showing a low bit rate encoding of
`the content shown in FIG. 5.
`
`10
`
`15
`
`FIG. 8 is a schedule showing the low bit rate encoding
`schedule of FIG. 7 advanced by T seconds and superimposed
`on the advanced schedule of FIG. 6.
`
`FIG. 9 is a schedule showing the transition from the
`delivery of the low bit rate encoded stream of FIG. 7 to the
`data stream of FIG. 6, with a gap to indicate optional bit
`stufling.
`FIG. 10 is a schedule showing the advanced schedule of
`FIG. 6 with a total of RT bits removed from the initial
`frames.
`
`DESCRIPTION OF THE EMBODIMENTS
`
`In the following detailed description of the embodiments,
`reference is made to the accompanying drawings which
`form a part hereof, and in which is shown by way of
`illustration specific embodiments in which the invention
`may be practiced. These embodiments are described in
`sufficient detail to enable those skilled in the art to practice
`the invention, and it is to be understood that other embodi—
`ments may be utilized and that structural, logical and elec-
`trical changes may be made without departing from the
`scope of the present
`inventions. The following detailed
`description is, therefore, not to be taken in a limiting sense,
`and the scope of the present inventions is defined only by the
`appended claims.
`The present invention is a system for achieving low
`latency responses from interactive multimedia servers, when
`the transmission bit rate is constrained. A server provides at
`least two different data streams. A first data stream is a low
`resolution stream encoded at a bit rate below the transmis-
`sion bit rate. A second data stream is a normal resolution
`
`stream encoded at a bit rate equal to the transmission bit rate.
`The server initially transmits the low resolution stream faster
`than real time, at a bit rate equal to the transmission bit rate.
`The client receives the low resolution stream faster than real
`time, but decodes and presents the low resolution stream in
`real time. When the client buffer has grown sufficiently large
`to guard against future underflow by the normal resolution
`stream, the server stops transmission of the low resolution
`stream and begins transmission of the normal resolution
`stream. The system of the present invention reduces the
`start-up or seek delay for interactive multimedia applications
`such as video on-demand, at the expense of initially lower
`quality.
`The detailed description of this invention is divided into
`four sections. The first section provides a general description
`of a suitable computing environment in which the invention
`may be implemented including an overview of a network
`architecture for generating, storing and transmitting audio/
`visual data using the present invention. The second section
`illustrates the data flow for a streaming media system for use
`with the network architecture described in the first section.
`The third section describes the methods of exemplary
`
`
`
`US 6,637,031 B1
`
`10
`
`15
`
`40
`
`45
`
`5
`embodiments of the invention. The fourth section is a
`conclusion which includes a summary of the advantages of
`the present invention.
`Computing Environment. FIG. 1 provides a brief, general
`description of a suitable computing environment in which
`the invention may be implemented. The invention will
`hereinafter be described in the general context of computer-
`executable program modules containing instructions
`executed by a personal computer (PC). Program modules
`include routines, programs, objects, components, data
`structures, etc. that perform particular tasks or implement
`particular abstract data types. Those skilled in the art will
`appreciate that the invention may be practiced with other
`computer-system configurations,
`including hand-held
`devices, multiprocessor systems, microprocessor-based pro-
`grammable consumer electronics, network PCs,
`minicomputcrs, mainframc computers, and the like. The
`invention may also be practiced in distributed computing
`environments where tasks are performed by remote process-
`ing devices linked through a communications network. In a ,
`distributed computing environment, program modules may
`be located in both local and remote memory storage devices.
`FIG. 1 employs a general-purpose computing device in
`the form of a conventional personal computer 20, which
`includes processing unit 21, system memory 22, and system
`bus 23 that couples the system memory and other system
`components to processing unit 21. System bus 23 may be
`any of several types, including a memory bus or memory
`controller, a peripheral bus, and a local bus, and may use any
`of a variety of bus structures. System memory 22 includes
`read-only memory (ROM) 24 and random-access memory
`(RAM) 25. Abasic input/output system (BIOS) 26, stored in
`ROM 24, contains the basic routines that transfer informa-
`tion between components of personal computer 20. BIOS 24
`also contains start-up routines for the system. Personal
`computer 20 further includes hard disk drive 27 for reading
`from and writing to a hard disk (not shown), magnetic disk
`drive 28 for reading from and writing to a removable
`magnetic disk 29, and optical disk drive 30 for reading from
`and writing to a removable optical disk 31 such as a
`CD-ROM or other optical medium. Hard disk drive 27,
`magnetic disk drive 28, and optical disk drive 30 are
`connected to system bus 23 by a hard-disk drive interface
`32, a magnetic-disk drive interface 33, and an optical-drive
`interface 34, respectively. The drives and their associated
`computer-readable media provide nonvolatile storage of
`computer-readable instructions, data structures, program
`modules and other data for personal computer 20. Although
`the exemplary environment described herein employs a hard
`disk, a removable magnetic disk 29 and a removable optical
`disk 31, those skilled in the art will appreciate that other
`types of computer-readable media which can store data
`accessible by a computer may also be used in the exemplary
`operating environment. Such media may include magnetic
`cassettes, flashmemory cards, digital versatile disks, Ber-
`noulli cartridges, RAMs, ROMs, and the like.
`Program modules may be stored on the hard disk, mag-
`netic disk 29, optical disk 31, ROM 24 and RAM 25.
`Program modules may include operating system 35, one or
`more application programs 36, other program modules 37,
`and program data 38. A user may enter commands and
`information into personal computer 20 through input devices
`such as a keyboard 40 and a pointing device 42. Other input
`devices (not shown) may include a microphone, joystick,
`game pad, satellite dish, scanner, or the like. These and other
`input devices are often connected to the processing unit 21
`through a serial-port interface 46 coupled to system bus 23;
`
`60
`
`65
`
`Page 10 of 15
`Page 10 of15
`
`6
`but they may be connected through other interfaces not
`shown in FIG. 1, such as a parallel port, a game port, or a
`universal serial bus (USB). A monitor 47 or other display
`device also connects to system bus 23 via an interface such
`as a video adapter 48. In addition to the monitor, personal
`computers typically include other peripheral output devices
`(not shown) such as speakers and printers.
`Personal computer 20 may operate in a networked envi-
`ronment using logical connections to one or more remote
`computers such as remote computer 49. Remote computer
`49 may be another personal computer, a server, a router, a
`network PC, a peer device, or other common network node.
`It
`typically includes many or all of the components
`described above in connection with personal computer 20;
`however, only a storage device 50 is illustrated in FIG. 1.
`The logical connections depicted in FIG. 1 include local-
`area network (LAN) 51 and a wide-area network (WAN) 52.
`Such networking environments are commonplace in offices,
`enterprise-wide computer networks, intranets and the Inter-
`net.
`
`When placed in a LAN networking environment, PC 20
`connects to local network 51 through a network interface or
`adapter 53. When used in a WAN networking environment
`such as the Internet, PC 20 typically includes modem 54 or
`other means for establishing communications over network
`52. Modern 54 may be internal or external to PC 20, and
`connects to system bus 23 via serial—port interface 46. In a
`networked environment, program modules depicted as resid-
`ing within 20 or portions thereof may be stored in remote
`storage device 50. Of course,
`the network connections
`shown are illustrative, and other means of establishing a
`communications link between the computers may be sub-
`stituted.
`
`FIG. 2 is a diagram of an example network architecture
`200 in which embodiments of the present invention are
`implemented. The example network architecture 200 com—
`prises video capturing tools 202, a video server 204, a
`network 206 and one or more video clients 208.
`
`The video capturing tools 202 comprise any commonly
`available devices for capturing video and audio data, encod-
`ing the data and transferring the encoded data to a computer
`via a standard intcrfacc. The example vidco capturing tools
`202 of FIG. 2 comprise a camera 210 and a computer 212
`having a video capture card, compression software and a
`mass storage device. The video capturing tools 202 are
`coupled to a video server 204 having streaming software and
`optionally having software tools enabling a user to manage
`the delivery of the data.
`The video server 204 comprises any commonly available
`computing environment such as the exemplary computing
`environment of FIG. 1, as well as a media server environ-
`ment that supports on-demand distribution of multimedia
`content. The media server environment of video server 204
`
`comprises strcaming softwarc, one or more data storage
`units for storing compressed files containing multimedia
`data, and a communications control unit for controlling
`information transmission between video server 204 and
`
`video clients 208. The video server 204 is coupled to a
`network 206 such as a local-area network or a wide-area
`network. Audio, video, illustrated audio, animations, and
`other multimedia data types are stored on video server 204
`and delivered by an application on-demand over network
`206 to one or more video clients 208.
`
`The video clients 208 comprise any commonly available
`computing environments such as the exemplary computing
`environment of FIG. 1. The video clients 208 also comprise
`
`
`
`US 6,637,031 B1
`
`5
`
`10
`
`15
`
`7
`any commonly available application for viewing streamed
`multimedia file types, including OuickTime (a format for
`video and animation), RealAudio (a format for audio data),
`RealVideo (a format
`for video data), ASF (Advanced
`Streaming Format) and MP4 (the MPEG-4 file format). Two
`video clients 208 are shown in FIG. 2. However, those of
`ordinary skill in the art can appreciate that video server 204
`may communicate with a plurality of video clients.
`In operation, for example, a user clicks on a link to a video
`clip or other video source, such as camera 210 used for video
`conferencing or other purposes, and an application program
`for viewing streamed multimedia files launches from a hard
`disk of the video client 208. The application begins loading
`in a file for the video which is being transmitted across the
`network 206 from the video server 204. Rather than waiting
`for the entire video to download, the video starts playing
`after an initial portion of the video has come across the
`network 206 and continues downloading the rest of the
`video while it plays. The user does not have to wait for the
`entire video to download before the user can start viewing.
`However, in existing systems there is a delay for such “on
`demand” interactive applications before the user can start
`viewing the initial portion of the video. The delay, referred
`to herein as a start-up delay or a seek delay, is experienced
`by the user between the time when the user signals the video ’
`server 204 to start transmitting data and the time when the
`data can be decoded by the video client 208 and presented
`to the user. However, the present invention, as described
`below, achieves low latency responses from video server
`204 and thus reduces the start-up delay and the seek delay.
`An example computing environment in which the present
`invention may be implemented has been described in this
`section of the detailed description. In one embodiment, a
`network architecture for on-demand distribution of multi-
`media content comprises video capture tools, a video server,
`a network and one or more video clients.
`
`'
`
`Data Flow for a Streaming Media System. The data flow
`for an example embodiment of a streaming media system is
`described by reference to FIGS. 3, 4A, 4B, 4C, 4D and 4E.
`FIG. 3 is a block diagram representing the data flow for a
`streaming media system 300 for use with the network
`architecture of FIG. 2. The streaming media system 300
`comprises an encoder 302 which may be coupled to camera
`210 or other real time or uncompressed video sources, an
`encoder buffer 304, a network 306, a decoder buffer 308 and
`a decoder 310.
`
`40
`
`45
`
`The encoder 302 is a hardware or software component
`that encodes and/or compresses the data for insertion into
`the encoder buffer 304. The encoder buffer 304 is one or
`more hardware or software components that stores the
`encoded data until such time as it can be released into the
`network 306. For
`live transmission such as video
`
`conferencing, the encoder buffer 304 may be as simple as a
`first—in first—out (FIFO) queue. For video on—demand from a
`video server 204, the encoder buffer 304 may be a combi-
`nation of a FIFO queue and a disk file on the capture tools
`202, transmission. buffers between the capture tools 202 and
`the video server 204, and a disk file and output FIFO queue
`on the Video server 204. The decoder buffer 308 is a
`hardware or software component that receives encoded data
`from the network 306, and stores the encoded data until such
`time as it can be decoded by decoder 310. The decoder 310
`is a hardware or software component that decodes and/or
`decompresses the data for display.
`In operation, each bit produced by the encoder 302 passes
`point A 312, point B 314, point C 316, and point D 318 at
`
`60
`
`65
`
`Page 11 of 15
`Page 11 of15
`
`8
`a particular instant in time. A graph of times at which bits
`cross a given point is referred to herein as a schedule. The
`schedules at which bits pass point A312, point B 314, point
`C 316, and point D 318 can be illustrated in a diagram such
`as shown in the FIGS. 4A, 4B, 4C, 4D and 4E.
`FIGS. 4A, 4B, 4C, 4D and 4E are schedules illustrating
`data flow for example embodiments of the streaming media
`system of FIG. 3. As shown in FIGS. 4A, 4B, 4C, 4D and
`4E, the y-axis corresponds to the total number of bits that
`have crossed the respective points (i.e. point A, point B,
`point C, and point D in FIG. 3) and the x-axis corresponds
`to elapsed time. In the example shown in FIG. 4A, schedule
`A corresponds to the number of bits transferred from the
`encoder 302 to the encoder buffer 304. Schedule B corre-
`sponds to the number of bits that have left the encoder buffer
`304 and entered the network 306. Schedule C corresponds to
`the number of bits received from the network 306 by the
`decoder buffer 308. Schedule D corresponds to the number
`of bits transferred from the decoder buffer 308 to the decoder
`310.
`
`In the example shown in FIG. 4B, the network 306 has a
`constant bit rate and a constant delay. As a result, schedules
`B and C are linear and are separated temporally by a
`constant transmission delay.
`In the example shown in FIG. 4C, the network 306 is a
`packet network. As a result, schedules B and C have a
`staircase form. The transmission delay is generally not
`constant. Nevertheless, there exist linear schedules B‘ and C‘
`that provide lower and upper bounds for schedules B and C
`respectively. Schedule B' is the latest possible linear sched-
`ule at which encoded bits are guaranteed to be available for
`transmission. Schedule C'
`is the earliest possible linear
`schedule at which received bits are guaranteed to be avail-
`able for decoding. The gap between schedules B‘ and C' is
`the maximum reasonable transmission delay (including jitter
`and any retransmission time) plus an allowance for the
`packetization itself. In this way, a packet network can be
`reduced, essentially,
`to a constant bit rate, constant delay
`channel.
`
`Referring now to the example shown in FIG. 4D, for
`real-time applications the end-to-end delay (from capture to
`presentation) must be constant; otherwise there would be
`temporal warping of the presentation. Thus, if the encoder
`and decoder have a constant delay, schedules A and D are
`separated temporally by a constant delay, as illustrated in
`FIG. 4D.
`
`At any given instant in time, the vertical distance between
`schedulesAand B is the number of bits in the encoder buffer,
`and the vertical distance between schedules C and D is the
`number of bits in the decoder buffer. If the decoder attempts
`to remove more bits from the decoder buffer than exist in the
`
`buffer (i.e., schedule D tries to occur ahead of schedule C),
`then the decoder buffer underfiows and an error oc