`(10) Patent No:
`a2) United States Patent
`Chou
`(45) Date of Patent:
`Oct. 21, 2003
`
`
`US006637031B1
`
`(54) MULTIMEDIA PRESENTATION LATENCY
`MINIMIZATION
`(75)
`Inventor: Philip A. Chou, Menlo Park, CA (US)
`.
`.
`.
`(73) Assignee: Microsoft Corporation, Redmond, WA
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(*) Notice:
`
`(21) Appl. No.: 09/205,875
`‘
`(22)
`Filed:
`Dec.4, 1998
`(SD)
`TintG0 erecercrrernencery cae cree xenon HON 7/173
`(52) U.S. CI:
`sessssiessscasesseassexseese 725/87; 725/94; 725/98;
`725/118; 709/219; 709/247; 348/384.1;
`348/395.1; 348/438.1; 375/240.1; 375/240.11
`(58) Field of Search ......cccccssseee 725/94, 98, 118,
`725/240.26, 87; 370/468, 236, 230-232,
`235; 709/219, 247; 348/394.1, 395.1, 409.1,
`410.1, 425.1, 438.1: 375/240.1, 240.11
`,
`,
`,
`240.19, 240.08
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,262,875 A * 11/1993 Minceret al. wesc 358/335
`5,650,539 A
`8/1997 Porter et al.
`sescesc.--. 395/200.61
`5,742,343 A
`4/1998 Haskell et al. oo... 348/415
`5,886,733 A *
`3/1999 Zdepski et al. 20.0.0... 348/13
`5,982,436 A * 11/1999 Balakrishnanetal. ...... 348/409
`6,014,694 A *
`1/2000 Aharoni et al. 2.0... 709/219
`ee 709/247
`6,185,625 Bl *
`2/2001 Tsoet al.
`
`8/2001 Hindus et al. oo... 370/468
`6,282,206 B1 *
`
`EP
`
`FOREIGN PATENT DOCUMENTS
`0695094
`1/1996 ee. H04N/7/26
`OTHER PUBLICATIONS
`Chiang, T., et al., “Hierarchical Coding of Digital Televi-
`sion”.IEEE Communications Mea. vol.32, No. 5,
`38-45, (May 1, 1994).
`Zheng, B., et al., “Multimedia Over High Speed Networks:
`Reducing Network Requirements with Fast Buffer Fillup”,
`IEEE Global
`Telecommunications Conference,
`NY,
`XP000825861, 779-784, (1998).
`* cited by examiner
`Primary Examiner—VivekSrivastava
`Assistant Examiner—Ngoc Vu
`(74) Attorney, Agent, or Firm—Lee & Hayes, PLLC
`(57)
`ABSTRACT
`ae
`.
`oo,
`:
`.
`To obtain real-time responses with interactive multimedia
`Servers, the server provides at least
`two different audio/
`visual data streams. A first data stream has fewer bits per
`frame and provides a video image much more quickly than
`a second data stream with a higher numberofbits and hence
`higher quality video image. The first data stream becomes
`available to a client much faster and may be more quickly
`displayed on demand while the second data stream is sent to
`improve the quality as soon as the playback buffer can
`handle it. In one embodiment, an entire video signal
`is
`layered, with a base layer providing the first signal and
`further enhancementlayers comprising the sccond. The base
`layer may be actual image frames or just the audio portion
`of a video stream. The first and second streamsare gradually
`combined in a mannersuchthat the playback buffer does not
`overflow or underflow.
`
`18 Claims, 6 Drawing Sheets
`
`200
`
` VIDEO CLIENT
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`VIDEO CAPTURING TOOLS
`
`Page 1 of 15
`Page 1 of 15
`
`VIMEO/IAC EXHIBIT 1011
`VIMEO/IAC EXHIBIT 1011
`VIMEO ET AL., IPR2019-00833
`VIMEO ETAL., IPR2019-00833
`
`
`
`U.S. Patent
`
`Oct. 21, 2003
`
`Sheet 1 of 6
`
`US 6,637,031 BI
`
`Lv
`
`YOLINON
`
`MYOMISNWAYTYOOT:
`
`eG
`
`MYOMIIN
`
`JOVIYALNI
`AOVIYALNI
`SOVANALNI
`
`1uQd
`
`JAN
`
`ALOWSY
`
`YSLNdNOD
`
`L“Old
`
`ONILWaddO
`
`oO
`A
`—e cee ee hoe eee
`
`NOLWONdd¥-a00STINGON
`OFWALSAS
`
`OIALNA
`epss|erc
`
`YAldVaV
`
`SN@WALSAS
`
`LZ
`
`ONISSIIONd
`
`Widas
`
`“WOLdO
`
`ASIOLLINOVN
`MSIGGYvH
`
`YaHLO
`
`WVveo0"d
`
`STINGOW
`
`Page 2 of 15
`Page 2 of 15
`
`JNYG
`
`JOVIYALNI
`
`SANG
`
`JOVANSLNI
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Oct. 21, 2003
`
`Sheet 2 of 6
`
`US 6,637,031 BI
`
`
`
`YSAYSSOSCIA
`
`vOd
`
`002
`
`802
`
`Page 3 of 15
`Page 3 of 15
`
`
`
`
`
`INSMNOO3CIA
`
`802
`
`
`
`INAMNSDO3QlA
`
`
`
`
`US 6,637,031 BI
`
`Sheet 3 of 6
`
`Oct. 21, 2003
`
`U.S. Patent
`
`Page 4 of 15
`Page 4 of 15
`
`
`
`U.S. Patent
`
`Oct. 21, 2003
`
`Sheet 4 of 6
`
`US 6,637,031 B1
`
`BITS
`
`AB C
`
`OD
`
`BITS
`
`AB
`
`¢
`
`D
`
`BITS
`
`ABB CC D
`
`
`
`
`BITS
`
`E AB c DF
`
`TIME
`
`TIME
`
`TIME
`
`TIME
`
`FIG. 4A
`
`FIG. 4B
`
`FIG. 4C
`
`FIG. 4D
`
`FIG. 4E
`
`BITS
`
`E—E AB
`
`G
`
`TRANSMISSION ,-”
`
`oC DoF
`
`DELAY _”
`
`
`<=
`.
`°
`TIME
`INITIAL ENCODER
`START-UP DELAY
`BUFFER EMPTINESS
`
`Page 5 of 15
`Page 5 of 15
`
`
`
`U.S. Patent
`
`Oct. 21, 2003
`
`Sheet 5 of 6
`
`US 6,637,031 B1
`
`TIME
`
`TIME
`
`
`
`TIME
`
`FIG. 5
`
`FIG. 6
`
`FIG. 7
`
`Page 6 of 15
`Page 6 of 15
`
`BITS
`
`
`
`U.S. Patent
`
`Oct. 21, 2003
`
`Sheet 6 of 6
`
`US 6,637,031 B1
`
` FIG. 8
`
`TIME
`
`TIME
`
`BITS
`
`TIME
`
`FIG. 9
`
`FIG. 10
`
`Page 7 of 15
`Page 7 of 15
`
`
`
`US 6,637,031 B1
`
`1
`MULTIMEDIA PRESENTATION LATENCY
`MINIMIZATION
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to multimedia
`communications and more specifically to latency minimiza-
`tion for on-demand interactive multimedia applications.
`
`COPYRIGHT NOTICE/PERMISSION
`
`10
`
`A portion of the disclosure of this patent document
`contains material which is subject to copyright protection.
`The copyright owner has no objection to the facsimile
`reproduction byanyone of the patent documentorthe patent
`disclosure as it appears in the Patent and Trademark Office
`patent file or records, but otherwise reserves all copyright
`rights whatsoever. The following notice applies to the soft-
`ware and data as described below andin the drawing hereto:
`Copyright © 1998, Microsoft Corporation, All Rights *
`Reserved.
`
`BACKGROUND
`
`Information presentation over the Internet is changing
`dramatically. New time-varying multimedia content is now
`being brought to the Internet, and in particular to the World
`Wide Web (the web), in addition to textual HTML pages and
`still graphics. Here, time-varying multimedia contentrefers
`to sound, video, animated graphics, or any other medium
`that evolves as a function of elapsed time, alone or in
`combination. In many situations, instant delivery and pre-
`sentation of such multimedia content, on demand,is desired.
`“On-demand”is a term for a wide set of technologies that
`enable individuals to select multimedia content from a
`central server for instant delivery and presentation on a
`clicnt (computer or television). For example, vidco-on-
`demand can be used for entertainment (ordering movies
`transmitted digitally), education (viewing training videos)
`and browsing (viewing informative audiovisual material on
`a web page) to name a few examples.
`Users are generally connected to the Internet by a com-
`munications link of limited bandwidth, such as a 56 kilo bils
`per second (Kbps) modem oran integrated services digital
`network (ISDN) connection. Even corporate users are usu-
`ally limited to a fraction of the 1.544 mega bits per second
`(Mbps) ‘I'-1 carrier rates.
`‘This bandwidth limitation pro-
`vides a challenge to on-demand systems: it may be impos-
`sible to transmit a large amount of image or video data over
`a limited bandwidth in the short amountof time required for 5
`“instant delivery and presentation.” Downloading a large
`image or video may take hours before presentation can
`begin. As a consequence, special
`techniques have been
`developed for on-demand processing of large images and
`video.
`
`40
`
`45
`
`2
`transmission enables the user to interact with the server
`instantly, with low delay, or low latency. For example,
`progressive image transmission enables a user to browse
`through a large database of images, quickly aborting the
`transmission of the unwanted images before they are com-
`pletely downloaded to the client.
`Similarly, streaming is a technique that provides time-
`varying content, such as video and audio, on demand over a
`communications link with limited bandwidth. In streaming,
`audiovisualdata is packetized, delivered over a network, and
`playedas the packets are being received at the receiving end,
`as opposed to being played only after all packets have been
`downloaded. Streaming technologies are becoming increas-
`ingly important with the growth of the Internet because most
`users do not have fast enough access to download large
`multimediafiles quickly. With streaming, the client browser
`or application can start displaying the data before the entire
`file has been transmitted.
`
`In a video on-demand delivery system that uses
`streaming,
`the audiovisual data is often compressed and
`stored on a disk on a media server for later transmission to
`
`the client side
`a client system. For streaming to work,
`receiving the data must be able to collect the data and send
`it as a stcady stream to a decoder or an application that is
`processing the data and converting it to sound or pictures. If
`the client receives the data more quickly than required, it
`needs to save the excess data in a buffer. Conversely, if the
`client receives the data more slowly than required,it needs
`to play out some of the data from the buffer. Storing part of
`a multimedia file in this manner before playing the file is
`referred to as buffering. Buffering can provide smooth
`playback even if the client temporarily receives the data
`more quickly or more slowly than required for real-time
`playback.
`There are two reasons that a client can temporarily receive
`data more quickly or more slowly than requiredfor real-time
`playback.First, in a variable-rate transmission system such
`as a packet network, the data arrives at uneven rates. Not
`only does packetized data inherently arrive in bursts, but
`even packets of data that are transmitted from the senderat
`an even rate may notarrive at the receiver at an even rate.
`This is due to the fact that individual packets mayfollow
`different routes, and the delay through any individual router
`may vary depending on the amountoftraffic waiting to go
`through the router. The variability in the rate at which data
`is transmitted through a network is called network jitter.
`Asecond reason that a client can temporarily receive data
`more quickly or more slowly than required for real-time
`playbackis that the media content is encoded to variable bit
`rate. For example, high-motion scenes in a video may be
`encoded with more bits than low-motion scenes. When the
`
`encoded video is transmitted with a relatively constant bit
`rate, then the high-motion framesarrive at a slowerrate than
`the low-motion frames. For both these reasons(variable-rate
`source encoding and variable-rate transmission channels),
`buffering is required at the client to allow a smooth presen-
`tation.
`
`A technique for providing large images on demand over
`a communications link with limited bandwidth is progres-
`sive image transmission. In progressive image transmission,
`each image is encoded, or compressed, in layers, like an
`onion. The first (core) layer, or base layer, represents a
`low-resolution version of the image. Successive layers rep-
`resent successively higher resolution versions of the image.
`The server transmits the layers in order, starting from the
`base layer. The clicnt receives the base layer, and instantly
`presents to the user a low-resolution version of the image.
`The client presents higher resolution versions of the image
`as the successive layers are received. Progressive image
`
`Page 8 of 15
`Page 8 of 15
`
`60
`
`65
`
`Unfortunately, buffering implies delay, or latency. Start-
`up delay refers to the latency the user experiences after he
`signals the server to start transmitting data from the begin-
`ning of the content (such as when a pointer to the contentis
`selected by the user) before the data can be decoded by the
`clicnt system and presented to the user. Scck delay refers to
`the latency the user experiencesafter he signals to the server
`to start transmitting data from an arbitrary place in the
`middle of the content (such as when a seek bar is dragged to
`
`
`
`US 6,637,031 B1
`
`3
`a particular point in time) before the data can be decoded and
`presented. Both start-up and seek delays occur because even
`after the client begins to receive new data, it must wait until
`its bufferis sufficiently full to begin playing out of the buffer.
`It does this in order to guard against future buffer underflow
`due to network jitter and variable-bit rate compression. For
`typical audiovisual coding on the Internet, start-up and seek
`delays between two and ten seconds are common.
`Large start-up and seek delays are particularly annoying
`whenthe user is trying to browse through a large amount of
`audiovisual content trying to find a particular video or a
`particular location in a video. As in the image browsing
`scenario using progressive transmission, most of the time
`the user will want to abort the transmission long before all
`the data are downloaded and presented. In such a scenario,
`delays of two to ten seconds between aborts seem intoler-
`able. What is needed is a method for reducing the start-up
`and seek delays for such “on demand”interactive multime-
`dia applications.
`SUMMARYOF THE INVENTION
`
`The above-identified problems, shortcomings and disad-
`vantages with the prior art, as well as other problems,
`shortcoming and disadvantages, are solved by the present
`invention, which will be understood by reading and studying
`the specification and the drawings. The present invention >
`minimizes the start-up and seek delays for on-demand
`interactive multimedia applications, when the transmission
`bit rate is constrained.
`
`In one embodiment, a server provides at least two differ-
`ent data streams. A first data stream is a low resolution
`stream encoded at a bit rate below the transmissionbit rate.
`Asecond data stream is a normal resolution stream encoded
`
`at a bit rate equal to the transmission bit rate. The server
`initially transmits the low resolution stream faster than real
`time, at a bit rate equal to the transmission bit rate. The client
`receives the low resolution stream [aster than real time, but
`decodes and presents the low resolution stream in real time.
`Unlike previous systems, the client does not need to wait
`for its buffer to become safely full before beginning to
`decode and present. The reasonis that even at the beginning
`of the transmission, when the client buffer is nearly empty,
`the buffer will not underflow, because il is being filled at a
`rate faster than real time, but is being played out at a rate
`equal to real time. Thus, the client can safely begin playing
`outof its buffer as soon as data are received. In this way, the
`delay due to buttering is reduced to nearly zero.
`When the client buffer has grown sufficiently large to
`guard against future underflow by the normal resolution
`stream, the server stops transmission of the low resolution
`stream and begins transmission of the normal resolution
`stream. The system of the present invention reduces the
`start-up or seek delay for interactive multimedia applications
`such as video on-demand, at the expense of initially lower
`quality. The invention includes systems, methods,
`computers, and computer-readable media of varying scope.
`Besides the embodiments, advantages and aspects of the
`invention described here, the invention also includes other
`embodiments, advantages and aspects, as will become
`apparent by reading and studying the drawings and the
`following description.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a diagram of an exemplary computer system in
`which the invention may be implemented.
`FIG. 2 is a diagram of an example networkarchitecture in
`which embodiments of the present invention are incorpo-
`rated.
`
`40
`
`45
`
`60
`
`65
`
`Page 9 of 15
`Page 9 of 15
`
`4
`FIG. 3 is a block diagram representing the data flow for
`a streaming media system for use with the computer network
`of FIG. 2.
`
`FIGS. 4A, 4B, 4C, 4D, and 4E are schedules illustrating
`data flow for example embodiments of the streaming media
`system of FIG. 3.
`FIG. 5 is a decoding schedule for multimedia content
`pre-encoded at a full bit rate.
`FIG. 6 is a schedule showingthe full bit rate encoding of
`FIG. 5 advanced by T seconds.
`FIG. 7 is a schedule showing a low bit rate encoding of
`the content shown in FIG. 5.
`
`10
`
`15
`
`FIG. 8 is a schedule showing the low bit rate encoding
`schedule of FIG. 7 advanced by T seconds and superimposed
`on the advanced schedule of FIG. 6.
`
`FIG. 9 is a schedule showing the transition from the
`delivery of the low bit rate encoded stream of FIG. 7 to the
`data stream of FIG. 6, with a gap to indicate optional bit
`stuffing.
`TIG. 10 is a schedule showing the advanced schedule of
`FIG. 6 with a total of RT bits removed from the initial
`frames.
`
`DESCRIPTION OF TITLE EMBODIMENTS
`
`In the following detailed description of the embodiments,
`reference is made to the accompanying drawings which
`form a part hereof, and in which is shown by way of
`illustration specific embodiments in which the invention
`may be practiced. These embodiments are described in
`sufficient detail to enable those skilled in the art to practice
`the invention, and it is to be understood that other embodi-
`ments may be utilized and that structural, logical and elec-
`trical changes may be made without departing from the
`scope of the present
`inventions. The following detailed
`description ts, therefore, not to be taken in a limiting sense,
`and the scope of the present inventionsis defined only by the
`appended claims.
`The present invention is a system for achieving low
`latency responses from interactive multimedia servers, when
`the transmission bit rate is constrained. A server providesat
`least two different data streams. A first data stream is a low
`resolution stream encodedata bit rate below the transmis-
`sion bit rate. A second data stream is a normal resolution
`
`stream encodedat a bil rate equal to the transmission bit rate.
`Theserverinitially transmits the low resolution stream faster
`than real time, at a bit rate equal to the transmissionbit rate.
`The client receives the low resolution stream faster than real
`time, but decodes and presents the low resolution stream in
`real time. Whentheclient buffer has grown sufficiently large
`to guard against future underflow by the normal resolution
`stream, the server stops transmission of the low resolution
`stream and begins transmission of the normal resolution
`stream. The system of the present invention reduces the
`start-up or seek delay for interactive multimedia applications
`such as video on-demand,at the expense of initially lower
`quality.
`The detailed description of this invention is divided into
`four sections. The first section provides a general description
`of a suitable computing environment in which the invention
`may be implemented including an overview of a network
`architecture for generating, storing and transmitting audio/
`visual data using the present invention. The second section
`illustrates the data flowfor a streaming media system for use
`with the network architecture described in the first section.
`The third section describes the methods of exemplary
`
`
`
`US 6,637,031 B1
`
`10
`
`15
`
`3
`
`5
`embodiments of the invention. The fourth section is a
`conclusion which includes a summary of the advantages of
`the present invention.
`Computing Environment. FIG. 1 provides a brief, general
`description of a suitable computing environment in which
`the invention may be implemented. The invention will
`hereinafter be described in the general context of computer-
`executable program modules containing instructions
`executed by a personal computer (PC). Program modules
`include routines, programs, objects, components, data
`structures, etc. that perform particular tasks or implement
`particular abstract data types. Those skilled in the art will
`appreciate that the invention may be practiced with other
`computer-system configurations,
`including hand-held
`devices, multiprocessor systems, microprocessor-based pro-
`grammable consumer electronics, network PCs,
`minicomputers, mainframe computers, and the like. The
`invention may also be practiced in distributed computing
`environments where tasks are performed by remote process-
`ing devices linked through a communications network. In a -
`distributed computing environment, program modules may
`be located in both local and remote memorystorage devices.
`FIG. 1 cmploys a general-purpose computing device in
`the form of a conventional personal computer 20, which
`includes processing unit 21, system memory 22, and system
`bus 23 that couples the system memory and other system
`components to processing unit 21. System bus 23 may be
`any of several types, including a memory bus or memory
`controller, a peripheral bus, and a local bus, and mayuse any
`of a variety of bus structures. System memory 22 includes
`read-only memory (ROM) 24 and random-access memory
`(RAM)25. A basic input/output system (BIOS) 26,stored in
`ROM 24, contains the basic routines that transfer informa-
`tion between components of personal computer 20. BIOS 24
`also contains start-up routines for the system. Personal
`computer 20 further includes hard disk drive 27 for reading
`from and writing to a hard disk (not shown), magnetic disk
`drive 28 for reading from and writing to a removable
`magnetic disk 29, and optical disk drive 30 for reading from
`and writing to a removable optical disk 31 such as a
`CD-ROM or other optical medium. Hard disk drive 27,
`magnetic disk drive 28, and optical disk drive 30 are
`connected to system bus 23 by a hard-disk drive interface
`32, a magnetic-disk drive interface 33, and an optical-drive
`interface 34, respectively. The drives and their associated
`computer-readable media provide nonvolatile storage of
`computer-readable instructions, data structures, program
`modules and other data for personal computer 20. Although
`the exemplary environmentdescribed herein employsa hard
`disk, a removable magnetic disk 29 and a removable optical
`disk 31, those skilled in the art will appreciate that other
`types of computer-readable media which can store data
`accessible by a computer mayalso be used in the exemplary
`operating environment. Such media may include magnetic
`cassettes, flashmemory cards, digital versatile disks, Ber-
`noulli cartridges, RAMs, ROMs,andthelike.
`Program modules may be stored on the hard disk, mag-
`netic disk 29, optical disk 31, ROM 24 and RAM 25.
`Program modules may include operating system 35, one or
`more application programs 36, other program modules 37,
`and program data 38. A user may enter commands and
`information into personal computer 20 through input devices
`such as a keyboard 40 and a pointing device 42. Other input
`devices (not shown) may include a microphone, joystick,
`gamepad,satellite dish, scanner,or the like. These and other
`input devices are often connected to the processing unit 21
`through a serial-port interface 46 coupled to system bus 23;
`
`40
`
`45
`
`60
`
`65
`
`6
`but they may be connected through other interfaces not
`shown in FIG. 1, such as a parallel port, a game port, or a
`universal serial bus (USB). A monitor 47 or other display
`device also connects to system bus 23 via an interface such
`as a video adapter 48. In addition to the monitor, personal
`computers typically include other peripheral output devices
`(not shown) such as speakers and printers.
`Personal computer 20 may operate in a networked envi-
`ronment using logical connections to one or more remote
`computers such as remote computer 49. Remote computer
`49 may be another personal computer, a server, a router, a
`network PC, a peer device, or other common networknode.
`It
`typically includes many or all of the components
`described above in connection with personal computer 20;
`however, only a storage device 50 is illustrated in FIG. 1.
`The logical connections depicted in FIG. 1 include local-
`area network (LAN) 51 and a wide-area network (WAN) 52.
`Such networking environments are commonplacein offices,
`enterprise-wide computer networks, intranets and the Ioter-
`net.
`
`Whenplaced in a LAN networking environment, PC 20
`connects to local network 51 through a networkinterface or
`adapter 53. When used in a WAN networking environment
`suchas the Internet, PC 20 typically includes modem 54 or
`other means for establishing communications over network
`52. Modem 54 may be internal or external to PC 20, and
`connects to system bus 23 via serial-port interface 46. In a
`networked environment, program modules depicted as resid-
`ing within 20 or portions thereof may be stored in remote
`storage device 50. Of course,
`the network connections
`shownare illustrative, and other means of establishing a
`communications link between the computers may be sub-
`stituted.
`
`FIG. 2 is a diagram of an example network architecture
`200 in which embodiments of the present invention are
`implemented. The example network architecture 200 com-
`prises video capturing tools 202, a video server 204, a
`network 206 and one or more video clients 208.
`
`The video capturing tools 202 comprise any commonly
`available devices for capturing video and audio data, encod-
`ing the data and transferring the encoded data to a computer
`via a standard interface. The example video capturing tools
`202 of FIG. 2 comprise a camera 210 and a computer 212
`having a video capture card, compression software and a
`mass storage device. The video capturing tools 202 are
`coupled to a video server 204 having streaming software and
`optionally having software tools enabling a user to manage
`the delivery of the data.
`‘The video server 204 comprises any commonly available
`computing environment such as the exemplary computing
`environment of FIG. 1, as well as a media server environ-
`ment that supports on-demand distribution of multimedia
`content. The media server environmentof video server 204
`
`comprises streaming softwarc, one or more data storage
`units for storing compressed files containing multimedia
`data, and a communications control unit for controlling
`information transmission between video server 204 and
`
`video clients 208. The video server 204 is coupled to a
`network 206 such as a local-area network or a wide-area
`network. Audio, video, illustrated audio, animations, and
`other multimedia data types are stored on video server 204
`and delivered by an application on-demand over network
`206 to one or more video clients 208.
`
`The video clients 208 comprise any commonly available
`computing environments such as the exemplary computing
`environment of FIG. 1. The video clients 208 also comprise
`
`Page 10 of 15
`Page 10 of 15
`
`
`
`US 6,637,031 B1
`
`a
`
`10
`
`15
`
`20
`
`7
`any commonly available application for viewing streamed
`multimedia file types, including QuickTime (a format for
`video and animation), RealAudio (a format for audio data),
`RealVideo (a format
`for video data), ASF (Advanced
`Streaming Format) and MP4 (the MPEG-4file format). Two
`video clients 208 are shown in FIG. 2. However, those of
`ordinary skill in the art can appreciate that video server 204
`may communicate with a plurality of video clients.
`In operation, for example, a uscr clicks on a link to a video
`clip or other video source, such as camera 210 used for video
`conferencing or other purposes, and an application program
`for viewing streamed multimedia files launches from a hard
`disk of the video client 208. The application begins loading
`in a file for the video whichis being transmitted across the
`network 206 from the video server 204. Rather than waiting
`for the entire video to download, the video starts playing
`after an initial portion of the video has come across the
`network 206 and continues downloading the rest of the
`video while it plays. The user does not have to wait for the
`entire video to download before the user can start viewing.
`However, in existing systems there is a delay for such “on
`demand” interactive applications before the user can start
`viewing the initial portion of the video. ‘The delay, referred
`to herein as a start-up delay or a seek delay, is experienced
`by the user between the time whenthe user signals the video 2
`server 204 to start transmitting data and the time when the
`data can be decoded by the video client 208 and presented
`to the user. However, the present invention, as described
`below, achieves low latency responses from video server
`204 and thus reduces the start-up delay and the seek delay.
`An example computing environmentin which the present
`invention may be implemented has been described in this
`section of the detailed description. In one embodiment, a
`network architecture for on-demand distribution of multi-
`media content comprises video capture tools, a video server,
`a network and one or more videoclients.
`
`Data Flow for a Streaming Media System. The data flow
`for an example embodimentof a streaming media system is
`described by reference to FIGS. 3, 4A, 4B, 4C, 4D and 4E.
`FIG. 3 is a block diagram representing the data flow for a
`streaming media system 300 for use with the network
`architecture of FIG. 2. The streaming media system 300
`comprises an encoder 302 which may be coupled to camera
`210 or other real time or uncompressed video sources, an
`encoder buffer 304, a network 306, a decoder buffer 308 and
`a decoder 310.
`
`40
`
`45
`
`The encoder 302 is a hardware or software component
`that encodes and/or compresses the data for insertion into
`the encoder buffer 304. The encoder buffer 304 is one or ;
`more hardware or software components that stores the
`encoded data until such time as it can be released into the
`network 306. For
`live transmission such as video
`
`8
`a particular instant in time. A graph of times at which bits
`cross a given point is referred to herein as a schedule. The
`schedules at which bits pass point A 312, point B 314, point
`C 316, and point D 318 can be illustrated in a diagram such
`as shown in the FIGS. 4A, 4B, 4C, 4D and 4E.
`FIGS. 4A, 4B, 4C, 4D and 4Eare schedules illustrating
`data flow for example embodiments of the streaming media
`system of FIG. 3. As shown in FIGS. 4A, 4B, 4C, 4D and
`4E, the y-axis corresponds to the total numberof bits that
`have crossed the respective points (i.e. point A, point B,
`point C, and point D in FIG. 3) and the x-axis corresponds
`to elapsed time. In the example shownin FIG.4A, schedule
`A corresponds to the number of bits transferred from the
`encoder 302 to the encoder buffer 304. Schedule B corre-
`sponds to the numberofbits that have left the encoder buffer
`304 and entered the network 306. Schedule C correspondsto
`the number of bits received from the network 306 by the
`decoder buffer 308. Schedule D corresponds to the number
`of bits transferred fromthe decoder buffer 308 to the decoder
`310.
`
`In the example shown in ['IG. 4B, the network 306 has a
`constant bit rate and a constant delay. As a result, schedules
`B and © are linear and are separated temporally by a
`constant transmission delay.
`In the example shown in FIG. 4C, the network 306 is a
`packet network. As a result, schedules B and C have a
`staircase form. The transmission delay is generally not
`constant. Nevertheless, there exist linear schedules B' and C'
`that provide lower and upper bounds for schedules B and C
`respectively. Schedule B'is the latest possible linear sched-
`ule at which encoded bits are guaranteed to be available for
`transmission. Schedule C'
`is the earliest possible linear
`schedule at which received bits are guaranteed to be avail-
`able for decoding. The gap between schedules B' and C' is
`the maximum reasonable transmission delay(includingjitter
`and any retransmission time) plus an allowance for the
`packetization itself. In this way, a packet network can be
`reduced, essentially,
`to a constant bit rate, constant delay
`channel.
`
`Referring now to the example shown in FIG. 4D, for
`real-time applications the end-to-end delay (from capture to
`presentation) must be constant; otherwise there would be
`temporal warping of the presentation. Thus, if the encoder
`and decoder have a constant delay, schedules A and D are
`separated temporally by a constant delay, as illustrated in
`FIG. 4D.
`
`At any giveninstantin time, the vertical distance between
`schedules A and B is the numberofbits in the encoder buffer,
`and the vertical distance between schedules C and D is the
`numberofbits in the decoder buffer. If the decoder attempts
`to remove morebits from the decoder buffer than exist in the
`buffer (i.e., schedule D tries to occur ahead of schedule C),
`then the decoder buffer underflows and an error occurs. To
`
`conferencing, the encoder buffer 304 may be as simple as a
`first-in first-out (FIFO) queue. For video on-demandfrom a 5
`prevent this from happening, schedule A must not precede
`schedule E, as illustrated in FIG. 4D. In FIG. 4D, schedules
`video server 204, the encoder buffer 304 may be a combi-
`nation of a FIFO queue andadisk file on the capture tools
`E and A are congruent to schedules C and D.
`202, transmission. buffers between the capture tools 202 and
`Likewise,
`the encoder buffer should never underflow;
`the video server 204, and a disk file and output FIFO queue
`otherwise the channel is under-utilized and quality sutters.
`on the video server 204. The decoder buffer 308 is a
`An encoder rate control mechanism therefore keeps sched-
`hardware or software componentthat receives encoded data
`ule Abetween the bounds of schedules E and B. This implies
`that schedule D lies between the bounds of schedules C and
`from the network 306, and stores the encoded data until such
`time as it can be decoded by decoder 310. ‘The decoder 310
`F, where sch