`(12) Patent Application Publication (10) Pub. No.: US 2004/0049793 A1
`(43) Pub. Date:
`Mar. 11, 2004
`Chou
`
`US 2004.0049793A1
`
`(54) MULTIMEDIA PRESENTATION LATENCY
`MINIMIZATION
`
`(76) Inventor: Philip A. Chou, Bellevue, WA (US)
`Correspondence Address:
`LEE & HAYES PLLC
`421 W RIVERSIDEAVENUE SUTE 500
`SPOKANE, WA 992.01
`Appl. No.:
`10/658,898
`(21)
`(22) Filed:
`Sep. 10, 2003
`Related U.S. Application Data
`Continuation of application No. 09/205,875, filed on
`Dec. 4, 1998, now Pat. No. 6,637,031.
`
`(63)
`
`Publication Classification
`
`(51) Int. Cl." ..................................................... H04N 7/173
`
`(52) U.S. Cl. ................................................................ 725/87
`
`(57)
`
`ABSTRACT
`
`Systems and methods for presenting time-varying multime
`dia content are described. In one aspect, a lower quality data
`Stream for an initial portion of the multimedia content is
`received. The lower quality data Stream is received at a rate
`faster than a real-time playback rate for the multimedia
`content. The lower quality data Stream was encoded at a bit
`rate below a transmission rate. A higher quality data Stream
`of a Subsequent portion of the multimedia content is
`received. The higher quality data Stream was encoded at a bit
`rate that equals the transmission rate. The initial portion and
`the Subsequent portion of the multimedia content are pre
`Sented at the real-time playback rate. Receiving the initial
`portion faster than the real-time playback rate provides for
`a reduction of latency due to buffering by a desired amount.
`
`Tr -
`
`-
`
`-
`
`- um- w w aro
`
`we as
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`. -
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`APPLICATION 56
`PROGRAMS
`OTHER
`PROGRAM
`MODULES
`
`37
`
`PROGRAM
`DAA
`
`22
`
`21
`
`PROCESSING
`UNT
`
`SYSTEM BUS
`
`48
`VIDEO
`ADAPER
`
`
`
`23
`
`HARD DISK MAGNEC DISK
`DRIVE
`DRME
`INTERFACE
`NERFACE
`
`34
`OPTICAL
`DRIVE
`INTERFACE
`
`SERIAL
`PORT
`INTERFACE
`
`A.
`
`47
`
`MONITOR
`
`--2O
`
`53
`
`LOCAL AREA NETWORK
`
`y E-29
`- - -
`38 y
`37
`JS
`35
`-
`OPERATING APPLICAON OTHER PROGRAM
`SYSTEM | PROGRAMS PROGRAM
`DATA
`MODULES
`
`A
`
`
`
`career
`
`I Peo 52
`
`4-O
`36 APPLICATION - - - -
`PROGRAMS
`
`oDo
`5O
`
`VIMEO/IAC EXHIBIT1007
`VIMEO ET AL., v. BT, IPR2019-00833
`
`
`
`Patent Application Publication Mar. 11, 2004 Sheet 1 of 6
`
`US 2004/0049793 A1
`
`
`
`80L?NOW| Z
`
`O
`
`SETI?IGOW
`
`
`
`Patent Application Publication Mar. 11, 2004 Sheet 2 of 6
`
`US 2004/0049793 A1
`
`
`
`?JEAèJES OBOJIA
`
`#70Z
`E
`
`OOZ
`
`90Z
`
`
`
`|NEITO OEC?IA
`
`- 8OZ
`
`
`
`I NEITO OBCIJA
`
`
`
`
`
`
`
`Patent Application Publication Mar. 11, 2004 Sheet 3 of 6
`
`US 2004/0049793 A1
`
`
`
`
`
`Patent Application Publication Mar. 11, 2004 Sheet 4 of 6
`
`US 2004/0049793 A1
`
`BITS
`
`A B C
`
`D
`
`FIG. 4A
`
`TIME
`
`TIME
`
`BITS
`
`A B
`
`C D
`
`FIG. 4B
`
`BITS
`
`A B', B C,C' D
`
`FG. 4C
`
`
`
`
`
`
`
`FG. 4E
`
`TME
`
`TRANSMISSION /
`DELAY /
`
`INTIAL ENCODER
`BUFFEREMPTINESS
`
`e
`START-UP DELAY
`
`TIME
`
`
`
`Patent Application Publication Mar. 11, 2004 Sheet 5 of 6
`
`US 2004/0049793 A1
`
`BITS
`
`BTS
`
`T
`
`BITS
`
`
`
`FIG. 5
`
`F.G. 6
`
`FIG. 7
`
`TIME
`
`TIME
`
`TIME
`
`
`
`Patent Application Publication Mar. 11, 2004 Sheet 6 of 6
`
`US 2004/0049793 A1
`
`BITS
`
`FG. 8
`
`BITS
`
`
`
`M
`
`FIG. 9
`
`FIG. 10
`
`TIME
`
`TIME
`
`TIME
`
`
`
`US 2004/0049793 A1
`
`Mar. 11, 2004
`
`MULTIMEDIA PRESENTATION LATENCY
`MINIMIZATION
`
`RELATED APPLICATIONS
`0001. This application is a continuation under 37 CFR
`1.53(b) of U.S. patent application Ser. No. 09/205,875, titled
`“Multimedia Presentation Latency Minimization', filed on
`Dec. 4, 1998, commonly assigned hereto, and hereby incor
`porated by reference.
`
`TECHNICAL FIELD
`0002 The present invention relates generally to multi
`media communications and more Specifically to latency
`minimization for on-demand interactive multimedia appli
`cations.
`
`COPYRIGHT NOTICE/PERMISSION
`0003) A portion of the disclosure of this patent document
`contains material which is Subject to copyright protection.
`The copyright owner has no objection to the facsimile
`reproduction by anyone of the patent document or the patent
`disclosure as it appears in the Patent and Trademark Office
`patent file or records, but otherwise reserves all copyright
`rights whatsoever. The following notice applies to the Soft
`ware and data as described below and in the drawing hereto:
`CopyrightC) 1998, Microsoft Corporation, All Rights
`Reserved.
`
`BACKGROUND
`0004 Information presentation over the Internet is
`changing dramatically. New time-varying multimedia con
`tent is now being brought to the Internet, and in particular to
`the World Wide Web (the web), in addition to textual HTML
`pages and Still graphics. Here, time-varying multimedia
`content refers to Sound, Video, animated graphics, or any
`other medium that evolves as a function of elapsed time,
`alone or in combination. In many situations, instant delivery
`and presentation of Such multimedia content, on demand, is
`desired.
`0005 “On-demand” is a term for a wide set of technolo
`gies that enable individuals to Select multimedia content
`from a central Server for instant delivery and presentation on
`a client (computer or television). For example, Video-on
`demand can be used for entertainment (ordering movies
`transmitted digitally), education (viewing training videos)
`and browsing (viewing informative audiovisual material on
`a web page) to name a few examples.
`0006 Users are generally connected to the Internet by a
`communications link of limited bandwidth, Such as a 56 kilo
`bits per Second (Kbps) modem or an integrated Services
`digital network (ISDN) connection. Even corporate users are
`usually limited to a fraction of the 1.544 mega bits per
`second (Mbps) T-1 carrier rates. This bandwidth limitation
`provides a challenge to on-demand Systems: it may be
`impossible to transmit a large amount of image or Video data
`over a limited bandwidth in the short amount of time
`required for “instant delivery and presentation.” Download
`ing a large image or Video may take hours before presenta
`tion can begin. As a consequence, Special techniques have
`been developed for on-demand processing of large images
`and Video.
`
`0007. A technique for providing large images on demand
`over a communications link with limited bandwidth is
`progressive image transmission. In progressive image trans
`mission, each image is encoded, or compressed, in layers,
`like an onion. The first (core) layer, or base layer, represents
`a low-resolution version of the image. Successive layers
`represent Successively higher resolution versions of the
`image. The Server transmits the layers in order, Starting from
`the base layer. The client receives the base layer, and
`instantly presents to the user a low-resolution version of the
`image. The client presents higher resolution versions of the
`image as the Successive layers are received. Progressive
`image transmission enables the user to interact with the
`Server instantly, with low delay, or low latency. For example,
`progressive image transmission enables a user to browse
`through a large database of images, quickly aborting the
`transmission of the unwanted images before they are com
`pletely downloaded to the client.
`0008 Similarly, streaming is a technique that provides
`time-varying content, Such as Video and audio, on demand
`over a communications link with limited bandwidth. In
`Streaming, audiovisual data is packetized, delivered over a
`network, and played as the packets are being received at the
`receiving end, as opposed to being played only after all
`packets have been downloaded. Streaming technologies are
`becoming increasingly important with the growth of the
`Internet because most users do not have fast enough access
`to download large multimedia files quickly. With Streaming,
`the client browser or application can Start displaying the data
`before the entire file has been transmitted.
`0009. In a video on-demand delivery system that uses
`Streaming, the audiovisual data is often compressed and
`Stored on a disk on a media Server for later transmission to
`a client System. For Streaming to work, the client Side
`receiving the data must be able to collect the data and Send
`it as a steady Stream to a decoder or an application that is
`processing the data and converting it to Sound or pictures. If
`the client receives the data more quickly than required, it
`needs to Save the exceSS data in a buffer. Conversely, if the
`client receives the data more slowly than required, it needs
`to play out some of the data from the buffer. Storing part of
`a multimedia file in this manner before playing the file is
`referred to as buffering. Buffering can provide Smooth
`playback even if the client temporarily receives the data
`more quickly or more slowly than required for real-time
`playback.
`0010. There are two reasons that a client can temporarily
`receive data more quickly or more slowly than required for
`real-time playback. First, in a variable-rate transmission
`System Such as a packet network, the data arrives at uneven
`rates. Not only does packetized data inherently arrive in
`bursts, but even packets of data that are transmitted from the
`Sender at an even rate may not arrive at the receiver at an
`even rate. This is due to the fact that individual packets may
`follow different routes, and the delay through any individual
`router may vary depending on the amount of traffic waiting
`to go through the router. The variability in the rate at which
`data is transmitted through a network is called network jitter.
`0011) A second reason that a client can temporarily
`receive data more quickly or more slowly than required for
`real-time playback is that the media content is encoded to
`variable bit rate. For example, high-motion Scenes in a Video
`
`
`
`US 2004/0049793 A1
`
`Mar. 11, 2004
`
`may be encoded with more bits than low-motion Scenes.
`When the encoded video is transmitted with a relatively
`constant bit rate, then the high-motion frames arrive at a
`slower rate than the low-motion frames. For both these
`reasons (variable-rate Source encoding and variable-rate
`transmission channels), buffering is required at the client to
`allow a Smooth presentation.
`0012 Unfortunately, buffering implies delay, or latency.
`Start-up delay refers to the latency the user experiences after
`he signals the Server to Start transmitting data from the
`beginning of the content (Such as when a pointer to the
`content is selected by the user) before the data can be
`decoded by the client System and presented to the user. Seek
`delay refers to the latency the user experiences after he
`Signals to the Server to Start transmitting data from an
`arbitrary place in the middle of the content (Such as when a
`Seek bar is dragged to a particular point in time) before the
`data can be decoded and presented. Both start-up and Seek
`delays occur because even after the client begins to receive
`new data, it must wait until its buffer is sufficiently full to
`begin playing out of the buffer. It does this in order to guard
`against future buffer underflow due to network jitter and
`variable-bit rate compression. For typical audiovisual cod
`ing on the Internet, Start-up and Seek delays between two and
`ten Seconds are common.
`0013 Large start-up and seek delays are particularly
`annoying when the user is trying to browse through a large
`amount of audiovisual content trying to find a particular
`Video or a particular location in a Video. AS in the image
`browsing Scenario using progressive transmission, most of
`the time the user will want to abort the transmission long
`before all the data are downloaded and presented. In Such a
`Scenario, delays of two to ten Seconds between aborts Seem
`intolerable. What is needed is a method for reducing the
`Start-up and Seek delays for Such "on demand” interactive
`multimedia applications.
`SUMMARY
`0.014 Systems and methods for presenting time-varying
`multimedia content are described. In one aspect, a lower
`quality data Stream for an initial portion of the multimedia
`content is received. The lower quality data Stream is
`received at a rate faster than a real-time playback rate for the
`multimedia content. The lower quality data Stream was
`encoded at a bit rate below a transmission rate. A higher
`quality data Stream of a Subsequent portion of the multime
`dia content is received. The higher quality data Stream was
`encoded at a bit rate that equals the transmission rate. The
`initial portion and the Subsequent portion of the multimedia
`content are presented at the real-time playback rate. Receiv
`ing the initial portion faster than the real-time playback rate
`provides for a reduction of latency due to buffering by a
`desired amount.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0.015
`FIG. 1 is a diagram of an exemplary computer
`System in which the invention may be implemented.
`0016 FIG. 2 is a diagram of an example network archi
`tecture in which embodiments of the present invention are
`incorporated.
`0017 FIG. 3 is a block diagram representing the data
`flow for a streaming media System for use with the computer
`network of FIG. 2.
`
`0018 FIGS. 4A, 4B, 4C, 4D, and 4E are schedules
`illustrating data flow for example embodiments of the
`streaming media system of FIG. 3.
`0019 FIG. 5 is a decoding schedule for multimedia
`content pre-encoded at a full bitrate.
`0020 FIG. 6 is a schedule showing the full bit rate
`encoding of FIG. 5 advanced by T seconds.
`0021 FIG. 7 is a schedule showing a low bit rate
`encoding of the content shown in FIG. 5.
`0022 FIG. 8 is a schedule showing the low bit rate
`encoding schedule of FIG. 7 advanced by T seconds and
`Superimposed on the advanced schedule of FIG. 6.
`0023 FIG. 9 is a schedule showing the transition from
`the delivery of the low bit rate encoded stream of FIG. 7 to
`the data stream of FIG. 6, with a gap to indicate optional bit
`Stuffing.
`0024 FIG. 10 is a schedule showing the advanced sched
`ule of FIG. 6 with a total of RT bits removed from the initial
`frames.
`
`DETAILED DESCRIPTION
`0025. In the following detailed description of the embodi
`ments, reference is made to the accompanying drawings
`which form a part hereof, and in which is shown by way of
`illustration specific embodiments in which the invention
`may be practiced. These embodiments are described in
`Sufficient detail to enable those skilled in the art to practice
`the invention, and it is to be understood that other embodi
`ments may be utilized and that Structural, logical and elec
`trical changes may be made without departing from the
`Scope of the present inventions. The following detailed
`description is, therefore, not to be taken in a limiting Sense,
`and the Scope of the present inventions is defined only by the
`appended claims.
`0026. The present invention is a system for achieving low
`latency responses from interactive multimedia Servers, when
`the transmission bit rate is constrained. A Server provides at
`least two different data Streams. A first data Stream is a low
`resolution Stream encoded at a bit rate below the transmis
`Sion bit rate. A Second data Stream is a normal resolution
`Stream encoded at a bit rate equal to the transmission bit rate.
`The server initially transmits the low resolution stream faster
`than real time, at a bit rate equal to the transmission bit rate.
`The client receives the low resolution stream faster than real
`time, but decodes and presents the low resolution Stream in
`real time. When the client buffer has grown sufficiently large
`to guard against future underflow by the normal resolution
`Stream, the Server Stops transmission of the low resolution
`Stream and begins transmission of the normal resolution
`Stream. The System of the present invention reduces the
`Start-up or Seek delay for interactive multimedia applications
`Such as Video on-demand, at the expense of initially lower
`quality.
`0027. The detailed description of this invention is divided
`into four Sections. The first Section provides a general
`description of a Suitable computing environment in which
`the invention may be implemented including an overview of
`a network architecture for generating, Storing and transmit
`ting audio/visual data using the present invention. The
`Second Section illustrates the data flow for a streaming media
`
`
`
`US 2004/0049793 A1
`
`Mar. 11, 2004
`
`system for use with the network architecture described in the
`first section. The third section describes the methods of
`exemplary embodiments of the invention. The fourth section
`is a conclusion which includes a Summary of the advantages
`of the present invention.
`0028. An Exemplary Computing Environment.
`0029 FIG. 1 provides a brief, general description of a
`Suitable computing environment in which the invention may
`be implemented. The invention will hereinafter be described
`in the general context of computer-executable program
`modules containing instructions executed by a personal
`computer (PC). Program modules include routines, pro
`grams, objects, components, data Structures, etc. that per
`form particular tasks or implement particular abstract data
`types. Those skilled in the art will appreciate that the
`invention may be practiced with other computer-System
`configurations, including hand-held devices, multiprocessor
`Systems, microprocessor-based programmable consumer
`electronics, network PCs, minicomputers, mainframe com
`puters, and the like. The invention may also be practiced in
`distributed computing environments where tasks are per
`formed by remote processing devices linked through a
`communications network. In a distributed computing envi
`ronment, program modules may be located in both local and
`remote memory Storage devices.
`0030 FIG. 1 employs a general-purpose computing
`device in the form of a conventional personal computer 20,
`which includes processing unit 21, System memory 22, and
`system bus 23 that couples the system memory and other
`System components to processing unit 21. System buS 23
`may be any of Several types, including a memory bus or
`memory controller, a peripheral bus, and a local bus, and
`may use any of a variety of bus structures. System memory
`22 includes read-only memory (ROM) 24 and random
`access memory (RAM) 25. A basic input/output system
`(BIOS) 26, stored in ROM 24, contains the basic routines
`that transfer information between components of personal
`computer 20. BIOS 24 also contains start-up routines for the
`system. Personal computer 20 further includes hard disk
`drive 27 for reading from and writing to a hard disk (not
`shown), magnetic disk drive 28 for reading from and writing
`to a removable magnetic disk 29, and optical disk drive 30
`for reading from and writing to a removable optical disk 31
`such as a CD-ROM or other optical medium. Hard disk drive
`27, magnetic disk drive 28, and optical disk drive 30 are
`connected to system bus 23 by a hard-disk drive interface
`32, a magnetic-disk drive interface 33, and an optical-drive
`interface 34, respectively. The drives and their associated
`computer-readable media provide nonvolatile Storage of
`computer-readable instructions, data structures, program
`modules and other data for personal computer 20. Although
`the exemplary environment described herein employs a hard
`disk, a removable magnetic disk 29 and a removable optical
`disk 31, those skilled in the art will appreciate that other
`types of computer-readable media which can Store data
`accessible by a computer may also be used in the exemplary
`operating environment. Such media may include magnetic
`cassettes, flash-memory cards, digital versatile disks, Ber
`noulli cartridges, RAMs, ROMs, and the like.
`0.031) Program modules may be stored on the hard disk,
`magnetic disk 29, optical disk 31, ROM 24 and RAM 25.
`Program modules may include operating System 35, one or
`
`more application programs 36, other program modules 37,
`and program data 38. A user may enter commands and
`information into personal computer 20 through input devices
`such as a keyboard 40 and a pointing device 42. Other input
`devices (not shown) may include a microphone, joystick,
`game pad, Satellite dish, Scanner, or the like. These and other
`input devices are often connected to the processing unit 21
`through a Serial-port interface 46 coupled to System buS 23;
`but they may be connected through other interfaces not
`shown in FIG. 1, Such as a parallel port, a game port, or a
`universal serial bus (USB). A monitor 47 or other display
`device also connects to System buS 23 via an interface Such
`as a Video adapter 48. In addition to the monitor, personal
`computers typically include other peripheral output devices
`(not shown) Such as Speakers and printers.
`0032 Personal computer 20 may operate in a networked
`environment using logical connections to one or more
`remote computerS Such as remote computer 49. Remote
`computer 49 may be another personal computer, a Server, a
`router, a network PC, a peer device, or other common
`network node. It typically includes many or all of the
`components described above in connection with personal
`computer 20; however, only a storage device 50 is illustrated
`in FIG. 1. The logical connections depicted in FIG. 1
`include local-area network (LAN) 51 and a wide-area net
`work (WAN) 52. Such networking environments are com
`monplace in offices, enterprise-wide computer networks,
`intranets and the Internet.
`0033. When placed in a LAN networking environment,
`PC 20 connects to local network 51 through a network
`interface or adapter 53. When used in a WAN networking
`environment such as the Internet, PC 20 typically includes
`modem 54 or other means for establishing communications
`over network 52. Modem 54 may be internal or external to
`PC 20, and connects to system bus 23 via serial-port
`interface 46. In a networked environment, program modules
`depicted as residing within 20 or portions thereof may be
`stored in remote storage device 50. Of course, the network
`connections shown are illustrative, and other means of
`establishing a communications link between the computers
`may be substituted.
`0034 FIG. 2 is a diagram of an example network archi
`tecture 200 in which embodiments of the present invention
`are implemented. The example network architecture 200
`comprises Video capturing tools 202, a Video Server 204, a
`network 206 and one or more video clients 208.
`0035. The video capturing tools 202 comprise any com
`monly available devices for capturing video and audio data,
`encoding the data and transferring the encoded data to a
`computer via a Standard interface. The example video cap
`turing tools 202 of FIG. 2 comprise a camera 210 and a
`computer 212 having a Video capture card, compression
`Software and a mass Storage device. The Video capturing
`tools 202 are coupled to a video server 204 having streaming
`Software and optionally having Software tools enabling a
`user to manage the delivery of the data.
`0036) The video server 204 comprises any commonly
`available computing environment Such as the exemplary
`computing environment of FIG. 1, as well as a media Server
`environment that Supports on-demand distribution of mul
`timedia content. The media server environment of video
`Server 204 comprises Streaming Software, one or more data
`
`
`
`US 2004/0049793 A1
`
`Mar. 11, 2004
`
`Storage units for Storing compressed files containing multi
`media data, and a communications control unit for control
`ling information transmission between video server 204 and
`video clients 208. The video server 204 is coupled to a
`network 206 Such as a local-area network or a wide-area
`network. Audio, Video, illustrated audio, animations, and
`other multimedia data types are stored on video server 204
`and delivered by an application on-demand over network
`206 to one or more video clients 208.
`0037. The video clients 208 comprise any commonly
`available computing environments Such as the exemplary
`computing environment of FIG. 1. The video clients 208
`also comprise any commonly available application for View
`ing streamed multimedia file types, including QuickTime (a
`format for video and animation), RealAudio (a format for
`audio data), RealVideo (a format for video data), ASF
`(Advanced Streaming Format) and MP4 (the MPEG-4 file
`format). Two video clients 208 are shown in FIG. 2.
`However, those of ordinary skill in the art can appreciate that
`video server 204 may communicate with a plurality of video
`clients.
`0.038. In operation, for example, a user clicks on a link to
`a Video clip or other Video Source, Such as camera 210 used
`for Video conferencing or other purposes, and an application
`program for viewing Streamed multimedia files launches
`from a hard disk of the video client 208. The application
`begins loading in a file for the Video which is being
`transmitted across the network 206 from the video server
`204. Rather than waiting for the entire video to download,
`the video starts playing after an initial portion of the video
`has come acroSS the network 206 and continues download
`ing the rest of the video while it plays. The user does not
`have to wait for the entire video to download before the user
`can Start viewing. However, in existing Systems there is a
`delay for Such "on demand” interactive applications before
`the user can Start viewing the initial portion of the Video. The
`delay, referred to herein as a start-up delay or a Seek delay,
`is experienced by the user between the time when the user
`Signals the Video Server 204 to Start transmitting data and the
`time when the data can be decoded by the video client 208
`and presented to the user. However, the present invention, as
`described below, achieves low latency responses from Video
`server 204 and thus reduces the start-up delay and the seek
`delay.
`0039. An example computing environment in which the
`present invention may be implemented has been described in
`this Section of the detailed description. In one embodiment,
`network architecture for on-demand distribution of multi
`media content comprises Video capture tools, a Video Server,
`a network and one or more Video clients.
`0040 Data Flow for a Streaming Media System.
`0041. The data flow for an example embodiment of a
`streaming media system is described by reference to FIGS.
`3, 4A, 4B, 4C, 4D and 4E. FIG. 3 is a block diagram
`representing the data flow for a streaming media system 300
`for use with the network architecture of FIG. 2. The
`streaming media system 300 comprises an encoder 302
`which may be coupled to camera 210 or other real time or
`uncompressed Video Sources, an encoder buffer 304, a
`network 306, a decoder buffer 308 and a decoder 310.
`0042. The encoder 302 is a hardware or software com
`ponent that encodes and/or compresses the data for insertion
`
`into the encoder buffer 304. The encoder buffer 304 is one
`or more hardware or Software components that Stores the
`encoded data until Such time as it can be released into the
`network 306. For live transmission Such as video confer
`encing, the encoder buffer 304 may be as simple as a first-in
`first-out (FIFO) queue. For video on-demand from a video
`server 204, the encoder buffer 304 may be a combination of
`a FIFO queue and a disk file on the capture tools 202,
`transmission buffers between the capture tools 202 and the
`video server 204, and a disk file and output FIFO queue on
`the video server 204. The decoder buffer 308 is a hardware
`or Software component that receives encoded data from the
`network 306, and stores the encoded data until Such time as
`it can be decoded by decoder 310. The decoder 310 is a
`hardware or Software component that decodes and/or
`decompresses the data for display.
`0043. In operation, each bit produced by the encoder 302
`passes point A312, point B314, point C 316, and point D
`318 at a particular instant in time. A graph of times at which
`bits croSS a given point is referred to herein as a Schedule.
`The schedules at which bits pass point A312, point B314,
`point C316, and point D 318 can be illustrated in a diagram
`such as shown in the FIGS. 4A, 4B, 4C, 4D and 4E.
`0044 FIGS. 4A, 4B, 4C, 4D and 4E are schedules
`illustrating data flow for example embodiments of the
`streaming media system of FIG. 3. As shown in FIGS. 4A,
`4B, 4C, 4D and 4E, the y-axis corresponds to the total
`number of bits that have crossed the respective points (i.e.
`point A, point B, point C, and point D in FIG. 3) and the
`X-axis corresponds to elapsed time. In the example shown in
`FIG. 4A, schedule A corresponds to the number of bits
`transferred from the encoder 302 to the encoder buffer 304.
`Schedule B corresponds to the number of bits that have left
`the encoder buffer 304 and entered the network 306. Sched
`ule C corresponds to the number of bits received from the
`network 306 by the decoder buffer 308. Schedule D corre
`sponds to the number of bits transferred from the decoder
`buffer 308 to the decoder 310.
`0045. In the example shown in FIG. 4B, the network 306
`has a constant bit rate and a constant delay. As a result,
`Schedules B and C are linear and are Separated temporally by
`a constant transmission delay.
`0046) In the example shown in FIG. 4C, the network 306
`is a packet network. As a result, Schedules B and C have a
`Staircase form. The transmission delay is generally not
`constant. Nevertheless, there exist linear Schedules B" and C
`that provide lower and upper bounds for schedules B and C
`respectively. Schedule B' is the latest possible linear sched
`ule at which encoded bits are guaranteed to be available for
`transmission. Schedule C' is the earliest possible linear
`Schedule at which received bits are guaranteed to be avail
`able for decoding. The gap between schedules B" and C' is
`the maximum reasonable transmission delay (including jitter
`and any retransmission time) plus an allowance for the
`packetization itself. In this way, a packet network can be
`reduced, essentially, to a constant bit rate, constant delay
`channel.
`0047 Referring now to the example shown in FIG. 4D,
`for real-time applications the end-to-end delay (from capture
`to presentation) must be constant; otherwise there would be
`temporal warping of the presentation. Thus, if the encoder
`
`
`
`US 2004/0049793 A1
`
`Mar. 11, 2004
`
`and decoder have a constant delay, Schedules A and D are
`Separated temporally by a constant delay, as illustrated in
`FIG. 4D.
`0.048. At any given instant in time, the vertical distance
`between schedules A and B is the number of bits in the
`encoder buffer, and the vertical distance between Schedules
`C and D is the number of bits in the decoder buffer. If the
`decoder attempts to remove more bits from the decoder
`buffer than exist in the buffer (i.e., schedule D tries to occur
`ahead of schedule C), then the decoder buffer underflows
`and an error occurs. To prevent this from happening, Sched
`ule A must not precede schedule E, as illustrated in FIG. 4D.
`In FIG. 4D, schedules E and A are congruent to schedules
`C and D.
`0049. Likewise, the encoder buffer should never under
`flow; otherwise the channel is under-utilized and quality
`Suffers. An encoder rate control mechanism therefore keeps
`Schedule A between the bounds of Schedules E and B. This
`implies that schedule D lies between the bounds of sched
`ules C and F, where Schedules E, A, and B are congruent to
`Schedules C, D, and F, as shown in FIG. 4D. The decoder
`buffer must be at least as big as the encoder buffer (otherwise
`it would overflow), but it need not be any bigger. For the
`purpose of this description, it is assumed that the encoder
`and decoder buffers are the same size. (In practice the
`encoder buffer may be combined with a disk and a network
`transmitter buffer, and the decoder buffer may be combined
`with a network receiver buffer, so the overall buffer sizes at
`the transmitter and receiver may differ.) The end-to-end
`delay is the Sum of the transmission delay and the decoder
`buffer delay (or equivalently the encoder buffer delay).
`0050 Referring now to FIG. 4E, in an on-demand sys
`tem, the media content is pre-encoded and Stored on a disk
`on a media Server for later transmission to a client. In this
`case, an actual transmission Schedule G may come an
`arbitrarily long time after the original transmission Schedule
`B, as illustrated in FIG. 4E. Although schedule B is no
`longer the transmission Schedule, it continues to guide the
`encoder's rate control mechanism, So that the decoder buffer
`Size can be bounded.
`0051. In an on-demand system, a user experiences a delay
`between when the user Signals the