`Baumgartner et al.
`
`54
`
`75
`
`73
`
`21
`22
`51
`52
`58
`
`METHOD AND APPARATUS FOR
`SYNCHRONIZNG AUDIO AND WIDEO DATA
`STREAMS IN A MULTIMEDIA SYSTEM
`Inventors: Donn M. Baumgartner; Thomas A.
`Dye, both of Austin,Tex.
`Assignee: Dell USA, L.P., Round Rock,Tex.
`
`Appl. No.: 255,604
`Filed:
`Jun. 8, 1994
`Int, C. m. H04NS/04
`... 348/515: 395/806
`Field of Search ........................ 348/515; 395/154,
`395/162-164; 345/122; 375/355; H04N 5/04,
`5/12
`
`56
`
`References Cited
`U.S. PATENT DOCUMENTS
`Cooper.
`Re. 33,535
`2/1991
`Haji et al..
`4538,176
`8/1985
`Kouyama et al. .
`4,618,890
`10/1986
`Kouyama et al. .
`4,644,400
`2/1987
`Chapelle et al. .
`4,679,085
`7/1987
`Cooper.
`4,703,355
`10/1987
`4,750,034
`6/1988
`Lem.
`4,851,909
`7/1989
`Noske et al. .
`5,170.252
`12/1992
`Gear et al. .
`5,420,801
`5/1995
`Dockter et al. ..................... 364,514 R
`5,430,485 7/1995 Lankford et al.
`... 348/515
`5,471,576 11/1995 Yee ......................................... 395/154
`FOREIGN PATENT DOCUMENTS
`2305278 12/1990 Japan .................................... 34.8/515
`
`f Synchronizationy
`module /
`
`
`
`
`
`
`
`Deterinine current
`video frame number
`O
`
`Determine curren
`audio positio.
`
`Coculate equivcent
`audio frone number
`
`512
`
`514
`
`Coll oudic driver to
`citton wive rote stutus
`
`584
`
`Calculate and store
`Oudio Frome rute
`
`505
`
`Imtialize varicbles
`
`s508
`
`too far chead
`(nd cucio clairg
`
`Yes
`
`Calculate synchronization
`error yuantity
`
`
`
`cudio poused
`(nd wideo co-ght
`lip
`
`
`
`
`
`
`
`524
`
`US005642171A
`Patent Number:
`11
`45 Date of Patent:
`
`5,642,171
`Jun. 24, 1997
`
`OTHER PUBLICATIONS
`Nicolaou, Cosmos "An Architecture for Real-Time Multi
`media Communication Systems"; IEEE Journal on Selected
`Areas in Communications, vol. 8, No. 3, Apr. 1990.
`Little, Thomas D.C. and Arif Ghafoor "Synchronization and
`Storage Models for Multimedia Objects”; IEEE Journal on
`Selected Areas in Communications, vol. 8, No.3, Apr. 1990.
`Primary Examiner-Sherrie Hsia
`57
`ABSTRACT
`Amethod and apparatus for synchronizing audio and video
`data streams in a computer system during a multimedia
`presentation to produce a correctly synchronized presenta
`tion. The preferred embodiment of the invention utilizes a
`nonlinear feedback method for data synchronization. The
`method of the present invention periodically queries each
`driver for the current audio and video position (or frame
`number) and calculates the synchronization error. The syn
`chronization error is used to determine a tempo value
`adjustment to one of the data stream designed to place the
`video and audio back in sync. The method then adjusts the
`audio or video tempo to maintain the audio and video data
`streams in synchrony. In the preferred embodiment of the
`invention, the video tempo is changed nonlinearly over time
`to achieve a match between the video position and the
`equivalent audio position. The method applies a Smoothing
`function to the determined tempo value to prevent overcom
`pensation. The method of the present invention can operate
`in any hardware system and in any software environment
`and can be adapted to existing systems with only minor
`modifications.
`
`40 Claims, 7 Drawing Sheets
`
`(
`
`
`
`Cetermine tempo
`
`525
`
`530
`
`so wolve
`
`Adjust tempo using
`a smoothing function
`
`532 --
`
`538
`
`Yes
`
`Audio
`ohead of audio
`status reporled
`OS bod
`
`error=3 and
`lost empg
`nct t cinq
`ge
`
`548
`Set tempo to nominal role
`
`tempt
`550 - M.
`Set last temp3 to notinal ote
`
`552
`
`temp3
`lost-tempo
`
`Store used tempo ic
`nexl Sync coll
`
`
`
`PAGE 1 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`U.S. Patent
`
`Jun. 24, 1997
`
`Sheet 1 of 7
`
`5,642,171
`
`| ±0IJI
`
`
`
`ZZ!
`
`
`PAGE 2 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`U.S. Patent
`
`Jun. 24, 1997
`
`Sheet 2 of 7
`
`5,642,171
`
`
`
`
`
`07%
`
`0
`G
`Z
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`p.100 0|pný
`
`0%" |
`
`2 (f)I, H.
`
`OZZ
`
`
`PAGE 3 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`U.S. Patent
`
`Jun. 24, 1997
`
`Sheet 3 of 7
`
`5,642,171
`
`
`
`
`
`
`
`
`
`
`PAGE 4 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`U.S. Patent
`
`Jun. 24, 1997
`
`Sheet 4 of 7
`
`5,642,171
`
`
`
`
`
`pJ00 0|pný 910MpIDH
`
`| 79
`
`þ :) I H
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`PAGE 5 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`U.S. Patent
`
`Jun. 24, 1997
`
`Sheet 5 of 7
`
`5,642,171
`
`Synchronization
`module
`
`
`
`
`
`
`
`
`
`
`
`No (> Col Oudio driver to
`
`obton WOve rote Stotus
`
`504
`
`Colculote Ond store
`Oudio frome rote
`
`Initialize VOriables
`
`506
`
`508
`
`Determine Current
`video frome number
`
`510
`
`Determine Current
`Oudio position
`
`Calculote equivalent
`Oudio frome number
`
`512
`
`514
`
`NO
`
`
`
`
`
`
`
`518
`
`Audio
`to0 for OheOd
`Ond OUdio ploying
`
`Colculote synchronizotion
`error quantity
`
`516
`
`
`
`No
`
`522
`
`OUdio poused
`Ond video Cought
`Up
`
`
`
`Yes
`
`Stop Oudio
`
`Restort Oudio
`
`520
`
`524
`
`FIC 5A
`
`
`PAGE 6 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`U.S. Patent
`
`Jun. 24, 1997
`
`Sheet 6 of 7
`
`5,642,171
`
`FIC 5B
`
`Determine tempo
`
`526
`
`528
`
`
`
`
`
`Video
`Storted
`
`Yes
`
`530
`
`Set tempo to
`slow volue
`
`Adjust tempo using
`Q Smoothing function
`
`
`
`
`
`532
`
`
`
`
`
`558
`
`Audio
`ahead of Oudio
`status reported
`OS bod
`
`Yes
`
`Set tempo to
`nominot rote
`
`536
`
`No
`
`Audio
`dato OVoloble
`
`exit
`
`
`
`No
`
`
`
`
`
`Sync
`error =0 Ond
`last tempo
`not = nomino
`rote
`
`
`
`
`
`
`
`
`
`
`
`Synchronizotion
`error X tolerance
`
`Yes
`
`Set tempo to nominol rote
`
`
`
`Adjust tempo
`
`Set lost tempo to nominol rote
`
`Adjust tempo
`
`554
`
`Store used tempo for
`next Sync coll
`
`
`PAGE 7 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`U.S. Patent
`
`Jun. 24, 1997
`
`Sheet 7 of 7
`
`5,642,171
`
`Common Storting Point
`602
`
`
`
`608
`
`Audio Do to
`Streom
`
`Video Doto
`StreOm
`
`604
`
`606
`
`FIC.. 6
`
`
`PAGE 8 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`1.
`METHOD AND APPARATUS FOR
`SYNCHRONZING AUDIO AND WIDEO DATA
`STREAMS IN A MULTIMEDIA SYSTEM
`
`5,642,171
`
`2
`200 bytes of storage, if the data is not compressed. Vector
`based images are created by defining the end points,
`thickness, color, pattern and curvature of lines and solid
`objects comprised within the image. Thus, a vector-based
`image includes a definition which consists of a numerical
`representation of the coordinates of the object, referenced to
`a corner of the image.
`Bit-mapped images are the most prevalent type of image
`storage format, and the most common bit-mapped-image file
`formats are as follows. A file format referred to as BMP is
`used for Windows bit-map files in 1-, 2-, 4-, 8-, and 24-bit
`color depths. BMPfiles containabit-map header that defines
`the size of the image, the number of color planes, the type .
`of compression used (if any), and the palette used. The
`Windows DIB (device-independent bit-map) format is a
`variant of the BMP format that includes a color table
`defining the RGB (red green blue) values of the colors used.
`Other types of bit-map formats include the TIF (tagged
`image format file), the PCX (Zsoft Personal Computer
`Paintbrush Bitmap) file format, the GIF (graphics inter
`change file) format, and the TGA (Texas Instruments
`Graphic Architecture) file format.
`The standard Windows format for bit-mapped images is a
`256-color device-independent bit map (DIB) with a BMP
`(the Windows bit-mapped file format) or sometimes a DIB
`extension. The standard Windows format for vector-based
`images is referred to as WMF (Windows metafile).
`Compression
`Full-motion video implies that video images shown on the
`computer's screen simulate those of a television set with
`identical (30 frames-per-second) frame rates, and that these
`images are accompanied by high-quality stereo sound. A
`large amount of storage is required for high-resolution color
`images, not to mention a full-motion video sequence. For
`example, a single frame of NTSC video at 640-by-400-pixel
`resolution with 16-bit color requires 512K of data perframe.
`At 30 flames per second, over 15 Megabytes of data storage
`are required for each second of full motion video. Due to the
`large amount of storage required for full motion video,
`various types of video compression algorithms are used to
`reduce the amount of necessary storage. Video compression
`can be performed either in real-time, i.e., on the fly during
`video capture, or on the stored video file after the video data
`has been captured and stored on the media. In addition,
`different video compression methods exist for still graphic
`images and for full-motion video.
`Examples of video data compression for still graphic
`images are RLE (run-length encoding) and JPEG (Joint
`Photographic Experts Group) compression. RLE is the stan
`dard compression method for Windows BMP and DIB files.
`The RLE compression method operates by testing for dupli
`cated pixels in a single line of the bit map and stores the
`number of consecutive duplicate pixels rather than the data
`for the pixel itself. JPEG compression is a group of related
`standards that provide either lossless (no image quality
`degradation) or lossy (imperceptible to severe degradation)
`compression types. Although JPEG compression was
`designed for the compression of still images rather than
`video, several manufacturers supply JPEG compression
`adapter cards for motion video applications.
`In contrast to compression algorithms for still images,
`most video compression algorithms are designed to com
`press full motion video. Video compression algorithms for
`motion video generally use a concept referred to as inter
`frame compression, which involves storing only the differ
`
`10
`
`15
`
`20
`
`FIELD OF THE INVENTION
`The present invention relates generally to multimedia
`computer systems, and more particularly to a method and
`apparatus for synchronizing video and audio data streams in
`a computer system during a multimedia presentation.
`DESCRIPTION OF THE RELATED ART
`Multimedia computer systems have become increasingly
`popular over the last several years due to their versatility and
`their interactive presentation style. A multimedia computer
`system can be defined as a computer system having a
`combination of video and audio outputs for presentation of
`audio-visual displays. A modem multimedia computer sys
`tem typically includes one or more storage devices such as
`an optical drive, a CD-ROM, a hard drive, a videodisc, or an
`audiodisc, and audio and video data are typically stored on
`one or more of these mass storage devices. In some file
`formats the audio and video are interleaved together in a
`single file, while in otherformats the audio and video data
`are stored in different files, many times on different storage
`25
`media. Audio and video data for a multimedia display may
`also be stored in separate computer systems that are net
`worked together. In this instance, the computer system
`presenting the multimedia display would receive a portion of
`the necessary data from the other computer system via the
`network cabling.
`A multimedia computer system also includes a video card
`such as a VGA (Video Graphics Array) card which provides
`output to a video monitor, and a sound card which provides
`audio output to speakers. A multimedia computer system
`may also include a video accelerator card or other special
`ized video processing card for performing video functions,
`such as compression, decompression, etc. When a computer
`system displays a multimedia presentation, the computer
`system microprocessor reads the audio and video data stored
`on the respective mass storage devices, or received from the
`other computer system in a distributed system, and provides
`the audio stream through the sound card to the speakers and
`provides the video stream through the VGA card and any
`specialized video processing hardware to the computer
`video monitor. Therefore, when a computer system presents
`an audio-visual display, the audio data stream is decoupled
`from the video data stream, and the audio and video data
`streams are processed by separate hardware subsystems.
`A multimedia computer system also includes an operating
`system and drivers for controlling the various hardware
`elements used to create the multimedia display. For
`example, a multimedia computer includes an audio driver or
`sound card driver for controlling the sound card and a video
`driver for controlling the optional video processing card.
`55
`One example of an operating system which supports mul
`timedia presentations is the Multimedia Extensions for the
`Microsoft Windows operating system.
`Graphic images used in Windows multimedia applica
`tions can be created in either of two ways, these being
`bit-mapped images and vector-based images. Bit-mapped
`images comprise a plurality of picture elements (pixels) and
`are created by assigning a color to each pixel inside the
`image boundary. Most bit-mapped color images require one
`byte per pixel for storage, so large bit-mapped images create
`65
`correspondingly large files. For example, a full-screen, 256
`color image in 640-by-480-pixel VGA mode requires 307,
`
`45
`
`30
`
`35
`
`50
`
`
`PAGE 9 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`3
`ences between successive frames in the data file. Interframe
`compression begins by digitizing the entire image of a key
`frame. Successive frames are compared with the key frame,
`and only the differences between the digitized data from the
`key frame and from the successive frames are stored.
`Periodically, such as when new scenes are displayed, new
`key frames are digitized and stored, and subsequent com
`parisons begin from this new reference point. It is noted that
`interframe compression ratios are content-dependent, i.e., if
`the video clip being compressed includes many abruptscene
`transitions from one image to another, the compression is
`less efficient. Examples of video compression which use an
`interframe compression technique are MPEG, DVI and
`Indeo, among others.
`MPEG (Moving Pictures Experts Group) compression is
`a set of methods for compression and decompression of full
`motion video images that uses the interframe compression
`technique described above. The MPEG standard requires
`that sound be recorded simultaneously with the video data,
`and the video and audio data are interleaved in a single file
`to attempt to maintain the video and audio synchronized
`during playback. The audio data is typically compressed as
`well, and the MPEG standard specifies an audio compres
`sion method referred to as ADPCM (Adaptive Differential
`Pulse Code Modulation) for audio data.
`A standard referred to as Digital Video Interactive (DVI)
`format developed by Intel Corporation is a compression and
`storage format for full-motion video and high-fidelity audio
`data. The DVI standard uses interframe compression tech
`niques similar to that of the MPEG standard and uses
`ADPCM compression for audio data. The compression
`method used in DVI is referred to as RTV 2.0 (real time
`video), and this compression method is incorporated into
`Intel's AVK (audio/video kernel) software for its DVI prod
`uct line. IBM has adopted DVI as the standard for displaying
`video for its Ultimedia product line. The DVI file format is
`based on the Intel i750 chipset and is supported through the
`Media Control Interface (MCI) for Windows. Microsoft and
`Intel jointly announced the creation of the DVMCI (digital
`video media control interface) command set for Windows
`3.1 in 1992.
`The Microsoft Audio Video Interleaved (AVID format is a
`special compressed file structure format designed to enable
`video images and synchronized sound stored on CD-ROMs
`to be played on PCs with standard VGA displays and audio
`45
`adapter cards. The AVI compression method uses an inter
`frame method, i.e., the differences between successive
`frames are stored in a manner similar to the compression
`methods used in DVI and MPEG. The AVI format uses
`symmetrical software compression-decompression
`techniques, i.e., both compression and decompression are
`performed in real time. Thus AVI files can be created by
`recording video images and sound in AVI format from a
`VCR or television broadcastin real time, if enough free hard
`disk space is available.
`In the AVI format, data is organized so that coded frame
`numbers are located in the middle of an encoded data file
`containing the compressed audio and compressed video.The
`digitized audio and video data are organized into a series of
`frames, each having header information. Each frame of the
`audio and video data streams is tagged with a frame number
`that typically depends upon the frame rate. For example, at
`every 33 milliseconds (ms) or a 30th of a second, a frame
`number is embedded in the header of the video frame and at
`every 30th of a second, or 33 ms, the same frame number is
`embedded in the header of the audio track. The number
`assigned to the frames is, therefore, coordinated so that the
`
`65
`
`30
`
`35
`
`40
`
`50
`
`55
`
`5,642,171
`
`10
`
`15
`
`20
`
`25
`
`4
`corresponding audio and video frames are originally tagged
`with the same number. Therefore, since the frames are
`initially received simultaneously, the frames can actually be
`preprocessed so that tag codes are placed into the header
`files of the audio and the video for tracking the frame
`number and position of the audio and video tracks.
`In the AVI format, the audio and video information are
`interleaved (alternated in blocks) in the CD-ROM to mini
`mize delays that would result from using separate tracks for
`video and audio information. Also, the audio and video data
`are interleaved to synchronize the data as it is stored on the
`system. This is done in an attempt to synchronize the audio
`and video data during playback.
`The Apple QuickTime format was developed by Apple for
`displaying animation and video on Macintosh computers,
`and has become a de facto multimedia standard. Apple's
`QuickTime and Microsoft's AVI take a parallel approach to
`the presentation of video stored on CD-ROMs, and the
`performance of the two systems is similar. The QuickTime
`format, like AVI, uses software compression and decom
`pression techniques but also can employ hardware devices,
`similar to those employed by DVI, to speed processing. The
`Apple QuickTimeformat became available for the PC under
`Microsoft Windows in late 1992.
`As mentioned above, the audio and video data streams in
`a multimedia presentation are processed by separate hard
`ware subsystems under the control of separate device driv
`ers. The audio and video data are separated into separate data
`streams that are then transmitted to separate audio and video
`subsystems. The video data is transmitted to the video
`subsystem for display, and the audio data is transmitted to
`the sound subsystem for broadcast. These two subsystems
`are addressed by separate drivers, and each driver is loaded
`dynamically by the operating system during a multimedia
`presentation. In an operating system that is multi-tasking,
`has multiple drivers, or has multiple windows, the time
`period between the servicing of drivers is indeterminate. If
`a driver is not serviced by the operating system in time for
`the next frame, a portion of the multimedia systems may
`stall, resulting in the audio not being synchronized with the
`video. When the audio and video portions of a multimedia
`presentation become unsynchronized, many times this lack
`of synchronization is noticeable to the viewer, resulting in a
`less pleasing display. One result of audio and video data
`being out of sync is that the viewer may hear words that do
`not match the lips of the speaker, a situation commonly
`called "out of lip sync."
`Therefore, many times the corresponding audio and video
`frames of a multimedia presentation are not played synchro
`nously together. The reasons for the audio and video data
`streams falling out of sync during a presentation include the
`inherent decoupling of the audio and video data streams in
`separate subsystems in conjunction with system bottlenecks
`and performance issues associated with the large amounts of
`data that are required to be manipulated during a multimedia
`presentation. As mentioned above, full motion video clips
`with corresponding audio require massive amounts of sys
`tem resources to process. However, a considerably greater
`amount of processing is required to display the video data
`than is required for the audio data. First the video data must
`be decompressed either in software or in a codec
`(compression-decompression) device. If the color depth of
`the video is higher than that of the display, such as when an
`AVIfile with 16 bit video is played on an 8 bit display, the
`computer must dither colors to fit within the display's color
`restrictions. Also, if the selected playback window size is
`inconsistent with the resolution at which the video was
`captured, the computer is required to scale each frame.
`
`
`PAGE 10 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`5
`In addition to the greater amount of processing required
`for video data, the amount of video processing can vary
`considerably, thus further adversely affecting synchroniza
`tion. For example, one variable that affects the speed of
`video playbackis the decompression performed on the video
`data. The performance of software decompression algo
`rithms can vary for a number of reasons. For example, due
`to the interframe method of compressing data, the number of
`bytes that comprise each video frame is variable, depending
`on how similar the prior video frame is to the current video
`frame. Thus, more time is required to process a series of
`frames in which background is moving than is required to
`process a series of frames containing only minor changes in
`the foreground. Other variables include whether the color
`depth of the video equals that of the display and whether the
`selected playback window size is consistent with the reso
`lution at which the video was captured, as mentioned above.
`In addition, a slow CPU adversely affects every stage in
`the processing of a video file for playback. A sluggish hard
`disk or CD-ROM controller can also adversely affect per
`20
`formance as can the performance of the display controller or
`video card. Also, other demands can be made on the system
`as a result of something as simple as a mouse movement.
`While the above processing is being performed on the video
`and audio data, and while other demands are made on
`system resources, it becomes very difficult to ensure that the
`audio and video data remain in synchronization.
`Video for Windows includes a method which presumably
`attempts to maintain the audio and video portions of a
`multimedia display in sync, i.e., attempts to adapt when the
`computer system cannot keep up with either the video or
`audio portions of the display. Video for Windows bench
`marks the video hardware when it first begins execution as
`well as every time thereafter that the default display is
`changed. The results of these tests are used to determine a
`35
`particular system's baseline display performance at various
`resolutions and color depths. Video for Windows then uses
`this information regarding the capabilities of the video
`system to adjust the video frame rate to match the bench
`marked performance for the default display. Video for
`Windows maintains the continuity of the audio at all costs
`because a halting audio track is deemed more distracting.
`When the burden of the video playback is such that the
`system cannot keep up, Video for Windows skips frames
`during playback or adjusts the frame rate continuously as the
`system's resource usage patterns change.
`However, the method used by video for Windows in
`adjusting the video rate to match the benchmarked perfor
`mance of the default display results in an average frame rate
`suitable for the benchmark determined at the time the default
`was last changed. Attempts to display video frames contain
`ing an unusually heavy amount of non-repetitive data will
`slow processing down to the point where the benchmarked
`frame rate is no longer useful. When this happens, video
`frames are skipped because the burden of processing the
`video data becomes too great to preserve lip-sync in the
`display. The result can be "jerky” movement of the images
`of persons speaking as noted in Discover Windows 3.1
`Multimedia, by Roger Jennings (Que Corp. 1992), p.
`105-106. Thus, the method used by Video for Windows has
`proven to be inadequate, i.e., the video and audio portions
`still fall out of sync or exhibit "jerky" movement during a
`presentation.
`Shortcomings inherent in decoupled audio multimedia
`systems have been a problem for some time, and various
`efforts have been made to synchronize the audio and video
`portions of a presentation. There has been a recognized need
`
`50
`
`6
`in the industry for a solution to this problem. However, no
`satisfactory solution has been found, prior to the present
`invention.
`Therefore, a method and apparatus is desired which
`provides improved synchronization between digital audio
`and digital video data streams in a multimedia computer
`system, i.e., a method is needed to assure that corresponding
`video and audio frames are played back together. A syn
`chronization method is also desired that does not require the
`use of an encoding procedure prior to the processing of
`audio and video digital signals. It is also desirable to provide
`a multimedia synchronization system that is capable of
`functioning consistently whether video and audio data are
`delivered to the system in separate files or interleaved in one
`file.
`
`SUMMARY OF THE ENVENTION
`The present invention comprises a method and apparatus
`for synchronizing separate audio and video data streams in
`a multimedia system. The preferred embodiment of the
`invention utilizes a nonlinear feedback method for data
`synchronization that is independent of hardware, the oper
`ating system and the video and audio drivers used. The
`system and method of the present invention does not require
`that incoming data be time stamped, or that any timing
`information exist in the video data stream relative to audio
`and video data correspondence. Further, the data is not
`required to be modified in any way prior to the transfer of
`data to the video and audio drivers, and no synchronization
`information need be present in the separated audio and video
`data streams that are being synchronized by the system and
`method of the present invention. The preferred embodiment
`of the present invention requires that there be a common
`starting point for the audio and video data, i.e., that them be
`a time index of Zero where the audio and video are both in
`synchrony, such that the first byte of audio and video digital
`data are generated simultaneously.
`The synchronization method of the present invention is
`called periodically during a multimedia display to synchro
`nize the video and audio data streams. In the preferred
`embodiment, a periodic timer is set to interrupt the multi
`media operating system at uniform intervals during a mul
`timedia display and direct the operating system to invoke the
`synchronization method of the present invention. When the
`synchronization method is invoked, the method first queries
`the video driver to determine the current video frame
`position and then queries the audio driver to determine the
`current audio position. The current audio position is then
`used to compute the equivalent audio frame number. The
`synchronization method compares the video and audio
`frame positions and computes a synchronization error value,
`which is essentially the number of frames by which the
`video frame position is in front of or behind the current
`audio frame position.
`The synchronization error is used to assign a tempo value
`meaningful to either the video driver or the audio driver. In
`the preferred embodiment, the method adjusts the video
`tempo to maintain video synchronization, but in an alternate
`embodiment the method adjusts the audio tempo to maintain
`synchronization. Once a video tempo value has been
`determined, the preferred method adjusts this video tempo
`value by applying a smoothing function, i.e., a weighted
`average of prior tempo values, to the determined tempo
`value. If the synchronization error is determined to be
`greater than a defined tolerance, i.e., if the audio and video
`data streams are more than a certain number of flames out of
`
`5,642,171
`
`10
`
`15
`
`25
`
`30
`
`45
`
`55
`
`65
`
`
`PAGE 11 OF 19
`
`SONOS EXHIBIT 1014
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`5,642,171
`
`7
`sync, and if the tempo value is not equal to the last tempo
`value previously sent to the video driver, then the method
`adjusts the video frame speed by passing the tempo value to
`the video driver.
`If the synchronization error is approximately 0, i.e., the
`audio and video data streams are substantially in sync, and
`the prior determined tempo value passed to the video driver
`was not the nominal rate, i.e., the rate intended to exactly
`match the audiorate, the method passes a video tempo value
`at the nominal rate to the video driver. In other words, if the
`audio and video data streams are in sync, a tempo value at
`the nominal rate is passed to the video driver. This removes
`any affects of the smoothing function, which otherwise
`would change the tempo value to other than the nominal
`rate.
`The method also determines if the audio is too far ahead
`of the video and if the audio is playing. If so, the audio is
`paused to allow the video to catch up. If the method
`determines that the audio is paused and that the video has
`caught up, the method restarts the audio. The method saves
`the video tempo value for comparison during the next call by
`the periodic timer and surrenders control to the operating
`system until called again.
`Therefore, the present invention provides an improved
`method of synchronizing the audio and video data streams
`during a multimedia presentation to provide a correctly
`synchronized presentation. The present invention permits
`the use of existing software drivers and multimedia operat
`ing systems. Further, the method of the present invention
`operates independently of where the audio and video data
`are stored as well as the type of operating system or drivers
`being used.Thus the presentinvention operates regardless of
`whether the audio and video data are interleaved in one file,
`stored on different media, or stored in separate computer
`systems. Also, the present invention does not require any
`type of time stamping or tagging of data, and thus does not
`require any modification of the video or audio data. Further,
`the present invention operates regardless of the type of
`compression/decompression algorithm used on the video
`data.
`
`O
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`8
`of the present invention is shown. It is noted that FIG. 1
`illustrates only portions of a functioning computer system,
`and those elements not necessary to the understanding of the
`operation of the present invention have been omitted for
`simplicity. As shown, the multimedia computer system
`includes a CPU 102 coupled to a host bus 106. Main
`memory 104 is also coupled to the hostbus 106. The host
`bus 106 is coupled to an expansion bus 112 by means of a
`bus controller 110. The expansion bus may be any of various
`types including the AT (advanced technology) bus or indus
`try standard architecture (ISA) bus, the EISA (extended
`industry standard architecture) bus, a microchannel (MCA)
`bus, etc. A video card or video adapter such as a VGA (video
`graphics array) card 120 is coupled to the expansion bus 112
`and is adapted to interface to a video monitor 122, as shown.
`The computer system may also include a video accelerator
`card 124 for performing compression/decompression
`(codec) functions. However, in the preferred embodiment
`the computer system does not include a video accelerator
`card. An audio card or sound card 130 is also coupled to the
`expansion bus 112 and interfaces to a speaker 132. The audio
`board 130 is preferentially a Sound Blaster II brand card
`made by Creative Labs, Inc. of Milpitas, Calif.
`Various mass storage devices are also coupled to the
`expansion bus 112, preferably including a CD-ROM 140,
`and a hard drive 142, as well as others. One or more of these
`mass storage devices store video and audio data which is
`used during presentation of a multimedia display. The audio
`and video data may