`Bheda et al.
`
`[54] AUDIO/VIDEO SUBPROCESSOR METHOD
`AND STRUCTURE
`
`[75] Inventors: Hemant Bheda, Cupertino; Ygal Arbel,
`E?l?ogta’hganha Snmvasan’ Fremont’
`
`-
`
`-
`
`_
`
`US006002441A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,002,441
`Dec. 14, 1999
`
`5,617,502
`5,638,531
`5,664,218
`
`,
`
`,
`
`5,742,361
`
`4/1997 Ort et al. ................................ .. 386/97
`6/1997 Crump et al.
`. 711/123
`9/1997 Kim et al. ..... ..
`348/15
`Earn? ft 191- ------------------------ -
`4/1998 Nakase e161. ........................ .. 348/423
`
`ope e a. .......................... ..
`
`_
`_
`_
`_
`[73] Asslgnee? Natlonal Semlconductor Corporatlon,
`Santa Clara
`
`Primary Examiner—Vu Le
`Attorney, Agent, or Firm—Steven F. CaserZa; Flehr
`Hohbach Test Albritton & Herbert
`
`[21] Appl. No.: 08/742,583
`-
`_
`O t. 28 1996
`22 F1 d.
`c
`’
`1 e
`[
`1
`[51]
`Int. Cl.6 ..................................................... .. H04N 7/32
`[52] US. Cl. .......... ..
`348/423; 348/384; 348/390
`[58] Field of Search ................................... .. 348/423, 384,
`348/390, 14, 512, 515, 416; 382/232, 234,
`307; 395/80032, 80033, 80034, 80035;
`386/98; 364/131_134; 711/1, 100, 147;
`375/240; 712/32_35; H04N 7/32
`
`[56]
`
`References Cited
`
`U'S' PATENT DOCUMENTS
`9/1988 Roche e161. .......................... .. 358/433
`4,772,956
`5/1993 Normile er a], __
`382/234
`5,212,742
`5,253,078 10/1993 Balkanski et al.
`358/426
`358/432
`5,270,832 12/1993 Balkanski et al
`5,379,356
`1/1995 Purcell et a1~
`~~ 348/416
`
`' ' ' '
`
`li;
`
`fuzknflac'i ' ' ' ' ' ' ' ' '
`an or """ "
`’
`’
`348/402
`9/1996 R tt
`t
`l.
`..
`5,557,538
`348/584
`9/1996 Fgrsférej ~~~~~ n
`575597562
`395/114
`5566 089 10/1996 Hoogenboom
`348/510
`5,594,660
`1/1997 Sung 6161. ...... ..
`348/423
`5,598,352
`1/1997 Rosenau et al.
`5,606,428
`2/1997 Hanselman ............................ .. 358/404
`
`ABSTRACT
`[57]
`A novel method and apparatus for decoding a compressed
`audio/video signal to produce decoded audio and decoded
`video signals. The decoding tasks are partitioned into “pre
`processing tasks” and “post-processing tasks.” Pre
`Processing tasks involve one Or more non-Signal Processing
`oriented operations Which do not require extensive comput
`ing resources. Pre-processing tasks are assigned to be
`executed by the host processor, Which can perform these
`tasks Without straining it computational resources. Pre
`processing tasks include demultiplexing the compressed
`audio/video stream into compressed audio and compressed
`video streams, performing audio pre-processing on the com
`pressed audio stream and performing video pre-processing
`0n the Compressed Video Stream Post-Processing tasks
`involve one or more signal processing oriented operations
`Which require extensive computing resources. Pre
`processing tasks are assigned to be executed by a dedicated
`subprocessor. Post-processing tasks include audio post
`
`processing and video post-processing. In an embodiment,
`video frame tasks are also performed as part of video
`.
`.
`.
`post-processing. Post-processing performed by the ded1
`cated subprocessor outputs a decoded audio signal and a
`decoded Vldeo 518M1
`
`5 Claims, 5 Drawing Sheets
`
`FviJe; l_=re-—p;o_c<;s;ihg_ ____________ _ \iicieBiBs'i-béée‘séirié i
`'1
`(Dequantization)
`:
`Compressed
`video data stream .
`.
`.
`|
`VLD
`Motion
`106
`1*’ (Dequantization)
`lnvelgsigsusgtgémon
`' compensation 1
`I
`l
`\\ L - ~ — — — — — — ~ ~ - - - - — — - — - - - - — — - "1
`i
`
`l
`
`Decoded
`video data
`stream
`116
`
`102
`
`Demultiplex
`
`‘ compressed
`' audio/video
`signal
`
`110
`
`114 |
`:
`
`112
`
`f;
`
`i
`
`Videoframe
`outputtask
`j
`
`124
`
`L _ _ _ _ _ _ _ _ _}
`
`104
`
`118
`
`120
`
`Decoded
`audio data
`
`_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ , _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|
`
`stream
`
`C 108
`d
`552225;:
`stream
`
`l
`
`VLD
`:
`Dequantization
`-
`Denormalization
`1
`: L Audio Pre-processing
`
`'
`
`i
`Inverse Transformation:
`I
`?|tering,windowing,
`.
`reconstruction
`Audio Post-processing :
`
`|
`
`/
`
`122
`
`PRE-PROCESSING TASKS performed by Host
`CPU
`
`POST-PROCESSING TASKS
`performed by dedicated subprocessor
`
`AMAMZON 1007
`Page 1 of 12
`
`
`
`US. Patent
`
`Dec. 14, 1999
`
`Sheet 1 0f 5
`
`6,002,441
`
`umuoomo
`
`9%82>
`
`88%
`
`0.._.
`
`moEQSO
`
`83%59:0
`
`mm_‘
`
`59:00694
`
`83%
`
`wNF
`
`NNw
`
`063mumuoomo
`
`EmmbmEmu
`
`
`
`.9562?.3:ng
`
`8:025:82
`
`o._> om?w_,r
`“cozmgemcmfimama:
`
`
`8:33:9ch
`
`acumufictocmo
`
`
`
`9.6080293
`
`_IIIIIIlllIlllIIII|I|IIIIlllIll||IllIlIlIIJ
`
`rMEDOE
`
`5305.
`
`cosmmcoasoo
`
`82>
`
`5626”.vNF
`
`o_a
`
`b<RF
`
`82>
`
`confiscmzc
`
`$52: ,mcfioomo
`Foomegscozmwscmzumo
`
`
`ummmmano
`
`EmmbmEmu89>
`
`mow
`
`NrrorF
`
`Noe
`
`6333,50
`
`umwwmano
`
`823296
`
`_mcm_w
`
`v9.
`
`ummmanoo
`
`Emu23m
`
`Emmbm
`
`wor
`
`ANIAMZON 1007
`
`Page 2 0f 12
`
`AMAMZON 1007
`Page 2 of 12
`
`
`
`
`
`
`
`US. Patent
`
`w
`
`whS
`
`mam“8:w02.
`
`.“983cEggmnsm>2155.2m222.95
`8:8688:390
`
`M,680%Em3;
`
`So062ESo82>
`
`m:9:N2N3
`
`m21)
`
`92
`
`00,6
`
`l
`
`
`
`2,H.m.<.moan.
`
`MNmagma
`
`70.Swimmam“mo:
`
`
`HboEmEmomtmfiDuo
`
`0:mm?
`
`v?
`
`ANIAMZON 1007
`
`Page 3 0f 12
`
`AMAMZON 1007
`Page 3 of 12
`
`
`
`
`
`U
`
`eD
`
`9
`
`w
`
`5f0
`
`309.
`
`%mam“mo:
`
`t500_U:<E50omE>man?mjS.R:
`
`
`
`0:238806mmmm?8:390mw:
`9538933E92839699393_M,
`
`2,0
`
`1
`
`Mm#50:
`
`09v9.60.:wmw
`
`E296gm#8:
`
`29:2208:25Duo
`
`ANIAMZON 1007
`
`Page 4 0f 12
`
`AMAMZON 1007
`Page 4 of 12
`
`
`
`
`
`S
`
`6,002,441
`
`US. Patent
`
`m8m
`
`
`pmNNFInnaI4EmgmaS8226m3588
`822063”ummmmanowuuIn-I632.53M«2ax935930m0E9».82>
`
`5“cozmccemcmfi0202:m9
`
`
`
`HOD36>:—mmom_>on:unu.acumwcmanoconfiscmsc8.62:
`Emmbm
`o:I|1aIIIIII
`3uo8:22AcoszscmsumovEmgmEmu82>
`
`IWCWmmwm.m.an_no_uuw«4. Emgw
`838933“V2852”E8E8thone
`
`
`855023305.58so:B855thwas.02.8301“.-me
`
`552,85;655::‘8:35:3083228
`
`
`couoauwcooecozmuaéocooEmuchasm
`
`ummwwano
`
`65%
`
`e.mmDOE
`
`ANIAMZON 1007
`
`Page 5 0f 12
`
`AMAMZON 1007
`Page 5 of 12
`
`
`
`U.S. Patent
`
`Dec. 14,1999
`
`Sheet 5 of5
`
`6,002,441
`
`52mm
`
`2&0
`
`m2
`//
`
`.05
`
`/
`
`K E ‘ mwo>< PE
`
`mm:
`
`, K r ‘ 0mm ‘
`
`m MEDQE
`
`&
`
`0%
`
`O: J
`
`09 K E: K \\
`
`222.61
`
`Nov
`
`No
`
`DEE
`
`on?
`
`AMAMZON 1007
`Page 6 of 12
`
`
`
`6,002,441
`
`1
`AUDIO/VIDEO SUBPROCESSOR METHOD
`AND STRUCTURE
`
`TECHNICAL FIELD
`
`This invention pertains to the storage of video information
`in digital format, and more speci?cally to novel apparatus
`and method for decompressing or “decoding” compressed
`digital audio/video signals in a highly ef?cient manner.
`
`BACKGROUND
`
`10
`
`2
`describes in more detail some of the prior art and other novel
`techniques for decoding digitally compressed audio/video
`signals.
`There are several prior art techniques used to perform the
`decompression steps depicted in FIG. 1. One such prior art
`technique uses advanced microprocessors, for example the
`586 microprocessors, to perform real-time video decoding
`using softWare. Although this technique achieves real time
`audio/video decoding, a major draWback of this technique is
`that considerable CPU resources are consumed during the
`decoding process. This results in the video delivery being
`jerky and the audio output lacking full ?delity. As a result,
`the quality of the output video and audio signals are beloW
`their expected quality levels. The inefficient use of limited
`CPU resources also causes performance degradation of other
`concurrently executing applications due to lack of CPU
`resources.
`In an effort to solve the inef?cient CPU usage problem,
`prior art techniques perform audio/video decoding using
`dedicated hardWare decoders Which off-load the audio/video
`decoding tasks from the host CPU. These dedicated hard
`Ware decoders serve as slave processors to the host CPU,
`With the slave decoder performing the task of audio/video
`decoding. An example of such a dedicated hardWare decoder
`is described in US. Pat. Nos. 5,253,078, and 5,270,832
`assigned to C-Cube Microsystems, Milpitas, Calif. An
`example of a dedicated audio/video decoder is the CL480
`device available from C-Cube Corporation.
`One such dedicated hardWare audio/video decoder system
`130 is depicted in FIG. 2. As shoWn in FIG. 2, a dedicated
`hardWare decoder 132 communicates With host CPU 134
`over a host (PCI or ISA) bus interface 138. Dedicated
`hardWare decoder 132 is also coupled to dynamic random
`access memory (DRAM) 133, video mixer 142 and audio
`digital to analog converter (DAC) 146. Video mixer 142 is
`also coupled to graphics subsystem 144 and DAC 146 is
`coupled to audio subsystem 148 such as a SoundBlastermTM
`sound card.
`Dedicated hardWare decoder 132 accepts a multiplexed
`compressed audio/video signal as input. The dedicated hard
`Ware decoder then demultiplexes the compressed audio/
`video signal into compressed audio and compressed video
`data streams (corresponding to step 104 in FIG. 1). Dedi
`cated hardWare decoder 132 then performs variable length
`decoding, dequantiZation, inverse discrete cosine transfor
`mation and motion compensation on the compressed video
`data stream. The resultant decoded video data stream is then
`fed to mixer 142 Which also receives a graphics input from
`graphics subsystem 144. The video output from mixer 142
`is then forWarded to a video output device such as a monitor
`for display.
`The compressed audio data stream is subjected to variable
`length decoding, denormaliZation, dequantiZation and
`inverse transformation including ?ltering and WindoWing
`functions, before being passed through DAC 148 and then
`forWarded to audio subsystem 148. The audio output from
`audio subsystem 148 can then be fed to any prior art audio
`output device.
`As described above, the entire task of audio/video decod
`ing is performed by dedicated hardWare decoder 132. Host
`CPU 134 is utiliZed only for monitoring the audio/video
`processing tasks to be performed by dedicated hardWare
`decoder 132. While this technique frees up CPU resources
`Which Would otherWise be dedicated to the audio/video
`processing tasks, it also has many disadvantages.
`One major disadvantage is that dedicated hardWare
`decoders are very expensive. This is because of the
`
`15
`
`20
`
`30
`
`35
`
`40
`
`45
`
`The CCITT/ISO committee has standardiZed a set of
`compression algorithms for still and full motion digital
`video compression and decompression. These compression
`schemes are popularly knoWn as JPEG, MPEG and H.261
`(P><64). Application of these standards is commonly used in
`video conferencing, CD-ROM based interactive videos for
`education and entertainment, digital video transmission for
`entertainment, still image catalogs, etc. All of the standards
`mentioned above, as Well as emerging HDTV standards,
`utiliZe transform code compressed domain formats (referred
`to herein as “transform domain” formats), Which include the
`Discrete Cosine Transform (DCT) format, the interframe
`predictive code format, such as the Motion Compensation
`(MC) algorithm Which may be used in conjunction With the
`25
`DCT format, and hybrid compressed formats. The DCT
`format is used in the compression standard for still images
`JPEG (Standard Draft, JPEG-9-R7, February 1991). The
`combination of Motion Compensation and Discrete Cosine
`Transform compression algorithm (MC/DCT) is used in a
`number of standards including: the compression standard for
`motion pictures (Generic coding of moving pictures and
`associated audio information, ISO/IEC 13818, ISO/IEC
`11172), the standard for video conferencing (ITU-T Rec
`ommendation H.261, CODEC for Audiovisual Services at
`p><64 kbits/s), and some High De?nition Television propos
`als.
`FIG. 1 depicts the steps involved in decoding or decom
`pressing a compressed audio/video signal. As shoWn, the
`steps involved in decoding of compressed audio/video signal
`102 include demultiplexing 104 the compressed audio/video
`signal into compressed audio 108 and compressed video 106
`data streams, performing video decoding on the compressed
`video data stream and performing audio decoding on the
`compressed audio data stream.
`As stated above, the ?rst step 104 involves demultiplex
`ing the compressed audio/video signal into compressed
`audio 108 and compressed video 106 streams. Audio decod
`ing is then performed on the compressed audio data stream
`108. During audio decoding compressed audio stream 108 is
`unpacked and its symbols decoded using a table-lookup
`(also called Variable Length Decoding (VLD)). The decoded
`quantized audio samples then undergo dequantiZation and
`denormaliZation 118. The denormaliZed audio samples are
`then subjected to inverse transformation 120 Which includes
`?ltering, WindoWing and reconstruction. Decoded audio
`stream 122 is then fed to audio renderer 127 before being
`forWarded to audio output device 128.
`Compressed video data stream 106 is subjected to video
`decoding. This includes variable length decoding (VLD)
`during Which the compressed video data stream is parsed
`into symbols 110, dequantiZation 110, inverse discrete
`cosine transformation 112 and motion compensation 114.
`Decoded video data stream 116 is then fed to video renderer
`124 before being forWarded to a graphics output device 126
`such as a monitor. US. patent application Ser. No. 08/525,
`357, assigned to the assignee of the present application,
`
`55
`
`60
`
`65
`
`AMAMZON 1007
`Page 7 of 12
`
`
`
`6,002,441
`
`10
`
`15
`
`25
`
`3
`increased logic complexity needed to perform the entire
`audio/video decoding. Increased complexity also increases
`the siZe of the decoder making it more expensive. There is
`thus a need for a decoder Which is cheaper and more
`compact than existing hardWare decoders.
`Another disadvantage of dedicated hardWare decoders is
`that the audio/video processing performed by the decoders
`does not make efficient use of existing system resources. In
`the system shoWn in FIG. 2, redundant hardWare such as
`audio/video DAC/mixer is needed for audio/video decoding
`even though CPU 134 is capable of handling these tasks
`ef?ciently. Thus, there is a need for a system Which can make
`efficient use of available system resources.
`Prior art systems like the one depicted in FIG. 2 also do
`not provide the ability to store decoded audio/video data
`streams in system memory. This makes prior art decoders
`incompatible With applications Which use audio/video data
`streams stored in system memory as their input. As a result,
`these applications, Which generally perform post-processing
`on the decoded audio/video data streams such as 3D effects
`and video resampling (scaling), cannot take advantage of the
`decoded audio/video outputs. Thus, there is a need for an
`audio/video decoding system Which is compatible With other
`system applications.
`Another disadvantage of prior art dedicated hardWare
`decoders is that they can process only a single compressed
`audio/video signal at a time. This is due to the fact that the
`dedicated hardWare decoder acts as a “black box,” taking in
`a compressed audio/video signal as input and outputting
`decoded audio and video streams. It is not possible to
`perform concurrent processing of audio/video streams.
`Thus, there is a need for a decoding system Which can
`perform concurrent processing of audio/video streams.
`SUMMARY
`In accordance With the teachings of this invention, a novel
`audio/video subprocessor is taught in Which hardWare accel
`eration provides improved performance, enhanced features,
`and frees the host processor to handle other tasks. Unlike
`prior art audio/video decoders, in accordance With the teach
`ings of this invention, the audio/video decode tasks are
`partitioned in a novel manner into pre-processing tasks and
`post-processing tasks.
`The pre-processing tasks typically involve non signal
`processing oriented operations, such as bit manipulations,
`table-lookup, and control, and thus in accordance With the
`teachings of this invention, these pre-processing tasks are
`assigned for execution by the system host processor. The
`host processor is typically a high-end RISC or CISC engine,
`such as a 586 microprocessor and is capable of handling
`such tasks efficiently. In addition, the host processor is
`responsible for other tasks such as data I/O, demultiplexing
`audio/video streams, audio/video task scheduling and audio/
`video synchroniZation Which do not require intensive com
`putational resources.
`Post-processing tasks, on the other hand, typically involve
`signal processing oriented operations such as multiply
`accumulate and require extensive CPU resources. In accor
`dance With the teachings of this invention, the post
`processing tasks are executed by dedicated audio and/or
`video processing hardWare, thereby reducing the burden
`imposed on the host CPU. In one embodiment, the novel
`audio/video processing hardWare of the invention also per
`forms video frame reformatting and output to the graphics
`subsystem of the host.
`The present invention satis?es the needs inherent in prior
`art audio/video decoding techniques. In particular, by per
`
`4
`forming only a subset of the audio/video decode tasks, the
`architectural complexity of the present invention is greatly
`reduced. This translates to savings in cost, poWer consump
`tion and siZe. The present invention makes efficient use of
`system resources thus reducing the need for redundant
`hardWare. Elimination of redundant hardWare reduces the
`number of external audio/video cables needed for the decod
`ing process, Which translates to reduction in costs and ease
`of installation. By storing the decoded audio/video outputs
`in system memory, the present invention, unlike prior art
`techniques, enables other post-processing applications to
`utiliZe the decoded outputs—this enhances compatibility
`With other system applications.
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`Additional features of the invention Will be readily appar
`ent from the folloWing detailed description and appended
`claims When taken in conjunction With the draWings, in
`Which:
`FIG. 1 is a block diagram depicting the various steps
`involved in decoding or decompressing a multiplexed com
`pressed audio/video signal;
`FIG. 2 is a block diagram of a prior art computer system
`incorporating a dedicated hardWare decoder;
`FIG. 3 is a block diagram depicting an exemplary com
`puter system incorporating the invention;
`FIG. 4 is block diagram depicting the steps involved in
`decoding a compressed audio/video signal in accordance
`With the teachings of the present invention; and
`FIG. 5 is a block diagram depicting the internal architec
`ture of the present invention.
`
`35
`
`DETAILED DESCRIPTION OF EXEMPLARY
`EMBODIMENTS
`
`In accordance With the teachings of this invention, FIG. 3
`depicts an exemplary computer system 150 incorporating a
`novel dedicated subprocessor 152 for performing audio/
`video signal decoding. As depicted in FIG. 3, computer
`system 150 includes dedicated subprocessor 152 Which
`interfaces With host CPU 134 over host bus 136 using host
`bus interface 138. Host CPU 134 and dedicated subproces
`sor 152 have a master-slave relationship, With host CPU 134
`acting as master and scheduling the tasks to be executed by
`dedicated subprocessor 152. In one embodiment, host bus
`136 is a generic PCI. Well knoWn prior art techniques like
`“Scatter-Gather” support to support virtual memory organi
`Zation may also be implemented in a given embodiment.
`Dedicated subprocessor 152 is also coupled to an external
`dynamic random access memory (DRAM) 133. Other com
`ponents of computer system 150 include system memory
`140, graphics subsystem 144 and audio subsystem 148.
`Dedicated subprocessor 152 uses host bus 136 to interface
`With system memory 140, graphics subsystem 144 and audio
`subsystem 148.
`Dedicated subprocessor 152 differentiates itself from
`prior art decoders in that it performs only a subset of the
`tasks involved in audio/video decoding of a compressed
`audio/video signal. This is unlike prior art systems in Which
`either all of the audio/video decoding is performed by the
`host CPU, seriously depleting CPU resources available for
`other tasks, or prior art systems Which include a dedicated
`hardWare decoder to perform all the audio/video decoding
`tasks at increased complexity and costs.
`In accordance With the present invention, audio/video
`decoding tasks are divided in a novel manner into “pre
`
`45
`
`55
`
`65
`
`AMAMZON 1007
`Page 8 of 12
`
`
`
`6,002,441
`
`5
`processing tasks” and “post-processing tasks,” as shown in
`FIG. 4. Pre-processing tasks involve one or more non-signal
`processing oriented operations such as bit manipulations,
`table lookup, audio/video demultiplexing, audio/video task
`scheduling and audio/video synchroniZation. Since pre
`processing tasks require minimal computational resources,
`in accordance With the present invention these pre
`processing tasks are assigned to be executed by the host
`CPU, Which can perform these tasks ef?ciently Without
`straining its computational resources.
`Post-processing tasks on the other hand involve one or
`more signal processing oriented operations, such as
`multiply-accumulate, inverse quantization, inverse discrete
`cosine transform, motion compensation, block
`reconstruction, WindoW ?ltering and video frame format
`ting. Post-processing tasks require considerable CPU and
`memory resources and are thus assigned to be executed by
`dedicated subprocessor 152 in accordance With the present
`invention.
`In accordance With the division of the audio/video decod
`ing tasks into pre-processing and post-processing tasks, in
`one embodiment of this invention, host CPU 134 is respon
`sible for demultiplexing the compressed audio/video signal
`into separate compressed video and compressed audio data
`streams. Host CPU 134 then performs video and audio
`pre-processing tasks. Video pre-processing includes per
`forming variable length decoding (VLD) Which involves
`parsing the compressed video signal into symbols 110. In
`one embodiment of the invention, video pre-processing
`tasks also include performing dequantiZation 110 on the
`symbol stream. HoWever, in an alternate embodiment, the
`dequantiZation task is included in video post-processing
`tasks.
`Audio pre-processing includes unpacking the audio sym
`bols using table lookup and then performing denormaliZa
`tion and dequantiZation 118 on the decoded quantiZed audio
`data stream. In one embodiment, host CPU 134, acting as the
`master, is also responsible for scheduling the tasks to be
`performed by dedicated subprocessor 152. In addition, host
`CPU 134 is also responsible for audio/video synchroniZation
`Which involves coordinating the audio and video decoding
`tasks such that if the video is ahead of audio, host CPU 134
`delays issuing the video task, and if the video lags audio,
`host CPU 134 strips decoding of the video frame to gain
`time.
`It should be apparent to those skilled in the art that the
`boundary betWeen pre-processing and post-processing tasks
`is not rigid, that is, in alternate embodiments of the present
`invention a particular pre-processing task may be classi?ed
`as a post-processing task and vice versa. Factors Which
`affect the division of decoding tasks into pre-processing and
`post-processing tasks include computational poWer of the
`host CPU and system bus scheduling restraints.
`Dedicated subprocessor 152 is responsible for post
`processing the pre-processed audio. and video data streams.
`This includes audio post-processing and video post
`processing. In one embodiment, dedicated subprocessor 152
`also performs video frame output tasks.
`Dedicated subprocessor 152 receives pre-processed
`audio/video data either directly from CPU 134 via host bus
`136, or via system memory 140. The capability to read
`pre-processed data from system memory 140 alloWs CPU
`134 and dedicated subprocessor 152 to perform audio/video
`decoding concurrently. For example, While dedicated sub
`processor 152 is post-processing the current frame, host
`CPU 134 pre-processes the next frame. In one embodiment,
`partial frame processing is also possible.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`6
`As mentioned above, dedicated subprocessor 152 per
`forms audio post-processing Which extracts decoded audio
`data stream from the pre-processed audio data. In accor
`dance With one embodiment-of this invention, dedicated
`subprocessor 152 reads pre-processed audio symbols from
`system memory 140 and converts them to pulse code
`modulation (PCM) samples by performing inverse trans
`form functions 120 including ?ltering, WindoWing, and
`reconstruction functions. The PCM samples are then trans
`ferred back to system-memory 140 Where they are acces
`sible by host CPU 134 or any typical prior art audio
`subsystems 148 such as a SoundBlasterTM sound card used
`in IBM-type personal computers to generate audible sound.
`As the audio samples are directly stored in memory 140, the
`need for redundant hardWare, for example the DAC 146
`shoWn in FIG. 2, is eliminated. Furthermore, the PCM audio
`samples stored in system memory 140 are accessible to
`post-processing applications for further processing. The
`present invention thus makes ef?cient use of existing system
`resources and alloWs different applications to share the
`decoded audio data.
`Video post-processing performed by dedicated subpro
`cessor 152 extracts decoded video data stream from the
`pre-processed compressed video data. In accordance With
`one embodiment of this invention, dedicated subprocessor
`152 reads pre-processed video symbols from system
`memory 140 and converts them into a video frame in native
`MPEG YCbCr 420 format. This involves performing
`inverse quantiZation and inverse discrete cosine transforma
`tion (IDCT) 112, and motion compensation 114 on the
`pre-processed video data. The output frame is then Written to
`DRAM 133 associated With dedicated subprocessor 152.
`In one embodiment of the invention, dedicated subpro
`cessor 152 is also responsible for performing video frame
`output tasks 124. Dedicated subprocessor 152 reads a frame
`from local DRAM 133 associated With dedicated subpro
`cessor 152. Dedicated subprocessor 152 converts the read
`frame to the video format required by the particular
`graphics-subsystem 144 utiliZed by computer system 150.
`The output frame is Written to the appropriate location in
`system memory 140 Which can be accessed by host CPU
`134, or to a dedicated graphics controller memory Which can
`be accessed by graphics subsystem 144. Graphics subsystem
`144 is a typical prior art graphics subsystem commonly
`employed in computer systems. In case of Uni?ed Memory
`Architecture (UMA) Where graphics memory and dedicated
`subprocessor memory are physically combined, the video
`frame output task reformats decoded video data and sends
`the reformatted video data to a display device or to system
`memory 140.
`As With audio decoding, video decoding performed in
`accordance With this invention reduces the need for redun
`dant hardWare, for example mixer 142 depicted in FIG. 2 is
`eliminated. Furthermore, video samples stored in system
`memory 140 can be utiliZed for further processing by other
`applications. The present invention thus alloWs other appli
`cations to take advantage of the video decoding output.
`In addition to system-level optimiZation achieved by
`partitioning the decoding tasks betWeen pre-processing
`operations to be performed by CPU 134 and post-processing
`operations to be performed by dedicated subprocessor 152,
`in accordance With the teachings of this invention the novel
`subprocessor architecture minimiZes chip siZe and cost by
`utiliZing a unique partitioning and sharing of the hardWare
`resources of subprocessor 152 betWeen audio and video
`post-processing tasks. As dedicated subprocessor 152 per
`forms only a subset of the tasks involved in compressed
`
`AMAMZON 1007
`Page 9 of 12
`
`
`
`15
`
`7
`audio/video decoding, namely only the post-processing
`tasks, the complexity of dedicated subprocessor 152 is
`greatly reduced. This translates to savings in cost, poWer
`utiliZed by the subprocessor, and siZe of dedicated subpro
`cessor 152.
`Referring to FIG. 5, there is shoWn a block diagram
`depicting the internal structure of an embodiment of dedi
`cated subprocessor 152. As shoWn in FIG. 5, dedicated
`subprocessor 152 comprises PCI Bus Interface (PCIF) 162,
`PCI Memory Management Unit (PCI MMU) 164, DRAM
`10
`Controller 174, DRAM Memory Management Unit (DRAM
`MMU) 172, Audio/Video Data Signal Processor unit
`(AVDSP) 166, Video Signal Processor (VSP) 168 and Frame
`Packer (FP) 170.
`PCI interface 162 enables dedicated subprocessor 152 to
`interface With the host system via PCI host bus 136. PCI
`MMU 164 contains FIFO (?rst in ?rst out) queues for
`receiving and transferring data to and from other parts of the
`host computer system. In particular, during video post
`processing, pre-processed video symbols are received from
`main memory 140 using bus mastering mode or directly
`from host CPU 134 using slave mode. During audio post
`processing, pre-processed audio symbols are received from
`main memory 140 or directly from host CPU 134. PCI
`MMU FIFO queues are also used to transfer post-processed
`audio and video data to system memory 140 or to the
`audio/video graphics subsystems.
`Dedicated subprocessor 152 interfaces With DRAM 133
`via DRAM controller 174, Which is a typical DRAM con
`troller knoWn in the prior art. In one embodiment of the
`invention, DRAM controller 174 includes refresh circuitry.
`In another embodiment, DRAM Controller 174 supports
`fast-page mode, tWo clocks per CAS cycle, and has a
`memory architecture of 16 bits Wide by at least 256K bytes.
`DRAM MMU 172 interfaces With DRAM Controller 174
`and includes arbitration logic, address generation logic.
`Audio/Video Data Signal Processor (AVDSP) 166 is used
`for all audio post-processing, and a subset of video post
`processing. All data coef?cients required for audio ?ltering
`and WindoWing reside in the on-chip ROM contained Within
`AVDSP 166. During audio post-processing, AVDSP 166
`receives pre-processed 16-bit audio samples from host CPU
`134 using slave mode or from system memory 140 using bus
`mastering mode. AVDSP 166 performs inverse quantiZation
`functions on the pre-processed audio samples including
`?ltering, WindoWing and reconstruction of audio samples.
`These functions are Well knoWn in the prior art. Audio
`decoded output samples (frames/pictures), for example
`16-bit samples in one embodiment, are Written back to a
`graphics output buffer using slave mode or to system
`memory 140 using bus mastering mode via PCI bus inter
`face 162. In cases Where the audio has stereo characteristics,
`the left and right output audio samples are interleaved.
`During video decoding, AVDSP 166 receives pre
`processed video symbols from main memory 140 or from
`host CPU 134 via PCI bus interface 162. In one
`embodiment, AVDSP 166 performs inverse-quantiZation,
`source and destination blocks address calculation, and con
`trols Video Signal Processor (VSP) 168 on a block-by-block
`basis. Decoded video output samples are Written back to a
`graphics buffer or system memory 140 using PCI bus
`interface 162.
`Video Signal Processor (VSP) 168 is responsible for
`performing inverse discrete cosine transformation (IDCT)
`and motion compensation on the video data stream. In one
`embodiment, VSP 168 is an engine for video IDCT, motion
`compensation and block reconstruction.
`
`55
`
`60
`
`65
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`6,002,441
`
`8
`Frame Packer (FP) 170 is responsible for executing video
`frame output tasks. In one embodiment, FP 170 is capable of
`transferring a frame from private DRAM 133 to graphics
`subsystem 144 or system memory 140, With on-the-?y
`format conversion. In another embodiment, color-space con
`version or stretching, Which is generally performed by
`graphics subsystem 144, is also performed by FP 170. FP
`170 is also capable of supporting a plurality of different
`graphics formats, such as Well knoWn variations of the 41212
`packed format.
`Audio Decode Task Performed by the Invention
`The audio decode task consists of transforming the
`frequency-domain, pre-processed samples available from
`the host to time-domain samples for use by a sound system.
`The pre-processed audio symbol data is independent of
`layer, sample-rate, and bitrate parameters, the only relevant
`parameter being the number of channels (mono or stereo). In
`addition, the concept of audio frame is no longer directly
`involved, a fact that alloWs the host to more easily match
`audio task siZe to the video frame rate, for easier synchro
`niZation and control.
`The audio decoding process requires maintaining a vector
`containing a ?ltered version of the past 512 samples. TWo
`such vectors (one for each stereo channel) are maintained in
`private DRAM 133. Each such vector is exactly 1024 bytes
`in siZe