`Nguyen
`
`USOO642.5054B1
`(10) Patent No.:
`US 6,425,054 B1
`(45) Date of Patent:
`Jul. 23, 2002
`
`(54) MULTIPROCESSOR OPERATION IN A
`MULTIMEDIA SIGNAL PROCESSOR
`
`(75) Inventor: Le Trong Nguyen, Monte Sereno, CA
`(US)
`
`rr. A
`(73) Assignee: Samsung Electronics Co., Ltd.,
`Kyungki-do (KR)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(*) Notice:
`
`(21) Appl. No.: 09/685,982
`22) Filled:
`Oct. 10, 2000
`(22) File
`C
`9
`Related U.S. Application Data
`(63) Continuation of application No. 08/697,102, filed on Aug.
`19, 1996, now abandoned.
`7
`
`
`
`5,511,217 A 4/1996 Nakajima et al. .............. 712/2
`5,546,586 A 8/1996 Wetmore et al. ............ 395/700
`5,551,010 A
`8/1996 Lino et al. .................. 711/169
`5,615,343 A 3/1997 Sarangdhar et al. .......... 710/39
`5,644,756. A 7/1997 Harwood, III .............. 395/311
`5,666,510 A 9/1997 Mitsuishi et al. ........... 711/220
`5,669,010 A
`9/1997 Duluk, Jr. ................... 712/222
`5,768,609. A 6/1998 Gove et al. ............ 395/800.11
`5,822.606 A 10/1998 Morton .....
`... 395/800.24
`5,930,522 A
`7/1999 Fant ............................ 712/25
`5,949,439 A
`9/1999 BNYoseph et al. ....... 34.5/503
`OTHER PUBLICATIONS
`Single Chip Gives Home PC's HDTV-Quality Movies. . .
`Business Wire, Oct. 9, 1995.
`Folev, P. “The Mpact(R) Media Processor redefines the mul
`y,
`p
`timedia PC.” Proceedings of Compcon 96. Feb. 25–28,
`1996: 311-318.
`Rathnam, S.; Slavenburg, G. “An Architectural Overview of
`the Programmable Multimedia Procesor, TM-1.” Proceed
`ings of COMPCON 96. Feb. 25–28, 1996: 319–326.
`TMS320C8X System-Level Synopsys, Texas Instrument
`
`8.
`
`- - - - - - - - - - - - - - - - - - - - - - - - - - GO6F 2.i.gle Inc. Sep.1995. Online Available: http://www-s.ti.com/sc/
`
`O
`
`O
`
`- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
`
`s
`
`sheets/S ru113b/s ru113.pdf.
`
`pru 113.p
`p
`(58) Field of Search ................................. 711/117, 11s,
`* cited by examiner
`711/130, 140, 147, 149; 712/1-9, 24, 29,
`32-35; 34.5/501-506, 514-522; 710, Primary Examiner Do Hyun Yoo
`ASSistant Examiner Yamir Encarnacion
`(74) Attorney, Agent, or Firm-Skjerven Morrill LLP
`References Cited
`(57)
`ABSTRACT
`U.S. PATENT DOCUMENTS
`To achieve high performance at low cost, an integrated
`4,300,763 A 11/1981 Barr ........................... 434,236.
`digital Signal processor uses an architecture which includes
`4,349.871 A * 9/1982 Lary .......................... 711/138
`both a general purpose processor and a vector processor. The
`4,394,540 A 7/1983 Willis et al. ........... 379/106.06
`integrated digital signal processor also includes a cache
`4,541,046 A 9/1985 Nagashima et al. ........... 712/3
`Subsystem, a first buS and a Second bus. The cache Sub
`4,888,679 A 12/1989 Fossum et al. ........ 395/800.02
`System provides caching and data routing for the processors
`4,991,083 A 2/1991 Aoyama et al. ............... 712/3
`s
`A
`E. 3. et
`- - - - - - - - - - - - - - -ft and buses. Multiple Simultaneous communication paths can
`5,293,602 A SE 4 A. al.
`- - - 2: 47
`be used in the cache Subsystem for the processors and buses.
`5.418973 A 5/1995 Ellis g al ... 711/3
`Furthermore, Simultaneous reads and writes are Supported to
`5.423,051 A
`6/1995 Fuller et al...
`... 7127
`a cache memory in the cache Subsystem.
`5,469,561. A 11/1995 Takeda ....................... 713/600
`5,502,683 A 3/1996 Marchioro ............. 395/230.05
`10 Claims, 8 Drawing Sheets
`m
`
`(56)
`
`200
`
`PROCESSING CORE
`
`SE: He al-SEC
`222
`PROCESSOR
`21
`-32BH
`256 B-256-Bit
`CACHESUBSYSTEI
`230
`
`26
`
`220
`
`260
`
`DAIA iSFRUCTION
`CACHE
`CACHE
`284
`262
`
`ROA 270ASTRUCTION DATA
`CACHE CONTRGt.
`CACHE CACHE
`280
`292 294
`
`242
`
`SYSTEAER as-e-
`43
`
`Filip EXAR
`
`245
`
`BSTREAlf
`PROCESSR
`
`240-Y-
`
`iNTERRUPT
`CONROLLER
`
`248
`
`y
`
`250
`
`DEWICE
`iWiFRFACE
`
`252
`H- OTHERS
`
`-- AiOEO
`H- AUDO
`--- PHOE
`
`DiA CONROLLER
`
`OCAL BUS
`itERFACE
`
`is 255
`---32-BiPCBS
`
`-238
`REAORY CONTROitER---3264-BITMEMORYBUS
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 1
`
`
`
`U.S. Patent
`
`Jul. 23, 2002
`
`Sheet 1 of 8
`
`US 6,425,054 B1
`
`
`
`LOCAL
`BUS
`105
`
`BUFFER
`MEMORY
`
`WIDRAM
`
`VIDEO IN
`
`VIDEO OUT
`
`AUDIO IN
`
`AUDIO OUT
`
`PHONE IN
`
`PHONE OUT
`
`MONITOR OUT
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 2
`
`
`
`U.S. Patent
`
`Jul. 23, 2002
`
`Sheet 2 of 8
`
`US 6,425,054 B1
`
`re
`
`PROCESSING CORE
`GENERAL
`PURPOSE
`PROCESSOR
`210
`
`200
`
`VECTOR
`PROCESSOR
`
`256-BIT- 256-BIT
`CACHE SUBSYSTEM 230
`
`DATA INSTRUCTION
`
`FM 270/NSTRUCTION DATA
`CACHE CONTROL
`280
`
`242
`
`
`
`243
`
`FULL-DUPLEX UART
`245
`
`BIT STREAM
`PROCESSOR
`
`DEVICE
`INTERFACE
`
`252
`
`WIDEO
`AUDIO
`PHONE
`
`DMA CONTROLLER
`
`257
`
`240
`248
`
`LOCAL BUS
`INTERFACE
`
`255
`32-BITPC BUS
`
`INTERRUPT
`CONTROLLER
`
`250
`
`258
`MEMORY CONTROLLERH-32/64-BITMEMORYBUS
`
`FIG. 2
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 3
`
`
`
`U.S. Patent
`
`Jul. 23, 2002
`
`Sheet 3 of 8
`
`US 6,425,054 B1
`
`300
`
`MAIN MEMORY
`
`320
`
`
`
`APPLICATION
`PROGRAM
`
`330
`
`OPERATING SYSTEM 340
`
`MULTIMEDIA
`PROCESSOR
`
`GENERAL PURPOSE
`PROCESSOR 210
`
`REAL-TIME OPERATING
`SYSTEM
`360
`
`
`
`GENERALIASKS
`370
`
`
`
`WECTOR-DATA IASKS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 4
`
`
`
`U.S. Patent
`
`Jul. 23, 2002
`
`Sheet 4 of 8
`
`US 6,425,054 B1
`
`230
`Na
`260-N IDC SRAM - 406
`GPP
`262n-1 RCACHE TAG
`GPP
`'''1 1 KBECACHE
`VECTOR
`292n-1
`EAG
`WECTOR
`294-14 DOHE TAG
`430
`44
`
`t
`
`:
`
`CACHE ROM
`472
`
`270
`
`TAG
`
`16 KB
`CACHE
`ROM
`
`280
`
`CACHE
`CONTROL
`
`418-41
`
`
`
`DATAPIPELINE
`
`426
`ADDRESS
`PIPELINE
`
`
`
`41 1412413414415416
`
`423424.425
`
`VECTOR (INSTRUCTION)
`VECTOR (DATA)
`GENERAL PURPOSE PROCESSOR
`
`VECTOR (INSTRUCTION)
`WECTOR (DATA)
`VECTOR (DATA)
`gEERA. PURPOSE PROCESSOR
`240-- FBUS
`IOBUS
`
`FIG. 4
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 5
`
`
`
`U.S. Patent
`
`Jul. 23, 2002
`
`Sheet 5 of 8
`
`US 6,425,054 B1
`
`4 MB
`
`4 MB
`
`INTERNAL UROM
`
`INTERNAL SRAM
`
`LOCALDRAMMEMORY
`
`5 MB
`
`INTERNAL FBUS DEVICES
`DRAM CONTROLLER
`WFB CONTROLLER
`DMA CONTROLLER
`KSO 122 CODEC SERAL INTERFACE
`KS01 19 CODEC SERAL INTERFACE
`AD 1843 CODEC SERAL INTERFACE
`
`47 MB
`
`RESERVED
`
`INTERNAL I/O BUS DEVICES
`BIT STREAM PROCESSOR
`8259 INTERRUPT CONTROLLER
`8254 TIMERS
`16450 UART SERAL LINE
`
`4 MB
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`128 MB
`
`2 GB
`
`
`
`4 GB
`
`MSPCONTROLREGISTER
`
`RESERVED
`
`OTHER HOST DEVICES (2GB)
`
`(MAPPED TO HOST ADDRESSES FROMOTO 2GB)
`FIG. 5
`
`
`
`
`
`
`
`510
`
`520
`
`530
`
`540
`
`550
`
`560
`
`570
`
`580
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 6
`
`
`
`U.S. Patent
`
`US 6,425,054 B1
`
`093
`
`0/3
`
`
`
`W08 3H070
`
`
`
`
`
`
`
`099 XnW
`
`
`
`0738 HH0,70
`
`80105/
`
`HOSS300Hd
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 7
`
`
`
`U.S. Patent
`
`Jul. 23, 2002
`
`Sheet 7 of 8
`
`US 6,425,054 B1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WOH JH0,70WVHS HH0,70
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 8
`
`
`
`U.S. Patent
`
`Jul. 23, 2002
`
`Sheet 8 of 8
`
`US 6,425,054 B1
`
`
`
`
`
`ÕTHGVGH || IVTXnWTHQVQH || UTHGWTHMNo.HOW H/M)
`
`
`
`ŒŽ HOSS3008'd 80.103/
`
`018
`
`
`
`
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 9
`
`
`
`1
`MULTIPROCESSOR OPERATION INA
`MULTIMEDIA SIGNAL PROCESSOR
`
`US 6,425,054 B1
`
`15
`
`25
`
`35
`
`40
`
`CROSS-REFERENCE TO RELATED
`APPLICATION
`This application is a continuation of U.S. patent applica
`tion Ser. No.: 08/697,102 filed Aug. 19, 1996, entitled
`“Multiprocessor Operation in a Multi-Media Signal
`Processor', now abandoned.
`This patent document is related to and incorporates by
`reference, in their entirety the following concurrently filed
`patent applications:
`U.S. patent application Ser. No. 08/699,579 entitled
`“Single-Instruction-Multiple-Data Processing in a
`Multimedia Signal Processoris' now U.S. Pat. No.
`6,058,465,
`U.S. patent application Ser. No. 08/699,294 entitled “Effi
`cient Context Saving and Restoring in Multi-Tasking
`Computing System Encironment” now U.S. Pat. No.
`6,061,711
`U.S. patent application Ser. No. 08/699,295, entitled
`“System and Method for Handling Software Interrupts
`with Argument Passing” now U.S. Pat. No. 5,996,058;
`U.S, patent application Ser. No. 08/699,294, entitled
`“System and Method for Handling Interrupts and
`Exception Events in an Asymmetric Multiprocessor
`Architecture' now U.S. Pat. No. 6,003,129;
`U.S. patent application Ser. No. 08/699,303, entitled
`“Methods and Apparatus for Processing Video Data'
`now abandoned;
`U.S. patent application Ser. No. 08/697,086, entitled
`“Single-Instruction-Multiple-Data Processing Using
`Multiple Banks of Vector Registers' now U.S. Pat. No.
`5,838,984; and
`U.S. patent application Ser. No. 08/699,585, entitled
`“Single-Instruction-Multiple-Data Processing with
`Combined Scalar/Vector Operations” now abandoned.
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`This invention relates to digital signal processors and
`particularly to dual-threaded, asymmetric parallel process
`ing Systems which include a general purpose processor and
`a vector processor for manipulation of Vector data.
`2. Description of Related Art
`A variety of digital signal processors (DSPs) are used in
`multimedia applications Such as coding and decoding of
`Video, audio, and communications data. One type of digital
`Signal processor (DSP) has dedicated hardware to address a
`specific problem such as MPEG video decoding or encod
`ing. Dedicated hardware DSPs generally provide high per
`formance per cost but are only usable for Specific problems
`and unable to adapt to other problems or changes in Stan
`dards.
`Programmable DSPs execute programs which solve mul
`timedia problems and provide greater flexibility than dedi
`cated hardware DSPs because changing Software for a
`60
`programmable DSP can change the problem solved. A
`disadvantage of programmable DSPs is their lower perfor
`mance per cost. A programmable DSP typically has an
`architecture Similar to that of a general purpose processor
`and a relatively low processing power. The low processing
`power generally results from an attempt to minimize cost.
`Thus, such a DSP is not a wholly satisfactory because a low
`
`45
`
`50
`
`55
`
`65
`
`2
`power DSP hampers the DSP's ability to address the more
`complex multimedia problems. Such as real-time Video
`encoding and decoding.
`Since a goal for a programmable DSP is to provide high
`processing power to address multimedia problems at a
`minimum cost, one could incorporate into Such a DSP
`parallel processing, which is one known way to increase
`processing power. One architecture for parallel processing is
`a “very long instruction word” (VLIW) DSP, which is
`characterized by a large number of functional units, most of
`which perform different, but relatively simple tasks. A Single
`instruction for a VLIW DSP may be 128 bytes or longer and
`has separate parts. The parts can be executed by Separate
`functional units in parallel. VLIW DSPs have high comput
`ing power because a large number of functional units can
`operate in parallel. VLIW DSPs also have relatively low cost
`because each functional unit is relatively Small and Simple.
`A problem for VLIW DSPs, however, is inefficiency in
`handling input/output control, communication with a host
`computer, and other functions that do not lend themselves to
`parallel execution in the functional units of the VLIW DSP.
`Additionally, programs for VLIW differ from conventional
`computer programs and can be difficult to develop because
`of lack of programming tools and programmerS familiar
`with VLIW Software architectures.
`
`SUMMARY OF THE INVENTION
`In accordance with the invention, an integrated digital
`Signal processor is disclosed. The digital Signal processor
`combines a general purpose processor with a vector
`processor, which is capable of operating in parallel with the
`general purpose processor. The integrated digital Signal
`processor is able to achieve high performance with low cost
`Since the two processors perform only tasks ideally Suited
`for each processor. For example, the general purpose pro
`ceSSor runs a real time operating System and performs
`overall System management while the vector processor is
`used to perform parallel calculations using data structures
`called “vectors'. A vector is a collection of data elements
`typically of the same type.
`In one embodiment, the digital Signal processor also
`includes a cache Subsystem, a first bus, and a Second bus.
`The first bus is used for high Speed devices Such as a local
`buS interface, a DMA controller, a device controller, and a
`memory controller. The Second bus is used for slow speed
`devices Such as a System timer, a UART, a bit Stream
`processor, and an interrupt controller.
`The cache Subsystem combines caching functions with
`Switchboarding, or data routing, functions. The Switchboard
`functions allow multiple communication paths between the
`processors and buses to operate Simultaneously.
`Furthermore, the cache portion of the cache Subsystem
`allows simultaneous reads and writes into the cache
`memory.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 shows a block diagram of a multimedia card in
`accordance with an embodiment of the invention.
`FIG. 2 shows a block diagram of a multimedia Signal
`processor in accordance with an embodiment of the inven
`tion.
`FIG. 3 illustrates relations between processors and soft
`ware or firmware in a System including a multimedia pro
`ceSSor in accordance with an embodiment of the invention.
`FIG. 4 shows a block diagram of a cache subsystem in
`accordance with an embodiment of the invention.
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 10
`
`
`
`US 6,425,054 B1
`
`3
`FIG. 5 shows a memory map in accordance with an
`embodiment of the invention.
`FIG. 6 shows a block diagram of a data pipeline used in
`a cache Subsystem in accordance with an embodiment of the
`invention.
`FIG. 7 shows a block diagram of a second data pipeline
`used in a cache Subsystem in accordance with an embodi
`ment of the invention.
`FIG. 8 shows a block diagram of an address pipeline used
`in a cache Subsystem in accordance with an embodiment of
`the invention.
`Use of the same reference symbols in different figures
`indicates Similar or identical items.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`In accordance with an aspect of the invention, a multi
`media processor includes a general purpose processor and a
`vector processor which operate in parallel according to
`Separate program threads. The general purpose processor,
`like most conventional general purpose processors, executes
`instructions which typically manipulates Scalar data. Such
`processors are Suited for execution of input/output (I/O) and
`control functions. In Some embodiments, the general pur
`pose processor has a limited vector processing capability of
`Several byte-size data elements packed into one data word.
`For example, if the general purpose processor is a 32-bit
`processor, Some embodiments of the general purpose pro
`ceSSor can process four one-byte data elements Simulta
`neously. However, multimedia computing Such as audio and
`Video data compression and decompression requires many
`repetitive calculations on pixel arrays and Strings of audio
`data. To perform real-time multimedia operations, a general
`purpose processor which manipulates Scalar data (e.g. one
`pixel value or Sound amplitude per operand) or only Small
`vectors must operate at a high clock frequency. In contrast,
`the vector processor executes instructions where each oper
`and is a vector containing multiple data elements (e.g.
`multiple pixel values or Sound amplitudes). Therefore, the
`vector processor can perform real-time multimedia opera
`tions at a fraction of the clock frequency required for a
`general purpose processor to perform the same function.
`Thus, by allowing an efficient division of the tasks
`required for a multimedia application, the combination of
`programmable general purpose and Vector processors pro
`vides high performance per cost. In one embodiment of the
`invention, the general purpose processor executes a real
`time operating System designed for a media circuit board
`“card') communicating with a host computer System. The
`real-time operating System communicates with a primary
`processor of the computer System, Services I/O devices on or
`coupled to the card, and Selects tasks which the vector
`processor executes. In that embodiment, the vector proces
`Sor is designed to perform the computationally intensive
`tasks requiring manipulation of large data blocks, while the
`general purpose processor acts as the master processor to the
`vector processor. Program threads for each processor are
`written using a conventional instruction Set which makes the
`multimedia processor “programmer-friendly'. Programma
`bility permits the multimedia processor to perform a variety
`of different multimedia tasks. The multimedia processor can,
`for example, be adapted to a new protocol Simply by
`changing either its application programs or its firmware. In
`one embodiment, the instructions Set is similar to that of a
`conventional reduced instruction set computer (RISC)
`instruction Set.
`
`4
`In accordance with another aspect of the invention, the
`general purpose processor and the vector processor Share a
`variety of on-chip and off-chip resources which are acces
`Sible through a single address Space. A cache Subsystem
`which implements Separate data and instruction caches for
`each processor also provides a Switchboard type connection
`between local memory and resources Such as a bitstream
`processor, a universal asynchronous receiver-transmitter
`(“UART), a direct memory access ("DMA") controller, a
`local bus interface, and a coder-decoder (“CODEC') inter
`face which are memory mapped devices. The cache Sub
`System can use a transaction-oriented protocol which imple
`ments a Switchboard for data access among the processors
`and memory mapped resources.
`FIG. 1 shows a multimedia card 100 in accordance with
`an embodiment of the invention. Multimedia card 100
`includes a printed circuit board, a multimedia processor 110,
`and a connector which attaches to a local bus 105 of a host
`computer System. In an exemplary embodiment, local bus
`105 is a PCI bus; but in other embodiments, local bus 105
`could be a proprietary bus or a bus which conforms to any
`desired protocol such as the ISA or VESA bus protocols.
`Multimedia processor 110 uses a local memory 120, also
`located on multimedia card 100, for storage of data and
`program instructions. Local memory 120 may also act as a
`frame buffer for Video coding and decoding applications. In
`the exemplary embodiment, local memory 120 can be
`implemented by a 512K by 32-bit synchronous dynamic
`random access memory (DRAM). Portions of the local
`memory Space can also be implemented by on-chip Static
`random access memory (“SRAM”) and read-only memory
`(“ROM") in multimedia processor 110. In fact, if on-chip
`memory is sufficient to hold the data and instructions of
`multimedia card 100, local memory 120 need not be imple
`mented.
`In addition to multimedia processor 110 and local
`memory 120, multimedia card 100 includes a video analog
`to-digital converter (ADC) 132, a video digital-to-analog
`converter (DAC) 134, an audio ADC 142, an audio DAC
`144, a communications ADC 146, and a communications
`DAC 148. Each of converters 132, 134, 142,144, 146, and
`148 can be implemented by one or more Separate integrated
`circuits. Alternatively, two or more of converters 132, 134,
`142, 144, 146, and 148 can be integrated on a single
`integrated circuit. A Single integrated circuit 140, for
`example, the AD 1843 available from Analog Devices, Inc.,
`can implement the functions of converters 142, 144, 146,
`and 148.
`Video ADC 132, which may be implemented by, for
`example, a KSO122 integrated circuit available from Sam
`Sung Semiconductor, Inc., connects to a Video camera or
`other Source of a Video Signal and digitizes the Video signal
`into a series of pixel values. Multimedia card 100 com
`presses or encodes the pixel values according to a Video
`encoding standard such as MPEG, JPEG, or H.324 imple
`mented in the firmware executed by multimedia processor
`110. The encoded video data can then be transmitted to the
`host computer via local bus 105, to a device such as an
`Ethernet card coupled to local bus 105, or to be further
`encoded for transmission on a telephone line coupled to
`communication DAC 148.
`Video DAC 134 converts a series of digital samples from
`multimedia processor 110 into an analog video signal for a
`video monitor or television. Video DAC 134 may be
`implemented, for example, by a KSO119 integrated circuit
`available from Samsung Semiconductor, Inc., according to
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 11
`
`
`
`S
`an NTSC or PAL video standard. Multimedia processor 110
`can generate the series of digital samples for video DAC 134
`based on data received from the host computer, another
`device coupled to local bus 105 a video camera coupled to
`video ADC 132, or a telephone line coupled to communi
`cation ADC 146.
`An optional component of multimedia card 100 is a
`graphics controller 150 which shares local memory 120 with
`multimedia processor 110 and provides a video signal to a
`video monitor for the host system. Graphics controller 150
`may be implemented, for example, by a Super VGA graphics
`controller available from various vendors, Such as Cirrus
`Logic, S3, or Trident Microsystems. Multimedia processor
`110 generates and stores pixel maps in local memory 120
`from which graphics controller 150 generates a Video signal
`for the video monitor.
`Audio ADC 142 and audio DAC 144 are for input and
`output of analog audio signals. In accordance with one
`aspect of the invention, multimedia card 100 emulates the
`functions of a Sound card, Such as the popular
`“SoundBlaster', and implements sound synthesis functions
`such as wavetable synthesis and FM synthesis. For sound
`card emulations, an application program executed by the
`host computer provides data representing a Sound, and
`multimedia processor 110 generates Sound amplitudes in
`accordance with that data. Audio DAC 144 converts the
`Sound amplitudes to an analog audio signal for a speaker or
`amplifier. Multimedia processor 110 similarly handles input
`audio signals from audio ADC 142.
`Communication ADC 146 Samples an analog signal from
`a telephone line and provides digitized Samples to multime
`dia processor 110. How multimedia processor 110 processes
`the digitized Samples depends on the function implemented
`in firmware. For example, multimedia processor 110 can
`implement modem functions by executing programs in
`firmware that perform V.34 demodulation of the samples and
`V.42bis error correction and decompression. Multimedia
`processor 110 can also compress data received from the host
`computer and generate digital Samples representing a cor
`40
`rectly modulated analog signal for transmission by commu
`nications DAC 148. Similarly, multimedia processor 110 can
`implement answering machine, facsimile, or videophone
`functions using the same hardware (ADC 146 and DAC
`148) as the interface with telephone lines if suitable firm
`45
`ware or Software is available.
`FIG. 2 shows a block diagram of an embodiment of
`multimedia processor 110 . Multimedia processor 110
`includes a processing core 200 which contains a general
`purpose processor 210 and a vector processor 220. AS used
`here, the term vector processor refers to a processor which
`executes-instructions having vector operands, i.e., operands
`each containing multiple data elements of the same type.
`Each of general purpose processor 210 and vector processor
`220 executes a separate program thread and may be a Scalar
`or SuperScalar processor.
`In the exemplary embodiment, general purpose processor
`210 is a 32-bit RISC processor which operates at 40 Mhz.
`and conforms to the standard ARM7 instruction set. The
`architecture for an ARM7 RISC processor and the ARM7
`60
`instruction set is described in the ARM7DM Data Sheet
`available from Advanced RISC Machines Ltd. General
`purpose processor 210 also implements an extension of the
`ARM7 instructions set which includes instructions for an
`interface with vector processor 220. The copending patent
`application, entitled “System and Method for Handling
`Software Interrupts with Argument Passing” which was
`
`6
`incorporated by reference above describes the extension to
`the ARM7 instruction set for the exemplary embodiment of
`the invention. General purpose processor 210 is connected
`to vector processor 220 by control bus 212 to carry out the
`extension of the ARM7 instruction set. Furthermore, inter
`rupt line 222 is used by vector processor 220 to request an
`interrupt on general purpose processor 210.
`Vector processor 220 has a SIMD (single-instruction
`multiple-data) architecture and manipulates both Scalar and
`vector quantities. In the exemplary embodiment, vector
`processor 220 consists of a pipelined RISC central process
`ing unit that operates at 80 Mhz and has a vector register file
`that is 288 bits wide. Each vector register in the vector
`register file can contain up to 32 data elements. Table 1
`shows the data types Supported for data elements within a
`VectOr.
`
`TABLE 1.
`
`Data Type
`
`Data Size
`
`Interpretation
`
`int8
`
`int9
`
`int16
`
`int32
`
`float
`
`8 bits
`(Byte)
`9 bits
`(Byte.9)
`16 bits
`(Halfword)
`32 bits
`(Word)
`
`32 bits
`(Word)
`
`8-bit 2's complement integer
`between -128 and 127.
`9-bit 2's complement integer
`between -256 and 255.
`16-bit 2's complement between
`–32,768 and 32,767.
`32-bit 2's complement integer
`between -2147483.648
`and 2147483.647
`32-bit floating point number
`conforming to the IEEE 754
`single-precision format.
`
`Thus, a vector register can hold thirty two 8-bit or 9bit
`integer data elements, Sixteen 16-bit integer data elements,
`or eight 32-bit integer or floating point elements.
`Additionally, the exemplary embodiment can also operate
`on a 576-bit vector operand Spanning two vector registers.
`The instruction set for vector processor 220 includes
`instructions for manipulating vectors and for manipulating
`Scalars. The patent application entitled "Single-Instruction
`Multiple-Data Processing in a Multimedia Signal
`Processor”, which was incorporated by reference above,
`describes the instruction Set for the exemplary embodiment
`of the invention and an architecture for implementing the
`instruction Set.
`Cache Subsystem 230 contains SRAM block 260, which
`is shown graphically as two blocks, ROM 270 and a cache
`control 280. Cache Subsystem 230 can configure SRAM
`block 260 into (i) an instruction cache 262 and a data cache
`264, for general purpose processor 210, and (ii) an instruc
`tion cache 292 and data cache 294, for vector processor 220.
`On-chip ROM 270 which contains data and instructions for
`general processor 210 and vector processor 220 can also be
`configured as a cache. In the exemplary embodiment, ROM
`270 contains: reset and initialization procedures, Self-test
`diagnostics procedures, interrupt and exception handlers,
`and Subroutines for Soundblaster emulation; Subroutines for
`V.34 modem Signal processing, general telephony functions,
`2-dimensional and 3-dimensional graphics Subroutine librar
`ies, and Subroutine libraries for audio and Video Standards
`Such as MPEG-1, MPEG-2, H.261, H.263, G.728, and
`G.723.
`FIG. 3 illustrates the relationships between hardware and
`Software or firmware in an application of multimedia card
`100 in a host computer system 300. Host computer system
`300 has a primary processor 310 which executes programs
`stored in a main memory 320. In the exemplary
`embodiment, host computer system 300 is an IBM compat
`
`US 6,425,054 B1
`
`15
`
`25
`
`35
`
`50
`
`55
`
`65
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1024, p. 12
`
`
`
`US 6,425,054 B1
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`7
`ible personal computer including an x86 type
`microprocessor, and the programs executed by host com
`puter system 300 include an application program 330, run
`ning under an operating system 340 such as WindowsTM95
`or NT. Application program 330 can communicate with
`multimedia card 100 via device drivers 342. Device drivers
`342 conform to the device driver API of the operating
`System.
`The device drivers are typically provided with each
`multimedia card 100 since different embodiments of multi
`media card 100 can have different hardware implementa
`tions Such as differing register maps and interrupt levels. The
`device drivers must properly transform the control signals
`needed by the particular embodiment of multimedia card
`100 into the control signals as defined by the device driver
`API of the operating System. Typically the operating System
`will expect a different device driver for each device such as
`a modem driver, a graphics driver, and an audio driver. Thus,
`if an embodiment of multimedia card 100 combines the
`functionality of an audio card, a modem, and a graphics
`card, three Separate device drivers are typically required by
`the operating System.
`General purpose processor 210 in multimedia processor
`110 executes a real-time operating system 360 which con
`trols communications with device drivers 342. General
`purpose processor 210 also performs general tasks 370.
`Vector processor 220 performs vector tasks 380.
`Cache Subsystem 230 (FIG. 2) also couples general pro
`cessor 210 and vector processor 220 to two system busses:
`IOBUS 240 and FBUS 250. IOBUS 240 typically operates
`at a slower frequency than FBUS 250. Slower speed devices
`are coupled to IOBUS 240, while higher speed devices are
`coupled to FBUS 250. By separating the slower speed
`devices from the higher speed devices, the slower Speed
`devices are prevented from unduly impacting the perfor
`mance of the higher Speed devices.
`Cache Subsystem 230 also serves as a Switchboard for
`communication between IOBUS 240, FBUS 250, general
`processor 210, and vector processor 220. In most embodi
`ments of cache subsystem 230, multiple simultaneous
`accesses between the buSSes and processors are possible. For
`example, Vector processor 220 is able to communicate with
`FBUS 250 at the same time that general purpose processor
`210 is communicating with IOBUS 240. In one embodiment
`of the invention, the combination of the Switchboard and
`caching function is accomplished by using direct mapping
`techniques for FBUS 250 and IOBUS 240. Specifically, the
`devices on FBUS 250 and IOBUS 240 can be accessed by
`general purpose processor 210 and vector processor 220 by
`Standard memory reads and write at appropriate addresses.
`FIG. 5 shows the memory map of one embodiment of the
`invention. Memory block 510, i.e. the address space from
`byte-address Zero to address 4M-1, is occupied ROM 270.
`The unit M and G, which are used here for memory
`addresses, respectively stand for the number 1,048,576 (i.e.,
`1,024 * 1024)
`and
`1,073,741,824 (i.e.,
`1024*1024*1024*1024). Memory block 520, i.e. the
`address space from byte-address 4M to 8M-1, is occupied by
`SRAM block 260. Memory block 530, i.e. the address space
`from byte-address 8M to address 72M-1, is occupied by
`local memory 120. The devices on FBUS 250 are mapped to
`memory block 540 which starts after byte-address 72M and
`extends to byte-address 77M. Memory block 550 is reserved
`for future expansion. The devices on IOBUS 240 are
`mapped to memory block 560, which starts after byte
`address 125M and extends to byte-address 128M-1.
`Memory block 570 is also reserved for future expansion.
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`Memory block 580, i.e. the address space from byte-address
`2G to address 4G-1, is occupied by other host computer
`devices and is typically accessed through local bus interface
`255.
`FBUS 250 (FIG. 2) is connected to a memory controller
`258, a DMA controller 257, a local bus interface 255, and a
`device interface 252 which respectively provide interfaces
`for local memory 120, local bus 105 and converters 132,
`134, 142, 144, 146, 148, and 150 shown in FIG. 1.
`Memory controller 258 controls reads and writes to local
`memory