`[19]
`[11] Patent Number:
`5,968,167
`
`Whittaker et al.
`[45] Date of Patent:
`Oct. 19, 1999
`
`USOOS968167A
`
`54 MULTI-THREADED DATA PROCESSING
`MANAGEMENT SYSTEM
`
`
`
`
`
`75
`
`73
`
`21
`22
`
`51
`52
`58
`
`59
`
`Inventors: James Robert Whittaker; Paul
`Rowland, both of Herts, United
`Kingdom
`
`Assignee: Videologic Limited, Hertfordshire,
`United Kingdom
`
`APPL No“ 08/834,808
`Filed:
`Apr. 3, 1997
`
`Int. Cl.6 ...................................................... G06F 13/00
`U-S- CL --------------------------------------------------- 712/225
`
`new 0f Search ...................... 710/131, 240,
`710/1, 2009 100, 5; 712/225
`
`References Cited
`US. 1’MEN1‘ DOCUMENTS
`
`4,542,455
`9/1985 Demeure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 395/674
`5,307,496
`4/1994 Ichinose et a1.
`-- 395/974
`4/1996 Milne et a1.
`5,487,153
`1/1996 Hammerstrom et 211
`569319551310
`5,511,002
`
`"
`/
`5,684.987
`
` 11/1997 Mamiya et a1.
`.. 395/614
`
`
`
`5,689.674 “/1997 Gri ith et a1.
`.....
`.. 395/393
`5 6991537
`12/1997 Sharangpani et a1.
`.. 395/393
`
`5/1998 Lambrecht et a1.
`.................... 395/308
`5:748,921
`FOREIGN PATENT DOCUMENTS
`
`0 367 639
`0 397 180
`WO 94/15287
`
`5/1990 European Fat. or. .
`11/1990 European Fat. 011.
`.
`7/1994 WIPO.
`
`Primary Examiner—David Y. Eng
`Allamey, Agent, 0r Firm—Flynn, Thiel, Boutell & Tanis,
`P.C.
`
`[57]
`
`ABSTRACT
`
`A data processing management system for controlling 1e
`execution of multiple threads of processing instructions such
`as the instructions that are employed to process multimecia
`data. The management system includes a media control core,
`a number of data processing units and a multi-banked 621616.
`For the processing instruction for each thread, the multime-
`dia core identifies the data processing operation to De
`executed as well as the resources needed tO execute tiat
`
`units for execution. The data and addresses upon which 1e
`
`Operation. The multimedia core then determines for each
`instruction if all the resources are available to execute 1e
`operation. For the operations for which all the resources are
`available, the multimedia core then determines which opera-
`tion has the highest priority, The Operation having
`1e
`highest priority is then passed to one of the data processing
`.
`.
`.
`.
`data processmg urnts act are temporarlly stored m 1e
`multi-banked cache. Data are written into the cache from
`multiple input POITS~ Dam arc read from thc cache 01”
`through multiple output ports.
`
`0 020 202 12/1980 European Pat. Off.
`
`.
`
`30 Clailns, 7 Drawing Sheets
`
`4
`2
`DATA PROCESSING UNITS
`
`
`
`
`1,
`DATA
`
`
`
`
`.’
`REAL T'ME
`PROCESSING
`HEEL/NE
`
`
`
`DATA
`CORE n
`.
`
`
`W'D'A'TA """""""" :,'5
`
`
`
`
`REAL TIME
`DATA
`/
`
`
`MEDIA CONTROL CORE
`PROCESSING
`.
`
`PIPELINE 5‘10
`DATA
`CORE n
`
`
`
`10
`
`
`"""" I {6
`DATA
`.,
`
`
`
`
`I
`REAL TIME
`DATA
`
`
`PROCESSING
`
`
`
`
`DATA
`CORE n
`PIPELINE
`
`
`IO
`
`
`
`
`MULTI-BANK CACHE
`
`
`
`i H
`L4
`
`
`
`13
`
`
`L4
`
`I M I
`
`I L4
`
`SAMSUNG-1004
`
`Page 1 of 16
`
`SAMSUNG-1004
`Page 1 of 16
`
`
`
`US. Patent
`
`Oct. 19, 1999
`
`Sheet 1 0f 7
`
`5,968,167
`
`<F<o
`
`oei/cmzzmaaammoo<F<o
`
`
`
`
`
`<F<oezawmoomammoogoszoo<ams_mgcnzmm
`
`uuuuuuuuuuuuuuuuuuuuuuuuuVammoo
`mzzmaa<F<o
`
`025mmooma
`
`
`
`<+<o<F<omgrrg<mm
`
`
`
`
`
`wtz:Oz_mmw00mn_<F<DNv
`
`_‘OE
`
`ammoo
`
`mzzmma<F<o
`
`025mmooma
`
`
`
`<F<o<k<om§7r4<mm
`
`mIo<_l|o|_xz<m-522
`
`3ma_fl__E_3fl
`
`S
`
`AV
`
`SAMSUNG-1004
`
`Page 2 of 16
`
`SAMSUNG-1004
`Page 2 of 16
`
`
`
`
`
`US. Patent
`
`1
`
`99
`
`2
`
`5,
`
`761’8
`
`map/>5an0.
`
`wormOhmeH<HwDQD
`
`5.0mMDOOOEOS
`
`N.®_n_
`
`52mmoooomos
`
`9,
`
`wlII1%“I'll
`
`§<moommSim7mEOa5330mwmmogE528mEEsM0_OF5&2.
`
`
`
`
`
`92%9.2%x2<mmtz:
`
`%on
`
`mmm:
`
`mm
`
`mm
`
`mN
`
`OF
`
`¥Z<m_._.._3_2
`
`wm10<o
`
`SAMSUNG-1004
`
`Page 3 of 16
`
`SAMSUNG-1004
`Page 3 of 16
`
`
`
`
`US. Patent
`
`Oct. 19, 1999
`
`Sheet 3 0f 7
`
`5,968,167
`
`066
`
`
`
`
`
`mzjmnza0<DmmOOO<QHmmOOAOMFZOO27:2
`
`mm
`
`\
`
`
`
`uuuuuuu«c.41ca1-ulnulqu-u.
`
`omN
`
`3»
`
`
`
`H300591.50OmQ_>Z_0591
`
`NV
`
`om
`
`IaZ_Own=>
`
`
`
`
`
`wvwmmoommmeOawwmooma-._.w0&0v.mmMOOMQPmOm0Vwwwoomafiwolmm
`
`<>
`
`0mmwx<mm0<U3—2
`
`«mNm
`
`
`
`nnnnnnn:1Ioauuuqtu.uaI.
`
`wwwm00<
`
`mkjmzéh
`
`
`
`EON.FOOm
`
`w0<mmm.rz_
`
`
`
`mmnEDm.mE/EmowmamEmhm>m
`
`4<meaEmm
`
`mo<mmm.rz_
`
`20m
`
`mo<umw._.z_
`
`SAMSUNG-1004
`
`Page 4 of 16
`
`SAMSUNG-1004
`Page 4 of 16
`
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Oct. 19, 1999
`
`Sheet 4 0f 7
`
`5,968,167
`
`AOKHZOO
`
`
`
`O._.._._m
`
`xz<m
`
`mm
`
`20m”.mtm02:2:
`
`mQOOOmOS
`
`._._mJOmHZOO
`
`EOE“.
`
`ZOFODmeZ mDOOOng
`
`20750200
`
`MDOU
`
`SAMSUNG-1004
`
`Page 5 of 16
`
`SAMSUNG-1004
`Page 5 of 16
`
`
`
`US. Patent
`
`Oct. 19, 1999
`
`Sheet 5 0f7
`
`5,968,167
`
`FROM MCC
`DATA BUS
`
`G
`
`WE
`CONTROL BITS W
`FROM
`MICROCODE
`
`R1
`R2
`
`FIG. 5
`
`REGBTER
`FILE
`
`TO MCC
`DATA BUS
`
`H
`3
`
`Z
`
`STATUS BITS
`TO MCC
`
`STATUS BUS
`
`SAMSUNG-1004
`
`Page 6 of 16
`
`SAMSUNG-1004
`Page 6 of 16
`
`
`
`US. Patent
`
`Oct. 19, 1999
`
`Sheet 6 0f7
`
`5,968,167
`
`:5
`E I—
`3—29
`‘l B
`
`I L“ I I“
`RESOURC
`
`CHECK
`
`RESOURCE
`
`CHECK
`
`81
`
`DATA PROCESSOR STATUS
`
`PIPELINE I BANK STATUS
`
`IO PORT STATUS
`
`
`EXECUTION DEPENDENCE
`
`CORE CONTROL
`
` 82 \
`DATA BANK 1 CTRLTHREAD1_---
`
`ROUTING CONTROL
`
`ADDR BANK 2 CTRL
`
`ADDR BANK 1 CTRL
`
`DATA BANK 2 CTRL
`
`
`
`
`
`
`
`
`
`
`INSTRUCTION BUFFERS
`
`THREADn
`
`SAMSUNG-1004
`
`Page 7 of 16
`
`SAMSUNG-1004
`Page 7 of 16
`
`
`
`US. Patent
`
`Oct. 19, 1999
`
`Sheet 7 0f 7
`
`5,968,167
`
`
`
`mmooezammoomo.m
`
`lllllllllllllllllllllllllllllllllllllllllllllllll
`
`00mm
`
`N.0_n_
`
`;a!
`
`xz<mmIO<0
`
`mmtmm<
`
`g.a!EEl<
`
`
`
`xz<mMIO<0v_Z<mm10<o
`
`mmtmgmmtmm<
`
`Wm
`
`/
`
`/
`
`/
`
`/
`
`____
`
`ow
`
`mtm>>
`
`MOF<0044<
`
`EDP/«004.2
`
`0.4mmow
`
`SAMSUNG-1004
`
`Page 8 of 16
`
`SAMSUNG-1004
`Page 8 of 16
`
`
`
`
`
`
`
`1
`MULTI-THREADED DATA PROCESSING
`MANAGEMENT SYSTEM
`
`5,968,167
`
`2
`SUMMARY OF THE INVENTION
`
`This invention relates to a data processing management
`system of the type which can be used with real
`time
`multimedia inputs and processing.
`BACKGROUND TO THE INVENTION
`
`The user interface to computers has continually evolved
`from teletypes to keyboard and character terminals to the
`(graphical user interface) GUI which is currently the stan—
`dard interface for the majority of computer users. This
`evolution is continuing with sound and 3D graphics increas-
`ingly common and 3D sound and virtual reality emerging.
`It’s common thread is an increase in the complexity of the
`human computer interface achieved by an accompanying is
`increase in the types of data presented to the user (personal
`computer) PC applications are taking advantage ofthis shift
`and are increasingly relying on the availability of sound and
`3D graphics in order to achieve their full potential.
`This has resulted in chip and board suppliers offering
`products with combined functionality designed to handle
`more than one data type e.g. 2D graphics and sound or 2D
`and (motion picture experts group) MPEG playback.
`it is
`important to note that these products to date use separate
`functional units for each data type.
`More recently, programmable SIMD (Single Instruction
`Vlultiple Data) architectures (e.g. Chrom atics MPACT) have
`emerged. These architectures use identical processing ele—
`ments executing the same instruction to perform the same
`rocessing on a number of blocks of data in parallel. This
`approach works well for data which can be easily partitioned
`0 allow a common function to be performed e.g. block
`arocessing in data compression such as MPEG, but are not
`flexible enough to execute a complete general algorithm
`which of en requires conditional flow control within the data
`wrocessing.
`DSP (digital signal processor) vendors have also sought to
`address this market with MIMD (Multiple Instruction Mul-
`
`iple Da a) devices (e.g. Texas Instruments” TI320C80)
`
`which 0 er the required flexibility to process the varied data
`ypes. However since the architecture replicates general
`Jurpose DSP cores which retain a far greater degree of
`flexibility than required for the application,
`the resulting
`chip is a high cost device,
`too high for general PC and
`consumer use.
`
`
`
`
`
`CPU (central processing unit) vendors promoting fast
`RISC CPUs for both general purpose programs and multi—
`media processing are unable (and do not wish) to compro-
`mise their architecture in order to support more than a few
`multimedia specific instructions and therefore do not
`achieve the required performance levels at a reasonable cost.
`As the CPU is also typically being used to run a non—real—
`time operating system, it is also unable to provide low
`latency processing.
`Dedicated multimedia CPUs (e.g. Philips” Trimedia)
`using VLIW (very long instruction words) instructions con-
`trolling multiple processing units are unable to make effi-
`cient use of their processing power because each instruction
`is dedicated to a single task (and data type) and therefore
`unable to make optimal use of all
`the processing units
`available. For example a VLIW instruction dedicated to a 3D
`graphics operation is unable to take advantage of hardware
`designed for MPEG motion estimation. The number of
`processing units, and therefore scale-ability, is also limited
`by the VLIW word length.
`
`Preferred embodiments of the present invention address
`the requirement for a device which processes all multimedia
`data types in a manner that minimises system costs and
`provides for future developments in multimedia and the
`related industry standards. They provide an architecture
`which is scalable in processing power, real-time I/O support
`and in the number of concurrent activities which can be
`undertaken.
`
`All multimedia data types may be viewed as streams of
`data which lend themselves to a vector processing approach.
`Some of these streams will be real time (e.g. from an audio
`or video input) and as such either require dedicated buffering
`or low latency processing to avoid data loss. Each data
`stream also requires some hardware resource so that it may
`be processed.
`A preferred embodiment of the invention includes a low
`latency re al-timc processing core responsible for data IO and
`task scheduling only. This avoids the need for unnecessary
`and costly buffering. It also includes a method of dynamic
`resource checking to ensure that only tasks with the required
`resources available are run.
`
`The balance between host processing power, memory
`costs and silicon costs is also continually changing. This
`means that the optimal division of work between a host
`processor and multimedia coprocessor also changes over
`time. This device is programmable to allow the division of
`work to be altered as required.
`Scale—ability of parallel processing devices is a problem
`for both hardware design and supporting software. As more
`processing units are added to a device the distribution of
`tasks between the processing units becomes more difficult
`resulting in either a diminishing return or an exponential
`growth in the number of inter-connects between functional
`units. Such changes also typically result in alterations to the
`programming model for the device requiring wholesale
`changes to the supporting software. Preferred embodiments
`of the invention address these issues by a consistent scalable
`architecture, where all the elements may be scaled without
`creating an explosion of inter-connects between functional
`units and without changing the programming model pre-
`sented to software interfacing to the device.
`FIG. 1 shows the base architecture of the device.
`
`The device has been conceived as a re-configurable
`engine able to match all the current and future algorithms
`required to process multimedia data. The work done by it is
`split into two categories. Both real time scheduling and IO
`processing are performed by a Media Control Core whilst
`computationally intensive data processing is performed by
`one or more additional data processing units.
`This division of work is one of the architecture‘s funda-
`mental characteristics.
`
`Data processing consists of a number of steps:
`Parameter fetching and setup
`Data fetching and processing
`Data storage
`In order
`to efficiently achieve high data processing
`throughput a processor needs to perform the above opera-
`tions on a reasonably large set of data. If the data set is too
`small the processor spends too high a proportion of it’s
`power on context switching between tasks and the resulting
`need to save and restore a thread’s state.
`Because the Media Control Core is required only to
`service requests to move data between IO ports and memory
`(to allow data processing to be performed) it can context
`
`15
`
`30
`
`35
`
`4O
`
`50
`
`55
`
`60
`
`65
`
`SAMSUNG-1004
`
`Page 9 of 16
`
`SAMSUNG-1004
`Page 9 of 16
`
`
`
`5,968,167
`
`3
`switch every clock cycle, this then removes the need for
`large data buffers to support real time IO. Data processing
`units are able to process data efficiently by performing a key
`part of an algorithm on data without interruption.
`These processing elements are supported by a scalable
`multibank cache which supports efficient data movement
`and processing by caching sets of data required for the active
`algorithms being run.
`The invention is defined in its various aspects with more
`precision in the appended claims to which reference should
`now be made.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`A preferred embodiment of the invention will now be
`described in detail, by way of example, with is reference to
`the figures in which:
`FIG.
`1 shows a block diagram of an embodiment of the
`invention;
`FIG. 2 shows a block diagram of the Media Control Core -
`of FIG. 1'>
`
`15
`
`FIG. 3 is a block diagram of a second embodiment of the
`invention;
`FIG. 4 is a block diagram of the control unit instruction
`pipeline the Media Control Core;
`FIG. 5 is a block diagram of the internal architecture of
`one of the data banks of FIG. 4;
`FIG. 6 shows in block form how resource checking and
`thus process selection is performed by the Media Control
`Core; and
`FIG. 7 is a block diagram showing how access is made to
`the banked cache memory of FIG. 1.
`DETAILED DESCRIPTION
`
`The base architecture of the embodiment of the invention
`is shown in FIG. 1. The centre of the system is a media
`control core (MCC) 2. This is a fine grained multithreading
`processor. This has a plurality of inputs and outputs which
`can be coupled to real time data input and output devices 4.
`These can be, for example, video sources, audio sources,
`video outputs, audio outputs, data sources, storage devices
`etc. In a simple example only one input and one output
`would be provided.
`Also coupled to the media control core 2 are a plurality of
`data processing units 6. Each of these comprises a data
`processing core 8 which controls the processing of data via
`data pipeline 10. The core 8 is decodes and sequences
`microinstructions for the pipeline 10.
`Also coupled to the media control core 2 is a multibanked
`cache memory 12 from which data may be retrieved by the
`media control core 2 and data processing units 6 and into
`which data may be written by the media control core, the
`data processing units 6.
`The media control core is a fine grained multithreading
`processing unit which directs data from inputs to data
`processing cores or to storage and provides data to outputs.
`It is arranged so that it can switch tasks on every clock cycle.
`This is achieved by, on every clock cycle checking which of
`the possible operations it could perform have all
`the
`resources available for those tasks to be executed and, of
`those, which has the highest priority. It could be arranged to
`commence operation of more than one operation on each
`clock cycle if sufficient processing power were provided.
`This resource checking ensures that everything required
`to perform a particular task is in place. This includes external
`
`30
`
`35
`
`4O
`
`55
`
`60
`
`65
`
`4
`resources such as whether or not data is available at an input
`port (EG video data) or whether a data storage device or
`output is available. It also includes internal resources such as
`data banks for temporary storage, available processing cores
`which are not currently working on other data or previously
`processed data required for a particular new processing
`operation. The media control core operates to direct data
`from an input to an appropriate data processing unit 6 for
`processing to take place and routes data to an output when
`required making use of the cache as necessary. Once execu-
`tion of a set of instructions has commenced on a data
`processing unit 6, the MCC can look again at the various
`threads it can run and the resources available for these whilst
`the program continues to run on the data processing unit.
`The resource and priority checking of the media control
`core means that tasks which serve as real tithe data such as
`video input are able to be performed without
`the large
`memory buffers which are usually required in current real
`time inputs. In operation such as video input the media
`control core will look to see whether data is available at the
`IO port and, if it is, will receive that data and send it either
`to a portion of the multibanked cache or to data storage
`registers in preparation for processing by the one of the data
`processing unit 6.
`The data processing units 6 are all under the control and
`scheduling of the media control core 2. In the example
`shown in FIG. 1 the units consist of a processing pipeline
`(data pipeline 10) which will be made up of a number of
`processing elements such as multipliers, adders, shifters etc
`under the control of an associated data processing core 8
`which runs a sequence of instructions to perform a data
`processing algorithm. Each of these data processing cores
`will have its own microinstruction ROM and/or RAM
`storing sequences of instructions to perform a particular data
`processes. The media control core invokes the data process-
`ing unit 6 to perform its particular operation sequence by, for
`example, passing an address offset into the unit’s microin-
`struction ROM and instructing the data processing unit to
`commence execution. The data processing unit 6 will then
`perform a particular process on either data from the multi—
`banked cache or data passed to it from one of the inputs to
`the media control core until completed when it will signal to
`the media control core that its processing is complete.
`The multibanked cache 12 of FIG. 1 is used for memory
`accesses and these are all cached through this bank. The
`cache is divided into a plurality of banks 14 each of which
`can be programmed to match the requirements of one of the
`data processing tasks being undertaken. For example, a
`cache bank might be dedicated to caching texture maps from
`main memory for use in 3D graphics rendering. Using this
`programmability ofthe cache banks allows the best possible
`use of on chip memory to be made and allows dynamic
`cache allocation to be performed thereby achieving the best
`performance under any particular conditions.
`Furthermore, the use of multiple cache banks allows the
`cache to be non—blocking. That is to say, if one of the cache
`banks is dealing with a request which it is currently unable
`to satisfy, such as a read instruction Where that data is not
`currently available, then another processing thread which
`uses a separate cache bank may be run.
`The entire device as shown in FIG. 1 is scalable and may
`be constructed on a single piece of silicon as an integrated
`chip. The media control core 2 is scalable in a manner which
`will be described below with reference to FIG. 2. As the size
`of the media control core is increased it is able to support
`further data processing units 6 whilst using the same pro-
`
`SAMSUNG-1004
`
`Page 10 of 16
`
`SAMSUNG-1004
`Page 10 of 16
`
`
`
`5,968,167
`
`5
`gramming model for the media control. More cache banks
`may also be added to support the further data processing
`units thereby increasing the effectiveness of the data
`throughput to the media control core and the data processing
`units. Because the programming model of the device is not
`changed this enables a high degree of backwards compat—
`ibility to be attained.
`The media control core is shown in more detail with
`reference to FIG. 2. It is composed of a control unit 16, a set
`of read/write units 18, a set of program counter banks 20, a
`set of address banks 22, a set of data banks 24, and a set of
`input/output banks 26. These banks are all coupled together
`by a media control core status bus 28 a media control core
`control bus 29 and a media control core data interconnect
`bus 30. The media control core data interconnect bus 30 is
`used for sending data between the various difierent banks
`and the status bus provides data such as the input/output port
`status and the status of data processing units to which the
`media control core can send instructions and data.
`
`15
`
`In addition, a memory block 32 storing microcode ~
`instructions in ROM and RAM is coupled to the control unit
`16 the units 18 to 26 listed above.
`
`All the core components, 18 to 26, with the exception of
`the control unit 16, have the same basic interface model
`which allows data to be read from them, written to them and
`operations performed between data stored in them. Each
`bank consists of a closely coupled local storage register file
`with a processing unit or arithmetic logic (ALU).
`The control unit 16 is used to control the execution of the
`media control core. On each clock cycle, control unit 16
`checks the availability of all resources (eg. input/output port
`status, data processing units status, etc) using status infor-
`mation provided over the media control status bus 28 against
`he resources required to run each program under its control.
`It then starts execution of the instruction for the highest
`ariority program thread which has all its resources available.
`The program counter bank 20 is used to store program
`counters for each processing thread which is supported by
`he media control core. It consists of a register for each of
`he processing threads which the media control core is
`capable of supporting and an ALU which performs all
`operations upon the program counters for program
`arogression, looping, branching, etc. The data banks 24 are
`used for general purpose operations on data to control
`arogram [low within the media control core They are a
`general resource is which can be used as required by any
`aroccssing thread which is running on the MCC.
`The address banks 22 are used to store and manipulate
`addresses for both instructions and data and are also a
`general MCC resource in a similar manner to the data banks
`24.
`The input/output banks 26 provide an interface between
`the media control core and real
`time data streams for
`input/output which are supported by the MCC. Their status
`indicates the availability of data at a port, eg. video input, or
`the ability of a port to take the data for output. They can, as
`an option,
`include the ability to transform data as it is
`transferred in or out, for example bit stuffing of a data
`stream.
`
`
`
`The read/write banks 18 provide an interface between the
`media control core and memory (via the multibank cache).
`As more than one processing thread can be run at any one
`time more than one read/write unit is required to avoid the
`blocking of memory requests.
`important
`The media control core is scalable in all
`respects. Because it is constructed from banks which loca-
`
`30
`
`35
`
`4O
`
`50
`
`55
`
`60
`
`65
`
`6
`lise storage (register files) and processing (ALU) additional
`banks can be added without creating any unmanageable
`routing and interconnection problems. The number of pro-
`cessing threads which could be supported can be increased
`by adding registers to the program counter bank and modi-
`fying the control unit accordingly. The number of input/
`output streams which can be supported by the MCC can be
`increased by adding further IO banks.
`The data throughput can be increased by adding further
`read/write units 18 and the MCC processing power overall
`can be increased by adding further data and address banks,
`24 and 22, respectively.
`A block diagram of a specific implementation of the data
`processing management system is shown in FIG. 3. The
`MCC in this serves as a plurality of real time data input/
`output ports and controls data processing units to process
`data received from them and output to them,
`In the figure is shown a video input 34 and audio input 36
`coupled to the media control core Via associated preproces-
`sors 38 and 40. A corresponding video output 42 and audio
`output 44 are coupled to the media control core 2 via
`respective post processors 46 and 48. The video and audio
`inputs and outputs may be digital inputs and outputs.
`As in FIG. 1 the media control core 2 is coupled to a
`multibanked cache 12 in this case referred to as the main
`cache bank. A data processing unit 6 comprising a secondary
`core 8 and a data (media) pipeline 10 are coupled directly the
`media control core and are used for processing of data
`supplied to them.
`Also coupled to the media core 2 is a processing unit 50
`comprising a digital to analog converter feed core (DAC
`feed core) 52 and a DAC feed pipeline 54 which supplies
`data to a digital to analog converter 56. The purpose of this
`is to provide a graphics output. To this end, the processing
`uni 50 fetches data via the frame buffer interface 58 and
`sys em bus 60 for the host computer video graphics adaptor
`(VGA 62) is retained for compatibility only. Thus, real time
`data is supplied on the video and audio inputs and can be
`sen out on the video and audio outputs whilst graphics
`ou out can be sent by the DAC 56.
`Data for graphics output can be generated by processing
`non-real time data from a source such a graphics frame
`
`bu er, a connection to which is shown in FIG. 3 via the
`
`
`
`frame bu er interface 58, 3D data, or real time video.
`The secondary data processing core 8 and media pipeline
`10 is an example of a data processing unit which is able to
`process audio, 3D, 2D, video scaling, video decoding etc.
`This could be formed from any type of general processor.
`The DAC feed core and DAC feed pipeline is dedicated
`to processing data from a number of frame buffers for the
`generation of RGB data for a DAC. It can switch between
`source buffers on a pixel by pixel basis, thus converting data
`taken from a number of video formats including YUV and
`combining source data from multiple frame buffers by
`blending or by colour or chroma keying.
`Each core will have an associated microcode store formed
`from ROM and RAM which for the purposes of clarity are
`not shown here, but which stores instructions to be executed
`by he processor The cache banks 12 interface to the media
`control core and the data processing units 6 and 50. They
`also interface to the system bus via an address translation
`uni 64. They are also linked to the frame buffer interface 58
`
`for writing data to and reading data from one or more frame
`
`bu ers.
`
`
`
`
`
`
`
`A data bank 24 is illustrated in FIG. 5. It comprises a
`register file 72, an ALU 74, and a multiplexed input 76. The
`
`SAMSUNG-1004
`
`Page 11 of 16
`
`SAMSUNG-1004
`Page 11 of 16
`
`
`
`5,968,167
`
`15
`
`7
`operation of the data bank is controlled by a number of bits
`in a microinstruetion which are labelled WE, W, R1, and R2
`and which are input to the register file. The result of the
`micro-instruction which is performed by the ALU is made
`available as status bits H S Z which are routed to the control
`unit of the media control core to implement branches and
`conditional instructions The register file is constructed to
`allow two operands to be fetched from the input and one
`operand to be written to the output on each clock cycle The
`data input port 78 and the data output port 80 allow
`communication with other data Via the media control core
`data bus 30 to which they are connected. Thus, the data flow
`in FIG. 5 is vertically down through the diagram whilst the
`flow of control information is from left to right being formed
`of control bits from the control unit and status bits sent back
`to the control unit reflecting the status of the data bank.
`Aplurality of these data banks are used and each is in the
`same form, that is to say each has its is own register file
`closely coupled to an ALU as shown in FIG. 5. This
`arrangement, using a plurality of closely coupled registers ~
`and ALU’s, preferably in a one to one relationship, differs
`from prior art embodiments of multiple ALU’s where com—
`plex multiplexing between register banks and multiple
`ALU’s was required.
`Generally,
`these data banks perform general purpose
`operations on data thereby controlling program flow within
`the MCC and can be used by any processing thread which
`is running on the MCC.
`The address banks 22, the program counter banks 20, and
`the 10 banks 26, and the read/write units 18 are all con-
`structed and operate in a similar manner but are provided in
`separate units to allow their implementation to be optimised,
`thereby reflecting the way in which they are used.
`The address banks 22 store and manipulate addresses for
`data accesses into memory (not illustrated). They are slightly
`simpler than the data banks in that
`they use unsigned
`accumulators and do not generate any condition codes to
`send back to the control unit 16 Via the status bus.
`
`30
`
`35
`
`The program counter bank 20 is used to store the program
`counter for each processing thread supported by the media
`control core. Thus, the number of registers in the bank of the
`type shown in FIG. 5 will be equivalent to the number of
`processing threads which the MCC can support. As with the
`address banks the ALU is used to program counter opera-
`tions and is unsigned. It does not generate conditions codes
`to send back to the control unit 2.
`
`The IO banks 26 are used to interface to IO ports, and
`contain no registers or ALU’s. They interface with real time
`data streams supported by the MCC. Astatus signal indicates
`the availability ofdata at a port, or the ability of a port to take
`data. They can optionally include the ability to transform the
`data as it is transferred.
`The read/write units 18 interface to the cache bank 12.
`They have no registers or ALU’s. A read unit accepts an
`address and, when the data is returned, sets a data valid
`status bit. A write unit accepts addresses and data. Multiple
`read and write units are used to ensure that if one cache
`access blocks then another thread can be continued running
`through another read/write unit.
`An instruction buffer with the control unit (not illustrated)
`for each data processing thread stores that thread’s next
`microinstruetion and instruction operands. The instruction
`and operands include bits which describe the resources
`required to execute that instruction. These resource require—
`ments are fed into the control unit’s resource checking logic
`along with status bits describing the current status of the
`
`4O
`
`50
`
`55
`
`60
`
`65
`
`8
`Media Control Core 2, external 10 ports 20 and data
`processing units 6,50. Simple combinatorial logic such as an
`array of logic gates determines whether an instruction can
`run or not and a fixed priority selector in the control unit 16
`then launches the highest priority runnable thread into the
`data path control pipeline (shown in FIG. 4) to start execu—
`tion of that program thread. The threads task could be
`‘receive video data’, process stored audio data” etc.
`Normally an instruction will request
`its thread’s next
`instruction to be read from memory when it is run. The
`instruction is read from memory (pointed to by the program
`counter) which contains an instruction opcode and operands.
`The opcode field of the instruction is used to index into the
`microcode ROM to retrieve the next instruction and the
`resultant microinstruetion is stored into the threads instruc-
`tion buffer together with the instruction operand fields.
`The resource checking and priority is illustrated fully in
`FIG. 6 For the three threads illustrated, global status infor—
`mation is received from the necessary data banks,
`the
`necessary address banks, routing control data from the
`control unit, control status information from control unit 16,
`and execution dependency data from other processes on
`which a particular thread is dependent. All this information
`is sent to a resource checker 81 which combines it with data
`from 10 ports, the various pipeline data bank status, and the
`status of the various data processing units. This happens for
`each possible thread.
`If it
`is possible to run that data
`processing thread then an output is generated to a is priority
`selector 82. This has information about the priority of each
`of the data processing threads supported and, as a result, can
`select for execution the thread with highest priority. For
`example, a real time data input such a Video would be given
`a high priority and this would take precedence over a
`background processing operation.
`Because the next
`instruction for a thread is already
`provided in an instruction buffer that instruction is always
`available for resource checking and priority selection. Thus,
`there is no loss of execution time by checking the status of
`every clock cycle.
`The data path control pipeline shown in FIG. 4 operates
`by allowing fields of a microinstruetion word to be placed
`into a pipeline at different depths. T