throbber
United States Patent
`[19]
`[11] Patent Number:
`5,968,167
`
`Whittaker et al.
`[45] Date of Patent:
`Oct. 19, 1999
`
`USOOS968167A
`
`54 MULTI-THREADED DATA PROCESSING
`MANAGEMENT SYSTEM
`
`
`
`
`
`75
`
`73
`
`21
`22
`
`51
`52
`58
`
`59
`
`Inventors: James Robert Whittaker; Paul
`Rowland, both of Herts, United
`Kingdom
`
`Assignee: Videologic Limited, Hertfordshire,
`United Kingdom
`
`APPL No“ 08/834,808
`Filed:
`Apr. 3, 1997
`
`Int. Cl.6 ...................................................... G06F 13/00
`U-S- CL --------------------------------------------------- 712/225
`
`new 0f Search ...................... 710/131, 240,
`710/1, 2009 100, 5; 712/225
`
`References Cited
`US. 1’MEN1‘ DOCUMENTS
`
`4,542,455
`9/1985 Demeure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 395/674
`5,307,496
`4/1994 Ichinose et a1.
`-- 395/974
`4/1996 Milne et a1.
`5,487,153
`1/1996 Hammerstrom et 211
`569319551310
`5,511,002
`
`"
`/
`5,684.987
`
` 11/1997 Mamiya et a1.
`.. 395/614
`
`
`
`5,689.674 “/1997 Gri ith et a1.
`.....
`.. 395/393
`5 6991537
`12/1997 Sharangpani et a1.
`.. 395/393
`
`5/1998 Lambrecht et a1.
`.................... 395/308
`5:748,921
`FOREIGN PATENT DOCUMENTS
`
`0 367 639
`0 397 180
`WO 94/15287
`
`5/1990 European Fat. or. .
`11/1990 European Fat. 011.
`.
`7/1994 WIPO.
`
`Primary Examiner—David Y. Eng
`Allamey, Agent, 0r Firm—Flynn, Thiel, Boutell & Tanis,
`P.C.
`
`[57]
`
`ABSTRACT
`
`A data processing management system for controlling 1e
`execution of multiple threads of processing instructions such
`as the instructions that are employed to process multimecia
`data. The management system includes a media control core,
`a number of data processing units and a multi-banked 621616.
`For the processing instruction for each thread, the multime-
`dia core identifies the data processing operation to De
`executed as well as the resources needed tO execute tiat
`
`units for execution. The data and addresses upon which 1e
`
`Operation. The multimedia core then determines for each
`instruction if all the resources are available to execute 1e
`operation. For the operations for which all the resources are
`available, the multimedia core then determines which opera-
`tion has the highest priority, The Operation having
`1e
`highest priority is then passed to one of the data processing
`.
`.
`.
`.
`data processmg urnts act are temporarlly stored m 1e
`multi-banked cache. Data are written into the cache from
`multiple input POITS~ Dam arc read from thc cache 01”
`through multiple output ports.
`
`0 020 202 12/1980 European Pat. Off.
`
`.
`
`30 Clailns, 7 Drawing Sheets
`
`4
`2
`DATA PROCESSING UNITS
`
`
`
`
`1,
`DATA
`
`
`
`
`.’
`REAL T'ME
`PROCESSING
`HEEL/NE
`
`
`
`DATA
`CORE n
`.
`
`
`W'D'A'TA """""""" :,'5
`
`
`
`
`REAL TIME
`DATA
`/
`
`
`MEDIA CONTROL CORE
`PROCESSING
`.
`
`PIPELINE 5‘10
`DATA
`CORE n
`
`
`
`10
`
`
`"""" I {6
`DATA
`.,
`
`
`
`
`I
`REAL TIME
`DATA
`
`
`PROCESSING
`
`
`
`
`DATA
`CORE n
`PIPELINE
`
`
`IO
`
`
`
`
`MULTI-BANK CACHE
`
`
`
`i H
`L4
`
`
`
`13
`
`
`L4
`
`I M I
`
`I L4
`
`SAMSUNG-1004
`
`Page 1 of 16
`
`SAMSUNG-1004
`Page 1 of 16
`
`

`

`US. Patent
`
`Oct. 19, 1999
`
`Sheet 1 0f 7
`
`5,968,167
`
`<F<o
`
`oei/cmzzmaaammoo<F<o
`
`
`
`
`
`<F<oezawmoomammoogoszoo<ams_mgcnzmm
`
`uuuuuuuuuuuuuuuuuuuuuuuuuVammoo
`mzzmaa<F<o
`
`025mmooma
`
`
`
`<+<o<F<omgrrg<mm
`
`
`
`
`
`wtz:Oz_mmw00mn_<F<DNv
`
`_‘OE
`
`ammoo
`
`mzzmma<F<o
`
`025mmooma
`
`
`
`<F<o<k<om§7r4<mm
`
`mIo<_l|o|_xz<m-522
`
`3ma_fl__E_3fl
`
`S
`
`AV
`
`SAMSUNG-1004
`
`Page 2 of 16
`
`SAMSUNG-1004
`Page 2 of 16
`
`
`
`

`

`US. Patent
`
`1
`
`99
`
`2
`
`5,
`
`761’8
`
`map/>5an0.
`
`wormOhmeH<HwDQD
`
`5.0mMDOOOEOS
`
`N.®_n_
`
`52mmoooomos
`
`9,
`
`wlII1%“I'll
`
`§<moommSim7mEOa5330mwmmogE528mEEsM0_OF5&2.
`
`
`
`
`
`92%9.2%x2<mmtz:
`
`%on
`
`mmm:
`
`mm
`
`mm
`
`mN
`
`OF
`
`¥Z<m_._.._3_2
`
`wm10<o
`
`SAMSUNG-1004
`
`Page 3 of 16
`
`SAMSUNG-1004
`Page 3 of 16
`
`
`

`

`US. Patent
`
`Oct. 19, 1999
`
`Sheet 3 0f 7
`
`5,968,167
`
`066
`
`
`
`
`
`mzjmnza0<DmmOOO<QHmmOOAOMFZOO27:2
`
`mm
`
`\
`
`
`
`uuuuuuu«c.41ca1-ulnulqu-u.
`
`omN
`
`3»
`
`
`
`H300591.50OmQ_>Z_0591
`
`NV
`
`om
`
`IaZ_Own=>
`
`
`
`
`
`wvwmmoommmeOawwmooma-._.w0&0v.mmMOOMQPmOm0Vwwwoomafiwolmm
`
`<>
`
`0mmwx<mm0<U3—2
`
`«mNm
`
`
`
`nnnnnnn:1Ioauuuqtu.uaI.
`
`wwwm00<
`
`mkjmzéh
`
`
`
`EON.FOOm
`
`w0<mmm.rz_
`
`
`
`mmnEDm.mE/EmowmamEmhm>m
`
`4<meaEmm
`
`mo<mmm.rz_
`
`20m
`
`mo<umw._.z_
`
`SAMSUNG-1004
`
`Page 4 of 16
`
`SAMSUNG-1004
`Page 4 of 16
`
`
`
`
`
`
`
`
`
`
`

`

`US. Patent
`
`Oct. 19, 1999
`
`Sheet 4 0f 7
`
`5,968,167
`
`AOKHZOO
`
`
`
`O._.._._m
`
`xz<m
`
`mm
`
`20m”.mtm02:2:
`
`mQOOOmOS
`
`._._mJOmHZOO
`
`EOE“.
`
`ZOFODmeZ mDOOOng
`
`20750200
`
`MDOU
`
`SAMSUNG-1004
`
`Page 5 of 16
`
`SAMSUNG-1004
`Page 5 of 16
`
`

`

`US. Patent
`
`Oct. 19, 1999
`
`Sheet 5 0f7
`
`5,968,167
`
`FROM MCC
`DATA BUS
`
`G
`
`WE
`CONTROL BITS W
`FROM
`MICROCODE
`
`R1
`R2
`
`FIG. 5
`
`REGBTER
`FILE
`
`TO MCC
`DATA BUS
`
`H
`3
`
`Z
`
`STATUS BITS
`TO MCC
`
`STATUS BUS
`
`SAMSUNG-1004
`
`Page 6 of 16
`
`SAMSUNG-1004
`Page 6 of 16
`
`

`

`US. Patent
`
`Oct. 19, 1999
`
`Sheet 6 0f7
`
`5,968,167
`
`:5
`E I—
`3—29
`‘l B
`
`I L“ I I“
`RESOURC
`
`CHECK
`
`RESOURCE
`
`CHECK
`
`81
`
`DATA PROCESSOR STATUS
`
`PIPELINE I BANK STATUS
`
`IO PORT STATUS
`
`
`EXECUTION DEPENDENCE
`
`CORE CONTROL
`
` 82 \
`DATA BANK 1 CTRLTHREAD1_---
`
`ROUTING CONTROL
`
`ADDR BANK 2 CTRL
`
`ADDR BANK 1 CTRL
`
`DATA BANK 2 CTRL
`
`
`
`
`
`
`
`
`
`
`INSTRUCTION BUFFERS
`
`THREADn
`
`SAMSUNG-1004
`
`Page 7 of 16
`
`SAMSUNG-1004
`Page 7 of 16
`
`

`

`US. Patent
`
`Oct. 19, 1999
`
`Sheet 7 0f 7
`
`5,968,167
`
`
`
`mmooezammoomo.m
`
`lllllllllllllllllllllllllllllllllllllllllllllllll
`
`00mm
`
`N.0_n_
`
`;a!
`
`xz<mmIO<0
`
`mmtmm<
`
`g.a!EEl<
`
`
`
`xz<mMIO<0v_Z<mm10<o
`
`mmtmgmmtmm<
`
`Wm
`
`/
`
`/
`
`/
`
`/
`
`____
`
`ow
`
`mtm>>
`
`MOF<0044<
`
`EDP/«004.2
`
`0.4mmow
`
`SAMSUNG-1004
`
`Page 8 of 16
`
`SAMSUNG-1004
`Page 8 of 16
`
`
`
`
`
`

`

`1
`MULTI-THREADED DATA PROCESSING
`MANAGEMENT SYSTEM
`
`5,968,167
`
`2
`SUMMARY OF THE INVENTION
`
`This invention relates to a data processing management
`system of the type which can be used with real
`time
`multimedia inputs and processing.
`BACKGROUND TO THE INVENTION
`
`The user interface to computers has continually evolved
`from teletypes to keyboard and character terminals to the
`(graphical user interface) GUI which is currently the stan—
`dard interface for the majority of computer users. This
`evolution is continuing with sound and 3D graphics increas-
`ingly common and 3D sound and virtual reality emerging.
`It’s common thread is an increase in the complexity of the
`human computer interface achieved by an accompanying is
`increase in the types of data presented to the user (personal
`computer) PC applications are taking advantage ofthis shift
`and are increasingly relying on the availability of sound and
`3D graphics in order to achieve their full potential.
`This has resulted in chip and board suppliers offering
`products with combined functionality designed to handle
`more than one data type e.g. 2D graphics and sound or 2D
`and (motion picture experts group) MPEG playback.
`it is
`important to note that these products to date use separate
`functional units for each data type.
`More recently, programmable SIMD (Single Instruction
`Vlultiple Data) architectures (e.g. Chrom atics MPACT) have
`emerged. These architectures use identical processing ele—
`ments executing the same instruction to perform the same
`rocessing on a number of blocks of data in parallel. This
`approach works well for data which can be easily partitioned
`0 allow a common function to be performed e.g. block
`arocessing in data compression such as MPEG, but are not
`flexible enough to execute a complete general algorithm
`which of en requires conditional flow control within the data
`wrocessing.
`DSP (digital signal processor) vendors have also sought to
`address this market with MIMD (Multiple Instruction Mul-
`
`iple Da a) devices (e.g. Texas Instruments” TI320C80)
`
`which 0 er the required flexibility to process the varied data
`ypes. However since the architecture replicates general
`Jurpose DSP cores which retain a far greater degree of
`flexibility than required for the application,
`the resulting
`chip is a high cost device,
`too high for general PC and
`consumer use.
`
`
`
`
`
`CPU (central processing unit) vendors promoting fast
`RISC CPUs for both general purpose programs and multi—
`media processing are unable (and do not wish) to compro-
`mise their architecture in order to support more than a few
`multimedia specific instructions and therefore do not
`achieve the required performance levels at a reasonable cost.
`As the CPU is also typically being used to run a non—real—
`time operating system, it is also unable to provide low
`latency processing.
`Dedicated multimedia CPUs (e.g. Philips” Trimedia)
`using VLIW (very long instruction words) instructions con-
`trolling multiple processing units are unable to make effi-
`cient use of their processing power because each instruction
`is dedicated to a single task (and data type) and therefore
`unable to make optimal use of all
`the processing units
`available. For example a VLIW instruction dedicated to a 3D
`graphics operation is unable to take advantage of hardware
`designed for MPEG motion estimation. The number of
`processing units, and therefore scale-ability, is also limited
`by the VLIW word length.
`
`Preferred embodiments of the present invention address
`the requirement for a device which processes all multimedia
`data types in a manner that minimises system costs and
`provides for future developments in multimedia and the
`related industry standards. They provide an architecture
`which is scalable in processing power, real-time I/O support
`and in the number of concurrent activities which can be
`undertaken.
`
`All multimedia data types may be viewed as streams of
`data which lend themselves to a vector processing approach.
`Some of these streams will be real time (e.g. from an audio
`or video input) and as such either require dedicated buffering
`or low latency processing to avoid data loss. Each data
`stream also requires some hardware resource so that it may
`be processed.
`A preferred embodiment of the invention includes a low
`latency re al-timc processing core responsible for data IO and
`task scheduling only. This avoids the need for unnecessary
`and costly buffering. It also includes a method of dynamic
`resource checking to ensure that only tasks with the required
`resources available are run.
`
`The balance between host processing power, memory
`costs and silicon costs is also continually changing. This
`means that the optimal division of work between a host
`processor and multimedia coprocessor also changes over
`time. This device is programmable to allow the division of
`work to be altered as required.
`Scale—ability of parallel processing devices is a problem
`for both hardware design and supporting software. As more
`processing units are added to a device the distribution of
`tasks between the processing units becomes more difficult
`resulting in either a diminishing return or an exponential
`growth in the number of inter-connects between functional
`units. Such changes also typically result in alterations to the
`programming model for the device requiring wholesale
`changes to the supporting software. Preferred embodiments
`of the invention address these issues by a consistent scalable
`architecture, where all the elements may be scaled without
`creating an explosion of inter-connects between functional
`units and without changing the programming model pre-
`sented to software interfacing to the device.
`FIG. 1 shows the base architecture of the device.
`
`The device has been conceived as a re-configurable
`engine able to match all the current and future algorithms
`required to process multimedia data. The work done by it is
`split into two categories. Both real time scheduling and IO
`processing are performed by a Media Control Core whilst
`computationally intensive data processing is performed by
`one or more additional data processing units.
`This division of work is one of the architecture‘s funda-
`mental characteristics.
`
`Data processing consists of a number of steps:
`Parameter fetching and setup
`Data fetching and processing
`Data storage
`In order
`to efficiently achieve high data processing
`throughput a processor needs to perform the above opera-
`tions on a reasonably large set of data. If the data set is too
`small the processor spends too high a proportion of it’s
`power on context switching between tasks and the resulting
`need to save and restore a thread’s state.
`Because the Media Control Core is required only to
`service requests to move data between IO ports and memory
`(to allow data processing to be performed) it can context
`
`15
`
`30
`
`35
`
`4O
`
`50
`
`55
`
`60
`
`65
`
`SAMSUNG-1004
`
`Page 9 of 16
`
`SAMSUNG-1004
`Page 9 of 16
`
`

`

`5,968,167
`
`3
`switch every clock cycle, this then removes the need for
`large data buffers to support real time IO. Data processing
`units are able to process data efficiently by performing a key
`part of an algorithm on data without interruption.
`These processing elements are supported by a scalable
`multibank cache which supports efficient data movement
`and processing by caching sets of data required for the active
`algorithms being run.
`The invention is defined in its various aspects with more
`precision in the appended claims to which reference should
`now be made.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`A preferred embodiment of the invention will now be
`described in detail, by way of example, with is reference to
`the figures in which:
`FIG.
`1 shows a block diagram of an embodiment of the
`invention;
`FIG. 2 shows a block diagram of the Media Control Core -
`of FIG. 1'>
`
`15
`
`FIG. 3 is a block diagram of a second embodiment of the
`invention;
`FIG. 4 is a block diagram of the control unit instruction
`pipeline the Media Control Core;
`FIG. 5 is a block diagram of the internal architecture of
`one of the data banks of FIG. 4;
`FIG. 6 shows in block form how resource checking and
`thus process selection is performed by the Media Control
`Core; and
`FIG. 7 is a block diagram showing how access is made to
`the banked cache memory of FIG. 1.
`DETAILED DESCRIPTION
`
`The base architecture of the embodiment of the invention
`is shown in FIG. 1. The centre of the system is a media
`control core (MCC) 2. This is a fine grained multithreading
`processor. This has a plurality of inputs and outputs which
`can be coupled to real time data input and output devices 4.
`These can be, for example, video sources, audio sources,
`video outputs, audio outputs, data sources, storage devices
`etc. In a simple example only one input and one output
`would be provided.
`Also coupled to the media control core 2 are a plurality of
`data processing units 6. Each of these comprises a data
`processing core 8 which controls the processing of data via
`data pipeline 10. The core 8 is decodes and sequences
`microinstructions for the pipeline 10.
`Also coupled to the media control core 2 is a multibanked
`cache memory 12 from which data may be retrieved by the
`media control core 2 and data processing units 6 and into
`which data may be written by the media control core, the
`data processing units 6.
`The media control core is a fine grained multithreading
`processing unit which directs data from inputs to data
`processing cores or to storage and provides data to outputs.
`It is arranged so that it can switch tasks on every clock cycle.
`This is achieved by, on every clock cycle checking which of
`the possible operations it could perform have all
`the
`resources available for those tasks to be executed and, of
`those, which has the highest priority. It could be arranged to
`commence operation of more than one operation on each
`clock cycle if sufficient processing power were provided.
`This resource checking ensures that everything required
`to perform a particular task is in place. This includes external
`
`30
`
`35
`
`4O
`
`55
`
`60
`
`65
`
`4
`resources such as whether or not data is available at an input
`port (EG video data) or whether a data storage device or
`output is available. It also includes internal resources such as
`data banks for temporary storage, available processing cores
`which are not currently working on other data or previously
`processed data required for a particular new processing
`operation. The media control core operates to direct data
`from an input to an appropriate data processing unit 6 for
`processing to take place and routes data to an output when
`required making use of the cache as necessary. Once execu-
`tion of a set of instructions has commenced on a data
`processing unit 6, the MCC can look again at the various
`threads it can run and the resources available for these whilst
`the program continues to run on the data processing unit.
`The resource and priority checking of the media control
`core means that tasks which serve as real tithe data such as
`video input are able to be performed without
`the large
`memory buffers which are usually required in current real
`time inputs. In operation such as video input the media
`control core will look to see whether data is available at the
`IO port and, if it is, will receive that data and send it either
`to a portion of the multibanked cache or to data storage
`registers in preparation for processing by the one of the data
`processing unit 6.
`The data processing units 6 are all under the control and
`scheduling of the media control core 2. In the example
`shown in FIG. 1 the units consist of a processing pipeline
`(data pipeline 10) which will be made up of a number of
`processing elements such as multipliers, adders, shifters etc
`under the control of an associated data processing core 8
`which runs a sequence of instructions to perform a data
`processing algorithm. Each of these data processing cores
`will have its own microinstruction ROM and/or RAM
`storing sequences of instructions to perform a particular data
`processes. The media control core invokes the data process-
`ing unit 6 to perform its particular operation sequence by, for
`example, passing an address offset into the unit’s microin-
`struction ROM and instructing the data processing unit to
`commence execution. The data processing unit 6 will then
`perform a particular process on either data from the multi—
`banked cache or data passed to it from one of the inputs to
`the media control core until completed when it will signal to
`the media control core that its processing is complete.
`The multibanked cache 12 of FIG. 1 is used for memory
`accesses and these are all cached through this bank. The
`cache is divided into a plurality of banks 14 each of which
`can be programmed to match the requirements of one of the
`data processing tasks being undertaken. For example, a
`cache bank might be dedicated to caching texture maps from
`main memory for use in 3D graphics rendering. Using this
`programmability ofthe cache banks allows the best possible
`use of on chip memory to be made and allows dynamic
`cache allocation to be performed thereby achieving the best
`performance under any particular conditions.
`Furthermore, the use of multiple cache banks allows the
`cache to be non—blocking. That is to say, if one of the cache
`banks is dealing with a request which it is currently unable
`to satisfy, such as a read instruction Where that data is not
`currently available, then another processing thread which
`uses a separate cache bank may be run.
`The entire device as shown in FIG. 1 is scalable and may
`be constructed on a single piece of silicon as an integrated
`chip. The media control core 2 is scalable in a manner which
`will be described below with reference to FIG. 2. As the size
`of the media control core is increased it is able to support
`further data processing units 6 whilst using the same pro-
`
`SAMSUNG-1004
`
`Page 10 of 16
`
`SAMSUNG-1004
`Page 10 of 16
`
`

`

`5,968,167
`
`5
`gramming model for the media control. More cache banks
`may also be added to support the further data processing
`units thereby increasing the effectiveness of the data
`throughput to the media control core and the data processing
`units. Because the programming model of the device is not
`changed this enables a high degree of backwards compat—
`ibility to be attained.
`The media control core is shown in more detail with
`reference to FIG. 2. It is composed of a control unit 16, a set
`of read/write units 18, a set of program counter banks 20, a
`set of address banks 22, a set of data banks 24, and a set of
`input/output banks 26. These banks are all coupled together
`by a media control core status bus 28 a media control core
`control bus 29 and a media control core data interconnect
`bus 30. The media control core data interconnect bus 30 is
`used for sending data between the various difierent banks
`and the status bus provides data such as the input/output port
`status and the status of data processing units to which the
`media control core can send instructions and data.
`
`15
`
`In addition, a memory block 32 storing microcode ~
`instructions in ROM and RAM is coupled to the control unit
`16 the units 18 to 26 listed above.
`
`All the core components, 18 to 26, with the exception of
`the control unit 16, have the same basic interface model
`which allows data to be read from them, written to them and
`operations performed between data stored in them. Each
`bank consists of a closely coupled local storage register file
`with a processing unit or arithmetic logic (ALU).
`The control unit 16 is used to control the execution of the
`media control core. On each clock cycle, control unit 16
`checks the availability of all resources (eg. input/output port
`status, data processing units status, etc) using status infor-
`mation provided over the media control status bus 28 against
`he resources required to run each program under its control.
`It then starts execution of the instruction for the highest
`ariority program thread which has all its resources available.
`The program counter bank 20 is used to store program
`counters for each processing thread which is supported by
`he media control core. It consists of a register for each of
`he processing threads which the media control core is
`capable of supporting and an ALU which performs all
`operations upon the program counters for program
`arogression, looping, branching, etc. The data banks 24 are
`used for general purpose operations on data to control
`arogram [low within the media control core They are a
`general resource is which can be used as required by any
`aroccssing thread which is running on the MCC.
`The address banks 22 are used to store and manipulate
`addresses for both instructions and data and are also a
`general MCC resource in a similar manner to the data banks
`24.
`The input/output banks 26 provide an interface between
`the media control core and real
`time data streams for
`input/output which are supported by the MCC. Their status
`indicates the availability of data at a port, eg. video input, or
`the ability of a port to take the data for output. They can, as
`an option,
`include the ability to transform data as it is
`transferred in or out, for example bit stuffing of a data
`stream.
`
`
`
`The read/write banks 18 provide an interface between the
`media control core and memory (via the multibank cache).
`As more than one processing thread can be run at any one
`time more than one read/write unit is required to avoid the
`blocking of memory requests.
`important
`The media control core is scalable in all
`respects. Because it is constructed from banks which loca-
`
`30
`
`35
`
`4O
`
`50
`
`55
`
`60
`
`65
`
`6
`lise storage (register files) and processing (ALU) additional
`banks can be added without creating any unmanageable
`routing and interconnection problems. The number of pro-
`cessing threads which could be supported can be increased
`by adding registers to the program counter bank and modi-
`fying the control unit accordingly. The number of input/
`output streams which can be supported by the MCC can be
`increased by adding further IO banks.
`The data throughput can be increased by adding further
`read/write units 18 and the MCC processing power overall
`can be increased by adding further data and address banks,
`24 and 22, respectively.
`A block diagram of a specific implementation of the data
`processing management system is shown in FIG. 3. The
`MCC in this serves as a plurality of real time data input/
`output ports and controls data processing units to process
`data received from them and output to them,
`In the figure is shown a video input 34 and audio input 36
`coupled to the media control core Via associated preproces-
`sors 38 and 40. A corresponding video output 42 and audio
`output 44 are coupled to the media control core 2 via
`respective post processors 46 and 48. The video and audio
`inputs and outputs may be digital inputs and outputs.
`As in FIG. 1 the media control core 2 is coupled to a
`multibanked cache 12 in this case referred to as the main
`cache bank. A data processing unit 6 comprising a secondary
`core 8 and a data (media) pipeline 10 are coupled directly the
`media control core and are used for processing of data
`supplied to them.
`Also coupled to the media core 2 is a processing unit 50
`comprising a digital to analog converter feed core (DAC
`feed core) 52 and a DAC feed pipeline 54 which supplies
`data to a digital to analog converter 56. The purpose of this
`is to provide a graphics output. To this end, the processing
`uni 50 fetches data via the frame buffer interface 58 and
`sys em bus 60 for the host computer video graphics adaptor
`(VGA 62) is retained for compatibility only. Thus, real time
`data is supplied on the video and audio inputs and can be
`sen out on the video and audio outputs whilst graphics
`ou out can be sent by the DAC 56.
`Data for graphics output can be generated by processing
`non-real time data from a source such a graphics frame
`
`bu er, a connection to which is shown in FIG. 3 via the
`
`
`
`frame bu er interface 58, 3D data, or real time video.
`The secondary data processing core 8 and media pipeline
`10 is an example of a data processing unit which is able to
`process audio, 3D, 2D, video scaling, video decoding etc.
`This could be formed from any type of general processor.
`The DAC feed core and DAC feed pipeline is dedicated
`to processing data from a number of frame buffers for the
`generation of RGB data for a DAC. It can switch between
`source buffers on a pixel by pixel basis, thus converting data
`taken from a number of video formats including YUV and
`combining source data from multiple frame buffers by
`blending or by colour or chroma keying.
`Each core will have an associated microcode store formed
`from ROM and RAM which for the purposes of clarity are
`not shown here, but which stores instructions to be executed
`by he processor The cache banks 12 interface to the media
`control core and the data processing units 6 and 50. They
`also interface to the system bus via an address translation
`uni 64. They are also linked to the frame buffer interface 58
`
`for writing data to and reading data from one or more frame
`
`bu ers.
`
`
`
`
`
`
`
`A data bank 24 is illustrated in FIG. 5. It comprises a
`register file 72, an ALU 74, and a multiplexed input 76. The
`
`SAMSUNG-1004
`
`Page 11 of 16
`
`SAMSUNG-1004
`Page 11 of 16
`
`

`

`5,968,167
`
`15
`
`7
`operation of the data bank is controlled by a number of bits
`in a microinstruetion which are labelled WE, W, R1, and R2
`and which are input to the register file. The result of the
`micro-instruction which is performed by the ALU is made
`available as status bits H S Z which are routed to the control
`unit of the media control core to implement branches and
`conditional instructions The register file is constructed to
`allow two operands to be fetched from the input and one
`operand to be written to the output on each clock cycle The
`data input port 78 and the data output port 80 allow
`communication with other data Via the media control core
`data bus 30 to which they are connected. Thus, the data flow
`in FIG. 5 is vertically down through the diagram whilst the
`flow of control information is from left to right being formed
`of control bits from the control unit and status bits sent back
`to the control unit reflecting the status of the data bank.
`Aplurality of these data banks are used and each is in the
`same form, that is to say each has its is own register file
`closely coupled to an ALU as shown in FIG. 5. This
`arrangement, using a plurality of closely coupled registers ~
`and ALU’s, preferably in a one to one relationship, differs
`from prior art embodiments of multiple ALU’s where com—
`plex multiplexing between register banks and multiple
`ALU’s was required.
`Generally,
`these data banks perform general purpose
`operations on data thereby controlling program flow within
`the MCC and can be used by any processing thread which
`is running on the MCC.
`The address banks 22, the program counter banks 20, and
`the 10 banks 26, and the read/write units 18 are all con-
`structed and operate in a similar manner but are provided in
`separate units to allow their implementation to be optimised,
`thereby reflecting the way in which they are used.
`The address banks 22 store and manipulate addresses for
`data accesses into memory (not illustrated). They are slightly
`simpler than the data banks in that
`they use unsigned
`accumulators and do not generate any condition codes to
`send back to the control unit 16 Via the status bus.
`
`30
`
`35
`
`The program counter bank 20 is used to store the program
`counter for each processing thread supported by the media
`control core. Thus, the number of registers in the bank of the
`type shown in FIG. 5 will be equivalent to the number of
`processing threads which the MCC can support. As with the
`address banks the ALU is used to program counter opera-
`tions and is unsigned. It does not generate conditions codes
`to send back to the control unit 2.
`
`The IO banks 26 are used to interface to IO ports, and
`contain no registers or ALU’s. They interface with real time
`data streams supported by the MCC. Astatus signal indicates
`the availability ofdata at a port, or the ability of a port to take
`data. They can optionally include the ability to transform the
`data as it is transferred.
`The read/write units 18 interface to the cache bank 12.
`They have no registers or ALU’s. A read unit accepts an
`address and, when the data is returned, sets a data valid
`status bit. A write unit accepts addresses and data. Multiple
`read and write units are used to ensure that if one cache
`access blocks then another thread can be continued running
`through another read/write unit.
`An instruction buffer with the control unit (not illustrated)
`for each data processing thread stores that thread’s next
`microinstruetion and instruction operands. The instruction
`and operands include bits which describe the resources
`required to execute that instruction. These resource require—
`ments are fed into the control unit’s resource checking logic
`along with status bits describing the current status of the
`
`4O
`
`50
`
`55
`
`60
`
`65
`
`8
`Media Control Core 2, external 10 ports 20 and data
`processing units 6,50. Simple combinatorial logic such as an
`array of logic gates determines whether an instruction can
`run or not and a fixed priority selector in the control unit 16
`then launches the highest priority runnable thread into the
`data path control pipeline (shown in FIG. 4) to start execu—
`tion of that program thread. The threads task could be
`‘receive video data’, process stored audio data” etc.
`Normally an instruction will request
`its thread’s next
`instruction to be read from memory when it is run. The
`instruction is read from memory (pointed to by the program
`counter) which contains an instruction opcode and operands.
`The opcode field of the instruction is used to index into the
`microcode ROM to retrieve the next instruction and the
`resultant microinstruetion is stored into the threads instruc-
`tion buffer together with the instruction operand fields.
`The resource checking and priority is illustrated fully in
`FIG. 6 For the three threads illustrated, global status infor—
`mation is received from the necessary data banks,
`the
`necessary address banks, routing control data from the
`control unit, control status information from control unit 16,
`and execution dependency data from other processes on
`which a particular thread is dependent. All this information
`is sent to a resource checker 81 which combines it with data
`from 10 ports, the various pipeline data bank status, and the
`status of the various data processing units. This happens for
`each possible thread.
`If it
`is possible to run that data
`processing thread then an output is generated to a is priority
`selector 82. This has information about the priority of each
`of the data processing threads supported and, as a result, can
`select for execution the thread with highest priority. For
`example, a real time data input such a Video would be given
`a high priority and this would take precedence over a
`background processing operation.
`Because the next
`instruction for a thread is already
`provided in an instruction buffer that instruction is always
`available for resource checking and priority selection. Thus,
`there is no loss of execution time by checking the status of
`every clock cycle.
`The data path control pipeline shown in FIG. 4 operates
`by allowing fields of a microinstruetion word to be placed
`into a pipeline at different depths. T

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket