`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 1 of 33 Page|D# 10453
`
`
`
`
`EXHIBIT A
`
`EXHIBIT A
`
`
`
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 2 of 33 PageID# 10454
`Case 3:14-CV-00757-REP-DJN Document 81-flllfllmmnflflMIHMHImmlmmlmfllr0454
`
`U8005860158A
`
`Unlted States Patent
`
`[19]
`
`[11] Patent Number:
`
`5,860,158
`
`Pai et al.
`
`[45] Date of Patent:
`
`Jan. 12, 1999
`
`[54] CACHE CONTROL UNIT WITH A CACHE
`REQUEST 'l‘RANSAC’l‘lON-ORIEN’I‘ED
`PROTOCOL
`
`_
`[7“]
`
`.
`.
`.
`.
`.
`.
`‘
`[memors' $110112“;53:2,hgtltlII—illtgi’é‘llg' Nguyen’
`
`[73] Assignee: Samsung Eleetmnies Company, 1.111.,
`5‘70"], R311 Of Korea
`
`[21] Appl. No.: 751,149
`
`[22]
`
`Filed:
`
`Nov. 15, 1996
`
`Int. Cl.6 ...................................................... G061“ 13/00
`[51]
`[52] US. Cl. ............................................. 711/118; 711/130
`[58] Field of Search ..................................... 711t119, 120,
`711,130,140, 118
`
`[56]
`
`References Cited
`.
`..
`‘
`‘
`U'b‘ 1”“th DOCUMEN I 5
`911987 Keeley etal.
`4.1195943
`4.701.844 1011987 Thompson etal.
`4.707.734
`1111987 Ryan et al.
`
`
`
`7111140
`711.5119
`7111140
`
`711/3
`251990 Sachs ct al.
`4.890.275
`. 395x425
`12.31994 Chang etal.
`5.377.345
`
`. 395.5800
`551995 Fllis c1 al.
`......
`5.4l8,973
`.
`651906 Balmcr et al.
`..
`5.524.265
`
`.. 395/1821
`5574.849 1151.906 Sonnicr et al.
`.
`
`5.659.782
`8.1997 Senter et al.
`3951811023
`Primary Examiner—Tod R. Swann
`Assistant Examiner—Felix B. Lee
`Attorney, A gent, 0r FirmfiSkjerven, Morrill, MacPherson,
`Franklin & Friel, L.L.P.; Stephen A. Terrile
`
`[57J
`
`ABSTRACT
`
`A cache control unit and a method of controlling a cache.
`The cache is coupled to a cache accessing device. A first
`cache request is received from the device. A request iden-
`Iificalion information is assigned 10 the first cache request
`and provided to the requesting device.
`'lhc first cache
`request may begin to be processed. A second cache request
`is received from the cache accessing device. The second
`cache request
`is assigned to the first cache request and
`provided to the requesting device. The first and second cache
`PCQUCSIS are finally fUIIS'Scn’iCCd-
`
`37 Claims, 8 Drawing Sheets
`
`ARM_CCU INTERFACE STATE MACHINE (A_SM)
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 3 of 33 PageID# 10455
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 3 of 33 Page|D# 10455
`
`
`US. Patent
`
`
`
`Jan. 12, 1999
`
`
`
`
`
`Sheet 1 of 8
`
`
`
`
`5,860,158
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`VECTOR EJCESSOR
`
`
`
`
`
`
`GENERAL PURPOSE
`PROCESSOR
`
`
`
`m
`
`
`
`
`
`
`
`
`
`
`
`
`CACHE SYSTEM 130
`
`
`
`
`
`ROM L52
`
`
`
`
`
`CACHE CONTROL UNIT @
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`182 ~
`
`
`
`
`
`
`
`
`
`UART m -
`
`
`
`
`
`
`BITSTREAM
`PROCESSOR Lag .
`
`
`
`
`
`INTERRUPT
`
`CONTROLLER @ h
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`FIG.
`
`1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`INTERFACE
`
`
`LOCAL BUS
`
`
`
`
`
`
`
`
`
`
`
`MEMORY
`
`
`
`
`
` SYSTEM TIMER
`
`
`
`
`
`DEVICE
`
`
`
`
`
`
`
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 4 of 33 PageID# 10456
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 4 of 33 Page|D# 10456
`
`
`US. Patent
`
`
`
`Jan. 12, 1999
`
`
`
`
`
`
`
`
`Sheet 2 0f 8
`
`5,860,158
`
`
`
`130
`
`
`
`
`
`
`SRAM 210
`TAG 212 ROM 150
`
`
`
`
`
`
`
`
`
`
`
`
`GPP ICACHE
`AG
`
`
`
`1428
`42A
`
`
`
`
`
`140
`
`170
`
`
`
`
`
`
`
`
`
`ROM CACHE
`
`
`
`TAG
`
`
`M
`1%
`
`
`
`
`
`
`
`TAG
`VECTOR ICACHE
`
`
`7_2A
`1728
`
`
`
`
`
`
`
`
`
`TAG
`
`_7___14B
`
`
`
`CACHE CONTROL
`
`
`UNIT
`
`
`m
`
`
`
`
`VECTOR DCACHE
`
`‘____74A
`
`
`
`
`
`
`
`
`
`
`'r—=
`
`I'll
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`GPP
`
`
`ADDRESS PIPELINE2—30
`232234236
`
`
`I...
`VECTOR (INSTR)
`
`
`
`
`
`VECTOR (DATA)
`
`
`
`
`VECTOR (INSTR)
`
`
`
`VECTOR (DATA)
`
`
`
`VECTOR (DATA)
`
`GPP
`
`
`
`
`
`
`
`
`
`
`
`
`
`_ATAPIPELINE—2_0
`
`
`
`IOBUS
`
`
`FIG. 2
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 5 of 33 PageID# 10457
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 5 of 33 Page|D# 10457
`
`
`
`
`
`
`
`
`
`SRAM m
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`GP READ MUX
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`mama'S'fl
`
`
`
`
`
`
`
`
`
`6661‘z1'uer
`
`
`
`
`
`
`
`
`
`
`
`
`8J0‘5199118
`
`
`
`
`
`
`
`
`851‘098‘5
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`FBUS MUX
`
`
`112
`
`
`
`
`
`
`READ
`
`
`
`||||||
`340 CACHE
`I MUX
`
`
`
`
`
`
`
`GENERAL
`VECTOR
`
`
`PURPOSE
`PROCESSOR
`
`
`
`PROCESSOR
`12g
`
`
`m
`
`L __________
`
`
`FIG. 3
`
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 6 of 33 PageID# 10458
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 6 of 33 Page|D# 10458
`
`IOBUS ' 130
`
`
`
`101 _RD_LAT
`
`FBUS
`190
`
`
`421
`
`
`
`CCU_DOUT
`
`
`
`
`__________________
`
`A‘iM-PAIA. _ _l
`
`FIG. 4
`
`
`
`mama'S'fl
`
`
`
`6661‘211191'
`
`8J0vmus
`
`89I‘098‘S
`
`
`
`---
`l' ——————————————————————————— ___7
`I
`|
`|
`|
`I
`I
`I
`I
`I
`I
`:
`:
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 7 of 33 PageID# 10459
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 7 of 33 Page|D# 10459
`
`505— I
`
`505— 2
`
`150
`
`
`
`CACHE_-RAM
`
`WR_TAG
`
`RDT_AG
`
`CACHE_ROM
`
`MCP_BASE _AM_WRITE_ADR
`
`r ———————— 4—1]
`:
`ADR_03
`
`
`I __ADR_MUXLAT
`
`WR_ADR_Q
`
`
`RD_ADR_Q
`
`
`
`RETURNJD li—lili
`:
`ADR_QZ
`
`
`l
`
`'
`'
`I
`:
`
`I
`'
`:
`
`:
`
`.C:
`Sn
`
`E?H
`
`g
`H
`
`L:
`‘3
`
`g"
`g
`E
`
`co
`
`U]
`woo
`8
`1..
`U]
`00
`
`no
`
`‘ Id RETURN—ADR — '
`'
`' .d '
`I
`II
`WB_LAT_TMP
`I
`I
`:
`:
`i
`ADR_QO
`WB_LAT
`
`L
`
`____ _.___J r————
`I
`:
`
`MEM ADR LAT
`
`L _____________ _l
`
`—————
`
`——'
`
`VECTOR PROCESSOR
`
`OPP
`
`FBUS
`190
`
`IOBUS
`180
`
`FIG. 5
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 8 of 33 PageID# 10460
`Case 3:14-cv-OO757-REP-DJN Document 81-1 Filed 04/10/15 Page 8 of 33 Page|D# 10460
`
`
`US. Patent
`
`
`
`
`
`Jan. 12, 1999
`
`
`
`Sheet 6 0f 8
`
`
`
`
`5,860,158
`
`
`
`
`
`
`
`
`ARM_CCU INTERFACE STATE MACHINE (A_SM)
`
`
`
`
`
`
`
`
`
`
`
`
`
`CCU_FBUS INTERFACE STATE MACHINE (F_SM)
`
`
`
`
`
`
`
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 9 of 33 PageID# 10461
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 9 of 33 Page|D# 10461
`
`
`US. Patent
`
`
`
`
`
`Jan. 12, 1999
`
`
`
`Sheet 7 0f 8
`
`
`
`
`5,860,158
`
`
`
`
`
`
`
`
`DATA RECEIVER STATE MACHINE (D_SM)
`
`
`
`
`FIG. 8
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 10 of 33 PageID# 10462
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 10 of 33 Page|D# 10462
`
`US. Patent
`
`Jan. 12, 1999
`
`Sheet 8 of 8
`
`5,860,158
`
`
`
`READ STATE MACHINE
`(RD_SM)
`
`WRITE STATE MACHINE
`(WR_SM)
`
`
`
`FIG. 9
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 11 of 33 PageID# 10463
`Case 3:14-cv-OO757-REP-DJN Document 81-1 Filed 04/10/15 Page 11 of 33 Page|D# 10463
`
`5,860,158
`
`
`
`
`
`
`
`
`
`
`1
`CACHE CONTROL UNIT WITH A CACHE
`
`
`
`
`
`REQUEST TRANSACTION-ORIENTED
`
`PROTOCOL
`
`COPYRIGHT NOTICE
`
`
`A portion of the disclosure of this patent document
`
`
`
`
`
`
`
`
`
`contains material which is subject to copyright protection.
`
`
`
`
`
`
`
`
`The copyright owner has no objection to the facsimile
`
`
`
`
`
`
`
`
`
`reproduction by anyone of the patent document or the patent
`
`
`
`
`
`
`
`
`
`
`disclosure, as it appears in the Patent and Trademark Office
`
`
`
`
`
`
`
`
`
`
`patent file or records, but otherwise reserves all copyright
`
`
`
`
`
`
`
`
`
`rights whatsoever.
`
`
`BACKGROUND OF THE INVENTION
`
`
`
`1. Field of the Invention
`
`
`
`
`
`This invention relates to providing processors with fast
`
`
`
`
`
`
`
`
`memory access and, more particularly, to providing control
`
`
`
`
`
`
`
`
`of cache memory systems.
`
`
`
`
`2. Description of the Related Art
`
`
`
`
`
`
`Processors often employ memories which are relatively
`
`
`
`
`
`
`
`slow when compared to the clock speeds of the processors.
`
`
`
`
`
`
`
`
`
`
`To speed up memory access for such processors, a relatively
`
`
`
`
`
`
`
`
`
`
`small amount of fast memory can be used in a data cache.
`
`
`
`
`
`
`
`
`
`
`
`
`A cache can mediate memory accesses and lessen the
`
`
`
`
`
`
`
`
`
`average memory access time for all or a large portion of the
`
`
`
`
`
`
`
`
`
`
`
`
`address space of a processor even though the cache is small
`
`
`
`
`
`
`
`
`
`
`relative to the address space. Caches do not occupy a
`
`
`
`
`
`
`
`
`
`
`specific portion of the address space of the processor but
`
`
`
`
`
`
`
`
`
`
`instead include tag information which identifies addresses
`
`
`
`
`
`
`
`for information in lines of the cache.
`
`
`
`
`
`
`
`Typically, a cache compares an address received from a
`
`
`
`
`
`
`
`
`
`processor to tag information stored in the cache to determine
`
`
`
`
`
`
`
`
`
`
`whether the cache contains a valid entry for the memory
`
`
`
`
`
`
`
`
`
`
`address being accessed. If such a cache entry exists (i.e. if
`
`
`
`
`
`
`
`
`
`
`
`there is a cache hit), the processor accesses (reads from or
`
`
`
`
`
`
`
`
`
`
`
`writes to) the faster cache memory instead of the slower
`
`
`
`
`
`
`
`
`
`
`memory.
`In addition to tag information, a cache entry
`
`
`
`
`
`
`
`
`
`typically contains a “validity” bit and a “dirty” bit which
`
`
`
`
`
`
`
`
`
`
`respectively indicated whether the associated information in
`
`
`
`
`
`
`
`the entry is valid and whether the associated information
`
`
`
`
`
`
`
`
`
`contains changes to be written back to the slower memory.
`
`
`
`
`
`
`
`
`
`
`If there is no cache entry for the address being accessed (i.e.
`
`
`
`
`
`
`
`
`
`
`
`
`there is a cache miss), access to the slower memory is
`
`
`
`
`
`
`
`
`
`
`
`required for the cache to create a new entry for the just
`
`
`
`
`
`
`
`
`
`
`
`
`accessed memory address.
`
`
`
`Caches use cache policies such as “least recently used” or
`
`
`
`
`
`
`
`
`
`
`“not last used” replacement techniques to determine which
`
`
`
`
`
`
`
`
`existing entries are replaced with new entries. Typically,
`
`
`
`
`
`
`
`
`computer programs access the same memory addresses
`
`
`
`
`
`
`
`repeatedly. Therefore, the most recently accessed data is
`
`
`
`
`
`
`
`
`likely to be accessed again soon after the initial access.
`
`
`
`
`
`
`
`
`
`
`Because recently accessed data is available in the cache for
`
`
`
`
`
`
`
`
`
`
`subsequent accesses, caches can improve access time across
`
`
`
`
`
`
`
`
`the address space of the processor.
`
`
`
`
`
`
`A different method for increasing processor speed is the
`
`
`
`
`
`
`
`
`
`use of parallel processing techniques. For example, by
`
`
`
`
`
`
`
`
`providing a number of functional units which perform
`
`
`
`
`
`
`
`
`tasks, a “very long instruction word” (VLIW)
`different
`
`
`
`
`
`
`
`
`processor can perform multiple functions through a single
`
`
`
`
`
`
`
`
`instruction. Also, a general purpose processor and a vector
`
`
`
`
`
`
`
`
`
`processor may be integrated to operate in parallel. An
`
`
`
`
`
`
`
`
`
`integrated multiprocessor is able to achieve high perfor—
`
`
`
`
`
`
`
`mance with low cost since the two processors perform only
`
`
`
`
`
`
`
`
`
`
`tasks ideally suited for each processor. For example, the
`
`
`
`
`
`
`
`
`
`general purpose processor runs a real time operating system
`
`
`
`
`
`
`
`
`
`
`10
`
`
`
`
`
`20
`
`
`
`
`
`30
`
`
`
`
`
`40
`
`
`
`
`
`50
`
`
`
`
`
`60
`
`
`
`
`
`
`
`2
`
`and performs overall system management while the vector
`
`
`
`
`
`
`
`processor is used to perform parallel calculations using data
`
`
`
`
`
`
`
`
`structures called “vectors”. (A vector is a collection of data
`
`
`
`
`
`
`
`
`
`elements typically of the same type.) Multiprocessor con-
`
`
`
`
`
`
`
`figurations are especially advantageous for operations
`
`
`
`
`
`
`involving digital signal processing such as coding and
`
`
`
`
`
`
`
`
`decoding video, audio, and communications data.
`
`
`
`
`
`
`SUMMARY OF THE INVENTION
`
`
`
`
`It has been discovered that accesses to a cache by multiple
`
`
`
`
`
`
`
`
`
`
`
`devices may be managed by a cache control unit
`that
`
`
`
`
`
`
`
`
`
`
`includes transaction identification logic to identify cache
`
`
`
`
`
`
`
`accesses. Such an apparatus provides the advantage of
`
`
`
`
`
`
`
`
`improving performance by increasing the speed of memory
`
`
`
`
`
`
`
`
`accesses by one or more devices. Specifically, such an
`
`
`
`
`
`
`
`
`
`apparatus allows the cache to service later arriving requests
`
`
`
`
`
`
`
`
`
`before earlier arriving requests.
`
`
`
`
`In one embodiment of the present invention, a cache is
`
`
`
`
`
`
`
`
`
`coupled to a cache accessing device. Afirst cache request is
`
`
`
`
`
`
`
`
`
`
`received from the device. A request identification informa-
`
`
`
`
`
`
`
`tion is assigned to the first cache request and provided to the
`
`
`
`
`
`
`
`
`
`
`
`
`requesting device. The first-cache request may begin to be
`
`
`
`
`
`
`
`
`
`processed. A second cache request is received from the
`
`
`
`
`
`
`
`
`
`cache accessing device. The second cache request
`is
`
`
`
`
`
`
`
`
`assigned to the first cache request and provided to the
`
`
`
`
`
`
`
`
`
`
`requesting device. The first and second cache requests are
`
`
`
`
`
`
`
`
`
`finally fully serviced.
`
`
`
`In another embodiment, a cache system includes a cache
`
`
`
`
`
`
`
`
`
`for temporarily storing information and a cache control unit.
`
`
`
`
`
`
`
`
`
`The cache control unit includes access control logic, iden-
`
`
`
`
`
`
`
`
`tification logic, and result logic. The access control logic
`
`
`
`
`
`
`
`
`
`receives and executes cache accesses by a cache accessing
`
`
`
`
`
`
`
`
`
`device. The identification logic assigns request identification
`
`
`
`
`
`
`
`information to each of the cache accesses, and provides the
`
`
`
`
`
`
`
`
`
`
`request identification information to the cache accessing
`
`
`
`
`
`
`
`device. The identification logic is capable of providing the
`
`
`
`
`
`
`
`
`
`request identification information prior to the execution of
`
`
`
`
`
`
`
`
`the cache accesses by the access control logic. The result
`
`
`
`
`
`
`
`
`
`
`logic provides the request identification information and the
`
`
`
`
`
`
`
`
`data requested by the cache accessing device to the cache
`
`
`
`
`
`
`
`
`
`
`accessing device if the cache access was a read.
`
`
`
`
`
`
`
`
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`
`
`
`
`The present invention may be better understood, and its
`
`
`
`
`
`
`
`
`numerous objects, features, and advantages made apparent
`
`
`
`
`
`
`to those skilled in the art by referencing the accompanying
`
`
`
`
`
`
`
`
`
`drawings.
`
`FIG. 1 shows a block diagram of a multimedia signal
`
`
`
`
`
`
`
`
`
`processor in accordance with an embodiment of the inven-
`
`
`
`
`
`
`
`
`tion.
`
`FIG. 2 shows a block diagram of a cache system in
`
`
`
`
`
`
`
`
`
`
`accordance with an embodiment of the invention.
`
`
`
`
`
`
`
`FIG. 3 shows a block diagram of a data pipeline used in
`
`
`
`
`
`
`
`
`
`
`a cache system in accordance with an embodiment of the
`
`
`
`
`
`
`
`
`
`invention.
`
`FIG. 4 shows a block diagram of a data pipeline used in
`
`
`
`
`
`
`
`
`
`
`a cache system in accordance with an embodiment of the
`
`
`
`
`
`
`
`
`
`invention.
`
`FIG. 5 shows a block diagram of an address pipeline used
`
`
`
`
`
`
`
`
`
`in a cache system in accordance with an embodiment of the
`
`
`
`
`
`
`
`
`
`
`invention.
`
`FIG. 6 shows a state diagram of a cache control unit and
`
`
`
`
`
`
`
`
`
`
`processor interface in accordance with an embodiment of the
`
`
`
`
`
`
`
`
`invention.
`
`FIG. 7 shows a state diagram of a cache control unit and
`
`
`
`
`
`
`
`
`
`
`bus interface in accordance with an embodiment of the
`
`
`
`
`
`
`
`
`invention.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 12 of 33 PageID# 10464
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 12 of 33 Page|D# 10464
`
`5,860,158
`
`
`
`
`
`
`
`
`
`
`
`
`3
`FIG. 8 shows a state diagram of a data receiver state
`
`
`
`
`
`
`
`
`
`
`machine in accordance with an embodiment of the inven-
`
`
`
`
`
`
`
`
`tion.
`
`FIG. 9 shows a state diagram of a read/write state machine
`
`
`
`
`
`
`
`
`in accordance with an embodiment of the invention.
`
`
`
`
`
`
`
`
`The use of the same reference symbols in different draw-
`
`
`
`
`
`
`
`
`
`ings indicates similar or identical items.
`
`
`
`
`
`
`DESCRIPTION OF THE PREFERRED
`
`
`
`EMBODIMENT(S)
`
`The following sets forth a detailed description of the
`
`
`
`
`
`
`
`
`
`preferred embodiments. The description is intended to be
`
`
`
`
`
`
`
`
`illustrative of the invention and should not be taken to be
`
`
`
`
`
`
`
`
`
`
`
`limiting. Many variations, modifications, additions and
`
`
`
`
`
`
`improvements may fall within the scope of the invention as
`
`
`
`
`
`
`
`
`
`
`defined in the claims that follow.
`
`
`
`
`
`
`Referring to FIG. 1, processor 100 includes a general
`
`
`
`
`
`
`
`
`
`purpose processor 110 coupled to a vector processor 120.
`
`
`
`
`
`
`
`
`
`General purpose processor 110 and vector processor 120 are
`
`
`
`
`
`
`
`
`
`coupled via control bus 112 and interrupt line 114. General
`
`
`
`
`
`
`
`
`
`
`purpose processor 110 and vector processor 120 are coupled
`
`
`
`
`
`
`
`
`
`to cache system 130 via bus 116 and bus 118, respectively.
`
`
`
`
`
`
`
`
`
`
`
`Cache system is coupled to input/output bus (IOBUS) 180
`
`
`
`
`
`
`
`
`
`and fast bus (FBUS) 190. IOBUS 180 is coupled to system
`
`
`
`
`
`
`
`
`
`
`
`timer 182, universal asynchronous receiver-transmitter
`
`
`
`
`
`(UART) 184, bitstream processor 186 and interrupt control-
`
`
`
`
`
`
`
`ler 188. FBUS 190 is coupled to device interface 192, direct
`
`
`
`
`
`
`
`
`
`
`
`memory access (DMA) controller 194, local bus interface
`
`
`
`
`
`
`
`
`196 and memory controller 198.
`
`
`
`
`
`General purpose processor 110 and vector processor 120
`
`
`
`
`
`
`
`execute separate program threads in parallel. General pur-
`
`
`
`
`
`
`
`pose processor 110 typically executes instructions which
`
`
`
`
`
`
`
`manipulate scalar data. Vector processor 120 typically
`
`
`
`
`
`
`
`executes instructions having vector operands, i.e., operands
`
`
`
`
`
`
`
`each containing multiple data elements of the same type. In
`
`
`
`
`
`
`
`
`
`
`some embodiments, general purpose processor 110 has a
`
`
`
`
`
`
`
`
`limited vector processing capability. However, applications
`
`
`
`
`
`
`that require multiple computations on large arrays of data are
`
`
`
`
`
`
`
`
`
`
`not suited for scalar processing or even limited vector
`
`
`
`
`
`
`
`
`
`processing. For example, multimedia applications such as
`
`
`
`
`
`
`
`audio and video data compression and decompression
`
`
`
`
`
`
`
`require many repetitive calculations on pixel arrays and
`
`
`
`
`
`
`
`
`strings of audio data. To perform real—time multimedia
`
`
`
`
`
`
`
`
`operations, a general purpose processor which manipulates
`
`
`
`
`
`
`
`scalar data (e.g. one pixel value or sound amplitude per
`
`
`
`
`
`
`
`
`
`
`operand) or only small vectors must operate at a high clock
`
`
`
`
`
`
`
`
`
`
`
`frequency. In contrast, a vector processor executes instruc-
`
`
`
`
`
`
`
`tions where each operand is a vector containing multiple
`
`
`
`
`
`
`
`
`
`data elements (e.g. multiple pixel values or sound
`
`
`
`
`
`
`
`
`amplitudes). Therefore, vector processor 120 can perform
`
`
`
`
`
`
`
`real-time multimedia operations at a fraction of the clock
`
`
`
`
`
`
`
`
`
`frequency required for general purpose processor 110 to
`
`
`
`
`
`
`
`
`perform the same function. Thus, by allowing an efficient
`
`
`
`
`
`
`
`
`
`division of the tasks required for, e.g., multimedia
`
`
`
`
`
`
`
`
`applications, the combination of general purpose processor
`
`
`
`
`
`
`
`110 and vector processor 120 provides high performance per
`
`
`
`
`
`
`
`
`
`cost. Although in the preferred embodiment, processor 100
`
`
`
`
`
`
`
`
`is for multimedia applications, processor 100 may be any
`
`
`
`
`
`
`
`
`
`type of processor.
`
`
`
`In one embodiment, general purpose processor 110
`
`
`
`
`
`
`
`executes a real-time operating system designed for a media
`
`
`
`
`
`
`
`
`
`circuit board communicating with a host computer system.
`
`
`
`
`
`
`
`
`The real—time operating system communicates with a pri—
`
`
`
`
`
`
`
`mary processor of the host computer system, services input/
`
`
`
`
`
`
`
`
`
`output (I/O) devices on or coupled to the media circuit
`
`
`
`
`
`
`
`
`
`
`board, and selects tasks which vector processor 120
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`4
`executes.
`In that embodiment, vector processor 120 is
`
`
`
`
`
`
`
`designed to perform computationally intensive tasks requir-
`
`
`
`
`
`
`ing the manipulation of large data blocks, while general
`
`
`
`
`
`
`
`
`purpose processor 110 acts as the master processor to vector
`
`
`
`
`
`
`
`
`
`processor 120.
`
`
`In the exemplary embodiment, general purpose processor
`
`
`
`
`
`
`
`110 is a 32-bit RISC processor which operates at 40 Mhz and
`
`
`
`
`
`
`
`
`
`
`
`conforms to the standard ARM7 instruction set. The archi-
`
`
`
`
`
`
`
`
`tecture for an ARM7 reduced instruction set computer
`
`
`
`
`
`
`
`(RISC) processor and the ARM7 instruction set is described
`
`
`
`
`
`
`
`
`in the ARM7DM Data Sheet available from Advanced RISC
`
`
`
`
`
`
`
`Machines Ltd. General purpose processor 110 also imple-
`
`
`
`
`
`
`
`ments an extension of the ARM7 instructions set which
`
`
`
`
`
`
`
`
`
`includes instructions for an interface with vector processor
`
`
`
`
`
`
`
`
`120. The extension to the ARM7 instruction set for the
`
`
`
`
`
`
`
`
`
`
`exemplary embodiment of the invention is described in
`
`
`
`
`
`
`
`
`copending, U.S. patent application Ser. No. 08/699,295,
`
`
`
`
`
`
`
`attorney docket No. M-4366 U.S., filed on Aug. 19, 1996,
`
`
`
`
`
`
`
`
`
`
`entitled “System and Method for Handling Software Inter-
`
`
`
`
`
`
`
`rupts with Argument Passing,” naming Seungyoon Peter
`
`
`
`
`
`
`
`Song, Moataz A. Mohamed, Heon-Chul Park and Le
`
`
`
`
`
`
`
`
`Nguyen as inventors, which is incorporated herein by ref-
`
`
`
`
`
`
`
`
`erence in its entirety. General purpose processor 110 is
`
`
`
`
`
`
`
`
`coupled to vector processor 120 by control bus 112 to carry
`
`
`
`
`
`
`
`
`
`
`out the extension of the ARM7 instruction set. Furthermore,
`
`
`
`
`
`
`
`
`interrupt line 114 is used by vector processor 120 to request
`
`
`
`
`
`
`
`
`
`
`an interrupt on general purpose processor 110.
`
`
`
`
`
`
`
`In the exemplary embodiment, vector processor 120 has
`
`
`
`
`
`
`
`a single-instruction-multiple-data (SIMD) architecture and
`
`
`
`
`manipulates both scalar and vector quantities. In the exem-
`
`
`
`
`
`
`
`
`plary embodiment, vector processor 120 consists of a pipe-
`
`
`
`
`
`
`
`lined reduced instruction set computer (RISC) central pro-
`
`
`
`
`
`
`
`cessing unit (CPU) that operates at 80 Mhz and has a 288-bit
`
`
`
`
`
`
`
`
`
`
`vector register file. Each vector register in the vector register
`
`
`
`
`
`
`
`
`
`file can contain up to 32 data elements. Avector register can
`
`
`
`
`
`
`
`
`
`
`
`hold thirty—two 8—bit or 9—bit integer data elements, sixteen
`
`
`
`
`
`
`
`
`16-bit
`integer data elements, or eight 32-bit
`integer or
`
`
`
`
`
`
`
`
`floating point elements. Additionally,
`the exemplary
`
`
`
`
`
`embodiment can also operate on a 576—bit vector operand
`
`
`
`
`
`
`
`
`spanning two vector registers.
`
`
`
`
`The instruction set for vector processor 120 includes
`
`
`
`
`
`
`
`
`instructions for manipulating vectors and for manipulating
`
`
`
`
`
`
`
`scalars. The instruction set for the exemplary embodiment of
`
`
`
`
`
`
`
`
`
`the invention and an architecture for implementing the
`
`
`
`
`
`
`
`
`instruction set
`is described in the pending U.S. patent
`
`
`
`
`
`
`
`
`
`application Ser. No. 08/699,597, attorney docket No.
`
`
`
`
`
`
`
`M—4355 U.S., filed on Aug. 19, 1996, entitled “Single—
`
`
`
`
`
`
`
`
`Instruction-Multiple-Data Processing in a Multimedia Sig-
`
`
`
`
`
`nal Processor,” naming Le Trong Nguyen as inventor, which
`
`
`
`
`
`
`
`
`is incorporated herein by reference in its entirety.
`
`
`
`
`
`
`
`
`General purpose processor 110 performs general tasks and
`
`
`
`
`
`
`
`executes a real—time operating system which controls com—
`
`
`
`
`
`
`
`munications with device drivers. Vector processor 120 per-
`
`
`
`
`
`
`
`forms vector tasks. General purpose processor 110 and
`
`
`
`
`
`
`
`vector processor 120 may be scalar or superscalar proces—
`
`
`
`
`
`
`
`
`sors. The multiprocessor operation of the exemplary
`
`
`
`
`
`
`embodiment of the invention is more fully described in
`
`
`
`
`
`
`
`
`pending U.S. patent application Ser. No. 08/697,102, attor-
`
`
`
`
`
`
`
`ney docket No. M-4354 U.S., filed on Aug. 19, 1996,
`
`
`
`
`
`
`
`
`
`entitled “Multiprocessor Operation in a Multimedia Signal
`
`
`
`
`
`
`Processor,” naming Le Trong Nguyen as inventor, which is
`
`
`
`
`
`
`
`
`incorporated herein by reference in its entirety.
`
`
`
`
`
`
`
`Referring again to FIG. 1, cache system 130 contains a
`
`
`
`
`
`
`
`
`
`fast random access memory (RAM) block (shown graphi-
`
`
`
`
`
`
`
`cally as blocks 140 and 170), read only memory (ROM) 150
`
`
`
`
`
`
`
`
`
`
`and a cache control unit 160. Cache system 130 can con-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`10
`
`
`
`20
`
`
`
`
`
`
`
`30
`
`
`
`
`
`40
`
`
`
`
`
`50
`
`
`
`
`
`60
`
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 13 of 33 PageID# 10465
`Case 3:14-cv-00757-REP-DJN Document 81-1 Filed 04/10/15 Page 13 of 33 Page|D# 10465
`
`5,860,158
`
`
`
`5
`
`
`figure the RAM block into (i) an instruction cache 142 and
`
`
`
`
`
`
`
`
`
`
`
`a data cache 144 for general purpose processor 110, and (ii)
`
`
`
`
`
`
`
`
`
`
`
`an instruction cache 172 and data cache 174 for vector
`
`
`
`
`
`
`
`
`
`
`processor 120. In the preferred embodiment, RAM block
`
`
`
`
`
`
`
`
`140, 170 includes static RAM (SRAM).
`
`
`
`
`
`
`In an embodiment of a computer system according to the
`
`
`
`
`
`
`
`
`
`invention, general purpose processor 110 and vector proces-
`
`
`
`
`
`
`
`sor 120 share a variety of on-chip and off-chip resources
`
`
`
`
`
`
`
`
`
`
`which are accessible through a single address space. Cache
`
`
`
`
`
`
`
`
`
`system 130 couples a memory to any of several memory
`
`
`
`
`
`
`
`
`
`
`mapped devices such as bitstream processor 186, UART
`
`
`
`
`
`
`
`
`184, DMA controller 194, local bus interface 196, and a
`
`
`
`
`
`
`
`
`
`
`coder-decoder (CODEC) device interfaced through device
`
`
`
`
`
`
`interface 192. Cache system 130 can use a transaction-
`
`
`
`
`
`
`
`
`oriented protocol to implement a switchboard for data access
`
`
`
`
`
`
`
`
`among the processors and memory mapped resources. For
`
`
`
`
`
`
`
`
`example, the transaction-oriented protocol provides that if
`
`
`
`
`
`
`
`completion of an initial cache transaction is delayed (e.g.,
`
`
`
`
`
`
`
`
`
`due to a cache miss), other cache access transactions may
`
`
`
`
`
`
`
`
`
`
`proceed prior to completion of the initial transaction. Thus,
`
`
`
`
`
`
`
`
`
`“step-aside-and-wait” capability is provided in this embodi-
`
`
`
`
`
`
`ment of a cache management system according to the
`
`
`
`
`
`
`
`
`
`invention. A similar transaction-oriented protocol is further
`
`
`
`
`
`
`
`described in pending, US. patent application Ser. No.
`
`
`
`
`
`
`
`
`08/731,393, attorney docket No. M-4398 U.S., filed on Oct.
`
`
`
`
`
`
`
`
`
`18, 1996, entitled “Shared Bus System with Transaction and
`
`
`
`
`
`
`
`
`
`Destination ID,” naming Amjad Z. Qureshi and Le Trong
`
`
`
`
`
`
`
`
`
`Nguyen as inventors, which is incorporated herein by ref—
`
`
`
`
`
`
`
`
`erence in its entirety.
`
`
`
`
`Cache system 130 couples general purpose processor 110
`
`
`
`
`
`
`
`
`and vector processor 120 to two system busses: IOBUS 180
`
`
`
`
`
`
`
`
`
`
`and FBUS 190. IOBUS 180 typically operates at a slower
`
`
`
`
`
`
`
`
`
`
`frequency than FBUS 190. Slower speed devices are
`
`
`
`
`
`
`
`
`coupled to IOBUS 180, while higher speed devices are
`
`
`
`
`
`
`
`
`
`coupled to FBUS 190. By separating the slower speed
`
`
`
`
`
`
`
`
`
`devices from the higher speed devices, the slower speed
`
`
`
`
`
`
`
`
`
`devices are prevented from unduly impacting the perfor-
`
`
`
`
`
`
`
`mance of the higher speed devices.
`
`
`
`
`
`
`Cache system 130 also serves as a switchboard for
`
`
`
`
`
`
`
`
`
`communication between IOBUS 180, FBUS 190, general
`
`
`
`
`
`
`
`purpose processor 110, and vector processor 120. In most
`
`
`
`
`
`
`
`
`
`embodiments of cache system 130, multiple simultaneous
`
`
`
`
`
`
`
`accesses between the busses and processors are possible. For
`
`
`
`
`
`
`
`
`
`example, vector processor 120 is able to communicate with
`
`
`
`
`
`
`
`
`
`FBUS 190 at the same time that general purpose processor
`
`
`
`
`
`
`
`
`
`
`110 is communicating with IOBUS 180. In one embodiment
`
`
`
`
`
`
`
`
`
`
`of the invention, the combination of the switchboard and
`
`
`
`
`
`
`
`
`
`caching function is accomplished by using direct mapping
`
`
`
`
`
`
`
`
`techniques for FBUS 190 and IOBUS 180. Specifically, the
`
`
`
`
`
`
`
`
`
`devices on FBUS 190 and IOBUS 180 can be accessed by
`
`
`
`
`
`
`
`
`
`
`
`general purpose processor 110 and vector processor 120 by
`
`
`
`
`
`
`
`
`
`standard memory reads and write at appropriate addresses.
`
`
`
`
`
`
`
`
`FBUS 190 provides an interface to the main memory. The
`
`
`
`
`
`
`
`
`
`
`interface unit to the memory is composed of a four-entry
`
`
`
`
`
`
`
`
`
`
`address queue and a one—entry write—back latch. The inter—
`
`
`
`
`
`
`
`
`face can support one pending refill (read) request from
`
`
`
`
`
`
`
`
`
`general purpose processor instruction cache 142, one pend-
`
`
`
`
`
`
`
`ing refill (read) request from vector processor instruction
`
`
`
`
`
`
`
`cache 172, one write request from vector processor data
`
`
`
`
`
`
`
`
`cache 174, and one write-back request from vector processor
`
`
`
`
`
`
`
`
`data cache due to a dirty cache line.
`
`
`
`
`
`
`
`
`FBUS 190 is coupled to various high speed devices such
`
`
`
`
`
`
`
`
`
`
`as a memory controller 198 and a DMA controller 194, a
`
`
`
`
`
`
`
`
`
`
`
`local bus interface 196, and a device interface 192. Memory
`
`
`
`
`
`
`
`
`
`
`controller 198 and DMA controller 194 provide memory
`
`
`
`
`
`
`
`
`interfaces. Local bus interface 196 provides an interface to
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`10
`
`
`
`
`
`20
`
`
`
`
`
`30
`
`
`
`
`
`40
`
`
`
`
`
`50
`
`
`
`
`
`60
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`6
`a local bus coupled to a processor. Device interface 192
`
`
`
`
`
`
`
`
`
`
`provides interfaces to various digital-to-analog and analog-
`
`
`
`
`
`
`to—digital converters (DACs and ACDs, respectively) that
`
`
`
`
`
`
`may be coupled to processor 100 for Video, audio or
`
`
`
`
`
`
`
`
`
`communications applications.
`
`
`Memory controller 198 provides an interface for a local
`
`
`
`
`
`
`
`
`
`memory if a local memory is provided for processor 100.
`
`
`
`
`
`
`
`
`
`
`Memory controller 198 controls reads and writes to the local
`
`
`
`
`
`
`
`
`
`
`memory. In the exemplary embodiment, memory controller
`
`
`
`
`
`
`
`198 is coupled to and controls one bank of synchronous
`
`
`
`
`
`
`
`
`
`
`dynamic RAMs (two 1M><16 SDRAM chips) configured to
`
`
`
`
`
`
`
`
`use 24 to 26 address bits and 32 data bits and having the
`
`
`
`
`
`
`
`
`
`
`
`
`
`features of: (i) a “CAS-before-RAS” refresh protocol, per-
`
`
`
`
`
`
`
`formed at a programmable refresh rate, (ii) partial writes that
`
`
`
`
`
`
`
`
`
`
`initiate Read-Modify-Write operations, and (iii) internal
`
`
`
`
`
`
`bank interleave. Memory controller 198 also provides a 1:1
`
`
`
`
`
`
`
`
`
`frequency match between the local memory and FBUS 190,
`
`
`
`
`
`
`
`
`
`manual “both bank precharge”, and address and data queu-
`
`
`
`
`
`
`
`
`ing to better utilize FBUS 190. Synchronous DRAM are
`
`
`
`
`
`
`
`
`known to effectively operate at such frequencies (80 MHZ),
`
`
`
`
`
`
`
`
`and standard fast page DRAMs and extended data out
`
`
`
`
`
`
`
`
`(EDO) DRAMs could also be used. DRAM controllers with
`
`
`
`
`
`
`
`
`capabilities similar to memory controller 198 in the exem-
`
`
`
`
`
`
`
`
`plary embodiment are known in the art.
`
`
`
`
`
`
`
`DMA controller 194 controls direct memory accesses
`
`
`
`
`
`
`between the main memory of a host computer and the local
`
`
`
`
`
`
`
`
`
`memory of processor 100. Such DMA controllers are well
`
`
`
`
`
`
`
`
`known in the art. In some embodiments of the invention, a
`
`
`
`
`
`
`
`
`
`
`memory data mover is included. The memory data mover
`
`
`
`
`
`
`
`
`performs DMA from one block of memory to another block
`
`
`
`
`
`
`
`
`
`
`of memory.
`
`
`Local bus interface 196 implements the required protocol
`
`
`
`
`
`
`
`
`for communications with a host computer via a local bus. In
`
`
`
`
`
`
`
`
`
`the exemplary embodiment, local bus interface 196 provides
`
`
`
`
`
`
`
`
`an interface to a 33-MHZ, 32-bit PCI bus. Such interfaces are
`
`
`
`
`
`
`
`
`
`
`
`well known in the art.
`
`
`
`
`
`Device interface 192 provides a hardware interface for
`
`
`
`
`
`
`
`
`devices such as audio, Video and communications DACs and
`
`
`
`
`
`
`
`
`
`ADCs which would typically be on a printed circuit board
`
`
`
`
`
`
`
`
`
`
`with a