`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 1 of 22 Page|D# 1078
`
`
`
`EXHIBIT A
`
`EXHIBIT A
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 2 of 22 PageID# 1079
`case3:14'CV'00757'REP'DJN “C“memilllllllllllfllfllllllllllllllflllillllllilllillllllli“’79
`
`USOOS778434A
`
`United States Patent
`
`[19]
`
`[11] Patent Number:
`
`5,778,434
`
`Nguyen et al.
`
`
`
`[45] Date of Patent: Jul. 7, 1998
`
`[54] SYSTEM AND METHOD FOR PROCESSING
`MULTIPLE REQUESTS AND OUT OF
`ORDER RETURNS
`
`[75]
`
`Inventors: Le Trong Nguyen. Monte Sereno;
`Yasuaki Hagiwara. Santa Clara. both
`of Calif.
`
`[73] Assignee:
`
`JSeiko Epson Corporation. Tokyo.
`apan
`
`[21] Appl. No.: 479,035
`_
`[22]
`filed:
`Jun. 7’ 1995
`1
`Int.
`1.6 ......................................................
`{:2} U S CC]
`711,123.65‘11/232
`5 8
`F:eld f g""""""""""""""""""""""
`3
`’
`_
`I:
`]
`l
`0
`earch ..................................... r95/46’;4. 473,
`711/1‘ 7' 146
`.
`References Clted
`US. PATENT DOCUNEENTS
`
`[56]
`
`13131;? 23:: gas];""""g1""""""""""""""" 33:22;
`
`4,797,814
`1/1989 Brewzar a '
`395I403
`5/1995 Konigsfeidgi. 31.
`5’420’991
`395/477
`__
`.. 395/393
`5:487:156
`111996 Popescu et al.
`
`.. 395/292
`5,535,340
`7/1996 Bell a a1,
`.....
`.. 395/309
`5,535,345
`7/1996 Fisch et a].
`
`5,539,911
`7/1996 Nguyen et a].
`.. 395/300
`.. 395/375
`$557,763
`9/1996 Sewer et a1.
`-- 395/300
`5560932
`9/19‘96 Nguyen Ct 31>
`
`" 395/453
`556L780 10/1996 Glen et a1.
`FOREIGN PATENT DOCUMENTS
`292188
`11/1988 E
`P
`01?
`uropean at.
`.
`568231
`11/1993 European Pat. Ofl’.
`
`.
`.
`
`OTHER PUBLICATIONS
`
`J.L. Hennessey and DA. Patterson. Computer Architecture:
`A Quantitative Approach (Morgan Kaufmann Publ: San
`Mateo. CA 1990) pp. 404-495.
`
`Primary Examiner—David L. Robertson
`Attorney, Agent, or Finn—Sterne. Kessler. Goldstein & Fox.
`P.L.L.C.
`
`ABSTRACT
`[57]
`A system and method for processing a sequence of requests
`for data by one or more central processing units (CPUs) after
`cache misses. Each CPU request includes a CPU-ID tag
`identifying the CPU issuing the request for data and an
`address identifying a location in lower-level memory where
`the data is stored. Cache-control 1]) tags are assigned to
`identify the locations in the request queue of the respective
`CPU-ID tags associated With each CPU quUCSt. caChC-
`control requests consisting of the cache—control ID tags and
`the respective address information are sent from the request
`queue to the lower-level memory or storage devices. Data is
`then returned along with the corresponding CCU-ID tags in
`the order in which it is returned by the storage devices.
`Finally. the sequence of CPU requests for data is fulfilled by
`returning the data and CPU-ID tag in the order in which the
`data was returned from lower—level memory. By issuing
`multiple requests for data and allowing out of order data
`return. data IS retrieved from lower-level memory after
`cache misses more quickly and efliciently than processing
`data requests in sequence. By checking the request queue.
`pending CPU requests for the same data including requests
`for the same long word of data can be identified. Cache hits
`for multiple requests are determined by simultaneously
`checking sets in cache memory. Multiple instructions are
`then issued for multiple superset cache hits.
`19 Claims, 11 Drawing Sheets
`
`
`r m-"eq-"u~----------—--—rM“*---——“*“'—-'——"“',
`- DAM
`? [TAG RAM
`302
`304
`g
`i
`:/L/ CACHE
`“—‘” “‘> READ
`208
`— ~—‘L>J READ/WRITE
`i
`|
`.._
`|
`
`i
`
`
`
`_,
`
`, ._
`
`,
`
`l
`
`.
`
`y
`4
`
`
`
`
`‘r--“‘”‘ __“"__“_‘—I _________________________________ 'l
`REQUESTK
`HIT on Request
`g
`:1
`;
`D .3 g
`a
`2
`{/V/ RETRIEVAL
`g
`3 EL r:
`[I 3
`E
`I
`PROCESSOR
`.
`8 5 e : ES
`.5
`i
`206
`f >
`.
`"‘ “ V
`U)
`1
`— 777
`l
`W,
`i
`77‘ T“
`5
`1
`t
`‘
`f l k/ 308
`E
`
`:j
`;
`l
`l
`;
`a
`;
`;
`l
`i
`g
`i
`;
`‘
`I
`l
`:
`“~“”i
`z
`' CCU-ID. DATA‘ MCU I
`:
`108 i
`. CCLHD, ADDRJ
`j
`l
`___ ‘ 7
`
`T
`
`V
`
`7 WT;
`
`
`-.
`
`CACHE CONTROL PROCESSOR 306
`
`
`l.
`
`CPU-ID.
`DATA
`CPU-ID: J
`ADDR
`1,1
`l
`L
`
`i
`;
`i —P_>
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 3 of 22 PageID# 1080
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 3 of 22 Page|D# 1080
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 1 of 11
`
`5,778,434
`
`122
`
`
`
`l
`I
`:
`1
`.
`1
`l
`:
`:
`;
`I
`:
`:
`:
`l
`
`CCU
`(Cache
`Control
`Unit)
`
`104
`
`MEM
`
`112
`
`
`
`IFU
`(instruction
`Fetch Unit)
`
`116
`
`[EU
`
`(Instruction
`
`execuflon)
`
`'
`1
`1
`1
`1
`1
`l
`1
`:
`1
`1
`i
`1
`'
`
`1lI| ||l l
`
`CPU 102
`
`FIG. 1
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 4 of 22 PageID# 1081
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 4 of 22 Page|D# 1081
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 2 of 11
`
`5,778,434
`
`INSTRUCTION CACHE
`
`SNOOP & READ ONLY
`
`
`
`
`
`
`204
`
`
`IFU
`
`115
`
`READ ONLY
`
`NO WRITE
`
`REQUEST/RETRIEVAL
`
`PROCESSOR
`
`206
`
`
`
`MCU
`
`108
`
`IEU
`
`120
`
`
`
`DATA CACHE
`
`R/W
`
`
`
`
`
`#4
`
`a8
`
`I3.mg4amOE
`
`V
`O
`aC
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 5 of 22 PageID# 1082
`D
`28m.
`
`cPm..%U
`
`0eo.m
`
` :mngm"aFIIL_Nh_a«$331:0t:L“Flllllllllllllllllllllllllllllllllllllllllllllllllm.................................................._P__Mm“1‘h7.t_mtwsaofim:Lnwow“9mm
`
`
`2l_mmnuwm_,uHmn__h_nSH05.___1._P5_B_a.7____4.3
`..LlO_m%mom“wmmmmmmmmmMmwmOmmmOOma__I.mmwm.m_m7,éiifimxiw(mauD1.
`)HHHo‘wmmmoo<_0A_
`
`
`
`
`
`5wmegéjoo_m3mm?“8mmemmmoomm.55on
`
`
`”.505Pjzodgoo_,u.
`
`
`
`nw....................................................29_f5
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 6 of 22 PageID# 1083
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 6 of 22 Page|D# 1083
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 4 of 11
`
`5,778,434
`
`
`
`RECEIVE REQUEST FROM
`
`CPU 114
`
`
`
`
`
`
`
`CHECK TAG RAM 302 To
`
`DETERMINE WHETHER
`REQUESTED DATA ARE
`
`
`
`
`404
`
`408
`
`ALREADY IN CACHE 208
`
`
`
`
`
`SEND CPU REQUEST TO
`
`REQUEST/RETRIEVAL
`
`PROCESSOR206 FOR
`
`YES
`
`
`
`
`PROCESSING MULTIPLE
`
`DATA REQUESTS AND
`
`
`
`
`OUT—OF—ORDER DATA
`
`RETURNS
`
`
`
`
`
`
`RETURN REQUESTED
`
`415
`
`DATA TO CPU 114
`
`FIG. 4
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 7 of 22 PageID# 1084
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 7 of 22 Page|D# 1084
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 5 of 11
`
`5,778,434
`
`I,—
`
`0L
`
`L!
`_.J
`LL}
`
`“9
`LL!
`C]
`1—
`g E
`i
`('9
`Z
`9
`
`s; SUBBLOCK
`
`L“V
`L”
`v
`E
`V.
`
`524
`
`FIG.5
`
`514
`
`VOv [
`
`—33
`$5
`32
`
`Ou
`
`1
`0:
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 8 of 22 PageID# 1085
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 8 of 22 Page|D# 1085
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 6 of 11
`
`5,778,434
`
`
`
`ROUTINE FOR CHECKING
`
`600
`
`PENDING CPU REQUESTS
`
`
`
`
`
`610
`
`CHECK RQ QUEUE
`
`620
`625
`
`COMPARE: A/
`
`
`RQ QUEUE =
`
`
`
`INDEX PORTION OF
`
`NO
`
`ADDR?
`
`
`
`630
`
`GO TO
`CACHE MISS
`SUBROUTINE
`
`
`
`
`800
`
`
`
`WAIT
`
`UNTIL
`
`PENDING
`
`REQUESTS
`
`RESOLVED
`
`GO TO
`CACHE MISS
`SUBROUTINE
`800
`
`ADDR?
`
`RQ QUEUE =
`
`
` COMPARE: /‘/
`
`
`TAG PORTION OF
`
`
`ADDR?
`
`
`YES
`640
`
`
`
`RODS/15:5EB:
`
`
`
`SUBBLOCK SELECT
`
`PORTI
`F
`ON O
`
`
`YES
`
`
`
`COMPARE:
`
`
`RQ QUEUE =
`LONG WORD SELECT
`
`650
`
`YES
`
`
`
`PORTION OF
`ADDR?.
`
`NO
`
`670
`
`660
`
`SAME
`
`REQUEST+
`RESULT
`
`PENDING
`WAIT
`
`HTR ROUTINE 700
`
`FIG. 6
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 9 of 22 PageID# 1086
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 9 of 22 Page|D# 1086
`
`US. Patent
`
`Jul. 7, 1993
`
`Sheet 7 of 11
`
`5,778,434
`
`ROUTINE
`
`710
`
`
`
`
`CHECK HTR
`FLAG; READ
`
`HTR ADDR
`
`
`
`
`"ON
`
`720
`
`"1"
`
`WAIT UNTIL
`
`PENDING
`
`SET HTR IN RQ QUEUE
`
`TO "1|!
`
`STORE LONG WORD SELECT
`PORTION OF ADDR INTO HTR
`
`ADDR FIELD OF RO QUEUE
`
`
`GO TO START FOR
`
`
`
`NEXT REQUEST
`
`
`730
`
`740
`
`750
`
`FIG. 7
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 10 of 22 PageID# 1087
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 10 of 22 Page|D# 1087
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 8 of 11
`
`5,778,434
`
`CACHE MISS
`
`800
`
`SUBROUTINE
`
`STORE CPU REQUEST
`
`(CPU-ID, ADDR) IN RQ
`QUEUE 308
`
`SET VALIDITY BITS
`
`DATA REQUEST TO MCU
`
`810
`
`820
`
`830
`
`IDENTIFY CCUID
`
`CORRESPONDING TO THE
`
`STORED CPUID
`
`SEND ADDR, CCUID IN
`
`FIG. 8
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 11 of 22 PageID# 1088
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 11 of 22 Page|D# 1088
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 9 of 11
`
`5,778,434
`
`DATA RETURN ROUTINE
`
`900
`
`
`
`910
`
`920
`
`
`
`RECEIVE DATAK
`CCU-ID FROM MCU
`
`
`
`
`GET CPU-ID, ADDR, HTR
`ADDR. HTR—CPU-ID, AND
`
`HTR FLAG IN RQ QUEUE
`
`308 USING CCU-ID
`
` ser COUNTER
`
`i=0
`
` READ DATA. START ADDR
`
`
`FROM RTN QUEUE
`
`
`930
`
`950
`
`
`
`COMPARE /‘/
`
`
`SEND LONG
`
`START ADDR + i 2
`
`
`WORD OF DATA
`LONG WORD
`
`
`
`AND CPU-ID
`SELECT PORTION
`
`
`
`980
`OF ADDR
`
`
`
`INCREMENT
`
`COUNTER
`
`i=i+1
`
`CHECK
`
`
`SEND LONG
`
`
`WHETHER HTR
`
`WORD OF DATA
`FLAG SET AND
`
`
`
`AND HTR-CPU—ID
`HTR-ADDR = START
`
`
`
`ADDR + i
`
`
`
`
`FIG. 9
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 12 of 22 PageID# 1089
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 12 of 22 Page|D# 1089
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 10 of 11
`
`5,778,434
`
`122
`
`Phygcal
`
`1001
`
`1021
`
`It?! Address
`U
`AG ADDR
`
`AG ADDR
`
`TRAM‘I
`
`TRAM2
`
`TAG RAMs
`
`1003
`
`‘
`
`1023
`
`SRAM1
`
`SRAM2
`
`I
`
`E STORAGE
`
`3 MEMORY
`
`
`E
`/
`\r\
`1005
`10253
`
`/
`1000
`
`MUX
`
`,
`
`/
`
`/I/
`1030
`
`select
`
`+
`
`O
`Prlor Art
`
`FICB.‘10
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 13 of 22 PageID# 1090
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 13 of 22 Page|D# 1090
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 11 of 11
`
`5,778,434
`
`122
`
`1101
`
`1121
`
`P0 -V
`
`p1
`
`M
`U
`
`TRAM1
`
`TRAM2
`
`TAG RAMs
`
`_
`
`2requesm
`
`—--1.
`
`SWITCH
`1102
`\11 104
`
`1 122
`
`1124
`
`1110
`
`
`
`SRAM1
`
`
`
`SRAMZ
`
`STORAGE
`MEMORY
`
`
`/7
`1105
`
`1125
`
`IEU / IFU
`
`1EU / IFU
`
`FIG. 11
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 14 of 22 PageID# 1091
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 14 of 22 Page|D# 1091
`
`l
`SYSTEM AND METHOD FOR PROCESSING
`MULTIPLE REQUESTS AND OUT OF
`ORDER RETURNS
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`This invention relates to a data processor having a hier-
`archial memory arrangement including a cache memory to
`speed data retrieval. More specifically. the present invention
`relates to an improved system and method for cache control
`and management which processes out-of—order return data
`and instructions. and/or multiple fetch requests. According
`to another aspect of the present invention. sets of cache
`memory can be checked for multiple requests. Multiple
`instructions and/or data can then be returned to fulfill more
`than one request at a time.
`2. Related Art
`
`Increases in processor speed have led to the development
`of memory devices having very fast access times. The cost
`of such memory devices. however. is often proportionally
`related to the speed at which the devices can be accessed.
`Thus. to store all of a processor’s data and the program
`instructions in very fast memory can lead to very high
`memory costs.
`
`To minimize the cost associated with high-speed memory
`while still reaping the benefits of fast access times. system
`designers have implemented cache memories. In a cache
`memory system. the majority of instructions and program
`data are stored in standard memory such as a disc. hard
`drive. or low speed random access memory (RAM). A
`relatively small amount of high-speed memory. called a
`cache. is provided to store a subset of the program data
`and/or instructions. In this patent document. the term data.
`when used in reference with storage in the cache. is used to
`generally refer to either instruction execution data or to
`program instructions.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Typically. those data that are most frequently accessed by
`the processor are stored in the cache. As a result. these data
`can be accessed by the processor at a much faster rate.
`Additionally. some systems implement an instruction
`prefetch. wherein instructions are fetched from low-speed
`storage in advance and stored in the cache. As a result. the
`instructions are already in cache when needed by the pro-
`cessor and can be accessed quickly.
`System designers frequently implement cache storage to
`speed access times for data and instructions.
`In such
`systems. a cache control unit
`is often implemented to
`manage the storage of data in the cache and provide the data
`to the processor. In these systems. the instruction fetch unit
`and instruction execution unit go to the cache control unit to
`request the required data. Upon receipt of a request. the
`cache control unit first searches the cache for the data
`requested. If the requested data exist in the cache. the data
`are provided to the requesting unit from the cache. This
`condition is known as a cache hit. If the data are not present
`in the cache. the cache control unit retrieves the data from
`storage and stores the data in a known location in the cache.
`Instruction requests are often handled in sequence accord-
`ing to their order in a particular application program. Bottle-
`necks and delays occur as outstanding requests. which might
`otherwise be executed more quickly. wait for preceding
`slower instructions to be processed.
`For example. in multi-processor systems sharing two or
`more buses. a central processor unit (CPU) stall condition
`
`45
`
`50
`
`55
`
`65
`
`5.778.434
`
`2
`
`arises when one of the buses is occupied in savicing a
`particular processor. When data requests or instructions are
`executed in sequence. the other processors depending on the
`occupied bus wait for the bus to become available before
`proceeding to process other data requests and instructions.
`regardless of the availability of other buses.
`Delays can further arise when sequential data requests and
`instructions are made to wait for data return from storage
`devices which have a different rate of data return. For
`
`instance. data return from faster lower-level storage devices
`such as dynamic random access memory (DRAM) still must
`wait for preceding data requests made to slower lower-level
`storage devices such as an input/output (I/O) device.
`To improve efliciency and speed in processing program
`instructions. many contemporary processors are capable of
`dynamic scheduling of instructions. Dynamic scheduling
`means that
`instructions can be retrieved from memory.
`scheduled. and executed. all in an order that is diiferent from
`the program order. At a general level. dynamic scheduling
`allows a pipelined processor to maximize the utilization of
`its resources by prioritizing the use of the resources between
`the multiple processors. Such dynamic scheduling. however.
`does not consider the potential disparity between the storage
`devices or the resources themselves in executing instructions
`or data requests. The CPU stalls encountered when a bus is
`servicing a particular processor is also not overcome.
`The inventors have discovered that there is a need for
`optimizing instruction execution and data requests after a
`cache miss. In particular. there is a need for accommodating
`out-of—order data returns and servicing multiple requests to
`increase the speed and efficiency of data retrieval. In mul-
`tipleprocessor systems sharing multiple buses it is especially
`desirable to avoid CPU stall. Further. it is desirable to avoid
`bottleneck and to quickly and efliciently accommodate data
`requests made to a variety of lower-level storage devices
`with difierent response times.
`SUMMARY OF THE INVENTION
`
`The present invention optimizes the processing of data
`requests and instructions after cache misses. By converting
`CPU requests into cache control requests. a sequence of
`multiple requests for data can be sent to various lower-level
`storage devices or shared buses. These cache control
`requests include cache control unit identification tags (CCU-
`ID tags) for identifying individual requests. Returning data
`is then received and forwarded to the requesting CPU in the
`order of return provided by the lower—level storage devices
`or shared buses.
`In this manner. data is retrieved after cache misses more
`
`quickly and efficiently than processing data requests in
`sequence. Because the cache control ID tags allow indi-
`vidual requests to be distinguished. multiple requests for
`data can be pending at the same time. Such requests can also
`be received “out of order” from lower-level memory. that is.
`in the order of which a storage device returns the data being
`sought.
`Out-of-order data return is especially advantageous in a
`system having numerous memory or storage devices. For
`example. such a system may include DRAM. read only
`memory (ROM). electrically programmable read-only
`memory (EPROM) disk storage. and I/O devices. Typically.
`it is expected that data would be returned from DRAM
`quicker than they would be returned from disk. or from I/O.
`Thus. a request from the CPU for data from I/O would not
`create a bottleneck as in prior systems. as subsequent
`requests to devices such as a DRAM with a faster rate of
`return can still be filled.
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 15 of 22 PageID# 1092
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 15 of 22 Page|D# 1092
`
`3
`
`4
`
`5.778.434
`
`Conventional systems were limited in their ability to
`return data out of order due to their lack of a technique for
`tracking the data returned. Retaining the CPU—ID identifying
`tag with the data request greatly increased the numbers of
`bits transmitted for each request. Such overhead costs were
`prohibitive. Without
`the ability to track the data being
`returned.
`the conventional cache control unit would not
`know which data correspond to which request.
`According to one embodiment of the invention. a
`sequence of CPU requests for data not previously found in
`cache memory. is stored in a request queue. Each CPU
`request includes a CPU-ID tag identifying the CPU issuing
`the request for data and an address identifying a location in
`lower-level memory where the data is stored. Cache-control
`ID tags are then assigned to identify the locations in the
`request queue of the respective CPU—1D tags associated with
`each CPU request.
`Cache-control requests consisting of the cache-control ID
`tags and the respective address information are then sent
`from the request queue to the lower-level memory or storage
`devices. For example. the cache-control requests can be sent
`by a cache control processor to a memory control unit for
`transmittal to various storage devices.
`Data is then returned along with the corresponding CCU—
`ID tags in the order in which it is returned by the storage
`devices. A cache control processor then retrieves the asso-
`ciated CPU-ID tags identified by the CCU-ID tags. The
`sequence of CPU requests for data is fulfilled by returning
`the data and CPU-ID tag in the order in which the data is
`returned.
`
`According to a further feature of the present invention. the
`cache control processing maintains Hit on Request (HTR)
`information to service multiple requests for the diflerent
`long words in a single returned subblock.
`In another embodiment of the present invention. cache
`hits for multiple requests are determined by checking sets of
`Tag RAMs simultaneously in each cache. Multiple instruc-
`tions can be then be issued in response to cache hits.
`The cache control processing of the present invention can
`be implemented using numerous difl‘erent architectures in a
`number of different environments. For ease of description.
`the cache control unit will be described in terms of an
`
`example architecture implemented in an example environ-
`ment. Based on the above and following description. it will
`become apparent to a person skilled in the relevant art how
`to implement the cache control unit using different archi-
`tectures in this or alternative environments.
`
`Further features and advantages of the present invention.
`as well as the structure and operation of various embodi-
`ments of the present invention. are described in detail below
`with reference to the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`FIG. 1 is a block diagram of a cache centrol system for
`processing requests for data according to the present inven-
`tion.
`
`FIG. 2 is a block diagram showing an example of an
`instruction cache and data cache in a cache control unit
`according to the present invention.
`FIG. 3 is schematic diagram of a cache and cache control
`processor according to a first embodiment of the present
`invention.
`
`FIG. 4 is a high-level operational flow diagram showing
`a cache hit/miss and data return according to the present
`invention.
`
`FIG. 5 is a diagram showing an example of a data request
`address as used in the present invention.
`FIG. 6 is a flow diagram showing a routine for processing
`CPU requests in a cache controller processor according to
`the present invention.
`FIG. 7 is a flow diagram showing a HTR subroutine for
`servicing multiple requests made for data in the same
`subblock according to the present invention.
`FIG. 8 is a flow diagram showing a subroutine for
`processing selected CPU requests further to accommodate
`out—of—order data return and multiple requests according to
`the present invention.
`FIG. 9 is a flow diagram showing a subroutine for
`returning data according to the present invention.
`FIG. 10 is a schematic diagram showing a conventional
`superset cache.
`FIG. 11 is a schematic diagram showing a superset cache
`according to the present invention.
`The present invention is described with reference to the
`accompanying drawings. In the drawings. like reference
`numbers indicate identical or functionally similar elements.
`Additionally. the left-most digit(s) of a reference number
`identifies the drawing in which the reference number first
`appears .
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`1. Overview and Discussion of the Invention
`The present invention optimizes the processing of data
`requests and instructions after cache misses. By converting
`CPU requests into cache control requests. a sequence of
`multiple requests for data can be sent to various lower—level
`storage devices. These cache control requests include cache
`control ID tags for identifying individual requests. Return-
`ing data is then received and forwarded to the requesting
`CPU in the order of return provided by the lower—level
`storage devices.
`In this manner. data is retrieved from lower-level memory
`after cache misses more quickly and efficiently than pro-
`cessing data requests in sequence. Because the cache control
`ID tags allow individual requests to be distinguished. mul-
`tiple requests for data can be pending at the same time. Such
`requests can also be received “out of order” from lower-level
`memory. that is. in the order of which a storage device
`returns the data being sought.
`Out-of—order data return is especially advantageous in a
`system having numerous memory or storage devices. For
`example. such a system may include DRAM. ROM.
`EPROM. disk storage. and I/O devices. Typically.
`it
`is
`expected that data would be returned from DRAM quicker
`than they would be returned from disk. or from I/O. Thus. a
`request from the CPU for data from I/O would not create a
`bottleneck as in prior systems. as subsequent requests to
`devices such as a DRAM with a faster rate of return can still
`be filled.
`
`2. Example Environment
`FIG. 1 is a diagram illustrating an example environment
`for the cache control unit. This environment includes a
`processor 102 (referred to as CPU 102). cache control unit
`104. a memory control unit 108. memory 112. and a virtual
`memory unit 122. The terms “.CPU’ “processor.” and
`“central processing unit” are used throughout to refer in
`general to any computer processing device such as a micro—
`processor. CPU 102 includes an instruction fetch unit (IFU)
`116 and an instruction execution unit (IEU) 120. The prin-
`ciples of memory-hierarchy design. and in particular. the
`
`10
`
`15
`
`20
`
`25
`
`3O
`
`35
`
`45
`
`SO
`
`55
`
`65
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 16 of 22 PageID# 1093
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 16 of 22 PagelD# 1093
`
`5
`
`5,778,434
`
`6
`
`accessing of cache memory. main memory. and/or virtual
`memory by computer processors are generally well-known
`and described. for example. in the book by John L. Hen-
`nessey and David A. Patterson. Computer Architecture A
`Quantitative Approach. Morgan Kaufmann Publishers. Inc..
`San Mateo. Calif. 1990 (incorporated by reference herein).
`Memory 112 is cost effective storage that generally has a
`slower access time than cache memory. Memory 112 can
`include. but is not limited to. any of a number of storage
`devices such as I/O devices. EPROM. disk or tape storage.
`ROM. DRAM. and so forth. Memory control unit (MCU)
`108 controls access to memory 112. Memory control unit
`108 can be. for example. an I/O controller or a memory
`controller. In this document. memory control unit 108 is
`sometimes referred to generically as a resource control unit
`108.
`Instruction fetch unit 116 fetches instructions for the
`processor. That is. instruction fetch unit requests desired
`instructions from cache conn'ol unit (CCU) 104. The instruc-
`tions can be fetched in advance. or when needed. Instruction
`execution unit 120 fetches program data required to execute
`an instruction. Similarly. IEU 120 requests the data from
`CCU 104. An example of execution data is a source operand
`for an instruction.
`
`The virtual memory unit (VMU) 122 is further included
`for translating or mapping virtual addresses in instruction or
`data requests from the CPU 102 into physical addresses. For
`instance. in a paged virtual memory type of scheme. the
`virtual memory unit 122 translates virtual addresses received
`from the CPU 104 through Instruction Fetch Unit IFU 116
`into physical addresses. In one example of a virtual address—
`ing scheme. a virtual page number can be passed through a
`page table and added to a page offset value to obtain the
`physical address.
`FIG. 2 is a block diagram illustrating an example imple-
`mentation of the CCU 104. The cache memory is divided
`into an Instruction cache (I cache) 204 and a data cache (D
`cache) 208. I cache 204 is often implemented as a read only
`memory with respect to CPU 102. That is. CPU 102 can only
`read instructions from I cache 204. There may be
`environments. however. where it is desirable to implement
`I cache 204 as a read/write memory. One such environment
`is where CPU 102 may modify the code.
`In contrast. D cache 208 is typically implemented as a
`read/write memory. This is because program data are fre-
`quently changed as a result of the executing code. Therefore.
`it is desirable for CPU 102 to have the ability to write data
`directly to the data cache 208.
`A Request/Retrieval Processor 206 processes data
`requests between IFU 116 or IEU 120 and MCU 108. The
`Request/Retrieval Processor 206 tracks requests for data
`from either the IFU 116 or the [EU 120 during cache misses.
`Data returned from the MCU 108 is then matched to the
`appropriate IFU or IEU data request even for out—of-order
`data returns.
`
`As shown in FIG. 2. the Request/Retrieval Processor 206
`can also be connected to each cache memory. I cache 204
`and D cache 208. In this way. information can be passed
`directly between each cache memory and the Request/
`Retrieval Processor 206. Although a single. common
`Request/Retrieval Processor 206 is shown in FIG. 2.
`it
`would be apparent to one skilled in the art that separate
`Request/Retrieval Processors dedicated to each cache
`memory 204 and 208. respectively. could be provided.
`Cache Control Processor Embodiment
`
`FIG. 3 is a block diagram illustrating an example imple-
`mentation of a cache control processor according to one
`
`embodiment of the present invention. For the purpose of
`description. the implementation is described with respect to
`D cache 208. It would be readily apparent to skilled in the
`art. however. that this example implementation can also be
`equally applied to I cache 204.
`In particular FIG. 3 depicts in greater detail the elements
`of the Request/Retrieval Processor 206 and data cache 208
`according to one embodiment of the present invention.
`The D cache 208 is a typical dual—port TAG RAM 302 and
`a S—RAM 304. Static random access memory (S—RAM) 304
`preferably is a high speed memory device suitable for cache
`memory such as a static RAM. The dual-port tag RAM 302
`stores block frame addresses identifying the corresponding
`blocks of data which are stored in S-RAM 304. To ascertain
`whether the S-RAM 304 contains requested data. only the
`addresses of the TAG RAM 302 need to be searched. The
`placement of blocks of data in S—RAM 304 can be either a
`directly mapped. fully associative. or N-way set associative
`type of placement. Preferably. the D cache 208 is two-way
`set associative.
`
`As shown in FIG. 3. the dual-port TAG RAM 302 has two
`ports. a READ/WRITE port and a READ port. The READ/
`WRH‘E port allows reading and writing of appropriate
`block-frame addresses in TAG RAM 302 and the corre—
`sponding data in S-RAM 304. while the READ port allows
`snooping for synchronizing multiple processor operations
`and bus coherency.
`FIG. 3 further depicts a two-level memory-hierarchy. As
`is well—known. the upper cache level allows quicker retrieval
`of data stored in the cache S-RAM 304 compared to the
`greater storage capacity but slower retrieval time of data
`stored in the main memory level in memory 112. Such a
`two-level memory hierarchy is illustrative only. For
`instance. it would be obvious to one of ordinary skill in the
`art to configure additional levels of memory as desired.
`In general. such dual-port TAG RAMs and their operation
`in a CPU having a hierarchial-memory arrangement are
`well-known. as described for instance in the Hennessey et a1.
`book cited above. According to the present
`invention.
`however. a Request/Retrieval Processor 206 is further pro-
`vided to enable processing of multiple requests and out-of-
`order data returns from memory during cache misses.
`As shown in the embodiment of FIG. 3. the Request/
`Retrieval Processor 206 consists of a cache control proces-
`sor 306. a request queue (RQ) 308. and a return queue (R’I'N)
`310.
`
`In general. the cache controller processor 306 processes
`CPU requests for data after TAG RAM 302 has been
`searched and the data being requested has not been found.
`After such cache misses. cache control processor 306
`screens and classifies each CPU request. This classifying
`which will be described in even further detail with respect to
`FIG. 8 basically checks whether similar CPU requests are
`pending already for the same subblock and in particular for
`the same long words within a subblock In this way. the Hil‘
`on Request fields (HTR flag. H'I'R-Addr. and HTR-CPU-ID)
`can be set to allow multiple requests for different long words
`in the same subblock to be serviced by one subblock data
`retum. Also. complexities arising from two CPU requests for
`the exact long word of data which directly map to the same
`cache location can be avoided.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`CPU—ID and ADDR information corresponding to a par-
`ticular data request are then pushed onto the respective
`CPU-ID and RD ADDRESS fields in the request queue 308
`of cache controller processor 306. A validity bit V is then set
`to indicate the presence of valid information in the CPU—ID
`and ADDR fields for that stored CPU request.
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 17 of 22 PageID# 1094
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 17 of 22 PagelD# 1094
`
`7
`
`8
`
`5.77 8.434
`
`The CPU—1D can be an IFU tag which allows the IFU 116
`to identify the requesting CPU. Similarly. the CPU-1D can
`consist of an IEU tag issued from [EU 120. The ADDR
`corresponds to a physical address identifying the data being
`requested. For example. as will be described in further detail
`below with respect to FIG. 5. data is returned in subblocks.
`in other words. the cache fill granularity is one subblock
`The ADDR information pushed on RQ queue 308. then.
`identifies the particular subblock sought from storage and
`further identifies a particular long word of data within the
`subblock.
`
`Each RQ queue 308 entry. i.e. the CPU-ID. RD Address.
`and associated HIT on Request fields. is addressed by a
`CCU-ID to identify that entry on the RQ queue 308. For
`example.
`the CCU—1D can be a pointer. not necessarily
`stored on the RQ queue. for identifying each CPU request
`entry stored on the RQ queue 308. The bit-length of the
`CCU—1D then depends upon the depth of the RQ queue 308
`and is preferably shorter in length than the CPU-1D. In one
`example. the RQ queue 312 can accommodate up to 8
`pending data requests. A 3-bit CCU—ID is sufficient
`to
`uniquely identify up to 8 data requests stored in the RQ
`queue. These CCU-IDs can be assigned or recognized by the
`controller as new CPU requests are pushed on the RQ queue
`308.
`
`Alternatively. a scoreboard type of sy