Samsung Electronics Co., Ltd. v. NVIDIA Corporation, 3:14-cv-00757, No. 47-1 (E.D.Va. Jan. 12, 2015)

Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 1 of 22 PageID# 1078
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 1 of 22 Page|D# 1078
`
`
`
`EXHIBIT A
`
`EXHIBIT A
`
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 2 of 22 PageID# 1079
`case3:14'CV'00757'REP'DJN “C“memilllllllllllfllfllllllllllllllflllillllllilllillllllli“’79
`
`USOOS778434A
`
`United States Patent
`
`[19]
`
`[11] Patent Number:
`
`5,778,434
`
`Nguyen et al.
`
`
`
`[45] Date of Patent: Jul. 7, 1998
`
`[54] SYSTEM AND METHOD FOR PROCESSING
`MULTIPLE REQUESTS AND OUT OF
`ORDER RETURNS
`
`[75]
`
`Inventors: Le Trong Nguyen. Monte Sereno;
`Yasuaki Hagiwara. Santa Clara. both
`of Calif.
`
`[73] Assignee:
`
`JSeiko Epson Corporation. Tokyo.
`apan
`
`[21] Appl. No.: 479,035
`_
`[22]
`filed:
`Jun. 7’ 1995
`1
`Int.
`1.6 ......................................................
`{:2} U S CC]
`711,123.65‘11/232
`5 8
`F:eld f g""""""""""""""""""""""
`3
`’
`_
`I:
`]
`l
`0
`earch ..................................... r95/46’;4. 473,
`711/1‘ 7' 146
`.
`References Clted
`US. PATENT DOCUNEENTS
`
`[56]
`
`13131;? 23:: gas];""""g1""""""""""""""" 33:22;
`
`4,797,814
`1/1989 Brewzar a '
`395I403
`5/1995 Konigsfeidgi. 31.
`5’420’991
`395/477
`__
`.. 395/393
`5:487:156
`111996 Popescu et al.
`
`.. 395/292
`5,535,340
`7/1996 Bell a a1,
`.....
`.. 395/309
`5,535,345
`7/1996 Fisch et a].
`
`5,539,911
`7/1996 Nguyen et a].
`.. 395/300
`.. 395/375
`$557,763
`9/1996 Sewer et a1.
`-- 395/300
`5560932
`9/19‘96 Nguyen Ct 31>
`
`" 395/453
`556L780 10/1996 Glen et a1.
`FOREIGN PATENT DOCUMENTS
`292188
`11/1988 E
`P
`01?
`uropean at.
`.
`568231
`11/1993 European Pat. Ofl’.
`
`.
`.
`
`OTHER PUBLICATIONS
`
`J.L. Hennessey and DA. Patterson. Computer Architecture:
`A Quantitative Approach (Morgan Kaufmann Publ: San
`Mateo. CA 1990) pp. 404-495.
`
`Primary Examiner—David L. Robertson
`Attorney, Agent, or Finn—Sterne. Kessler. Goldstein & Fox.
`P.L.L.C.
`
`ABSTRACT
`[57]
`A system and method for processing a sequence of requests
`for data by one or more central processing units (CPUs) after
`cache misses. Each CPU request includes a CPU-ID tag
`identifying the CPU issuing the request for data and an
`address identifying a location in lower-level memory where
`the data is stored. Cache-control 1]) tags are assigned to
`identify the locations in the request queue of the respective
`CPU-ID tags associated With each CPU quUCSt. caChC-
`control requests consisting of the cache—control ID tags and
`the respective address information are sent from the request
`queue to the lower-level memory or storage devices. Data is
`then returned along with the corresponding CCU-ID tags in
`the order in which it is returned by the storage devices.
`Finally. the sequence of CPU requests for data is fulfilled by
`returning the data and CPU-ID tag in the order in which the
`data was returned from lower—level memory. By issuing
`multiple requests for data and allowing out of order data
`return. data IS retrieved from lower-level memory after
`cache misses more quickly and efliciently than processing
`data requests in sequence. By checking the request queue.
`pending CPU requests for the same data including requests
`for the same long word of data can be identified. Cache hits
`for multiple requests are determined by simultaneously
`checking sets in cache memory. Multiple instructions are
`then issued for multiple superset cache hits.
`19 Claims, 11 Drawing Sheets
`
`
`r m-"eq-"u~----------—--—rM“*---——“*“'—-'——"“',
`- DAM
`? [TAG RAM
`302
`304
`g
`i
`:/L/ CACHE
`“—‘” “‘> READ
`208
`— ~—‘L>J READ/WRITE
`i
`|
`.._
`|
`
`i
`
`
`
`_,
`
`, ._
`
`,
`
`l
`
`.
`
`y
`4
`
`
`
`
`‘r--“‘”‘ __“"__“_‘—I _________________________________ 'l
`REQUESTK
`HIT on Request
`g
`:1
`;
`D .3 g
`a
`2
`{/V/ RETRIEVAL
`g
`3 EL r:
`[I 3
`E
`I
`PROCESSOR
`.
`8 5 e : ES
`.5
`i
`206
`f >
`.
`"‘ “ V
`U)
`1
`— 777
`l
`W,
`i
`77‘ T“
`5
`1
`t
`‘
`f l k/ 308
`E
`
`:j
`;
`l
`l
`;
`a
`;
`;
`l
`i
`g
`i
`;
`‘
`I
`l
`:
`“~“”i
`z
`' CCU-ID. DATA‘ MCU I
`:
`108 i
`. CCLHD, ADDRJ
`j
`l
`___ ‘ 7
`
`T
`
`V
`
`7 WT;
`
`
`-.
`
`CACHE CONTROL PROCESSOR 306
`
`
`l.
`
`CPU-ID.
`DATA
`CPU-ID: J
`ADDR
`1,1
`l
`L
`
`i
`;
`i —P_>
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 3 of 22 PageID# 1080
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 3 of 22 Page|D# 1080
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 1 of 11
`
`5,778,434
`
`122
`
`
`
`l
`I
`:
`1
`.
`1
`l
`:
`:
`;
`I
`:
`:
`:
`l
`
`CCU
`(Cache
`Control
`Unit)
`
`104
`
`MEM
`
`112
`
`
`
`IFU
`(instruction
`Fetch Unit)
`
`116
`
`[EU
`
`(Instruction
`
`execuflon)
`
`'
`1
`1
`1
`1
`1
`l
`1
`:
`1
`1
`i
`1
`'
`
`1lI| ||l l
`
`CPU 102
`
`FIG. 1
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 4 of 22 PageID# 1081
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 4 of 22 Page|D# 1081
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 2 of 11
`
`5,778,434
`
`INSTRUCTION CACHE
`
`SNOOP & READ ONLY
`
`
`
`
`
`
`204
`
`
`IFU
`
`115
`
`READ ONLY
`
`NO WRITE
`
`REQUEST/RETRIEVAL
`
`PROCESSOR
`
`206
`
`
`
`MCU
`
`108
`
`IEU
`
`120
`
`
`
`DATA CACHE
`
`R/W
`
`
`
`

`#4
`
`a8
`
`I3.mg4amOE
`
`V
`O
`aC
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 5 of 22 PageID# 1082
`D
`28m.
`
`cPm..%U
`
`0eo.m
`
` :mngm"aFIIL_Nh_a«$331:0t:L“Flllllllllllllllllllllllllllllllllllllllllllllllllm.................................................._P__Mm“1‘h7.t_mtwsaofim:Lnwow“9mm
`
`
`2l_mmnuwm_,uHmn__h_nSH05.___1._P5_B_a.7____4.3
`..LlO_m%mom“wmmmmmmmmmMmwmOmmmOOma__I.mmwm.m_m7,éiifimxiw(mauD1.
`)HHHo‘wmmmoo<_0A_
`
`
`
`
`
`5wmegéjoo_m3mm?“8mmemmmoomm.55on
`
`
`”.505Pjzodgoo_,u.
`
`
`
`nw....................................................29_f5
`
`
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 6 of 22 PageID# 1083
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 6 of 22 Page|D# 1083
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 4 of 11
`
`5,778,434
`
`
`
`RECEIVE REQUEST FROM
`
`CPU 114
`
`
`
`
`
`
`
`CHECK TAG RAM 302 To
`
`DETERMINE WHETHER
`REQUESTED DATA ARE
`
`
`
`
`404
`
`408
`
`ALREADY IN CACHE 208
`
`
`
`
`
`SEND CPU REQUEST TO
`
`REQUEST/RETRIEVAL
`
`PROCESSOR206 FOR
`
`YES
`
`
`
`
`PROCESSING MULTIPLE
`
`DATA REQUESTS AND
`
`
`
`
`OUT—OF—ORDER DATA
`
`RETURNS
`
`
`
`
`
`
`RETURN REQUESTED
`
`415
`
`DATA TO CPU 114
`
`FIG. 4
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 7 of 22 PageID# 1084
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 7 of 22 Page|D# 1084
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 5 of 11
`
`5,778,434
`
`I,—
`
`0L
`
`L!
`_.J
`LL}
`
`“9
`LL!
`C]
`1—
`g E
`i
`('9
`Z
`9
`
`s; SUBBLOCK
`
`L“V
`L”
`v
`E
`V.
`
`524
`
`FIG.5
`
`514
`
`VOv [
`
`—33
`$5
`32
`
`Ou
`
`1
`0:
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 8 of 22 PageID# 1085
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 8 of 22 Page|D# 1085
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 6 of 11
`
`5,778,434
`
`
`
`ROUTINE FOR CHECKING
`
`600
`
`PENDING CPU REQUESTS
`
`
`
`
`
`610
`
`CHECK RQ QUEUE
`
`620
`625
`
`COMPARE: A/
`
`
`RQ QUEUE =
`
`
`
`INDEX PORTION OF
`
`NO
`
`ADDR?
`
`
`
`630
`
`GO TO
`CACHE MISS
`SUBROUTINE
`
`
`
`
`800
`
`
`
`WAIT
`
`UNTIL
`
`PENDING
`
`REQUESTS
`
`RESOLVED
`
`GO TO
`CACHE MISS
`SUBROUTINE
`800
`
`ADDR?
`
`RQ QUEUE =
`
`
` COMPARE: /‘/
`
`
`TAG PORTION OF
`
`
`ADDR?
`
`
`YES
`640
`
`
`
`RODS/15:5EB:
`
`
`
`SUBBLOCK SELECT
`
`PORTI
`F
`ON O
`
`
`YES
`
`
`
`COMPARE:
`
`
`RQ QUEUE =
`LONG WORD SELECT
`
`650
`
`YES
`
`
`
`PORTION OF
`ADDR?.
`
`NO
`
`670
`
`660
`
`SAME
`
`REQUEST+
`RESULT
`
`PENDING
`WAIT
`
`HTR ROUTINE 700
`
`FIG. 6
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 9 of 22 PageID# 1086
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 9 of 22 Page|D# 1086
`
`US. Patent
`
`Jul. 7, 1993
`
`Sheet 7 of 11
`
`5,778,434
`
`ROUTINE
`
`710
`
`
`
`
`CHECK HTR
`FLAG; READ
`
`HTR ADDR
`
`
`
`
`"ON
`
`720
`
`"1"
`
`WAIT UNTIL
`
`PENDING
`
`SET HTR IN RQ QUEUE
`
`TO "1|!
`
`STORE LONG WORD SELECT
`PORTION OF ADDR INTO HTR
`
`ADDR FIELD OF RO QUEUE
`
`
`GO TO START FOR
`
`
`
`NEXT REQUEST
`
`
`730
`
`740
`
`750
`
`FIG. 7
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 10 of 22 PageID# 1087
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 10 of 22 Page|D# 1087
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 8 of 11
`
`5,778,434
`
`CACHE MISS
`
`800
`
`SUBROUTINE
`
`STORE CPU REQUEST
`
`(CPU-ID, ADDR) IN RQ
`QUEUE 308
`
`SET VALIDITY BITS
`
`DATA REQUEST TO MCU
`
`810
`
`820
`
`830
`
`IDENTIFY CCUID
`
`CORRESPONDING TO THE
`
`STORED CPUID
`
`SEND ADDR, CCUID IN
`
`FIG. 8
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 11 of 22 PageID# 1088
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 11 of 22 Page|D# 1088
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 9 of 11
`
`5,778,434
`
`DATA RETURN ROUTINE
`
`900
`
`
`
`910
`
`920
`
`
`
`RECEIVE DATAK
`CCU-ID FROM MCU
`
`
`
`
`GET CPU-ID, ADDR, HTR
`ADDR. HTR—CPU-ID, AND
`
`HTR FLAG IN RQ QUEUE
`
`308 USING CCU-ID
`
` ser COUNTER
`
`i=0
`
` READ DATA. START ADDR
`
`
`FROM RTN QUEUE
`
`
`930
`
`950
`
`
`
`COMPARE /‘/
`
`
`SEND LONG
`
`START ADDR + i 2
`
`
`WORD OF DATA
`LONG WORD
`
`
`
`AND CPU-ID
`SELECT PORTION
`
`
`
`980
`OF ADDR
`
`
`
`INCREMENT
`
`COUNTER
`
`i=i+1
`
`CHECK
`
`
`SEND LONG
`
`
`WHETHER HTR
`
`WORD OF DATA
`FLAG SET AND
`
`
`
`AND HTR-CPU—ID
`HTR-ADDR = START
`
`
`
`ADDR + i
`
`
`
`
`FIG. 9
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 12 of 22 PageID# 1089
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 12 of 22 Page|D# 1089
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 10 of 11
`
`5,778,434
`
`122
`
`Phygcal
`
`1001
`
`1021
`
`It?! Address
`U
`AG ADDR
`
`AG ADDR
`
`TRAM‘I
`
`TRAM2
`
`TAG RAMs
`
`1003
`
`‘
`
`1023
`
`SRAM1
`
`SRAM2
`
`I
`
`E STORAGE
`
`3 MEMORY
`
`
`E
`/
`\r\
`1005
`10253
`
`/
`1000
`
`MUX
`
`,
`
`/
`
`/I/
`1030
`
`select
`
`+
`
`O
`Prlor Art
`
`FICB.‘10
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 13 of 22 PageID# 1090
`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 13 of 22 Page|D# 1090
`
`US. Patent
`
`Jul. 7, 1998
`
`Sheet 11 of 11
`
`5,778,434
`
`122
`
`1101
`
`1121
`
`P0 -V
`
`p1
`
`M
`U
`
`TRAM1
`
`TRAM2
`
`TAG RAMs
`
`_
`
`2requesm
`
`—--1.
`
`SWITCH
`1102
`\11 104
`
`1 122
`
`1124
`
`1110
`
`
`
`SRAM1
`
`
`
`SRAMZ
`
`STORAGE
`MEMORY
`
`
`/7
`1105
`
`1125
`
`IEU / IFU
`
`1EU / IFU
`
`FIG. 11
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 14 of 22 PageID# 1091
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 14 of 22 Page|D# 1091
`
`l
`SYSTEM AND METHOD FOR PROCESSING
`MULTIPLE REQUESTS AND OUT OF
`ORDER RETURNS
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`This invention relates to a data processor having a hier-
`archial memory arrangement including a cache memory to
`speed data retrieval. More specifically. the present invention
`relates to an improved system and method for cache control
`and management which processes out-of—order return data
`and instructions. and/or multiple fetch requests. According
`to another aspect of the present invention. sets of cache
`memory can be checked for multiple requests. Multiple
`instructions and/or data can then be returned to fulfill more
`than one request at a time.
`2. Related Art
`
`Increases in processor speed have led to the development
`of memory devices having very fast access times. The cost
`of such memory devices. however. is often proportionally
`related to the speed at which the devices can be accessed.
`Thus. to store all of a processor’s data and the program
`instructions in very fast memory can lead to very high
`memory costs.
`
`To minimize the cost associated with high-speed memory
`while still reaping the benefits of fast access times. system
`designers have implemented cache memories. In a cache
`memory system. the majority of instructions and program
`data are stored in standard memory such as a disc. hard
`drive. or low speed random access memory (RAM). A
`relatively small amount of high-speed memory. called a
`cache. is provided to store a subset of the program data
`and/or instructions. In this patent document. the term data.
`when used in reference with storage in the cache. is used to
`generally refer to either instruction execution data or to
`program instructions.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Typically. those data that are most frequently accessed by
`the processor are stored in the cache. As a result. these data
`can be accessed by the processor at a much faster rate.
`Additionally. some systems implement an instruction
`prefetch. wherein instructions are fetched from low-speed
`storage in advance and stored in the cache. As a result. the
`instructions are already in cache when needed by the pro-
`cessor and can be accessed quickly.
`System designers frequently implement cache storage to
`speed access times for data and instructions.
`In such
`systems. a cache control unit
`is often implemented to
`manage the storage of data in the cache and provide the data
`to the processor. In these systems. the instruction fetch unit
`and instruction execution unit go to the cache control unit to
`request the required data. Upon receipt of a request. the
`cache control unit first searches the cache for the data
`requested. If the requested data exist in the cache. the data
`are provided to the requesting unit from the cache. This
`condition is known as a cache hit. If the data are not present
`in the cache. the cache control unit retrieves the data from
`storage and stores the data in a known location in the cache.
`Instruction requests are often handled in sequence accord-
`ing to their order in a particular application program. Bottle-
`necks and delays occur as outstanding requests. which might
`otherwise be executed more quickly. wait for preceding
`slower instructions to be processed.
`For example. in multi-processor systems sharing two or
`more buses. a central processor unit (CPU) stall condition
`
`45
`
`50
`
`55
`
`65
`
`5.778.434
`
`2
`
`arises when one of the buses is occupied in savicing a
`particular processor. When data requests or instructions are
`executed in sequence. the other processors depending on the
`occupied bus wait for the bus to become available before
`proceeding to process other data requests and instructions.
`regardless of the availability of other buses.
`Delays can further arise when sequential data requests and
`instructions are made to wait for data return from storage
`devices which have a different rate of data return. For
`
`instance. data return from faster lower-level storage devices
`such as dynamic random access memory (DRAM) still must
`wait for preceding data requests made to slower lower-level
`storage devices such as an input/output (I/O) device.
`To improve efliciency and speed in processing program
`instructions. many contemporary processors are capable of
`dynamic scheduling of instructions. Dynamic scheduling
`means that
`instructions can be retrieved from memory.
`scheduled. and executed. all in an order that is diiferent from
`the program order. At a general level. dynamic scheduling
`allows a pipelined processor to maximize the utilization of
`its resources by prioritizing the use of the resources between
`the multiple processors. Such dynamic scheduling. however.
`does not consider the potential disparity between the storage
`devices or the resources themselves in executing instructions
`or data requests. The CPU stalls encountered when a bus is
`servicing a particular processor is also not overcome.
`The inventors have discovered that there is a need for
`optimizing instruction execution and data requests after a
`cache miss. In particular. there is a need for accommodating
`out-of—order data returns and servicing multiple requests to
`increase the speed and efficiency of data retrieval. In mul-
`tipleprocessor systems sharing multiple buses it is especially
`desirable to avoid CPU stall. Further. it is desirable to avoid
`bottleneck and to quickly and efliciently accommodate data
`requests made to a variety of lower-level storage devices
`with difierent response times.
`SUMMARY OF THE INVENTION
`
`The present invention optimizes the processing of data
`requests and instructions after cache misses. By converting
`CPU requests into cache control requests. a sequence of
`multiple requests for data can be sent to various lower-level
`storage devices or shared buses. These cache control
`requests include cache control unit identification tags (CCU-
`ID tags) for identifying individual requests. Returning data
`is then received and forwarded to the requesting CPU in the
`order of return provided by the lower—level storage devices
`or shared buses.
`In this manner. data is retrieved after cache misses more
`
`quickly and efficiently than processing data requests in
`sequence. Because the cache control ID tags allow indi-
`vidual requests to be distinguished. multiple requests for
`data can be pending at the same time. Such requests can also
`be received “out of order” from lower-level memory. that is.
`in the order of which a storage device returns the data being
`sought.
`Out-of-order data return is especially advantageous in a
`system having numerous memory or storage devices. For
`example. such a system may include DRAM. read only
`memory (ROM). electrically programmable read-only
`memory (EPROM) disk storage. and I/O devices. Typically.
`it is expected that data would be returned from DRAM
`quicker than they would be returned from disk. or from I/O.
`Thus. a request from the CPU for data from I/O would not
`create a bottleneck as in prior systems. as subsequent
`requests to devices such as a DRAM with a faster rate of
`return can still be filled.
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 15 of 22 PageID# 1092
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 15 of 22 Page|D# 1092
`
`3
`
`4
`
`5.778.434
`
`Conventional systems were limited in their ability to
`return data out of order due to their lack of a technique for
`tracking the data returned. Retaining the CPU—ID identifying
`tag with the data request greatly increased the numbers of
`bits transmitted for each request. Such overhead costs were
`prohibitive. Without
`the ability to track the data being
`returned.
`the conventional cache control unit would not
`know which data correspond to which request.
`According to one embodiment of the invention. a
`sequence of CPU requests for data not previously found in
`cache memory. is stored in a request queue. Each CPU
`request includes a CPU-ID tag identifying the CPU issuing
`the request for data and an address identifying a location in
`lower-level memory where the data is stored. Cache-control
`ID tags are then assigned to identify the locations in the
`request queue of the respective CPU—1D tags associated with
`each CPU request.
`Cache-control requests consisting of the cache-control ID
`tags and the respective address information are then sent
`from the request queue to the lower-level memory or storage
`devices. For example. the cache-control requests can be sent
`by a cache control processor to a memory control unit for
`transmittal to various storage devices.
`Data is then returned along with the corresponding CCU—
`ID tags in the order in which it is returned by the storage
`devices. A cache control processor then retrieves the asso-
`ciated CPU-ID tags identified by the CCU-ID tags. The
`sequence of CPU requests for data is fulfilled by returning
`the data and CPU-ID tag in the order in which the data is
`returned.
`
`According to a further feature of the present invention. the
`cache control processing maintains Hit on Request (HTR)
`information to service multiple requests for the diflerent
`long words in a single returned subblock.
`In another embodiment of the present invention. cache
`hits for multiple requests are determined by checking sets of
`Tag RAMs simultaneously in each cache. Multiple instruc-
`tions can be then be issued in response to cache hits.
`The cache control processing of the present invention can
`be implemented using numerous difl‘erent architectures in a
`number of different environments. For ease of description.
`the cache control unit will be described in terms of an
`
`example architecture implemented in an example environ-
`ment. Based on the above and following description. it will
`become apparent to a person skilled in the relevant art how
`to implement the cache control unit using different archi-
`tectures in this or alternative environments.
`
`Further features and advantages of the present invention.
`as well as the structure and operation of various embodi-
`ments of the present invention. are described in detail below
`with reference to the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`FIG. 1 is a block diagram of a cache centrol system for
`processing requests for data according to the present inven-
`tion.
`
`FIG. 2 is a block diagram showing an example of an
`instruction cache and data cache in a cache control unit
`according to the present invention.
`FIG. 3 is schematic diagram of a cache and cache control
`processor according to a first embodiment of the present
`invention.
`
`FIG. 4 is a high-level operational flow diagram showing
`a cache hit/miss and data return according to the present
`invention.
`
`FIG. 5 is a diagram showing an example of a data request
`address as used in the present invention.
`FIG. 6 is a flow diagram showing a routine for processing
`CPU requests in a cache controller processor according to
`the present invention.
`FIG. 7 is a flow diagram showing a HTR subroutine for
`servicing multiple requests made for data in the same
`subblock according to the present invention.
`FIG. 8 is a flow diagram showing a subroutine for
`processing selected CPU requests further to accommodate
`out—of—order data return and multiple requests according to
`the present invention.
`FIG. 9 is a flow diagram showing a subroutine for
`returning data according to the present invention.
`FIG. 10 is a schematic diagram showing a conventional
`superset cache.
`FIG. 11 is a schematic diagram showing a superset cache
`according to the present invention.
`The present invention is described with reference to the
`accompanying drawings. In the drawings. like reference
`numbers indicate identical or functionally similar elements.
`Additionally. the left-most digit(s) of a reference number
`identifies the drawing in which the reference number first
`appears .
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`1. Overview and Discussion of the Invention
`The present invention optimizes the processing of data
`requests and instructions after cache misses. By converting
`CPU requests into cache control requests. a sequence of
`multiple requests for data can be sent to various lower—level
`storage devices. These cache control requests include cache
`control ID tags for identifying individual requests. Return-
`ing data is then received and forwarded to the requesting
`CPU in the order of return provided by the lower—level
`storage devices.
`In this manner. data is retrieved from lower-level memory
`after cache misses more quickly and efficiently than pro-
`cessing data requests in sequence. Because the cache control
`ID tags allow individual requests to be distinguished. mul-
`tiple requests for data can be pending at the same time. Such
`requests can also be received “out of order” from lower-level
`memory. that is. in the order of which a storage device
`returns the data being sought.
`Out-of—order data return is especially advantageous in a
`system having numerous memory or storage devices. For
`example. such a system may include DRAM. ROM.
`EPROM. disk storage. and I/O devices. Typically.
`it
`is
`expected that data would be returned from DRAM quicker
`than they would be returned from disk. or from I/O. Thus. a
`request from the CPU for data from I/O would not create a
`bottleneck as in prior systems. as subsequent requests to
`devices such as a DRAM with a faster rate of return can still
`be filled.
`
`2. Example Environment
`FIG. 1 is a diagram illustrating an example environment
`for the cache control unit. This environment includes a
`processor 102 (referred to as CPU 102). cache control unit
`104. a memory control unit 108. memory 112. and a virtual
`memory unit 122. The terms “.CPU’ “processor.” and
`“central processing unit” are used throughout to refer in
`general to any computer processing device such as a micro—
`processor. CPU 102 includes an instruction fetch unit (IFU)
`116 and an instruction execution unit (IEU) 120. The prin-
`ciples of memory-hierarchy design. and in particular. the
`
`10
`
`15
`
`20
`
`25
`
`3O
`
`35
`
`45
`
`SO
`
`55
`
`65
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 16 of 22 PageID# 1093
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 16 of 22 PagelD# 1093
`
`5
`
`5,778,434
`
`6
`
`accessing of cache memory. main memory. and/or virtual
`memory by computer processors are generally well-known
`and described. for example. in the book by John L. Hen-
`nessey and David A. Patterson. Computer Architecture A
`Quantitative Approach. Morgan Kaufmann Publishers. Inc..
`San Mateo. Calif. 1990 (incorporated by reference herein).
`Memory 112 is cost effective storage that generally has a
`slower access time than cache memory. Memory 112 can
`include. but is not limited to. any of a number of storage
`devices such as I/O devices. EPROM. disk or tape storage.
`ROM. DRAM. and so forth. Memory control unit (MCU)
`108 controls access to memory 112. Memory control unit
`108 can be. for example. an I/O controller or a memory
`controller. In this document. memory control unit 108 is
`sometimes referred to generically as a resource control unit
`108.
`Instruction fetch unit 116 fetches instructions for the
`processor. That is. instruction fetch unit requests desired
`instructions from cache conn'ol unit (CCU) 104. The instruc-
`tions can be fetched in advance. or when needed. Instruction
`execution unit 120 fetches program data required to execute
`an instruction. Similarly. IEU 120 requests the data from
`CCU 104. An example of execution data is a source operand
`for an instruction.
`
`The virtual memory unit (VMU) 122 is further included
`for translating or mapping virtual addresses in instruction or
`data requests from the CPU 102 into physical addresses. For
`instance. in a paged virtual memory type of scheme. the
`virtual memory unit 122 translates virtual addresses received
`from the CPU 104 through Instruction Fetch Unit IFU 116
`into physical addresses. In one example of a virtual address—
`ing scheme. a virtual page number can be passed through a
`page table and added to a page offset value to obtain the
`physical address.
`FIG. 2 is a block diagram illustrating an example imple-
`mentation of the CCU 104. The cache memory is divided
`into an Instruction cache (I cache) 204 and a data cache (D
`cache) 208. I cache 204 is often implemented as a read only
`memory with respect to CPU 102. That is. CPU 102 can only
`read instructions from I cache 204. There may be
`environments. however. where it is desirable to implement
`I cache 204 as a read/write memory. One such environment
`is where CPU 102 may modify the code.
`In contrast. D cache 208 is typically implemented as a
`read/write memory. This is because program data are fre-
`quently changed as a result of the executing code. Therefore.
`it is desirable for CPU 102 to have the ability to write data
`directly to the data cache 208.
`A Request/Retrieval Processor 206 processes data
`requests between IFU 116 or IEU 120 and MCU 108. The
`Request/Retrieval Processor 206 tracks requests for data
`from either the IFU 116 or the [EU 120 during cache misses.
`Data returned from the MCU 108 is then matched to the
`appropriate IFU or IEU data request even for out—of-order
`data returns.
`
`As shown in FIG. 2. the Request/Retrieval Processor 206
`can also be connected to each cache memory. I cache 204
`and D cache 208. In this way. information can be passed
`directly between each cache memory and the Request/
`Retrieval Processor 206. Although a single. common
`Request/Retrieval Processor 206 is shown in FIG. 2.
`it
`would be apparent to one skilled in the art that separate
`Request/Retrieval Processors dedicated to each cache
`memory 204 and 208. respectively. could be provided.
`Cache Control Processor Embodiment
`
`FIG. 3 is a block diagram illustrating an example imple-
`mentation of a cache control processor according to one
`
`embodiment of the present invention. For the purpose of
`description. the implementation is described with respect to
`D cache 208. It would be readily apparent to skilled in the
`art. however. that this example implementation can also be
`equally applied to I cache 204.
`In particular FIG. 3 depicts in greater detail the elements
`of the Request/Retrieval Processor 206 and data cache 208
`according to one embodiment of the present invention.
`The D cache 208 is a typical dual—port TAG RAM 302 and
`a S—RAM 304. Static random access memory (S—RAM) 304
`preferably is a high speed memory device suitable for cache
`memory such as a static RAM. The dual-port tag RAM 302
`stores block frame addresses identifying the corresponding
`blocks of data which are stored in S-RAM 304. To ascertain
`whether the S-RAM 304 contains requested data. only the
`addresses of the TAG RAM 302 need to be searched. The
`placement of blocks of data in S—RAM 304 can be either a
`directly mapped. fully associative. or N-way set associative
`type of placement. Preferably. the D cache 208 is two-way
`set associative.
`
`As shown in FIG. 3. the dual-port TAG RAM 302 has two
`ports. a READ/WRITE port and a READ port. The READ/
`WRH‘E port allows reading and writing of appropriate
`block-frame addresses in TAG RAM 302 and the corre—
`sponding data in S-RAM 304. while the READ port allows
`snooping for synchronizing multiple processor operations
`and bus coherency.
`FIG. 3 further depicts a two-level memory-hierarchy. As
`is well—known. the upper cache level allows quicker retrieval
`of data stored in the cache S-RAM 304 compared to the
`greater storage capacity but slower retrieval time of data
`stored in the main memory level in memory 112. Such a
`two-level memory hierarchy is illustrative only. For
`instance. it would be obvious to one of ordinary skill in the
`art to configure additional levels of memory as desired.
`In general. such dual-port TAG RAMs and their operation
`in a CPU having a hierarchial-memory arrangement are
`well-known. as described for instance in the Hennessey et a1.
`book cited above. According to the present
`invention.
`however. a Request/Retrieval Processor 206 is further pro-
`vided to enable processing of multiple requests and out-of-
`order data returns from memory during cache misses.
`As shown in the embodiment of FIG. 3. the Request/
`Retrieval Processor 206 consists of a cache control proces-
`sor 306. a request queue (RQ) 308. and a return queue (R’I'N)
`310.
`
`In general. the cache controller processor 306 processes
`CPU requests for data after TAG RAM 302 has been
`searched and the data being requested has not been found.
`After such cache misses. cache control processor 306
`screens and classifies each CPU request. This classifying
`which will be described in even further detail with respect to
`FIG. 8 basically checks whether similar CPU requests are
`pending already for the same subblock and in particular for
`the same long words within a subblock In this way. the Hil‘
`on Request fields (HTR flag. H'I'R-Addr. and HTR-CPU-ID)
`can be set to allow multiple requests for different long words
`in the same subblock to be serviced by one subblock data
`retum. Also. complexities arising from two CPU requests for
`the exact long word of data which directly map to the same
`cache location can be avoided.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`CPU—ID and ADDR information corresponding to a par-
`ticular data request are then pushed onto the respective
`CPU-ID and RD ADDRESS fields in the request queue 308
`of cache controller processor 306. A validity bit V is then set
`to indicate the presence of valid information in the CPU—ID
`and ADDR fields for that stored CPU request.
`
`

`Case 3:14-cv-00757-REP-DJN Document 47-1 Filed 01/12/15 Page 17 of 22 PageID# 1094
`Case 3:14-cv-OO757-REP-DJN Document 47-1 Filed 01/12/15 Page 17 of 22 PagelD# 1094
`
`7
`
`8
`
`5.77 8.434
`
`The CPU—1D can be an IFU tag which allows the IFU 116
`to identify the requesting CPU. Similarly. the CPU-1D can
`consist of an IEU tag issued from [EU 120. The ADDR
`corresponds to a physical address identifying the data being
`requested. For example. as will be described in further detail
`below with respect to FIG. 5. data is returned in subblocks.
`in other words. the cache fill granularity is one subblock
`The ADDR information pushed on RQ queue 308. then.
`identifies the particular subblock sought from storage and
`further identifies a particular long word of data within the
`subblock.
`
`Each RQ queue 308 entry. i.e. the CPU-ID. RD Address.
`and associated HIT on Request fields. is addressed by a
`CCU-ID to identify that entry on the RQ queue 308. For
`example.
`the CCU—1D can be a pointer. not necessarily
`stored on the RQ queue. for identifying each CPU request
`entry stored on the RQ queue 308. The bit-length of the
`CCU—1D then depends upon the depth of the RQ queue 308
`and is preferably shorter in length than the CPU-1D. In one
`example. the RQ queue 312 can accommodate up to 8
`pending data requests. A 3-bit CCU—ID is sufficient
`to
`uniquely identify up to 8 data requests stored in the RQ
`queue. These CCU-IDs can be assigned or recognized by the
`controller as new CPU requests are pushed on the RQ queue
`308.
`
`Alternatively. a scoreboard type of sy

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases