throbber
United States Patent
`Pattin et al.
`
`[19]
`
`[11] Patent Number:
`
`5,745,913
`
`[45] Date of Patent:
`
`Apr. 28, 1998
`
`USO05745913A
`
`................ 395/438
`395/425
`395/650
`358/296
`.. 395/494
`
`
`
`5,479,640 12/1995 Cartman et al.
`.
`5,431,591
`1/1996 Day, 1]] eta].
`5,481,707
`1/1996 Murphy Jr.eta].
`5,495,339
`2/1996 Stegbaueretal. ..
`5,604,884
`2/1997 Thome et 51.
`..........
`OTHER PUBLICATIONS
`
`Synchronous DRAM Datebook,
`IBM 16 Mbit
`IBM03 164-09C, pp. 1—1oo, Jan. 1996.
`Primary Examz'ner—Tod R. Swann
`Assistant Examinc-r—-David Langiahr
`Attorney, Agent, or Fz'rm—wStua1t T. Auvinen
`
`[57]
`
`ABSTRACT
`
`Memory requests from multiple processors are re-ordered to
`maximize DRAM row hits and minimize row misses.
`Requests are loaded into a request queue and simultaneously
`decoded to determine the DRAM bank of the request. The
`last row address of the decoded DRAM bank is compared to
`the row address of the new request and a row-hit bit is set
`in the request queue if the row addresses match. The bank’s
`state machine is consulted to determine if RAS is low or
`high, and a RAS—1ow bit in the request queue is set if RAS
`is low and the row still open. Arow counter is reset for every
`new access but is incremented with a slow clock while the
`row is open but not being accessed. After a predetermined
`count, the row is considered “stale”. A stale—row bit in the
`request queue is set if the decoded bank has a stale row. A
`request prioritizer reviews requests in the request queue and
`processes row-hit requests first, then row misses which are
`to a stale row. Lower in priority are row misses to non—stale
`rows which have been more recently accessed. Requests
`loaded into the request queue before the cache has deter-
`mined if a cache hit has occurred are speculative requests
`and can open a new row when the old row is stale or closed.
`
`19 Claims, 8 Drawing Sheets
`
`[54]
`
`[75]
`
`[73]
`
`[21]
`
`[22]
`[5 1]
`[52]
`
`[5 3]
`
`[56]
`
`MULTI-PROCESSOR DRAM CONTROLLER
`THAT PRIORITIZES ROW-MISS REQUESTS
`TO STALE BANKS
`
`Inventors: Jay C. Pattin. Redwood City; James S.
`Blomgren. San lose, both of Calif.
`
`Assignee: Exponential Technology, Inc., San
`Jose, Calif.
`
`Appl. No.: 691,005
`
`Filed:
`Int. Cl.‘ ..
`U.S. Cl.
`
`Aug. 5, 1996
`............................... G06F 12/00
`711/105; 711/111; 395/413;
`395/405; 364/DIG. 1
`Field of Search ....................... 365/230.01; 395/405,
`395/413, 438, 432
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`.. 364/200
`395/413
`395/405
`395/143
`395/325
`395/400
`395/425
`365/2300]
`. 395/325
`. 395/164
`365/189.01
`395/725
`395/425
`395/842
`395/800
`395/325
`..... 395/495
`
`
`
`9/1991 Fling et al.
`5,051,889
`11/1993 Mehring et al.
`5,265,236
`5,269,010 12/1993 MacDonald
`.
`5,280,571
`1/1994 Keith et al.
`5,289,584
`2/1994 Thnme et al.
`5,301,287
`4/1994 Herrell et al.
`5,3012% 4/1994 Hilton et al.
`5,307,320
`4/1994 Fairer et 111.
`5,353,416 10/1994 Olson
`5,357,606 10/1994 Adams
`5,392,239
`2/1995 Margulis el
`.
`
`5,392,436
`2/1995 Jansen et 2.].
`5,412,788
`5/1995 Collins et al.
`5,438,666
`8/1995 Craft et al.
`..
`5,440,752
`8/1995 Lentz at al.
`5,448,702
`9/1995 Garcia, Jr. et a1.
`5,450,564
`9/1995 Hassler et a1.
`.....
`
`.
`
`
`
`DISABLED
`
`91
`
`0001
`
`Volkswagen 1003
`
`0001
`
`Volkswagen 1003
`
`

`
`U.S. Patent
`
`Apr. 23, 1998
`
`Sheet 1 of 3
`
`5,745,913
`
`EIEIEIIJEIEIEIEIEIEIEIEIEI
`
`El
`
`El
`
`El
`
`El
`D
`
`D
`
`E]
`
`El
`
`D E
`
`]
`
`El
`
`LEVE L-2
`CACH E
`
`52
`
`52
`
`PCI
`
`INTERFACE
`
`EIEIEIEIEIEIUEIUEI
`
`EIEI
`
`gEl|.'.'lElUUE|
`
`El
`
`E]
`
`E!
`
`El
`
`El
`
`E E
`
`l D E
`
`El
`
`l.'.l
`
`0002
`
`

`
`U.S. Patent
`
`Apr. 28, 1998
`
`Sheet 2 of 3
`
`5,745,9 13
`
`DISABLED
`
`N EXT__STAT E
`
`20
`
`CLR
`
`T|ME_VAL
`
`BURST__CNT
`
`CLK
`
`CLK_D|V10
`
`ROW_ACT|VE ‘
`
`ROW [DLE$
`
`ROW__OLD
`
`FIG. 3
`
`0003
`
`

`
`U.S. Patent
`
`Apr. 23, 1998
`
`Sheet 3 of 3
`
`5,745,913
`
`REQUEST
`
`QUEUE
`
`ACTIVE
`
`REQU EST
`
`PROCESSING
`
`ROW_H IT, LOW, OLD
`
`ROW
`
`ADDR
`
`ROW
`
`ADDR
`
`ROW
`
`ADDR
`
`14
`
`14
`
`14
`
`0004
`
`

`
`U.S. Patent
`
`8991Q182LPA
`
`Sheet 4 of 8
`
`5,745,913
`
`
`
`
`
`n:o:>>omm>_.8<:>>om:::>>omEEmoo<E
`
`0005
`
`

`
`U.S. Patent
`
`Apr. 28, 1998
`
`Sheet 5 of 8
`
`5,745,913
`
`pmmaomm
`
`
`m_>_._.0<mm_N_:mo_mn_3.
`
`
`pwmsommoDn_O
`
`ozammoommmm«N4:..n_0
`
`pwmaomm
`m__._o<o
`
`
`
`m5m_:omm_s_3%
`
`ow
`
`mv
`
`>>Om
`
`moo<
`
`mm<n_s_oomo<:>>om
`
`E.
`
`
`
`
`
`N.mm:_D_|>>ON_.n._DOOmDhm5v_z<m_
`
`
`
`4¢swm2:.2<mo
`
`%o._o26m
`
`3
`
`Imono
`
`0006
`
`

`
`U.S. Patent
`
`Apr. 23, 1998
`
`Sheet 6 of 8
`
`5,745,913
`
`0007
`
`

`
`U.S. Patent
`
`0032LPA
`
`8991
`
`Sheet 7 of 8 5
`
`mmon
`
`mm
`
`mm3mmom3
`33NHoH
`
`mmao
`
`
`
`mmfiowoMUOMU
`
`m~Ho..:u-:m--m
`
`MNHOIIUIIMIIQ
`
`mua
`
`mmaouamuum
`
`mua
`
`o--4a-m
`
`mma
`
`munm
`
`Evm_m<o
`
`munm
`
`AmVmm<o
`
`mnnm
`
`A9m_m<o
`
`muum
`
`AoVmm<o
`
`0008
`
`

`
`U.S. Patent
`
`Apr. 28, 1998
`
`Sheet 8 of 3
`
`5,745,913
`
`LATENCY
`
`(CYCLES)
`
`50
`
`40
`
`30
`
`20
`
`2%
`
`3%
`
`4%
`
`5%
`
`6%
`
`7%
`
`MISS RATE PER CPU
`
`32% 48%
`
`64%
`
`80%
`
`96°/o
`
`BUS UTILIZATON
`
`FIG. 9
`
`0009
`
`

`
`5,745,913
`
`1
`MULTI-PROCESSOR DRAM CONTROLLER
`THAT PRIORITIZES ROW-MISS REQUESTS
`TO STALE BANKS
`
`BACKGROUND OF THE 1NVENTION—FIELD
`OF THE INVENTION
`
`This invention relates to multi—processor systems, and
`more particularly for DRAM controllers which re-order
`requests from multiple sources.
`
`BACKGROUND OF THE INVENTION——
`DESCRIPTION OF THE RELATED ARI‘
`
`Multi-processor systems are constructed from one or
`more processing elements. Each processing element has one
`or more processors, and a shared cache and/or memory.
`These processing elements are connected to other processing
`elements using a scaleable coherent interface (SCI). SCI
`provides a communication protocol for transferring memory
`data between processing elements. A single processing ele-
`ment which includes SCI can later be expanded or upgraded.
`An individual processing element typically contains two
`to four central processing unit (CPU) cores, with a shared
`cache. The shared cache is connected to an external memory
`for the processing element. The external memory is con-
`structed from dynamic RAM (DRAM) modules such as
`single-inline memory modules (S]MMs). The bandwidth
`between the shared cache and the external memory is critical
`and can limit system performance.
`Standard DRAM controllers for uni—proces sors have been
`available commercially and are well-known. However, these
`controllers, when used for multi-processor systems. do not
`take advantage of the fact that multiple processors generate
`the requests to the external memory. Often requests from
`different processors can be responded to in any order, not
`just
`the order received by the memory controller.
`Unfortunately, DRAM controllers for uni-processor systems
`do not typically have the ability to re-order requests. Thus
`standard DRAM controllers are not optimal for multi-
`processor systems.
`‘
`Synchronous DRAMs are becoming available which pro-
`vide extended features to optimize performance. The row
`can be left open, allowing data to be accessed by pulsing
`CAS without pulsing RAS again. Many CAS-only cycles
`canbe performed once the row address has been strobed into
`the DRAM chips and the row left active. Burst cycles can
`also be perfonned where CAS is strobed once while data that
`sequentially follows the column address is bursted out in
`successive clock cycles.
`What is desired is a DRAM controller which is optimized
`for a multi-processor system. It is desired to re-order
`requests from different CPU cores in an optimal fashion to
`increase bandwidth to the external DRAM memory. It is also
`desired to use burst features of newer synchronous DRAMs
`to further increase bandwidth.
`
`SUNIMARY OF THE INVENTION
`
`A memory controller accesses an external memory in
`response to requests from a plurality of general—purpose
`processors. The memory has a plurality of bank controllers.
`Each bank controller accesses a bank of the external
`memory. Each bank controller has a state machine for
`sequencing control signals for timing access of the external
`memory. The state machine outputs a row-address-strobe
`RAS-active indication of a logical state of a RAS signal line
`coupled to the bank of the external memory. A row address
`
`2
`register stores a last row address of a last—accessed row of
`the bank of the external memory.
`A counter includes reset means for resetting the counter
`upon completion of a data-transfer access to the bank of the
`external memory. The counter periodically increments when
`the row is active and no data-transfer access occurs. The
`counter outputs a stale-row indication when a count from the
`counter exceeds a predetermined count.
`A request queue stores requests from the plurality of
`general-purpose processors for accessing the external
`memory. The request queue stores row- status hits including:
`a) a row-hit indication when the last row address matches
`a row address of a request,
`b) the row-active indication from the state machine, and
`c) the stale-row indication from the counter.
`A request prioritizer is coupled to the request queue. It
`determines a next request from the request queue to generate
`a data-transfer access to the external memory. The request
`prioritizer re-orders the requests into an order other than the
`order the requests are loaded into the request queue. Thus
`requests from the plurality of processors are reordered for
`accessing the external memory.
`In further aspects of the invention the counter is incre-
`mented by a slow clock having a frequency divided down
`from a memory clock for incrementing the state machine
`when the row line is active and no data-transfer access
`occurs, but the counter is incremented by the memory clock
`during the data-transfer access. Thus the counter increments
`more slowly when no data-transfer occurs than when a
`data-transfer access is occurring.
`In further aspects the predetermined count for determin-
`ing the stale row indication is programmable. The request
`prioritizer re-orders requests having the stale-row indication
`before requests not having the stale-row indication when no
`requests have the row-hit indication. Thus miss requests to
`stale rows are processed before miss requests to more-
`recently-accessed rows.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a diagram of a multi-processor chip which
`connects to an external memory bus.
`FIG. 2 is a diagram of a state machine for accessing a
`dynamic random-access memory (DRAM) bank using page-
`mode access.
`
`FIG. 3 is a diagram of a burst counter which is also used
`to indicate how long the current row has been open without
`being accessed.
`FIG. 4 is a diagram of a DRAM controller for separately
`accessing multiple banks of DRAM.
`FIG. 5 is a diagram of the fields for a memory request in
`the request queue.
`FIG. 6 is a diagram showing that the status of the DRAM
`row is determined while the cache is determining if a cache
`hit or miss has occurred.
`
`FIG. 7 is a waveform showing a row being opened on one
`bank while a different bank is bursting data.
`FIG. 8 is a timing diagram of SDRAM accesses using
`normal and auto—precharge.
`FIG. 9 highlights a decrease in average latency when
`using request re-ordering.
`DETAILED DESCRIPTION
`
`in
`invention relates to an improvement
`The present
`embedded DRAM controllers. The following description is
`
`0010
`
`

`
`5,745,913
`
`3
`presented to enable one of ordinary skill in the art to make .
`and use the invention as provided in the context of a
`particular application and its requirements. Various modifi-
`cations to the preferred embodiment will be apparent to
`those with skill in the art, and the general principles defined
`herein may be applied to other embodiments. Therefore, the
`present invention is not intended to be limited to the par-
`ticular embodiments shown and described, but is to be
`accorded the widest scope consistent with the principles and
`novel features herein disclosed.
`Multi-Processor Die—FIG. 1
`FIG. 1 is a diagram of a multi-processor chip for con-
`necting to an external memory bus. Four CPU cores 42, 44,
`46, 48 each contain one or more execution pipelines for
`fetching, decoding, and executing general-purpose instn1c—
`tions. Each CPU core has a level-one primary cache. Oper-
`and or instruction fetches which miss in the primary cache
`are requested from the larger second—leve1 cache 60 by
`sending a request on internal bus 52.
`CPU cores 42, 44, 46. 48 may communicate with each
`other by writing and reading data in second-level cache 60
`without creating tratfic outside of chip 30 on the external
`memory bus. Thus inter-processor communication is some-
`times accomplished without external bus requests. However,
`communication to other CPU’s on other dies do require
`external cycles on the memory bus.
`Requests from CPU cores 42, 44, 46, 48 which cannot be
`satisfied by second-level cache 60 include external DRAM
`requests and PCI bus requests such as input-output and
`peripheral accesses. PCI requests are transferred from
`second-level cache 60 to PCI interface 58, which arbitrates
`for control of PCI bus 28, using PCI-specific arbitration
`signals. Bus 54 connects second-level cache 60 to PCI
`interface 58 and BIU 56.
`Memory requests which are mapped into the address
`space of the DRAMs rather than the PCI bus are sent from
`second-level cache 60 to bus-interface unit (BIU) 56. BUI
`56 contains a DRAM controller so that DRAM signals such
`as chip-select (CS), RAS, CAS, and the multiplexed rowl
`column address are generated directly rather than by an
`external DRAM controller chip connected to a local bus.
`Communication to the other processors is accomplished
`through scaleable—coherent interface SCI 57, which transfers
`data to other processing elements (not shown).
`DRAM Access Requirements
`The cost of a DRAM chip is typically reduced by multi-
`plexing the row and column addresses to the same input pins
`on a DRAM package. A row—address strobe (RAS) is
`asserted to latch the multiplexed address as the row address,
`which selects one of the rows in the memory array within the
`DRAM chip. A short period of time later a column-address
`strobe (CAS) is strobed to latch in the column address,
`which is the other half of the full address. Accessing DRAM
`thus requires that the full address be divided into a row
`address and a column address. The row address and the
`column address are strobed into the DRAM chip at different
`times using the same multiplexed-address pins on the
`DRAM chip.
`DRAM manufacturers require that many detailed timing
`specifications be met for the RAS and CAS signals and the
`multiplexed address. At the beginning of an access, the RAS
`signal must remain inactive for a period of time known as
`the row or RAS precharge period This often requires several
`clock periods of a processor's clock. After RAS precharge
`has occurred, the row address is driven out onto the multi-
`plexed address bus and RAS is asserted. Typically the row
`address must be driven one or more clock periods before
`
`4
`RAS is asserted to meet the address set-up requirement.
`Then the column address is driven to the multiplexed
`address bus and CAS is asserted.
`'
`For synchronous DRAMs, a burst mode is used where
`CAS is pulsed before the data is read out of the DRAM chip.
`On successive clock periods, a burst of data from successive
`memory locations is read out. CAS is not reasserted for
`each datum in the burst. The internal row is left asserted
`during this time although the external RAS signal may have
`become inactive. CAS may then be asserted again with a
`diiferent column address when additional bursts of data have
`the same row address but different column addresses.
`Indeed, a row address may be strobed in once followed by
`dozens or hundreds of CAS—on1y bursts. The internal row is
`left asserted as long as possible, often until the next refresh
`occurs. This is sometimes referred to as page-mode. Since a
`typical multiplexed address has 11 or more address bits, each
`row address or “page” has at least 2“ or 2048 unique
`column addresses.
`When the arbitration logic for DRAM access determines
`that the current access is to arow that is not requested by any
`other CPU’s,
`then a row precharge is performed upon
`completion of requests to the row when other requests are
`outstanding to other rows in the bank. This reduces latency
`to these other rows.
`The row address is typically the higher-order address bits
`while the column address is the lowest-order address bits.
`This address partition allows a single row to contain 2K or
`more contiguous bytes which can be sequentially accessed
`without the delay to strobe in a new row address and
`precharge the row. Since many computer programs exhibit
`locality, where memory references tend to be closer rather
`than farther away from the last memory reference,
`the
`DRAM row acts as a small cache.
`The DRAM is arranged into multiple banks. with the
`highest-order address bits being used to determine which
`bank is accessed. Sometimes lower-order address bits. espe-
`cially address bits between the row and column address bits,
`are used as bank-select bits. See “Page-interleaved memory
`access”, U.S. Pat. No. 5,051,889 by Fung et al., assigned to
`Chips and Technologies, Inc. of San Jose Calif. Each DRAM
`bank acts as a separate row-sized cache, reducing access
`time by avoiding the RAS precharge and strobing delay for
`references which match a row address of an earlier access.
`A synchronous DRAM adds a chip-select (CS) and a
`clock signal to the standard set of DRAM pins. The multi-
`plexed address, and the RAS, CAS, and WE control signals
`are latched into the DRAM chip when CS is active on a
`rising edge of the clock. Thus RAS, CAS, and WE are
`ignored when the chip-select signal is inactive. The internal
`row may be activated by having RAS and CS low on arising
`clock edge. The row remains activated when RAS becomes
`inactive as long as CS remains inactive to that DRAM chip.
`Other rows in other DRAM chips may then be activated by
`asserting a separate CS to these other DRAM chips. Nor-
`mally the multiplexed address and the RAS, CAS, WE
`control signals are shared among all banks of DRAMS, but
`each bank gets a separate chip-select. Only the bank having
`the active chip-select signal latches the shared RAS, CAS,
`WE signals; the other banks with CS inactive simply ignore
`the RAS, CAS, WE signals.
`Page-Mode DRAM State Machine—FIG. 2
`FIG. 2 is a diagram of a state machine for accessing a
`dynamic random-access memory (DRAM) bankusing page-
`mode access.A copy of DRAM state machine 10 is provided
`for each bank of DRAM which is to be separately accessed.
`Some DRAM banks may be disabled. indicated by disabled
`
`0011
`
`

`
`5,745,913
`
`5
`state 91 which cannot be exited without re-configuring the
`memory. Idle state 90 is first entered; the row is not yet
`activated. Row precharge state 92 is then entered; the row is
`still inactive. After a few clock periods have elapsed. the row
`or RAS precharge requirement has been met and RAS is
`asserted. Row active state 96 is then entered. CAS is then
`asserted for reads in state 98 and for writes in state 99. After
`
`tl1e CAS cycle and any burst is completed. row active state
`96 is re-entered and remains active until another access is
`requested.
`Additional accesses that have the same row address are
`known as row hits, and simply assert CAS using states 98 or
`99. When the row address does not match the row address
`of the last access, then a row miss occurs. Row precharge
`state 92 is entered, and the new row address is strobed in as
`RAS becomes active (falls) when entering state 96. Then the
`column address is strobed in states 98 or 99.
`A timer is used to signal when a refresh is needed. and
`refiesh state 93 is entered once any pending access is
`completed. Refresh is typically triggered by asserting CAS
`before the RAS signal, and a counter inside the DRAM chip
`increments for each refresh so that all rows are eventually
`refreshed.
`These states can be encoded into afour-bit field known as
`the BAN'K__STATE, which can be read by other logic
`elements. Table 1 shows a simple encoding of these states
`into a 4-bit field.
`'
`
`TABLE 1
`
`DRAM State Encodigg
`Name
`
`Idle, RAS High
`Row Precharge
`Row Active
`CAS Read
`CAS Write
`Refresh
`Not Installed
`
`BANK_STA".[E
`
`0000
`0001
`00 1 1
`0 100
`0 10 1
`0111
`11 11
`
`Burst and Row Open Timer--FIG. 3
`FIG. 3 is a diagram of a burst counter which is also used
`to indicate how long the current row has been open without
`being accessed. Row counter 12 is provided for each bank
`of DRAMS which is separately accessible. Counter 20 is a
`4-bit binary upward-counter which increments for each
`rising edge of the increment clock input INCR. Once the
`maximum count of 1111 (15 decimal) is reached, the counter
`holds the terminal count rather than rolling over to zero.
`Each time a new state is entered in DRAM state machine
`10 of FIG. 2. NEXT_STATE is pulsed and counter 20 is
`cleared to 0000. Thus counter 20 counts the time that DRAM
`state machine 10 remains in any state.
`The processor’s memory clock CLK is selected by mux
`19 for all states except row active state 96. When any state
`requires multiple clock cycles, counter 20 is used to keep
`count of the number of clock cycles in that state. For
`example, precharge may require 3 clock cycles. Row pre-
`charge state 92. clears counter 20 when first entered, and
`remains active until the count from counter 20 reaches two,
`which occurs during the third clock period of CLK. Once the
`count TIME_VAL output from counter 20 reaches two, row
`precharge state 92 may be exited and row active state 96
`entered.
`When bursting of data is enabled, counter 20 is also used
`as a burst counter. When read CAS state 98 is entered,
`counter 20 is incremented each clock period as the data is
`
`6
`burst until the burst count is reached. Finally read CAS state
`98 is exited once the desired burst count is reached. For a
`burst of four data items, four clock periods are required,
`while a burst of eight data items requires eight clock cycles.
`CAS may be pipelined so that a new column address is
`strobed in while the last data item(s) in the previous burst are
`being read or written. A simple pipeline is used to delay the
`data transfer relative to the state machine. For a CAS latency
`of three cycles, the burst of data actually occurs three clock
`cycles after it is indicated by the burst counter. In that case,
`read CAS state 98 is exited three cycles before the burst
`completes.
`Counter 20 is used in a second way for row active state 96.
`The second use of counter 20 is to keep track of the idle time
`since the last CAS cycle when RAS is low. Mux 19 selects
`the clock divided by ten, CLK_DlV10 during row active
`state 96 ROW_ACTIVE. This slower clock increments
`counter 20 at a slower rate. The slower rate is needed
`because row active state 96 may be operative for many
`cycles. In a sense, the DRAM bankis idle, but RAS has been
`left on for much of the time that row active state 96 is
`operative. Counter 20 can only count up to 15, so a fast clock
`would quickly reach the maximum count.
`_
`Counter 20 is slowly incremented by the divided-down
`clock CLK_DIV10. A programmable parameter ROW__
`IDLE$ is programmed into a system configuration register
`and may be adjusted by the system designer for optimum
`performance. Comparator 18 signals ROW__OLD when the
`count from counter 20 exceeds ROW_lDLE$, indicating
`that the row active state 96 has been active for longer than
`ROW_]DLE$ periods of CLK_DIV10.
`ROW Open Counter Indicates Stale Rows
`ROW__OLD is thus signaled when row active state 96 has
`been active for a relatively long period of time. No data
`transfers from this bank have occurred in that time, since
`another state such as read CAS state 98 would have been
`entered. clearing counter 20. ROW_0LD is an indication of
`how stale the open row is. Such information is useful when
`determining whether another reference is likely to occur to
`that open row. Once a long period of time has elapsed
`without any accesses to that row, it is not likely that future
`accesses will occur to this row. Rather, another row is likely
`to be accessed.
`.
`Another row is more likely to be accessed when ROW_
`OLD is active than when a shorter period of time has elapsed
`since the last access and ROW_0LD is not asserted. The
`priority of requests may be altered to lower the priority to a
`stale bank Scheduling logic in the DRAM controller may
`examine ROW__OLD to determine when it is useful to close
`a row and begin RAS precharge for another access, even
`before another access is requested. This is known as a
`speculative precharge, when the stale row is closed, and the
`bank is pre-charged before an actual demand request arrives.
`Multi—bank DRAM Controller-—-FIG. 4
`FIG. 4 is a diagram of DRAM controller 56 for separately
`accessing multiple banks of DRAM. Four DRAM banks 16
`are shown. although typically 8 or 16 banks may be sup-
`ported by a simple extension of the logic shown. Each bank
`has its own DRAM state machine 10, as detailed in FIG. 2,
`and burst and row counter 12. Each bank also has its own
`row—address register 14 which contains a copy of the row
`address bits strobed into the DRAMs during the last row
`open when RAS was asserted
`Bus 54 transmits memory requests from any of the four
`CPU cores 42, 44, 46, 48 which missed in second-level
`cache 60 of FIG. 1. or PCI interface 58 or SCI 57. These
`requests are loaded into request queue 26 which are priori-
`
`0012
`
`

`
`5,745,913
`
`7
`tized by request prioritizer 22 before being processed by
`request processor 24 which activates one of the DRAM state
`machines 10 for accessing one of the DRAM banks 16.
`Refresh timer 28 is a free—running counter/tirner which
`signals when a refresh is required, typically once every 0.1
`msec. The dilfcrent banks are refreshed in a staggered
`fashion to reduce the power surge when the DRAM is
`refreshed. Four refresh request signals REFO:3 are sent to
`request prioritizer 22 which begins the refresh as soon as any
`active requests which have already started have finished.
`Thus refresh is given the highest priority by request priori-
`tizer 22.
`As each new request is being loaded into request queue
`26, the bank address is decoded and the row address is sent
`to row—address register 14 of the decoded bank to determine
`if the new request is to the same row-address as the last
`access of that bank. The DRAM state machine 10 of that
`bank is also consulted to determine if a requested row is
`active or inactive. The row counter is also checked to see if
`ROW_OLD has been asserted yet, indicating a stale row.
`The results of the early query of the bank, the row-match
`indicator ROW_HII‘. the DRAM state machine’s current
`state of the row, ROW_ACI'IVE, and RAS_LOW from the
`row counter, are sent to request queue 26 and stored with the
`request.
`Request Queue Stores Row Hit, Count Status—FIG. 5
`FIG. 5 is a diagram of the fields for a memory request in
`the request queue. Enable bit 31 is set when a new request
`is loaded into request queue 26. but cleared when a request
`has been processed by a DRAM state machine which
`accesses the external memory. The full address of the
`request, including the bank, row, column, and byte address
`is stored in address field 32. An alternative is to store just the
`decoded bank number rather than the bank address. Status
`bits such as read or write indications are stored in field 34.
`An identifier for the source of the request is stored in source
`field 36.
`Other fields are loaded after the bank’s state machine, row
`counter, and row address are consulted. If the new row
`address of the request being loaded into request queue 26
`matches the row address of the last access, which is stored
`in row—address register 14, then ROW_}lIT bit 38 is set;
`otherwise it is cleared to indicate that the old row must be
`closed and a new row address slrobed in.
`The DRAM state machine 10 of the decoded bank is
`consulted to determine if the row is active (open) or inactive
`(closed). RAS is elfectively “low” and the row active for
`states 96, 98, 99. When RAS is "low", bit ROW_ACI'IVE
`39 is set. The row counter 12 is also consulted for the
`decoded bank, and ROWAOLD is copied to ROW_OLD
`bit 35 in request queue 26.
`Row Status Looked-Up During Cache Access—FIG. 6
`FIG. 6 is a diagram showing that the status of the DRAM
`row is obtained while the cache is determining if a cache hit
`or miss has occurred. Requests from CPU cores 42, 4-4, 46,
`48, and requests from SCI 57, and PCI interface 58 are sent
`to second-level cache 60 to determine if the data requested
`is present in the on-chip cache. At the same time as the cache
`lookup, before the cache hit or miss has been determined, the
`request’s address is decoded by bank decoder 62 to deter-
`mine which DRAM bank contains the data. ‘The decoded
`bank’ s state machine 10 and row-address register 14 are then
`selected. Row address Comparator 64 signals ROW__HIT if
`the new row address of the request matches the row address
`of the last access, stored in register 14. ROW_ACI'IVE is
`signaled by DRAM state machine 10 if the current state has
`the RAS signal active (low).
`
`8
`Row counter 12 of the decoded bank is also consulted to
`determine if the row has been open for along period of time
`without a recent access. ROW_OLD’ is signaled for such as
`stale row. These row status signals, ROW__HIT, ROW_
`ACTIVE, and ROW_OLD. are loaded into request queue
`26, possibly before cache 60 has determined if a hit or a miss
`has occurred. Should a cache hit occur,
`the row status
`information is simply discarded since cache 60 can supply
`the data without a DRAM access.
`The bank status information in request queue 26 is
`updated when the status of a bank changes. Alternately, the
`bank’s status may be read during each prioritization cycle,
`or as each request is processed.
`Request prioritizer 22 then examines the pending requests
`in request queue 26 to determine which request should be
`processed next by request processor 24. Request prioritizer
`22 often processes requests in a diflerent order than the
`requests are received in order to maximize memory perfor-
`mance.
`
`In the preferred embodiment, request prioritizer 22
`attempts to group requests to the same row together.
`Requests to the row with the most requests are processed
`before requests to other rows with fewer requests.
`Prioritizer Re-Orders Requests
`Request prioritizer 22 re-orders the requests to maximize
`memory bandwidth. Bandwidth can be increased by reduc-
`ing the number of row misses: requests to a row other than
`the open row. The prioritizer searches pending requests in
`request queue 26 and groups the requests together by bank
`and row. Any requests to a row that is already open are
`processed first. These requests to open rows are rapidly filled
`since the RAS precharge and opening delays are avoided.
`Row hits are requests to rows which are already open.
`Row-hit requests are re-ordered and processed before other
`row—miss requests. The row—miss request may be an older
`request which normally is processed before the younger
`row-hit request. However,
`the older row—miss request
`requires closing the row that is already open. If this row were
`closed, then the row-hit request would require spending
`additional delay to precharge RAS and open the row again.
`Once all of tile row-hit requests have been processed, only
`row-miss requests remain. Any requests to banks with RAS
`inactive (ROW,ACl‘IVE=0) are processed first, since no
`row is open. Then row-rniss requests to banks with open
`rows are processed. The row counter is used to determine
`priority of these requests: any row—miss requests that have
`ROW_OLD active are processed before requests with
`ROW_OLD inactive, so that stale rows are closed before
`more—reoently accessed rows are closed. The lowest priority
`request is a row—miss to a bank with a recently—accessed row
`open. Recently-accessed banks are likely to be accessed
`again, and thus a delay in closing the bank may allow a new
`request to that open row to be received and processed before
`closing the row to process the row—miss request.
`The pending requests are processed in this order:
`DRAM Refresh Request
`Row-Hit Request
`Row-Miss Request to Bank with Row Closed
`Row-Miss Request to Bank with Row Open, ROW_
`OLD=1
`
`Row-Miss Request to Bank with Row Open, ROW_
`OLD=0
`Thus a recently—accessed bank with ROW_OLD=0 is left
`65 open as long as possible because any row-rniss request to
`this bank is given the lowest priority. When multiple
`requests have the same priority, then requests in a group
`
`0013
`
`

`
`5,745,913
`
`9
`having a larger number of individual requests to a row are
`processed before smaller groups of requests.
`When multiple requests are pending to the current row,
`then these requests are processed before other requests to
`other rows. Efficient CAS-only cycles are used. When there
`is only one request to the current row, then an auto-precharge
`CAS cycle is performed to precharge the row as soon as
`possible. Auto-precharge is a special type of CAS cycle
`whereby a row precharge is requested immediately after the
`data has been internally read from the RAM array, even
`before the data has been bursted out of the DRAM chip.
`Auto-precharge is supported by some synchronous
`DRAM’s, allowing row precharge to begin before the data
`has been completely transfe

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket