throbber
Efficient (Stack) Algorithms for Analysis of
`Write-Back and Sector Memories
`
`JAMES G. THOMPSON
`U.S. Air Force
`and
`ALAN JAY SMITH
`University of California, Berkeley
`
`techniques
`existing analysis
`known as stack algorithms,
`algorithms
`the class of replacement
`For
`permit
`the computation
`of memory miss ratios
`for all memory sizes simultaneously
`in one pass over
`a memory
`reference string. We extend
`the class of computations
`possible by this methodology
`in two
`ways. First, we show how
`to compute
`the effects of copy-backs
`in write-back
`caches. The key
`observation
`here is that a given block
`is clean for all memory sizes less than or equal to C blocks and
`is dirty
`for all larger memory sizes. Our technique permits efficient
`computations
`for algorithms
`or
`systems using periodic write-back
`and/or block deletion. The second extension permits stack analysis
`simulation
`for sector (or subblock) caches in which a sector (associated with an address tag) consists
`of subsectors
`(or subblocks)
`that can be loaded
`independently.
`The key observation
`here is that a
`subsector
`is present only
`in caches of size C or greater. Load
`forward prefetching
`in a sector cache is
`shown
`to be a stack algorithm
`and is easily simulated using our technique. Running
`times
`for our
`methods are only slightly
`higher
`than
`for a simulation
`of a single memory size using nonstack
`techniques.
`
`[Memory
`Design Styles; B.3.3
`Structures]:
`[Memory
`Categories and Subject Descriptors: B.3.2
`Structures]:
`Performance Analysis and Design Aids; B.4.4
`[Input/Output
`and Data Communi-
`cations]:
`Performance Analysis and Design Aids; C.4 [Computer
`Systems Organization]:
`Per-
`formance of systems
`
`General Terms: Design, Theory
`
`Additional Key Words and Phrases: Cache, copyback, disk cache, memory hierarchy analysis, sector
`cache, simulation,
`stack analysis
`
`1. INTRODUCTION
`aspects of
`is one of the most important
`Analysis of memory system performance
`computer system design. Frequently
`this analysis
`is done using the technique of
`trace-driven
`simulation,
`in which a trace of memory
`references
`from a similar
`
`the National
`in part by
`is based on research supported
`here
`presented
`The material
`Foundation
`under grants CCR-8202591 and MIP-8713274,
`by NASA under consortium
`NCA2-128, by the State of California
`under
`the MICRO program, and by the International
`Machines Corporation, Digital Equipment Corporation, Hewlett-Packard,
`and Signetics.
`Authors’
`current
`addresses: Lt. Col. J. G. Thompson,
`Joint Data Systems Support Center
`(JDSSC/CBlO),
`The Pentagon, Washington, DC 20301-7010; A. J. Smith, University
`of California,
`Dept. of Electrical Engineering
`and Computer Sciences, Berkeley, CA 94720.
`the copies are not
`that
`Permission
`to copy without
`fee all or part of this material
`is granted provided
`made or distributed
`for direct commercial advantage,
`the ACM copyright notice and the title of the
`publication
`and its date appear, and notice
`is given that copying
`is by permission of the Association
`for Computing Machinery.
`To copy otherwise,
`or
`to
`republish,
`requires a fee and/or
`specific
`permission.
`0 1989 ACM 0734-2071/89/0200-0078
`
`Science
`agreement
`Business
`
`$01.50
`
`ACM Transactions
`
`on Computer Systems, Vol. 7, No. 1, February 1989, Pages 78-117.
`
`DOCKET NO.: IPR-5463750
`
`1
`
`TI-EXHIBIT 1013
`
`

`

`Efficient (Stack) Algorithms
`
`for Write-Back and Sector Memories
`
`79
`
`in
`
`to a simulation of the system under study. This technique
`system is used as input
`has been applied
`to all
`levels of the memory hierarchy
`from microprocessor
`caches to file system design.
`the discovery by
`Trace-driven
`simulation
`became much more practical with
`Mattson, Gecsei, Slutz, and Traiger
`[22] that,
`for certain
`replacement policies,
`the performance
`of all memory sizes could be determined with a single pass
`through
`the
`trace
`file. Their stack analysis
`technique
`relies on the
`inclusion
`property of these policies, such that the contents of any size memory
`is a subset
`of the contents of any larger memory. Thus,
`the contents of all memories can be
`represented by a stack, where the top k items in the stack are the contents of a
`memory of size k. Policies
`that obey this property are known as stack algorithms.
`An equivalent
`characteristic
`of stack algorithms
`is that each possesses a total
`priority
`ordering of all blocks at any
`instant
`time
`that
`is independent
`of
`memory size.
`situations,
`to some important
`Until now, stack analysis has not been applied
`forcing
`the designer to fall back on the one-size-at-a-time method. One example
`of this
`is the write-back policy, also called copy-back, where a write
`to a block
`causes the block
`to be marked “dirty”
`in the cache, but the write
`to secondary
`storage
`is delayed until
`some later
`time. Write-back
`is particularly
`desirable
`where memory bandwidth may be a limiting
`resource, such as in a shared-bus
`multiprocessor
`or network
`file system. The alternative
`policy, write-through
`or
`store-through,
`in which all modifications
`go directly
`to secondary storage, severely
`restricts
`the performance
`improvement
`due to caching. The consideration
`of
`writes
`is important even with write-back since, in many cases, a write can cause
`twice the memory accesses of a read; one to fetch the block prior
`to modification
`and another
`to rewrite
`the block
`(copy-back). Discussions
`of stack analysis,
`however, have either
`ignored writes altogether, or considered only write-through.
`The problem with stack analysis of write-back
`is that
`it appears to violate
`the
`inclusion property. For example, suppose that a dirty block at level k in the stack
`is read. It must come to the top of the stack, but it is now clean for some sizes
`and dirty
`for others. We show that by maintaining
`a “dirty
`level”
`for each block,
`the stack analysis
`technique can be extended
`to analyze write-back. This dirty
`level is the smallest cache in which
`the block is dirty, or equivalently,
`the lowest
`level
`in the stack to which
`the block has been pushed since its last write, or
`infinity
`if the block has never been written. The dirty
`level is used to count
`the
`number of write-backs
`for each cache size by assuming
`that each write results
`in
`a write-back,
`then waiting
`to see where the write
`is avoided.
`If a dirty block
`is
`rewritten,
`then
`the previous write has been avoided
`in all dirty sizes since both
`the previous and current write can be written back with one memory access.
`Stack analysis can be similarly extended
`to analyze sector or subblock caches
`[14, 211. In a microprocessor
`cache, access time, on-chip data path widths, and
`pin-counts
`favor small blocks, whereas
`the size of associative
`lookup circuitry
`and tags favor fewer, larger blocks. A possible compromise
`is to break each block
`into independent
`subblocks, any or all of which may be present. Again, we show
`that stack analysis can be applied by maintaining
`additional
`data with each
`block.
`techniques
`is a review of previous stack analysis
`The remainder of this section
`and definition
`of terms. Section 2 presents
`the stack algorithm
`for write-back,
`
`ACM Transactions
`
`on Computer Systems, Vol. 7, No. 1, February 1989.
`
`DOCKET NO.: IPR-5463750
`
`2
`
`TI-EXHIBIT 1013
`
`

`

`80
`
`l
`
`J. G. Thompson and A. J. Smith
`
`to handle deletions, periodic
`in Section 3 by several simple extensions
`followed
`write-back, and cache flush. Section 4 similarly presents the algorithm
`for sector
`caches, including
`an extension
`for a useful
`form of prefetch. Section 5 presents
`a comparison of the time required
`to perform analysis using the stack technique.
`Finally, we show how stack analysis makes possible other useful measurements,
`exemplified by a computation
`of the probability
`of a write-back as a function of
`memory size.
`
`1 .l Cache Memories
`Analysis of memory systems has always been an important part of the design of
`computer systems. The early concentration
`on virtual program memory
`[3] has
`been recently
`replaced by emphasis on high-speed processor caches [32] and by
`disk or file system buffering or caching
`[34]. In reality,
`these can all be viewed
`as parts of a hierarchy of memory
`types, where the upper levels of the hierarchy
`are faster but generally more costly. Although many of the analysis
`techniques
`discussed
`in this paper generalize
`to multilevel
`hierarchies
`[23], we restrict our
`discussions
`to a two-level hierarchy and refer to the top level as a cache.
`in
`The purpose of caching
`is to improve
`the average access time
`to items
`memory by keeping the most frequently used items in a small, fast, cache memory
`and by leaving
`the remainder
`in a larger, slower memory. The contents of cache
`are checked on each reference;
`if the referenced
`item
`is present
`in cache, then
`the item is available at the speed of the cache. If not, then the item is read into
`cache
`from memory,
`replacing
`something
`already cached. The speed of the
`combined memory system
`is a function
`of the
`two memory speeds and
`the
`probability
`that
`the referenced
`item is in cache.
`says
`[7]. This principle
`Caches are effective because of the principle of locality
`that the items most likely
`to be referenced next are those “near”
`the items that
`have been recently
`referenced. Two aspects of locality are temporal
`locality and
`spatial
`locality. Temporal
`locality
`implies
`that an item
`that has been recently
`referenced has a good chance of being referenced again in ‘a short
`time. Spatial
`locality
`implies
`that
`items close
`to a referenced
`item are also
`likely
`to be
`referenced. This
`is particularly
`evident
`in
`the sequential
`reference behavior
`observed in instructions
`and within
`files.
`to any cache, most of which
`There are a large number of design parameters
`must be considered
`in any analysis of that design. We briefly present definitions
`of a number of these. For more detail see [31].
`
`blocks or variable-size
`fixed-size
`into
`Blocking. The cache may be divided
`segments. Blocks are also referred
`to as pages in the context of virtual memory
`and lines or sectors in the context of a processor cache. The cache block or line
`size may be equal to the amount of data retrievable
`in one memory cycle, or it
`may require several memory cycles to fetch a block. A larger block size reduces
`per-block overhead and provides a form of prefetch, discussed below. This paper
`discusses only block caches, although most of the algorithms presented can be
`generalized
`to analyze segment caches [23], for example, a virtual memory
`that
`manages entire programs as a unit or file system caches that manage whole files.
`Replacement Policy. The replacement policy determines which block to remove
`when
`the cache is full and a new block must be fetched. Commonly
`suggested
`
`ACM
`
`Transactions
`
`on Computer
`
`Systems,
`
`Vol.
`
`7, No.
`
`1, February
`
`1989.
`
`DOCKET NO.: IPR-5463750
`
`3
`
`TI-EXHIBIT 1013
`
`

`

`Efficient (Stack) Algorithms
`
`for Write-Back and Sector Memories
`
`l
`
`81
`
`(FIFO),
`the Least Recently Used (LRU) policy, First-In First-Out
`include
`policies
`Least Frequently Used (LFU), and Random
`(RAND). An optimal policy, MIN,
`exists, but is unrealizable
`in practice because it requires knowledge of the future
`[22]. The MIN policy does not consider writes or deletes and is known
`to be
`nonoptimal
`if writes are considered
`[38].
`to
`is presented
`Write Policy. The write policy determines when a modification
`secondary storage. Writes may always go directly
`to secondary storage using the
`write-through
`or store-through policy. Alternatively,
`the write may go to the cache
`to be written at some later time, usually when the block is about to be replaced,
`using the write-back or copy-back policy. Write-back
`is motivated by the expec-
`tation
`that
`the block will be modified several times before it has to be written.
`Clearly, write-back can never cause more accesses than write-through
`and usually
`far fewer. On the other hand, since it deals in blocks rather
`than words, write-
`back may increase
`the number of bytes written.
`In addition, dirty blocks may
`remain
`in the cache for a long time,
`leading
`to reliability
`issues in large volatile
`caches such as file system caches in main memory. The decrease in memory
`traffic
`from write-back makes it very valuable
`in systems with
`limited memory
`bandwidth
`such as shared-bus multiprocessor
`systems. Write-back
`is also desir-
`able in file system caches because many files are temporary and may neuer have
`to be written.
`cache,
`in a write-through
`Write Allocate. When a written block is not present
`the block may be inserted
`in the cache (write allocate) or the cache may be
`bypassed altogether. Write allocate
`is again motivated by locality-the
`expecta-
`tion
`that
`the written block will soon be referenced again. A write-back
`cache
`always allocates a cache block to the written block.
`Write-Fetch.
`If write allocate
`is used by a cache where partial-block modifi-
`cation
`is allowed, and the block to be written
`is not in the cache (a write miss),
`then
`it is usually necessary to fetch the block prior
`to modifying
`it. This write-
`fetch is needed, for example,
`if one word of a multiword
`block
`is being written.
`The alternative
`is to keep track of the portion(s)
`of the cache block
`that are
`“valid,” which becomes costly when several disjoint portions of a large block are
`written. However,
`there are situations
`in which write-fetch
`can be avoided such
`as when the entire block
`is being overwritten,
`or when the contents of the rest
`of the block are predictable
`(e.g., when
`the block
`is a “new” block
`in a file
`system).
`that
`implies
`to a block often
`locality, a reference
`Prefetch. Because of spatial
`the next block will soon be referenced.
`It is possible
`to take advantage of this
`anticipated
`reference and to prefetch
`the next block in advance. This reduces the
`delay when the next block is actually
`referenced. Prefetch
`is advantageous when
`it can be overlapped with processing of other
`references or when
`two or more
`blocks can be fetched
`in much less time than all of them
`individually,
`as is the
`case with disk secondary storage. Although
`it reduces the delay, prefetch
`increases
`memory
`traffic
`unless all prefetched
`blocks are referenced before
`they are
`replaced. It may also result
`in memorypollution
`in which a soon-to-be-referenced
`block
`is displaced
`to make room
`to prefetch an unnecessary block
`[30].
`If a
`prefetch
`is only permitted
`in conjunction with a fetch, then the policy
`is a demand
`prefetch policy. Demand prefetch
`is desirable when
`the overhead of a fault
`is
`ACM Transactions
`on Computer Systems, Vol. 7. No. 1, February 1989.
`
`DOCKET NO.: IPR-5463750
`
`4
`
`TI-EXHIBIT 1013
`
`

`

`82
`
`l
`
`J. G. Thompson and A. J. Smith
`
`this over two (or more) blocks. With modern
`large; demand prefetch amortizes
`memory systems and file system caches, it is simple and inexpensive
`to initiate
`a prefetch even if the referenced block is present.
`
`1.2 Metrics
`The performance of a memory system can be measured in several ways. Perhaps
`the most widely used is the miss ratio, which
`is the fraction of references
`that
`were not satisfied by the cache. Conversely,
`the hit ratio is the fraction
`that were
`satisfied by the cache. The miss ratio
`is a latency metric since it determines
`the
`apparent access time of the memory system. The effective access time
`for any
`multilevel
`hierarchy
`is given by 2 tihi, where ti is the access time to the ith
`level,
`and hi is the fraction of references satisfied by the ith
`level cache. Sometimes
`overlooked
`is the
`fact
`that
`the access time
`to each level should
`include any
`queuing delays. These are usually negligible
`in a single-processor
`system, but
`may become important when several processors compete
`for access to a single
`secondary store [ 11.
`the
`varies with
`simulation
`of miss ratios during
`The actual computation
`parameters of the cache. Let N be the total number of references, and m(C) be
`the number of misses to a cache of size C. If all references are assumed to be
`reads, then the miss ratio
`for a cache of size C is given by
`
`MRn(C) = m(C)/N,
`
`(1.1)
`
`hence the name.
`where every write
`With write-through,
`secondary storage), the miss ratio
`is
`
`is a “miss”
`
`(i.e., causes an access to
`
`+ WIN
`= h(C)
`MRwT(C)
`(1.2)
`is the number of reads that “miss,” and W is the number of write
`
`is used, a write could
`the block and another
`
`result
`later
`
`in two accesses to secondary
`to write
`it. The miss ratio
`is
`
`where m,(C)
`references.
`When write-back
`storage, one to fetch
`now given by
`
`MRwB(C)
`
`=
`
`h(C)
`
`+ m,(C)
`
`+
`
`&dC))/N,
`
`fetches), and dp(C)
`is the number of write misses (i.e., write
`where m,(C)
`number of dirty blocks “pushed”
`from a cache of size C.
`This becomes
`
`(1.3)
`is the
`
`= (m(C) + dp(C))lN
`(1.4)
`is actually
`just a read reference and occurs if
`
`M&B(C)
`by using the fact that a write-fetch
`the block reference “misses.”
`for the write
`the processor must wait
`All of the expressions so far assume that
`to secondary storage
`to complete before continuing.
`It
`is often
`reasonable
`to
`buffer
`the writes so that
`the processor can continue almost
`immediately.
`In this
`case delay occurs only
`if there are enough accesses to create contention.
`In [31]
`it
`is observed
`that when memory bandwidth
`is adequate,
`four store-through
`
`ACM Transactions
`
`on Computer Systems, Vol. 7, No.
`
`I, February 1989.
`
`DOCKET NO.: IPR-5463750
`
`5
`
`TI-EXHIBIT 1013
`
`

`

`Efficient (Stack) Algorithms
`
`for Write-Back and Sector Memories
`
`*
`
`83
`
`buffers are sufficient to largely eliminate queuing for writes. Under this assump-
`tion, the write-back miss ratio with write-fetch
`is again simply
`
`MRwF(C) = m(C)/iV.
`
`(1.5)
`
`A related metric is the traffic ratio, which is the ratio of traffic between cache
`and secondary storage, measured in bytes, compared to the traffic that would be
`present without a cache [14]. The traffic ratio is increasingly
`important
`for
`analyzing shared-bus systems such as multiprocessor architectures or a network
`file system. Although buffering may eliminate write-back from consideration in
`the miss ratio, the write traffic is not eliminated, so writes must be considered
`in the traffic ratio. Also, prefetch may result in increased traffic since some
`prefetched blocks may not be actually referenced.
`The traffic ratio is dependent on the same factors as the miss ratio and, in
`addition, depends on the size of the data blocks transferred. Suppose that the
`processor accesses BP bytes per average memory reference. The traffic without a
`cache is then B, times the number of references. Frequently,
`the cache block
`size, B, is larger than B,. We assume that each cache miss causes B, bytes to be
`transferred. Then a large cache block size may act as a form of prefetch and
`reduce the miss ratio, but it may also increase the amount of traffic.
`The general form of the traffic ratio computation is
`
`TR(C,Bc) = h(C)
`
`+ m,(C) +f(C) + dp(C)]*B,/N*B,,
`
`(1.6)
`
`is again the number of write misses; dp(C) is again the number of
`where m,,(C)
`write-backs; and f(C)
`is the number of prefetched blocks. This expression
`assumes that write-fetch
`is used. Notice that the traffic ratio is identical to the
`miss ratio when there is no prefetching, no write buffering, and the cache block
`size is the same as BP.
`ratio, which is the ratio of secondary storage
`A third metric is the transfer
`accesses with and without cache [30]. This metric has also been called the
`[G. Gibson, personal communication 19861, the I/O ratio
`[18],
`transaction
`ratio
`and the swapping
`[19]. The transfer ratio is similar to the traffic ratio but
`ratio
`is more appropriate when performance is dominated by the cost of a memory
`access, relatively
`independent of the number of bytes transferred. Thus it is
`appropriate
`for disk caches and often for networks using small (1K or less)
`messages. For example, the transfer ratio decreases if two blocks are read from
`disk in a single I/O, whereas the traffic ratio is the same regardless of the number
`of I/OS used to transfer the data. The transfer ratio also has an indirect effect
`on the access time if there are enough transfers to create contention, particularly
`in multiple processor systems with shared memory. Assuming that prefetches
`occur only when the referenced block is not in cache (demand prefetch), then
`they do not affect the transfer ratio. A general expression for the transfer
`ratio is
`
`WC) = [m,(C) + m,(C) + @(C)I/N
`
`(1.7)
`
`which is almost proportional
`
`to the traffic ratio using constant block sizes.
`ACM Transactions
`on Computer Systems, Vol. 7, No. 1, February 1989.
`
`DOCKET NO.: IPR-5463750
`
`6
`
`TI-EXHIBIT 1013
`
`

`

`84
`
`-
`
`J. G. Thompson and A. J. Smith
`
`1.3 Trace-Driven Simulation
`simula-
`is to use trace-driven
`these metrics
`One common method
`for calculating
`from a system believed
`to be similar
`to the
`tion. Memory
`references are gathered
`system being modeled. These references are then used to drive a simulation
`of
`the system under study with varying design parameters. To the extent
`that
`the
`traces apply
`to the modeled system, simulation
`is a relatively
`simple way to
`observe the effect of changes to the memory hierarchy. Unfortunately,
`it could
`take a large number of simulations
`if only a single combination
`of memory sizes
`could be simulated at a time.
`replacement policies
`for certain
`In a classic paper, Mattson et al. showed that
`the miss ratios
`for all cache sizes could be calculated
`in a single pass over the
`reference
`trace
`[22]. These policies are collectively
`known as stuck algorithms.
`The technique depends on the inclusion property of these policies;
`the contents
`of any size cache includes
`(i.e., is a superset of)
`the contents of any smaller
`cache. Thus
`the cache at any time can be represented as a stack, with
`the upper
`k elements of the stack representing
`the blocks present
`in a cache of size k. The
`current stack level of any block
`is therefore
`the minimum
`cache size for which
`the block
`is resident.
`If a block
`is referenced while at level k, it is a “hit,” and
`therefore
`resident,
`for all sizes k and larger. The level at which
`the block is found
`is referred
`to as its stuck distance; see Figure 1. Using stack analysis,
`it is possible
`to compute
`the miss ratio of Equation
`(1.1) for all sizes by recording
`the hits to
`each level. The miss ratio
`for a cache size C is
`
`i hits(i)
`i=l
`
`N,
`
`(1.8)
`
`MRR(C) =
`
`N -
`)/
`(
`is
`that, since hits(i)
`the total number of references. Notice
`where N is again
`never negative,
`this is a nonincreasing
`function of cache size. All stack algorithms
`possess this characteristic,
`whereas nonstack algorithms may show points at
`which performance declines with
`increased cache size [4].
`The simplest example of a stack algorithm
`is the Least Recently Used (LRU)
`policy. The stack always contains
`the blocks
`in order of last reference, with
`the
`most recently
`referenced block on the top. For any cache size C, the LRU block
`for that cache size is the block at level C in the stack. When a block at level k is
`referenced,
`it is not in any cache smaller
`than k, and therefore
`it must be fetched.
`The block that must be removed from any cache of size j, j smaller than k, is the
`block at level j. The stack is updated by simply
`“pulling”
`the referenced block
`out of the stack and placing
`it on top. All blocks down to level k are effectively
`“pushed” down one level. Since the referenced block was in all caches k or larger,
`all blocks below level k remain unchanged. Figure 1 illustrates
`these operations
`for the case where the referenced block
`is at stack level 4 and the case in which
`the block is not currently
`in the stack.
`More generally, Mattson et al. [22] showed that any stack algorithm possesses
`a “priority
`function” which
`imposes a total ordering on all blocks at any given
`time, independent of cache size. Notice
`that LRU
`imposes such an ordering based
`on the time of last reference. However,
`in the more general case the relative
`priority of two blocks may change without either of them being referenced.
`(See,
`for example, Figure 4 where
`the relative positions of blocks A and B reverse
`ACM Transactions
`on Computer Systems, Vol. 7. No. 1, February 1989.
`
`DOCKET NO.: IPR-5463750
`
`7
`
`TI-EXHIBIT 1013
`
`

`

`Efficient (Stack) Algorithms
`
`for Write-Back and Sector Memories
`
`*
`
`85
`
`x,
`
`in the stack at level 4
`
`xr not in the stack
`
`replacement.
`using LRU
`Fig. 1. Examples of stack maintenance
`to the top of the cache
`The referenced block
`is always
`“pulled”
`stack. All blocks with smaller stack distance are pushed down one
`level.
`
`xl
`
`in the stack at level 4
`
`I, not in the stack
`
`Fig. 2. Examples of stack maintenance using a stack replacement
`algorithm. For each level C a single comparison
`(indicated by a
`circled cross) between
`the prior block at the level (sIW1 (C)) and the
`block pushed from above (y, (C - 1)) determines
`the new block at
`the level and the block pushed
`from
`the level. Update continues
`down to the current
`level of the referenced block or to the bottom
`of the stack if X, is not in the stack.
`
`the block at level j is
`the case that
`is no longer
`times 5 and 6.) It
`between
`necessarily
`the one to be pushed from that size cache. This complicates
`the stack
`update procedure, but only slightly. The stack can still be updated
`in a single
`pass that
`is similar
`to one pass of a bubble sort. A single comparison at each
`level determines
`the new block at the level and the block pushed from the level.
`First,
`the referenced block
`is still pulled
`to the top of the stack since it must
`become resident
`in all cache sizes. Using
`the terminology
`from
`[22], let yI (C) be
`the block pushed
`(“yanked?“)
`from a cache of size C. To make room for the
`referenced block, the top block in the stack, s,-] (l), must be pushed from a one-
`block cache, becoming y,(l). Some block must also be pushed from a two-block
`cache-the
`one with
`the lowest priority. A single comparison between y, (1) =
`stPl (1) and slml (2) determines which becomes yt (2). (Ties are broken by some
`arbitrary
`rule.) Similarly,
`the block pushed
`from a three-block
`cache should be
`ACM Transactions
`on Computer Systems, Vol. 7, No. 1, February 1989.
`
`DOCKET NO.: IPR-5463750
`
`8
`
`TI-EXHIBIT 1013
`
`

`

`86
`
`l
`
`J. G. Thompson and A. J. Smith
`
`Algorithm
`
`1. General Stack Analysis Algorithm
`
`1. FORlst<NDO
`2.
`IF z, 4 S,-, THEN A = 03
`3.
`ELSE DO
`4.
`FIND A SUCH THAT
`+ 1
`5.
`h(A)
`=&(A)
`6.
`IFA#l
`7.
`Y,(l) = St-1 (1)
`FOR2=i=ADO
`Y,(i) = pmin[y,(i
`
`- 11, s,-,(i)1
`
`st-, (A) = xr
`
`FORirADOy,(i)=0
`SC (1) = x1
`FORi>lDOs,(i)=s,-,(i)+y,(i-1)-y,(i)
`
`8.
`
`Notes:
`
`For all events.
`Zf not referenced before.
`
`the stack distance.
`Find
`Update
`the read hits.
`lf stack needs updating.
`Calculate push from each level
`down to the referenced block.
`Push is the minimum of the
`block at level and push from above.
`No pushes below reference.
`Establish new stack.
`
`(1)
`
`(2)
`
`(3)
`
`are assumed to remain unchanged at the next time
`
`that are not incremented
`In step 5, all counts
`t is not used.
`interval;
`thus the subscript
`as defined by the replacement algorithm.
`In step 7,pmin
`returns
`the block with
`the lowest priority,
`Pmin
`is the comparison
`function
`in the circles of Figure 2.
`to a set or removing a
`In step 8, plus and minus have the intuitive meaning of adding a member
`member.
`In this context, adding a member
`that
`is already present, or subtracting
`a member
`that
`is not present, have no effect. The same is true of adding or subtracting
`4. Thus,
`the block kept
`at level i, st (i), is either sl-, (i) or the block pushed from above, yr (i -
`l), whichever
`is not pushed
`from
`level
`i.
`
`Figure 3
`
`the lowest
`of the three blocks previously present. However,
`the lowest priority
`priority block can be determined
`in a single comparison of yt (2) and stP1 (3) since
`the third block, now .st (2), has already
`“won” a comparison against yt (2), and
`thus cannot have the lowest priority. Similar
`logic applies
`for all levels down to
`level lz, the original
`level of the referenced block; only
`the block currently at the
`level and the one pushed
`from above need be compared
`to find
`the block to be
`pushed. The contents of all sizes larger than k are again unchanged.
`The stack analysis algorithm
`is formally presented
`in Figure 3. This algorithm
`is used as the basis for the extensions
`in Section 2. Let:
`
`*** XN be a trace, where 3~~ is the reference at time t.
`x=xl,x*
`S, = the cache stack just after reference
`to xt, with st (C) = the block at
`stack level C. so(C) = 4 for all C.
`A = the stack distance of xt, that
`is, .s-~ (A) = xt.
`yt (C) = the block pushed (“yanked”)
`from cache of size C by reference xt.
`rh(C) = a count of the number of hits to level C by time t.
`
`it is possible to search the stack for the referenced block
`Note that, in practice,
`and update
`the stack simultaneously,
`since the priority
`function
`cannot depend
`on where (or even if) the referenced block is in the stack. The update stops when
`the referenced block
`is found. The block being pushed
`takes the place of the
`referenced block, which
`is inserted on top of the stack.
`ACM Transactions
`on Computer Systems, Vol. 7, No. 1, February 1989.
`
`DOCKET NO.: IPR-5463750
`
`9
`
`TI-EXHIBIT 1013
`
`

`

`Efficient (Stack) Algorithms
`
`for Write-Back and Sector Memories
`
`87
`
`Time
`Reference String
`
`2
`1
`A
`.+I
`---------
`
`3
`PI
`
`Al
`
`A2
`
`A3
`
`Cache
`Stack
`
`4
`B
`
`Bl
`.43
`
`5
`R
`
`B2
`.43
`
`0
`(’
`
`c’l
`A3
`B2
`
`7
`C
`
`c‘2
`A3
`B2
`
`8
`D
`
`Dl
`A3
`BL’
`c2
`
`0
`B
`
`B3
`-43
`Dl
`c?
`
`Fig. 4. Cache contents using the Least Frequently Used policy. The
`number beside the block is the priority, that is, the number of references.
`
`to
`of a Least Frequently Used policy
`the application
`As an example, consider
`the reference string
`{AAABBCCDB).
`Using
`this policy,
`the block pushed
`from
`any cache is the one that has been used the fewest total
`times. Figure 4 shows
`the contents of the stack after each reference, where
`the number beside each
`block
`is the priority
`(i.e., the number of uses of the block). Notice
`that a block
`may be pushed several levels because of a reference, as seen at time 8. Note too
`that blocks below the level where the referenced block
`is found are unchanged,
`even though
`they may have higher priorities, as seen after the last reference.
`
`1.4 Nonstack Algorithms
`that depends on cache size prevents
`function
`The prohibition
`against a priority
`some otherwise simple policies
`from being stack algorithms
`such as the First-In
`First-Out
`(FIFO)
`rule
`[22]. Another
`common
`technique
`that
`is not a stack
`algorithm
`is the use of demand prefetch or prefetch-on-miss
`[30]. Suppose that
`the prefetch policy
`is to fetch the following block along with any fetched block,
`but not to prefetch
`if the referenced block is already present. Assume an arbitrary
`stack algorithm
`for replacement.
`It
`is easy to construct
`counterexamples
`that
`violate
`inclusion because the priority
`of a prefetched block depends on when
`it
`is fetched, which varies with cache size. For example, consider
`the examples of
`Figure 5, where
`the contents of a larger cache are clearly not a superset of a
`smaller cache after the final
`reference.
`For
`that are stack algorithms.
`It
`is possible
`to construct prefetch policies
`example, the nondemand policy
`that always prefetches
`the next block, regardless
`of whether
`the referenced block is resident,
`is a stack algorithm. This policy
`is a
`form of One Block Lookahead or OBL
`[2]. From the point of view of the stack,
`this
`is equivalent
`to the insertion
`of a reference
`to the next block after each
`reference. Nondemand prefetch
`is not practical
`if the cost of a fault
`is high, as it
`is in virtual memory systems,
`for example, because the penalty
`for faulting
`to
`prefetch a block
`that may not be needed
`is greater
`than
`the potential
`gain.
`Nondemand prefetch
`is practical when it is possible to look for the next block in
`the cache and prefetch
`it if necessary without
`significantly
`slowing down pro-
`cessing the current
`reference. This
`is the case for many

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket