IPR2015-00163, No. 2023 Exhibit - 2023 (P.T.A.B. Aug. 11, 2015)

A LOH-OVERHEAD COHERENCE SOLUTION FOR MULTIPROCESSORS
`
`WITH PRIVATE CACHE MEMORIES
`
`Mark S. Papamarcos and Janak H. Patel
`Coordinated Science Laboratory
`University of Illinois
`1101 W. Springfield
`Urbana,
`IL 61801
`
`
`
` MAIN
`MEMORY
`
`A LOW-OVERHEAD COHERENCE SOLUTION FOR MULTIPROCESSORS WITH PRIVATE CACHE MEMORIES Mark S. Papamarcos and Janak H. Patel Coordinated Science Laboratory University of Illinois 1101 W. Springfield Urbana, IL 61801 ABSTRACT This paper presents a cache coherence solu- tion for multiprocessors organized around a single time-shared bus. The solution aims at reducing bus traffic and hence bus wait time. This in turn increases the overall processor utilization. Unlike most traditional high-performance coherence solutions, this solution does not use any global tables. Furthermore, this coherence scheme is modular and easily extensible, requiring no modif- ication of cache modules to add more processors to a system. The performance of this scheme is evaluated by using an approximate analysis method. It is shown that the performance of this scheme is closely tied with the miss ratio and the amount of sharing between processors. I. INTRODUCTION The use of cache memory has long been recog- nized as a cost-effective means of increasing the performance of uniprocessor systems [Conti69, Meade70, Kaplan73, Strecker76, Rao78, Smith82]. In this paper, we will consider the application of cache memory in a tightly-coupled multiprocessor system organized around a tlmeshared bus. Many computer systems, particularly the ones which use microprocessors, are heavily bus-llmited. Without some type of local memory, it is physically impos- sible to gain a significant performance advantage through multiple microprocessors on a single bus. Generally, there are two different implemen- tations of multlprocessor cache systems. One involves a single shared cache for all processors [Yeh83]. This organization has same distinct advantages, in particular, efficient cache utili- zation. However, this organization requires a crossbar between the processors and the shared cache. It is impractical to provide communication between each processor and the shared cache using a shared bus. The other alternative is private cache for each processor, as shown in Fig. I. However, thls organization suffers from the well known dat~ .~ or cache ~ problem. Should the same writeable data block exist in more ACKNOWLEDGEMENTS: Thls research was supported by the Naval Electronics Systems Command under VHSIC contract N00039-80-C-0556 and by the Joint Services Electronics Program under contract NO001 ~-8~-C-0149. IP OC I PROC I Roc I I "- i -" I I I I I I [ Timeshored Bus l I MAIN MEMORY Fig. I System Organization than one cache, it is possible for one processor to modify its local copy independently of the rest of the system. The simplest way to solve the coherence prob- lem is to require that the address of the block being written in cache be tranemitted throughout the system. Each cache must then check its own directory and purge the block if present. This scheme is most frequently referred to as J~~. Obviously, the invalidate traffic grows very quickly and, assuming that writes constitute 25~ of the memory references, the system becomes saturated wlth less than four processors. In [Bean79], a bias filter is pro- posed to reduce the cache directory interference that results from this scheme. The filter con- sists of a small associative memory between the bus and each cache. The associative memory keeps record of the most recently invalidated blocks, i~hlbitlng some subsequent wasteful invalidations. However, this only serves to reduce the amount of cache directory interference without actually reducing the bus traffic. Another class of coherence solutions are of the ~lobal-dlrectorv type. Status bits are asso- ciated with each block in main memory. Upon a cache miss or the first write to a block in cache, the block's global status is checked. An invali- date signal is sent only if another cache has a 348 0194-7111/84/0000/0348501.00© 1984 IEEE
`
`The simplest way to solve the coherence prob-
`lem is to require that
`the address of the block
`being written in cache be transmitted throughout
`the system.
`Each cache must
`then check its own
`directory and purge the block if present.
`This
`to
`scheme
`is most
`frequently
`referred
`as
` . Obviously,
`the invalidate
`traffic grows very quickly and,
`assuming that
`writes constitute 25% of
`the memory
`references,
`the system becomes saturated with less than four
`processors.
`In [Bean79.I. a mag _£;L1t_ex; is pro-
`posed to reduce the cache directory interference
`that results from this scheme.
`The filter con-
`sists of a small associative memory between the
`bus and each cache.
`The associative memory keeps
`a record of the most recently invalidated blocks,
`itmibiting some subsequent wasteful invalidations.
`However,
`this only serves to reduce the amount of
`cache directory
`interference without
`actually
`reducing the bus traffic.
`Another class of coherence solutions are of
`the ,gl9_b_a,l—_d1r_e_c_t_QmL type. Status bits are asso-
`ciated with each block in main memory.
`Upon a
`cache miss or the first write to a block in cache,
`the block's global status is checked.
`in invali-
`date signal
`is sent only if another cache has a
`
`ABSTRACT
`
`This paper presents a cache coherence solu-
`tion for multiprocessors organized around a single
`time-shared bus.
`The solution aims at
`reducing
`bus traffic and hence bus wait time. This in turn
`increases
`the
`overall
`processor
`utilization.
`Unlike most traditional high-performance coherence
`solutions,
`this solution does not use any global
`tables.
`Furthermore,
`this coherence
`scheme
`is
`modular and easily extensible, requiring no modif-
`ication of cache modules to add more processors to
`a
`system.
`The performance of
`this
`scheme
`is
`evaluated by using an approximate analysis method.
`It is shown that the performance of this scheme is
`closely tied with the miss ratio and the amount of
`sharing between processors.
`
`I.
`
`INTRODUCTION
`
`The use of cache memory has long been recog-
`nized as a cost-effective means of increasing the
`performance
`of
`uniprocessor
`systems
`[Conti69,
`Meade70, Kaplan73, Strecker76, Rao78, Smith82].
`In this paper, we will consider the application of
`cache memory in a tightly-coupled multiprocessor
`system organized around a
`timeshared bus. Many
`computer systems, particularly the ones which use
`microprocessors, are heavily bus-limited. Without
`some type of local memory, it is physically impos-
`sible to gain a significant performance advantage
`through multiple microprocessors on a single bus.
`
`there are two different implemen-
`Generally,
`tations of multiprocessor
`cache
`systems.
`One
`involves a single shared cache for all processors
`[Yeh83].
`This organization has
`some distinct
`advantages,
`in particular, efficient cache utili-
`zation.
`However,
`this organization requires a
`crossbar between the processors and the shared
`cache.
`It is impractical
`to provide communication
`between each processor and the shared cache using
`a shared bus.
`The other alternative is private
`cache for each processor,
`as shown in Fig.
`1.
`However,
`this organization suffers from the well
`known data gcnsiatensx or gauze _c9.her_ence problem.
`Should the same writeable data block exist in more
`
`
`ACKNOJLEDGEMENTS: This research was supported
`by
`the Naval Electronics Systems Command
`under VI-ISIC contract NOOO39-80—C-0556 and by
`the Joint Services Electronics Program under
`contract N0001II-84-C-011I9.
`
`0194-7111/84/0000/0348$01 .00©1984 IEEE
`
`348
`
`Timeshored Bus
`
`Fig.
`
`1
`
`System Organization
`
`than one cache, it is possible for one processor
`to modify its local copy independently of the rest
`of the system.
`
`Memory Integrity, LLC
`lPR2lJ15-I2IIJ15B, -IJIJ15B, -I'.IIJ1E3
`EXHIBIT
`
`lnte r" —2l'.I23
`
`Memo
`
`

`
`copy. Requests for transfers due to misses are also screened by the global table to eliminate unnecessary cache directory interference. The performance associated with these solutions is very high if one ignores the interference in the global directory. The hardware required to imple- ment a global directory for low access interfer- ence is extensive, requiring a distributed direc- tory with full crossbar. These schemes and their variations have been analyzed by several authors Tang76,Censler78,Dubois82,Yen82,Archibald83. A solution more appropriate for bus organized multiprocessors has been proposed by Goodman Goodman83. In this scheme, an invalidate request is broadcast only when a block is written in cache for the first time. The updated block is simultaneously written through to main memory. Only if a block in cache is written to more than once is it necessary to write it back before replacing it. This particular write strategy, a combination of write-through and write-back, is called write-once. A dual cache directory system is employed in order to reduce cache interference. We seek to integrate the high performance of global directory solutions associated with the inhibition of all ineffective invalidations and the modularity and easy adaptability to micropro- cessors of Goodman's scheme. In a bus-organized system with dual directories for interrogation, it is possible to determine at miss time if a block is resident in another cache. Therefore a status may be kept for each block in cache indicating whether it is Exclusive or Shared. All unneces- sary invalidate requests can be cut off at the point of origin. Bus traffic is therefore reduced to cache misses, actual invalidations and writes to main memory. Of these, the traffic generated by cache misses and actual invalidations represents the minimum unavoidable traffic. The number of writes to main memory is determined by the particular policy of write-through or write- back. Therefore, for a multlprocessor on a timeshared bus, performance should then approach the maximum possible for a cache coherent system under the given write policy. The cache coherence solution to be presented is applicable to both write-through and write-back policies. However, it has been shown that write- back generates less bus traffic than write-through Norton82. This has been verified by our perfor- mance studies. Therefore, we have chosen a write-back policy in the rest of this paper. Under a write-back policy, coherence is not main- tained between a cache and a main memory as can be done with a write-through policy. This in turn implies that I/O processors must follow the same protocol as a cache for data transfer to and from memory. II. PROPOSED COHERENCE SOLUTION In this section we present a low-overhead cache coherence algorithm. To implement this algorithm, it is necessary to associate two status bits with each block in cache. No status bits are associated with the main memory. The first bit indicates either Shared or Exclusive ownership of a block, while the second bit is set if the block has been locally modified. Because the state Shared-Modified is not allowed in our scheme, this status is used instead to denote a block contain- ing invalid data. A write-back policy is assumed. The four possible statuses of a block in cache at any given time are then: ;. JJiY~i~_~: Block does not contain valid data. 2. Excluslv~-/h~9_~iLig_~ (Excl-Unmod) : No other cache has this block. Data in block is con- sistent with main memory. 3. Shared-l~9_~I/.19_~ (Shared-Unmod): Some other caches may have this block. Data in block is consistent with main memory. 4. ~iy_~-Modlfled (Excl-Mod): No other cache has this block. Data in block has been locally modified and is therefore incon- sistent with main memory. A block is written back to main memory when evicted only if its status is Excl-Mod. If a write-through cache was desired then one would not need to differentiate between Excl-Mod and Excl- Unmod. Writes to an Exclusive block result only in modification of the cached block and the set- ting of the Modified status. The status of Shared-Unmod says that some other caches ~X have this block. Initially, when a block is declared Shared-Uumod, at least two caches must have this block. However, at a later time when all but one cache evicts this block, it is no longer truly Shared. But the status is not altered in favor of simplicity of implementation. Detailed flow charts of the proposed coher- ence algorithm are given in Figs. 2 and 3. Fig. 2 gives the required operations during a read cycle and Fig. 3 describes the write cycle. The follow- ing is a summary of the algorithm and some imple- mentation details which are not present in the flow charts. Upon a cache miss, a read request is broad- cast to all caches and the main memory. If the miss was caused by a write operation, an invali- date signal accompanies the request. If a cache directory matches the requested address then it inhibits the main memory from putting data on the bus. Assuming cache operations are asynchronous with each other and the bus, possible multiple cache responses can be resolved with a simple priority network, such as a daisy chain. The highest priority cache among the responding caches will then put the data on the bus. If no cache has the block then the memory provides the block. A unique response is thus guaranteed. On a read operation, all caches which match the requested address set the status of the corresponding block to Shared-Unmod. In addition, the block is writ- ten back to main memory concurrently with the transfer if its status was Excl-Mod. On a write, matching caches set the block status to Invalid. The requesting cache sets the status of the block to Shared-Unmod if the block came from another cache and to Excl-Unmod if the block came from 349
`
`

`
`r,E ca SS ye~o ~e~ I write J ~ ' J reodblockJ into cache coche~ernory Jsh set status to Jset ~otus to I ored-unrnod excl-unmod 4, I Fig. 2 Cache Read Operation main memory. Upon a subsequent cache write, an invalidation signal is broadcast with the block address only if the status is Shared-Uemod, thus minimizing unnecessary invalidation traffic. As will be seen in the following sections, the performance of the proposed coherence algo- rithm is directly dependent on the miss ratio and the degree of sharing, while in algorithms not utilizing global tables the performance is tied closely with the write frequency. Since the number of cache misses are far fewer than the number of writes, intuitively it is clear that the proposed algorithm should perform better than other modular algorithms. Most multlprocessing systems require the use of synchronization and mutual exclusion primi- tives. These primitives can be implemented with indivisible read-modlfy-wrlte operations (e.g., test-and-set) to memory. Indivisible read- modlfy-write operations are a challenge to most cache coherence solutions. However, in our sys- tem, the bus provides a convenient "lock" opera- tion with which to solve the read-modify-write problem. In our scheme if the block is either Excl-Unmod or Excl-Mod no special action is required to perform an indivisible read-modify- write operation on that block. However, if the block is declared Shared-Unmod, we must account for the contingency in which two processors are simultaneously accessing a Shared block. If the operation being performed is designated as indi- visible, then the cache controllers must first capture the bus before proceeding to execute the instruction. Through the normal bus arbitration mechanism, only one cache controller will get the bus. This controller can then complete the indi- visible operation. In the process, of course, the I send / J invalidot~ 1 ~ S block J "l I bock I I I " I I write in cache J read block into and set modified J involidot ej cache end sendJj Fig. 3 Cache Write Operation other block is invalidated and the other processor treats the access as a cache miss and proceeds on that basis. An implicit assumption in this scheme is that the controller must know before it starts executing the instruction that it is an indivisi- ble operation. Some current microprocessors are capable of locking the bus for the duration of an instruction. Unfortunately, with some others it is not possible to recognize a read-modify-wrlte before the read is complete; it is then too late to backtrack. For specific processors we have devised elaborate methods using interrupts and system calls to handle such situations. We will not present the specifics here, but it suffices to say that the schemes involve either the aborting and retrying of instructions or decoding instruc- tions in the cache controller. III. PERFORMANCE ANALYSIS The analysis of this coherence solution stems from an approximate method proposed by Patel Pate182. In this method, a request for a block transfer is broken up into several unit requests for service. The waiting time is also treated as a series of unit requests. Furthermore, these unit requests are treated as independent and ran- dom requests to the bus. It was shown in that paper that this rather non-obvlous transformation of the problem results in a much simpler but fairly accurate analysis. The errors introduced by the approximation are less than 5~ for a low miss ratio. First, let us define the system param- eters : N number of processors a processor memory reference rate m miss ratio 350
`
`

`
`w fraction of memory references that are writes d probability that a block in cache has been locally modified before eviction, i.e., the block is "dirty" u fraction of write requests that reference Unmodified blocks in cache s fraction of write requests that reference Shared blocks, equivalent to the fraction of Shared blocks in cache if references are assumed to be equally distributed throughout cache A number of cycles required for bus arbitration logic T number of cycles required for a block transfer I number of cycles required for a block invali- date To analyze our cache system, consider an interval of time comprising k units of useful pro- cessor actlvlty. In that time, kb bus requests will be issued, where b = ma + (1-m)awsu The term ma in the above expression represents the bus accesses due to cache misses and the term (1-m)awsu accounts for the invalidate requests resulting frc~ writes to Shared-Usmod blocks. The actual execution time for I useful unit of work, disregarding cache interference, will be I + bA + mat ÷ madT ÷ (1-m)awsuI + bW where W is the average waiting time per bus request. The cpu idle times per useful cpu cycle are the factors bA for bus arbitration, maT for fetching blocks on misses, madT for wrlte-back of Modified blocks, (1-m)awsuI for invalidate cycles, and bW for waiting time to acquire the bus. Now we account for cache interference from other processors. If no dual cache directory is assumed the performance degradation due to cache interference can be extremely severe. Therefore, we have assumed dual directories in cache. In this case, the cache interference will occur only in the following situations: I. A given processor receives invalidate requests from (N-I) other processors at the rate of (N-1)(1-m)awsu. We assume that all invalidates are effective and that, on the average, one cache is invalidated. The penalty for an invalidate is assumed to be one cache cycle. 2. Transfer requests occur at the rate (N-1)ma, of which (N-1)mas are for Shared blocks. We again assume that, on the average, one cache responds to the request. The penalty for a transfer is T cycles. We define Q to be the sum of these two effects, nRmely Q = (1 - m)awsu + masT Cache interference is assumed to be distributed over the processor execution time, yielding Z = I + bA + mat + madT + (1-m)awsuI + bW +-9- (I) Z 2 where Z is the real execution time for I useful unit of work. The unit request rate for each of the N processors as seen by the bus is Z - I - bA- 0/Z 2 Z The probability that no processor Is requesting the bus is given by Z - I - bA- QIZ2)N (I Z Therefore, the probability that at least one pro- cessor is requesting the bus, that is, the average bus utilization B, is, Z - I - bA- Q/Z2)N B (I - z (2) I To solve for B, W and Z, we need one more expres- sion for the bus utilization. That can be obtained by multiplying N by the actual bus time used, averaged over the execution period, giving N (Z - I - bA- bW - Q/Z 2) B = Z (3) Now we can solve for B, W and Z using equations (I), (2) and (3). Similar derivations exist for the case of no coherence, no coherence and no bus contention (infinite crossbar), and Goodman's scheme. The processor utilization U is simply l/Z. IV. DISCUSSION OF RESULTS In this section we present the analytical results to demonstrate the effect of various parameters on the cpu performance and bus traffic. The values of cache parameters used span a reason- able range covering most practical situations. In same cases we have chosen pessimistic values to emphasize the fact that our cache coherence solu- tion still gives good performance. The following values were used as default cache parameters: m = 5% Miss ratio: It may actually be lower for reasonable cache sizes, so this is a pes- simistic assumption. Lower miss ratios would be appropriate for slngle-tasking processors, while the 7.5% figure may be appropriate for multl-tasklng environ- ments involving many context switches. a = 90% Processor to memory access rate: Here we assume 90% of cpu cycles result in a cache request, although a smaller frac- tion is more likely in processors with a large register set. 351
`
`

`
`d = 50% w = 20% u = 30% s=51 A=I T=2 1=2 Write-back probability: Assume here that approximately half of all blocks are locally modified before eviction, although 20% and 80% are tried in order to see the effect of this parameter. Write frequency: Assumed to be about 20% of all memory references. This is a fairly standard nmnber. Since it only appears as a factor in the generation of invalidate requests with u and s, its actual value is not critical. Fraction of writes to unmodified blocks: Assume that roughly one third of all write hits are first-time writes to a given unmodified block and the remainder are subsequent writes to the modified block. Degree of sharing: In most cases we have assumed that 5% of writes are to a block which is declared Shared-Unmod. This should be a pessimistic assumption except for programs which pass large amounts of data between processors in which case s = 15% is more reasonable. In systemswhere most sharing occurs only on semaphores, the I% figure is more likely. Bus arbitration time: Assume that the logic for determining the next bus master settles within one cache cycle. Block transfer time: In a microprocessor envirou~ent blocks are likely to be small. Therefore, in most cases we have assumed that it takes approximately two cache cycles to transfer a block to a cache. We have also considered the effect of varying block transfer times due to differing technologies or larger cache blocks. Block invalidate time: We have assumed that the time taken for an invalidate cycle should be only slightly longer than a normal cache cycle, since the invali- date operation consists only of tranemit- ting an address and modifying the affected cache directories. The analytical method was verified using a time-driven simulator of the performance model. In all cases tested, the predicted performance differed by no more than 5% from the simulated performance. This error tended to approach 0 with heavier bus loading. Because of the comparative ease of generating data using the analytical solu- tion, all results shown have been derived analyti- cally. On each graph, all parameters assume their default values except the one being varied. Figs. 4 through 6 illustrate the effects of different miss ratios on bus utilization, system performance, and processor utilization as function of the number of processors. System performance is expressed as NU, where N is the number of pro- cessors and U is the single processor utilization. The system performance is limited primarily by the bus. From Fig. 4 we see that for 7.5% miss ratio the bus saturates with about 8 processors. As the miss ratio decreases to 2.5% the bus saturates wltb about 18 processors. The effect of bus saturation on system performance can be seen in Fig. 5. Note that, in general, bus utilization and system performance increase almost linearly with N until the bus reaches saturation. At this point, processor utilization begins to approach a curve proportional to I/N as seen in Fig. 6. If a I% miss ratio could be achieved, performance would top out with NU=29. B I .8 mlss=7"~~ ~sS s .6 // miss=2.5~ .2 0 0 2 ; 6 8 10 12 1'4 16 18 20 N Fig. 4 Effect of Miss Ratio m: Bus Utilization vs. Number of Processors 14- 12 10 8 6 NU miss=2.5Z 2 4 6 8 10 12 14 18 18 20 N Fig. 5 Effect of Miss Ratio m: System Performance vs. Number of Processors U 1 .8 .6 .4- .2 0 0 miss=2.5~£ mlss=7.5~ 1'o 1'2 1'4 1'8 1'8 2b N Fig. 6. Effect of Miss Ratio m: Processor Utilization vs. No. of Processors 352
`
`

`
`Figs. 7 and 8 illustrate the effect of differing degrees of interprocessor sharing. The effect of sharing factor, s, on system performance is relatively small compared with the effect of miss ratio. It is the factor (1-m)awsu that is responsible for the generation of invalidation traffic, which is generally smaller than the miss traffic. These graphs are also demonstrative of the effects of variations in the write frequency (w) and the percentage of first writes (u). The value of w is relatively fixed between 20% and 30%, and u should be fairly constant as well, except when moving large quantities of data. The s = 100% case corresponds to a standard write-back coherence scheme in which any block is potentially sharable. With a write-back frequency of 30% instead of 50% to compensate for initial write- throughs, the curves for Goodman's scheme are almost identical to those for s = 100%. Fig. 9 illustrates the effect of different write-back frequencies. The results here are fairly predictable. Wrlte-back is yet another factor which contributes to the bus traffic. A write-through policy would contribute much more traffic than this. Fig. 10 illustrates the degradation due to increasing block transfer times. System perfor- mance is so limited by transfer times of 4 cycles or more that it is absolutely necessary to be able to bring a block into cache in one or two cycles. Finally, Fig. 11 shows that the proposed coherence solution is very close to the ideal achievable system performance for a timeshared bus. The top curve represents a system not con- strained by a bus, while the second corresponds to a system with no coherence overhead. The bottum curve, representing the proposed solution, is very close to the middle curve, clearly showing that little system performance is lost in maintaining cache consistency using our algorithm. 1 .8 .6 .4 .2 0 0 B shore= 1Z shore= 100Z 15Z ; 6 8 1'0 I'2 1'4 1'6 1'8 20 N Fig. 7. Effect of Degree of Sharing s: Bus Utilization vs. Number of Processors NU 8 7 6 5 ¢ 3 2 1 0 o ~ 6 8 1'0 shore- I Z shore- 15X shore- 100Z 1'2 1'4 I'6 I'8 20 Fig. 8. Effect of Degree of Sharing s: System Performance vs. Number of Processors 10 9 8 7 6 5 NU wdtebock-20Z wdtebock-50Z wdtebock-80Z ¢ 3 2 1 0 0 I'0 I'2 I'4 I'6 18 2() Fig. 9. Effect of Wrlte-Back Probability d: System Performance vs. Number of Processors 14 12 10 8 6 4 2 0 0 NU transfer- 1. transfer=4 ; 6 8 1'0 1'2 14 1'6 1'8 20 Fig. 10. Effect of Block Transfer Time T: System Performance vs. No. of Processors 353
`
`

`
`20 18 16 14. 12 10 8 6 4 2 0 0 NU i rnll 2 4. 6 8 10 12 14 16 18 20 Fig. 11. Overhead of Coherence Solution: System Performance vs. No. of Processors V. CONCLUDING REMARKS In this paper we have introduced a new coher- ence algorithm for multlprocessors on a tlmeshared bus. It takes advantage of the relatively small amount of data shared between processors without the need for a global table. In addition, it is easily extenslble to an arbitrary n,-,bar of pro- cessors and relatively uncomplicated. The appli- cations of a system of this type are many. Pro- cessing modules could be added as needed, and the system need not be redesigned for each new appli- cation. For example, an interesting application would be to allocate one processing module to each user, with one possibly dedicated to operating system functions. Again, the primary advantage is easy expandabillty and very little performance degradatlon as a result of it. For any multlpro- cessor system on a tlmeshared bus, this coherence solution is as easy to implement as any other, save broadcast-invalldate, and offers a signifi- cant performance improvement if the amount of shared data memory is reasonably small. VI. ACKN~LEDGEMENTS We would llke to thank Professor Faye Brlg~s of Rice University and Professor Jean-Loup Baer of the University of Washington for helpful discus- sions concerning this paper. REFERENCES Archlbald83 J. Archibald and J. L. Baer, "An Economical Solution to the Cache Coherence Solution," University of Washington Technical Report 83-10-07, October, 1983. Bean79 B. M. Bean, K. Langston, R. Partridge, and K. K. Sy, "Bias Filter Memory for Filtering out Unnecessary Interrogations of Cache Direc- tories In a Multlprocessor System," United States Patent 4,142,23~, February 27, 1979. Cennier7 8 L. M. Censler and P. Feautrler, "A New Solu- tion to Coherence Problems in Multlcache Sys- tems," IEEE Trans. ComDut., VOI. C-27, December 1978, pp. 1112-1118. Cont169 C. J. Conti, "Concepts for Buffer Storage," IEEE ComDut. Grouo News, vol. 2, March 1969, pp. 9-13. Duboi s82 M. Dubois and F. A. Briggs, "Effects of Cache Coherency in Multiprocessors," IEEE Trans. ComDut., vol. C-31, November 1982, pp. 1083- 1099. Goodman83 J. R. Goodman, "Bslng Cache Memory to Reduce Processor-Memory Traffic," Proc. 10th A~ual SYruP. on Comuuter 3/-~, June 1983, pp. 124-131. Kaplan73 K. R. Kaplan and R. O. Winder, "Cache-Based Computer Systems," ComDuter~ March 1973, PP. 3O-36. MeadeT0 R. M. Meade, "On Memory System Design," AFIPS Proc. FJCC, vol. 37, 1970, pP. 33-43. Norton82 R. L. Norton and J. A. Abraham, "Using Write Back Cache to Improve Performance of Mul- tluser Mul tlprocessors," Proc. 1982 Int. Conf. on Parallel Processln~, August 1982, pp. 326-331. Patel82 J. H. Patel, "Analysis of Multiprocessors with Private Cache Memories, " IEEE Trans. Comvut., vol. C-31, April 1982, pp. 296-30~. Rao78 G. S. Rao, "Performance Analysis of Cache Memories," J. ACM, vol. 25, No. 3, July 1978, pp. 378-395. Smith82 A. J. Smith, "Cache Memories," ~ Sur- vevs~ vol. 14, No. 3, September 1982, pP. 473-530. Strecker76 W. D. Strecker, "Cache Memories for PDP-11 Family Computers". Proc. Rrd Annual SVmD. on ComPuter ~rchltecture, January 1976, PP. 155-158. Tang76 C. K. Tang, "Cache System Design in the Tightly Coupled Multlprocessor System," AFIPS Proc. NCC, vol. 45, 1976, PP. 7~9-753. Yeh83 P. C. C. Yeh, J. H. Patel, and E. S. David- son, "Shared Cache for Multlple-Stream Com- puter Systems," IEEE Trans. COmDUt. , vol. C- 32, January 1983, pp. 38-47. Yen82 W. C. Yen and K. S. Fu, "Coherence Problem in a Multlcache System," Proc. 1982 Int. Conf. on Parallel Processin~, 1982, pp. 332-339. 354

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases