`
`h HAHDWAREKSGFTWAHE APPROACH
`
`G~l[u-L
`\'itt:Lt •
`
`App a, Inc. at 3.1!.
`MEI‘I‘IDW Integrity, LLC
`
`
`
`IPR2fi15-00159. N151,
`N153, M1TZ
`EXHIBIT
`
`Memam lnteHrit - 2002
`
`1
`
`
`
`Senior Editor
`Dlrtctor of Produclion and
`Manufacturing
`Senior Production Editor
`EdJtort.al Coordinator
`Cover Design
`Cover Photo
`Tel'(.t Design
`Co.pyedilor
`Proorreaders
`
`C0<mpositor
`tllustr.ltors
`Indexer
`
`Denise E.M. Penrose
`Yon.le Ovenon
`
`Elisabeth Beller
`Meghan Keeffe
`Martin Heinkuji Graphic l>esign
`lmag;e copyright C> 1998 PhotoDlse, Inc.
`Mark Ong, Side by Side Studios
`Jennifer McCJ1in
`Jenni( er McClain, Jeff Van Beurcn,
`Ken Della.Penta, Christine Sabooni
`Nancy Logan
`Nancy Logan. Danmouth Publishing. lnc .• Cherie. Plumlee
`Steve Ralh
`
`Designarlons used by comp.antes to dlstlngutsh their products arc often claimed as trademarks or regis(cid:173)
`lettd trademarks, ln all i~nces when Morgan Kaufmann Publishers, Inc. is aware of a claim, the prod·
`uct names appear ln lnltJal caphal or ant caphal 1etten. Readecs, however, should contact the appropriale
`companies for more complcle infonnation ngarding ~emarks and registration.
`Morgan Kaufnwtn Publishers. Inc.
`Edilorial and Sales Office
`340 Pinc Strttl, Sixth Floor
`San Francis<>O, CA 94104·3205
`USA
`4151392-2665
`Telephone
`1151982-2665
`Facsimile
`mkp@mkp.com
`htcp:/twww.mkp.com
`WWW
`Order toll free 800/7i:;-7J23
`Advice, Praise, and Errors: Any correspondence related to this publication or intended for lht authors
`should be addre:ssed lO the fdltor1al and Sales Office or Morgan Kaufmann Publishers, Inc., Dept. PCA
`APE or st:nt electronically to pca@m,.p.com. Please repon e.nor$ by email to pcabugs@mkp.com. Plea~
`cheek the errata page at hup://www,m•p .. comlpc:a to see if the bug has already 'been reponed and fixed.
`Cl 1999 Morgan Kaufm.tnn Publishers. Dnc.
`All! rig)>ts rcsaved
`Pri:nted ln the United States of America
`·rr.-ns!CJY~"Xi 10 Oigi1id Pl'inting, 20 11
`No pan or this publica1ion may be reproduced, stored in a reuieval system, or transmitted in any form or
`by any means--elect.r0n.ic, mecltanlcal, photocopying, recording, or otherwise-without the prior written
`pennission of the publisher.
`~rtni$.~inni. .. nay hi.: sous.ht directly fmn1 Elsevief'.s Scicoce iirnl Tci;.lu1C)logy Righti Dc1)r1nn'lent in
`Oxrl'Olfd, UK. Phone: f44) 186S 8.43830, Nx: ( 44) 1865 853333, e-m~il: ptnniss:ion~@el~..:v icr.co.uk .
`Y0ti 11\il)' nlso rump!«~ y<lVr NQ\!~~~ ()!!•lir~ \lia th~ Etscvi4.'r homepage: http://wtA•w.elsevier.com b)'
`scloc1i11s "CustomCf Support" ;,nd t11<n ~01'1:1.inlnJ ~1ni1'Si()ns•.
`Library of Congress Cataloglng.-i:n·Pub:licatton Data
`Culler, David E.
`Parallel computer archilec:lure: a bardwarc/softWf.rt. approach I
`David E. CuUtr, Jaswlnder Pal Singh-, with Anoop Gupta.
`p. cm.
`Includes bibliographical references and index. • •
`ISSN· IJ; 978· 1·5586-0·343-I
`lSBN-10: 1-55860-343-3
`l. Parallel computers. 2. ComputCT architecture. l. Singh,Jaswinder Pat 11. Gupta, Anoop. Ill. Tide.
`Q.'.76.58.C85 1999
`004'.3,......c21
`
`98-28034
`CIP
`
`2
`
`
`
`5.1 Cach< Coh<l'l:nct 279
`
`The check to determine iF a bus lronsac1ion is relevanl 10 a cache is essentially the
`$amt tag match that is performed ror a request from the processor. The action taken
`may involve invalidating or updating the contenlS or state or that cache block and/or
`supplying the latest voluc for that block from the cache to the bus.
`A snoopy cac.hc cohcrcnoc pt0Locol 1ies 1oge1her two basic. facets of co111pulcr
`archilecture th.al are also round in uniprocessors: bus 1ranssctions and Lhc state trnn(cid:173)
`silion diagram associated witlt a cacl1e bloc.k. Recall that the first component-l hc
`bus 1ransac1ion--consists of Lhret phnm: a1·blrrndon, cornmand/address, ond data.
`In the arbitration phase. devices thal desire 10 i·nitlate a transaction asser1 their b us
`request, and the bus arbiter S<lects one of these and responds by aSS<rting its grant
`signal. Upon grant. the selected device pince.~ th e command, for example, read or
`write, and the associtlted address on the bus command and address lines. All <levicts
`observe the address and, in a uniprocessor, one or them recognizes that iL ls rcspon 4
`slble [o r lhe parllcular address. For I read tmnsac1ion, the address phase is [allowed
`by data transfer. \Vrite transactions vary from bus to bus Pccording to whether Lhe
`data ls lransferltd during or after the address phase. For most bu.scs, • m;pondin.g
`device can assen a wait signal to hold off the data transfer until it Ls ready. Thls w.alt
`sigiul is different from the 01her bus signals because it is a wired-OR acrQ$$ all the
`processors; that is, it is a logical l lf any device m.sens iL The initiator docs not nc.ed
`to know which responding devie< is panlcipating in the mmsfer, only that there Is
`one and whether ii is ready.
`The st:rond basic fac<:t of compultt archhect.ure leveraged by a cache cohere.nee
`protocol is that each block In a uniprocessor cache has a state associated with It,
`along with the tag and chta, which lndlco1cs the disposhion or the block, (e.g.,
`invalid, valid, dirty). The cache policy is defined by Lhe cache block state transition
`dltigram, \Vhich is a finite St.ale n1achlne Specifying hO\V Lhe dispoSilion or a blotk
`changes. Tra1isitions ror a cache block occur upon access to that block or to an
`odd rtSS Lhat maps to the san1e C'Jch e lint as 1h a1 block . (\Ve refer to a cache block n:.
`1he actual d>l.ta, and a line us the fixed s1or11ge 111 the hard,vare cache, in exact ar1al·
`ogy whh a page and a page fran1e In main n1cn1ory.) While only blocks that are actu·
`ally in cache lines have hard\vare srace lnrormat.ion, logically, all blocks 1hat are not
`residen1 in the cache can be viewed as being in ehher a special '"not presenf' state or
`in the "'invalid" St.ate. In Q uniprocessor system, ror a Vlrite·through. wrlte-no-(cid:173)
`allocate cache (Hennessy and Patterson l996). only rwo states Pre rtquirtd; valid
`and invalid. Initially, all the blocks are Invalid . When a processor read opuauon
`misses. a bus cransaaion 1$ generated to load the block from memory and the block
`is mark..! valid. Writes gen<rate a bus traruactlon to updote memory, and they d so
`update the cache block if it is pl't:Knt in the valid state. \Vrit.cs do not change the
`S<ate of the block. If a block Is replac.d. It may be marked invalid until the memory
`provides the new block, whereupon It becomes valid. A write-back cache requires an
`additionaJ state per cache Hne, lndicnting a "din y• or modifie.d block.
`In a multiprocessor syste:rn, a block has 11 St ale in each cache, and these cache
`states change according to the s 1ote 1ranshlon diagram. Thus, \VC can think of ll
`block's cache state as being a vector or p s101cs Instead of a single state, \Vht:rc p is t'he
`number or cach es. T'hc cache stntc is 1nnnipulalcd by a set of p distributed finite state
`
`3
`
`
`
`280 CHAPTER s Shared Men1ory l'o.i1.1hiprOce$$0rS
`
`machines, implemented by the cache controllers. The state machine or state transi(cid:173)
`tion diagram that governs the state changes is the same for all b1ocks and all caches,
`but the current scate of a block in different caches Is different. As before, if a block i.s
`not present in a cache we can assume it to be in a special "'not presenl" state or t \'en
`in the invalid state.
`ln a s nooping cache coherence scheme, each cache controller receives two sets of
`inputs: the processor issues memory requests, and the bus snooper informs about
`bus transactions from other caches. In response 10 ci1hcr, Lhe controlJer may update
`the state of the appropriate block in the cache according co Lhe current s tate and the
`state transition diagram. It may also take an action. For example, it responds 10 the
`processor with the requested dal'a, potentially generating new bus crnnsactions to
`obtain the data. le responds to bus 1ransactions by updating its state and sometimes
`intervenes in completin.g the transaction. Thus, a snooping prorocol is a distributed
`algorithm represented by a col1ection 0£ cooperating finite state machines. It is spec(cid:173)
`ified by the following components:
`
`•
`•
`
`•
`
`the set or states associated with memory blocks in the local caches
`the state cransition diagram, which cakes as inputs the current State and the
`processor request or observed bus transaction and p·roduces as output the nexl
`state for the cache block
`the actions associated with each state Lransltion, 'vhich are determined in p3rt
`by the sci of feasible octions defined by 1he bus, the cache, and the processor
`design
`
`Tl1e d iJferent state machines for a block are coordinated b·y bus transactions.
`A simple invalidation-based protocol for a coherent write-through. writc-no(cid:173)
`allocate cache is described by the state transition diagra1n in Figure 5.5. As in the
`uniprocessor case, coch cache block has only 1wo SIJUCS: invalid (I) and valid (V)
`(the .. not present" s tate is assumed to be the same as invalid). The transitions are
`marked wlth the input that causes the tr.l.nsition and the output that is generated
`with the transition. For example, when a controller sees a read from its processor
`miss in the cache. a BusRd transaction is generated, and upon completion or this
`tr11nsac1ion the block rmnsirions up to the valid state. Whenever the controller sees a
`processor wri1e to a location. a bus transaction ls gcneraled thal updaces ch.at loca(cid:173)
`tion in main memory with no change or State. The key enhancement to the unipro(cid:173)
`cessor state diagram is that when the bus snooper sees a v.'lite transaction on the bus
`for a memory block that is cached locally, the controller s.ets the cache state for tha1
`block 10 invalid, thereby effectively d.iscardlng its copy. (Figure 5.5 shows this bus(cid:173)
`i_nduced transition with. a dashed arc.) By extension, if .any processor generates a
`write ror a block that is cached by any or the Others, all of the others \"\'ill invalicla.tt
`their copies. Thus, multiple simultaneous readers or a block may coexist without
`generating bus transactions or invalidations. but a write will eliminate all other
`cached copies.
`To see how this simple write-through invalidation protocol provides coherence,
`we need to show that for any execution under the protocol a total order on the mem-
`
`4
`
`