`
`Reference 30
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 1
`
`
`
`Harvey G. Cragon
`
`
`
`( • r_
`
`•·;
`
`,
`
`' ,, I T ·
`
`-•~ I, I
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 2
`
`
`
`Memory Systems
`and Pipelined Processors
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 3
`
`
`
`Jones and Bartlett Books in Computer Science
`
`Arthur J. Bernstein and Philip M. Lewis
`Concurrency in Programming and Database Syste:rns
`
`Robert L. Causey
`Logic, Sets, and Recursion
`
`K. Mani Chandy and Stephen Taylor
`An Introduction to Parallel Programming
`
`Harvey G. Cragan
`Memory Systems and Pipelined Processors
`
`Michael J. Flynn
`Computer Architecture: Pipelined and Parallel Processor Design
`
`John Gregory and Don Redmond
`Introduction to Numerical Analysis
`
`James Hein
`Discrete Structures, Logic, and Computability
`
`E. Stewart Lee
`Algorithms and Data Structures in Computer Engineering
`
`Christopher H. Nevison, Daniel C. Hyde, G. Michael Schneider, and Paul
`T. 'lymann, Editors
`Laboratories for Parallel Computing
`
`Charles Van Loan
`An Introduction to Computational Science and Mathematics
`
`Henry M. Walker
`The Limits of Computing
`
`I
`
`rd
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 4
`
`
`
`'
`
`I Memory Sy te
`I a d p·pel ed P
`
`
`
`I
`
`I
`
`Ii
`
`I I
`
`11
`
`I I
`
`•
`
`:
`
`I I I I
`
`1·
`
`I I
`
`I
`I I
`
`11 I I I
`
`I
`I
`
`I
`
`I I I I
`
`I I!
`
`'
`
`II ,I: I 1
`
`I I I
`I
`
`I
`
`11,
`
`i
`
`I
`
`1
`
`11
`
`'1
`
`I
`
`I'
`
`I
`
`'
`
`._.._-- ............... ............... _ . , . ._ . - - -- - - -
`
`
`H•~·~ G. Cragon
`
`I 'r•, 'IJt,IVINSity of Texas at Austin
`
`11
`
`'
`
`Joan and :Bartlett
`Sudbury, Mau.achusetts
`
`London
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 5
`
`
`
`Editorial, Soln, and Cu~r &rvice Offitt•
`Jonet a,nd Ba.rtlet.t Pu.bliahen
`One Exeter Plua.
`&.ton, MA 02116
`1-800-832-0034
`617-859-3900
`
`Jonee and Bartlett Publiaben lntematlonal
`7 Melr-o9e Terrace
`Landon W8 7RL
`England
`
`Copyright© 1996 by Jonea and Bartlett Publiabc:ra, lnc.
`
`All right.a reNrnd. No part of the material protected by this copyright. notice
`may be reproduced or utilized in any form, electronic OT mechanical includiq
`photocopying, recording, or by any information atora,p and retrievaJ •yatem,
`without written penniuion from the copyriiht owner.
`
`Tr.demarked producu mendoned in thi:a book a.re listed on pqe 576 .
`
`Ubrar,- ol Coa.,re. CataloSlnl-ln•PubUcaUon Data
`
`Crapn, Harvey G.
`Memory aywtema and pipelined proceeaon / Harvey G. Cragon.
`cm.
`p.
`lncludee bibliographi.cal refe:re-.ncea and index.
`ISBN 0-86720~74-5
`1. Compaten, Pipeline. 2. Computer atorage devices.
`QA76.6.C68 1996
`004.25'6-dc:20
`
`l Title.
`
`95-22165
`CJP
`
`kt/uulioM Editor: Carl HealeT
`~ t iM Editt,r: Anne 8. Noonan
`Man.ufacturifl6 Bu~,: Dana L Cerrito
`EdiJ,orlal Produdion &rvice: Supe:ncript EdJtorlaJ Production &rvi.cee
`' /y~.- &anw•y Oraphica lntem.atiolUll
`Printi,w and Bwlin4: Braun-Brumfteld
`Cover D.•iln: &th Santo.
`Cawr Printing: John P. PO'fl' Company
`
`Printed ln the Urut.d 8t.t. oJ America
`99 ~ fY1 G6 g6
`10 g 8 7 6 &
`
`, 9 2 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 6
`
`
`
`Contents
`
`J Memory Systems.
`IA Ow>!YJtY
`I
`1.J M'twory Ultlnty
`4
`1.1.1 HianlN:bieAI. Memory Systema
`l.t.2 Mamory Efficicny
`7
`L& Al£mo!r Banchridth
`10
`J.S QUON
`13
`IA 8umman
`It
`
`6
`
`15
`M Ottcxlt:w
`15
`S.J Ou.be ?1@niratioa IUld P'lm!ment PoUdet
`&J 1 Sert todu 61Jooetioo
`%7
`t.U !£t,l!a,}y s.i.ct Cach..
`29
`1.13 Control Bilal Tllp, Vdiii, J?irt!, Shared anci
`I.RH
`31
`S2
`.1.1.t Nonblocking Cathee
`Beed &1idn and Pnfmmeore MMeb
`1.1.1 Trwpart; Time.
`S7
`I IO Ctc:be: Min Chor:adtri«lkt:
`t.tA Sourtteof'Vin end Htt Dnh
`1.2 r4 &kb Poliriea
`42
`
`19
`
`y
`
`U
`
`ftf
`
`i31l
`41
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 7
`
`
`
`45
`*Jl,6 Write Pollriell
`2..U ag,i...m.n, Polley
`54
`P: 17 Alotk S\:tc O.ntidrmtione
`
`58
`57
`
`60
`
`81
`2,81 Jo«trw:t:ktll Rnff't:re
`tut.ruction 9u!µf!S.
`2.3.2
`62
`UA Ioatzuctioo Corbet
`AA
`2 8 :f PIII.Nkbed [bfioK!tiM C&:bee Hh Omre
`
`f)R
`
`M
`
`...
`
`3.7
`u
`
`.LU Cont.mpon.ry Del& Cacbee-
`1.U D■ta Que.ua
`78
`13
`Unifitri C.d>et
`
`91
`Molt.ilnd C«bee
`Z.7 I Publiehed Pedt1nnanttt ::Ptrt
`
`73
`
`tu
`82
`
`9◄
`
`W
`Compilg Optimization
`107
`&&I Pipelined ()acl,..
`
`107
`
`3 Vinual Memory
`100
`AA Ouaie:«
`1Qfl
`8.1 Virtual Memory Caoc:epta
`8.2 ~ M,p';'\t)g
`117
`a.1.1 Cre•tul( t..np. Virtual Adclreau.u
`
`113
`
`-3.S.3
`lfgjetar Exbmaioos
`UA Summary
`120
`S.8 Vi:rtual Mrtmory Orga.niaa&iona
`U.l Pap Allocation Systema
`SJU S&gmwted e,.1ema:
`1.8.3 Paged Segme.n.taticm
`84 Trwn&Jation IookuldeBum,,...
`M.l TLB O!'J!"isatioCI
`165
`174
`341 TI,RM1eBot10D11a
`IA.S TLB lmpo,d oo Vlrtual Memotl Perf"onnanm
`lnat-ruetion Support for n.B■
`MA
`l77
`
`120
`
`llil
`128
`
`1&3
`166
`162
`
`us
`115
`
`""
`
`177
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 8
`
`
`
`Contents
`
`vii
`
`178
`
`I.I iir:i!alll Mtm.2a ~m;111io1 8~!1
`128
`34] Beed Accea11ea
`8.G.2 Allaitiim
`UlO
`aaa !idle Atrz111Ci
`UM
`aa~ I~Hao 0CXablt11
`184
`U.4 'hl,Je W•lld1:11 M.eL.hoda
`1116
`LU lnsttucticm Set Sunwrt for V'a.rtuaJ M•1D01T
`8.11.7 Me, ., , v Acoeu Omttol
`169
`u.a Choice or Paae 81:re
`t9S
`U.9 -~lh<DI>~
`196
`8JI V!ttual Me,non, P - - la,u,. ...t MMm
`199
`IA.I PubUllhed P!@ Fa-oh Ratio O.i.it
`
`18'i
`
`18'1
`
`'-JA Rcu1&.d1:ted ~Clutl Cattb.ea
`
`<.1.4 - 213
`
`4.U TLB AddnJ11in1
`
`:IOL
`
`llll3
`
`Zl6
`
`216
`
`227
`229
`
`.. Memotv: Addressing and 1/0 Coherenc~
`,o Oi!taMO«
`20:J
`,U Cadat Addn=alDS:1 Vlrt.uAJ oi" Real
`~ ... ~rtnAl 6ddai:aa C•tba
`f,.1) Beel Addmn Cecbe,
`?A4
`41.l Pi2tlined Real C..ehu
`206
`20ll
`212
`... l/0 S~m Addnesing Dall.ii!! Juun
`4.2.l Cache CobUilll~
`217
`42.2 8b002! (:onuolJE!fat
`220
`u.s DMA I/0 Contli!!r&tltma
`222
`,,, Otbtc Owtldtttlk>nt
`4M ~ r Control (Mr Snooe:r: Coain>lkn
`227
`U.l Memw v MeD!Md IJO m
`42.2 Cache Coheren~ Prot.:,cola
`•.a.a Snooe!!f: on Write Q\1,;0N
`4.8..C Swn?tUU"Y
`230
`
`'281
`
`II ln1:erieaved Mamo~ and Disk S1stems
`11,Q nv .. ..,1m
`23J
`Lt 1nwi..ve<1 aa.m..,,
`231
`5JJ &d'1m•oc• M11d,el1
`1.U Reduda1 the EtTec:t 6" StriW,i, l
`237
`241
`us
`5.1..C Tnblrlb!!!!I fur Miultii!?.Q As,cea
`2.49
`5.1.4 Catb.-Mcmo!)! Tranafcrt
`5.1.1 Nibble1 Pan I and Sutic Cc.lu.m.n Mod• l>rama
`
`..... Su(!!rinterldvin1
`
`'""'
`
`226
`
`1IOO
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 9
`
`
`
`U
`
`1..1.L Widt·Word lnt.vleaw.d ModnlN
`.Oist htems
`262
`JS:1
`l .l.1 Diak Requlremis,,l,a
`6.ll llilbmF"el/Oand.,.,...,,..._
`6.2.8 Diak Arrtyt
`272
`6.lt.◄ Diak Cacbos
`283
`
`151
`
`-
`
`6 Ploollned F'rooesaors
`287
`Qv:eryle:,1
`PiptU.,,. Porl'orm.lUICe Modakl
`
`281
`
`290
`
`298
`
`u.J Plpe.li.aa Emcie.ocy
`Pfpt,llo.e PartlUo-rdhq
`901
`
`300
`
`.... Ot.be-r :Pipeline Pert'Ol"mallff Mod.eta
`•.s ...
`
`U.J ~ 'n.at AddrwMemorr
`I endf8tor, Pm•,onn
`311
`&A2
`6.U Synehroni,;at:ion Summary
`
`3l5
`
`900
`309
`
`1 Sr-enctllflg
`317
`10 PYtr!k!)f
`312
`1.J Moocliog Tedu:uguo
`S:18
`?.I Pipenne PrteP Sl.tittp
`7 .. '1 Ptw.i~n StrJt(!ll(lf!I$
`32)
`7.3.J Static: Pradiction 9trat.egil!:il
`324
`V .3.J- Semi&t•tie Predict.On St;ra'-'@ie:t
`:3,26
`1.3.3 OJuamie Ptedieiion 81:tat.@
`32'1
`ln.tlruc:don Sequence Alter.a.lion Stn.tgiell
`"IA.I Delayed Bruch BtN.WO:
`3S9
`341
`'7.4.2 Pela~ Dra.nch '#1th Ahandt1n
`7.4..:S Dol11td .Bruch wilh F&Att CocnJ)tlrt)
`l5ra.nch Spreading with Coaditioo Cod.ea
`1 AA
`'7.6 Vet.ch M.llltipl&.Pa.th■ Strategi&a
`344
`741 Fekh Two fatba
`346
`7.5.2 Petch All Pat.bs
`-3,f,7
`1.15 .. 1 Ba.rtlwet8.lu!gilU"l!tDe.ou Mi
`
`319
`
`339
`
`.'i:.t3
`
`343
`
`7.•
`
`, PeoendftOGiel
`
`351
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 10
`
`
`
`Ca111anu
`
`ix
`
`II.I
`
`...
`
`••• ••• H.1
`
`358
`U.1 Trui 0.f'!Cdr-oc:iu
`l.s.l Rwutot O.pandeid.•
`;176
`8 9 :C Pthtr Malbodt
`..380
`3&
`Menwry Oepmdanciea
`U .1 Anti• and Output Ot>l)t!NlenciH
`UJI Tr1.1• O.ptnde11Nt!
`3~
`8.3.3 Lo■d Uypatl
`381
`i!..Y !l.ll-Modityio1 Code: f,,.. b.i,eoden.cl ..
`Bli•I e.::.._., JJ<r.:nd•nclff
`r'18t1
`115 MJl&NMmcS!unt
`IIIIO
`
`3M
`
`Ptrfwrnenre Yodtlt
`Nil
`401
`Sw:nm111 Camm.toil
`•tri
`Blr1.1c.~••tt• Dltpnnd1!.nd11•
`&U Sinclo-Pw,.UOC Plpelln•
`.1.03
`a.7.1- Mu.lll,Punttkln Dyn.v.,lci J>&ptllntt
`
`406
`
`• E,ca1ptlon1 and Interrupt•
`,'2&
`W10 Ovll"Ylt'l'I
`0, J Prt1tiM1 lnuimapt,
`tlflt
`P.l,l
`lntamrot.td Jn,tructlon qnd 8.yed Promm
`Coun1..tr
`i-38
`8.t .a
`fl11ntllln1 Prt.'t\!dlAJC lMl.l'Uc\k>o•
`e.1.1 Ha..llc1llo• re110_.i.n, 1,u.tt\lcti,,n,
`MormaIJ,e ind Coll of Preclao (nterrup1..11
`Pr.idM! fnl.c!ff'\111\· ~ 11\M\o::n tt
`tl6~
`
`48'1
`
`,n
`
`ID Enhancad lrnplomantatlon1
`JOI) Ovealtv
`◄61
`10..I 8upill'!ll'flliMtl Pr;w-p10P1
`,U
`10.l,1 E.Jement.uy p.,.ror:m11nce t.fode:•
`10:1.1 A Beflntd Model
`171
`10.1.1 Sup&rt1:lpelln11 O..itn luu~
`475
`10.1.4 CUffl!nt. Examplaot&perpipelininc
`415
`JO.S Supen1calar PN>celil80rs
`l Ul '&14!men1Ary' Pe:rformaMa Mod.I
`
`469
`
`418
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 11
`
`
`
`1~ Su.per11C1llar Duaip laau•
`'61
`J.Q.U: Supv,wia,r- ltrtnchlna A1.ra111:ljin
`LO.S.4 ~n e y Deiectitnl and .Reeolution
`10.I P,rti'-rmt@ Compyblop
`◄9:4
`J.0.. 9UP!U!£!I..,- 9cU1'!:n,lpelinocl ~ , 4S°7
`= PanUetism
`JD :4-1 P
`◄97
`I0A-2
`lnMtrlim:l,mil P1talleli1JP
`r Performance
`IQ ::t- 8 Pl
`f,M
`10.-6 -V~ ~ l ru11.ru¢oo Word (VLtw) P~.,.,.
`
`◄!Kt
`
`491
`
`492-
`
`50'2
`
`007
`
`61◄
`
`t 7 Yodm PmcnSora
`I I O Chrvxirw
`607
`fi09
`l U
`V11c.tm- Operatioos
`510
`11.2 Memoey•Prooi'!S80r Lnte·rfs.ce
`ll.2..1 Me,nory-c..Memory ~ 5 U
`lJ t 2 I owt/Soce A.n·Nwrutt Ma
`11:3 Ytdm P«rfcmnnDA
`51:t
`lLS.1
`.ANl!\Ument bj Vector lmirurtion Latency
`lit.reaming Rate
`LI.JU.
`616
`5-18
`lLlU .Asjeasni.ent. by MFLOPS
`11,M ~n i by Half Jwl'.:itui.4llef!c
`l1 3 & v ... :::::c 8 eler O.le v-.., Problem
`~noa &aa
`U.5 Matt Operadoos
`5S2.
`l I .8 Dllpl!ndi!.nrie$
`~'>3♦
`lntrtl.9t.\4mlc,n~ ~nd«=ticie.
`11.8.1
`11.6.2
`lo~tatemunl Oepende.mdei1
`lUU Clwnuy!
`636
`111 S..orbrneda Ptdirouanu W
`
`-520
`fil?l
`
`5.$4
`5:,S
`
`B.efACences Ml
`
`Subject lodel<
`
`U7
`
`Pror-euocs Index
`
`Tmdemaru
`
`516
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 12
`
`
`
`Preface
`
`Every difficulty slurred over will be a ghost to disturb your
`·
`sleep.
`
`- Chopin
`
`The greatest difficulties lie where we are not looking for
`them.
`-
`Johann Wolfgang von Goethe
`
`The widespread demand for high-performance personal computers and
`workstations has stimulated a renaissance in computer design. VLSI
`permits many millions of transistors to be placed on a chip that will sell
`for a few hundred of dollars. The two forces of demand and low cost
`combine to stimulate innovative designers to revisit old schemes and
`devise new ones to satisfy the ever-increasing demands for more perfor(cid:173)
`mance. The performance of microprocessors has increased at a compound
`rate of about 30 percent per year over the last decade. Positive feedback
`is supporting the demand for more performance; new software systems
`require more processor performance, while more processor performance
`permits more complex software systems. Many of the techniques dis(cid:173)
`cussed in this book had their birth in mainframe and supercomputer
`designs. However, with the success of the microprocessor, the architec(cid:173)
`ture of large mainframe class computers is being brought into question.
`This book describes and discusses the design details of high-speed
`memory systems and pipeline processors that underpin the modern mic(cid:173)
`roprocessor. I undertook the project of writing this book for several
`closely related reasons. First, I believed that there was a need for a
`graduate level course on the nitty-gritty details of computer system
`implementation and I could find no suitable text. A compendium of the
`implementation techniques used today by many designers has not been
`collected into one book. In addition, I wanted the experience, over a
`xi
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 13
`
`
`
`xii
`
`Preface
`
`of collecting reading and interpreting the litera-
`· d f
`,:,,
`,
`,
`per10 o a 1ew years,
`.
`.
`ture in this important field.
`While reading the early literature on processor 1m~lementation, I
`found that the implementation techniques used by the pioneers are con.
`tinually recycled into new designs. For e~ample, q~eues, first used in
`some of the very early machines, are bemg us~d m pr~cessors such
`as the Pentium and PowerPC 601. Although this book is focused on
`uniprocessors it is impossible to ignore the problems of coherency and
`synchronizati~n, problems usually assoc~ated wit~ parallel pro~essors.
`The material presented in this book 1s found m the open ht~rature
`and in manufacturers' processor manuals. I have not used mat~nal that
`is unavailable to any student or researcher; howev~r, the ope~ literature
`contains much duplication and extraneous mate~1al that this_ book at(cid:173)
`tempts to eliminate. On a discouraging note, much 1mplementatI~n detail
`of contemporary processors is not revealed by manufacturers without a
`nondisclosure agreement. Such an agreement is not possible when writ(cid:173)
`ing a textbook and thus there are unfortunate gaps. Because of competi(cid:173)
`tive pressures, a manufacturer will disclose an overview of a processor
`in a conference followed later by various product manuals. The details
`of a processor unfold over a period of time after its introduction. For
`these reasons, some of the relevant details of a processor may be missing
`or incorrectly described in this book.
`Because of the close tie between the memory and the processor and
`the commonality of a number of concepts, I believe that these two sub(cid:173)
`jects should be treated together in one book. This text is intended for a
`one-semester graduate course for students who have either experience
`in processor and/or memory architecture or who have taken an under(cid:173)
`graduate course in computer architecture or organization. The student
`should be conversant with the general ideas behind memory systems
`and pipelining. This book assembles the relevant information on memory
`systems and pipelined processors, and the references will lead a reader
`to the historical and contemporary literature on the topics discussed.
`This book examines the broad sweep of design issues and their historical
`precedence along with extensive references to specific topics. Care has
`students can
`been taken to reference the first mention of a topic so
`gain an appreciation of the contributions made by the early researchers
`and understand the development patterns that lead to the systems of
`todar Unfortunately the search for a first reference to a given topic has
`not, m all cases, been successful.
`Many advanced computer texts used in graduate courses cover a wide
`spectrum of topics in only medium depth. I have attempted to dig a
`· d stry
`·
`l
`·
`small deep hole rather than a wide shallow one Pro,:,,
`1ess1ona s m m u
`.
`.
`h
`s ould find this text helpful as it presents an in-depth look at the many
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 14
`
`
`
`Preface
`
`xiii
`
`details of memory and pipelined processors that are not found in a single
`text.
`Although the two topics of memory systems and pipelined processors
`are closely related, this text starts with memory systems based on the
`assumption that students have sufficient• background in pipelined pro(cid:173)
`cessors to appreciate the need for low latency and high bandwidth. For(cid:173)
`ward and backward references point to more detailed discussions of a
`topic that is being treated lightly.
`This book addresses only a few of the design issues of multiprocessors.
`The student needs a firm understanding of uniprocessor design before
`multiprocessor design problems can be effectively addressed. Neverthe(cid:173)
`less, this book will, in several places, note the similarities between the
`problems of overlap in memory systems and pipeline processors and the
`problems with multiprocessors. Thus, the student, having been exposed
`to these problems in the uniprocessor domain, should be better able to
`understand and pursue research in multiprocessors.
`Designers strive both to create processors that can execute programs
`in the shortest possible time and to achieve a balance between the speed
`of the processor and the available bandwidth and latency of the memory.
`This goal was important with the von Neumann architecture and is
`important today with the most recently announced RISC processors.
`Interleaved memory, hierarchical memory, and pipeline processors were
`first employed in high-performance supercomputers and mainframes.
`With the continued increase in circuit density available in VLSI devices,
`these memory systems and piplines have been used in microprocessors
`and are commonly integrated on the same chip. Thus, it is appropriate
`to combine in one book the salient design issues of memory and pipelines.
`A recurring idea in this book concerns the abundant use of tags and
`control bits in the internal structure of the processor that are not a part
`of the instruction set architecture. In nonoverlapped processors most of
`the processor resources are explicitly controlled by the executing instruc(cid:173)
`tions. With overlap, additional resources are added to the processor that
`are not visible to the programmer. These resources, and the data they
`contain, are controlled by tags of various kinds [MALI89]. Tags perform
`a number of functions : They identify and route data, indicate validity,
`provide protection, and perform some control functions. These tags and
`control bits are usually not program accessible and contribute to the rich
`context of a process that must be managed, preserved, and, in some
`cases, carefully destroyed.
`One of the goals of this book is to explain the techniques for managing
`the resources that are not visible to the program. Another goal of this
`book is to provide not only narrative descriptions of subjects but also
`analytical models and examples that can be used to make design trade-
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 15
`
`
`
`XIV
`
`Preface
`
`offs. Published peformance results are compared to the models' results
`when it is possible.
`
`The first section of this book covers memory systems. Chapter 1 ad(cid:173)
`dresses the two design goals of a me~o~ system: low laten~y and high
`bandwidth. The approaches to providmg these characteristics, hier(cid:173)
`archical and interleaved memories, are briefly addressed. Chapter 2
`starts the description of hierarchical memory by describing cache tech(cid:173)
`niques and specific cache designs. Cache design issues, such as organiza(cid:173)
`tion and performance, are described with illustrations of specific im(cid:173)
`plemented cache. Chapter 3 addresses virtual memory, giving a
`rationale for virtual memory, organizations, and performance tradeoff
`issues. A number of references are made to operating systems; however
`'
`detailed discussions of operating systems are outside the scope of this
`book. Chapter 4 discusses the complementary issues of cache addressing
`with either real or virtual addresses and coherency maintenance in all
`memory spaces of a computer when input/output is involved. Chapter
`5 completes the treatment of memory with a discussion of interleaved
`memory and disk systems. These two topics are important in the total
`memory hierarchy and virtual memory.
`The second section of this book covers pipelined processors. Chapter
`6 describes the general notions of pipelining with pipeline performance
`models. Partitioning, clock distribution, and atomic instructions are some
`of the issues discussed. Chapter 7 describes the issues involved in
`branching. A major detriment to pipeline performance is control transfers
`or branches. Branch strategies for pipelined processors and their perfor(cid:173)
`mance models are discussed. Chapter 8 addresses the issues of dependen(cid:173)
`cies, conflict detection, and resolution. Pipelined processors operate with
`a high degree of concurrency that leads to potential dependencies and
`conflicts. These problems require special design considerations in order
`to preserve the sequential model of computation. The chapter describes
`all of the techniques known to me for solving these problems. Chapter 9
`describes the issues of exceptions and interrupts in pipelined processors.
`Solutions to the precise interrupt problem can hurt the performance of
`dependency resolution solutions. Thus, the design of these two facilities
`is closely integrated in real machines. However, for expository reasons,
`these two subjects are treated separately with references between the
`chapters.
`Chapter 10 addresses the topics of superpipelined, superscalar, and
`very long instruction word processors. Performance models are pre(cid:173)
`sented to assist in the evaluation of design alternatives. Chapter 11
`assembles many of the topics of pipelining found in vector processors.
`The rational for vector processors is discussed, along with performance
`models and benchmark performance of real machines.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 16
`
`
`
`Preface
`
`xv
`
`As noted previously,
`text combines both the historical and contem-
`porary ideas on memory systems and pipelined processor implemen(cid:173)
`Undoubtedly, I have missed some relevant information and some
`of the material has been misinterpreted. For those errors of omission
`and commission, I take full responsibility.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 17
`
`
`
`1
`
`Memory Systems
`
`1.0 Overview
`Th.ere are tnRJlY models t.hnt iu-e u~ed to evnltrotc lhe pcrformanoo of a
`pr'OC(?saor. Ono of tho most i.nsightiul. the model for the time required to
`execute a task, is
`
`wk Lime .. nwnoor of instnl.ctioM oxecuted >< clocks per instruc,;ion
`(CPI) x clock period.
`
`'fhe n1;1moor ur in~Lruct.iutlli executed la usuallj ll f\JJittlion of the to•
`.struction set design and is outside the scope of thil!, book, but the second
`and tbl-rd tenns are the subject of this book. Toe design of the me,cnc>ry
`8}'Stom and tho pipeline, proccsaor have a direct. bearing on tbe CPI tl1ot
`ai.n be raali:red f~rn a proce:s!ior. A$ will be pointed out, there are CPI
`desi.gn considerations that affect the clock penod; the CPI can be redui.-cd
`.at tbe CJtpense of a lotlg(!.l" doclc and otherwise. Tbu importanl. metric is
`the pl"Oduet of theae two ,terms. not tlwir individual value3. The illterec:·
`tion11 between t.he memory 11y1:1wrn and (ho pn,celllkor will lw tNident us
`eac;b of these mujor aubfuoc:t.irms ts de-scribed.
`Tho pe:rfonnanou of a cornput.or depends in large m eit$UN? an the
`lnterfaoe between the processor nnd memory, lf tlus inwrface is oot
`oorrod., n ltigni!ieant inarettsi;, in OPI cA.n result.. Tho p~!lor mnku
`demands upon the memory for irlStnu:itiomi and daia, artd th16 denuuul
`muat be aatisficd in Ol'der to realli,e the futl potential of ~he prooe850r.
`'11w follow-ing two qu.oteti from ll!'LYN66J dC!uill two piU"amcU)l'S of the
`memory thaL are lmportant.
`
`Latency or latent period is the tctn.l timr associn.te.d wilh l,ho procmi$•
`ing (from excitation t.o response) of o particular dnta tmit at " phase
`i:n the oomputing pl'OOlss.
`
`l
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 18
`
`
`
`2
`
`Memory Systems
`
`SPARC SPARC MO>S MlPS
`lnt.egw FbWlll In~ FloittL'\&
`Pol.nl
`Poutt
`
`IB:M 81360 VAX AVERAGE
`
`tn.lnletlon
`.T.»ta mad
`Oata write
`
`0.79
`0.16
`0.06
`
`0.80
`(>.17
`0.03
`
`0,76
`0. 16
`0.09
`
`0.77
`0. 19
`0 04
`
`0..60
`o.3!S
`016
`
`o.~ 0,71
`0..22
`0,28
`0.0'7 O.CYT
`
`TABL£ 1.1 Momory reference dJS1ributlon.
`
`Bandwidth is an expression of the tune rate of oocun~ce. In par'b(cid:173)
`cu lllr, computational or exeeution bandwidth is the nu.moor of
`instruct.ions proc.css-Od per second, and 11torag\! bandwidth is the
`retrieval rate of openmd and ope.ration memory words (words/
`second).
`
`The t.echtllqueli use<! to provide low-lat.enoy and high-bandwidth
`memory for a processor depend hea"ilY upon the nature of the roferenoea
`made to the memory by the proces&0r. Table 1.1 shows the fraction of
`memory reCerenccs claaai6ed dB Instruction .Fetch, Data Read, and Dat.a
`Write for a number of procet1eors. Notice that instruction fetches repJ'(l•
`sent betM•c,en 50% and 80% of all refc.renoos. Dato. reads Nlptesent l6%
`to 35~ of all reforon.cea, while data write& rapl'C.$cnt less than 10% for
`all proeesaors other than th.e IBM $/360..
`The implication of these reference statistics is that the priorities of
`a memory 11ystem design are: first the inst.ruction fetches, second t.be
`data reads, and hist. the dste writes. The obvious 60lution to the problem
`of providing both low late.ncy and htgb bandwidth ill to w;e higher ~(cid:173)
`fonnanc:e memory devices. However, the bistory of increMing tho
`bandwidth of memory devices Cei.reulta) over a ton-year period is not
`enoourtlging. P\tbliahed data from the International Solid State Circuits
`Conferenoo provide normalit.ed informatioo on the reduetion of access
`t.ime.s of dynamic RAMS and tJt.l\tfo RAMS 11hown i n Fig\lre l. 1.
`Note thst while clock periods have decreased by a factor of approx:!•
`matcly 23 ( ll reduction of 27~ per year over the laIJt dee.ti.de), SRM.i
`cyclo t.imt>. have decreased by onl,y a factor o.r 18 (s reducUon of 25%
`per year) 1md DRAMS by a factor of 13 (a reduction of 23~ per )'C:Or).
`'fbese da&.a show that mttmory performance incTI?ases ba,1e not kept pace
`with reductions in processor clock periods.
`One conclusion from tha d4ta of Figure l .l is that c.he required
`decrease in latency and the increase in memory bandv.'idt.h ha,,e come,
`and will oontJnuo t-0 ootne, not from improved memory technology but
`from improved memory system de-sign. Memory pe.rforman0e will prob(cid:173)
`ably oontinu.~ to lag processor clock cycle-time improvementa. and mo,-e
`p:roce1380r c.-oncurrency will further increase the demand on memory. Thus
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 19
`
`
`
`FlouM 1.1 Mctmory and pro00$sor cyclo,llmc lmprovomoni:s.
`
`innovat.ive memory en,hifJ!ctureY 1tre mJ1.ntlat.ory if proce890r perfor(cid:173)
`mance is to continue to increase.
`'1110 n~ed for increased memory performance baa occurred in a period
`of significant memory cost decreases. Memory coets have been decreasing
`d.--arnatically since th.., early eoT!lmarcial compournJ. A 1-M byte of add(cid:173)
`on mrmory for IBM S/360 computers C05L $1M in 1965I ln 199-4, a 1-M
`byte r>f memory for t1 penlona1 computer coi;ts ap1u•o.x:imately $50; a
`dra.mat.ic cost roouction of approx.imatcly 20,000-fo.lil.
`'The relative cost reh1tion&hip between fast. and Blow memory
`wropon ents has n,rr13.ined relatively constont over this period 8.Jld
`ean ~ upre~ed by Oroi!cl,'11 Law (GR0853l. Thia law is an empjricaJ
`observation that states "giving added economy only as the square root.
`of the increase in speed-that ii;, to do a calc:ulatlon ten times as ch~ply
`yau must do it one hundred times as foai.." The converse bas often been
`st.at~d /1& if you double the price of a oomputer or computer component,
`the performance will increase by a factor .of four. Grosch's law am be
`confirmed by noting tho relative cost and performance of SRAM and
`ORAM chips. Note thst. Oroseb's law, aa posited, applies over time and
`does not. eoneid1.1r learning-curve cffect.4. Huwcve,r, the la w is fl'\!queot.ly
`applied to one t-ime perlod u when looking a t mf!mbel"t or n 00roputeT
`Ct.imlly.
`The ideal memory Cor a computer would be one that. re.ruoV'e!! the
`memory bandwidth 8.lld latency coul:it.rainL~. Such ~ memory would have
`.zero aoress and cycle time (zero latency and infinite bandwidth), be of
`i.nfiuite sit,c, and have u.,ro QOst. Obvioual,y, we will never have such an
`Wen.I memQry. Over three d«JU!es, desig:oers hAve attempted to ap(cid:173)
`proach the zero latency of this Ideal memory by Lh~ archi!A!ctural path
`of n memory hientrchy.
`The following Lwo sections int:roduoo tho l88ucs of supplying low•
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 20
`
`
`
`4
`
`M emO'ry System3
`
`Latency
`Clocks per
`umtructioWJ
`
`Normalized
`Performance
`
`0.0
`0.2
`0.4
`0.6
`0,8
`1.0
`
`LOO
`0.83
`0.71
`0.62
`0.55
`0.50
`
`Note: Basic CPI .. 1.
`
`TABLE. 1.2 The effac1 of memory latency on processor performel\oe.
`
`laU!Dcy and high-bandwidt.b memory. Chapters 2, S, and 4 give details
`of low-latency memory in the forms of caches and virtual memory. Chap(cid:173)
`Ui_r 5 det1cril>M the t.eclu\iques for providing high-bandwidth memory
`using mte.rleaved low-bandwid th memory component&.
`
`1.1 Memory Latency
`The i. sue of memory latency ts quite important. with regard to pipelined
`processors. Con.sider the effect of memo.ry latency on VA.~ 1 1nao perror.
`mn.ncc. l[this proooS$0r had a. pcrfecl mwnory, with a band.width of one
`clock ))el' reference end no additional lo.ooncy. au instruction execution
`would take approximately 9.6 ctocb. Actual memory !atency adds ap•
`proxim&t.ely 0.6 docks, giving o. CPI of appronmat.cly 10 clock.$, wbieb
`is an inc:reasa in the CPl of appt"Oximately !'ill;. However, with a pipelined
`processor that may have a CPI of 1.5 with a one cycle m.runory, 0.5
`additional clocks (1 clock i8 o.coountcd for itl the basic CPI flgure) repre(cid:173)
`eent. o 33~ in,crease in CPl The res-son letency is important is t.hat when
`n memory request is made, the proct?8SOr Jllay be icJle Wllil t.hnt requet.t
`is sat[sfied. Table 1.2 bows the impact of latency on the perfo:rmanoo of
`a processor.
`As e:lQ)eeted. performance will de.crease by~ when the memory
`reference latency is equlll to the proees11or perf'Onnlltlci:! without latency.
`C\trrenl memory t.ech.nology ls capable of satisfying mem.ory bandwidth
`requiromcmts for oven the most. powerfol computers. Bot Lho satisfaction
`of lat.oncy is a mlljor design problem. With increasingly faster processors,
`the memory pc.rformnnoc bottleneck bccom«:$ even moT<e J}l'Onounc:ed.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 21
`
`
`
`..__P_.ik _
`
`1. 1 Mem-ory LlltBncy
`
`u r-· · ·
`
`•
`
`•
`
`1
`
`5
`
`1
`
`__.r ►i LI
`·•·-·
`
`• Sm.Ill
`
`• £.q,a,,t,o
`
`fioUII£ 1.2 Hlersrchlc:al memory.
`
`1.1.1 Hierarchical Memory Systems
`Hjqarchical memory U&e6 a biefflrclly nf mem-oey unita, a.s shown in
`Figure l .2, t.o satia{y the low.Jatency re,qt1h:ement. of a memory 6)'!1te:m.
`Starting with the regiBt.crs in the processor, tho s llorage capacity in each
`level tnCJU8es and llhe apeed and the wet per b it dccreall@i. For most
`references, the referenced item is found in a memory with low latency.
`Frequently ref~renced itenu, ~ $Wred in high-i:,peed. costly memory,
`and infrequently referenced items are stored in a low-1>peed, low-coat.
`level. The weighted average latency is decreased by IJtOring itU1tntctJona
`and data l.n the level that ta oonsiBtent ·with the frequency of the ref~
`cnced i~m.
`l nt.erelltingly, von Neumann (BURK46l not..id in 1946 the require(cid:173)
`ment for a three-level memory hierarcl\y. The three levels ere the pro(cid:173)
`ces.sor internal regis~rll, the main memory, and bulk stA>ra.ge on magnetic
`wire recorders.. Hierarchical memory t\)'Btems permit the dQigJter to
`trade latency or &cce88 lime ag-ain!l.t cost.. The deaign goal ia that the
`p:roce:8sor will see a memory thot ill only slight ly slower than the fastest
`memory io tho b.iortu'Chy.
`This book does not directly address the prooes&or registers, although
`they are a member of the hierarch,». The r\!gi8ter-lc:'1el design conaadcr(cid:173)
`ations are basically the concern of the illstrucdon set designer. However.
`indirect. effect.!! on t!te perlorma.nco or the memory hienlrChy arc found
`in auch item& a.s t.he memory bandwidth requirred t.o support cont.at
`switching a very lnrgo register rue.
`Three terms that apply to all levels of the memory hierarchy are
`defined " follows.
`The h~ran:hical kueJ le usuR.lly clenot.ed by an index befPnning with
`1 fol' the level ,1.earest the pl'OCeasor and progrc:asing upward. But, in
`rolative t-0,rms, the '"highe5t level" is the level cloa<!St to the prooossor
`and the "lowen level" is the sloweat memory of the system. ln the
`wscussions IA> follow, the convention of the index 1 iA U8(!d for Uie highest
`level, with indices 2 · · h denoting the lower levels.
`Multileucl inclll$iort is the property that Lhe cont.eats of eru:b level
`of the hiera.reh.v h is alway& found at level h + l [LAM79J, CBAER87l.
`Main rMmt>ry is the lw-gest r11.ndom acce_~ memory in tlw system.
`Early compatel'il had foll1' levels of memory: registers. main memory,
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 22
`
`
`
`6
`
`Memory Sy11tem1
`
`di.ab or drums, and archival storage such as magnetic Lape. For t.helM!:
`comput.en, the main memory waa m8.KJ1etic core •ith