throbber
Homayoun
`
`Reference 30
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 1
`
`

`

`Harvey G. Cragon
`
`
`
`( • r_
`
`•·;
`
`,
`
`' ,, I T ·
`
`-•~ I, I
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 2
`
`

`

`Memory Systems
`and Pipelined Processors
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 3
`
`

`

`Jones and Bartlett Books in Computer Science
`
`Arthur J. Bernstein and Philip M. Lewis
`Concurrency in Programming and Database Syste:rns
`
`Robert L. Causey
`Logic, Sets, and Recursion
`
`K. Mani Chandy and Stephen Taylor
`An Introduction to Parallel Programming
`
`Harvey G. Cragan
`Memory Systems and Pipelined Processors
`
`Michael J. Flynn
`Computer Architecture: Pipelined and Parallel Processor Design
`
`John Gregory and Don Redmond
`Introduction to Numerical Analysis
`
`James Hein
`Discrete Structures, Logic, and Computability
`
`E. Stewart Lee
`Algorithms and Data Structures in Computer Engineering
`
`Christopher H. Nevison, Daniel C. Hyde, G. Michael Schneider, and Paul
`T. 'lymann, Editors
`Laboratories for Parallel Computing
`
`Charles Van Loan
`An Introduction to Computational Science and Mathematics
`
`Henry M. Walker
`The Limits of Computing
`
`I
`
`rd
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 4
`
`

`

`'
`
`I Memory Sy te
`I a d p·pel ed P
`
`
`
`I
`
`I
`
`Ii
`
`I I
`
`11
`
`I I
`
`•
`
`:
`
`I I I I
`
`1·
`
`I I
`
`I
`I I
`
`11 I I I
`
`I
`I
`
`I
`
`I I I I
`
`I I!
`
`'
`
`II ,I: I 1
`
`I I I
`I
`
`I
`
`11,
`
`i
`
`I
`
`1
`
`11
`
`'1
`
`I
`
`I'
`
`I
`
`'
`
`._.._-- ............... ............... _ . , . ._ . - - -- - - -
`
`
`H•~·~ G. Cragon
`
`I 'r•, 'IJt,IVINSity of Texas at Austin
`
`11
`
`'
`
`Joan and :Bartlett
`Sudbury, Mau.achusetts
`
`London
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 5
`
`

`

`Editorial, Soln, and Cu~r &rvice Offitt•
`Jonet a,nd Ba.rtlet.t Pu.bliahen
`One Exeter Plua.
`&.ton, MA 02116
`1-800-832-0034
`617-859-3900
`
`Jonee and Bartlett Publiaben lntematlonal
`7 Melr-o9e Terrace
`Landon W8 7RL
`England
`
`Copyright© 1996 by Jonea and Bartlett Publiabc:ra, lnc.
`
`All right.a reNrnd. No part of the material protected by this copyright. notice
`may be reproduced or utilized in any form, electronic OT mechanical includiq
`photocopying, recording, or by any information atora,p and retrievaJ •yatem,
`without written penniuion from the copyriiht owner.
`
`Tr.demarked producu mendoned in thi:a book a.re listed on pqe 576 .
`
`Ubrar,- ol Coa.,re. CataloSlnl-ln•PubUcaUon Data
`
`Crapn, Harvey G.
`Memory aywtema and pipelined proceeaon / Harvey G. Cragon.
`cm.
`p.
`lncludee bibliographi.cal refe:re-.ncea and index.
`ISBN 0-86720~74-5
`1. Compaten, Pipeline. 2. Computer atorage devices.
`QA76.6.C68 1996
`004.25'6-dc:20
`
`l Title.
`
`95-22165
`CJP
`
`kt/uulioM Editor: Carl HealeT
`~ t iM Editt,r: Anne 8. Noonan
`Man.ufacturifl6 Bu~,: Dana L Cerrito
`EdiJ,orlal Produdion &rvice: Supe:ncript EdJtorlaJ Production &rvi.cee
`' /y~.- &anw•y Oraphica lntem.atiolUll
`Printi,w and Bwlin4: Braun-Brumfteld
`Cover D.•iln: &th Santo.
`Cawr Printing: John P. PO'fl' Company
`
`Printed ln the Urut.d 8t.t. oJ America
`99 ~ fY1 G6 g6
`10 g 8 7 6 &
`
`, 9 2 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 6
`
`

`

`Contents
`
`J Memory Systems.
`IA Ow>!YJtY
`I
`1.J M'twory Ultlnty
`4
`1.1.1 HianlN:bieAI. Memory Systema
`l.t.2 Mamory Efficicny
`7
`L& Al£mo!r Banchridth
`10
`J.S QUON
`13
`IA 8umman
`It
`
`6
`
`15
`M Ottcxlt:w
`15
`S.J Ou.be ?1@niratioa IUld P'lm!ment PoUdet
`&J 1 Sert todu 61Jooetioo
`%7
`t.U !£t,l!a,}y s.i.ct Cach..
`29
`1.13 Control Bilal Tllp, Vdiii, J?irt!, Shared anci
`I.RH
`31
`S2
`.1.1.t Nonblocking Cathee
`Beed &1idn and Pnfmmeore MMeb
`1.1.1 Trwpart; Time.
`S7
`I IO Ctc:be: Min Chor:adtri«lkt:
`t.tA Sourtteof'Vin end Htt Dnh
`1.2 r4 &kb Poliriea
`42
`
`19
`
`y
`
`U
`
`ftf
`
`i31l
`41
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 7
`
`

`

`45
`*Jl,6 Write Pollriell
`2..U ag,i...m.n, Polley
`54
`P: 17 Alotk S\:tc O.ntidrmtione
`
`58
`57
`
`60
`
`81
`2,81 Jo«trw:t:ktll Rnff't:re
`tut.ruction 9u!µf!S.
`2.3.2
`62
`UA Ioatzuctioo Corbet
`AA
`2 8 :f PIII.Nkbed [bfioK!tiM C&:bee Hh Omre
`
`f)R
`
`M
`
`...
`
`3.7
`u
`
`.LU Cont.mpon.ry Del& Cacbee-
`1.U D■ta Que.ua
`78
`13
`Unifitri C.d>et
`
`91
`Molt.ilnd C«bee
`Z.7 I Publiehed Pedt1nnanttt ::Ptrt
`
`73
`
`tu
`82
`
`9◄
`
`W
`Compilg Optimization
`107
`&&I Pipelined ()acl,..
`
`107
`
`3 Vinual Memory
`100
`AA Ouaie:«
`1Qfl
`8.1 Virtual Memory Caoc:epta
`8.2 ~ M,p';'\t)g
`117
`a.1.1 Cre•tul( t..np. Virtual Adclreau.u
`
`113
`
`-3.S.3
`lfgjetar Exbmaioos
`UA Summary
`120
`S.8 Vi:rtual Mrtmory Orga.niaa&iona
`U.l Pap Allocation Systema
`SJU S&gmwted e,.1ema:
`1.8.3 Paged Segme.n.taticm
`84 Trwn&Jation IookuldeBum,,...
`M.l TLB O!'J!"isatioCI
`165
`174
`341 TI,RM1eBot10D11a
`IA.S TLB lmpo,d oo Vlrtual Memotl Perf"onnanm
`lnat-ruetion Support for n.B■
`MA
`l77
`
`120
`
`llil
`128
`
`1&3
`166
`162
`
`us
`115
`
`""
`
`177
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 8
`
`

`

`Contents
`
`vii
`
`178
`
`I.I iir:i!alll Mtm.2a ~m;111io1 8~!1
`128
`34] Beed Accea11ea
`8.G.2 Allaitiim
`UlO
`aaa !idle Atrz111Ci
`UM
`aa~ I~Hao 0CXablt11
`184
`U.4 'hl,Je W•lld1:11 M.eL.hoda
`1116
`LU lnsttucticm Set Sunwrt for V'a.rtuaJ M•1D01T
`8.11.7 Me, ., , v Acoeu Omttol
`169
`u.a Choice or Paae 81:re
`t9S
`U.9 -~lh<DI>~
`196
`8JI V!ttual Me,non, P - - la,u,. ...t MMm
`199
`IA.I PubUllhed P!@ Fa-oh Ratio O.i.it
`
`18'i
`
`18'1
`
`'-JA Rcu1&.d1:ted ~Clutl Cattb.ea
`
`<.1.4 - 213
`
`4.U TLB AddnJ11in1
`
`:IOL
`
`llll3
`
`Zl6
`
`216
`
`227
`229
`
`.. Memotv: Addressing and 1/0 Coherenc~
`,o Oi!taMO«
`20:J
`,U Cadat Addn=alDS:1 Vlrt.uAJ oi" Real
`~ ... ~rtnAl 6ddai:aa C•tba
`f,.1) Beel Addmn Cecbe,
`?A4
`41.l Pi2tlined Real C..ehu
`206
`20ll
`212
`... l/0 S~m Addnesing Dall.ii!! Juun
`4.2.l Cache CobUilll~
`217
`42.2 8b002! (:onuolJE!fat
`220
`u.s DMA I/0 Contli!!r&tltma
`222
`,,, Otbtc Owtldtttlk>nt
`4M ~ r Control (Mr Snooe:r: Coain>lkn
`227
`U.l Memw v MeD!Md IJO m
`42.2 Cache Coheren~ Prot.:,cola
`•.a.a Snooe!!f: on Write Q\1,;0N
`4.8..C Swn?tUU"Y
`230
`
`'281
`
`II ln1:erieaved Mamo~ and Disk S1stems
`11,Q nv .. ..,1m
`23J
`Lt 1nwi..ve<1 aa.m..,,
`231
`5JJ &d'1m•oc• M11d,el1
`1.U Reduda1 the EtTec:t 6" StriW,i, l
`237
`241
`us
`5.1..C Tnblrlb!!!!I fur Miultii!?.Q As,cea
`2.49
`5.1.4 Catb.-Mcmo!)! Tranafcrt
`5.1.1 Nibble1 Pan I and Sutic Cc.lu.m.n Mod• l>rama
`
`..... Su(!!rinterldvin1
`
`'""'
`
`226
`
`1IOO
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 9
`
`

`

`U
`
`1..1.L Widt·Word lnt.vleaw.d ModnlN
`.Oist htems
`262
`JS:1
`l .l.1 Diak Requlremis,,l,a
`6.ll llilbmF"el/Oand.,.,...,,..._
`6.2.8 Diak Arrtyt
`272
`6.lt.◄ Diak Cacbos
`283
`
`151
`
`-
`
`6 Ploollned F'rooesaors
`287
`Qv:eryle:,1
`PiptU.,,. Porl'orm.lUICe Modakl
`
`281
`
`290
`
`298
`
`u.J Plpe.li.aa Emcie.ocy
`Pfpt,llo.e PartlUo-rdhq
`901
`
`300
`
`.... Ot.be-r :Pipeline Pert'Ol"mallff Mod.eta
`•.s ...
`
`U.J ~ 'n.at AddrwMemorr
`I endf8tor, Pm•,onn
`311
`&A2
`6.U Synehroni,;at:ion Summary
`
`3l5
`
`900
`309
`
`1 Sr-enctllflg
`317
`10 PYtr!k!)f
`312
`1.J Moocliog Tedu:uguo
`S:18
`?.I Pipenne PrteP Sl.tittp
`7 .. '1 Ptw.i~n StrJt(!ll(lf!I$
`32)
`7.3.J Static: Pradiction 9trat.egil!:il
`324
`V .3.J- Semi&t•tie Predict.On St;ra'-'@ie:t
`:3,26
`1.3.3 OJuamie Ptedieiion 81:tat.@
`32'1
`ln.tlruc:don Sequence Alter.a.lion Stn.tgiell
`"IA.I Delayed Bruch BtN.WO:
`3S9
`341
`'7.4.2 Pela~ Dra.nch '#1th Ahandt1n
`7.4..:S Dol11td .Bruch wilh F&Att CocnJ)tlrt)
`l5ra.nch Spreading with Coaditioo Cod.ea
`1 AA
`'7.6 Vet.ch M.llltipl&.Pa.th■ Strategi&a
`344
`741 Fekh Two fatba
`346
`7.5.2 Petch All Pat.bs
`-3,f,7
`1.15 .. 1 Ba.rtlwet8.lu!gilU"l!tDe.ou Mi
`
`319
`
`339
`
`.'i:.t3
`
`343
`
`7.•
`
`, PeoendftOGiel
`
`351
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 10
`
`

`

`Ca111anu
`
`ix
`
`II.I
`
`...
`
`••• ••• H.1
`
`358
`U.1 Trui 0.f'!Cdr-oc:iu
`l.s.l Rwutot O.pandeid.•
`;176
`8 9 :C Pthtr Malbodt
`..380
`3&
`Menwry Oepmdanciea
`U .1 Anti• and Output Ot>l)t!NlenciH
`UJI Tr1.1• O.ptnde11Nt!
`3~
`8.3.3 Lo■d Uypatl
`381
`i!..Y !l.ll-Modityio1 Code: f,,.. b.i,eoden.cl ..
`Bli•I e.::.._., JJ<r.:nd•nclff
`r'18t1
`115 MJl&NMmcS!unt
`IIIIO
`
`3M
`
`Ptrfwrnenre Yodtlt
`Nil
`401
`Sw:nm111 Camm.toil
`•tri
`Blr1.1c.~••tt• Dltpnnd1!.nd11•
`&U Sinclo-Pw,.UOC Plpelln•
`.1.03
`a.7.1- Mu.lll,Punttkln Dyn.v.,lci J>&ptllntt
`
`406
`
`• E,ca1ptlon1 and Interrupt•
`,'2&
`W10 Ovll"Ylt'l'I
`0, J Prt1tiM1 lnuimapt,
`tlflt
`P.l,l
`lntamrot.td Jn,tructlon qnd 8.yed Promm
`Coun1..tr
`i-38
`8.t .a
`fl11ntllln1 Prt.'t\!dlAJC lMl.l'Uc\k>o•
`e.1.1 Ha..llc1llo• re110_.i.n, 1,u.tt\lcti,,n,
`MormaIJ,e ind Coll of Preclao (nterrup1..11
`Pr.idM! fnl.c!ff'\111\· ~ 11\M\o::n tt
`tl6~
`
`48'1
`
`,n
`
`ID Enhancad lrnplomantatlon1
`JOI) Ovealtv
`◄61
`10..I 8upill'!ll'flliMtl Pr;w-p10P1
`,U
`10.l,1 E.Jement.uy p.,.ror:m11nce t.fode:•
`10:1.1 A Beflntd Model
`171
`10.1.1 Sup&rt1:lpelln11 O..itn luu~
`475
`10.1.4 CUffl!nt. Examplaot&perpipelininc
`415
`JO.S Supen1calar PN>celil80rs
`l Ul '&14!men1Ary' Pe:rformaMa Mod.I
`
`469
`
`418
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 11
`
`

`

`1~ Su.per11C1llar Duaip laau•
`'61
`J.Q.U: Supv,wia,r- ltrtnchlna A1.ra111:ljin
`LO.S.4 ~n e y Deiectitnl and .Reeolution
`10.I P,rti'-rmt@ Compyblop
`◄9:4
`J.0.. 9UP!U!£!I..,- 9cU1'!:n,lpelinocl ~ , 4S°7
`= PanUetism
`JD :4-1 P
`◄97
`I0A-2
`lnMtrlim:l,mil P1talleli1JP
`r Performance
`IQ ::t- 8 Pl
`f,M
`10.-6 -V~ ~ l ru11.ru¢oo Word (VLtw) P~.,.,.
`
`◄!Kt
`
`491
`
`492-
`
`50'2
`
`007
`
`61◄
`
`t 7 Yodm PmcnSora
`I I O Chrvxirw
`607
`fi09
`l U
`V11c.tm- Operatioos
`510
`11.2 Memoey•Prooi'!S80r Lnte·rfs.ce
`ll.2..1 Me,nory-c..Memory ~ 5 U
`lJ t 2 I owt/Soce A.n·Nwrutt Ma
`11:3 Ytdm P«rfcmnnDA
`51:t
`lLS.1
`.ANl!\Ument bj Vector lmirurtion Latency
`lit.reaming Rate
`LI.JU.
`616
`5-18
`lLlU .Asjeasni.ent. by MFLOPS
`11,M ~n i by Half Jwl'.:itui.4llef!c
`l1 3 & v ... :::::c 8 eler O.le v-.., Problem
`~noa &aa
`U.5 Matt Operadoos
`5S2.
`l I .8 Dllpl!ndi!.nrie$
`~'>3♦
`lntrtl.9t.\4mlc,n~ ~nd«=ticie.
`11.8.1
`11.6.2
`lo~tatemunl Oepende.mdei1
`lUU Clwnuy!
`636
`111 S..orbrneda Ptdirouanu W
`
`-520
`fil?l
`
`5.$4
`5:,S
`
`B.efACences Ml
`
`Subject lodel<
`
`U7
`
`Pror-euocs Index
`
`Tmdemaru
`
`516
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 12
`
`

`

`Preface
`
`Every difficulty slurred over will be a ghost to disturb your

`sleep.
`
`- Chopin
`
`The greatest difficulties lie where we are not looking for
`them.
`-
`Johann Wolfgang von Goethe
`
`The widespread demand for high-performance personal computers and
`workstations has stimulated a renaissance in computer design. VLSI
`permits many millions of transistors to be placed on a chip that will sell
`for a few hundred of dollars. The two forces of demand and low cost
`combine to stimulate innovative designers to revisit old schemes and
`devise new ones to satisfy the ever-increasing demands for more perfor(cid:173)
`mance. The performance of microprocessors has increased at a compound
`rate of about 30 percent per year over the last decade. Positive feedback
`is supporting the demand for more performance; new software systems
`require more processor performance, while more processor performance
`permits more complex software systems. Many of the techniques dis(cid:173)
`cussed in this book had their birth in mainframe and supercomputer
`designs. However, with the success of the microprocessor, the architec(cid:173)
`ture of large mainframe class computers is being brought into question.
`This book describes and discusses the design details of high-speed
`memory systems and pipeline processors that underpin the modern mic(cid:173)
`roprocessor. I undertook the project of writing this book for several
`closely related reasons. First, I believed that there was a need for a
`graduate level course on the nitty-gritty details of computer system
`implementation and I could find no suitable text. A compendium of the
`implementation techniques used today by many designers has not been
`collected into one book. In addition, I wanted the experience, over a
`xi
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 13
`
`

`

`xii
`
`Preface
`
`of collecting reading and interpreting the litera-
`· d f
`,:,,
`,
`,
`per10 o a 1ew years,
`.
`.
`ture in this important field.
`While reading the early literature on processor 1m~lementation, I
`found that the implementation techniques used by the pioneers are con.
`tinually recycled into new designs. For e~ample, q~eues, first used in
`some of the very early machines, are bemg us~d m pr~cessors such
`as the Pentium and PowerPC 601. Although this book is focused on
`uniprocessors it is impossible to ignore the problems of coherency and
`synchronizati~n, problems usually assoc~ated wit~ parallel pro~essors.
`The material presented in this book 1s found m the open ht~rature
`and in manufacturers' processor manuals. I have not used mat~nal that
`is unavailable to any student or researcher; howev~r, the ope~ literature
`contains much duplication and extraneous mate~1al that this_ book at(cid:173)
`tempts to eliminate. On a discouraging note, much 1mplementatI~n detail
`of contemporary processors is not revealed by manufacturers without a
`nondisclosure agreement. Such an agreement is not possible when writ(cid:173)
`ing a textbook and thus there are unfortunate gaps. Because of competi(cid:173)
`tive pressures, a manufacturer will disclose an overview of a processor
`in a conference followed later by various product manuals. The details
`of a processor unfold over a period of time after its introduction. For
`these reasons, some of the relevant details of a processor may be missing
`or incorrectly described in this book.
`Because of the close tie between the memory and the processor and
`the commonality of a number of concepts, I believe that these two sub(cid:173)
`jects should be treated together in one book. This text is intended for a
`one-semester graduate course for students who have either experience
`in processor and/or memory architecture or who have taken an under(cid:173)
`graduate course in computer architecture or organization. The student
`should be conversant with the general ideas behind memory systems
`and pipelining. This book assembles the relevant information on memory
`systems and pipelined processors, and the references will lead a reader
`to the historical and contemporary literature on the topics discussed.
`This book examines the broad sweep of design issues and their historical
`precedence along with extensive references to specific topics. Care has
`students can
`been taken to reference the first mention of a topic so
`gain an appreciation of the contributions made by the early researchers
`and understand the development patterns that lead to the systems of
`todar Unfortunately the search for a first reference to a given topic has
`not, m all cases, been successful.
`Many advanced computer texts used in graduate courses cover a wide
`spectrum of topics in only medium depth. I have attempted to dig a
`· d stry

`l

`small deep hole rather than a wide shallow one Pro,:,,
`1ess1ona s m m u
`.
`.
`h
`s ould find this text helpful as it presents an in-depth look at the many
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 14
`
`

`

`Preface
`
`xiii
`
`details of memory and pipelined processors that are not found in a single
`text.
`Although the two topics of memory systems and pipelined processors
`are closely related, this text starts with memory systems based on the
`assumption that students have sufficient• background in pipelined pro(cid:173)
`cessors to appreciate the need for low latency and high bandwidth. For(cid:173)
`ward and backward references point to more detailed discussions of a
`topic that is being treated lightly.
`This book addresses only a few of the design issues of multiprocessors.
`The student needs a firm understanding of uniprocessor design before
`multiprocessor design problems can be effectively addressed. Neverthe(cid:173)
`less, this book will, in several places, note the similarities between the
`problems of overlap in memory systems and pipeline processors and the
`problems with multiprocessors. Thus, the student, having been exposed
`to these problems in the uniprocessor domain, should be better able to
`understand and pursue research in multiprocessors.
`Designers strive both to create processors that can execute programs
`in the shortest possible time and to achieve a balance between the speed
`of the processor and the available bandwidth and latency of the memory.
`This goal was important with the von Neumann architecture and is
`important today with the most recently announced RISC processors.
`Interleaved memory, hierarchical memory, and pipeline processors were
`first employed in high-performance supercomputers and mainframes.
`With the continued increase in circuit density available in VLSI devices,
`these memory systems and piplines have been used in microprocessors
`and are commonly integrated on the same chip. Thus, it is appropriate
`to combine in one book the salient design issues of memory and pipelines.
`A recurring idea in this book concerns the abundant use of tags and
`control bits in the internal structure of the processor that are not a part
`of the instruction set architecture. In nonoverlapped processors most of
`the processor resources are explicitly controlled by the executing instruc(cid:173)
`tions. With overlap, additional resources are added to the processor that
`are not visible to the programmer. These resources, and the data they
`contain, are controlled by tags of various kinds [MALI89]. Tags perform
`a number of functions : They identify and route data, indicate validity,
`provide protection, and perform some control functions. These tags and
`control bits are usually not program accessible and contribute to the rich
`context of a process that must be managed, preserved, and, in some
`cases, carefully destroyed.
`One of the goals of this book is to explain the techniques for managing
`the resources that are not visible to the program. Another goal of this
`book is to provide not only narrative descriptions of subjects but also
`analytical models and examples that can be used to make design trade-
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 15
`
`

`

`XIV
`
`Preface
`
`offs. Published peformance results are compared to the models' results
`when it is possible.
`
`The first section of this book covers memory systems. Chapter 1 ad(cid:173)
`dresses the two design goals of a me~o~ system: low laten~y and high
`bandwidth. The approaches to providmg these characteristics, hier(cid:173)
`archical and interleaved memories, are briefly addressed. Chapter 2
`starts the description of hierarchical memory by describing cache tech(cid:173)
`niques and specific cache designs. Cache design issues, such as organiza(cid:173)
`tion and performance, are described with illustrations of specific im(cid:173)
`plemented cache. Chapter 3 addresses virtual memory, giving a
`rationale for virtual memory, organizations, and performance tradeoff
`issues. A number of references are made to operating systems; however
`'
`detailed discussions of operating systems are outside the scope of this
`book. Chapter 4 discusses the complementary issues of cache addressing
`with either real or virtual addresses and coherency maintenance in all
`memory spaces of a computer when input/output is involved. Chapter
`5 completes the treatment of memory with a discussion of interleaved
`memory and disk systems. These two topics are important in the total
`memory hierarchy and virtual memory.
`The second section of this book covers pipelined processors. Chapter
`6 describes the general notions of pipelining with pipeline performance
`models. Partitioning, clock distribution, and atomic instructions are some
`of the issues discussed. Chapter 7 describes the issues involved in
`branching. A major detriment to pipeline performance is control transfers
`or branches. Branch strategies for pipelined processors and their perfor(cid:173)
`mance models are discussed. Chapter 8 addresses the issues of dependen(cid:173)
`cies, conflict detection, and resolution. Pipelined processors operate with
`a high degree of concurrency that leads to potential dependencies and
`conflicts. These problems require special design considerations in order
`to preserve the sequential model of computation. The chapter describes
`all of the techniques known to me for solving these problems. Chapter 9
`describes the issues of exceptions and interrupts in pipelined processors.
`Solutions to the precise interrupt problem can hurt the performance of
`dependency resolution solutions. Thus, the design of these two facilities
`is closely integrated in real machines. However, for expository reasons,
`these two subjects are treated separately with references between the
`chapters.
`Chapter 10 addresses the topics of superpipelined, superscalar, and
`very long instruction word processors. Performance models are pre(cid:173)
`sented to assist in the evaluation of design alternatives. Chapter 11
`assembles many of the topics of pipelining found in vector processors.
`The rational for vector processors is discussed, along with performance
`models and benchmark performance of real machines.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 16
`
`

`

`Preface
`
`xv
`
`As noted previously,
`text combines both the historical and contem-
`porary ideas on memory systems and pipelined processor implemen(cid:173)
`Undoubtedly, I have missed some relevant information and some
`of the material has been misinterpreted. For those errors of omission
`and commission, I take full responsibility.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 17
`
`

`

`1
`
`Memory Systems
`
`1.0 Overview
`Th.ere are tnRJlY models t.hnt iu-e u~ed to evnltrotc lhe pcrformanoo of a
`pr'OC(?saor. Ono of tho most i.nsightiul. the model for the time required to
`execute a task, is
`
`wk Lime .. nwnoor of instnl.ctioM oxecuted >< clocks per instruc,;ion
`(CPI) x clock period.
`
`'fhe n1;1moor ur in~Lruct.iutlli executed la usuallj ll f\JJittlion of the to•
`.struction set design and is outside the scope of thil!, book, but the second
`and tbl-rd tenns are the subject of this book. Toe design of the me,cnc>ry
`8}'Stom and tho pipeline, proccsaor have a direct. bearing on tbe CPI tl1ot
`ai.n be raali:red f~rn a proce:s!ior. A$ will be pointed out, there are CPI
`desi.gn considerations that affect the clock penod; the CPI can be redui.-cd
`.at tbe CJtpense of a lotlg(!.l" doclc and otherwise. Tbu importanl. metric is
`the pl"Oduet of theae two ,terms. not tlwir individual value3. The illterec:·
`tion11 between t.he memory 11y1:1wrn and (ho pn,celllkor will lw tNident us
`eac;b of these mujor aubfuoc:t.irms ts de-scribed.
`Tho pe:rfonnanou of a cornput.or depends in large m eit$UN? an the
`lnterfaoe between the processor nnd memory, lf tlus inwrface is oot
`oorrod., n ltigni!ieant inarettsi;, in OPI cA.n result.. Tho p~!lor mnku
`demands upon the memory for irlStnu:itiomi and daia, artd th16 denuuul
`muat be aatisficd in Ol'der to realli,e the futl potential of ~he prooe850r.
`'11w follow-ing two qu.oteti from ll!'LYN66J dC!uill two piU"amcU)l'S of the
`memory thaL are lmportant.
`
`Latency or latent period is the tctn.l timr associn.te.d wilh l,ho procmi$•
`ing (from excitation t.o response) of o particular dnta tmit at " phase
`i:n the oomputing pl'OOlss.
`
`l
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 18
`
`

`

`2
`
`Memory Systems
`
`SPARC SPARC MO>S MlPS
`lnt.egw FbWlll In~ FloittL'\&
`Pol.nl
`Poutt
`
`IB:M 81360 VAX AVERAGE
`
`tn.lnletlon
`.T.»ta mad
`Oata write
`
`0.79
`0.16
`0.06
`
`0.80
`(>.17
`0.03
`
`0,76
`0. 16
`0.09
`
`0.77
`0. 19
`0 04
`
`0..60
`o.3!S
`016
`
`o.~ 0,71
`0..22
`0,28
`0.0'7 O.CYT
`
`TABL£ 1.1 Momory reference dJS1ributlon.
`
`Bandwidth is an expression of the tune rate of oocun~ce. In par'b(cid:173)
`cu lllr, computational or exeeution bandwidth is the nu.moor of
`instruct.ions proc.css-Od per second, and 11torag\! bandwidth is the
`retrieval rate of openmd and ope.ration memory words (words/
`second).
`
`The t.echtllqueli use<! to provide low-lat.enoy and high-bandwidth
`memory for a processor depend hea"ilY upon the nature of the roferenoea
`made to the memory by the proces&0r. Table 1.1 shows the fraction of
`memory reCerenccs claaai6ed dB Instruction .Fetch, Data Read, and Dat.a
`Write for a number of procet1eors. Notice that instruction fetches repJ'(l•
`sent betM•c,en 50% and 80% of all refc.renoos. Dato. reads Nlptesent l6%
`to 35~ of all reforon.cea, while data write& rapl'C.$cnt less than 10% for
`all proeesaors other than th.e IBM $/360..
`The implication of these reference statistics is that the priorities of
`a memory 11ystem design are: first the inst.ruction fetches, second t.be
`data reads, and hist. the dste writes. The obvious 60lution to the problem
`of providing both low late.ncy and htgb bandwidth ill to w;e higher ~(cid:173)
`fonnanc:e memory devices. However, the bistory of increMing tho
`bandwidth of memory devices Cei.reulta) over a ton-year period is not
`enoourtlging. P\tbliahed data from the International Solid State Circuits
`Conferenoo provide normalit.ed informatioo on the reduetion of access
`t.ime.s of dynamic RAMS and tJt.l\tfo RAMS 11hown i n Fig\lre l. 1.
`Note thst while clock periods have decreased by a factor of approx:!•
`matcly 23 ( ll reduction of 27~ per year over the laIJt dee.ti.de), SRM.i
`cyclo t.imt>. have decreased by onl,y a factor o.r 18 (s reducUon of 25%
`per year) 1md DRAMS by a factor of 13 (a reduction of 23~ per )'C:Or).
`'fbese da&.a show that mttmory performance incTI?ases ba,1e not kept pace
`with reductions in processor clock periods.
`One conclusion from tha d4ta of Figure l .l is that c.he required
`decrease in latency and the increase in memory bandv.'idt.h ha,,e come,
`and will oontJnuo t-0 ootne, not from improved memory technology but
`from improved memory system de-sign. Memory pe.rforman0e will prob(cid:173)
`ably oontinu.~ to lag processor clock cycle-time improvementa. and mo,-e
`p:roce1380r c.-oncurrency will further increase the demand on memory. Thus
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 19
`
`

`

`FlouM 1.1 Mctmory and pro00$sor cyclo,llmc lmprovomoni:s.
`
`innovat.ive memory en,hifJ!ctureY 1tre mJ1.ntlat.ory if proce890r perfor(cid:173)
`mance is to continue to increase.
`'1110 n~ed for increased memory performance baa occurred in a period
`of significant memory cost decreases. Memory coets have been decreasing
`d.--arnatically since th.., early eoT!lmarcial compournJ. A 1-M byte of add(cid:173)
`on mrmory for IBM S/360 computers C05L $1M in 1965I ln 199-4, a 1-M
`byte r>f memory for t1 penlona1 computer coi;ts ap1u•o.x:imately $50; a
`dra.mat.ic cost roouction of approx.imatcly 20,000-fo.lil.
`'The relative cost reh1tion&hip between fast. and Blow memory
`wropon ents has n,rr13.ined relatively constont over this period 8.Jld
`ean ~ upre~ed by Oroi!cl,'11 Law (GR0853l. Thia law is an empjricaJ
`observation that states "giving added economy only as the square root.
`of the increase in speed-that ii;, to do a calc:ulatlon ten times as ch~ply
`yau must do it one hundred times as foai.." The converse bas often been
`st.at~d /1& if you double the price of a oomputer or computer component,
`the performance will increase by a factor .of four. Grosch's law am be
`confirmed by noting tho relative cost and performance of SRAM and
`ORAM chips. Note thst. Oroseb's law, aa posited, applies over time and
`does not. eoneid1.1r learning-curve cffect.4. Huwcve,r, the la w is fl'\!queot.ly
`applied to one t-ime perlod u when looking a t mf!mbel"t or n 00roputeT
`Ct.imlly.
`The ideal memory Cor a computer would be one that. re.ruoV'e!! the
`memory bandwidth 8.lld latency coul:it.rainL~. Such ~ memory would have
`.zero aoress and cycle time (zero latency and infinite bandwidth), be of
`i.nfiuite sit,c, and have u.,ro QOst. Obvioual,y, we will never have such an
`Wen.I memQry. Over three d«JU!es, desig:oers hAve attempted to ap(cid:173)
`proach the zero latency of this Ideal memory by Lh~ archi!A!ctural path
`of n memory hientrchy.
`The following Lwo sections int:roduoo tho l88ucs of supplying low•
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 20
`
`

`

`4
`
`M emO'ry System3
`
`Latency
`Clocks per
`umtructioWJ
`
`Normalized
`Performance
`
`0.0
`0.2
`0.4
`0.6
`0,8
`1.0
`
`LOO
`0.83
`0.71
`0.62
`0.55
`0.50
`
`Note: Basic CPI .. 1.
`
`TABLE. 1.2 The effac1 of memory latency on processor performel\oe.
`
`laU!Dcy and high-bandwidt.b memory. Chapters 2, S, and 4 give details
`of low-latency memory in the forms of caches and virtual memory. Chap(cid:173)
`Ui_r 5 det1cril>M the t.eclu\iques for providing high-bandwidth memory
`using mte.rleaved low-bandwid th memory component&.
`
`1.1 Memory Latency
`The i. sue of memory latency ts quite important. with regard to pipelined
`processors. Con.sider the effect of memo.ry latency on VA.~ 1 1nao perror.
`mn.ncc. l[this proooS$0r had a. pcrfecl mwnory, with a band.width of one
`clock ))el' reference end no additional lo.ooncy. au instruction execution
`would take approximately 9.6 ctocb. Actual memory !atency adds ap•
`proxim&t.ely 0.6 docks, giving o. CPI of appronmat.cly 10 clock.$, wbieb
`is an inc:reasa in the CPl of appt"Oximately !'ill;. However, with a pipelined
`processor that may have a CPI of 1.5 with a one cycle m.runory, 0.5
`additional clocks (1 clock i8 o.coountcd for itl the basic CPI flgure) repre(cid:173)
`eent. o 33~ in,crease in CPl The res-son letency is important is t.hat when
`n memory request is made, the proct?8SOr Jllay be icJle Wllil t.hnt requet.t
`is sat[sfied. Table 1.2 bows the impact of latency on the perfo:rmanoo of
`a processor.
`As e:lQ)eeted. performance will de.crease by~ when the memory
`reference latency is equlll to the proees11or perf'Onnlltlci:! without latency.
`C\trrenl memory t.ech.nology ls capable of satisfying mem.ory bandwidth
`requiromcmts for oven the most. powerfol computers. Bot Lho satisfaction
`of lat.oncy is a mlljor design problem. With increasingly faster processors,
`the memory pc.rformnnoc bottleneck bccom«:$ even moT<e J}l'Onounc:ed.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 21
`
`

`

`..__P_.ik _
`
`1. 1 Mem-ory LlltBncy
`
`u r-· · ·
`
`•
`
`•
`
`1
`
`5
`
`1
`
`__.r ►i LI
`·•·-·
`
`• Sm.Ill
`
`• £.q,a,,t,o
`
`fioUII£ 1.2 Hlersrchlc:al memory.
`
`1.1.1 Hierarchical Memory Systems
`Hjqarchical memory U&e6 a biefflrclly nf mem-oey unita, a.s shown in
`Figure l .2, t.o satia{y the low.Jatency re,qt1h:ement. of a memory 6)'!1te:m.
`Starting with the regiBt.crs in the processor, tho s llorage capacity in each
`level tnCJU8es and llhe apeed and the wet per b it dccreall@i. For most
`references, the referenced item is found in a memory with low latency.
`Frequently ref~renced itenu, ~ $Wred in high-i:,peed. costly memory,
`and infrequently referenced items are stored in a low-1>peed, low-coat.
`level. The weighted average latency is decreased by IJtOring itU1tntctJona
`and data l.n the level that ta oonsiBtent ·with the frequency of the ref~
`cnced i~m.
`l nt.erelltingly, von Neumann (BURK46l not..id in 1946 the require(cid:173)
`ment for a three-level memory hierarcl\y. The three levels ere the pro(cid:173)
`ces.sor internal regis~rll, the main memory, and bulk stA>ra.ge on magnetic
`wire recorders.. Hierarchical memory t\)'Btems permit the dQigJter to
`trade latency or &cce88 lime ag-ain!l.t cost.. The deaign goal ia that the
`p:roce:8sor will see a memory thot ill only slight ly slower than the fastest
`memory io tho b.iortu'Chy.
`This book does not directly address the prooes&or registers, although
`they are a member of the hierarch,». The r\!gi8ter-lc:'1el design conaadcr(cid:173)
`ations are basically the concern of the illstrucdon set designer. However.
`indirect. effect.!! on t!te perlorma.nco or the memory hienlrChy arc found
`in auch item& a.s t.he memory bandwidth requirred t.o support cont.at
`switching a very lnrgo register rue.
`Three terms that apply to all levels of the memory hierarchy are
`defined " follows.
`The h~ran:hical kueJ le usuR.lly clenot.ed by an index befPnning with
`1 fol' the level ,1.earest the pl'OCeasor and progrc:asing upward. But, in
`rolative t-0,rms, the '"highe5t level" is the level cloa<!St to the prooossor
`and the "lowen level" is the sloweat memory of the system. ln the
`wscussions IA> follow, the convention of the index 1 iA U8(!d for Uie highest
`level, with indices 2 · · h denoting the lower levels.
`Multileucl inclll$iort is the property that Lhe cont.eats of eru:b level
`of the hiera.reh.v h is alway& found at level h + l [LAM79J, CBAER87l.
`Main rMmt>ry is the lw-gest r11.ndom acce_~ memory in tlw system.
`Early compatel'il had foll1' levels of memory: registers. main memory,
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2142, p. 22
`
`

`

`6
`
`Memory Sy11tem1
`
`di.ab or drums, and archival storage such as magnetic Lape. For t.helM!:
`comput.en, the main memory waa m8.KJ1etic core •ith

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket