throbber
ARM_VPT_IPR_00000212
`
`ARM Ex. 1007
`IPR Petition - USP 5,463,750
`
`

`

`US. Patent
`
`Apr. 24, 1990
`
`Sheet 1 of 13
`
`4,920,477
`
`v-
`[(0119565in I
`
`.
`
`"mama; 2—— ”
`
`‘l
`
`g---—
`
`1--I--.-/
`
`IT
`
`U
`
`__/Z_'1 _______
`.5fo
`————————— 1, mm
`9&7ng
`'
`Mffd'ffl?
`——--
`I’M-M
`
`£35555
`[/3
`Z!
`macaw 3
`1“} __-==_
`WM
`%--—
`lflffd‘fi?
`——-
`IIIII
`_____________________ ..l
`W152?!
`I
`:Eilll
`'--'”Hams”/""'—"_‘
`-—=_I==!Il
`-lffé‘i.¥
`(IfHfi'fl?— _
`_—_-: P5095550»?fl
`--
`f9 2:!!'
`_-_-4|
`19ft;’EE h—----l
`
`III
`mm =.‘-lilfin
`' I:
`1003515 -=_.0
`iii—
`l:'{:9
`Full
`W12
`".2” _
`«9
`E? I
`:!== ,1;._iil
`I!!!
`l--.—gal“
`iii-f!
`IlilJ—zw
`l:::---
`m
`"E“?
`
`
`
`| |5r |
`
`"-—
`
`---
`
`— ’3’
`.12-1!!!"
`35
`
`-=-.. 42
`
`56'
`
`4'19
`
`1er
`————— _,
`. ”-15;
`
`_= [if
`'=: 95” mmm
`
`[—-95” mm;
`
`ARM_VPT_IPR_00000213
`ARM_VPT_IPR_00000213
`
`

`

`US. Patent
`
`Apr. 24, 1990
`
`Sheetlofls
`
`4
`
`0,477 '
`
`gagiqfia..
`..~§§~33,
`_~533%__53.3.3..._
`
`
`
`an.“a.RQEtahawk...fihkuk‘u
`
`.3kaRafi.
`
`_.._—_.-—.—..\
`
`.33....k.3has
`
`xa“.&.m§~§
`
`ARM_VPT_IPR_00000214
`ARM VPT IPR 00000214
`
`

`

`US. Patent
`
`Apr. 24, 1990
`f/f fill/5.565
`
`Sheet 3 0118
`{1 01.9 ”555'
`
`4,920,477 ‘
`
`6'4 A" J!- AWE?
`
`llffd‘f! [[9 I? _ NE!!! 116’ /
`
`
` Ifflfff! {/15
`
`
`mm’3’ mi
` FlRalf/1|"
`
`
`MIR/2704f
`2955!! f 356‘
`
` mm
`90%;?!
`
`
`HMS/6W JMFSS
`”if
`
`A.” w:
`
`Pam-z;
`
`W
`
`
`
`x:
`
`1M!55
`
`
`
`
`
`m x
`mm”:
`m
`mm
`
`”swarm
`mm
`"swarm
`
`
`
`my:
`my!
`mat/mat
`
`
`
`
`mr
`mr
`mm”
`
`
`(J! I 3!) £45!
`{It I 5?) MM
`(I! N!) Mil
`
`
`
`
`
`
`32W! Wit?
`3‘?wa Wlff
`
`Jéflfflflff
`
`
`F/flfi
`
`ARM_VPT_IPR_00000215
`ARM_VPT_IPR_00000215
`
`

`

`US. Patent
`
`Apr. 24, 1990
`[72' M5555
`
`Sheet 4 of 18
`FM.” £05555
`
`4,920,477
`
`REWSI'EI? flit“
`“Iii-1?”?!
`
`.5757 l
`I 51%!
`{Mir/#5 AMI" ”’flj F! 0117/16 Pd’flf
`PM!
`
`AM ”Pt If]? All”? .4“!
`
`14995? M? .41 0’
`
`
`
`
`
`
`”my mm
`
`m
`
`
`
`mm mu
`
`m
`
`we:mam fit:
`3.? 1 32 4/13"
`
`f/F 3:73.555
`
`SM?!"
`595355
`
`was?)
`
`W
`
`11
`
`
`
`
`
`Wfi
`JMJIH
`
`1995?
`101'”PtIt?
`
`
`MMWHJI
`IISHWIMI
`
`
`
`Mil! if”
`Sig/f!
`
`
`
` (IIIJIJMH
`
`(”1322mm
`
`
`3!
`I'll HP! If!
`
`
`
`Mfl?
`Milff
`Plflff
`
`”I
`
`
`
`FIG. 4
`
`ARM_VPT_IPR_00000216
`ARM_VPT_IPR_00000216
`
`

`

`US. Patent
`
`Apr. 24, 1990
`
`SheetSofls
`
`4,920,477
`
`
`
`
`kuahbah».M.\$.93.whfimfix\mikxmihk‘uhafiamNghixuwm.«$5.3..§s¥kh§§<MV§QN3Q.RVkn“..333....K$6..558.uh
`
`
`
`
`
`
`
`
`
`
`
` sawQNPI‘N:6563%!“Sam“..afi‘
`
`
`
`
`
`
`
`I #
`
`nail.E8mm:-x‘afifimag‘
`
`.2a.3ifig.58
`
`I ‘
`
`sé‘
`
`|lI
`
`Q‘
`
`I I K
`
`K II
`
`I I i
`
`3
`
`IIIIIIIIIIIII
`
`£39»N3!.\3Cfih‘uxwtkxuuhMaiuse.“
`1.53.&.\§RxTran“T):«Q
`
`63%Mai-N39‘
`
`ARM_VPT_IPR_00000217
`ARM VPT IPR 00000217
`
`
`
`
`
`

`

`US. Patent
`
`Apr. 24, 1990
`
`Sheet 6 of!!!
`
`4,920,477
`
`
`
`.-hfixi.Nxxutux
`
`“a.s
`
`
`
`3%flaw“..umSQ...E
`
`
`
`..$3:.3.“
`
`3E3
`

`
`«.33&3.QEm
`.33e.3a?xx.x.a3%am.8:ht.
`flax933
`
`$.36fixQSQEHk.
`
`ARM_VPT_IPR_00000218
`ARM VPT IPR 00000218
`
`KNQQumQ3§§u
`
`
`
`‘5‘SeatGK.Qb‘afiwxsaw.
`
`gnawkiwimm‘uaq
`
`am.
`
`
`
`sumwas:”3.33%
`
`7
`
`
`
`._xatI"._-K“Nik-"3kathKm.\Qhkhfihfix
`
`WwwmumkmfihglfiwmwmLasQ9%Qask.
`
`«EN
`
`€53hmQ38
`
`
`
`
`
`
`

`

`S.
`
`7
`
`1
`
`4,920,477 '
`
`
`
`
`
`._33%.$5.is3%--.Se233.Se.‘3m.&ESQ3aka:.§NEE.Emgaa.$.33anEE§¥akaA33%Shim:§3..~32.
`
`.._.3.ea.1Etfi.xIMy.NAQua3%Se..éfimt
`
`
`3.“.m$9.saw$3323
`
`..53$-wnm.iaham»Ru.3
`
`IEI_|
`
`.m.Sm33‘
`HumanNMuNNN
`
`
`
`.35<3ma«a :3.n..ES~3§§§-§~.ESQSm“WWW.”H5».\Eghufia?$4.ua8{333
`
`
`__E83$
`
`yam
`
`
`
`$5.$33.a
`
`in».“a.5.5..uI.t3.5.a55%hV.éélll
`-«$1‘3“.mup?»
`.fiRflkkfiNH$§V‘NNh.ANa
`
`
`
`NanaQue
`
`ARM_VPT_IPR_00000219
`ARM VPT IPR 00000219
`
`
`

`

`4,920,477 '
`Sheet 3 0118
`Apr. 24, 1990
`US. Patent
`
` |L|12 l. . . LII!
`
`
`
`
`
`
`E [
`
`-
`
`{6.461%
`
`£9.40 335155 F9}? 1152'31/0/70! P161!"l'5
`r——A——-\
`{w m 112 a; me warm ac MM! MfS-llilff m
`6'! £95.41 WIMW £59
`
`290
`
`: L
`
`--—7r—— 2m— ——————
`——‘
`F— Flag/fig' £45!
`—
`
`mm was
`mm.- m‘:
`my #5119.” M arm
`
`
`
`””5”
`a commas;
`
`ARM_VPT_IPR_00000220
`ARM_VPT_IPR_00000220
`
`

`

`US. Patent
`
`Apr.24, 1990
`
`Sheet9 ens
`
`.s‘xxkhk
`
`
`
`Raw.»3835‘
`
`.3N
`Qawkgasfi
`
`a?»h
`
`3.3%?
`
`3§§~
`
`EN‘aa6Q«qE.\
`
`«353*.»3.!
`
`
`
`«3‘3:itnxstfi.3‘.»
`
`NQEG
`
` 3.263
`
`
`«flaw
`
`Sea.#2.!
`
`333..“
`
`ARM_VPT_IPR_00000221
`ARM VPT IPR 00000221
`
`
`
`
`

`

`1
`
`Sheet100f18
`
`4,920,477 I
`
`1"ES.»m32%
`
`.atE|m.a.»~RN3§§‘a:xNR§§
`
`Mymagmkfi#53a»u.
`
`
`ham#33.8at.»-
`Eb...*8mafia“
`
`Mix?»
`
`RN
`
`m3:&&$5.MM$5.ax3amES23%Iwk
`
`bu...mu.é“EVEQwarma
`
`933‘.»hm.5%.
`
`3.8.6.»kg“.33V3%«q2%.»“a.
`
`
`gay.»§§flag3%QN.MSa‘3cagar.»E.
`
`
`MUNFNSaw?a.»as
`
`a:\“$53ta....3n«.3Nw23a,»hN
`
`.a»as.mE.
`
`ARM_VPT_IPR_00000222
`ARM VPT IPR 00000222
`
`
`
`
`
`
`
`
`

`

`US. Patent
`
`Apr. 24, 1990
`
`Sheet 11 of 18
`
`4,920,477
`
`\xv-.3aaEEK.“
`
`ssh.»min.»\2Q
`
`saw
`
`«3?.
`
`Nv-
`
`aimEEG
`
`m.33xx33a
`
`«an.
`
`5%kn
`
`xx
`
`hukmwnsam.ManN95‘2n\Man
`
`-33.“S-N3E
`
`Natin\NE
`
`\VI
`
`93‘“ENaQB.
`
`
`
` hmmany5%»93¢3%v
`
`34.3%.“.2‘
`
`Eiamfiq
`
`3Q55.
`
`mg
`
`fixayawn
`
`Ram.
`
`.29“3%xx%N&%%
`
`sv-
`
`3%».
`
`EEEa
`
`
`
`as“..32.“
`
`
`fitEsq«lamkma“E3»
`ans.“§§§§fififim
`
`
`
`QM§Q§EQ«MN.«5»8E$§§Ems-QHas.
`
`36wAnesh!“Kiwi
`9.5%kg39...wafiia
`
`
`
`w§§a§“Spa.3%mmRamaBilxi
`
`§§§a§I
`
`aha?Eek
`
`NN.UNLN
`
`ARM_VPT_IPR_00000223
`ARM VPT IPR 00000223
`
`
`
`
`
`

`

`US. Patent
`
`Apr. 24, 1990
`
`Sheet120f18
`
`NENEw
`
`9%3+3a:3.5.3
`
`__
`
`.Illlllllnlllllallll.
`
`alwmmmwdmaa.....4.
`Shaky.Alwxfikk.
`
`
`
`MQ.§\MQ§§§R~§wxfikiNM
`
`adha.2\x
`
`\Ekfi§hwk
`
`has.aE
`
`.33
`
`mum
`
`ES.gt...“.33
`
`\.xxxhag.x.5553‘awk
`
`Q.3Ex“...5c.
`
`$1a..eRR
`
`$933$22thuuwkE-
`
`“EaQ
`
`
`R.
`
`ARM_VPT_IPR_00000224
`ARM VPT IPR 00000224
`
`
`
`
`
`
`
`

`

`US. Patent
`
`w.A
`
`4
`
`0,477
`
`as“:ax?a4.3V‘1de
`
`aE$33.as.
`
`aa,}rEn‘gs:
`
`NM.3.35I33...
`
`mafitsfi93‘a.xl35.33§¥
`.3%...$3
`
`.EtaMyNa§.x§hw§h¥3%...I\k.
`‘$3xEREIIIg,at...rat:54%
`3..«Ex.(3a.
`
`
`
`
`3.5.3...
`
`
`nwaafihn3%E._BQ.Qat:3~me‘ifiufiflr
`
`‘3“.$35&3a_«a?St“$3_53a..$5..5:_.._g:as:a.§<3¥.§§
`
`
`
`mxumfimfiF.m38RruuuuuuIIIIIII1_e523*42‘B.3313.fixficHaws
`
`
`
`kaw«Mkmikm.RESQ
`
`
`
`
`
`
`
`MERE.“mfixkhfiubs‘...ukxfiafieMafia?2358..kthlx
`
`
`
`
`
`ARM_VPT_IPR_00000225
`ARM VPT IPR 00000225
`
`
`
`

`

`Patent
`
`Apr. 24, 1990
`
`Sheet 14 of 18
`
`0,477 ‘
`
`
`
`38‘at,4“an“auxmmES.“3“.Exam»
`
`3.3:.3.Sag
`
`36a.
`
`.
`
`g E
`
`w§§§
`
`whwfifiw
`
`Nfiauka
`
`k‘hfihuxNV
`
`§~§u
`
`xhudwa
`
`
`
` gmfiw%Na3:w3%NaSQ!akkfiwk
`
`H§§N5&qu
`
`as“.
`
`*EVEQQR»N
`
`bquSufi
`
`
`
`333%hhukmmfiQ:33.mamas“.
`
`$3a:
`
`$5.~§a.3.
`
`3.3.
`
`Rfiwxumm‘
`
`
`
`3fiwtuafi43%U.E:«aas
`
`
`.3has“..QNQSQQhwau‘kxu$th
`\fiwmx43$.NQ
`
`Nam.
`
`
`
`uahgwhxkysuEssa?
`
`
`
`
`
`mwwuwfiuSVE.-fihk‘
`
`t.Vwas:-N26-3‘
`
`$.kaH§éa.NS,I$1$3Em:“g3:3
`
`I«bank.
`“$3“
`
`
`
`
`$5.a§s§mm
`
`
`.uhfiwfia.mum
`
`
`11111L“Ex:_§§$“gmfiwt”Esra_.3am...
`
`3‘“$3:-t“.1wwmwam.“
`
`
`
`flung.»ha.a:
`
`...3maym$32§$§§
`
`-Q.§.
`
`mRwahauifiw
`
`.‘l:3Es.beExEEKé«a
`
`Mafia.»a.3a:a:2R3.3%33%
`
`at:«5.
`
`ENSVNuQNVN
`
`\h\..\_QEh!§§VQNKQSQ
`$3$323:.53$
`
`Km.“5%!wxikg‘53.x
`
`
`ARM_VPT_IPR_00000226
`ARM_VPT IPR 00000226
`
`
`
`
`
`
`
`
`
`
`
`

`

`US. Patent
`
`4,920,477
`
`
`
`3%!kfiufifi
`
`Rm.
`
`
`
`
`
`h3§§~.258“.39?“.u3.:Eu.
`
`
`1...,3%:33“$333.aM«3.3»atzst.xm..33ESEEa.
`l—Illll
` QESQ"kahxhhw53%.waaeshfifihH_===___
`
`
`
`a:3$3433¢Emnaga53%33$
`
`ESS....3‘35.:hasE3
`
`
`
`II}_E53a$§§.m2%.E33EEEEEn
`
`53$.“23EEs«55%.
`
`3E.§§amgamma3%mg.‘E?
`
`$33“.$3kittsa§§§~333v,uxx
`
`\2%aa,E2
`
`
`
`
`
`a..\.$.§-33.3.33335%.3
`
`QTR.a.ufiufiafianE3»
`
`$2.33.x3%.a3:.»
`
`
`
` «3.xEa“3%we.3.us:a?a3.....§§a3§Six:MN..“NRx3..&§a3334.5%
`
`ARM_VPT_IPR_00000227
`ARM VPT IPR 00000227
`
`
`

`

`US. Patent
`
`Apr.24,1990
`
`Sheet 16 oils
`
`4,920,477
`
`
`
`khakiRfifihfihtkxaisq-Range.
`
`*hawk!N393‘“Gk233$.xkhkhxuw.»
`
`:3
`
`N4.»kghug!x.w.»535<§§k.3“5%.ankfixtkkéx!d.»
`
`Q»
`./
`
`.QN.UNPN
`
`3w.NQ3%MN
`
`$36an.Nwswhh asEcg
`535%ka‘Vkuumamxxawx
`
`
`
`
`txaguo.383%
`
`a.
`
`M
`
`E‘xaw‘\.Auk:
`
`
`
`
`
`as.§§h§§~5§*uhxfimQ.Q‘3VxMhhhumHuh3“:
`
`
`
`xbgfik§u§§€MNa33a“
`
`
`
`Nquuk«3‘has:§§§§$hasIN
`
`
`
` hhx3.3.aa3a.‘3‘:33%a
`
`ARM_VPT_IPR_00000228
`ARM VPT IPR 00000228
`
`NNV
`
`«53‘
`
`
`
`
`
`
`
`fiwhkfi*SQ‘QQ“9.3%.:.
`
`

`

`US. Patent
`
`Apr. 24, 1990
`
`Sheetl'ToflS
`
`4
`
`,477 ‘
`
`
`
`.3kathwag568‘\hhkkaEaikha.§h~§§x
`
`
`
`
`
`a.Qh...k.
`
` ESE.b.k.haan.
`a‘hfi»has.Naurua3‘Qa.a.
`
`«‘3xx«8«S.3:3.33.:his.max».
`
`.33heBe33!
`
`
`
`.283.kfibfiazx
`
`RN.UNLN
`
`
`
`
`
`khsunx$36.33xwmfia“3%has-a
`
`hSEQ»w.»NRwhfiu»\sang.»§3?“.anaEé2
`
`-
`
`abesxkx
`
`.3.3‘k.
`
`ARM_VPT_IPR_00000229
`ARM VPT IPR 00000229
`
`
`

`

`US. Patent
`
`Apr. 24, 1990
`
`Sheet 13 of 18
`
`4,920,477
`
`a3gE.E.
`
`3.8-x
`
`
`
`m.Easy.»
`
`
`
`a%\.UNLN
`
`
`
`«NMENNNNN“.135.
`
`\E.nMAP»!3.heVENQ
`
`
`
`
`
`
`
`N38~35“.NNKKV.VMNNNNNV.NNNK
`
`[IE
`
`
` be5455‘!—-5QNggsqx
`luifl
`
`3Q9353:.
`
`‘3ENNNQ
`
`Q..N\0.35.3
`«NH“3%:
`
`xxxQNNNNN
`
`NVNN-N
`
`VNNNMNN.»
`
`RSNV
`
`N392;»
`
`«8.Cast!
`
`L56-V
`
`
`
`
`
`as.ENNNNV\NNQ33‘atNNNNNNNV
`
`
`
`2‘33.»
`
`NNNQNNNVV
`
`3.338%.!u.»33‘NNNNE.»INNNQ
`
`
`
`
`
`
`
`.§§s§.333NR\VNN‘NN
`
`ARM_VPT_IPR_00000230
`ARM VPT IPR 00000230
`
`
`
`
`
`
`
`
`
`

`

`1
`
`4,920,477
`
`VIRTUAL ADDRESS TABLE LOOK ASIDE
`BUFFER MISS RECOVERY METHOD AND
`APPARATUS
`
`BACKGROUND OF THE INVENTION
`
`The invention relates generally to pipeliued com-
`puter apparatus and methods and in particular to a
`method and apparatus for handling data table look-aside
`buffer misses in a data processing equipment using vir-
`tual address data.
`Substantially all multi-user computers employ virtual
`memory systems. These systems provide substantially
`unlimited memory addressing space. Typically.
`the
`processors, however, operate to the on-board high
`speed physical memory available to them. The on-board
`memory can. For example, be dedicated to a user and
`each time a user changes. the entire on-board memory is
`swapped. storing the data associated with one user in,
`for example. disk memory, and reading and storing data
`for the next user in physical memory.
`In a Trace computer, such as that described hereinaf-
`ter and based upon methods developed in part at Yale
`University, the data processor has a pipelined CPU and
`a pipelined memory. Further. the CPU generates virtual
`addresses, not physical addresses, and employs a data
`translation lookaside buffer (TLB) to effect a virtual
`address to physical address translation. It is important in
`such a system, which also provides for parallel process-
`ing using a very long instruction word having a length
`of. for example. 1.000 or more bits, to provide the ad-
`dress translation without a major sacrifice of either
`available pipeline depth or time.
`A noted above, when multiple users are present,
`memory is typically swapped between fast physical
`memory and slower storage such as dish. so that for
`each change of user there is a change of memory. This
`results in an undesirable decrease of system perfor-
`mance. Furthermore. when a pipelined memory system
`is employed, a determination that the required memory
`data is not available in high speed physical memory can
`cause a yet larger degradation in system performance
`since the memory pipeline must be drained and the
`entire system reset to the instruction having a data miss.
`It is therefore a primary object of the invention to
`provide a data processing method and apparatus for
`addressing a pipelined memory which provides high
`speed data TLB recovery when a miss occurs during a
`virtual address to physical address translation. Another
`primary object of the invention is a data TLB which
`minimizes user hashing. Other objects of the invention
`are a method and apparatus which enable reliable and
`efficient system recovery of a pipeline memory after a
`data TLB miss. Further objects of the invention are a
`computing method and apparatus which are reliable.
`fast, and capable of operating in a parallel processing
`environment.
`
`10
`
`IS
`
`25
`
`35
`
`£5
`
`50
`
`55
`
`SUMMARY OF THE INVENTION
`
`The invention relates to a virtual memory addressed
`table lookaside buffer miss recovery method and appa-
`ratus. The apparatus is.
`in a preferred embodiment,
`associated with a parallel processor having a central
`processing unit and at least one pipelined memory con-
`troller circuitry, the central processing unit addressing
`data using a virtual address memory table lookaside
`buffer. The data miss recovery circuitry features a first
`in-first out buffer register for storing virtual address
`
`ARM_VPT_IPR_00000231
`ARM_VPT_IPR_00000231
`
`2
`least each
`data from the central processor during at
`memory access instruction, a first in-first out buffer
`register for storing instruction status data during at least
`each memory access instruction, circuitry for detecting
`an instruction initiated memory access error condition.
`and circuitry responsive to detection of the memory
`access error condition for at least correcting the mem-
`ory access error condition and replaying. in sequence,
`the instruction causing the error condition and those
`instructions entering, and in, the memory pipeline after
`the instruction causing the error condition. The replay
`circuitry is responsive to the first
`in—first out buffer
`registers for replaying those instructions.
`In a specific aspect of the invention, the instruction
`status data includes at least operation code data. status
`data identifying the type of error. and data representing
`the destination of the memory access.
`In another aspect of the invention. an apparatus for
`reducing data memory thrashing has a multi-user data
`processor employing virtual memory addressing at the
`central processor level and at least one data table looka-
`side buffer for translating a processor supplied virtual
`address to a physical memory address. The apparatus
`features circuitry for assigning to each prDCessor user a
`system identification number. and storage circuitry for
`providing a virtual addreSs to physical address transla-
`tion at a buffer address derived by logically mixing a
`selected portion of the virtual address with the user
`system identification number.
`recovery
`The data table Iookaside buffer miss
`method. according to the invention, features the steps of
`storing sequentially generated virtual address data from
`the central processor in a first iii-first out buffer register.
`storing a sequentially generated instruction status data
`in a first in—first out buffer register. detecting a memory
`access error condition, correcting the memory access
`error causing that error condition, and replaying.
`in
`sequence, the instruction causing the error condition,
`and those instructions entering and in the memory pipe-
`line after the entry into the pipeline of the instruction
`causing the error condition, the replaying step being
`responsive to data stored in the virtual address and
`status data first in-first out butler registers for advanta-
`geously replaying the instructions.
`In another aspect, a method according to the inven-
`tion for reducing memory thrashing in a virtual memory
`addressing system having a data table lookaside buffer
`for translating 3 prooessOr supplied virtual address to a
`physical memory address, features the steps of assigning
`to each user a system identification number and storing
`data providing a virtual address to physical address
`translation at a buffer address derived by logically mix-
`ing a selected portion of the virtual address with the
`user system identification number. In a particular as-
`pect, the invention features exclusive OR‘ing, on a bit-
`by-bit basis, the system identification number with the
`selected portion of the virtual address.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`65
`
`Other objects. features. and advantages of the inven-
`tion Will appear from the following description taken
`together with the drawings in which:
`FIG. 1 is an electrical block diagram of the overall
`structure of a computer system in accordance with a
`preferred embodiment of the invention;
`
`

`

`4,920,477
`
`4
`52, 54, and 56. In other embodiments of the invention,
`more or fewer clusters, input/output processors, and
`memory systems can be employed.
`Referring to FIG. 2. each memory system has a mem-
`ory controller 58 for accepting memory reference re-
`quests from. for example. the central processing unit
`and for generating the necessary control signals over
`lines 600. 606 to access dynamic random access memory
`chips. The memory chips are organized into blocks of
`memory 62 and each controller 58 can control up to
`eight memory blocks, called “banks." Each word of
`memory is thus addressed by its controller number, its
`bank number, and the word number of the particular
`bank (the “word-in—bank”). The number of controllers.
`as well as the number of banks associated with each
`controller, can vary with the configuration of the sys-
`tem. Referring to FIG. 1. a preferred memory configu-
`ratiOn has eight memory controllers 58. each of which
`can receive data from the central processing units and
`provides output data to the various units of the system.
`Each memory controller provides access to each mem-
`ory bank 62 over the lines 60a and 60b and receives the
`result of the addressing inquiry over lines 64 and pro-
`vides data for storage to its banks over lines 65. In the
`illustrated embodiment of the invention, each memory
`bank 62 stores two million bytes of data: in accordance
`with the preferred embodiment of the invention.
`the
`memory is advantageously interleaved.
`In accordance with the illustrated embodiment of the
`invention, each memory controller 58 provides a multi-
`stage pipeline which generates the necessary control
`signals to access the proper dynamic RAM of memory
`banks 62. The memory write operation is a pipelined
`write procedure which provides for storing data in four
`beats of the equipment. The cycle time for storing a
`word is about 240 nanoseconds for the components used
`in the illustrated embodiment. Because the DRAM‘s are
`busy throughout this period, only one write request can
`be processed during the interval.
`Referring again to FIG. 1. the input/output proces-
`sors 36 and 38. in the illustrated embodiment. act as the
`interface between the CPU and memory on one hand,
`and an external device such as an external computer on
`the other. The external device can be a computer which
`communicates with various other input/ output periph-
`eral equipment such as tape drives and terminals. The
`input/output units also provide for direct-memory ac-
`cess (DMA) transfers of data between memory and the
`input/output device. The input/output processor uses a
`so-called “DMA engine" to control data flow and oper-
`ate a protocol sequence as is well known in the art. The
`input/output processor can contain, and preferably
`does contain. its own microprocessor which controls
`the timing of program interrupts and schedules the
`transfer of data using internal buffers.
`A primary function of the global controller is to pro-
`vide the program counter which generates the next
`instruction address. The global controller also “orches-
`trates" the process of filling the instruction cache from
`main memory during an instruction cache miss. Thus, if
`a required instruction is not found in the instruction
`cache during program execution. that instruction must
`be obtained from memory and the global controller
`asserts control over the various buses to quickly transfer
`instruction data from main memory to the instruction
`cache. The global controller, in the illustrated embodi-
`ment, further has an instruction table lookup buffer
`
`3
`FIG. 2 is an electrical block diagram of a memory
`system in accordance with a preferred embodiment of
`the invention;
`FIG. 3 is a block diagram of the integer processor in
`accordance with a preferred embodiment of the inven-
`mm;
`FIG. 4 is an electrical block diagram of a floating
`point processor in accordance with a preferred embodi-
`ment of the invention;
`FIG. 5 is a representation of the method for storing
`mask word data in a four-wide system configuration;
`FIG. 6 is a representation of the storage of mask word
`and data fields in a one-wide system configuration;
`FIG. 7 is an electrical block diagram illustrating
`cache miss detection and addressing. and calculation
`and storage of the next program counter value accord-
`ing to a preferred embodiment of the invention;
`FIG. 7A is an electrical block diagram showing the
`instruction table lockup operation and address genera-
`tion according to a preferred embodiment of the inven-
`non;
`FIG. 8 is an electrical block diagram illustrating ele-
`ments of the cache miss engine in accordance with a
`preferred embodiment of the invention;
`FIG. 9 is an electrical block diagram of a first section
`of a cache miss engine;
`FIG. Ill is an electrical block diagram illustrating the
`beginning of tag generation in the cache miss engine
`according to a preferred embodiment of the invention;
`FIG. 11 is an electrical block diagram showing the
`completion of tag generation in ,the cache miss engine
`according to a preferred embodiment of the invention;
`FIG. 12 is an electrical block diagram illustrating the
`virtual to physical address translation according to a
`preferred embodiment of the invention;
`FIG. 13 is an electrical block diagram illustrating the
`operating elements for implementing the history queue
`according to a preferred embodiment of the invention;
`FIG. 14 is an electrical block diagram detailing the
`elements of the integer unit history queues according to
`a preferred embodiment of the invention;
`FIG. 15 is a representation illustrating the elements of
`the status queue data word in accordance with a pre-
`ferred embodiment of the invention;
`FIG. 16 is an electrical block diagram of the integer
`unit branch logic and program counter address genera-
`tion circuitry according to a preferred embodiment of
`the invention;
`FIG. 1"! is a pictorial representation of the data in the
`instruction unit early beat immediate packet according
`to a preferred embodiment of the invention; and
`FIG. 18 is an electrical block diagram illustrating the
`interconnections of the integer processing units and the
`global controller
`for generating the nest program
`counter address according to a preferred embodiment
`of the invention.
`
`S
`
`10
`
`15
`
`20
`
`25
`
`35
`
`4-0
`
`45
`
`50
`
`55
`
`DESCRIPTION OF A PREFERRED
`EMBODIMENT
`
`General Structure and Operation
`
`Referring to FIG. I, a computer system or data pro-
`cessor 10 has a central processing unit (CPU) 11 having
`a plurality of clusters 12, 14, 16. 18, each cluster having
`an integer or I-unit processor 20, 2.2, 24. 26, and a float-
`ing point or F-unit processor 28. 30, 32, and 34, respec-
`tively. The central processing unit interconnects with
`input/output processors 36 and 38. a global controller
`40, and a plurality of memory systems 42. 44, 46, 48, 50.
`
`65
`
`ARM_VPT_IPR_00000232
`ARM_VPT_IPR_00000232
`
`

`

`4,920,477
`
`10
`
`15
`
`20
`
`25
`
`35
`
`45
`
`65
`
`5
`(ITLB) for storing a record of which "pages" of in-
`structions are currently in memory and the locations in
`slower, for example disk memory from which they
`were obtained.
`Each cluster. according to the invention, has, as
`noted above, an integer processor and a floating point
`processor. Referring to FIG. 3, each integer processor
`handles integer computation as well as other logic func-
`tions. The integer processor, in the illustrated embodi-
`ment, includes two independent arithmetic logic units
`70, 72 (designated ALUO and ALUl respectively). a
`64X 32-bit register file 74, a virtual to physical address
`data translation lookaside buffer 16, a branch unit 78,
`and a first and a second branch bank 80, 82, respec-
`tively. (Each branch bank of the illustrated embodiment
`is an 3 xl-bit register for storing branch condition data
`from the arithmetic logic units 70, 72 respectively.) The
`integer processor further includes a section £76 of a
`distributed instruction cache memory.
`Functionally. the translation lookaside buffer trans-
`lates virtual memory addresses from the ALU’s to phys-
`ical memory addresses using a table lookup mechanism
`well known to those practiced in the art, and the in-
`struction cache memory provides the ALU‘s with faster
`access to instructions than would be possible if the in-
`structions had to be read from memories 42.....56 for
`every cycle of the processor. The register file 74 is,
`according to the illustrated embodiment of the inven-
`tion, divided into two sub-banks. One sub-bank of
`thirty-two 32-bit registers is associated solely with arith-
`metic logic unit TI} and the other sub—bank is assoeiated
`solely with arithmetic logic unit 72. The branch bank
`circuitry an, 32, and the branch unit 78 are employed
`during multiway branch operations also described in
`more detail hereinafter.
`Referring to FIG. 4, the floating point processor has
`a floating point multiplier and arithmetic logic unit 90,
`and a floating point adder and arithmetic logic unit 92.
`Each floating point processor further includes a register
`file of sixty-four 32-bit registers that is divided in half in
`the same manner as the integer processor register file
`74. The floating point adder and arithmetic logic unit 92
`has access to source operands in one half of the register
`file 98 and the floating point multiplier and integer
`arithmetic logic unit 92 has access to the source oper-
`ands in the other half of the register file. There are in
`addition a first and second branch bank units 100. 102,
`respectively, and a memory store register file 10-1-
`which, in the illustrated embodiment consists of thirty-
`two 32-bit registers. The memory store register file is
`used by the integer and floating point processors of a
`cluster and is the path by which data can be stored in
`memory 42,...,56. The branch banks 100, 102, like the
`corresponding branch banks 80, 32 of the integer pro-
`cessor, comprise a set of eight one-bit registers that
`store coudition codes resulting from arithmetic logic
`unit Operations. These codes can be used in branch
`determination.
`Referring to FIG. I, in the illustrated embodiment.
`the CPU preferably has four clusters. This is referred to.
`in the illustrated embodiment, as a four-wide system. In
`other embodiments according to the invention.
`the
`number of clusters, and their architecture, can vary. In
`particular. there can be for example one or two clusters.
`designated a one-wide or a two-wide system. respec-
`tively. The number of memory controllers and the num-
`ber of banks per controller depend upon the number of
`clusters. For a “one-wide” processor. one might select
`
`ARM_VPT_IPR_00000233
`ARM_VPT_IPR_00000233
`
`30
`
`6
`two memory controllers. each having four banks of
`memory. Other configurations are within the skill of
`one practiced in the art.
`In accordance with the invention. the hardware ar-
`chitecture described in connection with FIGS. 1-4 is
`known to the compiler which generates pregram code
`for the system. In the illustrated embodiment, the pro-
`gram code is in the form of a sequence of [.024 bit
`histruction words for the preferred four-wide system. If
`fewer than four clusters are used, the width of the in-
`struction word can be accordingly reduced. (Thus, a
`two-wide system employs a 512-bit instructiou word
`and a one-wide system employs a 256-bit instruction
`word.) Each instruction word has a plurality of opera-
`tion fields (generally ALU instructions) and the goal of
`the compiler is to fill as many fields of the instruction
`word as possible so that each of the ALU‘s is occupied.
`executing an instruction for each beat of the equipment.
`The compiler stores resource information such as re-
`source restrictions, including access times, number of
`buses, and the number of available registers. The com—
`piler produces an execution code that optimizes re-
`source allocatiOn.
`In Operation, the compiler uses the Trace Scheduling
`method to analyze the flow of a program and to predict
`which paths the program will take. These predictions
`include statistical guesses about conditional branches.
`The compiler develops plots or traces of program flow
`and, where necessary, multiple traces, each with 21 cal-
`culated probability of being correct. are generated to
`describe the expected program sequence. The compiler
`uses various methods to select the best of the multiple
`projected traces and calls upon a “disambiguater” to
`assist in creating code that has parallel structure. The
`disambiguator method decides whether or not implied
`memory references result in a program conflict. that is.
`whether or not memory references can be executed in
`parallel.
`For example, if the program refers to variables “I"
`and "I," the compiler must know, if possible, whether
`these variables will refer to the same memory loeation.
`If they do not, the operations to which they relate can
`most likely be executed in parallel (unless they depend
`on each other's results). Thus, operations such as “write
`I" and "read J " can generally b performed concurrently
`if “I" and “J" are independent of each other at that
`execution step in the program. If, however, “I” and “J“
`translate to the same location in physical memory (and
`in the illustrated embodiment, to the same memory
`controller), the two Operations must be executed se-
`quentially. Accordingly. the more situations the disam-
`biguator can disambiguate, the more the code can be
`made to run in parallel. The Trace Scheduling method
`is described in detail in Ellis. John, Bulldog: A Campiler
`55 for VLIWArchi'recmres, MIT Press, Cambridge, Mass,
`1936, attached hereto as Appendix I.
`In the illustrated embodiment, the compiler further
`permits the programmer to make “assertions" about the
`variables used in the program. The programmer can
`assert, for example, that two variables are never equal
`or are not equal at some point in his program and there-
`after. These assertions increase the ability of the com-
`piler to generate parallel code because they reduce the
`uncertainty about
`the memory references that ulti-
`mately force code to be made sequential.
`Also, as in the case of memory reference disambigua-
`tion. programmer assertions can assist the compiler in
`the case of memory bank disambiguation. Since the
`
`

`

`4,920,477
`
`7
`memory has an interleaved structure for providing a
`higher memory bandwidth. and since multiple banks
`can be accessed simultaneously by the various ALU’s.
`the assertion that the difference between two variables
`will never be zero modulo N, where N is the number of 5
`banks in the system, guarantees that the same memory
`bank will not be accessed twice in the same beat.
`A further, more severe restriction exists. however, as
`noted above, that a memory controller cannot be refer-
`enced more than once in a single cycle. This poses a
`“problem" for tee compiler, since it cannot schedule in
`parallel two operations that reference the same memory
`controller. Therefore. the compiler can make parallel
`only those memory operations in which memory loca-
`tions, if accessed. are accessed through different mem-
`ory cantrollers. Thus, for example, writing code that
`accesses word N and word N+ M in the same best, of'a
`system which is configured with a total of M banks,
`would cause a bank conflict as well as a memory con-
`troller conflict.
`There also exists a stall condition that results from
`two or more references to the same memory bank
`within four beats. During a so-cailed a “bank stall,“ the
`CPU is set to an idle state due to the latency in the
`memory pipelines. The compiler. to the extent possible,
`avoids scheduling operations that cause bank stalls, but
`the occurrence of such an event is not fatal to program
`execution as are concurrent calls to the same memory
`controller. The bank stall mechanism is discussed in
`more detail below.
`
`10
`
`IS
`
`20
`
`8
`PAL PA2, and PA3, receive physical address data gen-
`erated using the data table lookaside buffer 76 of the
`integer processor for addressing the memory system.
`The outputs of memories 62 and SI]. 44 and 52, 46 and
`54, and 48 and 56, connect respectively to integer load
`buses ILI], lLl, 1L2. and IL3. This provides for the
`simultaneous loading of the integer load buses with up
`to four 32-bit words or fields from the interleaved mem-
`ory.
`In addition, however. memories 52 and 56 also
`connect respectively to bus lines [LB and IL2 to pro-
`vide the low order thirty»two bit data for a double
`precision sixty-four bit quantity. That data is transferred
`through the integer processors, along the HF buses. to
`the floating point processor register file for processing.
`In addition, each input/output processor 36. 38 con-
`nects to each of the integer load buses for making direct
`memory access (DMA) transfers as discussed in more
`detail beIOw.
`As noted above, the floating point load buses provide
`a path from memory to the floating point processors.
`Only four of the eight memory controllers, however,
`need connect to the floating point buses, because the
`two transmissions from the memories to the floating
`point processors always use the same four memory
`controllers. In one case, the floating point load. a sixty-
`four bit data word load. one memory of a pair loads the
`most significant half of the sixty-four bit quantity
`through the floating point bus while its neighboring
`memory simultaneously loads the least significant por-
`tion of the sixty-four bit quantity onto the integer load
`bus for transmission through the integer processor and
`HF bus to the floating point processor. {The sole excep-
`tion to this process for loading a sixty-four bit wide
`word provides for the integer load buses to carry the
`full sixty-four hit number, as noted above. For example,
`memory units 54 and 56 provide a sixty-four bit load
`using the integer load buses 1L2 and IL3 over lines 130
`and 132.) In the second case, during operation of the
`cache miss engine (described in detail below) the sam
`four memories provide mask wor

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket