`
`ARM Ex. 1007
`IPR Petition - USP 5,463,750
`
`
`
`US. Patent
`
`Apr. 24, 1990
`
`Sheet 1 of 13
`
`4,920,477
`
`v-
`[(0119565in I
`
`.
`
`"mama; 2—— ”
`
`‘l
`
`g---—
`
`1--I--.-/
`
`IT
`
`U
`
`__/Z_'1 _______
`.5fo
`————————— 1, mm
`9&7ng
`'
`Mffd'ffl?
`——--
`I’M-M
`
`£35555
`[/3
`Z!
`macaw 3
`1“} __-==_
`WM
`%--—
`lflffd‘fi?
`——-
`IIIII
`_____________________ ..l
`W152?!
`I
`:Eilll
`'--'”Hams”/""'—"_‘
`-—=_I==!Il
`-lffé‘i.¥
`(IfHfi'fl?— _
`_—_-: P5095550»?fl
`--
`f9 2:!!'
`_-_-4|
`19ft;’EE h—----l
`
`III
`mm =.‘-lilfin
`' I:
`1003515 -=_.0
`iii—
`l:'{:9
`Full
`W12
`".2” _
`«9
`E? I
`:!== ,1;._iil
`I!!!
`l--.—gal“
`iii-f!
`IlilJ—zw
`l:::---
`m
`"E“?
`
`
`
`| |5r |
`
`"-—
`
`---
`
`— ’3’
`.12-1!!!"
`35
`
`-=-.. 42
`
`56'
`
`4'19
`
`1er
`————— _,
`. ”-15;
`
`_= [if
`'=: 95” mmm
`
`[—-95” mm;
`
`ARM_VPT_IPR_00000213
`ARM_VPT_IPR_00000213
`
`
`
`US. Patent
`
`Apr. 24, 1990
`
`Sheetlofls
`
`4
`
`0,477 '
`
`gagiqfia..
`..~§§~33,
`_~533%__53.3.3..._
`
`
`
`an.“a.RQEtahawk...fihkuk‘u
`
`.3kaRafi.
`
`_.._—_.-—.—..\
`
`.33....k.3has
`
`xa“.&.m§~§
`
`ARM_VPT_IPR_00000214
`ARM VPT IPR 00000214
`
`
`
`US. Patent
`
`Apr. 24, 1990
`f/f fill/5.565
`
`Sheet 3 0118
`{1 01.9 ”555'
`
`4,920,477 ‘
`
`6'4 A" J!- AWE?
`
`llffd‘f! [[9 I? _ NE!!! 116’ /
`
`
` Ifflfff! {/15
`
`
`mm’3’ mi
` FlRalf/1|"
`
`
`MIR/2704f
`2955!! f 356‘
`
` mm
`90%;?!
`
`
`HMS/6W JMFSS
`”if
`
`A.” w:
`
`Pam-z;
`
`W
`
`
`
`x:
`
`1M!55
`
`
`
`
`
`m x
`mm”:
`m
`mm
`
`”swarm
`mm
`"swarm
`
`
`
`my:
`my!
`mat/mat
`
`
`
`
`mr
`mr
`mm”
`
`
`(J! I 3!) £45!
`{It I 5?) MM
`(I! N!) Mil
`
`
`
`
`
`
`32W! Wit?
`3‘?wa Wlff
`
`Jéflfflflff
`
`
`F/flfi
`
`ARM_VPT_IPR_00000215
`ARM_VPT_IPR_00000215
`
`
`
`US. Patent
`
`Apr. 24, 1990
`[72' M5555
`
`Sheet 4 of 18
`FM.” £05555
`
`4,920,477
`
`REWSI'EI? flit“
`“Iii-1?”?!
`
`.5757 l
`I 51%!
`{Mir/#5 AMI" ”’flj F! 0117/16 Pd’flf
`PM!
`
`AM ”Pt If]? All”? .4“!
`
`14995? M? .41 0’
`
`
`
`
`
`
`”my mm
`
`m
`
`
`
`mm mu
`
`m
`
`we:mam fit:
`3.? 1 32 4/13"
`
`f/F 3:73.555
`
`SM?!"
`595355
`
`was?)
`
`W
`
`11
`
`
`
`
`
`Wfi
`JMJIH
`
`1995?
`101'”PtIt?
`
`
`MMWHJI
`IISHWIMI
`
`
`
`Mil! if”
`Sig/f!
`
`
`
` (IIIJIJMH
`
`(”1322mm
`
`
`3!
`I'll HP! If!
`
`
`
`Mfl?
`Milff
`Plflff
`
`”I
`
`
`
`FIG. 4
`
`ARM_VPT_IPR_00000216
`ARM_VPT_IPR_00000216
`
`
`
`US. Patent
`
`Apr. 24, 1990
`
`SheetSofls
`
`4,920,477
`
`
`
`
`kuahbah».M.\$.93.whfimfix\mikxmihk‘uhafiamNghixuwm.«$5.3..§s¥kh§§<MV§QN3Q.RVkn“..333....K$6..558.uh
`
`
`
`
`
`
`
`
`
`
`
` sawQNPI‘N:6563%!“Sam“..afi‘
`
`
`
`
`
`
`
`I #
`
`nail.E8mm:-x‘afifimag‘
`
`.2a.3ifig.58
`
`I ‘
`
`sé‘
`
`|lI
`
`Q‘
`
`I I K
`
`K II
`
`I I i
`
`3
`
`IIIIIIIIIIIII
`
`£39»N3!.\3Cfih‘uxwtkxuuhMaiuse.“
`1.53.&.\§RxTran“T):«Q
`
`63%Mai-N39‘
`
`ARM_VPT_IPR_00000217
`ARM VPT IPR 00000217
`
`
`
`
`
`
`
`US. Patent
`
`Apr. 24, 1990
`
`Sheet 6 of!!!
`
`4,920,477
`
`
`
`.-hfixi.Nxxutux
`
`“a.s
`
`
`
`3%flaw“..umSQ...E
`
`
`
`..$3:.3.“
`
`3E3
`
`§
`
`«.33&3.QEm
`.33e.3a?xx.x.a3%am.8:ht.
`flax933
`
`$.36fixQSQEHk.
`
`ARM_VPT_IPR_00000218
`ARM VPT IPR 00000218
`
`KNQQumQ3§§u
`
`
`
`‘5‘SeatGK.Qb‘afiwxsaw.
`
`gnawkiwimm‘uaq
`
`am.
`
`
`
`sumwas:”3.33%
`
`7
`
`
`
`._xatI"._-K“Nik-"3kathKm.\Qhkhfihfix
`
`WwwmumkmfihglfiwmwmLasQ9%Qask.
`
`«EN
`
`€53hmQ38
`
`
`
`
`
`
`
`
`S.
`
`7
`
`1
`
`4,920,477 '
`
`
`
`
`
`._33%.$5.is3%--.Se233.Se.‘3m.&ESQ3aka:.§NEE.Emgaa.$.33anEE§¥akaA33%Shim:§3..~32.
`
`.._.3.ea.1Etfi.xIMy.NAQua3%Se..éfimt
`
`
`3.“.m$9.saw$3323
`
`..53$-wnm.iaham»Ru.3
`
`IEI_|
`
`.m.Sm33‘
`HumanNMuNNN
`
`
`
`.35<3ma«a :3.n..ES~3§§§-§~.ESQSm“WWW.”H5».\Eghufia?$4.ua8{333
`
`
`__E83$
`
`yam
`
`
`
`$5.$33.a
`
`in».“a.5.5..uI.t3.5.a55%hV.éélll
`-«$1‘3“.mup?»
`.fiRflkkfiNH$§V‘NNh.ANa
`
`
`
`NanaQue
`
`ARM_VPT_IPR_00000219
`ARM VPT IPR 00000219
`
`
`
`
`4,920,477 '
`Sheet 3 0118
`Apr. 24, 1990
`US. Patent
`
` |L|12 l. . . LII!
`
`
`
`
`
`
`E [
`
`-
`
`{6.461%
`
`£9.40 335155 F9}? 1152'31/0/70! P161!"l'5
`r——A——-\
`{w m 112 a; me warm ac MM! MfS-llilff m
`6'! £95.41 WIMW £59
`
`290
`
`: L
`
`--—7r—— 2m— ——————
`——‘
`F— Flag/fig' £45!
`—
`
`mm was
`mm.- m‘:
`my #5119.” M arm
`
`
`
`””5”
`a commas;
`
`ARM_VPT_IPR_00000220
`ARM_VPT_IPR_00000220
`
`
`
`US. Patent
`
`Apr.24, 1990
`
`Sheet9 ens
`
`.s‘xxkhk
`
`
`
`Raw.»3835‘
`
`.3N
`Qawkgasfi
`
`a?»h
`
`3.3%?
`
`3§§~
`
`EN‘aa6Q«qE.\
`
`«353*.»3.!
`
`
`
`«3‘3:itnxstfi.3‘.»
`
`NQEG
`
` 3.263
`
`
`«flaw
`
`Sea.#2.!
`
`333..“
`
`ARM_VPT_IPR_00000221
`ARM VPT IPR 00000221
`
`
`
`
`
`
`1
`
`Sheet100f18
`
`4,920,477 I
`
`1"ES.»m32%
`
`.atE|m.a.»~RN3§§‘a:xNR§§
`
`Mymagmkfi#53a»u.
`
`
`ham#33.8at.»-
`Eb...*8mafia“
`
`Mix?»
`
`RN
`
`m3:&&$5.MM$5.ax3amES23%Iwk
`
`bu...mu.é“EVEQwarma
`
`933‘.»hm.5%.
`
`3.8.6.»kg“.33V3%«q2%.»“a.
`
`
`gay.»§§flag3%QN.MSa‘3cagar.»E.
`
`
`MUNFNSaw?a.»as
`
`a:\“$53ta....3n«.3Nw23a,»hN
`
`.a»as.mE.
`
`ARM_VPT_IPR_00000222
`ARM VPT IPR 00000222
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Apr. 24, 1990
`
`Sheet 11 of 18
`
`4,920,477
`
`\xv-.3aaEEK.“
`
`ssh.»min.»\2Q
`
`saw
`
`«3?.
`
`Nv-
`
`aimEEG
`
`m.33xx33a
`
`«an.
`
`5%kn
`
`xx
`
`hukmwnsam.ManN95‘2n\Man
`
`-33.“S-N3E
`
`Natin\NE
`
`\VI
`
`93‘“ENaQB.
`
`
`
` hmmany5%»93¢3%v
`
`34.3%.“.2‘
`
`Eiamfiq
`
`3Q55.
`
`mg
`
`fixayawn
`
`Ram.
`
`.29“3%xx%N&%%
`
`sv-
`
`3%».
`
`EEEa
`
`
`
`as“..32.“
`
`
`fitEsq«lamkma“E3»
`ans.“§§§§fififim
`
`
`
`QM§Q§EQ«MN.«5»8E$§§Ems-QHas.
`
`36wAnesh!“Kiwi
`9.5%kg39...wafiia
`
`
`
`w§§a§“Spa.3%mmRamaBilxi
`
`§§§a§I
`
`aha?Eek
`
`NN.UNLN
`
`ARM_VPT_IPR_00000223
`ARM VPT IPR 00000223
`
`
`
`
`
`
`
`US. Patent
`
`Apr. 24, 1990
`
`Sheet120f18
`
`NENEw
`
`9%3+3a:3.5.3
`
`__
`
`.Illlllllnlllllallll.
`
`alwmmmwdmaa.....4.
`Shaky.Alwxfikk.
`
`
`
`MQ.§\MQ§§§R~§wxfikiNM
`
`adha.2\x
`
`\Ekfi§hwk
`
`has.aE
`
`.33
`
`mum
`
`ES.gt...“.33
`
`\.xxxhag.x.5553‘awk
`
`Q.3Ex“...5c.
`
`$1a..eRR
`
`$933$22thuuwkE-
`
`“EaQ
`
`
`R.
`
`ARM_VPT_IPR_00000224
`ARM VPT IPR 00000224
`
`
`
`
`
`
`
`
`
`US. Patent
`
`w.A
`
`4
`
`0,477
`
`as“:ax?a4.3V‘1de
`
`aE$33.as.
`
`aa,}rEn‘gs:
`
`NM.3.35I33...
`
`mafitsfi93‘a.xl35.33§¥
`.3%...$3
`
`.EtaMyNa§.x§hw§h¥3%...I\k.
`‘$3xEREIIIg,at...rat:54%
`3..«Ex.(3a.
`
`
`
`
`3.5.3...
`
`
`nwaafihn3%E._BQ.Qat:3~me‘ifiufiflr
`
`‘3“.$35&3a_«a?St“$3_53a..$5..5:_.._g:as:a.§<3¥.§§
`
`
`
`mxumfimfiF.m38RruuuuuuIIIIIII1_e523*42‘B.3313.fixficHaws
`
`
`
`kaw«Mkmikm.RESQ
`
`
`
`
`
`
`
`MERE.“mfixkhfiubs‘...ukxfiafieMafia?2358..kthlx
`
`
`
`
`
`ARM_VPT_IPR_00000225
`ARM VPT IPR 00000225
`
`
`
`
`
`Patent
`
`Apr. 24, 1990
`
`Sheet 14 of 18
`
`0,477 ‘
`
`
`
`38‘at,4“an“auxmmES.“3“.Exam»
`
`3.3:.3.Sag
`
`36a.
`
`.
`
`g E
`
`w§§§
`
`whwfifiw
`
`Nfiauka
`
`k‘hfihuxNV
`
`§~§u
`
`xhudwa
`
`
`
` gmfiw%Na3:w3%NaSQ!akkfiwk
`
`H§§N5&qu
`
`as“.
`
`*EVEQQR»N
`
`bquSufi
`
`
`
`333%hhukmmfiQ:33.mamas“.
`
`$3a:
`
`$5.~§a.3.
`
`3.3.
`
`Rfiwxumm‘
`
`
`
`3fiwtuafi43%U.E:«aas
`
`
`.3has“..QNQSQQhwau‘kxu$th
`\fiwmx43$.NQ
`
`Nam.
`
`
`
`uahgwhxkysuEssa?
`
`
`
`
`
`mwwuwfiuSVE.-fihk‘
`
`t.Vwas:-N26-3‘
`
`$.kaH§éa.NS,I$1$3Em:“g3:3
`
`I«bank.
`“$3“
`
`
`
`
`$5.a§s§mm
`
`
`.uhfiwfia.mum
`
`
`11111L“Ex:_§§$“gmfiwt”Esra_.3am...
`
`3‘“$3:-t“.1wwmwam.“
`
`
`
`flung.»ha.a:
`
`...3maym$32§$§§
`
`-Q.§.
`
`mRwahauifiw
`
`.‘l:3Es.beExEEKé«a
`
`Mafia.»a.3a:a:2R3.3%33%
`
`at:«5.
`
`ENSVNuQNVN
`
`\h\..\_QEh!§§VQNKQSQ
`$3$323:.53$
`
`Km.“5%!wxikg‘53.x
`
`
`ARM_VPT_IPR_00000226
`ARM_VPT IPR 00000226
`
`
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`4,920,477
`
`
`
`3%!kfiufifi
`
`Rm.
`
`
`
`
`
`h3§§~.258“.39?“.u3.:Eu.
`
`
`1...,3%:33“$333.aM«3.3»atzst.xm..33ESEEa.
`l—Illll
` QESQ"kahxhhw53%.waaeshfifihH_===___
`
`
`
`a:3$3433¢Emnaga53%33$
`
`ESS....3‘35.:hasE3
`
`
`
`II}_E53a$§§.m2%.E33EEEEEn
`
`53$.“23EEs«55%.
`
`3E.§§amgamma3%mg.‘E?
`
`$33“.$3kittsa§§§~333v,uxx
`
`\2%aa,E2
`
`
`
`
`
`a..\.$.§-33.3.33335%.3
`
`QTR.a.ufiufiafianE3»
`
`$2.33.x3%.a3:.»
`
`
`
` «3.xEa“3%we.3.us:a?a3.....§§a3§Six:MN..“NRx3..&§a3334.5%
`
`ARM_VPT_IPR_00000227
`ARM VPT IPR 00000227
`
`
`
`
`US. Patent
`
`Apr.24,1990
`
`Sheet 16 oils
`
`4,920,477
`
`
`
`khakiRfifihfihtkxaisq-Range.
`
`*hawk!N393‘“Gk233$.xkhkhxuw.»
`
`:3
`
`N4.»kghug!x.w.»535<§§k.3“5%.ankfixtkkéx!d.»
`
`Q»
`./
`
`.QN.UNPN
`
`3w.NQ3%MN
`
`$36an.Nwswhh asEcg
`535%ka‘Vkuumamxxawx
`
`
`
`
`txaguo.383%
`
`a.
`
`M
`
`E‘xaw‘\.Auk:
`
`
`
`
`
`as.§§h§§~5§*uhxfimQ.Q‘3VxMhhhumHuh3“:
`
`
`
`xbgfik§u§§€MNa33a“
`
`
`
`Nquuk«3‘has:§§§§$hasIN
`
`
`
` hhx3.3.aa3a.‘3‘:33%a
`
`ARM_VPT_IPR_00000228
`ARM VPT IPR 00000228
`
`NNV
`
`«53‘
`
`
`
`
`
`
`
`fiwhkfi*SQ‘QQ“9.3%.:.
`
`
`
`US. Patent
`
`Apr. 24, 1990
`
`Sheetl'ToflS
`
`4
`
`,477 ‘
`
`
`
`.3kathwag568‘\hhkkaEaikha.§h~§§x
`
`
`
`
`
`a.Qh...k.
`
` ESE.b.k.haan.
`a‘hfi»has.Naurua3‘Qa.a.
`
`«‘3xx«8«S.3:3.33.:his.max».
`
`.33heBe33!
`
`
`
`.283.kfibfiazx
`
`RN.UNLN
`
`
`
`
`
`khsunx$36.33xwmfia“3%has-a
`
`hSEQ»w.»NRwhfiu»\sang.»§3?“.anaEé2
`
`-
`
`abesxkx
`
`.3.3‘k.
`
`ARM_VPT_IPR_00000229
`ARM VPT IPR 00000229
`
`
`
`
`US. Patent
`
`Apr. 24, 1990
`
`Sheet 13 of 18
`
`4,920,477
`
`a3gE.E.
`
`3.8-x
`
`
`
`m.Easy.»
`
`
`
`a%\.UNLN
`
`
`
`«NMENNNNN“.135.
`
`\E.nMAP»!3.heVENQ
`
`
`
`
`
`
`
`N38~35“.NNKKV.VMNNNNNV.NNNK
`
`[IE
`
`
` be5455‘!—-5QNggsqx
`luifl
`
`3Q9353:.
`
`‘3ENNNQ
`
`Q..N\0.35.3
`«NH“3%:
`
`xxxQNNNNN
`
`NVNN-N
`
`VNNNMNN.»
`
`RSNV
`
`N392;»
`
`«8.Cast!
`
`L56-V
`
`
`
`
`
`as.ENNNNV\NNQ33‘atNNNNNNNV
`
`
`
`2‘33.»
`
`NNNQNNNVV
`
`3.338%.!u.»33‘NNNNE.»INNNQ
`
`
`
`
`
`
`
`.§§s§.333NR\VNN‘NN
`
`ARM_VPT_IPR_00000230
`ARM VPT IPR 00000230
`
`
`
`
`
`
`
`
`
`
`
`1
`
`4,920,477
`
`VIRTUAL ADDRESS TABLE LOOK ASIDE
`BUFFER MISS RECOVERY METHOD AND
`APPARATUS
`
`BACKGROUND OF THE INVENTION
`
`The invention relates generally to pipeliued com-
`puter apparatus and methods and in particular to a
`method and apparatus for handling data table look-aside
`buffer misses in a data processing equipment using vir-
`tual address data.
`Substantially all multi-user computers employ virtual
`memory systems. These systems provide substantially
`unlimited memory addressing space. Typically.
`the
`processors, however, operate to the on-board high
`speed physical memory available to them. The on-board
`memory can. For example, be dedicated to a user and
`each time a user changes. the entire on-board memory is
`swapped. storing the data associated with one user in,
`for example. disk memory, and reading and storing data
`for the next user in physical memory.
`In a Trace computer, such as that described hereinaf-
`ter and based upon methods developed in part at Yale
`University, the data processor has a pipelined CPU and
`a pipelined memory. Further. the CPU generates virtual
`addresses, not physical addresses, and employs a data
`translation lookaside buffer (TLB) to effect a virtual
`address to physical address translation. It is important in
`such a system, which also provides for parallel process-
`ing using a very long instruction word having a length
`of. for example. 1.000 or more bits, to provide the ad-
`dress translation without a major sacrifice of either
`available pipeline depth or time.
`A noted above, when multiple users are present,
`memory is typically swapped between fast physical
`memory and slower storage such as dish. so that for
`each change of user there is a change of memory. This
`results in an undesirable decrease of system perfor-
`mance. Furthermore. when a pipelined memory system
`is employed, a determination that the required memory
`data is not available in high speed physical memory can
`cause a yet larger degradation in system performance
`since the memory pipeline must be drained and the
`entire system reset to the instruction having a data miss.
`It is therefore a primary object of the invention to
`provide a data processing method and apparatus for
`addressing a pipelined memory which provides high
`speed data TLB recovery when a miss occurs during a
`virtual address to physical address translation. Another
`primary object of the invention is a data TLB which
`minimizes user hashing. Other objects of the invention
`are a method and apparatus which enable reliable and
`efficient system recovery of a pipeline memory after a
`data TLB miss. Further objects of the invention are a
`computing method and apparatus which are reliable.
`fast, and capable of operating in a parallel processing
`environment.
`
`10
`
`IS
`
`25
`
`35
`
`£5
`
`50
`
`55
`
`SUMMARY OF THE INVENTION
`
`The invention relates to a virtual memory addressed
`table lookaside buffer miss recovery method and appa-
`ratus. The apparatus is.
`in a preferred embodiment,
`associated with a parallel processor having a central
`processing unit and at least one pipelined memory con-
`troller circuitry, the central processing unit addressing
`data using a virtual address memory table lookaside
`buffer. The data miss recovery circuitry features a first
`in-first out buffer register for storing virtual address
`
`ARM_VPT_IPR_00000231
`ARM_VPT_IPR_00000231
`
`2
`least each
`data from the central processor during at
`memory access instruction, a first in-first out buffer
`register for storing instruction status data during at least
`each memory access instruction, circuitry for detecting
`an instruction initiated memory access error condition.
`and circuitry responsive to detection of the memory
`access error condition for at least correcting the mem-
`ory access error condition and replaying. in sequence,
`the instruction causing the error condition and those
`instructions entering, and in, the memory pipeline after
`the instruction causing the error condition. The replay
`circuitry is responsive to the first
`in—first out buffer
`registers for replaying those instructions.
`In a specific aspect of the invention, the instruction
`status data includes at least operation code data. status
`data identifying the type of error. and data representing
`the destination of the memory access.
`In another aspect of the invention. an apparatus for
`reducing data memory thrashing has a multi-user data
`processor employing virtual memory addressing at the
`central processor level and at least one data table looka-
`side buffer for translating a processor supplied virtual
`address to a physical memory address. The apparatus
`features circuitry for assigning to each prDCessor user a
`system identification number. and storage circuitry for
`providing a virtual addreSs to physical address transla-
`tion at a buffer address derived by logically mixing a
`selected portion of the virtual address with the user
`system identification number.
`recovery
`The data table Iookaside buffer miss
`method. according to the invention, features the steps of
`storing sequentially generated virtual address data from
`the central processor in a first iii-first out buffer register.
`storing a sequentially generated instruction status data
`in a first in—first out buffer register. detecting a memory
`access error condition, correcting the memory access
`error causing that error condition, and replaying.
`in
`sequence, the instruction causing the error condition,
`and those instructions entering and in the memory pipe-
`line after the entry into the pipeline of the instruction
`causing the error condition, the replaying step being
`responsive to data stored in the virtual address and
`status data first in-first out butler registers for advanta-
`geously replaying the instructions.
`In another aspect, a method according to the inven-
`tion for reducing memory thrashing in a virtual memory
`addressing system having a data table lookaside buffer
`for translating 3 prooessOr supplied virtual address to a
`physical memory address, features the steps of assigning
`to each user a system identification number and storing
`data providing a virtual address to physical address
`translation at a buffer address derived by logically mix-
`ing a selected portion of the virtual address with the
`user system identification number. In a particular as-
`pect, the invention features exclusive OR‘ing, on a bit-
`by-bit basis, the system identification number with the
`selected portion of the virtual address.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`65
`
`Other objects. features. and advantages of the inven-
`tion Will appear from the following description taken
`together with the drawings in which:
`FIG. 1 is an electrical block diagram of the overall
`structure of a computer system in accordance with a
`preferred embodiment of the invention;
`
`
`
`4,920,477
`
`4
`52, 54, and 56. In other embodiments of the invention,
`more or fewer clusters, input/output processors, and
`memory systems can be employed.
`Referring to FIG. 2. each memory system has a mem-
`ory controller 58 for accepting memory reference re-
`quests from. for example. the central processing unit
`and for generating the necessary control signals over
`lines 600. 606 to access dynamic random access memory
`chips. The memory chips are organized into blocks of
`memory 62 and each controller 58 can control up to
`eight memory blocks, called “banks." Each word of
`memory is thus addressed by its controller number, its
`bank number, and the word number of the particular
`bank (the “word-in—bank”). The number of controllers.
`as well as the number of banks associated with each
`controller, can vary with the configuration of the sys-
`tem. Referring to FIG. 1. a preferred memory configu-
`ratiOn has eight memory controllers 58. each of which
`can receive data from the central processing units and
`provides output data to the various units of the system.
`Each memory controller provides access to each mem-
`ory bank 62 over the lines 60a and 60b and receives the
`result of the addressing inquiry over lines 64 and pro-
`vides data for storage to its banks over lines 65. In the
`illustrated embodiment of the invention, each memory
`bank 62 stores two million bytes of data: in accordance
`with the preferred embodiment of the invention.
`the
`memory is advantageously interleaved.
`In accordance with the illustrated embodiment of the
`invention, each memory controller 58 provides a multi-
`stage pipeline which generates the necessary control
`signals to access the proper dynamic RAM of memory
`banks 62. The memory write operation is a pipelined
`write procedure which provides for storing data in four
`beats of the equipment. The cycle time for storing a
`word is about 240 nanoseconds for the components used
`in the illustrated embodiment. Because the DRAM‘s are
`busy throughout this period, only one write request can
`be processed during the interval.
`Referring again to FIG. 1. the input/output proces-
`sors 36 and 38. in the illustrated embodiment. act as the
`interface between the CPU and memory on one hand,
`and an external device such as an external computer on
`the other. The external device can be a computer which
`communicates with various other input/ output periph-
`eral equipment such as tape drives and terminals. The
`input/output units also provide for direct-memory ac-
`cess (DMA) transfers of data between memory and the
`input/output device. The input/output processor uses a
`so-called “DMA engine" to control data flow and oper-
`ate a protocol sequence as is well known in the art. The
`input/output processor can contain, and preferably
`does contain. its own microprocessor which controls
`the timing of program interrupts and schedules the
`transfer of data using internal buffers.
`A primary function of the global controller is to pro-
`vide the program counter which generates the next
`instruction address. The global controller also “orches-
`trates" the process of filling the instruction cache from
`main memory during an instruction cache miss. Thus, if
`a required instruction is not found in the instruction
`cache during program execution. that instruction must
`be obtained from memory and the global controller
`asserts control over the various buses to quickly transfer
`instruction data from main memory to the instruction
`cache. The global controller, in the illustrated embodi-
`ment, further has an instruction table lookup buffer
`
`3
`FIG. 2 is an electrical block diagram of a memory
`system in accordance with a preferred embodiment of
`the invention;
`FIG. 3 is a block diagram of the integer processor in
`accordance with a preferred embodiment of the inven-
`mm;
`FIG. 4 is an electrical block diagram of a floating
`point processor in accordance with a preferred embodi-
`ment of the invention;
`FIG. 5 is a representation of the method for storing
`mask word data in a four-wide system configuration;
`FIG. 6 is a representation of the storage of mask word
`and data fields in a one-wide system configuration;
`FIG. 7 is an electrical block diagram illustrating
`cache miss detection and addressing. and calculation
`and storage of the next program counter value accord-
`ing to a preferred embodiment of the invention;
`FIG. 7A is an electrical block diagram showing the
`instruction table lockup operation and address genera-
`tion according to a preferred embodiment of the inven-
`non;
`FIG. 8 is an electrical block diagram illustrating ele-
`ments of the cache miss engine in accordance with a
`preferred embodiment of the invention;
`FIG. 9 is an electrical block diagram of a first section
`of a cache miss engine;
`FIG. Ill is an electrical block diagram illustrating the
`beginning of tag generation in the cache miss engine
`according to a preferred embodiment of the invention;
`FIG. 11 is an electrical block diagram showing the
`completion of tag generation in ,the cache miss engine
`according to a preferred embodiment of the invention;
`FIG. 12 is an electrical block diagram illustrating the
`virtual to physical address translation according to a
`preferred embodiment of the invention;
`FIG. 13 is an electrical block diagram illustrating the
`operating elements for implementing the history queue
`according to a preferred embodiment of the invention;
`FIG. 14 is an electrical block diagram detailing the
`elements of the integer unit history queues according to
`a preferred embodiment of the invention;
`FIG. 15 is a representation illustrating the elements of
`the status queue data word in accordance with a pre-
`ferred embodiment of the invention;
`FIG. 16 is an electrical block diagram of the integer
`unit branch logic and program counter address genera-
`tion circuitry according to a preferred embodiment of
`the invention;
`FIG. 1"! is a pictorial representation of the data in the
`instruction unit early beat immediate packet according
`to a preferred embodiment of the invention; and
`FIG. 18 is an electrical block diagram illustrating the
`interconnections of the integer processing units and the
`global controller
`for generating the nest program
`counter address according to a preferred embodiment
`of the invention.
`
`S
`
`10
`
`15
`
`20
`
`25
`
`35
`
`4-0
`
`45
`
`50
`
`55
`
`DESCRIPTION OF A PREFERRED
`EMBODIMENT
`
`General Structure and Operation
`
`Referring to FIG. I, a computer system or data pro-
`cessor 10 has a central processing unit (CPU) 11 having
`a plurality of clusters 12, 14, 16. 18, each cluster having
`an integer or I-unit processor 20, 2.2, 24. 26, and a float-
`ing point or F-unit processor 28. 30, 32, and 34, respec-
`tively. The central processing unit interconnects with
`input/output processors 36 and 38. a global controller
`40, and a plurality of memory systems 42. 44, 46, 48, 50.
`
`65
`
`ARM_VPT_IPR_00000232
`ARM_VPT_IPR_00000232
`
`
`
`4,920,477
`
`10
`
`15
`
`20
`
`25
`
`35
`
`45
`
`65
`
`5
`(ITLB) for storing a record of which "pages" of in-
`structions are currently in memory and the locations in
`slower, for example disk memory from which they
`were obtained.
`Each cluster. according to the invention, has, as
`noted above, an integer processor and a floating point
`processor. Referring to FIG. 3, each integer processor
`handles integer computation as well as other logic func-
`tions. The integer processor, in the illustrated embodi-
`ment, includes two independent arithmetic logic units
`70, 72 (designated ALUO and ALUl respectively). a
`64X 32-bit register file 74, a virtual to physical address
`data translation lookaside buffer 16, a branch unit 78,
`and a first and a second branch bank 80, 82, respec-
`tively. (Each branch bank of the illustrated embodiment
`is an 3 xl-bit register for storing branch condition data
`from the arithmetic logic units 70, 72 respectively.) The
`integer processor further includes a section £76 of a
`distributed instruction cache memory.
`Functionally. the translation lookaside buffer trans-
`lates virtual memory addresses from the ALU’s to phys-
`ical memory addresses using a table lookup mechanism
`well known to those practiced in the art, and the in-
`struction cache memory provides the ALU‘s with faster
`access to instructions than would be possible if the in-
`structions had to be read from memories 42.....56 for
`every cycle of the processor. The register file 74 is,
`according to the illustrated embodiment of the inven-
`tion, divided into two sub-banks. One sub-bank of
`thirty-two 32-bit registers is associated solely with arith-
`metic logic unit TI} and the other sub—bank is assoeiated
`solely with arithmetic logic unit 72. The branch bank
`circuitry an, 32, and the branch unit 78 are employed
`during multiway branch operations also described in
`more detail hereinafter.
`Referring to FIG. 4, the floating point processor has
`a floating point multiplier and arithmetic logic unit 90,
`and a floating point adder and arithmetic logic unit 92.
`Each floating point processor further includes a register
`file of sixty-four 32-bit registers that is divided in half in
`the same manner as the integer processor register file
`74. The floating point adder and arithmetic logic unit 92
`has access to source operands in one half of the register
`file 98 and the floating point multiplier and integer
`arithmetic logic unit 92 has access to the source oper-
`ands in the other half of the register file. There are in
`addition a first and second branch bank units 100. 102,
`respectively, and a memory store register file 10-1-
`which, in the illustrated embodiment consists of thirty-
`two 32-bit registers. The memory store register file is
`used by the integer and floating point processors of a
`cluster and is the path by which data can be stored in
`memory 42,...,56. The branch banks 100, 102, like the
`corresponding branch banks 80, 32 of the integer pro-
`cessor, comprise a set of eight one-bit registers that
`store coudition codes resulting from arithmetic logic
`unit Operations. These codes can be used in branch
`determination.
`Referring to FIG. I, in the illustrated embodiment.
`the CPU preferably has four clusters. This is referred to.
`in the illustrated embodiment, as a four-wide system. In
`other embodiments according to the invention.
`the
`number of clusters, and their architecture, can vary. In
`particular. there can be for example one or two clusters.
`designated a one-wide or a two-wide system. respec-
`tively. The number of memory controllers and the num-
`ber of banks per controller depend upon the number of
`clusters. For a “one-wide” processor. one might select
`
`ARM_VPT_IPR_00000233
`ARM_VPT_IPR_00000233
`
`30
`
`6
`two memory controllers. each having four banks of
`memory. Other configurations are within the skill of
`one practiced in the art.
`In accordance with the invention. the hardware ar-
`chitecture described in connection with FIGS. 1-4 is
`known to the compiler which generates pregram code
`for the system. In the illustrated embodiment, the pro-
`gram code is in the form of a sequence of [.024 bit
`histruction words for the preferred four-wide system. If
`fewer than four clusters are used, the width of the in-
`struction word can be accordingly reduced. (Thus, a
`two-wide system employs a 512-bit instructiou word
`and a one-wide system employs a 256-bit instruction
`word.) Each instruction word has a plurality of opera-
`tion fields (generally ALU instructions) and the goal of
`the compiler is to fill as many fields of the instruction
`word as possible so that each of the ALU‘s is occupied.
`executing an instruction for each beat of the equipment.
`The compiler stores resource information such as re-
`source restrictions, including access times, number of
`buses, and the number of available registers. The com—
`piler produces an execution code that optimizes re-
`source allocatiOn.
`In Operation, the compiler uses the Trace Scheduling
`method to analyze the flow of a program and to predict
`which paths the program will take. These predictions
`include statistical guesses about conditional branches.
`The compiler develops plots or traces of program flow
`and, where necessary, multiple traces, each with 21 cal-
`culated probability of being correct. are generated to
`describe the expected program sequence. The compiler
`uses various methods to select the best of the multiple
`projected traces and calls upon a “disambiguater” to
`assist in creating code that has parallel structure. The
`disambiguator method decides whether or not implied
`memory references result in a program conflict. that is.
`whether or not memory references can be executed in
`parallel.
`For example, if the program refers to variables “I"
`and "I," the compiler must know, if possible, whether
`these variables will refer to the same memory loeation.
`If they do not, the operations to which they relate can
`most likely be executed in parallel (unless they depend
`on each other's results). Thus, operations such as “write
`I" and "read J " can generally b performed concurrently
`if “I" and “J" are independent of each other at that
`execution step in the program. If, however, “I” and “J“
`translate to the same location in physical memory (and
`in the illustrated embodiment, to the same memory
`controller), the two Operations must be executed se-
`quentially. Accordingly. the more situations the disam-
`biguator can disambiguate, the more the code can be
`made to run in parallel. The Trace Scheduling method
`is described in detail in Ellis. John, Bulldog: A Campiler
`55 for VLIWArchi'recmres, MIT Press, Cambridge, Mass,
`1936, attached hereto as Appendix I.
`In the illustrated embodiment, the compiler further
`permits the programmer to make “assertions" about the
`variables used in the program. The programmer can
`assert, for example, that two variables are never equal
`or are not equal at some point in his program and there-
`after. These assertions increase the ability of the com-
`piler to generate parallel code because they reduce the
`uncertainty about
`the memory references that ulti-
`mately force code to be made sequential.
`Also, as in the case of memory reference disambigua-
`tion. programmer assertions can assist the compiler in
`the case of memory bank disambiguation. Since the
`
`
`
`4,920,477
`
`7
`memory has an interleaved structure for providing a
`higher memory bandwidth. and since multiple banks
`can be accessed simultaneously by the various ALU’s.
`the assertion that the difference between two variables
`will never be zero modulo N, where N is the number of 5
`banks in the system, guarantees that the same memory
`bank will not be accessed twice in the same beat.
`A further, more severe restriction exists. however, as
`noted above, that a memory controller cannot be refer-
`enced more than once in a single cycle. This poses a
`“problem" for tee compiler, since it cannot schedule in
`parallel two operations that reference the same memory
`controller. Therefore. the compiler can make parallel
`only those memory operations in which memory loca-
`tions, if accessed. are accessed through different mem-
`ory cantrollers. Thus, for example, writing code that
`accesses word N and word N+ M in the same best, of'a
`system which is configured with a total of M banks,
`would cause a bank conflict as well as a memory con-
`troller conflict.
`There also exists a stall condition that results from
`two or more references to the same memory bank
`within four beats. During a so-cailed a “bank stall,“ the
`CPU is set to an idle state due to the latency in the
`memory pipelines. The compiler. to the extent possible,
`avoids scheduling operations that cause bank stalls, but
`the occurrence of such an event is not fatal to program
`execution as are concurrent calls to the same memory
`controller. The bank stall mechanism is discussed in
`more detail below.
`
`10
`
`IS
`
`20
`
`8
`PAL PA2, and PA3, receive physical address data gen-
`erated using the data table lookaside buffer 76 of the
`integer processor for addressing the memory system.
`The outputs of memories 62 and SI]. 44 and 52, 46 and
`54, and 48 and 56, connect respectively to integer load
`buses ILI], lLl, 1L2. and IL3. This provides for the
`simultaneous loading of the integer load buses with up
`to four 32-bit words or fields from the interleaved mem-
`ory.
`In addition, however. memories 52 and 56 also
`connect respectively to bus lines [LB and IL2 to pro-
`vide the low order thirty»two bit data for a double
`precision sixty-four bit quantity. That data is transferred
`through the integer processors, along the HF buses. to
`the floating point processor register file for processing.
`In addition, each input/output processor 36. 38 con-
`nects to each of the integer load buses for making direct
`memory access (DMA) transfers as discussed in more
`detail beIOw.
`As noted above, the floating point load buses provide
`a path from memory to the floating point processors.
`Only four of the eight memory controllers, however,
`need connect to the floating point buses, because the
`two transmissions from the memories to the floating
`point processors always use the same four memory
`controllers. In one case, the floating point load. a sixty-
`four bit data word load. one memory of a pair loads the
`most significant half of the sixty-four bit quantity
`through the floating point bus while its neighboring
`memory simultaneously loads the least significant por-
`tion of the sixty-four bit quantity onto the integer load
`bus for transmission through the integer processor and
`HF bus to the floating point processor. {The sole excep-
`tion to this process for loading a sixty-four bit wide
`word provides for the integer load buses to carry the
`full sixty-four hit number, as noted above. For example,
`memory units 54 and 56 provide a sixty-four bit load
`using the integer load buses 1L2 and IL3 over lines 130
`and 132.) In the second case, during operation of the
`cache miss engine (described in detail below) the sam
`four memories provide mask wor