throbber
(12)
`
`United States Patent
`Fu et al.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 6,633,945 B1
`Oct. 14, 2003
`
`US006633945B1
`
`(54) FULLY CONNECTED CACHE COHERENT
`MULTIPROCESSING SYSTEMS
`
`(75) Inventors: Daniel Fu, Sunnyvale, CA (US);
`Carlton T. Amdahl, Alameda County,
`CA (US); Walstein Bennett Smith, III,
`Pal‘) A1t°> CA(US)
`
`_
`(73) Asslgnee: Conexant Systems, IIlC., Newport
`Beach, CA (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U,S,C, 154(k)) by 0 days,
`
`_
`(21) Appl' No" 09/349’641
`(22) Filed:
`Jul_ 8 1999
`’
`Related US‘ Application Data
`
`(63) Continuation-in-part of application No. 09/281,749, ?led on
`Mar. 30, 1999, now Pat. No. 6,516,442, which is a continu
`ation-in-part of application No. 09/163,294, ?led on Sep. 29,
`1998, now Pat. No. 6,292,705, which is a continuation-in-
`part of application No. 08/986,430, ?led on Dec. 7, 1997,
`now Pat No_ 670657077'
`
`(51) Int. cl.7 .............................................. .. G06F 13/00
`(52) US. Cl. ....................... .. 710/316; 710/317; 710/29;
`_
`709/213; 711/130
`(58) Fleld 0f Search ............................... .. 710/100, 305,
`710/313> 315’ 316’ 317’ 29; 711/144’ 143’
`130, 147; 709/213
`
`(56)
`
`References Clted
`U.S. PATENT DOCUMENTS
`
`gilfgcslznet a1
`2
`4’48O’307 A 10/198 4 Budde et a1:
`5j161:156 A 11/1992 Baum et a1_
`5,271,000 A 12/1993 Engbersen et a1,
`5,313,609 A
`5/ 1994 Baylor et al.
`5,335,335 A
`8/1994 Jackson et al.
`5,440,698 A
`8/1995 Sindhu et al.
`5,505,686 A
`4/1996 Willis et al.
`
`4/1996 Zilka
`5,511,226 A
`4/1996 McClure
`5,513,335 A
`6/1996 Martinez, Jr. et al.
`5,524,234 A
`6/1996 IZZfHd
`5526380 A
`if“; h.
`2
`as“ ‘1° 1
`7
`7
`7/1996 Foley
`5,537,575 A
`9/1996 Taylor et al.
`5,553,310 A
`5,561,779 A 10/1996 Jackson
`5,568,620 A 10/1996 Sarangdhar et aL
`5,574,868 A 11/1996 Marisetty
`5,577,204 A 11/1996 Brewer et al.
`5,581,729 A 12/1996 Nishtala et al.
`5,588,131 A 12/1996 Borrill
`5,594,886 A
`1/1997 Smith et al.
`5,602,814 A
`2/1997 Jaquette et al.
`5,606,686 A
`2/1997 Tarui et al.
`5,634,043 A
`5/1997 Self et al.
`5,634,068 A
`5/1997 Nishtala et al.
`5,644,754 A
`7/1997 Weber
`5,655,100 A
`8/1997 Ebrahim et al.
`5,657,472 A
`8/1997 Van Loo et al.
`(List continued On next page.)
`
`OTHER PUBLICATIONS
`-
`-
`-
`Techmcal Whlte Paper’ Sun TM Enterpnse TM 10000
`Server, Sun Mlcrosystems, Sep. 1998.
`Alan Charlesworth, Star?re: Extending the SMP Envelope,
`IEEE Micro, Jan/Feb- 1998,1111- 3949
`(List continued on next page.)
`Primary Examiner—Sumati LefkowitZ
`Assistant Examiner_)(' Chung_TI-anS
`(74) Attorney, Agent, or Firm—Keith Kind; Kelly H. Hale
`(57)
`ABSTRACT
`Fully connected multiple FCU-based architectures reduce
`requirements for Tag SRAM siZe and memory read laten
`cies. Apreferred embodiment of a symmetric multiprocessor
`system includes a switched fabric (switch matrix) for data
`transfers that provides multiple concurrent buses that enable
`greatly increased bandwidth between processors and shared
`memory. Ahlgh-speed pomt-to-pomt Channel couples com
`mand initiators and memory with the switch matrix and with
`I/O subsystems.
`
`10 Claims, 15 Drawing Sheets
`
`MPCPUBUS—> M M M M W w MPCPUBUS
`
`DDR'SDRAM +- FcU-Mcu
`
`DDReSDRAM |-—-
`
`0
`
`I
`
`Pl-lo-Pl
`
`CHANNEL
`
`WOW
`CHANNEL
`
`FcU-Mcu —' DDR-SDRAM
`
`1
`
`—| DDR-SDRAM
`
`Pl-lo-Pl
`
`CHANNEL
`
`DDR-SDRAM |—' Fcu-Mcu
`
`FcU-Mcu —' DDR-SDRAM
`
`
`
`DDRASDRAM '- Elisa MPCPUBUS——>
`
`
`
`2
`
`3
`
`
`
`DDR-SDRAM ‘———MPCPUBUS
`
`—l
`
`V0
`BRlDGE CHIP
`
`PT-loPl
`
`CHANNEL
`
`V0
`BRlDGE CHlP
`
`PT-TQPT
`
`CHANNEL
`
`V0
`BRIDGE CHlP
`
`l—
`
`V0
`BRlDGE CHIP
`
`POI BUS
`
`PCl BUS
`
`PCl BUS
`
`_|_ _|_
`
`PC! BUS
`
`PCI BUS
`
`PCI BUS
`
`PCI BUS
`
`POI BUS
`
`NETAPP, INC. EXHIBIT 1001
`Page 1 of 23
`
`

`
`US 6,633,945 B1
`Page 2
`
`US. PATENT DOCUMENTS
`
`5,682,516 A 10/1997 Sarangdhar et al.
`5,684,977 A 11/1997 Van Loo et al.
`5,696,910 A 12/1997 Pawlowski
`5,796,605 A
`8/1998 Hagersten
`5,829,034 A 10/1998 Hagersten et 211.
`5,895,495 A
`4/1999 Arimilli et al.
`5,897,656 A
`4/1999 Vogt et al.
`5,940,856 A
`8/1999 Arimilli et al.
`5,946,709 A
`8/1999 Arimilli et al.
`5,978,411 A 11/1999 Kitade et al.
`6,044,122 A
`3/2000 Ellersick et 211.
`6,065,077 A * 5/2000 Fu ........................... .. 710/100
`6,125,429 A * 9/2000 Goodwin et a1. ......... .. 711/143
`6,145,007 A 11/2000 Dokic et al.
`6,279,084 B1
`8/2001 VanDoren et 211.
`6,289,420 B1 * 9/2001 Cypher ..................... .. 711/144
`6,292,705 B1
`9/2001 Wang et al.
`6,295,581 B1 * 9/2001 DeRoo ..................... .. 711/135
`
`OTHER PUBLICATIONS
`
`Joseph Heinrich, Origin TM and OnyZ2 TM Theory of
`Operations Manual, Document No. 007—3439—002, Silicon
`Graphics, Inc., 1997.
`White Paper, Sequent’s NUMA—Q SMP Architecture,
`Sequent, 1997.
`White Paper, Eight—way Multiprocessing, Hewlett—Packard,
`Nov. 1997.
`George White & Pete Vogt, Profusion, a Buffered, Cache
`Coherent Crossbar Switch, presented at Hot Interconnects
`Symposium V, Aug. 1997.
`Alan Charlesworth, et al., Gigaplane—XP: Extending the
`Ultra Enterprise Family, presented at Hot Interconnects
`Symposium V, Aug. 1997.
`
`James Loudon & Daniel Lenoski, The SGI Origin: A
`ccNUMA Highly Scalable Server, Silicon Graphics, Inc.,
`presented at the Proc. Of the 24m Int’l Symp. Computer
`Architecture, Jun. 1997.
`Mike Galles, Spider: A High—Speed Network Interconnect,
`IEEE Micro, Jan/Feb. 1997, pp. 34—39.
`T.D. Lovett, R. M. Clapp and R. J. Safranek, Numa—Q: an
`SCI—based Enterprise Server, Sequent, 1996.
`Daniel E. Lenoski & Wolf—Dietrich Weber, Scalable Shared
`Memory Multiprocessing, Morgan Kaufmann Publishers,
`1995, pp. 143—159.
`David B. Gustavson, The Scalable coherent Interface and
`Related Standards Projects, (as reprinted in Advanced Mul
`timicroprocessor Bus Architectures, JanusZ Zalewski, IEEE
`computer Society Press, 1995, pp. 195—207.).
`Kevin Normoyle, et al., UltraSPARC TM Port Architecture,
`Sun Microsystems, Inc., presented at Hot Interconnects III,
`Aug. 1995.
`Kevin Normoyle, et al., UltraSPARC TM Port Architecture,
`Sun Microsystems, Inc., presented at Hot Interconnects III,
`Aug. 1995, UltraSparc Interfaces.
`Kai Hwang, Advanced Computer Architecture: Parallelism,
`Scalability, Programmability, McGraw—Hill, 1993, pp.
`355—357.
`Jim Handy, The Cache Memory Book, Academic Press,
`1993, pp. 161—169.
`Angel L. Decegama, Parallel Processing Architectures and
`VLSI Hardware, vol. 1, Prentice—Hall, 1989, pp. 341—344.
`
`* cited by examiner
`
`NETAPP, INC. EXHIBIT 1001
`Page 2 of 23
`
`

`
`U.S. Patent
`
`0a. 14, 2003
`
`Sheet 1 0f 15
`
`US 6,633,945 B1
`
`g > :
`
`o: = =
`
`M85 95.2 $252 9 25.3%
`
`
`22:
`
`= =
`
`i ‘V
`
`QM 3
`
`
`
`wzoéslo M56510
`
`i . . . a
`
`a a
`
`S; 3‘
`
`=
`
`=
`
`v = V
`
`@
`
`#3. K95 v .QE
`
`55% 2 »\ o?
`
`NETAPP, INC. EXHIBIT 1001
`Page 3 of 23
`
`

`
`U.S. Patent
`
`0a. 14, 2003
`
`Sheet 2 0f 15
`
`US 6,633,945 B1
`
`E0552
`
`com?
`
`:02 I
`
`P P < < <
`
`1 @mm Al 53015826:
`
`:02 A v I
`
`I!
`
`I g 1 <
`
`All‘
`
`
`
`m: E Ola . . . 3
`
`QINIF a a a O I I
`
`
`
`20a :60
`
`4 < <
`
`I > > = \N:
`
`5i
`
`CNN
`
`> =
`
`$4 I. I: N? W oww
`
`
`
` oww M mo< 3mm 3% n6,,‘
`
`> = > 1:
`
`(m;
`
`mm? SF
`
`N 6E
`
`6m 61 \
`
`cow
`
`
`
`510 @310 510 9.50
`
`
`
`
`
`NETAPP, INC. EXHIBIT 1001
`Page 4 of 23
`
`

`
`U.S. Patent
`
`Oct. 14, 2003
`
`Sheet 3 of 15
`
`US 6,633,945 B1
`
`
`
`M..9...
`
`5%22%
`
`
`
`...om.our292n_©<E22
`
`IIIImm;:2
`
`
`
`m2_._o<o2EoSm/.Im2o<o5n_om9Em:2_2mo<22E2_
`
`N:W2mo<”_2E2_...WmoE2E2_29:28229:28
`282:8mSF_.............-Lr...........-L_..........-L
`
`IK/IvE;N9,
`
`'I
`
`I
`.
`__-_---..------I ‘
`LO
`‘O
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 5 of 23
`
`O(
`
`INTERFACE
`CONTROL
`
`INTERFACE
`CONTROL
`
`m2o2H2oo
`
`
`
`m22<2oA8mM>2o2m__2’v-Imn2
`
`I
`I
`
`I
`
`I
`
`I '
`
`IIIII IIII
`
`:
`
`I :
`
`O I
`ml
`:I
`
`.1.mm2>2o2w2_vv85S5.vvMN2E52222H‘ATm2E26528MN2.22052822MWm2mo$2E2_22moE2E2_22239:28222222mosmo2...2mo_>m_o22.222m9EE2_U22:2852+o2_2o<o2Wo2_2o<o2WM22.22-|om:2ma.2m1IIIIIIIIIIIIA1IIIIIIIIIIIIIIJIIIll
`4|I.I||2mmm22225_In.F‘O828225_._EE3v8WR3oE28_“‘I||.lI.E|
`
`
`-IAHJIAJAA282_’72.%2_22.2_22.o.iflllfl,2>mo_2m__2_1%flufl
`m22<2o
`
`NETAPP, INC. EXHIBIT 1001
`Page 5 of 23
`
`
`
`
`
`
`

`
`U.S. Patent
`
`0a. 14, 2003
`
`Sheet 4 0f 15
`
`US 6,633,945 B1
`
`“3 .031
`
`E0532
`
`EOEmE
`
`E0552
`
`E0552
`
`NETAPP, INC. EXHIBIT 1001
`Page 6 of 23
`
`

`
`U.S. Patent
`
`Oct. 14, 2003
`
`Sheet 5 of 15
`
`US 6,633,945 B1
`
`5o_>_.
`
`N32———ma4a-uu
`.._II“-I
`=...,..
`
`I
`
`IIIIIIIIIIIIII
`
`COCO
`("OLD
`<I'
`0")
`COCO
`<‘OC\l
`0')‘-
`931$
`C\lO'2
`31$
`t\lf\
`(NICO
`<\lLD
`t\l<f
`(NCO
`
`c\|(\l
`(\I\—
`CH3
`
`::®:;~;
`
`::_:::::©:.mN_
`
`III
`
`V821113.nnnnunnunnnnnnnn
`
`ma:Iulnnnnnuununnnnr
`.mmmmmmmmmnnnnnnnr--$111
`"IIII'IIIIIiIII
`
`IIIIII
`
`
`
`><>>mm:m_.5>omm:
`
`mEm§_IIIIIIIIIIIIIIIIEIIII
`owmzwzép
`
`EmiwI
`IIIIIIIIIIIIIIII‘IIIIIIIIIIIIII
`IIIIIIIIIIIIII
`
`ma
`
`M.9...
`
`586mun.8mEmz<EmtgwL
`IIIIIIIIIIIIL
`
`
`
`IIIIIIIIIIIIII
`
`III
`
`
`
`<_.<n_w$m8<
`
`
`
` ._sa:2:8m<m:§g
`
`w:Om_z<._.._:_>=wwo._.n5
`
`Eozmz
`
`mzo_5<mz<E
`
`m_z_._$55Eswe
`
`emmEozmzEma
`
`usem:mz<E
`
`><>>mm:mac3u
`
`9<>>-$macEH
`
`av
`
`.w:m-om_m<_._wE<moan
`
`n__>_m>33.89$
`
`m_z_._$55m_._.>m_mm
`
`
`
`E».Eu_mz<EEma
`
`mazasH
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 7 of 23
`
`NETAPP, INC. EXHIBIT 1001
`Page 7 of 23
`
`
`
`

`
`U.S. Patent
`
`Ef06
`
`36:6
`
`1BM...
`
`wE052I%DOE
`
`>mO_>.m=>_1
`
`33ou_W>mO_>_ms_IH,.2UM."EozwzImH
`
`2:0IEO
`
`oooooooooooooooooooooo
`
`wm.9...
`
`W83_on_82%_on_
`
`83_on_
`
`Alv3mm3%
`
`83_on_
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 8 of 23
`
`NETAPP, INC. EXHIBIT 1001
`Page 8 of 23
`
`

`
`U.S. Patent
`
`Oct. 14, 2003
`
`Sheet 7 of 15
`
`US 6,633,945 B1
`
`N <
`
`5 I
`
`EEoo
`E2LIJUJ
`22
`
`II
`
`3o2
`
`3 DD
`0 O0
`2 22
`
`
`
`E,’
`an
`
`D
`an
`(II
`
`3
`
`ED
`
`EEEEOOOO
`2222
`LLIUJLLJLIJ
`2222
`
`3233
`0000
`2225
`
`2
`:>
`D_<—>LJ<-—>
`O
`O
`
`c\_<—><_><—>
`O
`(J
`
`0.4->LJ<—->
`
`! O
`
`3
`3
`D_<—>O<-—>
`
`8
`%
`14-90%’
`O
`0
`:
`
`3
`2
`
`I)
`
`,_
`C
`
`D
`
`..
`C
`
`E3
`
`D
`ca
`ED
`
`2
`3
`
`4:-D
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 9 of 23
`
`3E
`
`D
`
`m
`
`3
`[13
`CD
`
`3
`no
`
`I
`
`NETAPP, INC. EXHIBIT 1001
`Page 9 of 23
`
`

`
`U.S. Patent
`U.S. Patent
`
`0a. 14, 2003
`Oct. 14, 2003
`
`Sheet 8 0f 15
`Sheet 8 of 15
`
`US 6,633,945 B1
`US 6,633,945 B1
`
`m.0?‘ mice3%
`
`m .UE
`
`III
`
`_.|
`
`m_o<n_m_m:.z_mamN._
`
`$60 310
`
`:00
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 10 of 23
`
`NETAPP, INC. EXHIBIT 1001
`Page 10 of 23
`
`
`

`
`U.S. Patent
`
`0
`
`51cl0
`
`U
`
`3m
`
`1B
`
`S>mo_>ms_I
`
`wE052+I.Ivm:02
`
`m§os_ms_
`
`Al|.l.Vm.:02M,Eo_>m_>_IaHH
`
`cooooooooooooooooooocc
`
`ES
`
`5IM,aGE
`
`%83E8%E
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 11 of 23
`
`NETAPP, INC. EXHIBIT 1001
`Page 11 of 23
`
`

`
`u
`
`H
`
`Oct. 14, 2003
`
`1
`
`0.1
`
`4
`
`3
`
`3
`
`PHHHab.93H3H33..H3
`
`ENEEIEEHET.HéHeEtEIHHHHH
`
`H:
`
`HEH
`
`1I%2.GE
`
`wHHHHas%H3H3H3H5
`M3%E02%EuH5H_o
`
`
`
`U ..E.|. ¢E.|v
`
` E E EHe E ..§.|. E5HHHHH
`
`mEEgmWEEEam
`
`:8
`
`E1
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 12 of 23
`
`NETAPP, INC. EXHIBIT 1001
`Page 12 of 23
`
`
`
`

`
`U.S. Patent
`
`0a. 14, 2003
`
`Sheet 11 0f 15
`
`US 6,633,945 B1
`
`_>_<mn_m I
`
`52mg I
`
`gig I
`
`_>_<mom I 30¢
`
`v v 6E
`
`‘V t
`
`1 :
`
`
`
`3E0 ............. .. P :10
`
`I
`
`I O62
`
`I 219E
`
`3mm
`
`NETAPP, INC. EXHIBIT 1001
`Page 13 of 23
`
`

`
`U.S. Patent
`
`Oct. 14, 2003
`
`Sheet 12 of 15
`
`US 6,633,945 B1
`
`5&08%m3n_o3%
`
`8%
`
`S55%8%
`
`
`
`ma3%n__2.l||IVTullmamDuon_s_
`
`
`
`:os_.:on_3.2-20".
`
`
`
`_>_<mom-m8_>_<~_om-m8
`
`_>_§8-%oFE-oEn_c_>_$_8.m8
`
`._m_z2<Io
`
`3.9.35.2.5
`
`
`
`._m_zz<Io._mzz<_._o
`
`E-O._..._.n_
`
`
`
`
`
`2<Ew.moo.m_zz<_._o_>_$am.~_8
`
`
`
`30.2.30".:us_.=uu_
`
`
`
`
`
`.E-OH.Es_<Em-m8MN_>_<~5m-m8bn_-O.E.n_
`
`
`
`._m_zz<_._o._mzz<Io
`
`mamEun=>_ma3%as
`
`23%:2022023%SP695%$53%
`
`
`
`._.n_-O._..._.n_._.n_-o._..._.n_
`
`
`
`._mzz<_._odzz<_._o
`
`O:Q.2o\_
`
`
`
`
`
`
`
`n_=._omoemmn___._omoemmn___._omoemmn___._omeemm
`
`
`
`
`
`
`
`
`
`
`
`NFmom._on_ma.8.mam_on_mam.8mamGmSm_on_mamEmam_on_
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 14 of 23
`
`NETAPP, INC. EXHIBIT 1001
`Page 14 of 23
`
`
`

`
`U.S. Patent
`
`Oct. 14, 2003
`
`Sheet 13 of 15
`
`US 6,633,945 B1
`
`5&08%8%3%
`
`2GE
`
`Tlma3%n=>_Ima3%.3
`
`Soumzzéo
`
`5-9.5
`
`dzz<_._oI:2:m_§m
`
`
`2E8o32.28zéom
`
`=os_.=ou_20.2.:n.3.2.20".E
`
`%
`
`dzz<IoV’|
`
`
`
`25E0n=2SouE-E.E
`
`._m_zz<_._o
`
`.E.oH.E
`
`._mzz<_._o
`
`N300maPaQ2
`
`2.O:O:o\_
`
`
`
`
`
`":5moemmn___._omoemm,___._om8_mm_E10m_8_%
`
`
`
`
`
`ma9ma_on_ma_on_SmGmmam_on_maan.mamEmam_on_
`
`
`
`
`
`
`
`v_z_._._<_m_mmv_z_._._<_$mxz:._<_m_m_mv_z_._._<_mmm
`
`
`
`
`
`
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 15 of 23
`
`NETAPP, INC. EXHIBIT 1001
`Page 15 of 23
`
`
`
`

`
`U.S. Patent
`
`Oct. 14, 2003
`
`Sheet 14 of 15
`
`US 6,633,945 B1
`
`
`
`
`
`mzoamz._<zO0O_._.EO~59.O._.z_.Eos_m_s_zoEE<._
`
`
`
`
`
`3..05
`
`
`
`oN_mIm:__Io;omo«zmnm_N_mv_oO._m
`
`momxo3E.8
`
`xoo._..._~.8
`
`
`
`mooxoo._mn.8
`
`mwo.N«_oo._m
`
`m8o53¢
`mom\M50.5
`moo{Rq53m
`‘momaxuo._m
`_Vacsv N50.5
`
`m59m
`
`mow
`
`moo
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 16 of 23
`
`NETAPP, INC. EXHIBIT 1001
`Page 16 of 23
`
`
`

`
`U.S. Patent
`U.S. Patent
`
`0a. 14, 2003
`Oct. 14, 2003
`
`Sheet 15 0f 15
`Sheet 15 of 15
`
`US 6,633,945 B1
`US 6,633,945 B1
`
`
`
`
`
`.|_HHIIIIIIIIIIIIIIIIIIIIIIII._mz:$55_xmgz_mz:m__._o<om9:mIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIll
`
`::::m:::::::::::m:1
`
`2.03
`
`NETAPP, INC. EXHIBIT 1001
`
`Page 17 of 23
`
`NETAPP, INC. EXHIBIT 1001
`Page 17 of 23
`
`

`
`US 6,633,945 B1
`
`1
`FULLY CONNECTED CACHE COHERENT
`MULTIPROCESSING SYSTEMS
`
`CROSS-REFERENCE TO RELATED APPLICATIONS
`This patent application is a continuation-in-part of the
`following commonly-oWned, US. patent application Ser.
`Nos.: US. application Ser. No. 08/986,430 now US. Pat.
`No. 6,065,077, AN APPARATUS AND METHOD FOR A
`CACHE COHERENT SHARED MEMORY MULTIPRO
`CESSING SYSTEM ?led Dec. 7, 1997; US. application
`Ser. No. 09/163,294 now US. Pat. No. 6,292,705,
`METHOD AND APPARATUS FOR ADDRESS
`TRANSFERS, SYSTEM SERIALIZATION, AND CEN
`TRALIZED CACHE AND TRANSACTION CONTROL,
`IN A SYMMETRIC MULTIPROCESSOR SYSTEM, ?led
`Sep. 29, 1998; and US. application Ser. No. 09/281,749 now
`US. Pat. No 6,516,442, CACHE INTERFACE AND PRO
`TOCOLS FOR CACHE COHERENCY IN A SCALABLE
`SYMMETRIC MULTIPROCESSOR SYSTEM, ?led Mar.
`30, 1999; all of Which are incorporated by reference herein.
`
`10
`
`15
`
`20
`
`BACKGROUND
`
`FIGS. 2—11 shoW point to point cache coherent sWitch
`solution for multiprocessor systems that are the subject of
`copending and coassigned applications.
`Depending on the implementation speci?cs, these designs
`may be problematic in tWo respects:
`1. Tag SRAM siZe is expensive
`2. Latency is greater than desired
`First, SRAM SiZe Issue:
`To support L2 siZe=4 MB, total 64 GB memory and 64
`byte line siZe
`the TAG array entry Will be 4 MB/64 Byte=64K entries
`the TAG siZe Will be 14 bits
`The total TAG array siZe=14 bits *64K=917,504 bit/per
`CPU
`To support 8-Way system, a duplicated TAG array siZe
`Will be 8*14 bits *64K—about 8M bit SRAM.
`8 Mbit SRAM is too large for single silicon integrait even
`With 0.25 micron CMOS process.
`Second, Latency Issue:
`Although the sWitch fabric solutions of FIGS. 2—11 pro
`vide scalability in memory throughput, maximum transac
`tion parallelism, and easy PCB broad routing, the latency for
`memory read transactions is greater than desired.
`Example for Memory Read Transactions:
`CPU read transaction Will ?rst latched by CCU, CCU
`format transaction into channel command, CCU Will send
`the transaction through channel, FCU’s IIF unit Will
`de-serialiZe the channel command or data and perform cache
`coherency operation, then FCU Will send the memory read
`transaction to MCU. MCU Will de-serialiZe the channel
`command, send the read command to DRAM address bus,
`MCU read from DRAM data bus, send the data to FCU via
`channel, FCU Will send data to CCU via channel. Finally the
`data is presented at CPU bus. A transaction for read crosses
`the channel four times. Each crossing introduces additional
`latency. What is needed is an SMP architecture With the
`bene?ts of the present FCU architecture, but With reduced
`Tag SRAM siZe requirements per chip and With reduced
`latencies.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`SUMMARY
`Fully connected multiple FCU-based architectures reduce
`requirements for Tag SRAM siZe and memory read laten
`
`65
`
`2
`cies. Apreferred embodiment of a symmetric multiprocessor
`system includes a sWitched fabric (sWitch matrix) for data
`transfers that provides multiple concurrent buses that enable
`greatly increased bandWidth betWeen processors and shared
`memory. A high-speed point-to-point Channel couples com
`mand initiators and memory With the sWitch matrix and With
`I/O subsystems.
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`FIG. 1 is a draWing of a prior-art generic symmetric
`shared-memory multiprocessor system using a shared-bus.
`FIG. 2 is a draWing of a preferred embodiment symmetric
`shared-memory multiprocessor system using a sWitched
`fabric data path architecture centered on a FloW-Control
`Unit (FCU).
`FIG. 3 is a draWing of the sWitched fabric data path
`architecture of FIG. 2, further shoWing internal detail of an
`FCU having a Transaction Controller (TC), Transaction Bus
`(TB), and Transaction Status Bus (TSB) according to the
`present invention.
`FIG. 4 is a draWing of a variation the embodiment of FIG.
`2, it Which each CPU has its oWn CCU, and in Which the
`channel interface and control is abstractly represented as
`being composed of a physical (PHY) link layer and a
`transport layer.
`FIG. 5 is a timing diagram comparing the memory trans
`action performance of a system based on a How control unit
`according to the present invention and a prior art shared-bus
`system.
`FIG. 6 is another vieW of the embodiment of FIG. 4.
`FIG. 7 is a draWing of a number of system embodiments
`according to the present invention. FIG. 7a illustrates a
`minimal con?guration, 7b illustrates a 4-Way con?guration,
`7c illustrates an 8-Way high-performance con?guration, and
`7a' illustrates a con?guration for I/O intensive applications.
`FIG. 8 is a draWing of a CPU having an integral CCU.
`FIG. 9 illustrates a variation of the embodiment of FIG. 6
`using the integrated CPU/CCU of FIG. 8.
`FIGS. 10a—d illustrates variations of the embodiments of
`FIG. 7 using the integrated CPU/CCU of FIG. 8.
`FIG. 11 is a draWing of an 4-Way embodiment of the
`present invention that includes coupling to an industry
`standard sWitching fabric for coupling CPU/Memory com
`plexes With I/O devices.
`FIG. 12 is a draWing of an FCU-based architecture
`according to a ?rst embodiment.
`FIG. 13 is a draWing of an FCU-based architecture
`according to a second embodiment.
`FIG. 14 de?nes the cache line characteristics of the
`systems of FIGS. 12 and 13.
`FIG. 15 de?nes the cache line de?nition.
`
`DETAILED DESCRIPTION
`
`System OvervieW
`FIG. 2 is a draWing of a preferred embodiment symmetric
`shared-memory multiprocessor system using a sWitched
`fabric data path architecture centered on a FloW-Control
`Unit (FCU) 220. In the illustrated embodiment, eight pro
`cessors 120 are used and the con?guration is referred herein
`as an “8P” system.
`The ECU (FloW Control Unit) 220 chip is the central core
`of the 8P system. The ECU internally implements a
`sWitched-fabric data path architecture. Point-to-Point (PP)
`
`NETAPP, INC. EXHIBIT 1001
`Page 18 of 23
`
`

`
`US 6,633,945 B1
`
`3
`interconnect 112, 113, and 114 and an associated protocol
`de?ne dedicated communication channels for all FCU U0.
`The terms Channels and PP-Channel are references to the
`FCU’s PP U0. The FCU provides Point-to-Point Channel
`interfaces to up to ten Bus Bridge Units (BBUs) 240 and/or
`CPU Channel Units (CCUs, also knoWn as Chanel Interface
`Units or CIUs) and one to four Memory Control Units
`(MCUs) 230. TWo of the ten Channels are ?xed to connect
`to BBUs. The other eight Channels can connect to either
`BBUs or CCUs. In an illustrative embodiment the number of
`CCUs is eight. In one embodiment the CCUs are packaged
`as a pair referred herein as a Dual CPU Interface Unit
`(DCIU) 210. In the 8P system shoWn, the Dual CPU
`Interface Unit (DCIU) 210 interfaces tWo CPUs With the
`FCU. Throughout this description, a reference to a “CCU”
`is understood to describe the logical operation of each half
`of a DCIU 210 and a references to “CCUs” is understood to
`apply to equally to an implementation that uses either single
`CCUs or DCIUs 210. CCUs act as a protocol converter
`betWeen the CPU bus protocol and the PP-Channel protocol.
`The FCU 210 provides a high-bandWidth and loW-latency
`connection among these components via a Data SWitch, also
`referred herein as a Simultaneous SWitched Matrix (SSM),
`or sWitched fabric data path. In addition to connecting all of
`these components, the FCU provides the cache coherency
`support for the connected BBUs and CCUs via a Transaction
`Controller and a set of cache-tags duplicating those of the
`attached CPUs’ L2 caches. FIG. 5 is a timing diagram
`comparing the memory transaction performance of a system
`based on a How control unit according to the present
`invention and a prior art shared-bus system.
`In a preferred embodiment, the FCU provides support tWo
`dedicated BBU channels, four dedicated MCU channels, up
`to eight additional CCU or BBU channels, and PCI peer
`to-peer bridging. The FCU contains a Transaction Controller
`(TC) With re?ected L2, states. The TC supports up to 200M
`cache-coherent transactions/second, MOSEI and MESI
`protocols, and up to 39-bit addressing. The FCU contains the
`Simultaneous SWitch Matrix (SSM) Data?oW SWitch, Which
`supports non-blocking data transfers.
`In a preferred embodiment, the MCU supports ?exible
`memory con?gurations, including one or tWo channels per
`MCU, up to 4 Gbytes per MCU (maximum of 16 Gbytes per
`system), With one or tWo memory banks per MCU, With one
`to four DIMMS per bank, of SDRAM, DDR-SDRAM, or
`RDRAM, and With non-interleaved or interleaved operation.
`In a preferred embodiment, the BBU supports both 32 and
`64 bit PCd bus con?gurations, including 32 bit/33 MHZ, 32
`bit/66 MHZ, and 64 bit/66 MHZ. The BBU is also 5V
`tolerant and supports AGP.
`All connections betWeen components occur as a series of
`“transactions.” A transaction is a Channel Protocol request
`command and a corresponding Channel Protocol reply. For
`example, a processor, via a CCU, can perform a Read
`request that Will be forWarded, via the FCU, to the MCU; the
`MCU Will return a Read reply, via the FCU, back to the same
`processor. A Transaction Protocol Table (TPT) de?nes the
`system-Wide behavior of every type of transaction and a
`Point-to-Point Channel Protocol de?nes the command for
`mat for transactions.
`The FCU assumes that initiators have converted addresses
`from other formats to conform With the PP-Channel de?ni
`tions. The FCU does do target detection. Speci?cally, the
`FCU determines the correspondence betWeen addresses and
`speci?c targets via address mapping tables. Note that this
`mapping hardWare (contained in the CFGIF and the TC)
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`4
`maps from Channel Protocol addresses to targets. The
`mapping generally does not change or permute addresses.
`
`Summary of Key Components
`Transaction Controller (TC) 400. The most critical coher
`ency principle obeyed by the FCU is the concept of a single,
`system-serialiZation point. The system-serialiZation point is
`the “funnel” through Which all transactions must pass. By
`guaranteeing that all transactions pass through the system
`serialiZation point, a precise order of transactions can be
`de?ned. (And this in turn implies a precise order of tag state
`changes.) In the FCU, the system-serialiZation point is the
`Transaction Controller (TC). Coherency state is maintained
`by the duplicate set of processor L2 cache-tags stored in the
`TC.
`The Transaction Controller (TC) acts as central system
`serialiZation and cache coherence point, ensuring that all
`transactions in the system happen in a de?ned order, obeying
`de?ned rules. All requests, cacheable or not, pass through
`the Transaction Controller. The TC handles the cache coher
`ency protocol using a duplicate set of L2 cache-tags for each
`CPU. It also controls address mapping inside the FCU,
`dispatching each transaction request to the appropriate target
`interface.
`Transaction Bus (TB) 3104 and Transaction Status Bus
`(TSB) 3106. All request commands ?oW through the Trans
`action Bus. The Transaction Bus is designed to provide fair
`arbitration betWeen all transaction sources (initiators) and
`the TC; it provides an inbound path to the TC, and distrib
`utes outbound status from the TC (via a Transaction Status
`Bus). The Transaction Bus (TB) is the address/control
`“highWay” in the FCU. It includes an arbiter and the
`Transaction Bus itself. The TB pipelines the address over
`tWo cycles. The extent of pipelining is intended to support
`operation of the FCU at 200 MHZ using contemporary
`fabrication technology at the time of ?ling of this disclosure.
`Whereas the TB provides inputs to the Transaction
`Controller, the Transaction Status Bus delivers outputs from
`the Transaction Controller to each interface and/or target.
`The TSB outputs provide transaction con?rmation, coher
`ency state update information, etc. Note that While many
`signals on the TSB are common, the TC does drive unique
`status information (such as cache-state) to each interface.
`The Transaction Bus and Transaction Status Bus are dis
`cussed in detail later in this application.
`SWitched Fabric Data Path (Data SWitch). The Data
`SWitch is an implementation of a Simultaneous SWitched
`Matrix (SSM) or sWitched fabric data path architecture. It
`provides for parallel routing of transaction data betWeen
`multiple initiators and multiple targets. The Data SWitch is
`designed to let multiple, simultaneous data transfers take
`place to/from initiators and from/to targets (destinations of
`transactions). Note that the Data SWitch is packet based.
`Every transfer over the Data SWitch starts With a Channel
`Protocol command (playing the role of a packet header) and
`is folloWed by Zero or more data cycles (the packet payload).
`All reply commands (some With data) NoW through the Data
`SWitch. Both Write requests and read replies Will have data
`cycles. Other replies also use the Data SWitch and Will only
`send a command header (no payload).
`IIF (Initiator InterFace) 3102. The IIF is the interface
`betWeen the FCU and an initiator (a BBU or a CCU). The IIF
`transfers Channel Protocol commands to and from the
`initiator. The IIF must understand the cache coherency
`protocol and must be able to track all outstanding transac
`tions. Note that the BBU/CCU can be both an initiator of
`
`NETAPP, INC. EXHIBIT 1001
`Page 19 of 23
`
`

`
`US 6,633,945 B1
`
`15
`
`25
`
`35
`
`5
`commands and a target of commands (for CSR read/Write if
`nothing else). Address and control buffering happen in the
`IIF; bulk data buffering is preferably done in the BBU/CCU
`(in order to save space in the FCU, Which has ten copies of
`the IIF). The IIF needs con?guration for CPU and I/O
`modes, and to handle differences betWeen multiple types of
`processors that may be used in different system con?gura
`tions.
`Memory Interface (MIF) 3108. The Memory Interface
`(MIF) is the portal to the memory system, acting as the
`interface betWeen the rest of the chipset and the MCU(s).
`The MIF is the interpreter/?lter/parser that receives trans
`action status from the TB and TC, issues requests to the
`MCU, receives replies from the MCU, and forWards the
`replies to the initiator of the transaction via the Data SWitch.
`It is a “slave” device in that it can never be an initiator on
`the TB. (The MIF is an initiator in another sense, in that it
`sources data to the Data SWitch.) For higher performance,
`the MIF supports speculative reads. Speculative reads start
`the read process early using the data from the TB rather than
`Waiting for the data on the TSB. There is one MIF
`(regardless of hoW many memory interfaces there are). The
`MIF contains the memory mapping logic that determines the
`relationship betWeen addresses and MCUs (and memory
`ports). The memory mapping logic includes means to con
`?gure the MIF for various memory banking/interleaving
`schemes. The MIF also contains the GART (Graphics
`Address Remap Table). Addresses that hit in the GART
`region of memory Will be mapped by the GART to the
`proper physical address.
`Con?guration Register Interface (CFGIF) 410. This is
`Where all the FCU’s Control and Status Registers (CSRs)
`logically reside. CFGIF is responsible for the reading and
`Writing of all the CSRs in the FCU, as Well as all of the
`diagnostic reads/Writes (e.g., diagnostic accesses to the
`duplicate tag RAM).
`Channel Interface Block (CIB). The CIBs are the transmit
`and receive interface for the Channel connections to and
`from the FCU. The FCU has 14 copies of the CIB, 10 for
`BBUs/CCUs, and 4 for MCUs. (The CIB is generic, but the
`logic on the core-side of the Channel is an IIF or the MIF.)
`Embodiments overvieW. FIG. 3 is a draWing shoWing
`internal detail of the sWitched fabric data path architecture
`Within the FCU of FIG. 2. A ?rst key component of the FCU
`is the Transaction Controller (TC) 400. A second key com
`ponent of the FCU is an address and control bus 3100, that
`is actually an abstraction representing a Transaction Bus
`(TB) 3104 and Transaction Status Bus (TSB) 3106. A third
`key component of the FCU is the Data Path SWitch (also
`referred herein as the Data SWitch, or the sWitched fabric
`data path). The Data SWitch is composed of vertical buses
`320, horiZontal buses 340, node sWitches 380. The node
`sWitches selectively couple the vertical and horiZontal buses
`under control of the Data Path SWitch Controller 360 and
`control signals 370. Additional key components of the FCU
`include one or more Initiator Interfaces (IIFs) 3102; a
`Memory Interface (MIF) 3108; and Channel Interface
`Blocks (CIBs) 305 at the periphery of the various interfaces.
`A number of alternate embodiments eXist. FIG. 4 is a
`draWing of a variation on the embodiment of FIG. 2, in
`Which each CPU has its oWn CCU. In this vieW the channel
`interface and control that make up the IFs and CCUs are
`abstractly represented as being composed of a physical
`(PHY) link layer and a transport layer. FIG. 6 is another
`vieW of the embodiment of FIG. 4. FIG. 7 is a draWing of
`a number of application speci?c variations on the embodi
`
`6
`ment of FIG. 4. FIG. 7a illustrates a minimal con?guration,
`7b illustrates a 4-Way con?guration, 7c illustrates an 8-Way
`high-performance con?guration, and 7a' illustrates a con
`?guration for I/O intensive applications.
`FIG. 8 is a draWing of a CPU having an integral CCU.
`FIG. 8 makes eXplicit a “backside” bus interface to an
`external cache (an L2 cache in the case illustrated). An IIF
`replaces the conventional CPU interface, such that the
`Channel is the frontside bus of the CPU of FIG. 8.
`The embodiments of FIGS. 9 and 10, are respective
`variations of the embodiments of FIGS. 6 and 7, With
`adaptation for the use of the integrated CPU/CCU of FIG. 8.
`The embodiments of FIGS. 9 and 10 offer system solutions
`With loWer CPU pin counts, higher throughput, loWer
`latency, hot plugable CPUs (if an OS supports it), and
`reduced PCB board layout compleXity compared With non
`integrated solutions.
`FIG. 11 is a draWing of an 4-Way embodiment of the
`present invention that includes coupling to an industry
`standard sWitching fabric for coupling CPU/Memory com
`pleXes with I/O devices.
`FIG. 12 is a draWing of an FCU-based architecture
`according to a ?rst embodiment.
`FIG. 13 is a draWing of an FCU-based architecture
`according to a second embodiment.
`FIG. 14 de?nes the cache line characteristics of the
`systems of FIGS. 12 and 13.
`
`Additional Descriptions
`US. application Ser. No. 08/986,430, AN APPARATUS
`AND METHOD FOR A CACHE COHERENT SHARED
`MEMORY MULTIPROCESSING SYSTEM, ?led Dec. 7,
`1997, incorporated by reference above, provides additional
`detail of the overall operation of the systems of FIGS. 2 and
`3. US. application Ser. No. 09/163,294, METHOD AND
`APPARATUS FOR ADDRESS TRANSFERS, SYSTEM
`SERLIZATION, AND CENTRALIZED CACHE AND
`TRANSACTION CONTROL, IN, A SYMMETRIC MUL
`TIPROCESSOR SYSTEM, ?led Sep. 29, 1998, provides
`additional detail of particular transaction address bus
`embodiments, and Was incorporated by reference previously
`herein. US. application Ser. No. 09/168,311, METHOD AN
`APPARATUS FOR EXTRACTING RECEIVED DIGITAL
`DATA FROM A FULLDUPLEX POINT-TO-POINT SIG
`NALING CHANNEL USING SAMPLED DATA
`TECHNIQUES, ?led Oct. 7, 1998, provides additional
`detail of particular transceiver embodiments, and Was incor
`porated by reference previously herein. US. application Ser.
`No. 09/281,749, CHANNEL INTERFACE AND PROTO
`COLS FOR CACHE COHERENCY IN A SCALABLE
`SYMMETRIC MULTIPROCESSOR SYSTEM, ?led Mar.
`30, 1999, provides additional detail of the channel interface
`blocks and the transport protocol, and Was incorporated by
`reference previously herein. To the eXtent to Which any
`discrepancies eXist betWeen the description in the above
`referenced applications and the instant application, the
`instant appli

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket