throbber
United States Patent
`
`(12)
`(10) Patent No.:
`US 7,296,121 B2
`
`Morton et al.
`(45) Date of Patent:
`Nov. 13, 2007
`
`US007296121B2
`
`(54) REDUCING PROBE TRAFFIC IN
`MULTIPROCESSOR SYSTEMS
`
`(75)
`
`Inventors: Eric Morton, Austin, TX (US); Rajesh
`’
`.
`E3“: ‘Al‘uiln’t.TXT§J%)j§d3an.d B
`a ee’
`115.11“
`’ a“
`G135”: Ausms TX (Us)
`_
`.
`.
`(73) ASSlgneeI NeWISyss Inca Austm, TX (US)
`
`'
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 250 days.
`
`5,524,212 A
`5,692,123 A
`5,751,995 A
`27:52,??? 2
`,
`,
`6,018,791 A
`6,038,652 A
`6,052,769 A
`6,067,603 A
`6,073,210 A
`6,085,295 A
`6,108,737 A
`6,122,715 A
`
`6/1996 Somani et al.
`11/1997 Logghe
`5/1998 Sarangdhar
`12/133;
`fiomllllro tet 31'
`erc an
`1/2000 Arimilli et al.
`3/2000 Phillips et al.
`4/2000 Hulf et al.
`5/2000 Carpenter et al.
`6/2000 Palanca et 31.
`7/2000 Ekanadham et al.
`8/2000 sharma et 31.
`9/2000 Palanca et al.
`.
`(Cont1nued)
`
`(21) APP1~ N05 10/966,161
`
`(22)
`
`Filed:
`
`Oct. 15, 2004
`(Under 37 CFR 1.47)
`
`FOREIGN PATENT DOCUMENTS
`
`W0
`
`W0 0239242
`
`5/2002
`
`OTHER PUBLICATIONS
`
`(65)
`
`Prior Publication Data
`US 2007/0055826 A1
`Mar. 8, 2007
`
`Guo, et al., “A Probe-Based Server Selection Protocol for Differ-
`entiated Service Networks”, © 2002 IEEE, p. 2353-2357.*
`
`Related US. Application Data
`(63) Continuation-in-part of application No. 10/288,347,
`filed on Nov. 4, 2002. now Pat. No. 7,003,633.
`
`(Cont1nued)
`Primary ExammeriBr‘lan.R. Peugh
`(74) A’wmey’ Age”! 0r F’rmiBeyer weaVer LLP
`
`(51)
`
`Int. Cl.
`(200601)
`G06F 12/00
`(52) US. Cl.
`..................; ..................... 711/148; 711/141
`(58) Field of ClaSSIficatlon Search .................
`7 11/141,
`711/148’ 131’ 144’ 145’ 146’ 709/206’ 213’
`.
`709/2165 217} 218’ 219
`.
`See appllcatlon file for complete search hlstory.
`References Cited
`U.S. PATENT DOCUMENTS
`
`(56)
`
`5,195,089 A
`5,394,555 A
`
`3/1993 Sindhu et al.
`2/1995 Hunter et 31.
`
`ABSTRACT
`(57)
`A computer system having a plurality of processing nodes
`interconnected by a first point-to-point architecture is
`described. Each processing node has a cache memory asso-
`ciated therewith. Aprobe filtering unit is operable to receive
`probes corresponding to memory lines from the processing
`nodes and to transmit the probes only to selected ones of the
`processing nodes with reference to probe filtering informa-
`tion. The probe filtering information is representative of
`states associated with selected ones of the cache memories.
`
`25 Claims, 25 Drawing Sheets
`
`Local Probe
`Fllleling
`
`memory cummllar
`
`geneyales pmhe 12002)
`
`
`
`v
`consult mullng
`table (2004)
`
`
`
`send Dmbe auly
`w PFU 12006)
`
` l
` v
`
`
`respond in
`
`requesmrlzmzl "“"°
`
`
`
`Requesled
`lms sloreo 1n cache'l
`201
`.5
`PFU sends pronelsl only lo nudes
`malmay have llneln some (2014) 1
`
`
`
`Pmbed nodes look
`
`up Ilne (2016)
`
`
`
`Pmbed nodes mum
`
`response: to PFU(ZD1B)
`
`
`FFU upoales
`anomaly (20207
`
`
`
`
`FFU accepts Probe and
`Mars lo acne mnemnm
`
`fllrednly (20051
`
`
` All
`responses (2024)
` FFU mllate: Ihe
`
`moans
`pmhad
`responded?
`(2022)
`inv
`
`PFU mivnnds In
`requesting nooe [2025)
`
`
`
`1
`
`APPLE 1001
`
`1
`
`APPLE 1001
`
`

`

`US 7,296,121 B2
`
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`2004/0088494 A1
`2004/0117559 A1
`2004/0255002 A1
`
`5/2004 Giasco
`6/2004 Glasco
`l2/2004 Kota et al.
`
`OTHER PUBLICATIONS
`
`
`
`\Io.
`
`\Io.
`
`\Io.
`
`
`
`
`
`* cited by examiner
`
`
`
`Bordaz et al.
`Keller et al.
`Palanca et al.
`Bauman et al.
`Arimilli et al.
`Palanca et al.
`Durham et al.
`Wang et al.
`Fu et al.
`Arimilli et al.
`Arimilli et al.
`Baumgartner et al.
`Arimilli et al.
`Keller et al.
`Arimilli et al.
`Miller et al.
`Armstrong et al.
`Keller et al.
`Zalewski et al.
`Khare et al.
`Morioka et al.
`Fu et al.
`Kessler et al.
`Sharma et al.
`Gharachorloo et al.
`Nguyen et al.
`Comisky et al.
`Janakiraman et al.
`Van Huben et al.
`Van Huben et al.
`Deneroff et al.
`Webb et al.
`Arimilli et al.
`Arimilli et al.
`Dhong et al.
`Mudgett et al.
`Bauman
`Glasco
`Hass et al.
`Glasco
`Weber
`Razdan et al.
`Van Doren
`Fung
`Gharachorloo et al.
`Zalewski et al.
`Van Huben et al.
`Van Huben
`Gharachorloo et al.
`Khare et al.
`Arimilli et al.
`Glasco
`Glasco
`Glasco
`Edirisooriya et al.
`Kessler et al.
`Giasco
`Giasco
`Giasco et al.
`Keller et al.
`Webb et al.
`Glasco
`Glasco
`
`............ 711/146
`
`.............. 7ll/l30
`
`................ 709/213
`
`11/2000
`12/2000
`l/2001
`2/2001
`2/2001
`3/2001
`3/2001
`9/2001
`9/2001
`l2/2001
`l2/2001
`l/2002
`l/2002
`5/2002
`6/2002
`10/2002
`10/2002
`l2/2002
`4/2003
`9/2003
`10/2003
`10/2003
`10/2003
`10/2003
`10/2003
`l2/2003
`l2/2003
`3/2004
`5/2004
`5/2004
`6/2004
`6/2004
`6/2004
`7/2004
`7/2004
`8/2004
`9/2004
`3/2005
`5/2005
`2/2006
`8/2001
`* 10/2001
`ll/2001
`l/2002
`4/2002
`5/2002
`6/2002
`6/2002
`7/2002
`7/2002
`l/2003
`9/2003
`9/2003
`9/2003
`10/2003
`10/2003
`ll/2003
`ll/2003
`l2/2003
`2/2004
`4/2004
`5/2004
`5/2004
`
`*
`
`*
`
`AAB
`
`1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B2
`B2
`B1
`B1
`B1
`B1
`B2
`B2
`B1
`B1
`B2
`B2
`B1
`B1
`B2
`B2
`B2
`B1
`B1
`B2
`B2
`B2
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`Al
`A1
`
`6,148,378
`6,167,492
`6,173,393
`6,189,078
`6,192,451
`6,205,520
`6,209,065
`6,292,705
`6,292,906
`6,330,643
`6,334,172
`6,338,122
`6,343,347
`6,385,705
`6,405,289
`6,463,529
`6,467,007
`6,490,661
`6,542,926
`6,615,319
`6,631,447
`6,633,945
`6,633,960
`6,636,906
`6,640,287
`6,658,526
`6,665,767
`6,704,842
`6,738,870
`6,738,871
`6,751,698
`6,751,721
`6,754,782
`6,760,809
`6,760,819
`6,775,749
`6,799,252
`6,865,595
`6,892,282
`7,003,633
`2001/0013089
`2001/0029574
`2001/0037435
`2002/0007463
`2002/0046327
`2002/0052914
`2002/0083149
`2002/0083243
`2002/0087807
`2002/00878ll
`2003/0009623
`2003/0182508
`2003/0182509
`2003/0182514
`2003/0195939
`2003/0196047
`2003/0210655
`2003/02l274l
`2003/0233388
`2004/0024836
`2004/0073755
`2004/0088492
`2004/0088493
`
`1.03,
`Specification Revision
`HyperTransportTM l/O Link
`HyperTransportTM Consortium, Oct. 10, 2001, Copyright © 2001
`HyperTransport Technology Consortium.
`PCT Search Report PCT/USO3/34756, Int’l filing date Oct. 30,
`2003, Search report Mailed Dec. 16, 2004.
`Bilir et al., “Multicast Snooping: A New Coherence Method Using
`a Multicast Address Networ ”, Computer Architecture, 1999. Pro-
`ceedings of the 26th International Symposium on, May 2-4, 1999.
`Martin et al., “Bandwidth Adaptive Snooping”, Proceedings of the
`Eighth International Symposium on High-Performance Computer
`Architecture on Feb. 2-6, 2002; pp. 251-262.
`Sorin et al., “Specifying and Verifying a Broadcast and a Multicast
`Snooping Cache Coherence Protocol”, IEEE Transactions on Par-
`allel and Distributed Systems, vol. 13, No. 6, Jun. 2002.
`US. Appl. No.: 10/288,347 (Now US. Pat. No. 7,003,633), Notice
`of Allowance, dated Sep. 12, 2005.
`US. Appl. No. 10/288,347 (Now US. Pat. No. 7,003,633), First
`Office Action, dated Nov. 18, 2004.
`Kim et al., “Power-aware Partitioned Cache Architectures”, 2001
`ACM p. 6467.
`Powell et al., “Reducing Set-Associative Cache Energy via Way-
`Prediction and Selective Direct-Mapping” 2001 IEEE, p. 54-65.
`Culler, D. E., J. P. Singh, A. Gupta, “Parallel Computer Architec-
`ture”,
`1999 Morgan Kaufmann, San Francisco, CA USA
`XP002277658.
`Tanenbaum, Andrew, “Computer Networks”, Computer Networks,
`London: Prentice Hall
`International, GB, 1996, pp. 345-403,
`XP002155220.
`L.S. Appl. No.: 10/288,347 (Now US. Pat. No. 7,003,633). Final
`Office Action, dated May 12, 2005.
`LS Office Action mailed Sep. 22, 2004, from L.S. Appl. \Io.
`10/106,426 [\TWISP002].
`L.S. Office Action mailed Mar. 7, 2005, form L.S. Appl. \Io.
`10/106,426 [\TWISP002].
`L.S. Office Action mailed Jul. 21, 2005, from L.S. Appl. \Io.
`10/106,426 [\TWISP002].
`L.S. Office Action mailed Sep. 23, 2004, from L.S. Appl. \Io.
`10/106,430 [\TWISP003].
`L.S. Ofiice Action mailed Mar. 10, 2005, form L.S. Appl. \Io.
`10/106,430 [\TWISP003].
`L.S. Office Action mailed Jul. 21, 2005, from L.S. Appl. \Io.
`10/106,430 [\TWISP003].
`L.S. Office Action mailed Sep. 22, 2004, from L.S. Appl. \Io.
`10/106,299 [\TWISP004].
`L.S. Ofiice Action mailed Mar. 10, 2005, from L.S. Appl. \Io.
`10/106,299 [\TWISP004].
`L.S. Office Action mailed Jul. 21, 2005, from L.S. Appl. \Io.
`10/106,299 [\TWISP004].
`L.S. Office Action mailed Jul. 20, 2005, from L.S. Appl. \Io.
`10/608,846 [\TWISP030].
`L.S. Office Action mailed Sep. 9, 2005, from L.S. Appl.
`10/462,015 [\TWISP040].
`L.S. Office Action mailed Sep. 9, 2005, from L.S. Appl.
`10/426,084 [\TWISP033].
`L.S. Office Action mailed Nov. 2, 2005, from L.S. Appl. \Io.
`10/106/430 [\TWISP003].
`L.S. Office Action mailed Oct. 5, 2005, from L.S. Appl.
`10/635,703 [\TWISP036].
`
`2
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 1 of 25
`
`US 7,296,121 B2
`
`Figure 1A
`
`Processing
`Cluster 101
`
`Processing
`Cluster 103
`
`Cluster 107
`
`Processing
`Cluster 105
`
`Processing
`
`Figure 1B
`
`Processing
`Cluster 121
`
`Processing
`Cluster 123
`
`Cluster 127
`
`Processing
`Cluster 125
`
`Processing
`
`3
`
`

`

`U.S. Patent
`
`ehS
`
`M2
`
`S
`
`2Bn
`
`70023a1
`
`
`
`
`V.to.0Omm5:05:00Nflow538on8:20:00mmomEmmooem
`aE
`3umow8880iomomEmmoooi
`
`
`30m
`
`Uomm0:III:InENOh
`
`2.,I85257-
`
`1,3505wowI%mOHm-
`
`
`
`
`
`EN:BEmO:ocom
`
`
`
`
`
`moEwE
`
`
`
`$0320895%
`
`
`
`Doom“wow
`
`
`
`2
`
`
`
`
`
`835m
`
`EN5&8on
`
`
`
`
`
`4
`
`
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 3 of 25
`
`US 7,296,121 B2
`
`
`
`
`wfiwcom
`
`oomHobsm
`
`
`
`
`
`
`
`momofiwnm3080.5
`
`
`
`HflmUOMMHOHGHHQDHUSOOEOZ
`
`
`
`Smooflgofi:
`
`“55:00 moSwE
`
`
`
`5
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 4 of 25
`
`US 7,296,121 B2
`
`v8:me
`
`
`
`6
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 5 of 25
`
`US 7,296,121 B2
`
`gDEE
`
`
`
`
`
` H3$602“$04.52
`
`7
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 6 of 25
`
`US 7,296,121 B2
`
`mmoSwE
`
`
`
`mufiozHmofinfioz
`
`Hmm
`
`8
`
`
`
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 7 of 25
`
`US 7,296,121 B2
`
`Figure5C
`
`In
`D
`'U
`
`0 Z
`
`m
`8111cm
`*71I:
`
`0 ZH
`
`9
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 8 of 25
`
`US 7,296,121 B2
`
`02
`
`N-momm-Nomi;
`
`
`
`
`
`
`
`N-.ow
`
`mmmama
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`:m
`
`
`
`
`
`
`
`mouoZBoo-H-282
`
`mum
`
`10
`
`10
`
`
`

`

`U.S. Patent
`
`N
`
`1V.
`
`70
`
`f09teehS
`
`US 7,296,121 B2
`
`
`
`Duo0
`
`méoom-moo
`
`
`
`
`
`0<
`E0:51
`3a“833%
`mcoo.5320
`
`
`
`
`
`
`
`
`
`
`
`
`
`0N05330
`
`11
`
`
`
`8050M
`
`
`
`oweSums—O
`
`coSwE
`
`11
`
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 10 0f 25
`
`US 7,296,121 B2
`
`Figure 7
`
`
`
`Coherence Directory 701
`
`Memory Line 711
`State 713
`gig;r1373}?
`OCCUPaI/Bfg’ VCCW
`
`1 Address 721
`Invalid
`N/A
`N/A
`
`l Address 731
`Invalid
`N/A
`N/A
`Address 741
`Shared
`N/A
`Clusters 1,3
`Address 751
`Shared
`N/A
`Clusters 1, 2, 3, 4
`[ Address 761
`Owned
`Cluster 4
`Cluster 2, 3, 4
`M Address 771
`“Owned
`Cluster 2
`Cluster 2, 4
`‘
`Address 781
`Modified
`Cluster 2
`N/A
`‘
`Address 791
`Modified
`Cluster 3
`l
`
`
`
`'
`
`-
`
`I
`
`
`
`l_____l
`
`12
`
`
`
`
`
`“I
`
`i
`
`
`
`
`
`N/A
`
`
`12
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 11 0125
`
`US 7,296,121 B2
`
`Figure 8
`
`
`
`Probe Filter
`Information 821
`
`Read Block Modify
`Read Block (Read) 823 (ReaWrite) 825
`
`
`
`Invalid 831
`
`Can use completion bit.
`Probe home cluster. (801)
`
`Can use completion bit.
`Pro“ figfigfmer-
`
`
`
`Shared 833
`
`Can use completion bit.
`Probe home cluster. (803)
`
`
`N/A (811)
`
`i Can use completion bit.
`Probe remote cluster with
`N/A (813)
`line cached in owned
`Owned 835
`state. (805)
`
`
`
`
`
`Can use completion blt'
`Probe remote cluster with
`line cached in modified
`state. (807)
`l.
`
`
`~
`
`Modified 837
`
`Can use completion bit.
`Probe regglpge): cluster.
`
`13
`
`13
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 12 of 25
`
`US 7,296,121 B2
`
`U:
`
`N-mmm
`
`3m2m553800
`
`
`
`i2%W402o /Rm4
`
`Ra28.
`
`14
`
`
`
`
`
`0mmSEED
`
`O
`
`N-Mmm2:0:
`
`a9.5mm
`
`28I280PG
`
`“weaved
`
`coo$330
`
`805cm
`
`ova$320
`
`14
`
`
`
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 13 of 25
`
`US 7,296,121 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`02
`
`N-mmo_
`
`
`
`
`6m:m28303800
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Duo
`
`v-52
`
`: v
`
`-83U
`
`
`
`S2&5
`
`amuzvum
`
`02:$320
`
`bum-:0mEoI
`
`ONE
`
`295m
`
`
`
`o3:Ems—U
`
`15
`
`15
`
`
`
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 14 of 25
`
`US 7,296,121 B2
`
`Figure 11
`
`
`
`
`equest Handling 1
`Home Coherence
`
`Controller
`
`
`R
`.
`'
`eceive Request
`Associated With A
`
`Memory Line
`
`i
`
`
`
`1135 X Forward Probe With
`
`Forward Request To
`—————> Completion Indicator To
`
`
`Memory Controller Remote Cluster
`
`1101
`
`1105
`
`
`V
`1 1" 9
`
`
`
`l109 F\..
`
`Receive Probe From 1
`
`Memory Controller
`
`V
`
`1113
`
`Access Coherence
`
`
`
`Directory And Probe
`Filter Information
`
`
`
`
`I se Filtering An
`pletion Indicat
`.
`
`Probe Responses And Do
`Not Forward Response To
`Request Cluster
`
`
`3 x RCCCIVC Home Cluster
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Probe Remote
`Cluster?
`
`1 131
`
`1 141
`
`
`
`
`
`
`
`
`Receive Source Done
`Receive Probe Responses
`
`
`From Request Cluster
`And Forward Response
`And Forward Source
`
`To Request Cluster With
`Done To Memory
`
`Completion Indicator
`Controller
`
`
`
`
`Proceed Using Memory
`Controller Serialization
`With No Filtering And
`
`Completion Bit
`
`16
`
`16
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 15 of 25
`
`US 7,296,121 B2
`
`Figure 12
`
`
`
`[
`1
`
`
`
`Memory Controller Filter Information 1221
`ReadBlock [Read] 1223
`lffiaegdfiggfigfgfsy
`
`
`
`Invalid 1231
`
`Send rec(1111§(s)tlt)o target.
`
`Send recglllggtggo target.
`
`
`
`Shared 1233
`
`Send recengzgtf target.
`
`Send re((1ll§ltlt)o target.
`
`
`
`l
`
`”213330W
`
`Owned
`owiiggaétuisgtitgog
`
`
`MOdified 1237
`
`Mogggliarcigtglé 5307) Moclli(iir:dagllli)sr’tcelrj.e(1l%15)
`
`
`
`17
`
`17
`
`

`

`U.S. Patent
`
`V.0N
`
`700
`
`61a
`
`5
`
`US 7,296,121 B2
`
`
`
`
`
`
`
`
`
`
`98m—v-—ogv-32T89T52
`0Eu7u0Eu
`
`29.3me
`
`
`
`2co?Bums—UB“moscom
`
`2ON?
`
`rmBEEU2:2.—
`
`mEN:5mgm8:235752
`OU
`
`18
`
`
`
`
`
`
`
`
`32vA
`
`N43:312ATE»:U1H0
`
`
`
`A
`
`$28083*
`
`912Sums—O
`
`18
`
`

`

`
`
`equest Handling
`Home Coherence
`
`
`Receive Request
`1403
`Identify Characteristics
`
`Associated With A
`Associated With Re uest
`Particular Memory Line
`q
`
`
`
`
`‘
`
`qu
`
`
`1411
`
`'5
`
`
`
`ass Memo v
`yp
`Controller?
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 17 of 25
`
`US 7,296,121 B2
`
`Figure 14
`
`Controller
`
`1401
`
`1405
`
`1405
`
`Forward Re uest To
`q
`Serialization Point
`1
`.
`.
`Proceed with Or Without
`Probe Filtering/31nd
`Completlon Indlcator
`
`(Figure 6 Or Figure 11)
`
`
`
`Yes
`'
`Block Requests And
`Probe Remote Cluster
`(Figure 12)
`
`
`
`l
`
`.
`
`1415
`
`1417
`
`Unblock Memory Line
`After Receiving Source
`Done From Request
`Cluster
`
`
`
`19
`
`19
`
`

`

`U.S. Patent
`
`Nov 13, 2007
`
`Sheet 18 of 25
`
`US 7,296,121 B2
`
`
`«.82.302$82$82$82I#82T82
`02mMHQ02m00002m0%HQ
`QmQBwomfi
`
`A3E5vows
`
`
`$82$2:02g80
`
`onH
`
`5520325%
`
`SEED060m
`
`20
`
`20
`
`

`

`U.S. Patent
`
`7m
`
`M91te
`
`US 7,296,121 B2
`
`
`
`
`
`3,AA$83”-83.352NASH.T82:03w8%HQ080%035anNQm
`
`
`
`<03#85A33:?
`
`SESBaaI
`
`
`
`EAmVHEmEU208mm
`
`2mow“
`
`mM“Bum:050S7826m
`
`21
`
`000
`
`m
`
`$2a
`
`21
`
`
`

`

`U.S. Patent
`
`%2
`
`m
`
`5
`
`US 7,296,121 B2
`
`mcomo
`
`BaVON._‘.No:
`2.M:2.SwtflgEwacooéoz
`
`
`momtflc.vP6695M3:928memcmEcozommcmt.
`
`Lnlw.mnot35cm686$
`7.5:momtmEE96500
`
`
`
`
`
`
`cozoSmwugmgmog
`
`22&5
`
`22
`
`22
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 21 of 25
`
`US 7,296,121 B2
`
`
`
`
`
`
`
`
`
`\.om:
`
`
`
`£2:5&8onIII,mummma
`
`S2:me
`
`38m
`
`
`
`«mom:8&8on
`
`«com:
`
`
`
`
`
`392.833onone”:Emmuoem
`
`
`
`-
`
`-I“2:05”,—
`2%“mm,I
`
`OS:5thO:
`
`cam:O:Ivlllllon:O:
`
`
`
`owe“:
`
`
`
`23
`
`23
`
`
`
`
`

`

`U.S. Patent
`
`NOV.13,2007
`
`Sheet22 of25
`
`US 7,296,121 B2
`
`nvoa_
`
`ovomfi
`
`Noafi
`
`Eu
`
`#0
`
`559.0
`
`v50
`
`db
`
`$596
`
`mczsom
`
`03$.
`
`H8885
`
`83%??—
`
`Eoiwum
`
`0.3:
`
`wESom
`
`manna:
`
`03$.
`
`amama
`
`avomfi
`
`24
`
`24
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 23 of 25
`
`US 7,296,121 B2
`
`Local Probe
`
`Filtering
`
`
`
`
`
`
`memory controller
`generates probe (2002)
`
`
`consult routing
`
`table (2004)
`
`
`
`
`
`
`send probe only
`
`to PFU (2006)
`
`
`
`
`
`
`
`
`PFU accepts probe and
`refers to cache coherence
`directory (2008)
`
`respond to
`requester (2012)
`
`
`
`Requested
`
`line stored in cache?
`
`
`(2010)
`
`
`
`yes
`__Y__
`
`PFU sends probe(s) only to nodes
`that may have line in cache (2014)
`
`
`
`
`
`Probed nodes look
`up line (2016)
`
`Probed nodes return
`responses to PFU (2018)
`
`Figure 20
`
`no
`
`PFU updates
`directory (2020)
`
`
`
` All
`
`probed nodes
`
`responded?
`(2022)
`
`
`
`yes
`
`PFU collates the
`responses (2024)
`
`
`PFU responds to
`
`
`requesting node (2026)
`
`
`
`25
`
`25
`
`

`

`U.S. Patent
`
`NOV 13, 2007
`
`Sheet 24 of 25
`
`US 7,296,121 B2
`
`E9:5
`
`mZ
`
`
`
`Ho:2mwctflE“5tho“2;”.move:wonoa:02#2
`
`.ESomeS9:3023;
`
`26
`
`Q(
`
`/3
`
`.F0.2Eoc:c:mciotc3meowEcoEmunoi
`
`
`
`.uwkonoiamm“so:32vanFOEBu:flow
`
`Z :
`
`o-
`a)
`m
`
`\ a
`
`
`
`2AI|DAGAIDEEDEAII2All:mo
`
`.
`
`26
`
`

`

`U.S. Patent
`
`mm
`
`m
`
`E
`
`US 7,296,121 B2
`
`3,”ESm.2[PETPETali:;I|P5
`
`Om
`
`mm
`
`com
`
`
`
`
`
`Ho:02mctozcone;826mono:wonopm:0202
`.5803:th05:083:HZ
`
`
`
`mmzNz
`
`27
`
`mm2:9“.
`
`27
`
`

`

`1
`REDUCING PROBE TRAFFIC IN
`MULTIPROCESSOR SYSTEMS
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`The present application is a continuation-in-part of and
`claims priority under 35 U.S.C. 120 to US. patent applica-
`tion Ser. No. 10/288,347 now US. Pat. No. 7,003,633 for
`METHODS AND APPARATUS FOR MANAGING
`
`PROBE REQUESTS filed on NOV. 4, 2002 the entire
`disclosure of which is incorporated herein by reference for
`all purposes. The subject matter described in the present
`application is also related to US. patent application Ser. No.
`10/288,399 now US. Pat. No. 7,103,726 for METHODS
`AND APPARATUS
`FOR MANAGING PROBE
`
`REQUESTS filed on Nov. 4, 2002, the entire disclosure of
`which is incorporated herein by reference for all purposes.
`
`BACKGROUND OF THE INVENTION
`
`The present invention generally relates to accessing data
`in a multiple processor system. More specifically,
`the
`present invention provides techniques for reducing memory
`transaction traffic in a multiple processor system.
`Data access in multiple processor systems can raise issues
`relating to cache coherency. Conventional multiple proces-
`sor computer systems have processors coupled to a system
`memory through a shared bus. In order to optimize access to
`data in the system memory, individual processors are typi-
`cally designed to work with cache memory. In one example,
`each processor has a cache that is loaded with data that the
`processor frequently accesses. The cache is read or written
`by a processor. However, cache coherency problems arise
`because multiple copies of the same data can co-exist in
`systems having multiple processors and multiple cache
`memories. For example, a frequently accessed data block
`corresponding to a memory line may be loaded into the
`cache of two different processors. In one example, if both
`processors attempt to write new values into the data block at
`the same time, different data values may result. One value
`may be written into the first cache while a different value is
`written into the second cache. A system might then be unable
`to determine what value to write through to system memory.
`A variety of cache coherency mechanisms have been
`developed to address such problems in multiprocessor sys-
`tems. One solution is to simply force all processor writes to
`go through to memory immediately and bypass the associ-
`ated cache. The write requests can then be serialized before
`overwriting a system memory line. However, bypassing the
`cache significantly decreases efficiency gained by using a
`cache. Other cache coherency mechanisms have been devel-
`oped for specific architectures. In a shared bus architecture,
`each processor checks or snoops on the bus to determine
`whether it can read or write a shared cache block. In one
`
`example, a processor only writes an object when it owns or
`has exclusive access to the object. Each corresponding cache
`object is then updated to allow processors access to the most
`recent version of the object.
`Bus arbitration is used when both processors attempt to
`write the same shared data block in the same clock cycle.
`Bus arbitration logic decides which processor gets the bus
`first. Although, cache coherency mechanisms such as bus
`arbitration are effective, using a shared bus limits the num-
`ber of processors that can be implemented in a single system
`with a single memory space.
`
`US 7,296,121 B2
`
`2
`
`Other multiprocessor schemes involve individual proces-
`sor, cache, and memory systems connected to other proces-
`sors, cache, and memory systems using a network backbone
`such as Ethernet or Token Ring. Multiprocessor schemes
`involving separate computer systems each with its own
`address space can avoid many cache coherency problems
`because each processor has its own associated memory and
`cache. When one processor wishes to access data on a
`remote computing system, communication is explicit. Mes-
`sages are sent
`to move data to another processor and
`messages are received to accept data from another processor
`using standard network protocols such as TCP/IP. Multipro-
`cessor systems using explicit communication including
`transactions such as sends and receives are referred to as
`
`systems using multiple private memories. By contrast, mul-
`tiprocessor system using implicit communication including
`transactions such as loads and stores are referred to herein as
`
`using a single address space.
`Multiprocessor schemes using separate computer systems
`allow more processors to be interconnected while minimiz-
`ing cache coherency problems. However,
`it would take
`substantially more time to access data held by a remote
`processor using a network infrastructure than it would take
`to access data held by a processor coupled to a system bus.
`Furthermore, valuable network bandwidth would be con-
`sumed moving data to the proper processors. This can
`negatively impact both processor and network performance.
`Performance limitations have led to the development of a
`point-to-point architecture for connecting processors in a
`system with a single memory space. In one example, indi-
`vidual processors can be directly connected to each other
`through a plurality of point-to-point links to form a cluster
`of processors. Separate clusters of processors can also be
`connected. The point-to-point links significantly increase the
`bandwidth for coprocessing and multiprocessing functions.
`However, using a point-to-point architecture to connect
`multiple processors in a multiple cluster system sharing a
`single memory space presents its own problems.
`Consequently,
`it is desirable to provide techniques for
`improving data access and cache coherency in systems
`having multiple processors connected using point-to-point
`links.
`
`SUMMARY OF THE INVENTION
`
`According to the present invention, various techniques are
`provided for reducing traffic relating to memory transactions
`in multi-processor systems. According to various specific
`embodiments, a computer system having a plurality of
`processing nodes interconnected by a first point-to-point
`architecture is provided. Each processing node has a cache
`memory associated therewith. A probe filtering unit is oper-
`able to receive probes corresponding to memory lines from
`the processing nodes and to transmit the probes only to
`selected ones of the processing nodes with reference to
`probe filtering information. The probe filtering information
`is representative of states associated with selected ones of
`the cache memories.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`According to other embodiments, methods and apparatus
`are provided for reducing probe traffic in a computer system
`comprising a plurality of processing nodes interconnected
`by a first point-to-point architecture. A probe corresponding
`to a memory line is transmitted from a first one of the
`processing nodes only to a probe filtering unit. The probe is
`evaluated with the probe filtering unit to determine whether
`a valid copy of the memory line is in any of the cache
`memories. The evaluation is done with reference to probe
`28
`
`65
`
`28
`
`

`

`US 7,296,121 B2
`
`3
`filtering information associated with the probe filtering unit
`and representative of states associated with selected ones of
`the cache memories. The probe is transmitted from the probe
`filtering unit only to selected ones of the processing nodes
`identified by the evaluating. Probe responses from the
`selected processing nodes are accumulated by the probe
`filtering unit. Only the probe filtering unit responds to the
`first processing node.
`A further understanding of the nature and advantages of
`the present invention may be realized by reference to the
`remaining portions of the specification and the drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention may best be understood by reference to the
`following description taken in conjunction with the accom-
`panying drawings, which are illustrative of specific embodi-
`ments of the present invention.
`FIGS. 1A and 1B are diagrammatic representation depict-
`ing a system having multiple clusters.
`FIG. 2 is a diagrammatic representation of a cluster
`having a plurality of processors.
`FIG. 3 is a diagrammatic representation of a cache coher-
`ence controller.
`
`FIG. 4 is a diagrammatic representation showing a trans-
`action flow for a data access request from a processor in a
`single cluster.
`FIG. 5A-5D are diagrammatic representations showing
`cache coherence controller functionality.
`FIG. 6 is a diagrammatic representation depicting a trans-
`action flow for a request with multiple probe responses.
`FIG. 7 is a diagrammatic representation showing a cache
`coherence directory.
`FIG. 8 is a diagrammatic representation showing probe
`filter information that can be used to reduce the number of
`
`probes transmitted to various clusters.
`FIG. 9 is a diagrammatic representation showing a trans-
`action flow for probing of a home cluster without probing of
`other clusters.
`
`FIG. 10 is a diagrammatic representation showing a
`transaction flow for probing of a single remote cluster.
`FIG. 11 is a flow process diagram showing the handling
`of a request with probe filter information.
`FIG. 12 is a diagrammatic representation showing
`memory controller filter information.
`FIG. 13 is a diagrammatic representation showing a
`transaction flow for probing a single remote cluster without
`probing a home cluster.
`FIG. 14 is a flow process diagram showing the handling
`of a request at a home cluster cache coherence controller
`using memory controller filter information.
`FIG. 15 is a diagrammatic representation showing a
`transaction flow for a cache coherence directory eviction of
`an entry corresponding to a dirty memory line.
`FIG. 16 is a diagrammatic representation showing a
`transaction flow for a cache coherence directory eviction of
`an entry corresponding to a clean memory line.
`FIG. 17 is a diagrammatic representation of a cache
`coherence controller according to a specific embodiment of
`the invention.
`
`FIG. 18 is a diagrammatic representation of a cluster
`having a plurality of processing nodes and a probe filtering
`unit.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`FIG. 19 is an exemplary representation of a processing
`node.
`
`65
`
`FIG. 20 is a flowchart illustrating local probe filtering
`according to a specific embodiment of the invention.
`
`4
`
`FIG. 21 is a diagrammatic representation of a transaction
`flow in which local probe filtering is facilitated according to
`a specific embodiment of the invention.
`FIG. 22 is a diagrammatic representation of another
`transaction flow in which local probe filtering is facilitated
`according to a specific embodiment of the invention.
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`Reference will now be made in detail to some specific
`embodiments of the invention including the best modes
`contemplated by the inventors for carrying out the invention.
`Examples of these specific embodiments are illustrated in
`the accompanying drawings. While the invention is
`described in conjunction with these specific embodiments, it
`will be understood that
`it
`is not
`intended to limit
`the
`
`invention to the described embodiments. On the contrary, it
`is intended to cover alternatives, modifications, and equiva-
`lents as may be included within the spirit and scope of the
`invention as defined by the appended claims. Multi-proces-
`sor architectures having point-to-point communication
`among their processors are suitable for implementing spe-
`cific embodiments of the present invention. In the following
`description, numerous specific details are set forth in order
`to provide a thorough understanding of the present inven-
`tion. The present invention may be practiced without some
`or all of these specific details. Well-known process opera-
`tions have not been described in detail
`in order not to
`
`unnecessarily obscure the present invention. Furthermore,
`the present application’s reference to a particular singular
`entity includes that possibility that the methods and appa-
`ratus of the present invention can be implemented using
`more than one entity, unless the context clearly dictates
`otherwise.
`
`According to various embodiments, techniques are pro-
`vided for increasing data access efficiency in a multiple
`processor system. In a point-to-point architecture, a cluster
`of processors includes multiple processors directly con-
`nected to each other through point-to-point links. By using
`point-to-point links instead of a conventional shared bus or
`external network, multiple processors are used efficiently in
`a system sharing the same memory space. Processing and
`network efficiency are also improved by avoiding many of
`the bandwidth and latency limitations of conventional bus
`and external network based multiprocessor architectures.
`According to various embodiments, however,
`linearly
`increasing the number of processors in a point-to-point
`architecture leads to an exponential increase in the number
`of links used to connect the multiple processors. In order to
`reduce the number of links used and to further modularize a
`
`multiprocessor system using a point-to-point architecture,
`multiple clusters may be used.
`According to some embodiments, multiple processor
`clusters are interconnected using a point-to-point architec-
`ture. Each cluster of processors includes a cache coherence
`controller used to handle communications between clusters.
`
`In one embodiment, the point-to-point architecture used to
`connect processors are used to connect clusters as well.
`By using a cache coherence controller, multiple cluster
`systems can be built using processors that may not neces-
`sarily support multiple clusters. Such a multiple cluster
`system can be built by using a cache coherence controller to
`represent non-local nodes in local transactions so that local
`nodes do not need to be aware of the existence of nodes
`outside of the local cluster. More detail on the cache
`
`coherence controller will be provided below.
`29
`
`29
`
`

`

`US 7,296,121 B2
`
`5
`In a single cluster system, cache coherency can be main-
`tained by sending all data access requests through a serial-
`ization point. Any mechanism for ordering data access
`requests (also referred to herein as requests and memory
`requests) is referred to herein as a serialization point. One
`example of a serialization point is a memory controller.
`Various processors in the single cluster system send data
`access requests to one or more memory controllers. In one
`example, each memory controller is configured to serialize
`or lock the data access requests so that only one data access
`request for a given memory line is allowed at any particular
`time. If another processor attempts to access the same
`memory line, the data access attempt is blocked until the
`memory line is unlocked. The memory controller allows
`cache coherency to be maintained in a multiple processor,
`single cluster system.
`A serialization point can also be used in a multiple
`processor, multiple cluster system where the processors in
`the various clusters share a single address space. By using a
`single address space, internal point-to-point links can be
`used to significantly improve intercluster communication
`over traditional external network based multiple cluster
`systems. Various processors in various clusters send data
`access requests to a memory controller associated with a
`particular cluster such as a home cluster. The memory
`controller can similarly serialize all data requests from the
`different clusters. However, a serialization point in a mul-
`tiple processor, multiple cluster system may not be as
`efficient as a serialization point in a multiple processor,
`single cluster system. That is, delay resulting from factors
`such as latency from transmitting between clusters can
`adversely affect the response times for various data access
`requests. It should be noted that delay also results from the
`use of probes in a multiple processor environment.
`Although delay in intercluster transactions in an architec-
`ture using a shared memory space is significantly less than
`the delay in conventional message passing environments
`using external networks such as Ethernet or Token Ring,
`even minimal delay is a significant factor. In some applica-
`tions, there may be millions of data access requests from a
`processor in a fraction of a second. Any delay can adversely
`impact processor performance.
`According to various embodiments, probe management is
`used to increase the efficiency of accessing data in a multiple
`processor, multiple cluster system. A mechanism for elicit-
`ing a response from a node to maintain cache coherency in
`a system is referred to herein as a probe. In one example, a
`mechanism for snooping a cache is referred to as a probe. A
`response to a probe can be directed to the source or target of
`the initiating request. Any mechanism for filtering or reduc-
`ing the number of probes and requests transmitted to various
`nodes is referred to herein as managing

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket