throbber
111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US007296121B2
`
`c12) United States Patent
`Morton et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,296,121 B2
`Nov. 13, 2007
`
`(54) REDUCING PROBE TRAFFIC IN
`MULTIPROCESSOR SYSTEMS
`
`(75)
`
`Inventors: Eric Morton, Austin, TX (US); Rajesh
`Kota, Austin, TX (US); Adnan
`Khaleel, Austin, TX (US); David B.
`Glasco, Austin, TX (US)
`
`(73) Assignee: Newisys, Inc., Austin, TX (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 250 days.
`
`(21) Appl. No.: 10/966,161
`
`(22) Filed:
`
`Oct. 15, 2004
`(Under 37 CFR 1.47)
`
`(65)
`
`Prior Publication Data
`
`US 2007/0055826 Al Mar. 8, 2007
`
`Related U.S. Application Data
`
`(63) Continuation-in-part of application No. 10/288,347,
`filed on Nov. 4, 2002, now Pat. No. 7,003,633.
`
`(51)
`
`Int. Cl.
`G06F 12100
`(2006.01)
`(52) U.S. Cl. ....................................... 7111148; 711/141
`(58) Field of Classification Search ................ 711/141,
`711/148, 131, 144, 145, 146; 709/206, 213,
`709/216,217,218,219
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,195,089 A
`5,394,555 A
`
`3/1993 Sindhu et al.
`2/1995 Hunter et al.
`
`5,524,212 A
`5,692,123 A
`5,751,995 A
`5,829,032 A
`5,893,151 A
`6,018,791 A
`6,038,652 A
`6,052,769 A
`6,067,603 A
`6,073,210 A
`6,085,295 A
`6,108,737 A
`6,122,715 A
`
`6/1996 Somani et a!.
`1111997 Logghe
`511998 Sarangdhar
`10/1998 Komuro et a!.
`4/1999 Merchant
`1/2000 Arimilli et a!.
`3/2000 Phillips et a!.
`4/2000 Huff et a!.
`5/2000 Carpenter et a!.
`6/2000 Palanca et al.
`7/2000 Ekanadham eta!.
`8/2000 Sharma et al.
`9/2000 Palanca et al.
`
`(Continued)
`
`wo
`
`FOREIGN PATENT DOCUMENTS
`wo 0239242
`
`5/2002
`
`OTHER PUBLICATIONS
`
`Guo, et a!., "A Probe-Based Server Selection Protocol for Differ(cid:173)
`entiated Service Networks", © 2002 IEEE, p. 2353-2357.*
`
`(Continued)
`
`Primary Examiner-Brian R. Peugh
`(74) Attorney, Agent, or Firm-Beyer Weaver LLP
`
`(57)
`
`ABSTRACT
`
`A computer system having a plurality of processing nodes
`interconnected by a first point-to-point architecture is
`described. Each processing node has a cache memory asso(cid:173)
`ciated therewith. A probe filtering unit is operable to receive
`probes corresponding to memory lines from the processing
`nodes and to transmit the probes only to selected ones of the
`processing nodes with reference to probe filtering informa(cid:173)
`tion. The probe filtering information is representative of
`states associated with selected ones of the cache memories.
`
`25 Claims, 25 Drawing Sheets
`
`Petition for Inter Partes Review of 
`U.S. Pat. No. 7,296,121
`IPR2015‐00158
`EXHIBIT
`Sony‐
`
`1
`
`

`

`US 7,296,121 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`6,148,378 A
`6,167,492 A
`6,173,393 B1
`6,189,078 B1
`6,192,451 B1
`6,205,520 B1
`6,209,065 B1
`6,292,705 B1
`6,292,906 B1
`6,330,643 B1
`6,334,172 B1
`6,338,122 B1
`6,343,347 B1
`6,385,705 B1
`6,405,289 B1
`6,463,529 B1
`6,467,007 B1
`6,490,661 B1
`6,542,926 B2
`6,615,319 B2
`6,631,447 B1
`6,633,945 B1
`6,633,960 B1
`6,636,906 B1
`6,640,287 B2
`6,658,526 B2
`6,665,767 B1
`6,704,842 B1
`6,738,870 B2
`6,738,871 B2
`6,751,698 B1
`6,751,721 B1
`6,754,782 B2
`6,760,809 B2
`6,760,819 B2
`6,775,749 B1 *
`6,799,252 B1
`6,865,595 B2
`6,892,282 B2
`7,003,633 B2
`200110013089 A1
`200110029574 A1 *
`200110037435 A1
`2002/0007463 A1
`2002/0046327 A1
`2002/0052914 A1
`2002/0083149 A1
`2002/0083243 A1
`2002/0087807 A1
`2002/0087811 A1
`2003/0009623 A1
`2003/0182508 A1
`2003/0182509 A1
`2003/0182514 A1
`2003/0195939 A1
`2003/0196047 A1
`2003/0210655 A1
`2003/0212741 A1
`2003/0233388 A1
`2004/0024836 A1 *
`2004/00737 55 A1
`2004/0088492 A1
`2004/0088493 A1
`
`1112000 Bordaz et a!.
`12/2000 Keller et a!.
`112001 Palanca et al.
`212001 Bauman et a!.
`212001 Arimilli et a!.
`3/2001 Palanca et al.
`3/2001 Durham et a!.
`9/2001 Wang et a!.
`9/2001 Fu et a!.
`12/2001 Arimilli et a!.
`12/2001 Arimilli et a!.
`112002 Baumgartner et a!.
`112002 Arimilli et a!.
`5/2002 Keller et a!.
`6/2002 Arimilli et a!.
`10/2002 Miller et a!.
`10/2002 Armstrong et a!.
`12/2002 Keller et a!.
`4/2003 Zalewski et a!.
`9/2003 Khare et al.
`10/2003 Morioka et a!.
`10/2003 Fu et a!.
`10/2003 Kessler et a!.
`10/2003 Sharma et a!.
`10/2003 Gharachorloo et a!.
`12/2003 Nguyen et al.
`12/2003 Comisky et a!.
`3/2004 Janakiraman et a!.
`5/2004 Van Ruben et al.
`5/2004 Van Ruben et al.
`6/2004 Deneroff et a!.
`6/2004 Webb et a!.
`6/2004 Arimilli et a!.
`7/2004 Arimilli eta!.
`7/2004 Dhong eta!.
`8/2004 Mudgett eta!. ............ 7111146
`9/2004 Bauman
`3/2005 Glasco
`5/2005 Hass et a!.
`2/2006 Glasco
`8/2001 Weber
`10/2001 Razdan eta!. .............. 7111130
`11/2001 Van Doren
`112002 Fung
`4/2002 Gharachorloo et a!.
`5/2002 Zalewski et a!.
`6/2002 Van Ruben et al.
`6/2002 Van Ruben
`7/2002 Gharachorloo et a!.
`7/2002 Khare et al.
`112003 Arimilli et a!.
`9/2003 Glasco
`9/2003 Glasco
`9/2003 Glasco
`10/2003 Edirisooriya et al.
`10/2003 Kessler et a!.
`11/2003 Giasco
`11/2003 Giasco
`12/2003 Giasco et al.
`2/2004 Keller eta!. ................ 709/213
`4/2004 Webb et a!.
`5/2004 Glasco
`5/2004 Glasco
`
`2004/0088494 A1
`2004/0117559 A1
`2004/0255002 A1
`
`5/2004 Giasco
`6/2004 Glasco
`12/2004 Kota et a!.
`
`OTHER PUBLICATIONS
`
`l/0 Link Specification Revision
`HyperTransport™
`1.03,
`HyperTransport™ Consortium, Oct. 10, 2001, Copynght © 2001
`HyperTransport Technology Consortium.
`PCT Search Report PCT/US03/34756, Int'l filing date Oct. 30,
`2003, Search report Mailed Dec. 16, 2004.
`.
`Bilir eta!., "Multicast Snooping: A New Coherence Method Usmg
`a Multicast Address Network", Computer Architecture, 1999. Pro(cid:173)
`ceedings of the 26th International Symposium on, May 2-4, 1999.
`Martinet al., "Bandwidth Adaptive Snooping", Proceedmgs of the
`Eighth International Symposium on High-Performance Computer
`Architecture on Feb. 2-6, 2002; pp. 251-262.
`Sorin et al., "SpecifYing and VerifYing a Broadcast and a Multicast
`Snooping Cache Coherence Protocol", IEEE Transactions on Par(cid:173)
`allel and Distributed Systems, vol. 13, No. 6, Jun. 2002.
`U.S. Appl. No.: 10/288,347 (Now U.S. Pat. No. 7,003,633), Notice
`of Allowance, dated Sep. 12, 2005.
`U.S. Appl. No. 10/288,347 (Now U.S. Pat. No. 7,003,633), First
`Office Action, dated Nov. 18, 2004.
`Kim eta!., "Power-aware Partitioned Cache Architectures", 2001
`ACM p. 6467.
`.
`Powell et a!., "Reducing Set-Associative Cache Energy v1a Way(cid:173)
`Prediction and Selective Direct-Mapping" 2001 IEEE, p. 54-65.
`Culler, D. E., J. P. Singh, A. Gupta, "Parallel Computer Architec(cid:173)
`ture", 1999 Morgan Kaufmann, San Francisco, CA USA
`XP002277658.
`Tanenbaum, Andrew, "Computer Networks", Computer Networks,
`London: Prentice Hall International, GB, 1996, pp. 345-403,
`XP002155220.
`U.S. Appl. No.: 10/288,347 (Now U.S. Pat. No. 7,003,633). Final
`Office Action, dated May 12, 2005.
`U.S. Office Action mailed Sep. 22, 2004, from U.S. Appl. No.
`10/106,426 [NWISP002].
`U.S. Office Action mailed Mar. 7, 2005, form U.S. Appl. No.
`10/106,426 [NWISP002].
`U.S. Office Action mailed Jul. 21, 2005, from U.S. Appl. No.
`10/106,426 [NWISP002].
`U.S. Office Action mailed Sep. 23, 2004, from U.S. Appl. No.
`10/106,430 [NWISP003].
`U.S. Office Action mailed Mar. 10, 2005, form U.S. Appl. No.
`10/106,430 [NWISP003].
`U.S. Office Action mailed Jul. 21, 2005, from U.S. Appl. No.
`10/106,430 [NWISP003].
`U.S. Office Action mailed Sep. 22, 2004, from U.S. Appl. No.
`10/106,299 [NWISP004].
`U.S. Office Action mailed Mar. 10, 2005, from U.S. Appl. No.
`10/106,299 [NWISP004].
`U.S. Office Action mailed Jul. 21, 2005, from U.S. Appl. No.
`10/106,299 [NWISP004].
`U.S. Office Action mailed Jul. 20, 2005, from U.S. Appl. No.
`10/608,846 [NWISP030].
`U.S. Office Action mailed Sep. 9, 2005, from U.S. Appl. No.
`10/462,015 [NWISP040].
`U.S. Office Action mailed Sep. 9, 2005, from U.S. Appl. No.
`10/426,084 [NWISP033].
`U.S. Office Action mailed Nov. 2, 2005, from U.S. Appl. No.
`10/106/430 [NWISP003].
`U.S. Office Action mailed Oct. 5, 2005, from U.S. Appl. No.
`10/635,703 [NWISP036].
`* cited by examiner
`
`2
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 1 of 25
`
`US 7,296,121 B2
`
`Figure lA
`
`Processing
`Cluster 101
`
`'
`'
`'
`'
`'
`'
`'
`'
`'
`'
`,_,
`
`Processing
`Cluster 103
`
`Ill a~- ______ :·
`
`: _______ : f lllc
`
`Processing
`Cluster 105
`
`Processing
`Cluster 107
`
`Figure lB
`
`Processing
`Cluster 121
`
`Processing
`Cluster 123
`
`-
`
`Switch 131
`141b~: :::
`,:: ~~~141c
`
`Processing
`Cluster 125
`
`Processing
`Cluster 127
`
`3
`
`

`

`06a
`
`11
`
`Figure 2
`
`Remote Clusters
`
`(\/208a
`'
`'
`'
`'
`\)
`
`,r-200
`
`206
`
`~I
`
`-
`
`Processor 202a
`
`(\/232b
`: :
`\ . ./
`
`Cache
`Coherence
`Controller 230
`
`(\/232a
`:
`:
`'
`'
`\,'
`
`Processor 202b r------
`
`-..-
`
`:,- ~- ~-... ,
`·- -- --·
`1"--208e
`
`214a
`
`214b_/
`
`r------ Processor 202c
`
`1;214f
`
`Service
`Processor 212
`
`~
`
`i'-'
`
`v-214e
`
`< ~~ ~)
`232c-1/
`
`:·· -- ··~,
`·- -----
`1"--232d
`
`' ' '
`'
`'
`'
`'
`'
`1
`
`208d_;
`
`~
`
`W6c
`
`;-:: 110 216
`
`1/0 Switch 210
`
`i
`I BIOS I
`
`204
`
`;214c
`
`"--214d
`
`'
`'
`'
`'
`'
`'
`.,_,(_208c
`
`208b-,
`---- ---
`'~. -- ---
`
`Processor 202d 1 -
`
`206d -
`
`1/0220 ~
`
`~
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`
`~
`
`N
`0
`0
`....:J
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`N
`0 .....
`N
`Ul
`
`d
`rJl
`-....l
`'N
`\C
`... o-..
`""""' N
`
`""""' = N
`
`4
`
`

`

`Figure 3
`
`230
`
`Coherent Interface 307
`
`Protocol Engine 305
`
`Pending
`Buffer 309
`
`Noncoherent Interface 311
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`'"
`N
`0
`0
`-....l
`
`rFJ =(cid:173)
`.....
`
`('D
`('D
`
`(.H
`
`0 .....
`N
`Ul
`
`d
`rJl
`-....l
`'N
`\C
`'"0'1
`""""' N
`
`""""' = N
`
`5
`
`

`

`U.S. Patent
`U.S. Patent
`
`Nov. 13, 2007
`Nov. 13, 2007
`
`Sheet 4 of 25
`Sheet 4 of 25
`
`US 7,296,121 B2
`US 7,296,121 B2
`
`v8:me
`
`
`
`6
`
`

`

`U.S. Patent
`U.S. Patent
`
`Nov. 13, 2007
`NOV. 13, 2007
`
`Sheet 5 of 25
`Sheet 5 of 25
`
`US 7,296,121 B2
`US 7,296,121 B2
`
`gmama
`
`
`
`
`
` H3$602“$04.52
`
`7
`
`

`

`U.S. Patent
`U.S. Patent
`
`Nov. 13, 2007
`Nov. 13, 2007
`
`Sheet 6 of 25
`Sheet 6 of 25
`
`US 7,296,121 B2
`US 7,296,121 B2
`
`mmoSwE
`
`$qu384.52
`
`Hmm
`
`8
`
`
`
`
`

`

`U.S. Patent
`U.S. Patent
`
`Nov. 13,2007
`Nov. 13, 2007
`
`Sheet 7 of 25
`Sheet 7 of 25
`
`US 7,296,121 B2
`US 7,296,121 B2
`
`0m8&3
`
`
`
`mmmmoveZ€03.20Z
`
`9
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 8 of 25
`
`US 7,296,121 B2
`
`u ~ ~--------------------,
`
`M
`
`\0
`lrl
`
`01
`
`I u(cid:173)\0
`
`lrl
`
`-u ~ ~----------------7
`
`\0
`V)
`
`r.rJ
`
`~
`0 z
`
`10
`
`

`

`Figure 6
`
`601-2
`
`L
`607
`
`·{Ll/
`LbJ
`
`I
`
`L
`
`Gog~ 601-3
`
`.........___~~~
`-~
`.,.
`
`c
`621-5
`
`MC
`623-2
`
`L
`645
`
`c
`641-1 f
`
`CPU H c
`
`601-l
`
`603-l
`
`Request
`Cluster 600
`
`Home
`Cluster 620
`
`Remote
`Cluster 640
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~
`.....
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`
`rFJ =-('D
`.....
`\0
`0 .....
`N
`Ul
`
`d
`rJl
`--..1
`'N
`\C
`'"0'1
`""""' N
`
`""""' = N
`
`11
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 10 of 25
`
`US 7,296,121 B2
`
`Figure 7
`
`Coherence Directory 701
`
`Memory Line 711
`
`State 713
`
`Address 721
`
`Address 731
`
`Address 741
`
`Address 751
`
`Invalid
`
`Invalid
`
`Shared
`
`Shared
`
`Dirty Data
`Owner 715
`NIA
`
`NIA
`
`N/A
`
`NIA
`
`Occupancy Vector
`717
`
`N/A
`
`N/A
`
`Clusters 1,3
`
`Clusters 1, 2, 3, 4
`
`Address 761
`Owned
`! - - - - - - - - - - - - - - r--------------
`Address 771
`Owned
`
`Cluster 4
`
`Cluster 2, 3, 4
`
`Cluster 2
`
`Cluster 2, 4
`
`Address 781
`
`Modified
`
`Cluster 2
`
`Address 791
`
`Modified
`
`Cluster 3
`
`...
`
`...
`
`...
`
`N/A
`
`N/A
`
`...
`
`12
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 11 of 25
`
`US 7,296,121 B2
`
`Figure 8
`
`Probe Filter
`Information 821
`
`Read Block (Read) 823
`
`Read Block Modify
`(Read/Write) 825
`
`Invalid 831
`
`Can use completion bit.
`Probe home cluster. (801)
`
`Can use completion bit.
`Probe home cluster.
`(809)
`
`Shared 833
`
`Can use completion bit.
`Probe home cluster. (803)
`
`N/A (811)
`
`Owned 835
`
`Can use completion bit.
`Probe remote cluster with
`line cached in owned
`state. (805)
`
`N/A (813)
`
`Modified 83 7
`
`Can use completion bit.
`Probe remote cluster with
`line cached in modified
`state. (807)
`
`Can use completion bit.
`Probe remote cluster.
`(815)
`
`13
`
`

`

`Figure 9
`
`c
`903-4
`
`CPU
`901-4
`
`c
`903-5
`
`lcPUl
`c
`~
`903-1
`
`Request
`Cluster 900
`
`L
`925
`
`192 ~ l ~ '--,---J
`
`19~7]
`fC I
`
`----------
`
`-, c
`
`1nt-3
`
`Completion Bit Set
`
`c
`921-4
`
`Home
`Cluster 920
`
`~
`
`Remote
`Cluster 940
`
`c ~~
`
`923-2
`
`921-5
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`
`rFJ =-('D
`.....
`....
`N
`0 .....
`N
`Ul
`
`d
`rJl
`"'--...1
`N
`\C
`"'0'1
`""""' N
`
`""""' = N
`
`14
`
`

`

`Figure 10
`
`c
`
`f---+ CPU
`1001-4
`
`I 003-4 I
`
`CPU
`1001-1
`
`c
`~
`1003-1
`
`Request
`Cluster 1000
`
`L
`
`I 1025 ;~~~-~
`
`L
`MC
`C
`1021-1 f---+ 1023-1- 1027
`
`C
`1021-3
`
`Home Cluster
`1020
`
`\ \ 10;1-2'\
`
`~ ,~4s f
`
`c
`1041-1
`
`L
`1047
`
`c
`1041-2
`
`1!~9 f
`
`Cluster I 040
`
`I
`
`Completion Bit Set
`
`I
`
`~
`
`c
`~
`1003-5
`I
`
`t
`c
`f---+ MC
`1021-5
`1023-2
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`~(H
`N
`0
`0
`-....l
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`....
`0 .....
`N
`Ul
`
`(H
`
`d
`rJl
`"'--...1
`N
`\C
`"'0'1
`""""' N
`
`""""' = N
`
`15
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 14 of 25
`
`US 7,296,121 B2
`
`Figure 11
`
`1135
`
`1139
`
`Forward Probe With
`Completion Indicator To
`Remote Cluster
`
`Receive Home Cluster
`Probe Responses And Do
`Not Forward Response To
`Request Cluster
`
`equest Handling
`Home Coherence
`Controller
`
`Receive Request
`Associated With A
`Memory Line
`
`Forward Request To
`Memory Controller
`
`Receive Probe From
`Memory Controller
`
`Access Coherence
`Directory And Probe
`Filter Information
`
`Yes
`
`Yes
`
`Probe Remote
`Cluster?
`
`1101
`
`1105
`
`1109
`
`1113
`
`1121
`
`1131
`
`No
`
`1141
`
`No
`
`1149
`
`Receive Probe Responses
`And Forward Response
`To Request Cluster With
`Completion Indicator
`
`Receive Source Done
`From Request Cluster
`And Forward Source
`Done To Memory
`Controller
`
`1145
`
`Proceed Using Memory
`Controller Serialization
`With No Filtering And
`Completion Bit
`
`16
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 15 of 25
`
`US 7,296,121 B2
`
`Figure 12
`
`Memory Controller Filter Information 1221
`
`Read Block [Read] 1223
`
`Read Block Modify
`[Read/Write] 1225
`
`Invalid 1231
`
`Send request to target.
`(1201)
`
`Send request to target.
`(1209)
`
`Shared 1233
`
`Send request to target.
`(1203)
`
`Send request to target.
`(1211)
`
`Owned 1235
`
`Forward Probe To
`Owning Cluster. (1205)
`
`Send request to target.
`(1213)
`
`Modified 123 7
`
`Forward Probe To
`Forward Probe To
`Modified Cluster. (1207) Modified Cluster. ( 1215)
`
`17
`
`

`

`Figure 13
`
`CPU H c
`
`1301-1
`
`1303-1
`
`c H CPU H c
`
`1303-4
`
`11301-4
`
`I
`
`1303-5
`~
`
`Completion Bit Set
`
`j_,l
`
`Request
`Cluster 1300
`
`c
`1321-1
`
`Home Cluster
`1320
`
`Remote
`Cluster 1340
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`~(H
`N
`0
`0
`-....l
`
`('D
`
`rFJ =-('D
`.....
`....
`0\
`0 .....
`N
`Ul
`
`d
`rJl
`"'--...1
`N
`\C
`"'0'1
`""""' N
`
`""""' = N
`
`18
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 17 of 25
`
`US 7,296,121 B2
`
`Figure 14
`
`equest Handling
`Home Coherence
`Controller
`
`1401
`
`1405
`
`1405
`
`Receive Request
`Associated With A
`Particular Memory Line
`
`1403
`
`Identify Characteristics
`Associated With Request
`
`Forward Request To
`Serialization Point
`
`1411
`~-No--
`
`Proceed With Or Without
`Probe Filtering And
`Completion Indicator
`(Figure 6 Or Figure 11)
`
`1415
`
`1417
`
`Yes
`
`Block Requests And
`Probe Remote Cluster
`(Figure 12)
`
`Unblock Memory Line
`After Receiving Source
`Done From Request
`Cluster
`
`19
`
`

`

`sized write
`
`DIR
`1501-1
`
`1
`
`..,1 HMC
`
`HMC
`1502-2
`
`Home Cluster
`
`Remote Cluster
`
`CCC
`1506-2
`
`R
`
`1502-3
`
`HMC I TD~I DIR \ SD ,.I HMC
`
`1501-2
`
`1502-4
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`....
`0 .....
`N
`Ul
`
`QO
`
`1
`
`• CCC
`1507-2
`
`Figure 15
`
`d
`rJl
`-....l
`'N
`\C
`'"0'1
`""""' N
`
`""""' = N
`
`20
`
`

`

`validate
`block
`
`I HMC
`
`.1602-1
`
`DIR I
`
`1601-1
`
`L
`
`HMC
`1602-2
`
`CCC
`1'606-2
`
`DIR.
`1601-2
`
`SD. HMC
`1602-3
`
`Home Cluster
`
`Remote Cluster(s)
`
`R
`
`CCC
`lt-----·1607-2
`
`Figure 16
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`....
`\0
`0 .....
`N
`Ul
`
`d
`rJl
`-....l
`'N
`\C
`'"0'1
`"""' N
`
`"""' = N
`
`21
`
`

`

`1700
`
`Ql g.:g
`8~ ...10:::
`
`LocaVRerrote
`Transaction
`Interface
`
`Eviction
`Manager
`1702
`
`Cache
`Coherence
`Directory
`1701
`
`y
`
`I
`
`y
`Coherent hterface 1707
`I
`
`A"otocol Engine 1705
`
`Pending
`Buffer
`1709
`
`Gl
`1V
`~~
`
`I ! ~
`
`_§
`
`l---N;~-~e~~lnterfaoo 111-1--~
`
`Figure 17
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`rFJ =(cid:173)
`('D a
`
`N
`0
`0 .....
`N
`Ul
`
`d r.r;_
`
`-....l
`'N
`\C
`0'1
`"'
`""""' N
`
`""""' = N
`
`22
`
`

`

`Figure 18
`
`_lSUOa ll
`
`'
`'
`
`'
`'
`
`' . ~, ... •
`
`f - - - Processor 1802a
`
`/'.,r1832b
`: :
`\~.'
`
`Probe
`Filtering
`Unit
`1830
`
`(·.,r1832a
`I
`:
`'
`'
`'-~'
`
`(.- -- -.. ~,
`.. -- __ ..
`'-1808e
`
`:"- -- --.. ,
`.. _ -- --·
`1832c- /
`
`:"'" -- -.... ,
`.... -- --·
`"-1832d
`
`1808a
`
`r18oo
`
`16UO
`
`~I
`
`Processor 1802b -
`
`1808b- \ . -· --
`....
`'•- ... __ )
`
`f - - - Processor 1802c
`
`,·,.\
`: :
`
`t"\
`: :
`
`1808d_) ..
`
`~
`
`~
`
`.,_\__1808c
`
`Processor 1802d -
`
`806c
`
`~ I/0 1816
`
`1101820 Q
`
`1806d -
`
`-F1
`
`I/0 Switch 1810
`
`T
`I BIOS I
`1804
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`
`~
`
`N
`0
`0
`--.l
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`N ....
`0 .....
`N
`Ul
`
`d
`rJl
`---1
`'N
`\C
`... o-.
`""""' N
`
`""""' = N
`
`23
`
`

`

`1802
`
`JTAG
`Handshake
`Registers
`
`[\_;908
`
`Processor
`
`CLK
`CTL
`CAD (n:O)
`
`....-
`
`I"
`
`""'["
`
`CLK
`CTL
`CAD (n:O)
`
`v
`1906b~
`
`Routing
`Table
`
`v--1902
`
`....-
`
`....-
`
`~
`""""~""
`
`1904a
`
`CLK
`CTL
`CAD (n:O)
`
`......
`
`CLK
`CTL
`.... CAD (n:O)
`,
`
`N906a
`
`Routing
`Table
`
`Routing
`Table
`
`1906~
`
`/
`
`CLK
`CTL
`CAD (n:O)
`
`~
`
`~
`
`~
`""'["
`
`CLK
`CTL
`CAD (n:O)
`
`Figure 19
`
`'
`
`1904b
`
`~
`
`~
`
`_..
`
`~
`
`)
`
`\
`
`)
`
`..
`__..
`...
`
`1904c
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`N
`N
`0 .....
`N
`Ul
`
`d
`rJl
`-....l
`'N
`\C
`'"0'1
`""""' N
`
`""""' = N
`
`24
`
`

`

`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 23 of 25
`
`US 7,296,121 B2
`
`Local Probe
`Filtering
`
`PFU accepts probe and
`refers to cache coherence
`directory (2008)
`
`respond to
`requester (2012)
`
`no
`
`yes
`
`PFU sends probe(s) only to nodes
`that may have line in cache (2014)
`
`Figure 20
`
`yes
`
`end
`
`25
`
`

`

`ProbeSrc only goes to filtering unit from M CT,
`gets new TOT and resent out as a ProbeTgt.
`
`RR
`
`Req
`
`..
`
`CPU
`
`M
`
`/~ NO-_---~
`~
`p ~ N
`
`P PFU /
`
`PR
`
`PFU---+l:PU
`2 x P R
`_____________..
`
`SD
`
`..
`
`M
`
`Non probed nodes due to probe filtering are not
`aware ofthe transactoin.
`
`Nl
`
`N3
`
`Figure 21
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`('D
`
`rFJ =(cid:173)
`......
`N
`.j;o.
`
`0 ......
`N
`Ul
`
`d
`rJl
`-....l
`'N
`\C
`'"0'1
`""""' N
`
`""""' = N
`
`26
`
`

`

`RR
`
`Req
`
`CPU
`
`.,. M
`
`SD __ __...~ M
`~ PFU -·~CPU
`2xPR
`
`NO
`
`Nl
`
`Non probed nodes due to probe filtering are not
`aware ofthe transactoin.
`
`N2
`
`N3
`
`Figure 22
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`z 0
`~ ....
`
`(.H
`
`~
`
`N
`0
`0
`-....l
`
`('D
`
`rFJ =-('D
`.....
`N
`Ul
`0 .....
`N
`Ul
`
`d
`rJl
`-....l
`'N
`\C
`'"0'1
`""""' N
`
`""""' = N
`
`27
`
`

`

`US 7,296,121 B2
`
`1
`REDUCING PROBE TRAFFIC IN
`MULTIPROCESSOR SYSTEMS
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`The present application is a continuation-in-part of and
`claims priority under 35 U.S.C. 120 to U.S. patent applica(cid:173)
`tion Ser. No. 10/288,347 now U.S. Pat. No. 7,003,633 for
`METHODS AND APPARATUS FOR MANAGING
`PROBE REQUESTS filed on Nov. 4, 2002 the entire
`disclosure of which is incorporated herein by reference for
`all purposes. The subject matter described in the present
`application is also related to U.S. patent application Ser. No.
`10/288,399 now U.S. Pat. No. 7,103,726 for METHODS 15
`AND APPARATUS
`FOR MANAGING
`PROBE
`REQUESTS filed on Nov. 4, 2002, the entire disclosure of
`which is incorporated herein by reference for all purposes.
`
`BACKGROUND OF THE INVENTION
`
`The present invention generally relates to accessing data
`in a multiple processor system. More specifically, the
`present invention provides techniques for reducing memory
`transaction traffic in a multiple processor system.
`Data access in multiple processor systems can raise issues
`relating to cache coherency. Conventional multiple proces(cid:173)
`sor computer systems have processors coupled to a system
`memory through a shared bus. In order to optimize access to
`data in the system memory, individual processors are typi(cid:173)
`cally designed to work with cache memory. In one example,
`each processor has a cache that is loaded with data that the
`processor frequently accesses. The cache is read or written
`by a processor. However, cache coherency problems arise
`because multiple copies of the same data can co-exist in
`systems having multiple processors and multiple cache
`memories. For example, a frequently accessed data block
`corresponding to a memory line may be loaded into the
`cache of two different processors. In one example, if both
`processors attempt to write new values into the data block at
`the same time, different data values may result. One value
`may be written into the first cache while a different value is
`written into the second cache. A system might then be unable
`to determine what value to write through to system memory.
`A variety of cache coherency mechanisms have been
`developed to address such problems in multiprocessor sys(cid:173)
`tems. One solution is to simply force all processor writes to
`go through to memory immediately and bypass the associ(cid:173)
`ated cache. The write requests can then be serialized before 50
`overwriting a system memory line. However, bypassing the
`cache significantly decreases efficiency gained by using a
`cache. Other cache coherency mechanisms have been devel(cid:173)
`oped for specific architectures. In a shared bus architecture,
`each processor checks or snoops on the bus to determine 55
`whether it can read or write a shared cache block. In one
`example, a processor only writes an object when it owns or
`has exclusive access to the object. Each corresponding cache
`object is then updated to allow processors access to the most
`recent version of the object.
`Bus arbitration is used when both processors attempt to
`write the same shared data block in the same clock cycle.
`Bus arbitration logic decides which processor gets the bus
`first. Although, cache coherency mechanisms such as bus
`arbitration are effective, using a shared bus limits the num(cid:173)
`ber of processors that can be implemented in a single system
`with a single memory space.
`
`45
`
`2
`Other multiprocessor schemes involve individual proces(cid:173)
`sor, cache, and memory systems connected to other proces(cid:173)
`sors, cache, and memory systems using a network backbone
`such as Ethernet or Token Ring. Multiprocessor schemes
`involving separate computer systems each with its own
`address space can avoid many cache coherency problems
`because each processor has its own associated memory and
`cache. When one processor wishes to access data on a
`remote computing system, communication is explicit. Mes-
`10 sages are sent to move data to another processor and
`messages are received to accept data from another processor
`using standard network protocols such as TCP/IP. Multipro(cid:173)
`cessor systems using explicit communication including
`transactions such as sends and receives are referred to as
`systems using multiple private memories. By contrast, mul(cid:173)
`tiprocessor system using implicit communication including
`transactions such as loads and stores are referred to herein as
`using a single address space.
`Multiprocessor schemes using separate computer systems
`20 allow more processors to be interconnected while minimiz(cid:173)
`ing cache coherency problems. However, it would take
`substantially more time to access data held by a remote
`processor using a network infrastructure than it would take
`to access data held by a processor coupled to a system bus.
`25 Furthermore, valuable network bandwidth would be con(cid:173)
`sumed moving data to the proper processors. This can
`negatively impact both processor and network performance.
`Performance limitations have led to the development of a
`point-to-point architecture for connecting processors in a
`30 system with a single memory space. In one example, indi(cid:173)
`vidual processors can be directly connected to each other
`through a plurality of point-to-point links to form a cluster
`of processors. Separate clusters of processors can also be
`connected. The point-to-point links significantly increase the
`35 bandwidth for coprocessing and multiprocessing functions.
`However, using a point-to-point architecture to connect
`multiple processors in a multiple cluster system sharing a
`single memory space presents its own problems.
`Consequently, it is desirable to provide techniques for
`40 improving data access and cache coherency in systems
`having multiple processors connected using point-to-point
`links.
`
`SUMMARY OF THE INVENTION
`
`According to the present invention, various techniques are
`provided for reducing traffic relating to memory transactions
`in multi-processor systems. According to various specific
`embodiments, a computer system having a plurality of
`processing nodes interconnected by a first point-to-point
`architecture is provided. Each processing node has a cache
`memory associated therewith. A probe filtering unit is oper(cid:173)
`able to receive probes corresponding to memory lines from
`the processing nodes and to transmit the probes only to
`selected ones of the processing nodes with reference to
`probe filtering information. The probe filtering information
`is representative of states associated with selected ones of
`the cache memories.
`According to other embodiments, methods and apparatus
`60 are provided for reducing probe traffic in a computer system
`comprising a plurality of processing nodes interconnected
`by a first point-to-point architecture. A probe corresponding
`to a memory line is transmitted from a first one of the
`processing nodes only to a probe filtering unit. The probe is
`65 evaluated with the probe filtering unit to determine whether
`a valid copy of the memory line is in any of the cache
`memories. The evaluation is done with reference to probe
`
`28
`
`

`

`3
`filtering information associated with the probe filtering unit
`and representative of states associated with selected ones of
`the cache memories. The probe is transmitted from the probe
`filtering unit only to selected ones of the processing nodes
`identified by the evaluating. Probe responses from the
`selected processing nodes are accumulated by the probe
`filtering unit. Only the probe filtering unit responds to the
`first processing node.
`A further understanding of the nature and advantages of
`the present invention may be realized by reference to the 10
`remaining portions of the specification and the drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`60
`
`US 7,296,121 B2
`
`4
`FIG. 21 is a diagrammatic representation of a transaction
`flow in which local probe filtering is facilitated according to
`a specific embodiment of the invention.
`FIG. 22 is a diagrammatic representation of another
`transaction flow in which local probe filtering is facilitated
`according to a specific embodiment of the invention.
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`Reference will now be made in detail to some specific
`embodiments of the invention including the best modes
`contemplated by the inventors for carrying out the invention.
`Examples of these specific embodiments are illustrated in
`The invention may best be understood by reference to the
`15 the accompanying drawings. While
`the
`invention
`is
`following description taken in conjunction with the accom(cid:173)
`described in conjunction with these specific embodiments, it
`panying drawings, which are illustrative of specific embodi(cid:173)
`will be understood that it is not intended to limit the
`ments of the present invention.
`invention to the described embodiments. On the contrary, it
`FIGS. 1A and 1B are diagrammatic representation depict(cid:173)
`is intended to cover alternatives, modifications, and equiva-
`20 Ients as may be included within the spirit and scope of the
`ing a system having multiple clusters.
`FIG. 2 is a diagrammatic representation of a cluster
`invention as defined by the appended claims. Multi-proces(cid:173)
`having a plurality of processors.
`sor architectures having point-to-point communication
`FIG. 3 is a diagrammatic representation of a cache coher(cid:173)
`among their processors are suitable for implementing spe-
`ence controller.
`cific embodiments of the present invention. In the following
`FIG. 4 is a diagrmatic representation showing a trans- 25
`description, numerous specific details are set forth in order
`action flow for a data access request from a processor in a
`to provide a thorough understanding of the present inven-
`single cluster.
`tion. The present invention may be practiced without some
`FIG. SA-SD are diagrammatic representations showing
`or all of these specific details. Well-known process opera(cid:173)
`tions have not been described in detail in order not to
`cache coherence controller functionality.
`FIG. 6 is a diagrammatic representation depicting a trans- 30
`unnecessarily obscure the present invention. Furthermore,
`action flow for a request with multiple probe responses.
`the present application's reference to a particular singular
`FIG. 7 is a diagrmatic representation showing a cache
`entity includes that possibility that the methods and appa(cid:173)
`ratus of the present invention can be implemented using
`coherence directory.
`FIG. 8 is a diagrammatic representation showing probe
`more than one entity, unless the context clearly dictates
`filter information that can be used to reduce the number of 35 otherwise.
`According to various embodiments, techniques are pro(cid:173)
`probes transmitted to various clusters.
`FIG. 9 is a diagrmatic representation showing a trans(cid:173)
`vided for increasing data access efficiency in a multiple
`action flow for probing of a home cluster without probing of
`processor system. In a point-to-point architecture, a cluster
`other clusters.
`of processors includes multiple processors directly con-
`FIG. 10 is a diagrammatic representation showing a
`40 nected to each other through point-to-point links. By using
`transaction flow for probing of a single remote cluster.
`point-to-point links instead of a conventional shared bus or
`FIG. 11 is a flow process diagram showing the handling
`external network, multiple processors are used efficiently in
`of a request with probe filter information.
`a system sharing the same memory space. Processing and
`FIG. 12 is a diagrmatic representation showing
`network efficiency are also improved by avoiding many of
`45 the bandwidth and latency limitations of conventional bus
`memory controller filter information.
`FIG. 13 is a diagrammatic representation showing a
`and external network based multiprocessor architectures.
`transaction flow for probing a single remote cluster without
`According to various embodiments, however,
`linearly
`increasing the number of processors in a point-to-point
`probing a home cluster.
`FIG. 14 is a flow process diagram showing the handling
`architecture leads to an exponential increase in the number
`of a request at a home cluster cache coherence controller
`50 of links used to connect the multiple processors. In order to
`reduce the number of links used and to further modularize a
`using memory controller filter information.
`FIG. 15 is a diagrammatic representation showing a
`multiprocessor system using a point-to-point architecture,
`transaction flow for a cache coherence directory eviction of
`multiple clusters may be used.
`an entry corresponding to a dirty memory line.
`According to some embodiments, multiple processor
`FIG. 16 is a diagrammatic representation showing a 55
`clusters are interconnected using a point-to-point architec(cid:173)
`transaction flow for a cache coherence directory eviction of
`ture. Each cluster of processors includes a cache coherence
`an entry corresponding to a clean memory line.
`controller used to handle communications between clusters.
`FIG. 17 is a diagrmatic representation of a cache
`In one embodiment, the point-to-point architecture used to
`coherence controller according to a specific embodiment of
`connect processors are used

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket