throbber
US007395379B2
`
`(12) Ulllted States Patent
`Glasco
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,395,379 B2
`*Jul. 1, 2008
`
`(54) METHODS AND APPARATUS FOR
`RESPONDING TOA REQUEST CLUSTER
`
`6,351,791 B1 *
`6,374,331 B1 *
`
`2/2002 Freerksen et a1. ......... .. 711/146
`4/2002 Janakiraman etal. ..... .. 711/141
`
`A _ TX S
`' D _d B G1
`(75) I
`nvemor'
`a“ '
`asco’ usnn’
`(U )
`
`6,385,705 B1
`6,490,661 B1
`
`5/2002 Keller et al. .............. .. 711/154
`12/2002 Keller et a1. .... ..
`711/154
`
`(73) Assigneez Newisysa Inu’Austin’ TX (Us)
`
`(*) Notice:
`
`Subject to any disclaimer, theterm Ofthis
`Pawnt is extended Or adjusted under 35
`U.S.C. 154(b) by 862 days.
`
`6,615,319 B2* 9/2003 Khare et a1. .............. .. 711/141
`6,631,401 B1 * 10/2003 Keller et al. .............. .. 709/213
`6,631,448 B2* 10/2003 Weber ...................... .. 711/141
`6,633,945 B1 * 10/2003 Fu etal. ................... .. 710/316
`
`This patent is subject to a terminal dis
`clalmer'
`
`(21) APP1-NO-I 10/145A39
`
`(22) Filed:
`
`May 13, 2002
`
`(65)
`
`Prior Publication Data
`US 2003/0212741 A1
`Nov. 13, 2003
`
`(51) Int- Cl-
`(2006-01)
`G06F 12/ 00
`(52) US. Cl. ...................... .. 711/146; 711/141; 711/144;
`711/147; 711/148
`(58) Field of Classi?cation Search ............... .. 711/141,
`711/146*148, 144
`See application ?le for complete search history.
`References Cited
`
`(56)
`
`US. PATENT DOCUMENTS
`5,195,089 A
`3/1993 Sindhu et a1‘
`5,659,710 A *
`8/1997 Sherman et a1‘ ___________ __ 71l/146
`5,893,151 A
`4/1999 Merchant
`5,958,019 A *
`9/1999 HagerSten et a1, _________ __ 713/375
`5,966,729 A 10/ 1999 Phelps
`6,038,644 A
`3/2000 Irie et al.
`6,067,603 A
`5/2000 Carpenter et a1~
`6,141,692 A 10/2000 Loewenstein et a1.
`6,167,492 A 12/2000 Keller et a1. .............. .. 711/154
`6,292,705 B1
`9/2001 Wang et 31.
`6,295,583 B1
`9/2001 RaZdan et a1.
`6,336,169 B1* 1/2002 Arimilli et a1. ............ .. 711/144
`6,338,122 B1 *
`1/2002 Baumgartner et a1. ..... .. 711/141
`
`(Continued)
`
`OTHER PUBLICATIONS
`
`Alan Charesworth, “star?re: extending the SMP envelope”, pub
`lished in Feb. 1998 by IEEE, pp. 39-49.*
`
`d
`C t'
`( on “me )
`Primary ExamineriSanjiV Shah
`Assistant ExamineriZhuo H Li
`(74) Attorney, Agent, or FirmiWeaVer Austin V1lleneuve &
`Sampson
`
`(57)
`
`ABSTRACT
`
`According to the present invention,' methods and apparatus
`are provided for increasing the e?iciency of data access in a
`multiple processor, multiple cluster system. A home cluster of
`processors receives a cache access request from a request
`‘cluster. The home cluster includes mechanisms for instruct
`ing probed remote clusters to respond to the request cluster
`instead of to the home cluster. The home cluster can also
`include mechanisms for reducing the number of probes sent
`to remote clusters.Techniques are also includedforproviding
`the requesting cluster With information to determine the num
`ber of responses to be transmitted to the requesting cluster as
`a result of the reduction in the number of probes sent at the
`home cluster.
`
`35 Claims, 15 Drawing Sheets
`
`CPU
`1001-1
`
`Request
`Cluster 1000
`
`Hum: Clun?
`1020
`
`Rum:
`Cluster 1040
`
`Ran-1m:
`Cluster 1060
`
`

`
`US 7,395,379 B2
`Page 2
`
`Us. PATENT DOCUMENTS
`
`6,728,843 B1* 4/2004 Pong et al. ................ .. 711/150
`6,754,782 B2
`6/2004 Arimilli et al.
`6,760,819 B2
`7/2004 Dhong et al.
`6,799,252 B1
`9/2004 Bauman
`6,839,808 B2 *
`1/2005 Gruner et al. ............. .. 711/130
`6,973,543 B1
`12/2005 Hughes
`2002/0053004 A1* 5/2002 Pong ........................ .. 711/119
`2003/0095557 A1
`5/2003 Keller et al.
`
`OTHER PUBLICATIONS
`
`1. 03,
`HyperTransportTM I/O Link Speci?cation Revision
`HyperTranspoItTM ConsoItium, Oct. 10, 2001, Copyright © 2001
`HyperTranspoIt Technology ConsoItium.
`U.S. Appl. No. 10/106,426, Of?ce Action dated Sep. 22, 2004.
`US. Appl. No. 10/106,426, Of?ce Action dated Mar. 7, 2005.
`US. Appl. No. 10/406,426, Of?ce Action dated Jul. 21, 2005.
`US. Appl. No. 10/106,426, Of?ce Action dated Nov. 21, 2005.
`US. Appl. No. 10/106,430, Of?ce Action dated Sep. 23, 2004.
`US. Appl. No. 10/106,430, Of?ce Action dated Mar. 10,2005.
`U.S. Appl. No. 10/106,430, Of?ce Action dated Jul. 21, 2005.
`US. Appl. No. 10/106,430, Of?ce Action dated Nov. 2, 2005.
`US. Appl. No. 10/106,299, Of?ce Action dated Sep. 22, 2004.
`US. Appl. No. 10/106,299, Of?ce Action dated Mar. 10,2005.
`U.S. Appl. No. 10/106,299, Of?ce Action dated Jul. 21, 2005.
`
`US. Appl. No. 10/106,299, Of?ce Action dated Nov. 21, 2005.
`US. Appl. No. 10/145,438, Of?ce Action dated Nov. 21, 2005.
`US. Appl. No. 10/106,426, ?led Mar. 22, 2002, Notice ofAllowance,
`mailed Apr. 21, 2006.
`US. Appl. No. 10/106,426, ?led Mar. 22, 2002, Allowed claims.
`U.S. Appl. No. 10/106,430, ?led Mar. 22, 2002, Notice ofAllowance
`mailed Apr. 21, 2006.
`US. Appl. No. 10/106,430, ?led Mar. 22, 2002, Allowed claims.
`U.S. Appl. No. 10/106,299, ?led Mar. 22, 2002, Notice ofAllowance
`mailed Apr. 28, 2006.
`US. Appl. No. 10/106,299, ?led Mar. 22, 2002, Allowed claims.
`U.S. Appl. No. 10/145,438, ?led May 13, 2002, Of?ce Action mailed
`Jun. 20, 2007.
`US. Appl. No. 10/145,438, ?led May 13, 2002, Of?ce Action mailed
`Mar. 9, 2007.
`US. Appl. No. 10/145,438, ?led May 13, 2002, Of?ce Action mailed
`Aug. 22, 2006.
`US. Appl. No. 10/145,438, ?led May 13, 2002, Of?ce Action mailed
`May 4, 2006.
`US. Appl. No. 10/145,438, ?led May 13, 2002, Of?ce Action mailed
`Nov. 21, 2005.
`Alan ChareswoIth “Star?re: extending the SMP envelope” published
`in Feb. 1998 by IEEE, pp. 39-49.
`U.S. Appl. No. 10/145,438, Final Of?ce Action mailed Nov. 28,
`2007.
`
`* cited by examiner
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 1 of 15
`
`US 7,395,379 B2
`
`Figure 1A
`
`Processing
`Cluster 101
`
`‘
`
`[-1 1 1d
`5' '3
`
`Processing
`j Cluster 103
`
`I
`
`,w" _'--~\I
`
`" , - — — ~ — __
`
`I}
`
`1l1a—\:,____:h
`
`(.1116
`
`1111f)
`
`{Twp/"111C
`
`r
`
`‘L
`
`Processing
`Cluster 105
`
`‘
`
`[-1 1 1b
`:' '3
`3U."
`
`Processing
`Cluster 107
`
`V
`
`Figure 1B
`
`Processing
`Cluster 121
`
`_
`
`Processing
`Cluster 123
`
`V
`
`141!)
`X5:
`
`141
`6
`
`Processing f
`Cluster 125
`
`‘
`
`Processing
`Cluster 127
`
`

`
`Jul. 1, 2008
`
`Sheet 2 of 15
`
`US 7,395,379 B2
`
`.08SmU.NOHSE
`tinm£8['11)SowM£8PI\«30:20
`Bofiom
`
`
`
`uocfionoo
`
`ommbzouaoo
`
`uommoooum
`
`
`
`cancan
`
`oafixamon
`
`coonooom
`
`..II.I..SNon
`
`
`uofiofimm¢nfiVVHmml
`0lEfiucum
`
`

`
`U.S. Patent
`
`m
`
`4.MS
`
`5
`
`US 7,395,379 B2
`
`3mfimmmm
`
`
`mmomBaummommammmEugen
`
`
`
`HamDom.w.HofiH.~Efioaousoz
`
`
`
`2,gmoomfimyfiE2300WOmm
`
`mEsmqm
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 4 0f 15
`
`US 7,395,379 B2
`
`w QBwE
`
`DmU
`
`méow
`
`DmU
`
`N- H ow bow
`
`D\ A
`
`mow
`
`mow
`
`U2
`
`fmow
`
`DmU
`
`75¢
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 5 0f 15
`
`US 7,395,379 B2
`
`qmom
`
`02
`
`DmD
`
`m-Sm
`
`méom
`
`<m 25%
`
`mom
`
`mom
`
`
`
`mowoz EooQéoZ
`
`02
`
`Tmom
`
`DAD
`
`72%
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 6 0f 15
`
`US 7,395,379 B2
`
`mémm
`
`mmm .\ / M,
`
`/
`
`
`
`@252 EooAéoZ
`
`Hmm
`
`mm 03E
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 7 0f 15
`
`US 7,395,379 B2
`
`um 25%»
`
`DmO
`
`méwm
`
`Qm
`
`79%
`
`K
`
`Tjvm
`
`DmU
`
`mmm
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 8 0f 15
`
`US 7,395,379 B2
`
`U2
`
`méwm
`
`4
`
`mm 2&5
`
`
`
`E8 , 3% . Sm - Eon , 3% u A o A 02 0
`
`A
`
`0% q
`
`, 5 A
`
`§ \
`
`V £0 OwHIQOZ
`
`m;
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 9 0f 15
`
`US 7,395,379 B2
`
`U DmO O 3.6 0 DnAU A U U DAAO
`
`w 93E
`
`A
`
`Wmoo wAow Ymow mAoo Wmom mAowT new Alm-mow AlmowAl 75w
`
`
`
`
`
`
`
`
`
`mow \
`
`\ A
`
`02 u u o , A T 02 T 0
`
`
`
`
`
`Wmmw mémw .wAmo mAmw L M 5% Tmmw AAmw
`
`A
`
`
`
` \ Q / méwwAl. 5% TAAww U A U
`
`E \
`
`A
`
`0
`
`mg \
`
`
`
`NAmw 050m
`
`_ r 0% 5530
`
`
`
`mow 5263A
`
`
`
`coo $330
`
`
`
`ado 805mm
`
`9% $620
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 10 0f 15
`
`US 7,395,379 B2
`
`wm? 3Q #5 W5, RB 29 3E
`
`02 o o o A ?l 02 o
`
`A
`
`h 05E
`
`
`
`
`
`3? E? ,1? 3? £2 Earl 5 ‘ii? #SPIIE o P6 o Eu 0 Q6 1H o 0 P6
`
`\ ,,
`
`A
`
`5 \
`
`A
`
`\ A
`
`EE‘I E ‘13$ 0 q 0
`
`WE \
`
`5 \
`o /
`
`, , 88526
`
`
`
`NANN, 056m
`
`
`
`mow $250M
`
`82250
`
`QR 22am
`
`82325
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 11 of 15
`
`US 7,395,379 B2
`
`Figure 8
`
`ag Handling A
`A Home Cache
`erence Contro
`
`7
`Receive Cache Access
`Request
`
`8 03 \ Generate New Tag
`
`V
`
`805 X
`
`V
`Maintained Tag In
`Pending Buffer
`
`7
`
`Forward Request To
`Serialization Point
`
`V
`
`Receive Probe From
`Serialization Point
`
`813
`
`'1 6 Resulting
`‘
`Locally Generated
`Request?
`
`No
`v
`Use Tag Corresponding
`To Tag From Request
`Cluster
`
`Use Newly Generated Tag
`From Home Cluster Tag
`Space
`
`v
`
`Broadcast Probes To
`Nonlocal Clusters With
`Selected Tag Information
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 12 0f 15
`
`US 7,395,379 B2
`
`Figure 9
`
`ag Handling A
`A Request Cache
`herence Contro
`
`F
`
`901 \ Send Cache Access
`Request To Home Cluster
`
`r
`
`903 "\ Receive Probe From
`Home Cluster
`
`v
`
`0 9 5 \ Probe Local Nodes
`
`v
`
`907 '\ Receive A Plurality Of
`Probe Responses
`
`'1
`
`Signal Prcgfetslsor
`90
`Associated 1 The
`9 \ Request After Expected
`Probe Responses Are
`Received
`
`

`
`US. Patent
`
`Jul. 1,2008
`
`Sheet 13 0f 15
`
`US 7,395,379 B2
`
`
`
`
`
`WmooH wéoo? #182 U PHD U
`
`02 U U A
`
`
`
`NANA: W32 W52 \ A
`
`2 85mm
`
`DmU
`
`NIH o2
`
`a $025M
`33 A
`
`
`
`002 .6520
`
`0 Q U
`A
`2% M \
`
`
`
`$3.2 53 All T32 805mm
`
`0
`
`7 H 02 835%
`
`. S2
`
`.6320 25m
`
`
`
`Q1: .5556
`
`32 $330
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 14 0f 15
`
`US 7,395,379 B2
`
`Figure l l
`
`ag Managemen
`Before Probe
`Transmission
`
`V
`
`1101 \ Receive Cache Access
`Request
`
`"
`1105 '\ Maintained Tag In
`Pending Buffer
`
`'
`1107 \ Fgégggi?-lugfirio
`
`Yes
`
`V
`1111 '\ Receive Probe From
`Serialization Point
`
`1113
`
`a I e Resulting ~
`Locally Generated
`REquest?
`
`1115 '\d
`
`No
`v
`Use Tag Corresponding
`To Tag From Request
`Cluster
`
`1121 \ Use Newly Generated Tag
`' From Home Cluster Tag
`Space
`
`V
`
`1123 \d Select Clusters To Send
`7
`Probes To Based On
`Directory
`
`1131
`
`'
`X Send Probe To Home
`Cluster With Coherence
`Information
`
`1133 ~\_
`
`v
`Forward Probes To
`Selected Clusters With
`Tag Information
`
`

`
`US. Patent
`
`Jul. 1, 2008
`
`Sheet 15 0f 15
`
`US 7,395,379 B2
`
`Figure 12
`
`ag Handling Upo
`Receiving Probe
`Responses
`
`1201 N Send Cache Access
`Request To Home Cluster
`
`7
`
`1203 \ Receive Probe From
`Home Cluster
`
`1205 \ Extract Information From
`
`Probe To Determine
`Number Of Expected
`Probe Responses
`
`1207 '\ Probe Local Nodes
`
`l
`1209 *\ Receive A Plurality Of
`Probe Responses
`
`1211 '\
`
`7
`Signal Processor
`Associated With The
`Request A?er Expected
`Probe Responses Are
`Received
`
`

`
`US 7,395,379 B2
`
`1
`METHODS AND APPARATUS FOR
`RESPONDING TO A REQUEST CLUSTER
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`The present application is related to ?led US. application
`Ser. No. 10/106,426 titled Methods And Apparatus For
`Speculative Probing At A Request Cluster, US. application
`Ser. No. 10/106,430 titled Methods And Apparatus For
`Speculative Probing With Early Completion And Delayed
`Request, and US. application Ser. No. 10/106,299 titled
`Methods And Apparatus For Speculative Probing With Early
`Completion And Early Request, the entireties of Which are
`incorporated by reference herein for all purposes. The present
`application is also related to concurrently ?led U.S. applica
`tion Ser. No. 10/145,438 titled Methods And Apparatus For
`Responding To A Request Cluster by David B. Glasco, the
`entirety of Which is incorporated by reference for all pur
`poses.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention generally relates to accessing data in
`a multiple processor system. More speci?cally, the present
`invention provides techniques for improving data access e?i
`ciency While maintaining cache coherency in a multiple pro
`cessor system having a multiple cluster architecture.
`2. Description of Related Art
`Data access in multiple processor systems can raise issues
`relating to cache coherency. Conventional multiple processor
`computer systems have processors coupled to a system
`memory through a shared bus. In order to optimiZe access to
`data in the system memory, individual processors are typi
`cally designed to Work With cache memory. In one example,
`each processor has a cache that is loaded With data that the
`processor frequently accesses. The cache is read or Written by
`a processor. HoWever, cache coherency problems arise
`because multiple copies of the same data can co-exist in
`systems having multiple processors and multiple cache
`memories. For example, a frequently accessed data block
`corresponding to a memory line may be loaded into the cache
`of tWo different processors. In one example, if both proces
`sors attempt to Write neW values into the data block at the
`same time, different data values may result. One value may be
`Written into the ?rst cache While a different value is Written
`into the second cache. A system might then be unable to
`determine What value to Write through to system memory.
`A variety of cache coherency mechanisms have been
`developed to address such problems in multiprocessor sys
`tems. One solution is to simply force all processor Writes to go
`through to memory immediately and bypass the associated
`cache. The Write requests can then be serialized before over
`Writing a system memory line. HoWever, bypassing the cache
`signi?cantly decreases e?iciency gained by using a cache.
`Other cache coherency mechanisms have been developed for
`speci?c architectures. In a shared bus architecture, each pro
`cessor checks or snoops on the bus to determine Whether it
`can read or Write a shared cache block. In one example, a
`processor only Writes an object When it oWns or has exclusive
`access to the object. Each corresponding cache object is then
`updated to alloW processors access to the most recent version
`of the object.
`Bus arbitration is used When both processors attempt to
`Write the same shared data block in the same clock cycle. Bus
`arbitration logic decides Which processor gets the bus ?rst.
`
`10
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`Although, cache coherency mechanisms such as bus arbitra
`tion are effective, using a shared bus limits the number of
`processors that can be implemented in a single system With a
`single memory space.
`Other multiprocessor schemes involve individual proces
`sor, cache, and memory systems connected to other proces
`sors, cache, and memory systems using a netWork backbone
`such as Ethernet or Token Ring. Multiprocessor schemes
`involving separate computer systems each With its oWn
`address space can avoid many cache coherency problems
`because each processor has its oWn associated memory and
`cache. When one processor Wishes to access data on a remote
`computing system, communication is explicit. Messages are
`sent to move data to another processor and messages are
`received to accept data from another processor using standard
`netWork protocols such as TCP/IP. Multiprocessor systems
`using explicit communication including transactions such as
`sends and receives are referred to as systems using multiple
`private memories. By contrast, multiprocessor system using
`implicit communication including transactions such as loads
`and stores are referred to herein as using a single address
`space.
`Multiprocessor schemes using separate computer systems
`alloW more processors to be interconnected While minimiZing
`cache coherency problems. HoWever, it Would take substan
`tially more time to access data held by a remote processor
`using a netWork infrastructure than it Would take to access
`data held by a processor coupled to a system bus. Further
`more, valuable netWork bandWidth Would be consumed mov
`ing data to the proper processors. This can negatively impact
`both processor and netWork performance.
`Performance limitations have led to the development of a
`point-to-point architecture for connecting processors in a sys
`tem With a single memory space. In one example, individual
`processors can be directly connected to each other through a
`plurality of point-to-point links to form a cluster of proces
`sors. Separate clusters of processors can also be connected.
`The point-to-point links signi?cantly increase the bandWidth
`for coprocessing and multiprocessing functions. HoWever,
`using a point-to-point architecture to connect multiple pro
`cessors in a multiple cluster system sharing a single memory
`space presents its oWn problems.
`Consequently, it is desirable to provide techniques for
`improving data access and cache coherency in systems hav
`ing multiple clusters of multiple processors connected using
`point-to-point links.
`
`SUMMARY OF THE INVENTION
`
`According to the present invention, methods and apparatus
`are provided for increasing the e?iciency of data access in a
`multiple processor, multiple cluster system. A home cluster of
`processors receives a cache access request from a request
`cluster. The home cluster includes mechanisms for instruct
`ing probed remote clusters to respond to the request cluster
`instead of to the home cluster. The home cluster can also
`include mechanisms for reducing the number of probes sent
`to remote clusters. Techniques are also included for providing
`the requesting cluster With information to determine the num
`ber of responses to be transmitted to the requesting cluster as
`a result of the reduction in the number of probes sent from the
`home cluster.
`According to various embodiments, a computer system is
`provided. A home cluster includes a ?rst plurality of proces
`sors and a home cache coherence controller. The ?rst plurality
`of processors and the home cache coherence controller are
`interconnected in a point-to-point architecture. The home
`
`

`
`US 7,395,379 B2
`
`3
`cache coherence controller is con?gured to send a probe to a
`remote cluster upon receiving a cache access request from a
`request cluster. The probe includes information directing the
`remote cluster to send a probe response corresponding to the
`request to the request cluster.
`According to other embodiments, another computer sys
`tem is provided. The computer system includes a ?rst cluster
`and a second cluster. The ?rst cluster includes a ?rst plurality
`of processors and a ?rst cache coherence controller. The ?rst
`plurality of processors and the ?rst cache coherence control
`ler are interconnected in a point-to -point architecture. The
`second cluster includes a second plurality of processors and a
`second cache coherence controller. The second plurality of
`processors and the second cache coherence controller are
`interconnected in a point-to -point architecture. The ?rst
`cache coherence controller is coupled to the second cache
`coherence controller and con?gured to send a request to the
`second cluster. The ?rst cache coherence controller is con?g
`ured to receive a plurality of probe responses corresponding
`to the request.
`According to still other embodiments, a method for a cache
`coherence controller to manage data access in a multiproces
`sor system is provided. A cache access request originating
`from a ?rst cluster of processors is sent to a second cluster of
`processors. A plurality of probe responses corresponding to
`the cache access request is received from a plurality of clus
`ters.
`A further understanding of the nature and advantages of the
`present invention may be realiZed by reference to the remain
`ing portions of the speci?cation and the draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention may best be understood by reference to the
`folloWing description taken in conjunction With the accom
`panying draWings, Which are illustrative of speci?c embodi
`ments of the present invention.
`FIGS. 1A and 1B are diagrammatic representation depict
`ing a system having multiple clusters.
`FIG. 2 is a diagrammatic representation of a cluster having
`a plurality of processors.
`FIG. 3 is a diagrammatic representation of a cache coher
`ence controller.
`FIG. 4 is a diagrammatic representation shoWing a trans
`action ?oW for a data access request from a processor in a
`single cluster.
`FIGS. 5A-5D are diagrammatic representations shoWing
`cache coherence controller functionality.
`FIG. 6 is a diagrammatic representation depicting a trans
`action ?oW for a remote cluster sending a probe response to a
`home cluster.
`FIG. 7 is a diagrammatic representation shoWing a trans
`action ?oW for a remote cluster sending a probe response to a
`requesting cluster.
`FIG. 8 is a How process diagram shoWing tag management
`before probe transmission to remote nodes.
`FIG. 9 is a process How diagram shoWing a technique for
`receiving probe responses.
`FIG. 10 is a diagrammatic representation shoWing a trans
`action ?oW for a remote cluster sending a probe response to a
`requesting cluster.
`FIG. 11 is a How process diagram shoWing tag manage
`ment before probe transmission to remote nodes in a system
`With a coherence directory.
`FIG. 12 is a process How diagram shoWing a technique for
`receiving probe responses in a system With a coherence direc
`tory.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`Reference Will noW be made in detail to some speci?c
`embodiments of the invention including the best modes con
`templated by the inventors for carrying out the invention.
`Examples of these speci?c embodiments are illustrated in the
`accompanying draWings. While the invention is described in
`conjunction With these speci?c embodiments, it Will be
`understood that it is not intended to limit the invention to the
`described embodiments. On the contrary, it is intended to
`cover alternatives, modi?cations, and equivalents as may be
`included Within the spirit and scope of the invention as
`de?ned by the appended claims. Multi-processor architec
`tures having point-to-point communication among their pro
`cessors are suitable for implementing speci?c embodiments
`of the present invention. In the folloWing description, numer
`ous speci?c details are set forth in order to provide a thorough
`understanding of the present invention. The present invention
`may be practiced Without some or all of these speci?c details.
`Well knoWn process operations have not been described in
`detail in order not to unnecessarily obscure the present inven
`tion. Furthermore, the present application’s reference to a
`particular singular entity includes that possibility that the
`methods and apparatus of the present invention can be imple
`mented using more than one entity, unless the context clearly
`dictates otherWise.
`Techniques are provided for increasing data access e?i
`ciency in a multiple processor, multiple cluster system. In a
`point-to-point architecture, a cluster of processors includes
`multiple processors directly connected to each other through
`point-to-point links. By using point-to-point links instead of a
`conventional shared bus or external netWork, multiple pro
`cessors are used e?iciently in a system sharing the same
`memory space. Processing and netWork e?iciency are also
`improved by avoiding many of the bandWidth and latency
`limitations of conventional bus and external netWork based
`multiprocessor architectures. According to various embodi
`ments, hoWever, linearly increasing the number of proces sors
`in a point-to-point architecture leads to an exponential
`increase in the number of links used to connect the multiple
`processors. In order to reduce the number of links used and to
`further modulariZe a multiprocessor system using a point-to
`point architecture, multiple clusters are used.
`According to various embodiments, the multiple processor
`clusters are interconnected using a point-to-point architec
`ture. Each cluster of processors includes a cache coherence
`controller used to handle communications betWeen clusters.
`In one embodiment, the point-to-point architecture used to
`connect processors are used to connect clusters as Well.
`By using a cache coherence controller, multiple cluster
`systems can be built using processors that may not necessarily
`support multiple clusters. Such a multiple cluster system can
`be built by using a cache coherence controller to represent
`non-local nodes in local transactions so that local nodes do
`not need to be aWare of the existence of nodes outside of the
`local cluster. More detail on the cache coherence controller
`Will be provided beloW.
`In a single cluster system, cache coherency can be main
`tained by sending all data access requests through a serialiZa
`tion point. Any mechanism for ordering data access requests
`is referred to herein as a serialization point. One example of a
`serialiZation point is a memory controller. Various processors
`in the single cluster system send data access requests to the
`memory controller. In one example, the memory controller is
`con?gured to serialiZe or lock the data access requests so that
`only one data access request for a given memory line is
`
`

`
`US 7,395,379 B2
`
`25
`
`5
`allowed at any particular time. If another processor attempts
`to access the same memory line, the data access attempt is
`blocked until the memory line is unlocked. The memory
`controller alloWs cache coherency to be maintained in a mul
`tiple processor, single cluster system.
`A serialization point can also be used in a multiple proces
`sor, multiple cluster system Where the processors in the vari
`ous clusters share a single address space. By using a single
`address space, internal point-to-point links can be used to
`signi?cantly improve intercluster communication over tradi
`tional external netWork based multiple cluster systems. Vari
`ous processors in various clusters send data access requests to
`a memory controller associated With a particular cluster such
`as a home cluster. The memory controller can similarly seri
`alize all data requests from the different clusters. However, a
`serialization point in a multiple processor, multiple cluster
`system may not be as ef?cient as a serialization point in a
`multiple processor, single cluster system. That is, delay
`resulting from factors such as latency from transmitting
`betWeen clusters can adversely affect the response times for
`various data access requests. It should be noted that delay also
`results from the use of probes in a multiple processor envi
`ronment.
`Although delay in intercluster transactions in an architec
`ture using a shared memory space is signi?cantly less than the
`delay in conventional message passing environments using
`external netWorks such as Ethernet or Token Ring, even mini
`mal delay is a signi?cant factor. In some applications, there
`may be millions of data access requests from a processor in a
`fraction of a second. Any delay can adversely impact proces
`sor performance.
`According to various embodiments, speculative probing is
`used to increase the ef?ciency of accessing data in a multiple
`processor, multiple cluster system. A mechanism for eliciting
`a response from a node to maintain cache coherency in a
`system is referred to herein as a probe. In one example, a
`mechanism for snooping a cache is referred to as a probe. A
`response to a probe can be directed to the source or target of
`the initiating request. Any mechanism for sending probes to
`nodes associated With cache blocks before a request associ
`ated With the probes is received at a serialization point is
`referred to herein as speculative probing.
`According to various embodiments, the reordering or
`elimination of certain data access requests do not adversely
`affect cache coherency. That is, the end value in the cache is
`the same Whether or not snooping occurs. For example, a
`local processor attempting to read the cache data block can be
`alloWed to access the data block Without sending the requests
`through a serialization point in certain circumstances. In one
`example, read access can be permitted When the cache block
`is valid and the associated memory line is not locked. Tech
`niques for performing speculative probing generally are
`described in US. application Ser. No. 10/ 106,426 titled
`Methods And Apparatus For Speculative Probing At A
`Request Cluster, US. application Ser. No. 10/ 106,430 titled
`55
`Methods And Apparatus For Speculative Probing With Early
`Completion And Delayed Request, and US. application Ser.
`No. 10/ 106,299 titled Methods And Apparatus For Specula
`tive Probing With Early Completion And Early Request, the
`entireties of Which are incorporated by reference herein for all
`purposes. By completing a data access transaction Within a
`local cluster, the delay associated With transactions in a mul
`tiple cluster system can be reduced or eliminated.
`The techniques of the present invention recognize that
`other ef?ciencies can be achieved, particularly When specu
`lative probing can not be completed at a local cluster. In one
`example, a cache access request is forWarded from a local
`
`45
`
`50
`
`60
`
`65
`
`20
`
`30
`
`35
`
`40
`
`6
`cluster to a home cluster. A home cluster then proceeds to
`send probes to remote clusters in the system. In typical imple
`mentations, the home cluster gatherers the probe responses
`corresponding to the probe before sending an aggregated
`response to the request cluster. The aggregated response typi
`cally includes the results of the home cluster probes and the
`results of the remote cluster probes. The techniques of the
`present invention provide techniques for more e?iciently
`aggregating responses at the request cluster instead of a home
`cluster. According to various embodiments, remote clusters
`send probe responses directly to the request cluster instead of
`sending the probe responses to the request cluster through a
`home cluster. In one embodiment, techniques are provided
`for enabling a home cluster to send a reduced number of
`probes to remote clusters. Mechanisms are provided for
`alloWing a home cluster to inform the request cluster that a
`reduced number of probes are being transmitted. The mecha
`nisms can be implemented in a manner entirely transparent to
`remote clusters.
`FIG. 1A is a diagrammatic representation of one example
`of a multiple cluster, multiple processor system that can use
`the techniques of the present invention. Each processing clus
`ter 101, 103, 105, and 107 can include a plurality of proces
`sors. The processing clusters 101, 103, 105, and 107 are
`connected to each other through point-to-point links Illa-f
`In one embodiment, the multiple processors in the multiple
`cluster architecture shoWn in FIG. 1A share the same memory
`space. In this example, the point-to-point links Illa-f are
`internal system connections that are used in place of a tradi
`tional front-side bus to connect the multiple processors in the
`multiple clusters 101, 103, 105, and 107. The point-to-point
`links may support any point-to-point coherence protocol.
`FIG. 1B is a diagrammatic representation of another
`example of a multiple cluster, multiple processor system that
`can use the techniques of the present invention. Each process
`ing cluster 121, 123, 125, and 127 can be coupled to a sWitch
`131 through point-to-point links 141a-d. It should be noted
`that using a sWitch and point-to-point links alloWs implemen
`tation With feWer point-to-point links When connecting mul
`tiple clusters in the system. A sWitch 131 can include a pro
`cessor With a coherence protocol interface. According to
`various implementations, a multicluster system shoWn in
`FIG. 1A is expanded using a sWitch 131 as shoWn in FIG. 1B.
`FIG. 2 is a diagrammatic representation of a multiple pro
`cessor cluster, such as the cluster 101 shoWn in FIG. 1A.
`Cluster 200 includes processors 20211-20201, one or more
`Basic I/O systems (BIOS) 204, a memory subsystem com
`prising memory banks 206a-206d, point-to-point communi
`cation links 208a-208e, and a service processor 212. The
`point-to-point communication links are con?gured to alloW
`interconnections betWeen processors 20211-20201, I/O sWitch
`210, and cache coherence controller 230. The service proces
`sor 212 is con?gured to alloW communications With proces
`sors 20211-20201, I/O sWitch 210, and cache coherence con
`troller 230 via a JTAG interface represented in FIG. 2 by links
`214a-214f It should be noted that other interfaces are sup
`ported. I/O sWitch 210 connects the rest of the system to I/O
`adapters 216 and 220.
`According to speci?c embodiments, the service processor
`of the present invention has the intelligence to partition sys
`tem resources according to a previously speci?ed partitioning
`schema. The partitioning can be achieved through direct
`manipulation of routing tables associated With the system
`processors by the service processor Which is made possible
`by the point-to-point communication infrastructure. The
`routing tables are used to control and isolate various system
`resources, the connections b

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket