throbber
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 1
`
`US007107408B2
`
`(12; United States Patent
`Glasco
`
`(10) Patent No.:
`
`(45) Date of Patent:
`
`US 7,107,408 B2
`Sep. 12, 2006
`
`(54) METHODS AND APPARATUS FOR
`SPECULATIVE PROBING WITH EARLY
`(I()MPLICTI()N AND EARLY REQUEST
`
`(75)
`
`Inventor: David I]. Glaseo. Austin. TX (US)
`
`(73) Assignee: Newisys, Inc.. Austin. TX (US)
`
`( 1‘ ) Notice:
`
`Subject to any disclaimer. the term of this
`patent is extended or adjusted under 35
`U.S.C. 1540;) by 454 days.
`
`(21) Appl. No.: 1o;1ms,299
`
`(22) Filed:
`
`Mar. 22, 2002
`
`(65)
`
`Prior Publication Data
`
`US 20113011825118 Al
`
`Sep. 25. 2003
`
`(51)
`
`Int. Cl.
`G06!‘ 12/'08
`G061’ 12/16
`(52) U.S. Cl.
`
`(2006.01)
`(2006.01)
`711.1141: 7111118: 7111128:
`7l1t'l3(); 7111146
`71li’14l—1-46.
`(58) Field of Classification Search
`7111130. 147-149. 135-136. 119-122. 118
`71141117
`
`application file [or complete seancli history.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`5.195.039 A ’“
`5.958.019 A
`6.067.603 A "‘
`6.167.492 A
`6,292,705 B1 "‘
`6.338.122 Bl "‘
`6.324.331 B1‘
`
`311993 Sind.l1I.l el :11.
`9.11999 Hagersten et al.
`5.52000 Carpenter ct al.
`1252000 Keller ct al.
`912001 Wang et al.
`l.-T2002 Baumganner el al.
`4.-“Z002
`Jana.kira.ma.n ct al.
`
`370.-"235
`
`
`
`711.-‘"141
`.. 711.-"154
`70055
`711.-"141
`T11.-"141
`
`6.385.705 Bl
`6.490.661 Bl
`6.615.319 B1‘
`6.633.945 B1‘
`6.754.782 Bl "
`6360.819 B11‘
`6.299.252 B1‘
`6.339.808 Bl "
`
`512002 Keller el 21.].
`12.-"2002 Keller el al.
`9.-"2003 Kharc et al.
`1012003 Fu et al.
`6.32004 Aliinilli el al.
`?.-''2004 Dhong ct a.l.
`9.32004 Bauman
`1.-"2005 Gruner et al.
`
`
`
`T11.-“I54
`'r‘ll-'lS0
`'i'lle'l41
`'i’l0.*'3 16
`7''] [F144
`
`'i'lle'l46
`"ill.-‘"149
`............ .. 7''] [F130
`
`..
`
`(JTI [HR PU l3[.ICA'I‘l()NS
`
`1.03.
`Revision
`Specification
`Link
`b"yperIrar:spoJ'r"'”I/O
`lIypcr’l‘ra.nsport"'“ Consortium. Oct. 10. 2001. Copyright EC‘: 2001
`HyperTra.I1sp0l'l Technology Consortium.
`US. App]. No. 101106.426. filed Mar. 22. 2002. Oflice Action
`mailed Nov. 21. 2005.
`US. App]. No. 10.-"145.-438.
`mailed Nov. 21. 2005.
`US. App]. No. 10.-"145.-439.
`mailed Nov. 21, 2005.
`
`filed May 13. 2002. Office Action
`
`filed May 13. 2002. Officc Action
`
`"‘ cited by examiner
`
`Pr.I'mar_v E.raminer—Matthew Kiln
`Assisram ;.’5'xai::fner~
`?’.huo ll. Li
`(74) /1m)rr1ey.
`/1gem, or Firm Beyer Weaver & Thomas.
`1.1.1’
`
`(57)
`
`ABSTRACT
`
`According to the present invention. methods and apparatus
`are provided for increasing the efficiency of data access in
`multiple processor. multiple cluster systems. A cache coher-
`ence controller associated with a lirst cluster of processors
`can determine whether speculative probing can be per-
`formed before forwarding a data access request to a second
`cluster. The cache coherence controller can also forward the
`
`data access request to the second cluster before receiving a
`probe response.
`
`53 Claims, 15 Drawing Sheets
`
` wlmme923-2
`
`Home
`___C1uster 97.0
`
`Remote
`Cluster ‘J40
`
`1 A
`
`
`
`1
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 1 0f 15
`
`US 7,107,408 B2
`
`Figure 1A
`
`Processing
`Cluster 101
`
`‘
`
`{-1 1 1d
`.1’ 3,
`I.’
`
`Processing
`Cluster 103
`
`V
`
`1|.
`
`,_ -
`
`-__
`
`: _ _ -.\
`
`A
`
`“MN,
`
`C111i:
`
`"111-1J
`
`,.--~-~~~/'111°
`
`V
`
`Y
`
`Processing
`Cluster 105
`
`‘
`
`[-11 lb
`I ‘
`
`Processing
`Cluster 107
`
`,
`
`Figure 1B
`
`Processing
`Cluster 121 f
`
`Processing
`Cluster 123
`
`V
`
`“""‘;'\—141d
`141 sf"?
`Switch 131
`
`mb?iilklb
`
`‘III: If 1410
`
`Processing *
`Cluster 125
`
`* Processing
`Cluster 127
`
`2
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 2 0f 15
`
`US 7,107,408 B2
`
`
`
`
`
`I] swam Homwoooam
`
`
`
`umom 5380.5
`
`6525
`
`N 0.53
`
`
`
`2835 225M
`
`OS 3955
`
`2.150
`
`EN @236 0;
`
`wOHm
`
`wow
`
`
`
`mmom Hommmooum
`
`1; 68k
`
`mwom)
`
`
`
`owom Hommwuopm
`
`I]:
`
`umomfl/
`
`Egg
`
`3
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 3 0f 15
`
`US 7,107,408 B2
`
`32$
`
`m mEmE
`
`
`
`2% @o?BEH E8500
`
`
`
`mom Sham mom m?mmm E9805
`
`
`
`
`
`4
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 4 0f 15
`
`US 7,107,408 B2
`
`U2
`23 1
`
`DMD
`
`méow
`
`A
`
`w PBwE
`
`m?
`
`DmU
`
`Néov
`
`A A
`
`A
`
`U2
`
`Tmov
`
`DmU
`
`Héov
`
`mow
`
`5
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 5 0f 15
`
`US 7,107,408 B2
`
`U2
`
`Wmom
`
`k
`
`§ 2&5
`
`DmO
`
`méom
`
`DmU
`
`mom
`
`wow
`
`mom
`
`U2
`
`Tmom
`
`DmU
`
`Tam
`
`6
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 6 0f 15
`
`US 7,107,408 B2
`
`mémm
`
`mm 2&5
`
`7
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 7 0f 15
`
`US 7,107,408 B2
`
`mlgm
`
`DmU
`
`A
`
`DmU
`
`N113
`
`H 23%
`
`mwm
`
`Dvm
`
`mwm
`
`Hmm
`
`Tmwm
`
`A
`
`DAD
`
`Tiwm
`
`
`
`wowoz 13017202
`
`mmm
`
`8
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 8 0f 15
`
`US 7,107,408 B2
`
`02
`
`mlmom
`
`A
`
`DHHO
`
`mémm
`
`1 281 >3 1
`
`P6 1H 1
`
`QQBmE
`
`Q
`
`Ch
`
`A
`
`g \ Q /
`wow .\
`203
`v2
`
`1 7-2%
`P6 _
`
`4
`
`V
`
`mbm
`
`9
`
`

`
`U.S. Patent
`
`eS
`
`D...
`
`60022,1
`
`51f09teehS
`
`7
`
`2B80
`
`10
`
`W3%So3%oqo
`
`M,0ago095mam1,NH
`
`as4
`
`q0%
`
`10
`
`

`
`U.S. Patent
`
`Sep. 12,2006
`
`Sheet 10 of 15
`
`US 7,107,408 B2
`
`11
`
`CMC721E723-2
`
`721-4
`
`Figure7
`
`Home
`
`Cluster720
`
`Remote
`
`Cluster740
`
`11
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 11 0f 15
`
`US 7,107,408 B2
`
`Figure 8
`
`Identi?es Memory Line
`Associated With A
`Request From A Request
`801 I“ Cluster Processor
`
`803
`
`Specula '
`,
`Probing Be
`
`305 /\ Proceed With Speculative
`Probing
`
`s - ‘ eivcd Pr :
`
`Associated Wit
`
`Proceed Without
`Sp eculative Probing
`
`i
`
`Y?g
`8
`
`,-\l
`
`Provide Probe
`InfOIljnHtlOl'l T0
`Intervemng Processor
`
`823
`/ v
`
`Provide Probe
`-
`——~———v Information T 0 Request
`81 1
`Ciuster Processor
`
`.
`Proceed Without
`Sp ecula?ve Probing
`
`E
`
`815 ’\'
`
`Wait For Responses
`I
`
`L_______
`
`,
`
`813 \f ‘ End
`
`12
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 12 0f 15
`
`US 7,107,408 B2
`
`$8 2% ES 2&1 Rm ZRTIQ
`
`02 u o 0 Q 02 0
`
`
`
`
`
`2a Q
`
`0
`
`
`, ,7 83225
`
`
`mémm 080m
`
`mmQsmE
`
`
`
`38 £8 18 38 38 $3 Sm Ii 38 o o o uTQEoTu Q‘lo?lbmu
`
`Q
`
`Q
`
`8m \
`
`
`
`mom 595mm
`
`82236
`
`2a Q
`
`3,3 2% 3% 0 Q 0
`
`Q
`
`
`
`9% 805mm
`
`33226
`
`13
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 13 0f 15
`
`US 7,107,408 B2
`
`Figure 10
`
`1001 \ Identify Cache State In
`Controller
`
`1003
`
`Cache State =
`Shared?
`
`No
`
`Yes
`
`1015
`
`Types Of Access
`Requested?
`
`1005
`
`Cache State =
`Owned?
`
`NO
`
`Read
`
`1 007
`
`Cache State =
`Exclusive?
`
`NO
`
`I01 7 N
`Yas
`
`Can Complete
`Transaction Locally
`
`1009
`
`Cache State =
`Modi?ed?
`
`No
`v
`
`1011
`
`Can Not Complete
`Transaction Locally
`
`‘
`
`n
`
`Write
`
`End ~
`
`14
`
`

`
`U.S. Patent
`
`Sep. 12,2006
`
`Sheet 14 of 15
`
`US 7,107,408 B2
`
`15
`
`THNZU
`
`ofiHBZEUESE
`
`3o.Eu.M
`
`3:SE20
`
`15
`
`

`
`U.S. Patent
`
`Sep. 12, 2006
`
`Sheet 15 of 15
`
`US 7,107,408 B2
`
`Figure 12
`
`1201
`
`Allocate Transaction
`x Identi?er
`
`y
`
`1203 \ Probe Local And Remote
`Clusters
`
`V
`
`1205 \d Local Transaction
`Completes
`
`V
`
`1207 "\ Maintain Transaction
`Identi?er
`
`1209
`
`All Remote Probe
`! esponses Receive .
`
`Yes
`
`i
`
`121 1 ’\ Clear Transaction
`Identi?er
`
`16
`
`

`
`US 7,107,408 B2
`
`1
`METHODS AND APPARATUS FOR
`SPECULATIVE PROBING WITH EARLY
`COMPLETION AND EARLY REQUEST
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is related to concurrently ?led US.
`application Ser. No. 10/106,426, entitled METHODS AND
`APPARATUS FOR SPECULATIVE PROBING AT A
`10
`REQUEST CLUSTER and to concurrently ?led U.S. appli
`cation Ser. No. 10/ 106,430, entitled METHODS AND
`APPARATUS FOR SPECULATIVE PROBING WITH
`EARLY COMPLETION AND DELAYED REQUEST, the
`disclosures of Which are incorporated by reference herein for
`all purposes.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention generally relates to accessing data
`in a multiple processor system. More speci?cally, the
`present invention provides techniques for improving data
`access ef?ciency While maintaining cache coherency in a
`multiple processor system having a multiple cluster archi
`tecture.
`2. Description of Related Art
`Data access in multiple processor systems can raise issues
`relating to cache coherency. Conventional multiple proces
`sor computer systems have processors coupled to a system
`memory through a shared bus. In order to optimiZe access to
`data in the system memory, individual processors are typi
`cally designed to Work With cache memory. In one example,
`each processor has a cache that is loaded With data that the
`processor frequently accesses. The cache can be onchip or
`olfchip. Each cache block can be read or Written by the
`processor. HoWever, cache coherency problems can arise
`because multiple copies of the same data can co-exist in
`systems having multiple processors and multiple cache
`memories. For example, a frequently accessed data block
`corresponding to a memory line may be loaded into the
`cache of tWo different processors. In one example, if both
`processors attempt to Write neW values into the data block at
`the same time, different data values may result. One value
`may be Written into the ?rst cache While a different value is
`Written into the second cache. A system might then be unable
`to determine What value to Write through to system memory.
`A variety of cache coherency mechanisms have been
`developed to address such problems in multiprocessor sys
`tems. One solution is to simply force all processor Writes to
`go through to memory immediately and bypass the associ
`ated cache. The Write requests can then be serialiZed before
`overWriting a system memory line. HoWever, bypassing the
`cache signi?cantly decreases ef?ciency gained by using a
`cache. Other cache coherency mechanisms have been devel
`oped for speci?c architectures. In a shared bus architecture,
`each processor checks or snoops on the bus to determine
`Whether it can read or Write a shared cache block. In one
`example, a processor only Writes an object When it oWns or
`has exclusive access to the object. Each corresponding cache
`object is then updated to alloW processors access to the most
`recent version of the object.
`Bus arbitration can be used When both processors attempt
`to Write the same shared data block in the same clock cycle.
`Bus arbitration logic can decide Which processor gets the
`bus ?rst. Although, cache coherency mechanisms such as
`bus arbitration are effective, using a shared bus limits the
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`number of processors that can be implemented in a single
`system With a single memory space.
`Other multiprocessor schemes involve individual proces
`sor, cache, and memory systems connected to other proces
`sors, cache, and memory systems using a netWork backbone
`such as Ethernet or Token Ring. Multiprocessor schemes
`involving separate computer systems each With its oWn
`address space can avoid many cache coherency problems
`because each processor has its oWn associated memory and
`cache. When one processor Wishes to access data on a
`remote computing system, communication is explicit. Mes
`sages are sent to move data to another processor and
`messages are received to accept data from another processor
`using standard netWork protocols such as TCP/IP. Multipro
`cessor systems using explicit communication including
`transactions such as sends and receives are referred to as
`systems using multiple private memories. By contrast, mul
`tiprocessor system using implicit communication including
`transactions such as loads and stores are referred to herein as
`using a single address space.
`Multiprocessor schemes using separate computer systems
`alloW more processors to be interconnected While minimiZ
`ing cache coherency problems. HoWever, it Would take
`substantially more time to access data held by a remote
`processor using a netWork infrastructure than it Would take
`to access data held by a processor coupled to a system bus.
`Furthermore, valuable netWork bandWidth Would be con
`sumed moving data to the proper processors. This can
`negatively impact both processor and netWork performance.
`Performance limitations have led to the development of a
`point-to-point architecture for connecting processors in a
`system With a single memory space. In one example, indi
`vidual processors can be directly connected to each other
`through a plurality of point-to-point links to form a cluster
`of processors. Separate clusters of processors can also be
`corrected. The point-to-point links signi?cantly increase the
`bandWidth for coprocessing and multiprocessing functions.
`HoWever, using a point-to-point architecture to connect
`multiple processors in a multiple cluster system sharing a
`single memory space presents its oWn problems.
`Consequently, it is desirable to provide techniques for
`improving data access and cache coherency in systems
`having multiple clusters of multiple processors connected
`using point-to-point links.
`
`SUMMARY OF THE INVENTION
`
`According to the present invention, methods and appara
`tus are provided for increasing the ef?ciency of data access
`in a multiple processor, multiple cluster system. A cache
`coherence controller associated With a ?rst cluster of pro
`cessors can determine Whether speculative probing can be
`performed before forWarding a data access request to a
`second cluster. The cache coherence controller can also
`forWard the data access request to the second cluster before
`receiving a probe response.
`According to speci?c embodiments, a computer system is
`provided. A ?rst cluster includes a ?rst plurality of proces
`sors and a ?rst cache coherence controller. The ?rst plurality
`of processors and the ?rst cache coherence controller are
`interconnected in a point-to-point architecture. A second
`cluster includes a second plurality of processors and a
`second cache coherence controller. The second plurality of
`processors and the second cache coherence controller are
`interconnected in a point-to-point architecture. The ?rst
`cache coherence controller is coupled to the second cache
`coherence controller. The ?rst cache coherence controller is
`
`17
`
`

`
`US 7,107,408 B2
`
`3
`con?gured to receive a cache access request originating
`from the ?rst plurality of processors and send a probe to the
`?rst plurality of processors in the ?rst cluster before the
`cache access request is received by a serialiZation point in
`the second cluster. The ?rst cache coherence controller can
`be further con?gured to forward the cache access request
`before determining if the cache access request can be
`completed locally.
`In one embodiment, the serialization point is a memory
`controller in the second cluster. The probe can be associated
`With the memory line corresponding to the cache access
`request. The ?rst cache coherence controller can be further
`con?gured to respond to the probe originating from the
`second cluster using information obtained from the probe of
`the ?rst plurality of processors. The ?rst cache coherence
`controller can also be associated With a pending buffer.
`According to another embodiment, a cache coherence
`controller is provided. The cache coherence controller
`includes interface circuitry coupled to a plurality of local
`processors in a local cluster and a non-local cache coherence
`controller in a non-local cluster. The plurality of local
`processors are arranged in a point-to-point architecture. The
`cache coherence controller can also include a protocol
`engine coupled to the interface circuitry. The protocol
`engine can be con?gured to receive a cache access request
`from a ?rst processor in the local cluster and speculatively
`probe a local node. The protocol engine can also forWard the
`cache access request before receiving a probe response from
`the local node associated With the cache.
`According to another embodiment, a method for a cache
`coherence controller to manage data access in a multipro
`cessor system is provided. A cache access request is received
`from a local processor associated With a local cluster of
`processors connected through a point-to-point architecture.
`It is determined if speculative probing of a local node
`associated With a cache can be performed before forWarding
`the cache request to a non-local cache coherence controller.
`The non-local cache coherence controller is associated With
`a remote cluster of processors connected through a point
`to-point architecture. The remote cluster of processors
`shares an address space With the local cluster of processors.
`A cache access request can be sent before receiving a probe
`response from the local node associated With the cache.
`A further understanding of the nature and advantages of
`the present invention may be realiZed by reference to the
`remaining portions of the speci?cation and the draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention may best be understood by reference to the
`folloWing description taken in conjunction With the accom
`panying draWings, Which are illustrative of speci?c embodi
`ments of the present invention.
`FIGS. 1A and 1B are diagrammatic representation depict
`ing a system having multiple clusters.
`FIG. 2 is a diagrammatic representation of a cluster
`having a plurality of processors.
`FIG. 3 is a diagrammatic representation of a cache coher
`ence controller.
`FIG. 4 is a diagrammatic representation shoWing a trans
`action ?oW for a data access request.
`FIG. 5A-5D are diagrammatic representations shoWing
`cache coherence controller functionality.
`FIG. 6 is a diagrammatic representation depicting a trans
`action ?oW for a data access request from a processor
`transmitted to a home cache coherency controller.
`
`4
`FIG. 7 is a diagrammatic representation shoWing a trans
`action ?oW for speculative probing at a request cluster.
`FIG. 8 is a process How diagram depicting the handling of
`intervening requests.
`FIG. 9 is a diagrammatic representation shoWing a trans
`action ?oW for speculative probing With delayed request.
`FIG. 10 is a process How diagram depicting the determi
`nation of Whether a data access request can complete locally.
`FIG. 11 is a diagrammatic representation shoWing a
`transaction ?oW for speculative probing With early request.
`FIG. 12 is a process How diagram depicting the mainte
`nance of transaction information.
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`Reference Will noW be made in detail to some speci?c
`embodiments of the invention including the best modes
`contemplated by the inventors for carrying out the invention.
`Examples of these speci?c embodiments are illustrated in
`the accompanying draWings. While the invention is
`described in conjunction With these speci?c embodiments, it
`Will be understood that it is not intended to limit the
`invention to the described embodiments. On the contrary, it
`is intended to cover alternatives, modi?cations, and equiva
`lents as may be included Within the spirit and scope of the
`invention as de?ned by the appended claims. Multi-proces
`sor architectures having point-to-point communication
`among their processors are suitable for implementing spe
`ci?c embodiments of the present invention. In the folloWing
`description, numerous speci?c details are set forth in order
`to provide a thorough understanding of the present inven
`tion. The present invention may be practiced Without some
`or all of these speci?c details. Well knoWn process opera
`tions have not been described in detail in order not to
`unnecessarily obscure the present invention.
`Techniques are provided for increasing data access effi
`ciency in a multiple processor, multiple cluster system. In a
`point-to-point architecture, a cluster of processors includes
`multiple processors directly connected to each other through
`point-to-point links. By using point-to-point links instead of
`a conventional shared bus or external network, multiple
`processors are used e?iciently in a system sharing the same
`memory space. Processing and netWork ef?ciency are also
`improved by avoiding many of the bandWidth and latency
`limitations of conventional bus and external netWork based
`multiprocessor architectures. According to various embodi
`ments, hoWever, linearly increasing the number of proces
`sors in a point-to-point architecture leads to an exponential
`increase in the number of links used to connect the multiple
`processors. In order to reduce the number of links used and
`to further modulariZe a multiprocessor system using a point
`to-point architecture, multiple clusters are used.
`According to various embodiments, the multiple proces
`sor clusters are interconnected using a point-to-point archi
`tecture. Each cluster of processors includes a cache coher
`ence controller used to handle communications betWeen
`clusters. In one embodiment, the point-to-point architecture
`used to connect processors are used to connect clusters as
`Well.
`By using a cache coherence controller, multiple cluster
`systems can be built using processors that may not neces
`sarily support multiple clusters. Such a multiple cluster
`system can be built by using a cache coherence controller to
`represent non-local nodes in local transactions so that local
`nodes do not need to be aWare of the existence of nodes
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`18
`
`

`
`US 7,107,408 B2
`
`5
`outside of the local cluster. More detail on the cache
`coherence controller Will be provided below.
`In a single cluster system, cache coherency can be main
`tained by sending all data access requests through a serial
`ization point. Any mechanism for ordering data access
`requests is referred to herein as a serialization point. One
`example of a serialization point is a memory controller.
`Various processors in the single cluster system send data
`access requests to the memory controller. The memory
`controller can be con?gured to serialize the data access
`requests so that only one data access request for a given
`memory line is alloWed at any particular time. If another
`processor attempts to access the same memory line, the data
`access attempt is blocked until the memory line is unlocked.
`The memory controller alloWs cache coherency to be main
`tained in a multiple processor, single cluster system.
`A serialization point can also be used in a multiple
`processor, multiple cluster system Where the processors in
`the various clusters share a single address space. By using a
`single address space, internal point-to-point links can be
`used to signi?cantly improve intercluster communication
`over traditional external netWork based multiple cluster
`systems. Various processors in various clusters send data
`access requests to a memory controller associated With a
`particular cluster such as a home cluster. The memory
`controller can similarly serialize all data requests from the
`different clusters. HoWever, a serialization point in a mul
`tiple processor, multiple cluster system may not be as
`ef?cient as a serialization point in a multiple processor,
`single cluster system. That is, delay resulting from factors
`such as latency from transmitting betWeen clusters can
`adversely affect the response times for various data access
`requests. It should be noted that delay also results from the
`use of probes in a multiple processor environment.
`Although delay in intercluster transactions in an architec
`ture using a shared memory space is signi?cantly less than
`the delay in conventional message passing environments
`using external netWorks such as Ethernet or token ring, even
`minimal delay is a signi?cant factor. In some applications,
`there may be millions of data access requests from a
`processor in a single second. Any delay can adversely
`impact processor performance.
`According to various embodiments, speculative probing
`is used to increase the ef?ciency of accessing data in a
`multiple processor, multiple cluster system. A mechanism
`for eliciting a response from a node to maintain cache
`coherency in a system is referred to herein as a probe. In one
`example, a mechanism forsnooping a cache is referred to as
`a probe. A response to a probe can be directed to the source
`or target of the initiating request. Any mechanism for
`sending probes to nodes associated With cache blocks before
`a request associated With the probes is received at a serial
`ization point is referred to herein as speculative probing.
`Techniques of the present invention recognize the reor
`dering or elimination of certain data access requests do not
`adversely affect cache coherency. That is, the end value in
`the cache is the same Whether or not snooping occurs. For
`example, a local processor attempting to read the cache data
`block can be alloWed to access the data block Without
`sending the requests through a serialization point in certain
`circumstances. In one example, read access can be permitted
`When the cache block is valid and the associated memory
`line is not locked. The techniques of the present invention
`provide mechanisms for determining When speculative
`probing can be performed and also provide mechanisms for
`determining When speculative probing can be completed
`Without sending a request through a serialization point.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`Speculative probing Will be described in greater detail
`beloW. By completing a data access transaction Within a
`local cluster, the delay associated With transactions in a
`multiple cluster system can be reduced or eliminated.
`To alloW even more ef?cient speculative probing, the
`techniques of the present invention also provide mechanisms
`for handling transactions that may result from speculatively
`probing a local node before locking a particular memory
`line. In one example, a cache coherence protocol used in a
`point-to-point architecture may not alloW for speculative
`probing. Nonetheless, mechanisms are provided to alloW
`various nodes such as processors and memory controllers to
`continue operations Within the cache coherence protocol
`Without knoWing that any protocol variations have occurred.
`FIG. 1A is a diagrammatic representation of one example
`of a multiple cluster, multiple processor system that can use
`the techniques of the present invention. Each processing
`cluster 101, 103, 105, and 107 can include a plurality of
`processors. The processing clusters 101, 103, 105, and 107
`are connected to each other through point-to-point links
`111117‘: In one embodiment, the multiple processors in the
`multiple cluster architecture shoWn in FIG. 1A share the
`same memory space. In this example, the point-to-point
`links lllaif are internal system connections that are used in
`place of a traditional front-side bus to connect the multiple
`processors in the multiple clusters 101, 103, 105, and 107.
`The point-to-point links may support any point-to-point
`coherence protocol.
`FIG. 1B is a diagrammatic representation of another
`example of a multiple cluster, multiple processor system that
`can use the techniques of the present invention. Each pro
`cessing cluster 121, 123, 125, and 127 can be coupled to a
`sWitch 135 through point-to-point links l4laid. It should be
`noted that using a sWitch and point-to-point alloWs imple
`mentation With feWer point-to-point links When connecting
`multiple clusters in the system. A sWitch 131 can include a
`processor With a coherence protocol interface. According to
`various implementations, a multicluster system shoWn in
`FIG. 1A is expanded using a sWitch 131 as shoWn in FIG.
`1B.
`FIG. 2 is a diagrammatic representation of a multiple
`processor cluster, such as the cluster 101 shoWn in FIG. 1A.
`Cluster 200 includes processors 202ai202d, one or more
`Basic I/O systems (BIOS) 204, a memory subsystem com
`prising memory banks 206ai206d, point-to-point commu
`nication links 208ai208e, and a service processor 212. The
`point-to-point communication links are con?gured to alloW
`interconnections betWeen processors 202ai202d, I/O sWitch
`210, and cache coherence controller 230. The service pro
`cessor 212 is con?gured to alloW communications With
`processors 202ai202d, I/O sWitch 210, and cache coherence
`controller 230 via a JTAG interface represented in FIG. 2 by
`links 2l4ai2l4f It should be noted that other interfaces are
`supported. I/O sWitch 210 connects the rest of the system to
`I/O adapters 216 and 220.
`According to speci?c embodiments, the service processor
`of the present invention has the intelligence to partition
`system resources according to a previously speci?ed parti
`tioning schema. The partitioning can be achieved through
`direct manipulation of routing tables associated With the
`system processors by the service processor Which is made
`possible by the point-to-point communication infrastructure.
`The routing tables are used to control and isolate various
`system resources, the connections betWeen Which are
`de?ned therein. The service processor and computer system
`partitioning are described in patent application Ser. No.
`09/932,456 titled Computer System Partitioning Using Data
`
`19
`
`

`
`US 7,107,408 B2
`
`7
`Transfer Routing Mechanism, ?led on Aug. 16, 2001, the
`entirety of Which is incorporated by reference for all pur
`poses.
`The processors 202aid are also coupled to a cache
`coherence controller 230 through point-to-point links
`232aid. Any mechanism or apparatus that can be used to
`provide communication betWeen multiple processor clusters
`While maintaining cache coherence is referred to herein as a
`cache coherence controller. The cache coherence controller
`230 can be coupled to cache coherence controllers associ
`ated With other multiprocessor clusters. It should be noted
`that there can be more than one cache coherence controller
`in one cluster. The cache coherence controller 230 commu
`nicates With both processors 202aid as Well as remote
`clusters using a point-to-point protocol.
`More generally, it should be understood that the speci?c
`architecture shoWn in FIG. 2 is merely exemplary and that
`embodiments of the present invention are contemplated
`having different con?gurations and resource interconnec
`tions, and a variety of alternatives for each of the system
`resources shoWn. HoWever, for purpose of illustration, spe
`ci?c details of server 200 Will be assumed. For example,
`most of the resources shoWn in FIG. 2 are assumed to reside
`on a single electronic assembly. In addition, memory banks
`206ai206d may comprise double data rate (DDR) memory
`Which is physically provided as dual in-line memory mod
`ules (DIMMs). I/O adapter 216 may be, for example, an
`ultra direct memory access (UDMA) controller or a small
`computer system interface (SCSI) controller Which provides
`access to a permanent storage device. I/O adapter 220 may
`be an Ethernet card adapted to provide communications With
`a netWork such as, for example, a local area netWork (LAN)
`or the Internet.
`According to a speci?c embodiment and as shoWn in FIG.
`2, both of I/O adapters 216 and 220 provide symmetric I/O
`access. That is, each provides access to equivalent sets of
`I/O. As Will be understood, such a con?guration Would
`facilitate a partitioning scheme in Which multiple partitions
`have access to the same types of I/O. HoWever, it should also
`be understood that embodiments are envisioned in Which
`partitions Without I/O are created. For example, a partition
`including one or more processors and associated memory
`resources, i.e., a memory complex, could be created for the
`purpose of testing the memory complex.
`According to one embodiment, service processor 212 is a
`Motorola MPC855T microprocessor Which includes inte
`grated chipset functions. The cache coherence controller 230
`can be an Application Speci?c Integrated Circuit (ASIC)
`supporting the local point-to-point coherence protocol. The
`cache coherence controller 230 can also be con?gured to
`handle a non-coherent protocol to alloW communication
`with I/O devices. In one embodiment, the cache coherence
`controller 230 is a specially con?gured programmable chip
`such as a programmable logic device or a ?eld program
`mable gate array.
`FIG. 3 is a diagrammatic representation of one example of
`a cache coherence controller 230. The cache coherence
`controller can include a protocol engine 305 con?gured to
`handle packets such as probes and requests received from
`processors in various clusters of a multiprocessor system.
`The functionality of the protocol engine 305 can be parti
`tioned across several engines to improve performance. In
`one example, partitioning can be done based on individual
`transactions ?oWs, packet type (request, probe and
`response), direction (incoming and outgoing), or transac
`tions ?oW (request ?oWs, probe ?oWs, etc).
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`The protocol engine 305 has access to a pending buffer
`309 that alloWs the cache coherence controller to track
`transactions such as recent requests and probes and associ
`ated the transactions With speci?c processors. Transaction
`information maintained in the pending buffer 309 can
`include transaction destination nodes, the addresses of
`req

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket