throbber
United States Patent [19]
`Hagersten et a].
`
`USOO5754877A
`[11] Patent Number:
`[45] Date of Patent:
`
`‘
`5,754,877
`May 19, 1998
`
`[54] EXTENDED SYMMETRICAL
`MULTIPROCESSOR ARCHITECTURE
`
`[75] Inventors: Erik E. Hagersten. Palo Alto. Calif.;
`Mark D. Hill. Madison. Wis.
`
`[73] Assignee: Sun Microsystems, Inc.. Palo Alto.
`Calif.
`
`[21] Appl. No.: 675,363
`[22] Filed:
`Jul. 2, 1996
`
`[51] Int. Cl.6 ................................................. .. G06F 15/163
`[52] US. Cl. .............................. .. 395/800.29; 395/200.81;
`395/200.73
`[58] Field of Search ....................... .. 395120068. 200.73.
`395/200.81. 800.29
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4/1989 Kn'ngs ....................................... .. 371/9
`4,819,232
`7/1996 Peavey et a1.
`.
`5,533,103
`5.579.512 11/1996 Goodrum et a1. .................... .. 395/500
`5,590,335 12/1996 Dubourreau et a].
`395/704
`5,608,893
`3/1997 Slingwine etal.
`395/468
`5,655,103
`8/1997 Cheng et a1. ........................ .. 395/479
`
`OTHER PUBLICATIONS
`
`Cox et al.. “Adaptive Cache Coherency for Detecting Migra
`tory Shared Data." Proc. 20th Annual Symposium on Com
`puter Architecture. May 1993. pp. 98-108.
`Stenstrom et al.. “An Adaptive Cache Coherence Protocol
`Optimized for Migratory Sharing." Proc. 20th Annual Sym
`posium on Computer Architecture. May 1993 IEEE. pp.
`109-118.
`Wolf-Dietrich Weber et al.. “Analysis of Cache invalidation
`Patterns in Multiprocessors”. Computer Systems Labora
`tory. Stanford University. CA. pp. 243-256.
`Kourosh et a1.. ‘Two Techniques to Enhance the Perfor
`mance of Memory Consistency Models 1991 International
`Conference on Parallel Processing. pp. 1-10.
`Li et al.. “Memory Coherence in Shared Virtual Memory
`Systems.” 1986 ACM. pp. 229-239.
`
`ABSTRACT
`
`D. Lenosky. PhD. “The Description and Analysis of DASH:
`A Scalable Directory-Based Multiprocessor.” DASH Proto
`type System. Dec. 1991. pp. 36-56.
`Hagersten et al.. “Simple COMA Node Implementations.”
`Ashley Saulsbury and Anders Landin Swedish Institute of
`Computer Science. 12 pages.
`Saulsbury et aL. “An Argument for Simple COMA.” Swed
`ish Institute of Computer Science. 10 pages.
`Hagersten et al.. “Simple COMA." Ashley Saulsbury and
`Anders Landin Swedish Institute of Computer Science. Jul.
`1993. pp. 233-259.
`Primary Examiner-William M. Treat
`Attorney, Agent, or F inn-Conley. Rose 8: Tayon; B. Noel
`Kivlin
`[57]
`An architecture for an extended multiprocessor (XMP)
`computer system is provided. The XMP computer system
`includes multiple SMP nodes. Each SMP node includes an
`XMP interface and a repeater structure coupled to the XMP
`interface. The SMP nodes are connected to each other by
`unidirectional point-to-point links. The repeater structure in
`each SMP node includes an upper level bus. one or more
`transaction repeaters coupled to the upper level bus. Each
`transaction repeater broadcasts transactions to bus devices
`attached to a lower level bus. wherein each transaction
`repeater is coupled to a separate lower level bus. Transaction
`repeater includes a queue and a bypass path. Transaction
`originating in a particular SMP node are stored in the queue.
`whereas transactions originating in other SMP nodes bypass
`the incoming queue to the bus device. Multiple transactions
`may be simultaneously broadcast across the point-to-point
`link connections between the SMP nodes. However. trans
`actions are broadcast to the SMP nodes in a de?ned. uniform
`order. A control signal is asserted by the XMP interface so
`that a transaction is received by bus devices in the originat
`ing node from the incoming queues at the same time and in
`the same order it is received by bus devices in non
`originating nodes. Thus a hierarchical bus structure is pro
`vided that overcomes physical/electrical limitations of
`single bus architecture while maximizing bus bandwidth
`utilization.
`
`16 Claims, 10 Drawing Sheets
`
`/I/
`
`'126
`
`126k].
`l
`
`XMP
`
`Interface
`I
`
`’ *
`
`;.1255
`
`age [
`
`b
`
`35A
`
`-120
`r
`
`22
`
`1
`
`124A ’\
`
`\422).]
`
`1248 ‘\~
`
`4228
`
`\
`
`1
`
`3.2A
`
`.
`J L
`
`\
`
`r
`
`l
`
`329 .
`I
`
`\
`
`35A
`Processor!
`Mammy ‘150A
`,—_L_
`l MTAG
`
`sea
`
`l/O
`
`sac
`Processor/
`“9m” .1500
`
`sap
`Processor!
`Mammy ‘-1500
`
`MTAG
`
`MTAG
`
`NETAPP, INC. EXHIBIT 1004
`Page 1 of 22
`
`

`
`US. Patent
`
`May 19, 1998
`
`Sheet 1 of 10
`
`5,754,877
`
`2\
`
`w
`
`/ ~ \
`
`
`
`............ mm r \ E
`F 8m S I
`
`
`
`m @1 m 663mm 3? < 683mm
`
`~| 26 N. 3 w 2a F. 3 A , mv ~ , 3v \
`
`~ ~ ~ mm on mm <m
`
`o @250 0 @250 m @250 < 83mm
`
`P .mE
`
`NETAPP, INC. EXHIBIT 1004
`Page 2 of 22
`
`

`
`US. Patent
`
`EL.
`
`May 19, 1998
`
`
`
`MAEMEVMAEVMAEVH KEVKEWEXEVW
`
`QKQMQEQVMAQEVWEVEMAEEXCVEXamayA32x3EXEEQQQEM
`
`Sheet 2 of 10
`
`5,754,877
`
`I or m
`
`mAsEXeE VWQVEVKAOVE
`
`35x33
`M2596.
`
`9.5 f: I m“
`
`mam N: l 2
`
`NETAPP, INC. EXHIBIT 1004
`Page 3 of 22
`
`

`
`U.S. Patent
`
`May 19, 1998
`
`Sheet 3 of 10
`
`5,754,877
`
`Processor/Memory
`
`Processor/Memory
`
`E 8
`
`%‘
`U)
`(DE
`0(1)
`92n.
`
`NETAPP, INC. EXHIBIT 1004
`
`Page 4 of 22
`
`22
`
`RepeaterB343
`
`34A
`
`RepeaterA
`
`32A
`
`NETAPP, INC. EXHIBIT 1004
`Page 4 of 22
`
`

`
`U.S. Patent
`
`May 19, 1993
`
`Sheet 4 of 10
`
`5,754,877
`
`:_S,m.w_n_o,m¢_m.m,o
`
`o.o>o
`
`mam31.5
`
`_..Elmo
`
`
`
`N.3.7%
`
`B:_Eoo:_»/we
`
`mm:_Eoo:_mm
`
`NETAPP, INC. EXHIBIT 1004
`
`Page 5 of 22
`
`NETAPP, INC. EXHIBIT 1004
`Page 5 of 22
`
`

`
`US. Patent
`
`May 19, 1998
`
`Sheet 5 of 10
`
`5,754,877
`
`2? 1
`
`Q60
`
`3 m?
`
`@0822
`
`
`
`mm; 230
`
`
`
`mam Ema 0P @EEQQE gm 5 oh
`
`
`6:02:00 \) \mv
`NvKK /
`
`04.
`
`E) S
`
`
`
`s? o_ oo {E
`
`_|IL
`
`my
`
`Bwwwooi
`
`cm, 222 2 mm
`
`m .2“.
`
`NETAPP, INC. EXHIBIT 1004
`Page 6 of 22
`
`

`
`US. Patent
`
`May 19, 1998
`
`Sheet 6 0f 10
`
`5,754,877
`
`mm
`
`, N _ _ \
`
`ow mm,
`
`.w?mmcmw U 6=9Eoo
`
`mm]
`
`~ ~ vm mm
`o<
`m w M525 9
`
`r _ _ _ L N v
`
`
`
`
`
`
`
`mam Ema o._. @5885 95 I 2.
`
`
`
`
`
`625m 8:9:60 6=9Eoo
`
`
`
`02 > wEwE mm? 51 5? 5m
`
`@ .5
`
`
`E8 Emu Emu Emu
`Ga Ga 61 5Q
`
`m , M N L,
`
`
`
`
`
`~ ~ ~ ~ Q2: Q2: mg“ 3.2
`
`mm? wNE
`
`NETAPP, INC. EXHIBIT 1004
`Page 7 of 22
`
`

`
`U.S. Patent
`
`May 19, 1993
`
`Sheet 7 of 10
`
`5,754,877
`
`81
`
`N.9...
`
`NETAPP, INC. EXHIBIT 1004
`
`Page 8 of 22
`
`NETAPP, INC. EXHIBIT 1004
`Page 8 of 22
`
`

`
`U.S. Patent
`
`May 19, 1993
`
`Sheet 3 of 10
`
`5,754,877
`
`8.1.
`
`NW
`
`mm.»
`
`n_S_X
`
`
`
`mom“2oom.cmE_.mm“
`
`
`
`mm.{NM
`
`II
`““mmfl-mwflvmfl—$3
`
`rllu-9_.Il..§
`
`
`
` 'mom:>._oEo_>_
`
`zommmooi
`
`mam
`
`tommmooi
`
`boEm_2
`
`
`
`o\_<8...roams.
`
`tommmooi
`
`mum<
`
`NETAPP, INC. EXHIBIT 1004
`
`Page 9 of 22
`
`NETAPP, INC. EXHIBIT 1004
`Page 9 of 22
`
`
`
`
`
`
`

`
`US. Patent
`
`May 19, 1998
`
`5,754,877
`
`252
`
`282
`
`252
`
`
`
`20:62 5295a
`
`
`
`Q22 $662
`
`
`
`%///////% m
`
`NETAPP, INC. EXHIBIT 1004
`Page 10 of 22
`
`

`
`US. Patent
`
`May 19, 1998
`
`Sheet 10 of 10
`
`5,754,877
`
`81
`
`0p m w \I o m w m m _.
`
`
`
`e2 em“. s5 e3 mszam e: e?vwsv? 3a 5E QEVWAAOVOEW s2 e2 5E 53W 55
`
`3E MACE VMAAQNEVWAQEVMCE agvwéemnvm 52 :5 M
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`mam 541m?
`
`25 m.ml_bmww
`
`9.5 mNJ 1 #9
`
`Em] mm?
`
`226
`
`mmmlhww
`
`NETAPP, INC. EXHIBIT 1004
`Page 11 of 22
`
`

`
`1
`EXTENDED SYMMETRICAL
`MULTIPROCESSOR ARCHITECTURE
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`This invention relates to the ?eld of multiprocessor com
`puter systems and. more particularly. to the architectural
`connection of multiple processors within a multiprocessor
`computer system.
`2. Description of the Relevant Art
`Multiprocessing computer systems include two or more
`processors which may be employed to perform computing
`tasks. A particular computing task may be performed upon
`one processor while other processors perform unrelated
`computing tasks. Alternatively. components of a particular
`computing task may be distributed among multiple proces
`sors to decrease the time required to perform the computing
`task as a whole. Generally speaking. a processor is a device
`con?gured to perform an operation upon one or more
`operands to produce a result. The operation is performed in
`response to an instruction executed by the processor.
`A popular architecture in commercial multiprocessing
`computer systems is the symmetric multiprocessor (SMP)
`architecture. Typically. an SMP computer system comprises
`multiple processors connected through a cache hierarchy to
`a shared bus. Additionally connected to the bus is a memory.
`which is shared among the processors in the system. Access
`to any particular memory location within the memory occurs
`in a similar amount of time as access to any other particular
`memory location. Since each location in the memory may be
`accessed in a uniform manner. this structure is often referred
`to as a uniform memory architecture (UMA).
`Processors are often con?gured with internal caches. and
`one or more caches are typically included in the cache
`hierarchy between the processors and the shared bus in an
`SMP computer system. Multiple copies of data residing at a
`particular main memory address may be stored in these
`caches. In order to maintain the shared memory model. in
`which a particular address stores exactly one data value at
`any given time. shared bus computer systems employ cache
`coherency. Generally speaking. an operation is coherent if
`the eifects of the operation upon data stored at a particular
`memory address are re?ected in each copy of the data within
`the cache hierarchy. For example. when data stored at a
`particular memory address is updated. the update may be
`supplied to the caches which are storing copies of the
`previous data. Alternatively. the copies of the previous data
`may be invalidated in the caches such that a subsequent
`access to the particular memory address causes the updated
`copy to be transferred from main memory. For shared bus
`systems. a snoop bus protocol is typically employed. Each
`coherent transaction performed upon the shared bus is
`examined (or “snooped") against data in the caches. If a
`copy of the affected data is found. the state of the cache line
`containing the data may be updated in response to the
`coherent transaction.
`Unfortunately. shared bus architectures su?er from sev
`eral drawbacks which limit their usefulness in multiprocess
`ing computer systems. A bus is capable of a peak bandwidth
`(e.g. a number of bytes/second which may be transferred
`across the bus). As additional processors are attached to the
`bus. the bandwidth required to supply the processors with
`data and instructions may exceed the peak bus bandwidth.
`Since some processors are forced to wait for available bus
`bandwidth. performance of the computer system suffers
`when the bandwidth requirements of the processors exceeds
`available bus bandwidth.
`
`45
`
`SO
`
`55
`
`60
`
`65
`
`5.754.877
`
`10
`
`25
`
`30
`
`35
`
`2
`Additionally. adding more processors to a shared bus
`increases the capacitive loading on the bus and may even
`cause the physical length of the bus to be increased. The
`increased capacitive loading and extended bus length
`increases the delay in propagating a signal across the bus.
`Due to the increased propagation delay. transactions may
`take longer to perform. Therefore. the peak bandwidth of the
`bus may decrease as more processors are added.
`These problems are further magni?ed by the continued
`increase in operating frequency and performance of proces
`sors. The increased performance enabled by the higher
`frequencies and more advanced processor microarchitec
`tures results in higher bandwidth requirements than previous
`processor generations. even for the same number of proces
`sors. Therefore. buses which previously provided sut?cient
`bandwidth for a multiprocessing computer system may be
`insu?icient for a similar computer system employing the
`higher performance processors.
`A common way to address the problems incurred as more
`processors and devices are added to a shared bus system. is
`to have a hierarchy of buses. In a hierarchical shared bus
`system. the processors and other bus devices are divided
`among several low level buses. These low level buses are
`connected by one or more high level buses. Transactions are
`originated on a low level bus. transmitted to the high level
`bus. and then driven back down to all the low level buses by
`repeaters. Thus. all the bus devices see the transaction at the
`same time and transactions remain ordered. The hierarchical
`shared bus logically appears as one large shared bus to all the
`devices. Additionally. the hierarchical structures overcomes
`the electrical constraints of a singled large shared bus.
`However. one problem with the above hierarchical shared
`bus structure is that transactions are always broadcast twice
`on the originating low level bus. This ine?iciency can
`severely limit the available bandwidth on the low level
`buses. A possible solution would be to have separate unidi
`rectional buses for transactions on the way up to higher
`levels of the bus hierarchy and for transactions on the way
`down from higher levels of the bus hierarchy. But this
`solution requires double the amount of bus signals and
`double the amount of pins on bus device packages. Obvi
`ously the solution imposes serious physical problems.
`An example an SMP computer system employing a tra
`ditional hierarchical bus structure. is illustrated in FIG. 1. A
`two-level bus structure is shown. Bus devices 8A-B are
`connected to lower level Ll.1 bus 4A and bus devices SC-D
`are connected to lower level L1.2 bus 4B. The bus devices
`may be any local bus type devices found in modern com
`puter systems such as a processor/memory device or an I/O
`bridge device. Each separate L1 bus 4A-B is coupled to an
`upper level L2 bus 2 by a repeater 6A-B. Together. each
`repeater. L1 bus. and bus device group from a repeater node
`5. For example. repeater 6A. L1 bus 4A. and bus devices
`SA-B comprise repeater node SA.
`When a bus transaction (such as a memory read) is
`initiated by a bus device. the transaction is transmitted from
`the originating Ll bus (4A or 4B) to the L2 bus 2. The
`transaction is then simultaneously broadcast back to both L1
`buses 4A-B by their respective repeaters 6A-B. In this
`manner the transaction is seen by all bus devices 8 at the
`same time. Furthermore. the hierarchical structure of FIG. 1
`ensures that bus transactions appear to all bus devices 8 in
`the same order. Thus. the hierarchical bus structure logically
`appears to the bus devices SA-D as a single shared bus.
`The operation of the computer system of FIG. I may be
`illustrated by timing diagram 12 as shown in FIG. 2. Each
`
`NETAPP, INC. EXHIBIT 1004
`Page 12 of 22
`
`

`
`5 .754.877
`
`3
`column of timing diagram 12 corresponds to a particular bus
`cycle. Eleven bus cycles increasing in time from left to right
`are represented by the eleven columns. The state of the L2
`bus 2. L11 bus 4A. and L1.2 bus 4B is shown for each bus
`cycle according to rows 14-16 respectively.
`During bus cycle 1. an outgoing packet (address and
`command) is driven by one of the bus devices 8 on the L1
`bus 4 in each repeater node 5. In timing diagram 12. these
`outgoing packets are shown as Pl(o) on the L1.l bus 4A and
`P2(o) on the L1.2 bus 4B. Since two di?erent bus transac
`tions were issued during the same cycle. the order in which
`they appear on the L2 bus 2 depends upon the arbitration
`scheme chosen. For the embodiment illustrated in tinting
`diagram 12. the transaction issued on the L1.1 bus 4A is
`transmitted to the 12 bus 2 ?rst. as represented by P1 on the
`L2 bus in bus cycle 2. Transaction P2(o) is queued in its
`respective repeater 6B. Also during bus cycle 2. two new
`transactions are issued on the lower level buses 4. repre
`sented by outgoing bus transactions P3(o) and P4(o) on the
`L1.1 bus 4A and L1.2 bus 4B respectively.
`During bus cycle 3. transaction P1 is broadcast as an
`incoming transaction on the L1 buses 4 of both repeater
`nodes 5. as represented by Pl(i) on rows 15 an 16. Also.
`during bus cycle 3. the second outgoing transaction P2(o)
`from bus cycle 1 broadcasts on the L2 bus 2 as shown in row
`14 on timing diagram 12.
`During bus cycle 4. transaction P2 is broadcast as an
`incoming transaction on the L1 buses 4. as represented by
`P2(i) on rows 15 and 16. Also. during bus cycle 4. outgoing
`transaction P3(o) broadcasts on the L2 bus 2 as transaction
`P3 as shown in row 14 on timing diagram 12. Similarly. bus
`transactions P3 and P4 are broadcast to the L1 buses during
`bus cycles 5 and 6. Because the L1 bus bandwidth it
`consumed with repeater broadcasts of incoming
`transactions. new outgoing transactions cannot be issued
`until bus cycle 7. As a result the full bandwidth of the L2 bus
`2 is not utilized as illustrated by the gap on row 14 during
`bus cycles 6 and 7.
`For systems requiring a large number of processors. the
`above hierarchical bus structure may require many levels of
`hierarchy. The delay associated with broadcasting each
`transaction to the top of the hierarchy and back down and the
`delay associated with bus arbitration may severely limit the
`throughput of large hierarchical structures.
`Another structure for multiprocessing computer systems
`is a distributed shared memory architecture. A distributed
`shared memory architecture includes multiple nodes within
`which processors and memory reside. The multiple nodes
`communicate via a network coupled there between. When
`considered as a whole. the memory included within the
`multiple nodes forms the shared memory for the computer
`system. Typically. directories are used to identify which
`nodes have cached copies of data corresponding to a par
`ticular address. Coherency activities may be generated via
`examination of the directories.
`However. distributed shared memory architectures also
`have drawbacks. Directory look ups. address translations.
`and coherency maintenance all add latency to transactions
`between nodes. Also. distributed shared memory architec
`ture systems normally require more complicated hardware
`than shared bus architectures.
`It is apparent from the above discussion that a more
`e?icient architecture for connecting a large number of
`devices in a multiprocessor system is desirable. The present
`invention addresses this need
`SUMMARY OF THE INVENTION
`The problems outlined above are in large part solved by
`a computer system in accordance with the present invention.
`
`20
`
`25
`
`35
`
`45
`
`55
`
`65
`
`4
`Broadly speaking. the present invention contemplates a
`multiprocessor computer system including multiple repeater
`nodes interconnected by an upper level bus. Each repeater
`node includes multiple bus devices. a lower level bus and an
`address repeater. The bus devices are interconnected on the
`lower level bus. The repeater couples the upper level bus to
`the lower level bus. The bus devices may be processor!
`memory devices and each bus device includes an incoming
`queue. Processor/memory bus devices include a high per
`formance processor such as a SPARC processor. DRAM
`memory. and a high speed second level cache memory. The
`physical DRAM memory located on each bus device col
`lectively comprises the system memory for the multiproces
`sor computer system. Also. bus devices may be input/output
`bus devices. I/O devices also include an incoming queue.
`Furthermore. input/output bus devices may include an I/O
`bus bridge that supports a peripheral I/O bus such as the PCI
`bus. This peripheral 110 bus allows communication with I/O
`devices. such as graphics controllers. serial and parallel
`ports and disk drives.
`The bus devices communicate with each other by sending
`and receiving bus transactions. A bus transaction initiated by
`one bus device is broadcast as an outgoing transaction on the
`lower level bus to which the initiating bus device is attached
`Each other bus device attached to the same lower level bus
`stores this outgoing transaction in its respective incoming
`queue. Also. the repeater attached to this lower level bus
`broadcasts the outgoing transaction to the upper level bus.
`The repeaters in each of the other repeater nodes receive this
`outgoing transaction and repeat it as an incoming transaction
`on their respective lower level buses. The repeater in the
`originating repeater node does not repeat the outgoing bus
`transaction as an incoming bus transaction on its lower level
`bus. Instead. when the other repeaters drive the outgoing
`transaction as incoming transactions on their respective
`lower level buses. the repeater in the originating repeater
`node asserts a control signal that alerts each bus device in the
`originating repeater node to treat the packet stored at the
`head of its incoming queue as the current incoming trans
`action. The repeaters in the nonoriginating repeater nodes
`assert control signals to the bus devices on their respective
`lower level buses indicating that those bus devices should
`bypass their incoming queues and receive the incoming
`transaction broadcast on their lower level buses. Storing the
`outgoing transaction in the incoming bus device queues in
`the originating repeater node frees up the lower level bus in
`the originating repeater node to broadcast another outgoing
`transaction while the ?rst transaction is being broadcast on
`the lower level buses in the nonoriginating repeater nodes.
`Therefore. maximum utilization of the lower level bus
`bandwidth is achieved.
`Generally speaking. every bus device on a given lower
`level bus stores all outgoing transactions that appear on that
`lower level bus in their incoming queues. Outgoing trans
`actions are broadcast by the repeater to the upper level bus
`in the same order that they appear in the lower level bus. The
`repeater for each repeater node drives transactions appearing
`on the upper level bus as incoming packets on the lower
`level bus only when those transactions are incoming trans
`actions from another repeater node. In this manner. all bus
`devices in the computer system see each particular transac
`tion at the same time and in the same order. Also. each bus
`transaction appears only once on each bus. Thus. the hier
`archical bus structure of the present invention appears as a
`single large. logically shared bus to all the bus devices and
`the multiprocessor computer system.
`Another embodiment of the present invention contem
`plates an extended multiprocessor computer architecture.
`
`NETAPP, INC. EXHIBIT 1004
`Page 13 of 22
`
`

`
`5.754.877
`
`6
`particular node initiates a transaction. the MTAG in that
`node is examined to determine if that node has valid access
`rights for that transaction address. If the retrieved MTAG
`indicates proper access rights. then the completed transac
`tion is valid. Otherwise. the transaction must be reissued
`globally to the other nodes.
`In another embodiment of the extended multiprocessor
`computer system of the present invention. different regions
`of the system memory address space may be assigned to
`operate in one of three modes. The three modes are the
`replicate mode. the migrate mode. and normal mode. For
`memory regions operating in the normal mode. all memory
`transactions are attempted in the originating multiprocessor
`node without sending global transactions. Transactions are
`only sent globally if the MTAG indicates improper access
`rights or if the address corresponds to a memory region
`mapped to another multiprocessor node.
`In the replicate mode. the replicate memory region is
`mapped to memory located in each multiprocessor node.
`such that a duplicate copy of the memory region is stored in
`each node. Therefore. replicate mode transactions are
`always attempted locally in the originating multiprocessor
`node. Transactions are only sent globally in replicate mode
`if the MTAG indicates improper access rights. In migrate
`mode. transactions are always sent globally the ?rst time.
`Therefore there is no need to maintain the MTAG coherency
`states.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`10
`
`25
`
`5
`Several multiprocessor nodes are interconnected with uni
`directional point-to-point link connections. Each multipro
`cessor link node includes a top level interface device for
`interfacing to these point-to-point link connections. Each
`node also includes an upper level bus which couples the top
`level interface to one or more repeaters. Each repeater is also
`coupled to a separate lower level bus in a fashion similar to
`that described for the embodiment above. One or more bus
`devices are attached to each lower level bus.
`Each repeater in a given multiprocessor node includes an
`internal queue and a bypass path. Each repeater also receives
`control signals from the top level interface. The control
`signals are used to select either the bypass path or the queue
`for transmitting transactions from the upper level bus to the
`lower level bus. Transactions originating within a given
`repeater node are stored in the queue whereas transactions
`incoming from another multiprocessor node are transmitted
`to the lower level bus via the bypass path. The point-to-point
`linking structure between top level interfaces of the multi
`processor nodes allows transactions to be communicated
`simultaneously between each multiprocessor node.
`Therefore. no arbitration delay is associated with these top
`level communications. Transaction ordering is maintained
`on this top level interface by following a strict de?ned
`transaction order. Any order may be chosen. but a speci?c
`de?ned order must be consistently used. For example. one
`such ordering may be that in a system comprising three
`nodes. node A. node B. and node C. transactions originating
`from node A take priority over transactions originating from
`node B and transactions originating from node B take
`priority over transactions originating from node C. This
`de?ned order indicates the order that transactions commu
`nicated on the top level point-to-point link structure will be
`transmitted to the repeaters in each multiprocessor node.
`Transactions broadcast on the upper level bus of nonorigi
`nating repeater nodes are further transmitted by the bypass
`path to the lower level buses in those nodes. However. the
`same transaction is not broadcast to the upper level bus in
`the originating repeater node. Instead. the control signal is
`asserted to the repeaters indicating that the transaction is to
`be broadcast to the lower level buses from the repeater
`queues. This allows the upper level bus in the originating
`node to remain free for broadcasting of new transactions.
`From the operation described above for the extended
`multiprocessor computer system. it can be seen that bus
`transactions broadcast between multiprocessor nodes appear
`only once on each upper level bus and lower level bus of
`each multiprocessor node. This allows maximum bus band
`width to be utilized. Furthermore. the strict de?ned ordering
`for the top level point-to-point connections ensures that an
`ordered transaction broadcast will always occur and that
`each bus device in the system will see each transaction at the
`same time and in the same order.
`Each bus device may contain memory. The memory
`located on each bus device collectively forms the system
`memory for the extended multiprocessor computer system.
`The memory is split into ditferent regions such that each
`multiprocessor node is assigned one portion of the total
`address space. The size of each address space portion is
`inversely proportional to the number of multiprocessor
`nodes comprising the extended multiprocessor computer
`system. For example. if there are three nodes. each node is
`assigned one-third of the address space.
`In order to maintain memory coherency between each
`node. each cache line in the system memory is tagged with
`a coherency state for that node. These coherency state tags
`are referred to as an MTAG. When a bus device in a
`
`Other objects and advantages of the invention will
`become apparent upon reading the following detailed
`description and upon reference to the accompanying draw
`ings in which:
`FIG. 1 is a block diagram of a symmetric multiprocessor
`computer system employing a hierarchical bus structure.
`FIG. 2 is a timing diagram illustrating the operation of the
`computer system of FIG. 1.
`FIG. 3 is a block diagram of a symmetric multiprocessor
`computer system employing a hierarchical bus structure
`according to one embodiment of the present invention.
`FIG. 4 is a timing diagram illustrating the operation of the
`computer system of FIG. 3.
`FIG. 5 is a block diagram of a processor/memory bus
`device for one embodiment of the present invention.
`FIG. 6 is block diagram of a U0 bridge bus device
`according to one embodiment of the present invention.
`FIG. 7 is a block diagram of an extended symmetric
`multiprocessor computer system according to one embodi
`ment of the present invention.
`FIG. 8 is a block diagram of an SMP node of the extended
`symmetric multiprocessor computer system of FIG. 7.
`FIG. 9 is a diagram of dilferent addressing modes
`employed in one embodiment of the present invention.
`FIG. 10 is a timing diagram illustrating the operation of
`the extended symmetric multiprocessor computer system of
`FIG. 7.
`While the invention is susceptible to various modi?ca
`tions and alternative forms. speci?c embodiments thereof
`are shown by way of example in the drawings and will
`herein be described in detail. It should be understood.
`however. that the drawings and detailed description thereto
`are not intended to limit the invention to the particular form
`disclosed. but on the contrary. the intention is to cover adl
`modi?cations. equivalents and alternatives falling within the
`
`35
`
`45
`
`50
`
`65
`
`NETAPP, INC. EXHIBIT 1004
`Page 14 of 22
`
`

`
`5.754.877
`
`7
`spirit and scope of the present invention as de?ned by the
`appended claims.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`8
`computer system 20 shown in FIG. 3. a memory operation
`may include one or more transactions upon the L1 buses 32
`and L2 bus 22. Bus transactions are broadcast as bit-encoded
`packets comprising an address. command. and source id.
`Other information may also be encoded in each packet such
`as addressing modes or mask information.
`Generally speaking. I/O operations are similar to memory
`operations except the destination is an I/O bus device. I/O
`devices are used to communicate with peripheral devices.
`such as serial ports or a ?oppy disk drive. For example. an
`1/0 read operation may cause a transfer of data from 1/0
`element 50 to a processor in processor/memory bus device
`38D. Similarly. an I/O write operation may cause a transfer
`of data from a processor in bus device 38D to the I/O
`element 50 in bus device 38B. In the computer system 20
`shown in FIG. 3. an I/O operation may include one or more
`transactions upon the L1 buses 32 and L2 bus 22.
`The architecture of the computer system 20 in FIG. 3 may
`be better understood by tracing the ?ow of typical bus
`transactions. For example. a bus transaction initiated by
`processor/memory element 48 of bus device 38A is issued
`on outgoing interconnect path 44A. The transaction is seen
`as outgoing packet P1(o) on L1.1 bus 32A. Each bus device
`connected to L1.1 bus 32A. including the initiating bus
`device (38A in this example). stores the outgoing packet
`P1(o) in its incoming queue 40. Also. repeater 34A broad
`casts the packet P1(o) onto the L2 bus 22 where it appears
`as packet P1. The repeaters in each of the non-originating
`repeater nodes 30 receive the packet P1 and drive it as an
`incoming packet Pl(i) on their respective L1 buses 32. Since
`the embodiment illustrated in FIG. 3 only show two repeater
`nodes 30. repeater 34B would receive packet P1 on the L2
`bus 22 and drive it as incoming packet Pl(i) on Ll.2 bus
`32B. in the above example. It is important to note that
`repeater 34A on the device node 30A from which the packet
`Pl originated as outgoing packet P1(o). does not drive
`packet P1 back down to L1.1 bus 32A as an incoming
`packet. Instead. when the other repeaters. such as repeater
`34B. drive packet P1 on their respective L1 buses. repeater
`34A asserts incoming signal 36A. Incoming signal 36A
`alerts each bus device in the originating node to treat the
`packet stored in its incoming queue 40 as the current
`incoming packet. The repeater 34B in non-originating node
`30B does not assert its incoming signal 36B. Thus devices
`38C and 38D bypass their incoming queues 40 and receive
`the incoming packet P1(i) from Ll.2 bus 32B. Multiplexors
`42 are responsive to the incoming signal and allow each
`device to see either the packet on the L1 bus 32 or the packet
`at the head of incoming queue 40 as the current transaction
`packet.
`In the above example. storing the outgoing packet P1(o)
`in the incoming queues 40A-B of all bus devices 38A-B in
`the originating node 30A. frees up the L1.1 bus 32A to
`broadcast another outgoing packet while t

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket