`
`1
`
`APPLE 1007
`
`
`
`Interconnection Networks
`
`An Engineering Approach
`
`José Duato
`
`Sudhakar Yalamanchili
`
`Lionel Ni
`
`IEEE
`COMPUTER
`SOCIETY
`
`Los Alamitos, California
`
`Washington
`
`'
`
`Brussels
`
`°
`
`Tokyo
`
`2
`
`
`
`Library oi Congress Cataloging-in-Publication Data
`
`Duato,José
`interconnection networks: an engineering approach /José Duato, Sudhakar
`Yalamanchili, Lionel Ni.
`p.
`cm.
`Includes bibliographical references.
`ISBN 0-8186-7800-3
`1. Computer networks.’
`n. Ni, Lionel M.
`|EEii¥ifi%?:iti‘%‘”‘
`TK5105.5.D88
`1997
`7
`004.6-dc21
`
`2 Multgiprocessgr.
`*~
`
`|.YalamanchiIi.
`
`97-20502
`CIP
`
`Engineers, Inc. All rights reserved.
`Copyright © 1997 by The
`Copyright and Reprint Permissions.‘ Abstracting is permitted with credit to the source. Libraries are permitted
`to photocopy isolated pages beyond the limits of US copyright law, for private use of their patrons. Other
`copying, reprint, or republication requests» should be addressed to: IEEE Copyrights Manager, IEEE Service
`Center, 445 Hoes Lane, P.0. Box 1331, Piscataway, NJ 08855-1331.
`
`IEEE Computer Society Press Order Number BPO7 800
`Library of Congress Number 97-20502
`ISBN 0-8186-7800-3
`
`Additional copies may be ordered from:
`
`IEEE Computer Society Press
`Customer Service Center
`10662 Los Vaqueros Circle
`P.0. Box 3014
`Los Alamitos, CA 90720-1314
`Tel: +1-714-821-8380
`Fax: +1-714-821-4641
`Email: cs.books @ computer. org
`
`IEEE Service Center
`445 Hoes Lane
`P.0. Box 1331
`Piscataway, NJ 08855-1331
`Tel: +1-908-981-1393
`Fax: +1-908-981-9667
`rnis.custserv@oomputer.org
`
`IEEE Computer Society
`13, Avenue de 1’Aqu'11on
`13-1200 Bnrssels
`BELGIUM
`Tel: +32-2-770-2198
`Fax: +32—2-770-8505
`euro.ofc@computer.org
`
`IEEE Computer Society
`Ooshima Building
`2-19-1 Minnmi—Aoyama
`Minato-ku, Tokyo 107
`JAPAN
`Tel: +81-3-3408-3118
`Fax: +81-3-3408-3553
`tokyo.ofc@computer.0rg
`
`Editor-in-Chief: Mohamed Fayad
`Publisher: Matt Loeb
`Acquisitions Editor: Bill Sanders
`Developmental Editor; Cheryl Baltes
`Advertising/Promotions: Tom Fink
`Production Editor: Lisa 0’Conner
`Cover Artist: John Marshall
`
`Interconnection Networks by John Marshall, oil on canvass, 1997, is based on a concept by
`About the cover:
`José Duato. The painting includes symbols representing important elements in the field. To find out more about
`the symbols in the painting, visit the book‘ s website at http://computer.org/books.
`Printed in the United States of America
`
`lEE
`COMPUTER 9
`SOCIETY
`
`3
`
`
`
`Chapter 1
`
`A Introduction
`
`Interconnection networks are currently being used for many different applications, ranging from
`internal buses in very large—scale integration (VLSI) circuits to wide area computer networks. Among
`others, these applications include backplane buses and system area networks, telephone switches,
`internal networks for asynchronous transfer mode (ATM) switches, processor/ memory interconnects
`for vector supercomputers, interconnection networks for multicomputers and distributed shared-
`memory multiprocessors, clusters of workstations, local area networks, metropolitan area networks,
`wide area computer networks, and networks for industrial applications. Additionally, the number of
`applications requiring interconnection networks is continuously growing. For example, an integral
`control system for a car requires a network connecting several microprocessors and devices.
`The characteristics and cost of these networks considerably depend on the application. There are
`no general solutions. For some applications, interconnection networks have been studied in depth for
`decades. This is the case for telephone networks, computer networks, and backplane buses. These
`networks are covered in many books. However, there are some other applications that have not been
`fully covered in the existing literature. This is the case for the interconnection networks used in
`multicomputers and distributed shared—memory multiprocessors.
`The lack of standards and the need for very high performance and reliability pushed the develop-
`ment of interconnection -networks for multicomputers. This technology was transferred to distributed
`shared—memory multiprocessors, improving the scalability of those machines. However, distributed
`shared—memory multiprocessors require an even higher network performance than multicomputers,
`pushing the development of interconnection networks even more. More recently, this network tech-
`nology began to be transferred to local area networks (LANs). Also, it has been proposed as a
`replacement for backplane buses, creating the concept of system area network. Hence, the advances
`in interconnection networks for multicomputers are the basis for the development of interconnection
`networks for other architectures and environments. Therefore, there is a need for structuring the
`concepts and solutions for this kind of interconnection networks. Obviously, when this technology
`is transferred to another environment, new issues arise that have to be addressed.
`Moreover, several of these networks are evolving very quickly, and the solutions proposed for
`different kinds of networks are overlapping. Thus, there is a need for formally stating the basic
`concepts, the alternative design choices, and the design trade—o£fs for most of those networks. In this
`book, we take that challenge and present in a structured way the basic underlying concepts of most
`interconnection networks, as well as the most interesting solutions currently implemented or proposed
`in the literature. As indicated above, the network technology developed for multicomputers has
`been transferred to other environments. Therefore, in this book we will mainly describe techniques
`
`4
`
`
`
`CHAPTER 1.
`
`INTRODUCTION
`
`developed for multicomputer networks. Most of these techniques can also be applied to distributed
`shared-memory multiprocessors, and to local and system area networks. However, we will also
`describe techniques specifically developed for these environments.
`
`1.1 Parallel Computing and Networks
`
`The demand for even more computing power has never stopped. Although the performance of
`processors has doubled in approximately every three-year span from 1980 to 1996, the complexity
`. of the software as well as the scale and solution quality of applications have continuously driven
`the development of even faster processors. A number of important problems have been identified
`in the areas of defense, aerospace, automotive applications, and science, whose solution requires
`tremendous amount of computational power. In order to solve these grand challenge problems, the
`goal has been to obtain computer systems capable of computing at the teraflops (1012 floating-point
`operations per second) level. Even the smallest of these problems requires gigafiops (109 floating-
`point operations per second) of performance for hours at a time. The largest problems require
`terafiops performance for more than a thousand hours at a time.
`Parallel computers with multiple processors are opening the door to teraflops computing per-
`formance to meet the increasing demand of computational power. The demand includes more
`computing power, higher network and input /output (I/O) bandwidths, and more memory and stor-
`age capacity. Even for applications requiring a lower computing power, parallel computers can be
`a cost-effective solution. Processors are becoming very complex. As a consequence, processor de-
`sign cost is growing so fast that only a few companies all over the World can afford to design a
`new processor. Moreover, design cost should be amortized by selling a very high number of units.
`Currently, personal computers and workstations dominate the computing market. Therefore, de-
`signing custom processors that boost the performance one order of magnitude is not cost-effective.
`Similarly, designing and manufacturing high-speed memories and disks is not cost-effective. The
`alternative choice consists of designing parallel computers from commodity components (processors,
`memories, disks, interconnects, etc.).
`In these parallel computers, several processors cooperate to
`solve a large problem. Memory bandwidth can be scaled with processor computing power by phys-
`ically distributing memory components among processors. Also, redundant arrays of inexpensive
`disks (RAID) allow the implementation of high-capacity reliable parallel file systems meeting the
`performance requirements of parallel computers.
`However, a parallel computer requires some kind of communication subsystems to interconnect
`processors, memories, disks and other peripherals. The specific requirements of these communication
`subsystems depend on the architecture of the parallel computer. The simplest solution consists of
`connecting processors to memories and disks as if there were a single processor, using system buses
`and I/O buses. Then, processors can be interconnected using the interfaces to local area networks.
`Unfortunately, commodity communication subsystems have been designed to meet a different set
`of requirements, i.e., those arising in computer networks. Although networks of workstations have
`been proposed as an inexpensive approach to build parallel computers, the communication subsystem
`becomes the bottleneck in most applications.
`Therefore, designing high-performance interconnection networks becomes a critical issue to ex-
`ploit the performance of parallel computers. Moreover, as the interconnection network is the only
`subsystem that cannot be efficiently implemented by using commodity components, its design be-
`comes very critical. This issue motivated the writing of this book. Up to now, most manufacturers
`designed custom interconnection networks (nCUBE-2, nCUBE-3, Intel Paragon, Cray T3D, Cray
`T3E, Thinking Machines Corp. CM-5, NEC Cenju-3, IBM SP2). More recently, several high-
`
`5
`
`
`
`1.2. PARALLEL COIWPUTER ARCHITECTURES
`
`Interconnection Network
`
`TT“TPlP P
`i
`l T
`lnterconnection Network
`¢
`l
`1;
`t
`l
`
`(a)
`
`(b)
`
`(a) A multicomputer.
`Figure 1.1. Schematic representation of parallel computers:
`shared—memory multiprocessor.
`(M = memory; P = processor.)
`
`(b) A UMA
`
`performance switches have been developed (Autonet, Myrinet, ServerNet) and are being marketed.
`These switches are targeted to workstations and personal computers, offering the customer the pos-
`sibility of building an inexpensive parallel computer by connecting cost—efl’ective computers through
`high—performance switches. The main issues arising in the design of networks for both approaches
`are covered in this book.
`
`1.2 Parallel Computer Architectures
`
`In this section, we briefly introduce the most popular parallel computer architectures. This descrip-
`tion will focus on the role of the interconnection network. A more detailed description is beyond the
`scope of this book.
`The idea of using commodity components for the design of parallel computers led to the de-
`velopment of distributed-memory multiprocessors, or multicomputers in early 1980s. These parallel
`computers consist of a set of processors, each one connected to its own local memory. Processors
`communicate between them by passing messages through an interconnection network. Figure 1.1a
`shows a simple scheme for this architecture. The first commercial multicomputers utilized commodity
`components, including Ethernet controllers to implement communication between processors. Un-
`fortunately, commodity communication subsystems were too slow, and the interconnection network
`became the bottleneck of those parallel computers. Several research efforts led to the development of
`interconnection networks that are several orders of magnitude faster than Ethernet networks. Most
`of the performance gain is due to architectural rather than technological improvements.
`Programming multicomputers is not an easy task. The programmer has to take care of dis-
`tributing code and data among the processors in an efficient way, invoking message-passing calls
`whenever some data are needed by other processors. On the other hand, shared-memory multi-
`processors provide a single memory space to all the processors, simplifying the task of exchanging
`data among processors. Access to shared memory has been traditionally implemented by using an
`interconnection network between processors and memory (Figure 1.1b). This architecture is referred
`to as umform memory access (UMA) architecture. It is not scalable because memory access time
`includes the latency of the interconnection network, and this latency increases with system size.
`More recently, shared-memory multiprocessors followed some trends previously established for
`multicomputers. In particular, memory has been physically distributed among processors, therefore
`reducing the memory access time for local accesses and increasing scalability. These parallel comput-
`ers are referred to as distributed shared—memory multiprocessors (DSM). Accesses to remote memory
`
`6
`
`
`
`CHAPTER 1.
`
`INTRODUCTION
`
`are performed through an interconnection network, very much like in multicomputers. The main
`difference between DSMs and multicomputers is that messages are initiated by memory accesses
`rather than by calling a system function. In order to reduce memory latency, each processor has
`several levels of cache memory, thus matching the speed of processors and memories. This architec-
`ture provides nonuniform memory access (NUMA) time. Indeed, most of the nonuniformity is due
`to the different access time between caches and main memories, rather than the different access time
`between local and remote memories. The main problem arising in DSMs is cache coherence. Several
`hardware and software cache coherence protocols have been proposed. These protocols produce
`additional trafiic through the interconnection network.
`The use of custom interconnects makes multicomputers and DSMs quite expensive. So, networks
`of workstations (NOW) have been proposed as an inexpensive approach to build parallel computers.
`NOWs take advantage of recent developments in LANs.
`In particular, the use of ATM switches
`has been proposed to implement NOWS. However, ATM switches are still expensive, which has
`motivated the development of high-performance switches, specifically designed to provide a cost-
`effective interconnect for workstations and personal computers.
`Although there are many similarities between interconnection networks for multicomputers and
`DSMs, it is important to keep in mind that performance requirements may be very different. Mes-
`sages are usually very short when DSMs are used. Additionally, network latency is important
`because memory access time depends on that latency. However, messages are typically longer and
`less frequent when using multicomputers. Usually the programmer is able to adjust the granularity
`of message communication in a multicomputer. On the other hand, interconnection networks for
`multicomputers and NOWs are mainly used for message passing. However, the geographical dis-
`tribution of workstations usually imposes constraints on the way processors are connected. Also,
`individual processors may be connected to or disconnected from the network at any time, thus
`imposing additional design constraints.
`
`1.3 Network Design Considerations
`
`Interconnection networks play a major role in the performance of modern parallel computers. There
`are many factors that may affect the choice of an appropriate interconnection network for the
`underlying parallel computing platform. These factors include:
`
`1. Performance requirements. Processes executing in different processors synchronize and com-
`municate through the interconnection network. These operations are usually performed by
`explicit message passing or by accessing shared variables. Message latency is the time elapsed
`between the time a message is generated at its source node and the time the message is deliv-
`ered at its destination node. Message latency directly affects processor idle time and memory
`access time to remote memory locations. Also, the network may saturate — it may be unable
`to deliver the flow of messages injected by the nodes, limiting the effective computing power of
`a parallel computer. The maximum amount of information delivered by the network per time
`unit defines the throughput of that network.
`
`. Scalability. A scalable architecture implies that as more processors are added, their memory
`bandwidth, I / O bandwidth, and network bandwidth should increase proportionally. Otherwise
`the components whose bandwidth does not scale may become a bottleneck for the rest of the
`system, decreasing the overall efficiency accordingly.
`
`. Incremental expandability. Customers are unlikely to purchase a parallel computer with a full
`set of processors and memories. As the budget permits, more processors and memories may be
`
`7
`
`
`
`1.3. NETWORK DESIGN CONSIDERATIONS
`
`added until a system’s maximum configuration is reached. In some interconnection networks,
`the number of processors must be a power of 2, which makes them difficult to expand.
`In
`other cases, expandability is provided at the cost of wasting resources. For example, a network
`designed for a maximum size of 1,024 nodes may contain many unused communication links
`when the network is implemented with a smaller size. Interconnection networks should provide
`incremental expandability, allowing the addition of a small number of nodes while minimizing
`resource wasting.
`
`. Partitionability. Parallel computers are usually shared by several users at a time. In this case,
`it is desirable that the network traffic produced by each user does not affect the performance
`of other applications. This can be ensured if the network can be partitioned into smaller
`functional subsystems. Partitionability may also be required for security reasons.
`
`. Simplicity. Simple designs often lead to higher clock frequencies and may achieve higher
`performance. Additionally, customers appreciate networks that are easy to understand because
`it is easier to exploit their performance.
`
`. Distance span. This factor may lead to very different implementations. In multicomputers and
`DSMS, the network is assembled inside a few cabinets. The maximum distance between nodes
`is small. As a consequence, signals are usually transmitted using copper wires. These wires
`can be arranged regularly, reducing the computer size and wire length. In NOWS, links have
`very different lengths and some links may be very long, producing problems such as Coupling,
`electromagnetic noise, and heavy link cables. The use of optical links solves these problems,
`equalizing the bandwidth of short and long links up to a much greater distance than when
`copper wire is used. Also, geographical constraints may impose the use of irregular connection
`patterns between nodes, making distributed control more diflicult to implement.
`
`. Physical constraints. An interconnection network connects processors, memories, and/or I/O
`devices. It is desirable for a network to accommodate a large number of components while
`maintaining a low communication latency. As the number of components increases, the number
`of wires needed to interconnect them also increases. Packaging these components together
`usually requires meeting certain physical constraints, such as operating temperature control,
`wiring length limitation, and space limitation. Two major implementation problems in large
`networks are the arrangement of wires in a limited area, and the number of pins per chip (or
`board) dedicated to communication channels. In other words, the complexity of the connection
`is limited by the maximum wire density possible, and by the maximum pin count. The speed
`at which a machine can run is limited by the wire lengths, and the majority of the power
`consumed by the system is used to drive the wires. This is an important and challenging issue
`to be considered. Diflerent engineering technologies for packaging, wiring, and maintenance
`should be considered.
`
`. Reliability and repairability. An interconnection network should be able to deliver information
`reliably. Interconnection networks can be designed for continuous operation in the presence of
`a limited number of faults. These networks are able to send messages through alternative paths
`when some faults are detected. In addition to reliability, interconnection networks should have
`a modular design, allowing hot upgrades and repairs. Nodes can also fail or be removed from
`the network.
`In particular, a node can be powered off in a network of workstations. Thus,
`NOWS usually require some reconfiguration algorithm for the automatic reconfiguration of the
`network when a node is powered on or ofl".
`
`8
`
`
`
`CHAPTER 1.
`
`INTROD UCTION
`
`9. Expected workloads. Users of a general-purpose machine may have very different requirements.
`If the kind of applications that will be executed in the parallel computer are known in advance,
`it may be possible to extract some information on usual communication patterns, message
`sizes, network load, etc. That information can be used for the optimization of some design
`parameters. VVhen it is not possible to get information on expected workloads, network design
`should be robust, i.e., design parameters should be selected in such a way that performance is
`good over a wide range of traflic conditions.
`
`. Cost constraints. Finally, it is obvious that the “best” network may be too expensive. Design
`decisions very often are trade-ofi"s between cost and other design factors. Fortunately, cost
`is not always directly proportional to performance. Using commodity components whenever
`possible may considerably reduce the overall cost.
`
`1.4 Classification of Interconnection Networks
`
`Among other criteria, interconnection networks have been traditionally classified according to the
`operating mode (synchronous or asynchronous), and network control (centralized, decentralized,
`or distributed). Nowadays, multicomputers, multiprocessors, and NOWS dominate the parallel
`computing market. All of these architectures implement asynchronous networks with distributed
`control. Therefore, we will focus on other criteria that are currently more significant.
`A classification scheme is shown in Figure 1.2 which categorizes the known interconnection net-
`works into four major classes based primarily on network topology: shared-medium networks, direct
`networks, indirect networks, and hybrid networks. For each class, the figure shows a hierarchy of
`subclasses, also indicating some real implementations for most of them. This classification scheme
`is based on the classification proposed in [252], and it mainly focuses on networks that have been
`implemented. It is by no means complete as other new and innovative interconnection networks may
`emerge as technology further advances, such as mobile communication and optical interconnections.
`In shared-medium networks, the transmission medium is shared by all communicating devices.
`An alternative to this approach consists of having point-to-point links directly connecting each com-
`municating device to a (usually small) subset of other communicating devices in the network.
`In
`this case, any communication between nonneighboring devices requires transmitting the informa-
`tion through several intermediate devices. These networks are known as direct networks. Instead
`of directly connecting the communicating devices between them,
`indirect networks connect those
`devices by means of one or more switches.
`If several switches exist, they are connected between
`them using point-to-point links. In this case, any communication between communicating devices
`requires transmitting the information through one or more switches. Finally, hybrid approaches are
`possible. These network classes and the corresponding subclasses will be described in the following
`sections.
`
`1.5 Shared-Medium Networks
`
`The least complex interconnect structure is one in which the transmission medium is shared by
`all communicating devices. In such shared-medium networks, only one device is allowed to use the
`network at a time. Every device attached to the network has requester, driver, and receiver circuits
`to handle the passing of address and data. The network itself is usually passive, since the network
`itself does not generate messages.
`
`9
`
`
`
`1.5. SHARED-MEDIUM NETWORKS
`
`Interconnection Networks
`
`—— Shared-Medium Networks
`
`Local Area Networks
`
`T Contention Bus (Ethernet)
`Tokcn Bus (Arcnct)
`T Token Ring (FDDI Ring, IBM Token Ring)
`
`Backplane Bus (Sun Gigaplane, DEC AlphaServer8X0O, SGI PowerPath-2)
`
`Direct Networks (Router-B ased Networks)
`
`Strictly Orthogonal Topologies
`Mesh
`
`T 2-D Mesh (Intel Paragon)
`T 3-D Mesh (MIT J-Machine)
`T Torus (k-ary n-cube)
`l-D Unidirectional Torus or Ring (KSR first-level ring)
`T 2-D Bidirectional Torus (Intel/CMU iWarp)
`T 3-D Bidirectional Torus (Cray T3D, Cray T3E)
`T Hypercube (Intel iPSC, nCUBE)
`
`Other Topologies: Trees, Cube-Connected Cycles, de Bruijn Network, Star Graphs, etc.
`
`—— Indirect Networks (Switch-Based Networks)
`
`T~ Regular Topologies
`T Crossbar (Cray X/Y-MP, DEC GIGAswitch, Myrinet)
`T Multistage Interconnection Networks
`
`T Blocking Networks
`i E Unidirectional MIN (NEC Ccnju-3, IBM RP3)
`Bidirectional MIN (IBM SP, TMC CM-5, Meiko CS—2)
`Nonblocking Networks: Clos Network
`
`T Irregular Topologies (DEC Autonet, Myrinet, ServerNet)
`
`——- Hybrid Networks
`
`T Multiple-Backplane Buses (Sun XDBus)
`
`T Hierarchical Networks (Bridged LANS, KSR)
`T Cluster—Based Networks (Stanford DASH, HP/Convex Exemplar)
`
`T Other Hypergraph Topologies: Hypcrbuscs, Hypermeshes, etc.
`
`(1-D = one-dimensional; 2-D : two-
`Figure 1.2. Classification of interconnection networks.
`dimensional; 3-D = three-dimensional; CMU : Carnegie Mellon University; DASH = Directory
`Architecture for Shared-Memory; DEC = Digital Equipment Corp.; FDDI = Fiber Distributed
`Data Interface; HP = Hewlett-Packard; KSR : Kendall Square Research; MIN = Multistage Inter-
`connection Network; MIT = Massachusetts Institute of Technology; SGI = Silicon Graphics Inc.;
`TMC = Thinking Machines Corp.)
`
`10
`
`
`
`CHAPTER 1.
`
`INTRODUCTION
`
`An important issue here is the arbitration strategy that determines the mastership of the shared—
`medium network to resolve network access conflicts. A unique characteristic of a shared medium
`is its ability to support atomic broadcast in which all devices on the medium can monitor network
`activities and receive the information transmitted on the shared medium. This property is important
`to efliciently support many applications requiring one-to-all or one-to—many communication services,
`such as barrier synchronization and snoopy cache coherence protocols. Due to limited network
`bandwidth, a single shared medium can only support limited number of devices before the medium
`becomes a bottleneck.
`Shared-medium networks constitute a well established technology. Additionally, their limited
`bandwidth restricts their use in multiprocessors. So, these networks will not be covered in this
`book, but we will present a short introduction in the following sections. There are two major classes
`of shared-medium networks: local area networks, mainly used to construct computer networks that
`span physical distances no longer than a few kilometers, and backplane buses, mainly used for
`internal communication in uniprocessors and multiprocessors.
`
`1.5.1 Shared-Medium Local Area Networks
`High—speed LANs can be used as a networking backbone to interconnect computers to provide an
`integrated parallel and distributed computing environment. Physically, a shared-medium LAN uses
`copper wires or fiber optics in a bit-serial fashion as the transmission medium. The network topology
`is either a bus or a ring. Depending on the arbitration mechanism used, different LANs have been
`commercially available. For performance and implementation reasons, it is impractical to have a
`centralized control or to have some fixed access assignment to determine the bus master who can
`access the bus. Three major classes of LANs based on distributed control are described below.
`
`Contention Bus
`The most popular bus arbitration mechanism is to have all devices to compete for the exclusive
`access right of the bus. Due to the broadcast nature of the bus, all devices can monitor the state
`of the bus, such as idle, busy, and collision. Here the term “collision” means that two or more
`devices are using the bus at the same time and their data collided. When the collision is detected,
`the competing devices will quit transmission and try later. The most well-known contention-based
`LAN is Ethernet which adopts carrier-sense multiple access with collision detection (CSMA/CD)
`protocol. The bandwidth of Ethernet is 10 Mbps and the distance span is 250 meters (coaxial cable).
`As processors are getting faster, the number of devices that can be connected to Ethernet is limited
`to avoid the network bottleneck. In order to break the 10 Mbps bandwidth barrier, Fast Ethernet
`can provide 100 Mbps bandwidth.
`
`Token Bus
`One drawback of the contention bus is its nondeterministic nature as there is no guarantee of how
`much waiting time is required to gain the bus access right. Thus, the contention bus is not suitable
`to support real-time applications. To remove the nondeterministic behavior, an alternate approach
`involves passing a token among the network devices. The owner of the token has the right to access
`the bus. Upon completion of the transmission, the token is passed to the next device based on
`some scheduling discipline. By restricting the maximum token holding time, the upper bound that
`a device has to wait for the token can be guaranteed. Arcnet supports token bus with a bandwidth
`of 2.5 Mbps.
`
`11
`
`
`
`1.5. SHARED-MEDIUM NETWORKS
`
`Figure 1.3. A single-bus network. (M = memory; P = processor.)
`
`Token Ring
`
`The idea of token ri.ng is a natural extension of token bus as the passing of the token forms a ring
`structure.
`IBM token ring supports bandwidths of both 4 and 16 Mbps based on coaxial cable.
`Fiber Distributed Data Interface (FDDI) provides a bandwidth of 100 Mbps using fiber optics.
`
`1.5.2 Shared-Medium Backplane Bus
`
`It is
`A backplane bus is the simplest interconnection structure for bus-based parallel computers.
`commonly used to interconnect processor(s) and memory modules to provide UMA architecture.
`Figure 1.3 shows a single-bus network. A typical backplane bus usually has 50 — 300 wires and is
`physically realized by printed lines on a circuit board or by discrete (backplane) wiring. Additional
`costs are incurred by interface electronics, such as line drivers, receivers, and connectors.
`There are three kinds of information in the backplane bus: data, address, and control signals.
`Control signals include bus request signal and request grant signal, among many others. In addition
`of the width of data lines, the maximum bus bandwidth that can be provided is dependent on the
`technology. The number of processors that can be put on a bus depends on many factors, such as
`processor speed, bus bandwidth, cache architecture, and program behavior.
`
`Methods of Information 'I‘ransfer
`
`Both data and address information must be carried in the bus. In order to increase the bus bandwidth
`
`and provide a large address space, both data width and address bits have to be increased. Such an
`increase implies another increase in the bus complexity and cost. Some designs try to share address
`and data lines. For multiplexed transfer, address and data are sent alternatively. Hence, they can
`share the same physical lines and require less power and fewer chips. For nonmultiplezed transfer,
`address and data lines are separated. Thus, data transfer can be done faster.
`In synchronous bus design, all devices are synchronized with a common clock. It requires less
`complicated logic and has been used in most existing buses. However, a synchronous bus is not
`easily upgradable. New faster processors are difficult to fit into a slow bus.
`In asynchronous buses, all devices connected to the bus may have different speeds and their own
`clocks. They use a handshaking protocol to synchronize with each other. This provides independence
`for different technologies and allows slower and faster devices with different clock rates to operate
`together. This also implies buffering is needed, since slower devices cannot handle messages as fast
`as faster devices.
`
`Bus Arbitration
`
`In a single-bus network, several processors may attempt to use the bus simultaneously. To deal with
`this, a policy must be implemented that allocates the bus to the processors making such requests.
`For performance reasons, bus allocation must be carried out by hardware arbiters. Thus, in order to
`perform a memory access request, the processor has to exclusively own the bus and become the bus
`
`12
`
`
`
`10
`
`CHAPTER 1.
`
`INTRODUCTION
`
`master. To become the bus master, each processor implements a bus requester, which is a collection
`of logic to req