throbber
TUTORIAL:
`
`INTE 111111111111111111~11ij'~ij~ 11\l~iiii lllllillf i[II iii! 1111111111 11111111 TIDN NElWDRKS
`
`~) .. :r parallel and distributed processing
`Chuan-lin Wu and Tse-yun Feng
`
`"(::'.ifl:1l
`
`.'_°;, rf\~ -
`
`IEEE CATALOG NUMBER EHO217-0
`LIBRARY OF CONGRESS NUMBER 84-81319
`IEEE COMPUTER SOCIETY ORDER NUMBER 574
`ISBN 0-8188-0574-X
`
`1ffil-.98'i THE INSTITUTE OF ELECTRICAL AND ELE~TRONICS ENGINEERS INC
`
`· -- - - -
`
`1
`
`•
`
`•
`
`IEEE COMPUTER SOCIETY
`
`I
`
`COMPUTER
`SOCIETY~
`PRESS
`~
`4'
`
`INTEL - 1008
`
`

`

`QA
`70,9
`. b5
`wg
`e.., I
`
`TU10RIAL:
`
`INTERCONNECTION NETWORKS
`for parallel and distributed processing
`Chuan-lin Wu and Tse-yun Feng
`
`IEEE CATALOG NUMBER EHO217-0
`LIBRARY OF CONGRESS NUMBER 84-81319
`IEEE COMPUTER SOCIETY ORDER NUMBER 574
`ISBN 0-8186-0574-X
`
`fffil;.~BY THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, INC.
`
`COMPUTER
`SOCIETY~
`PRESS ~®
`
`ffi, IEEE COMPUTER SOCIETY
`
`INFORMATION CENTER
`LOGICON S&ISO SAN PEDRO
`
`INTEL - 1008
`
`

`

`Published by IEEE Computer Society Press
`1109 Spring Street
`Suite 300
`Silver Spring, MD 20910
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source.
`Libraries are permitted to photocopy beyond the limits of U.S. copyright law for
`private use of patrons those articles in this volume that carry a code at the bottom
`of the first page, provided the per-copy fee indicated in the code is paid through the
`Copyright Clearance Center, 29 Congress Street, Salem, MA 01970. Instructors are
`permitted to photocopy isolated articles for noncommercial classroom use without
`fee. For other copying, reprint or republication permission, write to Director, Publish(cid:173)
`ing Services, IEEE, 345 E. 47 St., New York, NY 10017. All rights reserved. Copy(cid:173)
`right © 1984 by The Institute of Electrical and Electronics Engineers, Inc.
`
`IEEE Catalog Number EHO217-0
`Library of Congress Number 84-81319
`IEEE Computer Society Order Number 574
`ISBN 0-8186-0574-X (Paper)
`ISBN 0-8186-4574-1 (Microfiche)
`
`Order from: IEEE Computer Society
`Post Office Box 80452
`Worldway Postal Center
`Los Angeles, CA 90080
`
`IEEE Service Center
`445 Hoes Lane
`Piscataway, NJ 08854
`
`•-W---
`1ffif-.Bl.f
`
`The Institute of Electrical and Electronics Engineers, Inc.
`
`ii
`
`INTEL - 1008
`
`

`

`2 1 SEP 1987
`
`Preface
`
`This tutorial presents fundamentals in interconnection networks, a crucial topic in the field of parallel/distributed
`processing. The interconnectio~ network consists of software and hardware entities that are designed to facilitate efficient
`interprocess and interprocessor communication in a parallel processing system. Although conventional computer systems
`using the von Neumann model do not have many system units to be connected and the processes are sequentially executed,
`an interconnection network such as the time-shared bus is still a critical system component. Today, system units such as
`processors and memory modules have increased rapidly in new computer systems, and processes are executed in parallel
`to meet real-time requirements. The design of interconnection networks is emerging as the most basic issue in exploiting
`parallelism. The interconnection network not only has a profound impact on algorithm design, but also greatly affects
`system level control.
`In spite of the importance of the interconnection network in future computer system design, the issues of designing
`interconnection networks have never been fairly and completely explored in a single publication. Even worse, researchers
`often ignore its impact and stubbornly think that the interconn~ction network is nothing but a communication idiot. It is a
`fallacy for people to think that there is enough work in interconnection networks . In view of this, the authors believe that a
`tutorial is urgently needed to promote more fruitful research efforts in the field of parallel/distributed processing.
`The text of this tutorial is designed for system designers, programmers, educators, and those who are involved in the
`research , development, and application of various special-purpose and general-purpose computer systems . It is tutorial in
`nature and provides a state-of-the-art survey. It is expected that the tutorial will serve as a guide for beginners and as a
`major reference for all computer professionals. It is hoped that, after going through the text, readers will be able to design
`interconnection networks that fit their computer architecture needs, design better algorithms by taking advantage of what
`the interconnection network can offer, write better programs by knowing the limitations of the interconnection network,
`and trigger a revolution on the system control concept.
`The text is organized into 12 chapters. A tutorial guide, provided at the beginning of each chapter, explains the idea and
`concept and also annotates an up-to-date reference list. Each chapter includes some reprint articles which explain
`underlying theory and developments.
`Chapter 1 provides a comprehensive overview of interconnection networks, with a discussion of design trends and
`design issues.
`Chapter 2 surveys underlying network topologies used in parallel/distributed system architecture. The topologies are
`divided into two categories: static and dynamic .
`Chapter 3 focuses on control strategies for routing data through interconnection networks . Various routing algorithms
`are covered.
`Chapter 4 discusses how to realize data permutations in an array processing mode, with an exploration of permutation
`capability of various kinds of interconnection networks.
`Chapter 5 explores the performance evaluation of interconnection networks . Evaluation parameters considered include
`bandwidth, message delay, diameter, and effectiveness in simulating other networks .
`Chapter 6 is concerned with fault diagnosis for detecting and locating faults in interconnection networks. Fault models
`and test procedures are considered.
`Chapter 7 investigates fault-tolerance characteristics of interconnection networks, as well as ways to achieve fault
`tolerance.
`Chapter 8 discusses issues in using very-large-scale-integrated (VLSI) circuits to design interconnection networks.
`Area and delay models of layouts are also formulated .
`Chapter 9 covers reconfiguration techniques that are concerned with allocating network resources to achieve variable
`topology capability. Different approaches are discussed .
`Chapter 10, on mapping, is concerned with how to partition and assign program modules to resources such as processors
`and memories, subject to minimizing the total number of routing steps. The mapping is done under the assumption that
`either (or both) the algorithm or interconnection network is fixed.
`Chapter 11 explores network synthesis, which derives the optimum network topology for a transformed algorithm. Both
`algorithms and networks are treated as variables in the procedure.
`Finally, Chapter 12 provides some examples on the construction of multistage networks. It should be noted that every
`system needs an interconnection network. The examples provided are used to point out the trends.
`
`111
`
`INTEL - 1008
`
`

`

`We would like to take this opportunity to thank all the authors who have contributed to this tutorial text. We apologize to
`those authors whose valuable papers could not be included due to page limitations. We tried to be as complete as possible
`in compiling the reference lists, which were based on contribution and accessibility of the work. We deeply regret any
`omissions and hope that they will be made known to us so that they can be included in later updates. Finally, we would like
`to thank the reviewers for their helpful criticisms and suggestions, and Margaret Brown and her staff of the IEEE Computer
`Society Press for editing the text.
`
`Chuan-lin Wu, University of Texas at Austin
`Tse-yun Feng, Pennsylvania State University
`
`IV
`
`INTEL - 1008
`
`

`

`Published by IEEE Computer Society Press
`1109 Spring Street
`Suite 300
`Silver Spring, MD 20910
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source.
`Libraries are permitted to photocopy beyond the limits of U.S. copyright law for
`private use of patrons those articles in this volume that carry a code at the bottom
`of the first page, provided the per-copy fee indicated in the code is paid through the
`Copyright Clearance Center, 29 Congress Street, Salem, MA 01970. Instructors are
`permitted to photocopy isolated articles for noncommercial classroom use without
`fee. For other copying, reprint or republication permission, write to Director, Publish(cid:173)
`ing Services, IEEE, 345 E. 47 St., New York, NY 10017. All rights reserved. Copy(cid:173)
`right © 1984 by The Institute of Electrical and Electronics Engineers, Inc.
`
`IEEE Catalog Number EHO217-0
`Library of Congress Number 84-81319
`IEEE Computer Society Order Number 574
`ISBN 0-8186-057 4-X (Paper)
`ISBN 0-8186-457 4-i (Microfiche)
`
`Order from: IEEE Computer Society
`Post Office Box 80452
`Worldway Postal Center
`Los Angeles, CA 90080
`
`IEEE Service Center
`445 Hoes Lane
`Piscataway, NJ 088~4
`
`1$~8'-I
`·--,...,__ The Institute of Electrical and Electronics Engineers, Inc.
`
`ll
`
`INTEL - 1008
`
`

`

`Operation mode. Two types of communication can
`be identified: synchronous and asynchronous. Synchro(cid:173)
`nous communication is needed for processing in which
`communication paths are established synchronously for
`either a data manipulating function 11 or a data/instruc(cid:173)
`tion broadcast. Asynchronous communication is needed
`for multiprocessing in which connection requests are
`issued dynamically. A system may also be designed to
`facilitate both synchronous and asynchronous process(cid:173)
`ing. Therefore, typical operation modes of interconnec(cid:173)
`tion networks can be classified into three categories: syn(cid:173)
`chronous, asnychronous, and combined.
`
`two processors are passive and dedicated buses cannot be
`reconfigured for direct connections to other processors.
`On the other hand, links in the dynamic category can be
`reconfigured by setting the network's active switching
`elements.
`The cross product of the set of categories in each de(cid:173)
`sign decision-[operation mode) x [control strategy] x
`[switching methodology] x [network topology ]-repre(cid:173)
`sents a space of interconnection networks. Obviously, the
`cross product contains some uninteresting cases, but a
`network designer can obtain a meaningful subspace by
`exercising a practical view of engineering technology.
`
`Control strategy. A typical interconnection network
`consists of a number of switching elements and intercon(cid:173)
`necting links. Interconnection functions are realized by
`properly setting control of the switching elements. The
`control-setting function can be managed by a centralized
`controller or by the individual switching element. The lat(cid:173)
`ter strategy is called distributed control; the first strategy
`is called centralized control..
`
`Switching methodology. The two major switching
`methodologies are circuit switching and packet switching.
`In circuit switching, a physical path is actually established
`between a source and a destination. In packet switching,
`data is put in a packet and routed through the intercon(cid:173)
`nection network without establishing a physical connec(cid:173)
`tion path. In general, circuit switching is much more suit(cid:173)
`able for bulk data transmission, and packet switching is
`more efficient for short data messages. Another option,
`integrated switching, includes capabilities of both circuit
`switching and packet switching. Therefore, three switch(cid:173)
`ing methodologies can be identified: circuit switching,
`packet switching, and integrated switching.
`
`Network topology. A network can be depicted by a
`graph in which nodes represent switching points and
`edges represent communication links. The topologies
`tend to be regular and can be grouped into two categories:
`static and dynamic. In a static topology, links between
`
`Topologies
`
`Network topology is a key factor in determining a
`suitable architectural structure, and many topologies
`have been. considered for telephone switching connec(cid:173)
`tions. 12 Here, we review those proposed or used for con(cid:173)
`nections in tightly coupled multiple-processor systems
`(see Figure 3).
`
`Figure 2. Hardware model of com::urrent processing
`systems.
`
`Figure 3. Topologies of interconnection networks.
`
`December 1981
`
`6
`
`INTEL - 1008
`
`

`

`Figure 4. Examples of static network toplogies: (a) one dimensional; (b-f) two dimensional; and (g•D three dimensional.
`
`7
`
`COMPUTER
`
`INTEL - 1008
`
`

`

`iEEE TRANSACTIONS ON COMPUTERS. VOL. C-30. NO. 2. FEBRUARY 1981
`
`I/0
`~
`
`I
`
`UNIT
`
`'.1E.'10RY
`
`' '
`·- CONTROL - PROGRAM
`,n ELEMENT
`·- ELEMENT
`• • •
`
`I IPROCESSING
`
`0
`
`,..__
`
`DATA
`MEMORY
`
`PROCESSING
`
`l ~
`DATA
`}!EMORY
`
`-
`'.___ ELEMENT ----
`
`PROCESSING
`
`N-1
`
`DATA
`MEMORY
`
`I
`I
`
`INTER-
`CONNECTION
`NETWORK
`
`-
`
`plexity of Thompson's algorithm is determined by the com(cid:173)
`plexity of the GCN set-up algorithm. The best known parallel
`GCN set-up algorithms (Nassimi and Sahni [5]) have a
`complexity of 0(n) on an n X n MCC and 0(log4 N) on both
`CCC's and PSC's with N PE's.
`In this paper we present an algorithm for the RAR problem,
`which runs in 0(q2n) time on a q-dimensional n X n X · · · X
`n MCC and in 0(log2 N) time on an N PE PSC or CCC. Thus,
`the algorithm of this paper is asymptotically faster than the
`Thompson-Nassimi-Sahni algorithm for CCC's and PSC's.
`For MCC's we expect our algorithm to be significantly faster
`than the Thompson-Nassimi-Sahni algorithm, as the algo(cid:173)
`rithm of this paper is significantly simpler and has much less
`overhead.
`The RAW problem can be solved using the subalgorithms
`developed for the RAR problem. The complexity of the RAW
`algorithm is the same as that of the RAR algorithm when each
`PE is to receive at most one piece of data. A more general form
`of a RAW occurs when, at most, d data items are written into
`any one PE. In this case the complexity of the RAW becomes
`O(q 2n + dqn) on a q-dimensional n X n X · · · X n MCC and
`0(Iog2 N + d log N) on an N PE CCC or PSC.
`Section II describes how RAR's and RA W's are to be per(cid:173)
`formed. In Section III we present the necessary subalgo(cid:173)
`rithms.
`
`II. ALGORITHMS FOR RAR's AND RA W's
`
`I/
`0
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`' I
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`Fig. l. Block diagram of an SIMD computer.
`
`RAR's and RA W's are performed using certain well-de(cid:173)
`fined steps. These are described below.
`l) SORT: In a sort records are rearranged so as to be in
`nondecreasing order of a specified key. Let G(i) denote the
`record in PE(i), 0 ~ i < N. Let H(i) be the key field of record
`G(i). H(i) is also in PE(i). Following a sort, the records will
`have been rearranged such that H(i) :s; H(i + 1), 0 .:Si< N
`!oxes represent ?Es
`-- l.
`(b)
`(c)
`(a)
`2) RANK: The rank of a selected record is the number of
`Fig. 2. The SIMD models. (a) 4 X 4 MCC. (b) 8 PE CCC. (c) 8 PE
`selected records in PE's with a smaller index. For example,
`PSC.
`of records. The initial configuration is record G(i) in ·PE(i),
`assume we have eight PE's each containing one record. Let the
`0 5 i ~ j < N. Each record has a field H (high). The H values
`key values for these eight records be (6, 4, 2, 2*, 6, 6*, 3*, 4*),
`are such that O .:S H(O) < H(l) <···<HU) ~ N - 1, and
`where an asterisk over a key value denotes a flag or selected
`H(i) = cc for j < i < N. Generalize copies record G(i) into
`record. The ranks of the flagged records are ( - , - , - , 0, - , 1,
`l) + l through H(i), 0 5 i ~ j ( we assume, for
`PE's H(i -
`2, 3).
`convenience, H(- 1) = 0). Let G(0:7) = (A, B, C, -, -, -,
`3) CONCENTRATE: Let G(i,) 0 ::£ r ::£ j < N be a set of
`-, -,) and let H(0:7) = (1, 5, 6, cc, cc, cc, cc, cc). Following a
`records with GUr) initially in PE(i, ). Assume that the records
`have been ranked so that HUr) = r. A concentrate results in
`generalize, G(0:7) = (A, A, B, B, B, B, C, - ).
`record GUr) being moved to PE(r), 0 ~ r ~}.Assume that
`Our RAR algorithm is best described by considering an
`G(0:7) =(A,-.-, B, -, C, -, D) and i0 = 0, i1 = 3, i2 = 5,
`example (Fig. 3). In this example we have N = 8 PE's and
`S(0:7) = (2, 6, 2, oo, 5, 6, 00 , 6). (Recall that S(i) specifies the
`and i 3 = 7. Following a concentrate, G(0:7) = (A, B, C, D, -,
`-. -. -).
`PE from which the data for PE(i) is to be fetched, and S(i) =
`4) DISTRIBUTE: Let G(i), 0 ~ i ~ j < N be a set of
`00 iff PE(i) is to receive no data.) Let T(i) = i and FLAG(i)
`= 1, 0 :s; i < N. Our RAR algorithm begins by sorting the
`records with G(i) initially in PE(i). Let H(i), 0 :5 i 5 j be a
`set of destinations such that H(i) < H(i + 1 ), 0 5 i <}.The
`records G(i) = (S(i), T(i), FLAG (i)). Records are sorted
`on S; T is used to resolve ties (i.e., records with the same S
`purpose of a distribute is to route G(i) to PE(H(i)), 0 ~ i :5
`value are ordered by their T value). The sorting algorithm we
`j. It is easy to see that a distribute is the inverse of a concen(cid:173)
`trate. As an example, suppose that G(0:7) = (A, B, C, -, -,
`shall use ( see Section III) will be a comparison sort. We require
`-, -, -) and that H(O) = I, H(l) = 5, and H(2) = 6. Fol(cid:173)
`that during the sort whenever a comparison between G(i) and
`GU) is made, if S(i) = SU) and T(i) < TU), thell FLAG(i)
`lowing a distribute, G(0:7) =(-,A,-,-, -, B, C, -).
`5) GENERALIZE: A generalize makes multiple copies
`is set to zero. As a result of this, following the sort, FLAG(i)
`283
`
`INTEL - 1008
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket