throbber
Lecture Notes in Computer Science
`Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
`
`1734
`
`

`
`Berlin
`Heidelberg
`New York
`Barcelona
`Hong Kong
`London
`Milan
`Paris
`Singapore
`Tokyo
`
`

`
`Hermann Hellwagner
`Alexander Reinefeld (Eds.)
`
`SCI: Scalable
`Coherent Interface
`
`Architecture and Software
`for High-Performance Compute Clusters
`
`

`
`Series Editors
`
`Gerhard Goos, Karlsruhe University, Germany
`Juris Hartmanis, Cornell University, NY, USA
`Jan van Leeuwen, Utrecht University, The Netherlands
`
`Volume Editors
`
`Hermann Hellwagner
`University of Klagenfurt, Institute of Information Technology
`A-9020 Klagenfurt, Austria
`E-mail: hermann.hellwagner@uni-klu.ac.at
`
`Alexander Reinefeld
`Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin (ZIB)
`Takustr. 7, D-14195 Berlin-Dahlem, Germany
`E-mail: ar@zib.de
`
`Cataloging-in-Publication data applied for
`
`Die Deutsche Bibliothek - CIP-Einheitsaufnahme
`
`SCI - Scalable coherent interface : architecture and software for
`high-performance compute clusters / Hermann Hellwagner ; Alexander Reinefeld
`(ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ;
`Milan ; Paris ; Singapore ; Tokyo : Springer, 1999
`(Lecture notes in computer science ; Vol. 1734)
`ISBN 3-540-66696-6
`
`CR Subject Classification (1998): C.2, D.1-4, B.2-8
`
`ISSN 0302-9743
`ISBN 3-540-66696-6 Springer-Verlag Berlin Heidelberg New York
`
`This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
`concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
`reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
`or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
`in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are
`liable for prosecution under the German Copyright Law.
`c(cid:1) Springer-Verlag Berlin Heidelberg 1999
`Printed in Germany
`
`Typesetting: Camera-ready by author
`SPIN: 10704208
`06/3142 – 5 4 3 2 1 0
`
`Printed on acid-free paper
`
`

`
`Preface
`
`Background
`
`System interconnection networks have become a critical component of the
`computing technology of the late 1990s, and they are likely to have a great
`impact on the design, architecture, and use of future high-performance com-
`puters. Indeed, it is today not only the sheer computational speed that distin-
`guishes high-performance computers from desktop systems, but the e(cid:14)cient
`integration of the computing nodes into tightly coupled multiprocessor sys-
`tems. Network adapters, switches, and device driver software are increasingly
`becoming performance-critical components in modern supercomputers.
`Due to the recent availability of fast commodity network adapter cards
`and switches, tightly integrated clusters of PCs or workstations have emer-
`ged on the market, now (cid:12)lling the gap between desktop systems and super-
`computers. The use of commercial o(cid:11)-the-shelf (COTS) technology for both
`computing and networking enables scalable computing at relatively low costs.
`Some may disagree, but even the world champion in high-performance com-
`puting, Sandia Lab’s ASCI Red machine, may be seen as a COTS system.
`With just one hardware upgrade (pertaining to the Intel processors, not the
`network), this system has constantly been number one in the TOP-500 list of
`the worldwide fastest supercomputers since its installation in 1997. Clearly,
`the system area network plays a decisive role in overall performance.
`The Scalable Coherent Interface (SCI, ANSI/IEEE Standard 1596-1992)
`speci(cid:12)es one such fast system interconnect, emphasizing the flexibility, scala-
`bility, and high performance of the network. In recent years, SCI has become
`an innovative and widely discussed approach to interconnecting multiple pro-
`cessing nodes in various ways. SCI’s flexibility stems mainly from its com-
`munication protocols: in contrast to many other interconnects, SCI is not
`restricted to either message-based or shared-memory communication models.
`Instead, it combines both, taking advantage of similar properties that have
`been investigated in such hybrid machines as Stanford’s FLASH or MIT’s
`Alewife architectures. Since SCI also de(cid:12)nes a distributed directory-based
`cache coherence protocol, it is up to the computer architect to choose from
`a broad range of communication and execution models, including e(cid:14)cient
`message-passing architectures, as well as shared-memory models, in either
`the NUMA or CC-NUMA variants.
`
`

`
`VI
`
`Preface
`
`European industry and research institutions have played a key role in the
`SCI standardization process. Based on SCI adapter cards, switches, and fully
`integrated cluster systems manufactured by European companies, the SCI
`community in Europe has made and is making signi(cid:12)cant developments and
`state-of-the-art research on this important interconnect.
`
`Purpose of the Book
`
`From many discussions with friends, colleagues, and potential users, we found
`that one signi(cid:12)cant barrier to the widespread deployment and use of SCI is
`the lack of a clear vision of how SCI works, how it is being used in building
`clusters, and how obstacles in its deployment can be avoided. Our goal in
`compiling this book is to address these barriers by providing in-depth infor-
`mation on the technology and applications of SCI from various perspectives.
`The book focuses on SCI clusters built from commodity PCs or workstati-
`ons and SCI adapters, since they represent the mainstream and most cost-
`e(cid:11)ective application of SCI to date.
`In addition, some challenging research issues, mostly pertaining to shared-
`memory programming on SCI clusters, are discussed and potential improve-
`ments for SCI cluster equipment are highlighted.
`Who is the intended audience? The relevance of the book for computer
`architects is obvious, given the importance of system area networks for mod-
`ern high-performance computers. But the book is also intended for system
`administrators and compute center managers who plan to invest in cluster
`technology with COTS components. Furthermore, researchers and students
`wanting to contribute to this interesting technology with their own hard- or
`software developments might (cid:12)nd this book helpful.
`
`Organization of the Book
`
`The book consists of nine parts, each subdivided into chapters covering in-
`dividual topics. On the whole, the contributions cover the complete hard-
`ware/software spectrum of SCI clusters, ranging from the major concepts of
`SCI, through SCI hardware, networking, and low-level software issues, va-
`rious programming models and environments, up to tools and application
`experiences.
`Part I introduces the SCI standard and its application in practical compu-
`ter systems. SCI is put into context by comparing its concepts, architecture,
`and performance with its strongest competitor Myrinet and also with the
`proprietary Cray T3D interconnection network which set the standards back
`in 1993.
`Part II looks at the hardware. It describes two implementations of SCI
`adapters, the commercial, widely used Dolphin SCI cards for the PCI and
`SBus I/O buses, and the prototype adapter developed at TU M¨unchen which
`can be extended by special hardware for monitoring the SCI packet flow.
`
`

`
`Preface
`
`VII
`
`Building on the hardware, Part III explores how to build SCI interconnec-
`tion networks and analyzes various critical aspects of SCI networks, among
`them ringlet scalability and potential performance degradation by hardware-
`generated retry tra(cid:14)c.
`Part IV moves on to software, describing the functionality and concrete
`implementations of SCI device drivers and introducing a low-level API that
`abstracts away SCI’s distributed shared memory (DSM) implementation de-
`tails from higher-level software.
`The (cid:12)rst class of parallel and distributed programming models, namely
`message-passing libraries on top of SCI, are covered in Part V. The chapters
`report on projects which implemented sockets, TCP/IP, PVM, and MPI with
`high e(cid:14)ciency on top of SCI, by making judicious use of the SCI DSM and
`related features.
`As pointed out by the contributions in Part VI, developing shared-memory
`programming environments on SCI clusters with current SCI hardware and
`driver software is more challenging than implementing message-passing libra-
`ries. Partly due to the lack of well established shared-memory standards, the
`approaches described are widely diverse. They range from speci(cid:12)c shared vir-
`tual memory systems on top of SCI to a fully transparent, distributed thread
`system and to shared, parallel objects extending a CORBA middleware im-
`plementation. The chapters discuss some of the limitations of current SCI
`cluster equipment and present potential routes for future developments.
`Real-world experiences with SCI clusters are reported in Part VII. As
`a reference, benchmark and application performance results from the very
`large SCI clusters that are operated at PC2 Paderborn are given (cid:12)rst. The
`parallelization approaches and performance results from two projects, a com-
`plex molecular dynamics code and a real-time data acquisition and (cid:12)ltering
`application prototype for high-energy physics, are described as examples of
`real-world uses of SCI clusters.
`Part VIII deals with tools for SCI clusters, which apparently are still in
`their infancy. Therefore, only two basic SCI monitors, one implemented in
`hardware, the other in software, and their potential applications are presented
`here. In addition, a powerful system management tool, developed to operate
`the large Paderborn clusters as general-purpose, multi-user compute servers
`is introduced.
`Both SCI and SCI interconnects are still evolving in terms of standar-
`dization, product development, research (cid:12)ndings, and applications. In the
`(cid:12)nal part, Part IX, therefore, one of the designers of SCI, David Gustavson,
`describes the perspectives that he sees for SCI.
`
`Acknowledgements
`
`With great pleasure, we acknowledge the e(cid:11)orts of the many individuals who
`have contributed to the development of this book. First and foremost, we
`thank the authors for their enthusiasm, time, and expertise which made this
`
`

`
`VIII
`
`Preface
`
`book possible. We are also grateful to the people who helped in organizing the
`book, especially Oliver Heinz (PC2 Paderborn), Hans-Hermann Frese (ZIB
`Berlin), and Angelika Rossak (University Klagenfurt). The European Com-
`mission provided (cid:12)nancial support through the ESPRIT IV Programme’s SCI
`Working Group (EP 22582). Finally, we acknowledge the help of Alfred Hof-
`mann and Antje Endemann of Springer-Verlag, who were always competent,
`professional, and e(cid:14)cient partners to work with.
`
`September 1999
`
`Hermann Hellwagner
`Alexander Reinefeld
`
`

`
`Table of Contents
`
`Part I. SCI and Competitive Interconnects for Cluster Computing
`
`1. The SCI Standard and Applications of SCI
`3
`Hermann Hellwagner : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
`3
`1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`4
`1.2 SCI Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`4
`1.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`4
`1.2.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`6
`1.2.3 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`1.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
`1.3 The SCI Standard and Some Extensions . . . . . . . . . . . . . . . . . . . 11
`1.3.1 Logical Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
`1.3.2 Cache Coherence Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
`1.3.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
`1.4 Applications of SCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
`1.4.1 System Area Network for Clusters . . . . . . . . . . . . . . . . . . 23
`1.4.2 Memory Interconnect for Cache-Coherent
`Multiprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
`1.4.3 I/O Subsystem Interconnect . . . . . . . . . . . . . . . . . . . . . . . 30
`1.4.4 Large-Scale Data Acquisition System . . . . . . . . . . . . . . . 31
`1.5 Related Communication Networks and Concepts . . . . . . . . . . . 31
`1.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
`
`2. A Comparison of Three Gigabit Technologies:
`SCI, Myrinet and SGI/Cray T3D
`Christian Kurmann, Thomas Stricker : : : : : : : : : : : : : : : : : : : : : : : : : : : 39
`2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
`2.2 Levels of Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
`2.2.1 Direct Deposit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
`2.2.2 Message Passing (MPI/PVM) . . . . . . . . . . . . . . . . . . . . . . 42
`2.2.3 Protocol Emulation (TCP/IP) . . . . . . . . . . . . . . . . . . . . . 44
`2.3 Gigabit Network Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
`2.3.1 The Intel 80686 Hardware Platform . . . . . . . . . . . . . . . . . 46
`2.3.2 Myricom Myrinet Technology . . . . . . . . . . . . . . . . . . . . . . 47
`
`

`
`X
`
`Table of Contents
`
`2.3.3 Dolphin PCI-SCI Technology . . . . . . . . . . . . . . . . . . . . . . 48
`2.3.4 The SGI/Cray T3D { A Reference Point . . . . . . . . . . . . 48
`2.3.5 ATM: QoS { But Still Short of a Gigabit/s . . . . . . . . . . 50
`2.3.6 Gigabit Ethernet { An Outlook . . . . . . . . . . . . . . . . . . . . 50
`2.4 Transfer Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
`2.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
`2.4.2 \Native" and \Alternate" Transfer Modes in the Three
`Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
`2.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
`2.5.1 Performance of Local Memory Copy . . . . . . . . . . . . . . . . 58
`2.5.2 Performance of Direct Transfers to Remote Memory . . 58
`2.5.3 Performance of MPI/PVM Transfers . . . . . . . . . . . . . . . . 61
`2.5.4 Performance of TCP/IP Transfers . . . . . . . . . . . . . . . . . . 64
`2.5.5 Discussion and Comparison . . . . . . . . . . . . . . . . . . . . . . . . 65
`2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
`
`Part II. SCI Hardware
`
`3. Dolphin SCI Adapter Cards
`Marius Christian Liaaen, Hugo Kohmann : : : : : : : : : : : : : : : : : : : : : : : 71
`3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
`3.2 Overview of the Adapter Cards . . . . . . . . . . . . . . . . . . . . . . . . . . 71
`3.3 Operating Modes of the SCI Cards . . . . . . . . . . . . . . . . . . . . . . . 73
`3.4 SCI Requester . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
`3.4.1 Address Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
`3.4.2 SCI Transaction Handling . . . . . . . . . . . . . . . . . . . . . . . . . 75
`3.4.3 SCI Packet Requester . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
`3.5 SCI Responder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
`3.5.1 Mailbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
`3.5.2 Access Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
`3.5.3 Atomic Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
`3.5.4 Host Bridge Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 80
`3.6 DMA Transfers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
`3.6.1 DMA Transfers on the SBus Card . . . . . . . . . . . . . . . . . . 80
`3.6.2 DMA Transfers on the PCI Card . . . . . . . . . . . . . . . . . . . 80
`3.7 Interrupter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
`3.8 Concurrency Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
`3.8.1 Write Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
`3.8.2 E(cid:14)cient Store Barrier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
`3.9 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
`3.10 Applications and Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
`3.10.1 SAN Interface Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
`3.10.2 Remote I/O Connection and Data Acquisition . . . . . . . 83
`
`

`
`Table of Contents
`
`XI
`
`3.10.3 Switches and Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
`3.11 Cluster Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
`
`4. The TUM PCI/SCI Adapter
`Georg Acher, Wolfgang Karl, Markus Leberecht : : : : : : : : : : : : : : : : : : 89
`4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
`4.2 The PCI/SCI Adapter Architecture . . . . . . . . . . . . . . . . . . . . . . . 90
`4.3 SCI Packet Encoding and Decoding . . . . . . . . . . . . . . . . . . . . . . . 92
`4.3.1 Overview of Packet Processing . . . . . . . . . . . . . . . . . . . . . 92
`4.3.2 Choosing the Technology . . . . . . . . . . . . . . . . . . . . . . . . . . 92
`4.3.3 Internal Structure of the FPGA . . . . . . . . . . . . . . . . . . . . 93
`4.3.4 Structure of the Packet Manager as a Microcode
`Sequencer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
`4.3.5 Microcode Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
`4.3.6 Bene(cid:12)ts of the Micro Sequencer . . . . . . . . . . . . . . . . . . . . 98
`4.4 The SCI Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
`4.5 Preliminary Results for the PCI/SCI Adapter . . . . . . . . . . . . . . 99
`4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
`4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
`
`Part III. Interconnection Networks with SCI
`
`5. Low-Level SCI Protocols and Their Application to
`Flexible Switches
`Andreas C. D¨oring, Wolfgang Obel¨oer, Gunther Lustig, Erik Maehle 105
`5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
`5.2 Data Format of SCI Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
`5.3 Flow Control
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
`5.3.1 Flow Control in Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
`5.3.2 Packet Sequence in SCI . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
`5.3.3 Determination of State Transitions . . . . . . . . . . . . . . . . . 109
`5.4 Bandwidth Multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
`5.4.1 Bandwidth Management in One Ring . . . . . . . . . . . . . . . 110
`5.4.2 Idle Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
`5.4.3 Time-Out Determination . . . . . . . . . . . . . . . . . . . . . . . . . . 113
`5.5 Network Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
`5.5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
`5.5.2 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
`5.6 Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
`5.6.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
`5.6.2 Products and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 116
`5.6.3 Flexible Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
`5.6.4 Strip-o(cid:11) Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
`
`

`
`XII
`
`Table of Contents
`
`5.6.5 Routing Decision and Topology . . . . . . . . . . . . . . . . . . . . 119
`5.7 Rule-Based Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
`5.8 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
`
`6. SCI Rings, Switches, and Networks for Data Acquisition
`Systems
`Harald Richter, Richard Kleber, Matthias Ohlenroth : : : : : : : : : : : : : 125
`6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
`6.2 SCI-based Data Acquisition Systems . . . . . . . . . . . . . . . . . . . . . . 126
`6.3 SCINET Test Beds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
`6.4 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
`6.5 SCI Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
`6.6 E(cid:14)cient Use of SCI Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
`6.7 Multistage SCI Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
`6.8 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
`6.9 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
`
`7. Scalability of SCI Ringlets
`Geir Horn : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 151
`7.1 Do SCI Ringlets Scale in Number of Nodes? . . . . . . . . . . . . . . 151
`7.2 Ringlet Bandwidth Model
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
`7.2.1 Transaction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
`7.2.2 Packet Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
`7.2.3 Address Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
`7.2.4 Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
`7.2.5 Bypass Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
`7.2.6 Echo Packet Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
`7.2.7 Output Link Utilization Factor . . . . . . . . . . . . . . . . . . . . . 160
`7.3 Scalability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
`7.3.1 Common Assumptions
`. . . . . . . . . . . . . . . . . . . . . . . . . . . 161
`7.3.2 Uniform Ringlet Tra(cid:14)c . . . . . . . . . . . . . . . . . . . . . . . . . . 162
`7.3.3 Non-uniform Ringlet Tra(cid:14)c . . . . . . . . . . . . . . . . . . . . . . 162
`7.3.4 Changing Packet Lengths . . . . . . . . . . . . . . . . . . . . . . . . 163
`7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
`7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
`
`8. A(cid:11)ordable Scalability Using Multi-Cubes
`H(cid:23)akon Bugge, Knut Omang : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 167
`8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
`8.2 Interconnect Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
`8.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
`8.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
`8.4.1 \Hot-Link" Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
`
`

`
`Table of Contents
`
`XIII
`
`8.4.2 \Hot-B-Link" Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
`8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
`8.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
`
`Part IV. Device Driver Software and Low-Level APIs
`
`9. Interfacing SCI Device Drivers to Linux
`Roger Butenuth, Hans-Ulrich Heiss : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 179
`9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
`9.2 Layers of Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
`9.2.1 Address Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
`9.2.2 Levels of Hardware Abstraction . . . . . . . . . . . . . . . . . . . . 180
`9.2.3 Resource Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
`9.2.4 Virtual Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
`9.2.5 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
`9.3 Why Linux? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
`9.4 Interfaces of the Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
`9.4.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
`9.4.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
`9.4.3 User Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
`9.4.4 SCI Drivers on Other Nodes . . . . . . . . . . . . . . . . . . . . . . . 188
`9.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
`
`10. SCI Physical Layer API
`Volker Lindenstruth, David B. Gustavson : : : : : : : : : : : : : : : : : : : : : : 191
`10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
`10.1.1 Scope of the Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
`10.2 SCI Physical Layer API Architecture and Features. . . . . . . . . . 193
`10.2.1 Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
`10.2.2 Endianness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
`10.3 Supported Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
`10.4 Miscellaneous Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
`10.5 Address Translation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
`10.5.1 Global Object Identi(cid:12)er . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
`10.5.2 SCI Global Address Resolution . . . . . . . . . . . . . . . . . . . . . 200
`10.6 Shared Memory Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
`10.7 Packet Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
`10.8 Block Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
`10.9 Message Passing Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
`10.10 Cache Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
`10.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
`
`

`
`XIV
`
`Table of Contents
`
`Part V. Message Passing Libraries
`
`11. SCI Sockets Library
`Hermann Hellwagner, Josef Weidendorfer : : : : : : : : : : : : : : : : : : : : : : 209
`11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
`11.1.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
`11.1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
`11.2 Features and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
`11.2.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
`11.2.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
`11.2.3 Communication via the SSLib . . . . . . . . . . . . . . . . . . . . . . 212
`11.2.4 Connection Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
`11.2.5 Handling Special System Calls . . . . . . . . . . . . . . . . . . . . . 216
`11.2.6 Other Calls Intercepted and Handled by the SSLib . . . 218
`11.2.7 Out-of-Band Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
`11.3 Implementation Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
`11.3.1 Communication Among Components . . . . . . . . . . . . . . . . 218
`11.3.2 SSLib Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
`11.3.3 Choice of Most E(cid:14)cient Communication Mechanism . . 220
`11.3.4 SSLib Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
`11.3.5 Control Transfers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
`11.4 Functional Tests and Performance . . . . . . . . . . . . . . . . . . . . . . . . 222
`11.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
`11.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
`
`12. TCP=IP over SCI under Linux
`H¨useyin Taskin, Roger Butenuth : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 231
`12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
`12.2 SCIP Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
`12.2.1 Packet Driver Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
`12.2.2 Hardware Address Resolution . . . . . . . . . . . . . . . . . . . . . . 232
`12.2.3 Other Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . 233
`12.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
`12.3.1 Con(cid:12)guration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
`12.3.2 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
`12.3.3 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
`12.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
`
`13. PVM for SCI Clusters
`Markus Fischer, Alexander Reinefeld : : : : : : : : : : : : : : : : : : : : : : : : : : 239
`13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
`13.2 Parallel Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
`
`

`
`Table of Contents
`
`XV
`
`13.2.1 PVM Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
`13.2.2 Models for Zero-Memory-Copy Data Transfer . . . . . . . . 241
`13.3 SCI Communication Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
`13.4 PVM-SCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
`13.4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
`13.4.2 Supporting Multiple Interconnects . . . . . . . . . . . . . . . . . . 245
`13.4.3 Reducing Memory Copies . . . . . . . . . . . . . . . . . . . . . . . . . 245
`13.4.4 Ring Bu(cid:11)er Management . . . . . . . . . . . . . . . . . . . . . . . . . . 246
`13.4.5 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
`13.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
`
`14. ScaMPI { Design and Implementation
`L.P. Huse, K. Omang, H. Bugge, H. Ry, A.T. Haugsdal, E. Rustad 249
`14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
`14.2 Scali Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
`14.3 The SCI Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
`14.3.1 Coordinating Use of Shared Locations. . . . . . . . . . . . . . . 251
`14.3.2 Ensuring Safe Data Transport in SCI { Checkpointing 252
`14.3.3 Shared Address Space Programming without the
`Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket