`
`Santa Barbara, California
`April 25-28, 1995
`
`Sponsored by
`IEEE Computer Society Technical Committee
`On Parallel Processing
`
`IEEE Computer Society Press + The Institute of Electrical and Electronics Engineers, Inc.
`
`-
`
`BUNGIE - EXHIBIT 1027
`
`
`
`IEEE Computer Society Press
`10662 Los Vaqueros Circle
`P.O. Box 3014
`Los Alamitos, CA 90720-1264
`
`Copyright © 1995 by The Institute of Electrical and Electronics Engineers, Inc.
`All rights reserved.
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that
`carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331.
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. They
`reflect the authors' opinions and, in the interests of timely dissemination, are published as presented and
`without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society Press, or the Institute of Electrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Press Order Number PR07074
`IEEE Catalog Number 95TH8052
`IEEE Computer Society ISBN 0-8186-7074-6
`IEEE ISBN 0-7803-2530-3 (microfice)
`ISSN 1063-7133
`
`Additional copies may be ordered from:
`
`IEEE Computer Society Press
`Customer Service Center
`10662 Los Vaqueros Circle
`P.O. Box 3014
`Los Alamitos, CA 90720-1264
`Tel: +1-714-821-8380
`Fax: +1-714-821-4641
`Email: cs.books@computer.org
`
`IEEE Service Center
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, NJ 08855-1331
`Tel: + 1-908-981-1393
`Fax: + 1-908-981-9667
`
`IEEE Computer Society
`13, A venue de I' Aquil on
`B-1200 Brussels
`BELGIUM
`Tel: +32-2-770-2198
`Fax: +32-2-770-8505
`
`IEEE Computer Society
`Ooshima Building
`2-19-1 Minami-Aoyama
`Minato-ku, Tokyo 107
`JAPAN
`Tel: +81-3-3408-3118
`Fax: +81-3-3408-3553
`
`Editorial production by Bob Werner
`Cover art production by Michael Nomura
`Printed in the United States of America by Braun-Brumfield, Inc.
`
`•
`
`The Institute of Electrical and Electronics Engineers, Inc.
`
`
`
`Keynote Address: Modeling Parallel Communication
`Richard Karp, University of California, Berkeley ______________________ 2 1 1 ) C
`/C/Ct j
`
`Session 1: Networks
`Chair: Allan Gottlieb, New York University
`
`EMS
`LIB
`Contents fA
`
`'·: \
`·1/.
`-fi(',-
`
`The Partitioned Optical Passive Stars (POPS) Topology ___________________ 4
`G. Gravenstreter, Rami Melhem, D.M. Chiarulli,
`S.P. Levitan, and J.P. Teza
`
`The Mcube: A Symmetrical Cube Based Network with Twisted Links ______________ l l
`Nitin K. Singhvi and Kanad Ghose
`
`Multi-Mesh-An Efficient Topology for Parallel Processing _________________ 17
`Debasish Das and Bhabani P. Sinha
`
`Accuracy vs. Performance in Parallel Simulation of Interconnection Networks ___________ 22
`Douglas C. Burger and David A. Wood
`
`Fat-tree for Local Area Multiprocessors _________________________ 32
`Qiang Li and David Gustavson
`
`On Generalized Fat Trees ______________________________ 37
`Sabine R. Ohring, Maximilian /be[,
`Sajal K. Das, and Mohan J. Kumar
`
`Session 2: Scientific Computing I
`Chair: John Gustafson, Ames Laboratory
`
`Parallel Monte Carlo Simulation of MBE Growth _____________________ 46
`Isabel Beichl, Y. Ansel Teng, and Jim Blue
`
`Monte Carlo and Molecular Dynamics Simulations Using p4 _________________ 53
`K.J. Runge, P. Lee, J. Correa, R. T. Scalettar, and V. Oklobdzija
`
`Performance Evaluation of a Seismic Data Analysis Kernel on the KSR Multiprocessors _______ 60
`Weiming Gu
`
`Performance Evaluation of a New Parallel Preconditioner __________________ 65
`Keith Gremban, Gary Miller, and Marco 'Zagha
`
`A General Purpose Sparse Matrix Parallel Solvers Package __________________ 70
`Hong Q. Ding and Robert D. Ferraro
`
`Parallel Algorithms for Space-Time Adaptive Processing ___________________ 77
`Serge J. Olszanskyj, James M. Lebak, and Adam W. Bojanczyk
`
`Session 3: Graph Algorithms
`Chair: Oscar Ibarra, University of California, Santa Barbara
`
`Parallel Algorithms for Maximum Matching in Interval Graphs _________________ 84
`M.G. Andrews, M.J. Atallah, D.Z. Chen, and D.T. Lee
`
`An EREW PRAM Fully-Dynamic Algorithm for MST ___________________ 93
`Paolo Ferragina
`
`V
`
`
`
`~. · Recognizing Depth-First-Search Trees in Parallel _____________________ 101
`C.H. Peng, B.F. Wang, and J.S. Wang
`
`Implementation of Parallel Graph Algorithms on a Massively Parallel SIMD
`Computer with Virtual Processing ___________________________ 106
`Tsan-sheng Hsu, Vijaya Ramachandran, and Nathaniel Dean
`
`A Highly Parallel Algorithm to Approximate MaxCut on Distributed Memory Architectures ______ 113
`Steven Homer and Marcus Peinado
`
`A Distributed Algorithm for the Detection of Local Cycles and Knots ______________ 118
`Azzedine Boukerche and Carl Trapper
`
`Session 4: Communication and 1/0
`Chair: C.S. Raghavendra, Washington State University
`
`PCODE: An Efficient and Reliable Collective Communication Protocol for
`Unreliable Broadcast Domains ____________________________ 130
`Jehoshua Bruck, Danny Dolev, Ching-Tien Ho,
`Rimon Orni, and Ray Strong
`
`Experience with Active Messages on the Meiko CS-2 ____________________ 140
`Klaus E. Schauser and Chris J. Scheiman
`
`Performance of the Vesta Parallel File System ______________________ 150
`Dror G. Feitelson, Peter F. Corbett, and Jean-Pierre Prost
`
`VIP-FS: A Virtual, Parallel File System for High Performance Parallel and
`DistributedComputing _______________________________ 159
`Juan Miguel del Rosario, Michael Harry, and Alok Choudhary
`
`Characterizing Parallel File-Access Patterns on a Large-Scale Multiprocessor ___________ 165
`Apratim Purakayastha, Carla Schlatter Ellis, David Kotz,
`Nils Nieuwejaar, and Michael Best
`
`Parallel Algorithms for Database Operations and a Database Operation for Parallel Algorithms _____ 173
`Uzi Vishkin and Rajeev Raman
`
`Session 5: Non-Numeric Algorithms and Applications I
`Chair: Sanjay Ranka, Syracuse University
`
`Solving the Traveling Salesman Problem with a Distributed
`Branch-and-Bound Algorithm on a 1024 Processor Network _________________ 182
`S. Tschoke, R. Luling, and B. Monien
`
`J. Sequence Comparison on a Cluster of Workstations Using the PVM System ___________ 190
`
`X. Guan, R.J. Mural, and E. C. Uberbacher
`B-Trees with Relaxed Balance ____________________________ 196
`Kim S. Larsen and Rolf Fagerberg
`
`Fast Parallel Algorithms for Minimum and Related Problems with Small Integer Inputs _ __ _ ___ 203
`Omer Berkman and Yossi Matias
`
`A Note on Reducing Parallel Model Simulations to Integer Sorting _______________ .208
`Yossi Matias and Uzi Vishkin
`
`The Parameterized Round-Robin Partitioned Algorithm for Parallel External Sort __________ .213
`Honesty C. Young and Arun N. Swami
`
`vi
`
`
`
`Session 6: Partitioning and Data Distribution
`Chair: Prith Banerjee, University of Illinois, Urbana-Champaign
`
`Partitioning Regular Grid Applications with Irregular Boundaries for
`Cache-Coherent Multiprocessors ___________________________ 222
`Yang Zeng and Santosh G. Abraham
`
`Statement-Level Independent Partitioning of Uniform Recurrences _______________ 229
`J. Ramanujam and S. Vasanthakumar
`
`A Synthesis Method of LSGP Partitioning for Given-Shape Regular Arrays ____________ 234
`G.M. Megson and Xian Chen
`
`Hierarchical Tiling for Improved Superscalar Performance __________________ 239
`La.rry Carter, Jeanne Ferrante, and Susan Flynn Hummel
`
`Replication of Uniformly Accessed Shared Data for Large-Scale Data-Parallel Algorithms _ __ _ __ 246
`Chung-Ming Chen and Sao-Young Lee
`
`The Emulation Problem on Trees ___________________________ 25I
`Daw-Jong Shyu, Biing-Feng Wang, and Chuan-Yi Tang
`
`Session 7: Synchronization and Scheduling
`Chair: D.K. Panda, Ohio State University
`
`A Performance Comparison of Fast Distributed Mutual Exclusion Algorithms ___________ .258
`Theodore Johnson
`
`ALLNODE Barrier Synchronization Network ______________________ .265
`Howard T. Olnowich
`
`Efficient Implementation of Mutual Exclusion Locks in Large Multiprocessors ___________ .270
`Nian-Feng Tzeng and Shiwa S. Fu
`
`Bicriterion Scheduling of Identical Processing Time Jobs by Uniform Processors __________ 276
`A. Tuzikov, M. Makhaniok, and R. Manner
`
`Fast Scheduling of Periodic Tasks on Multiple Resources __________________ 280
`Sanjoy K. Baruah, Johannes Gehrke, and C. Greg Plaxton
`
`A Parallel Approach for Multiprocessor Scheduling ____________________ 289
`lshfaq Ahmad and Yu-Kwong Kwok
`
`Session 8: Parallel Algorithms on Networks
`Chair: David Nassimi, New Jersey Institute of Technology
`
`A Simple Voronoi Diagram Algorithm for a Reconfigurable Mesh _______________ 296
`Hossam ElGindy and La.chlan Wetherall
`
`Robust Shearsort on Incomplete Bypass Meshes _____________________ 304
`Behrooz Parhami and Ching Yu Hung
`
`Fault-Tolerant Sorting in SIMD Hypercubes ______________________ 3.12
`A. Mishra, Y. Chang, L. Bhuyan, and F. Lombardi
`
`A Faster Sorting Algorithm in the Broadcast Communication Model ______________ 319
`Stephan Olariu and James L. Schwing
`
`Efficient Algorithms for Global Data Communication on the Multidimensional Torus Network _____ 324
`Paraskevi Fragopoulou and Selim Aki
`
`Vil
`
`
`
`A Near-Optimal Algorithm for Gossiping in ad-Dimensional Mesh Bus Interconnection Network ____ 331
`Arun Jagota
`
`Session 9: Compiler Techniques
`Chair: Alan Sussman, University of Maryland
`
`Combining Dependence and Data Flow Anaylses to Optimize Communication ___________ 340
`Ken Kennedy and Nenad Nedeljkovic
`
`Parallelizing While Loops for Multiprocessor Systems ___________________ 347
`Lawrence Rauchwerger and David Padua
`Symbolic Range Propagation ____________________________ 357
`William Blume and Rudolf Eigenmann
`
`Construction DO Loops for Non-Convex Iteration Spaces in Compiling for Parallel Machines _____ .364
`Jingling Xue
`
`Compiler Techniques for Increasing CU/PE Overlap in SIMD Machines _____________ 3,69
`Gene Saghi and Howard Jay Siegel
`
`Keynote Address: The New Era of MPPs: Moderately Parallel Processors
`Forest Baskett, Silicon Graphics, Inc.
`
`Session 10: Parallel Architectures
`Chair: Jack Dennis, Massachusetts Institute of Technology
`
`The Impact of Pipelining on SIMD Architectures _____________________ 380
`James D. Allen and David E. Schimmel
`
`An Assessment of COMA Multiprocessors _______________________ 388
`Gyungho Lee
`
`A Workload Characterization for Coarse-Grain Multiprocessors ________________ 393
`Christopher Connelly and Carla Schlatt{?r Ellis
`Unified vs. Split TLBs and Caches in Shared-Memory MP Systems _______________ 398
`Qidong Xu and Patricia J. Teller
`
`Access Order to Avoid Inter-Vector-Conflicts in Complex Memory Systems ___________ 404
`A.M. de[ Corral and J.M. Llaberia
`
`A Unified Theory for a Traffic Analysis in Product Networks _________________ 41 l
`Abdelghani Bellaachia and Abdou Youssef
`
`Session 11: Scientific Computing II
`Chair: Vipin Kumar, University of Minnesota
`
`Geometric Mesh Partitioning: Implementation and Experiments ________________ 418
`John R. Gilbert, Gary L. Miller, and Shang-Hua Teng
`
`Non-Uniform 2-D Grid Partitioning for Heterogeneous Parallel Architectures ___________ 428
`Phyllis Crandall and Michael J. Quinn
`
`Toward Data Distribution Independent Parallel Matrix Multiplication ______________ 436
`Hyuk J. Lee and Jose A.B. Fortes
`
`viii
`
`
`
`Multi-Phase Array Redistribution: Modeling and Evaluation _________________ 441
`S.D. Kaushik, C.-H Huang, J. Ramanujam, and P. Sadayappan
`
`Minimizing Communication Overhead for Matrix Inversion Algorithms on Hypercubes _______ 446
`Xiadong Wang and Vwani P. Roychowdhury
`
`Folding Spatial Image Filters on the CM-5 ______________________ 451
`Sandra G. Dykes and Xiaodong Zhang
`
`Session 12: Resource Management
`Chair: Sartaj Sahni, University of Florida
`
`Robust Parallel Resource Management in Shared Memory Multiprocessor Systems _________ 458
`I-Ling Yen and Farokh B. Bastani
`
`Efficient Processor Allocation for 30 Tori _______________________ 466
`Wenjian Qiao and Lionel M. Ni
`
`An Analytical Comparison of Nearest Neighbor Algorithms for Load
`Balancing in Parallel Computers ___________________________ 472
`Chengzhong Xu, Burkhard Monien, Reinhard Luling, and Francis Lau
`
`Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in
`Coherent Shared-Memory Systems __________________________ 480
`Michael Marchetti, Leonidas Kontothanassis,
`Ricardo Bianchini, and Michael Scott
`
`Operating System Support for Concurrent Remote Task Creation ________________ 486
`Dejan S. Milojicic, David Black, and Steve Sears
`
`Industrial Track: Invited Vendor Presentations
`Industrial Track Chair: John K. Antonio, Purdue University
`
`Industrial Track: Session-I
`Architectures and Instrumentation
`Chair: Richard C. Metzger, Rome Laboratory
`
`SPI: An Instrumentation Development Environment for Parallel/Distributed Systems ________ 494
`Devesh Bhatt, Rakesh Jha, Todd Steeves,
`Rashmi Bhatt, and David Wills (Honeywell Technology Center)
`
`A Rugged Scalable Parallel System ___ __ __________ __ ___ __ _ _ .502
`Alan Smeyne and John R. Nickolls (Litton Guidance and Control Systems, Inc. and
`MasPar Computer Corp.)
`
`The RACE Network Architecture __________________________ ~508
`Bradley C. Kuszmaul (Mercury Computer Systems, Inc.)
`
`Industrial Track: Session-II
`Applications and Programming
`Chair: Sajal K. Das, University of North Texas
`
`Parallel Innovation for the Real World ______________ (paper not received at presstime)
`Peter Madams (nCUBE)
`
`IX
`
`
`
`Performance Results of Several High Performance Fortran Benchmarks _____________ 516
`Douglas Miles, Larry Meadows, and Mark Young (The Portland Group, Inc.)
`
`Transformable Computers & Hardware Object Technology _________________ 518
`John Schewe[, Michael Thornburg, and Steve Casselman (Virtual Computer Corporation)
`
`Session 13: Routing
`Chair: Assaf Schuster, Technion
`
`A Novel Deadlock-Free Routing Technique for a Class of de Bruijn Graph Based Networks ______ 524
`Hyunmin Park and Dharma P. Agrawal
`
`An Efficient Scheme for Complete Exchange in 20 Tori __________________ 532
`Yu-Chee Tseng, Sandeep KS. Gupta, and Dhabaleswar K. Panda
`
`DISHA: A Deadlock Recovery Scheme for Fully Adaptive Routing _______________ 537
`Anjan K. V. and Timothy Mark Pinkston
`
`Efficient Communication Using Total-Exchange _____________________ .544
`Satish B. Rao, Torsten Suet, Thanasis Tsantilas, and Mark Goudreau
`
`A Fast Distributed Optimal Routing Algorithm for Multicommodity Large Data Networks __ __ __ 551
`Garng M. Huang and Shan Zhu
`
`Efficient Routing and Message Bounds for Optimal Parallel Algorithms _____________ .556
`Xiaotie Deng and Patrick Dymond
`
`Session 14: Non-Numeric Algorithms and Applications II
`Chair: Charles Weems, University of Massachusetts
`
`&ACE: A High-Performance Parallel Prolog System ___________________ .564
`Enrico Pontelli, Gopal Gupta, and Manuel Hermenegildo
`
`Process Combination to Increase Event Granularity in Parallel Logic Simulation __________ 572
`Timothy J. McBrayer and Philip Wilsey
`
`Parallel Algorithms for Logic Synthesis Using the MIS Approach - - - - - - - - - - - - - -~579
`Kaushik De, John Chandy, Sumit Roy,
`Steven Parkes, and Prithviraj Banerjee
`
`Load Sharing in the Training Set Partition Algorithm for Parallel Neural Learning ------------'586
`Bernard Girau and Helene Paugam-Moisy
`
`Reconfiguration and Experiments on a Faulty Hypercube - - - - - - - - - - - - - - - - -~5 9 2
`Guanghua Lin and Nian-Feng Tzeng
`
`Time Scale Combining of Conservative Parallel Discrete Event Simulations ___________ 599
`Hatem Sellami and Sudhakar Yalamanchili
`
`Session 15: Tracing and Performance Tools
`Chair: Andrew Chien, University of Illinois, Urbana-Champaign
`
`Capturing and Automating Performance Diagnosis: The Poirot Approach ____________ 606
`B. Robert Helm, Allen D. Malony, and Stephen F. Fickas
`
`Monitoring and Controlling Remote Parallel Computations Using Schooner ___________ 614
`Zhanliang Chen and Richard D. Schlichting
`
`X
`
`
`
`Automated Instrumentation and Monitoring of Data Movement in Parallel Programs ________ 621
`Sekhar R. Sarukkai, Jerry Yan, and Melisa Schmidt
`
`Performance Prediction for Portable Parallel Execution on MIMD Architectures __________ 630
`Gopal Chillariga and Balkrishna Ramkumar
`
`Symbolic Performance Prediction of Scalable Parallel Programs ________________ 635
`Mark J. Clement and Michael J. Quinn
`
`A Visualization-Based Environment for Top-Down Debugging of Parallel Programs ________ 640
`Joseph L. Sharnowski and Betty H.C. Cheng
`
`Panel 1: Networked Parallel Processing:
`Is It the Future of Scalable Computing?
`Moderator: Kai Hwang, University of Southern California
`
`Panelists:
`
`Tilak Agerwala, IBM Kingston, New York; James Cownie, Meiko Corp.; David Du, University of Minnesota; Kai
`Li, Princeton University; Doug Pase, Cray Research, Inc.; Michael Quinn, Oregon State University; John Silvester,
`University of Southern California; Richard Stellwagen, NCR/AT&T, San Diego; Steve Wallach, Convex Computer
`Corp.; David Wood, University of Wisconsin, Madison
`
`Keynote Address: Parallel Computing
`Sartaj Sahni, University of Florida
`
`Session 16: Global Operations and Clocks
`Chair: James Goodman, University of Wisconsin, Madison
`
`Global Reduction in Wormhole k-ary n-cube Networks with Multidestination Exchange Worms ____ 652
`Dhabaleswar K. Panda
`
`Broadcasting on the Star and Pancake Interconnection Networks ________________ 660
`Ke Qiu
`
`Time Synchronization on SP! and SP2 Parallel Systems ___________________ 666
`Biilent Abati and Craig B. Stunkel
`
`Hardware for Fast Global Operations on Multicomputers __________________ 673
`Douglas V. Hall and Michael A. Driscoll
`
`Timestamp Consistency and Trace-Driven Analysis for Distributed Parallel Systems ________ 680
`C. Eric Wu, Yew-Huey Liu, and Yarsun Hsu
`
`Session 17: Visualization and Image Processing
`Chair: Charles Weems, University of Massachusetts
`
`Parallel Implementation of Ray-Tracing Algorithm on the Intel Delta Parallel Computer _______ 688
`Tong-Yee Lee, C.S. Raghavendra, and John B. Nicholas
`
`Parallel Volume Rendering _____________________________ 693
`Ruediger Westermann
`
`Parallel Implementation of Volume Rendering on Denali Graphics Systems ___________ 700
`Sheng-Yih Guan, Avi Bleiweiss, and Richard Lipes
`
`An Optimal Parallel Algorithm for Volume Ray Casting ___________________ 707
`Vineet Goel and Amar Mukherjee
`
`XI
`
`
`
`Parallel Motion Computing on the MasPar MP-2 Machine __________________ 712
`A. Branca, A. Distante, and H.R.P. Ellingworth
`
`Scalable Parallel List Ranking of Image Edges on Fine-Grained Machines ____________ 7 l 7
`Jamshed N. Patel, Ashfaq A. Khokhar, and Leah H. Jamieson
`
`Session 18: Parallel Programming
`Chair: Joel Saltz, University of Maryland
`J Integrating Task and Data Parallelism with the Group Communication Archetype __________ 724
`K. Mani Chandy, Rajit Manohar, Berna L. Massingill, and Daniel l.Meiron
`
`Divide-and-Conquer Programming on MIMD Computers __________________ 734
`Santhosh Kumaran and Michael J. Quinn
`
`New Data-Parallel Language Features for Sparse Matrix Computations _____________ 742
`Manuel Ujaldon, Emilio Zapata, Barbara M. Chapman, and Hans P. Zima
`
`Visualizing the Execution of High Performance Fortran (HPF) Programs _____________ 750
`Doug Kime/man, Pradeep Mittal, Edith Schonberg, Peter F. Sweeney,
`Ko-Yang Wang, and Dror Zernik
`
`Parallelization of a Wave Propagation Application using a Data Parallel Compiler _________ 760
`Francoise Andre, Marc Le Fur, Yves Maheo, and Jean-Louis Pazat
`
`On Deriving Data Parallel Code from a Functional Program _________________ 766
`Patrice Quinton, Sanjay Rajopadhye, and Doran Wilde
`
`Session 19: Special Purpose Architectures
`Chair: N. Ranganathan, University of South Florida
`
`Synapse-I: A High-Speed General Purpose Parallel Neurocomputer System ___________ 774
`U. Ramacher, W. Raab, J. Anlauf, U. Hachmann, J. Beichter, N. Bruis,
`M. Webeling, E. Sicheneder, J. Gliij3, A. Wurz, and R. Manner
`
`NERY: A Parallel Processor for Standard·Genetic Algorithms ________________ 782
`R. Hauser, R. Manner, and M. Makhaniok
`
`Performance Measurements of a Concurrent Production System Architecture without
`Global Synchronization ______________________________ 790
`Jose Nelson Amaral and Joydeep Ghosh
`
`Parallel Processing Algorithms and Architecture for Multimedia On-Demand Servers ________ 798
`Raja Neogi and Meghanad D. Wagh
`
`Evaluating Bayes Nets with Concurrent Process Networks __________________ 805
`E. Tick and Bruce D'Ambrosia
`
`Session 20: Run -Time Support for Irregular Parallelism
`Chair: Leah Jamieson, Purdue University
`
`Index Translation Schemes for Adaptive Computations on Distributed Memory Multicomputers ____ 812
`Bongki Moon, Mustafa Uysal, and Joel Saltz
`
`Exploiting Spatial Regularity in Irregular Iterative Applications ________________ 820
`Antonio Lain and Prithviraj Banerjee
`
`Data Parallel Programming in an Adaptive Environment ___________________ 827
`Guy Edjlali, Gagan Agrawal, Alan Sussman, and Joel Saltz
`
`XU
`
`
`
`Run-Time Support for Asynchronous Parallel Computations _________________ 833
`Guillermo A. Alvarez, Marcelo 0. Fernandez, Rogelio Alvez,
`Sylvia Rodriguez, Julio A. Sanchez Avalos, and Jorge L.C. Sanz
`
`An Adaptive Synchronization Method for Unpredictable Communication
`Patterns in Dataparallel Programs __________________________ 838
`Sundeep Prakash and Rajive Bagrodia
`
`Panel 2: Different Approaches to Parallel Computing Education
`Moderator: Russ Miller, Department of Computer Science, State University of New York at Buffalo ___ 846
`
`Panelists:
`
`Daniel C. Hyde, Computer Science Department, Bucknell University; David Kotz, Department of Computer Science,
`Dartmouth College; Gordon Makinson, Senior Lecturer Computer Science, Computing Laboratory, University of
`Kent, UK; P. Takis Metaxas, Computer Science Department, Wellesley College; Chris Nevison, Computer Science
`Department, Colgate University; Nan C. Schaller, Computer Science Department, Rochester Institute of Technology;
`Greg V. Wilson, Computer Science Department, University of Toronto.
`
`Index of Authors __________________ .:....._ _______ .849
`
`xiii
`
`
`
`Efficient Algorithms for Global Data Communication
`on the Multidimensional Torus Network
`
`Paraskevi Fragopoulou and Selim G. Aki
`( fragopou, akl )@qucis.queensu.ca
`Department of Computing & Information Science, Queen's University
`Kingston, Ontario, Canada, K7L 3N6
`
`Abstract
`
`Efficient interproces.rnr communication is crucial to
`increasing the performanci: of parallel multiproce.ssors.
`In this paper, a special framework is developed on /he
`multidimensional torus, a network /hat is c1irrently re(cid:173)
`ceiving considerable attention. Using this framnvork
`as the basic tool, a spannmg graph with special prop(cid:173)
`erties, to fit uarious communicatio11 need.s,
`i.s con(cid:173)
`structed on the network. The importance of tl,is !Jrnph
`is demonstrated with th c development of optimal algo(cid:173)
`rithms for thret: fundamental commm11cat1on problon.s,
`namely the m1tltinode broadcasting and the. single-node
`a11d multinode scattering under the store-and-forw,i.rd,
`all-port communication model.
`
`1
`
`Introduction
`
`In this paper we concentrate on three fnndanwn(cid:173)
`tal communication problems, namely the multinode
`broadcasting, and the single-node and multinode srnt(cid:173)
`tcring, on the popular multidimensional turu.~ network.
`Broadcasting is the distribution of the same group of
`messages from a source processor to all other pro(cid:173)
`ce8sors, and scattering is the distiibution of distinct.
`groups of messages from a source processor to all other
`processors. We consider scatt.eriug from a single pro(cid:173)
`ce,;sor of the network ( single-node scatt.eriug). Further(cid:173)
`more, we consider broadcasting and scattering simulta(cid:173)
`neously from all processors of the network ( mnltinode
`broadcasting and scattering). Algorithms for the same
`communication problems using a similar approach have
`been previously construd.ed on the binary hypercnhe
`[l, 7], the generalized hypercube (4], and the star net.(cid:173)
`works [3].
`All of the communication problems are studied
`under the store-and-forward, all-port conununicatiou
`model, meaning that in one time step a processor can
`ex,:hange messages of fixed length with all of its neigh-
`
`hors simultaneously. The communication is bidirw(cid:173)
`tional, meaning that an edge can be used for message
`transmission in both dired.ious at each time step. Each
`message requires unit. tiuw to he transn1it.ted on an
`edge, i.e.
`the unit cost. model is used. Finally, each
`processor wishes to transrnit a muubn of M messages
`to each one of its destination processors.
`All of t.he algorithm,; pre;;ent.Pd in this paper are
`based on the const.rud.ion of a spanning graph with
`special properties on the nmltidimensional torus net(cid:173)
`work. A special framework is developed to fadlit.ate
`t.he construction of the spanning graph and t.hP devel(cid:173)
`opment of the comm1mirntio11 algorithms. The mnlt.in(cid:173)
`ode hroacka.<;ting and scat.teriug prohlems ,ue of special
`interest. A special tedmiqne is developed on the multi(cid:173)
`dimensional t.orus so that 111,•ssagt>s originating at indi(cid:173)
`vidual nodes are inter!t,aved in surh a manner that. no
`two messages contend for the ,;au11• edge at. any give11
`t.ime during the execut.iou of the algorithm. For t.hP
`single-node scatt.,·ring prohlPIJl, wllt're th,· edges inci(cid:173)
`dent. to the source nodt> ronst.itut..• a hot.t.li·nel'k for t.lw
`transmission of the 1nessagPs, the spanning graph offers
`the capability to transrnit. au ,•qmil m1rnlwr of messages
`ovPr earh edge inriclent. to tlw somre nod,·.
`
`2 Notations and Definitions
`
`An 11-dimensional, k-ary, multidimensional toms
`net.work, denoted hy JHT,.,,., is an undirected graph
`of kn nodes, each one labeled by fill 11 digit. numlwr in
`radix k arithmetic [2, 8]. Each node ,, is ro1mect.ed t.o
`211 other nodes with which it diffrrs iu ouly one digit
`by j mod k, j E { -1, 1}, i.e ..
`
`Vn-1 ···tli+l Vi1'i-1 ···Vo, Vn-1 · .. Vt+I V~Vi-) ... Vo
`is an edge for all O s; i s; 11 - 1, and v: = ( v, + j) mod k,
`j E {-1, l}, Fig, 1. The nt>t.work is ecl~e and node
`symnwt.rir with degre(' 211 and diitmPtN 11 Li J. MT,. k
`
`10ti3-7133/95 $4.00 rQ 1995 IEE~;
`
`324
`
`
`
`of the multidimensional t.orus has properties similar to
`those of the right cyclic shift operation on nodes of the
`binary hypercuhe [7, l].
`
`Definition 2 Given function r(i) == (k -
`i) mod k
`from the set {O, 1, ... , k - 1} lo it.self, the rotation of
`a node v = Vn-l···Vi+1v;v;-1, .. Vo, denoted by Ro(v),
`is defined to be node r(vo)vn-\Vn-2, .. V;+,v;v;-1 ... V\.
`This can be viewed as a right cyclic .~hift of the digits
`of v with the wraparound digit being ma71ped through
`function r.
`
`The rotation operation has the following properties:
`1. If ( v, u) is an edge of dimension rl;, 0 ::; i S n - 1,
`j E {-1, 1 }, then edge ( Ro(v), Ro( 11 )) is of dimen(cid:173)
`sion gf,' so that:
`
`i' = (i - 1) mod n,
`
`j' = {
`
`-j,
`j,
`
`if i = 0,
`otherwise.
`
`2. As a result of prop,.rty 1, the 2n edges derived
`from edge (v, u) by consecutive applications of the
`rotation operation art' all of different clinwnsions.
`
`3. The rotation oprrat.iou pn·sf'rves tht> dist.au, .. of
`each node from nod,· O".
`
`The nodes of AlTn,k an• grou1wd into ,•quivalt>uce
`classPs under the act.iou of tht' rot.at.ion oprration as
`follows:
`
`Definition 3 A II ordered gro1171 of 1rndes, Meli one de(cid:173)
`rwed from 1/s preceding one c,yclirall1J, hy the applica(cid:173)
`tion of a rotation 1s cal/rd a necklace.
`
`Necklaces have t.he following prop..rtit's (5]:
`
`1. A necklace contains al 1110.9/ 211 nodes.
`
`2. The size of a nf'rklilre always divides 2u (6].
`
`3. All nodes of a llt>rklart' HrP at. t.hf' salllt' distance
`form node 0" .
`
`The nodes of A1T,,,k at. earh distauc.e from node O"
`are collections of necklaces. Below, the nt'cklaces of
`MTJ,3 at. each dist.au re d, 0 S d S 11 l # J, from node O"
`are given en dosed in pan·ut.heses (5]. -
`
`d = 0:
`cl= l :
`d = 2:
`d = 3:
`
`( 000)
`( 200, 020, 002, 100, 010, 001 )
`( 220, 022. 102, 110, 011, 201 )
`( 210, 021, 202, 120, 012, 101)
`( 222, 122, 112, 111. 211, 221)
`( 21i, 121 )
`
`Figure I: The MT2,4 network.
`
`belongs to the class of Cayley graphs [8]. The 2n gen(cid:173)
`erators that define the edges of MTn,k are denoted
`by g{, 0 S
`i S n - 1,
`j E {-1, 1}. Generator
`gf connects node v = Vn-l···v;+1v,u;-1 ... vo to node
`v' = Vn-1 .. ,vi+1((t,, + j) mod k)v;.-1 ... vo, whkh re(cid:173)
`sults by adding j mod k t,o the ith digit of v. In this
`case we say that edge (v, v') is of dimension f7i. or
`dim(v, v') = gf. In what follows, node 00 ... 0 of MT,..k
`is referred to as node on. In order to avoid tedious
`details, in what follows, we consider only multidimen(cid:173)
`sional torus networks with k odd.
`We now define two automorphisms on the multidi(cid:173)
`mensional torus network,namely, the tra11slatio11 and
`the rotation operations, that will be of primary impor(cid:173)
`tance for the const.ruc:tion of the spanning graph and
`the development. of the communication algorithrns.
`
`Definition 1 The translation of a node v with resp1·ct
`to node s, denoted by Tr,(v), is defined to be 11ode
`t = Tr,(v), so that t; = (v; +s;) mod k, 0 :S: i :S: 11- 1.
`The translation operation is an automorphism on
`the multidimensional torus that preserves the dimen(cid:173)
`sion of each edge [5]. This automorphism allows the
`network to be viewed idf'ntically from all nodes. The
`translation operation on 1\t!Tn,k is analogous to tl1f'
`exclusive-OR operation on nodes of thf' binary hyper(cid:173)
`cuhe [7, 1].
`As emphasized in the introduction, the multinode
`broadcasting and scattering algorithms are designed so
`that. messages originating at individual nodes are intrr(cid:173)
`leaved in such a manner that no two messages cont.end
`for the same edge at any given time. The propi"rties
`of the rotation operation, as explained below, will help
`achieve this attribute. The rotation operation on nod,•s
`
`325
`
`
`
`Definition 4 The penod of a node v, denoted by P(11),
`i.s defined to be the number of nodes contained in the
`necklace to which it belongs.
`
`Definition 5 An unfolded necklace
`is a11 ordered
`grnup of exactly 2n nodes, not neces.sarily disti11ct, each
`one obtained from its preceding one, cyclically, by the
`application of a rotation.
`
`Each necklace has a corresponding unfolded nt>ck(cid:173)
`lace. For necklaces that contain 2n nodes, the <"or(cid:173)
`responding unfolded necklace is I.he necklace itself.
`For necklaces that contain P < 211 nodPs, the corre(cid:173)
`sponding unfolded necklace is the ne<"klacl" rep.,atl"d
`~' times. This is possible since the size of a ni·rklare
`is always a divisor of2n. Below, the unfolded w•rklaces
`of MT3 ,3 at each distanre from node 0°, are given (5).
`
`d= 0:
`d= 1:
`d= 2:
`
`d= 3:
`
`( 000, 000, 000, 000, 000, 000 )
`( 200, 020, 002, 100, 010, 001 I
`( 220, 022, 102, 110, 011, 201 )
`( 210, 021, 202, 120, 012, 101 )
`( 222. 122, 112, 111, 211, 221
`( 212, 121, 212, 121, 212, 121 )
`
`The following definit.ion aims to distinguish 01w par(cid:173)
`ticular node of each necklace.
`
`Definition 6 The binary correspondrnt of a 11odr " of
`M1'n,k is the binar·y number obtainl'rl if we substilufr
`rnch nonzero digit in II with 1. One parlzcular nodl' of
`each necklace is now dl:lting-ui.~hed as follows:
`
`I. Select the nodes of the nerklaee that ht111f'
`largest bi11a1·y corHspondenl.
`
`/ht
`
`:?. Choo.se the /argfst ,w1011g the nodt's sdfflrd i11 s/q1
`1, if the k digits I/tat are used to labd tllf node.s
`of MTn,k are ordered as follows: 0 < 1 < I.: -- 1 <
`i < ... < f !1- This ordering of 1hr
`... i < k -
`digit.'/ is adopted in order io re,flerl how each digit
`co11tribute,'i to the distance of a node from nod( O".
`
`The node selected 111 step 2 is defined to be thr lr11df1'
`of the necklace.
`
`The property of the rotation operation that 211 edges
`each of which is obtained as a rotation of its pr.,n•,ling
`011e are all of different dimensions, along with th,• prop(cid:173)
`rrty of edge d