throbber
Proceedings
`
`Santa Barbara, California
`April 25-28, 1995
`
`Sponsored by
`IEEE Computer Society Technical Committee
`On Parallel Processing
`
`IEEE Computer Society Press + The Institute of Electrical and Electronics Engineers, Inc.
`
`-
`
`BUNGIE - EXHIBIT 1027
`
`

`

`IEEE Computer Society Press
`10662 Los Vaqueros Circle
`P.O. Box 3014
`Los Alamitos, CA 90720-1264
`
`Copyright © 1995 by The Institute of Electrical and Electronics Engineers, Inc.
`All rights reserved.
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that
`carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331.
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. They
`reflect the authors' opinions and, in the interests of timely dissemination, are published as presented and
`without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society Press, or the Institute of Electrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Press Order Number PR07074
`IEEE Catalog Number 95TH8052
`IEEE Computer Society ISBN 0-8186-7074-6
`IEEE ISBN 0-7803-2530-3 (microfice)
`ISSN 1063-7133
`
`Additional copies may be ordered from:
`
`IEEE Computer Society Press
`Customer Service Center
`10662 Los Vaqueros Circle
`P.O. Box 3014
`Los Alamitos, CA 90720-1264
`Tel: +1-714-821-8380
`Fax: +1-714-821-4641
`Email: cs.books@computer.org
`
`IEEE Service Center
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, NJ 08855-1331
`Tel: + 1-908-981-1393
`Fax: + 1-908-981-9667
`
`IEEE Computer Society
`13, A venue de I' Aquil on
`B-1200 Brussels
`BELGIUM
`Tel: +32-2-770-2198
`Fax: +32-2-770-8505
`
`IEEE Computer Society
`Ooshima Building
`2-19-1 Minami-Aoyama
`Minato-ku, Tokyo 107
`JAPAN
`Tel: +81-3-3408-3118
`Fax: +81-3-3408-3553
`
`Editorial production by Bob Werner
`Cover art production by Michael Nomura
`Printed in the United States of America by Braun-Brumfield, Inc.
`
`•
`
`The Institute of Electrical and Electronics Engineers, Inc.
`
`

`

`Keynote Address: Modeling Parallel Communication
`Richard Karp, University of California, Berkeley ______________________ 2 1 1 ) C
`/C/Ct j
`
`Session 1: Networks
`Chair: Allan Gottlieb, New York University
`
`EMS
`LIB
`Contents fA
`
`'·: \
`·1/.
`-fi(',-
`
`The Partitioned Optical Passive Stars (POPS) Topology ___________________ 4
`G. Gravenstreter, Rami Melhem, D.M. Chiarulli,
`S.P. Levitan, and J.P. Teza
`
`The Mcube: A Symmetrical Cube Based Network with Twisted Links ______________ l l
`Nitin K. Singhvi and Kanad Ghose
`
`Multi-Mesh-An Efficient Topology for Parallel Processing _________________ 17
`Debasish Das and Bhabani P. Sinha
`
`Accuracy vs. Performance in Parallel Simulation of Interconnection Networks ___________ 22
`Douglas C. Burger and David A. Wood
`
`Fat-tree for Local Area Multiprocessors _________________________ 32
`Qiang Li and David Gustavson
`
`On Generalized Fat Trees ______________________________ 37
`Sabine R. Ohring, Maximilian /be[,
`Sajal K. Das, and Mohan J. Kumar
`
`Session 2: Scientific Computing I
`Chair: John Gustafson, Ames Laboratory
`
`Parallel Monte Carlo Simulation of MBE Growth _____________________ 46
`Isabel Beichl, Y. Ansel Teng, and Jim Blue
`
`Monte Carlo and Molecular Dynamics Simulations Using p4 _________________ 53
`K.J. Runge, P. Lee, J. Correa, R. T. Scalettar, and V. Oklobdzija
`
`Performance Evaluation of a Seismic Data Analysis Kernel on the KSR Multiprocessors _______ 60
`Weiming Gu
`
`Performance Evaluation of a New Parallel Preconditioner __________________ 65
`Keith Gremban, Gary Miller, and Marco 'Zagha
`
`A General Purpose Sparse Matrix Parallel Solvers Package __________________ 70
`Hong Q. Ding and Robert D. Ferraro
`
`Parallel Algorithms for Space-Time Adaptive Processing ___________________ 77
`Serge J. Olszanskyj, James M. Lebak, and Adam W. Bojanczyk
`
`Session 3: Graph Algorithms
`Chair: Oscar Ibarra, University of California, Santa Barbara
`
`Parallel Algorithms for Maximum Matching in Interval Graphs _________________ 84
`M.G. Andrews, M.J. Atallah, D.Z. Chen, and D.T. Lee
`
`An EREW PRAM Fully-Dynamic Algorithm for MST ___________________ 93
`Paolo Ferragina
`
`V
`
`

`

`~. · Recognizing Depth-First-Search Trees in Parallel _____________________ 101
`C.H. Peng, B.F. Wang, and J.S. Wang
`
`Implementation of Parallel Graph Algorithms on a Massively Parallel SIMD
`Computer with Virtual Processing ___________________________ 106
`Tsan-sheng Hsu, Vijaya Ramachandran, and Nathaniel Dean
`
`A Highly Parallel Algorithm to Approximate MaxCut on Distributed Memory Architectures ______ 113
`Steven Homer and Marcus Peinado
`
`A Distributed Algorithm for the Detection of Local Cycles and Knots ______________ 118
`Azzedine Boukerche and Carl Trapper
`
`Session 4: Communication and 1/0
`Chair: C.S. Raghavendra, Washington State University
`
`PCODE: An Efficient and Reliable Collective Communication Protocol for
`Unreliable Broadcast Domains ____________________________ 130
`Jehoshua Bruck, Danny Dolev, Ching-Tien Ho,
`Rimon Orni, and Ray Strong
`
`Experience with Active Messages on the Meiko CS-2 ____________________ 140
`Klaus E. Schauser and Chris J. Scheiman
`
`Performance of the Vesta Parallel File System ______________________ 150
`Dror G. Feitelson, Peter F. Corbett, and Jean-Pierre Prost
`
`VIP-FS: A Virtual, Parallel File System for High Performance Parallel and
`DistributedComputing _______________________________ 159
`Juan Miguel del Rosario, Michael Harry, and Alok Choudhary
`
`Characterizing Parallel File-Access Patterns on a Large-Scale Multiprocessor ___________ 165
`Apratim Purakayastha, Carla Schlatter Ellis, David Kotz,
`Nils Nieuwejaar, and Michael Best
`
`Parallel Algorithms for Database Operations and a Database Operation for Parallel Algorithms _____ 173
`Uzi Vishkin and Rajeev Raman
`
`Session 5: Non-Numeric Algorithms and Applications I
`Chair: Sanjay Ranka, Syracuse University
`
`Solving the Traveling Salesman Problem with a Distributed
`Branch-and-Bound Algorithm on a 1024 Processor Network _________________ 182
`S. Tschoke, R. Luling, and B. Monien
`
`J. Sequence Comparison on a Cluster of Workstations Using the PVM System ___________ 190
`
`X. Guan, R.J. Mural, and E. C. Uberbacher
`B-Trees with Relaxed Balance ____________________________ 196
`Kim S. Larsen and Rolf Fagerberg
`
`Fast Parallel Algorithms for Minimum and Related Problems with Small Integer Inputs _ __ _ ___ 203
`Omer Berkman and Yossi Matias
`
`A Note on Reducing Parallel Model Simulations to Integer Sorting _______________ .208
`Yossi Matias and Uzi Vishkin
`
`The Parameterized Round-Robin Partitioned Algorithm for Parallel External Sort __________ .213
`Honesty C. Young and Arun N. Swami
`
`vi
`
`

`

`Session 6: Partitioning and Data Distribution
`Chair: Prith Banerjee, University of Illinois, Urbana-Champaign
`
`Partitioning Regular Grid Applications with Irregular Boundaries for
`Cache-Coherent Multiprocessors ___________________________ 222
`Yang Zeng and Santosh G. Abraham
`
`Statement-Level Independent Partitioning of Uniform Recurrences _______________ 229
`J. Ramanujam and S. Vasanthakumar
`
`A Synthesis Method of LSGP Partitioning for Given-Shape Regular Arrays ____________ 234
`G.M. Megson and Xian Chen
`
`Hierarchical Tiling for Improved Superscalar Performance __________________ 239
`La.rry Carter, Jeanne Ferrante, and Susan Flynn Hummel
`
`Replication of Uniformly Accessed Shared Data for Large-Scale Data-Parallel Algorithms _ __ _ __ 246
`Chung-Ming Chen and Sao-Young Lee
`
`The Emulation Problem on Trees ___________________________ 25I
`Daw-Jong Shyu, Biing-Feng Wang, and Chuan-Yi Tang
`
`Session 7: Synchronization and Scheduling
`Chair: D.K. Panda, Ohio State University
`
`A Performance Comparison of Fast Distributed Mutual Exclusion Algorithms ___________ .258
`Theodore Johnson
`
`ALLNODE Barrier Synchronization Network ______________________ .265
`Howard T. Olnowich
`
`Efficient Implementation of Mutual Exclusion Locks in Large Multiprocessors ___________ .270
`Nian-Feng Tzeng and Shiwa S. Fu
`
`Bicriterion Scheduling of Identical Processing Time Jobs by Uniform Processors __________ 276
`A. Tuzikov, M. Makhaniok, and R. Manner
`
`Fast Scheduling of Periodic Tasks on Multiple Resources __________________ 280
`Sanjoy K. Baruah, Johannes Gehrke, and C. Greg Plaxton
`
`A Parallel Approach for Multiprocessor Scheduling ____________________ 289
`lshfaq Ahmad and Yu-Kwong Kwok
`
`Session 8: Parallel Algorithms on Networks
`Chair: David Nassimi, New Jersey Institute of Technology
`
`A Simple Voronoi Diagram Algorithm for a Reconfigurable Mesh _______________ 296
`Hossam ElGindy and La.chlan Wetherall
`
`Robust Shearsort on Incomplete Bypass Meshes _____________________ 304
`Behrooz Parhami and Ching Yu Hung
`
`Fault-Tolerant Sorting in SIMD Hypercubes ______________________ 3.12
`A. Mishra, Y. Chang, L. Bhuyan, and F. Lombardi
`
`A Faster Sorting Algorithm in the Broadcast Communication Model ______________ 319
`Stephan Olariu and James L. Schwing
`
`Efficient Algorithms for Global Data Communication on the Multidimensional Torus Network _____ 324
`Paraskevi Fragopoulou and Selim Aki
`
`Vil
`
`

`

`A Near-Optimal Algorithm for Gossiping in ad-Dimensional Mesh Bus Interconnection Network ____ 331
`Arun Jagota
`
`Session 9: Compiler Techniques
`Chair: Alan Sussman, University of Maryland
`
`Combining Dependence and Data Flow Anaylses to Optimize Communication ___________ 340
`Ken Kennedy and Nenad Nedeljkovic
`
`Parallelizing While Loops for Multiprocessor Systems ___________________ 347
`Lawrence Rauchwerger and David Padua
`Symbolic Range Propagation ____________________________ 357
`William Blume and Rudolf Eigenmann
`
`Construction DO Loops for Non-Convex Iteration Spaces in Compiling for Parallel Machines _____ .364
`Jingling Xue
`
`Compiler Techniques for Increasing CU/PE Overlap in SIMD Machines _____________ 3,69
`Gene Saghi and Howard Jay Siegel
`
`Keynote Address: The New Era of MPPs: Moderately Parallel Processors
`Forest Baskett, Silicon Graphics, Inc.
`
`Session 10: Parallel Architectures
`Chair: Jack Dennis, Massachusetts Institute of Technology
`
`The Impact of Pipelining on SIMD Architectures _____________________ 380
`James D. Allen and David E. Schimmel
`
`An Assessment of COMA Multiprocessors _______________________ 388
`Gyungho Lee
`
`A Workload Characterization for Coarse-Grain Multiprocessors ________________ 393
`Christopher Connelly and Carla Schlatt{?r Ellis
`Unified vs. Split TLBs and Caches in Shared-Memory MP Systems _______________ 398
`Qidong Xu and Patricia J. Teller
`
`Access Order to Avoid Inter-Vector-Conflicts in Complex Memory Systems ___________ 404
`A.M. de[ Corral and J.M. Llaberia
`
`A Unified Theory for a Traffic Analysis in Product Networks _________________ 41 l
`Abdelghani Bellaachia and Abdou Youssef
`
`Session 11: Scientific Computing II
`Chair: Vipin Kumar, University of Minnesota
`
`Geometric Mesh Partitioning: Implementation and Experiments ________________ 418
`John R. Gilbert, Gary L. Miller, and Shang-Hua Teng
`
`Non-Uniform 2-D Grid Partitioning for Heterogeneous Parallel Architectures ___________ 428
`Phyllis Crandall and Michael J. Quinn
`
`Toward Data Distribution Independent Parallel Matrix Multiplication ______________ 436
`Hyuk J. Lee and Jose A.B. Fortes
`
`viii
`
`

`

`Multi-Phase Array Redistribution: Modeling and Evaluation _________________ 441
`S.D. Kaushik, C.-H Huang, J. Ramanujam, and P. Sadayappan
`
`Minimizing Communication Overhead for Matrix Inversion Algorithms on Hypercubes _______ 446
`Xiadong Wang and Vwani P. Roychowdhury
`
`Folding Spatial Image Filters on the CM-5 ______________________ 451
`Sandra G. Dykes and Xiaodong Zhang
`
`Session 12: Resource Management
`Chair: Sartaj Sahni, University of Florida
`
`Robust Parallel Resource Management in Shared Memory Multiprocessor Systems _________ 458
`I-Ling Yen and Farokh B. Bastani
`
`Efficient Processor Allocation for 30 Tori _______________________ 466
`Wenjian Qiao and Lionel M. Ni
`
`An Analytical Comparison of Nearest Neighbor Algorithms for Load
`Balancing in Parallel Computers ___________________________ 472
`Chengzhong Xu, Burkhard Monien, Reinhard Luling, and Francis Lau
`
`Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in
`Coherent Shared-Memory Systems __________________________ 480
`Michael Marchetti, Leonidas Kontothanassis,
`Ricardo Bianchini, and Michael Scott
`
`Operating System Support for Concurrent Remote Task Creation ________________ 486
`Dejan S. Milojicic, David Black, and Steve Sears
`
`Industrial Track: Invited Vendor Presentations
`Industrial Track Chair: John K. Antonio, Purdue University
`
`Industrial Track: Session-I
`Architectures and Instrumentation
`Chair: Richard C. Metzger, Rome Laboratory
`
`SPI: An Instrumentation Development Environment for Parallel/Distributed Systems ________ 494
`Devesh Bhatt, Rakesh Jha, Todd Steeves,
`Rashmi Bhatt, and David Wills (Honeywell Technology Center)
`
`A Rugged Scalable Parallel System ___ __ __________ __ ___ __ _ _ .502
`Alan Smeyne and John R. Nickolls (Litton Guidance and Control Systems, Inc. and
`MasPar Computer Corp.)
`
`The RACE Network Architecture __________________________ ~508
`Bradley C. Kuszmaul (Mercury Computer Systems, Inc.)
`
`Industrial Track: Session-II
`Applications and Programming
`Chair: Sajal K. Das, University of North Texas
`
`Parallel Innovation for the Real World ______________ (paper not received at presstime)
`Peter Madams (nCUBE)
`
`IX
`
`

`

`Performance Results of Several High Performance Fortran Benchmarks _____________ 516
`Douglas Miles, Larry Meadows, and Mark Young (The Portland Group, Inc.)
`
`Transformable Computers & Hardware Object Technology _________________ 518
`John Schewe[, Michael Thornburg, and Steve Casselman (Virtual Computer Corporation)
`
`Session 13: Routing
`Chair: Assaf Schuster, Technion
`
`A Novel Deadlock-Free Routing Technique for a Class of de Bruijn Graph Based Networks ______ 524
`Hyunmin Park and Dharma P. Agrawal
`
`An Efficient Scheme for Complete Exchange in 20 Tori __________________ 532
`Yu-Chee Tseng, Sandeep KS. Gupta, and Dhabaleswar K. Panda
`
`DISHA: A Deadlock Recovery Scheme for Fully Adaptive Routing _______________ 537
`Anjan K. V. and Timothy Mark Pinkston
`
`Efficient Communication Using Total-Exchange _____________________ .544
`Satish B. Rao, Torsten Suet, Thanasis Tsantilas, and Mark Goudreau
`
`A Fast Distributed Optimal Routing Algorithm for Multicommodity Large Data Networks __ __ __ 551
`Garng M. Huang and Shan Zhu
`
`Efficient Routing and Message Bounds for Optimal Parallel Algorithms _____________ .556
`Xiaotie Deng and Patrick Dymond
`
`Session 14: Non-Numeric Algorithms and Applications II
`Chair: Charles Weems, University of Massachusetts
`
`&ACE: A High-Performance Parallel Prolog System ___________________ .564
`Enrico Pontelli, Gopal Gupta, and Manuel Hermenegildo
`
`Process Combination to Increase Event Granularity in Parallel Logic Simulation __________ 572
`Timothy J. McBrayer and Philip Wilsey
`
`Parallel Algorithms for Logic Synthesis Using the MIS Approach - - - - - - - - - - - - - -~579
`Kaushik De, John Chandy, Sumit Roy,
`Steven Parkes, and Prithviraj Banerjee
`
`Load Sharing in the Training Set Partition Algorithm for Parallel Neural Learning ------------'586
`Bernard Girau and Helene Paugam-Moisy
`
`Reconfiguration and Experiments on a Faulty Hypercube - - - - - - - - - - - - - - - - -~5 9 2
`Guanghua Lin and Nian-Feng Tzeng
`
`Time Scale Combining of Conservative Parallel Discrete Event Simulations ___________ 599
`Hatem Sellami and Sudhakar Yalamanchili
`
`Session 15: Tracing and Performance Tools
`Chair: Andrew Chien, University of Illinois, Urbana-Champaign
`
`Capturing and Automating Performance Diagnosis: The Poirot Approach ____________ 606
`B. Robert Helm, Allen D. Malony, and Stephen F. Fickas
`
`Monitoring and Controlling Remote Parallel Computations Using Schooner ___________ 614
`Zhanliang Chen and Richard D. Schlichting
`
`X
`
`

`

`Automated Instrumentation and Monitoring of Data Movement in Parallel Programs ________ 621
`Sekhar R. Sarukkai, Jerry Yan, and Melisa Schmidt
`
`Performance Prediction for Portable Parallel Execution on MIMD Architectures __________ 630
`Gopal Chillariga and Balkrishna Ramkumar
`
`Symbolic Performance Prediction of Scalable Parallel Programs ________________ 635
`Mark J. Clement and Michael J. Quinn
`
`A Visualization-Based Environment for Top-Down Debugging of Parallel Programs ________ 640
`Joseph L. Sharnowski and Betty H.C. Cheng
`
`Panel 1: Networked Parallel Processing:
`Is It the Future of Scalable Computing?
`Moderator: Kai Hwang, University of Southern California
`
`Panelists:
`
`Tilak Agerwala, IBM Kingston, New York; James Cownie, Meiko Corp.; David Du, University of Minnesota; Kai
`Li, Princeton University; Doug Pase, Cray Research, Inc.; Michael Quinn, Oregon State University; John Silvester,
`University of Southern California; Richard Stellwagen, NCR/AT&T, San Diego; Steve Wallach, Convex Computer
`Corp.; David Wood, University of Wisconsin, Madison
`
`Keynote Address: Parallel Computing
`Sartaj Sahni, University of Florida
`
`Session 16: Global Operations and Clocks
`Chair: James Goodman, University of Wisconsin, Madison
`
`Global Reduction in Wormhole k-ary n-cube Networks with Multidestination Exchange Worms ____ 652
`Dhabaleswar K. Panda
`
`Broadcasting on the Star and Pancake Interconnection Networks ________________ 660
`Ke Qiu
`
`Time Synchronization on SP! and SP2 Parallel Systems ___________________ 666
`Biilent Abati and Craig B. Stunkel
`
`Hardware for Fast Global Operations on Multicomputers __________________ 673
`Douglas V. Hall and Michael A. Driscoll
`
`Timestamp Consistency and Trace-Driven Analysis for Distributed Parallel Systems ________ 680
`C. Eric Wu, Yew-Huey Liu, and Yarsun Hsu
`
`Session 17: Visualization and Image Processing
`Chair: Charles Weems, University of Massachusetts
`
`Parallel Implementation of Ray-Tracing Algorithm on the Intel Delta Parallel Computer _______ 688
`Tong-Yee Lee, C.S. Raghavendra, and John B. Nicholas
`
`Parallel Volume Rendering _____________________________ 693
`Ruediger Westermann
`
`Parallel Implementation of Volume Rendering on Denali Graphics Systems ___________ 700
`Sheng-Yih Guan, Avi Bleiweiss, and Richard Lipes
`
`An Optimal Parallel Algorithm for Volume Ray Casting ___________________ 707
`Vineet Goel and Amar Mukherjee
`
`XI
`
`

`

`Parallel Motion Computing on the MasPar MP-2 Machine __________________ 712
`A. Branca, A. Distante, and H.R.P. Ellingworth
`
`Scalable Parallel List Ranking of Image Edges on Fine-Grained Machines ____________ 7 l 7
`Jamshed N. Patel, Ashfaq A. Khokhar, and Leah H. Jamieson
`
`Session 18: Parallel Programming
`Chair: Joel Saltz, University of Maryland
`J Integrating Task and Data Parallelism with the Group Communication Archetype __________ 724
`K. Mani Chandy, Rajit Manohar, Berna L. Massingill, and Daniel l.Meiron
`
`Divide-and-Conquer Programming on MIMD Computers __________________ 734
`Santhosh Kumaran and Michael J. Quinn
`
`New Data-Parallel Language Features for Sparse Matrix Computations _____________ 742
`Manuel Ujaldon, Emilio Zapata, Barbara M. Chapman, and Hans P. Zima
`
`Visualizing the Execution of High Performance Fortran (HPF) Programs _____________ 750
`Doug Kime/man, Pradeep Mittal, Edith Schonberg, Peter F. Sweeney,
`Ko-Yang Wang, and Dror Zernik
`
`Parallelization of a Wave Propagation Application using a Data Parallel Compiler _________ 760
`Francoise Andre, Marc Le Fur, Yves Maheo, and Jean-Louis Pazat
`
`On Deriving Data Parallel Code from a Functional Program _________________ 766
`Patrice Quinton, Sanjay Rajopadhye, and Doran Wilde
`
`Session 19: Special Purpose Architectures
`Chair: N. Ranganathan, University of South Florida
`
`Synapse-I: A High-Speed General Purpose Parallel Neurocomputer System ___________ 774
`U. Ramacher, W. Raab, J. Anlauf, U. Hachmann, J. Beichter, N. Bruis,
`M. Webeling, E. Sicheneder, J. Gliij3, A. Wurz, and R. Manner
`
`NERY: A Parallel Processor for Standard·Genetic Algorithms ________________ 782
`R. Hauser, R. Manner, and M. Makhaniok
`
`Performance Measurements of a Concurrent Production System Architecture without
`Global Synchronization ______________________________ 790
`Jose Nelson Amaral and Joydeep Ghosh
`
`Parallel Processing Algorithms and Architecture for Multimedia On-Demand Servers ________ 798
`Raja Neogi and Meghanad D. Wagh
`
`Evaluating Bayes Nets with Concurrent Process Networks __________________ 805
`E. Tick and Bruce D'Ambrosia
`
`Session 20: Run -Time Support for Irregular Parallelism
`Chair: Leah Jamieson, Purdue University
`
`Index Translation Schemes for Adaptive Computations on Distributed Memory Multicomputers ____ 812
`Bongki Moon, Mustafa Uysal, and Joel Saltz
`
`Exploiting Spatial Regularity in Irregular Iterative Applications ________________ 820
`Antonio Lain and Prithviraj Banerjee
`
`Data Parallel Programming in an Adaptive Environment ___________________ 827
`Guy Edjlali, Gagan Agrawal, Alan Sussman, and Joel Saltz
`
`XU
`
`

`

`Run-Time Support for Asynchronous Parallel Computations _________________ 833
`Guillermo A. Alvarez, Marcelo 0. Fernandez, Rogelio Alvez,
`Sylvia Rodriguez, Julio A. Sanchez Avalos, and Jorge L.C. Sanz
`
`An Adaptive Synchronization Method for Unpredictable Communication
`Patterns in Dataparallel Programs __________________________ 838
`Sundeep Prakash and Rajive Bagrodia
`
`Panel 2: Different Approaches to Parallel Computing Education
`Moderator: Russ Miller, Department of Computer Science, State University of New York at Buffalo ___ 846
`
`Panelists:
`
`Daniel C. Hyde, Computer Science Department, Bucknell University; David Kotz, Department of Computer Science,
`Dartmouth College; Gordon Makinson, Senior Lecturer Computer Science, Computing Laboratory, University of
`Kent, UK; P. Takis Metaxas, Computer Science Department, Wellesley College; Chris Nevison, Computer Science
`Department, Colgate University; Nan C. Schaller, Computer Science Department, Rochester Institute of Technology;
`Greg V. Wilson, Computer Science Department, University of Toronto.
`
`Index of Authors __________________ .:....._ _______ .849
`
`xiii
`
`

`

`Efficient Algorithms for Global Data Communication
`on the Multidimensional Torus Network
`
`Paraskevi Fragopoulou and Selim G. Aki
`( fragopou, akl )@qucis.queensu.ca
`Department of Computing & Information Science, Queen's University
`Kingston, Ontario, Canada, K7L 3N6
`
`Abstract
`
`Efficient interproces.rnr communication is crucial to
`increasing the performanci: of parallel multiproce.ssors.
`In this paper, a special framework is developed on /he
`multidimensional torus, a network /hat is c1irrently re(cid:173)
`ceiving considerable attention. Using this framnvork
`as the basic tool, a spannmg graph with special prop(cid:173)
`erties, to fit uarious communicatio11 need.s,
`i.s con(cid:173)
`structed on the network. The importance of tl,is !Jrnph
`is demonstrated with th c development of optimal algo(cid:173)
`rithms for thret: fundamental commm11cat1on problon.s,
`namely the m1tltinode broadcasting and the. single-node
`a11d multinode scattering under the store-and-forw,i.rd,
`all-port communication model.
`
`1
`
`Introduction
`
`In this paper we concentrate on three fnndanwn(cid:173)
`tal communication problems, namely the multinode
`broadcasting, and the single-node and multinode srnt(cid:173)
`tcring, on the popular multidimensional turu.~ network.
`Broadcasting is the distribution of the same group of
`messages from a source processor to all other pro(cid:173)
`ce8sors, and scattering is the distiibution of distinct.
`groups of messages from a source processor to all other
`processors. We consider scatt.eriug from a single pro(cid:173)
`ce,;sor of the network ( single-node scatt.eriug). Further(cid:173)
`more, we consider broadcasting and scattering simulta(cid:173)
`neously from all processors of the network ( mnltinode
`broadcasting and scattering). Algorithms for the same
`communication problems using a similar approach have
`been previously construd.ed on the binary hypercnhe
`[l, 7], the generalized hypercube (4], and the star net.(cid:173)
`works [3].
`All of the communication problems are studied
`under the store-and-forward, all-port conununicatiou
`model, meaning that in one time step a processor can
`ex,:hange messages of fixed length with all of its neigh-
`
`hors simultaneously. The communication is bidirw(cid:173)
`tional, meaning that an edge can be used for message
`transmission in both dired.ious at each time step. Each
`message requires unit. tiuw to he transn1it.ted on an
`edge, i.e.
`the unit cost. model is used. Finally, each
`processor wishes to transrnit a muubn of M messages
`to each one of its destination processors.
`All of t.he algorithm,; pre;;ent.Pd in this paper are
`based on the const.rud.ion of a spanning graph with
`special properties on the nmltidimensional torus net(cid:173)
`work. A special framework is developed to fadlit.ate
`t.he construction of the spanning graph and t.hP devel(cid:173)
`opment of the comm1mirntio11 algorithms. The mnlt.in(cid:173)
`ode hroacka.<;ting and scat.teriug prohlems ,ue of special
`interest. A special tedmiqne is developed on the multi(cid:173)
`dimensional t.orus so that 111,•ssagt>s originating at indi(cid:173)
`vidual nodes are inter!t,aved in surh a manner that. no
`two messages contend for the ,;au11• edge at. any give11
`t.ime during the execut.iou of the algorithm. For t.hP
`single-node scatt.,·ring prohlPIJl, wllt're th,· edges inci(cid:173)
`dent. to the source nodt> ronst.itut..• a hot.t.li·nel'k for t.lw
`transmission of the 1nessagPs, the spanning graph offers
`the capability to transrnit. au ,•qmil m1rnlwr of messages
`ovPr earh edge inriclent. to tlw somre nod,·.
`
`2 Notations and Definitions
`
`An 11-dimensional, k-ary, multidimensional toms
`net.work, denoted hy JHT,.,,., is an undirected graph
`of kn nodes, each one labeled by fill 11 digit. numlwr in
`radix k arithmetic [2, 8]. Each node ,, is ro1mect.ed t.o
`211 other nodes with which it diffrrs iu ouly one digit
`by j mod k, j E { -1, 1}, i.e ..
`
`Vn-1 ···tli+l Vi1'i-1 ···Vo, Vn-1 · .. Vt+I V~Vi-) ... Vo
`is an edge for all O s; i s; 11 - 1, and v: = ( v, + j) mod k,
`j E {-1, l}, Fig, 1. The nt>t.work is ecl~e and node
`symnwt.rir with degre(' 211 and diitmPtN 11 Li J. MT,. k
`
`10ti3-7133/95 $4.00 rQ 1995 IEE~;
`
`324
`
`

`

`of the multidimensional t.orus has properties similar to
`those of the right cyclic shift operation on nodes of the
`binary hypercuhe [7, l].
`
`Definition 2 Given function r(i) == (k -
`i) mod k
`from the set {O, 1, ... , k - 1} lo it.self, the rotation of
`a node v = Vn-l···Vi+1v;v;-1, .. Vo, denoted by Ro(v),
`is defined to be node r(vo)vn-\Vn-2, .. V;+,v;v;-1 ... V\.
`This can be viewed as a right cyclic .~hift of the digits
`of v with the wraparound digit being ma71ped through
`function r.
`
`The rotation operation has the following properties:
`1. If ( v, u) is an edge of dimension rl;, 0 ::; i S n - 1,
`j E {-1, 1 }, then edge ( Ro(v), Ro( 11 )) is of dimen(cid:173)
`sion gf,' so that:
`
`i' = (i - 1) mod n,
`
`j' = {
`
`-j,
`j,
`
`if i = 0,
`otherwise.
`
`2. As a result of prop,.rty 1, the 2n edges derived
`from edge (v, u) by consecutive applications of the
`rotation operation art' all of different clinwnsions.
`
`3. The rotation oprrat.iou pn·sf'rves tht> dist.au, .. of
`each node from nod,· O".
`
`The nodes of AlTn,k an• grou1wd into ,•quivalt>uce
`classPs under the act.iou of tht' rot.at.ion oprration as
`follows:
`
`Definition 3 A II ordered gro1171 of 1rndes, Meli one de(cid:173)
`rwed from 1/s preceding one c,yclirall1J, hy the applica(cid:173)
`tion of a rotation 1s cal/rd a necklace.
`
`Necklaces have t.he following prop..rtit's (5]:
`
`1. A necklace contains al 1110.9/ 211 nodes.
`
`2. The size of a nf'rklilre always divides 2u (6].
`
`3. All nodes of a llt>rklart' HrP at. t.hf' salllt' distance
`form node 0" .
`
`The nodes of A1T,,,k at. earh distauc.e from node O"
`are collections of necklaces. Below, the nt'cklaces of
`MTJ,3 at. each dist.au re d, 0 S d S 11 l # J, from node O"
`are given en dosed in pan·ut.heses (5]. -
`
`d = 0:
`cl= l :
`d = 2:
`d = 3:
`
`( 000)
`( 200, 020, 002, 100, 010, 001 )
`( 220, 022. 102, 110, 011, 201 )
`( 210, 021, 202, 120, 012, 101)
`( 222, 122, 112, 111. 211, 221)
`( 21i, 121 )
`
`Figure I: The MT2,4 network.
`
`belongs to the class of Cayley graphs [8]. The 2n gen(cid:173)
`erators that define the edges of MTn,k are denoted
`by g{, 0 S
`i S n - 1,
`j E {-1, 1}. Generator
`gf connects node v = Vn-l···v;+1v,u;-1 ... vo to node
`v' = Vn-1 .. ,vi+1((t,, + j) mod k)v;.-1 ... vo, whkh re(cid:173)
`sults by adding j mod k t,o the ith digit of v. In this
`case we say that edge (v, v') is of dimension f7i. or
`dim(v, v') = gf. In what follows, node 00 ... 0 of MT,..k
`is referred to as node on. In order to avoid tedious
`details, in what follows, we consider only multidimen(cid:173)
`sional torus networks with k odd.
`We now define two automorphisms on the multidi(cid:173)
`mensional torus network,namely, the tra11slatio11 and
`the rotation operations, that will be of primary impor(cid:173)
`tance for the const.ruc:tion of the spanning graph and
`the development. of the communication algorithrns.
`
`Definition 1 The translation of a node v with resp1·ct
`to node s, denoted by Tr,(v), is defined to be 11ode
`t = Tr,(v), so that t; = (v; +s;) mod k, 0 :S: i :S: 11- 1.
`The translation operation is an automorphism on
`the multidimensional torus that preserves the dimen(cid:173)
`sion of each edge [5]. This automorphism allows the
`network to be viewed idf'ntically from all nodes. The
`translation operation on 1\t!Tn,k is analogous to tl1f'
`exclusive-OR operation on nodes of thf' binary hyper(cid:173)
`cuhe [7, 1].
`As emphasized in the introduction, the multinode
`broadcasting and scattering algorithms are designed so
`that. messages originating at individual nodes are intrr(cid:173)
`leaved in such a manner that no two messages cont.end
`for the same edge at any given time. The propi"rties
`of the rotation operation, as explained below, will help
`achieve this attribute. The rotation operation on nod,•s
`
`325
`
`

`

`Definition 4 The penod of a node v, denoted by P(11),
`i.s defined to be the number of nodes contained in the
`necklace to which it belongs.
`
`Definition 5 An unfolded necklace
`is a11 ordered
`grnup of exactly 2n nodes, not neces.sarily disti11ct, each
`one obtained from its preceding one, cyclically, by the
`application of a rotation.
`
`Each necklace has a corresponding unfolded nt>ck(cid:173)
`lace. For necklaces that contain 2n nodes, the <"or(cid:173)
`responding unfolded necklace is I.he necklace itself.
`For necklaces that contain P < 211 nodPs, the corre(cid:173)
`sponding unfolded necklace is the ne<"klacl" rep.,atl"d
`~' times. This is possible since the size of a ni·rklare
`is always a divisor of2n. Below, the unfolded w•rklaces
`of MT3 ,3 at each distanre from node 0°, are given (5).
`
`d= 0:
`d= 1:
`d= 2:
`
`d= 3:
`
`( 000, 000, 000, 000, 000, 000 )
`( 200, 020, 002, 100, 010, 001 I
`( 220, 022, 102, 110, 011, 201 )
`( 210, 021, 202, 120, 012, 101 )
`( 222. 122, 112, 111, 211, 221
`( 212, 121, 212, 121, 212, 121 )
`
`The following definit.ion aims to distinguish 01w par(cid:173)
`ticular node of each necklace.
`
`Definition 6 The binary correspondrnt of a 11odr " of
`M1'n,k is the binar·y number obtainl'rl if we substilufr
`rnch nonzero digit in II with 1. One parlzcular nodl' of
`each necklace is now dl:lting-ui.~hed as follows:
`
`I. Select the nodes of the nerklaee that ht111f'
`largest bi11a1·y corHspondenl.
`
`/ht
`
`:?. Choo.se the /argfst ,w1011g the nodt's sdfflrd i11 s/q1
`1, if the k digits I/tat are used to labd tllf node.s
`of MTn,k are ordered as follows: 0 < 1 < I.: -- 1 <
`i < ... < f !1- This ordering of 1hr
`... i < k -
`digit.'/ is adopted in order io re,flerl how each digit
`co11tribute,'i to the distance of a node from nod( O".
`
`The node selected 111 step 2 is defined to be thr lr11df1'
`of the necklace.
`
`The property of the rotation operation that 211 edges
`each of which is obtained as a rotation of its pr.,n•,ling
`011e are all of different dimensions, along with th,• prop(cid:173)
`rrty of edge d

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket