`Computing
`Modern Systems and Practices
`
`Thomas Sterling
`Matthew Anderson
`Maciej Brodowicz
`School of Informatics, Computing, and Engineering
`Indiana University, Bloomington
`
`Foreword by C. Gordon Bel I
`
`M<
`
`HORGAN
`
`KAUFMANN
`
`PUBLISHERS
`
`ELSEVIER
`
`AN
`
`IMPRINT
`
`OF ELSEVIER
`
`IPR2018-01600
`
`EXHIBIT
`2069
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 1
`
`
`
`Morgan Kaufmann is an imprint of Elsevier
`50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
`
`Copyright© 2018 Elsevier Inc. All rights reserved.
`
`No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
`mechanical, including photocopying, recording, or any information storage and retrieval system, without
`permission in writing from the publisher. Details on how to seek permission, further information about the
`Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance
`Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
`
`This book and the individual contributions contained in it are protected under copyright by the Publisher ( other
`than as may be noted herein).
`
`Notices
`Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
`understanding, changes in research methods, professional practices, or medical treatment may become
`necessary.
`
`Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using
`any information, methods, compounds, or experiments described herein. In using such information or methods
`they should be mindful of their own safety and the safety of others, including parties for whom they have a
`professional responsibility.
`
`To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any
`liability for any injury and/or damage to persons or property as a matter of products liability, negligence or
`otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the
`material herein.
`
`Library of Congress Cataloging-in-Publication Data
`A catalog record for this book is available from the Library of Congress
`
`British Library Cataloguing-in-Publication Data
`A catalogue record for this book is available from the British Library
`
`ISBN: 978-0-12-420158-3
`
`For information on all Morgan Kaufmann publications visit
`our website at https://www.elsevier.com/books-and-joumals
`
`,. Book Aid
`
`Working together
`to grow libraries in
`developing countries
`\\,V½.elstvie1.com
`• WW\\.hooL11d.01!:;
`
`International
`
`Publisher: Katey Birtcher
`Acquisition Editor: Steve Merken
`Developmental Editor: Nate McFadden
`Production Project Manager: Punithavathy Govindaradjane
`Designer: Mark Rogers
`
`Typeset by TNQ Books and Journals
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 2
`
`
`
`INTRODUCTION
`
`1
`
`CHAPTER OUTLINE
`1.1 High Performance Computing Disciplines ........................................................................................... 3
`
`
`1.1.1 Definition ....................................................................................................................
`3
`1.1.2 Application Programs ....................................................................................................
`4
`1.1.3 Performance and Metrics ..............................................................................................
`4
`1.1.4 High Performance Computing Systems ...........................................................................
`5
`1.1.5 Supercomputing Problems ............................................................................................
`7
`1.1.6 Application Programming ..............................................................................................
`8
`1.2 Impact of Supercomputing on Science, Society, and Security ........................................................... 1 O
`1.2.1 Catalyzing Fraud Detection and Market Data Analytics ...................................................
`10
`1.2.2 Discovering, Managing, and Distributing Oil and Gas .....................................................
`10
`1.2.3 Accelerating Innovation in Manufacturing .....................................................................
`10
`11
`1.2.4 Personalized Medicine and Drug Discovery ...................................................................
`1.2.5 Predicting Natural Disasters and Understanding Climate Change .................................... 12
`
`
`1.3 Anatomy of a Supercomputer ........................................................................................................... 14
`
`
`1.4 Computer Performance ................................................................................................................... 16
`1.4.1 Performance ..............................................................................................................
`16
`1.4.2 Peak Performance ......................................................................................................
`17
`1.4.3 Sustained Performance ...............................................................................................
`18
`1.4.4 Scaling ......................................................................................................................
`18
`1.4.5 Performance Degradation ............................................................................................
`19
`1.4.6 Performance Improvement ..........................................................................................
`20
`
`
`1.5 A Brief History of Supercomputing ................................................................................................... 21
`1.5.1 Epoch I-Automated Calculators Through Mechanical Technologies ............................... 22
`1.5.2 Epoch 11-von Neumann Architecture in Vacuum Tubes ................................................ 24
`1.5.3 Epoch Ill-Instruction-Level Parallelism ......................................................................
`29
`1.5.4 Epoch IV-Vector Processing and Integration ................................................................
`30
`1.5.5 Epoch V-Single-lnstruction Multiple Data Array ...........................................................
`33
`1.5.6 Epoch VI-Communicating Sequential Processors and Very Large Scale Integration ......... 34
`1.5.7 Epoch VII-Multicore Petaflops ...................................................................................
`37
`1.5.8 Neodigital Age and Beyond Moore's Law ......................................................................
`37
`
`High Performance Compuling. https://dol.org/10.101fil8978-0-12-420158-3.0000UI
`Copyright© 2018 Elsevier Inc. All rights reserved.
`
`1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 3
`
`
`
`2
`
`CHAPTER 1 INTRODUCTION
`
`1.6 This Textbook as a Guide and Tool for the Student ............................................................................ 38
`1. 7 Summary and Outcomes of Chapter 1 ............................................................................................... 39
`
`1.8 Questions and Problems .................................................................................................................. 40
`References ............................................................................................................................................ 41
`
`Supercomputing , which means supercomputers and their application, is among the most important
`developments of the modem age, with unequaled impact across a vast diversity of fields of inquiry
`and practical effect. From the extremes of arcane sciences to the most immediate practical concerns,
`supercomputers play an essential role in the progress and advancement of human capabilities, envi(cid:173)
`ronments, and understanding . No other single technology in the history of humanity has experienced a
`similar rate of growth, even in its relatively short existence. Within the span of a single human lifetime,
`supercomputers have expanded their ability to perform calculations by a factor of 10 trillion or 13 orders
`of magnitude, and this is a conservative estimate. From less than a 1000 basic operations per second in
`the late 1940s to today's perfonnance in excess of a 100 quadrillion floating-point operations per second
`(over 100 petaflops), supercomputer speed has steadily improved by about a factor of 200 times every
`decade through a series of advances in technology, architecture, programming methods, algorithms, and
`system software (Fig. 1.1). High performance computing (HPC), synonymous with supercomputing, is a
`principal means of exploration complementing empirical methods used for more than 2 millennia and
`theory practiced in the age of enlightenment of the last 4 centuries. As the "third pillar" of investigation,
`supercomputing enables new paths of inquiry, new techniques of design, and new methods of operating
`process. Even discoveries correctly credited to other classes of tools and instrumentation, such as giant
`telescopes or particle accelerators, require the use of supercomputers as well to produce their final results
`through data analysis (sometimes referred to as "big data"). It can be asserted that supercomputing
`allows us to understand the past, to control the present, and in limited cases to predict the future.
`The skills required to employ HPC are multiple and complex, while the means of acquiring such
`skills to a sufficient degree require potentially years of study and experience at least in normal practice.
`
`FIGURE 1.1
`The Titan petaflops machine fully deployed at Oak Ridge National Laboratory in 2013. It takes up more than
`4000 sq ft and consumes approximately 8 MW of electrical power. It has a theoretical peak performance of over
`27 petaflops and delivers 17.6 petaflop s R max sustained performance for the highly parallel Linpack (HPL)
`benchmark. This architecture includes Nvidia graphics processing unit accelerators .
`Photo courtesy of Oak Ridge National Laboratory, US Dept. of Energy
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 4
`
`
`
`1.1 HIGH PERFORMANCE COMPUTING DISCIPLINES
`
`3
`
`This often means lengthy apprenticeships in research facilities in academia, industry, or national
`laboratories. There are many books written to teach particular programming languages; others describe
`in detail the structures and instruction sets of computer architectures; and still others discuss system
`software such as operating systems. But missing has been a single textbook that serves as an entry(cid:173)
`level presentation of all these elements and their interrelationships in one place, combined with
`guided hands-on experience. This work, High Peiformance Computing, is developed as a carefully
`crafted synthesis of relevant elements of related disciplines, all of which contribute in critical ways to
`supercomputing and its use. This book presents the foundation concepts, in-depth relevant knowledge,
`and detailed skills that together will give you a meaningful understanding of HPC and an initial set of
`techniques to make you an effective, albeit incipient, practitioner in its use. Throughout this text the
`best practices employed by the community are presented with training, so you learn to do, the how,
`even as you are gaining understanding of the what and the why.
`This textbook provides a comprehensive introduction to the field of HPC. It is presented in a form
`that will be both intellectually rewarding and practical in teaching useful basic skills. It combines
`perspectives about supercomputing concepts, knowledge about supercomputers, and techniques for
`using and programming supercomputers. But teaching a complex subject like HPC is challenging, in
`that just about everything is defined in terms of and relates to everything else. Yet by the nature of
`pedagogy, material must be presented in some sequential order. This first chapter is a brief introductory
`presentation of the essential elements of HPC to provide an overview of everything; a first pass that
`will allow successive in-depth chapters to be related to this broad context.
`The chapter looks at the many facets which comprise HPC. The importance of the material is that it
`provides a complete, albeit simplified, perspective of HPC so that more detailed discussion of specifics
`can be understood within the full context. Because no piece makes sense without the others, almost all
`areas are briefly introduced in this chapter. To reinforce the interrelated broad-brush presentation of
`issues, this chapter concludes with a history of the field and its rapid evolution.
`
`1.1 HIGH PERFORMANCE COMPUTING DISCIPLINES
`As previously noted, HPC is really a collection of multiple interrelated disciplines, each providing an
`important aspect of the total field. To master HPC as a useful tool is to develop an understanding and
`associated skills in each of these corresponding areas. These broad areas are described here, including
`a formal definition of "high performance computing" that applies throughout the treatment of the
`field, end-user application problems that are the intended purpose of HPC across a wide range of
`science, engineering, societal, and security· domains, the core concept of performance which is the
`distinguishing characteristic of HPC compared to other forms of computing, the hardware and
`software components that make up an HPC system, environments, tools, application programming,
`and the interfaces used. Each of these is presented in some detail in the following sections and together
`form a major portion of the concepts, knowledge content, and skills comprising this textbook.
`
`1 .1 .1 DEFINITION
`HPC is a field of endeavor that relates to all facets of technology, methodology, and application
`associated with achieving the greatest computing capability possible at any point in time and technology.
`It engages a class of electronic digital machines referred to as "supercomputers" to perform a wide array
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 5
`
`
`
`4
`
`CHAPTER 1 INTRODUCTION
`
`of computational problems or "applications" (alternatively "workloads") as fast as is possible. The
`action of performing an application on a supercomputer is widely termed "supercomputing" and is
`synonymous with HPC.
`
`1.1.2 APPLICATION PROGRAMS
`The purpose of HPC is to derive answers to questions that cannot be adequately addressed alone
`through means of empiricism, theory, or even widely available or accessible commercial computers
`(e.g., enterprise servers). Historically supercomputers have been applied to science and engineering,
`and the methodology has been described as the "third pillar of science" alongside and complementing
`both experimentation (empiricism) and mathematics (theory). But the range of problems that super(cid:173)
`computers can tackle extends far beyond classical scientific and engineering studies to include
`challenges in socioeconomics, big-data management and learning, process control, and national
`security. An appHcation, then, is both the problem to be solved and the body of "code" or collection of
`ordered computing instructions that represent the means of solving the problem. The code is the means
`by which the user conveys to the supercomputer how it is to perform the necessary computations to
`achieve the objectives of the problem. The full set of code used is a "computer program" or just
`"program", and the person developing the application code is the "programmer".
`
`1.1.3 PERFORMANCE AND METRICS
`While the notion of performance may be intuitive, it is not simple. There is no single measure of
`performance that fully reflects all aspects of the quality of computer operation. A "metric" is a
`quantifiable observable operational parameter of a supercomputer. Multiple perspectives and related
`metrics are routinely applied to characterize the behavioral properties and capabilities of an HPC
`system. Two basic measures are employed individually or in combination and in differing contexts to
`formulate the values used to represent the quality of a supercomputer. These two fundamental
`measures are "time" and "number of operations" performed, both under prescribed conditions.
`For HPC the most widely used metric is "floating-point operations per second" or "flops". A
`floating-point operation is an addition or multiplication of two real (or floating-point) numbers
`represented in some machine-readable and manipulatable form. Because supercomputers are so
`"powerful", to describe their capability would require phrases like "a trillion or quadrillion operations
`per second". The field adopts the same system of notation as science and engineering, using the
`Greek prefixes kilo, mega, giga, tera, and peta to represent 1000, 1 million, 1 billion, 1 trillion, and 1
`quadrillion, respectively. The first supercomputers barely achieved 1 kiloflops (Kflops). Today's fastest
`supercomputer exhibits a peak performance in the order of 125 petaflops. The laptop computer upon
`which this textbook was written has a peak performance of a few gigaflops. A supercomputer is
`millions of times more powerful than a laptop by this metric.
`The true capability of a supercomputer is its ability to perform real work, to achieve useful results
`toward an end goal such as simulating a particular physical phenomenon (e.g., colliding neutron stars
`to determine resulting electromagnetic burst signatures). A better measure than flops is how long a
`given problem takes to complete. But because there are literally thousands (millions?) of such
`problems, this measure is not particularly useful broadly. Thus the HPC community selects specific
`problems around which to standardize. Such standardized application programs are "benchmarks".
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 6
`
`
`
`1.1 HIGH PERFORMANCE COMPUTING DISCIPLINES
`
`5
`
`··--·
`
`Performance Development
`1 Eflop/s ,--·-·-------·--·--·
`··· - -· ···-·--·-··-· -··
`I
`100 Pflop/s ~·-···· -·
`10 Pflop/s i. ..... ··-.
`1 PfloptsJ.
`100 Tfloptsl.-
`...
`- .
`10 Tflop/s i-·-·-·--···----··
`1 Tflop/s j
`...... .
`
`100Mflop/s !···.-r•·,··,
`1994
`
`. ,-,··,-·"·."'···.
`1996
`1998
`
`, .. ,.,
`2000
`
`, r •. ,
`, , ,·,
`2002
`2004
`
`,
`,·,--,"·,·-r·,··T'·",····
`2006
`2008
`
`2010
`
`,·-··.·······.
`2012
`
`·-······,
`2014
`
`2016
`
`FIGURE 1.2
`The evolution of the Rmax from the HPL benchmark for supercomputing systems in the Top 500 list since the list
`began in 1993. The top line indicates the cumulative performance of all the computers in the list. The middle line
`shows the performance of the number one computer in the list. The bottom line shows the performance of the last
`computer in the list (number 500).
`
`Image courtesy Erich Strohmaier
`
`One particularly widely used supercomputer benchmark is "Linpack", or more precisely the "highly
`parallel Lin pack" (HPL ), which solves a set of linear equations in dense matrix form [ 1]. A benchmark
`gives a means of comparative evaluation between two independent systems by measuring their
`respective times to perform the same calculation. Thus a second way to measure performance is time
`to completion of a fixed problem. The HPC community has selected HPL as a means of ranking
`supercomputers, as represented by the "Top 500 list" begun in 1993 (Fig. 1.2). But other benchmarks
`are also employed to stress certain aspects of a supercomputer or represent a certain class of programs.
`
`1.1.4 HIGH PERFORMANCE COMPUTING SYSTEMS
`The most visible aspect of the field of HPC is the high performance computers, or simply super(cid:173)
`computers, themselves. Today these machines appear as rows upon rows of many racks taking up
`thousands of square feet and consuming potentially multiple megawatts of electrical power. To be in
`the presence of one (which often means to be literally inside it) offers a whole other experience in
`terms of noise, rapidly shifting temperature gradients, and many blinking lights. Even the most staid
`observer cannot help but be awestruck by the impressive massiveness of such systems, the engineering
`by which they are achieved, the commitment they represent to the edges of computing capability, and
`the problems only they can solve. And beyond what is visible even to the not-so-casual observer is the
`infrastructure that supports the operation of these systems, much of which is below floors, in adjacent
`rooms, and outside the building that houses the machine. The deployment of a state-of-the-art
`supercomputer is truly a major engineering undertaking involving time, expense, and expertise, as
`well as responsible management and maintenance throughout the lifetime of the system. And yet the
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 7
`
`
`
`6
`
`CHAPTER 1 INTRODUCTION
`
`visible, audible, and other sensory experiences barely reflect the true nature of the accomplishment
`embodied by these machines. At the heart of the HPC system is the structure and organization of its
`myriad components and the semantics or rules by which they operate and perform the user applications
`offered to them. Even more than the hardware, the HPC system is a vast array of software components
`that control the hierarchy of the physical components and manage the user workloads. If the physical
`hardware, racks and all, is what the visitor experiences, the system software, its interfaces, and
`functionality are what the user experiences (usually at a location far from the physical machine) when
`developing and running the applications and analyzing the results.
`In one important sense, the high performance computer system has basic functionality and sub(cid:173)
`systems in common with the laptop personal computer upon which this textbook was written. These
`principal capabilities, shared by both extremes, include the following.
`
`• The operational functions that transform input data values to output results.
`• The internal memory that stores the data upon which the system operates.
`• The communication channels through which intermediate data is transferred between different
`components and subsystems during application execution.
`• The control hardware that coordinates the interoperability among the constituent components and
`subsystems.
`• The mass storage that organizes and holds the persistent data, system software, and application
`programs.
`• The input/output (1/0) channels and interfaces (like the keyboard I am typing on and the screen I
`am looking at) that connect users to the system.
`
`Similarly, the software of an HPC system has much in common with the desk-side workstation or
`departmental enterprise server. Like these more pervasive albeit more modest computers, the super(cid:173)
`computer has a software structure that serves many of the same purposes of interface, control, and
`functionality, including but not limited to the following.
`
`• The operating system that manages all aspects of the machine and its operation.
`• The compilers that translate application programs written in human-readable syntactic languages
`(and other interfaces) to machine-readable binary code.
`• File systems that present a logical abstraction of mass storage and organize the data on mass(cid:173)
`storage devices (like hard-disk drives).
`• The myriad software drivers of the 1/0 devices by which the computer communicates with the
`external world and users.
`• The many tools that make up much of the expected user environments.
`
`What distinguishes an HPC system from a conventional computer is the organization, inter(cid:173)
`connectivity, and scale of the component resources and the ability of the supporting software to
`manage the operation of the system at that scale (Fig. 1.3). By scale is meant the degree of physical
`and logical parallelism, i.e., the replication of key physical components such as processors and
`memory banks and the delineation of a number of tasks to be performed simultaneously. While even a
`single socket laptop incorporates some parallelism, an HPC system is structured in far more levels,
`each of which is usually much more substantial (but there are exceptions to this). It is this parallel
`organization, the methods by which the constituent subsystems are coordinated to solve a shared
`problem, and the additional functionality of the system software and programming models providing
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 8
`
`
`
`1.1 HIGH PERFORMANCE COMPUTING DISCIPLINES
`
`7
`
`Global Interconnection Network
`
`Accelerator
`Core
`Array
`
`Node 1
`
`Node
`2
`
`Node
`3
`
`Node
`N
`
`FIGURE 1.3
`HPC systems are distinguished from a conventional computer by the organization, interconnectivity, and scale of
`the many component resources illustrated here. A "node" incorporates all the functional elements required for
`computation, and is highly replicated to achieve large scale.
`
`such management that differentiate the supercomputer from its smaller counterparts. But from the
`viewpoint of the programmer, it is the need to think in parallel (many things happening at the same
`time) and distributed (things happening in different places separated by distance) that differentiates the
`supercomputer from the day-to-day computer [2]. This requires knowledge and skill in employing
`programming interfaces that expose and exploit application parallelism and algorithms that permit
`simultaneous operation of many parts of the computation contributing to the final answer.
`
`1.1.5 SUPERCOMPUTING PROBLEMS
`The field of supercomputing was born in the midst of revolutionary advances in experimental nuclear
`research, and has since grown to affect nearly all research fields driven by experiment. Because the
`genesis of supercomputing lies in simulating problems driven by nuclear physics, many supercomputing
`problems are framed in the context of tracking large systems of particles consisting of different
`species that may interact with one another and are not in equilibrium. Such nonequilibrium problems
`are generally difficult to compute analytically and can be very costly to explore experimentally.
`Consequently, these types of problems frequently appear on supercomputers, because of both the high(cid:173)
`resolution probing ability of the simulation and the substantially reduced cost at which the computational
`experiment can be conducted [3].
`Another class of supercomputing problem that overlaps with tracking large systems of particles
`with pair-wise interactions is the class that solves some set of partial-differential equations. For
`instance, a large fraction of supercomputing time is spent solving the Navier-Stokes equations for
`fluid flow because of their relevance to many engineering problems. As a second example, the direct
`detection of an astrophysical source of gravitational radiation in 2015 by the UGO Scientific
`Collaboration was supported by millions of hours of supercomputing resources solving the Einstein
`field equations to simulate the merger of binary black holes.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 9
`
`
`
`8
`
`CHAPTER 1 INTRODUCTION
`
`Table 1.1 Supercomputing Problem Representatives and How They Are Used in Academia,
`Industry, and Government
`
`Solution of partial(cid:173)
`differential equations
`
`Navier-Stokes
`equations, Einstein
`equations, Maxwell
`equations
`
`Large systems with pair(cid:173)
`wise force interactions
`Linear algebra
`
`Graph problems
`
`Stochastic systems
`
`Cosmology, molecular
`dynamics simulations
`Supporting solution of
`partial-differential
`equations, fundamental
`benchmarks of HPL and
`high performance
`conjugate gradients
`Systems research,
`machine learning
`Radiation transport,
`particle physics
`
`Black-Scholes
`equation,
`Navier-Stokes
`equations for
`compressible flow, oil
`reservoir modeling
`Medicine development,
`biomolecular dynamics
`Search engine
`PageRank, finite(cid:173)
`element simulations
`
`Weather prediction,
`hurricane modeling,
`storm-surge modeling,
`sea-ice modeling
`
`Plasma modeling
`
`HPC machine
`evaluation, climate
`modeling
`
`Fraud detection
`
`Risk analysis in finance,
`nuclear reactor design,
`process control
`
`Security services, data
`analytics
`Public health, modeling
`spread of disease
`
`Many classes of HPC problems are designed around the supercomputer's ability to solve problems
`in linear algebra. In science and engineering the result of discretizing partial-differential equations
`frequently results in a system of linear equations. This has led to the development of both direct and
`iterative solution techniques for supercomputers. The main benchmark currently used to measure a
`supercomputer's peak performance is a dense linear algebra problem.
`While many HPC problems arise from mathematical models, some of the most important super(cid:173)
`computing problems today arise from graph problems. Graph problems often come from problems
`arising in knowledge management, machine intelligence, linguistics, networks, biology, dynamical
`systems, and collections of pair-wise systems.
`HPC problem representatives and examples of their usage in academia, industry, and government
`are presented in Table 1. 1.
`The variety and novelty of supercomputing problems continue to expand far beyond its nuclear
`physics roots (see the example in Fig. 1.4). As supercomputing skill-sets and resources become
`increasingly commonplace, it is difficult to imagine an analytical field that will not be impacted by
`HPC in the future.
`
`1.1.6 APPLICATION PROGRAMMING
`The principal view the user has of a HPC system is through one or more programming interfaces,
`which take the form of programming languages, libraries, or other services. These are expanded by
`
`,,I'
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2083, p. 10
`
`
`
`1.1 HIGH PERFORMANCE COMPUTING DISCIPLINES
`
`9
`
`FIGURE 1.4
`
`A particle-in-cell simulation from the Gyro(cid:173)
`kinetic Toroidal code (Princeton Plasma
`Physics laboratory) that simulates a plasma
`within a Tokomak fusion device. A sampling
`of some particles within the toroid is shown
`here colored according to their velocity, with
`different supercomputing processor bound(cid:173)
`aries delineated by the toroidal subdivisions.
`
`additional sets of tools that assist in crafting, optimizing, and debugging application codes. Ironically,
`a major means of programming is the use of existing programs either directly or as templates to modify
`for specific purposes. There are hundreds of computer programming languages, from very low level
`including assemblers to very high reaching to the declarative regime. But for HPC the number of
`conventionally adopted programming interfaces is relatively few, in the order of dozens, although there
`are many more experimental or research models. At the risk of oversimplification, a programming
`language defines a set of named objects that can be manipulated, the basic operations that can be
`performed on these objects, the flow-control mechanisms for establishing the conditions and order of
`operation execution, the means of encapsulation for modularity, and I/0 including mass storage.
`Programming in the regime of supercomputing has additional requirements and characteristics.
`Performance is the driving requirement that differentiates HPC programming from other domains. It is
`second only to correctness and repeatability, which are of serious concern. Performance is most
`significantly represented by the need for representation and exploitation of computational parallelism:
`the ability to perform multiple tasks simultaneously. Parallel processing involves the definition of
`parallel tasks, establishing the criteria that determine when a task is performed, synchronization among
`tasks in part to coordinate sharing, and allocation to computing resources. A second aspect of
`programming for HPC is control of the relationship of allocations of data and tasks to the physical
`resources of the parallel and distributed systems. The nature of the parallelism may vary significantly
`depending on the form of computer system architecture targeted by the application program. Also of
`concern are issues of determinism, correctness, performance debugging, and performance portability.
`Depending on the nature of the class of parallel system architecture, different programming
`models are employed. One dimension of differentiation is granularity of the parallel workflow. Very
`coarse-grained workloads with no interactivity, sometimes referred to as "embarrassingly parallel" or
`"job-stream" workflow, suggest one class of workflow manag