`
`Reference 24
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 1
`
`
`
`MODERN SYSTEMS AND PRACTICES
`
`TH ·MAS "TEu.LING, MATTHEW AN
`11 MACIEJ l :l{• 1] ,, ,w1cz
`
`·:'\,
`
`F
`
`EW·
`
`Y C. G
`
`N ELL
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 2
`
`
`
`High Performance
`Computing
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 3
`
`
`
`High Performance
`Computing
`Modern Systems and Practices
`
`Thomas Sterling
`Matthew Anderson
`Maciej Brodowicz
`School of Informatics, Computing, and Engineering
`Indiana University, Bloomington
`
`Foreword by C. Gordon Bell
`
`M<
`
`MORGAN KAUFMANN PUBLISHERS
`
`ELSEVIER
`
`AN IMPRINT OF ELSEVIER
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 4
`
`
`
`Morgan Kaufmann is an imprint of Elsevier
`50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
`
`Copyright © 2018 Elsevier Inc. All rights reserved.
`
`No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
`mechanical, including photocopying, recording, or any information storage and retrieval system, without
`permission in writing from the publisher. Details on how to seek permission, further information about the
`Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance
`Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
`
`This book and the individual contributions contained in it are protected under copyright by the Publisher (other
`than as may be noted herein).
`
`Notices
`Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
`understanding, changes in research methods, professional practices, or medical treatment may become
`necessary.
`
`Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using
`any information, methods, compounds, or experiments described herein. In using such information or methods
`they should be mindful of their own safety and the safety of others, including parties for whom they have a
`professional responsibility.
`
`To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any
`liability for any injury and/or damage to persons or property as a matter of products liability, negligence or
`otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the
`material herein.
`
`Library of Congress Cataloging-in-Publication Data
`A catalog record for this book is available from the Library of Congress
`
`British Library Cataloguing-in-Publication Data
`A catalogue record for this book is available from the British Library
`
`ISBN: 978-0-12-420158-3
`
`For information on all Morgan Kaufmann publications visit
`our website at https://www.elsevier.com/books-and-joumals
`
`[I .,.,..._ Working together
`
`:_..11(1 to grow libraries in
`BookAid d 1
`·
`·
`eve Oplllg COUntrteS
`International
`www.clsc\ ICI.lOlll • www.houL11d.tH g
`
`Publisher: Katey Birtcher
`Acquisition Editor: Steve Merken
`Developmental Editor: Nate McFadden
`Production Project Manager: Punithavathy Govindaradjane
`Designer: Mark Rogers
`
`Typeset by TNQ Books and Journals
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 5
`
`
`
`Dedicated to
`
`Dr. Paul C. Messina
`
`Leader, colleague, collaborator, mentor, friend
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 6
`
`
`
`Contents
`
`Foreword ............................................................................................................................................. xix
`Preface ................................................................................................................................................ xxi
`Acknowledgments ............................................................................................................................ xx vii
`
`Introduction ........................................................................................... 1
`CHAPTER 1
`1.1 High Performance Computing Disciplines ................................................................. 3
`1.1.1 Definition .......................................................................................................... 3
`1.1.2 Application Programs ....................................................................................... 4
`1.1.3 Performance and Metrics ................................................................................. 4
`1.1.4 High Performance Computing Systems ........................................................... 5
`1.1.5 Supercomputing Problems ............................................................................... 7
`1.1.6 Application Programming ................................................................................ 8
`1.2 Impact of Supercomputing on Science, Society, and Security ................................ 10
`1.2.1 Catalyzing Fraud Detection and Market Data Analytics .............................. 10
`1.2.2 Discovering, Managing, and Distributing Oil and Gas ................................. 10
`1.2.3 Accelerating Innovation in Manufacturing .................................................... 10
`1.2.4 Personalized Medicine and Drug Discovery ................................................. 11
`1.2.5 Predicting Natural Disasters and Understanding Climate Change ................ 12
`1.3 Anatomy of a Supercomputer ................................................................................... 14
`1.4 Computer Performance ............................................................................................. 16
`1.4.1 Performance .................................................................................................... 16
`1.4.2 Peak Performance ........................................................................................... 17
`1.4.3 Sustained Performance ................................................................................... 18
`1 .4.4 Scaling ............................................................................................................ 18
`1.4.5 Performance Degradation ............................................................................... 19
`1.4.6 Performance Improvement ............................................................................. 20
`1.5 A Brief History of Supercomputing ......................................................................... 21
`1.5. I Epoch I-Automated Calculators Through Mechanical Technologies ......... 22
`1.5.2 Epoch II-von Neumann Architecture in Vacuum Tubes ............................. 24
`1.5.3 Epoch III-Instruction-Level Parallelism ...................................................... 29
`1.5.4 Epoch IV-Vector Processing and Integration .............................................. 30
`1.5.5 Epoch V-Single-Instruction Multiple Data Array ....................................... 33
`1.5.6 Epoch VI-Communicating Sequential Processors and Very Large
`Scale Integration ............................................................................................. 34
`1.5.7 Epoch VII-Multicore Petaflops .................................................................... 37
`1.5.8 Neodigital Age and Beyond Moore's Law .................................................... 37
`1.6 This Textbook as a Guide and Tool for the Student... ............................................. 38
`
`vii
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 7
`
`
`
`viii
`
`CONTENTS
`
`1.7 Summary and Outcomes of Chapter l ..................................................................... 39
`1.8 Questions and Problems ........................................................................................... 40
`References ......................................................................................................................... 41
`CHAPTER 2 HPC Architecture 1: Systems and Technologies ............................... 43
`2.1 Introduction ............................................................................................................. 44
`2.2 Key Properties of HPC Architecture ..................................................................... .44
`2.2.l Speed ............................................................................................................ 45
`2.2.2 Parallelism .................................................................................................... 45
`2.2.3 Efficiency ...................................................................................................... 46
`2.2.4 Power ............................................................................................................ 46
`2.2.5 Reliability ..................................................................................................... 47
`2.2.6 Programmability ........................................................................................... 48
`2.3 Parallel Architecture Families-Flynn's Taxonomy ............................................. .48
`2.4 Enabling Technology .............................................................................................. 51
`2.4.1 Technology Epochs ...................................................................................... 51
`2.4.2 Roles of Technologies .................................................................................. 55
`2.4.3 Digital Logic ................................................................................................. 55
`2.4.4 Memory Technologies .................................................................................. 58
`2.5 von Neumann Sequential Processors ..................................................................... 62
`2.6 Vector and Pipelining ............................................................................................. 64
`:J..6.1 Pipeline Parallelism ...................................................................................... 65
`2.6.2 Vector Processing ......................................................................................... 68
`2.7 Single-Instruction, Multiple Data Array ................................................................ 69
`2.7.1 Single-Instruction, Multiple Data Architecture ........................................... 69
`2.7.2 Amdahl's Law .............................................................................................. 70
`2.8 Multiprocessors ....................................................................................................... 73
`2.8. l Shared-Memory Multiprocessors ................................................................. 74
`2.8.2 Massively Parallel Processors ...................................................................... 76
`2.8.3 Commodity Clusters ..................................................................................... 77
`2.9 Heterogeneous Computer Structures ...................................................................... 78
`2.10 Summary and Outcomes of Chapter 2 ................................................................... 78
`2.11 Questions and Problems ......................................................................................... 80
`References ......................................................................................................................... 82
`CHAPTER 3 Commodity Clusters ............................................................................ 83
`3. 1 Introduction ............................................................................................................... 84
`3.1.1 Definition of "Commodity Cluster" ............................................................... 84
`3.1.2 Motivation and Justification for Clusters ....................................................... 84
`3.1.3 Cluster Elements ............................................................................................. 85
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 8
`
`
`
`CONTENTS
`
`ix
`
`3.1.4 Impact on Top 500 List.. ................................................................................ 86
`3.1.5 Brief History ................................................................................................... 88
`3.1.6 Chapter Guide ................................................................................................. 90
`3.2 Beowulf Cluster Project. ........................................................................................... 91
`3.3 Hardware Architecture .............................................................................................. 93
`3.3.1 TheNode ........................................................................................................ 93
`3.3.2 System Area Networks ................................................................................... 94
`3 .3 .3 Secondary Storage .......................................................................................... 95
`3.3.4 Commercial Systems Summary ..................................................................... 95
`3.4 Programming Interfaces ............................................................................................ 97
`3.4.1 High Performance Computing Programming Languages .............................. 97
`3.4.2 Parallel Programming Modalities .................................................................. 97
`3.5 Software Environment .............................................................................................. 98
`3.5.1 Operating Systems .......................................................................................... 98
`3.5.2 Resource Management ................................................................................... 99
`3.5.3 Debugger. ...................................................................................................... 101
`3.5.4 Performance Profiling ................................................................................... 101
`3.5.5 Visualization ................................................................................................. 101
`3.6 Basic Methods of Use ............................................................................................. 104
`3.6.1 Logging On ................................................................................................... 104
`3.6.2 User Space and Directory System ............................................................... 105
`3.6.3 Package Configuration and Building ........................................................... 110
`3.6.4 Compilers and Compiling ............................................................................ 112
`3.6.5 Running Applications ................................................................................... 113
`3. 7 Summary and Outcomes of Chapter 3 ................................................................... 113
`3.8 Questions and Exercises ......................................................................................... 114
`References ....................................................................................................................... 114
`CHAPTER 4 Benchmarking ................................................................................... us
`4.1 Introduction ........................................................................................................... 115
`4.2 Key Properties of an HPC Benchmark ................................................................ 117
`4.3 Standard HPC Community Benchmarks .............................................................. 120
`4.4 Highly Parallel Computing Unpack .................................................................... 120
`4.5 HPC Challenge Benchmark Suite ........................................................................ 123
`4.6 High Performance Conjugate Gradients .............................................................. 126
`4. 7 NAS Parallel Benchmarks ........................................................................... : ........ 130
`4.8 Graph500 ............................................................................................................... 132
`4.9 Miniapplications as Benchmarks .......................................................................... 135
`4.10 Summary and Outcomes of Chapter 4 ................................................................. 138
`4.11 Exercises ............................................................................................................... 139
`References ....................................................................................................................... 139
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 9
`
`
`
`x
`
`CONTENTS
`
`CHAPTER 5 The Essential Resource Management .............................................. 141
`5. 1 Managing Resources ............................................................................................... 142
`5.2 The Essential SLURM ............................................................................................ 146
`5.2.1 Architecture Overview ................................................................................. 147
`5.2.2 Workload Organization ................................................................................ 148
`5.2.3 SLURM Scheduling ..................................................................................... 149
`5.2.4 Summary of Commands ............................................................................... 151
`5.2.5 SLURM Job Scripting .................................................................................. 166
`5.2.6 SLURM Cheat Sheet .................................................................................... 171
`5.3 The Essential Portable Batch System ..................................................................... 172
`5.3.1 Portable Batch System Overview ................................................................ 172
`5.3.2 Portable Batch System Architecture ............................................................ 173
`5.3.3 Summary of PBS Commands ...................................................................... 174
`5.3.4 PBS Job Scripting ......................................................................................... 184
`5.3.5 PBS Cheat Sheet .......................................................................................... 186
`5.4 Summary and Outcomes of Chapter 5 ................................................................... 187
`5.5 Questions and Problems ......................................................................................... 189
`References ....................................................................................................................... 190
`CHAPTER 6 Symmetric Multiprocessor Architecture .......................................... 191
`6.1 Introduction ............................................................................................................. 191
`6.2 Architecture Overview ............................................................................................ 192
`6.3 Amdahl's Law Plus ................................................................................................. 196
`6.4 Processor Core Architecture ................................................................................... 199
`6.4.1 Execution Pipeline ........................................................................................ 200
`6.4.2 Instruction-Level Parallelism ....................................................................... 201
`6.4.3 Branch Prediction ......................................................................................... 201
`6.4.4 Forwarding .................................................................................................... 202
`6.4.5 Reservation Stations ..................................................................................... 202
`6.4.6 Multithreading .............................................................................................. 203
`6.5 Memory Hierarchy .................................................................................................. 204
`6.5. I Data Reuse and Locality .............................................................................. 204
`6.5.2 Memory Hierarchy ....................................................................................... 205
`6.5.3 Memory System Performance ...................................................................... 207
`6.6 PCI Bus ................................................................................................................... 209
`6.7 External 1/0 lnterfaces ............................................................................................ 213
`6.7.1 Network Interface Controllers ...................................................................... 213
`6.7.2 Serial Advanced Technology Attachment ................................................... 215
`6.7.3 JTAG ............................................................................................................. 218
`6.7.4 Universal Serial Bus ..................................................................................... 220
`6.8 Summary and Outcomes of Chapter 6 ................................................................... 222
`6.9 Questions and Exercises ......................................................................................... 223
`References ....................................................................................................................... 224
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 10
`
`
`
`CONTENTS
`
`xi
`
`CHAPTER 7 The Essential OpenMP ...................................................................... 22s
`7 .1 Introduction ............................................................................................................. 225
`7 .2 Overview of OpenMP Programming Model .......................................................... 226
`7 .2.1 Thread Parallelism ........................................................................................ 226
`7.2.2 Thread Variables ........................................................................................... 228
`7 .2.3 Runtime Library and Environment Variables .............................................. 228
`7 .3 Parallel Threads and Loops .................................................................................... 231
`7.3.1 Parallel Threads ............................................................................................ 231
`7.3.2 Private ........................................................................................................... 232
`7 .3 .3 Parallel "For" ................................................................................................ 233
`7.3.4 Sections ......................................................................................................... 239
`7 .4 Synchronization ...................................................................................................... 241
`7.4.1 Critical Synchronization Directive ............................................................... 242
`7.4.2 The Master Directive .................................................................................... 242
`7.4.3 The Barrier Directive ................................................................................... 243
`7.4.4 The Single Directive ..................................................................................... 243
`7.5 Reduction ................................................................................................................ 244
`7.6 Summary and Outcomes of Chapter 7 ................................................................... 245
`7. 7 Questions and Problems ......................................................................................... 246
`Reference ........................................................................................................................ 24 7
`CHAPTER 8 The Essential MPI ............................................................................. 249
`8.1 lntroduction ........................................................................................................... 250
`8.2 Message-Passing Interface Standards ................................................................... 251
`8.3 Message-Passing Interface Basics ........................................................................ 253
`8.3.1 mpi.h ........................................................................................................... 253
`8.3.2 MPI_Init. ..................................................................................................... 253
`8.3.3 MPI_Finalize .............................................................................................. 254
`8.3.4 Message-Passing Interface Example-Hello World .................................. 254
`8.4 Communicators ..................................................................................................... 255
`8.4.1 Size ............................................................................................................. 256
`8.4.2 Rank ............................................................................................................ 256
`8.4.3 Example ...................................................................................................... 257
`8.5 Point-to-Point Messages ....................................................................................... 258
`8.5.1 MPI Send .................................................................................................... 259
`8.5.2 Message-Passing Interface Data Types ...................................................... 259
`8.5.3 MPI Recv .................................................................................................... 259
`8.5.4 Example ...................................................................................................... 260
`8.6 Synchronization Collectives ............................................................................... • • 262
`8.6.1 Overview of Collective Calls ..................................................................... 262
`8.6.2 Barrier Synchronization ............................................................................. 263
`8.6.3 Example ...................................................................................................... 264
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 11
`
`
`
`xii
`
`CONTENTS
`
`8.7 Communication Collectives ................................................................................. 265
`8.7.1 Collective Data Movement. ........................................................................ 265
`8.7.2 Broadcast .................................................................................................... 268
`8.7.3 Scatter ......................................................................................................... 269
`8.7.4 Gather ......................................................................................................... 271
`8.7.5 Allgather ..................................................................................................... 272
`8.7.6 Reduction Operations ................................................................................. 274
`8.7.7 Alltoall ........................................................................................................ 277
`8.8 Nonblocking Point-to-Point Communication ....................................................... 279
`8.9 User-Defined Data Types ...................................................................................... 281
`8.10 Summary and Outcomes of Chapter 8 ................................................................. 283
`8.11 Exercises ............................................................................................................... 283
`References ....................................................................................................................... 284
`CHAPTER 9 Parallel Algorithms ........................................................................... 285
`9.1 Introduction ........................................................................................................... 285
`9.2 Fork-Join ............................................................................................................. 286
`9.3 Divide and Conquer .............................................................................................. 287
`9.4 Manager-Worker ................................................................................................. 291
`9.5 Embarrassingly Parallel ........................................................................................ 292
`9.6 Halo Exchange ...................................................................................................... 294
`9.6.1 The Advection Equation Using Finite Difference ..................................... 295
`9.6.2 Sparse Matrix Vector Multiplication .......................................................... 297
`9.7 Permutation: Cannon's Algorithm ........................................................................ 301
`9.8 Task Dataflow: Breadth First Search .................................................................... 306
`9.9 Summary and Outcomes of Chapter 9 ................................................................. 310
`9.10 Exercises ............................................................................................................... 311
`References ....................................................................................................................... 311
`CHAPTER 10 Libraries ........................................................................................... 313
`10. 1 Introduction ........................................................................................................ 313
`10.2 Linear Algebra .................................................................................................... 315
`10.2.1 Basic Linear Algebra Subprograms ..................................................... 317
`10.2.2 Linear Algebra Package ....................................................................... 324
`10.2.3 Scalable Linear Algebra Package ........................................................ 326
`10.2.4 GNU Scientific Library ........................................................................ 326
`10.2.5 Supernodal LU ..................................................................................... 326
`10.2.6 Portable Extensible Toolkit for Scientific Computation ...................... 327
`10.2.7 Scalable Library for Eigenvalue Problem Computations .................... 328
`10.2.8 Eigenvalue SoLvers for Petaflop-Applications .................................... 328
`10.2.9 Hypre: Scalable Linear Solvers and Multigrid Methods ..................... 328
`10.2.10 Domain-Specific Languages for Linear Algebra ................................. 329
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2136, p. 12
`
`
`
`CONTENTS
`
`xiii
`
`10.3 Partial Differential Equations ............................................................................. 329
`10.4 Graph Algorithms ............................................................................................... 329
`10.5 Parallel Input/Output .......................................................................................... 330
`10.6 Mesh Decomposition .......................................................................................... 333
`10. 7 Visualization ....................................................................................................... 334
`10.8 Parallelization ..................................................................................................... 334
`10.9 Signal Processing ................................................................................................ 334
`10.10 Performance Monitoring .................................................................................... 341
`10.11 Summary and Outcomes of Chapter 10 ............................................................. 342
`10.12 Exercises ............................................................................................................. 343
`References ..............................................................................