`In Praise of Computer Architecture: A Quantitative Approach
`Fifth Edition
`“The 5th edition of Computer Architecture: A Quantitative Approach continues
`the legacy, providing students of computer architecture with the most up-to-date
`information on current computing platforms, and architectural insights to help
`them design future systems. A highlight of the new edition is the significantly
`revised chapter on data-level parallelism, which demystifies GPU architectures
`with clear explanations using traditional computer architecture terminology.”
`—Krste Asanovic´, University of California, Berkeley
`“Computer Architecture: A Quantitative Approach is a classic that, like fine
`wine, just keeps getting better. I bought my first copy as I finished up my under-
`graduate degree and it remains one of my most frequently referenced texts today.
`When the fourth edition came out, there was so much new material that I needed
`to get it to stay current in the field. And, as I review the fifth edition, I realize that
`Hennessy and Patterson have done it again. The entire text is heavily updated and
`Chapter 6 alone makes this new edition required reading for those wanting to
`really understand cloud and warehouse scale-computing. Only Hennessy and
`Patterson have access to the insiders at Google, Amazon, Microsoft, and other
`cloud computing and internet-scale application providers and there is no better
`coverage of this important area anywhere in the industry.”
`—James Hamilton, Amazon Web Services
`“Hennessy and Patterson wrote the first edition of this book when graduate stu-
`dents built computers with 50,000 transistors. Today, warehouse-size computers
`contain that many servers, each consisting of dozens of independent processors
`and billions of transistors. The evolution of computer architecture has been rapid
`and relentless, but Computer Architecture: A Quantitative Approach has kept
`pace, with each edition accurately explaining and analyzing the important emerg-
`ing ideas that make this field so exciting.”
`—James Larus, Microsoft Research
`“This new edition adds a superb new chapter on data-level parallelism in vector,
`SIMD, and GPU architectures. It explains key architecture concepts inside mass-
`market GPUs, maps them to traditional terms, and compares them with vector
`and SIMD architectures. It’s timely and relevant with the widespread shift to
`GPU parallel computing. Computer Architecture: A Quantitative Approach fur-
`thers its string of firsts in presenting comprehensive architecture coverage of sig-
`nificant new developments!”
`—John Nickolls, NVIDIA
`“The new edition of this now classic textbook highlights the ascendance of
`explicit parallelism (data, thread, request) by devoting a whole chapter to each
`type. The chapter on data parallelism is particularly illuminating: the comparison
`and contrast between Vector SIMD, instruction level SIMD, and GPU cuts
`through the jargon associated with each architecture and exposes the similarities
`and differences between these architectures.”
`—Kunle Olukotun, Stanford University
`“The fifth edition of Computer Architecture: A Quantitative Approach explores
`the various parallel concepts and their respective tradeoffs. As with the previous
`editions, this new edition covers the latest technology trends. Two highlighted are
`the explosive growth of Personal Mobile Devices (PMD) and Warehouse Scale
`Computing (WSC)—where the focus has shifted towards a more sophisticated
`balance of performance and energy efficiency as compared with raw perfor-
`mance. These trends are fueling our demand for ever more processing capability
`which in turn is moving us further down the parallel path.”
`—Andrew N. Sloss, Consultant Engineer, ARM
`Author of ARM System Developer’s Guide
`Computer Architecture
`A Quantitative Approach
`Fifth Edition
`John L. Hennessy is the tenth president of Stanford University, where he has been a member
`of the faculty since 1977 in the departments of electrical engineering and computer science.
`Hennessy is a Fellow of the IEEE and ACM; a member of the National Academy of Engineering,
`the National Academy of Science, and the American Philosophical Society; and a Fellow of
`the American Academy of Arts and Sciences. Among his many awards are the 2001 Eckert-
`Mauchly Award for his contributions to RISC technology, the 2001 Seymour Cray Computer
`Engineering Award, and the 2000 John von Neumann Award, which he shared with David
`Patterson. He has also received seven honorary doctorates.
`In 1981, he started the MIPS project at Stanford with a handful of graduate students. After
`completing the project in 1984, he took a leave from the university to cofound MIPS Computer
`Systems (now MIPS Technologies), which developed one of the first commercial RISC
`microprocessors. As of 2006, over 2 billion MIPS microprocessors have been shipped in devices
`ranging from video games and palmtop computers to laser printers and network switches.
`Hennessy subsequently led the DASH (Director Architecture for Shared Memory) project, which
`prototyped the first scalable cache coherent multiprocessor; many of the key ideas have been
`adopted in modern multiprocessors. In addition to his technical activities and university
`responsibilities, he has continued to work with numerous start-ups both as an early-stage
`advisor and an investor.
`David A. Patterson has been teaching computer architecture at the University of California,
`Berkeley, since joining the faculty in 1977, where he holds the Pardee Chair of Computer
`Science. His teaching has been honored by the Distinguished Teaching Award from the
`University of California, the Karlstrom Award from ACM, and the Mulligan Education Medal and
`Undergraduate Teaching Award from IEEE. Patterson received the IEEE Technical Achievement
`Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE
`Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John von
`Neumann Medal and the C & C Prize with John Hennessy. Like his co-author, Patterson is a
`Fellow of the American Academy of Arts and Sciences, the Computer History Museum, ACM,
`and IEEE, and he was elected to the National Academy of Engineering, the National Academy
`of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on the Information
`Technology Advisory Committee to the U.S. President, as chair of the CS division in the Berkeley
`EECS department, as chair of the Computing Research Association, and as President of ACM.
`This record led to Distinguished Service Awards from ACM and CRA.
`At Berkeley, Patterson led the design and implementation of RISC I, likely the first VLSI reduced
`instruction set computer, and the foundation of the commercial SPARC architecture. He was a
`leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led to dependable
`storage systems from many companies. He was also involved in the Network of Workstations
`(NOW) project, which led to cluster technology used by Internet companies and later to cloud
`computing. These projects earned three dissertation awards from ACM. His current research
`projects are Algorithm-Machine-People Laboratory and the Parallel Computing Laboratory,
`where he is director. The goal of the AMP Lab is develop scalable machine learning algorithms,
`warehouse-scale-computer-friendly programming models, and crowd-sourcing tools to gain
`valueable insights quickly from big data in the cloud. The goal of the Par Lab is to develop tech-
`nologies to deliver scalable, portable, efficient, and productive software for parallel personal
`mobile devices.
`Computer Architecture
`A Quantitative Approach
`Fifth Edition
`John L. Hennessy
`Stanford University
`David A. Patterson
`University of California, Berkeley
`With Contributions by
`Krste Asanovic´
`University of California, Berkeley
`Jason D. Bakos
`University of South Carolina
`Robert P. Colwell
`R&E Colwell & Assoc. Inc.
`Thomas M. Conte
`North Carolina State University
`José Duato
`Universitat Politècnica de València and Simula
`Diana Franklin
`University of California, Santa Barbara
`David Goldberg
`The Scripps Research Institute
`Norman P. Jouppi
`HP Labs
`Sheng Li
`HP Labs
`Naveen Muralimanohar
`HP Labs
`Gregory D. Peterson
`University of Tennessee
`Timothy M. Pinkston
`University of Southern California
`Parthasarathy Ranganathan
`HP Labs
`David A. Wood
`University of Wisconsin–Madison
`Amr Zaky
`University of Santa Clara
`Amsterdam • Boston • Heidelberg • London
`New York • Oxford • Paris • San Diego
`San Francisco • Singapore • Sydney • Tokyo
`To Andrea, Linda, and our four sons
`This page intentionally left blank
`by Luiz André Barroso, Google Inc.
`The first edition of Hennessy and Patterson’s Computer Architecture: A Quanti-
`tative Approach was released during my first year in graduate school. I belong,
`therefore, to that first wave of professionals who learned about our discipline
`using this book as a compass. Perspective being a fundamental ingredient to a
`useful Foreword, I find myself at a disadvantage given how much of my own
`views have been colored by the previous four editions of this book. Another
`obstacle to clear perspective is that the student-grade reverence for these two
`superstars of Computer Science has not yet left me, despite (or perhaps because
`of) having had the chance to get to know them in the years since. These disadvan-
`tages are mitigated by my having practiced this trade continuously since this
`book’s first edition, which has given me a chance to enjoy its evolution and
`enduring relevance.
`The last edition arrived just two years after the rampant industrial race for
`higher CPU clock frequency had come to its official end, with Intel cancelling its
`4 GHz single-core developments and embracing multicore CPUs. Two years was
`plenty of time for John and Dave to present this story not as a random product
`line update, but as a defining computing technology inflection point of the last
`decade. That fourth edition had a reduced emphasis on instruction-level parallel-
`ism (ILP) in favor of added material on thread-level parallelism, something the
`current edition takes even further by devoting two chapters to thread- and data-
`level parallelism while limiting ILP discussion to a single chapter. Readers who
`are being introduced to new graphics processing engines will benefit especially
`from the new Chapter 4 which focuses on data parallelism, explaining the
`different but slowly converging solutions offered by multimedia extensions in
`general-purpose processors and increasingly programmable graphics processing
`units. Of notable practical relevance: If you have ever struggled with CUDA
`terminology check out Figure 4.24 (teaser: “Shared Memory” is really local,
`while “Global Memory” is closer to what you’d consider shared memory).
`Even though we are still in the middle of that multicore technology shift, this
`edition embraces what appears to be the next major one: cloud computing. In this
`case, the ubiquity of Internet connectivity and the evolution of compelling Web
`services are bringing to the spotlight very small devices (smart phones, tablets)
`x ■ Foreword
`and very large ones (warehouse-scale computing systems). The ARM Cortex A8,
`a popular CPU for smart phones, appears in Chapter 3’s “Putting It All Together”
`section, and a whole new Chapter 6 is devoted to request- and data-level parallel-
`ism in the context of warehouse-scale computing systems. In this new chapter,
`John and Dave present these new massive clusters as a distinctively new class of
`computers—an open invitation for computer architects to help shape this emerg-
`ing field. Readers will appreciate how this area has evolved in the last decade by
`comparing the Google cluster architecture described in the third edition with the
`more modern incarnation presented in this version’s Chapter 6.
`Return customers of this book will appreciate once again the work of two outstanding
`computer scientists who over their careers have perfected the art of combining an
`academic’s principled treatment of ideas with a deep understanding of leading-edge
`industrial products and technologies. The authors’ success in industrial interactions
`won’t be a surprise to those who have witnessed how Dave conducts his biannual proj-
`ect retreats, forums meticulously crafted to extract the most out of academic–industrial
`collaborations. Those who recall John’s entrepreneurial success with MIPS or bump into
`him in a Google hallway (as I occasionally do) won’t be surprised by it either.
`Perhaps most importantly, return and new readers alike will get their money’s
`worth. What has made this book an enduring classic is that each edition is not an
`update but an extensive revision that presents the most current information and
`unparalleled insight into this fascinating and quickly changing field. For me, after
`over twenty years in this profession, it is also another opportunity to experience
`that student-grade admiration for two remarkable teachers.
`Chapter 1
`Fundamentals of Quantitative Design and Analysis
`Classes of Computers
`1.3 Defining Computer Architecture
`Trends in Technology
`Trends in Power and Energy in Integrated Circuits
`Trends in Cost
`1.7 Dependability
`1.8 Measuring, Reporting, and Summarizing Performance
`1.9 Quantitative Principles of Computer Design
`1.10 Putting It All Together: Performance, Price, and Power
`1.11 Fallacies and Pitfalls
`1.12 Concluding Remarks
`1.13 Historical Perspectives and References
`Case Studies and Exercises by Diana Franklin
`Chapter 2 Memory Hierarchy Design
`Ten Advanced Optimizations of Cache Performance
`2.3 Memory Technology and Optimizations
`Protection: Virtual Memory and Virtual Machines
`Crosscutting Issues: The Design of Memory Hierarchies
`Putting It All Together: Memory Hierachies in the
`ARM Cortex-A8 and Intel Core i7
`Fallacies and Pitfalls
`■ Contents
`Chapter 3
`Concluding Remarks: Looking Ahead
`2.9 Historical Perspective and References
`Case Studies and Exercises by Norman P. Jouppi,
`Naveen Muralimanohar, and Sheng Li
`Instruction-Level Parallelism and Its Exploitation
`Instruction-Level Parallelism: Concepts and Challenges
`Basic Compiler Techniques for Exposing ILP
`Reducing Branch Costs with Advanced Branch Prediction
`3.4 Overcoming Data Hazards with Dynamic Scheduling
`3.5 Dynamic Scheduling: Examples and the Algorithm
`3.6 Hardware-Based Speculation
`Exploiting ILP Using Multiple Issue and Static Scheduling
`Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and
`Advanced Techniques for Instruction Delivery and Speculation
`3.10 Studies of the Limitations of ILP
`3.11 Cross-Cutting Issues: ILP Approaches and the Memory System
`3.12 Multithreading: Exploiting Thread-Level Parallelism to Improve
`Uniprocessor Throughput
`3.13 Putting It All Together: The Intel Core i7 and ARM Cortex-A8
`3.14 Fallacies and Pitfalls
`3.15 Concluding Remarks: What’s Ahead?
`3.16 Historical Perspective and References
`Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
`Chapter 4
`Data-Level Parallelism in Vector, SIMD, and GPU Architectures
`Vector Architecture
`SIMD Instruction Set Extensions for Multimedia
`4.4 Graphics Processing Units
`4.5 Detecting and Enhancing Loop-Level Parallelism
`Crosscutting Issues
`Putting It All Together: Mobile versus Server GPUs
`and Tesla versus Core i7
`Fallacies and Pitfalls
`Concluding Remarks
`4.10 Historical Perspective and References
`Case Study and Exercises by Jason D. Bakos
`Chapter 5
`Thread-Level Parallelism
`Centralized Shared-Memory Architectures
`Performance of Symmetric Shared-Memory Multiprocessors
`5.4 Distributed Shared-Memory and Directory-Based Coherence
`Synchronization: The Basics
`5.6 Models of Memory Consistency: An Introduction
`Crosscutting Issues
`Putting It All Together: Multicore Processors and Their Performance
`Fallacies and Pitfalls
`5.10 Concluding Remarks
`5.11 Historical Perspectives and References
`Case Studies and Exercises by Amr Zaky and David A. Wood
`Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and
`Data-Level Parallelism
`Programming Models and Workloads for Warehouse-Scale Computers 436
`Computer Architecture of Warehouse-Scale Computers
`Physical Infrastructure and Costs of Warehouse-Scale Computers
`Cloud Computing: The Return of Utility Computing
`Crosscutting Issues
`Putting It All Together: A Google Warehouse-Scale Computer
`Fallacies and Pitfalls
`Concluding Remarks
`6.10 Historical Perspectives and References
`Case Studies and Exercises by Parthasarathy Ranganathan
`Appendix A
`Instruction Set Principles
` Introduction
` Classifying Instruction Set Architectures
` Memory Addressing
` Type and Size of Operands
` Operations in the Instruction Set
` Instructions for Control Flow
` Encoding an Instruction Set
` Crosscutting Issues: The Role of Compilers
` Putting It All Together: The MIPS Architecture
`A.10 Fallacies and Pitfalls
`A.11 Concluding Remarks
`A.12 Historical Perspective and References
` Exercises by Gregory D. Peterson
`Appendix B
`Review of Memory Hierarchy
` Introduction
` Cache Performance
` Six Basic Cache Optimizations
` A-2
` A-3
` A-7
` A-13
` A-14
` A-16
` A-21
` A-24
` A-32
` A-39
` A-45
` A-47
` A-47
` B-2
` B-16
` B-22
`xiv ■ Contents
`Appendix C
` Virtual Memory
` Protection and Examples of Virtual Memory
` Fallacies and Pitfalls
` Concluding Remarks
` Historical Perspective and References
` Exercises by Amr Zaky
`Pipelining: Basic and Intermediate Concepts
` Introduction
` The Major Hurdle of Pipelining—Pipeline Hazards
` How Is Pipelining Implemented?
` What Makes Pipelining Hard to Implement?
` Extending the MIPS Pipeline to Handle Multicycle Operations
` Putting It All Together: The MIPS R4000 Pipeline
` Crosscutting Issues
` Fallacies and Pitfalls
` Concluding Remarks
`C.10 Historical Perspective and References
` Updated Exercises by Diana Franklin
`Appendix D
`Appendix E
`Appendix F
`Appendix G
`Appendix H
`Appendix I
`Appendix J
`Appendix K
`Appendix L
`Online Appendices
`Storage Systems
`Embedded Systems
`By Thomas M. Conte
`Interconnection Networks
`Revised by Timothy M. Pinkston and José Duato
`Vector Processors in More Depth
`Revised by Krste Asanovic
`Hardware and Software for VLIW and EPIC
`Large-Scale Multiprocessors and Scientific Applications
`Computer Arithmetic
`by David Goldberg
`Survey of Instruction Set Architectures
`Historical Perspectives and References
` B-40
` B-49
` B-57
` B-59
` B-59
` B-60
` C-2
` C-11
` C-30
` C-43
` C-51
` C-61
` C-70
` C-80
` C-81
` C-81
` C-82
`Why We Wrote This Book
`Through five editions of this book, our goal has been to describe the basic princi-
`ples underlying what will be tomorrow’s technological developments. Our excite-
`ment about the opportunities in computer architecture has not abated, and we
`echo what we said about the field in the first edition: “It is not a dreary science of
`paper machines that will never work. No! It’s a discipline of keen intellectual
`interest, requiring the balance of marketplace forces to cost-performance-power,
`leading to glorious failures and some notable successes.”
`Our primary objective in writing our first book was to change the way people
`learn and think about computer architecture. We feel this goal is still valid and
`important. The field is changing daily and must be studied with real examples
`and measurements on real computers, rather than simply as a collection of defini-
`tions and designs that will never need to be realized. We offer an enthusiastic
`welcome to anyone who came along with us in the past, as well as to those who
`are joining us now. Either way, we can promise the same quantitative approach
`to, and analysis of, real systems.
`As with earlier versions, we have strived to produce a new edition that will
`continue to be as relevant for professional engineers and architects as it is for
`those involved in advanced computer architecture and design courses. Like the
`first edition, this edition has a sharp focus on new platforms—personal mobile
`devices and warehouse-scale computers—and new architectures—multicore and
`GPUs. As much as its predecessors, this edition aims to demystify computer
`architecture through an emphasis on cost-performance-energy trade-offs and
`good engineering design. We believe that the field has continued to mature and
`move toward the rigorous quantitative foundation of long-established scientific
`and engineering disciplines.
`This Edition
`We said the fourth edition of Computer Architecture: A Quantitative Approach
`may have been the most significant since the first edition due to the switch to
`multicore chips. The feedback we received this time was that the book had lost
`the sharp focus of the first edition, covering everthing equally but without empha-
`sis and context. We’re pretty sure that won’t be said about the fifth edition.
`We believe most of the excitement is at the extremes in size of computing,
`with personal mobile devices (PMDs) such as cell phones and tablets as the cli-
`ents and warehouse-scale computers offering cloud computing as the server.
`(Observant readers may seen the hint for cloud computing on the cover.) We are
`struck by the common theme of these two extremes in cost, performance, and
`energy efficiency despite their difference in size. As a result, the running context
`through each chapter is computing for PMDs and for warehouse scale computers,
`and Chapter 6 is a brand-new chapter on the latter topic.
`The other theme is parallelism in all its forms. We first idetify the two types of
`application-level parallelism in Chapter 1: data-level parallelism (DLP), which
`arises because there are many data items that can be operated on at the same time,
`and task-level parallelism (TLP), which arises because tasks of work are created
`that can operate independently and largely in parallel. We then explain the four
`architectural styles that exploit DLP and TLP: instruction-level parallelism (ILP)
`in Chapter 3; vector architectures and graphic processor units (GPUs) in Chapter
`4, which is a brand-new chapter for this edition; thread-level parallelism in
`Chapter 5; and request-level parallelism (RLP) via warehouse-scale computers in
`Chapter 6, which is also a brand-new chapter for this edition. We moved memory
`hierarchy earlier in the book to Chapter 2, and we moved the storage systems
`chapter to Appendix D. We are particularly proud about Chapter 4, which con-
`tains the most detailed and clearest explanation of GPUs yet, and Chapter 6,
`which is the first publication of the most recent details of a Google Warehouse-
`scale computer.
`As before, the first three appendices in the book give basics on the MIPS
`instruction set, memory hierachy, and pipelining for readers who have not read a
`book like Computer Organization and Design. To keep costs down but still sup-
`ply supplemental material that are of interest to some readers, available online at
` are nine more appendices. There are
`more pages in these appendices than there are in this book!
`This edition continues the tradition of using real-world examples to demon-
`strate the ideas, and the “Putting It All Together” sections are brand new. The
`“Putting It All Together” sections of this edition include the pipeline organiza-
`tions and memory hierarchies of the ARM Cortex A8 processor, the Intel core i7
`processor, the NVIDIA GTX-280 and GTX-480 GPUs, and one of the Google
`warehouse-scale computers.
`Topic Selection and Organization
`As before, we have taken a conservative approach to topic selection, for there are
`many more interesting ideas in the field than can reasonably be covered in a treat-
`ment of basic principles. We have steered away from a comprehensive survey of
`every architecture a reader might encounter. Instead, our presentation focuses on
`core concepts likely to be found in any new machine. The key criterion remains
`that of selecting ideas that have been examined and utilized successfully enough
`to permit their discussion in quantitative terms.
`Our intent has always been to focus on material that is not available in equiva-
`lent form from other sources, so we continue to emphasize advanced content
`wherever possible. Indeed, there are several systems here whose descriptions
`cannot be found in the literature. (Readers interested strictly in a more basic
`introduction to computer architecture should read Computer Organization and
`Design: The Hardware/Software Interface.)
`An Overview of the Content
`Chapter 1 has been beefed up in this edition. It includes formulas for energy,
`static power, dynamic power, integrated circuit costs, reliability, and availability.
`(These formulas are also found on the front inside cover.) Our hope is that these
`topics can be used through the rest of the book. In addition to the classic quantita-
`tive principles of computer design and performance measurement, the PIAT sec-
`tion has been upgraded to use the new SPECPower benchmark.
`Our view is that the instruction set architecture is playing less of a role today
`than in 1990, so we moved this material to Appendix A. It still uses the MIPS64
`architecture. (For quick review, a summary of the MIPS ISA can be found on the
`back inside cover.) For fans of ISAs, Appendix K covers 10 RISC architectures,
`the 80x86, the DEC VAX, and the IBM 360/370.
`We then move onto memory hierarchy in Chapter 2, since it is easy to apply
`the cost-performance-energy principles to this material and memory is a critical
`resource for the rest of the chapters. As in the past edition, Appendix B contains
`an introductory review of cache principles, which is available in case you need it.
`Chapter 2 discusses 10 advanced optimizations of caches. The chapter includes
`virtual machines, which offers advantages in protection, software management,
`and hardware management and play an important role in cloud computing. In
`addition to covering SRAM and DRAM technologies, the chapter includes new
`material on Flash memory. The PIAT examples are the ARM Cortex A8, which is
`used in PMDs, and the Intel Core i7, which is used in servers.
`Chapter 3 covers the exploitation of instruction-level parallelism in high-
`performance processors, including superscalar execution, branch prediction,
`speculation, dynamic scheduling, and multithreading. As mentioned earlier,
`Appendix C is a review of pipelining in case you need it. Chapter 3 also sur-
`veys the limits of ILP. Like Chapter 2, the PIAT examples are again the ARM
`Cortex A8 and the Intel Core i7. While the third edition contained a great deal
`EX. 2153, p. 19


`■ Preface
`on Itanium and VLIW, this material is now in Appendix H, indicating our view
`that this architecture did not live up to the earlier claims.
`The increasing importance of multimedia applications such as games and video
`processing has also increased the importance of achitectures that can exploit data-
`level parallelism. In particular, there is a rising interest in computing using graphi-
`cal processing units (GPUs), yet few architects understand how GPUs really work.
`We decided to write a new chapter in large part to unveil this new style of com-
`puter architecture. Chapter 4 starts with an introduction to vector architectures,
`which acts as a foundation on which to build explanations of multimedia SIMD
`instrution set extensions and GPUs. (Appendix G goes into even more depth on
`vector architectures.) The section on GPUs was the most difficult to write in this
`book, in that it took many iterations to get an accurate description that was also
`easy to understand. A significant challenge was the terminology. We decided to go
`with our own terms and then provide a translation between our terms and the offi-
`cial NVIDIA terms. (A copy of that table can be found in the back inside cover
`pages.) This chapter introduces the Roofline performance model and then uses it
`to compare the Intel Core i7 and the NVIDIA GTX 280 and GTX 480 GPUs. The
`chapter also describes the Tegra 2 GPU for PMDs.
`Chapter 5 describes multicore processors. It explores symmetric and
`distributed-memory architectures, examining both organizational principles and
`performance. Topics in synchronization and memory consistency models are
`next. The example is the Intel Core i7. Readers interested in interconnection net-
`works on a chip should read Appendix F, and those interested in larger scale mul-
`tiprocessors and scientific applications should read Appendix I.
`As mentioned earlier, Chapter 6 describes the newest topic in computer archi-
`tecture, warehouse-scale computers (WSCs). Based on help from engineers at
`Amazon Web Services and Google, this chapter integrates details on design, cost,

