`
`Reference 41
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 1
`
`
`
`"2
`
`A HENNESSY
`
`COMPUTER
`
`;
`
`PATTERSON
`
`ARCHITECTURE
`
`A anntitalive A10/) mac/z
`
`PATENT OWNER DIRECTSTREAM, LLC
`-
`-
`.
`-
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 2
`
`
`
`In Praise of Computer Architecture: A Quantitative Approach
`Fifth Edition
`
`“The 5th edition of Computer Architecture: A Quantitative Approach continues
`the legacy, providing students of computer architecture with the most up-to-date
`information on current computing platforms, and architectural insights to help
`them design future systems. A highlight of the new edition is the significantly
`revised chapter on data-level parallelism, which demystifies GPU architectures
`with clear explanations using traditional computer architecture terminology.”
`
`—Krste Asanovic´, University of California, Berkeley
`
`“Computer Architecture: A Quantitative Approach is a classic that, like fine
`wine, just keeps getting better. I bought my first copy as I finished up my under-
`graduate degree and it remains one of my most frequently referenced texts today.
`When the fourth edition came out, there was so much new material that I needed
`to get it to stay current in the field. And, as I review the fifth edition, I realize that
`Hennessy and Patterson have done it again. The entire text is heavily updated and
`Chapter 6 alone makes this new edition required reading for those wanting to
`really understand cloud and warehouse scale-computing. Only Hennessy and
`Patterson have access to the insiders at Google, Amazon, Microsoft, and other
`cloud computing and internet-scale application providers and there is no better
`coverage of this important area anywhere in the industry.”
`
`—James Hamilton, Amazon Web Services
`
`“Hennessy and Patterson wrote the first edition of this book when graduate stu-
`dents built computers with 50,000 transistors. Today, warehouse-size computers
`contain that many servers, each consisting of dozens of independent processors
`and billions of transistors. The evolution of computer architecture has been rapid
`and relentless, but Computer Architecture: A Quantitative Approach has kept
`pace, with each edition accurately explaining and analyzing the important emerg-
`ing ideas that make this field so exciting.”
`
`—James Larus, Microsoft Research
`
`“This new edition adds a superb new chapter on data-level parallelism in vector,
`SIMD, and GPU architectures. It explains key architecture concepts inside mass-
`market GPUs, maps them to traditional terms, and compares them with vector
`and SIMD architectures. It’s timely and relevant with the widespread shift to
`GPU parallel computing. Computer Architecture: A Quantitative Approach fur-
`thers its string of firsts in presenting comprehensive architecture coverage of sig-
`nificant new developments!”
`
`—John Nickolls, NVIDIA
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 3
`
`
`
`“The new edition of this now classic textbook highlights the ascendance of
`explicit parallelism (data, thread, request) by devoting a whole chapter to each
`type. The chapter on data parallelism is particularly illuminating: the comparison
`and contrast between Vector SIMD, instruction level SIMD, and GPU cuts
`through the jargon associated with each architecture and exposes the similarities
`and differences between these architectures.”
`
`—Kunle Olukotun, Stanford University
`
`“The fifth edition of Computer Architecture: A Quantitative Approach explores
`the various parallel concepts and their respective tradeoffs. As with the previous
`editions, this new edition covers the latest technology trends. Two highlighted are
`the explosive growth of Personal Mobile Devices (PMD) and Warehouse Scale
`Computing (WSC)—where the focus has shifted towards a more sophisticated
`balance of performance and energy efficiency as compared with raw perfor-
`mance. These trends are fueling our demand for ever more processing capability
`which in turn is moving us further down the parallel path.”
`
`—Andrew N. Sloss, Consultant Engineer, ARM
`Author of ARM System Developer’s Guide
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 4
`
`
`
`Computer Architecture
`A Quantitative Approach
`
`Fifth Edition
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 5
`
`
`
`John L. Hennessy is the tenth president of Stanford University, where he has been a member
`of the faculty since 1977 in the departments of electrical engineering and computer science.
`Hennessy is a Fellow of the IEEE and ACM; a member of the National Academy of Engineering,
`the National Academy of Science, and the American Philosophical Society; and a Fellow of
`the American Academy of Arts and Sciences. Among his many awards are the 2001 Eckert-
`Mauchly Award for his contributions to RISC technology, the 2001 Seymour Cray Computer
`Engineering Award, and the 2000 John von Neumann Award, which he shared with David
`Patterson. He has also received seven honorary doctorates.
`
`In 1981, he started the MIPS project at Stanford with a handful of graduate students. After
`completing the project in 1984, he took a leave from the university to cofound MIPS Computer
`Systems (now MIPS Technologies), which developed one of the first commercial RISC
`microprocessors. As of 2006, over 2 billion MIPS microprocessors have been shipped in devices
`ranging from video games and palmtop computers to laser printers and network switches.
`Hennessy subsequently led the DASH (Director Architecture for Shared Memory) project, which
`prototyped the first scalable cache coherent multiprocessor; many of the key ideas have been
`adopted in modern multiprocessors. In addition to his technical activities and university
`responsibilities, he has continued to work with numerous start-ups both as an early-stage
`advisor and an investor.
`
`David A. Patterson has been teaching computer architecture at the University of California,
`Berkeley, since joining the faculty in 1977, where he holds the Pardee Chair of Computer
`Science. His teaching has been honored by the Distinguished Teaching Award from the
`University of California, the Karlstrom Award from ACM, and the Mulligan Education Medal and
`Undergraduate Teaching Award from IEEE. Patterson received the IEEE Technical Achievement
`Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE
`Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John von
`Neumann Medal and the C & C Prize with John Hennessy. Like his co-author, Patterson is a
`Fellow of the American Academy of Arts and Sciences, the Computer History Museum, ACM,
`and IEEE, and he was elected to the National Academy of Engineering, the National Academy
`of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on the Information
`Technology Advisory Committee to the U.S. President, as chair of the CS division in the Berkeley
`EECS department, as chair of the Computing Research Association, and as President of ACM.
`This record led to Distinguished Service Awards from ACM and CRA.
`
`At Berkeley, Patterson led the design and implementation of RISC I, likely the first VLSI reduced
`instruction set computer, and the foundation of the commercial SPARC architecture. He was a
`leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led to dependable
`storage systems from many companies. He was also involved in the Network of Workstations
`(NOW) project, which led to cluster technology used by Internet companies and later to cloud
`computing. These projects earned three dissertation awards from ACM. His current research
`projects are Algorithm-Machine-People Laboratory and the Parallel Computing Laboratory,
`where he is director. The goal of the AMP Lab is develop scalable machine learning algorithms,
`warehouse-scale-computer-friendly programming models, and crowd-sourcing tools to gain
`valueable insights quickly from big data in the cloud. The goal of the Par Lab is to develop tech-
`nologies to deliver scalable, portable, efficient, and productive software for parallel personal
`mobile devices.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 6
`
`
`
`Computer Architecture
`A Quantitative Approach
`
`Fifth Edition
`
`John L. Hennessy
`Stanford University
`
`David A. Patterson
`University of California, Berkeley
`
`With Contributions by
`Krste Asanovic´
`University of California, Berkeley
`Jason D. Bakos
`University of South Carolina
`Robert P. Colwell
`R&E Colwell & Assoc. Inc.
`Thomas M. Conte
`North Carolina State University
`José Duato
`Universitat Politècnica de València and Simula
`Diana Franklin
`University of California, Santa Barbara
`David Goldberg
`The Scripps Research Institute
`
`Norman P. Jouppi
`HP Labs
`Sheng Li
`HP Labs
`Naveen Muralimanohar
`HP Labs
`Gregory D. Peterson
`University of Tennessee
`Timothy M. Pinkston
`University of Southern California
`Parthasarathy Ranganathan
`HP Labs
`David A. Wood
`University of Wisconsin–Madison
`Amr Zaky
`University of Santa Clara
`
`Amsterdam • Boston • Heidelberg • London
`New York • Oxford • Paris • San Diego
`San Francisco • Singapore • Sydney • Tokyo
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 7
`
`
`
`Acquiring Editor: Todd Green
`Development Editor: Nate McFadden
`Project Manager: Paul Gottehrer
`Designer: Joanne Blank
`
`Morgan Kaufmann is an imprint of Elsevier
`225 Wyman Street, Waltham, MA 02451, USA
`
`© 2012 Elsevier, Inc. All rights reserved.
`
`No part of this publication may be reproduced or transmitted in any form or by any means, electronic
`or mechanical, including photocopying, recording, or any information storage and retrieval system,
`without permission in writing from the publisher. Details on how to seek permission, further informa-
`tion about the Publisher’s permissions policies and our arrangements with organizations such as the
`Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website:
`www.elsevier.com/permissions.
`
`This book and the individual contributions contained in it are protected under copyright by the
`Publisher (other than as may be noted herein).
`
`Notices
`Knowledge and best practice in this field are constantly changing. As new research and experience
`broaden our understanding, changes in research methods or professional practices, may become
`necessary. Practitioners and researchers must always rely on their own experience and knowledge in
`evaluating and using any information or methods described herein. In using such information or
`methods they should be mindful of their own safety and the safety of others, including parties for
`whom they have a professional responsibility.
`
`To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume
`any liability for any injury and/or damage to persons or property as a matter of products liability, neg-
`ligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas
`contained in the material herein.
`
`Library of Congress Cataloging-in-Publication Data
`Application submitted
`
`British Library Cataloguing-in-Publication Data
`A catalogue record for this book is available from the British Library.
`
`ISBN: 978-0-12-383872-8
`
`For information on all MK publications
`visit our website at www.mkp.com
`
`Printed in the United States of America
`11 12 13 14 15 10 9 8 7 6 5 4 3 2 1
`
`Typeset by: diacriTech, Chennai, India
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 8
`
`
`
`To Andrea, Linda, and our four sons
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 9
`
`
`
`This page intentionally left blank
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 10
`
`
`
`Foreword
`
`by Luiz André Barroso, Google Inc.
`
`1
`
`The first edition of Hennessy and Patterson’s Computer Architecture: A Quanti-
`tative Approach was released during my first year in graduate school. I belong,
`therefore, to that first wave of professionals who learned about our discipline
`using this book as a compass. Perspective being a fundamental ingredient to a
`useful Foreword, I find myself at a disadvantage given how much of my own
`views have been colored by the previous four editions of this book. Another
`obstacle to clear perspective is that the student-grade reverence for these two
`superstars of Computer Science has not yet left me, despite (or perhaps because
`of) having had the chance to get to know them in the years since. These disadvan-
`tages are mitigated by my having practiced this trade continuously since this
`book’s first edition, which has given me a chance to enjoy its evolution and
`enduring relevance.
`The last edition arrived just two years after the rampant industrial race for
`higher CPU clock frequency had come to its official end, with Intel cancelling its
`4 GHz single-core developments and embracing multicore CPUs. Two years was
`plenty of time for John and Dave to present this story not as a random product
`line update, but as a defining computing technology inflection point of the last
`decade. That fourth edition had a reduced emphasis on instruction-level parallel-
`ism (ILP) in favor of added material on thread-level parallelism, something the
`current edition takes even further by devoting two chapters to thread- and data-
`level parallelism while limiting ILP discussion to a single chapter. Readers who
`are being introduced to new graphics processing engines will benefit especially
`from the new Chapter 4 which focuses on data parallelism, explaining the
`different but slowly converging solutions offered by multimedia extensions in
`general-purpose processors and increasingly programmable graphics processing
`units. Of notable practical relevance: If you have ever struggled with CUDA
`terminology check out Figure 4.24 (teaser: “Shared Memory” is really local,
`while “Global Memory” is closer to what you’d consider shared memory).
`Even though we are still in the middle of that multicore technology shift, this
`edition embraces what appears to be the next major one: cloud computing. In this
`case, the ubiquity of Internet connectivity and the evolution of compelling Web
`services are bringing to the spotlight very small devices (smart phones, tablets)
`
`ix
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 11
`
`
`
`x ■ Foreword
`
`and very large ones (warehouse-scale computing systems). The ARM Cortex A8,
`a popular CPU for smart phones, appears in Chapter 3’s “Putting It All Together”
`section, and a whole new Chapter 6 is devoted to request- and data-level parallel-
`ism in the context of warehouse-scale computing systems. In this new chapter,
`John and Dave present these new massive clusters as a distinctively new class of
`computers—an open invitation for computer architects to help shape this emerg-
`ing field. Readers will appreciate how this area has evolved in the last decade by
`comparing the Google cluster architecture described in the third edition with the
`more modern incarnation presented in this version’s Chapter 6.
`Return customers of this book will appreciate once again the work of two outstanding
`computer scientists who over their careers have perfected the art of combining an
`academic’s principled treatment of ideas with a deep understanding of leading-edge
`industrial products and technologies. The authors’ success in industrial interactions
`won’t be a surprise to those who have witnessed how Dave conducts his biannual proj-
`ect retreats, forums meticulously crafted to extract the most out of academic–industrial
`collaborations. Those who recall John’s entrepreneurial success with MIPS or bump into
`him in a Google hallway (as I occasionally do) won’t be surprised by it either.
`Perhaps most importantly, return and new readers alike will get their money’s
`worth. What has made this book an enduring classic is that each edition is not an
`update but an extensive revision that presents the most current information and
`unparalleled insight into this fascinating and quickly changing field. For me, after
`over twenty years in this profession, it is also another opportunity to experience
`that student-grade admiration for two remarkable teachers.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 12
`
`
`
`Contents
`
`1
`
`Foreword
`Preface
`Acknowledgments
`
`Chapter 1
`
`Fundamentals of Quantitative Design and Analysis
`1.1
`Introduction
`1.2
`Classes of Computers
`1.3 Defining Computer Architecture
`1.4
`Trends in Technology
`1.5
`Trends in Power and Energy in Integrated Circuits
`1.6
`Trends in Cost
`1.7 Dependability
`1.8 Measuring, Reporting, and Summarizing Performance
`1.9 Quantitative Principles of Computer Design
`1.10 Putting It All Together: Performance, Price, and Power
`1.11 Fallacies and Pitfalls
`1.12 Concluding Remarks
`1.13 Historical Perspectives and References
`Case Studies and Exercises by Diana Franklin
`
`Chapter 2 Memory Hierarchy Design
`2.1
`Introduction
`2.2
`Ten Advanced Optimizations of Cache Performance
`2.3 Memory Technology and Optimizations
`2.4
`Protection: Virtual Memory and Virtual Machines
`2.5
`Crosscutting Issues: The Design of Memory Hierarchies
`2.6
`Putting It All Together: Memory Hierachies in the
`ARM Cortex-A8 and Intel Core i7
`Fallacies and Pitfalls
`
`2.7
`
`ix
`xv
`xxiii
`
`2
`5
`11
`17
`21
`27
`33
`36
`44
`52
`55
`59
`61
`61
`
`72
`78
`96
`105
`112
`
`113
`125
`
`xi
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 13
`
`
`
`xii
`
`■ Contents
`
`Chapter 3
`
`2.8
`Concluding Remarks: Looking Ahead
`2.9 Historical Perspective and References
`Case Studies and Exercises by Norman P. Jouppi,
`Naveen Muralimanohar, and Sheng Li
`
`Instruction-Level Parallelism and Its Exploitation
`3.1
`Instruction-Level Parallelism: Concepts and Challenges
`3.2
`Basic Compiler Techniques for Exposing ILP
`3.3
`Reducing Branch Costs with Advanced Branch Prediction
`3.4 Overcoming Data Hazards with Dynamic Scheduling
`3.5 Dynamic Scheduling: Examples and the Algorithm
`3.6 Hardware-Based Speculation
`3.7
`Exploiting ILP Using Multiple Issue and Static Scheduling
`3.8
`Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and
`Speculation
`3.9
`Advanced Techniques for Instruction Delivery and Speculation
`3.10 Studies of the Limitations of ILP
`3.11 Cross-Cutting Issues: ILP Approaches and the Memory System
`3.12 Multithreading: Exploiting Thread-Level Parallelism to Improve
`Uniprocessor Throughput
`3.13 Putting It All Together: The Intel Core i7 and ARM Cortex-A8
`3.14 Fallacies and Pitfalls
`3.15 Concluding Remarks: What’s Ahead?
`3.16 Historical Perspective and References
`Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
`
`Chapter 4
`
`Data-Level Parallelism in Vector, SIMD, and GPU Architectures
`4.1
`Introduction
`4.2
`Vector Architecture
`4.3
`SIMD Instruction Set Extensions for Multimedia
`4.4 Graphics Processing Units
`4.5 Detecting and Enhancing Loop-Level Parallelism
`4.6
`Crosscutting Issues
`4.7
`Putting It All Together: Mobile versus Server GPUs
`and Tesla versus Core i7
`4.8
`Fallacies and Pitfalls
`4.9
`Concluding Remarks
`4.10 Historical Perspective and References
`Case Study and Exercises by Jason D. Bakos
`
`Chapter 5
`
`Thread-Level Parallelism
`5.1
`Introduction
`5.2
`Centralized Shared-Memory Architectures
`5.3
`Performance of Symmetric Shared-Memory Multiprocessors
`
`129
`131
`
`131
`
`148
`156
`162
`167
`176
`183
`192
`
`197
`202
`213
`221
`
`223
`233
`241
`245
`247
`247
`
`262
`264
`282
`288
`315
`322
`
`323
`330
`332
`334
`334
`
`344
`351
`366
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 14
`
`
`
`Contents
`
`■ xiii
`
`5.4 Distributed Shared-Memory and Directory-Based Coherence
`5.5
`Synchronization: The Basics
`5.6 Models of Memory Consistency: An Introduction
`5.7
`Crosscutting Issues
`5.8
`Putting It All Together: Multicore Processors and Their Performance
`5.9
`Fallacies and Pitfalls
`5.10 Concluding Remarks
`5.11 Historical Perspectives and References
`Case Studies and Exercises by Amr Zaky and David A. Wood
`
`378
`386
`392
`395
`400
`405
`409
`412
`412
`
`Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and
`Data-Level Parallelism
`6.1
`432
`Introduction
`6.2
`Programming Models and Workloads for Warehouse-Scale Computers 436
`6.3
`Computer Architecture of Warehouse-Scale Computers
`441
`6.4
`Physical Infrastructure and Costs of Warehouse-Scale Computers
`446
`6.5
`Cloud Computing: The Return of Utility Computing
`455
`6.6
`Crosscutting Issues
`461
`6.7
`Putting It All Together: A Google Warehouse-Scale Computer
`464
`6.8
`Fallacies and Pitfalls
`471
`6.9
`Concluding Remarks
`475
`6.10 Historical Perspectives and References
`476
`Case Studies and Exercises by Parthasarathy Ranganathan
`476
`
`Appendix A
`
`Instruction Set Principles
`A.1
` Introduction
`A.2
` Classifying Instruction Set Architectures
`A.3
` Memory Addressing
`A.4
` Type and Size of Operands
`A.5
` Operations in the Instruction Set
`A.6
` Instructions for Control Flow
`A.7
` Encoding an Instruction Set
`A.8
` Crosscutting Issues: The Role of Compilers
`A.9
` Putting It All Together: The MIPS Architecture
`A.10 Fallacies and Pitfalls
`A.11 Concluding Remarks
`A.12 Historical Perspective and References
` Exercises by Gregory D. Peterson
`
`Appendix B
`
`Review of Memory Hierarchy
`B.1
` Introduction
`B.2
` Cache Performance
`B.3
` Six Basic Cache Optimizations
`
` A-2
` A-3
` A-7
` A-13
` A-14
` A-16
` A-21
` A-24
` A-32
` A-39
` A-45
` A-47
` A-47
`
` B-2
` B-16
` B-22
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 15
`
`
`
`xiv ■ Contents
`
`Appendix C
`
`B.4
`B.5
`B.6
`B.7
`B.8
`
`
` Virtual Memory
` Protection and Examples of Virtual Memory
` Fallacies and Pitfalls
` Concluding Remarks
` Historical Perspective and References
` Exercises by Amr Zaky
`
`Pipelining: Basic and Intermediate Concepts
`C.1
` Introduction
`C.2
` The Major Hurdle of Pipelining—Pipeline Hazards
`C.3
` How Is Pipelining Implemented?
`C.4
` What Makes Pipelining Hard to Implement?
`C.5
` Extending the MIPS Pipeline to Handle Multicycle Operations
`C.6
` Putting It All Together: The MIPS R4000 Pipeline
`C.7
` Crosscutting Issues
`C.8
` Fallacies and Pitfalls
`C.9
` Concluding Remarks
`C.10 Historical Perspective and References
`
` Updated Exercises by Diana Franklin
`
`Appendix D
`
`Appendix E
`
`Appendix F
`
`Appendix G
`
`Appendix H
`Appendix I
`Appendix J
`
`Appendix K
`Appendix L
`
`Online Appendices
`Storage Systems
`Embedded Systems
`By Thomas M. Conte
`Interconnection Networks
`Revised by Timothy M. Pinkston and José Duato
`Vector Processors in More Depth
`Revised by Krste Asanovic
`Hardware and Software for VLIW and EPIC
`Large-Scale Multiprocessors and Scientific Applications
`Computer Arithmetic
`by David Goldberg
`Survey of Instruction Set Architectures
`Historical Perspectives and References
`
`References
`Index
`
` B-40
` B-49
` B-57
` B-59
` B-59
` B-60
`
` C-2
` C-11
` C-30
` C-43
` C-51
` C-61
` C-70
` C-80
` C-81
` C-81
` C-82
`
`R-1
`I-1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 16
`
`
`
`Preface
`
`1
`
`Why We Wrote This Book
`
`Through five editions of this book, our goal has been to describe the basic princi-
`ples underlying what will be tomorrow’s technological developments. Our excite-
`ment about the opportunities in computer architecture has not abated, and we
`echo what we said about the field in the first edition: “It is not a dreary science of
`paper machines that will never work. No! It’s a discipline of keen intellectual
`interest, requiring the balance of marketplace forces to cost-performance-power,
`leading to glorious failures and some notable successes.”
`Our primary objective in writing our first book was to change the way people
`learn and think about computer architecture. We feel this goal is still valid and
`important. The field is changing daily and must be studied with real examples
`and measurements on real computers, rather than simply as a collection of defini-
`tions and designs that will never need to be realized. We offer an enthusiastic
`welcome to anyone who came along with us in the past, as well as to those who
`are joining us now. Either way, we can promise the same quantitative approach
`to, and analysis of, real systems.
`As with earlier versions, we have strived to produce a new edition that will
`continue to be as relevant for professional engineers and architects as it is for
`those involved in advanced computer architecture and design courses. Like the
`first edition, this edition has a sharp focus on new platforms—personal mobile
`devices and warehouse-scale computers—and new architectures—multicore and
`GPUs. As much as its predecessors, this edition aims to demystify computer
`architecture through an emphasis on cost-performance-energy trade-offs and
`good engineering design. We believe that the field has continued to mature and
`move toward the rigorous quantitative foundation of long-established scientific
`and engineering disciplines.
`
`xv
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 17
`
`
`
`xvi
`
`■ Preface
`
`This Edition
`
`We said the fourth edition of Computer Architecture: A Quantitative Approach
`may have been the most significant since the first edition due to the switch to
`multicore chips. The feedback we received this time was that the book had lost
`the sharp focus of the first edition, covering everthing equally but without empha-
`sis and context. We’re pretty sure that won’t be said about the fifth edition.
`We believe most of the excitement is at the extremes in size of computing,
`with personal mobile devices (PMDs) such as cell phones and tablets as the cli-
`ents and warehouse-scale computers offering cloud computing as the server.
`(Observant readers may seen the hint for cloud computing on the cover.) We are
`struck by the common theme of these two extremes in cost, performance, and
`energy efficiency despite their difference in size. As a result, the running context
`through each chapter is computing for PMDs and for warehouse scale computers,
`and Chapter 6 is a brand-new chapter on the latter topic.
`The other theme is parallelism in all its forms. We first idetify the two types of
`application-level parallelism in Chapter 1: data-level parallelism (DLP), which
`arises because there are many data items that can be operated on at the same time,
`and task-level parallelism (TLP), which arises because tasks of work are created
`that can operate independently and largely in parallel. We then explain the four
`architectural styles that exploit DLP and TLP: instruction-level parallelism (ILP)
`in Chapter 3; vector architectures and graphic processor units (GPUs) in Chapter
`4, which is a brand-new chapter for this edition; thread-level parallelism in
`Chapter 5; and request-level parallelism (RLP) via warehouse-scale computers in
`Chapter 6, which is also a brand-new chapter for this edition. We moved memory
`hierarchy earlier in the book to Chapter 2, and we moved the storage systems
`chapter to Appendix D. We are particularly proud about Chapter 4, which con-
`tains the most detailed and clearest explanation of GPUs yet, and Chapter 6,
`which is the first publication of the most recent details of a Google Warehouse-
`scale computer.
`As before, the first three appendices in the book give basics on the MIPS
`instruction set, memory hierachy, and pipelining for readers who have not read a
`book like Computer Organization and Design. To keep costs down but still sup-
`ply supplemental material that are of interest to some readers, available online at
`http://booksite.mkp.com/9780123838728/ are nine more appendices. There are
`more pages in these appendices than there are in this book!
`This edition continues the tradition of using real-world examples to demon-
`strate the ideas, and the “Putting It All Together” sections are brand new. The
`“Putting It All Together” sections of this edition include the pipeline organiza-
`tions and memory hierarchies of the ARM Cortex A8 processor, the Intel core i7
`processor, the NVIDIA GTX-280 and GTX-480 GPUs, and one of the Google
`warehouse-scale computers.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 18
`
`
`
`Preface
`
`■ xvii
`
`Topic Selection and Organization
`
`As before, we have taken a conservative approach to topic selection, for there are
`many more interesting ideas in the field than can reasonably be covered in a treat-
`ment of basic principles. We have steered away from a comprehensive survey of
`every architecture a reader might encounter. Instead, our presentation focuses on
`core concepts likely to be found in any new machine. The key criterion remains
`that of selecting ideas that have been examined and utilized successfully enough
`to permit their discussion in quantitative terms.
`Our intent has always been to focus on material that is not available in equiva-
`lent form from other sources, so we continue to emphasize advanced content
`wherever possible. Indeed, there are several systems here whose descriptions
`cannot be found in the literature. (Readers interested strictly in a more basic
`introduction to computer architecture should read Computer Organization and
`Design: The Hardware/Software Interface.)
`
`An Overview of the Content
`
`Chapter 1 has been beefed up in this edition. It includes formulas for energy,
`static power, dynamic power, integrated circuit costs, reliability, and availability.
`(These formulas are also found on the front inside cover.) Our hope is that these
`topics can be used through the rest of the book. In addition to the classic quantita-
`tive principles of computer design and performance measurement, the PIAT sec-
`tion has been upgraded to use the new SPECPower benchmark.
`Our view is that the instruction set architecture is playing less of a role today
`than in 1990, so we moved this material to Appendix A. It still uses the MIPS64
`architecture. (For quick review, a summary of the MIPS ISA can be found on the
`back inside cover.) For fans of ISAs, Appendix K covers 10 RISC architectures,
`the 80x86, the DEC VAX, and the IBM 360/370.
`We then move onto memory hierarchy in Chapter 2, since it is easy to apply
`the cost-performance-energy principles to this material and memory is a critical
`resource for the rest of the chapters. As in the past edition, Appendix B contains
`an introductory review of cache principles, which is available in case you need it.
`Chapter 2 discusses 10 advanced optimizations of caches. The chapter includes
`virtual machines, which offers advantages in protection, software management,
`and hardware management and play an important role in cloud computing. In
`addition to covering SRAM and DRAM technologies, the chapter includes new
`material on Flash memory. The PIAT examples are the ARM Cortex A8, which is
`used in PMDs, and the Intel Core i7, which is used in servers.
`Chapter 3 covers the exploitation of instruction-level parallelism in high-
`performance processors, including superscalar execution, branch prediction,
`speculation, dynamic scheduling, and multithreading. As mentioned earlier,
`Appendix C is a review of pipelining in case you need it. Chapter 3 also sur-
`veys the limits of ILP. Like Chapter 2, the PIAT examples are again the ARM
`Cortex A8 and the Intel Core i7. While the third edition contained a great deal
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 19
`
`
`
`xviii
`
`■ Preface
`
`on Itanium and VLIW, this material is now in Appendix H, indicating our view
`that this architecture did not live up to the earlier claims.
`The increasing importance of multimedia applications such as games and video
`processing has also increased the importance of achitectures that can exploit data-
`level parallelism. In particular, there is a rising interest in computing using graphi-
`cal processing units (GPUs), yet few architects understand how GPUs really work.
`We decided to write a new chapter in large part to unveil this new style of com-
`puter architecture. Chapter 4 starts with an introduction to vector architectures,
`which acts as a foundation on which to build explanations of multimedia SIMD
`instrution set extensions and GPUs. (Appendix G goes into even more depth on
`vector architectures.) The section on GPUs was the most difficult to write in this
`book, in that it took many iterations to get an accurate description that was also
`easy to understand. A significant challenge was the terminology. We decided to go
`with our own terms and then provide a translation between our terms and the offi-
`cial NVIDIA terms. (A copy of that table can be found in the back inside cover
`pages.) This chapter introduces the Roofline performance model and then uses it
`to compare the Intel Core i7 and the NVIDIA GTX 280 and GTX 480 GPUs. The
`chapter also describes the Tegra 2 GPU for PMDs.
`Chapter 5 describes multicore processors. It explores symmetric and
`distributed-memory architectures, examining both organizational principles and
`performance. Topics in synchronization and memory consistency models are
`next. The example is the Intel Core i7. Readers interested in interconnection net-
`works on a chip should read Appendix F, and those interested in larger scale mul-
`tiprocessors and scientific applications should read Appendix I.
`As mentioned earlier, Chapter 6 describes the newest topic in computer archi-
`tecture, warehouse-scale computers (WSCs). Based on help from engineers at
`Amazon Web Services and Google, this chapter integrates details on design, cost,
`an