`
`Reference 26
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 1
`
`
`
`John L. Hennessy I David A. Patterson
`
`COMPUTER
`ARCHITECTURE
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 2
`
`
`
`Computer Architecture Formulas
`
`1. CPU time= Instruction count x Clock cycles per instruction x Clock cycle time
`
`2. Xis n times faster than Y: n = Execution timey /Execution timex = Performancex /Performancey
`
`Execution time0 1d
`,
`3. Amdahl s Law: Speedupoverall = E .
`.
`.
`xecutlon tunenew
`
`4. Energyd)nrunk oc l/2 x Capacitive load xVoltage 2
`
`=
`( I _ Fraction
`enhan«d
`
`Fractionenhanccd
`)+ __ __ _
`Speedupenh4nced
`
`5. Power dynamic oc l/2 x Capacitive load x Voltage2 x Frequency switched
`
`6. Powersto.tk a: Currents1a,ic x Voltage
`
`1. Availability= Mean time to fail/ (Mean time to fail +Meantime to repair)
`
`8. Die yield = Wafer yield x 1 / ( 1 + Defects per unit area x Die area)N
`
`where Wafer yield accountc; for wafers that are so bad they need not be tested and N is a parameter called
`the process-complexity factor, a measure of manufacturing difficulty. N ranges from 11.5 to 15.5 in 2011.
`
`9. Means-arithmetic (AM). weighted arithmetic (lVAM), and geometric (GM):
`
` Time; WAM = ~ Weight1 xTime, GM = 1,9, Time,
`
`1~
`
`AM= ~
`
`where T1nte; is the execution time for the ith progran1 of a total of n in the workload, Weight; is the
`weighting of the ith program in the workload.
`
`10. Average memory-ac<:ess rime Hit time+ Miss rate x Miss penalty
`
`11. Misses per instruction= Miss rate x Memory access per instruction
`12. Cache index size: i ndex= Cache size/(Block size x Set associativity)
`Total Facility Power
`. .
`.
`.
`p
`13. Power Ur,l,zat,on Effectiveness (PUE) of a Warehouse Scale Computer= IT
`.
`Equipment ower
`·
`
`Rules of Thumb
`
`1. Amdahl/Case Rule: A balanced computer system needs about 1 MB of main memory capacity and 1
`megabit per second of 1/0 bandwidth per MIPS of CPU performance.
`2. 90/10 Locality Rule: A program executes about 90% of its instructions in 10% of its code.
`3. Bandwidth Rule: Bandwidth grows by at lenst the square of the improvement in latency.
`4. 2: J Cache Ride: The miss rate of a direct-mapped cache of size N is about the san1e as a two-way set(cid:173)
`associative cache of size NI'!..
`5. Dependability Rule: Design with no single point of failure.
`6. Watt-Year Rule: The fully burdened cost of a Watt per year in a Warehouse Scale Computer in Nonh
`America in 2011, including the cost of amortizing the power and cooling infrastructure. is about $2.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 3
`
`
`
`In Praise of Computer Architecture: A Quantitative Approach
`Sixth Edition
`
`"Although important concepts of architecture are timeless, this edition has been
`thoroughly updated with the latest technology developments, costs, examples,
`and references. Keeping pace with recent developments in open-sourced architec(cid:173)
`ture, the instruction set architecture used in the book has been updated to use the
`RISC-V ISA."
`
`-from the foreword by Norman P. Jouppi, Google
`
`"Computer Architecture: A Quantitative Approach is a classic that, like fine wine,
`just keeps getting better. I bought my first copy as I finished up my undergraduate
`degree and it remains one of my most frequently referenced texts today."
`
`-James Hamilton, Amazon Web Service
`
`"Hennessy and Patterson wrote the first edition of this book when graduate stu(cid:173)
`dents built computers with 50,000 transistors. Today, warehouse-size computers
`contain that many servers, each consisting of dozens of independent processors
`and billions of transistors. The evolution of computer architecture has been rapid
`and relentless, but Computer Architecture: A Quantitative Approach has kept pace,
`with each edition accurately explaining and analyzing the important emerging
`ideas that make this field so exciting."
`
`-James Larus, Microsoft Research
`
`"Another timely and relevant update to a classic, once again also serving as a win(cid:173)
`dow into the relentless and exciting evolution of computer architecture! The new
`discussions in this edition on the slowing of Moore's law and implications for
`future systems are must-reads for both computer architects and practitioners
`working on broader systems."
`
`-Parthasarathy (Partha) Ranganathan, Google
`
`"I love the 'Quantitative Approach' books because they are written by engineers,
`for engineers. John Hennessy and Dave Patterson show the limits imposed by
`mathematics and the possibilities enabled by materials science. Then they teach
`through real-world examples how architects analyze, measure, and compromise
`to build working systems. This sixth edition comes at a critical time: Moore's
`Law is fading just as deep learning demands unprecedented compute cycles.
`The new chapter on domain-specific architectures documents a number of prom(cid:173)
`ising approaches and prophesies a rebirth in computer architecture. Like the
`scholars of the European Renaissance, computer architects must understand our
`own history, and then combine the lessons of that history with new techniques
`to remake the world."
`
`-Cliff Young, Google
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 4
`
`
`
`This page intentionally left blank
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 5
`
`
`
`, . , , ' ' ' r ' •P •
`
`. , ,
`
`'': :~; .~~::i;.c•s
`; . • ;, ;~ ,, .• ;.,t •. :;~,, ",.
`.:. :,; \:,.\;.~;:::::'..;;·''
`
`:, , ,.~ .... ~t .,,,,. ,, ' ., :...~' ,
`
`Computer Architecture
`A Quantitative Approach
`
`Sixth Edition
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 6
`
`
`
`John L. Hennessy is a Professor of Electrical Engineering and Computer Science at Stanford
`University, where he has been a member of the faculty since 1977 and was, from 2000 to
`2016, its 10th President He currently serves as the Director of the Knight-Hennessy Fellow(cid:173)
`ship, which provides graduate fellowships to potential future leaders. Hennessy is a Fellow of
`the IEEE and ACM, a member of the National Academy of Engineering, the National Acad(cid:173)
`emy of Science, and the American Philosophical Society, and a Fellow of the American Acad(cid:173)
`emy of Arts and Sciences. Among his many awards are the 2001 Eckert-Mauchly Award for
`his contributions to RISC technology, the 2001 Seymour Cray Computer Engineering Award,
`and the 2000 John von Neumann Award, which he shared with David Patterson. He has also
`received 10 honorary doctorates.
`
`In 1981, he started the MIPS project at Stanford with a handful of graduate students. After
`completing the project in 1984, he took a leave from the university to cofound MIPS Com(cid:173)
`puter Systems. which developed one of the first commercial RISC microprocessors. As of
`2017, over 5 billion MIPS microprocessors have been shipped in devices ranging from video
`games and palmtop computers to laser printers and network switches. Hennessy subse(cid:173)
`quently led the DASH (Director Architecture for Shared Memory) project, which prototyped
`the first scalable cache coherent multiprocessor; many of the key ideas have been adopted
`in modern multiprocessors. In addition to his technical activities and university responsibil(cid:173)
`ities, he has continued to work with numerous start-ups, both as an early-stage advisor and
`an investor.
`David A. Patterson became a Distinguished Engineer at Google in 2016 after 40 years as a
`UC Berkeley professor. He joined UC Berkeley immediately after graduating from UCLA. He
`still spends a day a week in Berkeley as an Emeritus Professor of Computer Science. His
`teaching has been honored by the Distinguished Teaching Award from the University of
`California, the Karlstrom Award from ACM, and the Mulligan Education Medal and Under(cid:173)
`graduate Teaching Award from IEEE. Patterson received the IEEE Technical Achievement
`Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE
`Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John
`von Neumann Medal and the C & C Prize with John Hennessy. Like his co-author, Patterson is
`a Fellow of the American Academy of Arts and Sciences, the Computer History Museum,
`ACM, and IEEE, and he was elected to the National Academy of Engineering, the National
`Academy of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on the
`Information Technology Advisory Committee to the President of the United States, as chair
`of the CS division in the Berkeley EECS department, as chair of the Computing Research
`Association, and as President of ACM. This record led to Distinguished Service Awards from
`ACM, CRA, and SIGARCH. He is currently Vice-Chair of the Board of Directors of the RISC-V
`Foundation.
`
`At Berkeley, Patterson led the design and implementation of RISC I, likely the first VLSI
`reduced instruction set computer, and the foundation of the commercial SPARC architec(cid:173)
`ture. He was a leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led
`to dependable storage systems from many companies. He was also involved in the Network
`of Workstations (NOW) project, which led to cluster technology used by Internet companies
`and later to cloud computing. His current interests are in designing domain-specific archi(cid:173)
`tectures for machine learning, spreading the word on the open RISC-V instruction set archi(cid:173)
`tecture, and in helping the UC Berkeley RISELab (Real-time Intelligent Secure Execution).
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 7
`
`
`
`Computer Architecture
`A Quantitative Approach
`
`Sixth Edition
`
`John L. Hennessy
`Stanford University
`
`David A. Patterson
`University of California, Berkeley
`
`With Contributions by
`
`Krste Asanovic
`University of Colifornia, Berkeley
`Jason D. Bakos
`University of South Carolina
`Robert P. Colwell
`R&E Colwell & Assoc. Inc.
`Abhishek Bhattacharjee
`Rutgers University
`Thomas M. Conte
`Georgia Tech
`Jose Duato
`Proemisa
`Diana Franklin
`University of Chicago
`David Goldberg
`eBay
`
`Norman P. Jouppi
`Sheng Li
`lnrel Labs
`Naveen Muralimanohar
`HP Labs
`Gregory D. Peterson
`University of Tennessee
`Timothy M. Pinkston
`University of Southern California
`Parthasarathy Ranganathan
`David A. Wood
`University of Wisconsin- Madison
`Cliff Young
`Amr Zaky
`University of Santa Clara
`
`M<
`
`MORGAN K AUFMANN PUBLISHERS
`
`ELSEVIER
`
`A N
`
`IM PRINT OF ELSE V I E R
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 8
`
`
`
`Morgan Kaufmunn is an imprint of Elsevier
`50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
`
`© 2019 Elsevier Inc. All rights reserved.
`
`No part of this publication may be repro<lucl!<l or transmittt!<l in any form orby ;,my means. elecLr"Onic or mechanical.
`inclu<ling photocopying, recording, or any information storJge and retrieval system. without pennission in writing
`from the publisher. Details on how to seek pem1ission, funher information about the Publisher's permissions
`policies and our arransements with organizations such as the Copyright ClcarJnce Center and the C opyright
`Licensing Agency. can be found at our wehsite: www.elsevier.com/pem1issions.
`
`This book and the individual contrib utions contained in it are protected under copyright by the Publisher (other than
`as may be noted herein).
`
`Notices
`Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
`understanding. change~ in research methods. professional practices. or me<lical treatment m3y become necessary.
`
`Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any
`information. methods. compounds. or experiments described herein. In using such information or methods they
`should be mindful of their own safety and the safety of others, including parties for whom they have a professional
`responsibility.
`
`To the fulle.o,;t extent of the law. neither the Publisher non he authors, contributors. or editors, assume any liability for
`any injury and/or damage lo persons or propeny as a matter of product., liability, negligence or otherwise, or from
`any use or operalion of any methods. products, instructions, or ideas concaine<l in the material herein.
`
`Library of Congress Cataloging-in-Publication Data
`A catalog record for this book is available from the Library of Congress
`
`British Library Cataloguing-in-Publication Dal.I
`A catalogue record for this book is available from the British Library
`
`ISBN: 978-0-12-MI 1905- 1
`
`For information on all Morgan Kaufmann publications
`visit our website at ht1ps://www.clsevier.com/book.1H md-joumals
`
`Working together
`to grow libraries in
`developing countries
`
`Pntemational
`
`,vww.clscvicr.com • www.hooka1d.org
`
`■'" ookAid
`
`Publisher: Katey Bincher
`Acquisitio11 Editor: Stephen Merken
`Den!lopmemal Editor: Nate McFadden
`Produuiur, Prujel'I Manager: Stalin Viswunathan
`Cover Desi~1zer: C hristian J. 13ilbow
`
`Ty J)"set by SPi Global. India
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 9
`
`
`
`To Andrea, Linda, and our four sons
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 10
`
`
`
`This page intentionally left blank
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 11
`
`
`
`Foreword
`
`by Norman P. Jouppi, Google
`
`Much of the improvement in computer perfonnance over the last 40 years has been
`provided by computer architecture advancements that have leveraged Moore's
`Law and Dennard scaling to build larger and more parallel systems. Moore's
`Law is the observation that the maximum number of transistors in an integrated
`circuit doubles approximately every two years. Dennard scaling refers to the reduc(cid:173)
`tion of MOS supply voltage in concert with the scaling of feature sizes, so that as
`transistors get smaller, their power density stays roughly constant. With the end of
`Dennard scaling a decade ago, and the recent slowdown of Moore's Law due to a
`combination of physical limitations and economic factors, the sixth edition of the
`preeminent textbook for our field couldn't be more timely. Here are some reasons.
`First, because domain-specific architectures can provide equivalent perf or(cid:173)
`mance and power benefits of three or more historical generations of Moore's
`Law and Dennard scaling, they now can provide better implementations than
`may ever be possible with future scaling of general-purpose architectures. And
`with the diverse application space of computers today, there are many potential
`areas for architectural innovation with domain-specific architectures. Second,
`high-quality implementations of open-source architectures now have a much lon(cid:173)
`ger lifetime due to the slowdown in Moore's Law. This gives them more oppor(cid:173)
`tunities for continued optimization and refinement, and hence makes them more
`attractive. Third, with the slowing of Moore's Law, different technology compo(cid:173)
`nents have been scaling heterogeneously. Furthennore, new technologies such as
`2.5D stacking, new nonvolatile memories, and optical interconnects have been
`developed to provide more than Moore's Law can supply alone. To use these
`new technologies and nonhomogeneous scaling effectively, fundamental design
`decisions need to be reexamined from first principles. Hence it is important for
`students, professors, and practitioners in the industry to be skilled in a wide range
`of both old and new architectural techniques. All told, I believe this is the most
`exciting time in computer architecture since the industrial exploitation of
`instruction-level parallelism in microprocessors 25 years ago.
`The largest change in this edition is the addition of a new chapter on domain(cid:173)
`specific architectures. It's long been known that customized domain-specific archi(cid:173)
`tectures can have higher perfonnance, lower power, and require less silicon area
`than general-purpose processor implementations. However when general-purpose
`
`ix
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 12
`
`
`
`x
`
`,. Foreword
`
`processors were increasing in single-threaded performance by 40% per year (see
`Fig. 1.11 ), the extra time to market required to develop a custom architecture vs.
`using a leading-edge standard microprocessor could cause the custom architecture
`to lose much of its advantage. In contrast, today single-core performance is
`improving very slowly, meaning that the benefits of custom architectures will
`not be made obsolete by general-purpose processors for a very long time, if ever.
`Chapter 7 covers several domain-specific architectures. Deep neural networks
`have very high computation requirements but lower data precision requirements -
`this combination can benefit significantly from custom architectures. Two example
`architectures and implementations for deep neural networks are presented: one
`optimized for inference and a second optimized for training. Image processing
`is another example domain; it also has high computation demands and benefits
`from lower-precision data types. Furthermore, since it is often found in mobile
`devices, the power savings from custom architectures are also very valuable.
`Finally, by nature of their reprogrammability, FPGA-ba~ed accelerators can be
`used to implement a variety of different domain-specific architectures on a single
`device. They also can benefit more irregular applications that are frequently
`updated. like accelerating internet search.
`Although important concepts of architecture are timeless, this edition has been
`thoroughly updated with the latest technology developments, costs, examples, and
`references. Keeping pace with recent developments in open-sourced architecture,
`the instruction set architecture used in the book has been updated to use the
`RJSC-Y ISA.
`On a personal note, after enjoying the privilege of working with John as a grad(cid:173)
`uate student, I am now enjoying the privilege of working with Dave at Google.
`What an amazing duo!
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 13
`
`
`
`Contents
`
`Foreword
`Preface
`Acknowledgments
`
`Chapter 1
`
`Fundamentals of Quantitative Design and Analysis
`1.1
`Introduction
`1.2 Classes of Computers
`1.3 Defining Computer Architecture
`1.4 Trends in Technology
`1.5 Trends in Power and Energy in Integrated Circuits
`1.6 Trends in Cost
`1.7 Dependability
`1.8 Measuring, Reporting, and Summarizing Performance
`1.9 Quantitative Principles of Computer Design
`1.10 Putting It All Together: Performance, Price, and Power
`1.11 Fallacies and Pitfalls
`1.12 Concluding Remarks
`1.13 Historical Perspectives and References
`Case Studies and Exercises by Diana Franklin
`
`Chapter 2 Memory Hierarchy Design
`2.1
`Introduction
`2.2 Memory Technology and Optimizations
`2.3 Ten Advanced Optimizations of Cache Performance
`2.4 Virtual Memory and Virtual Machines
`2.5 Cross-Cutting Issues: The Design of Memory Hierarchies
`2.6 Putting It All Together: Memory Hierarchies in the ARM Cortex-A53
`and Intel Core i7 6700
`2.7 Fallacies and Pitfalls
`2.8 Concluding Remarks: Looking Ahead
`2.9 Historical Perspectives and References
`
`ix
`xvii
`
`XXV
`
`2
`6
`11
`18
`23
`29
`36
`39
`48
`55
`58
`64
`67
`67
`
`78
`84
`94
`118
`126
`
`129
`142
`146
`148
`
`xi
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 14
`
`
`
`xii
`
`CJ Contents
`
`Chapter 3
`
`Case Studies and Exercises by Norman P. Jouppi, Rajeev
`Balasubramonian, Naveen Muralimanohar, and Sheng Li
`
`Instruction-Level Parallelism and Its Exploitation
`Instruction-Level Parallelism: Concepts and Challenges
`3.1
`3.2 Basic Compiler Techniques for Exposing ILP
`3.3 Reducing Branch Costs With Advanced Branch Prediction
`3.4 Overcoming Data Hazards With Dynamic Scheduling
`3.5 Dynamic Scheduling: Examples and the Algorithm
`3.6 Hardware-Based Speculation
`3.7 Exploiting ILP Using Multiple Issue and Static Scheduling
`3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and
`Speculation
`3.9 Advanced Techniques for Instruction Delivery and Speculation
`3.10 Cross-Cutting Issues
`3.11 Multithreading: Exploiting Thread-Level Parallelism to Improve
`Uniprocessor Throughput
`3.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53
`3.13 Fallacies and Pitfalls
`3.14 Concluding Remarks: What's Ahead?
`3.15 Historical Perspective and References
`Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
`
`148
`
`168
`176
`182
`191
`201
`208
`218
`
`222
`228
`240
`
`242
`247
`258
`264
`266
`266
`
`Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures
`Introduction
`4.1
`4.2 Vector Architecture
`4.3 SIMD Instruction Set Extensions for Multimedia
`4.4 Graphics Processing Units
`4.5 Detecting and Enhancing Loop-Level Parallelism
`4.6 Cross-Cutting Issues
`4.7 Putting It All Together: Embedded Versus Server GPUs and
`Tesla Versus Core i7
`4.8 Fallacies and Pitfalls
`4.9 Concluding Remarks
`4.10 Historical Perspective and References
`Case Study and Exercises by Jason D. Bakos
`
`282
`283
`304
`310
`336
`345
`
`346
`353
`357
`357
`357
`
`Chapter 5
`
`Thread-Level Parallelism
`Introduction
`5.1
`5.2 Centralized Shared-Memory Architectures
`5.3 Performance of Symmetric Shared-Memory Multiprocessors
`
`368
`377
`393
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 15
`
`
`
`Contents
`
`Ill xiii
`
`5.4 Distributed Shared-Memory and Directory-Based Coherence
`5.5 Synchronization: The Basics
`5.6 Models of Memory Consistency: An Introduction
`5.7 Cross-Cutting Issues
`5.8 Putting It All Together: Multicore Processors and Their Performance
`5.9 Fallacies and Pitfalls
`5.10 The Future of Multicore Scaling
`5.11 Concluding Remarks
`5.12 Historical Perspectives and References
`Case Studies and Exercises by Amr Zaky and David A. Wood
`
`Chapter 6 Warehouse-Scale Computers to Exploit Request-Level
`and Data-Level Parallelism
`Introduction
`6.1
`6.2 Programming Models and Workloads for Warehouse-Scale
`Computers
`6.3 Computer Architecture of Warehouse-Scale Computers
`6.4 The Efficiency and Cost of Warehouse-Scale Computers
`6.5 Cloud Computing: The Return of Utility Computing
`6.6 Cross-Cutting Issues
`6.7 Putting It All Together: A Google Warehouse-Scale Computer
`6.8 Fallacies and Pitfalls
`6.9 Concluding Remarks
`6.10 Historical Perspectives and References
`Case Studies and Exercises by Parthasarathy Ranganathan
`
`Chapter 7
`
`Domain-Specific Architedures
`Introduction
`7.1
`7.2 Guidelines for DSAs
`7.3 Example Domain: Deep Neural Networks
`7.4 Google's Tensor Processing Unit, an Inference Data
`Center Accelerator
`7.5 Microsoft Catapult, a Flexible Data Center Accelerator
`Intel Crest, a Data Center Accelerator for Training
`7.6
`7.7 Pixel Visual Core, a Personal Mobile Device Image Processing Unit
`7.8 Cross-Cutting Issues
`7.9 Putting It All Together: CPUs Versus GPUs Versus DNN Accelerators
`7.10 Fallacies and Pitfalls
`7.11 Concluding Remarks
`7.12 Historical Perspectives and References
`Case Studies and Exercises by Cliff Young
`
`404
`412
`417
`422
`426
`438
`442
`444
`445
`446
`
`466
`
`471
`477
`482
`490
`501
`503
`514
`518
`519
`519
`
`540
`543
`544
`
`557
`567
`579
`579
`592
`595
`602
`604
`606
`606
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 16
`
`
`
`xiv a Contents
`
`Appendix A
`
`Instruction Set Principles
`A.1
`Introduction
`A.2 Classifying Instruction Set Architectures
`A.3 Memory Addressing
`A.4 Type and Size of Operands
`A.S Operations in the Instruction Set
`A.6
`Instructions for Control Flow
`A.7 Encoding an Instruction Set
`A.8 Cross-Cutting Issues: The Role of Compilers
`A.9 Putting It All Together: The RISC-V Architecture
`A.10 Fallacies and Pitfalls
`A.11 Concluding Remarks
`A.12 Historical Perspective and References
`Exercises by Gregory D. Peterson
`
`Appendix B Review of Memory Hierarchy
`
`B.1
`Introduction
`8.2 Cache Performance
`8.3 Six Basic Cache Optimizations
`B.4 Virtual Memory
`B.5 Protection and Examples of Virtual Memory
`B.6 Fallacies and Pitfalls
`B.7 Concluding Remarks
`8.8 Historical Perspective and References
`Exercises by Amr Zaky
`
`Appendix C Pipelining: Basic and Intermediate Concepts
`
`C.1
`Introduction
`C.2 The Major Hurdle of Pipelining-Pipeline Hazards
`(.3 How Is Pipelining Implemented?
`C.4 What Makes Pipelining Hard to Implement?
`C.5 Extending the RISC V Integer Pipeline to Handle Multicycle
`Operations
`C.6 Putting It All Together: The MIPS R4000 Pipeline
`C.7 Cross-Cutting Issues
`C.8 Fallacies and Pitfalls
`C.9 Concluding Remarks
`C.10 Historical Perspective and References
`Updated Exercises by Diana Franklin
`
`A-2
`A-3
`A-7
`A-13
`A-15
`A-16
`A-21
`A-24
`A-33
`A--42
`A-46
`A-47
`A-47
`
`B-2
`B-15
`B-22
`8--40
`8-49
`B-57
`8-59
`8-59
`B-60
`
`C-2
`C-10
`C-26
`C-37
`
`C--45
`C-55
`C-65
`C-70
`C-71
`C-71
`C-71
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 17
`
`
`
`Contents m xv
`
`Appendix D
`
`Appendix E
`
`Appendix F
`
`Appendix G
`
`Appendix H
`
`Appendix I
`Appendix J
`
`Appendix K
`
`Appendix L
`
`Appendix M
`
`Online Appendices
`Storage Systems
`Embedded Systems
`by Thomas M. Conte
`lnterconnedion Networks
`by Timothy M. Pinkston and Jose Duato
`Vector Processors in More Depth
`by Krste Asanovic
`Hardware and Software for VLIW and EPIC
`Large-Scale Multiprocessors and Scientific Applications
`Computer Arithmetic
`by David Goldberg
`Survey of lnstrudion Set Architedures
`Advanced Concepts on Address Translation
`by Abhishek Bhattacharjee
`Historical Perspedives and References
`
`References
`
`Index
`
`R-1
`1-1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 18
`
`
`
`This page intentionally left blank
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 19
`
`
`
`,,::,,
`
`,.,.,,_,,.,
`
`;:,.;.~;,;.W(.t·
`
`·· ··,
`
`Preface
`
`Why We Wrote This Book
`
`Through six editions of this book, our goal has been to describe the basic principles
`underlying what will be tomorrow's technological developments. Our excitement
`about the opportunities in computer architecture has not abated, and we echo what
`we said about the field in the first edition: "It is not a dreary science of paper
`machines that will never work. No! It's a discipline of keen intellectual interest,
`requiring the balance of marketplace forces to cost-perfonnance-power, leading
`to glorious failures and some notable successes."
`Our primary objective in writing our first book was to change the way people
`learn and think about computer architecture. We feel this goal is still valid and
`important. The field is changing daily and must be studied with real examples
`and measurements on real computers, rather than simply as a collection of defini(cid:173)
`tions and designs that will never need to be realized. We offer an enthusiastic wel(cid:173)
`come to anyone who came along with us in the past, as well as to those who are
`joining us now. Either way, we can promise the same quantitative approach to, and
`analysis of, real systems.
`As with earlier versions, we have strived to produce a new edition that will
`continue to be as relevant for professional engineers and architects as it is for those
`involved in advanced computer architecture and design courses. Like the first edi(cid:173)
`tion, this edition has a sharp focus on new platfonns-personal mobile devices and
`warehouse-scale computers-and new architectures-specifically, domain(cid:173)
`specific architectures. As much as its predecessors, this edition aims to demystify
`computer architecture through an emphasis on cost-performance-energy trade-offs
`and good engineering design. We believe that the field has continued to mature and
`move toward the rigorous quantitative foundation of long-established scientific
`and engineering disciplines.
`
`xvii
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 20
`
`
`
`xviii
`
`,. Pref ace
`
`This Edition
`
`The ending of Moore's Law and Dennard scaling is having as profound effect on
`computer architecture as did the switch to multicore. We retain the focus on the
`extremes in size of computing, with personal mobile devices (PMDs) such as cell
`phones and tablets as the clients and warehouse-scale computers offering cloud
`computing as the server. We also maintain the other theme of parallelism in all
`its fonns: data-level parallelism (DLP) in Chapters I and 4, instruction-level par(cid:173)
`allelism (!LP) in Chap1er 3, thread-level parallelism in Chapter S, and request(cid:173)
`/eve/ parallelism (RLP) in Chapter 6.
`The most pervasive change in this edition is switching from MIPS to the RJSC(cid:173)
`V instruction set. We suspect this modem, modular, open instruction set may
`become a significant force in the infonnation technology industry. It may become
`as important in computer architecture as Linux is for operating systems.
`The newcomer in this edition is Chapter 7, which introduces domain-specific
`architectures with several concrete examples from industry.
`As before, the first three appendices in the book give basics on the RJSC-V
`instruction set, memory hierarchy, and pipelining for readers who have not read
`a book like Computer Organization and Design. To keep costs down but still sup(cid:173)
`ply supplemental material that is of interest to some readers, available online at
`hnps://www .elsevier.com/books-and-joumals/book-companion/97801 28 1 1905 I
`are nine more appendices. There are more pages in these appendices than there are
`in this book!
`This edition continues the tradition of using real-world examples to demonstrate
`the ideas, and the "Putting It All Together" sections are brand new. The "Putting It All
`Together" sections of this edition include the pipeline organizations and memory hier(cid:173)
`archies of the ARM Cortex A8 processor, the Intel core i7 processor, the NVIDIA
`GTX-280 and GTX-480 GPUs. and one of the Google warehouse-scale computers.
`
`Topic Selection and Organization
`
`As before, we have taken a conservative approach to topic selection, for there are
`many more interesting ideas in the field than can reasonably be covered in a treat(cid:173)
`ment of basic principles. We have steered away from a comprehensive survey of
`every architecture a reader might encounter. Instead, our presentation focuses on
`core concepts likely to be found in any new machine. The key criterion remains
`that of selecting ideas that have been e)(amined and utilized successfully enough
`to permit their discussion in quantitative tenns.
`Our intent has always been to focus on material that is not available in equiv(cid:173)
`alent fonn from other sources, so we continue Lo emphasize advanced content
`wherever possible. Indeed, there are several systems here whose descriptions can(cid:173)
`not be found in the literature. (Readers interested strictly in a more basic introduc(cid:173)
`tion to computer architecture should read Computer Organization and Design: The
`Hardware/Software Interface.)
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 21
`
`
`
`Preface ■ xix
`
`An Overview of the Content
`
`Chapter I includes formulas for energy. static power, dynamic power, integrated cir(cid:173)
`cuit costs, reliability, and availability. (These formulas are also found on the front
`inside cover.) Our hope is that these topics can be used through the rest of the book.
`In addition to the classic quantitative principles of computer design and perfom1ance
`measurement, it shows the slowing of performance improvement of general-purpose
`microprocessors, which is one inspiration for domain-specific architectures.
`Our view is that the instruction set architecture is playing less of a role today
`than in I 990, so we moved this material to Appendix A. It now uses the RISC-Y
`architecture. (For quick review, a summary of the RJSC-Y ISA can be found on the
`back inside cover.) For fans of IS As, Appendix K was revised for this edition and
`covers 8 RISC architectures (5 for desktop and server use and 3 for embedded use),
`the 80x86, the DEC VAX, and the lBM 360/370.
`We then move onto memory hierarchy in Chapter 2, since it is easy to apply the
`cost-perfonnance-energy principles to this material, and memory is a critical
`resource for the rest of the chapters. As in the past edition, Appendix B contains
`an introductory review of cache principles, which is available in case you need it.
`Chapter 2 discusses IO advanced optimizations of caches. The chapter includes
`virtual machines, which offer advantages in protection, software management,
`and hardware management, and play an important role in cloud computing. In
`addition to covering SRAM and DRAM technologies, the chapter includes new
`material both on Flash memory and on the use of swcked die packaging for extend(cid:173)
`ing the memory hierarchy. The PIAT examples are the ARM Cortex AS, which is
`used in PMDs, and the lntcl Core i7, which is used in servers.
`Chapter 3 covers the exploitation o f instruction-level parallelism in high(cid:17