throbber
Homayoun.
`
`Reference 26
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 1
`
`

`

`John L. Hennessy I David A. Patterson
`
`COMPUTER
`ARCHITECTURE
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 2
`
`

`

`Computer Architecture Formulas
`
`1. CPU time= Instruction count x Clock cycles per instruction x Clock cycle time
`
`2. Xis n times faster than Y: n = Execution timey /Execution timex = Performancex /Performancey
`
`Execution time0 1d
`,
`3. Amdahl s Law: Speedupoverall = E .
`.
`.
`xecutlon tunenew
`
`4. Energyd)nrunk oc l/2 x Capacitive load xVoltage 2
`
`=
`( I _ Fraction
`enhan«d
`
`Fractionenhanccd
`)+ __ __ _
`Speedupenh4nced
`
`5. Power dynamic oc l/2 x Capacitive load x Voltage2 x Frequency switched
`
`6. Powersto.tk a: Currents1a,ic x Voltage
`
`1. Availability= Mean time to fail/ (Mean time to fail +Meantime to repair)
`
`8. Die yield = Wafer yield x 1 / ( 1 + Defects per unit area x Die area)N
`
`where Wafer yield accountc; for wafers that are so bad they need not be tested and N is a parameter called
`the process-complexity factor, a measure of manufacturing difficulty. N ranges from 11.5 to 15.5 in 2011.
`
`9. Means-arithmetic (AM). weighted arithmetic (lVAM), and geometric (GM):
`
` Time; WAM = ~ Weight1 xTime, GM = 1,9, Time,
`
`1~
`
`AM= ~
`
`where T1nte; is the execution time for the ith progran1 of a total of n in the workload, Weight; is the
`weighting of the ith program in the workload.
`
`10. Average memory-ac<:ess rime Hit time+ Miss rate x Miss penalty
`
`11. Misses per instruction= Miss rate x Memory access per instruction
`12. Cache index size: i ndex= Cache size/(Block size x Set associativity)
`Total Facility Power
`. .
`.
`.
`p
`13. Power Ur,l,zat,on Effectiveness (PUE) of a Warehouse Scale Computer= IT
`.
`Equipment ower

`
`Rules of Thumb
`
`1. Amdahl/Case Rule: A balanced computer system needs about 1 MB of main memory capacity and 1
`megabit per second of 1/0 bandwidth per MIPS of CPU performance.
`2. 90/10 Locality Rule: A program executes about 90% of its instructions in 10% of its code.
`3. Bandwidth Rule: Bandwidth grows by at lenst the square of the improvement in latency.
`4. 2: J Cache Ride: The miss rate of a direct-mapped cache of size N is about the san1e as a two-way set(cid:173)
`associative cache of size NI'!..
`5. Dependability Rule: Design with no single point of failure.
`6. Watt-Year Rule: The fully burdened cost of a Watt per year in a Warehouse Scale Computer in Nonh
`America in 2011, including the cost of amortizing the power and cooling infrastructure. is about $2.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 3
`
`

`

`In Praise of Computer Architecture: A Quantitative Approach
`Sixth Edition
`
`"Although important concepts of architecture are timeless, this edition has been
`thoroughly updated with the latest technology developments, costs, examples,
`and references. Keeping pace with recent developments in open-sourced architec(cid:173)
`ture, the instruction set architecture used in the book has been updated to use the
`RISC-V ISA."
`
`-from the foreword by Norman P. Jouppi, Google
`
`"Computer Architecture: A Quantitative Approach is a classic that, like fine wine,
`just keeps getting better. I bought my first copy as I finished up my undergraduate
`degree and it remains one of my most frequently referenced texts today."
`
`-James Hamilton, Amazon Web Service
`
`"Hennessy and Patterson wrote the first edition of this book when graduate stu(cid:173)
`dents built computers with 50,000 transistors. Today, warehouse-size computers
`contain that many servers, each consisting of dozens of independent processors
`and billions of transistors. The evolution of computer architecture has been rapid
`and relentless, but Computer Architecture: A Quantitative Approach has kept pace,
`with each edition accurately explaining and analyzing the important emerging
`ideas that make this field so exciting."
`
`-James Larus, Microsoft Research
`
`"Another timely and relevant update to a classic, once again also serving as a win(cid:173)
`dow into the relentless and exciting evolution of computer architecture! The new
`discussions in this edition on the slowing of Moore's law and implications for
`future systems are must-reads for both computer architects and practitioners
`working on broader systems."
`
`-Parthasarathy (Partha) Ranganathan, Google
`
`"I love the 'Quantitative Approach' books because they are written by engineers,
`for engineers. John Hennessy and Dave Patterson show the limits imposed by
`mathematics and the possibilities enabled by materials science. Then they teach
`through real-world examples how architects analyze, measure, and compromise
`to build working systems. This sixth edition comes at a critical time: Moore's
`Law is fading just as deep learning demands unprecedented compute cycles.
`The new chapter on domain-specific architectures documents a number of prom(cid:173)
`ising approaches and prophesies a rebirth in computer architecture. Like the
`scholars of the European Renaissance, computer architects must understand our
`own history, and then combine the lessons of that history with new techniques
`to remake the world."
`
`-Cliff Young, Google
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 4
`
`

`

`This page intentionally left blank
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 5
`
`

`

`, . , , ' ' ' r ' •P •
`
`. , ,
`
`'': :~; .~~::i;.c•s
`; . • ;, ;~ ,, .• ;.,t •. :;~,, ",.
`.:. :,; \:,.\;.~;:::::'..;;·''
`
`:, , ,.~ .... ~t .,,,,. ,, ' ., :...~' ,
`
`Computer Architecture
`A Quantitative Approach
`
`Sixth Edition
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 6
`
`

`

`John L. Hennessy is a Professor of Electrical Engineering and Computer Science at Stanford
`University, where he has been a member of the faculty since 1977 and was, from 2000 to
`2016, its 10th President He currently serves as the Director of the Knight-Hennessy Fellow(cid:173)
`ship, which provides graduate fellowships to potential future leaders. Hennessy is a Fellow of
`the IEEE and ACM, a member of the National Academy of Engineering, the National Acad(cid:173)
`emy of Science, and the American Philosophical Society, and a Fellow of the American Acad(cid:173)
`emy of Arts and Sciences. Among his many awards are the 2001 Eckert-Mauchly Award for
`his contributions to RISC technology, the 2001 Seymour Cray Computer Engineering Award,
`and the 2000 John von Neumann Award, which he shared with David Patterson. He has also
`received 10 honorary doctorates.
`
`In 1981, he started the MIPS project at Stanford with a handful of graduate students. After
`completing the project in 1984, he took a leave from the university to cofound MIPS Com(cid:173)
`puter Systems. which developed one of the first commercial RISC microprocessors. As of
`2017, over 5 billion MIPS microprocessors have been shipped in devices ranging from video
`games and palmtop computers to laser printers and network switches. Hennessy subse(cid:173)
`quently led the DASH (Director Architecture for Shared Memory) project, which prototyped
`the first scalable cache coherent multiprocessor; many of the key ideas have been adopted
`in modern multiprocessors. In addition to his technical activities and university responsibil(cid:173)
`ities, he has continued to work with numerous start-ups, both as an early-stage advisor and
`an investor.
`David A. Patterson became a Distinguished Engineer at Google in 2016 after 40 years as a
`UC Berkeley professor. He joined UC Berkeley immediately after graduating from UCLA. He
`still spends a day a week in Berkeley as an Emeritus Professor of Computer Science. His
`teaching has been honored by the Distinguished Teaching Award from the University of
`California, the Karlstrom Award from ACM, and the Mulligan Education Medal and Under(cid:173)
`graduate Teaching Award from IEEE. Patterson received the IEEE Technical Achievement
`Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE
`Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John
`von Neumann Medal and the C & C Prize with John Hennessy. Like his co-author, Patterson is
`a Fellow of the American Academy of Arts and Sciences, the Computer History Museum,
`ACM, and IEEE, and he was elected to the National Academy of Engineering, the National
`Academy of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on the
`Information Technology Advisory Committee to the President of the United States, as chair
`of the CS division in the Berkeley EECS department, as chair of the Computing Research
`Association, and as President of ACM. This record led to Distinguished Service Awards from
`ACM, CRA, and SIGARCH. He is currently Vice-Chair of the Board of Directors of the RISC-V
`Foundation.
`
`At Berkeley, Patterson led the design and implementation of RISC I, likely the first VLSI
`reduced instruction set computer, and the foundation of the commercial SPARC architec(cid:173)
`ture. He was a leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led
`to dependable storage systems from many companies. He was also involved in the Network
`of Workstations (NOW) project, which led to cluster technology used by Internet companies
`and later to cloud computing. His current interests are in designing domain-specific archi(cid:173)
`tectures for machine learning, spreading the word on the open RISC-V instruction set archi(cid:173)
`tecture, and in helping the UC Berkeley RISELab (Real-time Intelligent Secure Execution).
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 7
`
`

`

`Computer Architecture
`A Quantitative Approach
`
`Sixth Edition
`
`John L. Hennessy
`Stanford University
`
`David A. Patterson
`University of California, Berkeley
`
`With Contributions by
`
`Krste Asanovic
`University of Colifornia, Berkeley
`Jason D. Bakos
`University of South Carolina
`Robert P. Colwell
`R&E Colwell & Assoc. Inc.
`Abhishek Bhattacharjee
`Rutgers University
`Thomas M. Conte
`Georgia Tech
`Jose Duato
`Proemisa
`Diana Franklin
`University of Chicago
`David Goldberg
`eBay
`
`Norman P. Jouppi
`Google
`Sheng Li
`lnrel Labs
`Naveen Muralimanohar
`HP Labs
`Gregory D. Peterson
`University of Tennessee
`Timothy M. Pinkston
`University of Southern California
`Parthasarathy Ranganathan
`Google
`David A. Wood
`University of Wisconsin- Madison
`Cliff Young
`Google
`Amr Zaky
`University of Santa Clara
`
`M<
`
`MORGAN K AUFMANN PUBLISHERS
`
`ELSEVIER
`
`A N
`
`IM PRINT OF ELSE V I E R
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 8
`
`

`

`Morgan Kaufmunn is an imprint of Elsevier
`50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
`
`© 2019 Elsevier Inc. All rights reserved.
`
`No part of this publication may be repro<lucl!<l or transmittt!<l in any form orby ;,my means. elecLr"Onic or mechanical.
`inclu<ling photocopying, recording, or any information storJge and retrieval system. without pennission in writing
`from the publisher. Details on how to seek pem1ission, funher information about the Publisher's permissions
`policies and our arransements with organizations such as the Copyright ClcarJnce Center and the C opyright
`Licensing Agency. can be found at our wehsite: www.elsevier.com/pem1issions.
`
`This book and the individual contrib utions contained in it are protected under copyright by the Publisher (other than
`as may be noted herein).
`
`Notices
`Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
`understanding. change~ in research methods. professional practices. or me<lical treatment m3y become necessary.
`
`Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any
`information. methods. compounds. or experiments described herein. In using such information or methods they
`should be mindful of their own safety and the safety of others, including parties for whom they have a professional
`responsibility.
`
`To the fulle.o,;t extent of the law. neither the Publisher non he authors, contributors. or editors, assume any liability for
`any injury and/or damage lo persons or propeny as a matter of product., liability, negligence or otherwise, or from
`any use or operalion of any methods. products, instructions, or ideas concaine<l in the material herein.
`
`Library of Congress Cataloging-in-Publication Data
`A catalog record for this book is available from the Library of Congress
`
`British Library Cataloguing-in-Publication Dal.I
`A catalogue record for this book is available from the British Library
`
`ISBN: 978-0-12-MI 1905- 1
`
`For information on all Morgan Kaufmann publications
`visit our website at ht1ps://www.clsevier.com/book.1H md-joumals
`
`Working together
`to grow libraries in
`developing countries
`
`Pntemational
`
`,vww.clscvicr.com • www.hooka1d.org
`
`■'" ookAid
`
`Publisher: Katey Bincher
`Acquisitio11 Editor: Stephen Merken
`Den!lopmemal Editor: Nate McFadden
`Produuiur, Prujel'I Manager: Stalin Viswunathan
`Cover Desi~1zer: C hristian J. 13ilbow
`
`Ty J)"set by SPi Global. India
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 9
`
`

`

`To Andrea, Linda, and our four sons
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 10
`
`

`

`This page intentionally left blank
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 11
`
`

`

`Foreword
`
`by Norman P. Jouppi, Google
`
`Much of the improvement in computer perfonnance over the last 40 years has been
`provided by computer architecture advancements that have leveraged Moore's
`Law and Dennard scaling to build larger and more parallel systems. Moore's
`Law is the observation that the maximum number of transistors in an integrated
`circuit doubles approximately every two years. Dennard scaling refers to the reduc(cid:173)
`tion of MOS supply voltage in concert with the scaling of feature sizes, so that as
`transistors get smaller, their power density stays roughly constant. With the end of
`Dennard scaling a decade ago, and the recent slowdown of Moore's Law due to a
`combination of physical limitations and economic factors, the sixth edition of the
`preeminent textbook for our field couldn't be more timely. Here are some reasons.
`First, because domain-specific architectures can provide equivalent perf or(cid:173)
`mance and power benefits of three or more historical generations of Moore's
`Law and Dennard scaling, they now can provide better implementations than
`may ever be possible with future scaling of general-purpose architectures. And
`with the diverse application space of computers today, there are many potential
`areas for architectural innovation with domain-specific architectures. Second,
`high-quality implementations of open-source architectures now have a much lon(cid:173)
`ger lifetime due to the slowdown in Moore's Law. This gives them more oppor(cid:173)
`tunities for continued optimization and refinement, and hence makes them more
`attractive. Third, with the slowing of Moore's Law, different technology compo(cid:173)
`nents have been scaling heterogeneously. Furthennore, new technologies such as
`2.5D stacking, new nonvolatile memories, and optical interconnects have been
`developed to provide more than Moore's Law can supply alone. To use these
`new technologies and nonhomogeneous scaling effectively, fundamental design
`decisions need to be reexamined from first principles. Hence it is important for
`students, professors, and practitioners in the industry to be skilled in a wide range
`of both old and new architectural techniques. All told, I believe this is the most
`exciting time in computer architecture since the industrial exploitation of
`instruction-level parallelism in microprocessors 25 years ago.
`The largest change in this edition is the addition of a new chapter on domain(cid:173)
`specific architectures. It's long been known that customized domain-specific archi(cid:173)
`tectures can have higher perfonnance, lower power, and require less silicon area
`than general-purpose processor implementations. However when general-purpose
`
`ix
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 12
`
`

`

`x
`
`,. Foreword
`
`processors were increasing in single-threaded performance by 40% per year (see
`Fig. 1.11 ), the extra time to market required to develop a custom architecture vs.
`using a leading-edge standard microprocessor could cause the custom architecture
`to lose much of its advantage. In contrast, today single-core performance is
`improving very slowly, meaning that the benefits of custom architectures will
`not be made obsolete by general-purpose processors for a very long time, if ever.
`Chapter 7 covers several domain-specific architectures. Deep neural networks
`have very high computation requirements but lower data precision requirements -
`this combination can benefit significantly from custom architectures. Two example
`architectures and implementations for deep neural networks are presented: one
`optimized for inference and a second optimized for training. Image processing
`is another example domain; it also has high computation demands and benefits
`from lower-precision data types. Furthermore, since it is often found in mobile
`devices, the power savings from custom architectures are also very valuable.
`Finally, by nature of their reprogrammability, FPGA-ba~ed accelerators can be
`used to implement a variety of different domain-specific architectures on a single
`device. They also can benefit more irregular applications that are frequently
`updated. like accelerating internet search.
`Although important concepts of architecture are timeless, this edition has been
`thoroughly updated with the latest technology developments, costs, examples, and
`references. Keeping pace with recent developments in open-sourced architecture,
`the instruction set architecture used in the book has been updated to use the
`RJSC-Y ISA.
`On a personal note, after enjoying the privilege of working with John as a grad(cid:173)
`uate student, I am now enjoying the privilege of working with Dave at Google.
`What an amazing duo!
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 13
`
`

`

`Contents
`
`Foreword
`Preface
`Acknowledgments
`
`Chapter 1
`
`Fundamentals of Quantitative Design and Analysis
`1.1
`Introduction
`1.2 Classes of Computers
`1.3 Defining Computer Architecture
`1.4 Trends in Technology
`1.5 Trends in Power and Energy in Integrated Circuits
`1.6 Trends in Cost
`1.7 Dependability
`1.8 Measuring, Reporting, and Summarizing Performance
`1.9 Quantitative Principles of Computer Design
`1.10 Putting It All Together: Performance, Price, and Power
`1.11 Fallacies and Pitfalls
`1.12 Concluding Remarks
`1.13 Historical Perspectives and References
`Case Studies and Exercises by Diana Franklin
`
`Chapter 2 Memory Hierarchy Design
`2.1
`Introduction
`2.2 Memory Technology and Optimizations
`2.3 Ten Advanced Optimizations of Cache Performance
`2.4 Virtual Memory and Virtual Machines
`2.5 Cross-Cutting Issues: The Design of Memory Hierarchies
`2.6 Putting It All Together: Memory Hierarchies in the ARM Cortex-A53
`and Intel Core i7 6700
`2.7 Fallacies and Pitfalls
`2.8 Concluding Remarks: Looking Ahead
`2.9 Historical Perspectives and References
`
`ix
`xvii
`
`XXV
`
`2
`6
`11
`18
`23
`29
`36
`39
`48
`55
`58
`64
`67
`67
`
`78
`84
`94
`118
`126
`
`129
`142
`146
`148
`
`xi
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 14
`
`

`

`xii
`
`CJ Contents
`
`Chapter 3
`
`Case Studies and Exercises by Norman P. Jouppi, Rajeev
`Balasubramonian, Naveen Muralimanohar, and Sheng Li
`
`Instruction-Level Parallelism and Its Exploitation
`Instruction-Level Parallelism: Concepts and Challenges
`3.1
`3.2 Basic Compiler Techniques for Exposing ILP
`3.3 Reducing Branch Costs With Advanced Branch Prediction
`3.4 Overcoming Data Hazards With Dynamic Scheduling
`3.5 Dynamic Scheduling: Examples and the Algorithm
`3.6 Hardware-Based Speculation
`3.7 Exploiting ILP Using Multiple Issue and Static Scheduling
`3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and
`Speculation
`3.9 Advanced Techniques for Instruction Delivery and Speculation
`3.10 Cross-Cutting Issues
`3.11 Multithreading: Exploiting Thread-Level Parallelism to Improve
`Uniprocessor Throughput
`3.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53
`3.13 Fallacies and Pitfalls
`3.14 Concluding Remarks: What's Ahead?
`3.15 Historical Perspective and References
`Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
`
`148
`
`168
`176
`182
`191
`201
`208
`218
`
`222
`228
`240
`
`242
`247
`258
`264
`266
`266
`
`Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures
`Introduction
`4.1
`4.2 Vector Architecture
`4.3 SIMD Instruction Set Extensions for Multimedia
`4.4 Graphics Processing Units
`4.5 Detecting and Enhancing Loop-Level Parallelism
`4.6 Cross-Cutting Issues
`4.7 Putting It All Together: Embedded Versus Server GPUs and
`Tesla Versus Core i7
`4.8 Fallacies and Pitfalls
`4.9 Concluding Remarks
`4.10 Historical Perspective and References
`Case Study and Exercises by Jason D. Bakos
`
`282
`283
`304
`310
`336
`345
`
`346
`353
`357
`357
`357
`
`Chapter 5
`
`Thread-Level Parallelism
`Introduction
`5.1
`5.2 Centralized Shared-Memory Architectures
`5.3 Performance of Symmetric Shared-Memory Multiprocessors
`
`368
`377
`393
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 15
`
`

`

`Contents
`
`Ill xiii
`
`5.4 Distributed Shared-Memory and Directory-Based Coherence
`5.5 Synchronization: The Basics
`5.6 Models of Memory Consistency: An Introduction
`5.7 Cross-Cutting Issues
`5.8 Putting It All Together: Multicore Processors and Their Performance
`5.9 Fallacies and Pitfalls
`5.10 The Future of Multicore Scaling
`5.11 Concluding Remarks
`5.12 Historical Perspectives and References
`Case Studies and Exercises by Amr Zaky and David A. Wood
`
`Chapter 6 Warehouse-Scale Computers to Exploit Request-Level
`and Data-Level Parallelism
`Introduction
`6.1
`6.2 Programming Models and Workloads for Warehouse-Scale
`Computers
`6.3 Computer Architecture of Warehouse-Scale Computers
`6.4 The Efficiency and Cost of Warehouse-Scale Computers
`6.5 Cloud Computing: The Return of Utility Computing
`6.6 Cross-Cutting Issues
`6.7 Putting It All Together: A Google Warehouse-Scale Computer
`6.8 Fallacies and Pitfalls
`6.9 Concluding Remarks
`6.10 Historical Perspectives and References
`Case Studies and Exercises by Parthasarathy Ranganathan
`
`Chapter 7
`
`Domain-Specific Architedures
`Introduction
`7.1
`7.2 Guidelines for DSAs
`7.3 Example Domain: Deep Neural Networks
`7.4 Google's Tensor Processing Unit, an Inference Data
`Center Accelerator
`7.5 Microsoft Catapult, a Flexible Data Center Accelerator
`Intel Crest, a Data Center Accelerator for Training
`7.6
`7.7 Pixel Visual Core, a Personal Mobile Device Image Processing Unit
`7.8 Cross-Cutting Issues
`7.9 Putting It All Together: CPUs Versus GPUs Versus DNN Accelerators
`7.10 Fallacies and Pitfalls
`7.11 Concluding Remarks
`7.12 Historical Perspectives and References
`Case Studies and Exercises by Cliff Young
`
`404
`412
`417
`422
`426
`438
`442
`444
`445
`446
`
`466
`
`471
`477
`482
`490
`501
`503
`514
`518
`519
`519
`
`540
`543
`544
`
`557
`567
`579
`579
`592
`595
`602
`604
`606
`606
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 16
`
`

`

`xiv a Contents
`
`Appendix A
`
`Instruction Set Principles
`A.1
`Introduction
`A.2 Classifying Instruction Set Architectures
`A.3 Memory Addressing
`A.4 Type and Size of Operands
`A.S Operations in the Instruction Set
`A.6
`Instructions for Control Flow
`A.7 Encoding an Instruction Set
`A.8 Cross-Cutting Issues: The Role of Compilers
`A.9 Putting It All Together: The RISC-V Architecture
`A.10 Fallacies and Pitfalls
`A.11 Concluding Remarks
`A.12 Historical Perspective and References
`Exercises by Gregory D. Peterson
`
`Appendix B Review of Memory Hierarchy
`
`B.1
`Introduction
`8.2 Cache Performance
`8.3 Six Basic Cache Optimizations
`B.4 Virtual Memory
`B.5 Protection and Examples of Virtual Memory
`B.6 Fallacies and Pitfalls
`B.7 Concluding Remarks
`8.8 Historical Perspective and References
`Exercises by Amr Zaky
`
`Appendix C Pipelining: Basic and Intermediate Concepts
`
`C.1
`Introduction
`C.2 The Major Hurdle of Pipelining-Pipeline Hazards
`(.3 How Is Pipelining Implemented?
`C.4 What Makes Pipelining Hard to Implement?
`C.5 Extending the RISC V Integer Pipeline to Handle Multicycle
`Operations
`C.6 Putting It All Together: The MIPS R4000 Pipeline
`C.7 Cross-Cutting Issues
`C.8 Fallacies and Pitfalls
`C.9 Concluding Remarks
`C.10 Historical Perspective and References
`Updated Exercises by Diana Franklin
`
`A-2
`A-3
`A-7
`A-13
`A-15
`A-16
`A-21
`A-24
`A-33
`A--42
`A-46
`A-47
`A-47
`
`B-2
`B-15
`B-22
`8--40
`8-49
`B-57
`8-59
`8-59
`B-60
`
`C-2
`C-10
`C-26
`C-37
`
`C--45
`C-55
`C-65
`C-70
`C-71
`C-71
`C-71
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 17
`
`

`

`Contents m xv
`
`Appendix D
`
`Appendix E
`
`Appendix F
`
`Appendix G
`
`Appendix H
`
`Appendix I
`Appendix J
`
`Appendix K
`
`Appendix L
`
`Appendix M
`
`Online Appendices
`Storage Systems
`Embedded Systems
`by Thomas M. Conte
`lnterconnedion Networks
`by Timothy M. Pinkston and Jose Duato
`Vector Processors in More Depth
`by Krste Asanovic
`Hardware and Software for VLIW and EPIC
`Large-Scale Multiprocessors and Scientific Applications
`Computer Arithmetic
`by David Goldberg
`Survey of lnstrudion Set Architedures
`Advanced Concepts on Address Translation
`by Abhishek Bhattacharjee
`Historical Perspedives and References
`
`References
`
`Index
`
`R-1
`1-1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 18
`
`

`

`This page intentionally left blank
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 19
`
`

`

`,,::,,
`
`,.,.,,_,,.,
`
`;:,.;.~;,;.W(.t·
`
`·· ··,
`
`Preface
`
`Why We Wrote This Book
`
`Through six editions of this book, our goal has been to describe the basic principles
`underlying what will be tomorrow's technological developments. Our excitement
`about the opportunities in computer architecture has not abated, and we echo what
`we said about the field in the first edition: "It is not a dreary science of paper
`machines that will never work. No! It's a discipline of keen intellectual interest,
`requiring the balance of marketplace forces to cost-perfonnance-power, leading
`to glorious failures and some notable successes."
`Our primary objective in writing our first book was to change the way people
`learn and think about computer architecture. We feel this goal is still valid and
`important. The field is changing daily and must be studied with real examples
`and measurements on real computers, rather than simply as a collection of defini(cid:173)
`tions and designs that will never need to be realized. We offer an enthusiastic wel(cid:173)
`come to anyone who came along with us in the past, as well as to those who are
`joining us now. Either way, we can promise the same quantitative approach to, and
`analysis of, real systems.
`As with earlier versions, we have strived to produce a new edition that will
`continue to be as relevant for professional engineers and architects as it is for those
`involved in advanced computer architecture and design courses. Like the first edi(cid:173)
`tion, this edition has a sharp focus on new platfonns-personal mobile devices and
`warehouse-scale computers-and new architectures-specifically, domain(cid:173)
`specific architectures. As much as its predecessors, this edition aims to demystify
`computer architecture through an emphasis on cost-performance-energy trade-offs
`and good engineering design. We believe that the field has continued to mature and
`move toward the rigorous quantitative foundation of long-established scientific
`and engineering disciplines.
`
`xvii
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 20
`
`

`

`xviii
`
`,. Pref ace
`
`This Edition
`
`The ending of Moore's Law and Dennard scaling is having as profound effect on
`computer architecture as did the switch to multicore. We retain the focus on the
`extremes in size of computing, with personal mobile devices (PMDs) such as cell
`phones and tablets as the clients and warehouse-scale computers offering cloud
`computing as the server. We also maintain the other theme of parallelism in all
`its fonns: data-level parallelism (DLP) in Chapters I and 4, instruction-level par(cid:173)
`allelism (!LP) in Chap1er 3, thread-level parallelism in Chapter S, and request(cid:173)
`/eve/ parallelism (RLP) in Chapter 6.
`The most pervasive change in this edition is switching from MIPS to the RJSC(cid:173)
`V instruction set. We suspect this modem, modular, open instruction set may
`become a significant force in the infonnation technology industry. It may become
`as important in computer architecture as Linux is for operating systems.
`The newcomer in this edition is Chapter 7, which introduces domain-specific
`architectures with several concrete examples from industry.
`As before, the first three appendices in the book give basics on the RJSC-V
`instruction set, memory hierarchy, and pipelining for readers who have not read
`a book like Computer Organization and Design. To keep costs down but still sup(cid:173)
`ply supplemental material that is of interest to some readers, available online at
`hnps://www .elsevier.com/books-and-joumals/book-companion/97801 28 1 1905 I
`are nine more appendices. There are more pages in these appendices than there are
`in this book!
`This edition continues the tradition of using real-world examples to demonstrate
`the ideas, and the "Putting It All Together" sections are brand new. The "Putting It All
`Together" sections of this edition include the pipeline organizations and memory hier(cid:173)
`archies of the ARM Cortex A8 processor, the Intel core i7 processor, the NVIDIA
`GTX-280 and GTX-480 GPUs. and one of the Google warehouse-scale computers.
`
`Topic Selection and Organization
`
`As before, we have taken a conservative approach to topic selection, for there are
`many more interesting ideas in the field than can reasonably be covered in a treat(cid:173)
`ment of basic principles. We have steered away from a comprehensive survey of
`every architecture a reader might encounter. Instead, our presentation focuses on
`core concepts likely to be found in any new machine. The key criterion remains
`that of selecting ideas that have been e)(amined and utilized successfully enough
`to permit their discussion in quantitative tenns.
`Our intent has always been to focus on material that is not available in equiv(cid:173)
`alent fonn from other sources, so we continue Lo emphasize advanced content
`wherever possible. Indeed, there are several systems here whose descriptions can(cid:173)
`not be found in the literature. (Readers interested strictly in a more basic introduc(cid:173)
`tion to computer architecture should read Computer Organization and Design: The
`Hardware/Software Interface.)
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2138, p. 21
`
`

`

`Preface ■ xix
`
`An Overview of the Content
`
`Chapter I includes formulas for energy. static power, dynamic power, integrated cir(cid:173)
`cuit costs, reliability, and availability. (These formulas are also found on the front
`inside cover.) Our hope is that these topics can be used through the rest of the book.
`In addition to the classic quantitative principles of computer design and perfom1ance
`measurement, it shows the slowing of performance improvement of general-purpose
`microprocessors, which is one inspiration for domain-specific architectures.
`Our view is that the instruction set architecture is playing less of a role today
`than in I 990, so we moved this material to Appendix A. It now uses the RISC-Y
`architecture. (For quick review, a summary of the RJSC-Y ISA can be found on the
`back inside cover.) For fans of IS As, Appendix K was revised for this edition and
`covers 8 RISC architectures (5 for desktop and server use and 3 for embedded use),
`the 80x86, the DEC VAX, and the lBM 360/370.
`We then move onto memory hierarchy in Chapter 2, since it is easy to apply the
`cost-perfonnance-energy principles to this material, and memory is a critical
`resource for the rest of the chapters. As in the past edition, Appendix B contains
`an introductory review of cache principles, which is available in case you need it.
`Chapter 2 discusses IO advanced optimizations of caches. The chapter includes
`virtual machines, which offer advantages in protection, software management,
`and hardware management, and play an important role in cloud computing. In
`addition to covering SRAM and DRAM technologies, the chapter includes new
`material both on Flash memory and on the use of swcked die packaging for extend(cid:173)
`ing the memory hierarchy. The PIAT examples are the ARM Cortex AS, which is
`used in PMDs, and the lntcl Core i7, which is used in servers.
`Chapter 3 covers the exploitation o f instruction-level parallelism in high(cid:17

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket