throbber
Homayoun
`
`Reference 41
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 1
`
`

`

`"2
`
`A HENNESSY
`
`COMPUTER
`
`;
`
`PATTERSON
`
`ARCHITECTURE
`
`A anntitalive A10/) mac/z
`
`PATENT OWNER DIRECTSTREAM, LLC
`-
`-
`.
`-
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 2
`
`

`

`In Praise of Computer Architecture: A Quantitative Approach
`Fifth Edition
`
`“The 5th edition of Computer Architecture: A Quantitative Approach continues
`the legacy, providing students of computer architecture with the most up-to-date
`information on current computing platforms, and architectural insights to help
`them design future systems. A highlight of the new edition is the significantly
`revised chapter on data-level parallelism, which demystifies GPU architectures
`with clear explanations using traditional computer architecture terminology.”
`
`—Krste Asanovic´, University of California, Berkeley
`
`“Computer Architecture: A Quantitative Approach is a classic that, like fine
`wine, just keeps getting better. I bought my first copy as I finished up my under-
`graduate degree and it remains one of my most frequently referenced texts today.
`When the fourth edition came out, there was so much new material that I needed
`to get it to stay current in the field. And, as I review the fifth edition, I realize that
`Hennessy and Patterson have done it again. The entire text is heavily updated and
`Chapter 6 alone makes this new edition required reading for those wanting to
`really understand cloud and warehouse scale-computing. Only Hennessy and
`Patterson have access to the insiders at Google, Amazon, Microsoft, and other
`cloud computing and internet-scale application providers and there is no better
`coverage of this important area anywhere in the industry.”
`
`—James Hamilton, Amazon Web Services
`
`“Hennessy and Patterson wrote the first edition of this book when graduate stu-
`dents built computers with 50,000 transistors. Today, warehouse-size computers
`contain that many servers, each consisting of dozens of independent processors
`and billions of transistors. The evolution of computer architecture has been rapid
`and relentless, but Computer Architecture: A Quantitative Approach has kept
`pace, with each edition accurately explaining and analyzing the important emerg-
`ing ideas that make this field so exciting.”
`
`—James Larus, Microsoft Research
`
`“This new edition adds a superb new chapter on data-level parallelism in vector,
`SIMD, and GPU architectures. It explains key architecture concepts inside mass-
`market GPUs, maps them to traditional terms, and compares them with vector
`and SIMD architectures. It’s timely and relevant with the widespread shift to
`GPU parallel computing. Computer Architecture: A Quantitative Approach fur-
`thers its string of firsts in presenting comprehensive architecture coverage of sig-
`nificant new developments!”
`
`—John Nickolls, NVIDIA
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 3
`
`

`

`“The new edition of this now classic textbook highlights the ascendance of
`explicit parallelism (data, thread, request) by devoting a whole chapter to each
`type. The chapter on data parallelism is particularly illuminating: the comparison
`and contrast between Vector SIMD, instruction level SIMD, and GPU cuts
`through the jargon associated with each architecture and exposes the similarities
`and differences between these architectures.”
`
`—Kunle Olukotun, Stanford University
`
`“The fifth edition of Computer Architecture: A Quantitative Approach explores
`the various parallel concepts and their respective tradeoffs. As with the previous
`editions, this new edition covers the latest technology trends. Two highlighted are
`the explosive growth of Personal Mobile Devices (PMD) and Warehouse Scale
`Computing (WSC)—where the focus has shifted towards a more sophisticated
`balance of performance and energy efficiency as compared with raw perfor-
`mance. These trends are fueling our demand for ever more processing capability
`which in turn is moving us further down the parallel path.”
`
`—Andrew N. Sloss, Consultant Engineer, ARM
`Author of ARM System Developer’s Guide
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 4
`
`

`

`Computer Architecture
`A Quantitative Approach
`
`Fifth Edition
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 5
`
`

`

`John L. Hennessy is the tenth president of Stanford University, where he has been a member
`of the faculty since 1977 in the departments of electrical engineering and computer science.
`Hennessy is a Fellow of the IEEE and ACM; a member of the National Academy of Engineering,
`the National Academy of Science, and the American Philosophical Society; and a Fellow of
`the American Academy of Arts and Sciences. Among his many awards are the 2001 Eckert-
`Mauchly Award for his contributions to RISC technology, the 2001 Seymour Cray Computer
`Engineering Award, and the 2000 John von Neumann Award, which he shared with David
`Patterson. He has also received seven honorary doctorates.
`
`In 1981, he started the MIPS project at Stanford with a handful of graduate students. After
`completing the project in 1984, he took a leave from the university to cofound MIPS Computer
`Systems (now MIPS Technologies), which developed one of the first commercial RISC
`microprocessors. As of 2006, over 2 billion MIPS microprocessors have been shipped in devices
`ranging from video games and palmtop computers to laser printers and network switches.
`Hennessy subsequently led the DASH (Director Architecture for Shared Memory) project, which
`prototyped the first scalable cache coherent multiprocessor; many of the key ideas have been
`adopted in modern multiprocessors. In addition to his technical activities and university
`responsibilities, he has continued to work with numerous start-ups both as an early-stage
`advisor and an investor.
`
`David A. Patterson has been teaching computer architecture at the University of California,
`Berkeley, since joining the faculty in 1977, where he holds the Pardee Chair of Computer
`Science. His teaching has been honored by the Distinguished Teaching Award from the
`University of California, the Karlstrom Award from ACM, and the Mulligan Education Medal and
`Undergraduate Teaching Award from IEEE. Patterson received the IEEE Technical Achievement
`Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE
`Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John von
`Neumann Medal and the C & C Prize with John Hennessy. Like his co-author, Patterson is a
`Fellow of the American Academy of Arts and Sciences, the Computer History Museum, ACM,
`and IEEE, and he was elected to the National Academy of Engineering, the National Academy
`of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on the Information
`Technology Advisory Committee to the U.S. President, as chair of the CS division in the Berkeley
`EECS department, as chair of the Computing Research Association, and as President of ACM.
`This record led to Distinguished Service Awards from ACM and CRA.
`
`At Berkeley, Patterson led the design and implementation of RISC I, likely the first VLSI reduced
`instruction set computer, and the foundation of the commercial SPARC architecture. He was a
`leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led to dependable
`storage systems from many companies. He was also involved in the Network of Workstations
`(NOW) project, which led to cluster technology used by Internet companies and later to cloud
`computing. These projects earned three dissertation awards from ACM. His current research
`projects are Algorithm-Machine-People Laboratory and the Parallel Computing Laboratory,
`where he is director. The goal of the AMP Lab is develop scalable machine learning algorithms,
`warehouse-scale-computer-friendly programming models, and crowd-sourcing tools to gain
`valueable insights quickly from big data in the cloud. The goal of the Par Lab is to develop tech-
`nologies to deliver scalable, portable, efficient, and productive software for parallel personal
`mobile devices.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 6
`
`

`

`Computer Architecture
`A Quantitative Approach
`
`Fifth Edition
`
`John L. Hennessy
`Stanford University
`
`David A. Patterson
`University of California, Berkeley
`
`With Contributions by
`Krste Asanovic´
`University of California, Berkeley
`Jason D. Bakos
`University of South Carolina
`Robert P. Colwell
`R&E Colwell & Assoc. Inc.
`Thomas M. Conte
`North Carolina State University
`José Duato
`Universitat Politècnica de València and Simula
`Diana Franklin
`University of California, Santa Barbara
`David Goldberg
`The Scripps Research Institute
`
`Norman P. Jouppi
`HP Labs
`Sheng Li
`HP Labs
`Naveen Muralimanohar
`HP Labs
`Gregory D. Peterson
`University of Tennessee
`Timothy M. Pinkston
`University of Southern California
`Parthasarathy Ranganathan
`HP Labs
`David A. Wood
`University of Wisconsin–Madison
`Amr Zaky
`University of Santa Clara
`
`Amsterdam • Boston • Heidelberg • London
`New York • Oxford • Paris • San Diego
`San Francisco • Singapore • Sydney • Tokyo
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 7
`
`

`

`Acquiring Editor: Todd Green
`Development Editor: Nate McFadden
`Project Manager: Paul Gottehrer
`Designer: Joanne Blank
`
`Morgan Kaufmann is an imprint of Elsevier
`225 Wyman Street, Waltham, MA 02451, USA
`
`© 2012 Elsevier, Inc. All rights reserved.
`
`No part of this publication may be reproduced or transmitted in any form or by any means, electronic
`or mechanical, including photocopying, recording, or any information storage and retrieval system,
`without permission in writing from the publisher. Details on how to seek permission, further informa-
`tion about the Publisher’s permissions policies and our arrangements with organizations such as the
`Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website:
`www.elsevier.com/permissions.
`
`This book and the individual contributions contained in it are protected under copyright by the
`Publisher (other than as may be noted herein).
`
`Notices
`Knowledge and best practice in this field are constantly changing. As new research and experience
`broaden our understanding, changes in research methods or professional practices, may become
`necessary. Practitioners and researchers must always rely on their own experience and knowledge in
`evaluating and using any information or methods described herein. In using such information or
`methods they should be mindful of their own safety and the safety of others, including parties for
`whom they have a professional responsibility.
`
`To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume
`any liability for any injury and/or damage to persons or property as a matter of products liability, neg-
`ligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas
`contained in the material herein.
`
`Library of Congress Cataloging-in-Publication Data
`Application submitted
`
`British Library Cataloguing-in-Publication Data
`A catalogue record for this book is available from the British Library.
`
`ISBN: 978-0-12-383872-8
`
`For information on all MK publications
`visit our website at www.mkp.com
`
`Printed in the United States of America
`11 12 13 14 15 10 9 8 7 6 5 4 3 2 1
`
`Typeset by: diacriTech, Chennai, India
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 8
`
`

`

`To Andrea, Linda, and our four sons
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 9
`
`

`

`This page intentionally left blank
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 10
`
`

`

`Foreword
`
`by Luiz André Barroso, Google Inc.
`
`1
`
`The first edition of Hennessy and Patterson’s Computer Architecture: A Quanti-
`tative Approach was released during my first year in graduate school. I belong,
`therefore, to that first wave of professionals who learned about our discipline
`using this book as a compass. Perspective being a fundamental ingredient to a
`useful Foreword, I find myself at a disadvantage given how much of my own
`views have been colored by the previous four editions of this book. Another
`obstacle to clear perspective is that the student-grade reverence for these two
`superstars of Computer Science has not yet left me, despite (or perhaps because
`of) having had the chance to get to know them in the years since. These disadvan-
`tages are mitigated by my having practiced this trade continuously since this
`book’s first edition, which has given me a chance to enjoy its evolution and
`enduring relevance.
`The last edition arrived just two years after the rampant industrial race for
`higher CPU clock frequency had come to its official end, with Intel cancelling its
`4 GHz single-core developments and embracing multicore CPUs. Two years was
`plenty of time for John and Dave to present this story not as a random product
`line update, but as a defining computing technology inflection point of the last
`decade. That fourth edition had a reduced emphasis on instruction-level parallel-
`ism (ILP) in favor of added material on thread-level parallelism, something the
`current edition takes even further by devoting two chapters to thread- and data-
`level parallelism while limiting ILP discussion to a single chapter. Readers who
`are being introduced to new graphics processing engines will benefit especially
`from the new Chapter 4 which focuses on data parallelism, explaining the
`different but slowly converging solutions offered by multimedia extensions in
`general-purpose processors and increasingly programmable graphics processing
`units. Of notable practical relevance: If you have ever struggled with CUDA
`terminology check out Figure 4.24 (teaser: “Shared Memory” is really local,
`while “Global Memory” is closer to what you’d consider shared memory).
`Even though we are still in the middle of that multicore technology shift, this
`edition embraces what appears to be the next major one: cloud computing. In this
`case, the ubiquity of Internet connectivity and the evolution of compelling Web
`services are bringing to the spotlight very small devices (smart phones, tablets)
`
`ix
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 11
`
`

`

`x ■ Foreword
`
`and very large ones (warehouse-scale computing systems). The ARM Cortex A8,
`a popular CPU for smart phones, appears in Chapter 3’s “Putting It All Together”
`section, and a whole new Chapter 6 is devoted to request- and data-level parallel-
`ism in the context of warehouse-scale computing systems. In this new chapter,
`John and Dave present these new massive clusters as a distinctively new class of
`computers—an open invitation for computer architects to help shape this emerg-
`ing field. Readers will appreciate how this area has evolved in the last decade by
`comparing the Google cluster architecture described in the third edition with the
`more modern incarnation presented in this version’s Chapter 6.
`Return customers of this book will appreciate once again the work of two outstanding
`computer scientists who over their careers have perfected the art of combining an
`academic’s principled treatment of ideas with a deep understanding of leading-edge
`industrial products and technologies. The authors’ success in industrial interactions
`won’t be a surprise to those who have witnessed how Dave conducts his biannual proj-
`ect retreats, forums meticulously crafted to extract the most out of academic–industrial
`collaborations. Those who recall John’s entrepreneurial success with MIPS or bump into
`him in a Google hallway (as I occasionally do) won’t be surprised by it either.
`Perhaps most importantly, return and new readers alike will get their money’s
`worth. What has made this book an enduring classic is that each edition is not an
`update but an extensive revision that presents the most current information and
`unparalleled insight into this fascinating and quickly changing field. For me, after
`over twenty years in this profession, it is also another opportunity to experience
`that student-grade admiration for two remarkable teachers.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 12
`
`

`

`Contents
`
`1
`
`Foreword
`Preface
`Acknowledgments
`
`Chapter 1
`
`Fundamentals of Quantitative Design and Analysis
`1.1
`Introduction
`1.2
`Classes of Computers
`1.3 Defining Computer Architecture
`1.4
`Trends in Technology
`1.5
`Trends in Power and Energy in Integrated Circuits
`1.6
`Trends in Cost
`1.7 Dependability
`1.8 Measuring, Reporting, and Summarizing Performance
`1.9 Quantitative Principles of Computer Design
`1.10 Putting It All Together: Performance, Price, and Power
`1.11 Fallacies and Pitfalls
`1.12 Concluding Remarks
`1.13 Historical Perspectives and References
`Case Studies and Exercises by Diana Franklin
`
`Chapter 2 Memory Hierarchy Design
`2.1
`Introduction
`2.2
`Ten Advanced Optimizations of Cache Performance
`2.3 Memory Technology and Optimizations
`2.4
`Protection: Virtual Memory and Virtual Machines
`2.5
`Crosscutting Issues: The Design of Memory Hierarchies
`2.6
`Putting It All Together: Memory Hierachies in the
`ARM Cortex-A8 and Intel Core i7
`Fallacies and Pitfalls
`
`2.7
`
`ix
`xv
`xxiii
`
`2
`5
`11
`17
`21
`27
`33
`36
`44
`52
`55
`59
`61
`61
`
`72
`78
`96
`105
`112
`
`113
`125
`
`xi
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 13
`
`

`

`xii
`
`■ Contents
`
`Chapter 3
`
`2.8
`Concluding Remarks: Looking Ahead
`2.9 Historical Perspective and References
`Case Studies and Exercises by Norman P. Jouppi,
`Naveen Muralimanohar, and Sheng Li
`
`Instruction-Level Parallelism and Its Exploitation
`3.1
`Instruction-Level Parallelism: Concepts and Challenges
`3.2
`Basic Compiler Techniques for Exposing ILP
`3.3
`Reducing Branch Costs with Advanced Branch Prediction
`3.4 Overcoming Data Hazards with Dynamic Scheduling
`3.5 Dynamic Scheduling: Examples and the Algorithm
`3.6 Hardware-Based Speculation
`3.7
`Exploiting ILP Using Multiple Issue and Static Scheduling
`3.8
`Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and
`Speculation
`3.9
`Advanced Techniques for Instruction Delivery and Speculation
`3.10 Studies of the Limitations of ILP
`3.11 Cross-Cutting Issues: ILP Approaches and the Memory System
`3.12 Multithreading: Exploiting Thread-Level Parallelism to Improve
`Uniprocessor Throughput
`3.13 Putting It All Together: The Intel Core i7 and ARM Cortex-A8
`3.14 Fallacies and Pitfalls
`3.15 Concluding Remarks: What’s Ahead?
`3.16 Historical Perspective and References
`Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
`
`Chapter 4
`
`Data-Level Parallelism in Vector, SIMD, and GPU Architectures
`4.1
`Introduction
`4.2
`Vector Architecture
`4.3
`SIMD Instruction Set Extensions for Multimedia
`4.4 Graphics Processing Units
`4.5 Detecting and Enhancing Loop-Level Parallelism
`4.6
`Crosscutting Issues
`4.7
`Putting It All Together: Mobile versus Server GPUs
`and Tesla versus Core i7
`4.8
`Fallacies and Pitfalls
`4.9
`Concluding Remarks
`4.10 Historical Perspective and References
`Case Study and Exercises by Jason D. Bakos
`
`Chapter 5
`
`Thread-Level Parallelism
`5.1
`Introduction
`5.2
`Centralized Shared-Memory Architectures
`5.3
`Performance of Symmetric Shared-Memory Multiprocessors
`
`129
`131
`
`131
`
`148
`156
`162
`167
`176
`183
`192
`
`197
`202
`213
`221
`
`223
`233
`241
`245
`247
`247
`
`262
`264
`282
`288
`315
`322
`
`323
`330
`332
`334
`334
`
`344
`351
`366
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 14
`
`

`

`Contents
`
`■ xiii
`
`5.4 Distributed Shared-Memory and Directory-Based Coherence
`5.5
`Synchronization: The Basics
`5.6 Models of Memory Consistency: An Introduction
`5.7
`Crosscutting Issues
`5.8
`Putting It All Together: Multicore Processors and Their Performance
`5.9
`Fallacies and Pitfalls
`5.10 Concluding Remarks
`5.11 Historical Perspectives and References
`Case Studies and Exercises by Amr Zaky and David A. Wood
`
`378
`386
`392
`395
`400
`405
`409
`412
`412
`
`Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and
`Data-Level Parallelism
`6.1
`432
`Introduction
`6.2
`Programming Models and Workloads for Warehouse-Scale Computers 436
`6.3
`Computer Architecture of Warehouse-Scale Computers
`441
`6.4
`Physical Infrastructure and Costs of Warehouse-Scale Computers
`446
`6.5
`Cloud Computing: The Return of Utility Computing
`455
`6.6
`Crosscutting Issues
`461
`6.7
`Putting It All Together: A Google Warehouse-Scale Computer
`464
`6.8
`Fallacies and Pitfalls
`471
`6.9
`Concluding Remarks
`475
`6.10 Historical Perspectives and References
`476
`Case Studies and Exercises by Parthasarathy Ranganathan
`476
`
`Appendix A
`
`Instruction Set Principles
`A.1
` Introduction
`A.2
` Classifying Instruction Set Architectures
`A.3
` Memory Addressing
`A.4
` Type and Size of Operands
`A.5
` Operations in the Instruction Set
`A.6
` Instructions for Control Flow
`A.7
` Encoding an Instruction Set
`A.8
` Crosscutting Issues: The Role of Compilers
`A.9
` Putting It All Together: The MIPS Architecture
`A.10 Fallacies and Pitfalls
`A.11 Concluding Remarks
`A.12 Historical Perspective and References
` Exercises by Gregory D. Peterson
`
`Appendix B
`
`Review of Memory Hierarchy
`B.1
` Introduction
`B.2
` Cache Performance
`B.3
` Six Basic Cache Optimizations
`
` A-2
` A-3
` A-7
` A-13
` A-14
` A-16
` A-21
` A-24
` A-32
` A-39
` A-45
` A-47
` A-47
`
` B-2
` B-16
` B-22
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 15
`
`

`

`xiv ■ Contents
`
`Appendix C
`
`B.4
`B.5
`B.6
`B.7
`B.8
`
`
` Virtual Memory
` Protection and Examples of Virtual Memory
` Fallacies and Pitfalls
` Concluding Remarks
` Historical Perspective and References
` Exercises by Amr Zaky
`
`Pipelining: Basic and Intermediate Concepts
`C.1
` Introduction
`C.2
` The Major Hurdle of Pipelining—Pipeline Hazards
`C.3
` How Is Pipelining Implemented?
`C.4
` What Makes Pipelining Hard to Implement?
`C.5
` Extending the MIPS Pipeline to Handle Multicycle Operations
`C.6
` Putting It All Together: The MIPS R4000 Pipeline
`C.7
` Crosscutting Issues
`C.8
` Fallacies and Pitfalls
`C.9
` Concluding Remarks
`C.10 Historical Perspective and References
`
` Updated Exercises by Diana Franklin
`
`Appendix D
`
`Appendix E
`
`Appendix F
`
`Appendix G
`
`Appendix H
`Appendix I
`Appendix J
`
`Appendix K
`Appendix L
`
`Online Appendices
`Storage Systems
`Embedded Systems
`By Thomas M. Conte
`Interconnection Networks
`Revised by Timothy M. Pinkston and José Duato
`Vector Processors in More Depth
`Revised by Krste Asanovic
`Hardware and Software for VLIW and EPIC
`Large-Scale Multiprocessors and Scientific Applications
`Computer Arithmetic
`by David Goldberg
`Survey of Instruction Set Architectures
`Historical Perspectives and References
`
`References
`Index
`
` B-40
` B-49
` B-57
` B-59
` B-59
` B-60
`
` C-2
` C-11
` C-30
` C-43
` C-51
` C-61
` C-70
` C-80
` C-81
` C-81
` C-82
`
`R-1
`I-1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 16
`
`

`

`Preface
`
`1
`
`Why We Wrote This Book
`
`Through five editions of this book, our goal has been to describe the basic princi-
`ples underlying what will be tomorrow’s technological developments. Our excite-
`ment about the opportunities in computer architecture has not abated, and we
`echo what we said about the field in the first edition: “It is not a dreary science of
`paper machines that will never work. No! It’s a discipline of keen intellectual
`interest, requiring the balance of marketplace forces to cost-performance-power,
`leading to glorious failures and some notable successes.”
`Our primary objective in writing our first book was to change the way people
`learn and think about computer architecture. We feel this goal is still valid and
`important. The field is changing daily and must be studied with real examples
`and measurements on real computers, rather than simply as a collection of defini-
`tions and designs that will never need to be realized. We offer an enthusiastic
`welcome to anyone who came along with us in the past, as well as to those who
`are joining us now. Either way, we can promise the same quantitative approach
`to, and analysis of, real systems.
`As with earlier versions, we have strived to produce a new edition that will
`continue to be as relevant for professional engineers and architects as it is for
`those involved in advanced computer architecture and design courses. Like the
`first edition, this edition has a sharp focus on new platforms—personal mobile
`devices and warehouse-scale computers—and new architectures—multicore and
`GPUs. As much as its predecessors, this edition aims to demystify computer
`architecture through an emphasis on cost-performance-energy trade-offs and
`good engineering design. We believe that the field has continued to mature and
`move toward the rigorous quantitative foundation of long-established scientific
`and engineering disciplines.
`
`xv
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 17
`
`

`

`xvi
`
`■ Preface
`
`This Edition
`
`We said the fourth edition of Computer Architecture: A Quantitative Approach
`may have been the most significant since the first edition due to the switch to
`multicore chips. The feedback we received this time was that the book had lost
`the sharp focus of the first edition, covering everthing equally but without empha-
`sis and context. We’re pretty sure that won’t be said about the fifth edition.
`We believe most of the excitement is at the extremes in size of computing,
`with personal mobile devices (PMDs) such as cell phones and tablets as the cli-
`ents and warehouse-scale computers offering cloud computing as the server.
`(Observant readers may seen the hint for cloud computing on the cover.) We are
`struck by the common theme of these two extremes in cost, performance, and
`energy efficiency despite their difference in size. As a result, the running context
`through each chapter is computing for PMDs and for warehouse scale computers,
`and Chapter 6 is a brand-new chapter on the latter topic.
`The other theme is parallelism in all its forms. We first idetify the two types of
`application-level parallelism in Chapter 1: data-level parallelism (DLP), which
`arises because there are many data items that can be operated on at the same time,
`and task-level parallelism (TLP), which arises because tasks of work are created
`that can operate independently and largely in parallel. We then explain the four
`architectural styles that exploit DLP and TLP: instruction-level parallelism (ILP)
`in Chapter 3; vector architectures and graphic processor units (GPUs) in Chapter
`4, which is a brand-new chapter for this edition; thread-level parallelism in
`Chapter 5; and request-level parallelism (RLP) via warehouse-scale computers in
`Chapter 6, which is also a brand-new chapter for this edition. We moved memory
`hierarchy earlier in the book to Chapter 2, and we moved the storage systems
`chapter to Appendix D. We are particularly proud about Chapter 4, which con-
`tains the most detailed and clearest explanation of GPUs yet, and Chapter 6,
`which is the first publication of the most recent details of a Google Warehouse-
`scale computer.
`As before, the first three appendices in the book give basics on the MIPS
`instruction set, memory hierachy, and pipelining for readers who have not read a
`book like Computer Organization and Design. To keep costs down but still sup-
`ply supplemental material that are of interest to some readers, available online at
`http://booksite.mkp.com/9780123838728/ are nine more appendices. There are
`more pages in these appendices than there are in this book!
`This edition continues the tradition of using real-world examples to demon-
`strate the ideas, and the “Putting It All Together” sections are brand new. The
`“Putting It All Together” sections of this edition include the pipeline organiza-
`tions and memory hierarchies of the ARM Cortex A8 processor, the Intel core i7
`processor, the NVIDIA GTX-280 and GTX-480 GPUs, and one of the Google
`warehouse-scale computers.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 18
`
`

`

`Preface
`
`■ xvii
`
`Topic Selection and Organization
`
`As before, we have taken a conservative approach to topic selection, for there are
`many more interesting ideas in the field than can reasonably be covered in a treat-
`ment of basic principles. We have steered away from a comprehensive survey of
`every architecture a reader might encounter. Instead, our presentation focuses on
`core concepts likely to be found in any new machine. The key criterion remains
`that of selecting ideas that have been examined and utilized successfully enough
`to permit their discussion in quantitative terms.
`Our intent has always been to focus on material that is not available in equiva-
`lent form from other sources, so we continue to emphasize advanced content
`wherever possible. Indeed, there are several systems here whose descriptions
`cannot be found in the literature. (Readers interested strictly in a more basic
`introduction to computer architecture should read Computer Organization and
`Design: The Hardware/Software Interface.)
`
`An Overview of the Content
`
`Chapter 1 has been beefed up in this edition. It includes formulas for energy,
`static power, dynamic power, integrated circuit costs, reliability, and availability.
`(These formulas are also found on the front inside cover.) Our hope is that these
`topics can be used through the rest of the book. In addition to the classic quantita-
`tive principles of computer design and performance measurement, the PIAT sec-
`tion has been upgraded to use the new SPECPower benchmark.
`Our view is that the instruction set architecture is playing less of a role today
`than in 1990, so we moved this material to Appendix A. It still uses the MIPS64
`architecture. (For quick review, a summary of the MIPS ISA can be found on the
`back inside cover.) For fans of ISAs, Appendix K covers 10 RISC architectures,
`the 80x86, the DEC VAX, and the IBM 360/370.
`We then move onto memory hierarchy in Chapter 2, since it is easy to apply
`the cost-performance-energy principles to this material and memory is a critical
`resource for the rest of the chapters. As in the past edition, Appendix B contains
`an introductory review of cache principles, which is available in case you need it.
`Chapter 2 discusses 10 advanced optimizations of caches. The chapter includes
`virtual machines, which offers advantages in protection, software management,
`and hardware management and play an important role in cloud computing. In
`addition to covering SRAM and DRAM technologies, the chapter includes new
`material on Flash memory. The PIAT examples are the ARM Cortex A8, which is
`used in PMDs, and the Intel Core i7, which is used in servers.
`Chapter 3 covers the exploitation of instruction-level parallelism in high-
`performance processors, including superscalar execution, branch prediction,
`speculation, dynamic scheduling, and multithreading. As mentioned earlier,
`Appendix C is a review of pipelining in case you need it. Chapter 3 also sur-
`veys the limits of ILP. Like Chapter 2, the PIAT examples are again the ARM
`Cortex A8 and the Intel Core i7. While the third edition contained a great deal
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2153, p. 19
`
`

`

`xviii
`
`■ Preface
`
`on Itanium and VLIW, this material is now in Appendix H, indicating our view
`that this architecture did not live up to the earlier claims.
`The increasing importance of multimedia applications such as games and video
`processing has also increased the importance of achitectures that can exploit data-
`level parallelism. In particular, there is a rising interest in computing using graphi-
`cal processing units (GPUs), yet few architects understand how GPUs really work.
`We decided to write a new chapter in large part to unveil this new style of com-
`puter architecture. Chapter 4 starts with an introduction to vector architectures,
`which acts as a foundation on which to build explanations of multimedia SIMD
`instrution set extensions and GPUs. (Appendix G goes into even more depth on
`vector architectures.) The section on GPUs was the most difficult to write in this
`book, in that it took many iterations to get an accurate description that was also
`easy to understand. A significant challenge was the terminology. We decided to go
`with our own terms and then provide a translation between our terms and the offi-
`cial NVIDIA terms. (A copy of that table can be found in the back inside cover
`pages.) This chapter introduces the Roofline performance model and then uses it
`to compare the Intel Core i7 and the NVIDIA GTX 280 and GTX 480 GPUs. The
`chapter also describes the Tegra 2 GPU for PMDs.
`Chapter 5 describes multicore processors. It explores symmetric and
`distributed-memory architectures, examining both organizational principles and
`performance. Topics in synchronization and memory consistency models are
`next. The example is the Intel Core i7. Readers interested in interconnection net-
`works on a chip should read Appendix F, and those interested in larger scale mul-
`tiprocessors and scientific applications should read Appendix I.
`As mentioned earlier, Chapter 6 describes the newest topic in computer archi-
`tecture, warehouse-scale computers (WSCs). Based on help from engineers at
`Amazon Web Services and Google, this chapter integrates details on design, cost,
`an

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket