`
`Reference 32
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 1
`
`
`
`COMPUTER
`ORGANIZATION
`AND DESIGN
`
`THE HARDWARE/SOFTWARE INTERFACE
`
`FIFTH EDITION
`
`DAVID A. PATTERSON
`JOHN L. HENNESSY
`
`/m-
`
`*
`
`r
`
`M<MORGAN KAUFMANN
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 2
`
`
`
` In Praise of Computer Organization and Design: The Hardware/
`Software Interface, Fifth Edition
`
` “Textbook selection is oft en a frustrating act of compromise—pedagogy, content
`coverage, quality of exposition, level of rigor, cost. Computer Organization and
`Design is the rare book that hits all the right notes across the board, without
`compromise. It is not only the premier computer organization textbook, it is a
`shining example of what all computer science textbooks could and should be.”
` —Michael Goldweber, Xavier University
`
` “I have been using Computer Organization and Design for years, from the very
`fi rst edition. Th e new Fift h Edition is yet another outstanding improvement on an
`already classic text. Th e evolution from desktop computing to mobile computing
`to Big Data brings new coverage of embedded processors such as the ARM, new
`material on how soft ware and hardware interact to increase performance, and
`cloud computing. All this without sacrifi cing the fundamentals.”
` —Ed Harcourt, St. Lawrence University
`
` “To Millennials: Computer Organization and Design is the computer architecture
`book you should keep on your (virtual) bookshelf. Th e book is both old and new,
`because it develops venerable principles—Moore's Law, abstraction, common case
`fast, redundancy, memory hierarchies, parallelism, and pipelining—but illustrates
`them with contemporary designs, e.g., ARM Cortex A8 and Intel Core i7.”
` —Mark D. Hill, University of Wisconsin-Madison
` “Th e new edition of Computer Organization and Design keeps pace with advances
`in emerging embedded and many-core (GPU) systems, where tablets and
`smartphones will are quickly becoming our new desktops. Th is text acknowledges
`these changes, but continues to provide a rich foundation of the fundamentals
`in computer organization and design which will be needed for the designers of
`hardware and soft ware that power this new class of devices and systems.”
` —Dave Kaeli, Northeastern University
` “Th e Fift h Edition of Computer Organization and Design provides more than an
`introduction to computer architecture. It prepares the reader for the changes necessary
`to meet the ever-increasing performance needs of mobile systems and big data
`processing at a time that diffi culties in semiconductor scaling are making all systems
`power constrained. In this new era for computing, hardware and soft ware must be co-
`designed and system-level architecture is as critical as component-level optimizations.”
` —Christos Kozyrakis, Stanford University
`
` “Patterson and Hennessy brilliantly address the issues in ever-changing computer
`hardware architectures, emphasizing on interactions among hardware and soft ware
`components at various abstraction levels. By interspersing I/O and parallelism concepts
`with a variety of mechanisms in hardware and soft ware throughout the book, the new
`edition achieves an excellent holistic presentation of computer architecture for the
`PostPC era. Th is book is an essential guide to hardware and soft ware professionals
`facing energy effi ciency and parallelization challenges in Tablet PC to cloud computing.”
` —Jae C. Oh, Syracuse University
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 3
`
`
`
`This page intentionally left blank
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 4
`
`
`
` F
`
`I
`
`F
`
`T H
`
`E D I
`
`T
`
`I O N
`
` Computer Organization and Design
`
` T H E H A R D W A R E / S O F T W A R E I N T E R F A C E
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 5
`
`
`
` David A. Patterson has been teaching computer architecture at the University of
`California, Berkeley, since joining the faculty in 1977, where he holds the Pardee Chair
`of Computer Science. His teaching has been honored by the Distinguished Teaching
`Award from the University of California, the Karlstrom Award from ACM, and the
`Mulligan Education Medal and Undergraduate Teaching Award from IEEE. Patterson
`received the IEEE Technical Achievement Award and the ACM Eckert-Mauchly Award
`for contributions to RISC, and he shared the IEEE Johnson Information Storage Award
`for contributions to RAID. He also shared the IEEE John von Neumann Medal and
`the C & C Prize with John Hennessy. Like his co-author, Patterson is a Fellow of the
`American Academy of Arts and Sciences, the Computer History Museum, ACM,
`and IEEE, and he was elected to the National Academy of Engineering, the National
`Academy of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on
`the Information Technology Advisory Committee to the U.S. President, as chair of the
`CS division in the Berkeley EECS department, as chair of the Computing Research
`Association, and as President of ACM. Th is record led to Distinguished Service Awards
`from ACM and CRA.
` At Berkeley, Patterson led the design and implementation of RISC I, likely the fi rst
`VLSI reduced instruction set computer, and the foundation of the commercial
`SPARC architecture. He was a leader of the Redundant Arrays of Inexpensive Disks
`(RAID) project, which led to dependable storage systems from many companies.
`He was also involved in the Network of Workstations (NOW) project, which led to
`cluster technology used by Internet companies and later to cloud computing. Th ese
`projects earned three dissertation awards from ACM. His current research projects
`are Algorithm-Machine-People and Algorithms and Specializers for Provably Optimal
`Implementations with Resilience and Effi ciency. Th e AMP Lab is developing scalable
`machine
`learning algorithms, warehouse-scale-computer-friendly programming
`models, and crowd-sourcing tools to gain valuable insights quickly from big data in
`the cloud. Th e ASPIRE Lab uses deep hardware and soft ware co-tuning to achieve the
`highest possible performance and energy effi ciency for mobile and rack computing
`systems.
` John L. Hennessy is the tenth president of Stanford University, where he has been
`a member of the faculty since 1977 in the departments of electrical engineering and
`computer science. Hennessy is a Fellow of the IEEE and ACM; a member of the
`National Academy of Engineering, the National Academy of Science, and the American
`Philosophical Society; and a Fellow of the American Academy of Arts and Sciences.
`Among his many awards are the 2001 Eckert-Mauchly Award for his contributions to
`RISC technology, the 2001 Seymour Cray Computer Engineering Award, and the 2000
`John von Neumann Award, which he shared with David Patterson. He has also received
`seven honorary doctorates.
` In 1981, he started the MIPS project at Stanford with a handful of graduate students.
`Aft er completing the project in 1984, he took a leave from the university to cofound
`MIPS Computer Systems (now MIPS Technologies), which developed one of the fi rst
`commercial RISC microprocessors. As of 2006, over 2 billion MIPS microprocessors have
`been shipped in devices ranging from video games and palmtop computers to laser printers
`and network switches. Hennessy subsequently led the DASH (Director Architecture
`for Shared Memory) project, which prototyped the fi rst scalable cache coherent
`multiprocessor; many of the key ideas have been adopted in modern multiprocessors.
`In addition to his technical activities and university responsibilities, he has continued to
`work with numerous start-ups both as an early-stage advisor and an investor.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 6
`
`
`
` F
`
`I
`
`F
`
`T H
`
`E D I
`
`T
`
`I O N
`
`
`
` Computer Organization and Design
`
` T H E H A R D W A R E / S O F T W A R E I N T E R F A C E
`
` David A. Patterson
` University of California, Berkeley
`
` John L. Hennessy
` Stanford University
`
` With contributions by
`Perry Alexander
`Th e University of Kansas
`Peter J. Ashenden
`Ashenden Designs Pty Ltd
`Jason D. Bakos
`University of South Carolina
`Javier Bruguera
`Universidade de Santiago de Compostela
`Jichuan Chang
`Hewlett-Packard
`Matthew Farrens
`University of California, Davis
`
`David Kaeli
`Northeastern University
`Nicole Kaiyan
`University of Adelaide
`David Kirk
`NVIDIA
`James R. Larus
`School of Computer and
`Communications Science at EPFL
`Jacob Leverich
`Hewlett-Packard
`
`Kevin Lim
`Hewlett-Packard
`John Nickolls
`NVIDIA
`John Oliver
`Cal Poly, San Luis Obispo
`Milos Prvulovic
`Georgia Tech
`Partha Ranganathan
`Hewlett-Packard
`
` AMSTERDAM • BOSTON • HEIDELBERG • LONDON
`NEW YORK • OXFORD • PARIS • SAN DIEGO
`SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
` Morgan Kaufmann is an imprint of Elsevier
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 7
`
`
`
` Acquiring Editor: Todd Green
` Development Editor: Nate McFadden
` Project Manager: Lisa Jones
` Designer: Russell Purdy
` Morgan Kaufmann is an imprint of Elsevier
` Th e Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB
` 225 Wyman Street, Waltham, MA 02451, USA
` Copyright © 2014 Elsevier Inc. All rights reserved
` No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including
`photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how
`to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the
`Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions
` Th is book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted
`herein).
`
` Notices
` Knowledge and best practice in this fi eld are constantly changing. As new research and experience broaden our understanding, changes in
`research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience
`and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be
`mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
` To the fullest extent of the law, neither the publisher nor the authors, contributors, or editors, assume any liability for any injury and/
`or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods,
`products, instructions, or ideas contained in the material herein.
` Library of Congress Cataloging-in-Publication Data
` Patterson, David A.
` Computer organization and design: the hardware/soft ware interface/David A. Patterson, John L. Hennessy. — 5th ed.
` p. cm. — (Th e Morgan Kaufmann series in computer architecture and design)
`
`
` Rev. ed. of: Computer organization and design/John L. Hennessy, David A. Patterson. 1998.
` Summary: “Presents the fundamentals of hardware technologies, assembly language, computer arithmetic, pipelining, memory hierarchies
`and I/O”— Provided by publisher.
` ISBN 978-0-12-407726-3 (pbk.)
` 1. Computer organization. 2. Computer engineering. 3. Computer interfaces. I. Hennessy, John L. II. Hennessy, John L. Computer
`organization and design. III. Title.
` British Library Cataloguing-in-Publication Data
` A catalogue record for this book is available from the British Library
` ISBN: 978-0-12-407726-3
`
` For information on all MK publications visit our
`website at www.mkp.com
`
` Printed and bound in the United States of America
` 13 14 15 16 10 9 8 7 6 5 4 3 2 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 8
`
`
`
` To Linda,
`who has been, is, and always will be the love of my life
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 9
`
`
`
` A C K N O W L E D G M E N T S
`
` Figures 1.7, 1.8 Courtesy of iFixit ( www.ifi xit.com ).
` Figure 1.9 Courtesy of Chipworks ( www.chipworks.com ).
` Figure 1.13 Courtesy of Intel.
` Figures 1.10.1, 1.10.2, 4.15.2 Courtesy of the Charles Babbage
`Institute, University of Minnesota Libraries, Minneapolis.
` Figures 1.10.3, 4.15.1, 4.15.3, 5.12.3, 6.14.2 Courtesy of IBM.
`
` Figure 1.10.4 Courtesy of Cray Inc.
` Figure 1.10.5 Courtesy of Apple Computer, Inc.
` Figure 1.10.6 Courtesy of the Computer History Museum.
` Figures 5.17.1, 5.17.2 Courtesy of Museum of Science, Boston.
` Figure 5.17.4 Courtesy of MIPS Technologies, Inc.
` Figure 6.15.1 Courtesy of NASA Ames Research Center.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 10
`
`
`
`Contents
`
`Preface xv
`
`C H A P T E R S
`
` 1
`
`Computer Abstractions and T echnology 2
`
`Introduction 3
`1.1
`1.2 Eight Great Ideas in Computer Architecture 11
`1.3 Below Your Program 13
`1.4 Under the Covers 16
`1.5 Technologies for Building Processors and Memory 24
`1.6 Performance 28
`1.7 Th e Power Wall 40
` Th e Sea Change: Th e Switch from Uniprocessors to
`1.8
`Multiprocessors 43
`1.9 Real Stuff : Benchmarking the Intel Core i7 46
`1.10 Fallacies and Pitfalls 49
`1.11 Concluding Remarks 52
`1.12 Historical Perspective and Further Reading 54
`1.13 Exercises 54
`
` 2
`
`Instructions: Language of the Computer 60
`
`Introduction 62
`2.1
`2.2 Operations of the Computer Hardware 63
`2.3 Operands of the Computer Hardware 66
`2.4 Signed and Unsigned Numbers 73
`2.5 Representing Instructions in the Computer 80
`2.6 Logical Operations 87
`2.7
`Instructions for Making Decisions 90
`2.8 Supporting Procedures in Computer Hardware 96
`2.9 Communicating with People 106
`2.10 MIPS Addressing for 32-Bit Immediates and Addresses 111
`2.11 Parallelism and Instructions: Synchronization 121
`2.12 Translating and Starting a Program 123
`2.13 A C Sort Example to Put It All Together 132
`2.14 Arrays versus Pointers 141
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 11
`
`
`
`x
`
`Contents
`
`2.15 Advanced Material: Compiling C and Interpreting Java 145
`2.16 Real Stuff : ARMv7 (32-bit) Instructions 145
`2.17 Real Stuff : x86 Instructions 149
`2.18 Real Stuff : ARMv8 (64-bit) Instructions 158
`2.19 Fallacies and Pitfalls 159
`2.20 Concluding Remarks 161
`2.21 Historical Perspective and Further Reading 163
`2.22 Exercises 164
`
` 3
`
`Arithmetic for Computer s 176
`
`Introduction 178
`3.1
`3.2 Addition and Subtraction 178
`3.3 Multiplication 183
`3.4 Division 189
`3.5 Floating Point 196
`3.6 Parallelism and Computer Arithmetic: Subword Parallelism 222
` Real Stuff : Streaming SIMD Extensions and Advanced Vector
`3.7
`Extensions in x86 224
`3.8 Going Faster: Subword Parallelism and Matrix Multiply 225
`3.9 Fallacies and Pitfalls 229
`3.10 Concluding Remarks 232
`3.11 Historical Perspective and Further Reading 236
`3.12 Exercises 237
`
` 4
`
`The Processor 242
`
`Introduction 244
`4.1
`4.2 Logic Design Conventions 248
`4.3 Building a Datapath 251
`4.4 A Simple Implementation Scheme 259
`4.5 An Overview of Pipelining 272
`4.6 Pipelined Datapath and Control 286
`4.7 Data Hazards: Forwarding versus Stalling 303
`4.8 Control Hazards 316
`4.9 Exceptions 325
`4.10 Parallelism via Instructions 332
`4.11 Real Stuff : Th e ARM Cortex-A8 and Intel Core i7 Pipelines 344
`4.12
` Going Faster: Instruction-Level Parallelism and Matrix
`Multiply 351
` Advanced Topic: An Introduction to Digital Design Using a Hardware
`Design Language to Describe and Model a Pipeline and More Pipelining
`Illustrations 354
`
`4.13
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 12
`
`
`
`
`
`Contents
`
`xi
`
`4.14 Fallacies and Pitfalls 355
`4.15 Concluding Remarks 356
`4.16 Historical Perspective and Further Reading 357
`4.17 Exercises 357
`
` 5
`
`Large and Fast: Exploiting Memor y Hierarchy 372
`
`Introduction 374
`5.1
`5.2 Memory Technologies 378
`5.3 Th e Basics of Caches 383
`5.4 Measuring and Improving Cache Performance 398
`5.5 Dependable Memory Hierarchy 418
`5.6 Virtual Machines 424
`5.7 Virtual Memory 427
`5.8 A Common Framework for Memory Hierarchy 454
`5.9 Using a Finite-State Machine to Control a Simple Cache 461
`5.10 Parallelism and Memory Hierarchies: Cache Coherence 466
`5.11
` Parallelism and Memory Hierarchy: Redundant Arrays of
`Inexpensive Disks 470
`5.12 Advanced Material: Implementing Cache Controllers 470
` Real Stuff : Th e ARM Cortex-A8 and Intel Core i7 Memory
`5.13
`Hierarchies 471
`5.14 Going Faster: Cache Blocking and Matrix Multiply 475
`5.15 Fallacies and Pitfalls 478
`5.16 Concluding Remarks 482
`5.17 Historical Perspective and Further Reading 483
`5.18 Exercises 483
`
` 6
`
`Parallel Processor s from Client to Cloud 500
`
`Introduction 502
`6.1
`6.2 Th e Diffi culty of Creating Parallel Processing Programs 504
`6.3 SISD, MIMD, SIMD, SPMD, and Vector 509
`6.4 Hardware Multithreading 516
`6.5 Multicore and Other Shared Memory Multiprocessors 519
`6.6
`Introduction to Graphics Processing Units 524
`6.7
` Clusters, Warehouse Scale Computers, and Other
`Message-Passing Multiprocessors 531
`Introduction to Multiprocessor Network Topologies 536
`6.8
`6.9 Communicating to the Outside World: Cluster Networking 539
`6.10 Multiprocessor Benchmarks and Performance Models 540
` Real Stuff : Benchmarking Intel Core i7 versus NVIDIA Tesla
`6.11
`GPU 550
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 13
`
`
`
`xii
`
`Contents
`
`6.12 Going Faster: Multiple Processors and Matrix Multiply 555
`6.13 Fallacies and Pitfalls 558
`6.14 Concluding Remarks 560
`6.15 Historical Perspective and Further Reading 563
`6.16 Exercises 563
`
`A P P E N D I C E S
`
` A
`
`Assemblers, Linkers, and the SPIM Simulator A-2
`
`Introduction A-3
`A.1
`A.2 Assemblers A-10
`A.3 Linkers A-18
`A.4 Loading A-19
`A.5 Memory Usage A-20
`A.6 Procedure Call Convention A-22
`A.7 Exceptions and Interrupts A-33
`A.8
`Input and Output A-38
`A.9 SPIM A-40
`A.10 MIPS R2000 Assembly Language A-45
`A.11 Concluding Remarks A-81
`A.12 Exercises A-82
`
` B
`
`The Basics of Logic Design B-2
`
`Introduction B-3
`B.1
`B.2 Gates, Truth Tables, and Logic Equations B-4
`B.3 Combinational Logic B-9
`B.4 Using a Hardware Description Language B-20
`B.5 Constructing a Basic Arithmetic Logic Unit B-26
`B.6 Faster Addition: Carry Lookahead B-38
`B.7 Clocks B-48
`B.8 Memory Elements: Flip-Flops, Latches, and Registers B-50
`B.9 Memory Elements: SRAMs and DRAMs B-58
`B.10 Finite-State Machines B-67
`B.11 Timing Methodologies B-72
`B.12 Field Programmable Devices B-78
`B.13 Concluding Remarks B-79
`B.14 Exercises B-80
`I-1
`
`Index
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 14
`
`
`
`
`
`Contents
`
`xiii
`
`O N L I N E C O N T E N T
`
`C
`
`Graphics and Computing GPUs C-2
`
`Introduction C-3
`C.1
`C.2 GPU System Architectures C-7
`C.3 Programming GPUs C-12
`C.4 Multithreaded Multiprocessor Architecture C-25
`C.5 Parallel Memory System C-36
`C.6 Floating Point Arithmetic C-41
`C.7 Real Stuff : Th e NVIDIA GeForce 8800 C-46
`C.8 Real Stuff : Mapping Applications to GPUs C-55
`C.9 Fallacies and Pitfalls C-72
`C.10 Concluding Remarks C-76
`C.11 Historical Perspective and Further Reading C-77
`
`
`D
`
`Mapping Control to Hard ware D-2
`
`
`
`E
`
`Introduction D-3
`D.1
`Implementing Combinational Control Units D-4
`D.2
`Implementing Finite-State Machine Control D-8
`D.3
`Implementing the Next-State Function with a Sequencer D-22
`D.4
`D.5 Translating a Microprogram to Hardware D-28
`D.6 Concluding Remarks D-32
`D.7 Exercises D-33
`
` A Survey of RISC Architectures for Desktop, Server,
`and Embedded Computers E-2
`E.1
`Introduction E-3
`E.2 Addressing Modes and Instruction Formats E-5
`Instructions: Th e MIPS Core Subset E-9
`E.3
`E.4
` Instructions: Multimedia Extensions of the Desktop/Server RISCs E-16
`E.5
` Instructions: Digital Signal-Processing Extensions of the Embedded
`RISCs E-19
`Instructions: Common Extensions to MIPS Core E-20
`E.6
`Instructions Unique to MIPS-64 E-25
`E.7
`Instructions Unique to Alpha E-27
`E.8
`Instructions Unique to SPARC v9 E-29
`E.9
`E.10 Instructions Unique to PowerPC E-32
`E.11 Instructions Unique to PA-RISC 2.0 E-34
`E.12 Instructions Unique to ARM E-36
`E.13 Instructions Unique to Th umb E-38
`E.14 Instructions Unique to SuperH E-39
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 15
`
`
`
`xiv
`
`Contents
`
`E.15 Instructions Unique to M32R E-40
`E.16 Instructions Unique to MIPS-16 E-40
`E.17 Concluding Remarks E-43
`Glossary G-1
`Further Reading FR-1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 16
`
`
`
` Preface
`
` Th e most beautiful thing we can experience is the mysterious. It is the
`source of all true art and science.
`
` Albert Einstein, What I Believe, 1930
`
` About This Book
` We believe that learning in computer science and engineering should refl ect
`the current state of the fi eld, as well as introduce the principles that are shaping
`computing. We also feel that readers in every specialty of computing need
`to appreciate the organizational paradigms that determine the capabilities,
`performance, energy, and, ultimately, the success of computer systems.
` Modern computer technology requires professionals of every computing
`specialty to understand both hardware and soft ware. Th e interaction between
`hardware and soft ware at a variety of levels also off ers a framework for understanding
`the fundamentals of computing. Whether your primary interest is hardware or
`soft ware, computer science or electrical engineering, the central ideas in computer
`organization and design are the same. Th us, our emphasis in this book is to show
`the relationship between hardware and soft ware and to focus on the concepts that
`are the basis for current computers.
` Th e recent switch from uniprocessor to multicore microprocessors confi rmed
`the soundness of this perspective, given since the fi rst edition. While programmers
`could ignore the advice and rely on computer architects, compiler writers, and silicon
`engineers to make their programs run faster or be more energy-effi cient without
`change, that era is over. For programs to run faster, they must become parallel.
`While the goal of many researchers is to make it possible for programmers to be
`unaware of the underlying parallel nature of the hardware they are programming,
`it will take many years to realize this vision. Our view is that for at least the next
`decade, most programmers are going to have to understand the hardware/soft ware
`interface if they want programs to run effi ciently on parallel computers.
` Th e audience for this book includes those with little experience in assembly
`language or logic design who need to understand basic computer organization as
`well as readers with backgrounds in assembly language and/or logic design who
`want to learn how to design a computer or understand how a system works and
`why it performs as it does.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 17
`
`
`
`xvi
`
`Preface
`
` About the Other Book
` Some readers may be familiar with Computer Architecture: A Quantitative
`Approach , popularly known as Hennessy and Patterson. (Th is book in turn is
`oft en called Patterson and Hennessy.) Our motivation in writing the earlier book
`was to describe the principles of computer architecture using solid engineering
`fundamentals and quantitative cost/performance tradeoff s. We used an approach
`that combined examples and measurements, based on commercial systems, to
`create realistic design experiences. Our goal was to demonstrate that computer
`architecture could be learned using quantitative methodologies instead of a
`descriptive approach. It was intended for the serious computing professional who
`wanted a detailed understanding of computers.
` A majority of the readers for this book do not plan to become computer
`architects. Th e performance and energy effi ciency of future soft ware systems will
`be dramatically aff ected, however, by how well soft ware designers understand the
`basic hardware techniques at work in a system. Th us, compiler writers, operating
`system designers, database programmers, and most other soft ware engineers need
`a fi rm grounding in the principles presented in this book. Similarly, hardware
`designers must understand clearly the eff ects of their work on soft ware applications.
` Th us, we knew that this book had to be much more than a subset of the material
`in Computer Architecture , and the material was extensively revised to match the
`diff erent audience. We were so happy with the result that the subsequent editions of
` Computer Architecture were revised to remove most of the introductory material;
`hence, there is much less overlap today than with the fi rst editions of both books.
`
` Changes for the Fifth Edition
` We had six major goals for the fi ft h edition of Computer Organization and Design:
`demonstrate the importance of understanding hardware with a running example;
`highlight major themes across the topics using margin icons that are introduced
`early; update examples to refl ect changeover from PC era to PostPC era; spread the
`material on I/O throughout the book rather than isolating it into a single chapter;
`update the technical content to refl ect changes in the industry since the publication
`of the fourth edition in 2009; and put appendices and optional sections online
`instead of including a CD to lower costs and to make this edition viable as an
`electronic book.
` Before discussing the goals in detail, let’s look at the table on the next page. It
`shows the hardware and soft ware paths through the material. Chapters 1, 4, 5, and
`6 are found on both paths, no matter what the experience or the focus. Chapter 1
`discusses the importance of energy and how it motivates the switch from single
`core to multicore microprocessors and introduces the eight great ideas in computer
`architecture. Chapter 2 is likely to be review material for the hardware-oriented,
`but it is essential reading for the soft ware-oriented, especially for those readers
`interested in learning more about compilers and object-oriented programming
`languages. Chapter 3 is for readers interested in constructing a datapath or in
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 18
`
`
`
`
`
`Preface
`
`xvii
`
`Chapter or Appendix
`
`Sections
`
`Software focus
`
`Hardware focus
`
`1. Computer Abstractions
`and Technology
`
`2. Instructions: Language
`of the Computer
`
`E. RISC Instructi on-Set Architectures
`
`3. Arithmetic for Computers
`
`B. The Basics of Logic Design
`
`4. The Processor
`
`D. Mapping Control to Hardware
`
`5. Large and Fast: Exploiting
`Memory Hierarchy
`
`6. Parallel Process from Client
`to Cloud
`
`1.1 to 1.11
` 1.12 (History)
`2.1 to 2.14
` 2.15 (Compilers & Java)
`2.16 to 2.20
` 2.21 (History)
` E.1 to E.17
`3.1 to 3.5
`
`3.6 to 3.8 (Subword Parallelism)
`
`3.9 to 3.10 (Fallacies)
`
` 3.11 (History)
`B.1 to B.13
`4.1 (Overview)
`4.2 (Logic Conventions)
`4.3 to 4.4 (Simple Implementation)
`4.5 (Pipelining Overview)
`4.6 (Pipelined Datapath)
`4.7 to 4.9 (Hazards, Exceptions)
`4.10 to 4.12 (Parallel, Real Stuff)
` 4.13 (Verilog Pipeline Control)
`4.14 to 4.15 (Fallacies)
` 4.16 (History)
` D.1 to D.6
`
`5.1 to 5.10
`
` 5.11 (Redundant Arrays of
`Inexpensive Disks)
`
` 5.12 (Verilog Cache Controller)
`5.13 to 5.16
` 5.17 (History)
`6.1 to 6.8
` 6.9 (Networks)
`6.10 to 6.14
` 6.15 (History)
`
`A. Assemblers, Linkers, and
`the SPIM Simulator
`
` A.1 to A.11
`
`C. Graphics Processor Units
`
` C.1 to C.13
`
`Read carefully
`Review or read
`
`Read if have time
`Read for culture
`
`Reference
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 19
`
`
`
`xviii
`
`Preface
`
`learning more about fl oating-point arithmetic. Some will skip parts of Chapter 3,
`either because they don’t need them or because they off er a review. However, we
`introduce the running example of matrix multiply in this chapter, showing how
`subword parallels off ers a fourfold improvement, so don’t skip sections 3.6 to 3.8.
`Chapter 4 explains pipelined processors. Sections 4.1, 4.5, and 4.10 give overviews
`and Section 4.12 gives the next performance boost for matrix multiply for those with
`a soft ware focus. Th ose with a hardware focus, however, will fi nd that this chapter
`presents core material; they may also, depending on their background, want to read
`Appendix C on logic design fi rst. Th e last chapter on multicores, multiprocessors,
`and clusters, is mostly new content and should be read by everyone. It was
`signifi cantly reorganized in this edition to make the fl ow of ideas more natural
`and to include much more depth on GPUs, warehouse scale computers, and the
`hardware-soft ware interface of network interface cards that are key to clusters.
` Th e fi rst of the six goals for this fi rth edition was to demonstrate the importance
`of understanding modern hardware to get good performance and energy effi ciency
`with a concrete example. As mentioned above, we start with subword parallelism
`in Chapter 3 to improve matrix multiply by a factor of 4. We double performance
`in Chapter 4 by unrolling the loop to demonstrate the value of instruction level
`parallelism. Chapter 5 doubles performance again by optimizing for caches using
`blocking. Finally, Chapter 6 demonstrates a speedup of 14 from 16 processors by
`using thread-level parallelism. All four optimizations in total add just 24 lines of C
`code to our initial matrix multiply example.
` Th e second goal was to help readers separate the forest from the trees by
`identifying eight great ideas of computer architecture early and then pointing out
`all the places they occur throughout the rest of the book. We use (hopefully) easy
`to remember margin icons and highlight the corresponding word in the text to
`remind readers of these eight themes. Th ere are nearly 100 citations in the book.
`No chapter has less than seven examples of great ideas, and no idea is cited less than
`fi ve times. Performance via parallelism, pipelining, and prediction are the three
`most popular great ideas, followed closely by Moore’s Law. Th e processor chapter
`(4) is the one with the most examples, which is not a surprise since it probably
`received the most attention from computer architects. Th e one great idea found in
`every chapter is performance via parallelism, which is a pleasant observation given
`the recent emphasis in parallelism in the fi eld and in editions of this book.
` Th e third goal was to recognize the generation change in computing from the
`PC era to the PostPC era by this edition with our examples and material. Th us,
`Chapter 1 dives into the guts of a tablet computer rather than a PC, and Chapter 6
`describes the computing infrastructure of the cloud. We also feature the ARM,
`which is the instruction set of choice in the personal mobile devices of the PostPC
`era, as well as the x86 instruction set that dominated the PC Era and (so far)
`dominates cloud computing.
` Th e fourth goal was to spread the I/O material throughout the book rather
`than have it in its own chapter, much as we spread parallelism throughout all the
`chapters in the fourth edition. Hence, I/O material in this edition can be found in
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2144, p. 20
`
`
`
`
`
`Preface
`
`xix
`
`Sections 1.4, 4.9, 5.2, 5.5, 5.11, and 6.9. Th e thought is that readers (and instructors)
`are more likely to cover I/O if it’s not segregated to its own chapter.
` Th is is a fast-moving fi eld, and, as is always the case for our new editions, an
`important goal is to update the technical content. Th e running example is the ARM
`Cortex A8 and the Intel Core i7, refl ecting our PostPC Era. Other highlights include
`an overview the new 64-bit instruction set of ARMv8, a tutorial on GPUs that
`explains their unique terminology, more depth on the warehouse scale computers
`that make up the cloud, and a deep dive into 10 Gigabyte Ethernet cards.
` To keep the main book short and compatible with electronic books, we placed
`the optional material as online appendices instead of on a companion CD as in
`prior editions.
` Finally, we updated all the exercises in the book.
` While some elements changed, we have preserved useful book elements from
`prior editions. To make the book work better as a reference, we still place defi nitions
`of new terms in the margins at their fi rst occurrence. Th e book element called
`“Understanding Program Performance” sections helps readers understand the
`performance of their programs and how to improve it, just as the “Hardware/Soft ware
`Interface” book element helped readers understand the tradeoff s at this interface.
`“Th e Big Picture” section remains so that the reader sees the forest despite all the
`trees. “Check Yourself” sec