`Ex. 1017, Cover
`
`
`
`S
`
`E C O N D
`
`E D I T I O N
`
`Computer Organization and Design
`
`THE HARDWARE/SOFTWARE INTERFACE
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, Cover-2
`
`
`
`TRADEMARKS
`
`The following trademarks are the property of the following organizations:
`
`TeX is a trademark of Americal Mathematical Society.
`
`Apple II and Macintosh are trademarks of Apple Computers, Inc.
`
`CDC 6600, CDC 7600, CDC STAR-100, CYBER-180, CYBER-
`180/990, and CYBER-205 are trademarks of Control Data Corpora-
`tion.
`
`The Cosmic Cube is a trademark of California Institute of Technol-
`ogy.
`
`CP3100 is a trademark of Conner Peripherals.
`
`Cray, CRAY-1, CRAY J90, CRAY T90, CRAY X-MP/416, and
`CRAY Y-MP are trademarks of Cray Research.
`
`Alpha, AlphaServer, AlphaStation, DEC, DECsystem, DECsystem
`3100, DECstation, PDP-8, PDP-11, Unibus, VAX, VAX 8700, and
`VAX11/780 are trademarks of Digital Equipment Corporation.
`
`MP2361A, Super Eagle, VP100, VP200, and VPP300 are trademarks
`of Fujitsu Corporation.
`
`Gnu C Compiler is a trademark of Free Software Foundation.
`
`Goodyear MPP is a trademark of Goodyear Tire and Rubber Co.,
`Inc.
`
`Apollo DN 300, Apollo DN 10000, Convex, HP, HP Precision
`Architecture, HPPA, HP850, HP 3000, HP 300/70, PA-RISC, and
`Precision are registered trademarks of Hewlet-Packard Company.
`
`432, 960 CA, 4004, 8008, 8080, 8086, 8087, 8088, 80186, 80286, 80386,
`80486, Delta, iAPX 432, i860, Intel, Inte1486, Intel Hypercube, iP-
`SC/2, MMX, Multibus, Multibus II, Paragon, and Pentium are
`trademarks of Intel Corporation. Intel Inside is a registered trade-
`mark of Intel Corporation.
`
`360, 360/30, 360/40, 360/50, 360/65, 360/85, 360/91,370, 370/158,
`370/165, 370/168, 370-XA, ESA/370, 701, 704, 709, 801, 3033, 3080,
`3080 series, 3080 VF, 3081, 3090, 3090/100, 3090/200, 3090/400,
`3090/600, 3090/600S, 3090 VF, 3330, 3380, 3380D, 3380 Disk Model
`AK4, 3380J, 3390, 3880-23, 3990, 7090, 7094, IBM, IBM PC, IBM PC-
`AT, IBM SVS, ISAM, MVS, PL.8, PowerPC, POWERstation, RT-PC,
`RAMAC, RS/6000, Sage, Stretch, System/360, Vector Faility, and
`VM are trademarks of International Business Machines Corpora-
`tion. POWERserver, RISC System/6000, and SP2 are registered
`trademarks of International Business Machines Corporation.
`
`ICL DAP is a trademark of International Computers Limited.
`
`Inmos and Transputer are trademarks of Inmos.
`
`FutureBus is a trademark of the Institute of Electrical and Electron-
`ic Engineers.
`
`KSR-1 is a trademark of Kendall Square Research.
`
`MASPAR MP-1 and MASPAR MP-2 are trademarks of MasPar
`Corporation.
`
`MIPS, R2000, R3000, and R10000 are registered trademarks of
`MIPS Technology, Inc.
`
`Windows is a trademark of Microsoft Corporation.
`
`NuBus is a trademark of Massachusetts Institute of Technology.
`
`Delta Series 8608, System V/88 R32V1, VME bus, 6809, 68000,
`68010, 68020, 68030, 68881, 68882, 88000, 88000 1.8.4m14, 88100,
`and 88200 are trademarks of Motorola Corporation.
`
`Ncube and nCube/ten are trademarks of Ncube Corporation.
`
`NEC is ~i registered trademark of NEC Corporation.
`
`Network Computer is a trademark of Oracle Corporation.
`
`Parsytec GC is a trademark of Parsytec, Inc.
`
`hnprimis, IPI-2, Sabre, Sabre 97209, Seagate, and Wren IV are
`trademarks of Seagate Technology, Inc.
`
`NUMA-Q, Sequent, and Symmetry are trademarks of Sequent
`Computers.
`
`Power Challenge, Silicon Graphics, Silicon Graphics 43/240,
`Silicon Graphics 4D/60, Silicon Graphics 4D/240, and Silicon
`Graphics 4D Series are trademarks of Silicon Graphics. Origin2000
`is a registered trademark of Silicon Graphics.
`
`SPEC is a registered trademark of the Standard Performance Eval-
`uation Corporation.
`
`Spice is a trademark of University of California at Berkeley.
`
`Enterprise, Java, Sun, Sun Ultra, Sun Microsystems, and Ultra are
`trademarks of Sun Microsystems, Inc. SPARC and UltraSPARC
`are registered trademarks of SPARC International, Inc., licensed to
`Sun Microsystems, Inc.
`
`Connection Machine, CM-2, and CM-5 are trademarks of Thinking
`Machines.
`
`Burroughts 6500, B5000, B5500, D-machine, UNIVAC, UNIVAC I,
`and UNIVAC 1103 are trademarks of UNISYS.
`
`Alto, PARC, Palo Alto Research Center, and Xerox are trademarks
`of Xerox Corporation.
`
`The UNIX trademark is licensed exclusively through X/Open
`Company Ltd.
`
`All other product names are trademarks or registered trademarks
`of their respective companies. Where trademarks appear in this
`book and Morgan Kaufmann Publishers was aware of a trademark
`claim, the trademarks have been printed in initial caps or all caps.
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, Cover-3
`
`
`
`S
`
`E C O N D
`
`E D I T I O N
`
`Computer Organization and Design
`
`THE HARDWARE/SOFTWARE INTERFACE
`
`John L. Hennessy
`Stanford University
`
`David A. Patterson
`University of California, Berkeley
`
`With a contribution by
`James R. Larus
`University of Wisconsin
`
`Morgan Kaufmann Publishers, Inc.
`
`San Francisco, California
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, Cover-4
`
`
`
`Sponsoring Editor Denise Penrose
`Production Manager Yonie Overton
`Production Editor Julie Pabst
`Editorial Coordinator Jane Elliott
`Text and Cover Design Ross Carron Design
`Illustration Alexander Teshin Associates, with second edition modifications by Dartmouth
`Publishing, Inc.
`Chapter Opener Illustrations Canary Studios
`Copyeditor Ken DellaPenta
`Composition Nancy Logan
`Proofreader Jennifer McClain
`indexer Steve Rath
`Printer Courier Corporation
`
`Morgan Kaufmann Publishers, Inc.
`Editorial and Sales Office:
`340 Pine Street, Sixth Floor
`San Francisco, CA 94104-3205
`USA
`
`Telephone 415/392-2665
`Facsimile 415/982-2665
`Email mkp@mkp.com
`WWW http:ffwww.mkp.com
`Order toll free 800/745-7323
`
`© 1998 by Morgan Kaufmann Publishers, Inc.
`All rights reserved
`Printed in the United States of America
`
`04 03
`
`10 9
`
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`or by any means--electronic, mechanical, photocopying, recording, or otherwise--without the prior
`written permission of the publisher.
`
`Advice, Praise, and Errors: Any correspondence related to this publication or intended for the authors
`should be sent electronically to cod2bugs@mkp.com. Information regarding error sightings is encouraged.
`Any error sightings that are accepted for correction in subsequent printings will be rewarded by the
`authors with a payment of $1.00 (U.S.) per correction at the time of their implementation in a reprint.
`
`Library of Congress Cataloging-in-Publication Data
`Patterson, David A.
`Computer organization and design : the hardware/software interface
`/ David A. Patterson, John L. Hennessy.--2nd ed.
`p. cm.
`Includes bibliographical references and index.
`ISBN 1-55860-428-6 (cloth).--ISBN 1-55860-491-X (paper)
`1. Computer organization. 2. Computers--Design and construction.
`I. Hennessy, John L. II. Title
`3. Computer interfaces.
`1997
`QA76.9.C643H46
`004.2’2--dc21
`
`97-16050
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, Cover-5
`
`
`
`TO LI N DA AND ANDREA
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, Cover-6
`
`
`
`vi
`
`Foreword
`
`by John H. Crawford
`Intel Fellow, Director of Microprocessor Architecture
`Intel Corporation, Santa Clara, California
`
`Computer design is an exciting and competitive discipline. The microproces-
`sor industry is on a treadmill where we double microprocessor performance
`every 18 months and double microprocessor complexity--measured by the
`number of transistors per chip--every 24 months. This unprecedented rate of
`change has been evident for the entire 25-year history of the microprocessor,
`and it promises to continue for many years to come as the creativity and
`energy of many people are harnessed to drive innovation ahead in spite of the
`challenge of ever-smaller dimensions. This book trains the student with the
`concepts needed to lay a solid foundation for joining this exciting field. More
`importantly, this book provides a framework for thinking about computer
`organization and design that will enable the reader to continue the lifetime of
`learning necessary for staying at the forefront of this competitive discipline.
`The text focuses on the boundary between hardware and software and ex-
`plores the levels of hardware in the vicinity of this boundary. This boundary
`is captured in a computer’s architecture specification. It is a critical boundary
`for a successful computer product: an architect must define an interface that
`can be efficiently implemented by hardware and efficiently targeted by com-
`pilers. The interface must be able to retain these efficiencies for many genera-
`tions of hardware and compiler technology, much of which will be unknown
`at the time the architecture is specified. This boundary is central to the disci-
`pline of computer design: it is where compilation (in software) ends and inter-
`pretation (in hardware) begins.
`This book builds on introductory programming skills to introduce the con-
`cepts of assembly language programming and the tools needed for this task:
`the assembler, linker, and loader. Once these prerequisites are completed, the
`remainder of the book explores the first few levels of hardware below the ar-
`chitectural interface. The basic concepts are motivated and introduced with
`clear and intuitive examples, then elaborated into the "real stuff" used in to-
`day’s modern microprocessors. For example, doing the laundry is used as an
`analogy in Chapter 6 to explain the basic concepts of pipelining, a key tech-
`nique used in all modern computers. In Chapter 4, algorithms for the basic
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. vi
`
`
`
`Foreword
`
`vii
`
`floating-point arithmetic operators such as addition, multiplication, and divi-
`sion are first explained in decimal, then in binary, and finally they are elabo-
`rated into the best-known methods used for high-speed arithmetic in today’s
`computers.
`New to this edition are sections in each chapter entitled "Real Stuff." These
`sections describe how the concepts from the chapter are implemented in com-
`mercially successful products. These provide relevant, tangible examples of
`the concepts and reinforce their importance. As an example, the Real Stuff in
`Chapter 6, Enhancing Performance with Pipelining, provides an overview of a
`dynamically scheduled pipeline as implemented in both the IBM/Motorola
`PowerPC 604 and Intel’s Pentium Pro microprocessor.
`The history of computing is woven as a thread throughout the book to re-
`ward the reader with a glimpse of key successes from the brief history of this
`young discipline. The other side of history is reported in the Fallacies and Pit-
`falls section of each chapter. Since we can learn more from failure than from
`success, these sections provide a wealth of learning!
`The authors are two of the most admired teachers, researchers, and practi-
`tioners of the art of computer design today. John Hennessy has straddled both
`sides of the hardware/software boundary, providing technical leadership for
`the legendary MIPS compiler as well as the MIPS hardware products through
`many generations. David Patterson was one of the original RISC proponents:
`he coined the acronym RISC, evangelized the case for RISC, and served as a
`key consultant on Sun Microsystem’s SPARC line of processors. Continuing
`his talent for marketable acronyms, his next breakthrough was RAID (Redun-
`dant Arrays of Inexpensive Disks), which revolutionized the disk storage in-
`dustry for large data servers, and then NOW (Networks of Workstations).
`Like other great "software" products, this second edition went through an
`extensive beta testing program: 13 beta sites tested the draft manuscript in
`classes to "debug" the text. Changes from this testing have been incorporated
`into the "production" version.
`Patterson and Hennessy have succeeded in taking the first edition of their
`excellent introductory textbook on computer design and making it even better.
`This edition retains all of the good points of the original, yet adds significant
`new content and some minor enhancements. What results is an outstanding in-
`troduction to the exciting field of computer design.
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. vii
`
`
`
`viii
`
`Contents
`
`Contents
`
`Foreword vi
`by John H. Crawford
`
`Worked Examples xiii
`
`Computer Organization and Design Online
`
`xvi
`
`Preface xix
`
`CHAPTERS
`
`Computer Abstractions and Technology
`
`Introduction 3
`1,1
`1.2 Below Your Program 5
`1.3 Under the Covers 10
`integrated Circuits: Fueling innovation 21
`1.4
`1.5 Real Stuff: Manufacturing Pentium Chips 24
`1.6 Fallacies and Pitfalls 29
`1.7 Concluding Remarks 30
`1,8 Historical Perspective and Further Reading 32
`1.9 Key Terms 44
`1.10 Exercises 45
`
`The Role of Performance 52
`
`2.1
`introduction 54
`2.2 Measuring Performance 58
`2.3 Relating the Metrics 60
`2.4 Choosing Programs to Evaluate Performance 66
`2.5 Comparing and Summarizing Performance 69
`2.6 Real Stuff: The SPEC95 Benchmarks and Performance of Recent
`Processors 71
`2.7 Fallacies and Pitfalls 75
`2.8 Concluding Remarks 82
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. viii
`
`
`
`Contents ix
`
`2.9 Historical Perspective and Further Reading 83
`2.10 Key Terms 89
`2.11 Exercises 90
`
`Instructions: Language of the Machine 104
`
`Introduction 106
`3.1
`3.2 Operations of the Computer Hardware 107
`3.3 Operands of the Computer Hardware 109
`3.4 Representing instructions in the Computer 116
`Instructions for Making Decisions 122
`3.5
`3.6 Supporting Procedures in Computer Hardware 132
`3.7 Beyond Numbers 142
`3.8 Other Styles of MiPS Addressing 145
`3.9 Starting a Program 156
`3.10 An Example to Put It All Together 163
`3.11 Arrays versus Pointers 171
`3.12 Real Stuff: PowerPC and 80x86 instructions 175
`3.13 Fallacies and Pitfalls 185
`3.14 Concluding Remarks 187
`3.15 Historical Perspective and Further Reading 189
`3.16 Key Terms 196
`3.17 Exercises 196
`
`Arithmetic for Computers 2o8
`
`introduction 210
`4.1
`4.2 Signed and Unsigned Numbers 210
`4.3 Addition and Subtraction 220
`4.4 Logical Operations 225
`4.5 Constructing an Arithmetic Logic Unit 230
`4.6 Multiplication 250
`4.7 Division 265
`4.8 Floating Point 275
`4.9 Real Stuff: Floating Point in the PowerPC and 80x86
`4.10 Fallacies and Pitfalls 304
`4.11 Concluding Remarks 308
`4.12 Historical Perspective and Further Reading 312
`4.13 Key Terms 322
`4.14 Exercises 322
`
`301
`
`The Processor: Datapath and Control
`
`336
`
`introduction 338
`5.1
`5.2 Building a Datapath 343
`5.3 A Simple Implementation Scheme 351
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. ix
`
`
`
`Contents
`
`5.4 A Multicycle Implementation 377
`5.5 Microprogramming: Simplifying Control Design 399
`5.6 Exceptions 410
`5.7 Real Stuff: The Pentium Pro implementation 416
`5.8 Fallacies and Pitfalls 419
`5.9 Concluding Remarks 421
`5.10 Historical Perspective and Further Reading 423
`5.11 Key Terms 426
`5.12 Exercises 427
`
`Enhancing Performance with Pipelining 434
`
`6.1 An Overview of Pipelining 436
`6.2 A Pipeiined Datapath 449
`6.3 Pipalined Control 466
`6.4 Data Hazards and Forwarding 476
`6.5 Data Hazards and Stalls 489
`6.6 Branch Hazards 496
`6.7 Exceptions 505
`6.8 Snperscalar and Dynamic Pipelining 510
`6.9 Real Stuff: PowerPC 604 and Pentium Pro Pipelines
`6.10 Fallacies and Pitfalls 520
`6.11 Concluding Remarks 521
`6.12 Historical Perspective and Further Reading 525
`6.13 Key Terms 529
`6.14 Exercises 529
`
`517
`
`Large and Fast: Exploiting Memory Hierarchy 538
`
`7.1
`7.2
`7.3
`7.4
`7.5
`7.6
`7.7
`7.8
`7.9
`7.10
`7.11
`
`Introduction 540
`The Basics of Caches 545
`Measuring and Improving Cache Performance 564
`Virtual Memory 579
`A Common Framework for Memory Hierarchies 603
`Real Stuff: The Pentium Pro and PowerPC 604 Memory Hierarchies
`Fallacies and Pitfalls 615
`Concluding Remarks 618
`Historical Perspective and Further Reading 621
`Key Terms 627
`Exercises 628
`
`611
`
`Interfacing Processors and Peripherals 636
`
`8.1
`8.2
`
`introduction 638
`I/O Performance Measures: Some Examples from Disk and File
`Systems 641
`8.3 Types and Characteristics of I/O Devices 644
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. x
`
`
`
`Contents
`
`xi
`
`8.4
`8.5
`
`Buses: Connecting I/O Devices to Processor and Memory 655
`interfacing I/O Devices to the Memory, Processor, and Operating
`System 673
`8.6 Designing an I/O System 684
`8.7 Real Stuff: A Typical Desktop i/O System 687
`8.8 Fallacies and Pitfalls 688
`8.9 Concluding Remarks 690
`8.10 Historical Perspective and Further Reading 694
`8.11 Key Terms 700
`8.12 Exercises 700
`
`Multiprocessors 710
`
`Introduction 712
`9.1
`9.2 Programming Multiprocessors 714
`9.3 Muitiprocessors Connected by a Single Bus 717
`9.4 Muitiprocessors Connected by a Network 727
`9.5 Clusters 734
`9.6 Network Topologies 736
`9.7 Real Stuff: Future Directions for Multiprocessors 740
`9.8 Fallacies and Pitfalls 743
`9.9 Concluding Remarks--Evolution versus Revolution in Computer
`Architecture 746
`9.10 Historical Perspective and Further Reading 748
`9.11 Key Terms 756
`9.12 Exercises 756
`
`APPENDICES
`
`Assemblers, Linkers, and the SPIM Simulator
`by James R. Larus, University of Wisconsin
`
`A-2
`
`introduction A-3
`A.1
`A.2 Assemblers A-10
`A.3 Linkers A-17
`A.4 Loading A-19
`A.5 Memory Usage A-20
`A.6 Procedure Call Convention A-22
`A.7 Exceptions and Interrupts A-32
`Input and Output A-36
`A.8
`A.9 SPIM A-38
`A.10 MIPS R2000 Assembly Language A-49
`A.11 Concluding Remarks A-75
`A.12 Key Terms A-76
`A.13 Exercises A-76
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. xi
`
`
`
`xii
`
`Contents
`
`The Basics of Logic Design B-2
`
`B-4
`
`Introduction B-3
`B.1
`B.2 Gates, Truth Tables, and Logic Equations
`B.3 Combinational Logic B-8
`B.4 Clocks B-18
`B.5 Memory Elements B-21
`B.6 Finite State Machines B-35
`B.7 Timing Methodologies B-39
`B.8 Concluding Remarks B-44
`B.9 Key Terms B-45
`B.10 Exercises B-45
`
`Mapping Control to Hardware c-2
`
`Introduction C-3
`C.1
`Implementing Combinational Control Units C-4
`C.2
`implementing Finite State Machine Control C-8
`C.3
`Implementing the Next-State Function with a Sequencer
`C.4
`C.5 Translating a Microprogram to Hardware C-28
`C.6 Concluding Remarks C-31
`C.7 Key Terms C-32
`C.8 Exercises C-32
`
`C-21
`
`Glossary G-1
`
`Index I-1
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. xii
`
`
`
`Xlll
`
`Worked Examples
`
`Chapter 2: The Role of Performance
`
`Throughput and Response Time 56
`Relative Performance 57
`Improving Performance 60
`Using the Performance Equation 62
`Comparing Code Segments 64
`MIPS as a Performance Measure 78
`
`Chapter 3: Instructions: Language of the Machine
`
`Compiling Two C Assignment Statements into MIPS 108
`Compiling a Complex C Assignment into MIPS 109
`Compiling a C Assignment Using Registers 110
`Compiling an Assignment When an Operand Is in Memory 112
`Compiling Using Load and Store 113
`Compiling Using a Variable Array Index 114
`Translating a MIPS Assembly Instruction into a Machine Instruction 117
`Translating MIPS Assembly Language into Machine Language 119
`Compiling an If Statement into a Conditional Branch 123
`Compiling if-then-else into Conditional Branches 124
`Compiling a Loop with Variable Array Index 126
`Compiling a while Loop 127
`Compiling a Less Than Test 128
`Compiling a switch Statement by Using a Jump Address Table 129
`Compiling a Procedure that Doesn’t Call Another Procedure 134
`Compiling a Recursive Procedure, Showing Nested Procedure Linking 136
`Compiling a String Copy Procedure, Showing How to Use C Strings 143
`Translating Assembly Constants into Machine Language 145
`Loading a 32-Bit Constant 147
`Showing Branch Offset in Machine Language 149
`Branching Far Away 150
`Decoding Machine Code 154
`Linking Object Files 160
`Compiling an Assignment Statement into Accumulator Instructions 190
`Compiling an Assignment Statement into Memory-Memory Instructions 192
`Compiling an Assignment Statement into Stack Instructions 193
`
`Chapter 4: Arithmetic for Computers
`ASCII versus Binary Numbers 212
`Binary to Decimal Conversion 214
`Signed versus Unsigned Comparison 215
`Negation Shortcut 216
`Sign Extension Shortcut 217
`Binary-to-Hexadecimal Shortcut 218
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. xiii
`
`
`
`xiv
`
`Worked Examples
`
`Binary Addition and Subtraction 220
`C Bit Fields 229
`Both Levels of the Propagate and Generate 247
`Speed of Ripple Carry versus Carry Lookahead 248
`First Multiply Algorithm 253
`Second Multiply Algorithm 256
`Third Multiply Algorithm 257
`Booth’s Algorithm 261
`Multiply by 2i via Shift 262
`First Divide Algorithm 268
`Third Divide Algorithm 271
`Floating-Point Representation 279
`Converting Binary to Decimal Floating Point 280
`Decimal Floating-Point Addition 282
`Decimal Floating-Point Multiplication 287
`Compiling a Floating-Point C Program into MIPS Assembly Code 293
`Compiling Floating-Point C Procedure with Two-Dimensional Matrices into MIPS 294
`Rounding with Guard Digits 297
`
`Chapter 5: The Processor: Datapath and Control
`Composing Datapaths 351
`hnplementing Jumps 370
`Performance of Single-Cycle Machines 373
`Performance of a Single-Cycle CPU with Floating-Point Instructions 375
`CPI in a Multicycle CPU 397
`
`Chapter 6: Enhancing Performance with Pipelining
`Single-Cycle versus Pipelined Performance 438
`Stall on Branch Performance 442
`Forwarding with Two Instructions 446
`Reordering Code to Avoid Pipeline Stalls 447
`Labeled Pipeline Execution, Including Control 471
`Dependency Detection 479
`Forwarding 485
`Pipelined Branch 498
`Loops and Prediction 501
`Comparing Performance of Several Control Schemes 504
`Exception in a Pipelined Computer 507
`Simple Superscalar Code Scheduling 513
`Loop Unrolling for Superscalar Pipelines 513
`
`Chapter 7: Large and Fast: Exploiting Memory Hierarchy
`
`Bits in a Cache 550
`Mapping an Address to a Multiword Cache Block 556
`Calculating Cache Performance 565
`Cache Performance with Increased Clock Rate 567
`Associativity in Caches 5.71
`Size of Tags versus Set Associativity 575
`Performance of Multilevel Caches 576
`Overall Operation of a Memory Hierarchy 595
`
`Chapter 8: Interfacing Processors and Peripherals
`Impact of I/O on System Performance 639
`Disk Read Time 648
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. xiv
`
`
`
`Worked Examples
`
`Xv
`
`Performance of Two Networks 654
`FSM Control for I/O 662
`Performance Analysis of Synchronous versus Asynchronous Buses 662
`Performance Analysis of Two Bus Schemes 665
`Overhead of Polling in an I/O System 676
`Overhead of Interrupt-Driven I/O 679
`Overhead of I/O Using DMA 681 .
`I/O System Design 685
`
`Chapter 9: Multiprocessors
`Speedup Challenge 715
`Speedup Challenge, Bigger Problem 716
`Parallel Program (Single Bus) 718
`Parallel Program (Message Passing) 729
`
`Appendix A: Assemblers, Linkers, and the SPIM Simulator
`Local and Global Labels A-11
`String Directive A-15
`Macros A-15
`Stack in Recursive Procedure A-28
`Interrupt Handler A-34
`
`Appendix B: The Basics of Logic Design
`
`Truth Tables B-5
`Logic Equations B-6
`Sum of Products B-11
`PLAs B-13
`Don’t Cares B-16
`
`Appendix C: Mapping Control to Hardware
`Logic Equations for Next-State Outputs C-12
`Control ROM Entries C-17
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. xv
`
`
`
`d_
`
`16
`1i-
`
`re
`he
`d-
`is.
`~n
`,le
`
`e-
`
`e-
`in
`rt
`it
`~f
`~e
`
`"i-
`
`)-
`
`)r
`Le
`le
`
`Le
`’e
`g
`
`L-
`
`e
`1
`
`7.9 Historical Perspective and Further Reading
`
`621
`
`As we will see in Chapter 9, memory systems are also a central design issue
`for parallel processors. The growing importance of the memory hierarchy in
`determining system performance in both uniprocessor and multiprocessor
`systems means that this important area will continue to be a focus of both de-
`signers and researchers for some years to come.
`
`Historical Perspective and Further Reading
`
`¯ .. the one single development that put computers on their feet was the invention
`of a reliable form of memory, namely, the core memory .... Its cost was reasonable,
`it was reliable and, because it was reliable, it could in due course be made large.
`
`Maurice Wilkes,
`Memoirs of a Computer Pioneer, 1985
`
`The developments of most of the concepts in this chapter have been driven by
`revolutionary advances in the technology we use for memory. Before we dis-
`cuss how memory hierarchies were developed, let’s take a brief tour of the
`development of memory technology. In this section, we focus on the technolo-
`gies for building main memory and caches; Chapter 8 will provide some of
`the history of developments in disk technology.
`The ENIAC had only a small number of registers (about 20) for its storage
`and implemented these with the same basic vacuum tube technology that it
`used for building logic circuitry. However, the vacuum tube technology was
`far too expensive to be used to build a larger memory capacity. Eckert came up
`with the idea of developing a new technology based on mercury delay lines. In
`this technology, electrical signals were converted into vibrations that were sent
`down a tube of mercury, reaching the other end, where they were read out and
`recirculated. One mercury delay line could store about 0.5 Kbits. Although
`these bits were accessed serially, the mercury delay line was about a hundred
`times more cost-effective than vacuum tube memory. The first known working
`mercury delay lines were developed at Cambridge for the EDSAC. Figure 7.36
`shows the mercury delay lines of the EDSAC, which had 32 tanks and a total
`of 512 36-bit words.
`Despite the tremendous advance offered by the mercury delay lines, they
`were terribly unreliable and still rather expensive. The breakthrough came
`with the invention of core memory by J. Forrester at MIT as part of the Whirl-
`wind project, in the early 1950s (see Figure 7.37). Core memory uses a ferrite
`core, which can be magnetized, and once magnetized, acts as a store (just as a
`magnetic recording tape stores information). A set of wires running through
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. 621
`
`
`
`622
`
`Chapter 7 Large and Fast: Exploiting Memory Hierarchy
`
`FIGURE 7.36 The mercury delay lines in the EDSAC. This technology made it possible to
`build the first stored-program computer. The young engineer in this photograph is none other
`than Maurice Wilkes, the lead architect of the EDSAC. Photo courtesy of the Computer Museum,
`
`Boston.
`
`the center of the core, which had a dimension of 0.1-1.0 millimeters, make it
`possible to read the value stored on any ferrite core. The Whirlwind eventually
`included a core memory with 2048 16-bit words, or a total of 32 Kbits. Core
`memory was a tremendous advance: It was cheaper, faster, much more reli-
`able, and had higher density. Core memory was so much better than the alter-
`natives that it became the dominant memory technology only a few years after
`its invention and remained so for nearly 20 years.
`The technology that replaced core memory was the same one that we now
`use both for logic and memory: the integrated circuit. While registers were
`built out of transistorized memory in the 1960s, and IBM machines used tran-
`sistorized memory for microcode store and caches in 1970, building main
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. 622
`
`
`
`7.9 Historical Perspective and Further Reading
`
`623
`
`to
`er
`n,
`
`it
`Y
`e
`
`P
`
`r
`
`FIGURE 7.37 A core memory plane from the Whirlwind containing 256 cores arranged in
`a 16 x 16 array. Core memory was invented for the Whirlwind, which was used for air defense
`problems, and is now on display at the Smithsonian. (Incidentally, Ken Olsen, the founder and
`president of Digital for 20 years, built the machine that tested these core memories; it was his first
`computer.) Photo courtesy of the Computer Museum, Boston.
`
`memory out of transistors remained prohibitive until the development of the
`integrated circuit. With the integrated circuit, it became possible to build a
`DRAM (dynamic random access memory--see Appendix B for a description).
`The first DRAMS were built at Intel in 1970, and the machines using DRAM
`memories (as a high-speed option to core) came shortly thereafter; they used
`1-Kbit DRAMs. In fact, computer folklore says that Intel developed the micro-
`processor partly to help sell more DRAM. Figure 7.38 shows an early DRAM
`board. By the late 1970s, core memory became a historical curiosity. Just as core
`memory technology had allowed a tremendous expansion in memory size,
`DRAM technology allowed a comparable expansion. In the 1990s, many per-
`sonal computers have as much memory as the largest machines using core
`memory ever had.
`Nowadays, DRAMs are typically packaged with multiple chips on a little
`board called SIMM (single inline memory module) or DIMM (dual inline
`memory module). The SIMM shown in Figure 7.39 contains a total of 1 MB and
`sells for about $5 in 1997. In 1997, SIMMs and DIMMs are available with up to
`64 MB. While DRAMs will remain the dominant memory technology for some
`time to come, dramatic innovations in the packaging of DRAMs to provide
`both higher bandwidth and greater density are ongoing.
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. 623
`
`
`
`624
`
`Chapter 7 Large and Fast: Exploiting Memory Hierarchy
`
`FIGURE 7.38 An eariy DRAM board. This board uses 18-Kbit chips. Photo courtesy of IBM.
`
`,i
`
`;!
`
`FIGURE 7.39 A 1-MB SIMM, built in 1986, using 1-Mbit chips. This SIMM, used in a Mac-
`intosh, sells for about $5/MB in 1997. In 1997, most main memory is packed in either SIMMs or
`DIMMs similar to this, though using much higher-density memory chips (16-Mbit or 64-Mbit).
`Photo courtesy of MIPS Technology, Inc.
`
`’1
`
`The Development of Memory Hierarchies
`
`Although the pioneers of computing foresaw the need for a memory hier-
`archy and coined the term, the automatic management of two levels was first
`proposed by Kilburn and his colleagues and demonstrated at the University
`of Manchester with the Atlas computer, whici~ implemented virtual memory.
`This was the year before the IBM 360 was announced. IBM planned to include
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1017, p. 624
`
`
`
`7.9 Historical Perspective and Further Reading
`
`625
`
`virtual memory with the next generation (System/370), but the OS/360 oper-
`ating system wasn’t up to the challenge in 1970. Virtual memory was
`announced for the 370 family in 1972, and it was for this machine that the
`term translation-lookaside buffer was coined. The only computers today without
`virtual memory are a few supercomputers, and even they may add this fea-
`ture in the near future.
`The problems of inadequate address space have plagued designers repeat-
`edly. The architects of the PDP-11 identified a small address space as the only
`architectural mistake that is difficult to recover from. When the PDP-11 was de-
`signed, core memory densities were increasing at a very slow rate, and the
`competition from 100 other minicomputer companies meant that DEC might
`not have a cost-competitive product if every address had to go through the
`16-bit datapath twice. Hence the decision to add just 4 more address bits than
`the predecessor of the PDP-11. The architects of the IBM 360 were aware of the
`importance of address size and planned for the architecture to extend to 32 bits
`of address. Only 24 bits were used in the IBM 360, however, because the low-
`end 360 models would have been even slower with the larger addresses. Un-
`fortunately, the expansion effort was greatly complicated by programmers
`who stored extra information in the upper 8 "unused" address bits.
`Running out of a