`
`
`
`
`
`
`
`
`
`
`
`DAVl-D-I.__'A; PATTERSON '-
`JoHN LHEN N. E 5 SY
`
`M [f
`
`HDRGRN KAUFMANN
`
`
`
`Sony Corp., et al. v.
`Memory Imegrily, LLC
`IPR2015-BD153
`EXHIBIT
`Ime f' —20l]5
`
`Memo
`
`
`
`
`
`
`THIRD ED
`
`lTION
`
`
`
`Computer Organization and Design
`
`THE HARDWARE/SOFTWARE INTERFACE
`
`David A. Patterson
`
`University of California, Berkeley
`
`John L. Hennessy
`
`Stanford University
`
`With a contribution by
`Peter I. Ashenden
`Ashenden Designs Pty Ltd
`
`James R. Larus
`Microsoft Research
`
`Daniel I. Sorin
`Duke University
`
`
`
`ELSEVIER
`
`AMSTERDAM ' BOSTON ' HEIDELBERG - LONDON
`NEW YORK - OXFORD - PARIS - SAN DIEGO
`
`SAN FRANCISCO-SINGAPORE-SYDNEY~TOKYO M {{G‘)
`
`MorganKaufmannis animprintofElsevier
`
`MORGAN KAUFMANN PUBLISHERS
`
`
`
`
`
`
`
`2
`
`
`
`Denise E. M. Penrose
`Simon Crump
`Summer Block
`Ross Caron Design
`Chris Asimoudis
`GGS Book Services
`Nancy Logan and Dartmouth Publishing, Inc.
`Dartmouth Publishing, Inc.
`Ken DellaPenta
`Jacqui Brownstein
`Linda Buskus
`Courier
`Courier
`
`
`
`Senior Editor
`Publishing Services Manager
`Editorial Assistant
`Cover Design
`Cover and Chapter Illustration
`Text Design
`Composition
`Technical Illustration
`Copyeditor
`Proofreader
`Indexer
`Interior printer
`Cover printer
`
`Morgan Kaufmann Publishers is an imprint of Elsevier.
`500 Sansome Street, Suite 400, San Francisco, CA 94111
`
`This book is printed on acid—free paper.
`
`© 2005 by Elsevier Inc. All rights reserved.
`
`I
`
`Designations used by companies to distinguish their products are often claimed as trademarks or registered
`trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names
`appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies
`for more complete information regarding trademarks and registration.
`
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
`by any means—~electronic, mechanical, photocopying, scanning, or otherwise~without prior written per—
`mission of the publisher.
`
`Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford,
`UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e—mail: permissions@elsevier.com.uk. You may
`also complete your request on—line via the Elsevier homepage (http://elsevier.com) by selecting “Customer
`Support” and then “Obtaining Permissions.”
`
`
`
`
`
`
`
`Library of Congress Cataloging—in-Publication Data
`Application submitted
`ISBN: 1-55860—604-1
`
`For information on all Morgan Kaufmann publications,
`Visit our Web site at www.mkp.com.
`
`Printed in the United States of America
`0405060708
`54321
`
`
`
`
`3
`
`
`
`
`
`473
`7.2 The Basics of Caches
`
`
`
`
`
`
`
`
`CPU
`
`Levels in the
`
`Increasing distance
`from the CPU in
`access time
`
`
`<
`
`>
`
`Size of the memory at each level
`____________—____————
`
`FIGURE 7.3 This diagram shows the structure of a memory hierarchy: as the distance
`from the processor increases, so does the size. This structure with the appropriate operating
`mechanisms allows the processor to have an access time that is determined primarily by level 1 of the hier—
`archy and yet have a memory as large as level n. Maintaining this illusion is the subject of this chapter.
`Although the local disk is normally the bottom of the hierarchy, some systems use tape or a file server over a
`local area network as the next levels of the hierarchy.
`
`The Basics of Caches
`
`In our library example, the desk acted as a cache—a safe place to store things
`(books) that we needed to examine. Cache was the name chosen to represent the
`level of the memory hierarchy between the processor and main memory in the
`first commercial computer to have this extra level. Today, although this remains
`the dominant use of the word cache, the term is also used to refer to any storage
`managed to take advantage of locality of access. Caches first appeared in research
`computers in the early 19605 and in production computers later in that same
`decade; every general—purpose computer built today, from servers to low-power
`embedded processors, includes caches.
`In this section, we begin by looking at a very simple cache in which the processor
`requests are each one word and the blocks also consist of a single word. (Readers
`already familiar with cache basics may want to skip to Section 7.3 on page 492.)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Cache: a safe place for hid—
`ing or storing things.
`Webster’s New World Diction—
`
`ary of the American Language,
`Third College Edition (1988)
`
`
`
`4
`
`
`
`
`
`474
`
`Chapter 7 Large and Fast: Exploiting Memory Hierarchy
`
`
`
`
`
`
`
`x4
`X4
`,
`
`X
`X
`
`XL,
`x”;
`
`
`
`
`F
`X.-.
`X.-.
`
`
`
`
`
`X3
`
`X2
`
`X3
`
`X2
`
`l:x7_::l
`
`b. After the reference to Xn
`a. Before the reference to X”
`/—
`FIGURE 7.4 The cache just before and just after a reference to a word X,I that is not
`initially in the cache. This reference causes a miss that forces the cache to fetch anrom memory and
`insert it into the cache.
`
`Figure 7.4 shows such a simple cache, before and after requesting a data item that is
`not initially in the cache. Before the request, the cache contains a collection of recent
`references X1, X2, .
`. . , Xn_1, and the processor requests a word Xn that is not in the
`cache. This request results in a miss, and the word X1 is brought from memory into
`cache.
`‘
`there are two questions to
`In looking at
`the scenario in Figure 7.4,
`answer: How do we know if a data item is in the cache? Moreover, if it is, how do
`we find it? The answers to these two questions are related. If each word can go in
`exactly one place in the cache, then it is straightforward to find the word if it is in
`the cache. The simplest way to assign a location in the cache for each word in
`memory is to assign the cache location based on the address of the word in mem-
`ory. This cache structure is called direct mapped, since each memory location is
`mapped directly to exactly one location in the cache. The typical mapping
`between addresses and cache locations for a direct—mapped cache is usually sim—
`ple. For example, almost all direct—mapped caches use the mapping
`
`(Block address) modulo (Number of cache blocks in the cache)
`
`This mapping is attractive because if the number of entries in the cache is a power
`of two, then modulo can be computed simply by using the low—order log2 (cache
`size in blocks) bits of the address; hence the cache may be accessed directly with
`the low—order bits. For example, Figure 7.5 shows how the memory addresses
`
`direct-mapped cache A cache
`structure in which each memory
`location is mapped to exactly
`one location in the cache.
`
`xi.“;x.svz..,,Aw
`
`
`
`
`
`
`
`
`5
`
`
`
`
`
`7.2 The Basics of Caches
`475
`
`Cache
`
`000 001 010 011 100 101 110 111
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`11101
`
`
`
`00001
`00101
`01001
`01101
`10001
`10101
`11001
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Memory
`
`FIGURE 7.5 A direct-mapped cache with eight entries showing the addressesiof memory
`words between 0 and 31 that map to the same cache locations. Because there are eight words in
`the cache, an address X maps to the cache word X modulo 8. That is, the low—order log2(8) = 3 bits are used as
`the cache index. Thus, addresses 00001tw0, OlOOltwo, lOOOltWO, and llOOltWO all map to entry 001,:WO of the
`cache, while addresses 00101tw0, 01101two, 10101Mo, and 111mm all map to entry 101mm of the cache.
`
`between 1ten (00001tw0) and 29ten (11101tw0) map to locations 1ten (001
`5ten (101m)) in a direct—mapped cache of eight words.
`Because each cache location can contain the contents of a number of different
`
`two
`
`) and
`
`memory locations, how do we know whether the data in the cache corresponds to
`a requested word? That is, how do we know whether a requested word is in the
`cache or not? We answer this question by adding a set of tags to the cache. The
`tags contain the address information required to identify whether a word in the
`cache corresponds to the requested word. The tag needs only to contain the upper
`portion of the address, corresponding to the bits that are not used as an index into
`the cache. For example, in Figure 7.5 we need only have the upper 2 of the 5
`address bits in the tag, since the lower 3—bit index field of the address selects the
`block. We exclude the index bits because they are redundant, since by definition
`the index field of every address must have the same value.
`We also need a way to recognize that a cache block does not have valid infor—
`mation. For instance, when a processor starts up, the cache does not have good
`
`tag A field in a table used for a
`memory hierarchy that contains
`the address information required
`to identifywhether the associated
`block in the hierarchy corre—
`sponds to a requested word.
`
`
`
`
`
`6
`
`
`
`
`
`476
`Chapter 7 Large and Fast: Exploiting Memory Hierarchy
`
`validbit A field in the tables of a
`
`memory hierarchy that indicates
`that the associated block in the
`
`hierarchy contains valid data.
`
`data, and the tag fields will be meaningless. Even after executing many instruc—
`tions, some of the cache entries may still be empty, as in Figure 7.4. Thus, we need
`to know that the tag should be ignored for such entries. The most common
`method is to add a valid bit to indicate whether an entry contains a valid address.
`If the bit is not set, there cannot be a match for this block.
`For the rest of this section, we will focus on explaining how reads work in a
`cache and how the cache control works for reads. In general, handling reads is a
`little simpler than handling writes, since reads do not have to change the contents
`of the cache. After seeing the basics of how reads work and how cache misses can
`be handled, we’ll examine the cache designs for real computers and detail how
`these caches handle writes.
`
`Accessing a Cache
`
`Figure 7.6 shows the contents of an eight—word direct—mapped cache as it
`responds to a series of requests from the processor. Since there are eight blocks in
`the cache, the low—order 3 bits of an address give the block number. Here is the
`action for each reference:
`
`Assigned cache block
`Hit or miss
`Binary address
`Decimal address
`
`
`
`of reference in cacheof reference (where found or placed)
`
`10110two
`miss (7. 6b)
`(10110two mod 8): 110two
`
`26
`11010two
`miss (7.6c)
`(11010two mod 8): 010m
`
`22
`10110two
`hit
`(10110two mod 8) = 110two
`
`26
`11010two
`hit
`(11010th mod 8) = 010two
`
`16
`10000th
`miss (7.6d)
`(10000two mod 8) = 000two
`
`3
`00011th
`miss (7.6e)
`(00011two mod 8) = 011two
`16
`10000two
`hit
`(10000,“ mod 8) = 000th
`18
`10010MO
`miss (7.6f)
`(10010two mod 8) = 010M,
`
`
`
`
`
`
`When the word at address 18 ( 0010““) is brought into cache block 2
`(010mm), the word at address 26 (llOlOtwo), which was in cache block 2
`(010m)), must be replaced by the newly requested data. This behavior allows a
`cache to take advantage of temporal locality: recently accessed words replace
`less recently referenced words. This situation is directly analogous to needing a
`book from the shelves and having no more space on your desk—some book
`already on your desk must be returned to the shelves In a direct—mapped cache,
`there is only one place to put the newly requested item and hence only one
`choice of what to replace.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`7
`
`
`
`9.6 Network Topologies
`
`9-27
`
`9.6
`
` Network Topologies
`
`9.6
`
`Chapter 8 reviewed off-the-shelf, switched, local area networks that are the foun-
`dation of clusters. In this section we describe proprietary networks used in multi-
`processors.
`Network costs include the number of switches, the number of links on a switch
`to connect to the network, the width (number of bits) per link, and length of the
`links when the network is mapped into a physical machine. For example, on a
`machine that scales between tens and hundreds of processors, some links may be
`metal rectangles within a chip that are a few millimeters long, and others may be
`cables that must stretch several meters from one cabinet to another. Network per-
`formance is multifaceted as well. It includes the latency on an unloaded network
`to send and receive a message, the throughput in terms of the maximum number
`of messages that can be transmitted in a given time period, delays caused by con-
`tention for a portion of the network, and variable performance depending on the
`pattern of communication. Another obligation of the network may be fault toler-
`ance, since very large systems may be required to operate in the presence of bro-
`ken components.
`Networks are normally drawn as graphs, with each arc of the graph represent-
`ing a link of the communication network. The processor-memory node is shown
`as a black square, and the switch is shown as a colored circle. In this section, all
`links are bidirectional; that is, information can flow in either direction. All net-
`works consist of switches whose links go to processor-memory nodes and to other
`switches. The first improvement over a bus is a network that connects a sequence
`of nodes together:
`
`This topology is called a ring. Since some nodes are not directly connected, some
`messages will have to hop along intermediate nodes until they arrive at the final
`destination.
`Unlike a bus, a ring is capable of many simultaneous transfers. Because there
`are numerous topologies to choose from, performance metrics are needed to dis-
`tinguish these designs. Two are popular. The first is total network bandwidth,
`which is the bandwidth of each link multiplied by the number of links. This repre-
`sents the very best case. For the ring network above with P processors, the total
`network bandwidth would be P times the bandwidth of one link; the total net-
`work bandwidth of a bus is just the bandwidth of that bus, or two times the band-
`width of that link.
`
`network bandwidth Infor-
`mally, the peak transfer rate of a
`network; can refer to the speed
`of a single link or the collective
`transfer rate of all links in the
`network.
`
`8
`
`