`A Quantitative Approach
`
`Third Edition
`
`John L. Hennessy
`Stanford University
`
`David A. Patterson
`University of California at Berkeley
`
`With Contributions by
`
`David Goldberg
`Xerox Palo Alto Research Center
`
`Krste Asanovic
`Department of Electrical Engineering and Computer Science
`Massachusetts Institute ofTechnology
`
`MORGAN KAUFMANN PUBLISHERS
`
`AN
`
`IMPRINT OF ELSEVIER
`
`AMSTERDAM
`
`BOSTON
`
`LONDON NEW YORK
`
`OXFORD
`
`PARIS
`
`SAN DIEGO
`
`SAN FRANCISCO
`
`SINGAPORE
`
`SYDNEY
`
`TOKYO
`
`Oracle-1042 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`Senior Editor Denise E. M. Penrose
`Assistant Publishing Services Manager Edward Wade
`Senior Production Editor Cheri Palmer
`Editorial Coordinator Alyson Day
`Cover Design Ross Carron Design
`Cover Image Greg Pease/ gettyimages
`Text Design Rebecca Evans & Associates
`Technical Illustration Lineworks, Inc.
`Composition Nancy Logan
`Copyeditor Ken DellaPenta
`Proofreader Jennifer McClain
`Indexer Ty Koontz
`Printer Courier Corporation
`
`Designations used by companies to distinguish their products are often claimed as trademarks or reg(cid:173)
`istered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the
`product names appear in initial capital or all capital letters. Readers, however, should contact the
`appropriate companies for more complete information regarding trademarks and registration.
`
`Morgan Kaufmann Publishers
`An Imprint of Elsevier
`340 Pine Street, Sixth Floor, San Francisco, CA 94104-3205, USA
`www.mkp.com
`
`© 1990, 1996, 2003 by Elsevier.
`All rights reserved
`
`Published 1990. Third edition 2003
`Printed in the United States of America
`070605
`1 098 7654
`
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`or by any means-electronic, mechanical, photocopying, or otherwise-without the prior written per(cid:173)
`mission of the publisher.
`Permissions may be sought directly from Elsevier's Science and Technology Rights Department in
`Oxford, UK. Phone: (44) 1865 843830, Fax: (44) 1865 853333, e-mail: permissions@elsevier.co.uk.
`You may also complete your request on-line via the Elsevier homepage: http://www.elsevier.com
`selecting "Customer Support" and then "Obtaining Permissions".
`
`ADVICE, PRAISE, & ERRORS: Any correspondence related to this publication or intended for the
`authors should be addressed to ca3comments@mkp.com. Information regarding error sightings is also
`encouraged. Any error sightings that are accepted for correction in subsequent printings will be
`rewarded by the authors with a payment of $1.00 (U.S.) per correction upon availability of the new
`printing. Bugs can be sent to cabugs@mkp.com. (Please include your full name and permanent mail(cid:173)
`ing address.)
`
`Library of Congress Control Number: 2001099789
`
`ISBN-13: 978-1-55860-596-1
`ISBN-10: 1-55860-596-7 (cloth)
`
`ISBN-13: 978-1-55860-724-8
`ISBN-10 1-55860-724-2 (paper)
`
`This book is printed on acid-free paper.
`
`Oracle-1042 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`2.16 Historical Perspective and References
`
`11 151
`
`translate directly to cost-performance, and stack computers faded out shortly
`after this work.
`Strecker's article [1978] discusses how he and the other architects at DEC
`responded to this by designing the VAX architecture. The VAX was designed to
`simplify compilation of high-level languages. Compiler writers had complained
`about the lack of complete orthogonality in the PDP-11. The VAX architecture
`was designed to be highly orthogonal and to allow the mapping of a high-level
`language statement into a single VAX instruction. Additionally, the VAX design(cid:173)
`ers tried to optimize code size because compiled programs were often too large
`for available memories. Appendix E summarizes this instruction set.
`The VAX-11/780 was the first computer announced in the VAX series. It is
`one of the most successful-and most heavily studied--computers ever built. The
`cornerstone of DEC's strategy was a single architecture, VAX, running a single
`operating system, VMS. This strategy worked well for over 10 years. The large
`number of papers reporting instruction mixes, implementation measurements,
`and analysis of the VAX makes it an ideal case study [Wiecek 1982; Clark and
`Levy 1982]. Bhandarkar and Clark [1991] give a quantitative analysis of the dis(cid:173)
`advantages of the VAX versus a RISC computer, essentially a technical explana(cid:173)
`tion for the demise of the VAX.
`While the VAX was being designed, a more radical approach, called high(cid:173)
`level language computer architecture (HLLCA), was being advocated in the
`research community. This movement aimed to eliminate the gap between high(cid:173)
`level languages and computer hardware-what Gagliardi [1973] called the
`"semantic gap"-by bringing the hardware "up to" the level of the programming
`language. Meyers [1982] provides a good summary of the arguments and a his(cid:173)
`tory of high-level language computer architecture projects.
`HLLCA never had a significant commercial impact. The increase in memory
`size on computers eliminated the code size problems arising from high-level lan(cid:173)
`guages and enabled operating systems to be written in high-level languages. The
`combination of simpler architectures together with software offered greater per(cid:173)
`formance and more flexibility at lower cost and lower complexity.
`
`Reduced Instruction Set Computers
`
`In the early 1980s, the direction of computer architecture began to swing away
`from providing high-level hardware support for languages. Ditzel and Patterson
`[1980] analyzed the difficulties encountered by the high-level language architec(cid:173)
`tures and argued that the answer lay in simpler architectures. In another paper
`[Patterson and Ditzel 1980], these authors first discussed the idea of reduced
`instruction set computers (RISC) and presented the argument for simpler
`architectures. Clark and Strecker [1980], who were VAX architects, rebutted their
`proposal.
`The simple load-store computers such as MIPS are commonly called RISC
`architectures. The roots of RISC architectures go back to computers like the
`6600, where Thornton, Cray, and others recognized the importance of instruction
`
`Oracle-1042 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`152
`
`111 Chapter Two Instruction Set Principles and Examples
`
`set simplicity in building a fast computer. Cray continued his tradition of keeping
`computers simple in the CRAY-1. Commercial RISCs are built primarily on the
`work of three research projects: the Berkeley RISC processor, the IBM 801, and
`the Stanford MIPS processor. These architectures have attracted enormous indus(cid:173)
`trial interest because of claims of a performance advantage of anywhere from two
`to five times over other computers using the same technology.
`Begun in 1975, the IBM project was the first to start but was the last to
`become public. The IBM computer was designed as a 24-bit ECL minicomputer,
`while the university projects were both MOS-based, 32-bit microprocessors. John
`Cocke is considered the father of the 801 design. He received both the Eckert(cid:173)
`Mauchly and Turing awards in recognition of his contribution. Radin [1982]
`describes the highlights of the 801 architecture. The 801 was an experimental
`project that was never designed to be a product. In fact, to keep down cost and
`complexity, the computer was built with only 24-bit registers.
`In 1980, Patterson and his colleagues at Berkeley began the project that was
`to give this architectural approach its name (see Patterson and Ditzel [1980]).
`They built two computers called RISC-I and RISC-II. Because the IBM project
`was not widely known or discussed, the role played by the Berkeley group in pro(cid:173)
`moting the RISC approach was critical to the acceptance of the technology. They
`also built one of the first instruction caches to support hybrid format RISCs (see
`Patterson et al. [1983]). It supported 16-bit and 32-bit instructions in memory but
`32 bits in the cache. The Berkeley group went on to build RISC computers tar(cid:173)
`geted toward Smalltalk, described by Ungar et al. [1984], and LISP, described by
`Taylor et al. [1986].
`In 1981, Hennessy and his colleagues at Stanford published a description of
`the Stanford MIPS computer. Efficient pipelining and compiler-assisted schedul(cid:173)
`ing of the pipeline were both important aspects of the original MIPS design. MIPS
`stood for Microprocessor without Interlocked Pipeline Stages, reflecting the lack
`of hardware to stall the pipeline, as the compiler would handle dependencies.
`These early RISC computers-the 801, RISC-II, and MIPS-had much in
`common. Both university projects were interested in designing a simple computer
`that could be built in VLSI within the university environment. All three comput(cid:173)
`ers used a simple load-store architecture, fixed-format 32-bit instructions, and
`emphasized efficient pipelining. Patterson [1985] describes the three computers
`and the basic design principles that have come to characterize what a RISC com(cid:173)
`puter is. Hennessy [1984] provides another view of the same ideas, as well as
`other issues in VLSI processor design.
`In 1985, Hennessy published an explanation of the RISC performance advan(cid:173)
`tage and traced its roots to a substantially lower CPI-under 2 for a RISC proces(cid:173)
`sor and over 10 for a VAX-I 1n80 (though not with identical workloads). A paper
`by Erner and Clark [1984] characterizing VAX-11n8o performance was instru(cid:173)
`mental in helping the RISC researchers understand the source of the performance
`advantage seen by their computers.
`Since the university projects finished up, in the 1983-84 time frame, the tech(cid:173)
`nology has been widely embraced by industry. Many manufacturers of the early
`
`Oracle-1042 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`2.16 Historical Perspective and References
`
`11 153
`
`computers (those made before 1986) claimed that their products were RISC com(cid:173)
`puters. These claims, however, were often born more of marketing ambition than
`of engineering reality.
`In 1986, the computer industry began to announce processors based on the
`technology explored by the three RISC research projects. Moussouris et al.
`[1986] describe the MIPS R2000 integer processor, while Kane's book [1986] is
`a complete description of the architecture. Hewlett-Packard converted their exist(cid:173)
`ing minicomputer line to RISC architectures; Lee [1989] describes the HP Preci(cid:173)
`sion Architecture. IBM never directly turned the 801 into a product. Instead, the
`ideas were adopted for a new, low-end architecture that was incorporated in the
`IBM RT-PC and described in a collection of papers [Waters 1986]. In 1990, IBM
`announced a new RISC architecture (the RS 6000), which is the first superscalar
`RISC processor (see Chapter 4). In 1987, Sun Microsystems began delivering
`computers based on the SPARC architecture, a derivative of the Berkeley RISC-II
`processor; SPARC is described in Gamer et al. [1988]. The PowerPC joined the
`forces of Apple, IBM, and Motorola. Appendix C summarizes several RISC
`architectures.
`To help resolve the RISC versus traditional design debate, designers of VAX
`processors later performed a quantitative comparison of VAX and a RISC proces(cid:173)
`sor for implementations with comparable organizations. Their choices were the
`VAX 8700 and the MIPS M2000. The differing goals for VAX and MIPS have led
`to very different architectures. The VAX goals, simple compilers and code den(cid:173)
`sity, led to powerful addressing modes, powerful instructions, efficient instruction
`encoding, and few registers. The MIPS goals were high performance via pipelin(cid:173)
`ing, ease of hardware implementation, and compatibility with highly optimizing
`compilers. These goals led to simple instructions, simple addressing modes,
`fixed-length instruction formats, and a large number of registers.
`Figure 2.41 shows the ratio of the number of instructions executed, the ratio of
`CPis, and the ratio of performance measured in clock cycles. Since the organizations
`were similar, clock cycle times were assumed to be the same. MIPS executes about
`twice as many instructions as the VAX, while the CPI for the VAX is about six times
`larger than that for the MIPS. Hence, the MIPS M2000 has almost three times the
`performance of the VAX 8700. Furthermore, much less hardware is needed to build
`the MIPS processor than the VAX processor. This cost-performance gap is the rea(cid:173)
`son the company that used to make the VAX has dropped it and is now making the
`Alpha, which is quite similar to MIPS. Bell and Strecker [1998] summarize the
`debate inside the company.
`Looking back, only one complex instruction set computer (CISC) instruction
`set survived the RISC/CISC debate, and that one had binary compatibility with
`PC software. The volume of chips is so high in the PC industry that there is a
`sufficient revenue stream to pay the extra design costs-and sufficient resources
`due to Moore's Law-to build microprocessors that translate from CISC to RISC
`internally. Whatever loss in efficiency, due to longer pipeline stages and bigger
`die size to accommodate translation on the chip, was hedged by having a semi(cid:173)
`conductor fabrication line dedicated to producing just these microprocessors. The
`high volumes justify the economics of a fab line tailored to these chips.
`
`Oracle-1042 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`154
`
`111 Chapter Two Instruction Set Principles and Examples
`
`4.0
`
`3.5
`
`3.0
`
`2.5
`
`MIPSNAX
`
`2.0
`
`1.5
`
`1.0
`
`0.5
`
`0.0
`. "'
`
`<o~c;
`
`J~
`~fl,
`
`~ r1,'d
`
`~
`
`SPEC89 benchmarks
`
`Performance
`ratio
`
`Instructions
`executed ratio
`
`CPI ratio
`
`Figure 2.41 Ratio of MIPS M2000 to VAX 8700 in instructions executed and performance in clock cycles using
`SPEC89 programs. On average, MIPS executes a little over twice as many instructions as the VAX, but the CPI for
`the VAX is almost six times the MIPS CPI, yielding almost a threefold performance advantage. (Based on data from
`Bhandarkar and Clark [1991 ].)
`
`Thus, in the desktop/server market, RISC computers use compilers to trans(cid:173)
`late into RISC instructions and the remaining CISC computer uses hardware to
`translate into RISC instructions. One recent novel variation for the laptop market
`is the Transmeta Crusoe (see Section 4.8), which interprets 80x86 instructions
`and compiles on the fly into internal instructions.
`The embedded market, which competes in cost and power, cannot afford the
`luxury of hardware translation and thus uses compilers and RISC architectures.
`More than twice as many 32-bit embedded microprocessors were shipped in 2000
`than PC microprocessors, with RISC processors responsible for over 90% of that
`embedded market.
`
`A Brief History of Digital Signal Processors
`
`(Jeff Bier prepared this DSP history.)
`
`In the late 1990s, digital signal-processing (DSP) applications, such as digital
`cellular telephones, emerged as one of the largest consumers of embedded com(cid:173)
`puting power. Today, microprocessors specialized for DSP applications-some(cid:173)
`times called digital signal processors, DSPs, or DSP processors-are used in
`
`Oracle-1042 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`