`ARCHITECTURE
`TECHNIQUES FOR
`POWER-EFFICIENCY
`
`Petitioner Mercedes Ex-1029, 0001
`
`
`
`Petitioner Mercedes Ex-1029, 0002
`
`Petitioner Mercedes Ex-1029, 0002
`
`
`
`________ I_
`Synthesis Lectures on Computer
`Architecture
`
`iii
`
`Editor
`Mark D. Hill, University of Wisconsin, Madison
`
`Synthesis Lectures on Computer Architecture publishes 50 to 150 page publications on topics
`pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware
`components to create computers that meet functional, performance and cost goals.
`
`Computer Architecture Techniques for Power-Efficiency
`Stefanos Kaxiras and Margaret Martonosi
`2008
`
`Chip Mutiprocessor Architecture: Techniques to Improve Throughput and Latency
`Kunle Olukotun, Lance Hammond, James Laudon
`2007
`
`Transactional Memory
`James R. Larus, Ravi Rajwar
`2007
`
`Quantum Computing for Computer Architects
`Tzvetan S. Metodi, Frederic T. Chong
`2006
`
`Petitioner Mercedes Ex-1029, 0003
`
`
`
`Copyright © 2008 by Morgan & Claypool
`
`All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
`any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
`in printed reviews, without the prior permission of the publisher.
`
`Computer Architecture Techniques for Power-Efficiency
`Stefanos Kaxiras and Margaret Martonosi
`www.morganclaypool.com
`
`ISBN: 9781598292084 paper
`ISBN: 9781598292091
`ebook
`
`DOI: 10.2200/S00119ED1V01Y200805CAC004
`
`A Publication in the Morgan & Claypool Publishers series
`SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #4
`
`Lecture #4
`Series Editor: Mark D. Hill, University of Wisconsin, Madison
`
`Library of Congress Cataloging-in-Publication Data
`
`Series ISSN: 1935-3235 print
`Series ISSN: 1935-3243
`electronic
`
`Petitioner Mercedes Ex-1029, 0004
`
`
`
`
`
`vi
`
`ABSTRACT
`In the last few years, power dissipation has become an important design constraint, on par with
`performance, in the design of new computer systems. Whereas in the past, the primary job
`of the computer architect was to translate improvements in operating frequency and transistor
`count into performance, now power efficiency must be taken into account at every step of the
`design process.
`While for some time, architects have been successful in delivering 40% to 50% annual
`improvement in processor performance, costs that were previously brushed aside eventually
`caught up. The most critical of these costs is the inexorable increase in power dissipation and
`power density in processors. Power dissipation issues have catalyzed new topic areas in computer
`architecture, resulting in a substantial body of work on more power-efficient architectures.
`Power dissipation coupled with diminishing performance gains, was also the main cause for
`the switch from single-core to multi-core architectures and a slowdown in frequency increase.
`This book aims to document some of the most important architectural techniques that
`were invented, proposed, and applied to reduce both dynamic power and static power dissipation
`in processors and memory hierarchies. A significant number of techniques have been proposed
`for a wide range of situations and this book synthesizes those techniques by focusing on their
`common characteristics.
`
`KEYWORDS
`Computer power consumption, computer energy consumption, low power computer design,
`computer power efficiency, dynamic power, static power, leakage power, dynamic voltage/
`frequency scaling, computer architecture, computer hardware.
`
`Petitioner Mercedes Ex-1029, 0006
`
`
`
`________ I_
`Contents
`
`vii
`
`Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
`
`1.
`
`Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
`1.1 Brief history of the “power problem” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
`1.2 CMOS Power Consumption: A Quick Primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
`1.2.1 Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
`1.2.2 Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
`1.2.3 Other Forms of CMOS Power Dissipation. . . . . . . . . . . . . . . . . . . . . . . . .5
`Power-Aware Computing Today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
`1.3
`1.4 This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
`
`2. Modeling, Simulation, and Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
`2.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
`2.2 Modeling basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
`2.2.1 Dynamic-power Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
`2.2.2 Leakage Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
`2.2.3 Thermal models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
`Power Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
`2.3
`2.4 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
`2.4.1
`Performance-Counter-based Power and Thermal Estimates . . . . . . . . 19
`2.4.2
`Imaging and Other Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
`Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
`
`2.5
`
`3.
`
`3.2
`
`Using Voltage and Frequency Adjustments to Manage Dynamic Power . . . . . . . . . 23
`3.1 Dynamic Voltage and Frequency Scaling: Motivation and Overview . . . . . . . . 23
`3.1.1 Design Issues and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
`System-Level DVFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
`3.2.1 Eliminating Idle Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
`3.2.2 Discovering and Exploiting Deadlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
`Program-Level DVFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
`3.3.1 Offline Compiler Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
`3.3.2 Online Dynamic Compiler analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
`3.3.3 Coarse-Grained Analysis Based on Power Phases . . . . . . . . . . . . . . . . . . 34
`
`3.3
`
`Petitioner Mercedes Ex-1029, 0007
`
`
`
`viii CONTENTS
`3.4
`Program-Level DVFS for Multiple-Clock Domains . . . . . . . . . . . . . . . . . . . . . . . 35
`3.4.1 DVFS for MCD Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
`3.4.2 Dynamic Work-Steering for MCD Processors . . . . . . . . . . . . . . . . . . . . 38
`3.4.3 DVFS for Multi-Core Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
`3.5 Hardware-Level DVFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
`
`4.2
`
`4. Optimizing Capacitance and Switching Activity to Reduce Dynamic Power . . . . . 45
`4.1 A Road Map for Effective Switched Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . 46
`4.1.1 Excess Switching Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
`4.1.2 Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
`Idle-Unit Switching Activity: Clock gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
`4.2.1 Circuit-Level Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
`4.2.2
`Precomputation and Guarded Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 53
`4.2.3 Deterministic Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
`4.2.4 Clock gating examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
`Idle-Width Switching Activity: Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
`4.3.1 Narrow-Width Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
`4.3.2
`Significance Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
`4.3.3
`Further Reading on Narrow Width Operands . . . . . . . . . . . . . . . . . . . . . 64
`Idle-Width Switching Activity: Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
`4.4.1 Dynamic Zero Compression: Accessing Only Significant Bits . . . . . . . 65
`4.4.2 Value Compression and the Frequent Value Cache . . . . . . . . . . . . . . . . 66
`4.4.3
`Packing Compressed Cache Lines: Compression Cache and
`Significance-Compression Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
`Instruction Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
`4.4.4
`Idle-Capacity Switching Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
`4.5.1 The Power-inefficiency of Out-of-order Processors . . . . . . . . . . . . . . . . 71
`4.5.2 Resource Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
`Idle-Capacity Switching Activity: Instruction Queue . . . . . . . . . . . . . . . . . . . . . . 75
`4.6.1
`Physical Resizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
`4.6.2 Readiness Feedback Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
`4.6.3 Occupancy Feedback Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
`4.6.4 Logical Resizing Without Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . .78
`4.6.5 Other Power Optimizations for the Instruction Queue . . . . . . . . . . . . . 80
`4.6.6 Related Work on Instruction Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
`Idle-Capacity Switching Activity: Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
`
`4.7
`
`4.3
`
`4.4
`
`4.5
`
`4.6
`
`Petitioner Mercedes Ex-1029, 0008
`
`
`
`4.9
`
`4.8
`
`CONTENTS ix
`Idle-Capacity Switching Activity: Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
`4.8.1 Trading Memory Between Cache Levels . . . . . . . . . . . . . . . . . . . . . . . . . . 86
`4.8.2
`Selective Cache Ways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
`4.8.3 Accounting Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
`4.8.4 CAM-Tag Cache Resizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
`4.8.5
`Further Reading on Cache Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . 97
`Parallel Switching-Activity in Set-Associative Caches . . . . . . . . . . . . . . . . . . . . . 97
`4.9.1
`Phased Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
`4.9.2
`Sequentially Accessed Set-Associative Cache . . . . . . . . . . . . . . . . . . . . . . 99
`4.9.3 Way Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
`4.9.4 Advanced Way-Prediction Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . 104
`4.9.5 Way Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
`4.9.6 Coherence Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
`4.10 Cacheable Switching Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
`4.10.1 Work Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
`4.10.2 Filter Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
`4.10.3 Loop Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
`4.10.4 Trace Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
`4.11 Speculative Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
`4.12 Value-dependent Switching Activity: Bus encodings . . . . . . . . . . . . . . . . . . . . . 120
`4.12.1 Address Buses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121
`4.12.2 Address and Data Buses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122
`4.12.3 Further Reading on Data Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
`4.13 Dynamic Work Steering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
`
`5. Managing Static (Leakage) Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
`5.1 A Quick Primer on Leakage Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
`5.1.1
`Subthreshold Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
`5.1.2 Gate Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
`5.2 Architectural Techniques Using the Stacking Effect . . . . . . . . . . . . . . . . . . . . . . 138
`5.2.1 Dynamically Resized (DRI) Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
`5.2.2 Cache Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
`5.2.3 Adaptive Cache Decay and Adaptive Mode Control . . . . . . . . . . . . . . 147
`5.2.4 Decay in the L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
`5.2.5
`Four-Transistor Memory Cell Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
`5.2.6 Gated Vdd Approaches for Function Units . . . . . . . . . . . . . . . . . . . . . . . 156
`
`Petitioner Mercedes Ex-1029, 0009
`
`
`
`x CONTENTS
`5.3 Architectural Techniques Using the Drowsy Effect . . . . . . . . . . . . . . . . . . . . . . . 159
`5.3.1 Drowsy Data Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
`5.3.2 Drowsy Instruction Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
`5.3.3
`State Preserving versus No-state Preserving . . . . . . . . . . . . . . . . . . . . . . 164
`5.3.4 Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
`5.3.5 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
`5.3.6 Compiler Approaches for Decay and Drowsy Mode . . . . . . . . . . . . . . 169
`5.4 Architectural Techniques Based on VT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
`5.4.1 Dynamic Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
`5.4.2
`Static Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
`5.4.3 Dual-VT in Function Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
`5.4.4 Asymmetric Memory Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
`
`6.
`
`Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
`6.1 Dynamic power management via Voltage and Frequency Adjustment:
`Status and Future Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
`6.2 Dynamic Power Reductions based on Effective Capacitance and Activity
`Factor: Status and Future Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
`6.3 Leakage Power Reductions: Status and Future Trends . . . . . . . . . . . . . . . . . . . . 184
`6.4
`Final Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
`
`Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .187
`
`Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .189
`
`Petitioner Mercedes Ex-1029, 0010
`
`
`
`131
`
`C H A P T E R 5
`Managing Static (Leakage) Power
`
`Static power consumption has grown to a significant portion of total power consumption in
`recent years. In CMOS technology, static power consumption is due to the imperfect nature
`of transistors which “leak” current—thereby constantly consuming power—even when they are
`not switching. The advent of this form of static power, called leakage power, was forecasted
`early on [32, 136], giving architects the opportunity to propose techniques to address it. Such
`techniques are the focus of this chapter.
`Considerable work to reduce leakage power consumption is taking place at the process
`level [31]. In fact, process solutions such as the high-k dielectric materials in Intel’s 45 nm
`process technology, are already employed. Addressing the problem at the architectural level is,
`however, indispensable because architectural techniques can be used orthogonally to process
`technology solutions. The importance of architectural techniques is magnified by the exponential
`dependence of leakage power to various operating parameters such as supply voltage (Vdd),
`temperature (T), and threshold voltage (VT). Exponential dependence implies that a leakage-
`reduction solution that works well at some specific operating conditions may not be enough—
`the problem is bound to reappear with the same intensity as before but at higher temperatures
`or lower voltages.
`Undeniably, the most fruitful ground for developing leakage-reduction techniques at the
`architectural level has been the cache hierarchy. The large number of transistors in the on-chip
`memory largely justifies the effort (or obsession) even though these transistors are not the
`most “leaky”—that distinction goes to the high-speed logic transistors [41]. In addition, the
`regularity of design and the access properties of the memory system have made it an excellent
`target for developing high-level policies to fight leakage. Most of the architectural techniques
`presented in this chapter, therefore, target caches or memory structures.
`Chapter structure: The presentation of techniques in this chapter is structured according
`to the type of low-level leakage-reduction mechanism employed (Table 5.1). Architectural
`techniques inherit similar characteristics according to the physical quantity that is manipu-
`lated by their low-level, leakage-reduction mechanism. Here, we concentrate on three ma-
`jor low-level mechanisms (shown in Table 5.1). The first two, the stacking effect and the
`
`Petitioner Mercedes Ex-1029, 0011
`
`
`
`132 COMPUTER ARCHITECTURE TECHNIQUES FOR POWER-EFFICIENCY
`
`TABLE 5.1: Structure of the Leakage Reduction Tehniques in this Chapter.
`
`Section
`Section 5.2
`
`Section 5.3
`
`Characteristics
`Non-state-preserving
`(state-destroying)
`Significant leakage
`reduction
`Power-up latency: 10’s
`of cycles
`
`State-preserving
`Medium leakage
`reduction
`Power-up latency: <10
`cycles
`
`Significant leakage
`reduction
`
`Section 5.4
`
`Low Level
`
`High-Level
`
`Mechanism
`Stacking effect
`and gated Vdd:
`sleep transistor
`cuts off power
`
`Techniques
`Dynamically resized cache
`(DRI) [239], cache decay
`[127], adaptive mode
`control (AMC) [250],
`functional unit decay [105]
`
`Drowsy effect:
`scales supply
`voltage to
`reduce leakage
`
`Threshold voltage
`(VT)
`manipulation:
`
`Drowsy caches [77, 137],
`drowsy instruction caches
`[138, 139], hybrid
`approaches (decay +
`drowsy) [164],
`temperature-adaptive
`approaches [129],
`compiler approaches &
`hybrids [246]
`
`Dynamic
`Combined Vdd
`(e.g., DVFS) and VT
`(e.g., Adaptive Body
`Biasing—ABB) scaling
`[163, 231, 70]
`Static
`MTCMOS Functional
`Units [69], Asymmetric
`Memory Cells [17, 18]
`
`Petitioner Mercedes Ex-1029, 0012
`
`
`
`MANAGING STATIC (LEAKAGE) POWER 133
`drowsy mode, manipulate voltage across transistor terminals (source and drain). This affects
`the magnitude of leakage reduction, the latency in switching leakage modes, and the ability
`to retain state in the low-leakage mode. The third class of low-level mechanisms manipulates
`the transistor threshold voltage (VT) which can dramatically decrease leakage but at the cost of
`reduced device speed.
`It is important to note here that the techniques presented in this chapter address a specific
`type of leakage, called subthreshold leakage. Another type of leakage, called gate oxide leakage, is
`not addressed architecturally but rather at the process level. To gain a better understanding of
`the structure of this chapter as well as the difference in the two types of leakage, the following
`section (Section 5.1) delves into the underlying mechanics of leakage.
`
`A QUICK PRIMER ON LEAKAGE POWER
`5.1
`Static power is so called because it is consumed by every transistor even when no active switching
`is taking place. In older technologies (e.g., NMOS, TTL, ECL, etc.) it is an inherent problem,
`because a path from Vdd to ground is open even when transistors are not switching. With the
`advent of CMOS, static power became less of a concern because the Complementary gate design
`prevents open paths from Vdd to ground.
`Unfortunately, static power resurfaced in CMOS in the form of leakage power. In the latest
`process generations leakage power increases exponentially, principally because of reductions in
`the threshold voltage. Leakage power increased to levels never seen before in CMOS—levels
`comparable to the dynamic (switching) power consumption—when technology scaling entered
`the deep-submicron territory in feature size (<180 nm). Currently, 20–40% of the total power
`consumption is attributed to leakage power.
`CMOS static power arises due to leakage currents. The total leakage current (Ileak) times
`the supply voltage gives the static power consumption, Pleak:
`Pleak = V × Ileak.
`Leakage currents are a manifestation of the true analog nature of transistors, as opposed
`to our idealized view of them as perfect digital switches. The state of a transistor (on or off)
`is controlled by the voltage on its gate terminal. If this voltage is above the threshold voltage
`(VT) the channel beneath the gate conducts, allowing current in the on state (Ion) to flow from
`the source (Vdd) to the drain (GND, ground). In the opposite case (gate voltage below VT), we
`like to think that the transistor is off (perfect insulator). But in reality transistors leak: leakage
`currents flow even in their off state. This is evident in the I–V curve where current flows even
`below the threshold voltage where the device is supposed to be “off.”
`The current that flows from source to drain when the transistor is off is called sub-threshold
`leakage. But that is not all. There are five more types of leakage: reverse-biased-junction
`
`Petitioner Mercedes Ex-1029, 0013
`
`
`
`134 COMPUTER ARCHITECTURE TECHNIQUES FOR POWER-EFFICIENCY
`
`V threshold
`
`V s upply-
`
`FIGURE 5.1: Example of an “I–V ” curve for a semiconductor diode (introduced in Chapter 1).
`Although we informally treat semiconductors as switches, their non-ideal analog behavior leads to
`leakage currents and other effects.
`
`leakage, gate-induced-drain leakage, gate-oxide leakage, gate-current leakage, and punch-
`through leakage. The sub-threshold leakage and gate-oxide leakage dominate the total leakage
`current in devices. Both increase exponentially with each new technology generation with the
`gate-oxide leakage significantly outpacing the sub-threshold leakage.
`In sub-micron technologies, subthreshold and gate leakage is the cost we have to pay for
`the increased speed afforded by scaling. Supply voltage scaling attempts to curb an increase
`in dynamic power. Unfortunately, this strategy also leads to an enormous increase in the
`subthreshold and gate leakage problem. This explains why static power has been gaining on
`dynamic power as a percentage of the total power consumption with every process generation.
`
`5.1.1 Subthreshold Leakage
`Subthreshold leakage increases with technology scaling due to Vdd scaling. The supply voltage
`(Vdd) is scaled along with other physical quantities to reduce dynamic power consumption.
`Scaling solely the supply voltage, however, increases the delay (switching speed) of the transistor.
`This is because the delay is proportional to the inverse of the current that flows in the on state—
`the Ion current (as in the I–V curve of Figure 5.1):
`Delay ∝ 1
`∝
`Vdd
`(Vdd − VT)a
`Ion
`This current, Ion, is a function of the supply voltage and the difference between the supply
`voltage and the threshold voltage (VT). The factor α is a technology-dependent factor taking
`values greater than 1 (between 1.2 and 1.6 for recent technologies) [195]. Since Vdd is lowered
`in order to maintain the speed increase from scaling, the only course of action is to also lower
`the threshold voltage. Herein lies the problem: subthreshold leakage increases exponentially with
`lower threshold voltage.
`
`.
`
`Petitioner Mercedes Ex-1029, 0014
`
`
`
`MANAGING STATIC (LEAKAGE) POWER 135
`To understand the basic mechanisms for leakage reduction we have to take a closer look
`at the formulas describing leakage current. We base our discussion on the Berkeley Predictive
`Model (BSIM3V3.2) formula for subthreshold leakage [143] (which is also the starting point
`for the simplified Butts and Sohi models [41] discussed in Chapter 2). The formula describing
`the subthreshold leakage current, IDsub, is:
`(cid:1)
`(cid:2)
`IDsub = Is0
`
`−Voff
`Vgs−VT
`n · vt
`
`.
`
`1 − e
`
`−Vds
`vt
`
`e
`
`Here, Vds is the voltage bias across the drain and the source and Vgs is the voltage bias
`across the gate and source terminal. Voff is an empirically determined BSIM model parameter
`and vt (vt = kT/q ) is a physical parameter called thermal voltage1 which is proportional to the
`temperature, T. The term n encapsulates various device constants, while the term Is0 depends
`on the transistor geometry (in particular, the aspect ratio of the transistor, W/L).
`Immediately, this equation shows the dependence of leakage to W/L, and its exponential
`dependence to Vds, Vgs, VT, and T.
`
`r
`
`r W/L, transistor geometry: Leakage grows with the aspect ratio of a transistor and with
`its size. Butts and Sohi use simplified models that encapsulate transistor geometry in
`the kdesign parameter. They point out that very small transistors such as those found in
`SRAMs can leak much less than sized-for-performance logic gate transistors. Tran-
`sistor sizing is primarily a circuit-level concern and it will not preoccupy us at the
`architecture level.
`Vds, voltage differential between the drain and the source: This is probably the most
`important parameter concerning the architectural techniques developed for leakage.
`Two important leakage-control techniques that are based on reducing Vds are the
`transistor stacking technique2 and the drowsy technique—a.k.a. dynamic voltage scaling
`(DVS) for leakage [77]. Both these techniques rely on the (1 − e(−Vds/Vt)) factor of
`the subthreshold leakage equation. This factor is approximately 1 with a large Vds
`(i.e., Vds = Vdd and Vdd (cid:3) vt) but falls off rapidly as Vds is reduced. Architectural
`techniques based on transistor stacking—in particular, a stacking technique called
`gated Vdd [184]—and on the drowsy technique form the bulk of the work described in
`this chapter. The former are presented in Section 5.2 and the latter in Section 5.3.
`
`1For the thermal voltage equation, k is Boltzmann’s constant and q is the magnitude of the electron’s charge. At
`room temperature (T = 300 K), the thermal voltage is about 26 mV.
`2The stacking effect itself is also partially due to a change in the VT. This chance is dynamic and is caused by a
`slight reverse bias induced by the top (off) transistor on the bottom (off) transistor.
`
`Petitioner Mercedes Ex-1029, 0015
`
`
`
`r
`
`r
`
`r
`
`136 COMPUTER ARCHITECTURE TECHNIQUES FOR POWER-EFFICIENCY
`Vgs, voltage differential between the gate and source: Regarding subthreshold leakage
`for devices in their normal “off” state, this factor can be set to zero, so it is not
`a concern. Butts and Sohi use this assumption to arrive at their simplified leakage
`model [41]. However, Vgs plays a significant role in the gate-oxide leakage discussed in
`Section 5.1.2.
`VT, threshold voltage: The threshold voltage—the voltage level that switches on the
`transistor—significantly affects the magnitude of the leakage current in the off state.
`−1 is evident in the last
`The exponential dependence of subthreshold leakage on (VT)
`factor of the BSIM3 formula: the smaller the VT, the higher is the leakage. Raising the
`threshold voltage reduces the subthreshold leakage but compromises switching speed.
`Many circuit-level techniques, e.g., MTCMOS, reverse body bias (RBB) and larger-
`than-Vdd forward body bias [13, 174, 14, 222], have been developed to provide a choice
`of threshold voltages. These techniques provide multiple threshold voltages at the
`process level (for example, MTCMOS offers high-VT and low-VT devices) or vary the
`threshold voltage dynamically by applying bias voltages on the semiconductor body
`(e.g., RBB and larger-than-Vdd FBB). Architectural techniques based on manipulating
`the threshold voltage are presented in Section 5.4.
`T, temperature: Last but not the least, subthreshold leakage exponentially depends on
`temperature, T, via the thermal voltage term vt. This is actually a dangerous dependence
`since it can set off a phenomenon called thermal runaway. If leakage power—or for that
`matter any other source of power consumption—causes an increase in temperature,
`the thermal voltage vt also increases linearly to temperature. This leads, in turn, to
`an exponential increase in leakage, which further increases temperature. This vicious
`circle of temperature and leakage increase can be so severe as to seriously damage the
`semiconductor. The solution is to keep the temperature below some critical threshold
`so that thermal runaway cannot happen. Cooling techniques, combined with accurate
`thermal monitoring, are used for this purpose.3
`Architecturally, the dependence of leakage to temperature is quite interesting. This
`is because at low temperatures it might not be so important to engage architectural
`techniques that could hurt performance with little payoff. As temperature rises and
`leakage power becomes the dominant component of power consumption (and hence
`heat generation) architectural techniques that can curb leakage become much more
`appealing. One such example is presented in Section 5.3.4.
`
`3Unfortunately, the subject of thermal management, despite its importance, is too extensive to receive other
`than superficial coverage in the space of this book. Here, it is only mentioned briefly with respect to leakage
`(Section 5.3.4).
`
`Petitioner Mercedes Ex-1029, 0016
`
`
`
`MANAGING STATIC (LEAKAGE) POWER 137
`
`5.1.2 Gate Leakage
`Gate leakage (also known as gate-oxide leakage) is a major concern because of its tremendous
`rate of increase. It grew 100-fold from the 130 nm technology (2001) to the 90 nm technology
`(2003) [31]. Major semiconductor companies are switching to “high-k” dielectrics in their
`process technologies to alleviate this problem [31].
`Gate leakage occurs due to direct tunneling of electrons through the gate insulator—
`commonly silicon dioxide, SiO2—that separates the gate terminal from the transistor channel.
`The thickness, Tox, of the gate SiO2 insulator must also be scaled along with other dimensions
`of the transistor to allow the gate’s electric field to effectively control the conductance of the
`channel. The problem is that when the gate insulator becomes very thin, quantum mechanics
`allow electrons to tunnel across. When the insulating layer is thick, the probability of tunneling
`a