`
`VASANTH VENKATACHALAM AND MICHAEL FRANZ
`
`University of California, Irvine
`
`Power consumption is a major factor that limits the performance of computers. We
`survey the “state of the art” in techniques that reduce the total power consumed by a
`microprocessor system over time. These techniques are applied at various levels
`ranging from circuits to architectures, architectures to system software, and system
`software to applications. They also include holistic approaches that will become more
`important over the next decade. We conclude that power management is a multifaceted
`discipline that is continually expanding with new techniques being developed at every
`level. These techniques may eventually allow computers to break through the “power
`wall” and achieve unprecedented levels of performance, versatility, and reliability. Yet it
`remains too early to tell which techniques will ultimately solve the power problem.
`
`Categories and Subject Descriptors: C.5.3 [Computer System Implementation]:
`Microcomputers—Microprocessors; D.2.10 [Software Engineering]: Design—
`Methodologies; I.m [Computing Methodologies]: Miscellaneous
`General Terms: Algorithms, Design, Experimentation, Management, Measurement,
`Performance
`Additional Key Words and Phrases: Energy dissipation, power reduction
`
`1. INTRODUCTION
`
`Computer scientists have always tried to
`improve the performance of computers.
`But although today’s computers are much
`faster and far more versatile than their
`predecessors, they also consume a lot
`
`of power; so much power, in fact, that
`their power densities and concomitant
`heat generation are rapidly approaching
`levels comparable to nuclear reactors
`(Figure 1). These high power densities
`impair chip reliability and life expectancy,
`increase cooling costs, and,
`for large
`
`Parts of this effort have been sponsored by the National Science Foundation under ITR grant CCR-0205712
`and by the Office of Naval Research under grant N00014-01-1-0854.
`Any opinions, findings, and conclusions or recommendations expressed in this material are those of the
`authors and should not be interpreted as necessarily representing the official views, policies or endorsements,
`either expressed or implied, of the National Science foundation (NSF), the Office of Naval Research (ONR),
`or any other agency of the U.S. Government.
`The authors also gratefully acknowledge gifts from Intel, Microsoft Research, and Sun Microsystems that
`partially supported this work.
`Authors’ addresses: Vasanth Venkatachalam, School of Information and Computer Science, University of
`California at Irvine, Irvine, CA 92697-3425; email: vvenkata@uci.edu; Michael Franz, School of Information
`and Computer Science, University of California at Irvine, Irvine, CA 92697-3425; email: franz@uci.edu.
`Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
`without fee provided that copies are not made or distributed for profit or direct commercial advantage and
`that copies show this notice on the first page or initial screen of a display along with the full citation.
`Copyrights for components of this work owned by others than ACM must be honored. Abstracting with
`credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
`component of this work in other works requires prior specific permission and/or a fee. Permissions may be
`requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212)
`869-0481, or permissions@acm.org.
`c(cid:1)2005 ACM 0360-0300/05/0900-0195 $5.00
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005, pp. 195–237.
`
`Petitioner Samsung Ex-1030, 0001
`
`
`
`196
`
`1 0 0 0~ - - - - - - - - - - - - - -~
`
`Rocket Nozzle "
`
`Sun's Surface
`
`1386
`
`Penttum
`Oi486
`
`15µ
`
`Iµ
`
`07µ
`
`05µ 035µ 025µ 018µ 013µ
`
`01µ 007µ
`
`Technology Generation
`
`Fig. 1. Power densities rising. Figure adapted from
`Pollack 1999.
`
`data centers, even raise environmental
`concerns.
`At the other end of the performance
`spectrum, power issues also pose prob-
`lems for smaller mobile devices with lim-
`ited battery capacities. Although one could
`give these devices faster processors and
`larger memories, this would diminish
`their battery life even further.
`to
`Without
`cost effective solutions
`the power problem,
`improvements in
`micro-processor technology will eventu-
`ally reach a standstill. Power manage-
`ment is a multidisciplinary field that in-
`volves many aspects (i.e., energy, temper-
`ature, reliability), each of which is complex
`enough to merit a survey of its own. The
`focus of our survey will be on techniques
`that reduce the total power consumed by
`typical microprocessor systems.
`We will follow the high-level taxonomy
`illustrated in Figure 2. First, we will de-
`fine power and energy and explain the
`complex parameters that dynamic and
`static power depend on (Section 2). Next,
`we will introduce techniques that reduce
`power and energy (Section 3), starting
`with circuit (Section 3.1) and architec-
`tural techniques (Section 3.2, Section 3.3,
`and Section 3.4), and then moving on to
`two techniques that are widely applied in
`hardware and software, dynamic voltage
`scaling (Section 3.5) and resource hiberna-
`tion (Section 3.6). Third, we will examine
`what compilers can do to manage power
`(Section 3.7). We will then discuss recent
`work in application level power manage-
`ment (Section 3.8), and recent efforts (Sec-
`tion 3.9) to develop a holistic solution to
`
`V. Venkatachalam and M. Franz
`
`the power problem. Finally, we will dis-
`cuss some commercial power management
`systems (Section 3.10) and provide a
`glimpse into some more radical technolo-
`gies that are emerging (Section 3.11).
`
`(1)
`
`2. DEFINING POWER
`Power and energy are commonly defined
`in terms of the work that a system per-
`forms. Energy is the total amount of work
`a system performs over a period of time,
`while power is the rate at which the sys-
`tem performs that work. In formal terms,
`P = W /T
`E = P ∗ T,
`(2)
`where P is power, E is energy, T is a spe-
`cific time interval, and W is the total work
`performed in that interval. Energy is mea-
`sured in joules, while power is measured
`in watts.
`These concepts of work, power, and en-
`ergy are used differently in different con-
`texts. In the context of computers, work
`involves activities associated with run-
`ning programs (e.g., addition, subtraction,
`memory operations), power is the rate at
`which the computer consumes electrical
`energy (or dissipates it in the form of heat)
`while performing these activities, and en-
`ergy is the total electrical energy the com-
`puter consumes (or dissipates as heat)
`over time.
`This distinction between power and en-
`ergy is important because techniques that
`reduce power do not necessarily reduce en-
`ergy. For example, the power consumed
`by a computer can be reduced by halv-
`ing the clock frequency, but if the com-
`puter then takes twice as long to run
`the same programs, the total energy con-
`sumed will be similar. Whether one should
`reduce power or energy depends on the
`context. In mobile applications, reducing
`energy is often more important because
`it increases the battery lifetime. How-
`ever, for other systems (e.g., servers), tem-
`perature is a larger issue. To keep the
`temperature within acceptable limits, one
`would need to reduce instantaneous power
`regardless of the impact on total energy.
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Petitioner Samsung Ex-1030, 0002
`
`
`
`Power Reduction Techniques for Microprocessor Systems
`
`197
`
`High Level Organization Of Survey
`
`Defining Power
`(Section 2)
`
`Reducing Power
`(Section 3)
`
`Interconnect Optimizations (3.2)
`
`Emerging Technologies (3.11)
`
`Commercial Systems (3.10)
`
`Memory Optimizations (3.3)
`
`Crosslayer Adaptations (3.9)
`
`Hardware Adaptations (3.4)
`
`Application Level Techniques (3.8)
`
`Dynamic Voltage Scaling (3.5)
`
`Compiler Techniques (3.7)
`
`Resource Hibernation (3.6)
`
`Fig. 2. Organization of this survey.
`
`2.1. Dynamic Power Consumption
`There are two forms of power consump-
`tion, dynamic power consumption and
`static power consumption. Dynamic power
`consumption arises from circuit activity
`such as the changes of
`inputs in an
`adder or values in a register. It has two
`sources, switched capacitance and short-
`circuit current.
`Switched capacitance is the primary
`source of dynamic power consumption and
`arises from the charging and discharging
`of capacitors at the outputs of circuits.
`is a secondary
`Short-circuit current
`source of dynamic power consumption and
`accounts for only 10-15% of the total power
`consumption. It arises because circuits are
`composed of transistors having opposite
`polarity, negative or NMOS and positive
`or PMOS. When these two types of tran-
`sistors switch current, there is an instant
`when they are simultaneously on, creat-
`ing a short circuit. We will not deal fur-
`ther with the power dissipation caused by
`this short circuit because it is a smaller
`percentage of total power, and researchers
`have not found a way to reduce it without
`sacrificing performance.
`As the following equation shows, the
`more dominant component of dynamic
`power, switched capacitance (Pdynamic), de-
`pends on four parameters namely, supply
`voltage (V), clock frequency (f), physical
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`capacitance (C) and an activity factor (a)
`that relates to how many 0 → 1 or 1→ 0
`transitions occur in a chip:
`Pdynamic ∼ aCV 2 f .
`
`(3)
`
`Accordingly, there are four ways to re-
`duce dynamic power consumption, though
`they each have different tradeoffs and not
`all of them reduce the total energy con-
`sumed. The first way is to reduce the phys-
`ical capacitance or stored electrical charge
`of a circuit. The physical capacitance de-
`pends on low level design parameters
`such as transistor sizes and wire lengths.
`One can reduce the capacitance by re-
`ducing transistor sizes, but this worsens
`performance.
`The second way to lower dynamic power
`is to reduce the switching activity. As com-
`puter chips get packed with increasingly
`complex functionalities, their switching
`activity increases [De and Borkar 1999],
`making it more important to develop tech-
`niques that fall into this category. One
`popular technique, clock gating, gates
`the clock signal from reaching idle func-
`tional units. Because the clock network
`accounts for a large fraction of a chip’s
`total energy consumption, this is a very
`effective way of reducing power and en-
`ergy throughout a processor and is imple-
`mented in numerous commercial systems
`
`Petitioner Samsung Ex-1030, 0003
`
`
`
`198
`
`V. Venkatachalam and M. Franz
`
`h
`
`100°/c,
`
`0..
`ci
`0
`I-
`
`80%
`
`60%
`
`0) " 0
`" 0) " 0
`0) "" "' ""
`"' 0)
`
`0..
`
`40%
`
`20%
`
`..J
`
`1999
`
`2001
`
`2003
`Year
`
`2005
`
`2007
`
`2009
`
`ITRS trends for leakage power dissipation.
`Fig. 3.
`Figure adapted from Meng et al., 2005.
`
`including the Pentium 4, Pentium M, Intel
`XScale and Tensilica Xtensa, to mention
`but a few.
`The third way to reduce dynamic power
`consumption is to reduce the clock fre-
`quency. But as we have just mentioned,
`this worsens performance and does not al-
`ways reduce the total energy consumed.
`One would use this technique only if
`the target system does not support volt-
`age scaling and if the goal
`is to re-
`duce the peak or average power dissi-
`pation and indirectly reduce the chip’s
`temperature.
`The fourth way to reduce dynamic power
`consumption is to reduce the supply volt-
`age. Because reducing the supply voltage
`increases gate delays, it also requires re-
`ducing the clock frequency to allow the cir-
`cuit to work properly.
`The combination of scaling the supply
`voltage and clock frequency in tandem is
`called dynamic voltage scaling (DVS). This
`technique should ideally reduce dynamic
`power dissipation cubically because dy-
`namic power is quadratic in voltage and
`linear in clock frequency. This is the most
`widely adopted technique. A growing num-
`ber of processors, including the Pentium
`M, mobile Pentium 4, AMD’s Athlon, and
`Transmeta’s Crusoe and Efficieon proces-
`sors allow software to adjust clock fre-
`quencies and voltage settings in tandem.
`However, DVS has limitations and cannot
`always be applied, and even when it can
`be applied, it is nontrivial to apply as we
`will see in Section 3.5.
`
`2.2. Understanding Leakage Power
`Consumption
`consuming dynamic
`In addition to
`power, computer components consume
`static power, also known as idle power
`or leakage. According to the most re-
`cently published industrial
`roadmaps
`[ITRSRoadMap], leakage power is rapidly
`becoming the dominant source of power
`consumption in circuits (Figure 3) and
`persists whether a computer is active or
`idle. Because its causes are different from
`those of dynamic power, dynamic power
`reduction techniques do not necessarily
`reduce the leakage power.
`As the equation that follows illustrates,
`leakage power consumption is the product
`of the supply voltage (V) and leakage cur-
`rent (Il eak), or parasitic current, that flows
`through transistors even when the tran-
`sistors are turned off.
`Pleak = V Ileak.
`
`(4)
`
`To understand how leakage current
`arises, one must understand how transis-
`tors work. A transistor regulates the flow
`of current between two terminals called
`the source and the drain. Between these
`two terminals is an insulator, called the
`channel, that resists current. As the volt-
`age at a third terminal, the gate, isin-
`creased, electrical charge accumulates in
`the channel, reducing the channel’s resis-
`tance and creating a path along which
`electricity can flow. Once the gate volt-
`age is high enough, the channel’s polar-
`ity changes, allowing the normal flow of
`current between the source and the drain.
`The threshold at which the gate’s voltage
`is high enough for the path to open is called
`the threshold voltage.
`According to this model, a transistor is
`similar to a water dam. It is supposed
`to allow current to flow when the gate
`voltage exceeds the threshold voltage but
`should otherwise prevent current from
`flowing. However, transistors are imper-
`fect. They leak current even when the gate
`voltage is below the threshold voltage. In
`fact, there are six different types of cur-
`rent that leak through a transistor. These
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Petitioner Samsung Ex-1030, 0004
`
`
`
`Power Reduction Techniques for Microprocessor Systems
`
`199
`
`include reverse-biased-junction leakage,
`gate-induced-drain leakage, subthreshold
`leakage, gate-oxide leakage, gate-current
`leakage, and punch-through leakage. Of
`these six, subthreshold leakage and gate-
`oxide leakage dominate the total leakage
`current.
`Gate-oxide leakage flows from the gate of
`a transistor into the substrate. This type
`of leakage current depends on the thick-
`ness of the oxide material that insulates
`the gate:
`(cid:1)
`
`(cid:2)2
`
`V T
`
`ox
`
`Iox = K 2W
`
`−α Tox
`V .
`
`e
`
`(5)
`
`ture increases. But as the Equation (6)
`shows, this increases leakage further,
`causing yet higher temperatures. This vi-
`cious cycle is known as thermal runaway.
`It is the chip designer’s worst nightmare.
`How can these problems be solved?
`Equations (4) and (6) indicate four ways to
`reduce leakage power. The first way is to
`reduce the supply voltage. As we will see,
`supply voltage reduction is a very common
`technique that has been applied to compo-
`nents throughout a system (e.g., processor,
`buses, cache memories).
`The second way to reduce leakage power
`is to reduce the size of a circuit because the
`total leakage is proportional to the leak-
`age dissipated in all of a circuit’s tran-
`sistors. One way of doing this is to de-
`sign a circuit with fewer transistors by
`omitting redundant hardware and using
`smaller caches, but this may limit perfor-
`mance and versatility. Another idea is to
`reduce the effective transistor count dy-
`namically by cutting the power supplies to
`idle components. Here, too, there are chal-
`lenges such as how to predict when dif-
`ferent components will be idle and how to
`minimize the overhead of shutting them
`on or off. This, is also a common approach
`of which we will see examples in the sec-
`tions to follow.
`The third way to reduce leakage power
`is to cool the computer. Several cooling
`techniques have been developed since
`the 1960s. Some blow cold air into the
`circuit, while
`others
`refrigerate
`the
`processor [Schmidt and Notohardjono
`2002], sometimes even by costly means
`such as circulating cryogenic fluids like
`liquid nitrogen [Krane et al. 1988]. These
`techniques have three advantages. First
`they significantly reduce subthreshold
`leakage. In fact, a recent study [Schmidt
`and Notohardjono 2002] showed that cool-
`ing a memory cell by 50 degrees Celsius
`reduces the leakage energy by five times.
`Second, these techniques allow a circuit to
`work faster because electricity encounters
`less resistance at lower temperatures.
`Third, cooling eliminates some negative
`effects of high temperatures, namely the
`degradation of a chip’s reliability and life
`expectancy. Despite these advantages,
`
`(6)
`
`According to this equation, the gate-
`oxide leakage Iox increases exponentially
`as the thickness Tox of the gate’s oxide ma-
`terial decreases. This is a problem because
`future chip designs will require the thick-
`ness to be reduced along with other scaled
`parameters such as transistor length and
`supply voltage. One way of solving this
`problem would be to insulate the gate us-
`ing a high-k dialectric material instead
`of the oxide materials that are currently
`used. This solution is likely to emerge over
`the next decade.
`Subthreshold leakage current flows be-
`tween the drain and source of a transistor.
`It is the dominant source of leakage and
`depends on a number of parameters that
`are related through the following equa-
`tion:
`(cid:3)
`(cid:4)
`
`−Vth
`nT
`
`1 − e
`
`−V
`T
`
`.
`
`Isub = K 1W e
`In this equation, W is the gate width
`and K and n are constants. The impor-
`tant parameters are the supply voltage
`V , the threshold voltage Vth, and the
`temperature T. The subthreshold leak-
`age current Isub increases exponentially
`as the threshold voltage Vth decreases.
`This again raises a problem for future
`chip designs, because as technology scales,
`threshold voltages will have to scale along
`with supply voltages.
`The increase in subthreshold leakage
`current causes another problem. When the
`leakage current increases, the tempera-
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Petitioner Samsung Ex-1030, 0005
`
`
`
`200
`
`OFF A
`
`Vdd
`
`Vdd
`
`OFF B
`
`Ileakl
`
`OFF B
`
`i lleak2
`
`Virtual Vdd
`
`OFF C
`
`Ground
`(a)
`
`Ground
`(b)
`
`Virtual Ground
`
`V. Venkatachalam and M. Franz
`
`Vdd
`
`~j ~
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`Sleep Transistors
`/
`
`/
`
`/
`
`/
`
`/
`
`I
`
`I
`
`/
`ff'
`
`Ground
`
`Fig. 4. The transistor stacking effect. The cir-
`cuits in both (a) and (b) are leaking current be-
`tween Vdd and Ground. However, Ileak1, the leak-
`age in (a), is less than Ileak2, the leakage in (b).
`Figure adapted from Butts and Sohi 2000.
`
`there are issues to consider such as the
`costs of the hardware used to cool the
`circuit. Moreover,
`cooling techniques
`are insufficient if they result in wide
`temperature variations in different parts
`of a circuit. Rather, one needs to prevent
`hotspots by distributing heat evenly
`throughout a chip.
`
`2.2.1. Reducing Threshold Voltage. The
`fourth way of reducing leakage power is to
`increase the threshold voltage. As Equa-
`tion (6) shows, this reduces the subthresh-
`old leakage exponentially. However, it also
`reduces the circuit’s performance as is ap-
`parent in the following equation that re-
`lates frequency (f), supply voltage (V), and
`threshold voltage (Vth) and where α is a
`constant:
`
`f ∞ (V − Vth)α
`
`V
`
`.
`
`(7)
`
`the less intuitive ways of
`One of
`increasing the threshold voltage is to
`exploit what is called the stacking effect
`(refer to Figure 4). When two or more tran-
`sistors that are switched off are stacked
`on top of each other (a), then they dis-
`sipate less leakage than a single tran-
`sistor that is turned off (b). This is be-
`cause each transistor in the stack induces
`a slight reverse bias between the gate and
`source of the transistor right below it,
`
`Fig. 5. Multiple threshold circuits with
`sleep transistors.
`
`and this increases the threshold voltage
`of the bottom transistor, making it more
`resistant to leakage. As a result, in Fig-
`ure 4(a), in which all transistors are in
`the Off position, transistor B leaks less
`current than transistor A, and transistor
`C leaks less current than transistor B.
`Hence, the total leakage current is atten-
`uated as it flows from Vdd to the ground
`through transistors A, B, and C. This is not
`the case in the circuit shown in Figure 4
`(b), which contains only a single off
`transistor.
`Another way to increase the thresh-
`old voltage is to use Multiple Thresh-
`old Circuits With Sleep Transistors (MTC-
`MOS) [Calhoun et al. 2003; Won et al.
`2003] . This involves isolating a leaky cir-
`cuit element by connecting it to a pair
`of virtual power supplies that are linked
`to its actual power supplies through sleep
`transistors (Figure 5). When the circuit is
`active, the sleep transistors are activated,
`connecting the circuit to its power sup-
`plies. But when the circuit is inactive, the
`sleep transistors are deactivated, thus dis-
`connecting the circuit from its power sup-
`plies. In this inactive state, almost no leak-
`age passes through the circuit because the
`sleep transistors have high threshold volt-
`ages. (Recall that subthreshold leakage
`drops exponentially with a rise in thresh-
`old voltage, according to Equation (6).)
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Petitioner Samsung Ex-1030, 0006
`
`
`
`Power Reduction Techniques for Microprocessor Systems
`
`201
`
`Vg < V1hreshold
`
`drain
`
`+++
`
`(a)
`
`Fig. 6. Adaptive body biasing.
`
`This technique effectively confines the
`leakage to one part of the circuit, but is
`tricky to implement for several reasons.
`The sleep transistors must be sized prop-
`erly to minimize the overhead of acti-
`vating them. They cannot be turned on
`and off too frequently. Moreover, this tech-
`nique does not readily apply to memories
`because memories lose data when their
`power supplies are cut.
`A third way to increase the threshold
`is to employ dual threshold circuits. Dual
`threshold circuits [Liu et al. 2004; Wei
`et al. 1998; Ho and Hwang 2004] reduce
`leakage by using high threshold (low leak-
`age) transistors on noncritical paths and
`low threshold transistors on critical paths,
`the idea being that noncritical paths can
`execute instructions more slowly without
`impairing performance. This is a diffi-
`cult technique to implement because it re-
`quires choosing the right combination of
`transistors for high-threshold voltages. If
`too many transistors are assigned high
`threshold voltages, the noncritical paths
`in the circuit can slow down too much.
`A fourth way to increase the thresh-
`old voltage is to apply a technique known
`as adaptive body biasing [Seta et al.
`1995; Kobayashi and Sakurai 1994; Kim
`and Roy 2002]. Adaptive body biasing is
`a runtime technique that reduces leak-
`age power by dynamically adjusting the
`threshold voltages of circuits depending
`on whether the circuits are active. When
`a circuit is not active, the technique in-
`creases its threshold voltage, thus saving
`leakage power exponentially, although at
`the expense of a delay in circuit operation.
`When the circuit is active, the technique
`decreases the threshold voltage to avoid
`slowing it down.
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Vg < V1hreshold
`
`drain
`
`Body bias vollage
`
`(b)
`
`To adjust the threshold voltage, adap-
`tive body biasing applies a voltage to the
`transistor’s body known as a body bias
`voltage (Figure 6). This voltage changes
`the polarity of a transistor’s channel, de-
`creasing or increasing its resistance to
`current flow. When the body bias voltage
`is chosen to fill the transistor’s channel
`with positive ions (b), the threshold volt-
`age increases and reduces leakage cur-
`rents. However, when the voltage is cho-
`sen to fill the channel with negative ions,
`the threshold voltage decreases, allowing
`higher performance, though at the cost of
`more leakage.
`
`3. REDUCING POWER
`
`3.1. Circuit And Logic Level Techniques
`3.1.1. Transistor Sizing. Transistor siz-
`ing [Penzes et al. 2002; Ebergen et al.
`2004] reduces the width of transistors
`to reduce their dynamic power consump-
`tion, using low-level models that relate the
`power consumption to the width. Accord-
`ing to these models, reducing the width
`also increases the transistor’s delay and
`thus the transistors that lie away from
`the critical paths of a circuit are usually
`the best candidates for this technique. Al-
`gorithms for applying this technique usu-
`ally associate with each transistor a toler-
`able delay which varies depending on how
`close that transistor is to the critical path.
`These algorithms then try to scale each
`transistor to be as small as possible with-
`out violating its tolerable delay.
`
`ar-
`Reordering. The
`3.1.2. Transistor
`rangement of transistors in a circuit
`affects energy consumption. Figure 7
`
`Petitioner Samsung Ex-1030, 0007
`
`
`
`202
`
`\'dd
`
`/
`
`S"itchinJ \ cm ity
`" ✓, '
`/ ✓,
`I
`
`/
`
`I
`
`l>"., ,,'
`
`'
`
`'
`
`'
`
`V. Venkatachalam and M. Franz
`
`3.1.3. Half Frequency and Half Swing Clocks.
`Half-frequency and half-swing clocks re-
`duce frequency and voltage, respectively.
`Traditionally, hardware events such as
`register file writes occur on a rising
`clock edge. Half-frequency clocks synchro-
`nize events using both edges, and they
`tick at half the speed of regular clocks,
`thus cutting clock switching power in
`half. Reduced-swing clocks also often use
`a lower voltage signal and thus reduce
`power quadratically.
`
`3.1.4. Logic Gate Restructuring. There are
`many ways to build a circuit out of logic
`gates. One decision that affects power con-
`sumption is how to arrange the gates and
`their input signals.
`For example, consider two implementa-
`tions of a four-input AND gate (Figure 8),
`a chain implementation (a), and a tree
`implementation (b). Knowing the signal
`probabilities (1 or 0) at each of the pri-
`mary inputs (A, B, C, D), one can easily cal-
`culate the transition probabilities (0→1)
`for each output (W, X, F, Y, Z). If each in-
`put has an equal probability of being a
`1 or a0, then the calculation shows that
`the chain implementation (a) is likely to
`switch less than the tree implementation
`(b). This is because each gate in a chain
`has a lower probability of having a 0→1
`transition than its predecessor; its tran-
`sition probability depends on those of all
`its predecessors. In the tree implementa-
`tion, on the other hand, some gates may
`share a parent (in the tree topology) in-
`stead of being directly connected together.
`These gates could have the same transi-
`tion probabilities.
`Nevertheless, chain implementations
`do not necessarily save more energy than
`tree implementations. There are other is-
`sues to consider when choosing a topology.
`One is the issue of glitches or spurious
`transitions that occur when a gate does not
`receive all of its inputs at the same time.
`These glitches are more common in chain
`implementations where signals can travel
`along different paths having widely vary-
`ing delays. One solution to reduce glitches
`is to change the topology so that the dif-
`ferent paths in the circuit have similar
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`I
`
`I
`
`I r
`
`(a)
`
`(bl
`
`Fig. 7. Transistor Reordering. Figure
`adapted from Hossain et al. 1996.
`
`shows two possible implementations of
`the same circuit that differ only in their
`placement of the transistors marked A
`and B. Suppose that the input to transis-
`tor A is 1, the input to transistor B is 1,
`and the input to transistor C is 0. Then
`transistors A and B will be on, allowing
`current from Vdd to flow through them
`and charge the capacitors C1 and C2.
`Now suppose that the inputs change and
`that A’s input becomes 0, and C’s input be-
`comes 1. Then A will be off while B and C
`will be on. Now the implementations in (a)
`and (b) will differ in the amounts of switch-
`ing activity. In (a), current from ground
`will flow past transistors B and C, dis-
`charging both the capacitors C1 and C2.
`However, in (b), the current from ground
`will only flow past transistor C; it will not
`get past transistor A since A is turned off.
`Thus it will only discharge the capacitor
`C2, rather than both C1 and C2 as in part
`(a). Thus the implementation in (b) will
`consume less power than that in (a).
`Transistor reordering [Kursun et al.
`2004; Sultania et al. 2004] rearranges
`transistors to minimize their switching
`activity. One of its guiding principles is
`to place transistors closer to the circuit’s
`outputs if they switch frequently in or-
`der to prevent a domino effect where
`the switching activity from one transistor
`trickles into many other transistors caus-
`ing widespread power dissipation. This re-
`quires profiling techniques to determine
`how frequently different transistors are
`likely to switch.
`
`Petitioner Samsung Ex-1030, 0008
`
`
`
`Power Reduction Techniques for Microprocessor Systems
`
`203
`
`w
`" '~
`
`· A
`B
`0.5
`
`C
`0.5
`
`7/64
`X
`
`15/256
`F
`
`0.5
`
`(a)
`
`0.5A
`
`B
`11.5
`
`11.5
`C
`
`D
`0.5
`
`F
`
`3/16
`y
`
`z
`3/16
`
`(b)
`
`Fig. 8. Gate restructuring. Figure adapted from the Pennsylvania State
`University Microsystems Design Laboratory’s tutorial on low power design.
`
`delays. This solution, known as path bal-
`ancing often transforms chain implemen-
`tations into tree implementations. An-
`other solution, called retiming, involves
`inserting flip-flops or registers to slow
`down and thereby synchronize the signals
`that pass along different paths but recon-
`verge to the same gate. Because flip-flops
`and registers are in sync with the proces-
`sor clock, they sample their inputs less fre-
`quently than logic gates and are thus more
`immune to glitches.
`
`3.1.5. Technology Mapping. Because of
`the huge number of possibilities and
`tradeoffs at the gate level, designers rely
`on tools to determine the most energy-
`optimal way of arranging gates and sig-
`nals. Technology mapping [Chen et al.
`2004; Li et al. 2004; Rutenbar et al. 2001]
`is the automated process of constructing a
`gate-level representation of a circuit sub-
`ject to constraints such as area, delay, and
`power. Technology mapping for power re-
`lies on gate-level power models and a li-
`brary that describes the available gates,
`and their design constraints. Before a cir-
`cuit can be described in terms of gates, it
`is initially represented at the logic level.
`The problem is to design the circuit out of
`logic gates in a way that will mimimize the
`total power consumption under delay and
`cost constraints. This is an NP-hard Di-
`rected Acyclic Graph (DAG) covering prob-
`lem, and a common heuristic to solve it is
`to break the DAG representation of a cir-
`cuit into a set of trees and find the optimal
`mapping for each subtree using standard
`tree-covering algorithms.
`
`3.1.6. Low Power Flip-Flops. Flip-flops
`are the building blocks of small memories
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`such as register files. A typical master-
`slave flip-flop consists of two latches, a
`master latch, and a slave latch. The inputs
`of the master latch are the clock signal
`and data. The inputs of the slave latch
`are the inverse of the clock signal and the
`data output by the master latch. Thus
`when the clock signal is high, the master
`latch is turned on, and the slave latch
`is turned off. In this phase, the master
`samples whatever inputs it receives and
`outputs them. The slave, however, does
`not sample its inputs but merely outputs
`whatever it has most recently stored. On
`the falling edge of the clock, the master
`turns off and the slave turns on. Thus the
`master saves its most recent input and
`stops sampling any further inputs. The
`slave samples the new inputs it receives
`from the master and outputs it.
`Besides this master-slave design are
`other common designs such as the pulse-
`triggered flip-flop and sense-amplifier flip-
`flop. All these designs share some common
`sources of power consumption, namely
`power dissipated from the clock signal,
`power dissipated in internal switching ac-
`tivity (caused by the clock signal and by
`changes in data), and power dissipated
`when the outputs change.
`Researchers have proposed several al-
`ternative low power designs for flip-flops.
`Most of these approaches reduce the
`switching activity or the power dissipated
`by the clock signal. One alternative is the
`self-gating flip-flop. This design inhibits
`the clock signal to the flip-flop when the in-
`puts will produce no change in the outputs.
`Strollo et al. [2000] have proposed two ver-
`sions of this design. In the double-gated
`flip-flop, the master and slave latches
`each have their own clock-gating ci