throbber
Power Reduction Techniques For Microprocessor Systems
`
`VASANTH VENKATACHALAM AND MICHAEL FRANZ
`
`University of California, Irvine
`
`Power consumption is a major factor that limits the performance of computers. We
`survey the “state of the art” in techniques that reduce the total power consumed by a
`microprocessor system over time. These techniques are applied at various levels
`ranging from circuits to architectures, architectures to system software, and system
`software to applications. They also include holistic approaches that will become more
`important over the next decade. We conclude that power management is a multifaceted
`discipline that is continually expanding with new techniques being developed at every
`level. These techniques may eventually allow computers to break through the “power
`wall” and achieve unprecedented levels of performance, versatility, and reliability. Yet it
`remains too early to tell which techniques will ultimately solve the power problem.
`
`Categories and Subject Descriptors: C.5.3 [Computer System Implementation]:
`Microcomputers—Microprocessors; D.2.10 [Software Engineering]: Design—
`Methodologies; I.m [Computing Methodologies]: Miscellaneous
`General Terms: Algorithms, Design, Experimentation, Management, Measurement,
`Performance
`Additional Key Words and Phrases: Energy dissipation, power reduction
`
`1. INTRODUCTION
`
`Computer scientists have always tried to
`improve the performance of computers.
`But although today’s computers are much
`faster and far more versatile than their
`predecessors, they also consume a lot
`
`of power; so much power, in fact, that
`their power densities and concomitant
`heat generation are rapidly approaching
`levels comparable to nuclear reactors
`(Figure 1). These high power densities
`impair chip reliability and life expectancy,
`increase cooling costs, and,
`for large
`
`Parts of this effort have been sponsored by the National Science Foundation under ITR grant CCR-0205712
`and by the Office of Naval Research under grant N00014-01-1-0854.
`Any opinions, findings, and conclusions or recommendations expressed in this material are those of the
`authors and should not be interpreted as necessarily representing the official views, policies or endorsements,
`either expressed or implied, of the National Science foundation (NSF), the Office of Naval Research (ONR),
`or any other agency of the U.S. Government.
`The authors also gratefully acknowledge gifts from Intel, Microsoft Research, and Sun Microsystems that
`partially supported this work.
`Authors’ addresses: Vasanth Venkatachalam, School of Information and Computer Science, University of
`California at Irvine, Irvine, CA 92697-3425; email: vvenkata@uci.edu; Michael Franz, School of Information
`and Computer Science, University of California at Irvine, Irvine, CA 92697-3425; email: franz@uci.edu.
`Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
`without fee provided that copies are not made or distributed for profit or direct commercial advantage and
`that copies show this notice on the first page or initial screen of a display along with the full citation.
`Copyrights for components of this work owned by others than ACM must be honored. Abstracting with
`credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
`component of this work in other works requires prior specific permission and/or a fee. Permissions may be
`requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212)
`869-0481, or permissions@acm.org.
`c(cid:1)2005 ACM 0360-0300/05/0900-0195 $5.00
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005, pp. 195–237.
`
`Petitioner Samsung Ex-1030, 0001
`
`

`

`196
`
`1 0 0 0~ - - - - - - - - - - - - - -~
`
`Rocket Nozzle "
`
`Sun's Surface
`
`1386
`
`Penttum
`Oi486
`
`15µ
`
`Iµ
`
`07µ
`
`05µ 035µ 025µ 018µ 013µ
`
`01µ 007µ
`
`Technology Generation
`
`Fig. 1. Power densities rising. Figure adapted from
`Pollack 1999.
`
`data centers, even raise environmental
`concerns.
`At the other end of the performance
`spectrum, power issues also pose prob-
`lems for smaller mobile devices with lim-
`ited battery capacities. Although one could
`give these devices faster processors and
`larger memories, this would diminish
`their battery life even further.
`to
`Without
`cost effective solutions
`the power problem,
`improvements in
`micro-processor technology will eventu-
`ally reach a standstill. Power manage-
`ment is a multidisciplinary field that in-
`volves many aspects (i.e., energy, temper-
`ature, reliability), each of which is complex
`enough to merit a survey of its own. The
`focus of our survey will be on techniques
`that reduce the total power consumed by
`typical microprocessor systems.
`We will follow the high-level taxonomy
`illustrated in Figure 2. First, we will de-
`fine power and energy and explain the
`complex parameters that dynamic and
`static power depend on (Section 2). Next,
`we will introduce techniques that reduce
`power and energy (Section 3), starting
`with circuit (Section 3.1) and architec-
`tural techniques (Section 3.2, Section 3.3,
`and Section 3.4), and then moving on to
`two techniques that are widely applied in
`hardware and software, dynamic voltage
`scaling (Section 3.5) and resource hiberna-
`tion (Section 3.6). Third, we will examine
`what compilers can do to manage power
`(Section 3.7). We will then discuss recent
`work in application level power manage-
`ment (Section 3.8), and recent efforts (Sec-
`tion 3.9) to develop a holistic solution to
`
`V. Venkatachalam and M. Franz
`
`the power problem. Finally, we will dis-
`cuss some commercial power management
`systems (Section 3.10) and provide a
`glimpse into some more radical technolo-
`gies that are emerging (Section 3.11).
`
`(1)
`
`2. DEFINING POWER
`Power and energy are commonly defined
`in terms of the work that a system per-
`forms. Energy is the total amount of work
`a system performs over a period of time,
`while power is the rate at which the sys-
`tem performs that work. In formal terms,
`P = W /T
`E = P ∗ T,
`(2)
`where P is power, E is energy, T is a spe-
`cific time interval, and W is the total work
`performed in that interval. Energy is mea-
`sured in joules, while power is measured
`in watts.
`These concepts of work, power, and en-
`ergy are used differently in different con-
`texts. In the context of computers, work
`involves activities associated with run-
`ning programs (e.g., addition, subtraction,
`memory operations), power is the rate at
`which the computer consumes electrical
`energy (or dissipates it in the form of heat)
`while performing these activities, and en-
`ergy is the total electrical energy the com-
`puter consumes (or dissipates as heat)
`over time.
`This distinction between power and en-
`ergy is important because techniques that
`reduce power do not necessarily reduce en-
`ergy. For example, the power consumed
`by a computer can be reduced by halv-
`ing the clock frequency, but if the com-
`puter then takes twice as long to run
`the same programs, the total energy con-
`sumed will be similar. Whether one should
`reduce power or energy depends on the
`context. In mobile applications, reducing
`energy is often more important because
`it increases the battery lifetime. How-
`ever, for other systems (e.g., servers), tem-
`perature is a larger issue. To keep the
`temperature within acceptable limits, one
`would need to reduce instantaneous power
`regardless of the impact on total energy.
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Petitioner Samsung Ex-1030, 0002
`
`

`

`Power Reduction Techniques for Microprocessor Systems
`
`197
`
`High Level Organization Of Survey
`
`Defining Power
`(Section 2)
`
`Reducing Power
`(Section 3)
`
`Interconnect Optimizations (3.2)
`
`Emerging Technologies (3.11)
`
`Commercial Systems (3.10)
`
`Memory Optimizations (3.3)
`
`Crosslayer Adaptations (3.9)
`
`Hardware Adaptations (3.4)
`
`Application Level Techniques (3.8)
`
`Dynamic Voltage Scaling (3.5)
`
`Compiler Techniques (3.7)
`
`Resource Hibernation (3.6)
`
`Fig. 2. Organization of this survey.
`
`2.1. Dynamic Power Consumption
`There are two forms of power consump-
`tion, dynamic power consumption and
`static power consumption. Dynamic power
`consumption arises from circuit activity
`such as the changes of
`inputs in an
`adder or values in a register. It has two
`sources, switched capacitance and short-
`circuit current.
`Switched capacitance is the primary
`source of dynamic power consumption and
`arises from the charging and discharging
`of capacitors at the outputs of circuits.
`is a secondary
`Short-circuit current
`source of dynamic power consumption and
`accounts for only 10-15% of the total power
`consumption. It arises because circuits are
`composed of transistors having opposite
`polarity, negative or NMOS and positive
`or PMOS. When these two types of tran-
`sistors switch current, there is an instant
`when they are simultaneously on, creat-
`ing a short circuit. We will not deal fur-
`ther with the power dissipation caused by
`this short circuit because it is a smaller
`percentage of total power, and researchers
`have not found a way to reduce it without
`sacrificing performance.
`As the following equation shows, the
`more dominant component of dynamic
`power, switched capacitance (Pdynamic), de-
`pends on four parameters namely, supply
`voltage (V), clock frequency (f), physical
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`capacitance (C) and an activity factor (a)
`that relates to how many 0 → 1 or 1→ 0
`transitions occur in a chip:
`Pdynamic ∼ aCV 2 f .
`
`(3)
`
`Accordingly, there are four ways to re-
`duce dynamic power consumption, though
`they each have different tradeoffs and not
`all of them reduce the total energy con-
`sumed. The first way is to reduce the phys-
`ical capacitance or stored electrical charge
`of a circuit. The physical capacitance de-
`pends on low level design parameters
`such as transistor sizes and wire lengths.
`One can reduce the capacitance by re-
`ducing transistor sizes, but this worsens
`performance.
`The second way to lower dynamic power
`is to reduce the switching activity. As com-
`puter chips get packed with increasingly
`complex functionalities, their switching
`activity increases [De and Borkar 1999],
`making it more important to develop tech-
`niques that fall into this category. One
`popular technique, clock gating, gates
`the clock signal from reaching idle func-
`tional units. Because the clock network
`accounts for a large fraction of a chip’s
`total energy consumption, this is a very
`effective way of reducing power and en-
`ergy throughout a processor and is imple-
`mented in numerous commercial systems
`
`Petitioner Samsung Ex-1030, 0003
`
`

`

`198
`
`V. Venkatachalam and M. Franz
`
`h
`
`100°/c,
`
`0..
`ci
`0
`I-
`
`80%
`
`60%
`
`0) " 0
`" 0) " 0
`0) "" "' ""
`"' 0)
`
`0..
`
`40%
`
`20%
`
`..J
`
`1999
`
`2001
`
`2003
`Year
`
`2005
`
`2007
`
`2009
`
`ITRS trends for leakage power dissipation.
`Fig. 3.
`Figure adapted from Meng et al., 2005.
`
`including the Pentium 4, Pentium M, Intel
`XScale and Tensilica Xtensa, to mention
`but a few.
`The third way to reduce dynamic power
`consumption is to reduce the clock fre-
`quency. But as we have just mentioned,
`this worsens performance and does not al-
`ways reduce the total energy consumed.
`One would use this technique only if
`the target system does not support volt-
`age scaling and if the goal
`is to re-
`duce the peak or average power dissi-
`pation and indirectly reduce the chip’s
`temperature.
`The fourth way to reduce dynamic power
`consumption is to reduce the supply volt-
`age. Because reducing the supply voltage
`increases gate delays, it also requires re-
`ducing the clock frequency to allow the cir-
`cuit to work properly.
`The combination of scaling the supply
`voltage and clock frequency in tandem is
`called dynamic voltage scaling (DVS). This
`technique should ideally reduce dynamic
`power dissipation cubically because dy-
`namic power is quadratic in voltage and
`linear in clock frequency. This is the most
`widely adopted technique. A growing num-
`ber of processors, including the Pentium
`M, mobile Pentium 4, AMD’s Athlon, and
`Transmeta’s Crusoe and Efficieon proces-
`sors allow software to adjust clock fre-
`quencies and voltage settings in tandem.
`However, DVS has limitations and cannot
`always be applied, and even when it can
`be applied, it is nontrivial to apply as we
`will see in Section 3.5.
`
`2.2. Understanding Leakage Power
`Consumption
`consuming dynamic
`In addition to
`power, computer components consume
`static power, also known as idle power
`or leakage. According to the most re-
`cently published industrial
`roadmaps
`[ITRSRoadMap], leakage power is rapidly
`becoming the dominant source of power
`consumption in circuits (Figure 3) and
`persists whether a computer is active or
`idle. Because its causes are different from
`those of dynamic power, dynamic power
`reduction techniques do not necessarily
`reduce the leakage power.
`As the equation that follows illustrates,
`leakage power consumption is the product
`of the supply voltage (V) and leakage cur-
`rent (Il eak), or parasitic current, that flows
`through transistors even when the tran-
`sistors are turned off.
`Pleak = V Ileak.
`
`(4)
`
`To understand how leakage current
`arises, one must understand how transis-
`tors work. A transistor regulates the flow
`of current between two terminals called
`the source and the drain. Between these
`two terminals is an insulator, called the
`channel, that resists current. As the volt-
`age at a third terminal, the gate, isin-
`creased, electrical charge accumulates in
`the channel, reducing the channel’s resis-
`tance and creating a path along which
`electricity can flow. Once the gate volt-
`age is high enough, the channel’s polar-
`ity changes, allowing the normal flow of
`current between the source and the drain.
`The threshold at which the gate’s voltage
`is high enough for the path to open is called
`the threshold voltage.
`According to this model, a transistor is
`similar to a water dam. It is supposed
`to allow current to flow when the gate
`voltage exceeds the threshold voltage but
`should otherwise prevent current from
`flowing. However, transistors are imper-
`fect. They leak current even when the gate
`voltage is below the threshold voltage. In
`fact, there are six different types of cur-
`rent that leak through a transistor. These
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Petitioner Samsung Ex-1030, 0004
`
`

`

`Power Reduction Techniques for Microprocessor Systems
`
`199
`
`include reverse-biased-junction leakage,
`gate-induced-drain leakage, subthreshold
`leakage, gate-oxide leakage, gate-current
`leakage, and punch-through leakage. Of
`these six, subthreshold leakage and gate-
`oxide leakage dominate the total leakage
`current.
`Gate-oxide leakage flows from the gate of
`a transistor into the substrate. This type
`of leakage current depends on the thick-
`ness of the oxide material that insulates
`the gate:
`(cid:1)
`
`(cid:2)2
`
`V T
`
`ox
`
`Iox = K 2W
`
`−α Tox
`V .
`
`e
`
`(5)
`
`ture increases. But as the Equation (6)
`shows, this increases leakage further,
`causing yet higher temperatures. This vi-
`cious cycle is known as thermal runaway.
`It is the chip designer’s worst nightmare.
`How can these problems be solved?
`Equations (4) and (6) indicate four ways to
`reduce leakage power. The first way is to
`reduce the supply voltage. As we will see,
`supply voltage reduction is a very common
`technique that has been applied to compo-
`nents throughout a system (e.g., processor,
`buses, cache memories).
`The second way to reduce leakage power
`is to reduce the size of a circuit because the
`total leakage is proportional to the leak-
`age dissipated in all of a circuit’s tran-
`sistors. One way of doing this is to de-
`sign a circuit with fewer transistors by
`omitting redundant hardware and using
`smaller caches, but this may limit perfor-
`mance and versatility. Another idea is to
`reduce the effective transistor count dy-
`namically by cutting the power supplies to
`idle components. Here, too, there are chal-
`lenges such as how to predict when dif-
`ferent components will be idle and how to
`minimize the overhead of shutting them
`on or off. This, is also a common approach
`of which we will see examples in the sec-
`tions to follow.
`The third way to reduce leakage power
`is to cool the computer. Several cooling
`techniques have been developed since
`the 1960s. Some blow cold air into the
`circuit, while
`others
`refrigerate
`the
`processor [Schmidt and Notohardjono
`2002], sometimes even by costly means
`such as circulating cryogenic fluids like
`liquid nitrogen [Krane et al. 1988]. These
`techniques have three advantages. First
`they significantly reduce subthreshold
`leakage. In fact, a recent study [Schmidt
`and Notohardjono 2002] showed that cool-
`ing a memory cell by 50 degrees Celsius
`reduces the leakage energy by five times.
`Second, these techniques allow a circuit to
`work faster because electricity encounters
`less resistance at lower temperatures.
`Third, cooling eliminates some negative
`effects of high temperatures, namely the
`degradation of a chip’s reliability and life
`expectancy. Despite these advantages,
`
`(6)
`
`According to this equation, the gate-
`oxide leakage Iox increases exponentially
`as the thickness Tox of the gate’s oxide ma-
`terial decreases. This is a problem because
`future chip designs will require the thick-
`ness to be reduced along with other scaled
`parameters such as transistor length and
`supply voltage. One way of solving this
`problem would be to insulate the gate us-
`ing a high-k dialectric material instead
`of the oxide materials that are currently
`used. This solution is likely to emerge over
`the next decade.
`Subthreshold leakage current flows be-
`tween the drain and source of a transistor.
`It is the dominant source of leakage and
`depends on a number of parameters that
`are related through the following equa-
`tion:
`(cid:3)
`(cid:4)
`
`−Vth
`nT
`
`1 − e
`
`−V
`T
`
`.
`
`Isub = K 1W e
`In this equation, W is the gate width
`and K and n are constants. The impor-
`tant parameters are the supply voltage
`V , the threshold voltage Vth, and the
`temperature T. The subthreshold leak-
`age current Isub increases exponentially
`as the threshold voltage Vth decreases.
`This again raises a problem for future
`chip designs, because as technology scales,
`threshold voltages will have to scale along
`with supply voltages.
`The increase in subthreshold leakage
`current causes another problem. When the
`leakage current increases, the tempera-
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Petitioner Samsung Ex-1030, 0005
`
`

`

`200
`
`OFF A
`
`Vdd
`
`Vdd
`
`OFF B
`
`Ileakl
`
`OFF B
`
`i lleak2
`
`Virtual Vdd
`
`OFF C
`
`Ground
`(a)
`
`Ground
`(b)
`
`Virtual Ground
`
`V. Venkatachalam and M. Franz
`
`Vdd
`
`~j ~
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`Sleep Transistors
`/
`
`/
`
`/
`
`/
`
`/
`
`I
`
`I
`
`/
`ff'
`
`Ground
`
`Fig. 4. The transistor stacking effect. The cir-
`cuits in both (a) and (b) are leaking current be-
`tween Vdd and Ground. However, Ileak1, the leak-
`age in (a), is less than Ileak2, the leakage in (b).
`Figure adapted from Butts and Sohi 2000.
`
`there are issues to consider such as the
`costs of the hardware used to cool the
`circuit. Moreover,
`cooling techniques
`are insufficient if they result in wide
`temperature variations in different parts
`of a circuit. Rather, one needs to prevent
`hotspots by distributing heat evenly
`throughout a chip.
`
`2.2.1. Reducing Threshold Voltage. The
`fourth way of reducing leakage power is to
`increase the threshold voltage. As Equa-
`tion (6) shows, this reduces the subthresh-
`old leakage exponentially. However, it also
`reduces the circuit’s performance as is ap-
`parent in the following equation that re-
`lates frequency (f), supply voltage (V), and
`threshold voltage (Vth) and where α is a
`constant:
`
`f ∞ (V − Vth)α
`
`V
`
`.
`
`(7)
`
`the less intuitive ways of
`One of
`increasing the threshold voltage is to
`exploit what is called the stacking effect
`(refer to Figure 4). When two or more tran-
`sistors that are switched off are stacked
`on top of each other (a), then they dis-
`sipate less leakage than a single tran-
`sistor that is turned off (b). This is be-
`cause each transistor in the stack induces
`a slight reverse bias between the gate and
`source of the transistor right below it,
`
`Fig. 5. Multiple threshold circuits with
`sleep transistors.
`
`and this increases the threshold voltage
`of the bottom transistor, making it more
`resistant to leakage. As a result, in Fig-
`ure 4(a), in which all transistors are in
`the Off position, transistor B leaks less
`current than transistor A, and transistor
`C leaks less current than transistor B.
`Hence, the total leakage current is atten-
`uated as it flows from Vdd to the ground
`through transistors A, B, and C. This is not
`the case in the circuit shown in Figure 4
`(b), which contains only a single off
`transistor.
`Another way to increase the thresh-
`old voltage is to use Multiple Thresh-
`old Circuits With Sleep Transistors (MTC-
`MOS) [Calhoun et al. 2003; Won et al.
`2003] . This involves isolating a leaky cir-
`cuit element by connecting it to a pair
`of virtual power supplies that are linked
`to its actual power supplies through sleep
`transistors (Figure 5). When the circuit is
`active, the sleep transistors are activated,
`connecting the circuit to its power sup-
`plies. But when the circuit is inactive, the
`sleep transistors are deactivated, thus dis-
`connecting the circuit from its power sup-
`plies. In this inactive state, almost no leak-
`age passes through the circuit because the
`sleep transistors have high threshold volt-
`ages. (Recall that subthreshold leakage
`drops exponentially with a rise in thresh-
`old voltage, according to Equation (6).)
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Petitioner Samsung Ex-1030, 0006
`
`

`

`Power Reduction Techniques for Microprocessor Systems
`
`201
`
`Vg < V1hreshold
`
`drain
`
`+++
`
`(a)
`
`Fig. 6. Adaptive body biasing.
`
`This technique effectively confines the
`leakage to one part of the circuit, but is
`tricky to implement for several reasons.
`The sleep transistors must be sized prop-
`erly to minimize the overhead of acti-
`vating them. They cannot be turned on
`and off too frequently. Moreover, this tech-
`nique does not readily apply to memories
`because memories lose data when their
`power supplies are cut.
`A third way to increase the threshold
`is to employ dual threshold circuits. Dual
`threshold circuits [Liu et al. 2004; Wei
`et al. 1998; Ho and Hwang 2004] reduce
`leakage by using high threshold (low leak-
`age) transistors on noncritical paths and
`low threshold transistors on critical paths,
`the idea being that noncritical paths can
`execute instructions more slowly without
`impairing performance. This is a diffi-
`cult technique to implement because it re-
`quires choosing the right combination of
`transistors for high-threshold voltages. If
`too many transistors are assigned high
`threshold voltages, the noncritical paths
`in the circuit can slow down too much.
`A fourth way to increase the thresh-
`old voltage is to apply a technique known
`as adaptive body biasing [Seta et al.
`1995; Kobayashi and Sakurai 1994; Kim
`and Roy 2002]. Adaptive body biasing is
`a runtime technique that reduces leak-
`age power by dynamically adjusting the
`threshold voltages of circuits depending
`on whether the circuits are active. When
`a circuit is not active, the technique in-
`creases its threshold voltage, thus saving
`leakage power exponentially, although at
`the expense of a delay in circuit operation.
`When the circuit is active, the technique
`decreases the threshold voltage to avoid
`slowing it down.
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`Vg < V1hreshold
`
`drain
`
`Body bias vollage
`
`(b)
`
`To adjust the threshold voltage, adap-
`tive body biasing applies a voltage to the
`transistor’s body known as a body bias
`voltage (Figure 6). This voltage changes
`the polarity of a transistor’s channel, de-
`creasing or increasing its resistance to
`current flow. When the body bias voltage
`is chosen to fill the transistor’s channel
`with positive ions (b), the threshold volt-
`age increases and reduces leakage cur-
`rents. However, when the voltage is cho-
`sen to fill the channel with negative ions,
`the threshold voltage decreases, allowing
`higher performance, though at the cost of
`more leakage.
`
`3. REDUCING POWER
`
`3.1. Circuit And Logic Level Techniques
`3.1.1. Transistor Sizing. Transistor siz-
`ing [Penzes et al. 2002; Ebergen et al.
`2004] reduces the width of transistors
`to reduce their dynamic power consump-
`tion, using low-level models that relate the
`power consumption to the width. Accord-
`ing to these models, reducing the width
`also increases the transistor’s delay and
`thus the transistors that lie away from
`the critical paths of a circuit are usually
`the best candidates for this technique. Al-
`gorithms for applying this technique usu-
`ally associate with each transistor a toler-
`able delay which varies depending on how
`close that transistor is to the critical path.
`These algorithms then try to scale each
`transistor to be as small as possible with-
`out violating its tolerable delay.
`
`ar-
`Reordering. The
`3.1.2. Transistor
`rangement of transistors in a circuit
`affects energy consumption. Figure 7
`
`Petitioner Samsung Ex-1030, 0007
`
`

`

`202
`
`\'dd
`
`/
`
`S"itchinJ \ cm ity
`" ✓, '
`/ ✓,
`I
`
`/
`
`I
`
`l>"., ,,'
`
`'
`
`'
`
`'
`
`V. Venkatachalam and M. Franz
`
`3.1.3. Half Frequency and Half Swing Clocks.
`Half-frequency and half-swing clocks re-
`duce frequency and voltage, respectively.
`Traditionally, hardware events such as
`register file writes occur on a rising
`clock edge. Half-frequency clocks synchro-
`nize events using both edges, and they
`tick at half the speed of regular clocks,
`thus cutting clock switching power in
`half. Reduced-swing clocks also often use
`a lower voltage signal and thus reduce
`power quadratically.
`
`3.1.4. Logic Gate Restructuring. There are
`many ways to build a circuit out of logic
`gates. One decision that affects power con-
`sumption is how to arrange the gates and
`their input signals.
`For example, consider two implementa-
`tions of a four-input AND gate (Figure 8),
`a chain implementation (a), and a tree
`implementation (b). Knowing the signal
`probabilities (1 or 0) at each of the pri-
`mary inputs (A, B, C, D), one can easily cal-
`culate the transition probabilities (0→1)
`for each output (W, X, F, Y, Z). If each in-
`put has an equal probability of being a
`1 or a0, then the calculation shows that
`the chain implementation (a) is likely to
`switch less than the tree implementation
`(b). This is because each gate in a chain
`has a lower probability of having a 0→1
`transition than its predecessor; its tran-
`sition probability depends on those of all
`its predecessors. In the tree implementa-
`tion, on the other hand, some gates may
`share a parent (in the tree topology) in-
`stead of being directly connected together.
`These gates could have the same transi-
`tion probabilities.
`Nevertheless, chain implementations
`do not necessarily save more energy than
`tree implementations. There are other is-
`sues to consider when choosing a topology.
`One is the issue of glitches or spurious
`transitions that occur when a gate does not
`receive all of its inputs at the same time.
`These glitches are more common in chain
`implementations where signals can travel
`along different paths having widely vary-
`ing delays. One solution to reduce glitches
`is to change the topology so that the dif-
`ferent paths in the circuit have similar
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`I
`
`I
`
`I r
`
`(a)
`
`(bl
`
`Fig. 7. Transistor Reordering. Figure
`adapted from Hossain et al. 1996.
`
`shows two possible implementations of
`the same circuit that differ only in their
`placement of the transistors marked A
`and B. Suppose that the input to transis-
`tor A is 1, the input to transistor B is 1,
`and the input to transistor C is 0. Then
`transistors A and B will be on, allowing
`current from Vdd to flow through them
`and charge the capacitors C1 and C2.
`Now suppose that the inputs change and
`that A’s input becomes 0, and C’s input be-
`comes 1. Then A will be off while B and C
`will be on. Now the implementations in (a)
`and (b) will differ in the amounts of switch-
`ing activity. In (a), current from ground
`will flow past transistors B and C, dis-
`charging both the capacitors C1 and C2.
`However, in (b), the current from ground
`will only flow past transistor C; it will not
`get past transistor A since A is turned off.
`Thus it will only discharge the capacitor
`C2, rather than both C1 and C2 as in part
`(a). Thus the implementation in (b) will
`consume less power than that in (a).
`Transistor reordering [Kursun et al.
`2004; Sultania et al. 2004] rearranges
`transistors to minimize their switching
`activity. One of its guiding principles is
`to place transistors closer to the circuit’s
`outputs if they switch frequently in or-
`der to prevent a domino effect where
`the switching activity from one transistor
`trickles into many other transistors caus-
`ing widespread power dissipation. This re-
`quires profiling techniques to determine
`how frequently different transistors are
`likely to switch.
`
`Petitioner Samsung Ex-1030, 0008
`
`

`

`Power Reduction Techniques for Microprocessor Systems
`
`203
`
`w
`" '~
`
`· A
`B
`0.5
`
`C
`0.5
`
`7/64
`X
`
`15/256
`F
`
`0.5
`
`(a)
`
`0.5A
`
`B
`11.5
`
`11.5
`C
`
`D
`0.5
`
`F
`
`3/16
`y
`
`z
`3/16
`
`(b)
`
`Fig. 8. Gate restructuring. Figure adapted from the Pennsylvania State
`University Microsystems Design Laboratory’s tutorial on low power design.
`
`delays. This solution, known as path bal-
`ancing often transforms chain implemen-
`tations into tree implementations. An-
`other solution, called retiming, involves
`inserting flip-flops or registers to slow
`down and thereby synchronize the signals
`that pass along different paths but recon-
`verge to the same gate. Because flip-flops
`and registers are in sync with the proces-
`sor clock, they sample their inputs less fre-
`quently than logic gates and are thus more
`immune to glitches.
`
`3.1.5. Technology Mapping. Because of
`the huge number of possibilities and
`tradeoffs at the gate level, designers rely
`on tools to determine the most energy-
`optimal way of arranging gates and sig-
`nals. Technology mapping [Chen et al.
`2004; Li et al. 2004; Rutenbar et al. 2001]
`is the automated process of constructing a
`gate-level representation of a circuit sub-
`ject to constraints such as area, delay, and
`power. Technology mapping for power re-
`lies on gate-level power models and a li-
`brary that describes the available gates,
`and their design constraints. Before a cir-
`cuit can be described in terms of gates, it
`is initially represented at the logic level.
`The problem is to design the circuit out of
`logic gates in a way that will mimimize the
`total power consumption under delay and
`cost constraints. This is an NP-hard Di-
`rected Acyclic Graph (DAG) covering prob-
`lem, and a common heuristic to solve it is
`to break the DAG representation of a cir-
`cuit into a set of trees and find the optimal
`mapping for each subtree using standard
`tree-covering algorithms.
`
`3.1.6. Low Power Flip-Flops. Flip-flops
`are the building blocks of small memories
`
`ACM Computing Surveys, Vol. 37, No. 3, September 2005.
`
`such as register files. A typical master-
`slave flip-flop consists of two latches, a
`master latch, and a slave latch. The inputs
`of the master latch are the clock signal
`and data. The inputs of the slave latch
`are the inverse of the clock signal and the
`data output by the master latch. Thus
`when the clock signal is high, the master
`latch is turned on, and the slave latch
`is turned off. In this phase, the master
`samples whatever inputs it receives and
`outputs them. The slave, however, does
`not sample its inputs but merely outputs
`whatever it has most recently stored. On
`the falling edge of the clock, the master
`turns off and the slave turns on. Thus the
`master saves its most recent input and
`stops sampling any further inputs. The
`slave samples the new inputs it receives
`from the master and outputs it.
`Besides this master-slave design are
`other common designs such as the pulse-
`triggered flip-flop and sense-amplifier flip-
`flop. All these designs share some common
`sources of power consumption, namely
`power dissipated from the clock signal,
`power dissipated in internal switching ac-
`tivity (caused by the clock signal and by
`changes in data), and power dissipated
`when the outputs change.
`Researchers have proposed several al-
`ternative low power designs for flip-flops.
`Most of these approaches reduce the
`switching activity or the power dissipated
`by the clock signal. One alternative is the
`self-gating flip-flop. This design inhibits
`the clock signal to the flip-flop when the in-
`puts will produce no change in the outputs.
`Strollo et al. [2000] have proposed two ver-
`sions of this design. In the double-gated
`flip-flop, the master and slave latches
`each have their own clock-gating ci

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket