`
`IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006
`
`Sleepy Stack Leakage Reduction
`
`Jun Cheol Park and Vincent J. Mooney III, Senior Member, IEEE
`
`Abstract—Leakage power consumption of current CMOS
`technology is already a great challenge. International Technology
`Roadmap for Semiconductors projects that leakage power con-
`sumption may come to dominate total chip power consumption as
`the technology feature size shrinks. Leakage is a serious problem
`particularly for CMOS circuits in nanoscale technology. We pro-
`pose a novel ultra-low leakage CMOS circuit structure which we
`call “sleepy stack.” Unlike many other previous approaches, sleepy
`stack can retain logic state during sleep mode while achieving
`ultra-low leakage power consumption. We apply the sleepy stack
`to generic logic circuits. Although the sleepy stack incurs some
`delay and area overhead, the sleepy stack technique achieves the
`lowest leakage power consumption among known state-saving
`leakage reduction techniques, thus, providing circuit designers
`with new choices to handle the leakage power problem.
`Index Terms—Dual- th, low-leakage power dissipation, tran-
`sistor stacking.
`
`I. INTRODUCTION
`
`P OWER consumption is one of the top concerns of VLSI
`
`circuit design, for which CMOS is the primary technology.
`Today’s focus on low power is not only because of the recent
`growing demands of mobile applications. Even before the mo-
`bile era, power consumption has been a fundamental problem.
`To solve the power dissipation problem, many researchers have
`proposed different ideas from the device level to the architec-
`tural level and above. However, there is no universal way to
`avoid tradeoffs between power, delay, and area, and thus, de-
`signers are required to choose appropriate techniques that sat-
`isfy application and product needs.
`Power consumption of CMOS consists of dynamic and static
`components. Dynamic power is consumed when transistors are
`switching and static power is consumed regardless of transistor
`switching. Dynamic power consumption was previously (at
`0.18- m technology and above) the single largest concern
`for low-power chip designers since dynamic power accounted
`for 90% or more of the total chip power. Therefore, many
`previously proposed techniques, such as voltage and frequency
`scaling, focused on dynamic power reduction. However, as the
`feature size shrinks, e.g., to 0.09 and 0.065 m, static power
`has become a great challenge for current and future technolo-
`gies. Based on the International Technology Roadmap for
`Semiconductors (ITRS) [1], Kim et al. report that subthreshold
`leakage power dissipation of a chip may exceed dynamic power
`dissipation at the 65-nm feature size [2].
`
`One of the main reasons causing the leakage power increase
`is the increase of subthreshold leakage power. When technology
`feature size scales down, supply voltage and threshold voltage
`also scale down. Subthreshold leakage power increases expo-
`nentially as threshold voltage decreases. Furthermore, the struc-
`ture of the short channel device decreases the threshold voltage
`even lower. In addition to subthreshold leakage, another con-
`tributor to leakage power is gate-oxide leakage power due to the
`tunneling current through the gate-oxide insulator. Since gate-
`oxide thickness may reduce as the channel length decreases, in
`sub 0.1- m technology, gate-oxide leakage power may be com-
`parable to subthreshold leakage power if not handled properly.
`However, we assume other techniques will address gate-oxide
`leakage; for example, high- dielectric gate insulators may pro-
`vide a solution to reduce gate-leakage [2]. Therefore, this paper
`focuses on reducing subthreshold leakage power consumption.
`In this paper, we provide a new circuit structure named
`“sleepy stack” as a remedy for static power consumption. The
`sleepy stack has a novel structure that uniquely combines the
`advantages of two major prior approaches, the sleep transistor
`technique and the forced stack technique. However, unlike the
`sleep transistor technique, the sleepy stack technique retains the
`original state; furthermore, unlike the forced stack technique,
`to achieve up
`the sleepy stack technique can utilize high-
`to two orders of magnitude leakage power reduction compared
`to the forced stack. Unfortunately, the sleepy stack technique
`comes with delay and area overheads. Therefore, the sleepy
`stack technique provides new Pareto points [3] to designers
`who require ultra-low leakage power consumption and are
`willing to pay some area and delay cost.
`The main contributions of this paper are as follows: 1) intro-
`duction of a sleepy stack structure that can save leakage power
`up to two orders of magnitude for circuits that require extremely
`low leakage power consumption and 2) analysis of example
`sleepy stack logic circuits in terms of various ways (transistor
`scaling, threshold voltage, and transistor width) circuit design
`engineers can employ to adopt the sleepy stack technique as nec-
`essary.
`This paper is organized as follows. In Section II, prior work
`about low-leakage logic design is discussed. In Section III, the
`sleepy stack structure is explained and an analytical delay model
`is discussed. In Section IV, an empirical methodology applying
`the sleepy stack to generic logic is explained. In Section V, the
`experimental results of the sleepy stack for generic logic is pre-
`sented. In Section VI, conclusions are given.
`
`Manuscript received August 5, 2005; revised July 7, 2006.
`J. C. Park is with the Mobility Group, Intel Corporation, Folsom, CA 95630
`USA (e-mail: juncheol.park@intel.com).
`V. J. Mooney III is with the School of Electrical and Computer Engi-
`neering, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail:
`mooney@ece.gatech.edu).
`Digital Object Identifier 10.1109/TVLSI.2006.886398
`
`II. PREVIOUS WORK
`
`In this section, we discuss previous low-power techniques
`that primarily target reducing leakage power consumption of
`CMOS circuits. Techniques for leakage power reduction can
`
`1063-8210/$20.00 © 2006 IEEE
`
`1
`
`APPLE 1007
`
`
`
`PARK AND MOONEY III: SLEEPY STACK LEAKAGE REDUCTION
`
`1251
`
`be grouped into the following two categories: 1) state-saving
`techniques where circuit state (present value) is retained and
`2) state-destructive techniques where the current Boolean output
`value of the circuit might be lost [2]. A state-saving technique
`has an advantage over a state-destructive technique in that with a
`state-saving technique the circuitry can immediately resume op-
`eration at a point much later in time without having to somehow
`regenerate state. We characterize each low-leakage technique
`according to this criterion.
`State-destructive techniques cut off transistor (pull-up or pull-
`down or both) networks from supply voltage or ground using
`sleep transistors [4]. These types of techniques are also called
`and gated-
`(note that a gated clock is gener-
`gated-
`ally used for dynamic power reduction). Motoh et al. propose a
`technique they call multithreshold-voltage CMOS (MTCMOS)
`[4], which adds high-
`sleep transistors between pull-up net-
`works and
`and between pull-down networks and ground
`while logic circuits use low-
`transistors in order to maintain
`fast logic switching speeds. The sleep transistors are turned off
`when the logic circuits are not in use. By isolating the logic net-
`works using sleep transistors, the sleep transistor technique dra-
`matically reduces leakage power during sleep mode. However,
`the additional sleep transistors increase area and delay. Further-
`more, during sleep mode, the pull-up and pull-down networks
`will have floating values and, thus, will lose state. These floating
`values significantly impact the wake-up time and energy of the
`sleep technique due to the requirement to recharge transistors
`which lost state during sleep (this issue is nontrivial, especially
`for registers and flip-flops).
`To reduce the wake-up cost of the sleep transistor technique,
`the zigzag technique is introduced [5]. The zigzag technique
`reduces the wake-up overhead by choosing a particular circuit
`state (e.g., corresponding to a “reset”) and then, for the exact
`circuit state chosen, turning off the pull-down network for each
`gate whose output is high while conversely turning off the
`pull-up network for each gate whose output is low.
`By applying, prior to going to sleep, the particular input pat-
`tern chosen prior to chip fabrication, the zigzag technique can
`prevent floating. Although the zigzag technique retains the par-
`ticular state chosen prior to chip fabrication, any other arbitrary
`state during regular operation is lost in power-down mode.
`Another technique to reduce leakage power is transistor
`stacking. Transistor stacking exploits the stack effect;
`the
`stack effect results in substantial subthreshold leakage current
`reduction when two or more stacked transistors are turned off
`together. Narendra et al. study the effectiveness of the stack
`effect including effects from increasing the channel length [6].
`Since forced stacking of what previously was a single tran-
`sistor increases delay, Johnson et al. propose an algorithm that
`finds circuit input vectors that maximize stacked transistors of
`existing complex logic [7]. As a variation of the stacking tran-
`sistors, Hanchate and Ranganathan introduce self-controlled
`stacked transistors which are inserted between pull-up and
`pull-down networks and reduce leakage power by increasing
`internal resistance [8].
`Our sleepy stack structure can achieve more power savings
`than the forced stack technique and the self-controlled stacked
`transistors (e.g., 100
`compared with 10
`for the forced
`
`(a) Forced stack technique applied to an inverter. (b) Sleep transistor
`Fig. 1.
`technique applied to an inverter.
`
`stack transistor or the self-controlled stacked transistors).
`Furthermore, the sleepy stack can save exact logic state unlike
`gated-
`and gated-
`techniques (conventional sleep tran-
`sistor technique) and the zigzag technique.
`In Section III, we will discuss the sleepy stack structure and
`sleepy stack operation.
`
`III. SLEEPY STACK STRUCTURE
`We introduce our new leakage power reduction technique we
`name “sleepy stack.” The sleepy stack technique has a combined
`structure of the forced stack technique and the sleep transistor
`technique. However, unlike the sleep transistor technique, the
`sleepy stack technique retains exact logic state when in sleep
`mode; furthermore, unlike the forced stack technique, the sleepy
`stack technique can utilize high-
`transistors without 5
`(or
`greater) delay penalties. Therefore, far better than any prior ap-
`proach known to the authors of this paper, the sleepy stack tech-
`nique can achieve ultra-low leakage power consumption while
`saving state.
`We, first, explain the structure of the sleepy stack technique
`using an inverter. Then, we describe the details of sleepy stack
`operation in active mode and sleep mode. The advantages of
`the sleepy stack technique over the forced stack technique and
`the sleep transistor technique are explored. Finally, we derive a
`first-order delay model that compares the sleepy stack technique
`to the forced stack technique analytically.
`
`A. Sleepy Stack Approach
`In this section, we explain our sleepy stack structure com-
`paring to the forced stack technique and the sleep transistor tech-
`nique. The details of the sleepy stack inverter are described as
`an example. Two operation modes, active mode and sleep mode,
`of the sleepy stack technique are explored.
`1) Sleepy Stack Structure: The sleepy stack structure has
`a combined structure of the forced stack and the sleep tran-
`sistor techniques. Although we mentioned these two techniques
`in Section II, we focus on explaining forced stack and sleep
`transistor inverters here for the purposes of comparison with a
`sleepy stack inverter. Fig. 1(a) depicts a forced stack inverter and
`Fig. 1(b) depicts a sleep transistor inverter. The forced stack in-
`verter breaks existing transistors into two transistors and forces a
`stack structure to take advantage of the stack effect; this is shown
`in Fig. 1(a). Meanwhile, the sleep transistor inverter shown in
`
`2
`
`
`
`1252
`
`IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006
`
`(a) Sleepy stack inverter with W=L of each transistor and active mode
`Fig. 2.
`S, S assertion. (b) Sleep mode S, S assertion.
`
`Fig. 1(b) isolates existing logic networks using sleep transis-
`tors. The stack structure in Fig. 1(b) saves leakage power con-
`sumption during sleep mode. This sleep transistor technique
`frequently uses high-
`sleep transistors (the transistors con-
`trolled by
`and
`) to achieve larger leakage power reduction.
`The sleepy stack technique has a structure merging the forced
`stack technique and the sleep transistor technique. Fig. 2 shows
`a sleepy stack inverter. The sleepy stack technique divides ex-
`isting transistors into two transistors each typically with the
`same width
`half the size of the original single transistor’s
`width
`(i.e.,
`), thus, maintaining equivalent
`input capacitance. The sleepy stack inverter in Fig. 2(a) uses
`for the pull-up transistors and
`for the
`pull-down transistors, while a conventional inverter with the
`same input capacitance would use
`for the pull-up
`transistor and
`for the pull-down transistor (assuming
`). Then sleep transistors are added in parallel to
`one of the transistors in each set of two stacked transistors.
`We use a transistor sized as half the width of the original tran-
`sistor (i.e., we use
`) for the sleep transistor width of the
`sleepy stack. Although we exclusively use
`for the width
`of the sleep transistor, changing the sleep transistor width in
`various ways may provide additional tradeoffs between delay,
`power, and area. However, in this paper, we mainly focus on
`applying the sleepy stack structure with
`sleep transistor
`widths to generic logic circuits while varying technology fea-
`ture size, threshold voltage, and temperature. Please note that
`halving transistor width is not possible for a circuit that uses
`minimum size transistors. However, many circuits use nonmin-
`imum size to gain driving strength. In any case, if we cannot
`halve transistor width, then we simply use minimum width.
`2) Sleepy Stack Operation: Now we explain how the sleepy
`stack works during active mode and during sleep mode. Also,
`we explain leakage power savings using the sleepy stack struc-
`ture.
`The sleep transistors of the sleepy stack operate similar to the
`sleep transistors used in the sleep transistor technique in which
`sleep transistors are turned on during active mode and turned
`off during sleep mode. Fig. 2 depicts the sleepy stack operation
`using a sleepy stack inverter. During active mode [Fig. 2(a)],
`and
`are asserted, and, thus, all sleep transistors
`
`Fig. 3.
`
`(a) Inverter circuit schematic. (b) RC equivalent circuit.
`
`are turned on. This sleepy stack structure can potentially reduce
`circuit delay in two ways. First, since the sleep transistors are al-
`ways on during active mode, the sleepy stack structure achieves
`faster switching time than the forced stack structure; specifi-
`cally, in Fig. 2(a), at each sleep transistor drain, the voltage
`value connected to the sleep transistor source is always ready
`and available at the sleep transistor drain, and thus, current flow
`transistors connected to
`is immediately available to the low-
`the gate output regardless of the status of each transistor in par-
`allel to the sleep transistors. Furthermore, we can use high-
`transistors (which are slow but 1000
`or so less leaky) for the
`sleep transistors and the transistors parallel to the sleep transis-
`tors (see Fig. 2) without incurring large (e.g., 2 or more) delay
`increase.
`During sleep mode [Fig. 2(b)],
`are
`and
`asserted, and so both of the sleep transistors are turned off.
`Although the sleep transistors are turned off, the sleepy stack
`structure maintains exact logic state. The leakage reduction of
`the sleepy stack structure occurs in two ways. First, leakage
`power is suppressed by high-
`transistors, which are applied
`to the sleep transistors and the transistors parallel to the sleep
`transistors. Second, stacked and turned off transistors induce
`the stack effect [11], which also suppresses leakage power
`consumption. By combining these two effects, the sleepy stack
`structure achieves ultra-low leakage power consumption during
`sleep mode while retaining exact logic state. The price for this,
`however, is increased area.
`We will derive an analytical delay model of the sleepy stack
`inverter and compare the sleepy stack technique to the forced
`stack inverter in the next section. This analytical comparison
`of the next section, Section III-B, can be skipped if desired.
`The detailed experimental methodology and the results will be
`presented in Section IV.
`
`B. Analytical Comparison of Sleepy Stack Inverter Versus
`Forced Stack Inverter
`In this section, an analytical delay model of a sleepy stack
`inverter is explained and compared to a forced stack inverter,
`the best prior state-saving leakage reduction technique we could
`find.
`Generally, the transistor delay of a conventional inverter
`shown in Fig. 3 driving a load of
`can be expressed using
`the following equation:
`
`where
`tance.
`
`is the transistor resis-
`is the load capacitance and
`in Fig. 3(b) indicates input capacitance. Although the
`
`(1)
`
`3
`
`
`
`PARK AND MOONEY III: SLEEPY STACK LEAKAGE REDUCTION
`
`1253
`
`is 50%
`We assume that the internal node capacitance
`larger than
`because
`is the capacitance from three tran-
`sistors connected, while
`is the capacitance from two tran-
`sistors connected. Then
`
`(6)
`
`(7)
`
`Therefore,
`and
`if we use the same
`is 25% faster than
`for the forced stack inverter and the sleepy stack inverter.
`Alternatively, we may increase
`of the sleepy stack inverter
`and make the delay of the sleepy stack inverter and the delay of
`the forced stack inverter the same.
`Let us take an example. The gate delay of a CMOS circuit can
`be expressed as shown in the following approximated equation:
`
`(8)
`
`denote the gate delay in a CMOS cir-
`, and
`,
`where
`cuit, the threshold voltage, and velocity saturation index of a
`transistor, respectively. Using (8), the delay of the forced stack
`and the delay of the sleepy stack
`can be expressed as
`follows:
`
`(9)
`
`(10)
`
`and
`
`where
`are delay coefficients of the forced stack
`and
`inverter and the sleepy stack inverter, respectively. When the
`threshold voltage of the forced stack
`is the same as the
`threshold voltage of the sleepy stack
`, we calculate
`V,
`from (7). If we assume that
`,
`by applying
`equal to
`V, we can make
`of the forced
`, which is 69% higher than the
`stack inverter. This higher
`can potentially result in large
`leakage power reduction (e.g., 10 ).
`In this section, we introduced the sleepy stack technique for
`leakage power reduction. By combining the forced stack tech-
`nique and the sleep transistor technique, the sleepy stack can
`achieve smaller transistor delay than the forced stack technique
`while retaining state unlike the sleep transistor technique. The
`main advantage of the sleepy stack approach is the ability to use
`high-
`for both the sleep transistors and the transistors in par-
`allel with the sleep transistors. The increased threshold voltage
`transistors of the sleepy stack technique potentially brings much
`larger ( 10 ) leakage power reduction than the forced stack
`technique while achieving the same transistor delay. From the
`analytical model of the sleepy stack inverter, we observe that
`the sleepy stack inverter can reduce delay by 25%, which al-
`ternatively can be used to increase
`by 69%. Using this in-
`creased threshold voltage, the sleepy stack inverter can poten-
`tially achieve a large (e.g., 10 ) leakage power reduction com-
`pared to the forced stack inverter.
`In this section, we explained the sleepy stack structure and
`sleepy stack operation. We also described a first-order delay
`model of the sleepy stack (please note that all power and
`delay results reported in Section V are based, however, on
`
`(a) Forced stack technique inverter circuit schematic. (b) RC equivalent
`
`Fig. 4.
`circuit.
`
`Fig. 5.
`
`(a) Sleepy stack technique inverter schematic. (b) RC equivalent circuit.
`
`nonsaturation mode equation is complicated, we can predict the
`adequate first-order gate delay from (1) [14].
`Now we derive the delay of the inverter with the forced
`stack technique shown in Fig. 4. Since we assume that we
`break each existing transistor into two half sized transistors
`(see Section III-A1), the resistance of each transistor of the
`forced stack technique is doubled, i.e.,
`, compared to the
`standard inverter; furthermore, in this way, we can maintain
`input capacitance equal to Fig. 3(b). In Fig. 4,
`is internal
`node capacitance between the two pull-down transistors. Using
`the Elmore equation [10], we can express the delay of the
`forced stack inverter as follows:
`
`(2)
`(3)
`
`Similarly, we can depict the sleepy stack inverter and its re-
`sistance-capacitance (RC) equivalent circuit as shown in Fig. 5.
`Two extra sleep transistors are added and each sleep transistor
`has a resistance of
`(as discussed in Section III-A1, please
`note that increasing sleep transistor width reduces the sleep tran-
`sistor resistance further—however, let us continue with the ap-
`proach of Section III-A). The internal node capacitance is
`.
`Using the Elmore equation, we can derive the transistor delay
`of the sleepy stack inverter as follows:
`
`(4)
`(5)
`
`4
`
`
`
`1254
`
`IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006
`
`Fig. 6. Chain of four inverters with W=L of each transistor.
`
`HSPICE—see Section IV-C). In the next sections, we apply the
`sleepy stack structure to generic logic circuits, explaining in
`detail our methodology.
`
`IV. APPLYING SLEEPY STACK TO LOGIC CIRCUITS
`In this section, we first explain target benchmark circuits we
`use focusing on generic logic to evaluate our sleepy stack tech-
`nique [11]. Then we explain low-leakage techniques we con-
`sider for purposes of comparison; although the basic ideas of the
`compared techniques have been covered in Section II, this sec-
`tion will give detailed structure with transistor sizing for each
`prior technique to be compared to our sleepy stack approach.
`Finally, we explain experimental methodology that we use to
`compare our technique to the previous techniques we consider.
`
`A. Benchmark Circuits
`To show that the sleepy stack technique is applicable to gen-
`eral logic design, we choose three benchmark circuits, which
`are as follows: 1) a chain of 4 inverters; 2) a 4:1 multiplexer;
`and 3) a 4-bit adder.
`1) Chain of Four Inverters: A chain of four inverters shown
`in Fig. 6 is chosen because an inverter is one of the most basic
`CMOS circuits and is typically used to study circuit characteris-
`tics. We size each transistor of the inverter to have equal rise and
`fall times in each stage. Instead of using the minimum possible
`size of the transistor in a given technology, we use
`for pMOS and
`for nMOS transistors. Please refer
`to [12] for a layout of the chain of four inverters in TSMC
`0.18- m technology using the widths shown in Fig. 6; note that
`in Fig. 6, for 0.18- m technology, all pMOS transistors have
`m and
`m while all nMOS transistors
`have
`m and
`m.
`2) 4:1 Multiplexer: A possible implementation of a 4:1 mul-
`tiplexer is shown in Fig. 7, in which
`are input signals,
`and
`are selection signals, and
`is an enable signal. The
`multiplexer consists of an inverter, two-input NAND gates, and
`two-input NOR gates. All gates are sized to have rise and fall
`times equal to an inverter with pMOS
`and nMOS
`. Although the 4:1 multiplexer shown in Fig. 7 is not
`the most efficient way to implement a 4:1 multiplexer, we use
`the design of Fig. 7 to show that the sleepy stack can be ap-
`plicable to a combination of (a logic network of) typical CMOS
`gates. Please refer to [12] for NAND and NOR layouts used in this
`4:1 multiplexer.
`3) 4-Bit Adder: By use of the 1-bit full adder shown in Fig. 8,
`we implement a 4-bit adder. A full adder is an example of a
`typical complex CMOS gate. In Fig. 8,
`and
`are two inputs
`and is a carry input.
`and
`are outputs. The transistor
`
`Fig. 7. 4:1 multiplexer with delay critical path along the dashed line.
`
`sizing of the full adder is noted in Fig. 8. Please refer to [12] for
`the full adder layout we use.
`These three benchmark circuits (chain of 4 inverters, 4:1 mul-
`tiplexer, and 4-bit adder) designed in a conventional CMOS
`structure are used as our base case. In the next section, we ex-
`plain the low-leakage techniques to which we compare to our
`sleepy stack technique. These three benchmark circuits are also
`implemented using the low-leakage techniques explained in the
`next section, Section IV-B.
`
`B. Prior Low-Leakage Techniques Considered for
`Comparison Purposes
`
`The sleepy stack technique is compared to a conventional
`CMOS approach, which is our base case, and three other well-
`known previous approaches, i.e., the forced stack, sleep, and
`zigzag techniques explained in Section II. We also explore the
`impact of
`and transistor width on the sleepy stack technique.
`1) Base Case: In this paper, we use the phrase “base case”
`to refer to the conventional CMOS technique shown in Fig. 9
`and described in a classic textbook by Weste and Eshraghian
`[13]. Fig. 9 shows a pull-up network and a pull-down network
`using as few transistors as possible to implement the Boolean
`logic function desired. The base case of a chain of four inverters
`is sized as explained in Section IV-A1. The base case of a 4:1
`multiplexer is sized as explained in Section IV-A2. The base
`case of a 4-bit adder is sized as explained in Section IV-A3.
`2) Sleepy Stack Technique: Fig. 10 shows the sleepy stack
`technique applied to a conventional CMOS design. When we
`apply the sleepy stack technique, we replace each existing tran-
`sistor with two half sized transistors and add one extra sleep
`transistor as shown in Fig. 10. If dual-
`values are available,
`high-
`transistors are used for sleep transistors and transistors
`that are parallel to the sleep transistors.
`3) Forced Stack Technique: Fig. 11 shows the forced stack
`technique, which forces a stack structure by breaking down an
`
`5
`
`
`
`PARK AND MOONEY III: SLEEPY STACK LEAKAGE REDUCTION
`
`1255
`
`Fig. 8. 1-bit full adder with W=L of each transistor.
`
`Fig. 9. Base case (conventional CMOS) circuit structure.
`
`existing transistor into two half size transistors. When we apply
`the forced stack technique, we replace each existing transistor
`with two half sized transistors as shown in Fig. 11.
`4) Sleep Transistor Technique: The sleep transistor tech-
`nique shown in Fig. 12 uses sleep transistors between both
`and the pull-up network as well as between
`and the pull-
`down network. Generally, the width/length
`ratio is sized
`based on a tradeoff between area, leakage reduction, and delay.
`For simplicity, we size the sleep transistor to the size of the
`largest transistor in the network (pull-up or pull-down) con-
`nected to the sleep transistor. The size noted in Fig. 12 shows
`an example when the sleep transistors are applied to one of the
`inverters from Fig. 6. The pMOS and nMOS sleep transistors in
`Fig. 12 have
`and
`, respectively, because
`
`Fig. 10. Sleepy stack technique circuit structure.
`
`the size of the pull-up and pull-down transistors in Fig. 6 are
`and
`, respectively. If dual-
`values are
`available, high-
`transistors are used for sleep transistors.
`5) Zigzag Technique: The zigzag technique in Fig. 13 uses
`one sleep transistor in each logic stage either in the pull-up or
`pull-down network according a particular input pattern. In this
`paper, we use an input vector that can achieve the lowest mea-
`sured (simulated) leakage power consumption. Then, we either
`assign a sleep transistor to the pull-down network if the output
`is “ ” or else assign a sleep transistor to the pull-up network
`if the output is “ .” For Fig. 13, we assume that the output of
`the first stage is “ ” and the output of the second stage is “ ”
`
`6
`
`
`
`1256
`
`IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006
`
`Fig. 11. Forced stack technique circuit structure.
`
`Fig. 13. Zigzag technique circuit structure.
`
`Fig. 12. Sleep transistor technique circuit structure.
`
`when minimum leakage inputs are asserted. Therefore, we apply
`a pull-down sleep transistor for the first stage and a pull-up sleep
`transistor for the second stage. Similar to the sleep transistor
`technique, we size the sleep transistors to the size of the largest
`transistor in the network (pull-up or pull-down) connected to
`the sleep transistor. The transistor sizing in Fig. 13 shows an
`example where the zigzag technique is applied to two inverters
`from Fig. 6. If dual-
`values are available, high-
`transis-
`tors are used for the sleep transistors.
`The low-leakage techniques explained in this section,
`Section IV-B, are implemented using the three benchmark
`circuits described in Section IV-A. In the next section, we
`explain our experimental methodology.
`
`C. Experimental Methodology
`
`Fig. 14. Experimental flow with V
`
`of each process technology.
`
`1) Simulation Setup: We use an empirical methodology to
`evaluate the five techniques which are the base case, zigzag,
`sleep, stack, and sleepy stack techniques. Each benchmark cir-
`cuit implemented using each of the five techniques is evaluated
`in terms of delay, dynamic power, static power, and area. Our ex-
`perimental procedure, which is shown in Fig. 14, is as follows.
`We first design each target benchmark circuit with each specific
`technique using Cadence Virtuoso, a custom layout tool,1 and
`the North Carolina State University (NCSU) Cadence design kit
`targeting TSMC 0.18- m technology.2 When we design a cir-
`cuit using Cadence Virtuoso, we implement schematics as well
`as layouts. Then, we extract schematics from layout to obtain
`transistor circuit netlists. The extracted netlists are fed into the
`HSPICE simulation to estimate delay and power of the target
`benchmark designed with a specific technique; we use Synopsys
`HSPICE.3
`We use TSMC 0.18- m parameters obtained from MOSIS,4
`and we also use the Predictive Technology Model (PTM) param-
`
`The implemented circuits are simulated to measure delay,
`power, and area. For power measurement, we consider both
`dynamic power and static power. We first explain experimental
`infrastructure, and then we explain detailed measurement
`methodology.
`
`1Cadence Design Systems. [Online]. Available: http://www.cadence.com
`2NC State Univ. Cadence Tool Information. [Online]. Available: http://www.
`cadence.ncsu.edu
`3Synopsys Incorporated. [Online]. Available: http://www.synopsys.com
`4The MOSIS Service. [Online]. Available: http://www.mosis.org
`
`7
`
`
`
`PARK AND MOONEY III: SLEEPY STACK LEAKAGE REDUCTION
`
`1257
`
`Inputs and the critical path (dashed line) for 4-bit adder delay mea-
`Fig. 15.
`surement.
`
`Fig. 16. Waveforms of 1-bit adder for dynamic power measurement.
`
`eters for the technologies below 0.18 m in order to estimate
`the changes in power and delay as technology shrinks,5 [14].
`The chosen technologies, i.e., 0.07, 0.10, 0.13, and 0.18 m,
`use supply voltages of 0.8, 1.0, 1.3, and 1.8 V, respectively. We
`assume that only a single supply voltage is used in the chip de-
`signs we target. We do consider both single- and dual-
`tech-
`nology for the sleep, zigzag, and sleepy stack techniques. For the
`forced stack technique, we apply high-
`to one of the stacked
`transistors while fixing the technology to 0.07 m to observe
`delay and leakage variations (we find that high-
`causes dra-
`matic—greater than 5 —delay increase with the forced stack
`technique—see Section V-B). For the logic circuits, we set all
`high-
`transistors to have 2.0
`higher
`than the
`of a
`normal transistor (low-
`).
`2) Delay: We measure the worst case propagation delay of
`each benchmark. Input vectors and input and output triggers are
`chosen to measure delay across a given circuit’s critical path.
`The propagation delay is measured between the trigger input
`edge reaching 50% of the supply voltage value and the circuit
`output edge reaching 50% of the supply voltage value. Input
`waveforms have a 4-ns period (i.e., a 250-MHz rate) and rise
`and fall times of 100 ps. We use
`as the output load capac-
`itance.
`For the chain of four inverters, we measure two different prop-
`agation delay values: one when an input goes high and another
`when an input goes low. We take the larger value as the worst
`case propagation delay of the chain of four inverters.
`For the 4:1 multiplexer, we measure the worst case propa-
`gation delay of the path
`-
`-NAND-NOR-NOR-NAND-output
`shown in Fig. 7 (note that several other paths exist with equal
`delay). We measure this critical path delay when the output
`changes f