throbber
1250
`
`IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006
`
`Sleepy Stack Leakage Reduction
`
`Jun Cheol Park and Vincent J. Mooney III, Senior Member, IEEE
`
`Abstract—Leakage power consumption of current CMOS
`technology is already a great challenge. International Technology
`Roadmap for Semiconductors projects that leakage power con-
`sumption may come to dominate total chip power consumption as
`the technology feature size shrinks. Leakage is a serious problem
`particularly for CMOS circuits in nanoscale technology. We pro-
`pose a novel ultra-low leakage CMOS circuit structure which we
`call “sleepy stack.” Unlike many other previous approaches, sleepy
`stack can retain logic state during sleep mode while achieving
`ultra-low leakage power consumption. We apply the sleepy stack
`to generic logic circuits. Although the sleepy stack incurs some
`delay and area overhead, the sleepy stack technique achieves the
`lowest leakage power consumption among known state-saving
`leakage reduction techniques, thus, providing circuit designers
`with new choices to handle the leakage power problem.
`Index Terms—Dual- th, low-leakage power dissipation, tran-
`sistor stacking.
`
`I. INTRODUCTION
`
`P OWER consumption is one of the top concerns of VLSI
`
`circuit design, for which CMOS is the primary technology.
`Today’s focus on low power is not only because of the recent
`growing demands of mobile applications. Even before the mo-
`bile era, power consumption has been a fundamental problem.
`To solve the power dissipation problem, many researchers have
`proposed different ideas from the device level to the architec-
`tural level and above. However, there is no universal way to
`avoid tradeoffs between power, delay, and area, and thus, de-
`signers are required to choose appropriate techniques that sat-
`isfy application and product needs.
`Power consumption of CMOS consists of dynamic and static
`components. Dynamic power is consumed when transistors are
`switching and static power is consumed regardless of transistor
`switching. Dynamic power consumption was previously (at
`0.18- m technology and above) the single largest concern
`for low-power chip designers since dynamic power accounted
`for 90% or more of the total chip power. Therefore, many
`previously proposed techniques, such as voltage and frequency
`scaling, focused on dynamic power reduction. However, as the
`feature size shrinks, e.g., to 0.09 and 0.065 m, static power
`has become a great challenge for current and future technolo-
`gies. Based on the International Technology Roadmap for
`Semiconductors (ITRS) [1], Kim et al. report that subthreshold
`leakage power dissipation of a chip may exceed dynamic power
`dissipation at the 65-nm feature size [2].
`
`One of the main reasons causing the leakage power increase
`is the increase of subthreshold leakage power. When technology
`feature size scales down, supply voltage and threshold voltage
`also scale down. Subthreshold leakage power increases expo-
`nentially as threshold voltage decreases. Furthermore, the struc-
`ture of the short channel device decreases the threshold voltage
`even lower. In addition to subthreshold leakage, another con-
`tributor to leakage power is gate-oxide leakage power due to the
`tunneling current through the gate-oxide insulator. Since gate-
`oxide thickness may reduce as the channel length decreases, in
`sub 0.1- m technology, gate-oxide leakage power may be com-
`parable to subthreshold leakage power if not handled properly.
`However, we assume other techniques will address gate-oxide
`leakage; for example, high- dielectric gate insulators may pro-
`vide a solution to reduce gate-leakage [2]. Therefore, this paper
`focuses on reducing subthreshold leakage power consumption.
`In this paper, we provide a new circuit structure named
`“sleepy stack” as a remedy for static power consumption. The
`sleepy stack has a novel structure that uniquely combines the
`advantages of two major prior approaches, the sleep transistor
`technique and the forced stack technique. However, unlike the
`sleep transistor technique, the sleepy stack technique retains the
`original state; furthermore, unlike the forced stack technique,
`to achieve up
`the sleepy stack technique can utilize high-
`to two orders of magnitude leakage power reduction compared
`to the forced stack. Unfortunately, the sleepy stack technique
`comes with delay and area overheads. Therefore, the sleepy
`stack technique provides new Pareto points [3] to designers
`who require ultra-low leakage power consumption and are
`willing to pay some area and delay cost.
`The main contributions of this paper are as follows: 1) intro-
`duction of a sleepy stack structure that can save leakage power
`up to two orders of magnitude for circuits that require extremely
`low leakage power consumption and 2) analysis of example
`sleepy stack logic circuits in terms of various ways (transistor
`scaling, threshold voltage, and transistor width) circuit design
`engineers can employ to adopt the sleepy stack technique as nec-
`essary.
`This paper is organized as follows. In Section II, prior work
`about low-leakage logic design is discussed. In Section III, the
`sleepy stack structure is explained and an analytical delay model
`is discussed. In Section IV, an empirical methodology applying
`the sleepy stack to generic logic is explained. In Section V, the
`experimental results of the sleepy stack for generic logic is pre-
`sented. In Section VI, conclusions are given.
`
`Manuscript received August 5, 2005; revised July 7, 2006.
`J. C. Park is with the Mobility Group, Intel Corporation, Folsom, CA 95630
`USA (e-mail: juncheol.park@intel.com).
`V. J. Mooney III is with the School of Electrical and Computer Engi-
`neering, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail:
`mooney@ece.gatech.edu).
`Digital Object Identifier 10.1109/TVLSI.2006.886398
`
`II. PREVIOUS WORK
`
`In this section, we discuss previous low-power techniques
`that primarily target reducing leakage power consumption of
`CMOS circuits. Techniques for leakage power reduction can
`
`1063-8210/$20.00 © 2006 IEEE
`
`1
`
`APPLE 1007
`
`

`

`PARK AND MOONEY III: SLEEPY STACK LEAKAGE REDUCTION
`
`1251
`
`be grouped into the following two categories: 1) state-saving
`techniques where circuit state (present value) is retained and
`2) state-destructive techniques where the current Boolean output
`value of the circuit might be lost [2]. A state-saving technique
`has an advantage over a state-destructive technique in that with a
`state-saving technique the circuitry can immediately resume op-
`eration at a point much later in time without having to somehow
`regenerate state. We characterize each low-leakage technique
`according to this criterion.
`State-destructive techniques cut off transistor (pull-up or pull-
`down or both) networks from supply voltage or ground using
`sleep transistors [4]. These types of techniques are also called
`and gated-
`(note that a gated clock is gener-
`gated-
`ally used for dynamic power reduction). Motoh et al. propose a
`technique they call multithreshold-voltage CMOS (MTCMOS)
`[4], which adds high-
`sleep transistors between pull-up net-
`works and
`and between pull-down networks and ground
`while logic circuits use low-
`transistors in order to maintain
`fast logic switching speeds. The sleep transistors are turned off
`when the logic circuits are not in use. By isolating the logic net-
`works using sleep transistors, the sleep transistor technique dra-
`matically reduces leakage power during sleep mode. However,
`the additional sleep transistors increase area and delay. Further-
`more, during sleep mode, the pull-up and pull-down networks
`will have floating values and, thus, will lose state. These floating
`values significantly impact the wake-up time and energy of the
`sleep technique due to the requirement to recharge transistors
`which lost state during sleep (this issue is nontrivial, especially
`for registers and flip-flops).
`To reduce the wake-up cost of the sleep transistor technique,
`the zigzag technique is introduced [5]. The zigzag technique
`reduces the wake-up overhead by choosing a particular circuit
`state (e.g., corresponding to a “reset”) and then, for the exact
`circuit state chosen, turning off the pull-down network for each
`gate whose output is high while conversely turning off the
`pull-up network for each gate whose output is low.
`By applying, prior to going to sleep, the particular input pat-
`tern chosen prior to chip fabrication, the zigzag technique can
`prevent floating. Although the zigzag technique retains the par-
`ticular state chosen prior to chip fabrication, any other arbitrary
`state during regular operation is lost in power-down mode.
`Another technique to reduce leakage power is transistor
`stacking. Transistor stacking exploits the stack effect;
`the
`stack effect results in substantial subthreshold leakage current
`reduction when two or more stacked transistors are turned off
`together. Narendra et al. study the effectiveness of the stack
`effect including effects from increasing the channel length [6].
`Since forced stacking of what previously was a single tran-
`sistor increases delay, Johnson et al. propose an algorithm that
`finds circuit input vectors that maximize stacked transistors of
`existing complex logic [7]. As a variation of the stacking tran-
`sistors, Hanchate and Ranganathan introduce self-controlled
`stacked transistors which are inserted between pull-up and
`pull-down networks and reduce leakage power by increasing
`internal resistance [8].
`Our sleepy stack structure can achieve more power savings
`than the forced stack technique and the self-controlled stacked
`transistors (e.g., 100
`compared with 10
`for the forced
`
`(a) Forced stack technique applied to an inverter. (b) Sleep transistor
`Fig. 1.
`technique applied to an inverter.
`
`stack transistor or the self-controlled stacked transistors).
`Furthermore, the sleepy stack can save exact logic state unlike
`gated-
`and gated-
`techniques (conventional sleep tran-
`sistor technique) and the zigzag technique.
`In Section III, we will discuss the sleepy stack structure and
`sleepy stack operation.
`
`III. SLEEPY STACK STRUCTURE
`We introduce our new leakage power reduction technique we
`name “sleepy stack.” The sleepy stack technique has a combined
`structure of the forced stack technique and the sleep transistor
`technique. However, unlike the sleep transistor technique, the
`sleepy stack technique retains exact logic state when in sleep
`mode; furthermore, unlike the forced stack technique, the sleepy
`stack technique can utilize high-
`transistors without 5
`(or
`greater) delay penalties. Therefore, far better than any prior ap-
`proach known to the authors of this paper, the sleepy stack tech-
`nique can achieve ultra-low leakage power consumption while
`saving state.
`We, first, explain the structure of the sleepy stack technique
`using an inverter. Then, we describe the details of sleepy stack
`operation in active mode and sleep mode. The advantages of
`the sleepy stack technique over the forced stack technique and
`the sleep transistor technique are explored. Finally, we derive a
`first-order delay model that compares the sleepy stack technique
`to the forced stack technique analytically.
`
`A. Sleepy Stack Approach
`In this section, we explain our sleepy stack structure com-
`paring to the forced stack technique and the sleep transistor tech-
`nique. The details of the sleepy stack inverter are described as
`an example. Two operation modes, active mode and sleep mode,
`of the sleepy stack technique are explored.
`1) Sleepy Stack Structure: The sleepy stack structure has
`a combined structure of the forced stack and the sleep tran-
`sistor techniques. Although we mentioned these two techniques
`in Section II, we focus on explaining forced stack and sleep
`transistor inverters here for the purposes of comparison with a
`sleepy stack inverter. Fig. 1(a) depicts a forced stack inverter and
`Fig. 1(b) depicts a sleep transistor inverter. The forced stack in-
`verter breaks existing transistors into two transistors and forces a
`stack structure to take advantage of the stack effect; this is shown
`in Fig. 1(a). Meanwhile, the sleep transistor inverter shown in
`
`2
`
`

`

`1252
`
`IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006
`
`(a) Sleepy stack inverter with W=L of each transistor and active mode
`Fig. 2.
`S, S assertion. (b) Sleep mode S, S assertion.
`
`Fig. 1(b) isolates existing logic networks using sleep transis-
`tors. The stack structure in Fig. 1(b) saves leakage power con-
`sumption during sleep mode. This sleep transistor technique
`frequently uses high-
`sleep transistors (the transistors con-
`trolled by
`and
`) to achieve larger leakage power reduction.
`The sleepy stack technique has a structure merging the forced
`stack technique and the sleep transistor technique. Fig. 2 shows
`a sleepy stack inverter. The sleepy stack technique divides ex-
`isting transistors into two transistors each typically with the
`same width
`half the size of the original single transistor’s
`width
`(i.e.,
`), thus, maintaining equivalent
`input capacitance. The sleepy stack inverter in Fig. 2(a) uses
`for the pull-up transistors and
`for the
`pull-down transistors, while a conventional inverter with the
`same input capacitance would use
`for the pull-up
`transistor and
`for the pull-down transistor (assuming
`). Then sleep transistors are added in parallel to
`one of the transistors in each set of two stacked transistors.
`We use a transistor sized as half the width of the original tran-
`sistor (i.e., we use
`) for the sleep transistor width of the
`sleepy stack. Although we exclusively use
`for the width
`of the sleep transistor, changing the sleep transistor width in
`various ways may provide additional tradeoffs between delay,
`power, and area. However, in this paper, we mainly focus on
`applying the sleepy stack structure with
`sleep transistor
`widths to generic logic circuits while varying technology fea-
`ture size, threshold voltage, and temperature. Please note that
`halving transistor width is not possible for a circuit that uses
`minimum size transistors. However, many circuits use nonmin-
`imum size to gain driving strength. In any case, if we cannot
`halve transistor width, then we simply use minimum width.
`2) Sleepy Stack Operation: Now we explain how the sleepy
`stack works during active mode and during sleep mode. Also,
`we explain leakage power savings using the sleepy stack struc-
`ture.
`The sleep transistors of the sleepy stack operate similar to the
`sleep transistors used in the sleep transistor technique in which
`sleep transistors are turned on during active mode and turned
`off during sleep mode. Fig. 2 depicts the sleepy stack operation
`using a sleepy stack inverter. During active mode [Fig. 2(a)],
`and
`are asserted, and, thus, all sleep transistors
`
`Fig. 3.
`
`(a) Inverter circuit schematic. (b) RC equivalent circuit.
`
`are turned on. This sleepy stack structure can potentially reduce
`circuit delay in two ways. First, since the sleep transistors are al-
`ways on during active mode, the sleepy stack structure achieves
`faster switching time than the forced stack structure; specifi-
`cally, in Fig. 2(a), at each sleep transistor drain, the voltage
`value connected to the sleep transistor source is always ready
`and available at the sleep transistor drain, and thus, current flow
`transistors connected to
`is immediately available to the low-
`the gate output regardless of the status of each transistor in par-
`allel to the sleep transistors. Furthermore, we can use high-
`transistors (which are slow but 1000
`or so less leaky) for the
`sleep transistors and the transistors parallel to the sleep transis-
`tors (see Fig. 2) without incurring large (e.g., 2 or more) delay
`increase.
`During sleep mode [Fig. 2(b)],
`are
`and
`asserted, and so both of the sleep transistors are turned off.
`Although the sleep transistors are turned off, the sleepy stack
`structure maintains exact logic state. The leakage reduction of
`the sleepy stack structure occurs in two ways. First, leakage
`power is suppressed by high-
`transistors, which are applied
`to the sleep transistors and the transistors parallel to the sleep
`transistors. Second, stacked and turned off transistors induce
`the stack effect [11], which also suppresses leakage power
`consumption. By combining these two effects, the sleepy stack
`structure achieves ultra-low leakage power consumption during
`sleep mode while retaining exact logic state. The price for this,
`however, is increased area.
`We will derive an analytical delay model of the sleepy stack
`inverter and compare the sleepy stack technique to the forced
`stack inverter in the next section. This analytical comparison
`of the next section, Section III-B, can be skipped if desired.
`The detailed experimental methodology and the results will be
`presented in Section IV.
`
`B. Analytical Comparison of Sleepy Stack Inverter Versus
`Forced Stack Inverter
`In this section, an analytical delay model of a sleepy stack
`inverter is explained and compared to a forced stack inverter,
`the best prior state-saving leakage reduction technique we could
`find.
`Generally, the transistor delay of a conventional inverter
`shown in Fig. 3 driving a load of
`can be expressed using
`the following equation:
`
`where
`tance.
`
`is the transistor resis-
`is the load capacitance and
`in Fig. 3(b) indicates input capacitance. Although the
`
`(1)
`
`3
`
`

`

`PARK AND MOONEY III: SLEEPY STACK LEAKAGE REDUCTION
`
`1253
`
`is 50%
`We assume that the internal node capacitance
`larger than
`because
`is the capacitance from three tran-
`sistors connected, while
`is the capacitance from two tran-
`sistors connected. Then
`
`(6)
`
`(7)
`
`Therefore,
`and
`if we use the same
`is 25% faster than
`for the forced stack inverter and the sleepy stack inverter.
`Alternatively, we may increase
`of the sleepy stack inverter
`and make the delay of the sleepy stack inverter and the delay of
`the forced stack inverter the same.
`Let us take an example. The gate delay of a CMOS circuit can
`be expressed as shown in the following approximated equation:
`
`(8)
`
`denote the gate delay in a CMOS cir-
`, and
`,
`where
`cuit, the threshold voltage, and velocity saturation index of a
`transistor, respectively. Using (8), the delay of the forced stack
`and the delay of the sleepy stack
`can be expressed as
`follows:
`
`(9)
`
`(10)
`
`and
`
`where
`are delay coefficients of the forced stack
`and
`inverter and the sleepy stack inverter, respectively. When the
`threshold voltage of the forced stack
`is the same as the
`threshold voltage of the sleepy stack
`, we calculate
`V,
`from (7). If we assume that
`,
`by applying
`equal to
`V, we can make
`of the forced
`, which is 69% higher than the
`stack inverter. This higher
`can potentially result in large
`leakage power reduction (e.g., 10 ).
`In this section, we introduced the sleepy stack technique for
`leakage power reduction. By combining the forced stack tech-
`nique and the sleep transistor technique, the sleepy stack can
`achieve smaller transistor delay than the forced stack technique
`while retaining state unlike the sleep transistor technique. The
`main advantage of the sleepy stack approach is the ability to use
`high-
`for both the sleep transistors and the transistors in par-
`allel with the sleep transistors. The increased threshold voltage
`transistors of the sleepy stack technique potentially brings much
`larger ( 10 ) leakage power reduction than the forced stack
`technique while achieving the same transistor delay. From the
`analytical model of the sleepy stack inverter, we observe that
`the sleepy stack inverter can reduce delay by 25%, which al-
`ternatively can be used to increase
`by 69%. Using this in-
`creased threshold voltage, the sleepy stack inverter can poten-
`tially achieve a large (e.g., 10 ) leakage power reduction com-
`pared to the forced stack inverter.
`In this section, we explained the sleepy stack structure and
`sleepy stack operation. We also described a first-order delay
`model of the sleepy stack (please note that all power and
`delay results reported in Section V are based, however, on
`
`(a) Forced stack technique inverter circuit schematic. (b) RC equivalent
`
`Fig. 4.
`circuit.
`
`Fig. 5.
`
`(a) Sleepy stack technique inverter schematic. (b) RC equivalent circuit.
`
`nonsaturation mode equation is complicated, we can predict the
`adequate first-order gate delay from (1) [14].
`Now we derive the delay of the inverter with the forced
`stack technique shown in Fig. 4. Since we assume that we
`break each existing transistor into two half sized transistors
`(see Section III-A1), the resistance of each transistor of the
`forced stack technique is doubled, i.e.,
`, compared to the
`standard inverter; furthermore, in this way, we can maintain
`input capacitance equal to Fig. 3(b). In Fig. 4,
`is internal
`node capacitance between the two pull-down transistors. Using
`the Elmore equation [10], we can express the delay of the
`forced stack inverter as follows:
`
`(2)
`(3)
`
`Similarly, we can depict the sleepy stack inverter and its re-
`sistance-capacitance (RC) equivalent circuit as shown in Fig. 5.
`Two extra sleep transistors are added and each sleep transistor
`has a resistance of
`(as discussed in Section III-A1, please
`note that increasing sleep transistor width reduces the sleep tran-
`sistor resistance further—however, let us continue with the ap-
`proach of Section III-A). The internal node capacitance is
`.
`Using the Elmore equation, we can derive the transistor delay
`of the sleepy stack inverter as follows:
`
`(4)
`(5)
`
`4
`
`

`

`1254
`
`IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006
`
`Fig. 6. Chain of four inverters with W=L of each transistor.
`
`HSPICE—see Section IV-C). In the next sections, we apply the
`sleepy stack structure to generic logic circuits, explaining in
`detail our methodology.
`
`IV. APPLYING SLEEPY STACK TO LOGIC CIRCUITS
`In this section, we first explain target benchmark circuits we
`use focusing on generic logic to evaluate our sleepy stack tech-
`nique [11]. Then we explain low-leakage techniques we con-
`sider for purposes of comparison; although the basic ideas of the
`compared techniques have been covered in Section II, this sec-
`tion will give detailed structure with transistor sizing for each
`prior technique to be compared to our sleepy stack approach.
`Finally, we explain experimental methodology that we use to
`compare our technique to the previous techniques we consider.
`
`A. Benchmark Circuits
`To show that the sleepy stack technique is applicable to gen-
`eral logic design, we choose three benchmark circuits, which
`are as follows: 1) a chain of 4 inverters; 2) a 4:1 multiplexer;
`and 3) a 4-bit adder.
`1) Chain of Four Inverters: A chain of four inverters shown
`in Fig. 6 is chosen because an inverter is one of the most basic
`CMOS circuits and is typically used to study circuit characteris-
`tics. We size each transistor of the inverter to have equal rise and
`fall times in each stage. Instead of using the minimum possible
`size of the transistor in a given technology, we use
`for pMOS and
`for nMOS transistors. Please refer
`to [12] for a layout of the chain of four inverters in TSMC
`0.18- m technology using the widths shown in Fig. 6; note that
`in Fig. 6, for 0.18- m technology, all pMOS transistors have
`m and
`m while all nMOS transistors
`have
`m and
`m.
`2) 4:1 Multiplexer: A possible implementation of a 4:1 mul-
`tiplexer is shown in Fig. 7, in which
`are input signals,
`and
`are selection signals, and
`is an enable signal. The
`multiplexer consists of an inverter, two-input NAND gates, and
`two-input NOR gates. All gates are sized to have rise and fall
`times equal to an inverter with pMOS
`and nMOS
`. Although the 4:1 multiplexer shown in Fig. 7 is not
`the most efficient way to implement a 4:1 multiplexer, we use
`the design of Fig. 7 to show that the sleepy stack can be ap-
`plicable to a combination of (a logic network of) typical CMOS
`gates. Please refer to [12] for NAND and NOR layouts used in this
`4:1 multiplexer.
`3) 4-Bit Adder: By use of the 1-bit full adder shown in Fig. 8,
`we implement a 4-bit adder. A full adder is an example of a
`typical complex CMOS gate. In Fig. 8,
`and
`are two inputs
`and is a carry input.
`and
`are outputs. The transistor
`
`Fig. 7. 4:1 multiplexer with delay critical path along the dashed line.
`
`sizing of the full adder is noted in Fig. 8. Please refer to [12] for
`the full adder layout we use.
`These three benchmark circuits (chain of 4 inverters, 4:1 mul-
`tiplexer, and 4-bit adder) designed in a conventional CMOS
`structure are used as our base case. In the next section, we ex-
`plain the low-leakage techniques to which we compare to our
`sleepy stack technique. These three benchmark circuits are also
`implemented using the low-leakage techniques explained in the
`next section, Section IV-B.
`
`B. Prior Low-Leakage Techniques Considered for
`Comparison Purposes
`
`The sleepy stack technique is compared to a conventional
`CMOS approach, which is our base case, and three other well-
`known previous approaches, i.e., the forced stack, sleep, and
`zigzag techniques explained in Section II. We also explore the
`impact of
`and transistor width on the sleepy stack technique.
`1) Base Case: In this paper, we use the phrase “base case”
`to refer to the conventional CMOS technique shown in Fig. 9
`and described in a classic textbook by Weste and Eshraghian
`[13]. Fig. 9 shows a pull-up network and a pull-down network
`using as few transistors as possible to implement the Boolean
`logic function desired. The base case of a chain of four inverters
`is sized as explained in Section IV-A1. The base case of a 4:1
`multiplexer is sized as explained in Section IV-A2. The base
`case of a 4-bit adder is sized as explained in Section IV-A3.
`2) Sleepy Stack Technique: Fig. 10 shows the sleepy stack
`technique applied to a conventional CMOS design. When we
`apply the sleepy stack technique, we replace each existing tran-
`sistor with two half sized transistors and add one extra sleep
`transistor as shown in Fig. 10. If dual-
`values are available,
`high-
`transistors are used for sleep transistors and transistors
`that are parallel to the sleep transistors.
`3) Forced Stack Technique: Fig. 11 shows the forced stack
`technique, which forces a stack structure by breaking down an
`
`5
`
`

`

`PARK AND MOONEY III: SLEEPY STACK LEAKAGE REDUCTION
`
`1255
`
`Fig. 8. 1-bit full adder with W=L of each transistor.
`
`Fig. 9. Base case (conventional CMOS) circuit structure.
`
`existing transistor into two half size transistors. When we apply
`the forced stack technique, we replace each existing transistor
`with two half sized transistors as shown in Fig. 11.
`4) Sleep Transistor Technique: The sleep transistor tech-
`nique shown in Fig. 12 uses sleep transistors between both
`and the pull-up network as well as between
`and the pull-
`down network. Generally, the width/length
`ratio is sized
`based on a tradeoff between area, leakage reduction, and delay.
`For simplicity, we size the sleep transistor to the size of the
`largest transistor in the network (pull-up or pull-down) con-
`nected to the sleep transistor. The size noted in Fig. 12 shows
`an example when the sleep transistors are applied to one of the
`inverters from Fig. 6. The pMOS and nMOS sleep transistors in
`Fig. 12 have
`and
`, respectively, because
`
`Fig. 10. Sleepy stack technique circuit structure.
`
`the size of the pull-up and pull-down transistors in Fig. 6 are
`and
`, respectively. If dual-
`values are
`available, high-
`transistors are used for sleep transistors.
`5) Zigzag Technique: The zigzag technique in Fig. 13 uses
`one sleep transistor in each logic stage either in the pull-up or
`pull-down network according a particular input pattern. In this
`paper, we use an input vector that can achieve the lowest mea-
`sured (simulated) leakage power consumption. Then, we either
`assign a sleep transistor to the pull-down network if the output
`is “ ” or else assign a sleep transistor to the pull-up network
`if the output is “ .” For Fig. 13, we assume that the output of
`the first stage is “ ” and the output of the second stage is “ ”
`
`6
`
`

`

`1256
`
`IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006
`
`Fig. 11. Forced stack technique circuit structure.
`
`Fig. 13. Zigzag technique circuit structure.
`
`Fig. 12. Sleep transistor technique circuit structure.
`
`when minimum leakage inputs are asserted. Therefore, we apply
`a pull-down sleep transistor for the first stage and a pull-up sleep
`transistor for the second stage. Similar to the sleep transistor
`technique, we size the sleep transistors to the size of the largest
`transistor in the network (pull-up or pull-down) connected to
`the sleep transistor. The transistor sizing in Fig. 13 shows an
`example where the zigzag technique is applied to two inverters
`from Fig. 6. If dual-
`values are available, high-
`transis-
`tors are used for the sleep transistors.
`The low-leakage techniques explained in this section,
`Section IV-B, are implemented using the three benchmark
`circuits described in Section IV-A. In the next section, we
`explain our experimental methodology.
`
`C. Experimental Methodology
`
`Fig. 14. Experimental flow with V
`
`of each process technology.
`
`1) Simulation Setup: We use an empirical methodology to
`evaluate the five techniques which are the base case, zigzag,
`sleep, stack, and sleepy stack techniques. Each benchmark cir-
`cuit implemented using each of the five techniques is evaluated
`in terms of delay, dynamic power, static power, and area. Our ex-
`perimental procedure, which is shown in Fig. 14, is as follows.
`We first design each target benchmark circuit with each specific
`technique using Cadence Virtuoso, a custom layout tool,1 and
`the North Carolina State University (NCSU) Cadence design kit
`targeting TSMC 0.18- m technology.2 When we design a cir-
`cuit using Cadence Virtuoso, we implement schematics as well
`as layouts. Then, we extract schematics from layout to obtain
`transistor circuit netlists. The extracted netlists are fed into the
`HSPICE simulation to estimate delay and power of the target
`benchmark designed with a specific technique; we use Synopsys
`HSPICE.3
`We use TSMC 0.18- m parameters obtained from MOSIS,4
`and we also use the Predictive Technology Model (PTM) param-
`
`The implemented circuits are simulated to measure delay,
`power, and area. For power measurement, we consider both
`dynamic power and static power. We first explain experimental
`infrastructure, and then we explain detailed measurement
`methodology.
`
`1Cadence Design Systems. [Online]. Available: http://www.cadence.com
`2NC State Univ. Cadence Tool Information. [Online]. Available: http://www.
`cadence.ncsu.edu
`3Synopsys Incorporated. [Online]. Available: http://www.synopsys.com
`4The MOSIS Service. [Online]. Available: http://www.mosis.org
`
`7
`
`

`

`PARK AND MOONEY III: SLEEPY STACK LEAKAGE REDUCTION
`
`1257
`
`Inputs and the critical path (dashed line) for 4-bit adder delay mea-
`Fig. 15.
`surement.
`
`Fig. 16. Waveforms of 1-bit adder for dynamic power measurement.
`
`eters for the technologies below 0.18 m in order to estimate
`the changes in power and delay as technology shrinks,5 [14].
`The chosen technologies, i.e., 0.07, 0.10, 0.13, and 0.18 m,
`use supply voltages of 0.8, 1.0, 1.3, and 1.8 V, respectively. We
`assume that only a single supply voltage is used in the chip de-
`signs we target. We do consider both single- and dual-
`tech-
`nology for the sleep, zigzag, and sleepy stack techniques. For the
`forced stack technique, we apply high-
`to one of the stacked
`transistors while fixing the technology to 0.07 m to observe
`delay and leakage variations (we find that high-
`causes dra-
`matic—greater than 5 —delay increase with the forced stack
`technique—see Section V-B). For the logic circuits, we set all
`high-
`transistors to have 2.0
`higher
`than the
`of a
`normal transistor (low-
`).
`2) Delay: We measure the worst case propagation delay of
`each benchmark. Input vectors and input and output triggers are
`chosen to measure delay across a given circuit’s critical path.
`The propagation delay is measured between the trigger input
`edge reaching 50% of the supply voltage value and the circuit
`output edge reaching 50% of the supply voltage value. Input
`waveforms have a 4-ns period (i.e., a 250-MHz rate) and rise
`and fall times of 100 ps. We use
`as the output load capac-
`itance.
`For the chain of four inverters, we measure two different prop-
`agation delay values: one when an input goes high and another
`when an input goes low. We take the larger value as the worst
`case propagation delay of the chain of four inverters.
`For the 4:1 multiplexer, we measure the worst case propa-
`gation delay of the path
`-
`-NAND-NOR-NOR-NAND-output
`shown in Fig. 7 (note that several other paths exist with equal
`delay). We measure this critical path delay when the output
`changes f

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket