`
`Thomas Burd, Trevor Pering1, Anthony Stratakos2, Robert Brodersen
`
`Berkeley Wireless Research Center, Univ. of California, Berkeley, CA
`1Intel, Hillsboro, OR
`2Volterra, Fremont, CA
`
`The microprocessor system in portable electronic devices often has
`a time-varying computational load which is comprised of: 1) com-
`pute-intensive and low-latency processes, 2) background and high-
`latency processes, and 3) system idle. The key design objectives for
`the processor systems in these applications are providing the
`highest possible peak performance for the compute-intensive code
`(e.g., handwriting recognition, image decompression) while maxi-
`mizing the battery life for the remaining low performance periods.
`
`A common power-saving technique is to reduce clock frequency
`during non-compute intensive activity. This reduces power, but
`does not affect the total energy consumed per process. On the other
`hand, reducing the voltage of the processor improves its energy
`efficiency, but compromises its peak performance. If, however, clock
`frequency (fCLK) and supply voltage (VDD) are dynamically varied in
`response to computational load demands, then energy consumed
`per process can be reduced for the low computational periods, while
`retaining peak performance when required. This strategy, which
`achieves the highest possible energy efficiency for time-varying
`computational loads, is called dynamic voltage scaling (DVS).
`
`A prototype DVS-enabled chip-set in 0.6µm 3-metal VT≈1V, CMOS
`contains a battery-powered (3.3-6.0V) switching regulator, a micro-
`processor, SRAM memory chips, and an interface chip for connect-
`ing to commercial I/O peripherals. The microprocessor operates
`from 1.2-3.8V and 5-80MHz with 0.54mW/MIP minimum energy
`consumption. This is an improvement of 4x in both frequency range
`and minimum energy consumption over previous work which opti-
`mized energy for a fixed frequency at power-on, and demonstrates
`DVS on a microprocessor, under direct operating system control,
`and over a complete chip-set [1].
`
`A voltage scheduler is required in the operating system of a DVS
`system. It controls fCLK (and VDD) by writing a desired frequency (in
`MHz) to a coprocessor register. Individual applications supply a
`completion deadline, and the voltage scheduler uses the applica-
`tions’ previous execution history to determine the number of proces-
`sor cycles required and sets fCLK accordingly. By optimally adjusting
`fCLK, the CPU system always operates at the minimum performance
`level required by the current active processes and thereby consumes
`the minimal amount of energy [2].
`
`Figure 17.4.1 shows two seconds of a user-interface process, which
`is generally bursty and high-latency. The top trace demonstrates
`typical microprocessor operation, running at full-speed or idling.
`The lower trace shows the voltage scheduler operating, demonstrat-
`ing that much of the computation can be done at low VDD, dramati-
`cally improving energy efficiency.
`
`A regulation feedback loop for setting the variable VDD and fCLK is
`shown in Figure 17.4.2. The ring oscillator, which tracks the critical
`paths of the microprocessor over voltage, outputs fCLK as a function
`of VDD. The fCLK signal is digitally quantized in 1MHz steps, and used
`to generate a frequency error, FERR. The loop filter implements a
`hybrid pulse-width/pulse-frequency modulation algorithm that gen-
`erates an MP or MN enable. The regulated VDD, which is fed back to
`the CPU chip to close the loop, is generated across the capacitor.
`
`The converter operates in either tracking or regulation mode, as
`indicated by the track status signal. A new frequency request
`
`initiates tracking mode in which the converter either delivers or
`removes charge from the capacitor, depending upon the sign of FERR.
`When the error magnitude is less than 4MHz, the converter switches
`to the regulation mode in which MN is disabled and only the
`processor circuits can remove charge.
`
`The efficiency of the dc-dc converter ranges from 90% at high VDD, to
`80% at the lowest VDD. The transition time is at most 70µs, shown by
`the maximum 5-80MHz transition in Figure 17.4.3. Conversion
`losses create an energy penalty when changing voltage on the
`external capacitance. This penalty is at most 4µJ, which is equiva-
`lent to 712 full-load cycles at 80MHz.
`
`The complete microprocessor system is shown in Figure 17.4.4. The
`CPU chip contains a custom-implementation ARM8 processor core[3];
`a 16kB, 32-way set-associative unified cache; a 12-element write
`buffer; a bus interface with a simple memory controller; and a
`system coprocessor which contains the desired frequency register,
`the regulator interface, real-time counter, performance counters,
`and other system control state. The SRAM chip contains 64kB of
`memory and supports burst-mode accesses. The I/O chip level-
`converts to 3.3V and performs flow control. Connecting these three
`chips is a custom system bus which is powered by the variable VDD.
`
`There are constraints on the digital circuits to ensure they operate
`properly with a varying VDD. Capacitance cannot hold state for more
`than 1/2 of a clock cycle, such as in DRAMs and tri-state busses;
`otherwise, false logic lows can be induced when VDD increases. To
`prevent this, all tri-state busses have weak pMOS feedback to
`maintain the VDD voltage level on high signals. Sense-amp circuits
`are restricted to being precharged all the way to VDD. Also, nMOS
`pass gates are not allowed because they fail for VDD < VtP+VtN. Figure
`17.4.5 shows the cache-tag CAM cell which is modified for DVS. The
`match devices (M1, M2) are typically nMOS pass gates. However,
`rather than switch to CMOS pass gates, pMOS match devices are
`used in conjunction with pre-discharge devices (M3, M4) so that the
`bitlines are precharged high between match operations, which is the
`same polarity for reads and writes to the cell.
`
`As shown in Figure 17.4.6, system performance ranges from 6-85
`Dhrystone 2.1MIPs, and the total system energy consumption
`ranges from 0.54-5.6mW/MIP. In the optimum case when only a
`small fraction of the computation requires peak performance, the
`microprocessor system can effectively deliver 85MIPs while con-
`suming on average 0.54mW/MIP. A halt command can put the
`processor into a sleep mode in which the system will dissipate only
`800µW, with a one cycle start-up.
`
`To evaluate DVS, three benchmarks are executed on the system:
`video decompression (mpeg), audio processing (audio), and a PDA-
`like application (ui). They are first run at constant maximum
`performance to measure baseline energy consumption. They are
`then run with the voltage scheduler (shown in Figure 17.4.1 for the
`ui benchmark) and their energy consumption is measured again.
`The highly compute-intensive mpeg benchmark has only a 11%
`energy reduction from DVS, while the audio and ui benchmarks
`have a 4.5x and 3.5x energy reduction, respectively.
`
`Acknowledgments:
`This work was funded by DARPA, and made possible with coopera-
`tion from ARM Ltd. The authors thank P. Laramie, O. Rowhani,
`C. Chang and R. Davis for contributions.
`
`References:
`[1] Kuroda, T, et al., “Variable Supply-Voltage Scheme for Low-Power High-
`Speed CMOS Digital Design”, IEEE J. Solid-State Circuits, vol. 33, no. 3, pp.
`454-462, Mar. 1998.
`[2] Pering, T., et al., “The Simulation and Evaluation of Dynamic Voltage
`Scaling Algorithms”, Proc. of ISLPED, pp. 76-81, Aug. 1998
`[3] ARM 8 Data-Sheet, Document Number ARM-DDI-0080C, ARM Ltd., July
`1996.
`
`• 2000 IEEE International Solid-State Circuits Conference
`
`07803-5853-8/00
`
`©2000 IEEE
`
`MICROCHIP TECHNOLOGY INC. EXHIBIT 1006
`Page 1 of 3
`
`
`
`Figure 17.4.1: DVS improvement for UI process.
`
`Figure 17.4.2: Frequency to voltage feedback loop.
`
`Figure 17.4.3: Transient response of regulation loop.
`
`Figure 17.4.4: System architecture - 4 custom chips.
`
`Figure 17.4.5: DVS-compatible CAM cell.
`
`Figure 17.4.6: Measured performance vs. energy.
`
`• 2000 IEEE International Solid-State Circuits Conference
`
`07803-5853-8/00
`
`©2000 IEEE
`
`MICROCHIP TECHNOLOGY INC. EXHIBIT 1006
`Page 2 of 3
`
`
`
`Figure 17.4.7: CPU micrograph
` (7.5x9.0mm2, 1.3M transistors).
`
`Figure 17.4.8: Regulator micrograph (1.7x3.4mm2).
`
`• 2000 IEEE International Solid-State Circuits Conference
`
`07803-5853-8/00
`
`©2000 IEEE
`
`MICROCHIP TECHNOLOGY INC. EXHIBIT 1006
`Page 3 of 3
`
`