`
`Accelerating the worlds research.
`
`Energy Managen1ent for Con1n1ercial
`Servers
`
`W sF It r
`
`IEEE Computer
`
`Cite this paper
`
`Downloaded from Academia edu C?
`
`Get the citation in M LA, APA, or Chicago styles
`
`Related papers
`
`Download a PDF Pack of the best related papers C?
`
`Introducing the Adaptive Energy Management Features of the Power7 Chip
`Alan Drake, Malcolm Ware
`
`-
`
`Conserving disk energy in network servers
`Eduardo Pinheiro
`
`Data Center Energy Consumption Modeling: A Survey
`JORGE ANDREE VELASQUEZ RAMOS
`
`Petitioner Mercedes Ex-1027, 0001
`
`
`
`
`
`
`
`
`
`
`
`application code, or the operating system can per(cid:173)
`form the operations during process scheduling.
`
`Simultaneous multithreading
`Unlike DVS, which reduces the number of idle
`processor cycles by lowering frequency, simultane(cid:173)
`ous multithreading (SMT ) increases the processor
`use at a fixed frequency. Single programs exhibit
`idle cycles because of stalls on memory accesses or
`an application's inherent lack of instruction-level
`parallelism. T herefore, SMT maps multiple pro(cid:173)
`gram contexts (threads) onto the processor con(cid:173)
`currently to provide instructions that use the
`otherwise idle resources. T his increases performance
`by increasing processor use. Because the threads
`share resources, single threads are less likely to issue
`highly speculative instructions.
`Both effects improve energy efficiency because
`functional units stay busy executing useful, non(cid:173)
`speculative instructions.5 Support for SMT chiefly
`involves duplicating the registers describing the
`thread state, which requires much less power over(cid:173)
`head than adding a processor.
`
`Processor packing
`Whereas DVS and SMT effectively reduce idle
`cycles on single processors, processor packing oper(cid:173)
`ates across multiple processors. Research at IBM
`shows that average processor use of real Web
`servers is 11 to 50 percent of their peak capacity.4
`In SMP servers, this presents an opportunity to
`reduce power consumption.
`Rather than balancing the load across all proces(cid:173)
`sors, leaving them all lightly loaded, processor
`packing concentrates the load onto the smallest
`number of processors possible and turns the
`remaining processors off. T he major challenge to
`effectively implementing processor packing is the
`current lack of hardware support for turning
`processors in SMP servers on and off.
`
`Throughput computing
`Certain workload types might offer additional
`opportunities to reduce processor power. Servers
`typically process many requests, transactions, or jobs
`concurrently. If these tasks have internal parallelism,
`developers can replace high-performance processors
`with a larger number of slower but more energy-effi(cid:173)
`cient processors providing the same throughput.
`Piranha6 and our Super-Dense Server7 prototype
`are two such systems. The SDS prototype, operat(cid:173)
`ing as a cluster, used half of the energy that a con(cid:173)
`ventional server used on a transaction-processing
`workload. However, throughput computing is not
`
`Realizing the
`potential of
`processor power
`and energy
`management
`often requires
`software support.
`
`marily offer cost and performance benefits,
`they also present a power advantage:
`Machines optimized for particular tasks and
`amenable to workload-specific tuning for
`higher energy efficiencies serve each distinct
`part of the workload.
`
`PROCESSORS
`Processor power and energy management
`relies primarily on microarchitectural en(cid:173)
`hancements, but realizing their potential
`often requires some form of systems software
`support.
`
`Frequency and voltage scaling
`CM OS circuits in modern microprocessors con(cid:173)
`sume power in proportion to their frequency and to
`the square of their operating voltage. However,
`voltage and frequency are not independent of each
`other: A CPU can safely operate at a lower voltage
`only when it runs at a low enough frequency. Thus,
`reducing frequency and voltage together reduces
`energy per operation quadratically, but only
`decreases performance linearly.
`Early work on dynamic frequency and voltage
`scaling (DVS)-
`the ability to dynamically adjust
`processor frequency and voltage-proposed that
`instead of running at full speed and sitting idle part
`of the time, the CPU should dynamically change its
`frequency to accommodate the current load and
`eliminate slack.2 To select the proper CPU fre(cid:173)
`quency and voltage, the system must predict CPU
`use over a future time interval.
`Much of the recent work in DVS has sought to
` but further research
`develop prediction heuristics,3
`on predicting processor use for server workloads
`is needed. Other barriers to implementing DVS on
`SMP servers remain. For example, many cache(cid:173)
`coherence protocols assume that all processors run
`at the same frequency. Modifying these protocols to
`support DVS is nontrivial.
`Standard DVS techniques attempt to minimize
`the time the processor spends running the operat(cid:173)
`ing system idle loop. Other researchers have
`explored the use of DVS to reduce or eliminate the
`stall cycles that poor memory access latency ca uses.
`Because memory access time often limits the per(cid:173)
`formance of data-intensive applications, running
`the applications at reduced CPU frequency has a
`limited impact on performance.
`Offline profiling or compiler analysis can deter(cid:173)
`mine the optimal CPU frequency for an application
`or application phase. A programmer can insert
`operations to change frequency directly into the
`
`•4
`
`Computer
`
`Petitioner Mercedes Ex-1027, 0006
`
`
`
`suitable for all applications, so server makers must
`develop separate systems to provide improved sin(cid:173)
`gle-thread performance.
`
`MEMORY POWER
`Server memory is typically double-data rate syn(cid:173)
`chronous dynamic random access memory (DDR
`SDRAM), which has two low-power modes:
`power-down and self-refresh. In a large server, the
`number of memory modules typically surpasses the
`number of outstanding memory requests, so mem(cid:173)
`ory modules are often idle. The memory controllers
`can put this idle memory into low-power mode
`until a processor accesses it.
`Switching to power-down mode or back to active
`mode takes only one memory cycle and can reduce
`idle DRAM power consumption by more than 80
`percent. This savings may increase with newer tech(cid:173)
`nology.
`Self-refresh mode might achieve even greater
`power savings, but it currently requires several hun(cid:173)
`dred cycles to return to active mode. Thus, realiz(cid:173)
`ing this mode's potential benefits requires more
`sophisticated techniques.
`
`Data placement
`Data distribution in physical memory determines
`which memory devices a particular memory access
`uses and consequently the devices' active and idle
`periods. Memory devices with no active data can
`operate in a low-power mode.
`When a processor accesses memory, memory
`controllers can bring powered-down memory into
`an active state before proceeding with the access.
`To optimize performance, the operating system can
`activate the memory that a newly scheduled process
`uses during the context switch period, thus largely
`9
`hiding the latency of exiting the low-power mode.8
`•
`Better page allocation policies can also save
`energy. Allocating new pages to memory devices
`already in use helps reduce the number of active
`memory devices.9
`10 Intelligent page migration(cid:173)
`•
`moving data from one memory device to another to
`reduce the number of active memory devices---can
`11
`further reduce energy consumption.9
`•
`However, minimizing the number of active
`memory devices also reduces the memory band(cid:173)
`width available to the application. Some accesses
`previously made in parallel to several memory
`devices must be made serially to the same mem(cid:173)
`ory device.
`Data placement strategies originally targeted
`low-end systems; hence, developers must carefully
`evaluate them in the context of commercial server
`
`Minimizing the
`number of active
`memory devices also
`reduces the memory
`bandwidth available
`to the application.
`
`environments, where reduced memory band(cid:173)
`width can substantially impact performance.
`
`Memory address mapping
`Server memory systems provide config(cid:173)
`urable address mappings that can be used for
`energy efficiency. These systems partition
`memory among memory controllers, with
`each controller responsible for multiple banks
`of physical memory.
`Configurable parameters in the system mem-
`ory organization determine the interleaving-the
`mapping from a memory address to its location in a
`physical memory device. This mapping often occurs
`at the granularity of the higher-level cache line size.
`One possible interleaving allocates consecutive
`cache lines to DRAMs that different memory con(cid:173)
`trollers manage, striping the physical memory across
`all controllers. Another approach involves mapping
`consecutive cache lines to the same controller, using
`all the physical memory under one controller before
`moving to the next. Under each controller, memory
`address interleaving can similarly occur across the
`multiple physical memory banks.
`Spreading consecutive accesses- or even a single
`access-across multiple devices and controllers can
`improve memory bandwidth, but it may consume
`more power because the server must activate more
`memory devices and controllers.
`The interactions of interleaving schemes with con(cid:173)
`figurable DRAM parameters such as page-mode
`policies and burst length have an additional impact
`on memory latencies and power consumption.
`Contrary to current practice in which systems come
`with a set interleaving scheme, a developer can tune
`an interleaving scheme to obtain the desired power
`and performance tradeoff for each workload.
`
`Memory compression
`An orthogonal approach to reducing the active
`physical memory is to use a system with less mem(cid:173)
`ory. IBM researchers have demonstrated that they
`can compress the data in a server's memory to half
`its size.12 A modified memory controller compresses
`and decompresses data when it accesses the mem(cid:173)
`ory. However, compression adds an extra step to
`the memory access, which can be time-<:onsuming.
`Adding the 32-Mbyte L3 cache used in the IBM
`compression implementation significantly reduces
`the traffic to memory, thereby reducing the perfor(cid:173)
`mance loss that compression causes. In fact, a com(cid:173)
`pressed memory server can perform at nearly the
`same level as a server with no compression and
`double the memory.
`
`December 2003
`
`Petitioner Mercedes Ex-1027, 0007
`
`
`
`
`
`
`
`Heterogeneous
`systems have
`significant potential
`as a power-efficient
`architecture.
`
`common terminology and a standard set of
`interfaces for software power and energy
`management mechanisms. ACPI defines up
`to 16 active- or performance- states, PO
`through P1 5, and three idle- or power(cid:173)
`states, C l through C3 (or Dl through D3 for
`devices) . T he specification defines a set of
`tables that define the power/performance/
`latency characteristics of these states, which
`are both processor- and device-dependent.
`A simple but powerful approach to defining
`power-management policies is to specify a set of
`system operating states- compositions of system
`component states and modes- and then write a
`policy as a mapping from current system state and
`application activity to a system operating state in
`accordance with desired power and performance
`characteristics. Defining policies based on existing
`system states such as idle, processing-application,
`and processing-interrupt is an efficient and trans(cid:173)
`parent way to introduce power-management poli(cid:173)
`cies without overhauling the entire operating
`system.
`IBM and Monta Vista Software took this ap(cid:173)
`proach in the design of Dynamic Power Manage(cid:173)
`ment, a software architecture that supports
`dynamic voltage and frequency scaling for system(cid:173)
`on-<:hip environments.15 Developers can use DPM
`to manipulate processor cores and related bus fre(cid:173)
`quencies according to high-level policies they spec(cid:173)
`ify with optional policy managers.
`Although the initial work focuses on dynamic
`voltage and frequency scaling in system-on-chip
`designs, developing a similar architecture for man(cid:173)
`aging other techniques in high-end server systems
`is quite possible.
`Early work on software-managed power con(cid:173)
`sumption focused on embedded and laptop sys(cid:173)
`tems, trading performance for energy conservation
`and
`thus extending battery
`life. Recently,
`researchers have argued that operating systems
`should implement power-<:onservation policies and
`manage power like other system resources such as
`CPU time, memory allocation, and disk access.16
`Subsequently, researchers developed prototype
`systems that treat energy as a first-class resource
`that the operating system manages. 17 T hese proto(cid:173)
`types use an energy abstraction to unify manage(cid:173)
`ment of the power states and system components
`including the CPU, disk, and network interface.
`Extending such operating systems to manage sig(cid:173)
`nificantly more complex high-end machines and
`developing policies suitable for server system
`requirements are nontrivial tasks. Key questions
`
`include the frequency, overhead, and complexity
`of power measurement and accounting and the
`latency and overhead associated with managing
`power in large-scale systems.
`Although it increases overall complexity, a hyper(cid:173)
`visor- a software layer that lets a single server run
`multiple operating systems concurrently- may
`facilitate solving the energy-management problems
`of high-end systems. T he hypervisor, by necessity,
`encapsulates the policies for sharing system
`resources among execution images. Incorporating
`energy-efficiency rules into these policies gives the
`hypervisor control of system-wide energy man(cid:173)
`agement. It also allows developers to introduce
`energy-management mechanisms and policies that
`are specific to the server without requiring changes
`in the operating systems running on it.
`Because many servers are deployed in clusters,
`and because it's relatively easy to turn entire sys(cid:173)
`tems on and off, several researchers have consid(cid:173)
`ered energy management at the cluster level. For
`example, the Muse prototype uses an economic
`model that measures the performance value of
`adding a server versus its energy cost to determine
`the number of active servers the system needs.18
`Power-aware request distribution (PARD)
`attempts to minimize the number of servers needed
`by concentrating load on a few servers and turn(cid:173)
`ing the rest off.19 As server load changes, PARD
`turns machines on and off as needed.
`Ricardo Bianchini and Ram Rajamony discuss
`energy management for clusters further in the
`"Energy Conservation in Clustered Servers" sidebar.
`
`FUTURE DIRECTIONS
`T hree research areas will profoundly impact
`server system energy management:
`
`• power-efficient architectures,
`• system-wide management of power-manage(cid:173)
`ment techniques and policies, and
`• evaluation of power and performance trade(cid:173)
`offs.
`
`Heterogeneous systems have significant poten(cid:173)
`tial as a power-efficient architecture. T hese systems
`consist of multiple processing elements, each
`designed for power- and performance-efficient pro(cid:173)
`cessing of particular workload types. For example,
`a heterogeneous system for Internet applications
`combines network processors with a set of energy(cid:173)
`efficient processors to provide both efficient net(cid:173)
`work-protocol processing and application-level
`computation.
`
`Computer
`
`Petitioner Mercedes Ex-1027, 0010
`
`
`
`With the increase in transistor densities, hetero-
`geneity also extends into the microarchitectural
`arena, where a single chip combines several cores,
`each suited for efficient processing of a specific
`workload. Early work in this area combines mul-
`tiple generations of the Alpha processor on a single
`chip, using only the core requiring the lowest power
`while still offering sufficient performance for the
`current workload.20 As the workload changes, sys-
`tem software shifts the executing program from
`core to core, trying to match the application’s
`required performance while minimizing power con-
`sumption.
`Server systems can employ several energy-man-
`agement techniques concurrently. A coordinated
`high-level management approach is critical to han-
`dling the complexity of multiple techniques and
`applications and achieving the desired power and
`performance. Specifying high-level power-man-
`agement policies and mapping them to available
`mechanisms is a key issue that researchers must
`address.
`Another question concerns where to implement
`the various policies and mechanisms: in dedicated
`power-management applications, other applica-
`tions, middleware, operating systems, hypervisors,
`or individual hardware components. Clearly, none
`of these alternatives can achieve comprehensive
`energy management in isolation. At the very least,
`arriving at the right decision requires efficiently
`communicating information between the system
`layers.
`The real challenge is to make such directed
`autonomy work correctly and at reasonable over-
`head. Heng Zeng and colleagues suggest using
`models based on economic principles.17 Another
`option is to apply formal feedback-control and pre-
`diction techniques to energy management. For
`example, Sivakumar Velusamy and colleagues used
`control theory to formalize cache power manage-
`ment.21 The Clockwork project22 applies predictive
`methods to systems-level performance manage-
`ment, but less work has been done on using these
`methods.
`Designers and implementers need techniques to
`correctly and conveniently evaluate energy man-
`agement solutions. This requires developing bench-
`marks that focus not just on peak system
`performance but also on delivered performance
`with associated power and energy costs. Closely tied
`to this is the use of metrics that incorporate power
`considerations along with performance. Another
`key component is the ability to closely monitor sys-
`tem activity and correlate it to energy consumption.
`
`P erformance monitoring has come a long way
`
`in the past few years, with high-end proces-
`sors supporting an array of programmable
`event-monitoring counters. But system-level power
`management requires information about bus-level
`transactions and activity in the memory hierarchy.
`Much of this information is currently missing.
`Equally important is the ability to relate the values
`of these counters to both energy consumption and
`the actual application-level performance. The ulti-
`mate goal, after all, is to achieve a good balance
`between application performance and system
`energy consumption. ■
`
`Acknowledgment
`This work was supported, in part, by the US
`Defense Advanced Research Projects Agency
`(DARPA) under contract F33615-01-C-1892.
`
`References
`1. T. Mudge, “Power: A First-Class Architectural Design
`Constraint,” Computer, Apr. 2001, pp. 52-57.
`2. M. Weiser et al., “Scheduling for Reduced CPU
`Energy,” Proc. 1st Symp. Operating Systems Design
`and Implementation, Usenix Assoc., 1994, pp. 13-23.
`3. K. Flautner and T. Mudge, “Vertigo: Automatic Per-
`formance-Setting for Linux,” Proc. 5th Symp. Oper-
`ating Systems Design and Implementation (OSDI),
`Usenix Assoc., 2002, pp. 105-116.
`4. P. Bohrer et al., “The Case for Power Management
`in Web Servers,” Power-Aware Computing, R. Gray-
`bill and R. Melhem, eds., Series in Computer Science,
`Kluwer/Plenum, 2002.
`5. J. Seng, D. Tullsen, and G. Cai, “Power-Sensitive Mul-
`tithreaded Architecture,” Proc. 2000 Int’l Conf. Com-
`puter Design, IEEE CS Press, 2000, pp. 199-208.
`6. L.A. Barroso et al., “Piranha: A Scalable Architec-
`ture Based on Single-Chip Multiprocessing,” Proc.
`27th ACM Int’l Symp. Computer Architecture, ACM
`Press, 2000, pp. 282-293.
`7. W. Felter et al., “On the Performance and Use of
`Dense Servers,” IBM J. Research and Development,
`vol. 47, no. 5/6, 2003, pp. 671-688.
`8. V. Delaluz et al., “Scheduler-Based DRAM Energy
`Management,” Proc. 39th Design Automation Conf.,
`ACM Press, 2002, pp. 697-702.
`9. H. Huang, P. Pillai, and K.G. Shin, “Design and
`Implementation of Power-Aware Virtual Memory,”
`Proc. Usenix 2003 Ann. Technical Conf., Usenix
`Assoc., 2003, pp. 57-70.
`
`December 2003
`
`47
`
`Petitioner Mercedes Ex-1027, 0011
`
`
`
`10. A.R. Lebeck et al., “Power-Aware Page Allocation,”
`Proc. Int’l Conf. Architectural Support for Pro-
`gramming Languages and Operating Systems (ASP-
`LOS), ACM Press, 2000, pp. 105-116.
`11. V. Delaluz, M. Kandemir, and I. Kolcu, “Automatic
`Data Migration for Reducing Energy Consump-
`tion in Multi-Bank Memory Systems,” Proc. 39th
`Design Automation Conf., ACM Press, 2002, pp.
`213-218.
`12. R.B. Tremaine et al., “IBM Memory Expansion Tech-
`nology (MXT),” IBM J. Research and Development,
`vol. 45, no. 2, 2001, pp. 271-286.
`13. A. Moshovos et al., “Jetty: Filtering Snoops for
`Reduced Energy Consumption in SMP Servers,”
`Proc. 7th Int’l Symp. High-Performance Computer
`Architecture, IEEE CS Press, 2001, pp. 85-96.
`14. C. Saldanha and M. Lipasti, “Power Efficient Cache
`Coherence,” High-Performance Memory Systems,
`H. Hadimiouglu et al., eds., Springer-Verlag, 2003.
`15. B. Brock and K. Rajamani, “Dynamic Power Man-
`agement for Embedded Systems,” Proc. IEEE Int’l
`SOC Conf., IEEE Press, 2003, pp. 416-419.
`16. A. Vahdat, A. Lebeck, and C. Ellis, “Every Joule Is
`Precious: The Case for Revisiting Operating System
`Design for Energy Efficiency,” Proc. 9th ACM
`SIGOPS European Workshop, ACM Press, 2000,
`pp. 31-36.
`17. H. Zeng et al., “Currentcy: A Unifying Abstraction
`for Expressing Energy Management Policies,” Proc.
`General Track: 2003 Usenix Ann. Technical Conf.,
`Usenix Assoc., 2003, pp. 43-56.
`18. J. Chase et al., “Managing Energy and Server
`Resources in Hosting Centers,” Proc. 18th Symp.
`Operating Systems Principles (SOSP), ACM Press,
`2001, pp. 103-116.
`19. K. Rajamani and C. Lefurgy, “On Evaluating
`Request-Distribution Schemes for Saving Energy in
`Server Clusters,” Proc. IEEE Int’l Symp. Perfor-
`mance Analysis of Systems and Software, IEEE Press,
`2003, pp. 111-122.
`20. R. Kumar et al., “A Multi-Core Approach to
`Addressing the Energy-Complexity Problem in
`Microprocessors,” Proc. Workshop on Complexity-
`Effective Design (WCED 2003), 2003; www.ece.
`rochester.edu/~albonesi/wced03.
`21. S. Velusamy et al., “Adaptive Cache Decay Using For-
`mal Feedback Control,” Proc. Workshop on Memory
`Performance Issues (held in conjunction with the
`29th Int’l Symp. Computer Architecture), ACM
`Press, 2002; www.cs.virginia.edu/~skadron/Papers/
`wmpi_decay.pdf.
`22. L. Russell, S. Morgan, and E. Chron, “Clockwork: A
`New Movement in Autonomic Systems,” IBM Sys-
`tems J., vol. 42, no. 1, 2003, pp. 77-84.
`
`Charles Lefurgy is a research staff member at the
`IBM Austin Research Lab. His research interests
`include computer architecture and operating sys-
`tems. Lefurgy received a PhD in computer science
`and engineering from the University of Michigan.
`He is a member of the ACM and the IEEE. Contact
`him at lefurgy@us.ibm.com.
`
`Karthick Rajamani is a research staff member in
`the Power-Aware Systems Department at the IBM
`Austin Research Lab. His research interests include
`the design of computer systems and applications
`with the focus on power and performance opti-
`mizations. Rajamani received a PhD in electrical
`and computer engineering from Rice University.
`Contact him at karthick@us.ibm.com.
`
`Freeman Rawson is a senior technical staff member
`at the IBM Austin Research Lab. His research inter-
`ests include operating systems, middleware, and
`systems management. Rawson received a PhD in
`philosophy from Stanford University. He is a mem-
`ber of the IEEE Computer Society, the ACM, and
`AAAI. Contact him at frawson@us.ibm.com.
`
`Wes Felter is a researcher at the IBM Austin
`Research Lab. His interests include operating sys-
`tems, networking, peer-to-peer, and the sociopolit-
`ical aspects of computing. Felter received a BS in
`computer sciences from the University of Texas at
`Austin. He is a member of the ACM. Contact him
`at wmf@us.ibm.com.
`
`Michael Kistler is a senior software engineer at the
`IBM Austin Research Lab and a PhD candidate in
`computer science at the University of Texas at
`Austin. His research interests include operating sys-
`tems and fault-tolerant computing. Contact him at
`mkistler@us.ibm.com.
`
`Tom W. Keller manages the Power-Aware Systems
`Department at the IBM Austin Research Lab.
`Keller received a PhD in computer sciences from
`the University of Texas at Austin. Contact him at
`tkeller@us.ibm.com.
`
`48
`
`Computer
`
`Petitioner Mercedes Ex-1027, 0012
`
`