`US007032119B2
`
`(12) United States Patent
`Fung
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,032,119 B2
`Apr. 18,2006
`
`(54) DYNAMIC POWER AND WORKLOAD
`MANAGEMENT FOR MULTI-SERVER
`SYSTEM
`
`(75)
`
`Inventor: Henry T. Fung, San Jose, CA (US)
`
`(73) Assignee: Amphus, Inc., San Jose, CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the tenn of this
`patent is extended or adjusted under 35
`U.S.c. 154(b) by 678 days.
`
`(21) Appl. No.: 09/860,373
`
`(22)
`
`Filed:
`
`May 18, 2001
`
`4,509,148 A
`4,510,398 A
`4,538,231 A
`4,545,030 A
`4,570,219 A
`4,667,289 A
`4,677,566 A *
`4,698,748 A
`4,766,567 A
`4,780,843 A
`4,809,163 A
`4,823,292 A
`4,835,681 A
`4,841,440 A
`4,881,205 A
`
`4/1985 Asano et al.
`4/1985 Culp et al.
`8/1985 Abe et 31
`10/1985 Kitchin
`2/1986 Shibukawa et al.
`5/1987 Yoshida et 31
`6/1987 Whittaker et al.
`10/1987 Juzswik et al.
`8/1988 Kato
`10/1988 Tietjen
`2/1989 Rirosawa et 31
`4/1989 Rillion
`5/1989 Culley
`6/1989 Yonezu et al.
`11/1989 Aihara
`
`(Continued)
`
`365/230
`307/35
`364/483
`364/900
`395/775
`364/200
`700/295
`364/200
`364/900
`364/900
`364/200
`364/707
`364/200
`364/200
`365/222
`
`(65)
`
`(60)
`
`(51)
`
`(52)
`(58)
`
`(56)
`
`Prior Publication Data
`
`US 2002/0062454 Al
`
`May 23,2002
`
`Primary Examiner-Dennis M. Butler
`(74) Attorney, Agent, or Firm-Dorsey & Whitney LLP
`
`Related U.S. Application Data
`
`(57)
`
`ABSTRACT
`
`Provisional application No. 60/283,375, filed on Apr.
`11, 2001, provisional application No. 60/236,043,
`filed on Sep. 27, 2000, provisional application No.
`60/236,062, filed on Sep. 27, 2000.
`
`Int. Cl.
`(2006.01)
`G06F 1/32
`713/320; 713/322; 713/323
`U.S. Cl.
`713/300,
`Field of Classification Search
`713/320,322-324,600,601
`See application file for complete search history.
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,275,868 A
`4,279,020 A
`4,316,247 A
`4,317,180 A
`4,365,290 A
`4,381,552 A
`4,398,192 A
`4,463,440 A
`4,479,191 A
`
`9/1966 M31mer, Jr.
`7/1981 Chistian et al.
`2/1982 Iwamoto
`2/1982 Lies
`12/1982 Nelms et al.
`4/1983 Nocilini et al.
`8/1983 Moore et al.
`7/1984 Nishiura et al.
`10/1984 Nojima et al.
`
`395/375
`364/900
`364/200
`364/707
`364/200
`364/900
`340/825.44
`364/900
`364/707
`
`Network architecture, computer system and/or server, cir(cid:173)
`cuit, device, apparatus, method, and computer program and
`control mechanism for managing power consumption and
`workload in computer system and data and infonnation
`servers. Further provides power and energy consumption
`and workload management and control systems and archi(cid:173)
`tectures for high-density and modular multi-server computer
`systems that maintain performance while conserving energy
`and method for power management and workload manage(cid:173)
`ment. Dynamic server power management and optional
`dynamic workload management for multi-server environ(cid:173)
`ments is provided by aspects of the invention. Modular
`network devices and integrated server system,
`including
`modular servers, management units, switches and switching
`fabrics, modular power supplies and modular fans and a
`special backplane architecture are provided as well as
`dynamically reconfigurable multi-purpose modules and
`servers. Backplane architecture, structure, and method that
`has no active components and separate power supply lines
`and protection to provide high reliability in server environ(cid:173)
`ment.
`
`26 Claims, 19 Drawing Sheets
`
`CPUVOLTAGEGENERATORICOI/TRQLUNITN
`
`CPU CLOCK FREQ GEHERATOOfCOOlRCLUNITN
`
`
`
`US 7,032,119 B2
`Page 2
`
`..
`
`u.s. PATENT DOCUMENTS
`4,907,183 A
`3/1990 Tanaka
`4,922,450 A
`5/1990 Rose et al.
`4,963,769 A
`10/1990 Hiltpold et al.
`4,968,900 A
`11/1990 Harvey et al.
`4,974,180 A
`11/1990 Patton et al.
`4,980,836 A
`12/1990 Carter et al.
`4,991,129 A
`2/1991 Swartz
`4,996,706 A
`2/1991 Cho ..
`5,021,679 A
`6/1991 Fairbanks et al.
`5,025,387 A
`6/1991 Frane
`5,041,964 A
`8/1991 Cole et al.
`5,083,266 A
`1/1992 Watanabe
`5,119,377 A
`6/1992 Cobb et al.
`.
`5,123,107 A
`6/1992 Mensch, Jr.
`5,129,091 A
`7/1992 Yorimoto et al.
`5,151,992 A
`9/1992 Nagae
`5,167,024 A
`1111992 Smith et al.
`
`364/707
`364/900
`307/465
`307/296.3
`364/550
`364/483
`364/707
`379/93
`307/66
`364/493
`364/200
`395/275
`371/550
`395/800
`395/750
`395/750
`395/375
`
`5,175,845 A
`5,201,059 A
`5,218,704 A
`5,222,239 A
`5,247,164 A
`5,247,213 A
`5,247,655 A
`5,249,298 A
`5,251,320 A
`5,396,635 A
`5,666,538 A *
`5,742,833 A *
`6,115,823 A *
`6,324,651 Bl *
`6,397,340 Bl *
`6,587,950 Bl *
`2002/0007463 Al *
`* cited by examiner
`
`12/1992
`4/1993
`6/1993
`6/1993
`9/1993
`9/1993
`9/1993
`9/1993
`10/1993
`3/1995
`9/1997
`4/1998
`9/2000
`1112001
`5/2002
`7/2003
`1/2002
`
`Little
`Nguyen
`Watts, Jr. et al.
`Rosch
`Takahashi
`Trinh et al.
`Khan et al.
`Bolan et aI
`Kuzawinski et al.
`Fung
`DeNicola
`Dea et aI
`Velasco et aI.
`Kubik et al.
`Watts et al.
`Shah et aI.
`Fung
`
`.
`
`.
`
`395/550
`395/800
`395/750
`395/750
`235/492
`307/465
`395/550
`395/750
`395/750
`395/800
`713/320
`713/323
`713/322
`713/323
`713/322
`713/300
`713/320
`
`
`
`e•
`
`7J).
`
`• ~ ~~ ~=~
`
`~:
`
`~CfO
`
`-:....
`No
`
`\
`
`o0
`
`('D
`
`rFJ=(cid:173)
`('D.........
`o........
`
`\0
`
`dr
`
`Jl
`
`",......:I=W
`",N
`"""'"
`"""'"\C=N
`
`52
`
`52
`
`0 I VC(I
`EXTEND TO
`MULTIPLE RACKS WBAYS 53 C
`
`0
`
`0
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`,
`
`I
`
`I
`
`I
`
`I \V4 ~
`
`I
`
`54
`
`.00
`
`53
`52
`
`FIG. 1
`
`ELECTRIC SERVICE
`--+-'i
`\
`66
`
`I
`
`:a=x:
`
`54
`
`53
`
`I
`
`:c:a:
`
`65
`
`D
`
`omtIIII\':
`MANAGEMENT NODE(S)
`(LOCALAND REMOTE)
`
`64
`
`NAS
`
`
`
`e•
`
`7J).
`
`• ~ ~~ ~=~ >
`
`'t:l:-:....
`No
`
`~CfO
`
`\
`
`o0
`
`rFJ=(cid:173)
`('D.....
`.......
`
`('D
`
`No.
`
`\0
`
`dr
`
`Jl
`
`",......:I=W
`N
`'"
`"""'"
`"""'"\C=N
`
`55
`
`FAN
`57
`0
`FAN I FAN
`BANK
`1
`#2
`I FAN
`2
`I
`
`\I
`
`[
`
`II
`
`T
`
`[ [ [
`[
`[
`[
`
`[
`
`HDD HOO
`
`[
`~ -
`HDOO
`HDDO
`-
`-
`=== =
`HOD HOD1
`HDD
`HDD1
`- -7 -
`I
`~ SERVER~
`MODULE
`
`ELECTRICAL POWERAND
`DATACONNECTORS
`
`r--""I
`
`56
`POWER
`SUPPLY
`#2
`
`~
`
`r
`
`MM
`
`SWITCH
`
`\.
`
`\
`~
`
`....
`
`SWITCH
`
`MM
`
`FAN
`57
`0
`FAN FAN
`BANK
`1
`#1
`FAN
`2
`
`BACKPLANE
`
`~
`
`II
`
`CPU
`
`RAM
`
`H [
`
`[
`
`[
`
`56
`POWER
`SUPPLY
`#1
`
`II
`
`[
`
`[
`
`[
`
`T
`
`[
`[
`
`I
`[
`
`[
`
`r - - ........-
`
`HDD HDD
`
`r--- r - - -
`HDD
`HDD HDD
`
`' - - --- -
`
`r - - -
`
`-
`
`..----
`HOD HOD HOD HOD
`
`-
`HDD
`
`- ---- - - --- '---
`
`SERVER
`GROUP
`
`FIG.2
`
`
`
`e•
`
`7J).
`
`• ~ ~~ ~=~ >
`
`'t:l:-:....
`No
`
`~CfO
`
`\
`
`o0
`
`('D
`
`(.H
`
`rFJ=(cid:173)
`('D.....
`o........
`
`\0
`
`dr
`
`Jl
`
`",......:I=W
`N
`'"
`"""'"
`"""'"\C=N
`
`I
`WEB
`SERVER
`I
`
`I
`WEB
`D Do l SERVER I
`,
`
`0
`
`•
`
`0
`
`I
`DATA
`I
`
`1
`·------------1-------------:
`
`•
`
`D D
`
`I
`SERVER
`I
`
`L
`
`APP1
`I
`
`I
`DATA
`I
`
`CACHE
`SER~ER
`
`.----L. ~
`
`APP2
`I
`
`APP3
`I
`
`c 1
`
`~
`STORAGE
`(E.G. DISK(S))
`"----'"
`
`FIG. 3
`
`
`
`e•
`
`7J).
`•
`~
`~
`
`~ ~=
`
`~
`
`>'t:l:-:....
`
`~CfO
`
`\
`
`No
`
`o0
`
`('D
`
`,j;o,.
`
`rFJ=(cid:173)
`('D.....
`o........
`
`\0
`
`IN BAND
`
`IN BAND
`
`DOWNO
`
`UPO
`
`® :=,
`
`1_-
`
`I
`swor---'--------------------------------------------1--- ----------,----:
`'
`2
`
`SIO RJ11 RJ45
`PORT PORT PORT
`
`®1 MONO
`b tb ~
`
`~ NODE 1
`NODE 2
`NODE 3
`NODE n
`~ (CNORSM)
`(CNORSM)
`(CNORSM}
`(CNORSMI
`: ~------------t-------------1--------------~----------~
`~ SWli-~------------------------------------------: --- --~---------~----~
`
`co
`
`II D co
`
`co
`
`•
`
`..
`
`..
`
`dr
`
`Jl
`
`",......:I=W
`N
`'"
`"""'"
`"""'"\C=N
`
`MON1
`ri'"
`7
`RJ45
`PORT
`
`DOWN1
`
`FIG. 4
`
`UP1
`
`
`
`e•
`
`7J).
`
`• ~ ~~ ~=~ >
`
`'t:l:-:....
`No
`
`~CfO
`
`\
`
`o0
`
`('D
`
`rFJ=(cid:173)
`('D.....
`Ul
`o........
`
`\0
`
`dr
`
`Jl
`
`N
`'"
`"""'"
`
`",......:I=W
`"""'"\C=N
`
`----------------------------------------j
`EMBODIMENT:
`:
`i
`:
`----------, ,- OFARST
`f'
`,
`IMPlEMENTOR:
`,
`,
`,
`... :
`,
`EMBODIMENT
`:
`:
`: : ,_ OFSECOND
`,
`,
`F-
`IW'WIENTOR
`:
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`,
`:
`:
`:
`,
`,
`,
`,
`,
`--------------------:
`:
`DATA
`:
`:
`,
`,
`,
`,
`
`------------
`
`WEB
`SERVER
`
`• ••
`
`I
`
`I
`
`_
`
`----------------------
`
`='=-L:--,
`
`-------------------
`
`i
`:---
`:
`' ,-----
`' ,
`"
`: :
`: :
`' ,
`:: •••
`' ,
`' ,
`' ,
`' ,
`' ,
`' ,
`' ,
`' ,
`' ,
`' ,
`: :
`: :
`' ,
`' ,
`' ,
`' ,
`: c__________
`: :
`: :
`:
`'
`'
`'
`'
`
`I
`
`I
`
`l
`
`• ••
`
`CACHE
`SERVER
`$
`
`•••
`
`EMlJl
`
`----------
`
`M~
`
`FIG. 5
`
`--
`
`
`
`e•
`
`7J).
`•
`
`~ ~~ ~=~
`
`>'t:l:-:....
`
`~CfO
`
`\
`
`No
`
`o0
`
`100I
`
`('D
`
`rFJ=(cid:173)
`('D.....
`0\
`o........
`
`\0
`
`136
`
`INTERNETCONNECTION
`
`132
`
`~ROUTER
`
`130
`
`128
`II ~~~NCER
`124
`GIGABITUPLINK
`(WITH REDUNDANCy)
`
`I'
`
`I
`
`GIGABITUPUNK
`(WITHREDUNDANC~L126
`
`PRIMARY SWITCHING FABRIC (SWITCH MODULE)
`
`REMOTE INTERNET
`MANAGEMENT NODE
`LOCAL
`MANAGEMENT 138
`
`NODEn Iqr
`
`~ I~~~__
`
`IIII
`
`142-,":
`
`IIIIIt
`
`SERIAL
`146
`
`148
`
`POTS
`
`dr
`
`Jl
`
`N
`'"
`"""'"
`
`",......:I=W
`"""'"\C=N
`
`REMOTE DIAL-IN
`
`
`
`u.s. Patent
`
`Apr. 18,2006
`
`Sheet 7 of 19
`
`US 7,032,119 B2
`
`0
`
`~
`~
`0::«
`W
`« 0
`-.I
`0
`to en
`0:: w (u :::J
`0
`LU 0
`Cf) > ::::c:
`:2:
`
`r r 1! I'-.
`(9-u..
`
`:2:
`:2:
`
`:2:
`en
`
`a 0 a
`
`~C
`
`f)
`
`~e
`
`n
`
`t(cid:173)
`W
`~o:::::z
`>wo::::~>w
`O:::r:
`t(cid:173)
`LU
`
`oen
`
`en
`:::J
`co
`
`U0
`
`...
`
`:2:«
`
`
`
`u.s. Patent
`
`Apr. 18,2006
`
`Sheet 8 of 19
`
`US 7,032,119 B2
`
`INTERNET
`(OR OTHER
`NElWOR~
`
`ROUTER
`
`GIGABIT
`(GS1) SWITCH
`
`o
`
`o
`
`Q
`
`Q
`
`GIGABIT
`SWITCH
`(652)
`
`FIG.8
`
`
`
`e•
`
`7J).
`
`• ~ ~~ ~=~ >
`
`'t:l:-:....
`No
`
`~CfO
`
`\
`
`o0
`
`I
`l
`440
`I NIC
`t r 404-1 CPU 418-1
`r416-1
`CODE I: CPU VOLTAGE GENERATOR/CONTROLUNIT 1 I -
`CPU 1
`ACTIVITY
`CPU
`INDICATOR
`~DE ICPU CLOCK FREQUENCYGENERATORICONTROLUNIT2~
`GENERATOR
`422-1
`\... 420-1
`406-1 I I
`
`402-1
`
`r 4D1
`
`41~.1
`
`SERVER MODULE 1
`
`MEMORY
`
`I ACTIVITY
`
`INDICATORIS)
`
`/408-1
`
`~
`
`'410-1
`SUSPENDIRESUME
`
`AMPCBUS
`
`I?;
`
`f-
`
`I
`
`H ~C
`,/ .
`442-1 :
`
`I,
`I NIC I
`t
`CPUN
`ACTIVITY
`INDICATOR
`GENERATOR
`(
`I I
`406-N
`
`/
`412-N
`
`402-N
`
`CPU
`CODE
`CPU
`CODE
`
`: CPU VOLTAGE GENERATOR/CONTROLUNITN I
`
`I CPU CLOCK FREQ. GENERATOR/CONTROL UNIT N~
`
`('D
`
`rFJ=(cid:173)
`('D.....
`\0
`o........
`
`\0
`
`SERVER MODULE N
`
`MEMORY
`
`~4D8-N
`
`I ACTIVITY
`
`~
`
`i'-41D-N
`SUSPENDIRESUME
`
`lAC
`
`442-N ~ INO~ATOO(SI
`~
`MANAGEMENTMODULE 1(OR DELIVERED UNITS) V 430
`U=> ~ lAC
`8M CONTROLALGORITHM~
`f-432
`/
`AND UNIT
`.
`t
`I CPU I
`
`dr
`
`Jl
`
`N
`'"
`"""'"
`
`",......:I=W
`"""'"\C=N
`
`FIG. 9
`
`442-N
`ETHERNET
`
`fNiCl
`
`I
`
`
`
`e•
`
`7J).
`
`• ~ ~~ ~=~
`
`>'t:l:-:....
`
`~CfO
`
`\
`
`No
`
`o0
`
`('D
`
`rFJ=(cid:173)
`('D.........o
`o........
`
`\0
`
`dr
`
`Jl
`
`",......:I=W
`N
`'"
`"""'"
`"""'"\C=N
`
`,------------------------------------------------------
`
`320
`
`342
`
`SIGNAL338
`
`[SM302j
`
`SUSPENDIRESUME
`FROM RTC OR NIC
`ORJ.tC
`
`374
`
`IIIIIIIIII:
`
`I
`
`U t---------·
`
`t'"
`
`'---.......-J(
`
`332
`
`304
`
`SWITCH
`
`302-N
`:"::"--==:::::Jj---31)2:2
`.1: I,~~E~F~~ ~ AMPC
`
`, -
`308
`
`FIG. 10
`
`• 314 I
`
`I
`
`SUSPEND/RESUME
`
`I
`
`_
`
`
`
`e•
`
`7J).
`
`• ~ ~~ ~=~ >
`
`'t:l:-:....
`No
`
`~CfO
`
`\
`
`o0
`
`('D
`
`rFJ=(cid:173)
`('D.............
`o........
`
`\0
`
`dr
`
`Jl
`
`N
`'"
`"""'"
`
`",......:I=W
`"""'"\C=N
`
`221
`
`MANA~~MENT
`IMFORMATION
`BASE
`
`x16 SDR SDRAM ~:::=~=======vI1
`
`102
`~
`
`x16DDRSDRAM
`
`208
`
`I
`
`4
`
`II
`
`II
`
`.-1---+-t-~1 PMU:
`
`229
`32KHZ
`256 SERIAL
`PORT
`
`SYSTEM
`FLASH
`ROM
`
`\...- 257
`
`222
`
`ETHERNEt.. 265
`BUS
`
`FIG. 11
`
`
`
`e•
`
`7J).
`
`• ~ ~~ ~=~ >
`
`'t:l:-:....
`No
`
`~CfO
`
`\
`
`o0
`
`rFJ=(cid:173)
`('D.........
`.......
`
`No.
`
`\0
`
`('D
`
`dr
`
`Jl
`
`",......:I=W
`N
`'"
`"""'"
`"""'"\C=N
`
`POWER, RESET, SERIAL PORT, &WATCH DOG
`
`f,
`
`MANAGEMENT
`MODULE
`108a-1
`SWITCH/CPU
`MANAGEMENT
`=
`ETHERNET SWITCH
`MODULE
`104a-1
`
`SWITCHICPU
`MANAGEMENT
`
`MANAGEMENT
`MODULE
`108a-2
`
`'ALL UNES REPRESENT DIFFERENTIAL PAIR SIGNALS'
`
`'J
`
`FIG, 12
`
`112a-1
`112a-2
`112a-3
`112a-4
`112a-5
`112a-6
`112a-7
`112a-8
`112a-9
`112a-10
`112a-11
`112a-12
`112a-13
`112a-14
`112a-15
`112a-16
`
`SERVER MODULE
`SERVER MODULE
`SERVER MODULE
`SERVER MODULE
`SERVER MODULE
`SERVER MODULE
`SERVER MODULE
`SERVER MODULE
`SERVER MODULE
`SERVERMODULE
`SERVERMODULE
`SERVERMODULE
`SERVER MODULE
`SERVERMODULE
`SERVER MODULE
`SERVER MODULE
`:(250)
`MM~
`
`
`
`e•
`
`7J).
`•
`~
`
`~~ ~=~
`
`>'t:l:-:....
`
`~CfO
`
`\
`
`No
`
`o0
`
`530
`)
`
`('D
`
`(.H
`
`rFJ=(cid:173)
`('D.........
`o........
`
`\0
`
`dr
`
`Jl
`
`",......:I=W
`N
`'"
`"""'"
`"""'"\C=N
`
`522
`
`I
`
`LEGENDS:
`R1L=ROUTER/LOAD BALANCER
`W: WEB SERVER
`$=CACHE SERVER
`S=STREAMING MEDIASERVER
`M1: TYPE 1MASTER
`MtTYPE 2MASTER
`_ =IN BAND
`--__ : OUT OF BAND (AMPC)
`
`I
`
`I
`
`i
`,
`
`/
`
`,/;".,.-.-_._-~~~~;~~~~~~--~~---~-~---'"
`r
`,
`,
`'......
`
`-~---~-~---------~ • • • _ - - - - - - - -_ . - -_ . j '
`
`"
`
`.-504
`
`I
`
`510-1
`
`510-2
`
`501
`
`rf---l
`
`~ R1L I
`
`/524
`
`502
`
`516-1
`
`518-1
`
`520-1
`
`I
`
`I
`
`I
`
`I
`
`I
`
`IAI
`
`..
`
`FIG. 13
`
`
`
`u.s. Patent
`
`Apr. 18,2006
`
`Sheet 14 of 19
`
`US 7,032,119 B2
`
`<.9-u..
`
`III•IIIIIIIIII•III
`
`------..------
`
`IIIIIIIIIIIIIIIIIIII1IIIIIIIIII
`
`I
`I
`----------------1W----~~~nn~-----~----------------~------
`I
`I
`I
`I
`I
`I
`•
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`
`
`u.s. Patent
`
`U.S. Patent
`
`Apr. 18, 2006
`Apr. 18,2006
`
`Sheet 15 0f 19
`Sheet 15 of 19
`
`US 7,032,119 B2
`US 7,032,119 B2
`
`
`
`FIG. 15
`FIG. 15
`
`
`
`u.s. Patent
`
`Apr. 18,2006
`
`Sheet 16 of 19
`
`US 7,032,119 B2
`
`\
`
`"
`
`1 ...
`.... "'" :'...,, ,
`L \ "\
`\ "
`\. ""\..\ \
`AI
`.,
`..
`: :
`., ,
`'-- ~--'0
`
`,
`,
`
`\
`
`~
`
`\
`
`'\
`..
`
`"'\
`\
`
`\
`
`\
`
`I
`I
`I
`
`I
`I
`r
`
`FIG. 16
`
`FIG. 17
`
`FIG. 18
`
`FIG. 19
`
`FIG. 20
`
`FIG. 21
`
`FIG. 22
`
`SUPERVISORIMAPTERIMOVE
`CONTROL SIGNALS
`III
`
`t
`
`CD
`CD
`
`~
`SERVER MOD 2
`
`~
`CD
`
`CD
`
`SERVER MOD 1
`
`• ...
`...
`
`CD
`~CD
`
`SERVER MOD 3
`
`•
`CD
`CD
`
`~
`SERVER MODN
`
`FIG. 23
`
`FIG. 24
`
`
`
`u.s. Patent
`
`Apr. IS, 2006
`
`Sheet 17 of 19
`
`US 7,032,119 B2
`
`HOSTCOMPUTER
`HOST
`PROCESSOR
`
`1p2
`
`100
`
`~O3
`
`101 ;
`
`V
`
`,109
`
`MEMORY
`(RAM)
`
`BUSS
`
`~108
`
`CON~~LLER
`{ PROCESSOR I
`~~112
`MEMORY
`PROCEDURES ---110
`~-114
`DATA
`
`/104
`
`ST~~GE
`SUB-SYSTEM
`(E.G. RAID 1OR
`RAID 0+~)
`EJ
`EJ
`
`FIG. 25
`
`
`
`u.s. Patent
`
`Apr. 18,2006
`
`Sheet 18 of 19
`
`US 7,032,119 B2
`
`,--100
`101
`../
`
`HOSTCOMPUTER
`
`~O2
`
`HOST
`PROCESSOR
`
`BUSS
`
`I
`109.-/
`
`MEMORY
`(E.G. RAM)
`PROCEDURES-
`DATA
`
`~
`
`-103
`110
`
`114
`
`v104
`
`RAID
`STORAGE
`SUB-SYSTEM
`(E.G. RAID 1OR
`RAID 0+1)
`EJ
`EJ
`
`FIG. 26
`
`
`
`u.s. Patent
`
`Apr. 18,2006
`
`Sheet 19 of 19
`
`US 7,032,119 B2
`
`RAID 1
`
`RAID 0+1
`
`".-
`r----
`
`RAID SET
`
`~
`
`ABC
`DISK
`
`ABC
`DISK
`
`..-/
`
`FIG. 27
`
`r----.
`
`RAID SET
`
`ABC
`DISK
`
`Ii...
`
`I
`
`DEF
`DISK
`""'----
`
`----
`ABC
`DISK ...
`
`I
`
`DEF
`DISK
`
`~
`
`FIG. 28
`
`
`
`US 7,032,119 B2
`
`1
`DYNAMIC POWER AND WORKLOAD
`MANAGEMENT FOR MULTI-SERVER
`SYSTEM
`
`RELATED APPLICATIONS
`
`This application is a continuing application under 35
`U.S.c. 119(e) and 120, wherein applicant and inventor claim
`the benefit of priority to U.S. Provisional Application Ser.
`No. 60/283,375 entitled System, Method And Architecture
`For Dynamic Server Power Management And Dynamic
`Workload Management for Multi-Server Environment filed
`11 Apr. 2001; U.S. Provisional Application Ser. No. 60/236,
`043 entitled System, Apparatus, and Method for Power(cid:173)
`Conserving Multi-Node Server Architecture filed 27 Sep.
`2000; and U.S. Provisional Application Ser. No. 60/236,062
`entitled System, Apparatus, and Methodfor Power Conserv(cid:173)
`ing and Disc-Drive Life Prolonging RAID Configuration
`filed 27 Sep. 2000; each of which application is hereby
`incorporated by reference.
`
`FIELD OF THE INVENTION
`
`This invention pertains generally to architecture, appara(cid:173)
`tus, systems, methods, and computer programs and control
`mechanisms for managing power consumption and work(cid:173)
`load in data and information servers; more particularly to
`power consumption and workload management and control
`systems for high-density multi-server computer system
`architectures that maintain performance while conserving
`energy and to the method for power management and
`workload management used therein, and most particularly to
`system, method, architectures, and computer programs for
`dynamic server power management and dynamic workload
`management for multi-server envirouments.
`
`BACKGROUND
`
`Heretofore, servers generally, and multi-node network
`servers in particular, have paid little if any attention to power
`or energy conservation. Such servers were designed and
`constructed to run at or near maximum levels so as to serve
`data or other content as fast as possible, or where service
`demands were less than capacity to remain ever vigilant to
`provide fast response to service requests. Increasing proces(cid:173)
`sor and memory speeds have typically been accompanied by
`higher processor core voltages to support the faster device
`switching times, and faster hard disk drives have typically
`lead to faster and more energy-hungry disk drive motors.
`Larger memories and caches have also lead to increased
`power consumption even for small single-node servers.
`Power conservation efforts have historically focused on the
`portable battery-powered notebook market where battery
`life is an important marketing and use characteristic. How(cid:173)
`ever, in the server area, little attention has been given to
`saving power, such servers usually not adopting or utilizing
`even the power conserving suspend, sleep, or hibernation
`states that are available with some Microsoft 95/98/2000,
`Linux, Unix, or other operating system based computers,
`personal computers, PDAs, or information appliances.
`Multi-node servers present a particular energy consump(cid:173)
`tion problem as they have conventionally be architected as
`a collection of large power hungry boxes interconnected by
`external interconnect cables. Little attention has been placed
`on the size or form factor of such network architectures, the
`expansability of such networks, or on the problems associ(cid:173)
`ated with large network configurations. Such conventional
`
`2
`networks have also by-and-Iarge paid little attention to the
`large amounts of electrical power consumed by such con(cid:173)
`figurations or in the savings possible. This has been due in
`part because of the rapid and unexpected expansion in the
`Internet and in servers connected with and serving to Inter(cid:173)
`net clients. Internet service companies and entrepreneurs
`have been more interested in a short time to market and
`profit than on the effect on electrical power consumption and
`electrical power utilities; however, continuing design and
`10 operation without due regard to power consumption in this
`manner is problematic.
`Networks servers have also by-and-Iarge neglected to
`factor into the economics of running a network server
`system the physical plant cost associated with large rack
`15 mounted equipment carrying perhaps one network node per
`chassis. These physical plant and real estate costs also
`contribute to large operating costs.
`In the past, more attention was given to the purchase price
`of equipment and little attention to the operating costs. It
`20 would be apparent to those making the calculation that
`operating costs may far exceed initial equipment purchase
`price, yet little attention has been paid to this fact. More
`recently,
`the power available in the California electrical
`market has been at crisis levels with available power
`25 reserves dropping below a few percent reserve and rolling
`blackouts occurring as electrical power requirements drop
`below available electrical power generation capacity. High
`technology companies in the heart of Silicon Valley cannot
`get enough electrical power to make or operate product, and
`30 server farms which consume vast quantities of electrical
`energy for the servers and for cooling equipment and facili(cid:173)
`ties in which they are housed, have stated that they may
`relocated to areas with stable supplies oflow-cost electricity.
`Even were server manufactures motivated to adopt avail-
`35 able power management techniques, such techniques repre(cid:173)
`sent only a partial solution. Conventional computer system
`power management tends to focus on power managing a
`single CPU, such as by monitoring certain restricted aspects
`of the single CPU operation and making a decision that the
`40 CPU should be run faster to provide greater performance or
`more slowly to reduce power consumption.
`Heretofore, computer systems generally, and server sys(cid:173)
`tems having a plurality of servers where each server includes
`at least one processor or central processing unit (CPU) in
`45 particular have not been power managed to maintain per(cid:173)
`formance and reduce power consumption. Even where a
`server system having more than one server component and
`CPU may possibly have utilized a conventional personal
`computer architecture that provided some measure of local-
`50 ized power management separately within each CPU, no
`global power management architecture or methods have
`conventionally been applied to power manage the set of
`servers and CPUs as a single entity.
`The common practice of over-provisioning a server sys-
`55 tem so as to be able to meet peak demands has meant that
`during long periods of time, individual servers are consum(cid:173)
`ing power and yet doing no useful work, or several servers
`are performing some tasks that could be performed by a
`single server at a fraction of the power consumption.
`Operating a plurality of servers, including their CPU, hard
`disk drive, power supply, cooling fans, and any other circuits
`or peripherals that are associated with the server, at such
`minimal loading also unnecessarily shortens their service
`life. However, conventional server systems do not consider
`65 the longevity of their components. To the extent that certain
`of the CPUs, hard disk drives, power supplies, and cooling
`fans may be operated at lower power levels or for mechani-
`
`60
`
`
`
`US 7,032,119 B2
`
`3
`cal systems (hard disk drive and cooling fans in particular)
`their effective service life may be extended.
`Therefore there remains a need for a network architecture
`and network operating method that provides large capacity
`and multiple network nodes or servers in a small physical
`footprint and that is power conservative relative to server
`performance and power consumed by the server, as well as
`power conservative from the standpoint of power for server
`facility air conditioning. These and other problems are
`solved by the inventive system, apparatus and method.
`There also remains a need for server farms that are power
`managed in an organized global manner so that performance
`is maintained while reducing power consumption. There
`also remains a need to extend the effective lifetime of
`computer system components and servers so that the total
`cost of ownership is reduced.
`
`SUMMARY
`
`Aspects of the invention provide network architecture,
`computer system and/or server, circuit, device, apparatus,
`method, and computer program and control mechanism for
`managing power consumption and workload in computer
`system and data and information servers. Further provides
`power and energy consumption and workload management
`and control systems and architectures for high-density and
`modular multi-server computer systems that maintain per(cid:173)
`formance while conserving energy and method for power
`management and workload management. Dynamic server
`power management and optional dynamic workload man(cid:173)
`for multi-server environments is provided by
`agement
`aspects of the invention. Modular network devices and
`integrated server system, including modular servers, man(cid:173)
`agement units, switches and switching fabrics, modular
`power supplies and modular fans and a special backplane 35
`architecture are provided as well as dynamically reconfig(cid:173)
`urable multi-purpose modules and servers.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a diagrammatic illustration showing a exemplary
`embodiment of an inventive power conserving high-density
`server system.
`FIG. 2 is a diagrammatic illustration showing an exem(cid:173)
`plary embodiment of a single 2U high rack mountable
`Integrated Server System Unit having a plurality of modular
`server units.
`FIG. 3 is a diagrammatic illustration showing a standard
`server farm architecture in which multiple nodes are indi(cid:173)
`vidually connected by cables to each other to form the
`desired network.
`FIG. 4 is a diagrammatic illustration showing an embodi(cid:173)
`ment of the inventive Integrated Appliance Server (IAS)
`standard architecture also or alternatively referred to as an
`Integrated Server System (ISS) architecture in which mul- 55
`tiple nodes selected from at least a computer node (CN) such
`as a server module (SM), network node (NN) also referred
`to as a switch module, and monitor or management node
`(MN) also referred to as a Management Module (MM) are
`provided within a common enclosure and coupled together 60
`via an internal backplane bus.
`FIG. 5 is a diagrammatic illustration showing another
`embodiment ofthe invention in which multiple modular lAS
`(or ISS) clusters each containing multiple nodes are cas(cid:173)
`caded to define a specialized system.
`FIG. 6 is a diagrammatic illustration showing an embodi(cid:173)
`ment of an Integrated Server System Architecture having
`
`4
`two interconnected integrated server system units (ISSUs)
`and their connectivity with the external world.
`FIG. 7 is a diagrammatic illustration showing an exem(cid:173)
`plary embodiment of an AMPC bus and the connectivity of
`Server Modules and Management Modules to the bus to
`support serial data, video, keyboard, mouse, and other
`communication among and between the modules.
`FIG. 8 is a diagrammatic illustration showing an exem(cid:173)
`plary embodiment of ISSU connectivity to gigabit switches,
`10 routers, load balances, and a network.
`FIG. 9 is a diagrammatic illustration showing an embodi(cid:173)
`ment of the inventive power conserving power management
`between two servers and a manager.
`FIG. 10 is a diagrammatic illustration showing an alter(cid:173)
`15 native embodiment of a server system showing detail as to
`how activity may be detected and operating mode and power
`consumption controlled in response.
`FIG. 11 is a diagrammatic illustration showing another
`system particular
`alternative embodiment of a server
`20 adapted for a Transmeta Crusoe™ type processor having
`LongRun™ features showing detail as to how activity may
`be detected and operating mode and power consumption
`controlled in response.
`FIG. 12 is a diagrammatic illustration showing aspects of
`25 the connectivity of two management modules to a plurality
`of server modules and two Ethernet switch modules.
`FIG. 13 is a diagrammatic illustration showing an exem(cid:173)
`plary internetwork and the manner in which two different
`types of master may be deployed to power manage such
`30 system.
`FIG. 14 is a diagrammatic illustration showing a graph of
`the CPU utilization (processor activity) as a function oftime,
`wherein the CPU utilization is altered by entering different
`operating modes.
`FIG. 15 is a diagrammatic illustration showing an exem(cid:173)
`plary state engine state diagram graphically illustrating the
`relationships amongst the modes and identifYing some ofthe
`transitions between states or modes for operation of an
`embodiment of the inventive system and method.
`FIGS. 16-23 are diagrammatic illustrations showing
`exemplary state diagram for operating mode transitions.
`FIG. 24 is a diagrammatic illustration showing the man(cid:173)
`ner in which a plurality of servers may operate in different
`modes based on local detection and control of selected mode
`45 transitions and local detection but global control of other
`selected mode transitions.
`FIG. 25 is a diagrammatic illustration showing an
`embodiment of a computer system having a plurality of hard
`disc drives configured in a RAID configuration and using a
`50 separate RAID hardware controller.
`FIG. 26 is a diagrammatic illustration showing an alter(cid:173)
`native embodiment of a computer system having a plurality
`of hard disc drives configured in a RAID configuration and
`using software RAID control in the host processor.
`FIG. 27 is a diagrammatic illustration showing an exem(cid:173)
`plary RAIDl configuration.
`FIG. 28 is a diagrammatic illustration showing an exem(cid:173)
`plary RAIDO+ 1(RAID lO) configuration.
`
`40
`
`DETAILED DESCRIPTION OF EMBODIMENTS
`OF THE INVENTION
`
`The present invention pertains to computer system archi(cid:173)
`tectures and structures and methods for operating such
`65 computer system architectures in a compact high-perfor(cid:173)
`mance low-power consumption marmer. Computers, infor(cid:173)
`mation appliances, data processing systems, and all manner
`
`
`
`US 7,032,119 B2
`
`5
`of electronic systems and devices may utilize and benefit
`from the innovations described herein. Aspects of the inven(cid:173)
`tion also contribute to reliability, ease of maintenance, and
`longevity of the system as a whole and operation compo(cid:173)
`nents thereof. In an application that is of particular impor(cid:173)
`tance and which benefits greatly from the innovations
`described here, the computer system is or includes a server
`system having at least one and more typically a plurality of
`servers. Each server will include at least one processor or
`CPU but may include multiple CPUs. In multiple server 10
`configurations significant power consumption reduction is
`achieved by applying the inventive power management
`scheme. These and other aspects of the invention are
`described in the sections that follow.
`The physical form factors of the server modules and 15
`management modules provide significant advantages, how(cid:173)
`ever, it will be appreciated that the invention need not be
`limited to such modular servers or modular management
`elements, and that the invention extends to discrete servers
`and management elements. It is also to be appreciated that 20
`although the exemplary embodiments focus attention toward
`servers, server systems, and power saving features for server
`systems, that aspects of the invention transcend such servers
`and server environments. For example, distributed computer
`systems of all types may benefit from the form of coordi- 25
`nated management and control to determine CPU loading
`and coordinate computational processing over a multiplicity
`of processors.
`Section headers, where provided, are merely for the
`convenience of the reader and are not to be taken as limiting 30
`the scope of the invention in any way, as it will be under(cid:173)
`stood that certain elements and features of the invention
`have more than one function and that aspects of the inven(cid:173)
`tion and particular elements are described throughout the
`specification.
`With respect to FIG. 1 there is shown an exemplary rack
`mounted server system 50. The rack carries a plurality of 2U
`high integrated server system units 52 each having one or
`more management modules (MM) 53 and one or more
`server modules (SM) 54, each server module providing a 40
`fully independent server. Each server includes a processor or
`CPU and memory, mass storage device such as a hard disk
`drive, and input/output ports. In the embodiment illustrated
`each 2U high chassis 55 has 16 slots each of which may
`contain a PC-board mounted server module 54 or manage- 45
`ment module 53. The chassis 55 also provides one or more
`power supplies 56 and one or more cooling fan banks 57.
`These elements are coupled for communication by switches
`59 and a backplane 58.
`ISS chassis units 55 may be coupled 50
`The different
`together to form a larger system and these server units share
`a gigabit uplink 60, load balancer 61, a router 62 to connect
`to a network such as the Internet 63. Network Attached
`Storage (NAS) 64 may desirably be provided to increase
`storage capacity over that provided in individual server 55
`modules. Local and/or remote management nodes or work(cid:173)
`stations 65 may be provided to permit access to the system
`50. As power management is an important feature of aspects
`of the invention, the provision of electric service 66 to the
`system 50 as well as electric service 68 to building or 60
`facilities air conditioning or cooling 69 is also illustrated.
`Content or data may readily be served to remote clients 70
`over the Internet 63.
`The illustration in FIG. 1 shows how the form factor ofthe
`server and management modules increases server density 65
`and reduces the footprint of the server system. Of course
`multiple racks may be added to increase system capacity.
`
`6
`The inventive power management feature extends to indi(cid:173)
`vidual server modules, to groups of server modules, and to
`the entire set of server modules in the system 50 as desired.
`Power management may also be applied to the management
`modules, power supply modules, switches, cooling fan
`modules, and other components of the ISS.
`An exemplary embodiment of an ISS unit is illustrated in
`FIG. 2, which shows the manner in which PC board based
`server modules and management modules plug into a back
`plane along with power supplies, cooling fan units, switches,
`and other components to provide the high-density system.
`These and other features are described in greater detail in the
`remainder of this specification.
`With respect to FIG. 3, there is shown in diagrammatic
`form, an illustration showing a standard server farm archi(cid:173)
`tecture in which multiple nodes are individually connected
`by cables to each other to form the desired network. Server
`farms such as this are typically power hungry, operate
`continuously with little or no regard for actual usage, have
`a large footprint, and generate large amounts of heat that
`require considerable air conditioning to dissipate or remove.
`FIG. 4 is a diagrammatic illustration showing an embodi-
`ment of the inventive Integrated Server System (ISS) stan(cid:173)
`dard architecture in which multiple nodes selected from at
`least a computer node (CN) or Server Module (SM), net(cid:173)
`work node (NN) or Switch Module (SWM), and monitor
`node (MN) or Management Module (MM) are provided
`within a common enclosure and coupled together via an
`internal backplane bus and internal switch. Two separate
`switching fabrics sw1 and swO are provided and described
`hereinafter. Up-link (upO and up1) and down