`US 7,140,020 B2
`(10) Patent No.:
`(12)
`McCarthyet al.
`(45) Date of Patent:
`Nov. 21, 2006
`
`
`US007140020B2
`
`(54) DYNAMIC MANAGEMENT OF VIRTUAL
`PARTITION COMPUTER WORKLOADS
`THROUGH SERVICE LEVEL
`OPTIMIZATION
`
`(75)
`
`Inventors: Clifford A. McCarthy, Richardson, TX
`(US); Thomas E. Turicchi, Dallas, TX
`(US); Steven R. Landherr, Baton
`Rouge, LA (US)
`(73) Assignee: CewlettPackardDevelopmemS
`ompany,
`L.P.,
`Houston,
`(US)
`(*) Notice:
`Subject to any disclaimer, the term ofthis
`:
`;
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 669 days.
`
`:
`
`.
`
`(21) Appl. No.: 10/206,594
`(22)
`Filed:
`Jul. 26, 2002
`oo
`.
`Prior Publication Data
`US 2003/0037092 Al
`Feb. 20, 2003
`
`(65)
`
`Related U.S. Application Data
`. .
`.
`
`5,675,739 A
`6,587,938 BL*
`6,721,568 BL*
`
`10/1997 Eilert etal.
`7/2003 Eilert et al. oo... 712/29
`4/2004 Gustavssonetal. ........ 455/450
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`
`1 091 296
`
`4/2001
`
`OTHER PUBLICATIONS
`
`U.S. Appl. No. 09/493,753, McCarthy et al.
`U.S. Appl. No. 09/562,590, Bouchier et al.
`Govil K.et al: “Cellular Disco: Resource ManagementusingVirtual
`Clusters on Shared-Memory Multiprocessors” Operating Systems
`Review(Sigops), ACM Headquarter. New York, US,vol. 33, No. 5,
`Dec. 1999, pp. 154-169.
`“Partitioning for the IBM eserver pSeries 690 System” IBM White
`Paper, 2001, page complete 12.
`European Search Report issued for EP 03 25 3723 dated Jul. 19,
`2005.
`* cited by examiner
`Primary Examiner—Meng-Al T. An
`Assistant Examiner—Camquy Truong
`
`(57)
`
`ABSTRACT
`
`The present invention is directed to a system and method for
`
`managing allocation of a computer resource to at least one
`(63) Reaaaye3000f application Noeeie793,
`
`
`
`
`
`ee On Jan. OsfA23911£0, » DOW Bab INO: partition of a plurality of partitions of a multiple partition
`computer system, the system comprising:a plurality of work
`Int. Cl.
`(51)
`:
`:
`:
`(2006.01)
`GO6F 9/46
`load managers, with one work load manager associated with
`(2006.01)
`GO6F 15/00
`each partition of the plurality of partitions, wherein each
`718/104: 712/13
`52)
`,
`US. Cl
`work load manager determines a resource request value for
`?
`;
`(52) eeveeDoerr
`the computer resource based onatleast onepriority assigned
`(58) Field of Classification Search ................ 709/105;
`to its partition associated with the computer resource; and a
`,
`718/104; 712/13
`partition load managerthat is operative to form an allocation
`See application file for complete search history.
`value for each respective partition based on a respective
`References Cited
`resource request value; wherein the system apportions the
`computer resource amongthe plurality of partitions based on
`U.S. PATENT DOCUMENTS
`the allocation values.
`
`(56)
`
`5,473,773 A * 12/1995 Aman et al... 718/104
`5,487,170 A *
`1/1996 Basset al... 710/244
`
`20 Claims, 5 Drawing Sheets
`
`PARTITION RESOURCE ALLOCATOR
`
`PARTITION 1 Google Exhibit 1005
`
`Google Exhibit 1005
`Google v. Valtrus
`Google v. Valtrus
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 1 of 5
`
`US 7,140,020 B2
`
`10iAPPLICATION 1uciLa
`
`FIG.
`
`1A
`
`APPLICATION 1] APPLICATION 2|APPLICATION 3
`
`12
`
`13
`
`14
`
`0%
`
`t
`30%
`
`60%
`
`100%
`
`FIC.
`
`1B
`
`PARTITION RESOURCE ALLOCATOR
`
`PARTITION 1
`
`FIG. 2A
`
`201
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 2 of 5
`
`US 7,140,020 B2
`
`FIG. 2B
`
`205
`
`12
`
`s
`
`APPLICATION
`1
`
`1
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 3 of 5
`
`US 7,140,020 B2
`
`301
`
`START
`
`
`
`316
`
`
`
`-~302
`RECEIVE REQUESTS FROM WLMs
`
`
`
`DETERMINE
`WHETHER REALLOCATION
`IS NECESSARY?
`
`
`DESIGNATE CURRENT TARGET
`YES
`AS THE LOWEST OF
`
`
`
`ASSIGN ONE RESOURCE
`1) PREVIOUSLY ALLOCATED
`TO EACH PARTITION IN
`AMOUNT
`
`ALL PRIORITY GROUPS
`(NOT USED AS PREVIOUS
`
`
`TARGET)
`
`2) REQUESTED AMOUNT OF
`
`
`EXAMINE REQUESTS FROM ALL
`ONE PARTITION AMOUNT OF
`
`PARTITIONS FROM THE
`THE GROUP (NOT USED AS A
`
`PARTITIONS IN HIGHEST
`
`
`
`PREVIOUS TARGET) WHEREIN
`
`
`
`UNEXAMINED PRIORITY GROUP
`BOTH 1) AND 2)
`IGNORE
`
`PARTITIONS THAT REACHED
`
`THEIR REQUESTED AMOUNT
`
`
`WITHIN
`
`
`THE PRIORITY
`310
`
`GROUP, CAN EACH
`
`
`CAN EACH
`PARTITION BE ALLOCATED ITS
`
`PARTITION BE
`
`
`
`REQUESTED
`
`
`ALLOCATED AN AMOUNT OF
`AMOUNT
`
`RESOURCE TO EQUAL (OR
`EXCEED) THE CURRENT
`
`
`
` DIVIDE EQUALLY
`TARGET?
`
`311
`AMONG
`
`PARTITIONS OF
`
`
`
`THE GROUP
`
`
`ANY
`WHOSE
`ALLOCATE AN AMOUNT
`
`TO EQUAL THE CURRENT
`MORE PRIORITY
`ALLOCATED
`
`
`
`
`
`GROUPS?
`TARGET (IGNORING
`
`
`
`AMOUNT [S LESS
`506
`THAN CURRENT
`PARTITIONS THAT HAVE
`
`
`
`REACHED THEIR
`TARGET
`
`
`
`
`
`ANY
`REQUESTED AMOUNT)
`(IGNORING
`
`
`
`PARTITIONS THAT
`NO
`UNALLOCATED
`
`
`
`HAVE REACHED
`AMOUNTS?
`
`
`THEIR
`
`
`AN
`
`
`
`REQUESTED
`UNALLOCATED AMOUNT
`AMOUNT)
`
`
`
`313
`
`FIG. 8
`
`312
`
`
`
`U.S. Patent
`
`Nov.21, 2006
`
`Sheet 4 of 5
`
`US 7,140,020 B2
`
`a PRIORITY
`
`|REQUESTS CUMULATIVE ALLOCATIONS OF 19 RESOURCES
`
`PARTITIONS
`
`123 4
`
`2 3 4 5
`
`
`PARTITIONS CUMLATIVE
`
`FIG. 4A
`
`401
`
`402
`
`403
`
`404
`
`405
`
`406
`
`
`
`|REQUESTS CUMULATIVE ALLOCATIONS OF 24 RESOURCES
`
`
`
`tomas|1! 27 ps fon
`
`Veeeeee ee eneeeyeeaeeeSeee
`FIG. 4B 408
`409
`411
`412
`«413041445
`
`51
`
`52
`
`53
`
`SUMMING
`
`ARBITRATED
`VALUES
`
`CUMULATIVE
`
`SUBTRACTION TO
`FORM ROUNDED
`VALUES
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 5 of 5
`
`US 7,140,020 B2
`
`R1=3.5
`
` S1=R14+0=3.5 —» 4
`
`R2=3.5
`
` S2=R1+R2=7.0 —» 7
`
`R1’=S1-0=4
`
`R2’=S2-S1=3
`
`
`
`R3=3.0 S3=R1+R2+R3=10.0 —- 10=R3’=S3-S2=3
`FIG. 5B
`
`R1=10.1
`
`S1=R1+0=10.1 —» 10
`
`R2=20.2
`
`S2=R1+R2=30.3 —> 30
`
`R3=30.3
`
`S3=R1+R2+R3=60.6 —> 61
`
`R1’=S1-0=10
`
`R2’=S2-S1=20
`
`RS’ =S5-S2=31
`
`R4=39.4
`
`S4=R1+R2+R3+R4=100.0 — 100
`
`R4’=S4-S3=39
`
`FIG. 5C
`
`600
`
`601
`
`aaa
`
`603
`
`604
`
`605
`
`NETWORK
`
`1/0
`
`602
`
`USER
`
`COMMUNICATIONS
`
`[614
`
`ADAPTER
`
`
`jw ADAPTER
`ADAPTER
`
`
`INTERFACE
`
`ADAPTER
`
`EB
`607 GH“
`
`608
`FIG. 6
`
`609
`
`
`
`US 7,140,020 B2
`
`1
`DYNAMIC MANAGEMENTOF VIRTUAL
`PARTITION COMPUTER WORKLOADS
`THROUGH SERVICE LEVEL
`OPTIMIZATION
`
`CROSS-REFERENCE TO RELATED APPLICATIONS
`
`The present application is a continuation-in-part of U.S.
`application Ser. No. 09/493,753 entitled “DYNAMIC
`MANAGEMENT OF COMPUTER WORKLOADS
`THROUGH SERVICE LEVEL OPTIMIZATION,” filed
`Jan. 28, 2000 and is related to U.S. Pat. No. 6,725,317
`entitled “RECONFIGURATION SUPPORT FOR A MULTI
`PARTITION COMPUTER SYSTEM,” the disclosures of
`which are hereby incorporated herein by reference.
`
`TECHNICAL FIELD
`
`This application relates in general to computer systems,
`and in specific to dynamic allocation of computer resources
`among applications.
`
`BACKGROUND OF THE INVENTION
`
`Computer systems inherently have limited resources, par-
`ticularly CPU resources. These limited resources must be
`allocated among the different applications operating within
`the system. A prior allocation mechanism for allocating
`system resources to applications is a system known as
`Process Resource Manager (PRM) 10 as shownin FIG. 1A.
`It is used to partition the CPU resource 11 and various other
`resources among the different applications 12, 13, 14. The
`PRMpartitions the resources into fractions of the whole,
`which are expressed as percentages in PRM configuration,
`as shown in FIG. 1B. The fractions or pieces are then
`assigned to groups of processes, which comprise applica-
`tions. Each application would then receive someportion of
`the available resources.
`
`The PRMis a static mechanism, meaning that the allo-
`cation configuration is fixed by an administrator, and can
`only be changed by an administrator. In other words, the
`administrator specifies where the partitions should lie, 1.e.,
`what percent of the machine goes to application 12, appli-
`cation 13, and application 14. This information is fixed, so
`it cannot respond to changes in the needs of the different
`applications. For example, one application may be mostly
`idle, but occasionally has a large amount of work to do.
`Under the static mechanism with fixed entitlements, this
`application would be allocated a smaller fraction of the CPU
`resources, as a larger fraction can not be justified because of
`the large amountof idle time. Consequently, when the large
`amount of work is received, then the application’s perfor-
`mance will suffer because of its low entitlement. Therefore,
`the transactions will
`take longer
`to process. Another
`example is where a transaction requires large amounts of
`resources for extended periods of time, but also has periods
`of idle time. Under the static mechanism with fixed entitle-
`
`ments, this application would be allocated a larger fraction
`of the CPU resources. Consequently, when this application
`is idle, other applications’ performances will suffer, as this
`application is assigned a large amountof resources that are
`not being used, and thus are not available for other appli-
`cations. Therefore, the other transactions will take longer to
`process. Thus, this mechanism cannot handle changesin the
`requirements of the different applications.
`Another problem is the partitioning of the resources by
`the administrator. The administrator has to think in terms of
`
`the actual machine resources and the requirements of the
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`different applications. This is problematic because the
`resources and applications are operating at a lower level than
`whata person typically views. Moreover, the administrator
`has to have a great deal of knowledge of the application’s
`characteristics and its resource requirements in order to
`determine whereto set the partitions. Lack of knowledge is
`typically made up with guesswork. For example, an admin-
`istrator may chooseto set application 13 at 20% of the CPU
`resources. If the users of the system complain, the admin-
`istrator may change the value later on.
`in U.S. Pat. No.
`An alternative mechanism is taught
`5,675,739 by IBM, which is hereby incorporated by refer-
`ence. The IBM mechanism uses a priority-based model to
`process applications. In other words, high priority applica-
`tions are serviced from a queue before lowerpriority appli-
`cations. This mechanism can changethe priorities to adjust
`processing performance.
`Such prior art mechanismsare also ineffective for mul-
`tiple partition systems. Large computer systems, e.g. those
`with multiple processors, multiple I/O resources, multiple
`storage resources, etc., can be separated into partitions or
`protected domains. These partitions are hardware separa-
`tions that place resources into separate functional blocks.
`Resources in one block do not have direct access to
`resources in another block. This prevents one application
`from using the entire system resources, as well as contains
`faults and errors. However, the partitions, once defined, are
`static in nature, and cannot be readily changed without
`operator intervention. Thus, resources cannot be readily
`moved from one partition to another to satisfy workload
`balancing.
`
`BRIEF SUMMARY OF THE INVENTION
`
`The present invention is directed to a system and method
`for managing allocation of a computer resource to at least
`one partition of a plurality of partitions of a multiple
`partition computer system, the system comprising: a plural-
`ity of work load managers, with one work load manager
`associated with each partition of the plurality of partitions,
`wherein each work load manager determines a resource
`request value for the computer resource based onat least one
`priority assigned to its partition associated with the com-
`puter resource; and a partition load managerthat is operative
`to form an allocation value for each respective partition
`based on a respective resource request value; wherein the
`system apportions the computer resource amongthe plural-
`ity of partitions based on the allocation values.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1A depicts a prior art resource manager;
`FIG. 1B depicts the portioning of the applications of FIG.
`1A;
`FIG. 2A depicts the inventive partition load manager
`(PLM) operating with a plurality of partitions;
`FIG. 2B depicts a partition of FIG. 2A;
`FIG. 3 depicts a flow chart of the operations of the PLM
`of FIG. 2A;
`FIGS. 4A and 4B depict examples of allocation of
`resources by the PLM of FIG. 2A;
`FIGS. 5A, 5B, and 5C depict the operation of the rounder
`of the PLM of FIG. 2.A; and
`FIG. 6 depicts a block diagram of a computer system
`which is adapted to use the present invention.
`
`
`
`US 7,140,020 B2
`
`3
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`The invention dynamically responds to changes in work-
`load characteristics in a computer system. The computer
`system may comprise a single small computer, e.g. a per-
`sonal computer, a single large computer (e.g. an enterprise
`server), or a network of larger and/or small computers. The
`computers, particularly the large computers, or the network
`maybe divided into protection domainsorpartitions. Each
`partition may be running its own operating system. In any
`event,
`the inventive mechanism preferably allows the
`administrator to think in terms of performance goals rather
`than computer system resources and requirements. Conse-
`quently, the administrator preferably defines a variety of
`performance goals with different priorities between them,
`and the inventive mechanism will preferably make any
`necessary adjustment of the resources. The goals can be
`preferably set without regard to partitions. For example, a
`goal for a database portion of the computer system could be
`that a retrieval transaction should not take more than 10
`
`milliseconds. The inventive mechanism would then manipu-
`late the resources to achieve this goal. For multiple partition
`computer systems, the resources may be manipulated within
`a partition, e.g. processor time being allocated among appli-
`cations, or the resources may be manipulated between
`partitions, e.g. reassigning a processor from onepartition to
`other (effectively resizing the partitions), or combination of
`both. Note that the resources may be located on one physical
`computer and are allocated to an application or partition
`located on another physical computer.
`The inventive mechanism preferably includesa partition
`load manager (PLM) that receives resource request infor-
`mation from the partitions of the system. The PLM prefer-
`ably examines the resource request information, and com-
`pares the request information with the available resources.
`Based on the comparison, the PLM mayincrease, decrease,
`or leave unchanged, a particular partition’s resources. If the
`performanceofa partition is lagging, e.g., if transactions are
`taking longer than the goals, then the partition may request
`an increase in the resource entitlement from the PLM. If a
`
`partition is over-achieving,then the partition may inform the
`PLMthatit has excess resources, and the PLM may decrease
`its entitlement and allocate it to another partition or parti-
`tions.
`
`Each partition preferably includes a work load manager
`(WLM)which operates similarly to the PLM, but operates
`within a particular partition. The WLM is more fully
`explained in U.S. application Ser. No. 09/493,753 entitled
`“DYNAMIC MANAGEMENT OF COMPUTER WORK-
`LOADS THROUGHSERVICE LEVEL OPTIMIZATION,”
`filed Jan. 28, 2000, which is hereby incorporated herein by
`reference. Each WLM also receives goal information and
`priority information from a user or administrator. Note that
`such goal andpriority information may be the sameforall
`partitions or the information may be specific to each parti-
`tion or groups of partitions. The WLMalso receives perfor-
`mance information from performance monitors, which are
`processes that monitor the performance of the applications
`and devices within the partition. The WLM examines the
`information from the performance monitors and compares
`the information with the goals. Based on the comparison, the
`WLM may increase, decrease, or leave unchanged, an
`application’s entitlement. If the performance of an applica-
`tion is lagging,e.g., if transactions are taking longer than the
`goal, then the WLM increases the entitlement. If an appli-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`cation is over-achieving, then the WLM will decrease its
`entitlement and allocate it to another application.
`The WLMsalso interacts with the PLM. Each WLM
`
`initially and periodically, after determining its resource
`needs, sends resource request information to the PLM. The
`PLM,after receiving such requests, then allocates system
`resources betweenthe partitions. Each WLM,after receiving
`information aboutits partitions resources, then allocates its
`allotted resources among the applications onits partition.
`In multiple partition systems, the PLM mayreside in one
`partition and have access to the other partitions. Alterna-
`tively, the PLM mayreside in a service module that manages
`all of the partitions. Alternatively, the PLM may reside in
`each partition, and cooperatively allocate resources amongst
`themselves.
`A partition arbiter or partition resource allocator allocates
`the resources between the different partitions, based on the
`priorities of the partitions and the resource requests. This
`movementof resources is referred to as re-sizing partitions.
`A partition, preferably through its WLM,maintainsa list of
`prioritized application goals with an indication of the quan-
`tity of each required resource application goals of equal
`priority are treated equally. (Note that an application may
`have more than one goal.) The requests of higher priority
`application goals are satisfied before lower priority applica-
`tion goals. Unallocated resources may beheld in reserve or
`assigned to default partition. Note that applications of the
`default partition may always be exceeding their goals and
`thus require a rule that such a condition is not an event to
`cause reallocation of resources or resizing ofpartitions.
`Note that the partition resource entitlements are no longer
`a fixed configuration. As a partition’s needs change, the
`invention will automatically adjust partition entitlements
`based resource availability and priority. Thus, the invention
`is dynamic. Also note that the administrator no longer has to
`estimate the initial entitlements as the invention will deter-
`mine the correct resource allocation to achieve the stated
`goals, and the computer system using the invention will
`converge on certain partition entitlement values that achieve
`the stated performance goals. Further note that priorities can
`be assigned to the different goals. Consequently, different
`goals can be met based on system resources, e.g., with a high
`amount of resources, all goals can be met, however, with a
`lesser amount of resources the higher priority goal will be
`met before the lower priority goals. Further note that
`changes to the system can be made as soon as the PLM
`receives resource requests, and action by the system admin-
`istrator is not required. Note that
`in multiple partition
`systems, the administrator may define and prioritize goals
`that apply across all of the partitions and the different
`operating system instances operating in the partitions,
`instead of only being applied within a single partition.
`FIG. 2A depicts the various components of the invention
`in a multiple partition system having multiple partitions
`203-1, 203-2, 203-3 .
`. .203-N. Each partition may have one
`or more processors and other systems resources, e.g. storage
`devices, I/O devices, etc. Each partition is preferably run-
`ning its own operating system 26-1,
`.
`.
`. 26-N, which
`provides segregation and survivability between the parti-
`tions. Note that the different partitions may have different
`amounts of resources, e.g. different numbers of processors.
`Also note that the partitions may be virtual, as the multiple
`partitions may reside in one or more physical computers.
`Note that in an initial state the system may have the
`resources evenly divided among the partitions. Alterna-
`tively,
`the initial state of the system may provide only
`minimal resources to each partition, with the extra resources
`
`
`
`US 7,140,020 B2
`
`5
`being held in reserve, for example, either unassigned orall
`placed into one or more partitions. The operations of the
`PLM and the WLMswill cause the system resources to be
`quickly allocated in a mannerthat is most efficient to handle
`the defined goals andpriorities for the applications of each
`of the partitions.
`The resources of the computer system are managed by
`PLM 201. The PLM 201 receives resource requests from the
`different partitions. The requests can involve multiple pri-
`orities and multiple types of resources. For example, a
`request may state that the partition requires two processors
`and one storage device to handle all high priority applica-
`tions, four processors and two storage devices to handle all
`high and medium priority applications, seven processors and
`five storage devices to handle all high, medium, and low
`priority applications. The requests originate from the WLMs
`20-1, .. . 20-N. The WLMspreferably produce the requests
`after totaling the resources necessary to activate their respec-
`tive goals. After receiving one or more requests, the PLM
`preferably reviews system resources and determinesif real-
`location is necessary based on existing resources, current
`requests, and the priorities of the requests. Thus,
`if a
`particular partition has a change in resource requirements,
`the PLM will examinethe existing requirements of the other
`partitions with the new requirements of the particular par-
`tition, as well as the current resources,
`to determine if
`reallocation is necessary. The PLM mayalso initiate real-
`location after a change in system resources, e.g. a processor
`fails, or additional memory is added,etc.
`The PLM preferably determines whether reallocation is
`necessary by examining the priorities of the resource
`request. A change in a high level request will typically cause
`reallocation. For example, if all device resources are con-
`sumedin handling high priority operations ofthe partitions,
`then a changein a low priority request would be ignored. On
`the other hand, a change in a high priority request, e.g. less
`resources needed, will cause reallocation of the resources,
`e.g. the excess resources from the oversupplied partition
`would be re-allocated among the otherpartitions based on
`the goals and priorities of their applications. The PLM then
`calculates a revised distribution of resources based on the
`goals and priorities of the applications of different partitions.
`The revised distribution is
`then delivered to partition
`resource allocator 202. Allocator 202 preferably operates to
`resize the partitions, which is to move resources from one or
`more partitions to one or more partitions based on the
`instructions provided by the PLM 201. An example of such
`an allocator, and partition resizing is described in US. Pat.
`No. 6,725,317 entitled “RECONFIGURATION SUPPORT
`FOR A MULTI PARTITION COMPUTER SYSTEM,”the
`disclosure of which is hereby incorporated herein by refer-
`ence.
`
`Note that resizing may cause considerable overhead to be
`incurred by the system. In such a case, moving resources
`from onepartition to another reduces the available comput-
`ing time. Thus, determination by the PLM mayinclude a
`threshold that must be reached before the PLM begins
`reallocation. The threshold may include multiple compo-
`nents, e.g.
`time, percent under/over capacity, etc. For
`example, a small over/under capacity may have to exist for
`a longer period of time before reallocation occurs, while a
`large over/under capacity may cause an immediate reallo-
`cation. This would prevent small,
`transient changes in
`resource need from causing reallocations in the system.
`FIG. 2B depicts the various components ofa partition of
`the inventive system, which includes performance goals and
`priorities 21. Goals 21 preferably comprises a configuration
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`6
`file, which is defined by a user or system administrator, that
`describes the users preferences with regards to what char-
`acteristic(s) of the application is of interest and is being
`measured, what is the desired level of performance of the
`application in terms of the characteristic, and what is the
`priority of achieving this goal. A user can also specify time
`periods for a particular goal to be in effect. For example, a
`first application may be a first database and the user will
`specify in the configuration file that the characteristic is for
`a particular type of transaction to be completed within two
`seconds, and havea high priority. The application mayalso
`have a second goalfor the same characteristic, e.g. the same
`type of transactions are to be completed within one half of
`a second, and havea low priority. A second application may
`be a second database which hasa similar goal as that of the
`first database, namely for a particular type of transaction to
`be completed within two seconds, and have the samepriority
`as the first database. Thus, resources would be allocated
`between the two applications, so that the high priority goals
`will be met, and any excess resources would be given to the
`first application so that
`it can meet
`the lower priority
`“stretch” goal.
`The WLM 20 preferably receives performance informa-
`tion which describes the status of a particular characteristic
`or characteristics of each application 12, 13, 14 that is being
`monitored. The WLM 20 also receives performance infor-
`mation which describes the status and/or other characteris-
`tics of the processors 11 and other devices 25 (e.g. I/O,
`storage, etc.) contained within partition 208.
`The performance information is preferably supplied by
`performance monitor 23. As shown in FIG. 2B, a single
`monitor is capable of handling multiple applications and
`devices, however, a different embodiment of the present
`invention may have multiple monitors, each monitoring one
`or more applications and devices. Performance monitor 23
`is a small program that gathers specific information about
`the application and/or device. For example, if the application
`is a database, then a performance monitor measures access
`times for the database. As another example, if a device is a
`hard drive, then the performance monitor may measure data
`capacity. The information need not be strictly application
`performance; it can be any measurable characteristic of the
`workload (e.g. CPU usage). This information is being gath-
`ered continuously while the system is operating. The work-
`load manager will sample the information at some interval
`specified by the administrator.
`The output of the workload manager, derived from the
`ongoing performancereported by the monitors and given the
`goals by the user, is preferably periodically applied to the
`PRM 10. The output of WLM 20 is the share or entitlement
`allocation to the different resources that is assigned to each
`application. For example, each share may approximately
`equates to Yoo of a CPU operating second. Thus, within a
`second, an application having an entitlement of 10 will
`receive Yio of the second, provided that the application has
`at least one runable process. Note that the time received may
`not be consecutive, but rather may be distributed across the
`one second interval. Note that a share may also equate to
`other parameters based on the resource being allocated, e.g.
`a percent of disk storage space or actual numberof bytes of
`disk storage space.
`The partition may have multiple numbers of resources,
`e.g. multiple CPUs and/or multiple storage devices. Thus,
`the allocation can be placed all on one device or spread
`among the devices. For example, a ten percent processor
`allocation in a four processor system could result in forty
`percent of one processor,
`ten percent of each processor,
`
`
`
`US 7,140,020 B2
`
`7
`twenty percent of two processors, or someotherallocation.
`The allocation amongthe different devices is determined by
`the PRM 10. The PRM will movethe application around to
`various devices, as needed to attempt
`to ensure that
`it
`achieves ten percent. Therefore, if the application has only
`one runable thread, so that it can only execute on one CPU,
`then PRM will attempt to give it 20% of one CPU (on a two
`CPU system), so that is 10% of the total universe of CPU
`availability that is out there. Multi-threaded applications can
`be assigned to more than one CPU.Theallocation allows the
`application to perform its programmed tasks. How fast and
`efficient it performs its tasks is a reflection of how much
`CPUtime it was allocated. The less CPU it is allocated, the
`less it will perform in a time period. The more CPUit is
`allocated, the more it will perform in a time period. The
`performance monitor will measure its performance, which
`will be sampled by the WLM,thus completing the feedback
`of the system.
`The WLM 20 also preferably sends resource requests to
`the PLM 201. These requests may take the form of a list that
`describes the resources required for partition 208 to meet its
`goals for its different priorities. The PLM may then decide
`to reallocate resources based on a request. The PLM may
`store the different requests, which would permit the PLM to
`view the changes in the requested resources. This would
`allow the PLM to anticipate changes in resources. For
`example, over a period of time, the PLM mayrealize that a
`particular partition always has a need for more resources at
`a particular time (or following a particular event), e.g. at four
`p-m., and thus the PLM mayreallocate resources to that
`particular partition before the partition sends a request. The
`storing of requests would also allow for the setting of
`reallocation triggering criteria. A simple trigger could be
`used that compares a single message with the current
`resource allocation, e.g. a requested increase/decrease of 5%
`or greater of the current allocation resources would trigger
`reallocation. More complextriggers could be used that refer
`to the stored messages. For example, requests from a par-
`ticular partition for increase/decrease of 2% to <5% of the
`current allocation resource that continue for more than one
`hour will cause reallocation.
`PLM 201 operates according to the flow chart 300 of FIG.
`3. The PLM starts 301 by receiving 302 the resource
`requests from the WLMs. The PLM then optionally deter-
`mines whether to initiate reallocation 315. The PLM may
`compare the resource requests with the current allocations.
`If a particular partition has a request for more or less
`resources that exceeds a predetermined threshold, as com-
`pared with a current allocation, then the PLM mayinitiate
`reallocation. Also, the PLM may comparea plurality of such
`requests from each partition, which have been accumulated
`over time, to determine whether there is a chronic overage/
`underage of resources. For example, suppose a difference of
`10% between requested resources (either overage or under-
`age) and current resources will cause an immediate reallo-
`cation to occur, while a 9% difference will cause reallocation
`if the difference (9% or higher) occurs in two consecutive
`requests (or for 10 minutes), while a 8% difference (8% or
`higher) will cause reallocation if the difference occurs in
`three consecutive requests (or for 15 minutes), etc. If the
`PLM determines that reallocation should occur, then the
`PLMproceeds with box 316, and if not then the PLM returns
`to box 302.
`
`In box 316, the PLM preferably assigns 301 all partitions
`with the value 1 (hereinafter meaning a minimalallotment of
`devices, e.g. one CPU, oneI/O, one block of memory, etc.).
`The extra resources would be assigned to a default partition
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`8
`or held in reserve as unassigned. Alternatively, the PLM may
`evenly divide up the resources between the partitions.
`In box 303,
`the PLM then preferably examines the
`requests for resources needed to handle the highest appli-
`cation priority group of the partitions. It determines 304
`whether the requested amount for each partition within the
`priority group canbesatisfied. If so, then the PLM facilitates
`allocation 305 of the requested entitlement by sending the
`allocation information to the partition resource allocator
`202. Note that several messages may be sent, with one or
`more for each application priority level and/or partition.
`Alternatively, one message may be sent at the end 309,
`which lays out the complete allocation of the resources for
`all partitions. If not,
`then the PLM preferably arbitrates
`betweenthe different partitionsin a fair manner, as discussed
`in step 310. After satisfying each partition with the appli-
`cation priority group in step 305, the PLM then determines
`306 whether there are any more application priority groups.
`If so, then the PLM returns to step 303 andrepeats. If not,
`then PLM determines 307 whether
`any unallocated
`resources remain. If not, then the PLM is finished 309. The
`allocated resource information is sent
`to the partition
`resource allocator, and the PLM is finished forthis iteration.
`After receiving new requests, the PLM will begin again in
`step 301. If step 307 determinesthat resources are available,
`then the PLM may assign the remaining resources to a
`default partition, des