`(12) Patent Application Publication (10) Pub. No.: US 2011/0145615 A1
`Rychlik et al.
`(43) Pub. Date:
`Jun. 16, 2011
`
`US 2011 0145615A1
`
`(54) SYSTEMAND METHOD FOR
`CONTROLLING CENTRAL PROCESSING
`UNIT POWER BASED ON INFERRED
`WORKLOAD PARALLELISM
`
`(76) Inventors:
`
`Bohuslav Rychlik, San Diego, CA
`(US); Robert A. Glenn, Boulder,
`CO (US); Ali Iranli, San Diego, CA
`(US); Brian J. Salsbery, Boulder,
`CO (US); Sumit Sur, Boulder, CO
`(US); Steven S. Thomson, San
`Diego, CA (US)
`
`(21) Appl. No.:
`
`12/944,140
`
`(22) Filed:
`
`Nov. 11, 2010
`
`Related U.S. Application Data
`(60) Provisional application No. 61/286,953, filed on Dec.
`16, 2009.
`
`Publication Classification
`
`(51) Int. Cl
`(2006.01)
`Go,F iA32
`(52) U.S. Cl. ........................................................ 713/323
`
`ABSTRACT
`(57)
`A method of dynamically controlling power within a multi
`core CPU is disclosed and may include receiving a degree of
`parallelism in a workload of a Zeroth core and determining
`whether the degree of parallelism in the workload of the
`Zeroth core is equal to a first wake condition. Further, the
`method may include determining a time duration for which
`the first wake condition is met when the degree of parallelism
`in the workload of the Zeroth core is equal to the first wake
`condition and determining whether the time duration is equal
`to a first confirm wake condition. The method may also
`include invoking an operating system to power up a first core
`when the time duration is equal to the first confirm wake
`condition.
`
`500
`
`During operation, do
`
`
`
`Dynamically infer a degree of
`workload parallelism by monitoring
`an operating system state
`
`Based on the degree of
`Workload parallelism, power
`Core(s) up or down
`
`Yes
`
`End
`
`Petiitoner Mercedes Ex-1009, 0001
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 1 of 14
`
`US 2011/O145615 A1
`
`
`
`S
`
`S
`
`s
`
`&
`
`Petiitoner Mercedes Ex-1009, 0002
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 2 of 14
`
`US 2011/O145615 A1
`
`
`
`s
`
`Petiitoner Mercedes Ex-1009, 0003
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 3 of 14
`
`US 2011/O145615 A1
`
`332
`
`328
`
`Display?
`
`336
`
`338
`
`
`
`pay
`
`TouchSCreen
`
`
`
`
`
`POWer
`Supply
`
`COnt.
`
`380
`
`340
`
`344
`
`354 Port
`
`AmO.
`
`356, Speaker
`
`Microphone
`
`FM
`
`X/ /
`358
`
`64
`
`3
`
`
`
`
`
`/
`
`Headphones
`
`366
`
`FIG. 3
`
`SM Card
`
`Card
`
`346 - 348
`
`388 372
`
`ls RF
`
`rate E.
`ransCeiver
`SW.
`T
`
`370
`
`Keypad
`
`
`
`Mono
`Headset
`Wf micro.
`
`368
`
`374
`
`376
`
`378
`
`Petiitoner Mercedes Ex-1009, 0004
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 4 of 14
`
`US 2011/O145615 A1
`
`
`
`430
`
`432
`
`434
`
`1St COre
`1st DCVS
`
`Nth Core
`
`Parallelism
`Monitor
`
`Operating System
`
`MP
`Controller
`
`Petiitoner Mercedes Ex-1009, 0005
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 5 of 14
`
`US 2011/O145615 A1
`
`500
`
`504
`
`506
`
`
`
`During operation, do
`
`502
`
`Dynamically infer a degree of
`Workload parallelism by monitoring
`an operating system state
`
`Based on the degree of
`Workload parallelism, power
`Core(s) up or down
`
`YeS
`
`End
`FIG. 5
`
`Petiitoner Mercedes Ex-1009, 0006
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 6 of 14
`
`US 2011/O145615 A1
`
`600
`
`602
`
`
`
`During operation, do
`
`Monitor the length of all OS
`Scheduler ready-to-run queues in
`order to determine a degree of
`Workload paralellism
`
`Based on the degree of
`Workload parallelism, power
`Core(s) up or down
`
`604
`
`606
`
`F.G. 6
`
`Petiitoner Mercedes Ex-1009, 0007
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 7 of 14
`
`US 2011/O145615 A1
`
`700
`
`During operation, do
`
`702
`
`
`
`Periodically sample a ready-to-run
`Queue length
`
`Determine a running average of
`the degree of parallelism in the
`WOrkload
`
`Based on the degree of
`Workload parallelism, power
`Core(s) up or down
`
`704
`
`7O6
`
`708
`
`NO
`
`710
`
`End
`
`FIG. 7
`
`Petiitoner Mercedes Ex-1009, 0008
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 8 of 14
`
`US 2011/O145615 A1
`
`800
`
`During operation, do
`
`802
`
`
`
`Receive a Callback from the OS
`whenever an entry is added or
`removed from the run dueue
`
`Determine a running average of
`the degree of parallelism in the
`WOrkload
`
`Based on the degree of
`Workload parallelism, power
`core(s) up or down
`
`804
`
`806
`
`808
`
`NO
`
`810
`
`End
`
`FIG. 8
`
`Petiitoner Mercedes Ex-1009, 0009
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 9 of 14
`
`US 2011/O145615 A1
`
`902
`
`-
`
`
`
`When device is powered on, do
`u
`
`904
`
`900
`
`u 905
`
`906 —-
`
`-
`
`End
`
`Yes
`
`
`
`
`
`
`
`NO
`
`910
`
`Receive a running average of
`the degree of parallelism in the
`workload on the O" core
`
`Degree
`of parallelism equal to a first
`Wake Condition?
`
`
`
`Yes
`
`- 914
`
`Determine a time duration for which the
`first Wake COndition is met
`
`
`
`Time
`duration equal to a first confirm
`wake Condition?
`
`916
`
`NO
`
`FIG. 9
`
`Yes
`Invoke the OS to power up a 1 core
`
`98
`
`st
`Invoke the OS to add the 1 Core to a
`Set Of Schedulable resources
`
`-Y
`
`920
`
`
`
`Execute a 15 DCVS algorithm on the 15 core
`
`u- 922
`
`
`
`GOtO FIG 10
`
`Petiitoner Mercedes Ex-1009, 0010
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 10 of 14
`
`US 2011/O145615 A1
`
`o
`
`From FG 9
`
`
`
`
`
`From FG 11
`
`
`
`Execute taskS/threads at the
`0" core and the 1 core
`
`End
`
`Yes
`
`Device Off?
`
`NO
`
`Receive a running average of the
`degree of parallelism in the workload
`on the O" core and the 1 core
`
`
`
`
`
`
`
`
`
`Degree
`of parallelism equal to a first
`sleep Condition?
`
`Yes
`
`X
`
`Determine a time duration for which the
`first sleep condition is met
`
`
`
`-
`Time
`duration equal to a first Confirm
`sleep condition?
`
`
`
`
`
`Yes
`Invoke the OS to save a current state of the 1 Core
`- 1016
`
`u- 1014
`
`Invoke the OS to remove the 1' core
`from the Set of Schedulable resources
`
`1018
`
`
`
`FIG. 10
`
`Petiitoner Mercedes Ex-1009, 0011
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 11 of 14
`
`US 2011/O145615 A1
`
`to
`
`
`
`
`
`
`
`
`
`
`
`From FIG 10
`
`Receive a running average of the
`degree of parallelism in the workload
`on the O" core and the 15 core
`
`Degree
`of parallelism equal to an N"
`Wake COndition?
`
`
`
`Determine a time duration for which the
`N" wake condition is met
`
`>
`
`Time
`
`YeS
`
`Invoke the OS to add the N" core to a
`Set Of Schedullable resources
`
`
`
`GOtO FIG. 12
`
`FIG. 11
`
`Petiitoner Mercedes Ex-1009, 0012
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 12 of 14
`
`US 2011/O145615 A1
`
`From FG 11
`
`Execute tasks/threads at the O"
`core, the 1 core, and the N" core
`
`o
`
`
`
`End
`
`Receive a running average of
`the degree of parallelism in the
`workload on the O" core, the 1
`Core, and the N" core
`
`
`
`
`
`Degree
`of parallelism equal to an N"
`sleep condition?
`
`st Determine a time duration for which the
`N" sleep condition is met
`
`
`
`Time
`
`Yes
`
`
`
`Invoke the OS to remove the N" core
`from the Set Of Schedullable resources
`
`
`
`FIG. 12
`
`Return to FIG 10
`
`Petiitoner Mercedes Ex-1009, 0013
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 13 of 14
`
`US 2011/O145615 A1
`
`to
`
`Create a test program having a u- 1302
`steady state Workload with
`varying degrees of parallelism
`
`u
`Execute the test program on the u-1
`WireleSS device
`
`1308
`
`
`
`FIG. 13
`
`Petiitoner Mercedes Ex-1009, 0014
`
`
`
`Patent Application Publication
`
`Jun. 16, 2011 Sheet 14 of 14
`
`US 2011/O145615 A1
`
`1400
`S.
`
`Determine a run queue
`th
`Value for a O'COre
`
`-
`
`1402
`
`1404
`Determine an operating u
`frequency for the O" core
`
`Determine a utilization
`percentage for the O" core
`
`Determine an idle percentage
`for the O" core
`
`1406
`
`1408
`
`Determine a run queue
`value for an N" core
`
`u- 1410
`
`Determine an operating
`frequency for the N" core
`
`Determine a utilization
`percentage for the N" core
`
`Determine an idle percentage
`for the N" core
`
`Determine a load Value for
`the system
`
`1412
`
`1414
`
`-
`
`1416
`
`1418
`
`Based on the load value for the system
`turning one or more cores on or off
`
`1420
`
`End
`
`FIG. 14
`
`Petiitoner Mercedes Ex-1009, 0015
`
`
`
`US 2011/O 145615 A1
`
`Jun. 16, 2011
`
`SYSTEMAND METHOD FOR
`CONTROLLING CENTRAL PROCESSING
`UNIT POWER BASED ON INFERRED
`WORKLOAD PARALLELISM
`
`RELATED APPLICATIONS
`0001. The present application claims priority to U.S. Pro
`visional Patent Application Ser. No. 61/286,953, entitled
`SYSTEM AND METHOD OF DYNAMICALLY CON
`TROLLING A PLURALITY OF CORES IN A MULTI
`CORE CENTRAL PROCESSING UNIT, filed on Dec. 16,
`2009, the contents of which are fully incorporated by refer
`CCC.
`
`CROSS-REFERENCED APPLICATIONS
`0002 The present application is related to, and incorpo
`rates by reference, U.S. patent application Ser. No.
`s
`entitled SYSTEMAND METHOD FOR CONTROLLING
`CENTRAL PROCESSING UNIT POWER INAVIRTUAL
`IZED SYSTEM, by Rychlik et al., filed concurrently (Attor
`ney Docket Number 100329U1). The present application is
`related to, and incorporates by reference, U.S. patent appli
`cation Ser. No.
`entitled SYSTEMAND METHOD
`FOR ASYNCHRONOUSLY AND INDEPENDENTLY
`CONTROLLING CORE CLOCKS IN A MULTICORE
`CENTRAL PROCESSING UNIT, by Rychlik et al., filed
`concurrently (Attorney Docket Number 100330U1). The
`present application is related to, and incorporates by refer
`ence, U.S. patent application Ser. No.
`entitled SYS
`TEMAND METHOD FOR CONTROLLING CENTRAL
`PROCESSING UNIT POWER WITH REDUCED FRE
`QUENCY OSCILLATIONS, by Thomson et al., filed con
`currently (Attorney Docket Number 100339U1). The present
`application is related to, and incorporates by reference, U.S.
`patent application Ser. No.
`entitled SYSTEMAND
`METHOD FOR CONTROLLING CENTRAL PROCESS
`ING UNIT POWER WITH GUARANTEED TRANSIENT
`DEADLINES, by Thomson et al., filed concurrently (Attor
`ney Docket Number 100340U1). The present application is
`related to, and incorporates by reference, U.S. patent appli
`cation Ser. No.
`entitled SYSTEMAND METHOD
`FOR CONTROLLING CENTRAL PROCESSING UNIT
`POWER WITH GUARANTEED STEADY STATE DEAD
`LINES, by Thomson et al., filed concurrently (Attorney
`Docket Number 100341 U1). The present application is
`related to, and incorporates by reference, U.S. patent appli
`cation Ser. No.
`entitled SYSTEMAND METHOD
`FOR DYNAMICALLY CONTROLLING A PLURALITY
`OF CORES IN A MULTICORE CENTRAL PROCESSING
`UNIT BASED ON TEMPERATURE, by Sur et al., filed
`concurrently (Attorney Docket Number 100344U 1).
`
`DESCRIPTION OF THE RELATED ART
`0003 Portable computing devices (PDs) are ubiquitous.
`These devices may include cellular telephones, portable digi
`tal assistants (PDAs), portable game consoles, palmtop com
`puters, and other portable electronic devices. In addition to
`the primary function of these devices, many include periph
`eral functions. For example, a cellular telephone may include
`the primary function of making cellular telephone calls and
`the peripheral functions of a still camera, a video camera,
`global positioning system (GPS) navigation, web browsing,
`sending and receiving emails, sending and receiving text mes
`
`sages, push-to-talk capabilities, etc. As the functionality of
`Such a device increases, the processing power required to
`Support such functionality also increases. Further, as the com
`puting power increases, there exists a greater need to effec
`tively manage the processor, or processors, that provide the
`computing power.
`0004. Accordingly, what is needed is an improved method
`of controlling power within a multicore CPU.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0005. In the figures, like reference numerals refer to like
`parts throughout the various views unless otherwise indi
`cated.
`0006 FIG. 1 is a front plan view of a first aspect of a
`portable computing device (PCD) in a closed position;
`0007 FIG. 2 is a front plan view of the first aspect of aPCD
`in an open position;
`0008 FIG. 3 is a block diagram of a second aspect of a
`PCD;
`0009 FIG. 4 is a block diagram of a processing system;
`0010 FIG. 5 is a flowchart illustrating a first aspect of a
`method of dynamically controlling power within a multicore
`CPU:
`FIG. 6 is a flowchart illustrating a second aspect of
`0011
`a method of dynamically controlling power within a multi
`core CPU:
`0012 FIG. 7 is a flowchart illustrating a third aspect of a
`method of dynamically controlling power within a multicore
`CPU:
`0013 FIG. 8 is a flowchart illustrating a fourth aspect of a
`method of dynamically controlling power within a multicore
`CPU:
`0014 FIG. 9 is a flowchart illustrating a first portion of a
`fifth aspect of a method of dynamically controlling power
`within a multicore CPU:
`0015 FIG. 10 is a flowchart illustrating a second portion
`of a fifth aspect of a method of dynamically controlling power
`within a multicore CPU:
`0016 FIG. 11 is a flowchart illustrating a third portion of
`a fifth aspect of a method of dynamically controlling power
`within a multicore CPU:
`0017 FIG. 12 is a flowchart illustrating a fourth portion of
`a fifth aspect of a method of dynamically controlling power
`within a multicore CPU:
`0018 FIG. 13 is a flowchart illustrating a method of test
`ing a multicore CPU; and
`0019 FIG. 14 is a flowchart illustrating a sixth aspect of a
`method of dynamically controlling power within a multicore
`CPU.
`
`DETAILED DESCRIPTION
`
`0020. The word “exemplary” is used hereinto mean “serv
`ing as an example, instance, or illustration.” Any aspect
`described herein as “exemplary' is not necessarily to be con
`Strued as preferred or advantageous over other aspects.
`0021. In this description, the term “application” may also
`include files having executable content, such as: object code,
`Scripts, byte code, markup language files, and patches. In
`addition, an 'application” referred to herein, may also include
`files that are not executable in nature, such as documents that
`may need to be opened or other data files that need to be
`accessed.
`
`Petiitoner Mercedes Ex-1009, 0016
`
`
`
`US 2011/O 145615 A1
`
`Jun. 16, 2011
`
`0022. The term “content may also include files having
`executable content, such as: object code, Scripts, byte code,
`markup language files, and patches. In addition, "content
`referred to herein, may also include files that are not execut
`able in nature. Such as documents that may need to be opened
`or other data files that need to be accessed.
`0023. As used in this description, the terms “component.”
`“database.” “module.” “system.” and the like are intended to
`refer to a computer-related entity, either hardware, firmware,
`a combination of hardware and software, software, or soft
`ware in execution. For example, a component may be, but is
`not limited to being, a process running on a processor, a
`processor, an object, an executable, a thread of execution, a
`program, and/or a computer. By way of illustration, both an
`application running on a computing device and the comput
`ing device may be a component. One or more components
`may reside within a process and/or thread of execution, and a
`component may be localized on one computer and/or distrib
`uted between two or more computers. In addition, these com
`ponents may execute from various computer readable media
`having various data structures stored thereon. The compo
`nents may communicate by way of local and/or remote pro
`cesses such as in accordance with a signal having one or more
`data packets (e.g., data from one component interacting with
`another component in a local system, distributed system,
`and/or across a network Such as the Internet with other sys
`tems by way of the signal).
`0024. Referring initially to FIG. 1 and FIG. 2, an exem
`plary portable computing device (PCD) is shown and is gen
`erally designated 100. As shown, the PCD 100 may include a
`housing 102. The housing 102 may include an upper housing
`portion 104 and a lower housing portion 106. FIG. 1 shows
`that the upper housing portion 104 may include a display 108.
`In a particular aspect, the display 108 may be a touch screen
`display. The upper housing portion 104 may also include a
`trackball input device 110. Further, as shown in FIG. 1, the
`upper housing portion 104 may include a power on button 112
`and a power off button 114. As shown in FIG. 1, the upper
`housing portion 104 of the PCD 100 may include a plurality
`of indicator lights 116 and a speaker 118. Each indicator light
`116 may be a light emitting diode (LED).
`0025. In a particular aspect, as depicted in FIG. 2, the
`upper housing portion 104 is movable relative to the lower
`housing portion 106. Specifically, the upper housing portion
`104 may be slidable relative to the lowerhousing portion 106.
`As shown in FIG. 2, the lower housing portion 106 may
`include a multi-button keyboard 120. In a particular aspect,
`the multi-button keyboard 120 may be a standard QWERTY
`keyboard. The multi-button keyboard 120 may be revealed
`when the upper housing portion 104 is moved relative to the
`lower housing portion 106. FIG. 2 further illustrates that the
`PCD 100 may include a reset button 122 on the lowerhousing
`portion 106.
`0026 Referring to FIG. 3, an exemplary, non-limiting
`aspect of a portable computing device (PCD) is shown and is
`generally designated 320. As shown, the PCD320 includes an
`on-chip system 322 that includes a multicore CPU 324. The
`multicore CPU 324 may include a zeroth core 325, a first core
`326, and an Nth core 327.
`0027. As illustrated in FIG.3, a display controller 328 and
`a touch screen controller 330 are coupled to the multicore
`CPU 324. In turn, a touch screen display 332 external to the
`on-chip system 322 is coupled to the display controller 328
`and the touch screen controller 330.
`
`0028 FIG. 3 further indicates that a video encoder 334,
`e.g., a phase alternating line (PAL) encoder, a sequential
`couleur a memoire (SECAM) encoder, or a national televi
`sion system(s) committee (NTSC) encoder, is coupled to the
`multicore CPU 324. Further, a video amplifier 336 is coupled
`to the video encoder 334 and the touch screen display 332.
`Also, a video port 338 is coupled to the video amplifier 336.
`As depicted in FIG.3, a universal serial bus (USB) controller
`340 is coupled to the multicore CPU 324. Also, a USB port
`342 is coupled to the USB controller 340. A memory 344 and
`a subscriber identity module (SIM) card 346 may also be
`coupled to the multicore CPU 324. Further, as shown in FIG.
`3, a digital camera 348 may be coupled to the multicore CPU
`324. In an exemplary aspect, the digital camera 348 is a
`charge-coupled device (CCD) camera or a complementary
`metal-oxide semiconductor (CMOS) camera.
`0029. As further illustrated in FIG. 3, a stereo audio
`CODEC 350 may be coupled to the multicore CPU 324.
`Moreover, an audio amplifier 352 may coupled to the stereo
`audio CODEC 350. In an exemplary aspect, a first stereo
`speaker 354 and a second stereo speaker 356 are coupled to
`the audio amplifier 352. FIG. 3 shows that a microphone
`amplifier 358 may be also coupled to the stereo audio
`CODEC 350. Additionally, a microphone 360 may be
`coupled to the microphone amplifier 358. In a particular
`aspect, a frequency modulation (FM) radio tuner 362 may be
`coupled to the stereo audio CODEC 350. Also, an FM antenna
`364 is coupled to the FM radio tuner 362. Further, stereo
`headphones 366 may be coupled to the stereo audio CODEC
`350.
`0030 FIG. 3 further indicates that a radio frequency (RF)
`transceiver 368 may be coupled to the multicore CPU 324. An
`RF switch 370 may be coupled to the RF transceiver 368 and
`an RF antenna 372. As shown in FIG.3, a keypad 374 may be
`coupled to the multicore CPU 324. Also, a mono headset with
`a microphone 376 may be coupled to the multicore CPU 324.
`Further, a vibrator device 378 may be coupled to the multi
`core CPU 324. FIG.3 also shows that a power supply 380 may
`be coupled to the on-chip system 322. In a particular aspect,
`the power supply 380 is a direct current (DC) power supply
`that provides power to the various components of the PCD
`320 that require power. Further, in a particular aspect, the
`power supply is a rechargeable DC battery or a DC power
`supply that is derived from an alternating current (AC) to DC
`transformer that is connected to an AC power source.
`0031 FIG. 3 further indicates that the PCD 320 may also
`include a network card 388 that may be used to access a data
`network, e.g., a local area network, a personal area network,
`or any other network. The network card 388 may be a Blue
`tooth network card, a WiFi network card, a personal area
`network (PAN) card, a personal area network ultra-low
`power technology (PeANUT) network card, or any other
`network card well known in the art. Further, the network card
`388 may be incorporated into a chip, i.e., the network card
`388 may be a full solution in a chip, and may not be a separate
`network card 388.
`0032. As depicted in FIG. 3, the touch screen display 332,
`the video port 338, the USB port 342, the camera 348, the first
`stereo speaker 354, the second stereo speaker 356, the micro
`phone 360, the FM antenna 364, the stereo headphones 366,
`the RF switch 370, the RF antenna 372, the keypad 374, the
`monoheadset 376, the vibrator 378, and the power supply 380
`are external to the on-chip system 322.
`
`Petiitoner Mercedes Ex-1009, 0017
`
`
`
`US 2011/O 145615 A1
`
`Jun. 16, 2011
`
`0033. In a particular aspect, one or more of the method
`steps described herein may be stored in the memory 344 as
`computer program instructions. These instructions may be
`executed by the multicore CPU 324 in order to perform the
`methods described herein. Further, the multicore CPU 324,
`the memory 344, or a combination thereof may serve as a
`means for executing one or more of the method steps
`described herein in order to control power to each CPU, or
`core, within the multicore CPU 324.
`0034 Referring to FIG. 4, a processing system is shown
`and is generally designated 500. In a particular aspect, the
`processing system 500 may be incorporated into the PCD320
`described above in conjunction with FIG. 3. As shown, the
`processing system 500 may include a multicore central pro
`cessing unit (CPU) 402 and a memory 404 connected to the
`multicore CPU 402. The multicore CPU 402 may include a
`Zeroth core 410, a first core 412, and an Nth core 414. The
`Zeroth core 410 may include a zeroth dynamic clock and
`voltage scaling (DCVS) algorithm 416 executing thereon.
`The first core 412 may include a first DCVS algorithm 417
`executing thereon. Further, the Nth core 414 may include an
`Nth DCVS algorithm 418 executing thereon. In a particular
`aspect, each DCVS algorithm 416,417,418 may be indepen
`dently executed on a respective core 412,414, 416.
`0035 Moreover, as illustrated, the memory 404 may
`include an operating system 420 stored thereon. The operat
`ing system 420 may include a scheduler 422 and the scheduler
`422 may include a first run queue 424, a second run queue
`426, and an Nth run queue 428. The memory 404 may also
`include a first application 430, a second application 432, and
`an Nth application 434 stored thereon.
`0036. In a particular aspect, the applications 430, 432, 434
`may send one or more tasks 436 to the operating system 420
`to be processed at the cores 410,412,414 within the multicore
`CPU 402. The tasks 436 may be processed, or executed, as
`single tasks, threads, or a combination thereof. Further, the
`scheduler 422 may schedule the tasks, threads, or a combi
`nation thereof for execution within the multicore CPU 402.
`Additionally, the scheduler 422 may place the tasks, threads,
`or a combination thereof in the run queues 424, 426,428. The
`cores 410, 412, 414 may retrieve the tasks, threads, or a
`combination thereof from the run queues 424, 426, 428 as
`instructed, e.g., by the operating system 420 for processing,
`or execution, of those task and threads at the cores 410, 412,
`414.
`0037 FIG. 4 also shows that the memory 404 may include
`a parallelism monitor 440 and a multicore processor (MP)
`controller 442 stored thereon. The parallelism monitor 440
`may be connected to the operating system 420 and the MP
`controller 442. Specifically, the parallelism monitor 440 may
`be connected to the scheduler 422 within the operating sys
`tem 420. As described herein, the parallelism monitor 440
`may monitor the workload on the cores 410,412,414 and the
`MP controller 442 may control the power to the cores 410.
`412,414 as described below. In a particular aspect, by execut
`ing one or more of the method steps, e.g., as computer pro
`gram instructions, described herein, the parallelism monitor
`440, the MP controller 442, or a combination thereof may
`serve a means for dynamically controlling the power to the
`cores 410, 412,414 within the multicore CPU 402.
`0.038.
`In a particular dual-core aspect, during operating,
`the MP controller 442 may receive an input from the paral
`lelism monitor 440. The input may be a total system load.
`Moreover, the input may be a running average of the degree of
`
`parallelism in the workload. Based on the input, the MP
`controller 442 may determine whether a single core or two
`cores should be powered on. Further, the MP controller 442
`may output a control signal to the multicore CPU 402. The
`control signal may indicate whether to turn additional cores
`on or off. In the dual-core example, the MP controller 442
`may include fourthreshold values for controlling the decision
`to power the cores on and off. The four threshold values may
`include a number of ready-to-run threads in the OS scheduler
`queue to trigger a core wake, N.; a time duration for which
`N has been exceed to confirm a core wake, Tw; a number of
`ready-to-run threads in the OS scheduler to trigger a core
`sleep, N.; and a time duration for whichNs has been exceeded
`to confirm a core sleep, T.
`0039 Beginning with a single core active, e.g., the Zeroth
`core 410, when the running average of the degree of parallel
`ism in the workload on the zeroth core 410 meets or exceeds
`N for a duration of at least T, the MP controller 442 may
`wake up a second core, e.g., the first core 412. Conversely,
`when both cores, e.g., the zeroth core 410 and the first core
`412, are active and when the degree of parallelism in the
`workload falls below N for at least a duration of T, the MP
`controller 442 may decide to put the second core, e.g., the first
`core 412, to sleep.
`0040. In a particular aspect, sustained threshold parallel
`ism over the time T implies that the single core is Saturated.
`Further, the cores may be started at the most power efficient
`Voltage-frequency (VF) operating point. In particular aspect,
`two cores operating at an optimal VF offer more Dhrystone
`million instructions per second (DMIPS) that a single core
`operating at a maximum VF. In a dual-core aspect, dual,
`independent DCVS algorithms may adapt to asymmetric
`workloads and in Some cases, heterogeneous cores. Further,
`in a dual-core aspect, the two cores should remain active
`during multi-tasking workloads in order to avoid a perfor
`mance penalty that is doubled. Also, when the parallelism
`falls below N for the prescribed time T, the second core
`should be powered off and not placed in standby. In a particu
`lar aspect, placing the second core in standby may increase
`power leakage and also may reduce performance.
`0041. The optimal values of the parameters N. T. N.
`and T may depend on the exact power consumption charac
`teristics of the system 400. However, in one aspect, the values
`may be as follows:
`0.042
`N, -1.2,
`0.043 T-40 milliseconds (ms),
`0044) N=0.8, and
`0045 T-80 ms.
`0046. In this particular aspect, N=1.2 may ensure a sus
`tained parallelism before the second core is awakened.
`N=0.8 may ensure a sustained absence of parallelism before
`the second core is put asleep. T-80 ms is based on a power
`collapse threshold of the system, 400 ms. T-40 ms is half of
`T to improve multicore responsiveness.
`0047 FIG. 5 illustrates a first aspect of a method of con
`trolling power within a multi-core processor. The method is
`generally designated 500. The method 500 commences at
`block 502 with a do loop in which during operation of a device
`having a multi-core processor, the Succeeding steps may be
`performed. At block 504, a power controller may dynamically
`infer a degree of workload parallelism within the CPUs, or
`cores, e.g., by monitoring an operating system state. Moving
`to block 506, at least partially based on the degree of work
`load parallelism, the power controller may power core(s) up
`
`Petiitoner Mercedes Ex-1009, 0018
`
`
`
`US 2011/O 145615 A1
`
`Jun. 16, 2011
`
`or down. In other words, the power controller may turn the
`cores on or off based on the workload.
`0048. At decision 508, the power controller may deter
`mine whether the device is powered off. If the device is
`powered off, the method may end. Otherwise, if the device
`remains powered on, the method 500 may return to block.504
`and the method 500 may continue as described.
`0049 Referring now to FIG. 6, a second aspect of a
`method of controlling power within a multi-core processor is
`shown and is generally designated 600. The method 600
`commences at block 602 with a do loop in which during
`operation of a device having a multi-core processor, the Suc
`ceeding steps may be performed. At block 604, a controller,
`e.g., a parallelism monitor, may monitor the length of all
`operating system (OS) scheduler ready-to-run queues in
`order to determine a degree of workload parallelism within
`the CPUs, or cores. In a particular aspect, the parallelism
`monitor may be a software program residing in a memory of
`the device. Further, inaparticular aspect, the scheduler ready
`to-run queue is a list of current tasks of threads that are
`available for scheduling on one or more CPUs. Some multi
`core systems may only have a single ready-to-run queue.
`Other multicore systems may have multiple ready-to-run
`queues. Regardless, of the number of ready-to-run queues, at
`any instant in time, the total number of tasks, threads, or a
`combination thereof waiting on these queues, plus a number
`of tasks, threads, or a combination thereof actually running,
`may be an approximation for the degree of parallelism in the
`workload.
`0050 Moving to block 606, at least partially based on the
`degree of workload parallelism, the parallelism monitor may
`power core(s) up or down. In other words, the parallelism
`monitor may turn the cores on or off based on the workload.
`0051. At decision 608, the parallelism monitor may deter
`mine whether the device is powered off. If the device is
`powered off, the method may end. Otherwise, if the device
`remains powered on, the method 600 may return to block 604
`and the method 600 may continue as described.
`0052 Referring to FIG. 7, a third aspect of a method of
`controlling power within a multi-core processor is shown and
`is generally designated 700. The method 700 commences at
`block 702 with a do loop in which during operation of a device
`having a multi-core processor, the Succeeding steps may be
`performed. At block 704, a parallelism monitor may periodi
`cally sample a ready-to-run queue length. For example, the
`parallelism monitor may sample the ready-to-run queue
`length every millisecond (1 ms). At block 706, the parallelism
`monitor may determine a running average of the degree of
`parallelism in the workload. Moving to block 708, at least
`partially based on the degree of workload parallelism, the
`parallelism monitor may power core(s) up or down. In other
`words, the parallelism monitor may turn the cores on or off
`based on the workload.
`0053 At decision 710, the parallelism monitor may deter
`mine whether the device is powered off. If the device is
`powered off, the method may end. Otherwise, if the device
`remains powered on, the method 700 may return to block 704
`and the method 700 may continue as described.
`0054 FIG. 8 depicts a fourth aspect of a method of con
`trolling power within a multi-core processor is shown. The
`method is generally designated 800 and the method 800 may
`commence at block 802 with a do loop in which during
`operation of a device having a multi-core processor, the Suc
`ceeding steps may be performed. At block 804, a parallelism
`
`monitor may receive a callback from the operating system
`(OS) whenever an entry is added or removed from the OS
`scheduler run queue. Further, at block 806, the parallelism
`monitor may determine a running average of the degree of
`parallelism in the workload of the CPUs, or cores.
`0055 Moving to block 808, at least partially based on the
`degree of workload