`Hum et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,615,647 B2
`Dec. 24, 2013
`
`USOO8615647B2
`
`(54) MIGRATING EXECUTION OF THREAD
`BETWEEN CORES OF DIFFERENT
`INSTRUCTION SETARCHITECTURE IN
`
`MULT-CORE PROCESSOR AND
`
`TRANSTONING EACH CORE TO
`RESPECTIVE ON / OFF POWER STATE
`
`(75) Inventors: Herbert Hum, Portland, OR (US); Eric
`Sprangle, Austin, TX (US); Doug
`Carmean, Beaverton, OR (US); Rajesh
`Kumar, Portland, OR (US)
`
`(*) Notice:
`
`(73) Assignee: Intel Corporation, Santa Clara, CA
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 538 days.
`(21) Appl. No.: 12/220,092
`
`(22) Filed:
`(65)
`
`Jul. 22, 2008
`Prior Publication Data
`US 2009/0222654 A1
`Sep. 3, 2009
`Related U.S. Application Data
`(60) Provisional application No. 61/067,737, filed on Feb.
`29, 2008.
`(51) Int. Cl.
`G06F L/32
`(52) U.S. Cl.
`USPC .............................. 713/100; 712/32; 713/323
`(58) Field of Classification Search
`None
`See application file for complete search history.
`
`(2006.01)
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,763,242 A
`5,293,500 A
`
`8, 1988 Lee et al.
`3, 1994 Ishida et al.
`
`5,890,799 A * 4/1999 Yiu et al. ...................... T13,321
`5.991,883. A I. 1999 tion
`38
`A . 139 Eday
`T12/41
`
`6,035.408 A
`
`3, 2000 Huang .
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`6,058.434 A
`6.219,742 B1
`6,240,521 B1
`6.405,320 B1
`6,501,999 B1
`6,513,057 B1
`6,513,124 B1
`6,567,839 B1
`6,631,474 B1
`
`5, 2000 Wilt et al.
`4/2001 Stanley
`5, 2001 Barber et al.
`6/2002 Lee et al.
`12/2002 Cai
`1/2003 McCrory
`1/2003 Furuichi et al.
`5/2003 Borkenhagen et al.
`10/2003 Cai et al.
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`wooie:
`858.
`WOO3100546
`10, 2003
`2004/064119
`T 2004
`
`W.
`WO
`WO
`
`OTHER PUBLICATIONS
`Seng et al. “Reducing Power with Dynamic Critical Path Informa
`tion'. Proc. of the 34th annual ACM/IEEE international symposium
`on Microarchitecture, ACM, Dec. 2001, pp. 114-123.
`(Continued)
`
`Primary Examiner — Kenneth Kim
`s Civic or Firm — Mnemoglyphics, LLC;
`aWCIV. V1
`
`(57)
`
`ABSTRACT
`
`Techniques to control power and processing among a plural
`ity of asymmetric cores. In one embodiment, one or more
`asymmetric cores are power managed to migrate processes or
`threads among a plurality of cores according to the perfor
`mance and power needs of the system.
`
`20 Claims, 6 Drawing Sheets
`
`PROCESSOR
`
`PROCESSOR
`
`PROC.
`CORE
`
`MEMORY
`34
`
`MEMORY
`
`HIGH-PERF
`GRAPHICS
`33B
`
`BUSBRIDGE
`3.18.
`
`ODECES
`34.
`
`AUDIOIO
`324
`
`KEYBOARD
`MOUSE
`
`COMM
`DEVICES
`
`DATA STORASE
`
`code as
`
`
`
`
`
`
`
`
`
`Petitioner Mercedes Ex-1033, 0001
`
`
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6,718.475 B2
`6,732,280 B1
`6,901,522 B2
`6,968,469 B1
`7,017,060 B2
`7,069,463 B2
`7,093,147 B2
`7,171,546 B2
`7,269,752 B2
`
`4/2004 Cai
`5, 2004 Cheok et al.
`5, 2005 Buch
`1 1/2005 Fleischmann et al.
`3, 2006 Therien et al.
`6, 2006 Oh
`8, 2006 Farkas et al.
`1/2007 Adams
`9, 2007 John
`
`
`
`US 8,615,647 B2
`Page 2
`
`4/2007 Sutardja
`2007/0O83785 A1
`2007/0094444 A1* 4/2007 Sutarda ........................ T11 112
`2007/022O246 A1
`9, 2007 Powell et al.
`2007/0234077 A1* 10, 2007 Rothman et al. .............. T13,300
`2008, 0028244 A1
`1/2008 Capps et al.
`2008, 0028245 A1
`1/2008 Ober et al.
`2008/0307244 A1 12/2008 Bertelsen et al.
`2009/0193243 A1* 7/2009 Ely ................................... 713/2
`2009, 0222654 A1
`9, 2009 Hum et al.
`2010.0185833 A1* 7, 2010 Saito et al. .................... T12/2O3
`2010/026891.6 A1* 10/2010 Hu et al. ......................... T12/41
`
`OTHER PUBLICATIONS
`
`Irani et al. Online Strategies for Dynamic Power Management in
`Systems with Multiple Power-Saving States.” ACM Transactions on
`Embedded Computing Systems, vol. 2, No. 3, Aug. 2003, pp. 325
`346.
`Leflurgy et al., “Energy Management for Commercial Servers. Com
`puter, IEEE, Dec. 2003, vol. 36, Issue 12, pp.39-48.
`Kumar et al., “Single-ISA Heterogeneous Multi-Core Architectures:
`The Potential for Processor Power Reduction.” Proceedings of the
`Sh Annual IEEE/ACM International Symposium on
`icroarchitecture, 2003, pp. 81-92.
`Benini et al., “A Survey of Design Techniques for System-Level
`Dynamic Power Management.” IEEE Transactions on Very Large
`Scale Integration (VLSI) Systems, Jun. 2000, vol. 8, Issue 3, pp.
`299-316.
`Abramson et al., “Intel Virtualization Technology for Directed I/O.”
`IntelTechnology Journal, Aug. 10, 2006, vol. 10, Issue 3, pp. 179-192
`(16 pages included).
`International Search Report of PCT/2007/000010, Swedish Patent
`Office, Stockholm, Sweden, dated May 16, 2007, 4 pages.
`International Search Report, PCT/2007/000010, dated May 16, 2007.
`
`* cited by examiner
`
`3: E. 1358. Shone a... is so
`7,492,368 B1
`2/2009 Nordquist et al.
`7,500,126 B2 * 3/2009 Terechko et al. ............. T13,323
`7,500,127 B2
`3/2009 Fleck et al.
`7,624,215 B2 * 1 1/2009 Axford et al. ................. T10,260
`7,743,232 B2 * 6/2010 Shen et al. .........
`712,211
`8,028,290 B2
`9/2011 Rymarczyk et al.
`718, 104
`8,060,727 B2 : 11/2011 Blixt..............
`T12/32
`200 SS R
`
`23. 5 fiG . .
`.
`.
`.
`.
`.
`. T17,140
`2002/0129288 A1
`9, 2002 LOh et al.
`2003/0065734 A1
`4/2003 Ramakesavan
`2003, OO88800 A1
`5, 2003 Cai
`2003.01.00340 A1
`5/2003 Cupps et al.
`2003/011 0012 A1
`6/2003 Orenstein et al.
`2003/0224768 Al 12/2003 Adjamah
`2005/0066209 A1
`3/2005 Kee et al.
`2005, 0132239 A1
`6, 2005 Athas et al.
`2005, 0182980 A1
`8/2005 Sutardja
`2006.0036878 A1
`2, 2006 Rothman et al.
`2006/009,5807 A1
`5, 2006 Grochowski et al.
`2006/0294401 A1* 12/2006 Munger ........................ 713,300
`2007/0038875 A1
`2/2007 Cupps et al.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`Petitioner Mercedes Ex-1033, 0002
`
`
`
`U.S. Patent
`U.S. Patent
`
`Dec. 24, 2013
`Dec. 24, 2013
`
`Sheet 1 of 6
`Sheet 1 of 6
`
`US 8,615,647 B2
`US 8,615,647 B2
`
`
`
`
`
`FIG. 1
`
`205
`205
`230
`33
`33
`
`03
`
`215
`201
`210
`215
`210
`250
`240
`250
`[220]
`240
`'' (219.
`- 29, 213
`* (29. 207
`(29.
`217
`243
`22
`ft L219)
`|7
`213
`223 77g]
`207
`253. [279]
`217
`217
`<-- 245
`||
`225
`- 255
`225
`1.
`
`
`~~|225 <— [235|<- 425 <— {255|
`227.
`| 247.
`| 257,
`L »237
`227
`--a
`247
`257
`D
`|
`oe
`[ig
`(”
`I
`I
`
`I
`
`|- 260
`
`tir.
`
`FIG. 2
`
`Petitioner Mercedes Ex-1033, 0003
`
`Petitioner Mercedes Ex-1033, 0003
`
`
`
`U.S. Patent
`
`Dec. 24, 2013
`
`Sheet 2 of 6
`
`US 8,615,647 B2
`
`7$.
`ARJOWE W
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Petitioner Mercedes Ex-1033, 0004
`
`
`
`U.S. Patent
`
`Dec. 24, 2013
`
`Sheet 3 of 6
`
`US 8,615,647 B2
`
`POWer VS. Performance
`
`
`
`O
`
`0.2
`
`0.6
`0.4
`Relative Performance
`
`1
`
`0.8
`
`401
`
`FIG. 4
`
`Petitioner Mercedes Ex-1033, 0005
`
`
`
`U.S. Patent
`
`Dec. 24, 2013
`
`Sheet 4 of 6
`
`US 8,615,647 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IT IS DETERMINED THATA
`PROCESSITHREADITASK RUNNING
`ON A MAIN PROCESSOR CORE OFA
`MULTI-CORE PROCESSOR DOES
`MAYBERUNON ALOWER
`POWER PERFORMANCE CORE
`WHILE MANTAINING AN ACCEPTABLE
`PERFORMANCE LEVEL
`501.
`
`ANEVENT OCCURS IN THE MAIN
`CORE TO CAUSE STATE FROM THE
`CORE TO BE SAVED AND COPIED TO
`A LOWERPOWERIPERFORMANCE
`CORE
`505
`
`THE TRANSFERRED THREADI
`PROCESSITASK IS RESTARTED ON
`THE LOWER POWERIPERFORMANCE
`CORE
`510
`
`THE MAN CORE MAYBE PLACED
`INALOWER POWER STATE
`515.
`
`
`
`TRANSFORMED TASK
`ACTIVITY LEVEL, EXCEEDSA
`THRESHOLD OR NEW PROCESS
`STARTED ON MAIN
`CORE2
`
`
`
`TRANSFERTASK BACK
`TOMAIN CORE
`525
`
`FIG. 5
`
`Petitioner Mercedes Ex-1033, 0006
`
`
`
`U.S. Patent
`U.S. Patent
`
`Dec. 24, 2013
`Dec. 24, 2013
`
`Sheet 5 of 6
`Sheet 5 of 6
`
`US 8,615,647 B2
`US 8,615,647 B2
`
`
`
`600-1
`
`SC-1
`
`PC-1
`
`PC-N
`
`Petitioner Mercedes Ex-1033, 0007
`
`Petitioner Mercedes Ex-1033, 0007
`
`
`
`U.S. Patent
`
`Dec. 24, 2013
`
`Sheet 6 of 6
`
`US 8,615,647 B2
`
`
`
`Main COre
`
`ULPC
`
`FIG. 8
`
`Petitioner Mercedes Ex-1033, 0008
`
`
`
`US 8,615,647 B2
`
`1.
`MGRATING EXECUTION OF THREAD
`BETWEEN CORES OF DIFFERENT
`INSTRUCTION SETARCHITECTURE IN
`MULTI-CORE PROCESSOR AND
`TRANSTONING EACH CORE TO
`RESPECTIVE ON / OFF POWER STATE
`
`FIELD OF THE INVENTION
`
`Embodiments of the invention relate generally to the field
`of information processing and more specifically, to the field
`of distributing program tasks among various processing ele
`mentS.
`
`10
`
`BACKGROUND
`
`As more processing throughput is required from modern
`microprocessors, it is often at the expense of power consump
`tion. Some applications, such as mobile internet devices
`(MIDs), ultra-mobile personal computers (UMPCs), cellular
`phones, personal digital assistants (PDAs), and even laptop/
`notebook computers, may benefit from processors that con
`sume relatively little power. However, achieving relatively
`high processing throughput at relatively low power is a chal
`lenge, involving various design trade-offs, depending on the
`usage models of the computing platform.
`One approach to reducing power in a computing platform
`when there is relatively little activity, is to place the processor
`in a low-power state. However, placing a processor in a low
`power state or returning a processor from a low-power State
`may require a non-trivial amount of time. Therefore, it may or
`may not be worth the time required to place a processor in a
`low-power state or to return the processor from a low-power
`state. Furthermore, not all processes and tasks that are run on
`a processor require the full processing throughput of the
`processor.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`15
`
`25
`
`30
`
`35
`
`2
`ing elements. In some embodiments, each processing element
`is a processor core, having one or more execution resources,
`Such as arithmetic logic units (ALUs), instruction decoder,
`and instruction retirement unit, among other things. In some
`embodiments, the number of asymmetric processing ele
`ments has at least two different processing throughput or
`performance capabilities, power consumption characteristics
`or limits, Voltage Supply requirements, clock frequency char
`acteristics, number of transistors, and/or instruction set archi
`tectures (ISAS). In one embodiment, an asymmetric micro
`processor includes at least one main processor core having
`larger power consumption characteristics and/or processing
`throughput/performance characteristics than at least one
`other processing core within or otherwise associated with the
`microprocessor.
`In one embodiment, a process or task running or intended
`to run on a main higher power/performance processing core
`may be transferred to one of the other lower power/perfor
`mance processing cores for various reasons, including that
`the process or task does not require the processing throughput
`of one of the main cores, the processor or the system in which
`its used is placed into or otherwise requires a lower-power
`consumption condition (such as when running on battery
`power), and for increasing the processing throughput of the
`asymmetric microprocessor or system in which the higher
`power/performance cores and lower power/performance
`cores are used. For example, in one embodiment, the asym
`metric processing elements may be used concurrently or oth
`erwise in parallel to perform multiple tasks or processes,
`thereby improving the overall throughput of the processor
`and processing system.
`In one embodiment, the at least one main processing core
`has a different ISA than at least one of the at least one pro
`cessor cores having a lower power consumption characteris
`tic and/or processing performance capability. In one embodi
`ment, instruction translation logic in the form of hardware,
`Software, or some combination thereof, may be used to trans
`late instructions for the at least one main processor core into
`instructions for the at least one other lower-power/perfor
`mance processing core. For example, in one embodiment, one
`or more of the main higher power/performance cores may
`have a complex instruction set computing (CISC) architec
`ture, such as the “x86’ computing architecture, and therefore
`performs instructions that are intended for x86 processor
`cores. One or more of the lower power/performance cores
`may have a different ISA than the main core, including a
`reduced instruction set computing (RISC) architecture. Such
`as an Advanced RISC Machine (ARM) core. In other embodi
`ments, the main processing element(s) and the lower power/
`performance processing element(s) may include other archi
`tectures, such as the MIPS ISA. In other embodiments the
`main processing element(s) may have the same ISA as the
`lower power/performance element(s) (e.g., x86).
`In one embodiment, a number of different threads, pro
`cesses, or tasks associated with one or more software pro
`grams may be intelligently moved among and ran on a num
`ber of different processing elements, having a number of
`different processing capabilities (e.g., operating Voltage, per
`formance, power consumption, clock frequency, pipeline
`depth, transistor leakage, ISA), according to the dynamic
`performance and power consumption needs of the processor
`or computer system. For example, if one process, such as that
`associated with a spreadsheet application, does not require
`the full processing capabilities of a main, higher performance
`processor core, but may be instead be ran with acceptable
`performance on a lower-power core, the process may be
`transferred to or otherwise ran on the lower power core and
`
`Embodiments of the invention are illustrated by way of
`example, and not by way of limitation, in the figures of the
`accompanying drawings and in which like reference numer
`als refer to similar elements and in which:
`FIG. 1 illustrates a block diagram of a microprocessor, in
`which at least one embodiment of the invention may be used;
`FIG. 2 illustrates a block diagram of a shared bus computer
`system, in which at least one embodiment of the invention
`may be used;
`FIG. 3 illustrates a block diagram a point-to-point inter
`connect computer system, in which at least one embodiment
`of the invention may be used;
`FIG. 4 is a curve showing the relationship between power
`and performance using at least one embodiment of the inven
`tion;
`FIG. 5 is a flow diagram of operations that may be used for
`performing at least one embodiment of the invention;
`FIG. 6 illustrates a number of processing units and an
`activity level, thermal, or power detection/monitoring unit
`that may be used in at least one embodiment.
`FIG. 7 illustrates a power management logic according to
`one embodiment.
`FIG. 8 illustrates a technique to transition between at least
`two asymmetric processing cores, according to one embodi
`ment.
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`DETAILED DESCRIPTION
`
`Embodiments of the invention include a microprocessor or
`processing system having a number of asymmetric process
`
`Petitioner Mercedes Ex-1033, 0009
`
`
`
`US 8,615,647 B2
`
`10
`
`15
`
`3
`the main, higher power processor core may be placed in a low
`power state or may just remain idle. By running threads/
`processes/tasks on a processor core that better matches the
`performance needs of the thread/process/task, power con
`Sumption may be optimized, according to some embodi
`mentS.
`FIG. 1 illustrates a microprocessor in which at least one
`embodiment of the invention may be used. In particular, FIG.
`1 illustrates microprocessor 100 having one or more main
`processor cores 105 and 110, each being able to operate at a
`higher performance level (e.g., instruction throughput) or
`otherwise consume more power than one or more low-power
`cores 115,120. In one embodiment, the low-power cores may
`be operated at the same or different operating Voltage as the
`main cores. Furthermore, in some embodiments, the low
`power cores may operate a different clock speed or have fewer
`execution resources, such that they operate at a lower perfor
`mance level than the main cores.
`In other embodiments, the low-power cores may be of a
`different ISA than the main cores. For example, the low
`power cores may have an ARMISA and the main cores may
`have an x86 ISA, such that a program using x86 instructions
`may need to have these instructions translated into ARM
`instructions if a process/task/thread is transferred to one of
`the ARM cores. Because the process/thread/task being trans
`25
`ferred may be one that does not require the performance of
`one of the main cores, a certain amount of latency associated
`with the instruction translation may be tolerated without
`noticeable or significant loss of performance.
`Also illustrated in FIG. 1 is at least one other non-CPU
`functional unit 117, 118, and 119 which may perform other
`non-CPU related operations. In one embodiment, the func
`tional units 117, 118, and 119 may include functions such as
`graphics processing, memory control and I/O or peripheral
`control. Such as audio, video, disk control, digital signal pro
`cessing, etc. The multi-core processor of FIG. 1 also illus
`trates a cache 123 that each core can access for data or instruc
`tions corresponding to any of the cores.
`In one embodiment, logic 129 may be used to monitor
`performance or power of any of the cores illustrated in FIG. 1
`in order to determine whether a process/task/thread should be
`migrated from one core to another to optimize power and
`performance. In one embodiment, logic 129 is associated
`with the main cores 105 and 110 to monitor an activity level
`of the cores to determine whether the processes/threads/tasks
`running on those cores could be run on a lower-power core
`115, 120 at an acceptable performance level, thereby reduc
`ing the overall power consumption of the processor. In other
`embodiments, logic 129 may respond to a power state of the
`system, Such as when the system goes from being plugged
`into an A/C outlet to battery power. In this case, the OS or
`Some other power state monitoring logic may inform logic
`129 of the new power conditions and the logic 129 may cause
`a current-running process (or processes yet to be scheduled to
`run) to either be transferred (or scheduled) to a lower-power
`core (in the case of going from A/C to battery, for example) or
`from a lower-power core to a main core (in the case of going
`from battery to A/C, for example). In some embodiments, an
`operating system (OS) may be responsible for monitoring or
`otherwise controlling the power states of the processor and/or
`system, such that the logic 129 simply reacts to the OS's
`commands to reduce power by migrating taskS/threads/pro
`cesses to a core that better matches the performance needs of
`the taskS/threads/processes while accomplishing the power
`requirements dictated or indicated by the OS.
`In some embodiments, the logic 129 may be hardware
`logic or software, which may or may not determine a core(s)
`
`35
`
`4
`on which a process/task/thread should be run independently
`of the OS. In one embodiment, for example, logic 129 is
`implemented in software to monitor the activity level of the
`cores, such as the main cores, to see if it drops below a
`threshold level, and in response thereto, causes one or more
`processes running on the monitored core(s) to be transferred
`to a lower-power core, such as cores 115 and 120. Conversely,
`logic 129 may monitor the activity level of a process running
`on a lower-power core 115 and 120 in order to determine
`whether it is rising above a threshold level, thereby indicating
`the process should be transferred to one of the main cores 105,
`110. In other embodiments, logic 129 may independently
`monitor other performance or power indicators within the
`processor or system and cause processes/threads/tasks to be
`migrated to cores that more closely fit the performance needs
`of the tasks/processes/threads while meeting the power
`requirements of the processor of the system at a given time. In
`this way, the power and performance of processor 100 can be
`controlled without the programmer or OS being concerned or
`even aware of the underlying power state of the processor.
`In other embodiments, each core in FIG.1 may be concur
`rently running different taskS/threads/processes to get the
`most performance benefit possible from the processor. For
`example, in one embodiment, a process/thread/task that
`requires high performance may be run on a main core 105.
`110 concurrently with a process/thread/task that doesn’t
`require as high performance as what the main cores are able to
`deliver on lower-power cores 115, 120. In one embodiment,
`the programmer determines where to schedule these taskS/
`threads/processes, whereas in other embodiments, these
`threads/tasks/processes may be scheduled by an intelligent
`thread scheduler (not shown) that is aware of the performance
`capabilities of each core and can schedule the threads to the
`appropriate core accordingly. In other embodiments, the
`threads are simply scheduled without regard to the perfor
`mance capabilities of the underlying cores and the threads/
`processes/tasks are migrated to a more appropriate core after
`the activity levels of the cores in response to the threads/
`processes/tasks are determined. In this manner, neither an OS
`nor a programmer need be concerned about where the
`threads/processes/tasks are scheduled, because the threads/
`processes/tasks are scheduled on the appropriate core(s) that
`best suits the performance requirement of each thread while
`maintaining the power requirements of the system or proces
`SO.
`In one embodiment, logic 129 may be hardware, software,
`or some combination thereof. Furthermore, logic 129 may be
`distributed within one or more cores or exist outside the cores
`while maintaining electronic connection to the one or more
`cores to monitor activity/power and cause threads/tasks/pro
`cesses to be transferred to appropriate cores.
`FIG. 2, for example, illustrates a front-side-bus (FSB)
`computer system in which one embodiment of the invention
`may be used. Any processor 201, 205, 210, or 215 may
`include asymmetric cores (differing in performance, power,
`operating Voltage, clock speed, or ISA), which may access
`information from any local level one (L1) cache memory 220,
`225, 230, 235, 240, 245, 250, 255 within or otherwise asso
`ciated with one of the processor cores 223, 227, 233, 237,
`243, 247, 253, 257. Furthermore, any processor 201, 205,
`210, or 215 may access information from any one of the
`shared level two (L2) caches 203, 207, 213, 217 or from
`system memory 260 via chipset 265. One or more of the
`processors in FIG.2 may include or otherwise be associated
`with logic 219 to monitor and/or control the scheduling or
`migration of processes/threads/tasks between each of the
`asymmetric cores of each processor. In one embodiment,
`
`30
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Petitioner Mercedes Ex-1033, 0010
`
`
`
`US 8,615,647 B2
`
`5
`logic 219 may be used to schedule or migrate threads/tasks/
`processes to or from one asymmetric core in one processor to
`another asymmetric core in another processor.
`In addition to the FSB computer system illustrated in FIG.
`2, other system configurations may be used in conjunction
`with various embodiments of the invention, including point
`to-point (P2P) interconnect systems and ring interconnect
`systems. The P2P system of FIG.3, for example, may include
`several processors, of which only two, processors 370, 380
`are shown by example. Processors 370,380 may each include
`a local memory controller hub (MCH)372, 382 to connect
`with memory 32, 34. Processors 370,380 may exchange data
`via a P2P interface 350 using P2P interface circuits 378,388.
`Processors 370, 380 may each exchange data with a chipset
`390 via individual P2P interfaces 352, 354 using point to
`point interface circuits 376,394,386,398. Chipset 390 may
`also exchange data with a high-performance graphics circuit
`338 via a high-performance graphics interface 339. Embodi
`ments of the invention may be located within any processor
`having any number of processing cores, or within each of the
`P2P bus agents of FIG. 3. In one embodiment, any processor
`core may include or otherwise be associated with a local
`cache memory (not shown). Furthermore, a shared cache (not
`shown) may be included in either processor outside of both
`processors, yet connected with the processors via P2P inter
`connect, Such that either or both processors local cache infor
`mation may be stored in the shared cache if a processor is
`placed into a low power mode. One or more of the processors
`or cores in FIG.3 may include or otherwise be associated with
`logic to monitor and/or control the scheduling or migration of
`processes/threads/tasks between each of the asymmetric
`cores of each processor.
`FIG. 4 is a graph illustrating the performance and power
`characteristics associated with a processor when Scaling Volt
`age and frequency including techniques according to at least
`one embodiment of the invention. Reducing Voltage is an
`efficient way of reducing power since the frequency scales
`linearly with the voltage, while the power scales as the volt
`age 3 (power-CV2F). Unfortunately, this efficient voltage
`Scaling approach only works within a range of Voltages; at
`Some point, 'Vimin', the transistor Switching frequency does
`not scale linearly with voltage. At this point (401), to further
`reduce power, the frequency is reduced without dropping the
`Voltage. In this range, the power scales linearly with the
`frequency which is not nearly as attractive as when in the
`range where Voltage scaling is possible. In one embodiment,
`power consumption of the system may be reduced below the
`minimum point 401 of a typical multi-core processor having
`symmetric processing elements by Scheduling or migrating
`processes/threads/tasks from higher-performance/power
`cores to lower-performance/power cores if appropriate. In
`FIG. 4, the power/performance curve segment 405 indicates
`where the overall non-linear power/performance curve could
`be extended to enable more power savings, in one embodi
`ment.
`FIG. 5 illustrates a flow diagram of operations that may be
`used in conjunction with at least one embodiment of the
`invention. At operation 501, it is determined that a process/
`thread/task running on a main processor core of a multi-core
`processor does may be run on a lower power/performance
`core while maintaining an acceptable performance level. In
`one embodiment, the determination could be made by moni
`toring the activity level of the main core in response to run
`ning the thread/process/task and comparing it to a threshold
`value, corresponding to an acceptable performance metric of
`the lower power/performance core. In other embodiments,
`the determination could be made based on system power
`
`40
`
`45
`
`6
`requirements, such as when the system is running on A/C
`power versus battery power. In yet other embodiments, a
`thread/process/task may be designated to require only a cer
`tain amount of processor performance, for example, by a
`programmer, the OS, etc. In other embodiments, other tech
`niques for determining whether a task/thread/process could
`be transferred to a lower power/performance core, thereby
`reducing power consumption.
`At operation 505, an event (e.g., yield, exception, etc.)
`occurs in the main core to cause state from the core to be saved
`and copied to a lower power/performance core. In one
`embodiment, a handler program is invoked in response to the
`event to cause the main core state to be transferred from the
`main core to a lower power/performance core. At operation
`510, the transferred thread/process/task is restarted or
`resumed on the lower power/performance core. At operation
`515, the main core may be placed in a lower power state (e.g.,
`paused, halted, etc.) until 520 either the transferred process/
`task/thread requires above a threshold level of performance,
`in which case the thread/process/task may be transferred back
`to the main core 525 in a similar manner as it was transferred
`to the lower power/performance core, or another task/pro
`cess/thread is scheduled for execution on the main core.
`In one embodiment, the thread/process/task transferred
`from the main core to the lower power/performance core is
`first translated from the ISA of the main core to the ISA of the
`lower power/performance core, if the two have different
`architectures. For example, in one embodiment, the main core
`is an x86 architecture core and the lower power/performance
`core is an ARM architecture core, in which case instructions
`of the transferred thread/process/task may be translated (for
`example, by a software binary translation shell) from x86
`instructions to ARM instructions. Because the thread/pro
`cess/task being transferred is by definition one that does not
`require as much performance as to require it to be ran on the
`main core, a certain amount of latency may be tolerated in
`translating the process/task/thread from the x86 architecture
`to ARM architecture.
`FIG. 6 illustrates a processing apparatus having a number
`of individual processing units between which processes/
`threads/tasks may be swapped under control of an activity
`level monitor, or thermal or power monitor, according to one
`embodiment. In the embodiment of FIG. 6, N processing
`units, processing unit 600-1, 600-2 through 600-N are
`coupled to a monitor or detection (generically referred to as
`“monitor) logic 610. In one embodiment, the monitor 610
`includes an activity, thermal and/or power monitoring unit
`that monitors the activity/performance, power consumption,
`and/or temperature of the processing units 600-1 through
`600-N. In one embodiment, performance counters may be
`used to monitor the activity level of processing units 600-1
`through 600-N. In one embodiment, the monitor 610 orches
`trates process shifting between processing units in order to
`manage power consumption and/or particularly thermal con
`cerns, while maintaining an acceptable level of performance.
`In one embodiment, each processing unit provides a moni
`tor value that typically reflects activity level, power consump
`tion and/or temperature information to the monitor 610 via
`signals such as processor communication (PC) lines PC-1
`through PC-N. The monitor value may take a variety of forms
`and may be a variety of different types of information. For
`example, the monitor value may simply be an analog or
`digital reading of the temperature of each processing unit.
`Alternatively, the monitor value may be a simple or complex
`activity factor that reflects the operational activity level of a
`particular processing unit. In some embodiments, power con
`sumption information reflected by the monitor value may
`
`10
`
`15
`
`25
`
`30
`
`35
`
`50
`
`55
`
`60
`
`65
`
`Petitioner Mercedes Ex-1033, 0011
`
`
`
`US 8,615,647 B2
`
`7
`include a measured current level or other indication of how
`much power is being consumed by the processing unit. Addi
`tionally, some embodiments may convey power consumption
`information to the monitor 110 that is a composite of several
`of these or other types of known or otherwise available means
`of measuring or estimating power consumption. Accordingly,
`Some power consumption metric which reflects one or more
`of these or other power consumption indicators may be
`derived. The transmitted monitor value may reflectatempera
`ture or a power consumption metric, which itself may factor
`in a temperature. Serial, parallel, and/or various known or
`otherwise available protocols may be used to transmit this
`information to the power monitor.
`In one embodiment, the monitor 610 receives the power
`consumption information from the various processing units
`and analyzes whether the power consumption or activity level
`of one processing unit is at a level to justify the overhead of
`re-allocating processes to different processing units. For
`example, the monitor may be triggered to rearrange processes
`when a particular processing unit falls below a threshold level
`of activity, or when power consumption is above an accept
`able level. In one embodiment, the monitor 610 may develop
`a total power consumption metric to indicate the total power
`consumption, total activity level metric, or total thermal state
`of all processing units to effectuate the various power control
`strategies. In one embodiment, the monitor 610 may be a
`hardware component, a software component, routine, or
`module, or a combination of hardware and software that
`works either dependently or independently of the operating
`system.
`In one embodiment, the monitor communicates to the pro
`cessing units via thread or process swap control (S