throbber
(12) United States Patent
`Hum et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,615,647 B2
`Dec. 24, 2013
`
`USOO8615647B2
`
`(54) MIGRATING EXECUTION OF THREAD
`BETWEEN CORES OF DIFFERENT
`INSTRUCTION SETARCHITECTURE IN
`
`MULT-CORE PROCESSOR AND
`
`TRANSTONING EACH CORE TO
`RESPECTIVE ON / OFF POWER STATE
`
`(75) Inventors: Herbert Hum, Portland, OR (US); Eric
`Sprangle, Austin, TX (US); Doug
`Carmean, Beaverton, OR (US); Rajesh
`Kumar, Portland, OR (US)
`
`(*) Notice:
`
`(73) Assignee: Intel Corporation, Santa Clara, CA
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 538 days.
`(21) Appl. No.: 12/220,092
`
`(22) Filed:
`(65)
`
`Jul. 22, 2008
`Prior Publication Data
`US 2009/0222654 A1
`Sep. 3, 2009
`Related U.S. Application Data
`(60) Provisional application No. 61/067,737, filed on Feb.
`29, 2008.
`(51) Int. Cl.
`G06F L/32
`(52) U.S. Cl.
`USPC .............................. 713/100; 712/32; 713/323
`(58) Field of Classification Search
`None
`See application file for complete search history.
`
`(2006.01)
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,763,242 A
`5,293,500 A
`
`8, 1988 Lee et al.
`3, 1994 Ishida et al.
`
`5,890,799 A * 4/1999 Yiu et al. ...................... T13,321
`5.991,883. A I. 1999 tion
`38
`A . 139 Eday
`T12/41
`
`6,035.408 A
`
`3, 2000 Huang .
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`6,058.434 A
`6.219,742 B1
`6,240,521 B1
`6.405,320 B1
`6,501,999 B1
`6,513,057 B1
`6,513,124 B1
`6,567,839 B1
`6,631,474 B1
`
`5, 2000 Wilt et al.
`4/2001 Stanley
`5, 2001 Barber et al.
`6/2002 Lee et al.
`12/2002 Cai
`1/2003 McCrory
`1/2003 Furuichi et al.
`5/2003 Borkenhagen et al.
`10/2003 Cai et al.
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`wooie:
`858.
`WOO3100546
`10, 2003
`2004/064119
`T 2004
`
`W.
`WO
`WO
`
`OTHER PUBLICATIONS
`Seng et al. “Reducing Power with Dynamic Critical Path Informa
`tion'. Proc. of the 34th annual ACM/IEEE international symposium
`on Microarchitecture, ACM, Dec. 2001, pp. 114-123.
`(Continued)
`
`Primary Examiner — Kenneth Kim
`s Civic or Firm — Mnemoglyphics, LLC;
`aWCIV. V1
`
`(57)
`
`ABSTRACT
`
`Techniques to control power and processing among a plural
`ity of asymmetric cores. In one embodiment, one or more
`asymmetric cores are power managed to migrate processes or
`threads among a plurality of cores according to the perfor
`mance and power needs of the system.
`
`20 Claims, 6 Drawing Sheets
`
`PROCESSOR
`
`PROCESSOR
`
`PROC.
`CORE
`
`MEMORY
`34
`
`MEMORY
`
`HIGH-PERF
`GRAPHICS
`33B
`
`BUSBRIDGE
`3.18.
`
`ODECES
`34.
`
`AUDIOIO
`324
`
`KEYBOARD
`MOUSE
`
`COMM
`DEVICES
`
`DATA STORASE
`
`code as
`
`
`
`
`
`
`
`
`
`Petitioner Mercedes Ex-1033, 0001
`
`

`

`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6,718.475 B2
`6,732,280 B1
`6,901,522 B2
`6,968,469 B1
`7,017,060 B2
`7,069,463 B2
`7,093,147 B2
`7,171,546 B2
`7,269,752 B2
`
`4/2004 Cai
`5, 2004 Cheok et al.
`5, 2005 Buch
`1 1/2005 Fleischmann et al.
`3, 2006 Therien et al.
`6, 2006 Oh
`8, 2006 Farkas et al.
`1/2007 Adams
`9, 2007 John
`
`
`
`US 8,615,647 B2
`Page 2
`
`4/2007 Sutardja
`2007/0O83785 A1
`2007/0094444 A1* 4/2007 Sutarda ........................ T11 112
`2007/022O246 A1
`9, 2007 Powell et al.
`2007/0234077 A1* 10, 2007 Rothman et al. .............. T13,300
`2008, 0028244 A1
`1/2008 Capps et al.
`2008, 0028245 A1
`1/2008 Ober et al.
`2008/0307244 A1 12/2008 Bertelsen et al.
`2009/0193243 A1* 7/2009 Ely ................................... 713/2
`2009, 0222654 A1
`9, 2009 Hum et al.
`2010.0185833 A1* 7, 2010 Saito et al. .................... T12/2O3
`2010/026891.6 A1* 10/2010 Hu et al. ......................... T12/41
`
`OTHER PUBLICATIONS
`
`Irani et al. Online Strategies for Dynamic Power Management in
`Systems with Multiple Power-Saving States.” ACM Transactions on
`Embedded Computing Systems, vol. 2, No. 3, Aug. 2003, pp. 325
`346.
`Leflurgy et al., “Energy Management for Commercial Servers. Com
`puter, IEEE, Dec. 2003, vol. 36, Issue 12, pp.39-48.
`Kumar et al., “Single-ISA Heterogeneous Multi-Core Architectures:
`The Potential for Processor Power Reduction.” Proceedings of the
`Sh Annual IEEE/ACM International Symposium on
`icroarchitecture, 2003, pp. 81-92.
`Benini et al., “A Survey of Design Techniques for System-Level
`Dynamic Power Management.” IEEE Transactions on Very Large
`Scale Integration (VLSI) Systems, Jun. 2000, vol. 8, Issue 3, pp.
`299-316.
`Abramson et al., “Intel Virtualization Technology for Directed I/O.”
`IntelTechnology Journal, Aug. 10, 2006, vol. 10, Issue 3, pp. 179-192
`(16 pages included).
`International Search Report of PCT/2007/000010, Swedish Patent
`Office, Stockholm, Sweden, dated May 16, 2007, 4 pages.
`International Search Report, PCT/2007/000010, dated May 16, 2007.
`
`* cited by examiner
`
`3: E. 1358. Shone a... is so
`7,492,368 B1
`2/2009 Nordquist et al.
`7,500,126 B2 * 3/2009 Terechko et al. ............. T13,323
`7,500,127 B2
`3/2009 Fleck et al.
`7,624,215 B2 * 1 1/2009 Axford et al. ................. T10,260
`7,743,232 B2 * 6/2010 Shen et al. .........
`712,211
`8,028,290 B2
`9/2011 Rymarczyk et al.
`718, 104
`8,060,727 B2 : 11/2011 Blixt..............
`T12/32
`200 SS R
`
`23. 5 fiG . .
`.
`.
`.
`.
`.
`. T17,140
`2002/0129288 A1
`9, 2002 LOh et al.
`2003/0065734 A1
`4/2003 Ramakesavan
`2003, OO88800 A1
`5, 2003 Cai
`2003.01.00340 A1
`5/2003 Cupps et al.
`2003/011 0012 A1
`6/2003 Orenstein et al.
`2003/0224768 Al 12/2003 Adjamah
`2005/0066209 A1
`3/2005 Kee et al.
`2005, 0132239 A1
`6, 2005 Athas et al.
`2005, 0182980 A1
`8/2005 Sutardja
`2006.0036878 A1
`2, 2006 Rothman et al.
`2006/009,5807 A1
`5, 2006 Grochowski et al.
`2006/0294401 A1* 12/2006 Munger ........................ 713,300
`2007/0038875 A1
`2/2007 Cupps et al.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`Petitioner Mercedes Ex-1033, 0002
`
`

`

`U.S. Patent
`U.S. Patent
`
`Dec. 24, 2013
`Dec. 24, 2013
`
`Sheet 1 of 6
`Sheet 1 of 6
`
`US 8,615,647 B2
`US 8,615,647 B2
`
`
`
`
`
`FIG. 1
`
`205
`205
`230
`33
`33
`
`03
`
`215
`201
`210
`215
`210
`250
`240
`250
`[220]
`240
`'' (219.
`- 29, 213
`* (29. 207
`(29.
`217
`243
`22
`ft L219)
`|7
`213
`223 77g]
`207
`253. [279]
`217
`217
`<-- 245
`||
`225
`- 255
`225
`1.
`
`
`~~|225 <— [235|<- 425 <— {255|
`227.
`| 247.
`| 257,
`L »237
`227
`--a
`247
`257
`D
`|
`oe
`[ig
`(”
`I
`I
`
`I
`
`|- 260
`
`tir.
`
`FIG. 2
`
`Petitioner Mercedes Ex-1033, 0003
`
`Petitioner Mercedes Ex-1033, 0003
`
`

`

`U.S. Patent
`
`Dec. 24, 2013
`
`Sheet 2 of 6
`
`US 8,615,647 B2
`
`7$.
`ARJOWE W
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Petitioner Mercedes Ex-1033, 0004
`
`

`

`U.S. Patent
`
`Dec. 24, 2013
`
`Sheet 3 of 6
`
`US 8,615,647 B2
`
`POWer VS. Performance
`
`
`
`O
`
`0.2
`
`0.6
`0.4
`Relative Performance
`
`1
`
`0.8
`
`401
`
`FIG. 4
`
`Petitioner Mercedes Ex-1033, 0005
`
`

`

`U.S. Patent
`
`Dec. 24, 2013
`
`Sheet 4 of 6
`
`US 8,615,647 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IT IS DETERMINED THATA
`PROCESSITHREADITASK RUNNING
`ON A MAIN PROCESSOR CORE OFA
`MULTI-CORE PROCESSOR DOES
`MAYBERUNON ALOWER
`POWER PERFORMANCE CORE
`WHILE MANTAINING AN ACCEPTABLE
`PERFORMANCE LEVEL
`501.
`
`ANEVENT OCCURS IN THE MAIN
`CORE TO CAUSE STATE FROM THE
`CORE TO BE SAVED AND COPIED TO
`A LOWERPOWERIPERFORMANCE
`CORE
`505
`
`THE TRANSFERRED THREADI
`PROCESSITASK IS RESTARTED ON
`THE LOWER POWERIPERFORMANCE
`CORE
`510
`
`THE MAN CORE MAYBE PLACED
`INALOWER POWER STATE
`515.
`
`
`
`TRANSFORMED TASK
`ACTIVITY LEVEL, EXCEEDSA
`THRESHOLD OR NEW PROCESS
`STARTED ON MAIN
`CORE2
`
`
`
`TRANSFERTASK BACK
`TOMAIN CORE
`525
`
`FIG. 5
`
`Petitioner Mercedes Ex-1033, 0006
`
`

`

`U.S. Patent
`U.S. Patent
`
`Dec. 24, 2013
`Dec. 24, 2013
`
`Sheet 5 of 6
`Sheet 5 of 6
`
`US 8,615,647 B2
`US 8,615,647 B2
`
`
`
`600-1
`
`SC-1
`
`PC-1
`
`PC-N
`
`Petitioner Mercedes Ex-1033, 0007
`
`Petitioner Mercedes Ex-1033, 0007
`
`

`

`U.S. Patent
`
`Dec. 24, 2013
`
`Sheet 6 of 6
`
`US 8,615,647 B2
`
`
`
`Main COre
`
`ULPC
`
`FIG. 8
`
`Petitioner Mercedes Ex-1033, 0008
`
`

`

`US 8,615,647 B2
`
`1.
`MGRATING EXECUTION OF THREAD
`BETWEEN CORES OF DIFFERENT
`INSTRUCTION SETARCHITECTURE IN
`MULTI-CORE PROCESSOR AND
`TRANSTONING EACH CORE TO
`RESPECTIVE ON / OFF POWER STATE
`
`FIELD OF THE INVENTION
`
`Embodiments of the invention relate generally to the field
`of information processing and more specifically, to the field
`of distributing program tasks among various processing ele
`mentS.
`
`10
`
`BACKGROUND
`
`As more processing throughput is required from modern
`microprocessors, it is often at the expense of power consump
`tion. Some applications, such as mobile internet devices
`(MIDs), ultra-mobile personal computers (UMPCs), cellular
`phones, personal digital assistants (PDAs), and even laptop/
`notebook computers, may benefit from processors that con
`sume relatively little power. However, achieving relatively
`high processing throughput at relatively low power is a chal
`lenge, involving various design trade-offs, depending on the
`usage models of the computing platform.
`One approach to reducing power in a computing platform
`when there is relatively little activity, is to place the processor
`in a low-power state. However, placing a processor in a low
`power state or returning a processor from a low-power State
`may require a non-trivial amount of time. Therefore, it may or
`may not be worth the time required to place a processor in a
`low-power state or to return the processor from a low-power
`state. Furthermore, not all processes and tasks that are run on
`a processor require the full processing throughput of the
`processor.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`15
`
`25
`
`30
`
`35
`
`2
`ing elements. In some embodiments, each processing element
`is a processor core, having one or more execution resources,
`Such as arithmetic logic units (ALUs), instruction decoder,
`and instruction retirement unit, among other things. In some
`embodiments, the number of asymmetric processing ele
`ments has at least two different processing throughput or
`performance capabilities, power consumption characteristics
`or limits, Voltage Supply requirements, clock frequency char
`acteristics, number of transistors, and/or instruction set archi
`tectures (ISAS). In one embodiment, an asymmetric micro
`processor includes at least one main processor core having
`larger power consumption characteristics and/or processing
`throughput/performance characteristics than at least one
`other processing core within or otherwise associated with the
`microprocessor.
`In one embodiment, a process or task running or intended
`to run on a main higher power/performance processing core
`may be transferred to one of the other lower power/perfor
`mance processing cores for various reasons, including that
`the process or task does not require the processing throughput
`of one of the main cores, the processor or the system in which
`its used is placed into or otherwise requires a lower-power
`consumption condition (such as when running on battery
`power), and for increasing the processing throughput of the
`asymmetric microprocessor or system in which the higher
`power/performance cores and lower power/performance
`cores are used. For example, in one embodiment, the asym
`metric processing elements may be used concurrently or oth
`erwise in parallel to perform multiple tasks or processes,
`thereby improving the overall throughput of the processor
`and processing system.
`In one embodiment, the at least one main processing core
`has a different ISA than at least one of the at least one pro
`cessor cores having a lower power consumption characteris
`tic and/or processing performance capability. In one embodi
`ment, instruction translation logic in the form of hardware,
`Software, or some combination thereof, may be used to trans
`late instructions for the at least one main processor core into
`instructions for the at least one other lower-power/perfor
`mance processing core. For example, in one embodiment, one
`or more of the main higher power/performance cores may
`have a complex instruction set computing (CISC) architec
`ture, such as the “x86’ computing architecture, and therefore
`performs instructions that are intended for x86 processor
`cores. One or more of the lower power/performance cores
`may have a different ISA than the main core, including a
`reduced instruction set computing (RISC) architecture. Such
`as an Advanced RISC Machine (ARM) core. In other embodi
`ments, the main processing element(s) and the lower power/
`performance processing element(s) may include other archi
`tectures, such as the MIPS ISA. In other embodiments the
`main processing element(s) may have the same ISA as the
`lower power/performance element(s) (e.g., x86).
`In one embodiment, a number of different threads, pro
`cesses, or tasks associated with one or more software pro
`grams may be intelligently moved among and ran on a num
`ber of different processing elements, having a number of
`different processing capabilities (e.g., operating Voltage, per
`formance, power consumption, clock frequency, pipeline
`depth, transistor leakage, ISA), according to the dynamic
`performance and power consumption needs of the processor
`or computer system. For example, if one process, such as that
`associated with a spreadsheet application, does not require
`the full processing capabilities of a main, higher performance
`processor core, but may be instead be ran with acceptable
`performance on a lower-power core, the process may be
`transferred to or otherwise ran on the lower power core and
`
`Embodiments of the invention are illustrated by way of
`example, and not by way of limitation, in the figures of the
`accompanying drawings and in which like reference numer
`als refer to similar elements and in which:
`FIG. 1 illustrates a block diagram of a microprocessor, in
`which at least one embodiment of the invention may be used;
`FIG. 2 illustrates a block diagram of a shared bus computer
`system, in which at least one embodiment of the invention
`may be used;
`FIG. 3 illustrates a block diagram a point-to-point inter
`connect computer system, in which at least one embodiment
`of the invention may be used;
`FIG. 4 is a curve showing the relationship between power
`and performance using at least one embodiment of the inven
`tion;
`FIG. 5 is a flow diagram of operations that may be used for
`performing at least one embodiment of the invention;
`FIG. 6 illustrates a number of processing units and an
`activity level, thermal, or power detection/monitoring unit
`that may be used in at least one embodiment.
`FIG. 7 illustrates a power management logic according to
`one embodiment.
`FIG. 8 illustrates a technique to transition between at least
`two asymmetric processing cores, according to one embodi
`ment.
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`DETAILED DESCRIPTION
`
`Embodiments of the invention include a microprocessor or
`processing system having a number of asymmetric process
`
`Petitioner Mercedes Ex-1033, 0009
`
`

`

`US 8,615,647 B2
`
`10
`
`15
`
`3
`the main, higher power processor core may be placed in a low
`power state or may just remain idle. By running threads/
`processes/tasks on a processor core that better matches the
`performance needs of the thread/process/task, power con
`Sumption may be optimized, according to some embodi
`mentS.
`FIG. 1 illustrates a microprocessor in which at least one
`embodiment of the invention may be used. In particular, FIG.
`1 illustrates microprocessor 100 having one or more main
`processor cores 105 and 110, each being able to operate at a
`higher performance level (e.g., instruction throughput) or
`otherwise consume more power than one or more low-power
`cores 115,120. In one embodiment, the low-power cores may
`be operated at the same or different operating Voltage as the
`main cores. Furthermore, in some embodiments, the low
`power cores may operate a different clock speed or have fewer
`execution resources, such that they operate at a lower perfor
`mance level than the main cores.
`In other embodiments, the low-power cores may be of a
`different ISA than the main cores. For example, the low
`power cores may have an ARMISA and the main cores may
`have an x86 ISA, such that a program using x86 instructions
`may need to have these instructions translated into ARM
`instructions if a process/task/thread is transferred to one of
`the ARM cores. Because the process/thread/task being trans
`25
`ferred may be one that does not require the performance of
`one of the main cores, a certain amount of latency associated
`with the instruction translation may be tolerated without
`noticeable or significant loss of performance.
`Also illustrated in FIG. 1 is at least one other non-CPU
`functional unit 117, 118, and 119 which may perform other
`non-CPU related operations. In one embodiment, the func
`tional units 117, 118, and 119 may include functions such as
`graphics processing, memory control and I/O or peripheral
`control. Such as audio, video, disk control, digital signal pro
`cessing, etc. The multi-core processor of FIG. 1 also illus
`trates a cache 123 that each core can access for data or instruc
`tions corresponding to any of the cores.
`In one embodiment, logic 129 may be used to monitor
`performance or power of any of the cores illustrated in FIG. 1
`in order to determine whether a process/task/thread should be
`migrated from one core to another to optimize power and
`performance. In one embodiment, logic 129 is associated
`with the main cores 105 and 110 to monitor an activity level
`of the cores to determine whether the processes/threads/tasks
`running on those cores could be run on a lower-power core
`115, 120 at an acceptable performance level, thereby reduc
`ing the overall power consumption of the processor. In other
`embodiments, logic 129 may respond to a power state of the
`system, Such as when the system goes from being plugged
`into an A/C outlet to battery power. In this case, the OS or
`Some other power state monitoring logic may inform logic
`129 of the new power conditions and the logic 129 may cause
`a current-running process (or processes yet to be scheduled to
`run) to either be transferred (or scheduled) to a lower-power
`core (in the case of going from A/C to battery, for example) or
`from a lower-power core to a main core (in the case of going
`from battery to A/C, for example). In some embodiments, an
`operating system (OS) may be responsible for monitoring or
`otherwise controlling the power states of the processor and/or
`system, such that the logic 129 simply reacts to the OS's
`commands to reduce power by migrating taskS/threads/pro
`cesses to a core that better matches the performance needs of
`the taskS/threads/processes while accomplishing the power
`requirements dictated or indicated by the OS.
`In some embodiments, the logic 129 may be hardware
`logic or software, which may or may not determine a core(s)
`
`35
`
`4
`on which a process/task/thread should be run independently
`of the OS. In one embodiment, for example, logic 129 is
`implemented in software to monitor the activity level of the
`cores, such as the main cores, to see if it drops below a
`threshold level, and in response thereto, causes one or more
`processes running on the monitored core(s) to be transferred
`to a lower-power core, such as cores 115 and 120. Conversely,
`logic 129 may monitor the activity level of a process running
`on a lower-power core 115 and 120 in order to determine
`whether it is rising above a threshold level, thereby indicating
`the process should be transferred to one of the main cores 105,
`110. In other embodiments, logic 129 may independently
`monitor other performance or power indicators within the
`processor or system and cause processes/threads/tasks to be
`migrated to cores that more closely fit the performance needs
`of the tasks/processes/threads while meeting the power
`requirements of the processor of the system at a given time. In
`this way, the power and performance of processor 100 can be
`controlled without the programmer or OS being concerned or
`even aware of the underlying power state of the processor.
`In other embodiments, each core in FIG.1 may be concur
`rently running different taskS/threads/processes to get the
`most performance benefit possible from the processor. For
`example, in one embodiment, a process/thread/task that
`requires high performance may be run on a main core 105.
`110 concurrently with a process/thread/task that doesn’t
`require as high performance as what the main cores are able to
`deliver on lower-power cores 115, 120. In one embodiment,
`the programmer determines where to schedule these taskS/
`threads/processes, whereas in other embodiments, these
`threads/tasks/processes may be scheduled by an intelligent
`thread scheduler (not shown) that is aware of the performance
`capabilities of each core and can schedule the threads to the
`appropriate core accordingly. In other embodiments, the
`threads are simply scheduled without regard to the perfor
`mance capabilities of the underlying cores and the threads/
`processes/tasks are migrated to a more appropriate core after
`the activity levels of the cores in response to the threads/
`processes/tasks are determined. In this manner, neither an OS
`nor a programmer need be concerned about where the
`threads/processes/tasks are scheduled, because the threads/
`processes/tasks are scheduled on the appropriate core(s) that
`best suits the performance requirement of each thread while
`maintaining the power requirements of the system or proces
`SO.
`In one embodiment, logic 129 may be hardware, software,
`or some combination thereof. Furthermore, logic 129 may be
`distributed within one or more cores or exist outside the cores
`while maintaining electronic connection to the one or more
`cores to monitor activity/power and cause threads/tasks/pro
`cesses to be transferred to appropriate cores.
`FIG. 2, for example, illustrates a front-side-bus (FSB)
`computer system in which one embodiment of the invention
`may be used. Any processor 201, 205, 210, or 215 may
`include asymmetric cores (differing in performance, power,
`operating Voltage, clock speed, or ISA), which may access
`information from any local level one (L1) cache memory 220,
`225, 230, 235, 240, 245, 250, 255 within or otherwise asso
`ciated with one of the processor cores 223, 227, 233, 237,
`243, 247, 253, 257. Furthermore, any processor 201, 205,
`210, or 215 may access information from any one of the
`shared level two (L2) caches 203, 207, 213, 217 or from
`system memory 260 via chipset 265. One or more of the
`processors in FIG.2 may include or otherwise be associated
`with logic 219 to monitor and/or control the scheduling or
`migration of processes/threads/tasks between each of the
`asymmetric cores of each processor. In one embodiment,
`
`30
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Petitioner Mercedes Ex-1033, 0010
`
`

`

`US 8,615,647 B2
`
`5
`logic 219 may be used to schedule or migrate threads/tasks/
`processes to or from one asymmetric core in one processor to
`another asymmetric core in another processor.
`In addition to the FSB computer system illustrated in FIG.
`2, other system configurations may be used in conjunction
`with various embodiments of the invention, including point
`to-point (P2P) interconnect systems and ring interconnect
`systems. The P2P system of FIG.3, for example, may include
`several processors, of which only two, processors 370, 380
`are shown by example. Processors 370,380 may each include
`a local memory controller hub (MCH)372, 382 to connect
`with memory 32, 34. Processors 370,380 may exchange data
`via a P2P interface 350 using P2P interface circuits 378,388.
`Processors 370, 380 may each exchange data with a chipset
`390 via individual P2P interfaces 352, 354 using point to
`point interface circuits 376,394,386,398. Chipset 390 may
`also exchange data with a high-performance graphics circuit
`338 via a high-performance graphics interface 339. Embodi
`ments of the invention may be located within any processor
`having any number of processing cores, or within each of the
`P2P bus agents of FIG. 3. In one embodiment, any processor
`core may include or otherwise be associated with a local
`cache memory (not shown). Furthermore, a shared cache (not
`shown) may be included in either processor outside of both
`processors, yet connected with the processors via P2P inter
`connect, Such that either or both processors local cache infor
`mation may be stored in the shared cache if a processor is
`placed into a low power mode. One or more of the processors
`or cores in FIG.3 may include or otherwise be associated with
`logic to monitor and/or control the scheduling or migration of
`processes/threads/tasks between each of the asymmetric
`cores of each processor.
`FIG. 4 is a graph illustrating the performance and power
`characteristics associated with a processor when Scaling Volt
`age and frequency including techniques according to at least
`one embodiment of the invention. Reducing Voltage is an
`efficient way of reducing power since the frequency scales
`linearly with the voltage, while the power scales as the volt
`age 3 (power-CV2F). Unfortunately, this efficient voltage
`Scaling approach only works within a range of Voltages; at
`Some point, 'Vimin', the transistor Switching frequency does
`not scale linearly with voltage. At this point (401), to further
`reduce power, the frequency is reduced without dropping the
`Voltage. In this range, the power scales linearly with the
`frequency which is not nearly as attractive as when in the
`range where Voltage scaling is possible. In one embodiment,
`power consumption of the system may be reduced below the
`minimum point 401 of a typical multi-core processor having
`symmetric processing elements by Scheduling or migrating
`processes/threads/tasks from higher-performance/power
`cores to lower-performance/power cores if appropriate. In
`FIG. 4, the power/performance curve segment 405 indicates
`where the overall non-linear power/performance curve could
`be extended to enable more power savings, in one embodi
`ment.
`FIG. 5 illustrates a flow diagram of operations that may be
`used in conjunction with at least one embodiment of the
`invention. At operation 501, it is determined that a process/
`thread/task running on a main processor core of a multi-core
`processor does may be run on a lower power/performance
`core while maintaining an acceptable performance level. In
`one embodiment, the determination could be made by moni
`toring the activity level of the main core in response to run
`ning the thread/process/task and comparing it to a threshold
`value, corresponding to an acceptable performance metric of
`the lower power/performance core. In other embodiments,
`the determination could be made based on system power
`
`40
`
`45
`
`6
`requirements, such as when the system is running on A/C
`power versus battery power. In yet other embodiments, a
`thread/process/task may be designated to require only a cer
`tain amount of processor performance, for example, by a
`programmer, the OS, etc. In other embodiments, other tech
`niques for determining whether a task/thread/process could
`be transferred to a lower power/performance core, thereby
`reducing power consumption.
`At operation 505, an event (e.g., yield, exception, etc.)
`occurs in the main core to cause state from the core to be saved
`and copied to a lower power/performance core. In one
`embodiment, a handler program is invoked in response to the
`event to cause the main core state to be transferred from the
`main core to a lower power/performance core. At operation
`510, the transferred thread/process/task is restarted or
`resumed on the lower power/performance core. At operation
`515, the main core may be placed in a lower power state (e.g.,
`paused, halted, etc.) until 520 either the transferred process/
`task/thread requires above a threshold level of performance,
`in which case the thread/process/task may be transferred back
`to the main core 525 in a similar manner as it was transferred
`to the lower power/performance core, or another task/pro
`cess/thread is scheduled for execution on the main core.
`In one embodiment, the thread/process/task transferred
`from the main core to the lower power/performance core is
`first translated from the ISA of the main core to the ISA of the
`lower power/performance core, if the two have different
`architectures. For example, in one embodiment, the main core
`is an x86 architecture core and the lower power/performance
`core is an ARM architecture core, in which case instructions
`of the transferred thread/process/task may be translated (for
`example, by a software binary translation shell) from x86
`instructions to ARM instructions. Because the thread/pro
`cess/task being transferred is by definition one that does not
`require as much performance as to require it to be ran on the
`main core, a certain amount of latency may be tolerated in
`translating the process/task/thread from the x86 architecture
`to ARM architecture.
`FIG. 6 illustrates a processing apparatus having a number
`of individual processing units between which processes/
`threads/tasks may be swapped under control of an activity
`level monitor, or thermal or power monitor, according to one
`embodiment. In the embodiment of FIG. 6, N processing
`units, processing unit 600-1, 600-2 through 600-N are
`coupled to a monitor or detection (generically referred to as
`“monitor) logic 610. In one embodiment, the monitor 610
`includes an activity, thermal and/or power monitoring unit
`that monitors the activity/performance, power consumption,
`and/or temperature of the processing units 600-1 through
`600-N. In one embodiment, performance counters may be
`used to monitor the activity level of processing units 600-1
`through 600-N. In one embodiment, the monitor 610 orches
`trates process shifting between processing units in order to
`manage power consumption and/or particularly thermal con
`cerns, while maintaining an acceptable level of performance.
`In one embodiment, each processing unit provides a moni
`tor value that typically reflects activity level, power consump
`tion and/or temperature information to the monitor 610 via
`signals such as processor communication (PC) lines PC-1
`through PC-N. The monitor value may take a variety of forms
`and may be a variety of different types of information. For
`example, the monitor value may simply be an analog or
`digital reading of the temperature of each processing unit.
`Alternatively, the monitor value may be a simple or complex
`activity factor that reflects the operational activity level of a
`particular processing unit. In some embodiments, power con
`sumption information reflected by the monitor value may
`
`10
`
`15
`
`25
`
`30
`
`35
`
`50
`
`55
`
`60
`
`65
`
`Petitioner Mercedes Ex-1033, 0011
`
`

`

`US 8,615,647 B2
`
`7
`include a measured current level or other indication of how
`much power is being consumed by the processing unit. Addi
`tionally, some embodiments may convey power consumption
`information to the monitor 110 that is a composite of several
`of these or other types of known or otherwise available means
`of measuring or estimating power consumption. Accordingly,
`Some power consumption metric which reflects one or more
`of these or other power consumption indicators may be
`derived. The transmitted monitor value may reflectatempera
`ture or a power consumption metric, which itself may factor
`in a temperature. Serial, parallel, and/or various known or
`otherwise available protocols may be used to transmit this
`information to the power monitor.
`In one embodiment, the monitor 610 receives the power
`consumption information from the various processing units
`and analyzes whether the power consumption or activity level
`of one processing unit is at a level to justify the overhead of
`re-allocating processes to different processing units. For
`example, the monitor may be triggered to rearrange processes
`when a particular processing unit falls below a threshold level
`of activity, or when power consumption is above an accept
`able level. In one embodiment, the monitor 610 may develop
`a total power consumption metric to indicate the total power
`consumption, total activity level metric, or total thermal state
`of all processing units to effectuate the various power control
`strategies. In one embodiment, the monitor 610 may be a
`hardware component, a software component, routine, or
`module, or a combination of hardware and software that
`works either dependently or independently of the operating
`system.
`In one embodiment, the monitor communicates to the pro
`cessing units via thread or process swap control (S

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket