`US 20110213950Al
`
`c19) United States
`c12) Patent Application Publication
`Mathieson et al.
`
`c10) Pub. No.: US 2011/0213950 Al
`Sep. 1, 2011
`(43) Pub. Date:
`
`(54) SYSTEM AND METHOD FOR POWER
`OPTIMIZATION
`
`(76)
`
`Inventors:
`
`John George Mathieson, San Jose,
`CA (US); Phil Carmack, Santa
`Clara, CA (US); Brian Smith,
`Mountain View, CA (US)
`
`(21) Appl. No.:
`
`12/787,359
`
`(22) Filed:
`
`May 25, 2010
`
`Related U.S. Application Data
`
`(63) Continuation-in-part of application No. 12/137,053,
`filed on Jun. 11, 2008.
`
`Publication Classification
`
`(51)
`
`Int. Cl.
`G06F 15176
`(2006.01)
`G06F 9102
`(2006.01)
`(52) U.S. Cl. ................................... 712/30; 712/E09.002
`ABSTRACT
`(57)
`
`A technique for reducing the power consumption required to
`execute processing operations. A processing complex, such
`as a CPU or a GPU, includes a first set of cores comprising
`one or more fast cores and second set of cores comprising one
`or more slow cores. A processing mode of the processing
`complex can switch between a first mode of operation and a
`second mode of operation based on one or more of the work(cid:173)
`load characteristics, performance characteristics of the first
`and second sets of cores, power characteristics of the first and
`second sets of cores, and operating conditions of the process(cid:173)
`ing complex. A controller causes the processing operations to
`be executed by either the first set of cores or the second set of
`cores to achieve the lowest total power consumption.
`
`START
`
`/400A
`
`Process one or more operations
`on first set of cores
`402
`
`Evaluate processing parameter associated
`with processing the one or more operations
`404
`
`YES
`
`Process the one or more operations
`on second set of cores
`408
`
`Petitioner Samsung Ex-1005, 0001
`
`
`
`Patent Application Publication
`
`Sep. 1, 2011 Sheet 1 of 7
`
`US 2011/0213950 Al
`
`System Memory
`104
`
`Device Driver
`103
`-
`
`Computer
`System
`
`/100
`
`.
`
`Memory
`Bridge
`105
`
`Parallel Processing
`Subsystem
`112
`
`CPU
`102
`
`Fast Cores
`130
`-
`
`Slow Cores
`140
`-
`
`□
`
`Displa
`y
`Devic
`e
`/110
`
`Input De vices
`108
`
`-
`
`-
`
`Communications
`Path
`v106
`
`,,,,-
`~
`System
`Disk
`114
`
`-...
`
`. / '
`
`,
`
`~
`
`-,,
`
`~
`
`1/0 Bridge
`107
`
`' ,.
`
`I
`I
`
`I
`
`Add-In Card
`120
`
`I -
`
`.
`
`Switch
`116
`
`Add-In Card
`121
`
`Network
`Adapter
`118
`
`FIG. 1
`
`Petitioner Samsung Ex-1005, 0002
`
`
`
`Patent Application Publication
`
`Sep. 1, 2011 Sheet 2 of 7
`
`US 2011/0213950 Al
`
`212
`
`212
`
`C~RE r
`1=] 1
`l C~RE I 1
`C~RE r,
`
`212
`
`r-210
`
`212
`
`'I'
`
`CONTROLLER
`240
`-
`...
`
`.
`
`.
`
`I
`
`DATA
`214
`-
`
`222
`
`. B
`
`DATA
`224
`-
`
`I
`
`I
`
`I
`
`-~
`
`1ir
`
`RESOURCE
`UNIT
`230
`-
`
`J
`
`r-220
`
`~ -,
`
`CPU
`102
`
`FIG. 2
`
`Petitioner Samsung Ex-1005, 0003
`
`
`
`Patent Application Publication
`
`Sep. 1, 2011 Sheet 3 of 7
`
`US 2011/0213950 Al
`
`CONTROLLER
`240
`-
`
`TAG ~336
`
`338---....
`
`DATA
`
`""
`
`I
`I
`
`334
`-
`f
`
`I
`
`-; ..
`
`~332
`·•
`
`230
`
`"'-340
`
`..
`-
`
`-
`-
`
`L2 CACHE
`CONTROLLER
`320
`-
`
`L2 CACHE
`CONTROLLER
`310
`-
`
`210
`
`220
`
`CPU
`102
`
`FIG. 3
`
`Petitioner Samsung Ex-1005, 0004
`
`
`
`Patent Application Publication
`
`Sep. 1, 2011 Sheet 4 of 7
`
`US 2011/0213950 Al
`
`START
`
`✓ 400A
`
`Process one or more operations
`on first set of cores
`402
`
`Evaluate processing parameter associated
`with processing the one or more operations
`404
`
`rocessing parameter
`threshold valu
`
`YES
`
`Process the one or more operations
`on second set of cores
`408
`
`FIG. 4A
`
`Petitioner Samsung Ex-1005, 0005
`
`
`
`Patent Application Publication
`
`Sep. 1, 2011 Sheet 5 of 7
`
`US 2011/0213950 Al
`
`/400B
`
`START
`
`Evaluate the workload associated with executing
`one or more processing operations,
`performance data and/or power data associated with
`a first set of cores, and performance data and/or
`power data associated with a second set of cores
`452
`
`Optionally/, evaluate operating conditions of
`the processing complex
`454
`
`--------r--------
`
`7
`
`-
`
`Execute the processing operations with the first
`set of cores
`456
`l
`
`Evaluate the workload associated with executing
`one or more processing operations,
`performance data and/or power data associated with
`a first set of cores, and performance data and/or
`power data associated with a second set of cores
`458
`
`________ J ________
`
`7
`
`Optionally,, evaluate operating conditions of
`the processing complex
`460
`
`L
`
`Execute the processing operations with the
`second set of cores
`462
`
`END
`
`FIG. 48
`
`Petitioner Samsung Ex-1005, 0006
`
`
`
`Patent Application Publication
`
`Sep. 1, 2011 Sheet 6 of 7
`
`US 2011/0213950 Al
`
`1500
`
`START
`
`Execute processing operations with one or more
`cores having a first type and access to a resource
`502
`
`Determine that at least a workload
`associated with the processing operations
`has changed
`
`504
`
`Execute the processing operations with one or more
`cores having a second type and access to the resource
`506
`
`END
`
`FIG. 5
`
`Petitioner Samsung Ex-1005, 0007
`
`
`
`Patent Application Publication
`
`Sep. 1, 2011 Sheet 7 of 7
`
`US 2011/0213950 Al
`
`✓ 600
`
`Power
`
`604
`
`~ 610
`
`I
`
`602
`
`Frequency (MHz)
`
`FIG. 6
`
`Petitioner Samsung Ex-1005, 0008
`
`
`
`US 2011/0213950 Al
`
`Sep. 1, 2011
`
`1
`
`SYSTEM AND METHOD FOR POWER
`OPTIMIZATION
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`[0001] This application in a continuation-in-part of U.S.
`patent application Ser. No. 12/137,053, filed on Jun. 11, 2008
`(Attorney Docket No. NVDA/P003709), which is hereby
`incorporated herein by reference.
`
`BACKGROUND OF THE INVENTION
`
`[0002]
`1. Field of the Invention
`[0003] The present invention relates generally to computer
`hardware and, more specifically, to a system and method for
`power optimization.
`[0004] 2. Description of the Related Art
`[0005] Low power design has become increasingly impor(cid:173)
`tant in recent years. With the proliferation of battery-powered
`mobile devices, efficient power management is quite impor(cid:173)
`tant to the success of a product or system.
`[0006] A number of techniques have been developed to
`increase performance and/or reduce power consumption in
`conventional integrated circuits (I Cs). For example, sleep and
`standby modes, multi-threading techniques, multi-core tech(cid:173)
`niques, and other techniques are currently implemented to
`increase performance and/or decrease power consumption.
`However, these techniques do not reduce power consumption
`enough to meet the requirements of certain emerging tech(cid:173)
`nologies and products.
`[0007] As the foregoing illustrates, what is needed in the art
`is an improved technique for power optimization that over(cid:173)
`comes
`the drawbacks associated with conventional
`approaches.
`
`SUMMARY
`
`[0008] One embodiment of the invention sets forth a com(cid:173)
`puter-implemented method for processing one or more opera(cid:173)
`tions within a processing complex. The method includes
`causing the one or more operations to be processed by a first
`set of cores within the processing complex; evaluating at least
`a workload associated with processing the one or more opera(cid:173)
`tions to determine that the one or more operations should be
`processed by a second set of cores included within the pro(cid:173)
`cessing complex; and causing the one or more operations to
`be processed by the second set of cores.
`[0009] Another embodiment of the invention provides a
`computer-implemented method for processing one or more
`operations within a processing complex. The method
`includes causing the one or more operations to be processed
`by a first set of cores within the processing complex; evalu(cid:173)
`ating at least a workload associated with processing the one or
`more operations, performance data and power data associated
`with the first set of cores, and performance data and power
`data associated with a second set of cores included within the
`processing complex to determine whether the one or more
`operations should continue to be processed by the first set of
`cores or should be processed by the second set of cores; and
`causing the one or more operations to continue to be pro(cid:173)
`cessed by the first set of cores or to be processed by the second
`set of cores.
`[001 OJ Yet another embodiment of the invention provides a
`computer-implemented method for processing one or more
`operations within a processing complex. The method
`
`includes causing the one or more operations to be processed
`by a first set of cores included within the processing complex,
`where the first set of core is configured to utilize a resource
`unit when processing the one or more operations; evaluating
`at least a workload associated with processing the one or more
`operations to determine that the one or more operations
`should be processed by a second set of cores included within
`the processing complex; and causing the one or more opera(cid:173)
`tions to be processed by the second set of cores included
`within the processing complex, where the second set of cores
`is configured to utilize the resource unit when processing the
`one or more operations.
`[0011] Advantageously, embodiments of the invention pro(cid:173)
`vide techniques to decrease the total power consumption of a
`processor.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0012] So that the manner in which the above recited fea(cid:173)
`tures of the invention can be understood in detail, a more
`particular description of the invention, briefly summarized
`above, may be had by reference to embodiments, some of
`which are illustrated in the appended drawings. It is to be
`noted, however, that the appended drawings illustrate only
`typical embodiments of this invention and are therefore not to
`be considered limiting of its scope, for the invention may
`admit to other equally effective embodiments.
`[0013] FIG. 1 is a block diagram illustrating a computer
`system configured to implement one or more aspects of the
`invention.
`[0014] FIG. 2 is a conceptual diagram illustrating a pro(cid:173)
`cessing complex that includes heterogeneous cores, accord(cid:173)
`ing to one embodiment of the invention.
`[0015] FIG. 3 is a conceptual diagram illustrating a pro(cid:173)
`cessing complex that includes a shared resource, according to
`one embodiment of the invention.
`[0016] FIGS. 4A-4B are flow diagrams of method steps for
`switching between modes of operation of a processing com(cid:173)
`plex, according to various embodiments of the invention.
`[0017] FIG. 5 is a flow diagram of method steps for switch(cid:173)
`ing between modes of operation of a processing complex
`having a shared resource, according to one embodiment of the
`invention.
`[0018] FIG. 6 is a conceptual diagram illustrating power
`consumption as a function of operating frequency for differ(cid:173)
`ent types of processing cores, according to one embodiment
`of the invention.
`
`DETAILED DESCRIPTION
`
`[0019]
`In the following description, numerous specific
`details are set forth to provide a more thorough understanding
`of the invention. However, it will be apparent to one of skill in
`the art that the invention may be practiced without one or
`more of these specific details. In other instances, well-known
`features have not been described in order to avoid obscuring
`embodiments of the invention.
`
`System Overview
`
`[0020] FIG. 1 is a block diagram illustrating a computer
`system 100 configured to implement one or more aspects of
`the invention. Computer system 100 includes a central pro(cid:173)
`cessing unit (CPU) 102 and a system memory 104 commu(cid:173)
`nicating via a bus path through a memory bridge 105. The
`CPU 102 includes one or more "fast" cores 130 and one or
`
`Petitioner Samsung Ex-1005, 0009
`
`
`
`US 2011/0213950 Al
`
`Sep. 1, 2011
`
`2
`
`more "shadow" or slow cores 140, as described in greater
`detail herein. In some embodiments, the cores 130 are asso(cid:173)
`ciated with higher performance and higher leakage power
`than the cores 140. Memory bridge 105 may be integrated
`into CPU 102 as shown in FIG. 1. Alternatively, memory
`bridge 105, may be a conventional device, e.g., a Northbridge
`chip, that is coupled to CPU 102 via a bus. Memory bridge
`105 is also coupled to an I/O (input/output) bridge 107 via
`communication path 106 (e.g., a HyperTransport link).
`[0021]
`I/O bridge 107, which may be, e.g., a Southbridge
`chip, receives user input from one or more user input devices
`108 (e.g., keyboard, mouse) and forwards the input to CPU
`102 via path 106 and memory bridge 105. A parallel process(cid:173)
`ing subsystem 112 is coupled to memory bridge 105 via a bus
`or other communication path 113 ( e.g., a PCI Express, Accel(cid:173)
`erated Graphics Port, or HyperTransport link); in one
`embodiment parallel processing subsystem 112 is a graphics
`subsystem that delivers pixels to a display device 110 ( e.g., a
`conventional CRT or LCD based monitor). A system disk 114
`is also connected to I/O bridge 107. A switch 116 provides
`connections between I/O bridge 107 and other components
`such as a network adapter 118 and various add-in cards 120
`and 121. Other components (not explicitly shown), including
`USB or other port connections, CD drives, DVD drives, film
`recording devices, and the like, may also be connected to I/O
`bridge 107. Communication paths interconnecting the vari(cid:173)
`ous components in FIG. 1 may be implemented using any
`suitable protocols, such as PCI (Peripheral Component Inter(cid:173)
`connect), PCI Express (PCI-E), AGP (Accelerated Graphics
`Port), HyperTransport, or any other bus or point-to-point
`communication protocol(s ), and connections between differ(cid:173)
`ent devices may use different protocols as is known in the art.
`[0022]
`In one embodiment, the parallel processing sub(cid:173)
`system 112 incorporates circuitry optimized for graphics and
`video processing, including, for example, video output cir(cid:173)
`cuitry, and constitutes a graphics processing unit (GPU). In
`another embodiment, the parallel processing subsystem 112
`incorporates circuitry optimized for general purpose process(cid:173)
`ing, while preserving the underlying computational architec(cid:173)
`ture. In yet another embodiment, the parallel processing sub(cid:173)
`system 112 may be integrated with one or more other system
`elements, such as the memory bridge 105, CPU 102, and I/O
`bridge 107 to form a system on chip (SoC).
`[0023]
`It will be appreciated that the system shown in FIG.
`1 is illustrative and that variations and modifications are pos(cid:173)
`sible. The connection topology, including the number and
`arrangement of bridges, may be modified as desired. For
`instance, in some embodiments, system memory 104 is
`directly connected to CPU 102 rather than connected through
`a bridge, and other devices communicate with system
`memory 104 via memory bridge 105 and CPU 102. In other
`alternative topologies, parallel processing subsystem 112 is
`connected to I/O bridge 107 or directly to CPU 102, rather
`than to memory bridge 105. In still other embodiments, one or
`more of CPU 102, I/O bridge 107, parallel processing sub(cid:173)
`system 112, and memory bridge 105 may be integrated into
`one or more chips. The particular components shown herein
`are optional; for instance, any number of add-in cards or
`peripheral devices might be supported. In some embodi(cid:173)
`ments, switch 116 is eliminated, and network adapter 118 and
`add-in cards 120, 121 connect directly to I/O bridge 107.
`Power Optimization Implementation
`[0024] FIG. 2 is a conceptual diagram illustrating a pro(cid:173)
`cessing complex that includes heterogeneous cores, accord-
`
`ing to one embodiment of the invention. As shown, the pro(cid:173)
`cessing complex comprises the CPU 102 shown in FIG. 1. In
`other embodiments, the processing complex may be any other
`type of processing unit, such as a graphics processing unit
`(GPU).
`[0025] The CPU 102 includes a first set of cores 210, a
`second set of cores 220, a shared resource 230, and a control(cid:173)
`ler 240. Other components included within the CPU 102 are
`omitted to avoid obscuring embodiments of the invention. In
`some embodiments, the first set of cores 210 includes one or
`more cores 212 and data 214, and the second set of cores 220
`includes one or more cores 222 and data 224. In some
`embodiments, the first set of cores 210 and the second set of
`cores 220 are included on the same chip. In other embodi(cid:173)
`ments the first set of cores 210 and the second set of cores 220
`are in~luded on separate chips that comprise the CPU 102.
`[0026] As shown, the CPU 102, also referred to herein as
`the "processing complex," includes the first set of cores 210
`and the second set of cores 220. In one embodiment, the cores
`included in the first set of cores 210 may implement substan(cid:173)
`tially the same functionality as the cores included in the
`second set of cores 220. In alternative embodiments, each
`given set of cores 210, 220 may implement a particular func(cid:173)
`tional block of the CPU 102, such as an arithmetic and logic
`unit, a fetch unit, a graphics pipeline, a rasterizer, or the like.
`In still further embodiments, the cores included in the second
`set of cores 220 may be capable of a subset of the function(cid:173)
`ality of the cores included in the first set of cores 210. Various
`designs are within the scope of embodiments of the invention
`and may be based on trade-offs in usage for providing the
`shared functionality.
`[0027] According to various embodiments, the power con(cid:173)
`sumption associated with the CPU 102 is derived from
`"dynamic" switching power and "static" leakage power. The
`switching power loss is based on the charging and discharg(cid:173)
`ing of the each transistor and its associated capacitance, and
`increases with operating frequency and number of gates. The
`leakage power loss is based on gate and channel leakage in
`each transistor, and increases as process geometry decreases.
`[0028] According to various embodiments, the cores 212
`included in the first set of cores 210 comprise "fast" cores and
`the cores 222 included in the second set of cores 220 comprise
`"slow" cores. For example, the cores 212 may be manufac(cid:173)
`tured using faster transistors that have significant static leak(cid:173)
`age. In some embodiments, when the computing needs and/or
`workload of the first set of cores 210 are lowered, then the
`clock speed is lowered to reduce power. The static leakage is
`not a significant issue at the high clock speeds required for
`peak performance. However, at slower clock speeds, the static
`leakage of the fast transistors can dominate the overall power
`consumption. According to various embodiments, the first set
`of cores includes N cores and the second set of cores includes
`M cores. In one embodiment, N is not equal to M. In other
`embodiments, N is equal to M. In some embodiments, the first
`set of cores 210 may include multiple cores, e.g., four cores,
`and the second set of cores 220 may include a single core 222.
`In other embodiments, the first set of cores 210 may include
`a single core and/or the second set of cores 220 may include
`multiple cores.
`[0029] Thus, according to various embodiments, the sec(cid:173)
`ond set of cores 220, also referred to as "shadow" cores, are
`also included within the CPU 102. The second set of cores
`220 includes one or more "slow" cores 222 constructed from
`slower transistors that are not capable of operating as quickly
`
`Petitioner Samsung Ex-1005, 0010
`
`
`
`US 2011/0213950 Al
`
`Sep. 1, 2011
`
`3
`
`as the transistors includes in the cores 212 of the first set of
`cores 210. In some embodiments, the second set of cores 220
`has a much lower leakage power loss that the first set of cores
`210, but is not capable of achieving the same performance
`levels as the first set of cores 210.
`In some embodiments, a controller 240 included
`[0030]
`within the CPU 102 is configured to evaluate at least a work(cid:173)
`load associated with one or more operations to be executed by
`the CPU 102. In some embodiments, the controller is imple(cid:173)
`mented in software and is executed by the CPU 102. Based on
`the evaluated workload, the controller 240 is able to configure
`the CPU 102 to operate in a first mode of operation or a second
`mode of operation. In the first mode of operation, the first set
`of cores 210 is enabled and operable and the second set of
`cores 220 is disabled. In the second mode of operation, the
`second set of cores 220 is enabled and operable and the first
`set of cores 210 is disabled. In addition, in various embodi(cid:173)
`ments, the controller 240 is able to increase and/or decrease
`the operating frequency of the first processor and/or the sec(cid:173)
`ond processor when operating the CPU 102 in each of the first
`and second modes. In one embodiment, the first set of cores
`210 is disabled and powered off when the one or more opera(cid:173)
`tions are processed by the second set of cores 220. In alter(cid:173)
`native embodiments, the first set of cores 210 is clock gated
`and/or power gated when the one or more operations are
`processed by the second set of cores 220.
`[0031] For example, if the CPU 102 is operating in the first
`mode at high frequency, and the controller 240 detects that the
`workload has decreased to a point where operating in the first
`mode at lower frequency would save power, then the control(cid:173)
`ler 240 may decrease the operating frequency of the first set of
`cores 210. If the controller 240 later detects that the workload
`has further decreased to a point where the CPU 102 would use
`less power to operate in the second mode, then the controller
`240 causes the CPU 102 to operate in the second mode. In
`some embodiments, the CPU 102 may operate in both the first
`mode and the second mode simultaneously. In some embodi(cid:173)
`ments, operating in both the first and second modes simulta(cid:173)
`neously may result in lower overall power efficiency. For
`example, the CPU 102 may operate in both the first mode and
`the second mode simultaneously during a transition period
`when transitioning between the first mode and second mode,
`or vice versa.
`[0032]
`In one embodiment, evaluating the workload
`includes determining whether a processing parameter asso(cid:173)
`ciated with processing the one or more operations is greater
`than or less than a threshold value. For example, the process(cid:173)
`ing parameter may be a processing frequency, and the evalu(cid:173)
`ating at least the workload comprises determining that the one
`or more operations should be processed at a processing fre(cid:173)
`quency that is greater than or less than a threshold frequency.
`In another example, the processing parameter may be instruc(cid:173)
`tion throughput, and the evaluating at least the workload
`comprises determining that the instruction throughput when
`processing the workload should be greater than or less than a
`threshold throughput.
`[0033]
`In some embodiments, determining that processing
`operations should switch from being executed by the first set
`of cores 210 to being executed by the second set of cores 220,
`and vice versa, is based on evaluating at least the workload, as
`described above, and performance data and/or power data
`associated with the first and/or second sets of cores. As also
`shown in FIG. 2, each of the first and second sets of cores 210
`and 220 includes data 214 and 224, respectively.
`
`[0034] According to various embodiments, the data 214,
`224 includes performance data and/or power data. The per(cid:173)
`formance data associated with the first set of cores and the
`second set of cores includes at least one of an operating
`frequency range of the first set of cores and an operating
`frequency range of the second set of cores, the number of
`cores in the first set of cores and the number of cores in the
`second set of cores, and an amount of parallelism between the
`cores in the first set of cores and an amount of parallelism
`between the cores in the second set of cores. The power data
`associated with the first set of cores and the second set of
`cores includes at least one of a maximum voltage at which the
`cores in the first set of cores can operate and a maximum
`voltage at which the cores in the second set of cores can
`operate, a maximum current that the cores in the first set of
`cores can tolerate and a maximum current that the cores in the
`second set of cores can tolerate, and an amount of power
`dissipation as a function of at least an operating frequency for
`the cores in the first set of cores and an amount of power
`dissipation as a function of at least an operating frequency for
`the cores in the second set of cores.
`[0035] According to various embodiments, the controller
`240 is configured to evaluate the data 214, 224 and determine
`which set of cores should execute the processing operations
`based, at least in part, on the data 214. In one embodiment, the
`data 214, 224 is included within fuses associated with the
`processing complex and the controller 240 is configured to
`read the data 214, 224 from the fuses. In alternative embodi(cid:173)
`ments, the data 214, 224 is determined dynamically during
`operation of the processing complex by the controller 240.
`[0036]
`In one embodiment, the particular silicon composi(cid:173)
`tion, process technology, and/or logical implementations
`used to manufacture each of the first and second processors
`210, 220 is known at the time of manufacture. In some
`embodiments, the silicon composition and/or process tech(cid:173)
`nology associated with the first processor 210 is different than
`the silicon composition and/or process technology associated
`with the second processor 220. However, each integrated
`circuit manufactured is not identical. Minor variations exist
`between ICs, even ICs on the same wafer. Therefore, the
`characteristics associated with an IC may vary from chip-to(cid:173)
`chip. According to various embodiments of the invention, at
`the time of manufacturing, each chip may be measured with a
`testing device to measure the performance data and/or the
`power data associated with the first set of cores 210 and the
`performance data and/or the power data associated with the
`second set of cores 220. The dynamic power, in some embodi(cid:173)
`ments, is approximately equal between chips and can be
`estimated as a function of the number of gates and operating
`frequency. In other embodiments, the silicon composition
`and/or process technology could be mixed between chips
`and/or cores, thereby providing different dynamic power
`between chips and/or cores.
`[0037] Based on the measured and/or estimated character(cid:173)
`istics, one or more fuses may be set on the CPU 102 to
`characterize the performance data and/or the power data of
`the CPU 102 based on various characteristics, such as oper(cid:173)
`ating frequency, voltage, temperature, throughput, and the
`like. In some embodiments, the one or more fuses may com(cid:173)
`prise the data 214 and 224 shown in FIG. 2. Accordingly, the
`controller 240 may be configured to read the data 214, 224
`and determine which mode of operation is most optimal
`based on the particular operating characteristics at a particu(cid:173)
`lar time.
`
`Petitioner Samsung Ex-1005, 0011
`
`
`
`US 2011/0213950 Al
`
`Sep. 1, 2011
`
`4
`
`In some embodiments, the data 214, 224 changes
`[0038]
`dynamically during operation of the first and/or second sets of
`cores 210, 220. For example, temperature changes associated
`with the CPU 102 may causes one or more of the performance
`data 214, 224 to change. Accordingly, the controller 240 may
`determine that a certain mode of operation is more power
`efficient, based on the dynamic operating temperature infor(cid:173)
`mation. In some embodiments, the controller 240 may deter(cid:173)
`mine the current operating characteristics and perform a table
`look-up to determine which mode of operation is most power
`efficient. The table may be organized based on ranges of the
`different operating characteristics of the CPU 102. In alter(cid:173)
`native embodiments, the controller 240 may determine which
`mode of operation is more power efficient based on evaluat(cid:173)
`ing a function having inputs associated with the different
`operating characteristics. For example, the function may be a
`discrete or continuous function.
`[0039]
`In some embodiments, determining which set of
`cores should execute the processing operations is based on
`evaluating one or more operating conditions of the processing
`complex. The one or more operating conditions may include
`at least one of a supply voltage, a temperature of each chip
`included in the processing complex, and an average leakage
`current over a period of time of each chip included in the
`processing complex. The one or more operating conditions
`may be determined dynamically during operation of the pro(cid:173)
`cessing complex.
`[0040]
`In some embodiments, determining whether the one
`or more operations should continue to be processed by the
`first set of cores or should be processed by the second set of
`cores is based on at least one of the thermal constraint, the
`performance requirement, the latency requirement, and the
`current requirement.
`[0041]
`In some embodiments, the first set of cores 210 and
`the second set of cores 220 are configured to use a shared
`resource 230 when executing processing operations. The
`shared resource 230, may be any resource including a fixed
`function processing block, a memory unit, such as a cache
`unit, or any other type of computing resource.
`[0042] According to various embodiments, the process of
`analyzing the parameters and choosing the most appropriate
`set of cores to use is described in greater detail in FIGS. 4-6.
`[0043] When execution of the processing operations
`switches from the first set of cores to the second set of cores,
`in some embodiments, the controller 240 is configured to
`transfer the processor state from the first set of cores to the
`second set of cores. In one embodiment, the controller saves
`the processor state to the shared resource 230, triggers a
`hardware mechanism that stops and powers off the first set of
`cores 210, and boots the second set of cores 220. The second
`set of cores 220 then restores the processor state from the
`shared resource 230 and continues operation at the lower
`speed associated with the second set of cores 220. In other
`embodiments, the processing state may be stored in any
`memory unit when transferring execution of the operations
`between the two sets of cores. In still further embodiments,
`the processing state may be directly transferred to the other
`set of cores via a dedicated bus, where the processing state is
`not stored in any memory unit with switching between the
`two sets of cores. The transition from the first mode to the
`second mode, and vice versa, can be done transparently to
`high level software, such as the operating system.
`[0044] According to some embodiments,
`the shared
`resource 230 is an L2 cache RAM, and the first and second
`
`sets of cores 210, 220 share the same L2 cache RAM. In one
`embodiment, each of the first set of cores 210 and the second
`set of cores 220 includes an L2 cache controller. The L2 cache
`may include a single set of tag and data RAM. The control
`signals and buses between the first and second sets of cores
`210, 220 and the L2 cache are multiplexed so that either the
`first set of cores 210 or the second set of cores 220 can control
`the L2 cache. In some embodiments, only one of the first and
`second sets of cores 210, 220 can control the L2 cache at a
`particular time. Also, in some embodiments, the read data bus
`from the RAM goes to both the first and second sets of cores
`210, 220 and is used by whichever set of cores is active at the
`time.
`[0045]
`In a processing complex that implements a common
`L2 cache, both sets of cores can have the performance advan(cid:173)
`tages associated with implementing an L2 cache, without the
`additional area required for separate L2 caches. Additionally,
`two separate L2 caches would add significant delay to the
`processor mode switch. For example, on a switch from oper(cid:173)
`ating in the first mode to operating in the second mode, the
`data in the first L2 cache associated with the first set of cores
`would need to be copied to the second L2 cache associated
`with the second set of cores, thereby causing inefficiencies.
`Then, the first L2 cache would need to be flushed or zeroed(cid:173)
`out to remove old data, thereby causing additional inefficien(cid:173)