throbber
1111111111111111 IIIIII IIIII 1111111111 11111 1111111111 11111 111111111111111 1111111111 11111111
`US 20110213950Al
`
`c19) United States
`c12) Patent Application Publication
`Mathieson et al.
`
`c10) Pub. No.: US 2011/0213950 Al
`Sep. 1, 2011
`(43) Pub. Date:
`
`(54) SYSTEM AND METHOD FOR POWER
`OPTIMIZATION
`
`(76)
`
`Inventors:
`
`John George Mathieson, San Jose,
`CA (US); Phil Carmack, Santa
`Clara, CA (US); Brian Smith,
`Mountain View, CA (US)
`
`(21) Appl. No.:
`
`12/787,359
`
`(22) Filed:
`
`May 25, 2010
`
`Related U.S. Application Data
`
`(63) Continuation-in-part of application No. 12/137,053,
`filed on Jun. 11, 2008.
`
`Publication Classification
`
`(51)
`
`Int. Cl.
`G06F 15176
`(2006.01)
`G06F 9102
`(2006.01)
`(52) U.S. Cl. ................................... 712/30; 712/E09.002
`ABSTRACT
`(57)
`
`A technique for reducing the power consumption required to
`execute processing operations. A processing complex, such
`as a CPU or a GPU, includes a first set of cores comprising
`one or more fast cores and second set of cores comprising one
`or more slow cores. A processing mode of the processing
`complex can switch between a first mode of operation and a
`second mode of operation based on one or more of the work(cid:173)
`load characteristics, performance characteristics of the first
`and second sets of cores, power characteristics of the first and
`second sets of cores, and operating conditions of the process(cid:173)
`ing complex. A controller causes the processing operations to
`be executed by either the first set of cores or the second set of
`cores to achieve the lowest total power consumption.
`
`START
`
`/400A
`
`Process one or more operations
`on first set of cores
`402
`
`Evaluate processing parameter associated
`with processing the one or more operations
`404
`
`YES
`
`Process the one or more operations
`on second set of cores
`408
`
`Petitioner Samsung Ex-1005, 0001
`
`

`

`Patent Application Publication
`
`Sep. 1, 2011 Sheet 1 of 7
`
`US 2011/0213950 Al
`
`System Memory
`104
`
`Device Driver
`103
`-
`
`Computer
`System
`
`/100
`
`.
`
`Memory
`Bridge
`105
`
`Parallel Processing
`Subsystem
`112
`
`CPU
`102
`
`Fast Cores
`130
`-
`
`Slow Cores
`140
`-
`
`□
`
`Displa
`y
`Devic
`e
`/110
`
`Input De vices
`108
`
`-
`
`-
`
`Communications
`Path
`v106
`
`,,,,-
`~
`System
`Disk
`114
`
`-...
`
`. / '
`
`,
`
`~
`
`-,,
`
`~
`
`1/0 Bridge
`107
`
`' ,.
`
`I
`I
`
`I
`
`Add-In Card
`120
`
`I -
`
`.
`
`Switch
`116
`
`Add-In Card
`121
`
`Network
`Adapter
`118
`
`FIG. 1
`
`Petitioner Samsung Ex-1005, 0002
`
`

`

`Patent Application Publication
`
`Sep. 1, 2011 Sheet 2 of 7
`
`US 2011/0213950 Al
`
`212
`
`212
`
`C~RE r
`1=] 1
`l C~RE I 1
`C~RE r,
`
`212
`
`r-210
`
`212
`
`'I'
`
`CONTROLLER
`240
`-
`...
`
`.
`
`.
`
`I
`
`DATA
`214
`-
`
`222
`
`. B
`
`DATA
`224
`-
`
`I
`
`I
`
`I
`
`-~
`
`1ir
`
`RESOURCE
`UNIT
`230
`-
`
`J
`
`r-220
`
`~ -,
`
`CPU
`102
`
`FIG. 2
`
`Petitioner Samsung Ex-1005, 0003
`
`

`

`Patent Application Publication
`
`Sep. 1, 2011 Sheet 3 of 7
`
`US 2011/0213950 Al
`
`CONTROLLER
`240
`-
`
`TAG ~336
`
`338---....
`
`DATA
`
`""
`
`I
`I
`
`334
`-
`f
`
`I
`
`-; ..
`
`~332
`·•
`
`230
`
`"'-340
`
`..
`-
`
`-
`-
`
`L2 CACHE
`CONTROLLER
`320
`-
`
`L2 CACHE
`CONTROLLER
`310
`-
`
`210
`
`220
`
`CPU
`102
`
`FIG. 3
`
`Petitioner Samsung Ex-1005, 0004
`
`

`

`Patent Application Publication
`
`Sep. 1, 2011 Sheet 4 of 7
`
`US 2011/0213950 Al
`
`START
`
`✓ 400A
`
`Process one or more operations
`on first set of cores
`402
`
`Evaluate processing parameter associated
`with processing the one or more operations
`404
`
`rocessing parameter
`threshold valu
`
`YES
`
`Process the one or more operations
`on second set of cores
`408
`
`FIG. 4A
`
`Petitioner Samsung Ex-1005, 0005
`
`

`

`Patent Application Publication
`
`Sep. 1, 2011 Sheet 5 of 7
`
`US 2011/0213950 Al
`
`/400B
`
`START
`
`Evaluate the workload associated with executing
`one or more processing operations,
`performance data and/or power data associated with
`a first set of cores, and performance data and/or
`power data associated with a second set of cores
`452
`
`Optionally/, evaluate operating conditions of
`the processing complex
`454
`
`--------r--------
`
`7
`
`-
`
`Execute the processing operations with the first
`set of cores
`456
`l
`
`Evaluate the workload associated with executing
`one or more processing operations,
`performance data and/or power data associated with
`a first set of cores, and performance data and/or
`power data associated with a second set of cores
`458
`
`________ J ________
`
`7
`
`Optionally,, evaluate operating conditions of
`the processing complex
`460
`
`L
`
`Execute the processing operations with the
`second set of cores
`462
`
`END
`
`FIG. 48
`
`Petitioner Samsung Ex-1005, 0006
`
`

`

`Patent Application Publication
`
`Sep. 1, 2011 Sheet 6 of 7
`
`US 2011/0213950 Al
`
`1500
`
`START
`
`Execute processing operations with one or more
`cores having a first type and access to a resource
`502
`
`Determine that at least a workload
`associated with the processing operations
`has changed
`
`504
`
`Execute the processing operations with one or more
`cores having a second type and access to the resource
`506
`
`END
`
`FIG. 5
`
`Petitioner Samsung Ex-1005, 0007
`
`

`

`Patent Application Publication
`
`Sep. 1, 2011 Sheet 7 of 7
`
`US 2011/0213950 Al
`
`✓ 600
`
`Power
`
`604
`
`~ 610
`
`I
`
`602
`
`Frequency (MHz)
`
`FIG. 6
`
`Petitioner Samsung Ex-1005, 0008
`
`

`

`US 2011/0213950 Al
`
`Sep. 1, 2011
`
`1
`
`SYSTEM AND METHOD FOR POWER
`OPTIMIZATION
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`[0001] This application in a continuation-in-part of U.S.
`patent application Ser. No. 12/137,053, filed on Jun. 11, 2008
`(Attorney Docket No. NVDA/P003709), which is hereby
`incorporated herein by reference.
`
`BACKGROUND OF THE INVENTION
`
`[0002]
`1. Field of the Invention
`[0003] The present invention relates generally to computer
`hardware and, more specifically, to a system and method for
`power optimization.
`[0004] 2. Description of the Related Art
`[0005] Low power design has become increasingly impor(cid:173)
`tant in recent years. With the proliferation of battery-powered
`mobile devices, efficient power management is quite impor(cid:173)
`tant to the success of a product or system.
`[0006] A number of techniques have been developed to
`increase performance and/or reduce power consumption in
`conventional integrated circuits (I Cs). For example, sleep and
`standby modes, multi-threading techniques, multi-core tech(cid:173)
`niques, and other techniques are currently implemented to
`increase performance and/or decrease power consumption.
`However, these techniques do not reduce power consumption
`enough to meet the requirements of certain emerging tech(cid:173)
`nologies and products.
`[0007] As the foregoing illustrates, what is needed in the art
`is an improved technique for power optimization that over(cid:173)
`comes
`the drawbacks associated with conventional
`approaches.
`
`SUMMARY
`
`[0008] One embodiment of the invention sets forth a com(cid:173)
`puter-implemented method for processing one or more opera(cid:173)
`tions within a processing complex. The method includes
`causing the one or more operations to be processed by a first
`set of cores within the processing complex; evaluating at least
`a workload associated with processing the one or more opera(cid:173)
`tions to determine that the one or more operations should be
`processed by a second set of cores included within the pro(cid:173)
`cessing complex; and causing the one or more operations to
`be processed by the second set of cores.
`[0009] Another embodiment of the invention provides a
`computer-implemented method for processing one or more
`operations within a processing complex. The method
`includes causing the one or more operations to be processed
`by a first set of cores within the processing complex; evalu(cid:173)
`ating at least a workload associated with processing the one or
`more operations, performance data and power data associated
`with the first set of cores, and performance data and power
`data associated with a second set of cores included within the
`processing complex to determine whether the one or more
`operations should continue to be processed by the first set of
`cores or should be processed by the second set of cores; and
`causing the one or more operations to continue to be pro(cid:173)
`cessed by the first set of cores or to be processed by the second
`set of cores.
`[001 OJ Yet another embodiment of the invention provides a
`computer-implemented method for processing one or more
`operations within a processing complex. The method
`
`includes causing the one or more operations to be processed
`by a first set of cores included within the processing complex,
`where the first set of core is configured to utilize a resource
`unit when processing the one or more operations; evaluating
`at least a workload associated with processing the one or more
`operations to determine that the one or more operations
`should be processed by a second set of cores included within
`the processing complex; and causing the one or more opera(cid:173)
`tions to be processed by the second set of cores included
`within the processing complex, where the second set of cores
`is configured to utilize the resource unit when processing the
`one or more operations.
`[0011] Advantageously, embodiments of the invention pro(cid:173)
`vide techniques to decrease the total power consumption of a
`processor.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0012] So that the manner in which the above recited fea(cid:173)
`tures of the invention can be understood in detail, a more
`particular description of the invention, briefly summarized
`above, may be had by reference to embodiments, some of
`which are illustrated in the appended drawings. It is to be
`noted, however, that the appended drawings illustrate only
`typical embodiments of this invention and are therefore not to
`be considered limiting of its scope, for the invention may
`admit to other equally effective embodiments.
`[0013] FIG. 1 is a block diagram illustrating a computer
`system configured to implement one or more aspects of the
`invention.
`[0014] FIG. 2 is a conceptual diagram illustrating a pro(cid:173)
`cessing complex that includes heterogeneous cores, accord(cid:173)
`ing to one embodiment of the invention.
`[0015] FIG. 3 is a conceptual diagram illustrating a pro(cid:173)
`cessing complex that includes a shared resource, according to
`one embodiment of the invention.
`[0016] FIGS. 4A-4B are flow diagrams of method steps for
`switching between modes of operation of a processing com(cid:173)
`plex, according to various embodiments of the invention.
`[0017] FIG. 5 is a flow diagram of method steps for switch(cid:173)
`ing between modes of operation of a processing complex
`having a shared resource, according to one embodiment of the
`invention.
`[0018] FIG. 6 is a conceptual diagram illustrating power
`consumption as a function of operating frequency for differ(cid:173)
`ent types of processing cores, according to one embodiment
`of the invention.
`
`DETAILED DESCRIPTION
`
`[0019]
`In the following description, numerous specific
`details are set forth to provide a more thorough understanding
`of the invention. However, it will be apparent to one of skill in
`the art that the invention may be practiced without one or
`more of these specific details. In other instances, well-known
`features have not been described in order to avoid obscuring
`embodiments of the invention.
`
`System Overview
`
`[0020] FIG. 1 is a block diagram illustrating a computer
`system 100 configured to implement one or more aspects of
`the invention. Computer system 100 includes a central pro(cid:173)
`cessing unit (CPU) 102 and a system memory 104 commu(cid:173)
`nicating via a bus path through a memory bridge 105. The
`CPU 102 includes one or more "fast" cores 130 and one or
`
`Petitioner Samsung Ex-1005, 0009
`
`

`

`US 2011/0213950 Al
`
`Sep. 1, 2011
`
`2
`
`more "shadow" or slow cores 140, as described in greater
`detail herein. In some embodiments, the cores 130 are asso(cid:173)
`ciated with higher performance and higher leakage power
`than the cores 140. Memory bridge 105 may be integrated
`into CPU 102 as shown in FIG. 1. Alternatively, memory
`bridge 105, may be a conventional device, e.g., a Northbridge
`chip, that is coupled to CPU 102 via a bus. Memory bridge
`105 is also coupled to an I/O (input/output) bridge 107 via
`communication path 106 (e.g., a HyperTransport link).
`[0021]
`I/O bridge 107, which may be, e.g., a Southbridge
`chip, receives user input from one or more user input devices
`108 (e.g., keyboard, mouse) and forwards the input to CPU
`102 via path 106 and memory bridge 105. A parallel process(cid:173)
`ing subsystem 112 is coupled to memory bridge 105 via a bus
`or other communication path 113 ( e.g., a PCI Express, Accel(cid:173)
`erated Graphics Port, or HyperTransport link); in one
`embodiment parallel processing subsystem 112 is a graphics
`subsystem that delivers pixels to a display device 110 ( e.g., a
`conventional CRT or LCD based monitor). A system disk 114
`is also connected to I/O bridge 107. A switch 116 provides
`connections between I/O bridge 107 and other components
`such as a network adapter 118 and various add-in cards 120
`and 121. Other components (not explicitly shown), including
`USB or other port connections, CD drives, DVD drives, film
`recording devices, and the like, may also be connected to I/O
`bridge 107. Communication paths interconnecting the vari(cid:173)
`ous components in FIG. 1 may be implemented using any
`suitable protocols, such as PCI (Peripheral Component Inter(cid:173)
`connect), PCI Express (PCI-E), AGP (Accelerated Graphics
`Port), HyperTransport, or any other bus or point-to-point
`communication protocol(s ), and connections between differ(cid:173)
`ent devices may use different protocols as is known in the art.
`[0022]
`In one embodiment, the parallel processing sub(cid:173)
`system 112 incorporates circuitry optimized for graphics and
`video processing, including, for example, video output cir(cid:173)
`cuitry, and constitutes a graphics processing unit (GPU). In
`another embodiment, the parallel processing subsystem 112
`incorporates circuitry optimized for general purpose process(cid:173)
`ing, while preserving the underlying computational architec(cid:173)
`ture. In yet another embodiment, the parallel processing sub(cid:173)
`system 112 may be integrated with one or more other system
`elements, such as the memory bridge 105, CPU 102, and I/O
`bridge 107 to form a system on chip (SoC).
`[0023]
`It will be appreciated that the system shown in FIG.
`1 is illustrative and that variations and modifications are pos(cid:173)
`sible. The connection topology, including the number and
`arrangement of bridges, may be modified as desired. For
`instance, in some embodiments, system memory 104 is
`directly connected to CPU 102 rather than connected through
`a bridge, and other devices communicate with system
`memory 104 via memory bridge 105 and CPU 102. In other
`alternative topologies, parallel processing subsystem 112 is
`connected to I/O bridge 107 or directly to CPU 102, rather
`than to memory bridge 105. In still other embodiments, one or
`more of CPU 102, I/O bridge 107, parallel processing sub(cid:173)
`system 112, and memory bridge 105 may be integrated into
`one or more chips. The particular components shown herein
`are optional; for instance, any number of add-in cards or
`peripheral devices might be supported. In some embodi(cid:173)
`ments, switch 116 is eliminated, and network adapter 118 and
`add-in cards 120, 121 connect directly to I/O bridge 107.
`Power Optimization Implementation
`[0024] FIG. 2 is a conceptual diagram illustrating a pro(cid:173)
`cessing complex that includes heterogeneous cores, accord-
`
`ing to one embodiment of the invention. As shown, the pro(cid:173)
`cessing complex comprises the CPU 102 shown in FIG. 1. In
`other embodiments, the processing complex may be any other
`type of processing unit, such as a graphics processing unit
`(GPU).
`[0025] The CPU 102 includes a first set of cores 210, a
`second set of cores 220, a shared resource 230, and a control(cid:173)
`ler 240. Other components included within the CPU 102 are
`omitted to avoid obscuring embodiments of the invention. In
`some embodiments, the first set of cores 210 includes one or
`more cores 212 and data 214, and the second set of cores 220
`includes one or more cores 222 and data 224. In some
`embodiments, the first set of cores 210 and the second set of
`cores 220 are included on the same chip. In other embodi(cid:173)
`ments the first set of cores 210 and the second set of cores 220
`are in~luded on separate chips that comprise the CPU 102.
`[0026] As shown, the CPU 102, also referred to herein as
`the "processing complex," includes the first set of cores 210
`and the second set of cores 220. In one embodiment, the cores
`included in the first set of cores 210 may implement substan(cid:173)
`tially the same functionality as the cores included in the
`second set of cores 220. In alternative embodiments, each
`given set of cores 210, 220 may implement a particular func(cid:173)
`tional block of the CPU 102, such as an arithmetic and logic
`unit, a fetch unit, a graphics pipeline, a rasterizer, or the like.
`In still further embodiments, the cores included in the second
`set of cores 220 may be capable of a subset of the function(cid:173)
`ality of the cores included in the first set of cores 210. Various
`designs are within the scope of embodiments of the invention
`and may be based on trade-offs in usage for providing the
`shared functionality.
`[0027] According to various embodiments, the power con(cid:173)
`sumption associated with the CPU 102 is derived from
`"dynamic" switching power and "static" leakage power. The
`switching power loss is based on the charging and discharg(cid:173)
`ing of the each transistor and its associated capacitance, and
`increases with operating frequency and number of gates. The
`leakage power loss is based on gate and channel leakage in
`each transistor, and increases as process geometry decreases.
`[0028] According to various embodiments, the cores 212
`included in the first set of cores 210 comprise "fast" cores and
`the cores 222 included in the second set of cores 220 comprise
`"slow" cores. For example, the cores 212 may be manufac(cid:173)
`tured using faster transistors that have significant static leak(cid:173)
`age. In some embodiments, when the computing needs and/or
`workload of the first set of cores 210 are lowered, then the
`clock speed is lowered to reduce power. The static leakage is
`not a significant issue at the high clock speeds required for
`peak performance. However, at slower clock speeds, the static
`leakage of the fast transistors can dominate the overall power
`consumption. According to various embodiments, the first set
`of cores includes N cores and the second set of cores includes
`M cores. In one embodiment, N is not equal to M. In other
`embodiments, N is equal to M. In some embodiments, the first
`set of cores 210 may include multiple cores, e.g., four cores,
`and the second set of cores 220 may include a single core 222.
`In other embodiments, the first set of cores 210 may include
`a single core and/or the second set of cores 220 may include
`multiple cores.
`[0029] Thus, according to various embodiments, the sec(cid:173)
`ond set of cores 220, also referred to as "shadow" cores, are
`also included within the CPU 102. The second set of cores
`220 includes one or more "slow" cores 222 constructed from
`slower transistors that are not capable of operating as quickly
`
`Petitioner Samsung Ex-1005, 0010
`
`

`

`US 2011/0213950 Al
`
`Sep. 1, 2011
`
`3
`
`as the transistors includes in the cores 212 of the first set of
`cores 210. In some embodiments, the second set of cores 220
`has a much lower leakage power loss that the first set of cores
`210, but is not capable of achieving the same performance
`levels as the first set of cores 210.
`In some embodiments, a controller 240 included
`[0030]
`within the CPU 102 is configured to evaluate at least a work(cid:173)
`load associated with one or more operations to be executed by
`the CPU 102. In some embodiments, the controller is imple(cid:173)
`mented in software and is executed by the CPU 102. Based on
`the evaluated workload, the controller 240 is able to configure
`the CPU 102 to operate in a first mode of operation or a second
`mode of operation. In the first mode of operation, the first set
`of cores 210 is enabled and operable and the second set of
`cores 220 is disabled. In the second mode of operation, the
`second set of cores 220 is enabled and operable and the first
`set of cores 210 is disabled. In addition, in various embodi(cid:173)
`ments, the controller 240 is able to increase and/or decrease
`the operating frequency of the first processor and/or the sec(cid:173)
`ond processor when operating the CPU 102 in each of the first
`and second modes. In one embodiment, the first set of cores
`210 is disabled and powered off when the one or more opera(cid:173)
`tions are processed by the second set of cores 220. In alter(cid:173)
`native embodiments, the first set of cores 210 is clock gated
`and/or power gated when the one or more operations are
`processed by the second set of cores 220.
`[0031] For example, if the CPU 102 is operating in the first
`mode at high frequency, and the controller 240 detects that the
`workload has decreased to a point where operating in the first
`mode at lower frequency would save power, then the control(cid:173)
`ler 240 may decrease the operating frequency of the first set of
`cores 210. If the controller 240 later detects that the workload
`has further decreased to a point where the CPU 102 would use
`less power to operate in the second mode, then the controller
`240 causes the CPU 102 to operate in the second mode. In
`some embodiments, the CPU 102 may operate in both the first
`mode and the second mode simultaneously. In some embodi(cid:173)
`ments, operating in both the first and second modes simulta(cid:173)
`neously may result in lower overall power efficiency. For
`example, the CPU 102 may operate in both the first mode and
`the second mode simultaneously during a transition period
`when transitioning between the first mode and second mode,
`or vice versa.
`[0032]
`In one embodiment, evaluating the workload
`includes determining whether a processing parameter asso(cid:173)
`ciated with processing the one or more operations is greater
`than or less than a threshold value. For example, the process(cid:173)
`ing parameter may be a processing frequency, and the evalu(cid:173)
`ating at least the workload comprises determining that the one
`or more operations should be processed at a processing fre(cid:173)
`quency that is greater than or less than a threshold frequency.
`In another example, the processing parameter may be instruc(cid:173)
`tion throughput, and the evaluating at least the workload
`comprises determining that the instruction throughput when
`processing the workload should be greater than or less than a
`threshold throughput.
`[0033]
`In some embodiments, determining that processing
`operations should switch from being executed by the first set
`of cores 210 to being executed by the second set of cores 220,
`and vice versa, is based on evaluating at least the workload, as
`described above, and performance data and/or power data
`associated with the first and/or second sets of cores. As also
`shown in FIG. 2, each of the first and second sets of cores 210
`and 220 includes data 214 and 224, respectively.
`
`[0034] According to various embodiments, the data 214,
`224 includes performance data and/or power data. The per(cid:173)
`formance data associated with the first set of cores and the
`second set of cores includes at least one of an operating
`frequency range of the first set of cores and an operating
`frequency range of the second set of cores, the number of
`cores in the first set of cores and the number of cores in the
`second set of cores, and an amount of parallelism between the
`cores in the first set of cores and an amount of parallelism
`between the cores in the second set of cores. The power data
`associated with the first set of cores and the second set of
`cores includes at least one of a maximum voltage at which the
`cores in the first set of cores can operate and a maximum
`voltage at which the cores in the second set of cores can
`operate, a maximum current that the cores in the first set of
`cores can tolerate and a maximum current that the cores in the
`second set of cores can tolerate, and an amount of power
`dissipation as a function of at least an operating frequency for
`the cores in the first set of cores and an amount of power
`dissipation as a function of at least an operating frequency for
`the cores in the second set of cores.
`[0035] According to various embodiments, the controller
`240 is configured to evaluate the data 214, 224 and determine
`which set of cores should execute the processing operations
`based, at least in part, on the data 214. In one embodiment, the
`data 214, 224 is included within fuses associated with the
`processing complex and the controller 240 is configured to
`read the data 214, 224 from the fuses. In alternative embodi(cid:173)
`ments, the data 214, 224 is determined dynamically during
`operation of the processing complex by the controller 240.
`[0036]
`In one embodiment, the particular silicon composi(cid:173)
`tion, process technology, and/or logical implementations
`used to manufacture each of the first and second processors
`210, 220 is known at the time of manufacture. In some
`embodiments, the silicon composition and/or process tech(cid:173)
`nology associated with the first processor 210 is different than
`the silicon composition and/or process technology associated
`with the second processor 220. However, each integrated
`circuit manufactured is not identical. Minor variations exist
`between ICs, even ICs on the same wafer. Therefore, the
`characteristics associated with an IC may vary from chip-to(cid:173)
`chip. According to various embodiments of the invention, at
`the time of manufacturing, each chip may be measured with a
`testing device to measure the performance data and/or the
`power data associated with the first set of cores 210 and the
`performance data and/or the power data associated with the
`second set of cores 220. The dynamic power, in some embodi(cid:173)
`ments, is approximately equal between chips and can be
`estimated as a function of the number of gates and operating
`frequency. In other embodiments, the silicon composition
`and/or process technology could be mixed between chips
`and/or cores, thereby providing different dynamic power
`between chips and/or cores.
`[0037] Based on the measured and/or estimated character(cid:173)
`istics, one or more fuses may be set on the CPU 102 to
`characterize the performance data and/or the power data of
`the CPU 102 based on various characteristics, such as oper(cid:173)
`ating frequency, voltage, temperature, throughput, and the
`like. In some embodiments, the one or more fuses may com(cid:173)
`prise the data 214 and 224 shown in FIG. 2. Accordingly, the
`controller 240 may be configured to read the data 214, 224
`and determine which mode of operation is most optimal
`based on the particular operating characteristics at a particu(cid:173)
`lar time.
`
`Petitioner Samsung Ex-1005, 0011
`
`

`

`US 2011/0213950 Al
`
`Sep. 1, 2011
`
`4
`
`In some embodiments, the data 214, 224 changes
`[0038]
`dynamically during operation of the first and/or second sets of
`cores 210, 220. For example, temperature changes associated
`with the CPU 102 may causes one or more of the performance
`data 214, 224 to change. Accordingly, the controller 240 may
`determine that a certain mode of operation is more power
`efficient, based on the dynamic operating temperature infor(cid:173)
`mation. In some embodiments, the controller 240 may deter(cid:173)
`mine the current operating characteristics and perform a table
`look-up to determine which mode of operation is most power
`efficient. The table may be organized based on ranges of the
`different operating characteristics of the CPU 102. In alter(cid:173)
`native embodiments, the controller 240 may determine which
`mode of operation is more power efficient based on evaluat(cid:173)
`ing a function having inputs associated with the different
`operating characteristics. For example, the function may be a
`discrete or continuous function.
`[0039]
`In some embodiments, determining which set of
`cores should execute the processing operations is based on
`evaluating one or more operating conditions of the processing
`complex. The one or more operating conditions may include
`at least one of a supply voltage, a temperature of each chip
`included in the processing complex, and an average leakage
`current over a period of time of each chip included in the
`processing complex. The one or more operating conditions
`may be determined dynamically during operation of the pro(cid:173)
`cessing complex.
`[0040]
`In some embodiments, determining whether the one
`or more operations should continue to be processed by the
`first set of cores or should be processed by the second set of
`cores is based on at least one of the thermal constraint, the
`performance requirement, the latency requirement, and the
`current requirement.
`[0041]
`In some embodiments, the first set of cores 210 and
`the second set of cores 220 are configured to use a shared
`resource 230 when executing processing operations. The
`shared resource 230, may be any resource including a fixed
`function processing block, a memory unit, such as a cache
`unit, or any other type of computing resource.
`[0042] According to various embodiments, the process of
`analyzing the parameters and choosing the most appropriate
`set of cores to use is described in greater detail in FIGS. 4-6.
`[0043] When execution of the processing operations
`switches from the first set of cores to the second set of cores,
`in some embodiments, the controller 240 is configured to
`transfer the processor state from the first set of cores to the
`second set of cores. In one embodiment, the controller saves
`the processor state to the shared resource 230, triggers a
`hardware mechanism that stops and powers off the first set of
`cores 210, and boots the second set of cores 220. The second
`set of cores 220 then restores the processor state from the
`shared resource 230 and continues operation at the lower
`speed associated with the second set of cores 220. In other
`embodiments, the processing state may be stored in any
`memory unit when transferring execution of the operations
`between the two sets of cores. In still further embodiments,
`the processing state may be directly transferred to the other
`set of cores via a dedicated bus, where the processing state is
`not stored in any memory unit with switching between the
`two sets of cores. The transition from the first mode to the
`second mode, and vice versa, can be done transparently to
`high level software, such as the operating system.
`[0044] According to some embodiments,
`the shared
`resource 230 is an L2 cache RAM, and the first and second
`
`sets of cores 210, 220 share the same L2 cache RAM. In one
`embodiment, each of the first set of cores 210 and the second
`set of cores 220 includes an L2 cache controller. The L2 cache
`may include a single set of tag and data RAM. The control
`signals and buses between the first and second sets of cores
`210, 220 and the L2 cache are multiplexed so that either the
`first set of cores 210 or the second set of cores 220 can control
`the L2 cache. In some embodiments, only one of the first and
`second sets of cores 210, 220 can control the L2 cache at a
`particular time. Also, in some embodiments, the read data bus
`from the RAM goes to both the first and second sets of cores
`210, 220 and is used by whichever set of cores is active at the
`time.
`[0045]
`In a processing complex that implements a common
`L2 cache, both sets of cores can have the performance advan(cid:173)
`tages associated with implementing an L2 cache, without the
`additional area required for separate L2 caches. Additionally,
`two separate L2 caches would add significant delay to the
`processor mode switch. For example, on a switch from oper(cid:173)
`ating in the first mode to operating in the second mode, the
`data in the first L2 cache associated with the first set of cores
`would need to be copied to the second L2 cache associated
`with the second set of cores, thereby causing inefficiencies.
`Then, the first L2 cache would need to be flushed or zeroed(cid:173)
`out to remove old data, thereby causing additional inefficien(cid:173)

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket