`
`USO 10049080B2
`
`c12) United States Patent
`George et al.
`
`US 10,049,080 B2
`(IO) Patent No.:
`(45) Date of Patent:
`Aug. 14, 2018
`
`(54) ASYMMETRIC PERFORMANCE
`MULTICORE ARCHITECTURE WITH SAME
`INSTRUCTION SET ARCHITECTURE
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`(71) Applicant: Intel Corporation, Santa Clara, CA
`(US)
`
`(72)
`
`Inventors: Varghese George, Folsom, CA (US);
`Sanjeev S. Jahagirdar, Folsom, CA
`(US); Deborah T. Marr, Portland, OR
`(US)
`
`(73) Assignee: INTEL CORPORATION, Santa Clara,
`CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by O days.
`
`(21) Appl. No.: 15/431,527
`
`(22) Filed:
`
`Feb. 13, 2017
`
`(65)
`
`Prior Publication Data
`
`US 2017/0154012 Al
`
`Jun. 1, 2017
`
`Related U.S. Application Data
`
`(63)
`
`Continuation of application No. 13/335,257, filed on
`Dec. 22, 2011, now Pat. No. 9,569,278.
`
`(51)
`
`(52)
`
`(58)
`
`(2006.01)
`(2006.01)
`(2006.01)
`
`Int. Cl.
`G06F 15180
`G06F 13/40
`G06F 1132
`U.S. Cl.
`CPC ............ G06F 15180 (2013.01); G06F 113206
`(2013.01); G06F 113293 (2013.01); G06F
`113296 (2013.01); G06F 13/4022 (2013.01)
`Field of Classification Search
`None
`See application file for complete search history.
`
`7,992,020 Bl
`2006/0095807 Al*
`
`8/2011 Tuan et al.
`5/2006 Grochowski ........... G06F 1/206
`713/324
`2006/0279152 Al* 12/2006 Ha ........................ G06F 1/3203
`310/114
`
`2006/0282692 Al
`12/2006 Oh
`5/2008 Capps et al.
`2008/0127192 Al
`2008/0263324 Al* 10/2008 Sutardja ................ G06F 1/3203
`712/43
`
`2008/0288748 Al
`2009/0055826 Al
`
`11/2008 Sutardja et al.
`2/2009 Bernstein et al.
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`
`CN
`
`101076770 A
`
`11/2007
`
`OTHER PUBLICATIONS
`
`Final Office Action from U.S. Appl. No. 13/335,257, dated May 5,
`2015, 13 pages.
`
`(Continued)
`
`Primary Examiner - Eric Coleman
`(74) Attorney, Agent, or Firm - Nicholas De Vos
`Webster & Elliott LLP
`
`(57)
`
`ABSTRACT
`
`A method is described that entails operating enabled cores of
`a multi-core processor such that both cores support respec(cid:173)
`tive software routines with a same instruction set, a first core
`being higher performance and consuming more power than
`a second core under a same set of applied supply voltage and
`operating frequency.
`
`24 Claims, 8 Drawing Sheets
`
`- - - - - - ' - , - 6 0 1
`
`MULTIPLE HIGH
`POWER CORES
`OPERATIONAL AND
`AT LEAST ONE LOW
`POWER CORE
`CPERATIOO"L
`
`/610
`
`7H.P:
`
`CORES~---_-_-_-_-_-_-A
`
`6HP:CORESENABLED.•i
`•
`
`6028'1ABLED
`1HP:
`CORE~ - - - , , ;
`!
`8'1ABLED
`__________
`.•:
`,
`•
`:
`:
`D~~~m~~~~gw ~A:~:L -- ---<' ~614
`
`606
`
`,
`:
`
`'
`:
`
`•••
`
`DEMAND
`
`INTHEFACEOF
`CONTINUED DROP OFF
`IN DEMAND EACH TIME OH P. CORES
`
`D!SABLEANEXTHIGH
`POWERCORE
`
`ENABLED
`615
`
`)
`
`1 L.P. CORE
`ENABLED
`606
`
`INTHEFACEOFCOOTINUED
`DROPOFFINDEMAND,EACH
`TIME DEMAND FALLSBB..OW
`A NEXT THRESHOLD, DISABLE
`NEXT LCflN PCflNER CORE
`UNTILONELOWPCflNER
`CORE !S OPERATING
`
`Petitioner Samsung Ex-1001, 0001
`
`
`
`US 10,049,080 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`2009/0271646 Al*
`
`10/2009
`
`2009/0307512 Al
`2009/0328055 Al
`2010/0058086 Al
`2010/0083011 Al
`2010/0131781 Al*
`
`12/2009
`12/2009
`3/2010
`4/2010
`5/2010
`
`2010/0153954 Al
`2011/0093733 Al*
`
`6/2010
`4/2011
`
`2011/0239015 Al
`2011/0252260 Al*
`
`9/2011
`10/2011
`
`2012/0117403 Al
`2012/0260258 Al*
`
`5/2012
`10/2012
`
`Talwar .................. G06F 1/3203
`713/322
`
`Munjal et al.
`Bose et al.
`Lee
`Onouchi et al.
`Memon ................. G06F 1/3209
`713/310
`
`Morrow et al.
`Kruglick ............... G06F 1/3203
`713/340
`
`Boyd et al.
`Flachs ................... G06F 1/3287
`713/324
`
`Bieswanger et al.
`Regini .................. G06F 9/5094
`718/104
`
`OTHER PUBLICATIONS
`
`First Office Action from foreign counterpart China Patent Applica(cid:173)
`tion No. 201280063860, dated Dec. 21, 2015, 19 pages.
`Non-Final Office Action from U.S. Appl. No. 13/335,257, dated Jan.
`12, 2015, 15 pages.
`
`Non-Final Office Action from U.S. Appl. No. 13/335,257 dated May
`26, 2016, 10 pages.
`Notice of Allowance from U.S. Appl. No. 13/335,257 dated Sep. 27,
`2016, 6 pages.
`Second Office Action from foreign counterpart China Patent Appli(cid:173)
`cation No. 201280063860, dated Jul. 21, 2016, 12 pages.
`Third Office Action from foreign counterpart China Patent Appli(cid:173)
`cation No. 201280063860, dated Dec. 15, 2016, 31 pages.
`Notice of Allowance from TW counterpart Application No.
`101147200, dated Sep. 29, 2014, 1 page.
`Aruj, Ori. "Evolution: 20 years of switching Fabric", Sep. 2008. EE
`Times. Retrieved
`from
`http://www.eetimes.com/document.
`asp?doc_id~ 1272140.
`PCT International Search Report for PCT Counterpart Application
`No. PCT/US2012/068274, 5 pgs., (dated Feb. 22, 2013).
`PCT Written Opinion of the International Searching Authority for
`PCT Counterpart Application No. PCT/US2012/068274, 6 pgs.,
`(dated Feb. 22, 2013).
`PCT Notification Concerning Transmittal of International Prelimi(cid:173)
`nary Report on Patentability (Chapter I of the Patent Cooperation
`Treaty) for PCT Counterpart Application No. PCT/US2012/068274,
`8 pgs., (dated Jul. 3, 2014).
`Fourth Office Action from foreign counterpart China Patent Appli(cid:173)
`cation No. 201280063860.9, dated Oct. 9, 2017, 10 pages.
`Notice on Grant of Patent Right for Invention from foreign coun(cid:173)
`terpart Chinese Patent Application No. 201280063860.9, dated Jan.
`24, 2018, 4 pages.
`
`* cited by examiner
`
`Petitioner Samsung Ex-1001, 0002
`
`
`
`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 1 of 8
`
`US 10,049,080 B2
`
`100_1
`
`101 1
`
`101_2
`
`101 3
`
`• • • 101 N
`
`103_1
`
`103_2
`
`103_3
`
`• • • 103 N
`
`105_2
`
`105_3
`
`106
`
`100_2
`
`100_3 • • •
`
`105_1
`
`108_Y
`
`108_X
`
`•••
`
`105_X
`
`FIG. 1
`
`Petitioner Samsung Ex-1001, 0003
`
`
`
`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 2 of 8
`
`US 10,049,080 B2
`
`201
`(
`
`\
`All CORES
`ENABLED
`WITH MAX
`SUPPLY VOLTAGE
`AND OP. FREQ.
`
`202
`(
`
`\
`ONE CORE
`ENABLED WITH
`MIN SUPPLY
`VOLTAGE AND
`OP. FREQ.
`
`HIGHEST
`PERFORMANCE
`AND
`POWER CONSUMPTION
`
`LOWEST
`PERFORMANCE
`AND
`POWER CONSUMPTION
`
`FIG. 2
`
`Petitioner Samsung Ex-1001, 0004
`
`
`
`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 3 of 8
`
`US 10,049,080 B2
`
`312
`
`302_1
`
`304
`
`vcc
`
`311
`
`... ------
`
`310
`
`•••
`
`··•----'
`
`FIG. 3
`
`Petitioner Samsung Ex-1001, 0005
`
`
`
`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 4 of 8
`
`US 10,049,080 B2
`
`HIGH POWER
`CORES
`402
`
`LOW POWER
`CORES(S)
`401
`
`,----A.----, ~
`
`• • •
`
`• • •
`
`FIG. 4
`
`Petitioner Samsung Ex-1001, 0006
`
`
`
`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 5 of 8
`
`US 10,049,080 B2
`
`POWER
`CONSUMPTION
`
`505
`
`SUPPLY VOLTAGE
`AND/OR
`OPERATING FREQUENCY
`
`FIG. 5
`
`Petitioner Samsung Ex-1001, 0007
`
`
`
`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 6 of 8
`
`US 10,049,080 B2
`
`601
`
`MULTIPLE HIGH
`POWER CORES
`OPERATIONAL AND
`AT LEAST ONE LOW
`POWER CORE
`OPERATIONAL
`
`602
`, - - - - - - - - - - ' - - ,
`
`IN THE FACE OF
`CONTINUED DROP OFF
`IN DEMAND, EACH TIME
`DEMAND FALLS BELOW
`A NEXT THRESHOLD,
`DISABLE A NEXT HIGH
`POWER CORE
`
`/
`0 H.P. CORES
`ENABLED, ALL
`L.P CORES
`ENABLED
`615
`
`7 H.P.
`CORES·
`ENABLED
`
`1 H.P.
`CORE
`ENABLED
`
`/610
`
`~~~~~~~~~~~~~~~~~~~~~~~~~~:;4
`6 H.P CORES ENABLED
`• •
`'
`•
`--------------·(
`. ' . '
`----------------/I
`---------< • ~~614
`
`606
`
`,
`' I
`'
`
`1 L.P. CORE
`ENABLED
`606
`
`DEMAND
`
`NO
`
`,- 604, 606
`
`IN THE FACE OF CONTINUED
`DROP OFF IN DEMAND, EACH
`TIME DEMAND FALLS BELOW
`A NEXT THRESHOLD, DISABLE
`NEXT LOW POWER CORE
`UNTIL ONE LOW POWER
`CORE IS OPERATING
`
`FIG. 6
`
`Petitioner Samsung Ex-1001, 0008
`
`
`
`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 7 of 8
`
`US 10,049,080 B2
`
`✓ 710
`
`1 H.P.
`CORE
`ENABLED
`
`2 H.P.
`CORES
`ENABLED
`
`••
`•
`
`2 LP.
`CORES
`ENABLED
`
`r
`
`701
`
`SINGLE LOW
`POWER CORE
`IS OPERATIONAL
`
`~
`
`r
`
`702
`
`IN THE FACE OF
`CONTINUED INCREASE
`IN DEMAND, EACH
`TIME DEMAND RISES
`ABOVE A NEXT
`THRESHOLD,ENABLE
`A NEXT LOW POWER
`CORE UNTIL ALL LOW
`POWER CORES
`ARE ENABLED
`
`/703
`.---------------------,
`
`IN THE FACE OF
`CONTINUED INCREASE
`IN DEMAND, EACH
`TIME DEMAND RISES
`ABOVE A NEXT
`THRESHOLD,ENABLE
`A NEXT HIGH POWER
`CORE UNTIL ALL
`HIGH POWER CORES
`ARE ENABLED
`
`I
`I
`I
`
`-+:
`
`I
`I
`I
`
`I
`I
`
`:-it-t- 711
`-+:
`1 LP.
`CORE
`ENABLED
`
`:+-712
`
`FIG. 7
`
`Petitioner Samsung Ex-1001, 0009
`
`
`
`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 8 of 8
`
`US 10,049,080 B2
`
`CREATE HIGH LEVEL
`800
`BEHAVIOR DESCRIPTIONS ~
`FOR EACH OF THE
`PROCESSOR'S CORES
`
`801
`SYNTHESIZE INTO ~
`RTL LEVEL NETLIST
`
`SYNTHESIZE INTO
`GATE LEVEL
`NETLIST
`
`SYNTHESIZE INTO
`GATE LEVEL
`NETLIST
`
`LOW
`POWER
`DESIGN
`
`L - IBRARY
`
`loll
`
`802
`
`(
`
`HIGH
`
`R POWE
`DESIG
`N
`LIBRA ~
`
`PLACE ROUTE AND
`TIM ING ANALYSIS
`
`PLACE ROUTE AND
`TIMING ANALYSIS
`
`v--806
`
`TRANSISTOR
`LEVEL
`.______,,
`NETLIST
`
`r
`
`TRANSISTOR
`...___,., LEVEL
`NETLIST
`,
`
`DESIGN LAYOUT
`
`v--807
`
`VERIFICATION AND
`GROUND RULE
`CHECK
`
`..--808
`
`FIG. 8
`
`Petitioner Samsung Ex-1001, 0010
`
`
`
`US 10,049,080 B2
`
`2
`FIG. 1 shows a multicore processor and surrounding
`computer system (prior art);
`FIG. 2 shows a power management strategy (prior art);
`FIG. 3 shows a logic gate drive circuit;
`FIG. 4 shows multi core processor having high power and
`low power cores that support the same instruction set;
`FIG. 5 compares power consumption of a high power core
`and low power core;
`FIG. 6 shows a first power management method;
`FIG. 7 shows a second power management method;
`FIG. 8 shows a design method.
`
`DETAILED DESCRIPTION
`
`Overview
`
`1
`ASYMMETRIC PERFORMANCE
`MULTICORE ARCHITECTURE WITH SAME
`INSTRUCTION SET ARCHITECTURE
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`The present patent application is a continuation applica(cid:173)
`tion claiming priority from U.S. patent application Ser. No.
`13/335,257, filed Dec. 22, 2011, and titled: "Asymmetric
`Performance Multicore Architecture with Same Instruction
`Set Architecture", which is incorporated herein by reference
`in its entirety.
`
`BACKGROUND
`
`10
`
`15
`
`Field of Invention
`The field of invention relates generally to computing
`system architecture, and, more specifically, to an asymmet-
`ric performance multicore architecture with same instruction
`set architecture (ISA).
`Background
`FIG. 1 shows a typical multi-core processor 100_1. As
`observed in FIG. 1, the multi-core processor 100_1 includes
`a plurality of processor cores 101_1 to lOl_N on a same
`semiconductor die 100_1. Each of the processor cores 25
`typically contain at least one caching layer for caching data
`and/or instructions. A switch fabric 102 interconnects the
`processor cores 101_1 to lOl_N to one another and to one
`or more additional caching layers 103_1 to 103_N. Accord(cid:173)
`ing to one approach, the processors 101_1 to l0l_N and the
`one or more caching layers have internal coherency logic to, 30
`for example, prevent two different cores from concurrently
`modifying the same item of data.
`A system memory interface (which may also include
`additional coherency logic) 104 is also included. Here, if a
`core requests a specific cache line having a needed instruc(cid:173)
`tion or item of data, and, the cache line is not found in any
`of the caching layers, the request is presented to the system
`memory interface 104. If the looked for cache line is not in
`the system memory 105_1 that is directly coupled to inter(cid:173)
`face 104, the request is forwarded through system network
`interface 106 to another multi-core processor to fetch the
`desired data/instruction from its local system memory ( e.g.,
`system memory 105_X of multi-core processor l00_X). A
`packet switched network 107 exists between the multi(cid:173)
`processor cores 100_1 to lO0_X to support these kinds of
`system memory requests.
`Interfaces to system I/O components 108_1 to 108_Y
`(e.g., deep non volatile storage such as a hard disk drive,
`printers, external network interfaces, etc.) are also included
`on the multi-processor core. These interfaces may take the
`form of high speed link interfaces such as high speed
`Ethernet interfaces and/or high speed PCie interfaces.
`Some multi core processors may also have a port 105 to
`the switch fabric 102 to scale upwards the number of
`processor cores associated with a same (also scaled upward)
`caching structure. For example, as observed FIG. 1, multi(cid:173)
`processor cores 101_1 and 101_2 are coupled through the
`switch fabric port 105 to effectively form a platform of 2N
`cores that share a common caching structure (processor
`100_2 is coupled to processor 100_1 through a similar port
`to its switch fabric).
`
`Detailed Description
`Computing system power consumption is becoming more
`20 and more of a concern. As such, a number of different power
`management schemes are incorporated into modern day
`computing systems. Typically, the power management com(cid:173)
`ponent of the system will scale up the processing perfor-
`mance of the system as the system's workload increases,
`and, scale down the processing performance of the system as
`the system's workload decreases. Decreasing the processing
`performance of the system corresponds to power savings as
`the power consumption of the system is strongly correlated
`with its performance capabilities.
`A typical way to scale processing performance and power
`consumption with workload is to enable/disable entire cores
`and raise/lower their supply voltages and operating frequen(cid:173)
`cies in response to system workload. For example, as
`observed in FIG. 2, under a maximum performance and
`35 power consumption state 201 all cores are enabled and each
`core is provided with a maximum supply voltage and
`maximum clock frequency. By contrast, under a minimum
`performance and power consumption state 202 (at which
`40 program code can still be executed), only one core is
`enabled. The single core is provided with a minimum supply
`voltage and minimum operating frequency.
`Some basic concepts of electronic circuit power consump(cid:173)
`tion are observed in FIG. 3. Here, the driver circuit 310
`45 portion of a logic gate 311 is observed driving a next one or
`more logic gate(s) 312. Specifically, the speed of operation
`of interconnected logic gates 311, 312 rises as the width of
`its driving transistors 302_1, 302_2 (measured, for each
`transistor, along the semiconductor surface perpendicular to
`50 the direction of current flow) increase and the capacitance
`303 of the line 304 (and input capacitance of the load logic
`gate(s) 312) it is driving decreases. Here, in order to raise the
`voltage on the line from a logic low level to a logic high
`level, a sufficiently strong current 305 needs to be driven by
`55 the source transistor 302_1 through the line to rapidly apply
`charge to the capacitance 303 (and thereby raise the voltage
`on the line). Similarly, in order to lower the voltage on the
`line from a logic high level to a logic low level, a sufficiently
`strong current 306 needs to be "sunk" by the sink transistor
`60 302_2 through the line to rapidly draw charge off the
`capacitance (and thereby lower the voltage on the line).
`Essentially, the amount of current the transistors 302_1,
`302_2 will source/sink is a function of their respective
`widths. That is, the wider the transistors are, the more
`65 current they will source/sink. Moreover, the amount of
`current the transistors 302_1, 302_2 will source/sink is also
`a function of the supply voltage VCC that is applied to the
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention is illustrated by way of example
`and not limitation in the figures of the accompanying
`drawings, in which like references indicate similar elements
`and in which:
`
`Petitioner Samsung Ex-1001, 0011
`
`
`
`US 10,049,080 B2
`
`3
`driver circuit 310 observed in FIG. 3. Essentially, the higher
`the supply voltage, the stronger the source/sink currents will
`be.
`Further still, the rate at which the transistors will be able
`to apply/draw charge to/from the capacitor is a function of
`the size of the capacitance 303 of the line 304 being driven.
`Specifically, the transistors will apply/draw charge slower as
`the capacitance 304 increases and apply/draw charge faster
`as the capacitance 304 decreases. The capacitance 304 of the
`line is based on its physical dimensions. That is, the capaci- 10
`tance 304 increases the longer and wider the line, and by
`contrast, the capacitance 304 decreases the shorter and
`narrower the line is. The line itself is of fixed dimensions
`once the circuit is manufactured. Nevertheless, line width
`and line length are design parameters that designers must 15
`account for. The width of the line cannot be narrowed too
`much or else it will have the effect of increasing the line's
`resistance which will also slow down the rate of charge
`applied/drawn to/from the capacitor.
`A final speed factor is the frequency of the signal itself on 20
`the line. Essentially, circuits driven with a faster clock signal
`will more rapidly switch between applying and drawing
`charge to/from the line capacitance 304 than circuits with a
`slower clock signal. Here, more rapid switching corresponds
`to a circuit that is sending binary information faster.
`All of the factors described above for increasing the rate
`at which the charge on the capacitor is applied/drawn also
`lead to a circuit that consumes more power. That is, a circuit
`that is designed to have relatively wide source/sink transis(cid:173)
`tors, a high supply voltage, short load lines and receive a 30
`higher frequency clock signal will operate faster and there(cid:173)
`fore consume more power than circuits oppositely oriented
`as to these same parameters.
`Recalling the discussion of FIGS. 1 and 2, note that prior
`art multi core processor power management schemes have
`been implemented on processors whose constituent cores are
`identical. That is, referring to FIG. 1, all of cores 101_1 to
`lOl_N are identical in design. In other approaches, the cores
`are not identical but are radically different. Specifically, one
`of the cores is a low power core but the lower power
`characteristic is achieved by stripping out sizable chunks of
`logic circuitry as compared to the other cores. More spe(cid:173)
`cifically, the sizable chunks that are stripped out correspond
`to the logic that executes the program code instructions. Said
`another way, the low power core supports a reduced instruc(cid:173)
`tion set as compared to the higher performance cores. A
`problem with this approach, however, is that it is difficult for
`system software to adjust switch operation between proces(cid:173)
`sor cores having different instruction sets.
`FIG. 4 depicts a new approach in which at least one of the
`cores 401 is designed to be lower performance and therefore
`consume less power than other cores 402 in the processor.
`However, the lower power core(s) 401 has a same logic
`design as the higher power core(s) 402 and therefore sup(cid:173)
`ports the same instruction set 403 as the high power core(s) 55
`402. The low power core(s) 401 achieve a lower power
`design point by having narrower drive transistor widths than
`the higher power core(s) and/or having other power con(cid:173)
`sumption related design features, such as any of those
`discussed above with respect to FIG. 3, that are oppositely 60
`oriented than the same design features in the higher power
`cores.
`According to one approach, discussed in more detail
`below, when the multi-processor core is being designed, the
`same high level description ( e.g., the same VHDL or Verilog 65
`description) is used for both the higher performance/power
`core(s) and the lower performance/power core(s). When the
`
`4
`higher level descriptions are synthesized into RTL netlists,
`however, for the subsequent synthesis from an RTL netlist
`into a transistor level netlist, different technology libraries
`are used for the low power core(s) than the high power
`5 core(s). As alluded to above, the drive transistors of logic
`gates associated with the libraries used for the low power
`core(s) have narrower respective widths than the "same"
`transistors of the "same" logic gates associated with the
`libraries used for the high power cores.
`By design of the multiprocessor, referring to FIG. 5, the
`lower power core(s) exhibit inherently lower power con(cid:173)
`sumption (and processing performance) than the higher
`power core(s ). That is, for a same applied clock or operating
`frequency, because of its narrower drive transistor widths,
`for example, a lower power core will consume less power
`than a higher power core. Because of the narrower drive
`transistor widths, however, the lower power core has a
`maximum operating frequency that is less than the maxi(cid:173)
`mum operating frequency of the higher power core.
`The import of the lower power core, however, is that the
`multi-processor is able to entertain a power management
`strategy that is the same/similar to already existing power
`management strategies, yet, still achieve an even lower
`power consumption in the lower/lowest performance/power
`25 states. Specifically, recall briefly power state 202 of FIG. 2
`in which only one core is left operable (the remaining cores
`are disabled). Here, if the one remaining operable core is the
`low power core, the processor will exhibit even lower power
`consumption than the prior art low power state 202.
`The amount of reduced power savings 503 is directly
`observable in FIG. 5. Here, recall that all the processors
`were identical in the multi-processor that was discussed with
`respect to the prior art low power state 202 of FIG. 2. As
`such, even if the supply voltage and operating voltage was
`35 reduced to a minimum, the power consumption would be
`that of a higher power processor ( e.g., having wider drive
`transistor widths). This operating point is represented by
`point 504 of FIG. 5. By contrast, in the lowest power
`operating state of the improved multi-processor, if the
`40 operable core is a low power core it will consume power
`represented by point 505 of FIG. 5. As such, the improved
`processor exhibits comparatively lower power consumption
`at the lower/lowest performance operating states than the
`prior art multi-processor, while, at the same time, fully
`45 supporting the instruction set architecture the software is
`designed to operate on.
`FIG. 6 shows a power management process flow that can
`be executed, for example, with power management software
`that is running on the multi-processor (or another multi-
`50 processor or separate controller, etc.). Conversely, the power
`management process flow of FIG. 6 can be executed entirely
`in hardware on the multi-processor or by some combination
`of such hardware and software.
`According to the process flow of FIG. 6, from an initial
`state 601 where at least some high power processor cores
`and the low power core(s) are operating, in response to a
`continued drop in demand on the multi-processor, another
`high power core is disabled each time the continued drop in
`demand falls below some next lower threshold. For
`example, in a multi-processor core having sixteen cores
`where fourteen cores are high power cores and two cores are
`low power cores, the initial state 601 may correspond to a
`state where seven of the high power cores and both of the
`low power cores are operational.
`In response to continued lower demand placed on the
`multi-processor, the seven high power cores will be disabled
`one by one with each new lower demand threshold 602. For
`
`Petitioner Samsung Ex-1001, 0012
`
`
`
`US 10,049,080 B2
`
`5
`instance, as observed at inset 610, demand level 611 justifies
`enablement of the seven high power cores and both low
`power cores. As the demand continually drops to a next
`lower threshold 612, one of the high power cores is disabled
`613 leaving six operable high power cores and two low 5
`power cores.
`Before the high power core is disabled, as a matter of
`designer choice, the core's individual operating frequency,
`or the operating frequency of all ( or some of) the enabled
`high power cores, or the operating frequency of all ( or some 10
`of) the enabled high power cores and the low power cores
`may be lowered to one or more lower operating frequency
`levels.
`A similar designer choice exists with respect to the supply
`voltages applied to the cores. That is, before the high power
`core is disabled, as a matter of designer choice, the core's
`individual supply voltage, or the supply voltage of all ( or
`some of) the enabled high power cores, or the supply voltage
`of all ( or some of) the enabled high power cores and the low
`power cores may be lowered to one or more lower supply
`voltages. Supply voltages may be lowered in conjunction
`with the lowering of operating frequency, or, just one or
`none of these parameters may be lowered as described
`above.
`Eventually, with the continued drop in demand, the last 25
`remaining high power core will be disabled 615 after
`demand falls below some lower threshold 614. This leaves
`only the low power cores in operation. Operating frequency
`and/or supply voltage of the low power core(s) may likewise
`be lowered as demand continues to drop beneath level 614.
`With continued drop in demand a similar process of dis(cid:173)
`abling cores as demand falls below each next lower demand
`threshold 604 continues until the multi-processor core is left
`with only one low power core remaining as its sole operating
`core 606.
`State 606 is reached of course with the disablement of the
`last high power core in implementations where the processor
`only has one lower power core. Again supply voltage and/or
`operating frequency of the sole remaining low power core
`may be lowered as demand continues to fall. Importantly, in
`state 606, as discussed above, the multi-processor will
`exhibit lower power consumption than other multi-processor
`cores having an identical power management scheme but
`whose constituent cores are all high power cores. Even
`lower power consumption can be provided for in state 606
`if the sole operating low power core is provided with a lower
`supply voltage and/or lower operating frequency that the
`lowest operating supply voltage and/or operating frequency
`applied to the high power cores.
`No special adjustment needs to be made by or for appli(cid:173)
`cation software, virtual machine or virtual machine monitor
`when the system is running only on the low power core(s)
`after all the high power cores are disabled. Again, the
`preservation of the same instruction core across all cores in
`the system corresponds to transparency from the software's 55
`perspective as to the underlying cores. Lower performance
`may be recognized with lower cores but no special adjust(cid:173)
`ments as to the content of the instruction streams should be
`necessary. In various alternate implementations: 1) the hard(cid:173)
`ware/machine readable firmware can monitor and control
`the core mix; or, 2) the hardware can relinquish control to
`the Operating system and let it monitor the demand and
`control the core mix.
`FIG. 7 shows essentially a reverse of the processes
`described above. As observed in FIG. 7, starting from a state 65
`in which only a single low power core is operating 701
`additional low power cores are enabled (if any more) 702 as
`
`6
`demand on the multi-processor continually increases. Even(cid:173)
`tually, high power cores are enabled 703. Notably, the
`demand threshold needed to enable a next processor from an
`operating low power processor may correspond to a lower
`demand increment than the demand threshold needed to
`enable to a next processor from an operating high power
`processor.
`That is, inset 710 shows the increase in demand 711
`needed after a low power processor is first enabled to trigger
`the enablement of a next processor in the face of increased
`demand. The increase in demand 712 needed after a high
`power processor is first enabled to trigger enablement of a
`next high power processor in the face of increased demand
`is greater than the aforementioned demand 711. This is so
`15 because a high power processor is able to handle more total
`demand than a low power processor and therefore does not
`need to have additional "help" as soon as a low power
`processor does.
`Operating frequency and/or supply voltage may also be
`20 increased in conjunction with the enablement of cores in the
`face of increased demand in a logically inverse mamier to
`that discussed above with respect to the disablement of
`cores.
`FIG. 8 shows a design process for designing a multi-core
`processor consistent with the principles discussed above. As
`part of the design process, high level behavioral descriptions
`800 (e.g., VHDL or Verilog descriptions) for each of the
`processor's cores are synthesized into a Register Transfer
`Level (RTL) netlist 801. The RTL netlist is synthesized 802
`30 into corresponding higher power core gate level netlist(s)
`( one for each high power ore) with libraries corresponding
`to a higher power/performance design ( such as logic circuits
`having wider drive transistors). The RTL netlist is also
`synthesized 803 into corresponding lower power core gate
`35 level netlist(s) (one for each low power core) with libraries
`corresponding to a lower power/performance design (such
`as logic circuits having wider drive transistors). Here, the
`logic designs for the high power and low power cores are the
`same but the design of their corresponding logic circuits
`40 have different performance/power design points.
`The transistor level netlists for the respective cores are
`then used as a basis for performing a respective place, route
`and timing analysis 806 and design layout 807. Here, the
`lower power/performance cores may have more relaxed
`45 placement and timing guidelines owing to the larger per(cid:173)
`missible propagation delay through and between logic cir(cid:173)
`cuits. Said another way, recalling from the discussion of
`FIG. 3 that longer load lines result in slower rise and fall
`times, the lower performance cores may permit longer load
`50 lines between transistors and gates because these cores are
`designed to have slower operation ( of course, if load lines
`are increased to much along with the inclusion of narrower
`drive transistors, the drop in performance may be more than
`desired).
`Upon completion of the layout and timing analysis, the
`cores are cleared for manufacture upon a clean manufactur(cid:173)
`ing ground rule check 808.
`Processes taught by the discussion above may be per(cid:173)
`formed with program code such as machine-executable
`60 instructions that cause a machine that executes these instruc-
`tions to perform certain functions. In this context, a
`"machine" may be a machine that converts intermediate
`form ( or "abstract") instructions into processor specific
`instructions ( e.g., an abstract execution environment such as
`a "virtual machine" (e.g., a Java Virtual Machine), an
`interpreter, a Common Language Runtime, a high-level
`language virtual machine, etc.)), and/or, electronic circuitry
`
`Petitioner Samsung Ex-1001, 0013
`
`
`
`US 10,049,080 B2
`
`7
`disposed on a semiconductor chip ( e.g., "logic circuitry"
`implemented with transistors) designed to execute instruc(cid:173)
`tions such as a general-purpose processor and/or a special(cid:173)
`purpose processor. Processes taught by the discussion above
`may also be performed by (in the alternative to a machine or
`in combination with a machine) electronic circuitry designed
`to perform the processes ( or a portion thereof) without the
`execution of program code.
`It is believed that processes taught by the discussion
`above may also be described in source level program code
`in various object-orientated or non-object-orientated com(cid:173)
`puter programming languages (e.g., Java, C#, VB, Python,
`C, C++, J#, APL, Cobol, Fortran, Pascal, Perl, etc.) sup(cid:173)
`ported by various software development frameworks ( e.g.,
`Microsoft Corporation's .NET, Mono, Java, Oracle Corpo(cid:173)
`ration's Fusion, etc.). The source level program code may be
`converted into an intermediate form of program code ( such
`as Java byte code, Microsoft Intermediate Language, etc.)
`that is understandable to an abstract execution environment
`(e.g., a Java Virtual Machine, a Common Language Run(cid:173)
`time, a high-level language virtual machine, an interpreter,
`etc.) or may be compiled directly into object code.
`According to various approaches the abstract execution
`environment may convert the intermediate form program
`code into processor specific code by, 1) compiling the
`inte