throbber
I 1111111111111111 1111111111 11111 111111111111111 1111111111 111111111111111111
`
`USO 10049080B2
`
`c12) United States Patent
`George et al.
`
`US 10,049,080 B2
`(IO) Patent No.:
`(45) Date of Patent:
`Aug. 14, 2018
`
`(54) ASYMMETRIC PERFORMANCE
`MULTICORE ARCHITECTURE WITH SAME
`INSTRUCTION SET ARCHITECTURE
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`(71) Applicant: Intel Corporation, Santa Clara, CA
`(US)
`
`(72)
`
`Inventors: Varghese George, Folsom, CA (US);
`Sanjeev S. Jahagirdar, Folsom, CA
`(US); Deborah T. Marr, Portland, OR
`(US)
`
`(73) Assignee: INTEL CORPORATION, Santa Clara,
`CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by O days.
`
`(21) Appl. No.: 15/431,527
`
`(22) Filed:
`
`Feb. 13, 2017
`
`(65)
`
`Prior Publication Data
`
`US 2017/0154012 Al
`
`Jun. 1, 2017
`
`Related U.S. Application Data
`
`(63)
`
`Continuation of application No. 13/335,257, filed on
`Dec. 22, 2011, now Pat. No. 9,569,278.
`
`(51)
`
`(52)
`
`(58)
`
`(2006.01)
`(2006.01)
`(2006.01)
`
`Int. Cl.
`G06F 15180
`G06F 13/40
`G06F 1132
`U.S. Cl.
`CPC ............ G06F 15180 (2013.01); G06F 113206
`(2013.01); G06F 113293 (2013.01); G06F
`113296 (2013.01); G06F 13/4022 (2013.01)
`Field of Classification Search
`None
`See application file for complete search history.
`
`7,992,020 Bl
`2006/0095807 Al*
`
`8/2011 Tuan et al.
`5/2006 Grochowski ........... G06F 1/206
`713/324
`2006/0279152 Al* 12/2006 Ha ........................ G06F 1/3203
`310/114
`
`2006/0282692 Al
`12/2006 Oh
`5/2008 Capps et al.
`2008/0127192 Al
`2008/0263324 Al* 10/2008 Sutardja ................ G06F 1/3203
`712/43
`
`2008/0288748 Al
`2009/0055826 Al
`
`11/2008 Sutardja et al.
`2/2009 Bernstein et al.
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`
`CN
`
`101076770 A
`
`11/2007
`
`OTHER PUBLICATIONS
`
`Final Office Action from U.S. Appl. No. 13/335,257, dated May 5,
`2015, 13 pages.
`
`(Continued)
`
`Primary Examiner - Eric Coleman
`(74) Attorney, Agent, or Firm - Nicholas De Vos
`Webster & Elliott LLP
`
`(57)
`
`ABSTRACT
`
`A method is described that entails operating enabled cores of
`a multi-core processor such that both cores support respec(cid:173)
`tive software routines with a same instruction set, a first core
`being higher performance and consuming more power than
`a second core under a same set of applied supply voltage and
`operating frequency.
`
`24 Claims, 8 Drawing Sheets
`
`- - - - - - ' - , - 6 0 1
`
`MULTIPLE HIGH
`POWER CORES
`OPERATIONAL AND
`AT LEAST ONE LOW
`POWER CORE
`CPERATIOO"L
`
`/610
`
`7H.P:
`
`CORES~---_-_-_-_-_-_-A
`
`6HP:CORESENABLED.•i
`•
`
`6028'1ABLED
`1HP:
`CORE~ - - - , , ;
`!
`8'1ABLED
`__________
`.•:
`,
`•
`:
`:
`D~~~m~~~~gw ~A:~:L -- ---<' ~614
`
`606
`
`,
`:
`
`'
`:
`
`•••
`
`DEMAND
`
`INTHEFACEOF
`CONTINUED DROP OFF
`IN DEMAND EACH TIME OH P. CORES
`
`D!SABLEANEXTHIGH
`POWERCORE
`
`ENABLED
`615
`
`)
`
`1 L.P. CORE
`ENABLED
`606
`
`INTHEFACEOFCOOTINUED
`DROPOFFINDEMAND,EACH
`TIME DEMAND FALLSBB..OW
`A NEXT THRESHOLD, DISABLE
`NEXT LCflN PCflNER CORE
`UNTILONELOWPCflNER
`CORE !S OPERATING
`
`Petitioner Samsung Ex-1001, 0001
`
`

`

`US 10,049,080 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`2009/0271646 Al*
`
`10/2009
`
`2009/0307512 Al
`2009/0328055 Al
`2010/0058086 Al
`2010/0083011 Al
`2010/0131781 Al*
`
`12/2009
`12/2009
`3/2010
`4/2010
`5/2010
`
`2010/0153954 Al
`2011/0093733 Al*
`
`6/2010
`4/2011
`
`2011/0239015 Al
`2011/0252260 Al*
`
`9/2011
`10/2011
`
`2012/0117403 Al
`2012/0260258 Al*
`
`5/2012
`10/2012
`
`Talwar .................. G06F 1/3203
`713/322
`
`Munjal et al.
`Bose et al.
`Lee
`Onouchi et al.
`Memon ................. G06F 1/3209
`713/310
`
`Morrow et al.
`Kruglick ............... G06F 1/3203
`713/340
`
`Boyd et al.
`Flachs ................... G06F 1/3287
`713/324
`
`Bieswanger et al.
`Regini .................. G06F 9/5094
`718/104
`
`OTHER PUBLICATIONS
`
`First Office Action from foreign counterpart China Patent Applica(cid:173)
`tion No. 201280063860, dated Dec. 21, 2015, 19 pages.
`Non-Final Office Action from U.S. Appl. No. 13/335,257, dated Jan.
`12, 2015, 15 pages.
`
`Non-Final Office Action from U.S. Appl. No. 13/335,257 dated May
`26, 2016, 10 pages.
`Notice of Allowance from U.S. Appl. No. 13/335,257 dated Sep. 27,
`2016, 6 pages.
`Second Office Action from foreign counterpart China Patent Appli(cid:173)
`cation No. 201280063860, dated Jul. 21, 2016, 12 pages.
`Third Office Action from foreign counterpart China Patent Appli(cid:173)
`cation No. 201280063860, dated Dec. 15, 2016, 31 pages.
`Notice of Allowance from TW counterpart Application No.
`101147200, dated Sep. 29, 2014, 1 page.
`Aruj, Ori. "Evolution: 20 years of switching Fabric", Sep. 2008. EE
`Times. Retrieved
`from
`http://www.eetimes.com/document.
`asp?doc_id~ 1272140.
`PCT International Search Report for PCT Counterpart Application
`No. PCT/US2012/068274, 5 pgs., (dated Feb. 22, 2013).
`PCT Written Opinion of the International Searching Authority for
`PCT Counterpart Application No. PCT/US2012/068274, 6 pgs.,
`(dated Feb. 22, 2013).
`PCT Notification Concerning Transmittal of International Prelimi(cid:173)
`nary Report on Patentability (Chapter I of the Patent Cooperation
`Treaty) for PCT Counterpart Application No. PCT/US2012/068274,
`8 pgs., (dated Jul. 3, 2014).
`Fourth Office Action from foreign counterpart China Patent Appli(cid:173)
`cation No. 201280063860.9, dated Oct. 9, 2017, 10 pages.
`Notice on Grant of Patent Right for Invention from foreign coun(cid:173)
`terpart Chinese Patent Application No. 201280063860.9, dated Jan.
`24, 2018, 4 pages.
`
`* cited by examiner
`
`Petitioner Samsung Ex-1001, 0002
`
`

`

`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 1 of 8
`
`US 10,049,080 B2
`
`100_1
`
`101 1
`
`101_2
`
`101 3
`
`• • • 101 N
`
`103_1
`
`103_2
`
`103_3
`
`• • • 103 N
`
`105_2
`
`105_3
`
`106
`
`100_2
`
`100_3 • • •
`
`105_1
`
`108_Y
`
`108_X
`
`•••
`
`105_X
`
`FIG. 1
`
`Petitioner Samsung Ex-1001, 0003
`
`

`

`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 2 of 8
`
`US 10,049,080 B2
`
`201
`(
`
`\
`All CORES
`ENABLED
`WITH MAX
`SUPPLY VOLTAGE
`AND OP. FREQ.
`
`202
`(
`
`\
`ONE CORE
`ENABLED WITH
`MIN SUPPLY
`VOLTAGE AND
`OP. FREQ.
`
`HIGHEST
`PERFORMANCE
`AND
`POWER CONSUMPTION
`
`LOWEST
`PERFORMANCE
`AND
`POWER CONSUMPTION
`
`FIG. 2
`
`Petitioner Samsung Ex-1001, 0004
`
`

`

`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 3 of 8
`
`US 10,049,080 B2
`
`312
`
`302_1
`
`304
`
`vcc
`
`311
`
`... ------
`
`310
`
`•••
`
`··•----'
`
`FIG. 3
`
`Petitioner Samsung Ex-1001, 0005
`
`

`

`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 4 of 8
`
`US 10,049,080 B2
`
`HIGH POWER
`CORES
`402
`
`LOW POWER
`CORES(S)
`401
`
`,----A.----, ~
`
`• • •
`
`• • •
`
`FIG. 4
`
`Petitioner Samsung Ex-1001, 0006
`
`

`

`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 5 of 8
`
`US 10,049,080 B2
`
`POWER
`CONSUMPTION
`
`505
`
`SUPPLY VOLTAGE
`AND/OR
`OPERATING FREQUENCY
`
`FIG. 5
`
`Petitioner Samsung Ex-1001, 0007
`
`

`

`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 6 of 8
`
`US 10,049,080 B2
`
`601
`
`MULTIPLE HIGH
`POWER CORES
`OPERATIONAL AND
`AT LEAST ONE LOW
`POWER CORE
`OPERATIONAL
`
`602
`, - - - - - - - - - - ' - - ,
`
`IN THE FACE OF
`CONTINUED DROP OFF
`IN DEMAND, EACH TIME
`DEMAND FALLS BELOW
`A NEXT THRESHOLD,
`DISABLE A NEXT HIGH
`POWER CORE
`
`/
`0 H.P. CORES
`ENABLED, ALL
`L.P CORES
`ENABLED
`615
`
`7 H.P.
`CORES·
`ENABLED
`
`1 H.P.
`CORE
`ENABLED
`
`/610
`
`~~~~~~~~~~~~~~~~~~~~~~~~~~:;4
`6 H.P CORES ENABLED
`• •
`'
`•
`--------------·(
`. ' . '
`----------------/I
`---------< • ~~614
`
`606
`
`,
`' I
`'
`
`1 L.P. CORE
`ENABLED
`606
`
`DEMAND
`
`NO
`
`,- 604, 606
`
`IN THE FACE OF CONTINUED
`DROP OFF IN DEMAND, EACH
`TIME DEMAND FALLS BELOW
`A NEXT THRESHOLD, DISABLE
`NEXT LOW POWER CORE
`UNTIL ONE LOW POWER
`CORE IS OPERATING
`
`FIG. 6
`
`Petitioner Samsung Ex-1001, 0008
`
`

`

`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 7 of 8
`
`US 10,049,080 B2
`
`✓ 710
`
`1 H.P.
`CORE
`ENABLED
`
`2 H.P.
`CORES
`ENABLED
`
`••
`•
`
`2 LP.
`CORES
`ENABLED
`
`r
`
`701
`
`SINGLE LOW
`POWER CORE
`IS OPERATIONAL
`
`~
`
`r
`
`702
`
`IN THE FACE OF
`CONTINUED INCREASE
`IN DEMAND, EACH
`TIME DEMAND RISES
`ABOVE A NEXT
`THRESHOLD,ENABLE
`A NEXT LOW POWER
`CORE UNTIL ALL LOW
`POWER CORES
`ARE ENABLED
`
`/703
`.---------------------,
`
`IN THE FACE OF
`CONTINUED INCREASE
`IN DEMAND, EACH
`TIME DEMAND RISES
`ABOVE A NEXT
`THRESHOLD,ENABLE
`A NEXT HIGH POWER
`CORE UNTIL ALL
`HIGH POWER CORES
`ARE ENABLED
`
`I
`I
`I
`
`-+:
`
`I
`I
`I
`
`I
`I
`
`:-it-t- 711
`-+:
`1 LP.
`CORE
`ENABLED
`
`:+-712
`
`FIG. 7
`
`Petitioner Samsung Ex-1001, 0009
`
`

`

`U.S. Patent
`
`Aug. 14, 2018
`
`Sheet 8 of 8
`
`US 10,049,080 B2
`
`CREATE HIGH LEVEL
`800
`BEHAVIOR DESCRIPTIONS ~
`FOR EACH OF THE
`PROCESSOR'S CORES
`
`801
`SYNTHESIZE INTO ~
`RTL LEVEL NETLIST
`
`SYNTHESIZE INTO
`GATE LEVEL
`NETLIST
`
`SYNTHESIZE INTO
`GATE LEVEL
`NETLIST
`
`LOW
`POWER
`DESIGN
`
`L - IBRARY
`
`loll
`
`802
`
`(
`
`HIGH
`
`R POWE
`DESIG
`N
`LIBRA ~
`
`PLACE ROUTE AND
`TIM ING ANALYSIS
`
`PLACE ROUTE AND
`TIMING ANALYSIS
`
`v--806
`
`TRANSISTOR
`LEVEL
`.______,,
`NETLIST
`
`r
`
`TRANSISTOR
`...___,., LEVEL
`NETLIST
`,
`
`DESIGN LAYOUT
`
`v--807
`
`VERIFICATION AND
`GROUND RULE
`CHECK
`
`..--808
`
`FIG. 8
`
`Petitioner Samsung Ex-1001, 0010
`
`

`

`US 10,049,080 B2
`
`2
`FIG. 1 shows a multicore processor and surrounding
`computer system (prior art);
`FIG. 2 shows a power management strategy (prior art);
`FIG. 3 shows a logic gate drive circuit;
`FIG. 4 shows multi core processor having high power and
`low power cores that support the same instruction set;
`FIG. 5 compares power consumption of a high power core
`and low power core;
`FIG. 6 shows a first power management method;
`FIG. 7 shows a second power management method;
`FIG. 8 shows a design method.
`
`DETAILED DESCRIPTION
`
`Overview
`
`1
`ASYMMETRIC PERFORMANCE
`MULTICORE ARCHITECTURE WITH SAME
`INSTRUCTION SET ARCHITECTURE
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`The present patent application is a continuation applica(cid:173)
`tion claiming priority from U.S. patent application Ser. No.
`13/335,257, filed Dec. 22, 2011, and titled: "Asymmetric
`Performance Multicore Architecture with Same Instruction
`Set Architecture", which is incorporated herein by reference
`in its entirety.
`
`BACKGROUND
`
`10
`
`15
`
`Field of Invention
`The field of invention relates generally to computing
`system architecture, and, more specifically, to an asymmet-
`ric performance multicore architecture with same instruction
`set architecture (ISA).
`Background
`FIG. 1 shows a typical multi-core processor 100_1. As
`observed in FIG. 1, the multi-core processor 100_1 includes
`a plurality of processor cores 101_1 to lOl_N on a same
`semiconductor die 100_1. Each of the processor cores 25
`typically contain at least one caching layer for caching data
`and/or instructions. A switch fabric 102 interconnects the
`processor cores 101_1 to lOl_N to one another and to one
`or more additional caching layers 103_1 to 103_N. Accord(cid:173)
`ing to one approach, the processors 101_1 to l0l_N and the
`one or more caching layers have internal coherency logic to, 30
`for example, prevent two different cores from concurrently
`modifying the same item of data.
`A system memory interface (which may also include
`additional coherency logic) 104 is also included. Here, if a
`core requests a specific cache line having a needed instruc(cid:173)
`tion or item of data, and, the cache line is not found in any
`of the caching layers, the request is presented to the system
`memory interface 104. If the looked for cache line is not in
`the system memory 105_1 that is directly coupled to inter(cid:173)
`face 104, the request is forwarded through system network
`interface 106 to another multi-core processor to fetch the
`desired data/instruction from its local system memory ( e.g.,
`system memory 105_X of multi-core processor l00_X). A
`packet switched network 107 exists between the multi(cid:173)
`processor cores 100_1 to lO0_X to support these kinds of
`system memory requests.
`Interfaces to system I/O components 108_1 to 108_Y
`(e.g., deep non volatile storage such as a hard disk drive,
`printers, external network interfaces, etc.) are also included
`on the multi-processor core. These interfaces may take the
`form of high speed link interfaces such as high speed
`Ethernet interfaces and/or high speed PCie interfaces.
`Some multi core processors may also have a port 105 to
`the switch fabric 102 to scale upwards the number of
`processor cores associated with a same (also scaled upward)
`caching structure. For example, as observed FIG. 1, multi(cid:173)
`processor cores 101_1 and 101_2 are coupled through the
`switch fabric port 105 to effectively form a platform of 2N
`cores that share a common caching structure (processor
`100_2 is coupled to processor 100_1 through a similar port
`to its switch fabric).
`
`Detailed Description
`Computing system power consumption is becoming more
`20 and more of a concern. As such, a number of different power
`management schemes are incorporated into modern day
`computing systems. Typically, the power management com(cid:173)
`ponent of the system will scale up the processing perfor-
`mance of the system as the system's workload increases,
`and, scale down the processing performance of the system as
`the system's workload decreases. Decreasing the processing
`performance of the system corresponds to power savings as
`the power consumption of the system is strongly correlated
`with its performance capabilities.
`A typical way to scale processing performance and power
`consumption with workload is to enable/disable entire cores
`and raise/lower their supply voltages and operating frequen(cid:173)
`cies in response to system workload. For example, as
`observed in FIG. 2, under a maximum performance and
`35 power consumption state 201 all cores are enabled and each
`core is provided with a maximum supply voltage and
`maximum clock frequency. By contrast, under a minimum
`performance and power consumption state 202 (at which
`40 program code can still be executed), only one core is
`enabled. The single core is provided with a minimum supply
`voltage and minimum operating frequency.
`Some basic concepts of electronic circuit power consump(cid:173)
`tion are observed in FIG. 3. Here, the driver circuit 310
`45 portion of a logic gate 311 is observed driving a next one or
`more logic gate(s) 312. Specifically, the speed of operation
`of interconnected logic gates 311, 312 rises as the width of
`its driving transistors 302_1, 302_2 (measured, for each
`transistor, along the semiconductor surface perpendicular to
`50 the direction of current flow) increase and the capacitance
`303 of the line 304 (and input capacitance of the load logic
`gate(s) 312) it is driving decreases. Here, in order to raise the
`voltage on the line from a logic low level to a logic high
`level, a sufficiently strong current 305 needs to be driven by
`55 the source transistor 302_1 through the line to rapidly apply
`charge to the capacitance 303 (and thereby raise the voltage
`on the line). Similarly, in order to lower the voltage on the
`line from a logic high level to a logic low level, a sufficiently
`strong current 306 needs to be "sunk" by the sink transistor
`60 302_2 through the line to rapidly draw charge off the
`capacitance (and thereby lower the voltage on the line).
`Essentially, the amount of current the transistors 302_1,
`302_2 will source/sink is a function of their respective
`widths. That is, the wider the transistors are, the more
`65 current they will source/sink. Moreover, the amount of
`current the transistors 302_1, 302_2 will source/sink is also
`a function of the supply voltage VCC that is applied to the
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention is illustrated by way of example
`and not limitation in the figures of the accompanying
`drawings, in which like references indicate similar elements
`and in which:
`
`Petitioner Samsung Ex-1001, 0011
`
`

`

`US 10,049,080 B2
`
`3
`driver circuit 310 observed in FIG. 3. Essentially, the higher
`the supply voltage, the stronger the source/sink currents will
`be.
`Further still, the rate at which the transistors will be able
`to apply/draw charge to/from the capacitor is a function of
`the size of the capacitance 303 of the line 304 being driven.
`Specifically, the transistors will apply/draw charge slower as
`the capacitance 304 increases and apply/draw charge faster
`as the capacitance 304 decreases. The capacitance 304 of the
`line is based on its physical dimensions. That is, the capaci- 10
`tance 304 increases the longer and wider the line, and by
`contrast, the capacitance 304 decreases the shorter and
`narrower the line is. The line itself is of fixed dimensions
`once the circuit is manufactured. Nevertheless, line width
`and line length are design parameters that designers must 15
`account for. The width of the line cannot be narrowed too
`much or else it will have the effect of increasing the line's
`resistance which will also slow down the rate of charge
`applied/drawn to/from the capacitor.
`A final speed factor is the frequency of the signal itself on 20
`the line. Essentially, circuits driven with a faster clock signal
`will more rapidly switch between applying and drawing
`charge to/from the line capacitance 304 than circuits with a
`slower clock signal. Here, more rapid switching corresponds
`to a circuit that is sending binary information faster.
`All of the factors described above for increasing the rate
`at which the charge on the capacitor is applied/drawn also
`lead to a circuit that consumes more power. That is, a circuit
`that is designed to have relatively wide source/sink transis(cid:173)
`tors, a high supply voltage, short load lines and receive a 30
`higher frequency clock signal will operate faster and there(cid:173)
`fore consume more power than circuits oppositely oriented
`as to these same parameters.
`Recalling the discussion of FIGS. 1 and 2, note that prior
`art multi core processor power management schemes have
`been implemented on processors whose constituent cores are
`identical. That is, referring to FIG. 1, all of cores 101_1 to
`lOl_N are identical in design. In other approaches, the cores
`are not identical but are radically different. Specifically, one
`of the cores is a low power core but the lower power
`characteristic is achieved by stripping out sizable chunks of
`logic circuitry as compared to the other cores. More spe(cid:173)
`cifically, the sizable chunks that are stripped out correspond
`to the logic that executes the program code instructions. Said
`another way, the low power core supports a reduced instruc(cid:173)
`tion set as compared to the higher performance cores. A
`problem with this approach, however, is that it is difficult for
`system software to adjust switch operation between proces(cid:173)
`sor cores having different instruction sets.
`FIG. 4 depicts a new approach in which at least one of the
`cores 401 is designed to be lower performance and therefore
`consume less power than other cores 402 in the processor.
`However, the lower power core(s) 401 has a same logic
`design as the higher power core(s) 402 and therefore sup(cid:173)
`ports the same instruction set 403 as the high power core(s) 55
`402. The low power core(s) 401 achieve a lower power
`design point by having narrower drive transistor widths than
`the higher power core(s) and/or having other power con(cid:173)
`sumption related design features, such as any of those
`discussed above with respect to FIG. 3, that are oppositely 60
`oriented than the same design features in the higher power
`cores.
`According to one approach, discussed in more detail
`below, when the multi-processor core is being designed, the
`same high level description ( e.g., the same VHDL or Verilog 65
`description) is used for both the higher performance/power
`core(s) and the lower performance/power core(s). When the
`
`4
`higher level descriptions are synthesized into RTL netlists,
`however, for the subsequent synthesis from an RTL netlist
`into a transistor level netlist, different technology libraries
`are used for the low power core(s) than the high power
`5 core(s). As alluded to above, the drive transistors of logic
`gates associated with the libraries used for the low power
`core(s) have narrower respective widths than the "same"
`transistors of the "same" logic gates associated with the
`libraries used for the high power cores.
`By design of the multiprocessor, referring to FIG. 5, the
`lower power core(s) exhibit inherently lower power con(cid:173)
`sumption (and processing performance) than the higher
`power core(s ). That is, for a same applied clock or operating
`frequency, because of its narrower drive transistor widths,
`for example, a lower power core will consume less power
`than a higher power core. Because of the narrower drive
`transistor widths, however, the lower power core has a
`maximum operating frequency that is less than the maxi(cid:173)
`mum operating frequency of the higher power core.
`The import of the lower power core, however, is that the
`multi-processor is able to entertain a power management
`strategy that is the same/similar to already existing power
`management strategies, yet, still achieve an even lower
`power consumption in the lower/lowest performance/power
`25 states. Specifically, recall briefly power state 202 of FIG. 2
`in which only one core is left operable (the remaining cores
`are disabled). Here, if the one remaining operable core is the
`low power core, the processor will exhibit even lower power
`consumption than the prior art low power state 202.
`The amount of reduced power savings 503 is directly
`observable in FIG. 5. Here, recall that all the processors
`were identical in the multi-processor that was discussed with
`respect to the prior art low power state 202 of FIG. 2. As
`such, even if the supply voltage and operating voltage was
`35 reduced to a minimum, the power consumption would be
`that of a higher power processor ( e.g., having wider drive
`transistor widths). This operating point is represented by
`point 504 of FIG. 5. By contrast, in the lowest power
`operating state of the improved multi-processor, if the
`40 operable core is a low power core it will consume power
`represented by point 505 of FIG. 5. As such, the improved
`processor exhibits comparatively lower power consumption
`at the lower/lowest performance operating states than the
`prior art multi-processor, while, at the same time, fully
`45 supporting the instruction set architecture the software is
`designed to operate on.
`FIG. 6 shows a power management process flow that can
`be executed, for example, with power management software
`that is running on the multi-processor (or another multi-
`50 processor or separate controller, etc.). Conversely, the power
`management process flow of FIG. 6 can be executed entirely
`in hardware on the multi-processor or by some combination
`of such hardware and software.
`According to the process flow of FIG. 6, from an initial
`state 601 where at least some high power processor cores
`and the low power core(s) are operating, in response to a
`continued drop in demand on the multi-processor, another
`high power core is disabled each time the continued drop in
`demand falls below some next lower threshold. For
`example, in a multi-processor core having sixteen cores
`where fourteen cores are high power cores and two cores are
`low power cores, the initial state 601 may correspond to a
`state where seven of the high power cores and both of the
`low power cores are operational.
`In response to continued lower demand placed on the
`multi-processor, the seven high power cores will be disabled
`one by one with each new lower demand threshold 602. For
`
`Petitioner Samsung Ex-1001, 0012
`
`

`

`US 10,049,080 B2
`
`5
`instance, as observed at inset 610, demand level 611 justifies
`enablement of the seven high power cores and both low
`power cores. As the demand continually drops to a next
`lower threshold 612, one of the high power cores is disabled
`613 leaving six operable high power cores and two low 5
`power cores.
`Before the high power core is disabled, as a matter of
`designer choice, the core's individual operating frequency,
`or the operating frequency of all ( or some of) the enabled
`high power cores, or the operating frequency of all ( or some 10
`of) the enabled high power cores and the low power cores
`may be lowered to one or more lower operating frequency
`levels.
`A similar designer choice exists with respect to the supply
`voltages applied to the cores. That is, before the high power
`core is disabled, as a matter of designer choice, the core's
`individual supply voltage, or the supply voltage of all ( or
`some of) the enabled high power cores, or the supply voltage
`of all ( or some of) the enabled high power cores and the low
`power cores may be lowered to one or more lower supply
`voltages. Supply voltages may be lowered in conjunction
`with the lowering of operating frequency, or, just one or
`none of these parameters may be lowered as described
`above.
`Eventually, with the continued drop in demand, the last 25
`remaining high power core will be disabled 615 after
`demand falls below some lower threshold 614. This leaves
`only the low power cores in operation. Operating frequency
`and/or supply voltage of the low power core(s) may likewise
`be lowered as demand continues to drop beneath level 614.
`With continued drop in demand a similar process of dis(cid:173)
`abling cores as demand falls below each next lower demand
`threshold 604 continues until the multi-processor core is left
`with only one low power core remaining as its sole operating
`core 606.
`State 606 is reached of course with the disablement of the
`last high power core in implementations where the processor
`only has one lower power core. Again supply voltage and/or
`operating frequency of the sole remaining low power core
`may be lowered as demand continues to fall. Importantly, in
`state 606, as discussed above, the multi-processor will
`exhibit lower power consumption than other multi-processor
`cores having an identical power management scheme but
`whose constituent cores are all high power cores. Even
`lower power consumption can be provided for in state 606
`if the sole operating low power core is provided with a lower
`supply voltage and/or lower operating frequency that the
`lowest operating supply voltage and/or operating frequency
`applied to the high power cores.
`No special adjustment needs to be made by or for appli(cid:173)
`cation software, virtual machine or virtual machine monitor
`when the system is running only on the low power core(s)
`after all the high power cores are disabled. Again, the
`preservation of the same instruction core across all cores in
`the system corresponds to transparency from the software's 55
`perspective as to the underlying cores. Lower performance
`may be recognized with lower cores but no special adjust(cid:173)
`ments as to the content of the instruction streams should be
`necessary. In various alternate implementations: 1) the hard(cid:173)
`ware/machine readable firmware can monitor and control
`the core mix; or, 2) the hardware can relinquish control to
`the Operating system and let it monitor the demand and
`control the core mix.
`FIG. 7 shows essentially a reverse of the processes
`described above. As observed in FIG. 7, starting from a state 65
`in which only a single low power core is operating 701
`additional low power cores are enabled (if any more) 702 as
`
`6
`demand on the multi-processor continually increases. Even(cid:173)
`tually, high power cores are enabled 703. Notably, the
`demand threshold needed to enable a next processor from an
`operating low power processor may correspond to a lower
`demand increment than the demand threshold needed to
`enable to a next processor from an operating high power
`processor.
`That is, inset 710 shows the increase in demand 711
`needed after a low power processor is first enabled to trigger
`the enablement of a next processor in the face of increased
`demand. The increase in demand 712 needed after a high
`power processor is first enabled to trigger enablement of a
`next high power processor in the face of increased demand
`is greater than the aforementioned demand 711. This is so
`15 because a high power processor is able to handle more total
`demand than a low power processor and therefore does not
`need to have additional "help" as soon as a low power
`processor does.
`Operating frequency and/or supply voltage may also be
`20 increased in conjunction with the enablement of cores in the
`face of increased demand in a logically inverse mamier to
`that discussed above with respect to the disablement of
`cores.
`FIG. 8 shows a design process for designing a multi-core
`processor consistent with the principles discussed above. As
`part of the design process, high level behavioral descriptions
`800 (e.g., VHDL or Verilog descriptions) for each of the
`processor's cores are synthesized into a Register Transfer
`Level (RTL) netlist 801. The RTL netlist is synthesized 802
`30 into corresponding higher power core gate level netlist(s)
`( one for each high power ore) with libraries corresponding
`to a higher power/performance design ( such as logic circuits
`having wider drive transistors). The RTL netlist is also
`synthesized 803 into corresponding lower power core gate
`35 level netlist(s) (one for each low power core) with libraries
`corresponding to a lower power/performance design (such
`as logic circuits having wider drive transistors). Here, the
`logic designs for the high power and low power cores are the
`same but the design of their corresponding logic circuits
`40 have different performance/power design points.
`The transistor level netlists for the respective cores are
`then used as a basis for performing a respective place, route
`and timing analysis 806 and design layout 807. Here, the
`lower power/performance cores may have more relaxed
`45 placement and timing guidelines owing to the larger per(cid:173)
`missible propagation delay through and between logic cir(cid:173)
`cuits. Said another way, recalling from the discussion of
`FIG. 3 that longer load lines result in slower rise and fall
`times, the lower performance cores may permit longer load
`50 lines between transistors and gates because these cores are
`designed to have slower operation ( of course, if load lines
`are increased to much along with the inclusion of narrower
`drive transistors, the drop in performance may be more than
`desired).
`Upon completion of the layout and timing analysis, the
`cores are cleared for manufacture upon a clean manufactur(cid:173)
`ing ground rule check 808.
`Processes taught by the discussion above may be per(cid:173)
`formed with program code such as machine-executable
`60 instructions that cause a machine that executes these instruc-
`tions to perform certain functions. In this context, a
`"machine" may be a machine that converts intermediate
`form ( or "abstract") instructions into processor specific
`instructions ( e.g., an abstract execution environment such as
`a "virtual machine" (e.g., a Java Virtual Machine), an
`interpreter, a Common Language Runtime, a high-level
`language virtual machine, etc.)), and/or, electronic circuitry
`
`Petitioner Samsung Ex-1001, 0013
`
`

`

`US 10,049,080 B2
`
`7
`disposed on a semiconductor chip ( e.g., "logic circuitry"
`implemented with transistors) designed to execute instruc(cid:173)
`tions such as a general-purpose processor and/or a special(cid:173)
`purpose processor. Processes taught by the discussion above
`may also be performed by (in the alternative to a machine or
`in combination with a machine) electronic circuitry designed
`to perform the processes ( or a portion thereof) without the
`execution of program code.
`It is believed that processes taught by the discussion
`above may also be described in source level program code
`in various object-orientated or non-object-orientated com(cid:173)
`puter programming languages (e.g., Java, C#, VB, Python,
`C, C++, J#, APL, Cobol, Fortran, Pascal, Perl, etc.) sup(cid:173)
`ported by various software development frameworks ( e.g.,
`Microsoft Corporation's .NET, Mono, Java, Oracle Corpo(cid:173)
`ration's Fusion, etc.). The source level program code may be
`converted into an intermediate form of program code ( such
`as Java byte code, Microsoft Intermediate Language, etc.)
`that is understandable to an abstract execution environment
`(e.g., a Java Virtual Machine, a Common Language Run(cid:173)
`time, a high-level language virtual machine, an interpreter,
`etc.) or may be compiled directly into object code.
`According to various approaches the abstract execution
`environment may convert the intermediate form program
`code into processor specific code by, 1) compiling the
`inte

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket