`Vajda
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,984,523 B2
`Mar. 17, 2015
`
`USOO8984523B2
`
`(54) METHOD FOR EXECUTING SEQUENTIAL
`CODE ON THE SCALABLE PROCESSORAT
`INCREASED FREQUENCY WHILE
`SWITCHING OFF THENON-SCALABLE
`PROCESSOR CORE OF AMULTICORE CHIP
`(75) Inventor: Andras Vajda, Sundsberg (FI)
`(73) Assignee: Telefonaktiebolaget LM Ericsson
`(publ), Stockholm (SE)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 276 days.
`13/321,228
`May 26, 2009
`PCT/SE2009/050594
`
`(*) Notice:
`
`(21) Appl. No.:
`(22) PCT Filed:
`(86). PCT No.:
`S371 (c)(1),
`Nov. 18, 2011
`(2), (4) Date:
`(87) PCT Pub. No.: WO2O10/138031
`PCT Pub. Date: Dec. 2, 2010
`Prior Publication Dat
`O DO
`US 2012/OO6O17OA1
`Mar. 8, 2012
`51) Int. Cl
`(51) .of o/50
`(2006.01)
`G06F 9/48
`2OO 6. O1
`GO6F I/32
`3.
`(52) U.S. Cl
`(
`.01)
`AV e. we
`CPC
`2 .."8. oCEEs l/5.
`of E 3. 01); Y02B 05, 7 26. 01)
`.4
`foil - - - - - in A. 220. 713/323; 713/324
`OSSO Sea
`CPC. G06F 9/4893; G06F 1/3287: G06F 1/3203;
`Y02B 60/144: YO2B 60/1217
`USPC ................................... 718/104; 713/320, 324
`See application file for complete search history.
`
`65
`(65)
`
`(58)
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`6,986,068 B2* 1/2006 Togawa ......................... T13,320
`7,093,147 B2 * 8/2006 Farkas et al. .................. T13,320
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`WO WO 2005O10737 A2 * 2, 2005
`WO WO 2007/O19003. A
`2, 2007
`
`OTHER PUBLICATIONS
`Annavaram et al., Mitigating Amdahl's law through EPI throttling,
`2005. In proceedings of international Symposium on computer archi
`tecture, pp. 298–309.*
`
`(Continued)
`
`Primary Examiner — Meng An
`Assistant Examiner — Abu Ghaffari
`
`ABSTRACT
`(57)
`Method and scheduler in an operating system, for scheduling
`processing resources on a multi-core chip. The multi-core
`chip comprises a plurality of processor cores. The operating
`system is configured to Schedule processing resources to an
`application to be executed on the multi-core chip. The method
`comprises allocating a plurality of processor cores to the
`application. Also, the method comprises Switching off
`another processor core allocated to the application, not
`executing the sequential portion of the application, when a
`sequential portion of the application is executing on only one
`processor core. In addition, the method comprises increasing
`the frequency of the one processor core executing the appli
`cation to the second frequency, such that the processing speed
`is increased more than predicted by Amdahl's law.
`
`12 Claims, 6 Drawing Sheets
`
`Allocate a plurality of processor cores to the application.
`
`401
`
`
`
`
`
`
`
`Sequential code
`portion executing?
`
`Switch off other processor cores allocated to the application
`not executing sequential portion of the application,
`
`402
`
`Available power
`budget sufficient
`
`Yes
`
`decrease frequency of sole scalable
`core of other low priority application.
`
`
`
`Increase frequency of the ore processor core executing
`the application code, such that the processing speed is
`increased more than predicted by Amdahl's law.
`
`404
`
`Restore the frequency of the processor cores allocated to
`the application, to the resp. earlier operating frequency.
`
`405
`
`END
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 1 of 15
`
`
`
`US 8,984,523 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`7,937,568 B2*
`2003/0O888OO A1*
`2005/0223249 A1*
`2007/022O294 A1*
`2007/0288769 A1*
`2008.0005592 A1
`
`5, 2011
`5/2003
`10, 2005
`9, 2007
`12, 2007
`1, 2008
`
`
`
`Correale et al. .............. T12/221
`Cai .............
`713,320
`Samson ........................ T13,320
`Lippett ......................... T13,320
`Chang et al. .................. T13,300
`Allarey et al.
`
`OTHER PUBLICATIONS
`
`Kumar et al., Hetrogeneous Chip Multiprocessors, 2005, IEEE, pp.
`32-38.*
`Kumar, R., et al., “A multi-core approach to addressing the energy
`complexity problem in microprocessors.” In Workshop On Complex
`ity-Effective Design, Jun. 2003.
`
`* cited by examiner
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 2 of 15
`
`
`
`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 1 of 6
`
`US 8,984,523 B2
`
`Speed-up according to Amdahl's law
`
`
`
`8
`
`16
`
`32
`
`64
`
`256
`
`1 O24 2048
`
`Number of Cores
`
`Fig. 1
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 3 of 15
`
`
`
`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 2 of 6
`
`US 8,984,523 B2
`
`210
`Application software
`
`
`
`
`
`
`
`User Layer 215
`
`Operating System 225
`
`
`
`
`
`220
`Scheduler
`
`
`
`Hardware Layer 235
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`232-1
`Processor
`COe
`
`31
`Processor
`COG
`
`232-2
`Processor
`COS
`
`232-3
`Processor
`COre
`
`232-4
`Processor
`COe
`
`232-5
`Processor
`COre
`
`232-6
`PrOCessor
`COe
`
`232-7
`PrOCessor
`COre
`
`230 Multi-core chip
`
`Fig. 2
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 4 of 15
`
`
`
`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 3 of 6
`
`US 8,984,523 B2
`
`
`
`232-1
`Processor
`COre
`
`232-2
`Processor
`COre
`
`232-8
`Processor
`COre
`
`232-9
`Processor
`COe
`
`232-3
`Processor
`COre
`
`231-1
`Processor
`COre
`
`232-4
`Processor
`COre
`
`232-5
`Processor
`CO6
`
`232-6
`Processor
`COe
`
`232-7
`Processor
`COre
`
`232-10
`PrOCessor
`CO6
`
`231-2
`Processor
`COe
`
`232-11
`PrOCeSSOr
`CO6
`
`232-12
`Processor
`COre
`
`232-13
`PrOCeSSOr
`COre
`
`232-14
`Processor
`COe
`
`232-15
`Processor
`CO?e
`
`232-16
`Processor
`CO6
`
`232-22
`Processor
`CO6
`
`232-23
`Processor
`COre
`
`232-17
`Processor
`COre
`
`231-3
`Processor
`COe
`
`232-24
`Processor
`COC
`
`231-4
`Processor
`CO6
`
`232-18
`Processor
`CO6
`
`Processor
`CO6
`
`232-19
`Processor
`CO6
`
`232-21
`Processor
`COre
`
`232-25
`Processor
`COre
`
`232-27
`Processor
`COre
`
`232-26
`Processor
`COre
`
`30 Multi-core chip
`
`Fig. 3
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 5 of 15
`
`
`
`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 4 of 6
`
`US 8,984,523 B2
`
`START
`
`Allocate a plurality of processor cores to the application.
`
`401
`
`No
`
`
`
`Sequential Code
`portion executing?
`
`Yes
`Switch off other processor cores allocated to the application,
`not executing sequential portion of the application.
`
`402
`
`
`
`
`
`No
`
`Available power
`budget sufficient?
`
`Yes
`
`Decrease frequency of some scalable
`Core of other low priority application.
`
`403
`
`
`
`Increase frequency of the One processor Core executing
`the application code, such that the processing Speed is
`increased more than predicted by Amdahl's law.
`
`404
`
`Restore the frequency of the processor cores allocated to
`the application, to the resp. earlier operating frequency.
`
`405
`
`Fig. 4
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 6 of 15
`
`
`
`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 5 of 6
`
`US 8,984,523 B2
`
`Processing
`Resources
`
`
`
`Allocation
`unit
`
`520
`Switching
`unit
`
`530
`increasing
`unit
`
`Processing
`Resources
`
`540
`Restoring unit
`
`20 Scheduler
`
`Fig. 5
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 7 of 15
`
`
`
`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 6 of 6
`
`US 8,984,523 B2
`
`Speedup
`
`
`
`|N|\|\|||||||| | || NV \|
`|||N||||||||
`
`Number of Cores
`
`Fig. 6
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 8 of 15
`
`
`
`US 8,984,523 B2
`
`2
`where computing resources could be utilized either as several
`cores executing in parallel, for the execution of the parallel
`sections of the applications; or as a single, fast, powerful core
`for running the single-threaded, sequential portions of the
`applications. However, so far there is no solution that would
`scale with the number of cores.
`Thus there is a problem to increase the processing speed of
`application code in a computer.
`
`5
`
`1.
`METHOD FOR EXECUTING SEQUENTIAL
`CODE ON THE SCALABLE PROCESSORAT
`INCREASED FREQUENCY WHILE
`SWITCHING OFF THENON-SCALABLE
`PROCESSOR CORE OF AMULTICORE CHIP
`
`TECHNICAL FIELD
`
`The present invention relates to a method and a scheduler in
`an operating system and, more in particular, to a mechanism
`for scheduling processing resources on a multi-core chip.
`
`10
`
`BACKGROUND
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`The computing industry’s development over the last
`decades has beenlargely defined by a dramatic increase of the
`available computing power, driven primarily by the self-ful
`filment of Moore's law. However, recently this development
`has been heavily impacted by the inability of scaling perfor
`mance further through increased frequency and complexity
`of processor devices, due primarily to power and cooling
`issues; as an answer, the industry turned to Chip-MultiPro
`cessors (CMP) or multi-core devices. It is predicted that the
`advancement of manufacturing technology, in fact, the con
`tinued development according to Moore's law, will allow for
`chips comprising 100s or even 1000s of processor cores.
`Currently operating system schedulers work on the prin
`ciple of assigning tasks to available hardware resources Such
`as processor cores or hardware threads and sharing a core
`among multiple applications.
`Current approaches have a number of limitations. They do
`not scale well to high number of cores within the same chip.
`Micro-managing each core does not pay off well and may not
`be economical for simple cores. Further, current solutions do
`not consider dynamic variations in the processing require
`ments of applications. An application may be at different
`point in time executing in single threaded mode, while at
`other moments may require multiple threads of execution.
`Chip multiprocessors pose however new requirements on
`operating systems and applications. It appears that current
`symmetric multi-processing, time-shared operating systems
`will not scale to several tens, less to hundreds or thousands of
`processor cores. On the application side, Amdahl's law, even
`its version for chip multi-processors, will put a limit to the
`amount of performance increase that can beachieved even for
`highly parallelized applications, assuming static symmetric
`or static asymmetric multi-core chips as illustrated in FIG.1.
`For example, an application with only 1% sequential code can
`achieve, according to Amdahl's law, a theoretical speed-up
`factor of 7.4 on 8 cores, 14 on 16 cores, but only 72 on 256
`cores and 91 on 1024 cores, compared with running on a
`single core.
`Amdahl's law for CMP is expressed as:
`
`50
`
`55
`
`60
`
`65
`
`1
`Speedupi (f, n, r) = - If f
`-- -
`perf(r)
`in
`
`Where f is the fraction of the code that can execute in
`parallel, n is the number of processor cores, ris the number of
`processor cores that can be used jointly to execute sequential
`portions and perf(r) is the resulting performance, assumed to
`be less than r.
`Better Scalability, approaching linear for highly parallel
`ized applications, could theoretically be obtained with chips
`
`SUMMARY
`
`It is the object to obviate at least some of the above disad
`Vantages and provide an improved performance of applica
`tions executing on a multi-core system.
`According to a first aspect, the object is achieved by a
`method in an operating system for scheduling processing
`resources on a multi-core chip. The multi-core chip com
`prises a plurality of processor cores. The operating system is
`configured to schedule processing resources to an application
`to be executed on the multi-core chip. The method comprises
`allocating a plurality of processor cores to the application.
`Also, the method comprises Switching off processor cores
`allocated to the application, not executing the sequential por
`tion of the application, when a sequential portion of the appli
`cation is executing on only one processor core. In addition,
`the method comprises increasing the frequency of the one
`processor core executing the application, Such that the pro
`cessing speed is increased more than predicted by Amdahl's
`law.
`According to a second aspect, the object is also achieved by
`a scheduler in an operating system for scheduling processing
`resources on a multi-core chip. The processing resources are
`allocated to an application to be executed on the multi-core
`chip. The multi-core chip comprises a plurality of processor
`cores. The scheduler comprises an allocation unit. The allo
`cation unit is adapted to allocate a plurality of processor cores
`to the application. Also, the scheduler comprises a Switching
`unit. The Switching unit is adapted to Switch off a processor
`core, allocated to the application, not executing a sequential
`portion of the application. Further, the scheduler comprises
`an increasing unit. The increasing unit is adapted to increase
`the frequency of the one processor core executing the appli
`cation, such that the processing speed is increased more than
`predicted by Amdahl's law.
`The present solution is based on a space shared approach
`for Scheduling multiple processes on a many core chip. Also,
`the present Solution rely on adapting performance of indi
`vidual cores and the chip as a whole by controlled and sched
`uled variation of the frequency at which different cores
`execute, in contrast with current thread migration strategies.
`Instead of trying to guess the best thread assignment policy in
`the operating system rely on application generated requests
`for computing resources, similarly to how memory is cur
`rently managed. Thereby is a near-linear Scaling of speed with
`the number of processing cores achieved. Further, the present
`Solution addresses the issue of dynamically adapting process
`ing needs to the requirements of applications. In addition, as
`the present Solution is not relying on thread migration, hence
`improving cache usage characteristics. Thereby is an
`improved performance within a wireless communication sys
`tem achieved.
`Other objects, advantages and novel features of the inven
`tion will become apparent from the following detailed
`description of the invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention will now be described more in detail
`in relation to the enclosed drawings, in which:
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 9 of 15
`
`
`
`US 8,984,523 B2
`
`4
`
`3
`FIG. 1 is a diagram illustrating Speed-up according to
`Amdahl's law.
`FIG. 2 is a schematic block diagram illustrating a system
`according to some embodiments.
`FIG. 3 is a schematic block diagram illustrating a Multi
`core chip according to some embodiments.
`FIG. 4 is a flow chart illustrating embodiments of method
`steps in an operating system.
`FIG. 5 is a block diagram illustrating embodiments of a
`scheduler in an operating system.
`FIG. 6 is a diagram illustrating Speed-up according to
`Some embodiments of the present method.
`
`DETAILED DESCRIPTION
`
`Number
`of cores
`
`Theoretical
`speed-up
`
`Suggested
`factor
`
`8
`16
`32
`64
`256
`1024
`2048
`
`2
`2.6
`3.2
`4
`6.4
`10.1
`12.7
`
`2
`2
`2
`4
`4
`8
`8
`
`Embodiments of the present invention are thus based on the
`concept of a space-shared operating system 225, whereas the
`operating system 225 executes as service on some of the cores
`231, while other cores 232 execute just a microkernel without
`threading support. Cores 231, 232 may further be allocated to
`applications 210 upon request. The total power the many-core
`chip may use to execute applications 210 is managed by the
`operating system 225. This global resource is allocated to
`applications 210 through two resources; cores 231, 232 and
`frequencies at which these cores 231, 232 execute, i.e. each
`application may be allocated a number of processor cores
`231, 232, but the total computing power of these allocated
`number of processor cores 231, 232, i.e. the frequency at
`which these will execute, may be adjusted based on applica
`tion needs and competing resource requests, within the limits
`set by the amount of scalable cores allocated to an application
`210, the maximum frequency Scaling factor that may be used
`for those cores 231, 232 and the total power budget of the chip
`23O.
`In order to allow the scheduler 220 to allocate computing
`resources, each application 210 may provide the operating
`system 225 with three parallelism parameters. The first
`parameter is degree of parallelism. The degree of parallelism
`may be expressed as a fraction of the computation that may
`execute in parallel on multiple cores 231, 232. The second
`parameter is the minimum amount of cores 231, 232 that the
`application 210 needs in order to execute. The third parameter
`is the priority level of the application 210, which may appear
`quite similar to “process priority, as previously known.
`Thus each application 210 may request two types of com
`puting resources from the operating system 225. The first type
`of resource is processor core resource, which may be of type
`non-scalable, i.e. fixed frequency, or scalable i.e. Scalable
`frequency. In the herein described non-limiting exemplary
`embodiments, the cores 231 are scalable and the cores 232
`non-scalable. The allocation of this type of resource can occur
`at any time, e.g. at application start-up based on the parallel
`ism parameters and/or during execution based on workload
`changes.
`The second type of resource is execution speed. There are
`two scenarios for an application 210 to request allocation/de
`allocation of Such resources. Temporary frequency boost/
`restore to normal for the scalable cores 231 allocated to it, or
`shut off/restore to normal for any core 231, 232 allocated to
`the application 210.
`From application perspective, the present method may
`assume a behaviour similar to handling memory today. Thus
`the application 210 may specify its parallelism parameters at
`start-up according to some embodiments. Also, the applica
`tion 210 may request shutting off cores 231,232 immediately
`when these are not needed, e.g. once a parallel computation
`has finished and a single-threaded sequence begins. Further,
`according to Some embodiments, the application 210 may
`request frequency boosting as a resource allocation request.
`The application 210 may according to some embodiments be
`responsible for requesting restoring to normal operating fre
`
`10
`
`15
`
`The invention is defined as a method and a scheduler in an
`operating system, which may be put into practice in the
`embodiments described below. This invention may, however,
`be embodied in many different forms and should not be
`construed as limited to the embodiments set forth herein;
`rather, these embodiments are provided so that this disclosure
`will be thorough and complete, and will fully convey the
`scope of the invention to those skilled in the art. It should be
`understood that there is no intent to limit the present method
`and/or scheduler to any of the particular forms disclosed, but
`on the contrary, the present methods and Schedulers are to
`cover all modifications, equivalents, and alternatives falling
`within the scope of the invention as defined by the claims.
`The present invention may, of course, be carried out in
`other ways than those specifically set forth herein without
`departing from essential characteristics of the invention. The
`present embodiments are to be considered in all respects as
`illustrative and not restrictive, and all changes coming within
`the meaning and equivalency range of the appended claims
`are intended to be embraced therein.
`FIG. 2 is a schematic illustration over a computer system
`200. The computer system 200 may be situated within a
`computing device.
`The computer system 200 comprises a user layer 215. An
`application software 210, within the user layer 215, may run
`in the computer system 200.
`The computer system 200 further comprises an operating
`system 225 and a hardware layer 235.
`The operating system 225 comprises a scheduler 220. The
`scheduler 220 is configured to schedule processing resources.
`The hardware layer 235 may comprise a multi-core chip
`230. The multi-core chip 230, which may be referred to as a
`multi-core processor, comprises a plurality of processor cores
`231 and 232 although the cores 231, 232 in the multi-core
`chip 230 may vary in number in other embodiments. The
`processor cores 231 and 232 may further comprise further
`resources e.g. a cache memory, instruction processing engine,
`just to mention some arbitrary examples of core resources.
`In the table below is summarized the expected speed-up as
`well as the Suggested levels according to some embodiments.
`Even for 2048 core systems 200, five frequency levels (0, 1X,
`2X, 4x, 8X) may suffice, with an acceptable impact on the
`theoretical performance. The performance differences are in
`the range of 1%-25% for applications with degrees of paral
`lelism between 0.5 and 0.999, higher for applications with
`less degree of parallelism. However, all cores 231, 231, scal
`able or not, shall support at least levels 0 and 1x, i.e. it may be
`possible to switch off any core 231, 232.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 10 of 15
`
`
`
`5
`quency of any core 231, 232 that was previously shut down.
`Further yet, the application 210 may be responsible for
`requesting allocation/release of processor cores 231, 232 of
`various types i.e. scalable/non-scalable.
`According to some embodiments, the application 210 may
`be responsible to decide on which processor core 231, 232 a
`certain computation is to be executed, which requires a shift
`from today’s approach where the application 210 just pro
`vides a set of threads that the operating system 225 will
`deploy on available resources
`The present method is based on the following principal
`idea of having the processor cores 231, 232 on the chip 230
`space-shared rather than time-shared. Thus each processor
`core 231, 232 may be allocated to one and only one applica
`tion 210. Also, the allocation of processor cores 231,232 may
`be geometry aware, in the sense that the operating system 225
`may group resources allocated to the same application 210
`close to each other. Further, the operating system 225 may
`only shut off or free up allocated processor cores 231, 232 at
`the explicit request of the application 210 or when the appli
`cation 210 exits. According to yet some embodiments, the
`operating system 225 may adjust the speed of individual
`Scalable cores 231 if needed to accommodate new requests.
`All non-allocated cores 232 may be kept in shut-off state.
`Further yet, the processing resource requests may have a wait
`time attached. Such that a request with a Zero wait time may
`return immediately, either with allocated resources or a nega
`tive response. A request with a positive wait-time may either
`return immediately with successfully allocated resources or
`be queued by the operating system 225 until it can be fulfilled,
`or a timer expires.
`The present method according to some embodiments may
`begin with shutting off all cores 231, 232 that do not execute
`operating system code. When a new application 210 is
`started, it gets allocated, if possible, the amount of processor
`cores 231, 232 that it has specified in its parallelism param
`eters. Initially, all cores 231, 232 may be started at normal, 1X
`speed. Allocation of processor cores 231, 232 may be pos
`sible if the current power usage and the power required by the
`cores 231, 232 according to the application parallelism
`parameters is less than, or equal to, the total power budget, or
`the application 210 has a priority level that triggers re-sched
`uling of power budgets and the re-scheduling results in having
`enough power budget freed up.
`When an application 210 requests the allocation of more
`processor cores 231, 232, the request may be fulfilled if the
`resulting power budget does not exceeds the total budget, or
`the application 210 has a priority level that triggers re-sched
`uling of power budgets and the re-scheduling results in having
`enough power budget freed up.
`The operating system 225 actively re-calculates power
`budgets every time a application 210 requests shut off or
`restore of individual cores 231, 232 and may process pending
`requests if possible. Restoring may only succeed if the result
`ing power budget does not exceeds the total budget, or the
`application 210 has a priority level that triggers re-scheduling
`of power budgets and the re-scheduling results in having
`enough power budget freed up.
`Restoring of operation of already allocated cores operating
`system 225 may have higher priority than allocation of new
`resources according to some embodiments.
`When an application 210 requests frequency boosting or
`restoring for a core 231, 232, the operating system 225 may
`fulfil the request if and to the extent that the result does not
`mean exceeding the overall power budget of the processor
`chip. If the application 210 has a priority level that may
`trigger re-scheduling of power budgets, such a re-scheduling
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 8,984,523 B2
`
`10
`
`15
`
`6
`may be performed. If the request was to reduce frequency, the
`operating system 225 may process pending resource requests.
`The operating system 225 may keep track of the period for
`which a scalable core 231 was running at boosted frequency;
`in order to avoid over-heating of the chip 230, the frequency
`may be scaledback after the pre-set maximum amount of time
`the core 231, 232 is allowed to run at such high frequencies.
`The scale-back may also trigger re-scheduling and servicing
`of outstanding requests. The scalable core 231 may be kept
`running at low frequency for a certain amount of time so as to
`allow a decrease in the temperature of the chip in order to
`avoid physical damage to the hardware, according to some
`embodiments. Thus a timer 550 may be used in order to
`prevent the scalable core 231 from increasing the frequency
`before a certain predetermined time interval has passed.
`According to some embodiments the timer 550 may be acti
`vated at the moment when the scalable core 231 decreases the
`frequency from an increased frequency to a working fre
`quency. A comparison between the measured timer value and
`a predetermined time limit may then be performed, such that
`the scalable core 231 is prevented from increasing the fre
`quency before a certain predetermined cooling period has
`passed, according to Some embodiments.
`Re-scheduling of resource allocations may be regarded as
`an advantageous component of the present method. It may
`occur every time when a request cannot be fulfilled without
`over-stepping the power budget and the requesting applica
`tion has a higher priority than at least some of the other
`executing applications 210.
`During re-scheduling, the operating system 225 may basi
`cally modify the clocking of frequency-boosted cores 231,
`232. As stated before, an already allocated core 231, 232 may
`not be taken away forcefully by the operating system 225,
`neither may it be shut down. This may be a pre-requisite to
`allow applications 210 to perform correctly. However, modi
`fying the frequency at which single-threaded portions of
`lower-prioritized applications execute is a sensible way of
`prioritizing applications, without starving any. Hence, the
`operating system 225 may calculate how much the frequency
`Scaling factor of some of the scalable cores executing lower
`prioritized applications is to be reduced in order to fit in the
`power budget the requirements of higher prioritized applica
`tions. The applications 210 may be notified about the change
`and the application 210 with higher priority get its resources
`allocated.
`There may be several different policies used by the sched
`uler 220 Such as spreading the scale-back as evenly as pos
`sible to several applications 210, reducing the impact on each
`individual application 210, but impacting several applica
`tions 210. Also, the scheduler 220 may focus on just enough
`applications 210, the ones with the lowest priority, to free up
`sufficient amount of power budget to fulfil the request.
`Requests with non-zero wait time that cannot be fulfilled
`may be queued and may be serviced based on the priority of
`the application 210 making the request. If a request cannot be
`fulfilled before its timer expires, the request may be rejected
`and the application 210 notified.
`In the space-shared operating system 225, the scheduling
`method outlined above may be implemented as two compo
`nents, placed in the core-local micro-kernel and a central
`resource management service.
`The core-local micro-kernel has the primary task of man
`aging the hardware and interfacing the application 210 and
`the central resource management service. It provides e.g. the
`following services: Change the frequency of the core 231,
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 11 of 15
`
`
`
`7
`232, shut down the core 231, 232, receive and forward to the
`central resource management service requests from the appli
`cation 210.
`In order to increase performance, replies from the central
`resource management service may be sent directly to the
`application 210, bypassing the local micro-kernel, according
`to Some embodiments.
`The central resource management service may be respon
`sible for keeping track of all the processing resources i.e.
`scalable cores 231 and non-scalable cores 232 and the overall
`power budget. It has the function of allocating processing
`resources to newly started applications 210, re-calculating
`core frequencies and request adjustment from local micro
`kernels, wake-up/shut-down cores 231, 232 and managing
`and tracking pending resource requests.
`The central resource management service itself may be
`implemented as an internally distributed service, i.e. execut
`ing on multiple cores 231, 232. “Central' in this context only
`means that there may be one Such service running in the
`system, visible to the distributed micro-kernels.
`The present scheduling method thus may handle two types
`of resources: non-scalable cores 232 and scalable-cores 231.
`Both of these resources scale, quantity-wise, linearly with the
`total number of cores 231, 232. In fact, the maximum number
`of scalable cores 231 equals n/8, while the maximum number
`of non-scalable cores 232 may be, obviously, less than n,
`where n is the total number of processor cores 231, 232 in the
`chip 230.
`The present scheduling method may perform two tasks:
`keeping track of allocated cores 231, 232 i.e. which core 231,
`232 to which application 210, and performing re-scheduling
`calculations.
`The first task may complexity-wise be quite similar to
`memory management and there are algorithms with a con
`stant complexity, not dependent upon the actual number of
`resources or users, i.e. number of cores 231, 232 and amount
`of applications 210.
`The task of re-scheduling may depend on the number of
`scalable cores 231, 232, limited to n/8 according to some
`embodiments. In fact, when a re-scheduling is to be per
`formed, the scheduler may perform the following two steps:
`Step 1
`Identify the number of scalable cores 231 that are candi
`date for scaling back. These are cores 231 allocated to appli
`cations 210 with priority lower than the application 210 issu
`ing the request that triggered the re-scheduling. As this
`information may be stored for each active scalable core 231,
`which may be limited to max n/8, according to Some embodi
`ments, this information may be obtained by traversing a list
`with n/8 elements, yielding a complexity of O(n/8).
`Step 2
`Out of the candidate scalable cores 231, the method may
`select which core 231 have their frequency reduced and by
`how much. The gain with each frequency-step can be pre
`calculated, hence the total number of frequency changes may
`be calculated as a simple calculation, which may be indepen
`dent of the total number of cores 231, 232, but dependent on
`the chosen policy; in any case it will be limited to the number
`of scalable cores, hence O(n/8).
`In conclusion, the complexity of the re-scheduling algo
`rithm may be O(n/8) in best case, with a worst case complex
`ity of O(n/4). The key point is though that it scales linearly
`with the potential number of scalable cores and involves, even
`for a 1024 core chip, working with tables of at most 128
`elements.
`Concerning memory, the scheduler may have to store only
`four pieces of information for each scalable core 231, to
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 8,984,523 B2
`
`10
`
`15
`
`8
`which application 210 it is allocated, current frequency scal
`ing factor, dead-line until it can execute at this frequency and
`application priority, for performance reasons, the amount of
`memory needed for even a 1024 core system may be just a few
`kilobytes. For each non-scalable cores information Such as
`current state i.e. on/o