throbber
(12) United States Patent
`Vajda
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,984,523 B2
`Mar. 17, 2015
`
`USOO8984523B2
`
`(54) METHOD FOR EXECUTING SEQUENTIAL
`CODE ON THE SCALABLE PROCESSORAT
`INCREASED FREQUENCY WHILE
`SWITCHING OFF THENON-SCALABLE
`PROCESSOR CORE OF AMULTICORE CHIP
`(75) Inventor: Andras Vajda, Sundsberg (FI)
`(73) Assignee: Telefonaktiebolaget LM Ericsson
`(publ), Stockholm (SE)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 276 days.
`13/321,228
`May 26, 2009
`PCT/SE2009/050594
`
`(*) Notice:
`
`(21) Appl. No.:
`(22) PCT Filed:
`(86). PCT No.:
`S371 (c)(1),
`Nov. 18, 2011
`(2), (4) Date:
`(87) PCT Pub. No.: WO2O10/138031
`PCT Pub. Date: Dec. 2, 2010
`Prior Publication Dat
`O DO
`US 2012/OO6O17OA1
`Mar. 8, 2012
`51) Int. Cl
`(51) .of o/50
`(2006.01)
`G06F 9/48
`2OO 6. O1
`GO6F I/32
`3.
`(52) U.S. Cl
`(
`.01)
`AV e. we
`CPC
`2 .."8. oCEEs l/5.
`of E 3. 01); Y02B 05, 7 26. 01)
`.4
`foil - - - - - in A. 220. 713/323; 713/324
`OSSO Sea
`CPC. G06F 9/4893; G06F 1/3287: G06F 1/3203;
`Y02B 60/144: YO2B 60/1217
`USPC ................................... 718/104; 713/320, 324
`See application file for complete search history.
`
`65
`(65)
`
`(58)
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`6,986,068 B2* 1/2006 Togawa ......................... T13,320
`7,093,147 B2 * 8/2006 Farkas et al. .................. T13,320
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`WO WO 2005O10737 A2 * 2, 2005
`WO WO 2007/O19003. A
`2, 2007
`
`OTHER PUBLICATIONS
`Annavaram et al., Mitigating Amdahl's law through EPI throttling,
`2005. In proceedings of international Symposium on computer archi
`tecture, pp. 298–309.*
`
`(Continued)
`
`Primary Examiner — Meng An
`Assistant Examiner — Abu Ghaffari
`
`ABSTRACT
`(57)
`Method and scheduler in an operating system, for scheduling
`processing resources on a multi-core chip. The multi-core
`chip comprises a plurality of processor cores. The operating
`system is configured to Schedule processing resources to an
`application to be executed on the multi-core chip. The method
`comprises allocating a plurality of processor cores to the
`application. Also, the method comprises Switching off
`another processor core allocated to the application, not
`executing the sequential portion of the application, when a
`sequential portion of the application is executing on only one
`processor core. In addition, the method comprises increasing
`the frequency of the one processor core executing the appli
`cation to the second frequency, such that the processing speed
`is increased more than predicted by Amdahl's law.
`
`12 Claims, 6 Drawing Sheets
`
`Allocate a plurality of processor cores to the application.
`
`401
`
`
`
`
`
`
`
`Sequential code
`portion executing?
`
`Switch off other processor cores allocated to the application
`not executing sequential portion of the application,
`
`402
`
`Available power
`budget sufficient
`
`Yes
`
`decrease frequency of sole scalable
`core of other low priority application.
`
`
`
`Increase frequency of the ore processor core executing
`the application code, such that the processing speed is
`increased more than predicted by Amdahl's law.
`
`404
`
`Restore the frequency of the processor cores allocated to
`the application, to the resp. earlier operating frequency.
`
`405
`
`END
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 1 of 15
`
`

`

`US 8,984,523 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`7,937,568 B2*
`2003/0O888OO A1*
`2005/0223249 A1*
`2007/022O294 A1*
`2007/0288769 A1*
`2008.0005592 A1
`
`5, 2011
`5/2003
`10, 2005
`9, 2007
`12, 2007
`1, 2008
`
`
`
`Correale et al. .............. T12/221
`Cai .............
`713,320
`Samson ........................ T13,320
`Lippett ......................... T13,320
`Chang et al. .................. T13,300
`Allarey et al.
`
`OTHER PUBLICATIONS
`
`Kumar et al., Hetrogeneous Chip Multiprocessors, 2005, IEEE, pp.
`32-38.*
`Kumar, R., et al., “A multi-core approach to addressing the energy
`complexity problem in microprocessors.” In Workshop On Complex
`ity-Effective Design, Jun. 2003.
`
`* cited by examiner
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 2 of 15
`
`

`

`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 1 of 6
`
`US 8,984,523 B2
`
`Speed-up according to Amdahl's law
`
`
`
`8
`
`16
`
`32
`
`64
`
`256
`
`1 O24 2048
`
`Number of Cores
`
`Fig. 1
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 3 of 15
`
`

`

`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 2 of 6
`
`US 8,984,523 B2
`
`210
`Application software
`
`
`
`
`
`
`
`User Layer 215
`
`Operating System 225
`
`
`
`
`
`220
`Scheduler
`
`
`
`Hardware Layer 235
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`232-1
`Processor
`COe
`
`31
`Processor
`COG
`
`232-2
`Processor
`COS
`
`232-3
`Processor
`COre
`
`232-4
`Processor
`COe
`
`232-5
`Processor
`COre
`
`232-6
`PrOCessor
`COe
`
`232-7
`PrOCessor
`COre
`
`230 Multi-core chip
`
`Fig. 2
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 4 of 15
`
`

`

`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 3 of 6
`
`US 8,984,523 B2
`
`
`
`232-1
`Processor
`COre
`
`232-2
`Processor
`COre
`
`232-8
`Processor
`COre
`
`232-9
`Processor
`COe
`
`232-3
`Processor
`COre
`
`231-1
`Processor
`COre
`
`232-4
`Processor
`COre
`
`232-5
`Processor
`CO6
`
`232-6
`Processor
`COe
`
`232-7
`Processor
`COre
`
`232-10
`PrOCessor
`CO6
`
`231-2
`Processor
`COe
`
`232-11
`PrOCeSSOr
`CO6
`
`232-12
`Processor
`COre
`
`232-13
`PrOCeSSOr
`COre
`
`232-14
`Processor
`COe
`
`232-15
`Processor
`CO?e
`
`232-16
`Processor
`CO6
`
`232-22
`Processor
`CO6
`
`232-23
`Processor
`COre
`
`232-17
`Processor
`COre
`
`231-3
`Processor
`COe
`
`232-24
`Processor
`COC
`
`231-4
`Processor
`CO6
`
`232-18
`Processor
`CO6
`
`Processor
`CO6
`
`232-19
`Processor
`CO6
`
`232-21
`Processor
`COre
`
`232-25
`Processor
`COre
`
`232-27
`Processor
`COre
`
`232-26
`Processor
`COre
`
`30 Multi-core chip
`
`Fig. 3
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 5 of 15
`
`

`

`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 4 of 6
`
`US 8,984,523 B2
`
`START
`
`Allocate a plurality of processor cores to the application.
`
`401
`
`No
`
`
`
`Sequential Code
`portion executing?
`
`Yes
`Switch off other processor cores allocated to the application,
`not executing sequential portion of the application.
`
`402
`
`
`
`
`
`No
`
`Available power
`budget sufficient?
`
`Yes
`
`Decrease frequency of some scalable
`Core of other low priority application.
`
`403
`
`
`
`Increase frequency of the One processor Core executing
`the application code, such that the processing Speed is
`increased more than predicted by Amdahl's law.
`
`404
`
`Restore the frequency of the processor cores allocated to
`the application, to the resp. earlier operating frequency.
`
`405
`
`Fig. 4
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 6 of 15
`
`

`

`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 5 of 6
`
`US 8,984,523 B2
`
`Processing
`Resources
`
`
`
`Allocation
`unit
`
`520
`Switching
`unit
`
`530
`increasing
`unit
`
`Processing
`Resources
`
`540
`Restoring unit
`
`20 Scheduler
`
`Fig. 5
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 7 of 15
`
`

`

`U.S. Patent
`
`Mar. 17, 2015
`
`Sheet 6 of 6
`
`US 8,984,523 B2
`
`Speedup
`
`
`
`|N|\|\|||||||| | || NV \|
`|||N||||||||
`
`Number of Cores
`
`Fig. 6
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 8 of 15
`
`

`

`US 8,984,523 B2
`
`2
`where computing resources could be utilized either as several
`cores executing in parallel, for the execution of the parallel
`sections of the applications; or as a single, fast, powerful core
`for running the single-threaded, sequential portions of the
`applications. However, so far there is no solution that would
`scale with the number of cores.
`Thus there is a problem to increase the processing speed of
`application code in a computer.
`
`5
`
`1.
`METHOD FOR EXECUTING SEQUENTIAL
`CODE ON THE SCALABLE PROCESSORAT
`INCREASED FREQUENCY WHILE
`SWITCHING OFF THENON-SCALABLE
`PROCESSOR CORE OF AMULTICORE CHIP
`
`TECHNICAL FIELD
`
`The present invention relates to a method and a scheduler in
`an operating system and, more in particular, to a mechanism
`for scheduling processing resources on a multi-core chip.
`
`10
`
`BACKGROUND
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`The computing industry’s development over the last
`decades has beenlargely defined by a dramatic increase of the
`available computing power, driven primarily by the self-ful
`filment of Moore's law. However, recently this development
`has been heavily impacted by the inability of scaling perfor
`mance further through increased frequency and complexity
`of processor devices, due primarily to power and cooling
`issues; as an answer, the industry turned to Chip-MultiPro
`cessors (CMP) or multi-core devices. It is predicted that the
`advancement of manufacturing technology, in fact, the con
`tinued development according to Moore's law, will allow for
`chips comprising 100s or even 1000s of processor cores.
`Currently operating system schedulers work on the prin
`ciple of assigning tasks to available hardware resources Such
`as processor cores or hardware threads and sharing a core
`among multiple applications.
`Current approaches have a number of limitations. They do
`not scale well to high number of cores within the same chip.
`Micro-managing each core does not pay off well and may not
`be economical for simple cores. Further, current solutions do
`not consider dynamic variations in the processing require
`ments of applications. An application may be at different
`point in time executing in single threaded mode, while at
`other moments may require multiple threads of execution.
`Chip multiprocessors pose however new requirements on
`operating systems and applications. It appears that current
`symmetric multi-processing, time-shared operating systems
`will not scale to several tens, less to hundreds or thousands of
`processor cores. On the application side, Amdahl's law, even
`its version for chip multi-processors, will put a limit to the
`amount of performance increase that can beachieved even for
`highly parallelized applications, assuming static symmetric
`or static asymmetric multi-core chips as illustrated in FIG.1.
`For example, an application with only 1% sequential code can
`achieve, according to Amdahl's law, a theoretical speed-up
`factor of 7.4 on 8 cores, 14 on 16 cores, but only 72 on 256
`cores and 91 on 1024 cores, compared with running on a
`single core.
`Amdahl's law for CMP is expressed as:
`
`50
`
`55
`
`60
`
`65
`
`1
`Speedupi (f, n, r) = - If f
`-- -
`perf(r)
`in
`
`Where f is the fraction of the code that can execute in
`parallel, n is the number of processor cores, ris the number of
`processor cores that can be used jointly to execute sequential
`portions and perf(r) is the resulting performance, assumed to
`be less than r.
`Better Scalability, approaching linear for highly parallel
`ized applications, could theoretically be obtained with chips
`
`SUMMARY
`
`It is the object to obviate at least some of the above disad
`Vantages and provide an improved performance of applica
`tions executing on a multi-core system.
`According to a first aspect, the object is achieved by a
`method in an operating system for scheduling processing
`resources on a multi-core chip. The multi-core chip com
`prises a plurality of processor cores. The operating system is
`configured to schedule processing resources to an application
`to be executed on the multi-core chip. The method comprises
`allocating a plurality of processor cores to the application.
`Also, the method comprises Switching off processor cores
`allocated to the application, not executing the sequential por
`tion of the application, when a sequential portion of the appli
`cation is executing on only one processor core. In addition,
`the method comprises increasing the frequency of the one
`processor core executing the application, Such that the pro
`cessing speed is increased more than predicted by Amdahl's
`law.
`According to a second aspect, the object is also achieved by
`a scheduler in an operating system for scheduling processing
`resources on a multi-core chip. The processing resources are
`allocated to an application to be executed on the multi-core
`chip. The multi-core chip comprises a plurality of processor
`cores. The scheduler comprises an allocation unit. The allo
`cation unit is adapted to allocate a plurality of processor cores
`to the application. Also, the scheduler comprises a Switching
`unit. The Switching unit is adapted to Switch off a processor
`core, allocated to the application, not executing a sequential
`portion of the application. Further, the scheduler comprises
`an increasing unit. The increasing unit is adapted to increase
`the frequency of the one processor core executing the appli
`cation, such that the processing speed is increased more than
`predicted by Amdahl's law.
`The present solution is based on a space shared approach
`for Scheduling multiple processes on a many core chip. Also,
`the present Solution rely on adapting performance of indi
`vidual cores and the chip as a whole by controlled and sched
`uled variation of the frequency at which different cores
`execute, in contrast with current thread migration strategies.
`Instead of trying to guess the best thread assignment policy in
`the operating system rely on application generated requests
`for computing resources, similarly to how memory is cur
`rently managed. Thereby is a near-linear Scaling of speed with
`the number of processing cores achieved. Further, the present
`Solution addresses the issue of dynamically adapting process
`ing needs to the requirements of applications. In addition, as
`the present Solution is not relying on thread migration, hence
`improving cache usage characteristics. Thereby is an
`improved performance within a wireless communication sys
`tem achieved.
`Other objects, advantages and novel features of the inven
`tion will become apparent from the following detailed
`description of the invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention will now be described more in detail
`in relation to the enclosed drawings, in which:
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 9 of 15
`
`

`

`US 8,984,523 B2
`
`4
`
`3
`FIG. 1 is a diagram illustrating Speed-up according to
`Amdahl's law.
`FIG. 2 is a schematic block diagram illustrating a system
`according to some embodiments.
`FIG. 3 is a schematic block diagram illustrating a Multi
`core chip according to some embodiments.
`FIG. 4 is a flow chart illustrating embodiments of method
`steps in an operating system.
`FIG. 5 is a block diagram illustrating embodiments of a
`scheduler in an operating system.
`FIG. 6 is a diagram illustrating Speed-up according to
`Some embodiments of the present method.
`
`DETAILED DESCRIPTION
`
`Number
`of cores
`
`Theoretical
`speed-up
`
`Suggested
`factor
`
`8
`16
`32
`64
`256
`1024
`2048
`
`2
`2.6
`3.2
`4
`6.4
`10.1
`12.7
`
`2
`2
`2
`4
`4
`8
`8
`
`Embodiments of the present invention are thus based on the
`concept of a space-shared operating system 225, whereas the
`operating system 225 executes as service on some of the cores
`231, while other cores 232 execute just a microkernel without
`threading support. Cores 231, 232 may further be allocated to
`applications 210 upon request. The total power the many-core
`chip may use to execute applications 210 is managed by the
`operating system 225. This global resource is allocated to
`applications 210 through two resources; cores 231, 232 and
`frequencies at which these cores 231, 232 execute, i.e. each
`application may be allocated a number of processor cores
`231, 232, but the total computing power of these allocated
`number of processor cores 231, 232, i.e. the frequency at
`which these will execute, may be adjusted based on applica
`tion needs and competing resource requests, within the limits
`set by the amount of scalable cores allocated to an application
`210, the maximum frequency Scaling factor that may be used
`for those cores 231, 232 and the total power budget of the chip
`23O.
`In order to allow the scheduler 220 to allocate computing
`resources, each application 210 may provide the operating
`system 225 with three parallelism parameters. The first
`parameter is degree of parallelism. The degree of parallelism
`may be expressed as a fraction of the computation that may
`execute in parallel on multiple cores 231, 232. The second
`parameter is the minimum amount of cores 231, 232 that the
`application 210 needs in order to execute. The third parameter
`is the priority level of the application 210, which may appear
`quite similar to “process priority, as previously known.
`Thus each application 210 may request two types of com
`puting resources from the operating system 225. The first type
`of resource is processor core resource, which may be of type
`non-scalable, i.e. fixed frequency, or scalable i.e. Scalable
`frequency. In the herein described non-limiting exemplary
`embodiments, the cores 231 are scalable and the cores 232
`non-scalable. The allocation of this type of resource can occur
`at any time, e.g. at application start-up based on the parallel
`ism parameters and/or during execution based on workload
`changes.
`The second type of resource is execution speed. There are
`two scenarios for an application 210 to request allocation/de
`allocation of Such resources. Temporary frequency boost/
`restore to normal for the scalable cores 231 allocated to it, or
`shut off/restore to normal for any core 231, 232 allocated to
`the application 210.
`From application perspective, the present method may
`assume a behaviour similar to handling memory today. Thus
`the application 210 may specify its parallelism parameters at
`start-up according to some embodiments. Also, the applica
`tion 210 may request shutting off cores 231,232 immediately
`when these are not needed, e.g. once a parallel computation
`has finished and a single-threaded sequence begins. Further,
`according to Some embodiments, the application 210 may
`request frequency boosting as a resource allocation request.
`The application 210 may according to some embodiments be
`responsible for requesting restoring to normal operating fre
`
`10
`
`15
`
`The invention is defined as a method and a scheduler in an
`operating system, which may be put into practice in the
`embodiments described below. This invention may, however,
`be embodied in many different forms and should not be
`construed as limited to the embodiments set forth herein;
`rather, these embodiments are provided so that this disclosure
`will be thorough and complete, and will fully convey the
`scope of the invention to those skilled in the art. It should be
`understood that there is no intent to limit the present method
`and/or scheduler to any of the particular forms disclosed, but
`on the contrary, the present methods and Schedulers are to
`cover all modifications, equivalents, and alternatives falling
`within the scope of the invention as defined by the claims.
`The present invention may, of course, be carried out in
`other ways than those specifically set forth herein without
`departing from essential characteristics of the invention. The
`present embodiments are to be considered in all respects as
`illustrative and not restrictive, and all changes coming within
`the meaning and equivalency range of the appended claims
`are intended to be embraced therein.
`FIG. 2 is a schematic illustration over a computer system
`200. The computer system 200 may be situated within a
`computing device.
`The computer system 200 comprises a user layer 215. An
`application software 210, within the user layer 215, may run
`in the computer system 200.
`The computer system 200 further comprises an operating
`system 225 and a hardware layer 235.
`The operating system 225 comprises a scheduler 220. The
`scheduler 220 is configured to schedule processing resources.
`The hardware layer 235 may comprise a multi-core chip
`230. The multi-core chip 230, which may be referred to as a
`multi-core processor, comprises a plurality of processor cores
`231 and 232 although the cores 231, 232 in the multi-core
`chip 230 may vary in number in other embodiments. The
`processor cores 231 and 232 may further comprise further
`resources e.g. a cache memory, instruction processing engine,
`just to mention some arbitrary examples of core resources.
`In the table below is summarized the expected speed-up as
`well as the Suggested levels according to some embodiments.
`Even for 2048 core systems 200, five frequency levels (0, 1X,
`2X, 4x, 8X) may suffice, with an acceptable impact on the
`theoretical performance. The performance differences are in
`the range of 1%-25% for applications with degrees of paral
`lelism between 0.5 and 0.999, higher for applications with
`less degree of parallelism. However, all cores 231, 231, scal
`able or not, shall support at least levels 0 and 1x, i.e. it may be
`possible to switch off any core 231, 232.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 10 of 15
`
`

`

`5
`quency of any core 231, 232 that was previously shut down.
`Further yet, the application 210 may be responsible for
`requesting allocation/release of processor cores 231, 232 of
`various types i.e. scalable/non-scalable.
`According to some embodiments, the application 210 may
`be responsible to decide on which processor core 231, 232 a
`certain computation is to be executed, which requires a shift
`from today’s approach where the application 210 just pro
`vides a set of threads that the operating system 225 will
`deploy on available resources
`The present method is based on the following principal
`idea of having the processor cores 231, 232 on the chip 230
`space-shared rather than time-shared. Thus each processor
`core 231, 232 may be allocated to one and only one applica
`tion 210. Also, the allocation of processor cores 231,232 may
`be geometry aware, in the sense that the operating system 225
`may group resources allocated to the same application 210
`close to each other. Further, the operating system 225 may
`only shut off or free up allocated processor cores 231, 232 at
`the explicit request of the application 210 or when the appli
`cation 210 exits. According to yet some embodiments, the
`operating system 225 may adjust the speed of individual
`Scalable cores 231 if needed to accommodate new requests.
`All non-allocated cores 232 may be kept in shut-off state.
`Further yet, the processing resource requests may have a wait
`time attached. Such that a request with a Zero wait time may
`return immediately, either with allocated resources or a nega
`tive response. A request with a positive wait-time may either
`return immediately with successfully allocated resources or
`be queued by the operating system 225 until it can be fulfilled,
`or a timer expires.
`The present method according to some embodiments may
`begin with shutting off all cores 231, 232 that do not execute
`operating system code. When a new application 210 is
`started, it gets allocated, if possible, the amount of processor
`cores 231, 232 that it has specified in its parallelism param
`eters. Initially, all cores 231, 232 may be started at normal, 1X
`speed. Allocation of processor cores 231, 232 may be pos
`sible if the current power usage and the power required by the
`cores 231, 232 according to the application parallelism
`parameters is less than, or equal to, the total power budget, or
`the application 210 has a priority level that triggers re-sched
`uling of power budgets and the re-scheduling results in having
`enough power budget freed up.
`When an application 210 requests the allocation of more
`processor cores 231, 232, the request may be fulfilled if the
`resulting power budget does not exceeds the total budget, or
`the application 210 has a priority level that triggers re-sched
`uling of power budgets and the re-scheduling results in having
`enough power budget freed up.
`The operating system 225 actively re-calculates power
`budgets every time a application 210 requests shut off or
`restore of individual cores 231, 232 and may process pending
`requests if possible. Restoring may only succeed if the result
`ing power budget does not exceeds the total budget, or the
`application 210 has a priority level that triggers re-scheduling
`of power budgets and the re-scheduling results in having
`enough power budget freed up.
`Restoring of operation of already allocated cores operating
`system 225 may have higher priority than allocation of new
`resources according to some embodiments.
`When an application 210 requests frequency boosting or
`restoring for a core 231, 232, the operating system 225 may
`fulfil the request if and to the extent that the result does not
`mean exceeding the overall power budget of the processor
`chip. If the application 210 has a priority level that may
`trigger re-scheduling of power budgets, such a re-scheduling
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 8,984,523 B2
`
`10
`
`15
`
`6
`may be performed. If the request was to reduce frequency, the
`operating system 225 may process pending resource requests.
`The operating system 225 may keep track of the period for
`which a scalable core 231 was running at boosted frequency;
`in order to avoid over-heating of the chip 230, the frequency
`may be scaledback after the pre-set maximum amount of time
`the core 231, 232 is allowed to run at such high frequencies.
`The scale-back may also trigger re-scheduling and servicing
`of outstanding requests. The scalable core 231 may be kept
`running at low frequency for a certain amount of time so as to
`allow a decrease in the temperature of the chip in order to
`avoid physical damage to the hardware, according to some
`embodiments. Thus a timer 550 may be used in order to
`prevent the scalable core 231 from increasing the frequency
`before a certain predetermined time interval has passed.
`According to some embodiments the timer 550 may be acti
`vated at the moment when the scalable core 231 decreases the
`frequency from an increased frequency to a working fre
`quency. A comparison between the measured timer value and
`a predetermined time limit may then be performed, such that
`the scalable core 231 is prevented from increasing the fre
`quency before a certain predetermined cooling period has
`passed, according to Some embodiments.
`Re-scheduling of resource allocations may be regarded as
`an advantageous component of the present method. It may
`occur every time when a request cannot be fulfilled without
`over-stepping the power budget and the requesting applica
`tion has a higher priority than at least some of the other
`executing applications 210.
`During re-scheduling, the operating system 225 may basi
`cally modify the clocking of frequency-boosted cores 231,
`232. As stated before, an already allocated core 231, 232 may
`not be taken away forcefully by the operating system 225,
`neither may it be shut down. This may be a pre-requisite to
`allow applications 210 to perform correctly. However, modi
`fying the frequency at which single-threaded portions of
`lower-prioritized applications execute is a sensible way of
`prioritizing applications, without starving any. Hence, the
`operating system 225 may calculate how much the frequency
`Scaling factor of some of the scalable cores executing lower
`prioritized applications is to be reduced in order to fit in the
`power budget the requirements of higher prioritized applica
`tions. The applications 210 may be notified about the change
`and the application 210 with higher priority get its resources
`allocated.
`There may be several different policies used by the sched
`uler 220 Such as spreading the scale-back as evenly as pos
`sible to several applications 210, reducing the impact on each
`individual application 210, but impacting several applica
`tions 210. Also, the scheduler 220 may focus on just enough
`applications 210, the ones with the lowest priority, to free up
`sufficient amount of power budget to fulfil the request.
`Requests with non-zero wait time that cannot be fulfilled
`may be queued and may be serviced based on the priority of
`the application 210 making the request. If a request cannot be
`fulfilled before its timer expires, the request may be rejected
`and the application 210 notified.
`In the space-shared operating system 225, the scheduling
`method outlined above may be implemented as two compo
`nents, placed in the core-local micro-kernel and a central
`resource management service.
`The core-local micro-kernel has the primary task of man
`aging the hardware and interfacing the application 210 and
`the central resource management service. It provides e.g. the
`following services: Change the frequency of the core 231,
`
`Patent Owner Daedalus Prime LLC
`Exhibit 2003 - Page 11 of 15
`
`

`

`7
`232, shut down the core 231, 232, receive and forward to the
`central resource management service requests from the appli
`cation 210.
`In order to increase performance, replies from the central
`resource management service may be sent directly to the
`application 210, bypassing the local micro-kernel, according
`to Some embodiments.
`The central resource management service may be respon
`sible for keeping track of all the processing resources i.e.
`scalable cores 231 and non-scalable cores 232 and the overall
`power budget. It has the function of allocating processing
`resources to newly started applications 210, re-calculating
`core frequencies and request adjustment from local micro
`kernels, wake-up/shut-down cores 231, 232 and managing
`and tracking pending resource requests.
`The central resource management service itself may be
`implemented as an internally distributed service, i.e. execut
`ing on multiple cores 231, 232. “Central' in this context only
`means that there may be one Such service running in the
`system, visible to the distributed micro-kernels.
`The present scheduling method thus may handle two types
`of resources: non-scalable cores 232 and scalable-cores 231.
`Both of these resources scale, quantity-wise, linearly with the
`total number of cores 231, 232. In fact, the maximum number
`of scalable cores 231 equals n/8, while the maximum number
`of non-scalable cores 232 may be, obviously, less than n,
`where n is the total number of processor cores 231, 232 in the
`chip 230.
`The present scheduling method may perform two tasks:
`keeping track of allocated cores 231, 232 i.e. which core 231,
`232 to which application 210, and performing re-scheduling
`calculations.
`The first task may complexity-wise be quite similar to
`memory management and there are algorithms with a con
`stant complexity, not dependent upon the actual number of
`resources or users, i.e. number of cores 231, 232 and amount
`of applications 210.
`The task of re-scheduling may depend on the number of
`scalable cores 231, 232, limited to n/8 according to some
`embodiments. In fact, when a re-scheduling is to be per
`formed, the scheduler may perform the following two steps:
`Step 1
`Identify the number of scalable cores 231 that are candi
`date for scaling back. These are cores 231 allocated to appli
`cations 210 with priority lower than the application 210 issu
`ing the request that triggered the re-scheduling. As this
`information may be stored for each active scalable core 231,
`which may be limited to max n/8, according to Some embodi
`ments, this information may be obtained by traversing a list
`with n/8 elements, yielding a complexity of O(n/8).
`Step 2
`Out of the candidate scalable cores 231, the method may
`select which core 231 have their frequency reduced and by
`how much. The gain with each frequency-step can be pre
`calculated, hence the total number of frequency changes may
`be calculated as a simple calculation, which may be indepen
`dent of the total number of cores 231, 232, but dependent on
`the chosen policy; in any case it will be limited to the number
`of scalable cores, hence O(n/8).
`In conclusion, the complexity of the re-scheduling algo
`rithm may be O(n/8) in best case, with a worst case complex
`ity of O(n/4). The key point is though that it scales linearly
`with the potential number of scalable cores and involves, even
`for a 1024 core chip, working with tables of at most 128
`elements.
`Concerning memory, the scheduler may have to store only
`four pieces of information for each scalable core 231, to
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 8,984,523 B2
`
`10
`
`15
`
`8
`which application 210 it is allocated, current frequency scal
`ing factor, dead-line until it can execute at this frequency and
`application priority, for performance reasons, the amount of
`memory needed for even a 1024 core system may be just a few
`kilobytes. For each non-scalable cores information Such as
`current state i.e. on/o

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket