throbber
United States Patent [19J
`Nicol et al.
`
`I 1111111111111111 11111 111111111111111 IIIII IIIII IIIII IIIII IIIIII Ill lllll llll
`6,141,762
`Oct. 31, 2000
`
`US006141762A
`[11] Patent Number:
`[45] Date of Patent:
`
`[54] POWER REDUCTION IN A
`MULTIPROCESSOR DIGITAL SIGNAL
`PROCESSOR BASED ON PROCESSOR LOAD
`
`7/1998 Evoy .
`5,787,294
`9/1998 Horden et al. .
`5,812,860
`5,974,556 10/1999 Jackson et al. .
`
`[76]
`
`Inventors: Christopher J. Nicol, 61 Hubbard Ave.,
`Red Bank, N.J. 07701; Kanwar Jit
`Singh, 23 Kerry Dr., Hazlet, N.J. 07730
`
`Primary Examiner----Gopal C. Ray
`Attorney, Agent, or Firm-Henry T Brendzel
`
`[57]
`
`ABSTRACT
`
`[21] Appl. No.: 09/128,030
`
`[22] Filed:
`
`Aug. 3, 1998
`
`Int. Cl.7 ................................. G06F 1/32; G06F 9/44
`[51]
`[52] U.S. Cl. .......................... 713/300; 713/320; 713/501;
`709/100
`[58] Field of Search ..................................... 713/300, 320,
`713/321, 322, 323, 340, 501, 600; 709/100,
`202; 327/291, 540; 365/227; 712/10
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,142,684
`5,727,193
`5,778,237
`
`8/1992 Perry et al. .
`3/1998 Takeuchi.
`7/1998 Yamamoto et al. .
`
`Improved operation of multi-processor chips is achieved by
`dynamically controlling processing load of chips and
`controlling, significantly greater than on/off granularity, the
`operating voltages of those chips so as to minimize overall
`power consumption. A controller in a multi-processor chip
`allocates tasks to the individual processors to equalize
`processing load among the chips, then the controller lowers
`the clock frequency on the chip to as low a level as possible
`while assuring proper operation, and finally reduces the
`supply voltage. Further improvement is possible by control(cid:173)
`ling the supply voltage of individual processing elements
`within the multi-processor chip, as well as controlling the
`supply voltage of other elements in the system within which
`the multi-processor chip operates.
`
`46 Claims, 2 Drawing Sheets
`
`120
`
`CALIBRATION
`
`102
`
`-------7
`I
`I
`I
`I
`
`130
`
`DC-DC
`CONVERSION
`
`r------------------
`IC
`1
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`101
`REFERENCE
`CLOCK
`
`FREQ REQ -1--1---1
`
`100
`
`TASKS
`
`(OS)
`PE
`
`PE
`
`150
`
`PE
`
`PE
`
`PE
`
`110
`
`CLK-L
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`1041
`103
`LEVEL ________ I
`I
`SHIFTER
`1
`1
`L __________________________ J
`
`140
`r ____ .../_7
`I
`I
`I
`I
`I
`I
`L _ _ _ _ _ _ J
`
`Vdd -LOCAL
`
`INTEL - 1008
`
`

`

`U.S. Patent
`
`Oct. 31, 2000
`
`Sheet 1 of 2
`
`6,141,762
`
`FIG. 1
`
`-N
`
`:l::
`:::IE
`.__.
`(cid:141) u
`z:
`L..J
`=>
`0
`L..J
`~
`LA.
`
`500
`450
`400
`350
`300
`250
`200
`150
`100
`50
`01
`
`1.5
`
`2
`
`3
`
`2.5
`Vdd (VOLTS)
`
`3.5
`
`4
`
`FIG. 2
`
`r------------------
`~
`
`120
`
`- - - - - - - 7
`I
`I
`I
`I
`
`130
`
`DC-DC
`CONVERSION
`
`1
`:
`I
`CALIBRATION
`I
`I L . . . - - -~~
`I
`101
`I
`REFERENCE :
`CLOCK
`
`I 1:
`
`Vdd -LOCAL
`
`102
`L-'--+--~-....-----t""--f<ll " ' 1 - -~ - I
`110
`I L _ _ _ _ _ _ .J
`CLK-L
`
`FREQ REQ ~ ~ - . J 100
`(OS)
`PE
`
`TASKS
`
`PE
`
`PE
`
`PE
`
`PE
`
`1
`I
`I
`I
`I
`I
`1
`150
`I
`I
`1041
`103
`LEVEL 1----1-------i.-------___,
`1
`:
`SHIFTER
`1
`L--------------------------~
`
`INTEL - 1008
`
`

`

`U.S. Patent
`
`Oct. 31, 2000
`
`Sheet 2 of 2
`
`6,141,762
`
`FIG. 3
`
`t
`
`,---------7
`L7
`1_J
`>-
`u
`L ___ _
`r_J
`~
`5 ___ I
`
`~ _ _ _ _ _ _ _ _ _J
`La..
`
`NEW TASK STARTED
`
`TASK ENDED
`
`NEW TASK CREATED
`
`FIG. 4
`
`r--- -----
`DC-DC
`CONVERSION
`
`DC-DC
`CONVERSION
`
`DC-DC
`CONVERSION
`
`PLL - -
`
`PLL
`
`PLL
`
`FREQ
`
`LEVEL CONVERTER
`._,_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ - r - '
`
`150
`
`I
`I
`I
`1
`I
`I
`I
`I
`I
`I
`160 :
`:
`I
`I
`I
`I
`I
`I
`TASK I
`I
`:
`CHIP
`CONTROLLER
`L ______________________________________ J
`1
`
`,__._ ______ _._ ________ __..__,
`ASYNCH COMMUNICATION NETWORK
`t---------------~
`
`,.........__..__,
`
`INTEL - 1008
`
`

`

`6,141,762
`
`1
`POWER REDUCTION IN A
`MULTIPROCESSOR DIGITAL SIGNAL
`PROCESSOR BASED ON PROCESSOR LOAD
`
`BACKGROUND
`
`2
`Low Power High-Speed COMS Digital Design," by T.
`Kuroda et al, CICC97 Conference Proceedings, and JSSC
`Issue of CISS97, May, 1998. The Kuroda et al paper dem(cid:173)
`onstrates that the speed of the circuit can be maintained ( or
`5 at least the speed degradation can be minimized) by tuning
`the threshold voltages even as the supply voltage is lowered.
`The tuning is achieved on-chip by varying the substrate-bias
`voltage. These techniques are needed to ensure that the
`leakage current, which increasing as the threshold voltage is
`10 reduced, does not become too large.
`Thus, it is known that varying supply voltage to a chip can
`improve performance by eliminating unexpected variability
`in the supply voltage, and by accounting for process and
`operating temperature variations.
`
`SUMMARY OF THE INVENTION
`
`Improved performance of multi-processor chips is
`achieved by dynamically controlling the processing load of
`chips and controlling, which significantly greater than on/off
`granularity, the operating voltages of those chips so as to
`minimize overall power consumption. A controller in a
`multi-processor chip allocates tasks to the individual pro(cid:173)
`cessors to equalize processing load among the chips, then
`the controller lowers the clock frequency on the chip to as
`low a level as possible while assuring proper operation, and
`finally reduces the supply voltage. Further improvement is
`possible by controlling the supply voltage of individual
`processing elements within the multi-processor chip, as well
`30 as controlling the supply voltage of other elements in the
`system within which the multi-processor chip operates.
`
`25
`
`This invention relates to electronic circuits and, more
`particularly to power consumption within electronic circuits.
`Integrated circuits are designed to meet speed require(cid:173)
`ments under worst-case operating conditions. In Lucent
`Technology's 0.35 µm 3.3V CMOS technology, the "worst(cid:173)
`case-slow" condition is specified for a temperature of 125C.
`and a chip supply voltage, Vaa, of 2.7V. The worst-case
`power consumption of the chip is quoted at the maximum
`supply voltage of 3.6V. The difference in chip performance
`at the "worst-case slow", nominal, and "worst-case-fast" 15
`conditions is shown in FIG. 1, where the frequency of a
`25-stage ring oscillator is shown at different supply voltages
`and process corners. At the nominal operating voltage of
`3.3V, the speed difference between "worst case slow"
`(WCS) and "worst case fast" (WCF) is a factor of 2.2. From 20
`the graph it can be seen that if a chip is designed to operate
`at 140 MHz and at 2.lV supply even when it is "worst(cid:173)
`case-slow", a manufactured chip whose characteristics hap(cid:173)
`pen to be nominal will continue to operate at 140 MHz even
`when the chip supply is reduced to 2.1 V.
`The power consumption of a CMOS circuit increases
`linearly with operating frequency and quadratically with
`supply voltage. Therefore, a reduction in supply voltage can
`significantly reduce power consumption. For example, by
`reducing the nominal operating voltage from 3.3V to 2.1 V,
`the nominal power consumption of a 140 MHz chip is
`reduced by 60% without altering the circuit. This, of course,
`presumes an ability to identify and measure a chip's varia(cid:173)
`tion from nominal characteristics, and an ability to modify
`the supply voltage based on this measurement.
`To achieve variable power supply voltage scaling, a
`programmable dc----dc converter may be used. Probably, the
`most efficient approach in use today is the buck converter
`circuit. These are well known in the art.
`Voltage scaling as a function of temperature has been
`incorporated into the Intel Pentium product family as a
`technique to achieve high performance at varying operating
`temperatures and process corners. It is described in U.S. Pat.
`No. 5,440,520. The approach uses an on-chip temperature
`sensor and associated processing circuitry which issues a
`code to the off-chip power supply to provide a particular
`supply voltage. The process variation information is hard(cid:173)
`coded into each device as a final step of manufacturing. This
`approach has the disadvantage of costly testing of each chip 50
`to determine its variance from nominal processing. Several
`manufacturers make Pentium-compatible dc----dc converter
`circuits, which are highlighted in "Powering the Big
`Microprocessors", by B. Travis, EDN, Aug. 15, pp. 31-44,
`1997.
`Recently, there has been considerable interest in integrat(cid:173)
`ing much of the buck controller circuit onto the chip. The
`only off-chip components are the inductor (typically about
`10 µH) and capacitor (typically about 30 µF) used in the
`buck converter. Efficiencies in excess of 80% are typical for
`a range of voltages and load currents. See, for example, "A
`High-Efficiency Variable Voltage CMOS Dynamic dc----dc
`Switching Regulator," by W. Namgoong, M. Yu, and T.
`Meng, Proceedings ISSCC97 pp. 380-381, February, 1997.
`Researchers have been also experimenting with on-chip 65
`voltage scaling techniques to counter process and tempera(cid:173)
`ture variations. See "Variable Supply-Voltage Scheme for
`
`35
`
`40
`
`45
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates the maximum operating frequency that
`is achievable with a 0.35 µm technology CMOS chip as a
`function of supply voltage;
`FIG. 2 presents a block diagram of a multi-processor chip
`with supply voltage control in accordance with the prin(cid:173)
`ciples disclosed herein;
`FIG. 3 shows the relationship between the voltage control
`clock, Clk, of FIG. 2, the clock applied to the processing
`elements of FIG. 2, Clk-L, and the supply voltage applied to
`the processing elements, V aa-local; and
`FIG. 4 depicts the block diagram of a multi-processor chip
`with supply voltage control that is individual to each of the
`processing elements.
`
`DETAILED DESCRIPTION
`
`FIG. 2 depicts a block diagram of a multi-processor chip.
`It contains processing elements (PEs) 100, 101, 102,
`103, ... 104, and each PE contains a central processing unit
`(CPU) and a local cache memory (not shown). A real-time
`operating system resides in PE 100 and allocates tasks to the
`55 other PEs from a mix of many digital signal processing
`applications. The load of the FIG. 2 system is time varying
`and is dependent on the applications that are being executed
`at any given time. For example, a set-top-box for a multi(cid:173)
`media broadband access system might need to receive an
`60 HDTV signal. It could also be transmitting data from a
`computer, to the Internet, and responding to button requests
`from a remote control handset. Over time, this dynamic mix
`of applications places different load requirements on the
`system.
`For a maximally utilized system, all of the available
`processors ought to be operating at full speed when satis(cid:173)
`fying the maximum load encountered by the system. At such
`
`INTEL - 1008
`
`

`

`6,141,762
`
`5
`
`20
`
`3
`a time, the power consumption of the multiprocessor chip is
`at its maximum level. However, as the load requirements are
`lowered, the system should, advantageously, reduce its
`power consumption. It may be noted that, typically, com(cid:173)
`puters spend 99% of their time waiting for a user to press a
`key. This presents a great opportunity to drastically reduce
`the average power consumption. The specific approach by
`which the system "scales back" its performance can greatly
`impact the realizable power savings.
`In the FIG. 2 arrangement, in accordance with the prin(cid:173)
`ciples disclosed herein, the applications that need to be
`processed are mapped to the N PEs under control of real
`time operating system (RTOS) executed on PE 100. If the
`number of instructions that need to be executed for each task
`is known and made available to the operating system, a
`scheduler within the operating system can use this informa(cid:173)
`tion to determine the best way to allocate the tasks to the
`available processors in order to balance the computation.
`The intermediate goal, of course, is to maximize the paral(cid:173)
`lelism and to evenly distribute the load presented to the FIG.
`2 system among all of the PE's.
`When an application that is running on the FIG. 2 system
`is subdivided into N concurrent task streams, as suggested
`above, each of the PEs become lightly loaded. This allows
`the clock frequency of the PEs to be reduced, and if the task 25
`division can be carried out perfectly, then the clock fre(cid:173)
`quency of the FIG. 2 system can be reduced by a factor of
`N. Reducing the frequency, as indicated above, allows
`reducing the necessary supply voltage, and reducing the
`supply voltage reduces the system's power consumption 30
`( quadratically). To illustrate, if a given application that is
`executed on 1 PE requires operating the PE at 140 MHz, it
`is known from FIG. 1 that the PE can be operated at
`approximately a 2.7V supply. When the application is
`divided into two concurrent tasks and assigned to two PEs 35
`that are designed to operate at 140 MHz from a 2.7V supply,
`then the PEs can be operated at 70 MHz and at a supply
`voltage of 1.8V. This reduction in operating voltage repre(cid:173)
`sents a power saving of 55%. Of course, it is unlikely that
`an application can be perfectly divided into two equal load 40
`task streams and, therefore, the 55% power saving is the
`maximum achievable power saving for two PEs.
`It should be understood that in the above example, when
`two PEs are employed and their operating frequency can be
`reduced to 70 MHz, the indicated reduction presumes that it 45
`is desired to perform the given tasks as if there was a single
`PE that operates at 140 MHz. That is, the presumption is that
`there is a certain time when the tasks assigned to the chip
`must be finished. In fact, there might not be any particular
`requirement for when the tasks are to be finished. 50
`Alternatively, a requirement for when the tasks are to be
`finished might not be related to the highest operating fre(cid:173)
`quency of the chip.
`For example, the above-illustrated chip (where each of the
`PEs is designed to operate at 140 MHz) might be employed 55
`in a system whose basic frequency is related to 160 MHz. In
`such an arrangement, dividing tasks between the two PEs of
`the chip and operating each of the PEs at 80 MHz would be
`preferable because it would be easier to synchronize the
`chip's input and output functions to the other elements in the 60
`system. Thus, in a sense it is the expected completion time
`for the collection of assigned tasks that is controlling, and
`the reduction of frequency from the maximum that the chip
`can support may be controlled by the division of tasks that
`may be accomplished.
`Hence, the operating system of PE 100 needs to ascertain
`the required completion time, divide the collection of tasks
`
`4
`as evenly as possible (in terms of needed processing time),
`consider the PE with the tasks that require the most time to
`carry out, and adjust the clock frequency to insure that the
`most heavily loaded PE carries out its assigned tasks within
`the required completion time. Once the frequency is thus
`determined, a minimum supply voltage can be determined.
`The supply voltage determination can be made by reference
`to a plot like the one shown in FIG. 1 or, advantageously, by
`evaluating the actual performance of the multiprocessor at
`10 hand.
`As indicated above, the operating system can reduce the
`supply voltage even further by tracking temperature and
`process variations. For example, when the chip is nominal in
`its characteristics, then it can be operated along line 20 of
`15 FIG. 1, which calls for only 1.5V supply when operating at
`70 MHz.
`Returning the discussion to FIG. 2, the programmable(cid:173)
`frequency clock is generated using an appropriately multi(cid:173)
`plied input reference clock (line 101) via a phase lock loop
`frequency synthesizer circuit 110 which has a high
`resolution, e.g., can be altered in increments of 5 MHz.
`Advantageously, two clocks are generated by PLL 110
`(requiring two synthesizer circuits), a Clk clock, and a Clk-L
`which is 1 frequency step lower than Clk when Clk is being
`increased. For example, in a PLL 110 unit that provides 5
`MHz resolution, when Clk is being increased from 75 MHz
`to 80 MHz, the value of Clk-L is set to 75 MHz.
`Clk-L is applied to the PEs, while Clk is applied to
`calibration circuit 120, which generates a supply voltage
`command. The supply voltage command is applied to
`de-de converter 130 followed by L-C circuit 140 to cause
`the combination of converter 130 and L-C circuit 140 to
`create the supply voltage V aa-local, which is fed back to
`calibration circuit 120 via line 102. The V aa-local supply
`voltage is also applied to all of the PEs ( excluding perhaps
`the operating system PE 100).
`The reason for having the frequency Clk-L lag behind the
`frequency Clk is that the clock frequency applied to the PEs
`should not be increased prior to the supply voltage being
`increased to accommodate the higher frequency. Otherwise,
`the PEs might fail to perform properly. Circuit 120 observes
`the level on line 102 to determine whether it corresponds to
`the voltage necessary to make PEs 100---104 operate properly
`(described below), and it also waits till the signal on line 102
`is stable (following whatever ringing occurs at the output of
`L-C circuit 140. The signal on line 121 provides information
`to PE 100 (yes/no) to inform the operating system of when
`the supply voltage is stable. When the voltage is stable and
`Clk has reached the required frequency, the operating system
`sets Clk-L to Clk and then changes the task allocation on the
`PEs to correspond to that which the PEs were set up to
`accommodate.
`FIG. 3 demonstrates the timing associated with increasing
`Clk, Clk-L and Yaa-local when a new task is created and the
`load on the multiprocessor is thus increased, and the timing
`associated with decreasing Clk, Clk-L and V aa-local when
`the load on the multiprocessor is decreased. Specifically, it
`shows the system operating at 70 MHz from a 1.8V supply
`when the load is increased in three steps to 140 MHz. When
`the 2.7V supply is stable, as shown by the supply voltage
`plot, the new task is enabled for execution. Some time
`thereafter according to FIG. 3, a task completes, which
`reduces the load on the multiprocessor. The reduced load
`65 permits lowering the clock frequency to 100 MHz and
`lowering the supply voltage to 2.1 V. This, too, is accom(cid:173)
`modated in steps (two steps, this time), with Clk-L preceding
`
`INTEL - 1008
`
`

`

`6,141,762
`
`5
`Clk to insure, again, that the PEs continue to operate
`properly while the supply voltage is decreased.
`Calibration block 120 can use one of several techniques to
`determine the voltage required to operate the circuit at a
`given clock frequency. One technique is given in Koruda et
`al article. Recognizing that each of the PEs (101-104) has a
`critical path which controls the ultimate speed of the PE,
`block 120 uses two copies of that portion of the PE circuit
`that contains the critical path of the PE circuit, with one of
`the copies being purposely designed to be just slightly
`slower. Both of the copies are operated from clock signal
`Clk and from the V aa-local supply voltage of line 102, and
`that voltage is adjusted within block 120 so that, while
`operating at frequency Clk, the slightly slower PE fails to
`operate properly while the other PE does operate properly.
`This guarantees that the PE's are operating from a supply
`voltage that is "just above" the point at which they are likely
`to fail. Since the two critical path copies within element 120
`experience the same variations in temperature as do PEs
`101-104, the Yaa-local supply voltage appropriately tracks
`the temperature variations as well as the different operating
`frequency specifications.
`The FIG. 2 system uses the operating system to react to
`variations in the system load. As more tasks are entered into
`the "to-do" list, the operating system of PE 100 computes
`the correct way to balance the additional computational
`requirements and allocates the tasks to the processors. It then
`computes the required operating frequency.
`It is noted that the frequency is gradually programmed
`into the system (as shown by the stepped changes in FIG. 3).
`This prevents excessive noise on the V aa-local supply volt(cid:173)
`age and possible circuit failure. For example, if the system
`is operating at 50 MHz and it needs to operate at 75 MHz,
`the clock frequency is increased slowly, perhaps even as
`slowly as in 5 MHz increments. In addition, as indicated
`above, the V aa-local supply voltage is increased ahead of
`increasing the frequency of the clock the operates the PEs,
`when increased processing capability is desired, and the
`clock is reduced ahead of reducing the supply voltage when
`reduced processing capability will suffice.
`Of course, V aa-local can only be reduced so-far before the
`circuits start to fail, at which point the operating system
`employs gated clocking techniques to "shut down" PEs that
`are not needed. Of course, the fact that supply voltage
`V aa-local varies as a function of load should be accounted
`for in the interface between the PEs 101-104 and PE 100 (as
`well as in the interface between the multiprocessor chip and
`the "outside world". This is accomplished with level con(cid:173)
`verter 150, which is quite conventional. It basically converts
`between the voltage level of PEs 101-104 and the voltage
`level of PE 100.
`The notion of adjusting operating frequency to load and
`adjusting supply voltage to track the operating frequency
`can be extended to allow each PE to have its own supply
`voltage. The benefit of this approach for some applications
`becomes apparent when it is realized that the chip-wise
`voltage scaling is most effective when the load of the
`computation can be evenly distributed across all of the PEs.
`In some applications, however, one may encounter tasks that
`cannot be partitioned into concurrent evenly-loaded threads
`and, therefore, some PE within the multiprocessor would
`require a higher operating frequency and a higher operating
`voltage. This would require raising the frequency and volt(cid:173)
`age of the entire multiprocessor chip.
`A separate power supply for each PE in a chip overcomes
`this limitation by allowing the operating system to indepen-
`
`5
`
`10
`
`6
`dently program the lowest operating frequency and corre(cid:173)
`sponding lowest supply voltage for each PE. The architec(cid:173)
`ture of such an arrangement is shown in FIG. 4. Each PE in
`FIG. 4 needs an independent controller that performs the
`functions of PE 100 ( except it does not divide tasks among
`PEs). As shown in FIG. 4, all of the controllers are embodied
`in a single controller 200, which may be just another
`processing element of the integrated circuit that contains the
`other processing elements. Each processing element also
`requires a calibration circuit like circuit 120, and a voltage
`converter circuit like circuits 130 and 140. It also has a PE
`200 that assigns the tasks given to the multi-processor chip
`of FIG. 4 among the PEs.
`It may be noted that if the frequencies at which the
`15 individual PEs operate differ from one another and from
`other elements within the system where the multiprocessor
`chip is employed, there is an issue of synchronization that
`must be addressed. That is, a synchronization schema must
`be implemented when there is a need to communicate data
`20 between PEs (or with other system elements) that operate at
`different frequencies. It is possible to arrange the frequencies
`so that the collection of tasks that are assigned to the
`multiprocessor is completed at a predetermined time. In
`such a case, the synchronization problem of the multipro-
`25 cessor vis-a-vis other elements within the system where the
`multiprocessor is employed is minimized. However, that
`leaves the issue of synchronizing the exchange of data
`among the PEs of a multiprocessor chip.
`To effect such synchronization, each PE within the FIG.
`30 4 arrangement is connection to an arrangement comprising
`elements 150 and 160. Level converter 150 converts the
`variable voltage swings of the PEs to a fixed level swing, and
`network 160 resolves the issue of different clock domains.
`The principles disclosed above for a multiprocessor is
`35 extendible to other system arrangements. This includes
`systems with a plurality of separate processor elements that
`operate at different frequencies and operating voltages, as
`well as components that are not typically thought of as
`processor elements. For example, there is a current often-
`40 used practice to maintain program code and data for different
`applications of a personal computer in a fast memory. As
`each new application is called, more information is stored in
`the fast memory, until that memory is filled. Thereafter,
`when a new application is called, some of the information in
`45 the fast memory is discarded, some other information is
`placed in the slower hard drive, and the released memory is
`populated with the new application. It is possible to antici(cid:173)
`pate that memory stored in the fast memory is so old as to
`be unlikely to be accessed before a new application is called.
`50 When so anticipated, some of the fast memory can be
`released (storing some of the data that needed to be
`remembered) at a leisurely pace. That is, lower clock fre(cid:173)
`quency can be employed in connection with the fast memory
`and the hard drive, with a corresponding lower supply
`55 voltage, resulting in an overall power saving in both the
`memory's operation and in the operation of the hard drive.
`The above description illustrated the principles of this
`invention, but it should be realized that a skilled artisan may
`easily make various modifications and improvements that
`60 are within the scope of this invention as defined by the
`appended claims. For example, in one of the embodiment
`disclosed above all of the PEs in a multi-processor chip are
`subjected to a single controlled supply voltage. In another
`embodiment disclosed above each of the PEs in a multi-
`65 processor chip is subjected to its own, individually
`controlled, supply voltage. It should be realized, however,
`that a middle ground is also possible; i.e., the PEs of a
`
`INTEL - 1008
`
`

`

`6,141,762
`
`10
`
`7
`multi-processor chip can be divided into groups, and each
`group of PEs can be arranged to operate from its own
`controlled supply voltage. To cite another example, the FIG.
`2 embodiment employs two almost identical critical path
`circuits to establish the minimum supply voltage. 5
`Alternatively, the voltage may be set in accordance with a
`preset frequency-voltage relationship that is not unlike the
`one depicted in FIG. 1.
`It should also be noted that level converter 150 is inter(cid:173)
`posed in FIG. 2 between PE 100 and the other PEs because
`PE 100 is operating off V aa- PE 100 can also be operated off
`V aa-local, in which case the level converter is interposed
`between PE 100 and the input/output port of the FIG. 2
`circuits that interacts with PE 100.
`It should further be noted that the power supply circuit
`need not have any elements outside the circuit itself ( as
`depicted in FIG. 2). A skilled artisan would be aware that
`circuit design exists that can be manufactured wholly within
`an integrated circuit.
`Yet another modification may be implemented by discard- 20
`ing the two-step application of voltages and frequencies of
`FIG. 3 when appropriate timing conditions are met.
`We claim:
`1. A method executed within a system for controlling
`power consumption of a sub-circuit of said system compris- 25
`ing the steps of:
`ascertaining time allotted for carrying out an assigned
`task;
`determining a lowest frequency at which or above which 30
`the sub-circuit must operate in order to complete execu(cid:173)
`tion of the assigned task within the allotted time; and
`based on characteristics of the sub-circuit, setting a supply
`voltage that is applied to the sub-circuit to a lowest
`level that insures proper operation of the sub-circuit at 35
`the determined frequency.
`2. The method of claim 1 further comprising the step of
`executing said task after said voltage is set, and a frequency
`at which or above which the sub-circuit operates is set to
`said determined frequency.
`3. The method of claim 1, carried out in a multiprocessor
`sub-circuit, wherein said assigned task comprises a plurality
`of sub-tasks, the method further comprising the step of
`apportioning said sub-tasks among processors of said
`multiprocessor sub-circuit, resulting in one of said 45
`processors carrying the largest load of sub-tasks
`processing, compared to the sub-tasks processing load
`of others of said processors, where
`said step of apportioning is executed prior to said step
`of determining, and
`said step of determining ascertains the lowest fre(cid:173)
`quency that, when employed at the processor carry(cid:173)
`ing the largest load of sub-tasks processing, is suf(cid:173)
`ficient to complete its assigned sub-tasks processing
`within the allotted time.
`4. The method of claim 3 further comprising the step of
`assigning the subtasks to said processors in accordance with
`said apportioning.
`5. The method of claim 1, wherein said steps are executed
`in a multiprocessor integrated circuit.
`6. The method of claim 5 where said steps are executed in
`said processor under control of a real-time operating system.
`7. The method of claim 6 wherein assigned task comprises
`a plurality of sub-tasks and said method further comprising
`a step, executed in said processor under control of said 65
`real-time operating system, of apportioning said sub-tasks
`among processors of said multiprocessor sub-circuit.
`
`8
`8. The method of claim 1 wherein said steps are executed
`in a circuit that comprises distinct, cooperating, processing
`units.
`9. The method of claim 1 wherein the determined fre(cid:173)
`quency assumes values that are multiples of a preset fre(cid:173)
`quency increment.
`10. The method of claim 1 wherein the step of setting a
`supply voltage is sensitive to an operational state of two
`circuits that are identical except that one is slower than the
`other.
`11. The method of claim 10, carried out in an integrated
`circuit, wherein said two circuits are within said integrated
`circuit.
`12. The method of claim 10 wherein the step of setting a
`supply voltage adjusts the supply voltage to cause the slower
`15 of the two circuits to be at a failed operational state, and the
`other of the two circuits to be at a working operational state.
`13. The method of claim 2 further comprising the steps of:
`determining a new lowest frequency, when a new task is
`assigned, at which or above which the sub-circuit must
`operate in order to complete execution of the assigned
`task within the allotted time;
`comparing the lowest frequency to the new lowest fre(cid:173)
`quency to determine whether a new operating fre(cid:173)
`quency should be set for said sub-circuit;
`when said step of comparing determines that the new
`lowest frequency is lower than said lowest frequency,
`reducing the frequency at which said sub-circuit is set
`to operate and, thereafter, reducing the supply voltage
`that is applied to the sub-circuit; and
`when said step of comparing determines that the new
`lowest frequency must be higher than said lowest
`frequency, increasing the supply voltage that is applied
`to the sub-circuit and, thereafter, increasing the fre(cid:173)
`quency at which said sub-circuit is set to operate to said
`new lowest frequency.
`14. The method of claim 1 wherein the characteristics of
`the sub-circuit are expressed as a relationship between
`frequency and supply voltage.
`15. The method of claim 1 further comprising the step of
`40 converting output levels of said sub-circuit to standardized
`levels.
`16. The method of claim 1 further comprising the step of
`synchronizing output signals of said sub-circuit to a timing
`signal of said system.
`17. A circuit that includes a processor, comprising:
`a controller, responsive to an applied task and to a
`specification of a time interval, where duration of at
`most said time interval is necessary for execution of
`said applied task, for developing a frequency of opera(cid:173)
`tion for said processor that is the lowest frequency of
`operation that allows completion of said applied task
`within said time interval;
`a calibration circuit responsive to said controller for
`directin

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket