throbber
Homayoun
`
`Reference 12
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2124, p. 1
`
`

`

`2017 IEEE 35th International Conference on Computer Design
`
`   
`

` 
`
` 

`  
`   
`
`
`  
`
`
`
`  
`!
` 
` 
` 
` 
`"
`  # 
`
`$
`%&#
`'(()* +
`
`       
`
   
`    
`  
`


` 
` 
`      
`


`
` 
`
     
`
   
` 
  
` 
`
` 
`    
`

 
   

`

  
` 
`     
 
   
` 
`

    
`
  
`  
`
`   !    

   
`
`    

`  
!"      # 
`        

`   

  
`
` 

` 
`   #   

  $  
` #
    
`

`#  

` 
`     
  % 
` #
`
`
`
 
`    
 
`   
 
`


`  
   
` &
` 
  
` 
`

`
`
` 


        

` 
`   
  

`   
` 
`
`
     %
 
` 
`    

 

`

` 

       # 

  $   
`
!"        '     

`  
`
 ! 
`
`   
  
  

` % 
`  
`
       
 
`
`  

  
`  
`  
`  
 

(
`
` 

 
`
`
  "
 )   * 
`  
` #
`   
 
  
`
     
`       "

  
`  
    
`
`


 

  
    
`   
` 
`
` 
 & 
` 
`
` 
   ! 

`
  
`
` 

` + ,
 -  

 
` 

#
 
` 


` 
`  

` 

`#     
`   
`   #

.


`
 
  

` 
  
`   
` 
`
`


` 
`
`

` 
  

`  

`
` 
`
`
`
   
`  
`
`  
 
 
`

 
     
`,+ ,-./!#-,/
`
`  

` 
`      
`
` 
   + -  0    
`
`  1
`       
`
` 
` 
` 
`  0     
` 
`
`1
`2%  1
`0
`
` 
`
`+ 
` 
` 
` 
`
`  
` 
`2
` 
` 
`
`
`   
`   
3

 
`456+1 
`  
` 
`  
`
`            1
` 1   
`
`
`
`   
` 4(789:6+        
`     
`
`  
`  % 
`
`  
`
`45555:9;6+
`  
`  
`
`
`1
` 
`
`   2 
`  
`  
`  
`,< 0,4=6. >2 + ,-- 4?6 -
`
`( 4:6  
`    
`
`  2  
` 1  1
`1
` 
`  +
` 
` 
`  
`  
`  
`
` 2
`2 1     
`  
`  
` 
`  
`
`   +  
`     2 
` 
`
`   
` +-
`
`  
` $ 45:6 -$% 45?6   
`
`45556+ , 4555:6      
`  
`
`1
`2  
`
`  
` 2  
`  
` 

` 
`  
`+ - 
`  456
`      
`  (! 2 
`
`
`2   
`       1
`
`
`  
` 
` 2  2 0   
` 
` 
`   
` @+ -
` 
` 
`  
` 1
` 
` 
`  2   
`  2 
`
` 
` 
` 
`   
`  + A  
`
`
`  
` 
`  
` 
`  
`  
` 
` 
`
`
`  0   
`  
`2+ 
`      % 
`        
`    
`4595(5:6+1 
`
` 
` 
`     
` 
` 
`  
` 
` 
` 

`     
`    
`2 
`  
`22
` 
`
`  
`
`

`
`  
`+$
`
`
`  11
`  
`   
`   
`  
`2    
`   
`     
` 
`
`
` 
  +
`-    
`         
` 
`  
`      
`
`  
`1
`  
`  + - 
`
`
` 
`
`   
`
`   1
`   
` 
`  B
`@  2
`
` 
`+
`A
`
`  2 2
`     
`   
`  
`  
`   
`2   
`   
`   
`  
` 1  @ 
`  
`+ , 
`
`       
` 

`
`
`  
`     
`
`  5C  
`
`  
` 9C   
`  
`  
` 1
` 
`2
`    
` 
` %      (C
`  
`  
`2
` 
`
`
`     >
`
` 
`4;895996
`
`1063-6404/17 $31.00 © 2017 IEEE
`DOI 10.1109/ICCD.2017.28
`
`129
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2124, p. 2
`
`

`

`Prior Work
`/  0 1
`  
`
 !"#$##%
`0 Thread Count: [5, 9, 21, 22]
`   &'$$$($'$"##%
`Core Type: [4,8,11,17,18,19,22]
`  
`
)   "##%
`0 Thread Count & Core Type: [9,22]
` 
 
` )   #*%
`Voltage/Frequency & Core Type: [2,3]
`+
 ,-  #*&!"$$...%
`0 Application Behavior: [2, 3,4, 5, 9, 11, ...]
`This Work
`  0 1
`  
`
)  )
`0 Thread Count & Core Type &
` 
 
` )+
 ,- 
`. O Voltage/Frequency & Application Behavior
`
`.
`
`Multithreaded applications are comprised of number of
` 
`   
` 
`  2
` 
`parallel regions which are separated by serial regions. In this
`
`
`  1  
` 
` 2 
`
` + , 
`work, we refer to these parallel regions as Region of Interest
`1
`0 1
`
`   
`
`   .   ,
`
`(ROI).
`In a homogenous multi-core architecture and for
`E./,C+ ,    

` 
`  
`  
`
`conventional scheduling, all of these regions are processed on
`    
` 
`
` 
`the same core type, same voltage and fiequency, and number
`  
`      
`@   2
`
`of threads,
`though not all
`regions may have the same
` 
`    
`      
`preferences. Some of these ROIs may benefit fiom different
`
`
` +    ./,  2 
` 
`
`configurations than the others to obtain the maximized energy-
`  
`
`2%3
` 

`efficiency. As mentioned earlier,
`the main challenge for
`  +   
`
`     
`
`scheduling is to effectively tune system, architecture and
`          
`  
` 
`application level parameters in CCA for the entire application
`  
`
` 
`
` 
`as well as
`the intermediate parallel
`regions
`to achieve
` 1   
` 
`
`     
`maximum energy-efficiency. In this work, similar to [1,2], we
`%
` 
  +,1
`0
`45961
`are assuming that big cores (composed) are constructed by
`
`    2  
` E C 
` 
`  2
`composing multiple little (base) cores.
`  E2C 
`+
`Fig. 2 illustrates an overview of optimal configurations for
`$ +9
` 
` 1  
`
`
`application selected fiom SPLASH-2 multithreaded
`an
`    
`
`  
9 
`
`benchmark suite with three parallel regions. In each ROI the
`2 
`0  1 
` 
`
` + ,   ./, 
`best possible configuration for core type, operating fiequency
`2 2  
` 
` 
`  
`  
`@ 
`and thread counts that results in maximized energy-efficiency
` 
`  
`  %3 
` 
  
`is specified. As can be seen, R011 needs to be run on the base
` + 2./,52
`2
`core with 2.4GHz fiequency and 8 threads, whereas for R012
` 
`19+7"3
`@ ?
`1
`
`./,9
`in order to achieve the best energy-efficiency, we need to
` 
`
`      2 
` 
   1  
`compose two small cores to make a big core. Moreover, the
`  1  
`  0  2  
`+ 
` 
` 
`optimal operating frequency increases to 2.8GHz. The ability
` 
`  
`@  
`  9+?"3+ - 2
`to accurately predict the optimal application, system and even
` 
`
`   
`microarchitecture parameters and adapt them to achieve the
`
`
`  
` 
`
`        
`maximum energy-efficiency within different parallel regions of
`%
` 
  1
`
`
` 
`an application is the main motivation for this work. We
`        
`  1
`0+ A
`develop several machine leaming-based approaches which are
`  
` 
`
2
` 1 
`
`able to predict the optimal setting of tuning parameters and
`2  
`         
`
` 
`change them to suit the specific requirement of each parallel
`         
`@
`    
`
`region within the multithreaded application.
`
` 1
` +
`Fig. 3 depicts our three-stage approach for predicting
`$ + (   
` 
`
  
`  
` 
`  
`the right core type and application configuration when running
`
`  
`   
`1
` 
`a multithreaded application on composite cores architecture.
` 
`     
` 
`  
`+
`Our machine leaming-based approach begins fiom extracting
`/
`   
`
2 
`  2  
` %
`  
`microarchitectural data (referred as feature extraction), fiom
`
`
`  
`  E
`
`
`  
` %
` C 
`
`different parallel regions of application to characterize the
`
` 
`
`      
` 
`3 
`multithreaded workload. These data (or features) include the
`
` 1
`0+ -  E
` 
`C   
`hardware performance counter data, which are representative
`
`1
` 
`
`  
`  1  
`
`
` 
`of application behavior at run-time. Next, a machine leaming-
` 2 
`
`
+% 
`

`based predictor (that is built ofi-line) takes in these features
`2 
` 
` E  2 
C 0   
`
`and predicts the best configuration settings for a given parallel
`
` 2  
` 
`  
`
`region. For this purpose, we have implemented five well-
`
` + $
`  
` 1      1

`known machine learning algorithms and compare them in
`01   
`   
`  
`  
`terms of accuracy, power and area overhead to find to the most
`
` 
` 1
`
` 
`
`effective learning model which yields in optimized energy-
`   
`   1    3 
` 

`efficiency. Finally, we configure the processor and schedule
`  + $ 1  
`  
` 
`   
`the application to run on the predicted configuration. In this
`   
`   
`   
`+ , 
`work, the metric that we use to characterize energy-efficiency
`1
`0
` 1 
` 
`3
` 
  
`is the Energy Delay Product (EDP) which aims to balance
`  
`  ! 
`  E!C 1    2 
`performance and power consumption. We construct and
`
`
`   1
` + A 
`  
`
`
`#
`$
`*
`0 i 0
`*
 
`-  
`0   
`Extraction
`Feature
`Predictor
`Scheduling
`—>I—>
`
`LEI—b
`/!

`Multithreaded
`+ 
 
`Processor Carrjl'g.
` 

`
`
 

`Feature Values
`Applications
`

`
`   
`
`(Core Type, Freq, #Threads)
`
      
`
`Fig. 3. An overview ofour approach for predicting the optimal configuration
`$ +(+ 
` 1
`
` 
`
`    
`
`and scheduling the multithreaded application
`  
` 
`
`
`
`
`
`
`—’-—>
`
`
`
`
`
`
`
`
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2124, p. 3
`
`voltage/fiequency [2,3], or core type [4,8,11,17,18,19,22] and
`  B
`@  49(6 
` 
`  47?555:5?58996 
`they have ignored the interplay among all these parameters.
`    
`  
`     
`
`+
`This study indicates that these parameters individually, while
`-      
`
`   1
`important, do not make a truly optimum configuration to
`
`   0  
`   
` 
`achieve the best energy-efficiency on a CCA. The best
`    2 
` 
     + - 2
`configuration for a multithreaded application can be effectively
`  
`
`
`  2  
`found, only when these parameters are jointly optimized. Fig. 1
`1
`
`
`D3+$ +5
`illustrates the tuning parameters influencing in scheduling
`
`    
`
`       
`decision in a CCA. Also, recent prior works as well as the
`    + 
`  
`
` 1
`0  1  
`contribution of our work is shown in this figure.
` 
`2
`1
`01 
`+
`In this paper, through methodical investigation of power
`,  
` 
`        1
`
`and performance, and comprehensive system and micro-
` 
`
`   
`    
`

`architectural level analysis, we first characterize multithreaded
`
`  
` 1
` 
` 
`3
`
`applications on CCA to understand the power and performance
` 
`1
`
`
` 
`trade-ofis offered by various configuration parameters and to
`
`
 
` 2 
`  
` 
`
`  
`find how the interplay of these parameters affects the energy-
`1  
`   
`
`    
` 

`efficiency. Our study is focused on a CCA where many little
`  + /
`        1
`  
`cores (base) can be configured into few big cores (composed)
` 
`E2C 2  
`12  
`E C
`and vice versa. The experimental results support that there is
`   
`+ - %
`
` 
`  
` 
`no unique solution for the best configuration for different
` @  
`  2  
` 
` 
`
`applications. Given the
`dispersed pattern of optimum
` + " 
` 
` 
`  
`configuration, we develop various machine learning models to
`  
`1  
` 
` 
`predict the energy-efficiency of parallel regions, and guide
`
`   
` 
    
`
`   
`scheduling and fine-tuning parameters to maximize the energy-
`  
 
`
`%3
` 

`efficiency. As behavior of applications changes at run-time, we
`  +2 
`   
`
1
`applied our prediction and tuning method at a fine-grained
` 
` 
`        

`
`level of individual parallel region within an application.
`  
`
` 1 +
`We use five well-known machine learning models to build
`A 1
01 
` 2
`predictors based on the knowledge extracted fiom an extensive
`
` 
`201 %
` 
`% 
`set of hardware performance data which are
`a good
`  
`1
` 
`
`   1  
`  
`representative of application behavior for each parallel region
`
`
`     2 
` 
`   
`
` 
`within the training phase. The models are then used at run-time
`1
` +-
`
`

`to predict the optimal processor configuration for each parallel
`
` 
` 
`  
`
` 
`
`region of a multithreaded workload to maximize the energy-
`
`    
` 1
`0  %3  
` 

`efficiency. We analyze the proposed machine leaming-based
`  + A 3  
`   
`
2
`models in terms of their prediction accuracy, power overhead
`  
`  
` 
`   
`  1
`  
`
`and implementation cost to understand their cost effectiveness.
` 
`
`   +
`11. MOTIVATION AND OVERVIEW OF OUR APPROACH
`,,+ /-,&-,/!/&.&,A/$/#../
`
`Optimal Configurations
`l"""""""""""""""""""""""""""""""""""""""""""""""""u
`_ _ _
`_ — _ 1
`— _ _ _
`'
`l Base Core
`I
`l Composed Core I
`l Composed Core I
`g
`E
`|
`24611:
`|
`28st
`I
`|
`24GHz
`I
`:
`:
`l
`8threads
`I
`l_ 8threads
`l
`l
`8threads
`l
`i
`I
`:
`— 7 z- —
`— F —
`— 7/ ,_ —
`:
`
` ’“w4—) 4—} 4—)
`m l/
`
`ROI1
`ROI2
`ROB
`
`.u
`
`Time
`
`Fig. 2. Optimal configurations in different parallel regions ofan application
`$ +9+/  
`
`
`
`  
`
`130
`130
`
`  
`
`

`
`  
`Energy
` 
`Efficiency
`
`  
`CE,” me
`

`l
`'3’ '
`“l
`
` 
 
` 

` 
` 
`
`
`
`Application
`+
 
`
`Behavior (l/O,
`,- 2
`
`
`CPU, ...)
`/3...
`
`Fig.1. Tuning parameters influencing energy-efficiency in composite cores
`$ +5+- 
`
`  
` 
    
`
`architecture Prior work on scheduling using these tuning parameters
`
`  
`+
`
`1
`0    
`
`+
`
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2124, p. 3
`
`

`

`Big (Composed)
`
`TABLL‘ l. ARCHITECTURAL SPECIFICATION
`
` .,--#. ,$,-,/
`- ,+
`
`
`Mieroareh. Parameter
`Base
`Composed
`+ 
  -

  
`1

`
`
` 
`
`2
`lssue-Commit width
`4
`
`,
1
`9
`7
`lNT instruction queue
`16 entries
`32 entries
`
`,-
` @
`5=
`
`(9
`
`FP instruction queue
`16 entries
`32 entries
`
`Reorder buffer
`32 entries
`64 entries
`$
` @
`5=
`
`(9
`
`
`.
`
`2
`
`(9
`
`=7
`
`Branch penalty
`7cyc
`14cyc
`
` 
`:  
`57  
`iLl-dLl Cache
`16KB/4—way/2cyc
`32KB/4—way/2cyc
`
` 5
 5 
`5=M B7
1B9  
`(9M B7
1B9  
`L2 Cache
`4MB/8-way/32cyc
`4MB/8-way/32cyc
` 9 
`7 B?
1B(9  
`7 B?
1B(9  
`
`
`
`
`
`
`
`4#
`
`Little (Base)
`4
, 
`
`4$56
`
`4$56
`
`4$56
`
`4$56
`
` 
`
` 
`
` 
`
` 
`
`, 7 
`
`4$52
`
`4$52
`
`4$52
`
`4$52
`
`Fig. 4. Conceptual structure of a four-core composite architecture
`$ +7+ 
` 
`
`

` 
`  
`
`
`compare five predictors using the machine learning algorithms
` 
` 
` 
`  
`  
`
`described in the section V to guide the scheduling in a CCA.
`
`2 &   +
`111. EXPERIMENTAL SETUP AND METHODOLOGY
`,,,+ F., - -#! -/!/ /"G
`This section provides the details of our experimental setup.
` - 
` 
`%
`+
`We use Sniper [13] version 6.1, a parallel, high speed and
`A  
` 45(6 
` =+5  
`    
`cycle-accurate x86 simulator for multi-core systems. McPAT
`  
 
` %?= 
` 
` 

` + -
`[15] is integrated with Sniper and is used to obtain power
`45;6  
` 1 
`     2 1
`
`consumption results. We study SPLASH-2 [14] and PARSEC
` 
`+A 
94576.
`[26] multithreaded benchmark suites
`for simulation. For
`49=6 
` 2 
`0  
` + $
`
`architectural
`simulation, we modeled a
`heterogeneous
`
`  
`  1   
` 
`composite cores architecture based on the recently proposed
`  
` 
`  
` 2  
`  
`
`work in [17, 18]. Our study is focusing on a CCA where two
`1
`045:5?6+/
`  1
`1
`little cores
`(base) can be configured into one big core
` 
` E2C  2  
`   2  
`
`(composed) and vice versa. Fig. 4 provides a conceptual
`E C    
`+ $ + 7 
`    
`overview of a four-core CCA. In this paper, we investigate two
` 
` 1
`

`+,
`1  1
`baseline heterogeneous CCAs which consist of multiple base
`2 
`   1     2
`and composed cores: 1) 8base/4comp, and 2) 4base/2comp. It
`  
`H 5C?2B7  9C 72B9 + ,
`is also important to note that for benchmark simulation we
`  
`    
` 2 
`0  1
`applied the binding (one-thread—per—core) model with #threads
`2 E

`

`

`C1I
`
`= #cores to maximize the performance of multithreaded
`JJ I 
`  %3  
`
`   
`
`applications [4, 5].
` 47;6+
`Table I shows the microarchitectural configuration of base
` -2,1
`
`  
`  
`2
`(little) and composed (big) core of CCA in our experiment. We
`EC E2 C 
`
`%
`+A
`collect performance counters data on each architecture for
`   
`
`  
`     
`  
` 
`
`characterization and drive
`the
`scheduling and mapping
` 
` 
`3  
` 
`      
`algorithms. We use these data to extract and evaluate the actual
` 
`+A%
`   
`behavior of applications (I/O, CPU or memory intensive) for
`2 
`    E,B/ # 
` 
`  C 
`
`predicting energy-efficiency and assisting scheduling decision.
`
`  
` 
      +
`IV. CHARACTERIZATION RESULTS
`,&+ .-.,K-,/.# -
`In this section, we evaluate the application performance and
`, 1  
`
` 
`energy-efficiency sensitivity to the tuning parameters of
`
` 
         
`
` 
`operating fiequency, number of running threads, and the choice
`
` 
`@ 2
`
` 
`  
`of microarchitectures (base vs. composed) in heterogeneous
` 
`
`  
` E2 + C  
` 
`composite cores architecture. Note that
`the entire set of
`  
` 
`  
`+    
`  
`benchmark analysis results is quite extensive. Therefore, due to
`2 
`0
`@% +-
`
`
`space limitation we only present the results for a limited
`   1  
` 
` 
`  
`number of representative benchmarks shown in Fig. 5. This
`2
` 
`
`  2 
`0 1  $ + ;+ -
`figure depicts the overall performance in terms of execution
` 
`     
` 
`
`   
`  % 
`time (represented as a bar graph) and EDP results (represented
`E
`
`2
`
`C!
`E
`
`
`as a line graph) for four different multithreaded benchmarks
`  
`C 
` 
` 
` 
` 2 
`0
`across different core types, fiequencies and number of threads.
`
`
` 
`
`@ 2
`
`+
`      
` 
`A. Frequency Sensitivity
`For this analysis, all benchmarks were simulated using a
`$
`    2 
`0 1
`    
`baseline composed core running with only a single thread. The
`2  
`
` 1 
`+-
`operating fiequency is swept fi'om 1.6 GHz to 2.8GHz with a
`
` 
`@ 1
`5+="39+?"31
`step of 400MHz and the voltage is changed between 0.7, 0.8,
`  7LL 3        21L+: L+?
`0.9, and 1V, respectively. As can be seen in Fig. 5, some
`L+8  5&
`  +   2   $ + ; 
`benchmarks are very sensitive to changing the fiequency. For
`2 
`0 
` 
`       
`@ + $
`
`   !!  

`"
`    
`@  
`instance, in finm and cholesky reducing the fiequency almost
`linearly reduces the overall performance. Overall, as expected,
`
`
`  
`
`
` +/ 
`% 
`as
`the
`fiequency increases,
`the performance
`increases
`
` 
`@ 
`
`
` 
`
` 
`
`
`accordingly. The EDP results show that a higher fiequency
` 
` + - !
` 1    
` 
`@ 
`
`results in higher EDP.
`Increasing the number of threads
`
`   
` !+ ,
`   2
`  
`
`interestingly reduces the sensitivity to fi'equency. In other
`
` 
`      
`@ + , 
`
`words, increasing the number of running threads increases the
`1
`
` 2
`
` 
`
`
`performance gain due to parallelization. Consequently,
`the
`
`
`     
`3+ @ 
`overall performance as the number of threads increases is more
` 
`
`
` 2
`
`
`
`
`influenced by the speedup gain as a results of parallelism rather
` 2 
`
`
`
`
`than operating at higher fi'equency. Moreover, the results show
`
`  
`
`@ + 
` 
`
`1
`that the base core is more sensitive to fiequency scaling than
`  2 
`  
`    
`@     
`the composed cores. This is also an interesting observation as
`  
`+-
` 2
` 
`the composed core has a large pipeline, allowing it to tolerate
`  
`
` 1 
`
`performance cost due to alterations in access latency to cache
`
`
`     
`        
`subsystem as a result of fi'equency scaling. Note that changing
`2
`
`@   +   
`clock fiequency changes the number of cycles it takes for the
`  0
`@   2
`  0
`
`processor to communicate with the cache.
`
` 
`  1  +
`# 
   
` 
`B. Core Type Sensitivity
`In this section,
`the results are reported for a baseline
`
`,    
` 
`
`
` 
`  2
`configuration with a core running a single thread at the highest
`  
`1 
`
`  
` 
`fiequency of 2.8 GHz and operating voltage of IV. The
`
`@   9+? "3  
`     5&+ -
`changing parameter is the core type, which varies between a
`    
`
`   
`  1  
` 21 
`base (little) core and a composed (big) core architecture. Core
`2EC 
` E2 C 
`
`  
`+
`
`type demonstrates constant behavior with regards to EDP. As
` 
`  2 
` 1
` 
`  !+ 
`shown in Fig. 5, there is a clear gap between the big composed
`1$ +;
` 
` 212  
`and little base cores (in Threadl and F28), with composed
`  2 
` E -
`5  $9+?C 1 
`core having lower EDP.
`In these cases,
`the performance
` 
`    1
` !+ ,    
`
` 
`benefits of the composed core outweigh the energy savings of
`2  
`1 
`   
`the base core.
`2 
`+
`   
  
` 
`C. Thread Count Sensitivity
`Finally, each benchmark is simulated with varying the
`$   2 
`0   1 
`  
`numbers of threads.
`In this
`step, each simulation was
`2
`  
`+ ,      1
`performed at the same fi'equency of 2.8 GHz and operating
`
`
`    
`@   9+? "3  
` 
`voltage of 1V, when changing the number of threads fiom 1 to
`  5&1   2
`
`
`5
`8. As shown in Fig. 5, increasing the thread counts leads to
`?+  1  $ + ; 
`   
`   
`better performance at the cost of higher power. Moreover, there
`2
`
`
`   
`1
`+ 
` 
`
`
`is a large gap between the EDP values of base and composed
`
`  21! 2 
`core in lower number of threads. In particular, we observe that
` 
`1
`2
`
`+,
` 
`12
` 
`by increasing the application thread counts, the corresponding
`2
`  
`  
`
` 
`gap between different core types diminishes and makes the
`  21 
` 
`    0 
`base core competitive to the composed core in terms of EDP.
`2 
`    
`
`!+
`$ %
  
`
` 
         

`D. Joint Analysis (Core Type, Frequency, Thread Count)
`To understand the
`interplay among various
`tuning
`- 
`
`
`
`   
`
` 
`parameters and find the optimum configuration for maximizing
`
`
`  
`
`%3 
`the energy-efiiciency, in this section all permutations of the
` 
` 
        
`  
`parameters were simulated. We test four voltage/fiequency
`
`
` 1
` + A  
`  B
`@ 
`settings on two core types and execute each multithreaded
`   1 
`   %   

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket