throbber
Homayoun
`
`Reference 9
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2121, p. 1
`
`

`

`Amdahl’s Law in the Multicore Era
`
`Mark D. Hill and Michael R. Marty
`
`Everyone knows Amdahl's Law, but quickly forgets it.
`-Dr. Thomas Puzak, IBM, 2007
`
`TABLE 1. Upper Bound on Speedup, f=0.99
`
`Dynamic
`
`Asymmetric
`
`Symmetric
`
`Base Amdahl
`
`Equivalents
`# Base Core
`
`16
`64
`256
`1024
`
`14
`39
`72
`91
`
`14
`39
`80
`161
`
`14
`49
`166
`531
`
`< 16
`< 60
`< 223
`< 782
`
`execution.The right-most three columns show the larger upper
`bounds on speedup enabled using richer cores deployed in
`symmetric, asymmetric, and dynamic multicore designs.
`
`Implications of our work include:
`• Not surprisingly, researchers and architects should
`aggressively attack serial bottlenecks in the multicore era.
`(cid:127) Increasing core performance, even if it appears locally
`inefficient, can be globally efficient by reducing the idle
`time of the rest of the chip’s resources.
`(cid:127) Chips with more resources tend to encourage designs
`with richer cores.
`(cid:127) Asymmetric multicore designs offer greater potential
`speedup than symmetric designs, provided challenges
`(e.g., scheduling) can be addressed.
`(cid:127) Dynamic designs—that
`temporarily harness cores
`together to speed sequential execution—have the poten-
`tial to achieve the best of both worlds.
`Overall, we show that Amdahl’s Law beckons multicore
`designers to view performance of the entire chip rather than
`zeroing in on core efficiencies. In the following sections, we
`first review Amdahl’s Law. We then present simple hardware
`models for symmetric, asymmetric, and dynamic multicore
`chips.
`
`Amdahl's Law Background
`Most computer scientists learned Amdahl Law's in school
`[5]. Let speedup be the original execution time divided by an
`enhanced execution time. The modern version of Amdahl's
`
`Abstract
`We apply Amdahl’s Law to multicore chips using symmet-
`ric cores, asymmetric cores, and dynamic techniques that
`allows cores to work together on sequential execution. To
`Amdahl’s simple software model, we add a simple hardware
`model based on fixed chip resources.
`A key result we find is that, even as we enter the multicore
`era, researchers should still seek methods of speeding sequen-
`tial execution. Moreover, methods that appear locally ineffi-
`cient (e.g., tripling sequential performance with a 9x resource
`cost) can still be globally efficient as they reduce the sequen-
`tial phase when the rest of the chip’s resources are idle.
`To reviewers: This paper’s accessible form is between a
`research contribution and a perspective. It seeks to stimulate
`discussion, controversy, and future work. In addition, it seeks
`to temper the current pendulum swing from the past’s under-
`emphasis on parallel research to a future with too little
`sequential research.
`
`Today we are at an inflection point in the computing land-
`scape as we enter the multicore era. All computing vendors
`have announced chips with multiple processor cores. More-
`over, vendor roadmaps promise to repeatedly double the num-
`ber of cores per chip. These future chips are variously called
`chip multiprocessors, multicore chips, and many-core chips.
`Designers of multicore chips must subdue more degrees of
`freedom than single-core designs. Questions include: How
`many cores? Should cores use simple pipelines or powerful
`multi-issue ones? Should cores use the same or different
`micro-architectures? In addition, designers must concurrently
`manage power from both dynamic and static sources.
`While answers to these questions are challenges for
`today’s multicore chip with 2-8 cores, they will get much more
`challenging in the future. Source as varied as Intel and Berke-
`ley predict a hundred [6] if not a thousand cores [2].
`It is our thesis that Amdahl's Law has important conse-
`quences for the future of our multicore era. Since most of us
`learned Amdahl's Law in school, all of our points are “known”
`at some level. Our goal is ensure we remember their implica-
`tions and avoid the pitfalls that Puzak fears.
`Table 1 foreshadows the results we develop for applica-
`tions that are 99% parallelizable. For varying number of base
`cores, the second column gives the upper bounds on speedup
`as predicted by Amdahl’s Law. In this paper, we develop a
`simple hardware model that reflects potential tradeoffs in
`devoting chip resources towards either parallel or sequential
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2121, p. 2
`
`

`

`Architects should always increase core resources when
`perf(r) > r, because doing so speeds up both sequential and par-
`allel execution. When perf(r) < r, however, the tradeoff begins:
`increasing core performance aids sequential execution, but hurts
`parallel execution.
`Our equations allow perf(r) to be an arbitrary function, but
`all the graphs below assume perf(r) =
`r
`. In other words, we
`assume efforts that devote r BCE resources will result in perfor-
`r
`mance
`. Thus, architectures can double performance at a cost
`of 4 BCEs, triple it for 9 BCEs, etc. We tried other similar func-
`1.5
`r
`tions, e.g.,
`, but found no important changes to our results.
`
`Symmetric Multicore Chips
`A symmetric multicore chip requires that all its cores have
`the same cost. A symmetric multicore chip with a resource bud-
`get of n = 64 BCEs, for example, can support 64 cores of 1 BCE
`each, 16 cores of 4 BCEs each, or, in general, n/r cores of r
`BCEs each (rounded down to an integer number of cores). Fig-
`ures 1 and 2 show cartoons of two possible symmetric multicore
`chips for n = 16. The figures illustrate area, not power, as the
`chip’s limiting resource and omit important structures such as
`memory interfaces, shared caches, and interconnects.
`Under Amdahl's Law, the speedup of a symmetric multicore
`chip (relative to using one single-BCE core) depends on the soft-
`ware fraction that is parallelizable (f), total chip resources in
`BCEs (n), and the BCE resources (r) devoted to increase the per-
`formance of each core. The chip uses one core to execute
`sequentially at performance perf(r). It uses all n/r cores to exe-
`cute in parallel at performance perf(r)*n/r. Overall, we get:
`
`=
`
`Speedupsymmetric f n r
`,
`,
`(
`
`)
`
`1
`--------------------------------------------------
`f–
`f r⋅
`1
`------------------
`-------------------------
`+
`perf r( )
`perf r( ) n⋅
`To understand this equation, let’s begin with the upper-left
`graph of Figure 4. It assumes a symmetric multicore chip of n =
`r
`16 BCEs and perf(r) =
`. The x-axis gives resources used to
`increase performance of each core: a value 1 says the chip has 16
`base cores, while 16 uses all resources for a single core. Lines
`assume different values for the fraction parallel (f=0.5, 0.9, ...,
`0.999). The y-axis gives the speedup of the symmetric multicore
`chip (relative to running on one single-BCE base core). The
`maximum speedup for f=0.9, for example, is 6.7 using 8 cores of
`cost 2 BCEs each. The remaining left-hand graphs give speedups
`for symmetric multicore chips with chip resources of n = 64,
`256, and 1024 BCEs.
`Result 1: Amdahl’s Law applies to multicore chips, as achieving
`good speedups requires f’s that are very near 1. Thus, finding
`parallelism is critical.
`Implication 1: Researchers should target increasing f via archi-
`tectural support, compiler techniques, programming model
`improvements, etc.
`Implication 1 is both most obvious and most important.
`Recall, however, that speedups much less than n can still be cost
`effective.1
`Result 2: Using more BCEs per core, r > 1, can be optimal, even
`r
`. For a given f, the maxi-
`when performance grows by only
`
`Law states that if one enhances a fraction f of a computation by a
`speedup S, then the overall speedup is:
`Speedupenhanced f S,(
`
`)
`
`=
`
`1
`-------------------------
`1 f–(
`--+
`)
`
`fS-
`
`Amdahl's Law applies broadly and has important corollaries
`such as:
`(cid:127) Attack the common case: If f is small, your optimizations
`will have little effect.
`(cid:127) But the aspects you ignore also limit speedup:
`As S goes to infinity, Speedup goes to1/(1-f).
`Four decades ago, Amdahl originally defined his law for the
`special case of using n processors (cores today) in parallel when
`he argued for the Validity of the Single Processor Approach to
`Achieving Large Scale Computing Capabilities [1]. He simplisti-
`cally assumed that a fraction f of a program's execution time was
`infinitely parallelizable with no overhead, while the remaining
`fraction, 1-f, was totally sequential. Without presenting an equa-
`tion, he noted that the speedup on n processors is governed by:
`1
`Speedupparallel f n,(
`-------------------------
`1 f–(
`--+
`)
`
`)
`
`=
`
`fn-
`
`Finally, he argued that typical values of 1-f were large
`enough to favor single processors.
`While Amdahl's arguments were simple, they held and
`mainframes with one or a few processors dominated the comput-
`ing landscape. They also largely held in minicomputer and per-
`sonal computer eras that followed. As recent technology trends
`usher us into the multicore era, we investigate whether Amdahl’s
`Law is still relevant.
`
`A Simple Cost Model for Multicore Chips
`To apply Amdahl's Law to a multicore chip, we need a cost
`model for the number and performance of cores that the chip can
`support. Herein we develop a simple hardware model in the
`spirit of Amdahl's simple software model.
`We first assume that a multicore chip of given size and tech-
`nology generation can contain at most n base core equivalents
`(BCEs), where a single BCE implements the baseline core. This
`limit comes from the resources a chip designer is willing to
`devote to processor cores (with L1 caches). It does not include
`chip resources expended on shared caches, interconnections,
`memory controllers, etc. Rather we simplistically assume that
`these non-processor resources are roughly constant in the multi-
`core variations we consider.
`We are agnostic on what limits a chip to n BCEs. It may be
`power; it may be area; it may be some combination of power,
`area, and other factors.
`We second assume that (micro-) architects have techniques
`for using the resources of multiple BCEs to create a richer core
`with greater sequential performance. Let the performance of a
`single-BCE core be 1. We specifically assume that architects can
`expend the resources of r BCEs to create a rich core with
`sequential performance perf(r).
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2121, p. 3
`
`

`

`Note: These cartoons omit important structures and assume area, not power, is a chip’s limiting resource.
`
`Figure 3. Asymmetric
`Figure 2. Symmetric
`Figure 1. Symmetric
`One 4-BCE core, 12 1-BCE cores
`Sixteen 1-BCE cores
`Four 4-BCE cores
`mum speedup can occur at 1 big core, n base cores, or with an
`symmetric speedups. The symmetric curves typically show
`intermediate number of middle-sized cores. Consider, n=16.
`either immediate performance improvement or performance loss
`With f=0.5, one core (of cost 16 BCEs) gives the best speedup of
`as the chip uses more powerful cores, depending on the level of
`4. With f=0.975, 16 single-BCE cores provide a speedup of 11.6.
`parallelism. In contrast, asymmetric chips reach a maximum
`With n=64, f=0.9, 9 cores of 7.1 BCEs each provides an overall
`speedup in the middle of the extremes.
`speedup of 13.3.
`Result 4: Asymmetric multicore chips can offer maximum
`Implication 2: Researchers should seek methods of increasing
`speedups that are much greater than symmetric multicore chips
`(and never worse). For f=0.975 and n=256, for example, the best
`core performance even at a high cost.
`asymmetric speedup is 125.0 whereas the best symmetric
`Result 3: Moving to denser chips increases the likelihood that
`speedup 51.2. For n=1024 and the same f, the difference
`cores should be non-minimal. Even at f=0.99, minimal base
`increases to 364.5 versus 102.5. This result follows from
`cores are optimal at chip sizes n=16 and 64, but more powerful
`Amdahl’s idealized software assumptions, wherein software is
`cores help at n=256 and 1024.
`either completely sequential or completely parallel.
`Implication 3: Even as Moore’s Law allows larger chips,
`Implication 4: Researchers should continue to investigate
`researchers should look for ways to design more powerful cores.
`asymmetric multicore chips. However, real chips must deal with
`many challenges, such as scheduling different phases of parallel-
`ism with real overheads. Furthermore, chips may need multiple
`larger cores for multiprogramming and workloads that exhibit
`overheads not captured by Amdahl’s model.
`Result 5: Denser multicore chips increase both the speedup ben-
`efit of going asymmetric (see above) and the optimal perfor-
`mance of the single large core. For f=0.975 and n=1024, for
`example, best speedup is obtained with one core of 345 BCEs
`and 679 single-BCE cores.
`Implication 5: Researchers should investigate methods of
`speeding sequential performance even if they appear locally
`r
`inefficient, e.g., perf(r) =
`. This is because these methods can
`be globally efficient as they reduce the sequential phase when
`the chip’s other n-r cores are idle.
`
`Asymmetric Multicore Chips
`An alternative to a symmetric multicore chip is an asymmet-
`ric multicore chip where one or more cores are more powerful
`than the others [3, 8, 9, 12]. With the simplistic assumptions of
`Amdahl's Law, it makes most sense to devote extra resources to
`increase the capability of only one core, as shown in Figure 3.
`With a resource budget of n=64 BCEs, for example, an asym-
`metric multicore chip can have one 4-BCE core and 60 1-BCE
`cores, one 9-BCE core and 55 1-BCE cores, etc. In general, the
`chip can have 1+n-r cores since the single larger core uses r
`resources and leaves n-r resources for the 1-BCE cores.
`Amdahl's Law has a different effect on an asymmetric multi-
`core chip. This chip uses the one core with more resources to
`execute sequentially at performance perf(r). In the parallel frac-
`tion, however, it gets performance perf(r) from the large core
`and performance 1 from each of the n-r base cores. Overall, we
`get:
`
`Speedupasymmetric f n r
`,
`,
`(
`
`)
`
`=
`
`1
`-------------------------------------------------------------
`1 f–
`f
`------------------
`------------------------------------
`+
`perf r( )
`perf r( )
`n r–+
`
`The asymmetric speedup curves are shown in Figure 4.
`These curves are markedly different from the corresponding
`
`1. A system is cost-effective if its speedup exceeds its costup
`[13]. Multicore costup is the multicore system cost divided by
`the single-core system cost. Since this costup is often much less
`than n, speedups less than n can be cost effective.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2121, p. 4
`
`

`

`Symmetric, n = 16
`
`Asymmetric, n = 16
`
`4
`r BCEs
`
`8
`
`16
`
`Asymmetric, n = 64
`
`4
`
`8
`r BCEs
`
`16
`
`32
`
`64
`
`f=0.999
`
`f=0.99
`
`f=0.975
`
`f=0.9
`
`f=0.5
`
`2
`
`f=0.999
`
`f=0.99
`
`f=0.975
`
`f=0.9
`
`f=0.5
`2
`
`16
`
`14
`
`12
`
`10
`
`8
`
`6
`
`4
`
`2
`
`60
`
`50
`
`40
`
`30
`
`20
`
`10
`
`asymmetric
`
`Speedup
`
`asymmetric
`
`Speedup
`
`f=0.999
`
`f=0.99
`
`f=0.975
`
`f=0.9
`
`f=0.5
`
`2
`
`4
`r BCEs
`
`8
`
`16
`
`Symmetric, n = 64
`
`f=0.999
`
`f=0.99
`
`f=0.975
`
`f=0.9
`
`f=0.5
`
`2
`
`4
`
`8
`r BCEs
`
`16
`
`32
`
`64
`
`16
`
`14
`
`12
`
`10
`
`8
`
`6
`
`4
`
`2
`
`60
`
`50
`
`40
`
`30
`
`20
`
`10
`
`symmetric
`
`Speedup
`
`symmetric
`
`Speedup
`
`250
`
`f=0.999
`
`Symmetric, n = 256
`
`Asymmetric, n = 256
`
`250
`
`f=0.999
`
`200
`
`150
`
`100
`
`f=0.99
`
`50
`
`f=0.975
`
`f=0.9
`f=0.5
`
`asymmetric
`
`Speedup
`
`4
`
`8
`
`16
`r BCEs
`
`32
`
`64
`
`128
`
`256
`
`2
`
`4
`
`8
`
`16
`r BCEs
`
`32
`
`64
`
`128
`
`256
`
`Symmetric, n = 1024
`
`Asymmetric, n = 1024
`
`f=0.99
`
`200
`
`150
`
`100
`
`symmetric
`
`Speedup
`
`50
`
`f=0.975
`
`f=0.9
`f=0.5
`2
`
`f=0.99
`
`f=0.975
`
`f=0.9
`f=0.5
`32
`r BCEs
`
`64
`
`128
`
`256
`
`512
`
`1024
`
`f=0.999
`
`1000
`
`900
`
`800
`
`700
`
`600
`
`500
`
`400
`
`300
`
`200
`
`100
`
`asymmetric
`
`Speedup
`
`64
`
`128
`
`256
`
`512
`
`1024
`
`2
`
`4
`
`8
`
`16
`
`f=0.999
`
`f=0.99
`f=0.975
`f=0.9
`f=0.5
`
`2
`
`4
`
`8
`
`16
`
`32
`r BCEs
`
`1000
`
`900
`
`800
`
`700
`
`600
`
`500
`
`400
`
`300
`
`200
`
`100
`
`symmetric
`
`Speedup
`
`Figure 4. Speedup of Symmetric and Asymmetric Multicore chips.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2121, p. 5
`
`

`

`locally inefficient, as with asymmetric chips, the methods can be
`globally efficient. While these methods may be difficult to apply
`under Amdahl’s extreme assumptions, they could flourish for
`software with substantial phases of intermediate-level parallel-
`ism.
`
`Conclusions
`To Amdahl’s simple software model, we add a simple hard-
`ware model and compute speedups for symmetric, asymmetric,
`and dynamic multicore chips:
`
`Speedupsymmetric f n r
`,
`(
`,
`
`)
`
`=
`
`Speedupasymmetric f n r
`,
`(
`,
`
`)
`
`=
`
`1
`---------------------------------------------------
`f r⋅
`1 f–
`--------------------------
`------------------
`+
`perf r( ) n⋅
`perf r( )
`
`1
`-------------------------------------------------------------
`f
`f–
`1
`------------------------------------
`------------------
`+
`perf r( ) n r–+
`perf r( )
`
`1
`--------------------------
`1 f–
`--+
`----------------
`perf(r)
`
`fn-
`
`Speedupdynamic f n r
`,
`,
`(
`
`)
`
`=
`
`Results first reaffirm that seeking greater parallelism is criti-
`cal to obtaining good speedups. We then show how core designs
`that are locally inefficient can be globally efficient.
`Of course, our model seeks insight by making many simpli-
`fying assumptions. The real world is much more complex.
`Amdahl’s simple software model may not hold reasonably for
`future software. Future mainstream parallel software may also
`behave differently from today’s highly tuned parallel scientific
`and database codes. Our simple hardware model does not
`account for the complex tradeoffs between cores, cache capacity,
`interconnect resources, and off-chip bandwidth. Nevertheless,
`we find value in the controversy and discussion that drafts of this
`paper have already stimulated.
`We thank Shailender Chaudhry, Robert Cypher, Anders Lan-
`din, José F. Martínez, Kevin Moore, Andy Phelps, Thomas
`Puzak, Partha Ranganathan, Karu Sankaralingam, Mike Swift,
`Marc Tremblay, Sam Williams, David Wood, and the Wisconsin
`Multifacet group for their comments and/or proofreading. This
`work is supported in part by the National Science Foundation
`(NSF), with grants EIA/CNS-0205286, CCR-0324878, and
`CNS-0551401, as well as donations from Intel and Sun Micro-
`systems. Hill has significant financial interest in Sun Microsys-
`tems. The views expressed herein are not necessarily those of the
`NSF, Intel, or Sun Microsystems.
`
`sequential
`mode
`
`parallel
`mode
`
`Figure 5. Dynamic
`Sixteen 1-BCE cores
`
`Dynamic Multicore Chips
`What if architects could have their cake and eat it too? Con-
`sider dynamically combining up to r cores together to boost per-
`formance of only the sequential component, as shown in
`Figure 5. This could be possible with thread-level speculation,
`helper threads, etc. [4, 7, 10, 11]. In sequential mode, this
`dynamic multicore chip can execute with performance perf(r)
`when the dynamic techniques can use r BCEs. In parallel mode,
`a dynamic multicore gets performance n using all base cores in
`parallel. Overall, we get:
`
`1
`-------------------------
`f–
`1
`----------------
`--+
`perf(r)
`
`fn-
`
`Speedupdynamic f n r
`,
`,
`(
`
`)
`
`=
`
`Of course dynamic techniques may have some additional
`resource overhead (e.g., area) not reflected in the equation as
`well as additional runtime overhead when combining and split-
`ting cores. Figure 6 displays dynamic speedups when using r
`r
`cores in sequential mode for perf(r) =
`. Light grey lines give
`the corresponding values for asymmetric speedup. The graphs
`show that performance always gets better as more BCE
`resources can be exploited to improve the sequential component.
`Practical considerations, however, may keep r much smaller
`than its maximum of n.
`Result 6: Dynamic multicore chips can offer speedups that can
`be greater, and are never worse, than asymmetric chips. With
`Amdahl’s sequential-parallel assumption, however, achieving
`much greater speedup than asymmetric chips requires that
`dynamic techniques harness large numbers of BCEs in sequen-
`tial mode. For f=0.999 and n=256, for example, the dynamic
`speedup attained when 256 BCEs can be utilized in sequential
`mode is 963. However the comparable asymmetric speedup is
`748. This result follows because we assume that dynamic chips
`can both gang together all resources for sequential execution and
`free them for parallel execution.
`Implication 6: Researchers should continue to investigate meth-
`ods that approximate a dynamic multicore chip, such as thread
`level speculation, and helper threads. Even if the methods appear
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2121, p. 6
`
`

`

`References
`[1] G. M. Amdahl. Validity of the Single-Processor Approach
`to Achieving Large Scale Computing Capabilities. In
`AFIPS Conference Proceedings, pages 483–485, 1967.
`[2] Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro,
`Joseph James Gebis, Parry Husbands, Kurt Keutzer,
`David A. Patterson, William Lester Plishker, John Shalf,
`Samual Webb Williams, and Katherine A. Yelick. The
`Landscape of Parallel Computing Research: A View from
`Berkeley. Technical Report Technical Report No.
`UCB/EECS-2006-183, EECS Department, University of
`California, Berkeley, 2006.
`[3] Saisanthosh Balakrishnan, Ravi Rajwar, Michael Upton,
`and Konrad Lai. The Impact of Performance Asymmetry in
`Emerging Multicore Architectures. In ISCA 32, June 2005.
`[4] Lance Hammond, Mark Willey, and Kunle Olukotun. Data
`Speculation Support for a Chip Multiprocessor. In ASPLOS
`8, pages 58–69, October 1998.
`[5] John L. Hennessy and David A. Patterson. Computer
`Architecture: A Quantitative Approach. Morgan
`Kaufmann, third edition, 2003.
`[6] From a Few Cores to Many: A Tera-scale Computing
`Research
`Overview.
`ftp://download.intel.com/research/platform/terascale/terasc
`ale_overview_pape% r.pdf, 2006.
`[7] Engin Ipek, Meyrem Kirman, Nevin Kirman, and Jose F.
`Martinez. Core Fusion: Accomodating Software Diversity
`in Chip Multiprocessors. In ISCA 34, June 2007.
`[8] J.A. Kahl, M.N. Day, H.P. Hofstee, C.R. Johns, T.R.
`Maeurer, and D. Shippy.
`Introduction
`to
`the Cell
`IBM
`Journal
`of Research
`and
`Multiprocessor.
`Development, 49(4), 2005.
`[9] Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi,
`Parthasarathy Ranganathan, and Dean M. Tullsen. Single-
`ISA Heterogeneous Multi-Core Architectures: The
`Potential for Processor Power Reduction. In MICRO 36,
`December 2003.
`[10] Jose Renau, Karin Strauss, Luis Ceze, Wei Liu, Smruti
`Sarangi, James Tuck, and Josep Torrellas. Energy-Efficient
`Thread-Level Speculation on a CMP. IEEE Micro, 26(1),
`Jan/Feb 2006.
`[11] G.S. Sohi, S. Breach, and T.N. Vijaykumar. Multiscalar
`Processors. In ISCA 22, pages 414–425, June 1995.
`[12] M. Aater Suleman, Yale N. Patt, Eric A. Sprangle, Anwar
`Rohillah, Anwar Ghuloum, and Doug Carmean. ACMP:
`Balancing Hardware Efficiency
`and Programmer
`Efficiency. Technical Report HPS Technical Report, TR-
`HPS-2007-001, University of Texas, Austin, February
`2007.
`[13] David A. Wood and Mark D. Hill. Cost-Effective Parallel
`Computing. IEEE Computer, pages 69–72, February 1995.
`
`Dynamic, n = 16
`
`4
`r BCEs
`
`8
`
`16
`
`Dynamic, n = 64
`
`4
`
`8
`r BCEs
`
`16
`
`32
`
`64
`
`Dynamic, n = 256
`
`f=0.999
`
`f=0.99
`
`f=0.975
`
`f=0.9
`
`f=0.5
`
`2
`
`f=0.999
`
`f=0.99
`
`f=0.975
`
`f=0.9
`
`f=0.5
`2
`
`16
`
`14
`
`12
`
`10
`
`8
`
`6
`
`4
`
`2
`
`60
`
`50
`
`40
`
`30
`
`20
`
`10
`
`dynamic
`
`Speedup
`
`dynamic
`
`Speedup
`
`f=0.999
`
`f=0.99
`
`250
`
`200
`
`150
`
`100
`
`dynamic
`
`Speedup
`
`50
`
`f=0.975
`
`f=0.9
`f=0.5
`2
`
`1000
`
`900
`
`800
`
`700
`
`600
`
`500
`
`400
`
`300
`
`200
`
`100
`
`dynamic
`
`Speedup
`
`4
`
`8
`
`16
`r BCEs
`
`32
`
`64
`
`128
`
`256
`
`Dynamic, n = 1024
`
`f=0.999
`
`f=0.99
`
`f=0.975
`
`f=0.9
`f=0.5
`
`32
`r BCEs
`
`2
`
`4
`
`8
`
`16
`
`64
`
`128
`
`256
`
`512
`
`1024
`
`Figure 6. Speedup of Dynamic Multicore Chips
`(light lines show asymmetric speedup)
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2121, p. 7
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket