throbber
344
`
`Chapter 7 • Designing Sequential Logic Circuits
`
`such as channel length modulation and DIBL. Figure 7-22b plots the transient response
`for different device sizes and confirms that an individual WIL ratio of greater than 3 is
`required to overpower the feedback and switch the state of the latch.
`
`7.3 Dynamic Latches and Registers
`Storage in a static sequential circuit relies on the concept that a cross-coupled inverter pair pro(cid:173)
`duces a bistable element and can thus he used to memorize binary values. This approach has the
`useful property that a stored value remains valid as long as the supply voltage is applied to the
`circuit-hence the name static. The major disadvantage of the static gate, however, is its com(cid:173)
`plexity. When registers are used in computational structures that are constantly clocked (such as
`a pipelined datapath), the requirement that the memory should hold state for extended periods of
`time can be significantly relaxed.
`This results in a class of circuits based on temporary storage of charge on parasitic capaci(cid:173)
`tors. The principle Is exactly identical to the one used in dynamic logic-charge stored on a
`capacitor can be used to represent a logic signal. The absence of charge denotes a 0, while its
`presence stands for a stored I. No capacitor is ideal, unfortunately, and some charge leakage is
`always present. A stored value can thus only be kept for a limited amount of time, typically in
`the range of milliseconds. If one wants to preserve signal integrity, a periodic refresh of the value
`is necessary; hence, the name dynamic storage. Reading the value of the stored signal from a
`capacitor without dis1upting the charge requires the availability of a device with a high-input
`impedance.
`
`7.3.1 Dynamic Transmission-Gate Edge-Triggered Registers
`A fully dynamic positive edge-triggered register based on the master-slave concept is shown in
`Figure 7-23. When CLK = 0, the input data is sampled on storage node 1, which has an equiva(cid:173)
`lent capacitance of C 1, consisting of the gate capacitance of / 1, the junction capacitance of T1,
`and the overlap gate capacitance of T1• During this period, the slave stage is in a hold mode, with
`node 2 in a high-impedance (floating) state. On the rising edge of clock, the transmission gate T2
`turns on, and the value sampled on node I right before the 1ising edge propagates to the output Q
`(note that node I is stable during the high phase of the clock, since the first transmission gate is
`
`D
`
`CLK
`j_
`
`A
`
`T,
`T I
`
`11
`
`c,
`
`CLK
`Figure 7-23 Dynamic edge-triggered register.
`
`CLK
`
`CLK
`j_
`
`T,
`
`B
`
`!3
`
`Q
`
`C,
`
`T I-
`
`Dell Ex. 1025
`Page 233
`
`

`

`

`

`

`

`

`

`

`

`

`

`350
`
`Chapter 7 • Designing Sequential Logic Circuits
`
`Problem 7.5 Dual-Edge Registers
`
`Determine how the adoption of dual-edge registers influences the power dissipation in the dock(cid:173)
`distribution network.
`
`7.3.3 True Single-Phase Clocked Register (TSPCR)
`In the two-phase clocking schemes described earlier, care must be taken in routing the two clock
`signals to ensure that overlap is minimized. While the C2MOS provides a skew-tolerant solution,
`it is possible to design registers that only use a single phase clock. The True Single-Phase
`Clocked Register (TSPCR), proposed by Yuan and Svensson, uses a single clock [Yuan89]. The
`basic single-phase positive and negative latches are shown in Figure 7-30. For the positive latch,
`when CLK is high, the latch is in the transparent mode and corresponds to two cascaded invert(cid:173)
`ers; the latch is noninverting, and propagates the input to the output. On the other hand, when
`CLK = 0, both inverters are disabled, and the latch is in hold mode. Only the pull-up networks
`are still active, while the pull-down circuits are deactivated. As a result of the dual-stage
`approach, no signal can ever propagate from the input of the latch to the output in this mode. A
`register can be constructed by cascading positive and negative latches. The clock load is similar
`to a conventional transmission gate register, or C2:rvt0S register. The main advantage is the use
`of a single clock phase. The disadvantage is the slight increase in the number of transistors-12
`transistors are now required.
`As a reminder, note that a dynamic circuit in the style of Figure 7-30 must be used with
`caution. When the clock is low (for the positive latch), the output node may be floating, and it is
`exposed to coupling from other signals. Also, charge sharing can occur if the output node drives
`transmission gates. Dynamic nodes should be isolated with the aid of static inverters, or made
`pseudostatic for improved noise immunity.
`As with many other latch families, TSPC offers an additional advantage that we have not
`explored so far: The possibility of embedding logic functionality into the latches. This reduces
`
`Yoo
`
`VDD
`
`V'vo
`
`'" C:t r-9
`l:( ~
`
`CL~
`
`T
`-
`
`14
`
`Om
`
`r-9
`
`In
`
`Out
`
`"'1~~
`_i_
`
`..L
`I
`
`Figure 7-30 True Single-Phase Latches.
`
`Dell Ex. 1025
`Page 239
`
`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`r
`C I ~ rI CL~!
`(
`
`!
`l
`
`CL!S[
`
`I
`
`i-i
`
`--=
`
`f
`t
`
`-
`
`362
`
`Chapter 7 • Designing Sequential Logic Circuits
`
`VDD
`
`Voo
`
`Von
`
`r-----<J
`
`CL!Sj
`
`I
`'-
`
`0
`
`-
`Figure 7-43 Potential race condition during (0-0) overlap in c2M0S-based design.
`
`from stage to stage under this condition is when the logic function F is inverting, as illustrated
`in Figure 7-43, where F is replaced by a single, static CMOS inverter. Similar considerations
`are valid for the (1-1) overlap.
`Based on this concept, a logic circuit style called NORA-CMOS was conceived
`[Gon~alves83]. It combines C2M0S pipeline registers and NORA dynamic logic function
`blocks. Each module consists of a block of combinational logic that can be a mixture of static
`and d}'namic logic, followed b}' a C2MOS latch. Logic and latch are clocked in such a way that
`both are simultaneously in either evaluation, or hold (precharge) mode. A block that is in evalua(cid:173)
`tion during CLK = I is called a CLK module, while the inverse is called a CLK module. Exam(cid:173)
`ples of both classes are shown in Figure 7-44a and 7-44b, respectively. The operation modes of
`the modules are summarized in Table 7-2.
`A NORA datapath consists of a chain of alternating CLK and CLK modules. While one
`class of modules is precharging with its output latch in hold mode, preserving the previous
`output value, the other class is evaluating. Data is passed in a pipelined fashion from module
`to module. NORA offers designers a wide range of design choices. Dynamic and static logic
`
`Table 7-2 Operation modes for NORA logic modules.
`
`CLKblock
`
`CLKblock
`
`Logic
`
`Latch
`
`Logic
`
`Latch
`
`Precharge
`
`Hold
`
`Evaluate
`
`Evaluate
`
`Evaluate
`
`Evaluate
`
`Precharge
`
`Hold
`
`CLK=O
`
`CLK= l
`
`Dell Ex. 1025
`Page 251
`
`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`370
`
`Chapter 7 • Designing Sequential Logic Circuits
`
`5.0
`
`4.0
`
`'o'
`~ 3.0
`5
`-~
`~ 2.0
`
`LO
`
`0.0
`0.5
`
`1.0
`
`1.5
`V"rl (V)
`
`2.0
`
`2.5
`
`Figure 7-53
`
`tpHL of current-starved inverter as a function of the control voltage.
`
`This results in large variations of the propagation delay, as the drive current is exponen(cid:173)
`tially dependent on the drive voltage. When operating in this region, the delay is very sen(cid:173)
`sitive to variations in the contra] voltage and hence to noise.
`
`Another approach to implement the delay cell is to use a differential element as shown in
`Figure 7-54a. Since the delay cell provides both inverting and noninverting outputs, an oscillator
`with an even number of stages can be implemented. Figure 7-54b shows a two-stage differential
`VCO, where the feedback loop provides 180° phase shift through two gate delays, one nonin(cid:173)
`verting and the other inverting, therefore forming an oscillation. The simulated waveforms of
`this two-stage VCO are shown in Figure 7-54c. The in-phase and quadrature phase outputs are
`available simultaneously. The differential-type VCO has better immunity to common mode noise
`(for example, supply noise) compared with the common ring oscillator. However, it consumes
`more power due to its increased complexity and its static current.
`
`7.7 Perspective: Choosing a Clocking Strategy
`A crucial decision that must be made in the earliest phases of chip design is to select the appro(cid:173)
`priate clocking methodology. The reliable synchronization of the various operations occurring in
`a complex circuit is one of the most intriguing challenges facing the digital designer of the next
`decade. Choosing the right clocking scheme affects the functionality, speed, and power of a
`circuit.
`A number of widely used clocking schemes were introduced in this chapter. The most
`robust and conceptually simple scheme is the two-phase master-slave design. The predominant
`approach is to use the multiplexer-based register, and to generate the two clock phases locally by
`simply inverting the clock. More exotic schemes such as the glitch register are also used in prac(cid:173)
`tice. However, these schemes require significant fine-tuning and must only be used in specific
`situations. An example of such is the need for a negative setup time to cope with clock skew.
`The general trend in high-performance CMOS VLSI design is therefore to use simple
`clocking schemes, even at the expense of performance. Most automated design methodologies
`
`Dell Ex. 1025
`Page 259
`
`

`

`

`

`372
`
`Chapter 7 • Designing Sequential Logic Circuits
`
`propagation delay. These parameters must be carefully optimized, because they may
`account for a significant portion of the clock period.
`• Registers can be siatic or dynamic. A static register holds state as long as the power supply
`is turned on. It is ideal for memory that is accessed infrequently (e.g., reconfiguration reg(cid:173)
`isters or control information). Static registers use either multiplexers or overpowering to
`enable the writing of data.
`• Dynamic memory is based on temporary charge storage on capacitors. The primary advan(cid:173)
`tage is reduced complexity, higher performance. and lower pmver consumption. However,
`charge on a dynamic node leaks away with time, and dynamic circuits thus have a mini(cid:173)
`mum clock frequency. Pure dynamic memory is hardly used anymore. Register circuits are
`made pseudostatic to provide immunity against capacitive coupling and other sources of
`circuit induced noise.
`• Registers can also be constructed by using the pulse or glitch concept. An intentional pulse
`(using a one-shot circuit) is used to sample the input around an edge. Sense-amplifier(cid:173)
`based schemes also are used to construct registers; they should be used as well when high(cid:173)
`performance or low-signal-swing signalling is required.
`• Choice of clocking style is an important consideration. Two-phase design can result in race
`problems. Circuit techniques such as C2MOS can be used to eliminate race conditions in
`two-phase clocking. Another option is to use true single-phase clocking. However, the rise
`time of clocks must be carefully optimized to eliminate races.
`• The combination of dynamic logic with dynamic latches can produce extremely fast com(cid:173)
`putational structures. An example of such an approach, the NORA logic style, is very
`effective in pipelined datapaths.
`• Monostable structures have only one stable state; thus, they are useful as pulse generators.
`• Astable multi vibrators, or oscil1ators~ possess no stable state. The ring oscillator is the
`best-known example of a circuit of this class.
`• Schmitt triggers display hysteresis in their de characteristic and fast transitions in their
`transient response. They are mainly used to suppress noise.
`
`7.9 To Probe Further
`The basic concepts of sequential gates can be found in many logic design textbooks (e.g.,
`[Mano82] and [Hill74]). The design of sequential circuits is amply documented in most of the
`traditional digital circuit handbooks. [PartoviOl] and [Bemstein98] provide in-depth overviews
`of the issues and solutions in the design of high-performance sequential elements.
`
`References
`[Bemstein98} K. Bernstein et al., High-Speed CMOS Desig11 Styles, Kluwer Academic Publishers, 1998.
`(Dopperpuhl92] D. Dopperpuhl et aL, ''A 200 MHz 64-b Dual Issue CMOS Microproce55or," IEEE Journal of Solid(cid:173)
`Stale Circuits, vol. 27, no. l 1. Nov. 1992, pp. 1555-1567.
`[Gieseke971 B. Gieseke et al., "A 600 MHz Superscalar R[SC Microproces:.or with Out-of-Order Execution," IEEE
`lutemmional Soiid~Stale Circuits Co!lference, pp. 176-177, Feb. 1997.
`
`Dell Ex. 1025
`Page 261
`
`

`

`7.9 To Probe Further
`
`373
`
`[Gonr;alves83] N. Goni;alves and H. De Man, "NORA: a racefree dynamic CMOS technique for pipelined logic struc(cid:173)
`tures," IEEE Journal of Solid-Stare Circuits, vol. SC-18, no. 3, June 1983, pp. 261-266.
`[Hill74] F. Hill and G. Peterson, Jmroduction to Switching Theory and Logical Design, Wiley, 1974.
`{Jeong87] D. Jeong et al., "Design of PLL-based clock generation circuits," IEEE Journal of Solid-State Circuits, vol.
`SC-22, no. 2, April 198-7, pp. 255-261.
`[Kozu96J S. Kozu et al., "A 100 MHz 0.4 W RISC Processor with 200 MHz Multiply-Adder, using Pulse-Register Tech(cid:173)
`nique," IEEE ISSCC, pp. l40-14I, February 1996.
`[Mano82J M. Mano, Computer SysJem Archiiecture, Prentice.Hall, 1982,
`[Montanaro96] J. Ivfontanaro et al., "A i60-MHz, 32-b, 0.5-W CMOS RISC Microprocessor," IEEE Joumal ofSolid(cid:173)
`State Circuits. pp. 1703-1714, November 1996.
`[Mutoh95I S. Mutoh et al., "1-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltag-e
`CMOS," IEEE Journal of Solid State Circuirs, pp. 847-854, August 1995.
`[P.artovi96] H. Panovi, "Flow-Through Latch and Edge-Triggered Flip-Flop Hybrid Elements," IEEE lSSCC, pp. 138-
`139, February 1996.
`[Partovi01] H. Partovi. "Clocked Storage Elements," in Design of High-Pe1formmu:e Microprocessor Circuits,
`Chandakasan et al., ed., Chapter 11, pp. 207-233, 2001.
`[Schmitt38] 0. H. Schmitt, "A Thermionic Trigger," Joumal o/Scientijic Instruments, vol. 15, January 1938, p-p. 24-26.
`[Suzuki73] Y. Suzuki, K. Odagawa, and T. Abe, "Clocked CMOS calculator circuitry," IEEE Journal of Solid State Cir(cid:173)
`cuits, vol. SC-8, December l973, pp. 462-469.
`[Yuan89] J. Yuan and Svensson C., "High-Speed CMOS CircuitTeclmique," IEEE JSSC, vol. 24, no. l, February 1989,
`pp. 62-70.
`
`Dell Ex. 1025
`Page 262
`
`

`

`
`
`Dell Ex. 1025
`
`Page 263
`
`Dell Ex. 1025
`Page 263
`
`

`

`PART
`
`3
`
`A System
`Perspective
`
`"Art, it seems to me, should simplify. That, indeed, is very nearly the whole of the higher artistic
`process; finding what conventions of form and what of detail one can do without and yet pre(cid:173)
`serve the spirit of the whole."
`
`Willa Sibert Cather,
`On the Art of Fiction (l 920).
`
`"Simplicity and repose are the qualities that measure the true value of any work of art"
`
`Frank Lloyd Wright.
`
`Dell Ex. 1025
`Page 264
`
`

`

`Dell Ex. 1025
`
`Page 265
`
`Dell Ex. 1025
`Page 265
`
`

`

`CHAPTER
`
`8
`
`Implementation Strategies
`for Digital ICs
`
`Semicustom and structured design methodologies
`
`ASIC and system-011-a-chip design flows
`
`Co11figurahle hardware
`
`Introduction
`8.1
`8.2
`From Custom to Semicustom and Structured-Array Design Approaches
`8.3 Custom Circuit Design
`8.4 Cell-Based Design Methodology
`8.4.1 Standard Cell
`8.4.2 Compiled Cells
`8.4.3 Macrocells, Megacells, and Intellectual Property
`8.4.4 Semicustom Design Flow
`Array-Based Implementation Approaches
`8.5.1
`Prediffused (or Mask-Programmable) Arrays
`8.5.2 Prewired Arrays
`Perspective-The Implementation Platform of the Future
`Summary
`To Probe Further
`
`8.6
`8.7
`8.8
`
`8.5
`
`377
`
`Dell Ex. 1025
`Page 266
`
`

`

`

`

`8.1
`
`Introduction
`
`379
`
`-<
`
`/
`
`I
`j-<
`
`f-
`;:,
`f:::
`;:,
`0
`f'O ;:,
`"' :"i
`
`MEMORY
`
`i
`
`DATAPATH
`
`'
`
`I
`
`1
`' 1-<-~ CONTROL
`J
`
`Figure 8-2 Composition of a generic digital processor. The arrows
`represent the possible interconnections.
`
`could be the brain of a personal computer (PC), or the heart of a compact-disc player or cellular
`phone. It is composed of a number of building blocks that occur in one form or another in almost
`every digital processor:
`
`• The datapath is the core of the processor; it is where all computations are performed. The
`other blocks in the processor are support units that either store the results produced by the
`datapath or help to determine what will happen in the next cycle. A typical datapath con(cid:173)
`sists of an interconnection of basic combinational functions, such as logic (AND. OR,
`EXOR) or arithmetic operators (addition, multiplication, comparison, shift). lntennediate
`results are stored in registers. Different strategies exist for the implementation of data(cid:173)
`paths-sh·uctured custom cells versus automated standard cells, or fixed hard-wired versus
`flexible field-programmable fabric. The choice of the implementation platform is mostly
`influenced by the trade-off between different design metrics such as area, speed, energy,
`design time, and reusability.
`• The control module determines what actions happen in the processor at any given point
`in time. A controller can be viewed as a finite state machine (FSM). It consists of registers
`and logic, and thus is a sequential circuit. The logic can be implemented in different
`\\lays-either as an interconnection of basic logic gates (standard cells), or in a more struc(cid:173)
`tured fashion using programmable logic arrays (PLAs) and instruction memories.
`• The memory module serves as the centralized data storage area. A broad range of differ(cid:173)
`ent memory c1asses exist. The main difference between those classes is in the way data can
`be accessed, such as ''read only" versus "read-writet sequential versus random access, or
`single-ported versus multiported access. Another way of differentiating between memories
`is related to their data-retention capabilities. Dynamic memory structures must be
`refreshed periodically to keep their data, while static memmies keep their data .as long as
`the power source is turned on. Finally, nonvolatile memories such as flash memories con(cid:173)
`serve the stored data even when the supply voltage is removed. A single processor might
`combine different memory classes. For example, random access memory can be used to
`store data, and read-only memory may store instructions.
`
`Dell Ex. 1025
`Page 268
`
`

`

`

`

`

`

`382
`
`Chapter 8 • Implementation Strategies for Digital ICS
`
`efficiency-the number of operations that can be perfonned for a given amount of
`energy-of various implementation styles versus their flexibility-that is, the range of
`applications that can be mapped onto them. A staggering three orders of magnitude in
`variation can be observed. This clearly demonstrates that hard-wired or implementation
`styles with limited flexibility (such as configurable or parameterizable modules) are pref(cid:173)
`erable when energy efficiency is a must
`
`In this and the following three chapters, we discuss, respectively, implementation tech(cid:173)
`niques for random logic and controllers (this chapter), interconnect (Chapter 9), datapaths
`(Chapter l l), and memories (Chapter 12). Observe that the choice of the implementation
`approach can have a tremendous effect on the quality of the final product. Important aspects in
`the design of complex systems consisting of multiple blocks and thus deserving special attention
`are synchronization and timing (Chapter !Oi and the power disuibution network (Chapter 9).
`The distribution of clock signals and supply current has become one of the dominant problems
`in the design of state-of-the-an processors. A number of Design Methodology Inserts, inter(cid:173)
`spersed between the chapters, address the design challenge posed by these complex components,
`and introduce the advanced design automation tools that are available to the designer. Inserts F,
`G, and H discuss design synthesis, verification, and test. respectively.
`
`8.2 From Custom to Semicustom and Structured-Array
`Design Approaches
`The viability of a microelectronics design depends on a number of ( often) conflicting factors,
`such as performance in terms of speed or power consumption, cost, and production volume. For
`example, to be competitive in the market, a microprocessor has to excel in performance at a low
`cost to the customer. Achieving both goals simultaneously is only possible through Jarge sales
`volumes. The high development cost associated with high-perfonnance design is then amortized
`over many parts. Applications such as supercomputing and some defense applications present
`another scenario. \Vith ultimate performance as the primary design goal, high-performance cus(cid:173)
`tom design techniques often are desirable. The production volume is small, but the cost of elec(cid:173)
`tronic parts is only a fraction of the overall system costs and thus not much of an issue. Finally,
`reducing the system size through integration, not performance, is the major objective in most
`consumer applications. Under these circumstances, the design cost can be reduced substantially
`by using advanced design-automation techniques, which compromise performance, but mini(cid:173)
`mize design time. As noted in Chapter I, the cost of a semiconductor device is the sum of two
`components:
`
`• The nonrecurring expense (NRE), which is incurred only once for a design and includes
`the cost of designing the part.
`• The production cost per part, which is a function of the process compiexity, design area,
`and process yield.
`
`Dell Ex. 1025
`Page 271
`
`

`

`8.3 Custom Circuit Design
`
`383
`
`Digital Circuit lmplcmentalion Approaches I
`'
`'
`I Custom
`I I Semicustom I
`
`I
`
`I
`
`I Cell based
`'
`Standard cdb
`Compik<l cells
`
`I
`
`tvlacro cells
`
`I
`
`I
`
`I Array ba5ed I
`'
`Pre-diffused
`(GuteArrnys)
`
`Pre-wired
`(FPGA's)
`
`Figure 8-5 Overview of implementation approaches !or digital
`integrated circuits (after [DeMicheli94]).
`
`These economic considerations have spuffed the development of a number of distinct implemen(cid:173)
`tation approaches ranging from high-performance, handcrafted design to fully programmable,
`medium-to-low performance designs. Figure 8-5 provides an overview of the different method(cid:173)
`ologies. In the sections that follow, we discuss first the custom design methodology, followed by
`the semicustom and aiTay-based approaches.
`
`8.3 Custom Circuit Design
`When performance or design density is of primary importance, handcrafting the circuit topology
`and physical design seems to be the only option. Indeed, this approach was the only option in the
`early days of digital microelectronics, as is adequately demonstrated in the design of the Intel
`4004 microprocessor (see Figure 8-Sa). The labor-intensive nature of custom design translates
`into a high cost and a long time to market. Therefore, it can only be justified economically under
`the following conditions:
`
`• The custom block can be reused many times (for example, as a library cell).
`• The cost can be amortized over a large volume. Microprocessors and semiconductor mem(cid:173)
`ories are examples of applications in this class.
`• Cost is not the ptime design criterion, as it is in supercomputers or hypersupercomputers.
`
`With continuous progress in the design-automation arena, the share of cnstom design reduces
`from year to year. Even in the most advanced high-perfonnance microprocessors, such as the
`Intel Pentium® 4 processor (see Figure 8-6), virtually aH portions are designed automatically
`using semicustom design approaches. Only the most performance-critical moduJes such as the
`phase locked-loops and the clock buffers are designed manually. In fact, library cell design is the
`only area where custom design still th,ives today.
`The amount of design automation in the custom-design process is minimal, yet some
`design tools have proven indispensable. In concert with a wide range of verification, simulation.
`extraction and modeling toois, layout editors. design-rule and electrical-mle checkers-as
`
`Dell Ex. 1025
`Page 272
`
`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`392
`
`Chapter 8 • Implementation Strategies for Digital !CS
`
`8.4.3 Macrocells, Megacells and Intellectual Property
`Standardizing at the logic-gate !eve) is attractive for random logic functions, but it turns out to be
`inefficient for more complex structures such Us multipliers, data paths, memories, and embedded
`microprocessors and DSPs. By capturing the specific nature of these blocks, implementations
`can be obtained that outperform the results of the standard ASIC design process by a wide mar(cid:173)
`gin. Cells that contain a complexity that surpasses what is found in a typical standard-cell library
`are called macmcel/s (or, sometimes, megacells). Two types of macrocells can be identified:
`
`represents a module with a given functionality and a predetermined physical
`The Hard Macro
`design. The relative location of the transistors and the wiring \.Vithin the modnie is fixed. In
`essence, a hard macro represents a custom design of the requested function. In some cases, the
`macro is parameterized, which means that versions with slightly different properties are avail(cid:173)
`able or can be generated. I\1ultipliers and memories are examples: A hard multiplier macro may
`not only generate a 32 x l 6 multiplier, but also an 8 x 8 one.
`The advantage of the hard macro is that it brings with it all the good properties of custom
`design: dense layout, and optimized and predictable performance and power dissipation. By
`encapsulating the function into a macrornodule, it can be reused over and over in different
`designs. This reuse helps to offset the initial design cost. The disadvantage of the hard macro is
`that it is hard to porl the design to other technologies or to other manufacturers. For every new
`technology. a major redesign of the block is necessary. For this reason, hard macros are used less
`and less, and are employed mainly when the automated generation approach is far inferior or
`even impossible. Embedded memories and microprocessors are good examples of hard macros.
`They typically are provided by the IC manufacturer (who also provides the standard cell library),
`or the semiconductor vendor who has a particularly desirable function to offer (such as a stan(cid:173)
`dard microprocessor or DSP).
`In the case of a macro that can be parameterized, a generator called the module compiler is
`used to create the actual physical layout. Regular structures such as PLAs, memodes, and multi(cid:173)
`pliers are easily constructed by abutting predesigned leaf cells in a two-dimensional array topol(cid:173)
`ogy. All interconnections are made by abutment, and no or little extra routing is needed if the
`cells are designed correctly, which minimizes the parasitic capacitance. The PLA of Figure 8-11
`is an example of such a configuration. The whole mray can be constructed with a minimal num(cid:173)
`ber of cells. The generator itself is a simple software program that determines the relative posi(cid:173)
`tioning of the various leaf cells in the array.
`
`Example 8.4 A Memory Macromodule
`Figure 8-13 shows an example of a "hard" memory macrocell. The 256 x 32 SRAM block
`is generated by a parameterizable module generator. Besides creating the layout, the gen(cid:173)
`erator also provides accurate timing and power information. Modern memory generators
`also include an amount of redundancy to deal with defects.
`
`Dell Ex. 1025
`Page 281
`
`

`

`

`

`

`

`

`

`396
`
`Chapter 8 • Implementation Strategies for Digital ICS
`
`tailored to the application, and that the processor itself can easily be ported to different
`technologies and fabrication processes.
`Implementing the computation-intensive parts of the protocol (MAC/PHY) on the
`microprocessor would require very high clock speeds and would unnecessarily increase the
`power dissipation of the chip. Fortunately, these functions are fixed and typically do not
`require a flexible implementation. Hence, they are implemented as an accelerator module in
`standard cells. The hard-wired implementation accomplishes the task of implementing a
`huge number of computations at a relatively low power leve] and clock frequency. The
`designer of a system on a chip is continuously faced with the challenge of deciding what is
`more desirable-after-the-fabrication flexibility versus higher performance at lower power
`levels. Fortunately, tools are emerging that help the designer to explore the overall design
`space and analyze the trade-offs in an informed fashion [SilvaOIJ. Observe also that the chip
`contains a set of I/0 interfaces, as well as an embedded network module, which helps to
`orchestrate the traffic between processor and the various accelerator and I/0 modules.
`
`The generation process of a macro module depends on the hard or soft nature of the block,
`as weH as the level of design entry. ln the following sections, we briefly discuss some com(cid:173)
`monly: used approaches.
`
`8.4.4 Semicustom Design Flow
`So far, we have defined the components that make up the cell-based design methodology. In this
`section, we discuss how it all comes together. Figure 8-16 details the traditional sequence of
`steps to design a semicustom circuit. The steps of what we caH the design flow are enumerated
`in the figure, with a brief description of each:
`
`l. Design Capture enters the design into the ASIC design system. A variety of methods can
`be used to do so, including schematics and block diagrams; hardware description lan(cid:173)
`guages (HDLs) such as VHDL, Verilog, and. more recently, C-derivatives such as Sys(cid:173)
`temC; behavioral description languages followed by high-level synthesis; and imported
`intellectual property modules.
`2. Logic Synthesis tools translate modules described using an HDL language into a netlist.
`Netlists of reused or generated macros can then be inserted to form the complete netlist of
`the design.
`3. Prelayout Simulation and Verification. The design is checked for conectness. Perfor(cid:173)
`mance analysis is pe1formed based on estimated parasitics and layout parameters. If the
`design is found to be nonfunctional, extra iterations over the design capture or the logic
`synthesis are necessary.
`4. Floor Planning. Based on estimated module sizes, the overall outlay of the chip is cre(cid:173)
`ated. The global-power and clock-distribution networks also are conceived at that time.
`
`Dell Ex. 1025
`Page 285
`
`

`

`

`

`

`

`8.5 Array-Based Implementation Approaches
`
`399
`
`ations, it throws quite a challenge at the design-tool developers. \Vith the number of parasitic
`effects increasing with every round of technology scaling, the design optimization process that
`must take all this into account becomes exponentiaHy complex as well. As a result, other
`approaches might be required as well. In the coming chapters, we will highlight "design solu(cid:173)
`tions" that can help to alieviate some of these problems. An example is the use of regular and
`predictable structures, both at the logical and the physical level.
`
`8.5 Array-Based Implementation Approaches
`While design automation can help reduce the design time, it does not address the time spent in
`the manufactming process. AH of the design methodologies discussed thus far require a com(cid:173)
`plete run through the fabrication process.This can take from three weeks to several months, and
`it can substantially delay the introduction of a product Additionally, with ever-increasing mask
`costs, a dedicated process run is expensive, and product economics must determine if this is a
`viable route.
`Consequently, a number of alternative implementation approaches have been devised that
`do not require a complete run through the manufacturing process, or they avoid dedicated pro(cid:173)
`cessing completely. These approaches have the advantage of having a lower NRE (nonrecurring
`expense) and are, therefore, more attractive for small series. This comes at the expense of lower
`performance, lower integration density, or higher power dissipation.
`
`Prediffused {or Mask-Programmable) Arrays
`
`8.5.1
`In this approach, batches of wafers containing arrays of primitive cells or transistors are manu(cid:173)
`factured by the vendors and stored. All the fabrication steps needed to make transistors are stan(cid:173)
`dardized and executed without regard to the final application.
`To transform these uncommitted wafers into an actual design, only the desired intercon(cid:173)
`nections have to be added. detennining the overall function of the chip with only a few metalli(cid:173)
`zation steps. These layers

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket