`
`Chapter 7 • Designing Sequential Logic Circuits
`
`such as channel length modulation and DIBL. Figure 7-22b plots the transient response
`for different device sizes and confirms that an individual WIL ratio of greater than 3 is
`required to overpower the feedback and switch the state of the latch.
`
`7.3 Dynamic Latches and Registers
`Storage in a static sequential circuit relies on the concept that a cross-coupled inverter pair pro(cid:173)
`duces a bistable element and can thus he used to memorize binary values. This approach has the
`useful property that a stored value remains valid as long as the supply voltage is applied to the
`circuit-hence the name static. The major disadvantage of the static gate, however, is its com(cid:173)
`plexity. When registers are used in computational structures that are constantly clocked (such as
`a pipelined datapath), the requirement that the memory should hold state for extended periods of
`time can be significantly relaxed.
`This results in a class of circuits based on temporary storage of charge on parasitic capaci(cid:173)
`tors. The principle Is exactly identical to the one used in dynamic logic-charge stored on a
`capacitor can be used to represent a logic signal. The absence of charge denotes a 0, while its
`presence stands for a stored I. No capacitor is ideal, unfortunately, and some charge leakage is
`always present. A stored value can thus only be kept for a limited amount of time, typically in
`the range of milliseconds. If one wants to preserve signal integrity, a periodic refresh of the value
`is necessary; hence, the name dynamic storage. Reading the value of the stored signal from a
`capacitor without dis1upting the charge requires the availability of a device with a high-input
`impedance.
`
`7.3.1 Dynamic Transmission-Gate Edge-Triggered Registers
`A fully dynamic positive edge-triggered register based on the master-slave concept is shown in
`Figure 7-23. When CLK = 0, the input data is sampled on storage node 1, which has an equiva(cid:173)
`lent capacitance of C 1, consisting of the gate capacitance of / 1, the junction capacitance of T1,
`and the overlap gate capacitance of T1• During this period, the slave stage is in a hold mode, with
`node 2 in a high-impedance (floating) state. On the rising edge of clock, the transmission gate T2
`turns on, and the value sampled on node I right before the 1ising edge propagates to the output Q
`(note that node I is stable during the high phase of the clock, since the first transmission gate is
`
`D
`
`CLK
`j_
`
`A
`
`T,
`T I
`
`11
`
`c,
`
`CLK
`Figure 7-23 Dynamic edge-triggered register.
`
`CLK
`
`CLK
`j_
`
`T,
`
`B
`
`!3
`
`Q
`
`C,
`
`T I-
`
`Dell Ex. 1025
`Page 233
`
`
`
`
`
`
`
`
`
`
`
`
`
`350
`
`Chapter 7 • Designing Sequential Logic Circuits
`
`Problem 7.5 Dual-Edge Registers
`
`Determine how the adoption of dual-edge registers influences the power dissipation in the dock(cid:173)
`distribution network.
`
`7.3.3 True Single-Phase Clocked Register (TSPCR)
`In the two-phase clocking schemes described earlier, care must be taken in routing the two clock
`signals to ensure that overlap is minimized. While the C2MOS provides a skew-tolerant solution,
`it is possible to design registers that only use a single phase clock. The True Single-Phase
`Clocked Register (TSPCR), proposed by Yuan and Svensson, uses a single clock [Yuan89]. The
`basic single-phase positive and negative latches are shown in Figure 7-30. For the positive latch,
`when CLK is high, the latch is in the transparent mode and corresponds to two cascaded invert(cid:173)
`ers; the latch is noninverting, and propagates the input to the output. On the other hand, when
`CLK = 0, both inverters are disabled, and the latch is in hold mode. Only the pull-up networks
`are still active, while the pull-down circuits are deactivated. As a result of the dual-stage
`approach, no signal can ever propagate from the input of the latch to the output in this mode. A
`register can be constructed by cascading positive and negative latches. The clock load is similar
`to a conventional transmission gate register, or C2:rvt0S register. The main advantage is the use
`of a single clock phase. The disadvantage is the slight increase in the number of transistors-12
`transistors are now required.
`As a reminder, note that a dynamic circuit in the style of Figure 7-30 must be used with
`caution. When the clock is low (for the positive latch), the output node may be floating, and it is
`exposed to coupling from other signals. Also, charge sharing can occur if the output node drives
`transmission gates. Dynamic nodes should be isolated with the aid of static inverters, or made
`pseudostatic for improved noise immunity.
`As with many other latch families, TSPC offers an additional advantage that we have not
`explored so far: The possibility of embedding logic functionality into the latches. This reduces
`
`Yoo
`
`VDD
`
`V'vo
`
`'" C:t r-9
`l:( ~
`
`CL~
`
`T
`-
`
`14
`
`Om
`
`r-9
`
`In
`
`Out
`
`"'1~~
`_i_
`
`..L
`I
`
`Figure 7-30 True Single-Phase Latches.
`
`Dell Ex. 1025
`Page 239
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`r
`C I ~ rI CL~!
`(
`
`!
`l
`
`CL!S[
`
`I
`
`i-i
`
`--=
`
`f
`t
`
`-
`
`362
`
`Chapter 7 • Designing Sequential Logic Circuits
`
`VDD
`
`Voo
`
`Von
`
`r-----<J
`
`CL!Sj
`
`I
`'-
`
`0
`
`-
`Figure 7-43 Potential race condition during (0-0) overlap in c2M0S-based design.
`
`from stage to stage under this condition is when the logic function F is inverting, as illustrated
`in Figure 7-43, where F is replaced by a single, static CMOS inverter. Similar considerations
`are valid for the (1-1) overlap.
`Based on this concept, a logic circuit style called NORA-CMOS was conceived
`[Gon~alves83]. It combines C2M0S pipeline registers and NORA dynamic logic function
`blocks. Each module consists of a block of combinational logic that can be a mixture of static
`and d}'namic logic, followed b}' a C2MOS latch. Logic and latch are clocked in such a way that
`both are simultaneously in either evaluation, or hold (precharge) mode. A block that is in evalua(cid:173)
`tion during CLK = I is called a CLK module, while the inverse is called a CLK module. Exam(cid:173)
`ples of both classes are shown in Figure 7-44a and 7-44b, respectively. The operation modes of
`the modules are summarized in Table 7-2.
`A NORA datapath consists of a chain of alternating CLK and CLK modules. While one
`class of modules is precharging with its output latch in hold mode, preserving the previous
`output value, the other class is evaluating. Data is passed in a pipelined fashion from module
`to module. NORA offers designers a wide range of design choices. Dynamic and static logic
`
`Table 7-2 Operation modes for NORA logic modules.
`
`CLKblock
`
`CLKblock
`
`Logic
`
`Latch
`
`Logic
`
`Latch
`
`Precharge
`
`Hold
`
`Evaluate
`
`Evaluate
`
`Evaluate
`
`Evaluate
`
`Precharge
`
`Hold
`
`CLK=O
`
`CLK= l
`
`Dell Ex. 1025
`Page 251
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`370
`
`Chapter 7 • Designing Sequential Logic Circuits
`
`5.0
`
`4.0
`
`'o'
`~ 3.0
`5
`-~
`~ 2.0
`
`LO
`
`0.0
`0.5
`
`1.0
`
`1.5
`V"rl (V)
`
`2.0
`
`2.5
`
`Figure 7-53
`
`tpHL of current-starved inverter as a function of the control voltage.
`
`This results in large variations of the propagation delay, as the drive current is exponen(cid:173)
`tially dependent on the drive voltage. When operating in this region, the delay is very sen(cid:173)
`sitive to variations in the contra] voltage and hence to noise.
`
`Another approach to implement the delay cell is to use a differential element as shown in
`Figure 7-54a. Since the delay cell provides both inverting and noninverting outputs, an oscillator
`with an even number of stages can be implemented. Figure 7-54b shows a two-stage differential
`VCO, where the feedback loop provides 180° phase shift through two gate delays, one nonin(cid:173)
`verting and the other inverting, therefore forming an oscillation. The simulated waveforms of
`this two-stage VCO are shown in Figure 7-54c. The in-phase and quadrature phase outputs are
`available simultaneously. The differential-type VCO has better immunity to common mode noise
`(for example, supply noise) compared with the common ring oscillator. However, it consumes
`more power due to its increased complexity and its static current.
`
`7.7 Perspective: Choosing a Clocking Strategy
`A crucial decision that must be made in the earliest phases of chip design is to select the appro(cid:173)
`priate clocking methodology. The reliable synchronization of the various operations occurring in
`a complex circuit is one of the most intriguing challenges facing the digital designer of the next
`decade. Choosing the right clocking scheme affects the functionality, speed, and power of a
`circuit.
`A number of widely used clocking schemes were introduced in this chapter. The most
`robust and conceptually simple scheme is the two-phase master-slave design. The predominant
`approach is to use the multiplexer-based register, and to generate the two clock phases locally by
`simply inverting the clock. More exotic schemes such as the glitch register are also used in prac(cid:173)
`tice. However, these schemes require significant fine-tuning and must only be used in specific
`situations. An example of such is the need for a negative setup time to cope with clock skew.
`The general trend in high-performance CMOS VLSI design is therefore to use simple
`clocking schemes, even at the expense of performance. Most automated design methodologies
`
`Dell Ex. 1025
`Page 259
`
`
`
`
`
`372
`
`Chapter 7 • Designing Sequential Logic Circuits
`
`propagation delay. These parameters must be carefully optimized, because they may
`account for a significant portion of the clock period.
`• Registers can be siatic or dynamic. A static register holds state as long as the power supply
`is turned on. It is ideal for memory that is accessed infrequently (e.g., reconfiguration reg(cid:173)
`isters or control information). Static registers use either multiplexers or overpowering to
`enable the writing of data.
`• Dynamic memory is based on temporary charge storage on capacitors. The primary advan(cid:173)
`tage is reduced complexity, higher performance. and lower pmver consumption. However,
`charge on a dynamic node leaks away with time, and dynamic circuits thus have a mini(cid:173)
`mum clock frequency. Pure dynamic memory is hardly used anymore. Register circuits are
`made pseudostatic to provide immunity against capacitive coupling and other sources of
`circuit induced noise.
`• Registers can also be constructed by using the pulse or glitch concept. An intentional pulse
`(using a one-shot circuit) is used to sample the input around an edge. Sense-amplifier(cid:173)
`based schemes also are used to construct registers; they should be used as well when high(cid:173)
`performance or low-signal-swing signalling is required.
`• Choice of clocking style is an important consideration. Two-phase design can result in race
`problems. Circuit techniques such as C2MOS can be used to eliminate race conditions in
`two-phase clocking. Another option is to use true single-phase clocking. However, the rise
`time of clocks must be carefully optimized to eliminate races.
`• The combination of dynamic logic with dynamic latches can produce extremely fast com(cid:173)
`putational structures. An example of such an approach, the NORA logic style, is very
`effective in pipelined datapaths.
`• Monostable structures have only one stable state; thus, they are useful as pulse generators.
`• Astable multi vibrators, or oscil1ators~ possess no stable state. The ring oscillator is the
`best-known example of a circuit of this class.
`• Schmitt triggers display hysteresis in their de characteristic and fast transitions in their
`transient response. They are mainly used to suppress noise.
`
`7.9 To Probe Further
`The basic concepts of sequential gates can be found in many logic design textbooks (e.g.,
`[Mano82] and [Hill74]). The design of sequential circuits is amply documented in most of the
`traditional digital circuit handbooks. [PartoviOl] and [Bemstein98] provide in-depth overviews
`of the issues and solutions in the design of high-performance sequential elements.
`
`References
`[Bemstein98} K. Bernstein et al., High-Speed CMOS Desig11 Styles, Kluwer Academic Publishers, 1998.
`(Dopperpuhl92] D. Dopperpuhl et aL, ''A 200 MHz 64-b Dual Issue CMOS Microproce55or," IEEE Journal of Solid(cid:173)
`Stale Circuits, vol. 27, no. l 1. Nov. 1992, pp. 1555-1567.
`[Gieseke971 B. Gieseke et al., "A 600 MHz Superscalar R[SC Microproces:.or with Out-of-Order Execution," IEEE
`lutemmional Soiid~Stale Circuits Co!lference, pp. 176-177, Feb. 1997.
`
`Dell Ex. 1025
`Page 261
`
`
`
`7.9 To Probe Further
`
`373
`
`[Gonr;alves83] N. Goni;alves and H. De Man, "NORA: a racefree dynamic CMOS technique for pipelined logic struc(cid:173)
`tures," IEEE Journal of Solid-Stare Circuits, vol. SC-18, no. 3, June 1983, pp. 261-266.
`[Hill74] F. Hill and G. Peterson, Jmroduction to Switching Theory and Logical Design, Wiley, 1974.
`{Jeong87] D. Jeong et al., "Design of PLL-based clock generation circuits," IEEE Journal of Solid-State Circuits, vol.
`SC-22, no. 2, April 198-7, pp. 255-261.
`[Kozu96J S. Kozu et al., "A 100 MHz 0.4 W RISC Processor with 200 MHz Multiply-Adder, using Pulse-Register Tech(cid:173)
`nique," IEEE ISSCC, pp. l40-14I, February 1996.
`[Mano82J M. Mano, Computer SysJem Archiiecture, Prentice.Hall, 1982,
`[Montanaro96] J. Ivfontanaro et al., "A i60-MHz, 32-b, 0.5-W CMOS RISC Microprocessor," IEEE Joumal ofSolid(cid:173)
`State Circuits. pp. 1703-1714, November 1996.
`[Mutoh95I S. Mutoh et al., "1-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltag-e
`CMOS," IEEE Journal of Solid State Circuirs, pp. 847-854, August 1995.
`[P.artovi96] H. Panovi, "Flow-Through Latch and Edge-Triggered Flip-Flop Hybrid Elements," IEEE lSSCC, pp. 138-
`139, February 1996.
`[Partovi01] H. Partovi. "Clocked Storage Elements," in Design of High-Pe1formmu:e Microprocessor Circuits,
`Chandakasan et al., ed., Chapter 11, pp. 207-233, 2001.
`[Schmitt38] 0. H. Schmitt, "A Thermionic Trigger," Joumal o/Scientijic Instruments, vol. 15, January 1938, p-p. 24-26.
`[Suzuki73] Y. Suzuki, K. Odagawa, and T. Abe, "Clocked CMOS calculator circuitry," IEEE Journal of Solid State Cir(cid:173)
`cuits, vol. SC-8, December l973, pp. 462-469.
`[Yuan89] J. Yuan and Svensson C., "High-Speed CMOS CircuitTeclmique," IEEE JSSC, vol. 24, no. l, February 1989,
`pp. 62-70.
`
`Dell Ex. 1025
`Page 262
`
`
`
`
`
`Dell Ex. 1025
`
`Page 263
`
`Dell Ex. 1025
`Page 263
`
`
`
`PART
`
`3
`
`A System
`Perspective
`
`"Art, it seems to me, should simplify. That, indeed, is very nearly the whole of the higher artistic
`process; finding what conventions of form and what of detail one can do without and yet pre(cid:173)
`serve the spirit of the whole."
`
`Willa Sibert Cather,
`On the Art of Fiction (l 920).
`
`"Simplicity and repose are the qualities that measure the true value of any work of art"
`
`Frank Lloyd Wright.
`
`Dell Ex. 1025
`Page 264
`
`
`
`Dell Ex. 1025
`
`Page 265
`
`Dell Ex. 1025
`Page 265
`
`
`
`CHAPTER
`
`8
`
`Implementation Strategies
`for Digital ICs
`
`Semicustom and structured design methodologies
`
`ASIC and system-011-a-chip design flows
`
`Co11figurahle hardware
`
`Introduction
`8.1
`8.2
`From Custom to Semicustom and Structured-Array Design Approaches
`8.3 Custom Circuit Design
`8.4 Cell-Based Design Methodology
`8.4.1 Standard Cell
`8.4.2 Compiled Cells
`8.4.3 Macrocells, Megacells, and Intellectual Property
`8.4.4 Semicustom Design Flow
`Array-Based Implementation Approaches
`8.5.1
`Prediffused (or Mask-Programmable) Arrays
`8.5.2 Prewired Arrays
`Perspective-The Implementation Platform of the Future
`Summary
`To Probe Further
`
`8.6
`8.7
`8.8
`
`8.5
`
`377
`
`Dell Ex. 1025
`Page 266
`
`
`
`
`
`8.1
`
`Introduction
`
`379
`
`-<
`
`/
`
`I
`j-<
`
`f-
`;:,
`f:::
`;:,
`0
`f'O ;:,
`"' :"i
`
`MEMORY
`
`i
`
`DATAPATH
`
`'
`
`I
`
`1
`' 1-<-~ CONTROL
`J
`
`Figure 8-2 Composition of a generic digital processor. The arrows
`represent the possible interconnections.
`
`could be the brain of a personal computer (PC), or the heart of a compact-disc player or cellular
`phone. It is composed of a number of building blocks that occur in one form or another in almost
`every digital processor:
`
`• The datapath is the core of the processor; it is where all computations are performed. The
`other blocks in the processor are support units that either store the results produced by the
`datapath or help to determine what will happen in the next cycle. A typical datapath con(cid:173)
`sists of an interconnection of basic combinational functions, such as logic (AND. OR,
`EXOR) or arithmetic operators (addition, multiplication, comparison, shift). lntennediate
`results are stored in registers. Different strategies exist for the implementation of data(cid:173)
`paths-sh·uctured custom cells versus automated standard cells, or fixed hard-wired versus
`flexible field-programmable fabric. The choice of the implementation platform is mostly
`influenced by the trade-off between different design metrics such as area, speed, energy,
`design time, and reusability.
`• The control module determines what actions happen in the processor at any given point
`in time. A controller can be viewed as a finite state machine (FSM). It consists of registers
`and logic, and thus is a sequential circuit. The logic can be implemented in different
`\\lays-either as an interconnection of basic logic gates (standard cells), or in a more struc(cid:173)
`tured fashion using programmable logic arrays (PLAs) and instruction memories.
`• The memory module serves as the centralized data storage area. A broad range of differ(cid:173)
`ent memory c1asses exist. The main difference between those classes is in the way data can
`be accessed, such as ''read only" versus "read-writet sequential versus random access, or
`single-ported versus multiported access. Another way of differentiating between memories
`is related to their data-retention capabilities. Dynamic memory structures must be
`refreshed periodically to keep their data, while static memmies keep their data .as long as
`the power source is turned on. Finally, nonvolatile memories such as flash memories con(cid:173)
`serve the stored data even when the supply voltage is removed. A single processor might
`combine different memory classes. For example, random access memory can be used to
`store data, and read-only memory may store instructions.
`
`Dell Ex. 1025
`Page 268
`
`
`
`
`
`
`
`382
`
`Chapter 8 • Implementation Strategies for Digital ICS
`
`efficiency-the number of operations that can be perfonned for a given amount of
`energy-of various implementation styles versus their flexibility-that is, the range of
`applications that can be mapped onto them. A staggering three orders of magnitude in
`variation can be observed. This clearly demonstrates that hard-wired or implementation
`styles with limited flexibility (such as configurable or parameterizable modules) are pref(cid:173)
`erable when energy efficiency is a must
`
`In this and the following three chapters, we discuss, respectively, implementation tech(cid:173)
`niques for random logic and controllers (this chapter), interconnect (Chapter 9), datapaths
`(Chapter l l), and memories (Chapter 12). Observe that the choice of the implementation
`approach can have a tremendous effect on the quality of the final product. Important aspects in
`the design of complex systems consisting of multiple blocks and thus deserving special attention
`are synchronization and timing (Chapter !Oi and the power disuibution network (Chapter 9).
`The distribution of clock signals and supply current has become one of the dominant problems
`in the design of state-of-the-an processors. A number of Design Methodology Inserts, inter(cid:173)
`spersed between the chapters, address the design challenge posed by these complex components,
`and introduce the advanced design automation tools that are available to the designer. Inserts F,
`G, and H discuss design synthesis, verification, and test. respectively.
`
`8.2 From Custom to Semicustom and Structured-Array
`Design Approaches
`The viability of a microelectronics design depends on a number of ( often) conflicting factors,
`such as performance in terms of speed or power consumption, cost, and production volume. For
`example, to be competitive in the market, a microprocessor has to excel in performance at a low
`cost to the customer. Achieving both goals simultaneously is only possible through Jarge sales
`volumes. The high development cost associated with high-perfonnance design is then amortized
`over many parts. Applications such as supercomputing and some defense applications present
`another scenario. \Vith ultimate performance as the primary design goal, high-performance cus(cid:173)
`tom design techniques often are desirable. The production volume is small, but the cost of elec(cid:173)
`tronic parts is only a fraction of the overall system costs and thus not much of an issue. Finally,
`reducing the system size through integration, not performance, is the major objective in most
`consumer applications. Under these circumstances, the design cost can be reduced substantially
`by using advanced design-automation techniques, which compromise performance, but mini(cid:173)
`mize design time. As noted in Chapter I, the cost of a semiconductor device is the sum of two
`components:
`
`• The nonrecurring expense (NRE), which is incurred only once for a design and includes
`the cost of designing the part.
`• The production cost per part, which is a function of the process compiexity, design area,
`and process yield.
`
`Dell Ex. 1025
`Page 271
`
`
`
`8.3 Custom Circuit Design
`
`383
`
`Digital Circuit lmplcmentalion Approaches I
`'
`'
`I Custom
`I I Semicustom I
`
`I
`
`I
`
`I Cell based
`'
`Standard cdb
`Compik<l cells
`
`I
`
`tvlacro cells
`
`I
`
`I
`
`I Array ba5ed I
`'
`Pre-diffused
`(GuteArrnys)
`
`Pre-wired
`(FPGA's)
`
`Figure 8-5 Overview of implementation approaches !or digital
`integrated circuits (after [DeMicheli94]).
`
`These economic considerations have spuffed the development of a number of distinct implemen(cid:173)
`tation approaches ranging from high-performance, handcrafted design to fully programmable,
`medium-to-low performance designs. Figure 8-5 provides an overview of the different method(cid:173)
`ologies. In the sections that follow, we discuss first the custom design methodology, followed by
`the semicustom and aiTay-based approaches.
`
`8.3 Custom Circuit Design
`When performance or design density is of primary importance, handcrafting the circuit topology
`and physical design seems to be the only option. Indeed, this approach was the only option in the
`early days of digital microelectronics, as is adequately demonstrated in the design of the Intel
`4004 microprocessor (see Figure 8-Sa). The labor-intensive nature of custom design translates
`into a high cost and a long time to market. Therefore, it can only be justified economically under
`the following conditions:
`
`• The custom block can be reused many times (for example, as a library cell).
`• The cost can be amortized over a large volume. Microprocessors and semiconductor mem(cid:173)
`ories are examples of applications in this class.
`• Cost is not the ptime design criterion, as it is in supercomputers or hypersupercomputers.
`
`With continuous progress in the design-automation arena, the share of cnstom design reduces
`from year to year. Even in the most advanced high-perfonnance microprocessors, such as the
`Intel Pentium® 4 processor (see Figure 8-6), virtually aH portions are designed automatically
`using semicustom design approaches. Only the most performance-critical moduJes such as the
`phase locked-loops and the clock buffers are designed manually. In fact, library cell design is the
`only area where custom design still th,ives today.
`The amount of design automation in the custom-design process is minimal, yet some
`design tools have proven indispensable. In concert with a wide range of verification, simulation.
`extraction and modeling toois, layout editors. design-rule and electrical-mle checkers-as
`
`Dell Ex. 1025
`Page 272
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`392
`
`Chapter 8 • Implementation Strategies for Digital !CS
`
`8.4.3 Macrocells, Megacells and Intellectual Property
`Standardizing at the logic-gate !eve) is attractive for random logic functions, but it turns out to be
`inefficient for more complex structures such Us multipliers, data paths, memories, and embedded
`microprocessors and DSPs. By capturing the specific nature of these blocks, implementations
`can be obtained that outperform the results of the standard ASIC design process by a wide mar(cid:173)
`gin. Cells that contain a complexity that surpasses what is found in a typical standard-cell library
`are called macmcel/s (or, sometimes, megacells). Two types of macrocells can be identified:
`
`represents a module with a given functionality and a predetermined physical
`The Hard Macro
`design. The relative location of the transistors and the wiring \.Vithin the modnie is fixed. In
`essence, a hard macro represents a custom design of the requested function. In some cases, the
`macro is parameterized, which means that versions with slightly different properties are avail(cid:173)
`able or can be generated. I\1ultipliers and memories are examples: A hard multiplier macro may
`not only generate a 32 x l 6 multiplier, but also an 8 x 8 one.
`The advantage of the hard macro is that it brings with it all the good properties of custom
`design: dense layout, and optimized and predictable performance and power dissipation. By
`encapsulating the function into a macrornodule, it can be reused over and over in different
`designs. This reuse helps to offset the initial design cost. The disadvantage of the hard macro is
`that it is hard to porl the design to other technologies or to other manufacturers. For every new
`technology. a major redesign of the block is necessary. For this reason, hard macros are used less
`and less, and are employed mainly when the automated generation approach is far inferior or
`even impossible. Embedded memories and microprocessors are good examples of hard macros.
`They typically are provided by the IC manufacturer (who also provides the standard cell library),
`or the semiconductor vendor who has a particularly desirable function to offer (such as a stan(cid:173)
`dard microprocessor or DSP).
`In the case of a macro that can be parameterized, a generator called the module compiler is
`used to create the actual physical layout. Regular structures such as PLAs, memodes, and multi(cid:173)
`pliers are easily constructed by abutting predesigned leaf cells in a two-dimensional array topol(cid:173)
`ogy. All interconnections are made by abutment, and no or little extra routing is needed if the
`cells are designed correctly, which minimizes the parasitic capacitance. The PLA of Figure 8-11
`is an example of such a configuration. The whole mray can be constructed with a minimal num(cid:173)
`ber of cells. The generator itself is a simple software program that determines the relative posi(cid:173)
`tioning of the various leaf cells in the array.
`
`Example 8.4 A Memory Macromodule
`Figure 8-13 shows an example of a "hard" memory macrocell. The 256 x 32 SRAM block
`is generated by a parameterizable module generator. Besides creating the layout, the gen(cid:173)
`erator also provides accurate timing and power information. Modern memory generators
`also include an amount of redundancy to deal with defects.
`
`Dell Ex. 1025
`Page 281
`
`
`
`
`
`
`
`
`
`396
`
`Chapter 8 • Implementation Strategies for Digital ICS
`
`tailored to the application, and that the processor itself can easily be ported to different
`technologies and fabrication processes.
`Implementing the computation-intensive parts of the protocol (MAC/PHY) on the
`microprocessor would require very high clock speeds and would unnecessarily increase the
`power dissipation of the chip. Fortunately, these functions are fixed and typically do not
`require a flexible implementation. Hence, they are implemented as an accelerator module in
`standard cells. The hard-wired implementation accomplishes the task of implementing a
`huge number of computations at a relatively low power leve] and clock frequency. The
`designer of a system on a chip is continuously faced with the challenge of deciding what is
`more desirable-after-the-fabrication flexibility versus higher performance at lower power
`levels. Fortunately, tools are emerging that help the designer to explore the overall design
`space and analyze the trade-offs in an informed fashion [SilvaOIJ. Observe also that the chip
`contains a set of I/0 interfaces, as well as an embedded network module, which helps to
`orchestrate the traffic between processor and the various accelerator and I/0 modules.
`
`The generation process of a macro module depends on the hard or soft nature of the block,
`as weH as the level of design entry. ln the following sections, we briefly discuss some com(cid:173)
`monly: used approaches.
`
`8.4.4 Semicustom Design Flow
`So far, we have defined the components that make up the cell-based design methodology. In this
`section, we discuss how it all comes together. Figure 8-16 details the traditional sequence of
`steps to design a semicustom circuit. The steps of what we caH the design flow are enumerated
`in the figure, with a brief description of each:
`
`l. Design Capture enters the design into the ASIC design system. A variety of methods can
`be used to do so, including schematics and block diagrams; hardware description lan(cid:173)
`guages (HDLs) such as VHDL, Verilog, and. more recently, C-derivatives such as Sys(cid:173)
`temC; behavioral description languages followed by high-level synthesis; and imported
`intellectual property modules.
`2. Logic Synthesis tools translate modules described using an HDL language into a netlist.
`Netlists of reused or generated macros can then be inserted to form the complete netlist of
`the design.
`3. Prelayout Simulation and Verification. The design is checked for conectness. Perfor(cid:173)
`mance analysis is pe1formed based on estimated parasitics and layout parameters. If the
`design is found to be nonfunctional, extra iterations over the design capture or the logic
`synthesis are necessary.
`4. Floor Planning. Based on estimated module sizes, the overall outlay of the chip is cre(cid:173)
`ated. The global-power and clock-distribution networks also are conceived at that time.
`
`Dell Ex. 1025
`Page 285
`
`
`
`
`
`
`
`8.5 Array-Based Implementation Approaches
`
`399
`
`ations, it throws quite a challenge at the design-tool developers. \Vith the number of parasitic
`effects increasing with every round of technology scaling, the design optimization process that
`must take all this into account becomes exponentiaHy complex as well. As a result, other
`approaches might be required as well. In the coming chapters, we will highlight "design solu(cid:173)
`tions" that can help to alieviate some of these problems. An example is the use of regular and
`predictable structures, both at the logical and the physical level.
`
`8.5 Array-Based Implementation Approaches
`While design automation can help reduce the design time, it does not address the time spent in
`the manufactming process. AH of the design methodologies discussed thus far require a com(cid:173)
`plete run through the fabrication process.This can take from three weeks to several months, and
`it can substantially delay the introduction of a product Additionally, with ever-increasing mask
`costs, a dedicated process run is expensive, and product economics must determine if this is a
`viable route.
`Consequently, a number of alternative implementation approaches have been devised that
`do not require a complete run through the manufacturing process, or they avoid dedicated pro(cid:173)
`cessing completely. These approaches have the advantage of having a lower NRE (nonrecurring
`expense) and are, therefore, more attractive for small series. This comes at the expense of lower
`performance, lower integration density, or higher power dissipation.
`
`Prediffused {or Mask-Programmable) Arrays
`
`8.5.1
`In this approach, batches of wafers containing arrays of primitive cells or transistors are manu(cid:173)
`factured by the vendors and stored. All the fabrication steps needed to make transistors are stan(cid:173)
`dardized and executed without regard to the final application.
`To transform these uncommitted wafers into an actual design, only the desired intercon(cid:173)
`nections have to be added. detennining the overall function of the chip with only a few metalli(cid:173)
`zation steps. These layers