`DESIGN
`
`KENNETH D. WAGNER
`IBM Corp.
`
`A well-designed clock system is a fun-
`damental requirement in high-speed
`computers. In this tutorial, the author
`provides a framework for under-
`standing system timing and then de-
`scribes how the clock system executes
`the timing specifications. The tutorial
`examines clock generation and the
`construction of clock-distribution net-
`works, which are integral to any clock
`system. Examples from contemporary
`high-speed systems highlight several
`common methods of clock generation,
`distribution, and tuning. Tight control
`of system clock skew is essential to an
`effective clock system.
`
`T
`
`he careful design of clock systems is often neglected. Part
`of the reason is that older, slower computers had higher
`tolerances to variations in the clock signal and had less
`exacting timing requirements. Today, however, as the
`demand for high-speed computers grows, the design of their clock
`systems should become a major concern not only in achieving
`high performance, but also in reducing assembly and mainte-
`nance costs.
`A well-planned and well-built clock system is a prerequisite to
`reliable long-term computer operation. Conversely, a badly de-
`signed clock system can plague a computer throughout its life-
`time, affecting its operation at any speed. To make such systems
`function, components often have to be tuned individually at sev-
`eral stages of manufacturing.
`Despite these costs and performance penalties, timing design
`is still overlooked in many systems. Although significant deci-
`sions that must be made early in computer design include such
`issues as clocking scheme and type of memory element, design-
`ers seldom participate. Instead, system architects may simply re-
`peat a previously successful set of choices, despite significant
`changes in design specifications, technology, and environment.
`Of course, these systems will eventually be functional, but they
`will require much more maintenance and tuning-costs not al-
`ways reflected back to the developers.
`These attitudes prevail in part because timing design problems
`are rarely reported in the literature. Also, design teams tend to
`be secretive about their clock systems, either because they believe
`they are doing something new or because they are doing nothing
`new and are afraid to be associated with an older technique.
`Either way, the result is a scarcity of information on how to avoid
`timing problems through proper design of the clock system.
`THE CLOCK SYSTEM
`System timing specifications are executed using a clock system.
`The clock system has two main functions, clock generation and
`clock distribution. We use clock-generation circuitry to form
`highly accurate timing signals, which we then use to synchronize
`
`OCTOBER1988
`
`0740-7475/88/ 1000-09$1.0001988 IEEE
`
`9
`
`ADVANCED MICRO DEVICES, INC.
`Exh. 2008
`LG ELECTRONICS, INC. v. ADVANCED MICRO DEVICES, INC.
`IPR2015-00324
`
`
`
`CLOCK SYSTEM DESIGN-
`
`Two types of
`clocked b is t a bl e
`elements are important
`in contemporary
`high-speed computers:
`the latch and the
`edge-triggered flip-flop.
`
`changes in the system state. These pulsed, synchronizing signals
`are known as clocks. We use clock distribution to deliver the
`clocks to their destinations at precisely specified instants. A net-
`work, called the clock-distribution network, propagates clocks
`formed by clock generation to clocked memory elements.
`Most logic design texts, such as that by McCluskey (see "Addi-
`tional Reading" at the end of this article), describe bistable ele-
`synchronized by the clock signals. A system oscillator is the
`source for these periodic signals. We generate and manipulate the
`clock signals and precisely place clock pulses to meet the system
`timing requirements. We may also tune the clocks to compensate
`for inaccuracies in the clock pulsewidth or pulse position.
`
`BISTABLE ELEMENTS
`The focus of this article is on the timing design of systems that
`use static bistable elements. The techniques described can also
`be used in the timing design for other types of clocked memory
`elements, such as arrays and dynamic latches, or for precharg-
`ing circuitry.
`Most logic design texts, such as that by McCluskey (see "Addi-
`tional Reading" at the end of this article), describe bistable ele-
`ments and their characteristics in great detail. Two types of
`clocked bistable elements are important in contemporary high-
`speed computers: the latch and the edge-triggered flip-flop. The
`latch is transparent while its clock (control) input is active. By
`transparent, we mean that its outputs reflect any of its data in-
`puts. Edge-triggered elements, such as the D flip-flop, respond to
`their data inputs only at either the rising or falling transition of
`their clock input. They do not have the transparency property of
`the latch.
`We can describe the time-dependent behavior of a bistable ele-
`ment using the following parameters:
`
`setup time, the minimum time that the data input of the bistable
`element must be held stable before the active edge or latching
`level of the clock pulse occurs
`hold time, the minimum time that the data input of the bistable
`element must be held stable after the active edge or latching
`level of the clock pulse disappears
`propagation delay, the time between a change on the clock or
`data input of the bistable element and the corresponding
`change on its output
`
`For system operation to be correct, the setup time, hold time,
`and minimum clock pulsewidth must be satisfied for each
`bistable element. Signals whose propagation delay is so long that
`it violates the setup time are called long-path signals. Signals
`whose propagation delay is so short that it violates the hold time
`are called short-path signals. Both conditions result in incorrect
`data being stored.
`
`10
`
`IEEE DESIGN & TEST OF COMPUTERS
`
`
`
`- W,
`
`oi;
`
`I
`
`1
`
`
`
`prlad-
`
`fc)
`Figure 1. System clocking waveforms;
`single-phase (a), two-phase &), a n d
`edge-triggered (c). Wi=pulsewidth of
`phase j and gji= interphase gap from
`phase i to phase j; fgy > 0 + two-phase,
`nonoverlapping, Ygy < 0 + two-phase
`overlapping.
`
`-W-
`
`~
`
`~
`
`SYSTEM CLOCKING SCHEMES
`
`_
`_
`
`System clocking is either single-phase, multiphase (usually
`two-phase), or edge-triggered. Figure 1 illustrates. The dark rec-
`tangles in the figure represent the interval during which a bistable
`element samples its data input. Each scheme requires a min-
`imum clock pulsewidth.
`The most widely used scheme is multiphase clocking. The mul-
`tiphase clocking scheme in Figure l b is two-phase, nonoverlap-
`ping. In this scheme, two distinct clock phases are distributed
`within the system, and each bistable element receives one of these
`two clocks. Systems that have adopted two-phase clocking in-
`clude microprocessors such as the Intel 80x86 series and Mo-
`torola MC68000 family, micro-mainframes such as the HP-9000,
`and mainframes such as the IBM 3090 and the Univac 1100/90.
`Figure 2 shows a finite-state machine, a machine that realizes
`sequential logic functions, with each clocking scheme. (For more
`on finite-state machines, see McCluskey’s text.) For simplicity,
`primary 1 / 0 is not shown. The Amdahl580 mainframe and Cray-
`1 vector processor are single-phase latch machines, such as that
`shown in Figure 2a. Modern high-speed microprocessors like the
`Bellmac-32A are two-phase latch machines with a single-latch
`design using nonoverlapping clock phases, such as that shown
`in Figure 2b. Figure 2c shows a two-phase latch machine with a
`double-latch design. This type of machine supports scan-path
`testing, since it can use LSSD latch pairs, which are hazard-free
`master-slave latches with a scan input port. Most contemporary
`IBM products, including IBM 3090 mainframes, incorporate de-
`sign for testability using this structure. Systems built with cata-
`log parts are usually flip-flop machines, such as that shown in
`Figure 2d, because clocked bistable elements commonly offered
`in bipolar and CMOS MSI chips are edge-triggered.
`
`~
`
`~
`
`~~~~~
`
`CLOCK CYCLE
`THE
`System designers characterize a computer’s functionality in
`terms of its clock cycle, also called its machine cycle. The aver-
`age number of clock cycles required per machine instruction is a
`measure of computer performance. Table 1 gives clock rates for
`some well-known systems. The designer focuses on the clock
`cycle because it determines the standard work interval for inter-
`nal machine functions. The system state is the set of values in
`system memory elements at the end of a clock cycle.
`A clock cycle has the following properties:
`
`1. It consists of a sequence of one or more clock pulses.
`2. The sequence of clocks generated in each cycle is identical to
`every other cycle.
`3. No partial clock sequences can occur: clocks can only stop and
`start at cycle boundaries.
`4. Each bistable element can be updated at most once per cycle.
`
`These properties ensure that the transition to the next state of
`the system is predictable and correct. This deterministic system
`
`OCTOBER1988
`
`Figure 2. General finite-state machine
`structures: one-phase latch machine (4.
`two-phase latch machine with single-
`latch &) and double latch (c), andflip-jbp
`machine (d).
`
`11
`
`
`
`CLOCK SYSTEM DESIGN
`
`a conventional
`Computer System, One
`source generates the
`system clock signal.
`Mu1 tiple processors
`Operating Sy~Chl'OnOUSly
`share One si8na1.
`may
`
`behavior will hold whether clock cycles occur at the system oper-
`ating rate or one at a time. We can reproduce system behavior at
`the operating rate by issuing single clock cycles or bursts of clock
`cycles, which makes system debugging much simpler.
`
`T I c i N G ANAL ysIs
`
`__
`
`~~
`
`~~
`
`~
`
`Programs for timing analysis are used routinely to verify system
`timing. They can identify long or short paths, and the designer
`can interact with them to get estimates of signal-path delays in
`parts of the system. Designers can also run them after layout to
`get more accurate results. The delay models used for system ele-
`ments are validated by circuit simulation.
`Single-phase systems and multiphase overlapping systems re-
`quire more extensive timing analysis than multiphase nonover-
`lapping and edge- triggered systems. The timing constraints of
`single-phase and multiphase overlapping systems are two-sided,
`bounded by both short paths and long paths. Figure 3 illustrates
`these constraints in a simplified example, where setup time and
`hold time are set to 0. The advantage of these systems is that they
`operate more quickly than their nonoverlapping counterparts.
`
`___
`
`~
`
`~
`
`~~
`
`CLOCK SIGNALS
`~
`_
`_
`~
`In a conventional computer system, one source generates the
`system clock signal. Multiple processors operating synchronously
`may also share one signal. We can manipulate this clock signal
`in many ways before it reaches its destinations. We can divide it,
`delay it, shape it, buffer it, and gate it. Clocked bistable elements,
`either latches or flip-flops, use the signal that results from such
`manipulations.
`
`
`
`System
`
`Intro Date
`
`Technology
`
`Class
`
`Nominal Clock
`Period (ns)
`
`Nominal Clock
`Frequency (MHz)
`
`Cray-X-MP
`Cray- 1 S,- 1M
`CDC Cyber 180/990
`IBM 3090
`Amdahl58
`IBM 308X
`Univac 1 100/90
`MIPS-X
`HP-900
`Motorola 68020
`Bellmac-32A
`
`12
`
`1982
`1980
`1985
`1986
`1982
`1981
`1984
`1987
`1982
`1985
`1982
`
`MSI ECL
`MSI ECL
`ECL
`ECL
`L S I ECL
`LSI 7TL
`LSI ECL
`VLSI CMOS
`VLSI NMOS
`VLSI CMOS
`VLSI CMOS
`_____
`
`~~
`
`Vector processor
`Vector processor
`Mainframe
`Mainframe
`Mainframe
`Mainframe
`Mainframe
`Microprocessor
`Micro-mainframe
`Microprocessor
`Microprocessor
`
`9.5
`12.5
`16.0
`18.5
`23.0
`24.5.26.0
`30.0
`50.0
`55.6
`60.0
`125.0
`
`105.3
`80.0
`62.5
`54.1
`43.5
`40.8.38.5
`33.3
`20.0
`18.0
`16.7
`8.0
`
`~
`
`~~
`
`B E E DESIGN & TEST OF COMPUTERS
`
`
`
`Combinational
`logic
`
`Cycle 1
`
`latch
`Cycle 2 -
`
`For all systems, we must
`correctly place the
`10 -
`LO
`leading- or trailing-edge
`positions of the
`distributed clock pulses to
`ensure that bistable
`elements switch at the
`correct times.
`
`c
`5
`1. LS data available at 1 ,
`2. LS data must arrive at LO after t , ( or be latched up In Cycle 1 + short path).
`(or reduces the path lengh available In CyCle2).
`3. LO data arrives at LO by t
`4. LS data mwt arrive at LO before t4 (or be latched up In Cycle 3 d Ion! path).
`
`~
`
`Clock2
`
`Clock3
`
`Combinational
`logic
`
`Cycle 1
`
`-
`
`., O
`
`DM
`Master
`\./
`
`10 __ 10
`Y
`Dest
`Slave
`Cycle 2 -
`
`Clock 1
`
`__
`10
`1s
`
`latch
`
`Clock 1
`
`Clock 2
`
`5
`
`( a1
`
`Clock1
`
`Source
`Master
`
`Source
`Slave
`
`Clock 1
`
`Clock2
`
`I
`
`I
`
`Clock 3
`
`J
`
`'\
`
`5
`1.1,s data available at 1,
`aller I , (or be latched up in Cycle 1 d short path).
`2.1,s data must arrive atI.JI
`3.1,s data must arrive at GO by I, (or violate system cycle rime rewirement).
`4.1,s data must arrlve at L,O before t I (or be latched up in Cyde 3 3 long path).
`Figure 3. Path requirements in a single-phase machine (dand in a two-
`phase overlapping latch machine with a double latch (b).
`
`~~
`
`~
`
`~~~~
`
`~~~~
`
`~ _ _ ~
`
`~
`
`SIGNAL CHARACTERISTICS
`Clocked sequential logic responds to several characteristics of
`the clock signal: the clock period, the pulsewidth, and the lead-
`ing-edge or trailing-edge position of the clock pulse. The clock pe-
`riod is the interval before the signal pattern repeats. The ideal
`clock signal for a bistable element is a sequence of regularly re-
`peating pulses. Ideal pulses are rectangular with sufficient dura-
`tion and amplitude to ensure the reliable operation of the bistable
`element. The duration of the pulse, or pulsewidth 0, can be any
`fraction of the clock period, but is usually less than or equal to
`half of it. An accurate model of a real clock pulse includes actual
`voltage levels and the shapes of the pulse edges.
`
`OCTOBER1988
`
`13
`
`
`
`CLOCK SYSTEM DESIGN
`
`pulsewidth-
`manipulation
`elf3"Zents have
`three functions;
`chop, shrink,
`and stretch.
`
`For all systems, we must correctly place the leading- or trailing-
`edge positions of the distributed clock pulses to ensure that
`bistable elements switch at the correct times. Also. distributed
`clock pulses must be wide enough or they will either be filtered
`out in transmission or be unable to switch a bistable element be-
`cause they lack the energy. Clock-manipulation elements reposi-
`tion clock pulses and change their pulsewidths. They consist of
`delay elements and elements that manipulate the pulsewidth.
`Delay elements either delay a pulse, or, in a timing chain, pro-
`duce a sequence of delayed pulses in response to a single pulse
`input. Pulsewidth-manipulation elements require both delay ele-
`ments and logic gates.
`Delay elements are available as both analog and digital circuits
`and are chosen according to the accuracy, flexibility, and range
`of signal delay required. Analog delay elements vary from simple
`printed or discrete wire interconnections to delay lines. Delay
`lines, packaged in hybrid chips, consist of lumped LC elements
`or distributed printed wire, which provides more accurate con-
`trol. Digital delay elements include logic gates and counters. Logic
`gates are relatively inaccurate because of their wide delay ranges.
`while the time resolution of counters depends on their operating
`frequency.
`Some delay elements are programmable, providing a range of
`delays. To select a particular delay, we can either connect to a
`particular chip output pin or tap, or control the configuration
`electronically by a multiplexer. A typical integrated delay line pro-
`vides delays from 1 to 10 ns in 1-ns increments with a k0.5-ns
`tolerance.
`Pulsewidth-manipulation elements have three functions: chop,
`shrink, and stretch. Figure 4b shows the effect of a chopper,
`shrinker, and stretcher on a positive pulse. The effect of each
`manipulation element differs for positive and negative clock
`pulses. Thus, for each pulse polarity, only three of the four ele-
`ments are useful. The other element has only a delay effect. Table
`2 shows the values for the signal characteristics after chopping,
`shrinking, and stretching. AND gates have delay &,. OR gates
`have delay &, inverters have delay dL, delay elements have delay
`D, and interconnections have no delay. The signal input is a pulse
`of width Wwhose leading edge occurs at time t=O. For an element
`to have an effect during the pulse, the sum of dL and D must be
`less than W.
`
`Table 2 . Effect of elements that manipulate the clock pukerutdth.
`-
`~-~
`Negative Pulse
`Positive Pulse
`-
`Element Leading Pulse- Function Leading Pulse- Function
`Edge width
`Edge width
`
`~~
`
`~~
`
`~
`
`~
`
`14
`
`A
`B
`C
`D
`
`D+d, Chopper
`da
`- -
`-
`D+d, W-D
`Shrinker
`da W+D Stretcher
`
`-
`- -
`do
`D+d, Chopper
`dn W+D Stretcher
`Shrinker
`I>+da W-D
`
`~~~~~~
`
`~
`
`~~
`
`~
`
`~
`
`ZEEE DESIGN & TEST OF COMPUTERS
`
`
`
`~
`
`CLOCK GENERATION
`-~
`~-
`We can derive all clock signals in a synchronous machine from
`the system clock signal. The system clock is often a rectangular
`pulse train with a 50% duty cycle, called a square wave. The cir-
`cuit that generates the system clock is at the base of the clock-
`distribution network. Its input is from either a voltage-controlled
`oscillator (VCO), a crystal oscillator (XO), or a voltage-controlled
`crystal oscillator (VCXO). All three sources produce a sinusoidal
`(single-frequency) output, which is then clamped or divided to
`generate the rectangular system clock. Excluding the quartz crys-
`tal, the oscillator circuit is usually packaged on a single hybrid
`IC.
`A simple oscillator consists of an LC circuit, which we tune by
`carefully selecting component values that allow the circuit to res-
`onate at the desired frequency. When we need extreme frequency
`stability over a wide temperature range, we use an XO. An XO
`consists of a tuned circuit with an embedded quartz crystal in the
`fcedback loop. The crystal stabilizes the resonant frequency of the
`oscillator circuit.
`When we need a larger range of selectable frequencies, we use
`either a VCO or a VCXO, because the XO has a very limited
`tunable range. A DC voltage input controls both the VCO and
`VCXO. The VCO could be an emitter-coupled multivibrator that
`produces a square wave that we can tune over a 10: 1 frequency
`range up to 20 MHz. It could also be a capacitance-controlled
`oscillator that produces a sine wave tunable over a 2: 1 frequency
`range up to microwave frequencies. If we modify the resonant
`frequency of an XO, we get a tuning accuracy of a few hundred
`parts per million in the VCXO. Thus, the XO has the most
`frequency stability but the least tuning flexibility, the VCXO is in
`the middle on both, and the VCO has the least frequency stabil-
`ity and the most tuning flexibility. Frequency instability in the
`oscillator can cause clock jitter, requiring us to assign a tolerance
`to the clock-edge placement in timing analysis.
`From the system clock we derive the full set of clocks and clock
`phases that the system requires. We can generate multiphase
`clocks from a square-wave input in many different ways. These
`methods include one shots, clock choppers or shrinkers, shift-
`register latches, and frequency dividers, depending on the preci-
`sion and flexibility required. To prevent the overlap of adjacent
`clock phases in a nonoverlapping clocking scheme, we use out-
`put feedback or clock choppers. If there is uneven loading on each
`clock phase, the relative pulse-edge positions may change, which
`might cause some of the clock phases to overlap. Another cause
`of overlap is the asymmetric rising and falling delays of contem-
`porary devices.
`Figure 5 shows two simplified circuits that create two-phase
`clocks. The techniques are applicable to general multiphase clock
`generation. The first circuit is used in the Univac 1100/90 for
`four-phase clock generation. It requires a fast-running square-
`wave clock input and a ring counter. Each stage of the ring
`counter enables one clock phase, and the single clock chopper
`
`OCTOBER1988
`
`From the system
`clock we derive the
`full set of clocks
`and clock phases
`that the system
`requires.
`
`fa)
`
`Pesitive BUM
`
`Figure 4. Elements that manipulate the
`clock pulsewidth fa) and their elfeet on a
`positive pulse (b).
`
`15
`
`
`
`CLOCK SYSTEM DESIGN
`
`For developing,
`diagnosing, and
`producing high-
`speed systems, we ideally
`want a wide-bandwidth
`oscillator source that
`is highly accurate.
`
`C k k In
`
`I
`
`I
`
`I
`
`' d
`
`'
`
`!
`
`
`
`04
`Figure 5. Creating a two-phase clock: selecting the pulses of a fast-run-
`ning clock (4 and decoding the primary clocks (b).
`
`determines pulsewidth. The second circuit is used in the Bellmac-
`32A. It generates two-phase clocks by decoding primary clock sig-
`nals. We can use a gray-code counter to produce these primary
`clocks, or we can use clock shaping. Clock shaping allows us to
`generate clock phases from a system clock with fixed gaps be-
`tween phases (forcing pulsewidths to vary with frequency).
`
`~
`
`~-
`
`~~
`
`~
`
`~~
`
`CLOCK .~ SEQUENCES
`
`The three schemes for system clocking we have looked at-
`single-phase, multiphase, and edge-triggered-determine
`the
`basic data flow in latch and flip-flop machines during each clock
`cycle. Complicating these requirements, though, are special
`timing considerations. For example, subsystems may require
`dfferent clock-amval times so that they can communicate with
`each other across interfaces with large delays. Also, paths within
`subsystems may be too long for normal system timing. We can
`accommodate irregular interfaces and paths without affecting the
`clock cycle, although system timing becomes more complex. To
`handle these cases, we generate a sequence of clocks during each
`clock cycle and do not use normal data-path timing.
`There are two timing design styles for handling the clock
`sequences generated during a clock cycle: multiphase design and
`multiclock design. Figure 6 illustrates. The dashed vertical lines
`represent the boundaries of the clock cycle. The solid vertical lines
`represent active clock edges. Time proceeds left to right across
`each diagram and only paths originating from the earliest (left-
`most) cycle are shown. In a normal multiphase (k-phase) design
`
`16
`
`IEEE DESIGN & TEST OF COMPUTERS
`
`
`
`(Figure 6a). latches clocked by phase 1 feed latches clocked by
`phase 2, and so on. Only the latch clocked on the last phase feeds
`the phase-1 latch of the succeeding cycle. All data movement
`proceeds phase i to phase i+ 1 modulo k.
`In contrast, the multiclock design (Figure 6c) ensures that
`bistable elements clocked at any time Ti during one cycle feed only
`bistable elements clocked in the succeeding cycle. For instance,
`the three cycle n- 1 clocks are early, normal and late, which corre-
`spond to the times To, T1 and T2. Each can feed any of the To, Ti
`or T2 bistable elements in cycle n.
`In the Amdahl580, early clocks prevent long paths between the
`remote channel frame and 1 / 0 processor. If we clock the source
`latch earlier or destination latch later than normal on a signal
`path, the signal has a longer interval to propagate. Of course,
`other signal paths between latches using normal clocks as
`sources and early clocks as destinations will have a shorter than
`normal time to propagate. Similarly, paths with latches using late
`clocks as sources and normal clocks as destinations will also be
`shorter.
`Multiphase design and multiclock design can be mixed, as
`shown in Figure 6d. The two-phase, double-latch configuration
`has master latches, which feed their associated slaves in the same
`cycle. Each master latch is clocked at one of three timings: To, Ti
`or T2. The slave latch of each pair communicates with any of the
`master latches in the next cycle. The IBM 3033, 308X, and 3090
`mainframes use similar techniques.
`Figures 6b, 6e, and 6f show examples of more complex paths.
`Figure 6b shows the possibility of paths that skip adjacent phases
`in a three-phase system. The Univac 1100/90 is an example of a
`design with nonadjacent phase paths. Note that any phase-i-to-
`phase-i path in the succeeding cycle would require identical an-
`alysis to a single-phase system. Figures 6e and 6f show fractional
`cycle and multicycle paths. Such paths are typical of a perfor-
`mance-oriented design that uses two-phase latch machines.
`Systems can also generate clocks that operate at several dis-
`tinct cycle times, usually integer multiples of a base cycle time.
`We can use clocks with lower rates for parts of the system that
`do not need faster clocks. All clocking between subsystems must
`be synchronous, or else we must use techniques to reduce
`metastable behavior at subsystem interfaces.
`THE SYSTEM CLOCK SOURCE
`For developing, diagnosing, and producing high-speed systems,
`we ideally want a wide-bandwidth, highly accurate oscillator
`source. Most systems have both a crystal-oscillator source input
`for production systems and a tunable source input for prototype
`development and AC diagnosis. During development of a multi-
`phase system, we may need to vary the pulsewidth of any clock
`phase as well as to vary the relative pulse positions.
`To detect marginal path-delay problems, the Amdahl 580
`selects any one of three crystal oscillators as the clock source in
`production machines, lengthening or shortening its clock cycle.
`
`,
`
`Cydenl
`
`,
`
`Cvcten
`
`,
`
`Cyde n-1
`
`Cycle n - -
`
`Cyden-1
`
`Cyclen
`
`Cyde n-1
`
`Cycle n
`
`figure 6. Placing clock pulses: three-
`phase, adjacent paths (4; three-phase.
`nonadjacent paths (b): multiclock [three
`clock) (c): multiclock. two-phase (d); mul-
`ticlock, two-phase with fractional cycle
`paths (e): and multiclock. two-phase with
`multicycle paths If).
`
`OCTOBER1988
`
`17
`
`
`
`CLOCK SYSTEM DESIGN
`
`Thegoal of clock
`distribution is to
`organize clocks so that
`the delays from the source
`of each clock or clock
`phase to its bistable
`elements are
`identical.
`
`Operating modes are called normal, fast margin, and slow mar-
`gin. These correspond to nominal clock frequency, 5% faster than
`nominal, and 5% slower than nominal. An external oscillator
`input is also available, bypassing the internal oscillators during
`diagnosis and development.
`To detect marginal timing problems in the IBM 3090, a two-
`phase double-latch machine, designers made it possible to
`lengthen the delay between the leading edge of the slave clock and
`the trailing edge of the master latch clock for a selected system
`region (see Figure 3b). In addition, lengthening the clock cycle al-
`lows us to verify the slave-latch-to-slave-latch path delay.
`We can choose between distributed or centralized clock sources
`to control multiprocessors synchronously. In distributed control,
`we let each processor or processor group in the complex use its
`own local oscillator, with some form of enforced synchronization
`between oscillators. like a phase-locked loop. Alternatively, in
`centralized control, we designate one oscillator as the master
`oscillator and have each system select this master through a
`local/remote switch. The second method is simpler and is com-
`mon in mainframe multiprocessor models such as the Amdahl
`580, IBM 3033, and IBM 370/168. Although the IBM multi-
`processors use a master oscillator, other standby oscillators are
`phase-locked to the master oscillator and can be selected if it fails.
`CLOCK DISTRIBUTION
`The goal of clock distribution is to organize clocks so that the
`delays from the source of each clock or clock phase to its bistable
`elements (its destinations) are identical. In reality, however, no
`matter how each clock path is constructed, any two clock paths
`in the same machine or any two corresponding paths in different
`machines will always have a delay difference. Every computer
`operates in a different temperature, power supply, and radiation
`environment, and duplicate components will differ in subtle ways
`between computers. We must build in tolerance to these varia-
`tions in any system timing design.
`The most common approaches to ensure correct and reliable
`machine timing are worst-case analysis and statistical analysis.
`In worst-case analysis, we assume that all component parame-
`ters lie within some range, and the cumulative worst-case effect
`is still within the timing tolerance of the machine. In statistical
`analysis, the intent is that most machines have tolerable timing
`characteristics, and so we can rely on the cumulative statistical
`variations of component parameters to remain within the timing
`tolerance.
`
`We specify system timing such that every system memory ele-
`ment has an expected arrival time for the active edge of its clock
`signal. Clock-edge inaccuracy is the difference between the ac-
`tual and expected arrival time of this clock edge. For every pair
`of system memory elements that communicate, we define path
`
`18
`
`IEEE DESIGN &A ,TEST OF COMZ'UTERS
`
`
`
`Critical to efficient
`distribution is the
`el ock-network
`1 a you t-t he p h y s ica 1
`placement of the
`network.
`
`clock skew as the sum of the clock-edge inaccuracies of the pair’s
`source and destination. System clock skew is the largest path
`clock skew in the system. It is the value of the worst-case timing
`inaccuracy among all paths. We can break it into interboard skew,
`on-board interchip skew, and so on to the smallest timed compo-
`nent.
`The challenge to designers of clock-distribution networks is how
`to control system clock skew so that it becomes an acceptably
`small fraction of the system clock period. As a rule, most systems
`cannot tolerate a clock skew of more than 10% of the system clock
`period. If system clock skew goes beyond the design limit, system
`behavior can be affected. Setup and hold times are missed, which
`results in long and short paths. No scheme is immune from these
`problems-even
`flip-flop machines can malfunction when clock
`skew is present.
`Clock-powering trees, such as the one in Figure 7, are a source
`of clock skew. These trees are used to produce multiple copies of
`the clock signal for distribution. Each gate of the tree has some
`uncertainty associated with its delay, which is the difference be-
`tween its hest-case and worst-case delays. This difference is
`called gufe skew. Using a worst-case timing analysis, the clock
`skew caused by a powering tree equals the arithmetic sum of the
`gate (and interconnection) skews on the path from the tree root
`to an output. I n other words, clock skew has a cumulative effect
`by tree level. We can minimize this clock skew by placing all gates
`at a given tree level, or even the entire tree, on the same chip. In
`addition, we can realize elements at each tree level by using elec-
`trically matched devices and careful wiring.
`
`~
`
`~~
`
`DISTRIBUTION
`TECHNIQUES
`We must efficiently distribute the rectangular clock pulses pro-
`diicwl through the interaction of the oscillator and clock-gener-
`ation circuitry. Critical to efficient distribution is the clock-net-
`work layout-the physical placement of the network. It must
`conform to design rules that ensure the integrity of the clock sig-
`nal by minimizing electrical coupling, switching currents, and im-
`pedance discontinuities. Other rules must prevent excessive
`clock skew by equalizing path delays and maintaining the quali-
`ty of the signal edge. Symmetry and balanced loading at many
`levels ofpackaging. such a s on the chip or on the board, are char-
`acteristic of effective clock-network layouts. To achieve these
`qualities, we can prearrange positions of the clock pins and make
`clock paths as short as possible.
`It is sometimes difficult to coordinate the relative lengths of two
`paths that originate from a common source. To match any two
`paths or path segments in the clock system, we may need iden-
`t ical lengths of cable, wire, and interconnections; balanced load-
`ing; and equal numbers of buffer gates. A technique called time-
`domain rejlectomety helps in this process by accurately
`~iieasuring the line delays of long cables. In this process, line sig-
`
`OCTOBER1988
`
`Figure 7. Clock-powering tree.
`
`Clock 61
`
`19
`
`
`
`CLOCK SYSTEM DESIGN
`
`In practice, large
`systems distribute
`a small number of
`clock signals to each
`board or module.
`
`Figure 8. Logic islands.
`
`20
`
`nals are generated, and the signal reflections from line termina-
`tions are detected in real time. Once we measure the delays. we
`can equalize them by adjusting the lengths of the cables.
`Duplicating the composition of two paths is not the only meth-
`od of ensuring