throbber
CLOCK SYSTEM
`DESIGN
`
`KENNETH D. WAGNER
`IBM Corp.
`
`A well-designed clock system is a fun-
`damental requirement in high-speed
`computers. In this tutorial, the author
`provides a framework for under-
`standing system timing and then de-
`scribes how the clock system executes
`the timing specifications. The tutorial
`examines clock generation and the
`construction of clock-distribution net-
`works, which are integral to any clock
`system. Examples from contemporary
`high-speed systems highlight several
`common methods of clock generation,
`distribution, and tuning. Tight control
`of system clock skew is essential to an
`effective clock system.
`
`T
`
`he careful design of clock systems is often neglected. Part
`of the reason is that older, slower computers had higher
`tolerances to variations in the clock signal and had less
`exacting timing requirements. Today, however, as the
`demand for high-speed computers grows, the design of their clock
`systems should become a major concern not only in achieving
`high performance, but also in reducing assembly and mainte-
`nance costs.
`A well-planned and well-built clock system is a prerequisite to
`reliable long-term computer operation. Conversely, a badly de-
`signed clock system can plague a computer throughout its life-
`time, affecting its operation at any speed. To make such systems
`function, components often have to be tuned individually at sev-
`eral stages of manufacturing.
`Despite these costs and performance penalties, timing design
`is still overlooked in many systems. Although significant deci-
`sions that must be made early in computer design include such
`issues as clocking scheme and type of memory element, design-
`ers seldom participate. Instead, system architects may simply re-
`peat a previously successful set of choices, despite significant
`changes in design specifications, technology, and environment.
`Of course, these systems will eventually be functional, but they
`will require much more maintenance and tuning-costs not al-
`ways reflected back to the developers.
`These attitudes prevail in part because timing design problems
`are rarely reported in the literature. Also, design teams tend to
`be secretive about their clock systems, either because they believe
`they are doing something new or because they are doing nothing
`new and are afraid to be associated with an older technique.
`Either way, the result is a scarcity of information on how to avoid
`timing problems through proper design of the clock system.
`THE CLOCK SYSTEM
`System timing specifications are executed using a clock system.
`The clock system has two main functions, clock generation and
`clock distribution. We use clock-generation circuitry to form
`highly accurate timing signals, which we then use to synchronize
`
`OCTOBER1988
`
`0740-7475/88/ 1000-09$1.0001988 IEEE
`
`9
`
`ADVANCED MICRO DEVICES, INC.
`Exh. 2008
`LG ELECTRONICS, INC. v. ADVANCED MICRO DEVICES, INC.
`IPR2015-00324
`
`

`
`CLOCK SYSTEM DESIGN-
`
`Two types of
`clocked b is t a bl e
`elements are important
`in contemporary
`high-speed computers:
`the latch and the
`edge-triggered flip-flop.
`
`changes in the system state. These pulsed, synchronizing signals
`are known as clocks. We use clock distribution to deliver the
`clocks to their destinations at precisely specified instants. A net-
`work, called the clock-distribution network, propagates clocks
`formed by clock generation to clocked memory elements.
`Most logic design texts, such as that by McCluskey (see "Addi-
`tional Reading" at the end of this article), describe bistable ele-
`synchronized by the clock signals. A system oscillator is the
`source for these periodic signals. We generate and manipulate the
`clock signals and precisely place clock pulses to meet the system
`timing requirements. We may also tune the clocks to compensate
`for inaccuracies in the clock pulsewidth or pulse position.
`
`BISTABLE ELEMENTS
`The focus of this article is on the timing design of systems that
`use static bistable elements. The techniques described can also
`be used in the timing design for other types of clocked memory
`elements, such as arrays and dynamic latches, or for precharg-
`ing circuitry.
`Most logic design texts, such as that by McCluskey (see "Addi-
`tional Reading" at the end of this article), describe bistable ele-
`ments and their characteristics in great detail. Two types of
`clocked bistable elements are important in contemporary high-
`speed computers: the latch and the edge-triggered flip-flop. The
`latch is transparent while its clock (control) input is active. By
`transparent, we mean that its outputs reflect any of its data in-
`puts. Edge-triggered elements, such as the D flip-flop, respond to
`their data inputs only at either the rising or falling transition of
`their clock input. They do not have the transparency property of
`the latch.
`We can describe the time-dependent behavior of a bistable ele-
`ment using the following parameters:
`
`setup time, the minimum time that the data input of the bistable
`element must be held stable before the active edge or latching
`level of the clock pulse occurs
`hold time, the minimum time that the data input of the bistable
`element must be held stable after the active edge or latching
`level of the clock pulse disappears
`propagation delay, the time between a change on the clock or
`data input of the bistable element and the corresponding
`change on its output
`
`For system operation to be correct, the setup time, hold time,
`and minimum clock pulsewidth must be satisfied for each
`bistable element. Signals whose propagation delay is so long that
`it violates the setup time are called long-path signals. Signals
`whose propagation delay is so short that it violates the hold time
`are called short-path signals. Both conditions result in incorrect
`data being stored.
`
`10
`
`IEEE DESIGN & TEST OF COMPUTERS
`
`

`
`- W,
`
`oi;
`
`I
`
`1
`
`
`
`prlad-
`
`fc)
`Figure 1. System clocking waveforms;
`single-phase (a), two-phase &), a n d
`edge-triggered (c). Wi=pulsewidth of
`phase j and gji= interphase gap from
`phase i to phase j; fgy > 0 + two-phase,
`nonoverlapping, Ygy < 0 + two-phase
`overlapping.
`
`-W-
`
`~
`
`~
`
`SYSTEM CLOCKING SCHEMES
`
`_
`_
`
`System clocking is either single-phase, multiphase (usually
`two-phase), or edge-triggered. Figure 1 illustrates. The dark rec-
`tangles in the figure represent the interval during which a bistable
`element samples its data input. Each scheme requires a min-
`imum clock pulsewidth.
`The most widely used scheme is multiphase clocking. The mul-
`tiphase clocking scheme in Figure l b is two-phase, nonoverlap-
`ping. In this scheme, two distinct clock phases are distributed
`within the system, and each bistable element receives one of these
`two clocks. Systems that have adopted two-phase clocking in-
`clude microprocessors such as the Intel 80x86 series and Mo-
`torola MC68000 family, micro-mainframes such as the HP-9000,
`and mainframes such as the IBM 3090 and the Univac 1100/90.
`Figure 2 shows a finite-state machine, a machine that realizes
`sequential logic functions, with each clocking scheme. (For more
`on finite-state machines, see McCluskey’s text.) For simplicity,
`primary 1 / 0 is not shown. The Amdahl580 mainframe and Cray-
`1 vector processor are single-phase latch machines, such as that
`shown in Figure 2a. Modern high-speed microprocessors like the
`Bellmac-32A are two-phase latch machines with a single-latch
`design using nonoverlapping clock phases, such as that shown
`in Figure 2b. Figure 2c shows a two-phase latch machine with a
`double-latch design. This type of machine supports scan-path
`testing, since it can use LSSD latch pairs, which are hazard-free
`master-slave latches with a scan input port. Most contemporary
`IBM products, including IBM 3090 mainframes, incorporate de-
`sign for testability using this structure. Systems built with cata-
`log parts are usually flip-flop machines, such as that shown in
`Figure 2d, because clocked bistable elements commonly offered
`in bipolar and CMOS MSI chips are edge-triggered.
`
`~
`
`~
`
`~~~~~
`
`CLOCK CYCLE
`THE
`System designers characterize a computer’s functionality in
`terms of its clock cycle, also called its machine cycle. The aver-
`age number of clock cycles required per machine instruction is a
`measure of computer performance. Table 1 gives clock rates for
`some well-known systems. The designer focuses on the clock
`cycle because it determines the standard work interval for inter-
`nal machine functions. The system state is the set of values in
`system memory elements at the end of a clock cycle.
`A clock cycle has the following properties:
`
`1. It consists of a sequence of one or more clock pulses.
`2. The sequence of clocks generated in each cycle is identical to
`every other cycle.
`3. No partial clock sequences can occur: clocks can only stop and
`start at cycle boundaries.
`4. Each bistable element can be updated at most once per cycle.
`
`These properties ensure that the transition to the next state of
`the system is predictable and correct. This deterministic system
`
`OCTOBER1988
`
`Figure 2. General finite-state machine
`structures: one-phase latch machine (4.
`two-phase latch machine with single-
`latch &) and double latch (c), andflip-jbp
`machine (d).
`
`11
`
`

`
`CLOCK SYSTEM DESIGN
`
`a conventional
`Computer System, One
`source generates the
`system clock signal.
`Mu1 tiple processors
`Operating Sy~Chl'OnOUSly
`share One si8na1.
`may
`
`behavior will hold whether clock cycles occur at the system oper-
`ating rate or one at a time. We can reproduce system behavior at
`the operating rate by issuing single clock cycles or bursts of clock
`cycles, which makes system debugging much simpler.
`
`T I c i N G ANAL ysIs
`
`__
`
`~~
`
`~~
`
`~
`
`Programs for timing analysis are used routinely to verify system
`timing. They can identify long or short paths, and the designer
`can interact with them to get estimates of signal-path delays in
`parts of the system. Designers can also run them after layout to
`get more accurate results. The delay models used for system ele-
`ments are validated by circuit simulation.
`Single-phase systems and multiphase overlapping systems re-
`quire more extensive timing analysis than multiphase nonover-
`lapping and edge- triggered systems. The timing constraints of
`single-phase and multiphase overlapping systems are two-sided,
`bounded by both short paths and long paths. Figure 3 illustrates
`these constraints in a simplified example, where setup time and
`hold time are set to 0. The advantage of these systems is that they
`operate more quickly than their nonoverlapping counterparts.
`
`___
`
`~
`
`~
`
`~~
`
`CLOCK SIGNALS
`~
`_
`_
`~
`In a conventional computer system, one source generates the
`system clock signal. Multiple processors operating synchronously
`may also share one signal. We can manipulate this clock signal
`in many ways before it reaches its destinations. We can divide it,
`delay it, shape it, buffer it, and gate it. Clocked bistable elements,
`either latches or flip-flops, use the signal that results from such
`manipulations.
`
`
`
`System
`
`Intro Date
`
`Technology
`
`Class
`
`Nominal Clock
`Period (ns)
`
`Nominal Clock
`Frequency (MHz)
`
`Cray-X-MP
`Cray- 1 S,- 1M
`CDC Cyber 180/990
`IBM 3090
`Amdahl58
`IBM 308X
`Univac 1 100/90
`MIPS-X
`HP-900
`Motorola 68020
`Bellmac-32A
`
`12
`
`1982
`1980
`1985
`1986
`1982
`1981
`1984
`1987
`1982
`1985
`1982
`
`MSI ECL
`MSI ECL
`ECL
`ECL
`L S I ECL
`LSI 7TL
`LSI ECL
`VLSI CMOS
`VLSI NMOS
`VLSI CMOS
`VLSI CMOS
`_____
`
`~~
`
`Vector processor
`Vector processor
`Mainframe
`Mainframe
`Mainframe
`Mainframe
`Mainframe
`Microprocessor
`Micro-mainframe
`Microprocessor
`Microprocessor
`
`9.5
`12.5
`16.0
`18.5
`23.0
`24.5.26.0
`30.0
`50.0
`55.6
`60.0
`125.0
`
`105.3
`80.0
`62.5
`54.1
`43.5
`40.8.38.5
`33.3
`20.0
`18.0
`16.7
`8.0
`
`~
`
`~~
`
`B E E DESIGN & TEST OF COMPUTERS
`
`

`
`Combinational
`logic
`
`Cycle 1
`
`latch
`Cycle 2 -
`
`For all systems, we must
`correctly place the
`10 -
`LO
`leading- or trailing-edge
`positions of the
`distributed clock pulses to
`ensure that bistable
`elements switch at the
`correct times.
`
`c
`5
`1. LS data available at 1 ,
`2. LS data must arrive at LO after t , ( or be latched up In Cycle 1 + short path).
`(or reduces the path lengh available In CyCle2).
`3. LO data arrives at LO by t
`4. LS data mwt arrive at LO before t4 (or be latched up In Cycle 3 d Ion! path).
`
`~
`
`Clock2
`
`Clock3
`
`Combinational
`logic
`
`Cycle 1
`
`-
`
`., O
`
`DM
`Master
`\./
`
`10 __ 10
`Y
`Dest
`Slave
`Cycle 2 -
`
`Clock 1
`
`__
`10
`1s
`
`latch
`
`Clock 1
`
`Clock 2
`
`5
`
`( a1
`
`Clock1
`
`Source
`Master
`
`Source
`Slave
`
`Clock 1
`
`Clock2
`
`I
`
`I
`
`Clock 3
`
`J
`
`'\
`
`5
`1.1,s data available at 1,
`aller I , (or be latched up in Cycle 1 d short path).
`2.1,s data must arrive atI.JI
`3.1,s data must arrive at GO by I, (or violate system cycle rime rewirement).
`4.1,s data must arrlve at L,O before t I (or be latched up in Cyde 3 3 long path).
`Figure 3. Path requirements in a single-phase machine (dand in a two-
`phase overlapping latch machine with a double latch (b).
`
`~~
`
`~
`
`~~~~
`
`~~~~
`
`~ _ _ ~
`
`~
`
`SIGNAL CHARACTERISTICS
`Clocked sequential logic responds to several characteristics of
`the clock signal: the clock period, the pulsewidth, and the lead-
`ing-edge or trailing-edge position of the clock pulse. The clock pe-
`riod is the interval before the signal pattern repeats. The ideal
`clock signal for a bistable element is a sequence of regularly re-
`peating pulses. Ideal pulses are rectangular with sufficient dura-
`tion and amplitude to ensure the reliable operation of the bistable
`element. The duration of the pulse, or pulsewidth 0, can be any
`fraction of the clock period, but is usually less than or equal to
`half of it. An accurate model of a real clock pulse includes actual
`voltage levels and the shapes of the pulse edges.
`
`OCTOBER1988
`
`13
`
`

`
`CLOCK SYSTEM DESIGN
`
`pulsewidth-
`manipulation
`elf3"Zents have
`three functions;
`chop, shrink,
`and stretch.
`
`For all systems, we must correctly place the leading- or trailing-
`edge positions of the distributed clock pulses to ensure that
`bistable elements switch at the correct times. Also. distributed
`clock pulses must be wide enough or they will either be filtered
`out in transmission or be unable to switch a bistable element be-
`cause they lack the energy. Clock-manipulation elements reposi-
`tion clock pulses and change their pulsewidths. They consist of
`delay elements and elements that manipulate the pulsewidth.
`Delay elements either delay a pulse, or, in a timing chain, pro-
`duce a sequence of delayed pulses in response to a single pulse
`input. Pulsewidth-manipulation elements require both delay ele-
`ments and logic gates.
`Delay elements are available as both analog and digital circuits
`and are chosen according to the accuracy, flexibility, and range
`of signal delay required. Analog delay elements vary from simple
`printed or discrete wire interconnections to delay lines. Delay
`lines, packaged in hybrid chips, consist of lumped LC elements
`or distributed printed wire, which provides more accurate con-
`trol. Digital delay elements include logic gates and counters. Logic
`gates are relatively inaccurate because of their wide delay ranges.
`while the time resolution of counters depends on their operating
`frequency.
`Some delay elements are programmable, providing a range of
`delays. To select a particular delay, we can either connect to a
`particular chip output pin or tap, or control the configuration
`electronically by a multiplexer. A typical integrated delay line pro-
`vides delays from 1 to 10 ns in 1-ns increments with a k0.5-ns
`tolerance.
`Pulsewidth-manipulation elements have three functions: chop,
`shrink, and stretch. Figure 4b shows the effect of a chopper,
`shrinker, and stretcher on a positive pulse. The effect of each
`manipulation element differs for positive and negative clock
`pulses. Thus, for each pulse polarity, only three of the four ele-
`ments are useful. The other element has only a delay effect. Table
`2 shows the values for the signal characteristics after chopping,
`shrinking, and stretching. AND gates have delay &,. OR gates
`have delay &, inverters have delay dL, delay elements have delay
`D, and interconnections have no delay. The signal input is a pulse
`of width Wwhose leading edge occurs at time t=O. For an element
`to have an effect during the pulse, the sum of dL and D must be
`less than W.
`
`Table 2 . Effect of elements that manipulate the clock pukerutdth.
`-
`~-~
`Negative Pulse
`Positive Pulse
`-
`Element Leading Pulse- Function Leading Pulse- Function
`Edge width
`Edge width
`
`~~
`
`~~
`
`~
`
`~
`
`14
`
`A
`B
`C
`D
`
`D+d, Chopper
`da
`- -
`-
`D+d, W-D
`Shrinker
`da W+D Stretcher
`
`-
`- -
`do
`D+d, Chopper
`dn W+D Stretcher
`Shrinker
`I>+da W-D
`
`~~~~~~
`
`~
`
`~~
`
`~
`
`~
`
`ZEEE DESIGN & TEST OF COMPUTERS
`
`

`
`~
`
`CLOCK GENERATION
`-~
`~-
`We can derive all clock signals in a synchronous machine from
`the system clock signal. The system clock is often a rectangular
`pulse train with a 50% duty cycle, called a square wave. The cir-
`cuit that generates the system clock is at the base of the clock-
`distribution network. Its input is from either a voltage-controlled
`oscillator (VCO), a crystal oscillator (XO), or a voltage-controlled
`crystal oscillator (VCXO). All three sources produce a sinusoidal
`(single-frequency) output, which is then clamped or divided to
`generate the rectangular system clock. Excluding the quartz crys-
`tal, the oscillator circuit is usually packaged on a single hybrid
`IC.
`A simple oscillator consists of an LC circuit, which we tune by
`carefully selecting component values that allow the circuit to res-
`onate at the desired frequency. When we need extreme frequency
`stability over a wide temperature range, we use an XO. An XO
`consists of a tuned circuit with an embedded quartz crystal in the
`fcedback loop. The crystal stabilizes the resonant frequency of the
`oscillator circuit.
`When we need a larger range of selectable frequencies, we use
`either a VCO or a VCXO, because the XO has a very limited
`tunable range. A DC voltage input controls both the VCO and
`VCXO. The VCO could be an emitter-coupled multivibrator that
`produces a square wave that we can tune over a 10: 1 frequency
`range up to 20 MHz. It could also be a capacitance-controlled
`oscillator that produces a sine wave tunable over a 2: 1 frequency
`range up to microwave frequencies. If we modify the resonant
`frequency of an XO, we get a tuning accuracy of a few hundred
`parts per million in the VCXO. Thus, the XO has the most
`frequency stability but the least tuning flexibility, the VCXO is in
`the middle on both, and the VCO has the least frequency stabil-
`ity and the most tuning flexibility. Frequency instability in the
`oscillator can cause clock jitter, requiring us to assign a tolerance
`to the clock-edge placement in timing analysis.
`From the system clock we derive the full set of clocks and clock
`phases that the system requires. We can generate multiphase
`clocks from a square-wave input in many different ways. These
`methods include one shots, clock choppers or shrinkers, shift-
`register latches, and frequency dividers, depending on the preci-
`sion and flexibility required. To prevent the overlap of adjacent
`clock phases in a nonoverlapping clocking scheme, we use out-
`put feedback or clock choppers. If there is uneven loading on each
`clock phase, the relative pulse-edge positions may change, which
`might cause some of the clock phases to overlap. Another cause
`of overlap is the asymmetric rising and falling delays of contem-
`porary devices.
`Figure 5 shows two simplified circuits that create two-phase
`clocks. The techniques are applicable to general multiphase clock
`generation. The first circuit is used in the Univac 1100/90 for
`four-phase clock generation. It requires a fast-running square-
`wave clock input and a ring counter. Each stage of the ring
`counter enables one clock phase, and the single clock chopper
`
`OCTOBER1988
`
`From the system
`clock we derive the
`full set of clocks
`and clock phases
`that the system
`requires.
`
`fa)
`
`Pesitive BUM
`
`Figure 4. Elements that manipulate the
`clock pulsewidth fa) and their elfeet on a
`positive pulse (b).
`
`15
`
`

`
`CLOCK SYSTEM DESIGN
`
`For developing,
`diagnosing, and
`producing high-
`speed systems, we ideally
`want a wide-bandwidth
`oscillator source that
`is highly accurate.
`
`C k k In
`
`I
`
`I
`
`I
`
`' d
`
`'
`
`!
`
`
`
`04
`Figure 5. Creating a two-phase clock: selecting the pulses of a fast-run-
`ning clock (4 and decoding the primary clocks (b).
`
`determines pulsewidth. The second circuit is used in the Bellmac-
`32A. It generates two-phase clocks by decoding primary clock sig-
`nals. We can use a gray-code counter to produce these primary
`clocks, or we can use clock shaping. Clock shaping allows us to
`generate clock phases from a system clock with fixed gaps be-
`tween phases (forcing pulsewidths to vary with frequency).
`
`~
`
`~-
`
`~~
`
`~
`
`~~
`
`CLOCK .~ SEQUENCES
`
`The three schemes for system clocking we have looked at-
`single-phase, multiphase, and edge-triggered-determine
`the
`basic data flow in latch and flip-flop machines during each clock
`cycle. Complicating these requirements, though, are special
`timing considerations. For example, subsystems may require
`dfferent clock-amval times so that they can communicate with
`each other across interfaces with large delays. Also, paths within
`subsystems may be too long for normal system timing. We can
`accommodate irregular interfaces and paths without affecting the
`clock cycle, although system timing becomes more complex. To
`handle these cases, we generate a sequence of clocks during each
`clock cycle and do not use normal data-path timing.
`There are two timing design styles for handling the clock
`sequences generated during a clock cycle: multiphase design and
`multiclock design. Figure 6 illustrates. The dashed vertical lines
`represent the boundaries of the clock cycle. The solid vertical lines
`represent active clock edges. Time proceeds left to right across
`each diagram and only paths originating from the earliest (left-
`most) cycle are shown. In a normal multiphase (k-phase) design
`
`16
`
`IEEE DESIGN & TEST OF COMPUTERS
`
`

`
`(Figure 6a). latches clocked by phase 1 feed latches clocked by
`phase 2, and so on. Only the latch clocked on the last phase feeds
`the phase-1 latch of the succeeding cycle. All data movement
`proceeds phase i to phase i+ 1 modulo k.
`In contrast, the multiclock design (Figure 6c) ensures that
`bistable elements clocked at any time Ti during one cycle feed only
`bistable elements clocked in the succeeding cycle. For instance,
`the three cycle n- 1 clocks are early, normal and late, which corre-
`spond to the times To, T1 and T2. Each can feed any of the To, Ti
`or T2 bistable elements in cycle n.
`In the Amdahl580, early clocks prevent long paths between the
`remote channel frame and 1 / 0 processor. If we clock the source
`latch earlier or destination latch later than normal on a signal
`path, the signal has a longer interval to propagate. Of course,
`other signal paths between latches using normal clocks as
`sources and early clocks as destinations will have a shorter than
`normal time to propagate. Similarly, paths with latches using late
`clocks as sources and normal clocks as destinations will also be
`shorter.
`Multiphase design and multiclock design can be mixed, as
`shown in Figure 6d. The two-phase, double-latch configuration
`has master latches, which feed their associated slaves in the same
`cycle. Each master latch is clocked at one of three timings: To, Ti
`or T2. The slave latch of each pair communicates with any of the
`master latches in the next cycle. The IBM 3033, 308X, and 3090
`mainframes use similar techniques.
`Figures 6b, 6e, and 6f show examples of more complex paths.
`Figure 6b shows the possibility of paths that skip adjacent phases
`in a three-phase system. The Univac 1100/90 is an example of a
`design with nonadjacent phase paths. Note that any phase-i-to-
`phase-i path in the succeeding cycle would require identical an-
`alysis to a single-phase system. Figures 6e and 6f show fractional
`cycle and multicycle paths. Such paths are typical of a perfor-
`mance-oriented design that uses two-phase latch machines.
`Systems can also generate clocks that operate at several dis-
`tinct cycle times, usually integer multiples of a base cycle time.
`We can use clocks with lower rates for parts of the system that
`do not need faster clocks. All clocking between subsystems must
`be synchronous, or else we must use techniques to reduce
`metastable behavior at subsystem interfaces.
`THE SYSTEM CLOCK SOURCE
`For developing, diagnosing, and producing high-speed systems,
`we ideally want a wide-bandwidth, highly accurate oscillator
`source. Most systems have both a crystal-oscillator source input
`for production systems and a tunable source input for prototype
`development and AC diagnosis. During development of a multi-
`phase system, we may need to vary the pulsewidth of any clock
`phase as well as to vary the relative pulse positions.
`To detect marginal path-delay problems, the Amdahl 580
`selects any one of three crystal oscillators as the clock source in
`production machines, lengthening or shortening its clock cycle.
`
`,
`
`Cydenl
`
`,
`
`Cvcten
`
`,
`
`Cyde n-1
`
`Cycle n - -
`
`Cyden-1
`
`Cyclen
`
`Cyde n-1
`
`Cycle n
`
`figure 6. Placing clock pulses: three-
`phase, adjacent paths (4; three-phase.
`nonadjacent paths (b): multiclock [three
`clock) (c): multiclock. two-phase (d); mul-
`ticlock, two-phase with fractional cycle
`paths (e): and multiclock. two-phase with
`multicycle paths If).
`
`OCTOBER1988
`
`17
`
`

`
`CLOCK SYSTEM DESIGN
`
`Thegoal of clock
`distribution is to
`organize clocks so that
`the delays from the source
`of each clock or clock
`phase to its bistable
`elements are
`identical.
`
`Operating modes are called normal, fast margin, and slow mar-
`gin. These correspond to nominal clock frequency, 5% faster than
`nominal, and 5% slower than nominal. An external oscillator
`input is also available, bypassing the internal oscillators during
`diagnosis and development.
`To detect marginal timing problems in the IBM 3090, a two-
`phase double-latch machine, designers made it possible to
`lengthen the delay between the leading edge of the slave clock and
`the trailing edge of the master latch clock for a selected system
`region (see Figure 3b). In addition, lengthening the clock cycle al-
`lows us to verify the slave-latch-to-slave-latch path delay.
`We can choose between distributed or centralized clock sources
`to control multiprocessors synchronously. In distributed control,
`we let each processor or processor group in the complex use its
`own local oscillator, with some form of enforced synchronization
`between oscillators. like a phase-locked loop. Alternatively, in
`centralized control, we designate one oscillator as the master
`oscillator and have each system select this master through a
`local/remote switch. The second method is simpler and is com-
`mon in mainframe multiprocessor models such as the Amdahl
`580, IBM 3033, and IBM 370/168. Although the IBM multi-
`processors use a master oscillator, other standby oscillators are
`phase-locked to the master oscillator and can be selected if it fails.
`CLOCK DISTRIBUTION
`The goal of clock distribution is to organize clocks so that the
`delays from the source of each clock or clock phase to its bistable
`elements (its destinations) are identical. In reality, however, no
`matter how each clock path is constructed, any two clock paths
`in the same machine or any two corresponding paths in different
`machines will always have a delay difference. Every computer
`operates in a different temperature, power supply, and radiation
`environment, and duplicate components will differ in subtle ways
`between computers. We must build in tolerance to these varia-
`tions in any system timing design.
`The most common approaches to ensure correct and reliable
`machine timing are worst-case analysis and statistical analysis.
`In worst-case analysis, we assume that all component parame-
`ters lie within some range, and the cumulative worst-case effect
`is still within the timing tolerance of the machine. In statistical
`analysis, the intent is that most machines have tolerable timing
`characteristics, and so we can rely on the cumulative statistical
`variations of component parameters to remain within the timing
`tolerance.
`
`We specify system timing such that every system memory ele-
`ment has an expected arrival time for the active edge of its clock
`signal. Clock-edge inaccuracy is the difference between the ac-
`tual and expected arrival time of this clock edge. For every pair
`of system memory elements that communicate, we define path
`
`18
`
`IEEE DESIGN &A ,TEST OF COMZ'UTERS
`
`

`
`Critical to efficient
`distribution is the
`el ock-network
`1 a you t-t he p h y s ica 1
`placement of the
`network.
`
`clock skew as the sum of the clock-edge inaccuracies of the pair’s
`source and destination. System clock skew is the largest path
`clock skew in the system. It is the value of the worst-case timing
`inaccuracy among all paths. We can break it into interboard skew,
`on-board interchip skew, and so on to the smallest timed compo-
`nent.
`The challenge to designers of clock-distribution networks is how
`to control system clock skew so that it becomes an acceptably
`small fraction of the system clock period. As a rule, most systems
`cannot tolerate a clock skew of more than 10% of the system clock
`period. If system clock skew goes beyond the design limit, system
`behavior can be affected. Setup and hold times are missed, which
`results in long and short paths. No scheme is immune from these
`problems-even
`flip-flop machines can malfunction when clock
`skew is present.
`Clock-powering trees, such as the one in Figure 7, are a source
`of clock skew. These trees are used to produce multiple copies of
`the clock signal for distribution. Each gate of the tree has some
`uncertainty associated with its delay, which is the difference be-
`tween its hest-case and worst-case delays. This difference is
`called gufe skew. Using a worst-case timing analysis, the clock
`skew caused by a powering tree equals the arithmetic sum of the
`gate (and interconnection) skews on the path from the tree root
`to an output. I n other words, clock skew has a cumulative effect
`by tree level. We can minimize this clock skew by placing all gates
`at a given tree level, or even the entire tree, on the same chip. In
`addition, we can realize elements at each tree level by using elec-
`trically matched devices and careful wiring.
`
`~
`
`~~
`
`DISTRIBUTION
`TECHNIQUES
`We must efficiently distribute the rectangular clock pulses pro-
`diicwl through the interaction of the oscillator and clock-gener-
`ation circuitry. Critical to efficient distribution is the clock-net-
`work layout-the physical placement of the network. It must
`conform to design rules that ensure the integrity of the clock sig-
`nal by minimizing electrical coupling, switching currents, and im-
`pedance discontinuities. Other rules must prevent excessive
`clock skew by equalizing path delays and maintaining the quali-
`ty of the signal edge. Symmetry and balanced loading at many
`levels ofpackaging. such a s on the chip or on the board, are char-
`acteristic of effective clock-network layouts. To achieve these
`qualities, we can prearrange positions of the clock pins and make
`clock paths as short as possible.
`It is sometimes difficult to coordinate the relative lengths of two
`paths that originate from a common source. To match any two
`paths or path segments in the clock system, we may need iden-
`t ical lengths of cable, wire, and interconnections; balanced load-
`ing; and equal numbers of buffer gates. A technique called time-
`domain rejlectomety helps in this process by accurately
`~iieasuring the line delays of long cables. In this process, line sig-
`
`OCTOBER1988
`
`Figure 7. Clock-powering tree.
`
`Clock 61
`
`19
`
`

`
`CLOCK SYSTEM DESIGN
`
`In practice, large
`systems distribute
`a small number of
`clock signals to each
`board or module.
`
`Figure 8. Logic islands.
`
`20
`
`nals are generated, and the signal reflections from line termina-
`tions are detected in real time. Once we measure the delays. we
`can equalize them by adjusting the lengths of the cables.
`Duplicating the composition of two paths is not the only meth-
`od of ensuring

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket