throbber
I 1111111111111111 11111 1111111111 111111111111111 IIIII IIIII IIIIII IIII IIII IIII
`US007546567B2
`
`c12) United States Patent
`Cheon et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,546,567 B2
`Jun.9,2009
`
`(54)
`
`METHOD AND APPARATUS FOR
`GENERATING A VARIATION-TOLERANT
`CLOCK-TREE FOR AN INTEGRATED
`CIRCUIT CHIP
`
`(75)
`
`Inventors: Yongseok Cheon, Portland, OR (US);
`Pei-Hsin Ho, Portland, OR (US)
`
`(73)
`
`Assignee: Synopsys, Inc., Mountain View, CA
`(US)
`
`( *)
`
`Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 191 days.
`
`(21)
`
`Appl. No.: 11/652,302
`
`(22) Filed:
`
`Jan.10,2007
`
`(65)
`
`(51)
`
`(52)
`(58)
`
`(56)
`
`Prior Publication Data
`
`US 2008/0168412Al
`
`Jul. 10, 2008
`
`Int. Cl.
`G06F 17150
`(2006.01)
`U.S. Cl. .................................. 716/6; 716/7; 716/10
`Field of Classification Search ....................... None
`See application file for complete search history.
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`Minami ......................... 716/6
`Cheng et al . .................. 716/10
`Srinivasan et al .
`............ 716/10
`Auracher et al.
`.............. 716/13
`
`A *
`9/1996
`Bl*
`4/2002
`Bl*
`2/2004
`B2 * 10/2006
`
`5,557,779
`6,367,060
`6,698,006
`7,117,472
`
`7,257,782 B2 *
`2008/0216040 Al*
`
`8/2007 Ho et al. ........................ 716/2
`9/2008 Furnish et al.
`................ 716/10
`
`OTHER PUBLICATIONS
`
`Cheon et al., "Power-Aware Placement," Design Automation Con(cid:173)
`ference 2005, pp. 795-800.*
`Rajaram et al., "Reducing Clock Skew Variability via Cross Links,"
`Design Automation Conference 2004, pp. 18-23.*
`Velenis et al., "A Clock Tree Topology Extraction Algorithm for
`Improving the Tolerance of Clock Distribution Networks to Delay
`Uncertainty," 2001 IEEE, pp. 422-425.*
`
`* cited by examiner
`
`Primary Examiner-Leigh Marie Garbowski
`(74) Attorney, Agent, or Firm-Park, Vaughan & Fleming
`LLP
`
`(57)
`
`ABSTRACT
`
`One embodiment of the present invention relates to a process
`that generates a clock-tree on an integrated circuit (IC) chip.
`During operation, the process starts by receiving a placement
`for a chip layout, where the placement includes a set of
`registers at fixed locations in the chip layout. The process then
`generates a timing criticality profile for the set of registers,
`wherein the timing criticality profile specifies timing criticali(cid:173)
`ties between pairs of registers in the set of registers. Next, the
`process clusters the set of registers based on the timing criti(cid:173)
`cality profile to create a clock-tree for the set of registers. By
`clustering the registers based on the timing criticality profile,
`the process facilitates using commonly-shared clock paths in
`the clock-tree to provide clock signals to timing critical reg(cid:173)
`ister pairs.
`
`25 Claims, 6 Drawing Sheets
`
`RECEIVE A PLACEMENT
`402
`
`GENERATE A TIMING
`CRITICALITY PROFILE FROM
`THE PLACEMENT
`404
`
`PERFORM A GEOMETRIC
`LOCATION-BASED
`CLUSTERING TO GENERATE A
`TEMPORARY PARTlAL-CLOCK(cid:173)
`TREE
`406
`
`EXTRACT A SET OF
`CONSTRAINTS FROM THE
`TEMPORARY PARTIAL-CLOCK(cid:173)
`TREE
`408
`
`DISCARD THE TEMPORARY
`PARTIAL-CLOCK-TREE
`410
`
`PERFORM A TIMING
`CRITICALITY-BASED
`CLUSTERING UNDER THE SET
`OF CONSTRAINTS TO
`GENERATE A PARTIAL(cid:173)
`CLOCK-TREE
`412
`
`Page 1 of 14
`
`

`

`-....l = N
`
`0--,
`tit
`0--,
`~
`tit
`-....l
`d r.,;_
`
`O'I
`0 .....
`....
`.....
`rJJ =- ('D
`
`('D
`
`\,Ci
`0
`0
`N
`~\,Ci
`
`~ = ?
`
`~ = ~
`
`~
`~
`~
`•
`00
`~
`
`1;g
`
`Prata
`
`FIG. 1
`
`128
`
`126
`
`124
`
`122
`
`120
`
`118
`
`Design
`System ) Logic Design) Synthesis & )
`
`Test 116
`Design for
`
`Verif. 114
`and Fune.
`
`112
`
`) Resolution ) ~ask )
`
`Enhanc.
`
`Verfication
`Physical
`
`Implement. & Extract.
`) Analys~ )
`Physical
`
`Planning
`) Design )
`
`Verification
`
`Netlist
`
`160
`
`Assembly
`Pad<aging ) ~
`
`170
`Chips
`
`&
`
`) Fabrication ) )
`
`150
`
`EDA ) Tape-
`
`110
`
`Software
`
`140
`out
`
`Product ))
`
`100
`Idea
`
`Page 2 of 14
`
`

`

`U.S. Patent
`US. Patent
`
`Jun. 9, 2009
`Jun.9,2009
`
`Sheet 2 of6
`Sheet 2 of 6
`
`US 7,546,567 B2
`US 7,546,567 B2
`
`
`
`208
`
`FIG. 2
`FIG. 2
`
`Page 3 of 14
`
`Page 3 of 14
`
`

`

`U.S. Patent
`
`Jun.9,2009
`
`Sheet 3 of 6
`
`US 7,546,567 B2
`
`CLOCK 302
`
`310\ _________________ _
`
`(-3t~-------.I
`
`--:--
`:
`:
`
`I
`
`I
`I
`I
`I
`I
`I
`I
`
`I
`I
`I
`I
`I
`I
`
`- - -1
`I
`I
`I
`I
`I
`I
`I
`304
`___ J
`
`c32~
`I
`I
`I
`I
`I
`I
`I
`1314
`I
`
`FIG. 3
`
`Page 4 of 14
`
`

`

`U.S. Patent
`
`Jun.9,2009
`
`Sheet 4 of 6
`
`US 7,546,567 B2
`
`START
`
`RECEIVE A PLACEMENT
`402
`
`GENERATE A TIMING
`CRITICALITY PROFILE FROM
`THE PLACEMENT
`404
`
`PERFORM A GEOMETRIC
`LOCATION-BASED
`CLUSTERING TO GENERATE A
`.-----t1.iTEMPORARY PARTIAL-CLOCK(cid:173)
`TREE
`406
`
`EXTRACT A SET OF
`CONSTRAINTS FROM THE
`TEMPORARY PARTIAL-CLOCK(cid:173)
`TREE
`408
`
`DISCARD THE TEMPORARY
`PARTIAL-CLOCK-TREE
`410
`
`PERFORM A TIMING
`CRITICALITY-BASED
`CLUSTERING UNDER THE SET
`OF CONSTRAINTS TO
`GENERATE A PARTIAL(cid:173)
`CLOCK-TREE
`412
`
`NO
`
`END
`
`FIG. 4
`
`Page 5 of 14
`
`

`

`U.S. Patent
`
`Jun.9,2009
`
`Sheet 5 of 6
`
`US 7,546,567 B2
`
`START
`
`PRIORITIZE LEAF-LEVEL
`REGISTERS BASED ON THE
`TIMING CRITICALITIES
`502
`
`START WITH THE HIGHEST PRIORITY REGISTER
`PAIR, AND FOR EACH PRIORITIZED REGISTER PAIR
`
`CLUSTER THE REGISTER
`PAIR INTO A SAME CLUSTER
`504
`
`CHECK THE CLUSTER
`CONTAINING THE REGISTER
`PAIR AGAINST THE SET OF
`CONSTRAINTS
`506
`
`YES
`
`DO NOT INCLUDE
`THE REGISTER PAIR
`IN THE SAME
`CLUSTER
`510
`
`NO
`
`INCLUDE THE REGISTER PAIR
`IN THE SAME CLUSTER
`512
`
`GO TO THE NEXT HIGHEST PRIORITY REGISTER PAIR
`
`CLUSTER THE REMAINING
`REGISTERS USING A
`LOCATION-BASED
`CLUSTERING PROCESS
`514
`
`END
`
`FIG. 5
`
`Page 6 of 14
`
`

`

`U.S. Patent
`
`Jun.9,2009
`
`Sheet 6 of 6
`
`US 7,546,567 B2
`
`CHIP 600
`
`CLUSTER
`602
`
`CLUSTER
`606
`
`CLUSTER
`604
`
`CLUSTER
`608
`
`FIG. 6A
`
`CHIP 600
`
`610
`
`624
`
`612
`
`626
`
`614
`
`y7
`
`616
`
`FIG. 6B
`
`Page 7 of 14
`
`

`

`US 7,546,567 B2
`
`1
`METHOD AND APPARATUS FOR
`GENERATING A VARIATION-TOLERANT
`CLOCK-TREE FOR AN INTEGRATED
`CIRCUIT CHIP
`
`BACKGROUND
`
`2
`tion networks to delay uncertainty," ISCAS 2001). However,
`this technique does not use any physical proximity informa(cid:173)
`tion to guide clock-tree synthesis, and therefore suffers from
`problems such as unbalanced tree topology, larger wire length
`5 overhead, and higher power consumption.
`Hence, what is needed is a method and an apparatus for
`creating an OCV-tolerant clock-tree without the problems
`described above.
`
`SUMMARY
`
`1. Field of the Invention
`The present invention relates to techniques for designing
`clock distribution networks for integrated circuit (IC) chips. 10
`More specifically, the present invention relates to a method
`and an apparatus for generating a clock-tree on an IC chip to
`facilitate reducing the effects of on-chip variation (OCV).
`2. Related Art
`Advances in semiconductor technology presently make it
`possible to integrate large-scale systems, including hundreds
`of millions of transistors, onto a single semiconductor chip.
`Integrating such large-scale systems onto a single semicon(cid:173)
`ductor chip increases the speed at which such systems can
`operate, because signals between system components do not
`have to cross chip boundaries, and are not subject to lengthy
`chip-to-chip propagation delays.
`The speed of a system on an integrated circuit (IC) chip is
`largely determined by system clock frequency. In a typical
`synchronous IC chip, a clock distribution network (referred to
`as a "clock-tree") is used to distribute a clock signal from a
`common source to various circuit components. This clock
`signal is used to coordinate data transfers between circuit
`components. However, as
`increasing clock frequencies
`reduce the clock periods to fractions of a nanosecond, design- 30
`ing clock-trees is becoming increasingly more challenging.A
`direct result of the decreasing clock period is a shrinking
`"timing budget" between logically coupled clock sinks. This
`decreasing timing budget is requiring clock-trees to have
`minimal clock skew.
`Many sources contribute to clock skew in a clock-tree.
`Among these sources, "variations" have become one of the
`more significant challenges
`in synchronous clock-tree
`design. These variations can include: manufacturing process
`variations, operational voltage variations, and ambient tem(cid:173)
`perature variations. In particular, some of these variations
`occur within a chip boundary, and are hence referred to as
`"on-chip variations" (OCV). Due to the impact of OCV, the
`timing characteristics of instances of the same component
`may vary across the chip, thereby limiting the performance of
`the chip, and even threatening the functionality of the chip.
`Furthermore, OCV causes uncertainty in clock arrival times
`at circuit components. This uncertainty can cause clock skew
`and can thereby worsen the timing performance of the data
`paths between the clock sinks.
`In order to reduce the effects of OCV, some systems insert
`shunt connections called "cross links" into a clock-tree struc(cid:173)
`ture in a post-processing step (see A. Rajaram, J. Hu and R.
`Mahapatra, "Reducing Clock Skew Variability via Cross
`Links," IEEE Trans. Computer-Aided Design, Vol. 25, No. 6,
`pp. 1176-1182, June, 2006). These cross links can increase
`the amount of clock-path sharing between registers, thereby
`improving OCV-tolerance. However, this technique requires
`significantly more routing resources (wires) than would be
`needed by a typical clock-tree. Furthermore, the timing char(cid:173)
`acteristics of the cross links are generally difficult to analyze.
`Another disadvantage of this technique is that additional
`wires also increase overall power consumption of the chip.
`Other systems address the OCV issue by sequentially
`merging timing-critical pairs ofregisters based on a priority
`ordering (see D. Velenis, et al., "A clock tree topology extrac(cid:173)
`tion algorithm for improving the tolerance of clock distribu-
`
`One embodiment of the present invention relates to a pro(cid:173)
`cess that generates a clock-tree on an integrated circuit (IC)
`chip. During operation, the process starts by receiving a
`15 placement for a chip layout, where the placement includes a
`set of registers at fixed locations in the chip layout. The
`process then generates a timing criticality profile for the set of
`registers, wherein the timing criticality profile specifies tim(cid:173)
`ing criticalities between pairs of registers in the set of regis-
`20 ters. Next, the process clusters the set of registers based on the
`timing criticality profile to create a clock-tree for the set of
`registers. By clustering the registers based on the timing
`criticality profile, the process facilitates using commonly(cid:173)
`shared clock paths in the clock-tree to provide clock signals to
`25 timing critical register pairs.
`In a variation on this embodiment, the process obtains the
`timing criticality between a pair of registers by computing a
`timing slack between the pair of registers based on the
`received placement.
`In a further variation on this embodiment, the process
`computes the timing slack between the pair of registers by
`computing a data path delay and a clock skew between the
`pair of registers based on the received placement.
`In a variation on this embodiment, the process generates
`35 the timing criticality profile for the set of registers by con(cid:173)
`structing a graph G(Y, E), wherein Vis the set of registers and
`E is the set of edges between the set of registers weighted by
`the corresponding timing criticalities.
`In a variation on this embodiment, the process clusters the
`40 set of registers based on the timing criticality profile to create
`a clock-tree by: clustering the set of registers at a leaf-level to
`generate a disjoint set ofleaf-level clusters; assigning a clock
`buffer to each of the leaf-level clusters to generate a set of
`clock-buffers; and clustering the set of clock-buffers to gen-
`45 erate a disjoint set ofnon-leaflevel clusters.
`In a further variation on this embodiment, the process
`clusters the set of registers at the leaf-level by: prioritizing the
`pairs of registers based on the associated timing criticalities,
`wherein a more timing critical pair of registers receives a
`50 higher priority; and attempting to assign a pair of registers to
`the same cluster based on the associated priority, thereby
`substantially maximizing the inclusion of the timing critical
`register pairs into the same clusters.
`In a further variation, registers within the same cluster
`55 share clock-buffers and clock-nets along associated clock
`paths in the clock-tree.
`In a further variation, prior to clustering the set of registers
`based on the timing criticality profile, the process clusters the
`set of registers using a geometric location-based clustering
`60 process to obtain a temporary partial-clock-tree, which com(cid:173)
`prises a disjoint set of clusters. The process then extracts a set
`of constraints from the set of clusters. Next, the process
`discards the temporary partial-clock-tree.
`In a further variation, the set of constraints can include: a
`65 bounding box size for a cluster; a bounding box shape for a
`cluster; a fanout within a cluster; and a capacitance within a
`cluster.
`
`Page 8 of 14
`
`

`

`US 7,546,567 B2
`
`3
`In a further variation, the process clusters the set of regis(cid:173)
`ters based on the timing criticality profile by applying the set
`of constraints to each of the leaf-level clusters and the non(cid:173)
`leaf-level clusters to ensure that the constraints are not vio(cid:173)
`lated.
`In a variation on this embodiment, using commonly-shared
`clock paths in the clock-tree to provide clock signals to the
`timing critical register pairs improves timing performance of
`the IC chip by reducing the impact of on-chip-variations in
`creating clock skew between registers in timing critical reg(cid:173)
`ister pairs.
`In a further variation on this embodiment, the on-chip(cid:173)
`variations can include: process variations; voltage variations;
`and temperature variations.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`FIG.1 illustrates various steps in the design and fabrication
`of an integrated circuit in accordance with an embodiment of
`the present invention.
`FIG. 2 illustrates the effect ofOCV on an exemplary circuit
`in accordance with an embodiment of the present invention.
`FIG. 3 illustrates an exemplary clock-tree structure with
`leaf-level registers in accordance with an embodiment of the
`present.
`FIG. 4 presents a flowchart illustrating the process of gen(cid:173)
`erating a clock-tree to improve timing performance in accor(cid:173)
`dance with an embodiment of the present.
`FIG. 5 presents a flowchart illustrating the process of clus(cid:173)
`tering leaf-level registers based on both timing criticality
`profile and constraints from the location-based clustering in
`accordance with an embodiment of the present.
`FIG. 6A illustrates an exemplary leaf-level-clustering
`result in accordance with an embodiment of the present.
`FIG. 6B illustrates the non-leaf-level abstraction of the
`clock buffers in FIG. 6A in accordance with an embodiment
`of the present.
`
`DETAILED DESCRIPTION
`
`The following description is presented to enable any per(cid:173)
`son skilled in the art to make and use the invention, and is
`provided in the context of a particular application and its
`requirements. Various modifications to the disclosed embodi(cid:173)
`ments will be readily apparent to those skilled in the art, and
`the general principles defined herein may be applied to other
`embodiments and applications without departing from the
`spirit and scope of the present invention. Thus, the present
`invention is not limited to the embodiments shown, but is to
`be accorded the widest scope consistent with the principles
`and features disclosed herein.
`
`Integrated Circuit Design Flow
`FIG.1 illustrates various steps in the design and fabrication
`of an integrated circuit in accordance with an embodiment of
`the present invention.
`The process starts with the product idea ( step 100) which is
`realized using an EDA software design process (step 110).
`When the design is finalized, it can be taped-out ( event 140).
`After tape out, the fabrication process (step 150) and pack(cid:173)
`aging and assembly processes (step 160) are performed
`which ultimately result in finished chips (result 170).
`The EDA software design process (step 110), in tum, com(cid:173)
`prises steps 112-130, which are described below. Note that
`the design flow description is for illustration purposes only.
`This description is not meant to limit the present invention.
`For example, an actual integrated circuit design may require
`
`5
`
`20
`
`4
`the designer to perform the design steps in a different
`sequence than the sequence described below. The following
`discussion provides further details of the steps in the design
`process.
`System design (step 112): The designers describe the func-
`tionality that they want to implement. They can also perform
`what-if planning to refine functionality, check costs, etc.
`Hardware-software architecture partitioning can occur at this
`stage. Exemplary EDA software products from Synopsys,
`10 Inc. that can be used at this step include Model Architect,
`Saber, System Studio, and Design Ware® products.
`Logic design and functional verification (step 114): At this
`stage, the VHDL or Verilog code for modules in the system is
`written and the design is checked for functional accuracy.
`15 More specifically, the design is checked to ensure that it
`produces the correct outputs. Exemplary EDA software prod(cid:173)
`ucts from Synopsys, Inc. that can be used at this step include
`VCS, VERA, Design Ware®, Magellan, Formality, ESP and
`LEDA products.
`Synthesis and design for test (step 116): Here, the VHDL/
`Verilog is translated to a netlist. The netlist can be optimized
`for the target technology. Additionally, tests can be designed
`and implemented to check the finished chips. Exemplary
`EDA software products from Synopsys, Inc. that can be used
`25 at this step include Design Compiler®, Physical Compiler,
`Test Compiler, Power Compiler, FPGA Compiler, Tetramax,
`and Design Ware® products.
`Netlist verification (step 118): At this step, the netlist is
`checked for compliance with timing constraints and for cor-
`30 respondence with the VHDLNerilog source code. Exemplary
`EDA software products from Synopsys, Inc. that can be used
`at this step include Formality, Prime Time, and VCS products.
`Design planning (step 120): Here, an overall floorplan for
`the chip is constructed and analyzed for timing and top-level
`35 routing. Exemplary EDA software products from Synopsys,
`Inc. that can be used at this step include Astra and IC Com(cid:173)
`piler products.
`Physical implementation (step 122): The placement (posi-
`40 tioning of circuit elements) and routing ( connection of the
`same) occurs at this step. Exemplary EDA software products
`from Synopsys, Inc. that can be used at this step include the
`Astra and IC Compiler products.
`Analysis and extraction (step 124): At this step, the circuit
`function is verified at a transistor level, this in turn permits
`what-if refinement. Exemplary EDA software products from
`Synopsys, Inc. that can be used at this step includeAstroRail,
`PrimeRail, Primetime, and Star RC/XT products.
`Physical verification (step 126): In this step, the design is
`50 checked to ensure correctness for manufacturing, electrical
`issues, lithographic issues, and circuitry. Exemplary EDA
`software products from Synopsys, Inc. that can be used at this
`step include the Hercules product.
`Resolution enhancement (step 128): This step involves
`55 geometric manipulations of the layout to improve manufac(cid:173)
`turability of the design. Exemplary EDA software products
`from Synopsys, Inc. that can be used at this step include
`Proteus, ProteusAF, and PSMGen products.
`Mask data preparation (step 130): This step provides the
`60 "tape-out" data for production of masks to produce finished
`chips. Exemplary EDA software products from Synopsys,
`Inc. that can be used at this step include the CATS(R) family
`of products.
`Embodiments of the present invention can be used during
`65 one or more of the above described steps. Specifically, one
`embodiment of the present invention can be used during the
`physical implementation step 122.
`
`45
`
`Page 9 of 14
`
`

`

`US 7,546,567 B2
`
`5
`Overview of the Clock-Three Generation Technique
`The present invention provides a technique for improving
`overall timing performance of an IC chip in the presence of
`clock variations. Specifically, using a placed IC design as an
`input, the present invention performs a clock-tree synthesis
`(CTS) operation on the IC design to maximize commonly(cid:173)
`shared clock paths (including clock buffers and clock nets) to
`the registers in the IC design if they have tighter timing
`constraints between them.
`More specifically, the present invention first extracts tim(cid:173)
`ing information as a preprocessing step of the CTS, so that
`timing critical register pairs can be identified from the placed
`design netlist. Next, starting from leaf-level clustering, the
`present invention attempts to maximize the inclusion of the
`timing critical register pairs into the same cluster.
`Furthermore, one embodiment of the present invention
`uses a two-pass clustering procedure to minimize the negative
`effects of a pure timing-based clustering on general CTS
`qualities (such as maximum global skew, clock wire length,
`power, and insertion delay). In the first pass, a location-based 20
`general purpose clustering is performed, and a disjoint set of
`clusters is obtained. The statistics on the capacitances and
`bounding boxes of the clusters are then extracted from the set
`of clusters as guiding constraints for the second pass. In the
`second pass, the clustering is re-done based on both the tim- 25
`ing criticality profile and a set of constraints from the first
`pass, so that the set of constraints (e.g., maximum capaci(cid:173)
`tance, maximum fanout, etc.) are not violated.
`The two-pass CTS procedure is performed in a bottom-up
`manner, and can be extended to non-leaf levels. Note that 30
`each resulting cluster is driven by a single clock buffer ( or
`inverter), both at the leaf-level and at non-leaflevels.
`
`6
`Note that variations, such as processing variations (e.g.,
`width of the clock wire), voltage variations, and temperature
`variations can result in uncertainties in clock path delays C 1
`and C2 . Because these uncertainties can have both positive
`5 and negative values, a worst case scenario is typically con(cid:173)
`sidered. For example, the uncertainties can simultaneously
`cause an increase ofC 1 and a decrease ofC2 . Hence the clock
`skew ti. becomes significantly larger, and the above timing
`constraint becomes more stringent. Furthermore, if there is
`10 not a sufficient positive slack, this increase of clock skew can
`cause a violation of the timing constraint more easily which
`can lead to failures in the circuits.
`Note that if there are more commonly-shared portions in
`the clock paths (including clock buffers and clock nets) from
`15 the clock source to different clock sinks, the data path
`between the pair of clock sinks becomes less vulnerable to the
`variations, because the commonly-shared clock path up to a
`common point does not contribute to clock skew uncertain-
`ties. In one embodiment of the present invention, the effect of
`shared clock paths can be measured by using a method called
`"clock reconvergence pessimism removal" or CRPR.
`FIG. 3 illustrates an exemplary clock-tree structure 300
`with leaf-level registers in accordance with an embodiment of
`the present.
`Clock-tree structure 300 includes a main clock source 302
`at the root of clock-tree 300, two middle levels of clock(cid:173)
`buffers, and a leaf-level comprising a set of clustered regis(cid:173)
`ters. Registers 304 and 306 form a launcher/capturer pair
`which is logically coupled through a data path 308. When
`comparing the two clock paths from clock 302 to registers
`304 and 306, note that branch 310 and branch 312 are the
`non-shared "private" sections of the two clock paths, respec(cid:173)
`tively. These two private branches contain a total of four
`unshared clock buffers.
`Registers 314 and 316 form another launcher/capturer pair
`which is logically coupled through a data path 318. Note that
`the clock paths for registers 314 and 316 are mostly shared,
`and the private sections of the two clock paths are branch 320
`and branch 322, respectively. These two private branches are
`both local branches which contain no clock buffers. Note that
`register pair 314/316 achieves a significantly more clock-path
`sharing than the register pair 304/306.
`In one embodiment of the present invention, the total num(cid:173)
`ber of unshared clock-buffers in both clock paths of a
`launcher/capturer pair is used to indicate a severity of the
`effects due to OCV. In this embodiment, OCV causes a much
`more severe impact on data path 308 than on data path 318.
`Note that generally registers within a same cluster, for
`example in cluster 324, are less vulnerable to OCV than
`register pairs in different clusters, for example, between clus(cid:173)
`ter 326 and cluster 328. Based on above observation, we
`provide an OCV-aware CTS technique below.
`
`Variations Impact on Clock Skew and Timing Constraints
`Note that on-chip variation (OCV) increases the clock 35
`skew uncertainties which subsequently worsen timing con(cid:173)
`straints in the circuits. Due to these variations, a certain clock
`path (to a launcher) can be slower than expected; while
`another clock path ( to a capturer) can be faster than expected.
`A critical timing path clocked by such a clock-tree will paten- 40
`tially violate a specified timing constraint.
`FIG. 2 illustrates the effect ofOCV on an exemplary circuit
`in accordance with an embodiment of the present invention.
`FIG. 2 includes two logically coupled registers 202 and
`204, and a clock source 206. Register 202 is coupled to 45
`register 204 through a data path 208 which includes a com(cid:173)
`binational logic 210. Note that in this configuration, we refer
`to register 202 as a "launcher" and register 204 as a "cap(cid:173)
`turer", respectively. A clock signal generated by clock source
`206 is distributed to registers 202 and 204 through two dif- 50
`ferent clock paths 212 and 214, which are associated with
`clock path delays C 1 and C2 , respectively. Ideally, C 1 and C2
`are designed to be identical. In reality, there almost always
`exists a difference between them, which is referred to as the
`clock skew li.=C 1-C2 . Note that in FIG. 2, clock path (wire)
`212 is shorter than clock path (wire) 214.
`During a clock cycle, new data presented at the outputs of
`register 202 moves downstream through data path 208 and is
`latched onto the inputs of register 204. During normal circuit
`operation, the timing constraint stipulates that a delay d asso- 60
`ciated with data path 208 plus a setup time t associated with
`register 204 is less than the clock period T, i.e., d+t<T. Fur(cid:173)
`thermore, due to the non-zero clock skew resulting from C 1
`and C2 , the timing constraint becomes C 1 +d+t<T +C2 , which
`can be rewritten in terms of the clock skew as: li.+d+t<T. The 65
`difference (T-t)-(li.+d) is referred to as the "slack," and hence
`a negative slack results in a violation of the timing constraint.
`
`55
`
`Generating a OCV-Tolerant Clock-Tree
`FIG. 4 presents a flowchart illustrating the process of gen(cid:173)
`erating a clock-tree to improve timing performance in accor(cid:173)
`dance with an embodiment of the present.
`During operation, the process starts by receiving a place(cid:173)
`ment of an IC chip design (step 402). Specifically, in this IC
`chip design all the registers and the corresponding combina(cid:173)
`tional logics between registers have been placed at respective
`locations within the chip boundary, and the connectivities
`between registers are specified in a design netlist file.
`The process then generates a timing criticality profile for
`the set of registers, wherein the timing criticality profile
`specifies timing criticalities between pairs of launcher/cap-
`
`Page 10 of 14
`
`

`

`US 7,546,567 B2
`
`7
`turer in the set of registers (step 404). As a result, timing
`critical register pairs can be identified from the placed design
`netlist.
`Note that a timing criticality can be computed based on the
`slack described above, i.e., (T-t)-(ll+d), wherein T is the
`clock period, t is the setup time of the capturer, and d is the
`data path delay which can be obtained from the placed design
`netlist. However, because the clock-tree has not been estab(cid:173)
`lished at this stage, an exact clock skew is unknown. In one
`embodiment, an ideal clock skew of zero is assumed. In a
`further embodiment, a clock skew of a predetermined value is
`assumed (for example, by making a worst case estimate of the
`clock skew).
`One embodiment of the present invention represents the
`timing criticality profile as a graph G(V, E), wherein V rep(cid:173)
`resents the set of registers and Eis the set of edges between the
`set of registers weighted by their corresponding timing criti(cid:173)
`calities. Specifically, an edge between each pair oflauncher/
`capturer is assigned a weight (for example, from Oto 10)
`based on its timing criticality. For example, in FIG. 3, the edge
`between register pair 304/306 will be assigned a higher
`weight than the edge between register pair 314/316 if the
`former has a longer data path delay. In order to reduce the
`complexity of the timing graph G, only those edges associ(cid:173)
`ated with sufficiently large weights will be included in the
`graph, and considered as timing critical. Consequently, reg(cid:173)
`isters with relatively low timing criticalities are not consid(cid:173)
`ered for their timing profile during clustering.
`Note that the timing graph G can be overlaid on the placed
`chip layout using two-dimensional (2D) distance-coordi(cid:173)
`nates, and the set of edges can be displayed as straight lines
`connecting timing critical register pairs (as the end points of
`an edge).
`
`Performing a Location-Based CTS
`A conventional location-based clustering technique aims
`to minimize the clock skew, power consumption, wire length,
`etc., by clustering leaf-level registers and upper-level clock(cid:173)
`buffers based on the physical proximity of the components at
`each level in the placement configuration. For example, when
`two registers are placed far apart from each other, the tech(cid:173)
`nique tends to avoid clustering them together which would
`lead to larger clock skew. However, it is possible that these
`two physically distant registers are associated by a high tim(cid:173)
`ing criticality. This timing profile is not considered by the
`conventional location-based CTS. On the other hand, two
`physically nearby registers can be logically unrelated, and
`therefore are not required to be clustered together. However,
`this is also not considered by the conventional location-based
`CTS. In contract, the present invention takes into account
`both geometric and timing information while creating a
`clock-tree.
`Referring back to FIG. 4, the process next performs a 55
`first-pass location-based clustering on the set of registers to
`generate a temporary partial-clock-tree, which comprises a
`disjoint set the clusters (step 406). Specifically, the clustering
`is based primarily on the physical proximity of the compo(cid:173)
`nents, while also considering a set of user specified "hard"
`constraints. These user-specified constraints on a cluster can
`include but are not limited to: maximum/minimum capaci(cid:173)
`tance within a cluster and maximum/minimum fanout within
`a cluster.
`Note that the set of clusters in the temporary partial-clock- 65
`tree includes clusters from both the leaf-level and non-leaf-
`levels, and each cluster has an associated bounding box con-
`
`5
`
`8
`(registers or clock-buffers).
`taining a set of nodes
`Furthermore, each cluster is driven by a single common clock
`buffer.
`Next, the process extracts statistics from the set of clusters
`(step 408). These statistics form a new set of constraints on a
`cluster, which can include but is not limited to: a maximum/
`minimum bounding box size of a cluster, a bounding box
`shape ( e.g., a maximum aspect ratio) ofa cluster, a maximum/
`minimum capacitance within a cluster, and maximurn/mini-
`10 mum fanout

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket