`US008234614Bl
`
`c12) United States Patent
`Wen et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,234,614 Bl
`Jul. 31, 2012
`
`(54) MULTI-THREADED GLOBAL ROUTING
`
`OTHER PUBLICATIONS
`
`(75)
`
`Inventors: Hua Wen, Mountain View, CA (US);
`Hsiao-Ping Tseng, Fremont, CA (US);
`Zhong Wang, San Jose, CA (US)
`
`(73) Assignee: Synopsys, Inc., Mountain View, CA
`(US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1031 days.
`
`Deza et al.; "Global Routing in VLSI Design: Algorithms, Theory,
`and Computational Practice"; Oct. 6, 2008; McMaster University;
`pp. 1-25.*
`Muller et al.; "Optimizing Yield in Global Routing"; Nov. 5, 2009;
`University of Bonn; pp. 1-7.*
`Muhanunet Mustafa Ozdal, et al., "Archer: A History Driven Global
`Routing Algorithm", 2007 International Conference on Computer
`Aided Design (ICAD), Nov. 4-8, 2007, pp. 488-495.
`Xhaoyun Xing, et al., "A Parallel Algorithm for Timing-driven Glo(cid:173)
`bal Routing for Standard Cells", International Conference on Parallel
`Processing, Aug. 10-14, 1998, pp. 54-61.
`
`(21) Appl. No.: 12/156,963
`
`(22) Filed:
`
`Jun.5,2008
`
`(51)
`
`Int. Cl.
`G06F 17150
`(2006.01)
`(52) U.S. Cl. ....................................................... 716/129
`(58) Field of Classification Search ................... 716/129
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`6,269,469 Bl *
`7/2001 Pavisic et al .................. 716/129
`6,289,495 Bl *
`9/2001 Raspopovic et al.
`......... 716/129
`4/2006 He et al.
`7,036,101 B2
`7,574,686 Bl*
`8/2009 Wadland et al. .............. 716/126
`2002/0120912 Al*
`8/2002 He et al ........................... 716/12
`2006/0190897 Al
`8/2006 He et al.
`2007/0028201 Al *
`2/2007 Mehrotra et al ................ 716/12
`2009/0070726 Al *
`3/2009 Mehrotra et al ................ 716/13
`
`* cited by examiner
`
`Primary Examiner - Suresh Memula
`(74) Attorney, Agent, or Firm - Blakely Sokoloff Taylor &
`ZafmanLLP
`
`ABSTRACT
`(57)
`A method is described for routing a semiconductor chip's
`global nets. The method includes identifying a subset of the
`global nets and routing the subset of global nets using mul(cid:173)
`tiple threads, where, each of the global nets within the subset
`are routed by one of the threads in isolation of the subset's
`other global nets. The method further includes identifying a
`second subset of the global nets and routing the second subset
`of global nets using the multiple threads, where, each of the
`global nets within the second subset are routed by one of the
`threads in isolation of the second subset's other global nets
`but in respect of the routes of first subset of global nets.
`
`20 Claims, 6 Drawing Sheets
`
`101
`.... _;
`
`SORT
`NETS
`
`103
`
`104
`
`106
`--.... . ./
`
`Page 1 of 12
`
`
`
`U.S. Patent
`
`Jul. 31, 2012
`
`Sheet 1 of 6
`
`US 8,234,614 Bl
`
`SORT
`NETS
`
`RANKED LIST OF GLOBAL NETS
`,----_.____ __ _,
`102
`DETERMINE
`WINDOW
`SIZE
`
`SELECT NEXT SET
`OF NETS WITHIN
`WINDOW
`
`CONCURRENTLY
`ROUTE NETS
`WITHIN WINDOW
`
`104
`'-. ./
`
`CONFLICTS?
`
`105
`
`NO EXIT IF
`ALL GLOBAL
`NETS ARE
`ROUTED)
`
`REMOVE NETS UNTIL
`NO CONFLICTS EXIST
`
`ADD REMOVED NETS
`TO LIST OF NETS
`
`106
`"--"
`
`107
`----._/
`
`YES
`
`TO
`MANY
`CONFLICTS?
`
`NO --><
`
`NO
`
`109
`
`TOO
`FEW
`CONFLICTS?
`
`FIG. 1
`
`Page 2 of 12
`
`
`
`U.S. Patent
`
`Jul. 31, 2012
`
`Sheet 2 of 6
`
`US 8,234,614 Bl
`
`201
`\
`IDENTIFY POWER/GROUND
`NETS
`
`202
`\
`
`i
`
`IDENTIFY CLOCK
`NETS
`
`i
`
`203
`\
`IDENTIFY TIMING/SLEW
`CRITICAL NETS
`
`i
`
`20\
`IDENTIFY SHORTER/LOWER
`FAN OUT NETS BEFORE
`LONGER/HIGHER FAN OUT
`NETS
`
`i
`
`205
`\
`GEOGRAPHICALLY DISPERSE
`NETS
`
`POWER/CLOCK
`NETS
`
`POWER/
`GROUND
`
`CLOCK
`
`POWER/
`GROUND
`
`CLOCK
`
`TIMING/
`SLEW CRITICAL
`
`ms?l■I
`,oo?l■ilil
`,~1■1ii1■1
`
`SHORTEN
`LOWER FAN
`OUT NETS LONGER
`TIMING/ ~ HIGHER
`CLOCK SLEW CRITICAL FAN OUT NETS
`NETS
`NETS
`)
`
`POWER/
`GROUND
`NETS
`
`20671■1■1■1111111111111
`
`FIG.2
`
`Page 3 of 12
`
`
`
`U.S. Patent
`
`Jul. 31, 2012
`
`Sheet 3 of 6
`
`US 8,234,614 Bl
`
`307A~
`
`308 _..,.
`309 _..,.
`
`311
`
`310
`
`1
`
`2
`
`3
`
`N+1 N+2 N+3
`
`...
`
`2N+1
`
`. . .
`(N-1)N ...
`
`...
`...
`
`N
`
`N+N
`
`4
`
`N+4
`
`E.G.,
`REGION#: REGION_JD
`
`. . .
`
`N2
`
`307C
`
`~
`
`NET_ID
`
`REGION_ID
`
`NET.JD
`
`REGION.JD
`
`8
`
`17
`
`52
`
`78
`
`89
`
`95
`
`97
`
`99
`
`102
`
`1
`
`2
`
`5
`
`5
`
`6
`
`4
`
`3
`
`8
`
`7
`
`FIG. 3
`
`8
`
`17
`
`52
`
`89
`
`78
`
`95
`
`97
`
`99
`
`102
`
`1
`
`2
`
`5
`
`6
`
`5
`
`4
`
`3
`
`8
`
`7
`
`Page 4 of 12
`
`
`
`U.S. Patent
`
`Jul. 31, 2012
`
`Sheet 4 of 6
`
`US 8,234,614 Bl
`
`, ,QUEUE
`_,
`401
`
`WINDOW OF
`NETS
`403
`
`•
`•
`•
`NET ID= 89
`NET ID"" 52
`NET ID= 17
`NET ID =8
`
`•I'
`
`DISPATCH
`PROCESS
`
`I
`
`I
`
`/
`
`/
`
`/
`
`I
`
`I
`
`I
`
`" I
`
`I
`I
`I
`I
`I
`I
`I
`
`+
`
`' ' ' ' '
`. . .
`
`' ' '
`
`....
`
`' ' '4
`
`THREAD_1
`
`THREAD_2
`
`THREAD_N
`
`(
`402_1
`
`(
`402_2
`
`(
`402_N
`
`FIG. 4
`
`Page 5 of 12
`
`
`
`U.S. Patent
`U.S. Patent
`
`Jul. 31, 2012
`1021.,3mJ
`
`t
`h
`Sheet 5 of 6
`M
`
`US 8,234,614 Bl
`1B41694
`SU
`
`A
`
`501
`
`B
`
`--
`. ~--.. ,.,
`
`I
`
`I',
`
`r,
`
`I\
`
`\
`
`I
`\
`
`I
`I
`
`502
`
`503
`
`I
`I
`I
`
`'\
`
`
`
`20mo
`sllllllllllll000G
`HJIIIHIIIIIIFIllll
`
`%IIIIIIIIIIII
`
`0
`
`B45
`
`504
`
`505
`~
`
`60
`5®5
`
`~ 500
`0
`
`III-IIIIIIII5
`0®
`
`0000
`
`FIG. 5
`I
`
`Page 6 of 12
`
`Page 6 of 12
`
`
`
`
`
`U.S. Patent
`
`Jul. 31, 2012
`
`Sheet 6 of 6
`
`US 8,234,614 Bl
`
`CACHE
`filM
`Al
`
`PROCESSOR(S)
`filtl
`
`'I
`
`r
`
`MCH
`602
`
`J~
`
`·•
`
`SYSTEM
`- . MEMORY
`- .
`filU
`
`DISPLAY
`fillI
`
`GRAPHICS
`- - PROCESSOR - -
`--
`.
`§00
`
`~ ~
`
`ICH
`QQQ
`
`608N
`
`FIG. 6
`
`I l a ... \
`
`1r
`
`6082
`
`6081
`
`Page 7 of 12
`
`
`
`1
`MULTI-THREADED GLOBAL ROUTING
`
`2
`DETAILED DESCRIPTION
`
`US 8,234,614 Bl
`
`FIELD OF INVENTION
`
`The field ofinvention relates generally to the field of semi(cid:173)
`conductor chip design and more particularly to the routing of
`global nets.
`
`BACKGROUND
`
`The present discussion discloses an improved approach
`that uses multi-threading to concurrently route a plurality of
`5 global nets. As will become more clear in the description that
`follows, in order to effect an environment where multiple
`threads can operate in isolation of one another, a subset of the
`total global net pool (the size of which is referred to as a
`"window") is selected for concurrent multi-threaded routing.
`10 The window effectively "slides" over the pool of the global
`nets in order to route the entire pool in a multi-threaded
`fashion. The size of the window may also self adjust to deter(cid:173)
`mine an appropriate number of nets that can be concurrently
`routed successfully. More details are provided immediately
`15 below in reference to FIG.1 which provides an overview of an
`embodiment of such a routing process.
`According to the methodology of FIG. 1, initially, a semi(cid:173)
`conductor chip's global nets are identified and imported from
`a design file and then sorted 101 according to a strategy that
`20 ranks the nets according to a priority scheme. The list of nets
`may therefore be referred to as a sorted or ranked list of nets.
`A window size is also determined 102. The window size is a
`specific number of nets. As observed in FIG. 1, a number of
`nets equal to the window size is selected 103 from the "top of'
`the ranked net list and concurrently routed 104 by a multi(cid:173)
`threaded processing unit.
`After the window of nets is concurrently routed the con-
`stituent nets are checked for conflicts 105. As described in
`more detail below, a conflict exists if two or more routed nets
`occupy a same region of the semiconductor chip. Here, in an
`embodiment, individual nets within a window are routed in
`isolation from one another. That is, within a window of nets
`that are concurrently routed, a net's routing is undertaken
`without reference to the routing of other nets within the same
`window. As such, there exists some probability that two nets
`from the same window will be "in conflict". That is, they
`impermissibly occupy the same chip surface region.
`If such conflicts are detected 105, one or more nets are
`strategically eliminated such that the conflicts are eliminated
`106. The nets whose routes were eliminated 106 are then
`placed "back on top" of the ranked list 107 for, in one embodi-
`ment, immediate selection within the next window of concur(cid:173)
`rently routed nets. Depending on the number of conflicts that
`are detected 108, the window size may be adjusted downward
`45 102 to reduce the number of conflicts expected from the next
`window. Likewise, according to one optional approach, if too
`few conflicts are detected 109 (e.g., if no conflicts are
`detected), the window size may be expanded 102 to include
`more nets for the next window. In another alternate embodi-
`ment, a globally routed net could have a section of it that is
`in-conflict re-routed around the tile with the conflict.
`FIG. 2 shows an exemplary embodiment of a methodology
`for sorting nets 101. According to the sorting embodiment of
`FIG. 2, first, power or ground nets are selected 201. That is,
`power/ground nets are identified as the highest priority nets to
`be routed before other nets. Generally speaking, electronic
`circuits desire "clean" power supply and ground voltage rails.
`Too much resistance along a power supply node or ground
`node promotes the formation of undesired voltage level dif-
`60 ferences along the node. By choosing to route power/ground
`nets first, "shorter" power/ground route lengths will be
`designed into the semiconductor chip which, in turn, should
`lower the IR drop along these nodes.
`After the power/ground nets are identified, clock nets are
`65 identified 202. The operation of a semiconductor chip can
`often be viewed as a number of carefully synchronized sub(cid:173)
`processes. Clock signals are used to control the timing of
`
`Present day semiconductor chips can have tens of millions
`of individual wires. Sophisticated routing software programs
`are used to automatically route these wires. However, owing
`to the sheer numberof wires and the complexity of the wiring,
`a considerable amount of time may be expended even though
`the routing process is automated ( e.g., 24 hours or more).
`Parallel processing techniques such as multi-threading
`have been attempted in order to reduce the routing time. The
`term "thread" is often used to refer to a sequential stream of
`program code instructions and/or the processing resources
`used to process such a stream. Multi-threading is the concur(cid:173)
`rent execution of more than one thread. For instance, if a
`computing system has two single threaded CPUs, the com(cid:173)
`puting system can be configured as a multi-threaded machine 25
`that concurrently executes two separate threads.
`In order to enhance the processing performance of a multi(cid:173)
`threaded machine, threads should be able to perform their
`respective tasks without being dependent on one another.
`Said another way, each thread should be able to execute in 30
`isolation of the otherthread(s ). In the context of semiconduc-
`tor chip routing, multi-threading can be easily applied to
`"detail" routing but not "global" routing.
`If the surface area of a semiconductor chip is viewed as a
`grid of smaller surfaces or "tiles", multi-threaded routing is 35
`easily applied to the localized wires within tiles (detail rout(cid:173)
`ing) but not to global wires that extend across multiple tiles
`(global routing). With respect to detail routing, application of
`one thread to one tile and another thread to another tile cor-
`responds to an environment where the two threads can aper- 40
`ate in isolation of one another because the local wires of the
`first tile can not occupy the same space as the local wires of
`the second tile.
`By contrast, with respect to global routing, many global
`nets could potentially occupy the same tile(s ). As such, there
`may be interdependence between nets (because they may
`share the same tile( s)) and/or interdependence between tile(s)
`(because they may include sections of a same wiring space).
`These interdependencies impose difficulties when trying to
`impose an operating environment for multiple concurrent 50
`threads that operate in isolation of one another.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention is illustrated by way of example and 55
`not limitation in the figures of the accompanying drawings, in
`which like references indicate similar elements and in which:
`FIG. 1 shows a process for routing global nets using a
`multi-threaded processing approach;
`FIG. 2 shows a process for ranking global nets;
`FIG. 3 shows a process for geographically dispersing glo(cid:173)
`bal nets;
`FIG. 4 shows an architecture for routing global nets using
`a multi-threaded processing approach;
`FIG. 5 shows a process for removing conflicts from a group
`of routed nets;
`FIG. 6 shows a computing system.
`
`Page 8 of 12
`
`
`
`US 8,234,614 Bl
`
`3
`these processes. For instance, a clock signal may be used to
`trigger when a sub process starts or stops, and/or, how fast the
`sub-process operates. A clock net is a wire that transports a
`clock signal. Generally, similar to power/ground nodes, a
`"clean" clock signal corresponds to precise synchronization/
`timing and/or fast execution of processes within the semicon(cid:173)
`ductor chip during its operation. As such, routing the clock
`signal nets directly after the power/ground signal nets pro(cid:173)
`motes the formation of clean clock signals within the semi(cid:173)
`conductor chip. This, in turn, promotes the speed and/or pre(cid:173)
`cision at which the chip's sub-processes will operate.
`After the clock nets are identified, timing/slew critical nets
`are identified 203. The motivation for prioritizing timing/slew
`critical nets is similar to that of clock nets. The performance 15
`of a semiconductor chip is often judged by "how fast" the chip
`operates. As discussed above, clock signals may control the
`speed of the chip's internal processes and, often, a number of
`clock signals are high frequency signals (some of which may
`be the highest frequency signals used by the chip). Other 20
`signals may also exist on the chip that are not clock signals but
`nevertheless should be "fast". Such nets are referred to as
`timing/slew critical nets. The speed of a signal can be mea(cid:173)
`sured, for instance, by the rise times and/or fall times of its
`logic levels and/or its propagation delay from one end of the
`net to another end of the net. Because various factors-such
`as an extensive wire length----can adversely affect the quality
`of a timing/slew critical signal, timing/slew critical nets are
`prioritized after the clock signal nets.
`After, the power/ground, clock and timing/slew critical
`nets are identified 201/202/203, the remainder of the chip's
`nets are identified 204. According to the specific embodiment
`of FIG. 2, the remaining nets are further sorted such that
`shorter and lower fan-out nets are routed before longer and
`higher fan-out nets ( on the theory that shorter and lower
`fan-out nets are more difficult to re-route if involved in a
`conflict). Here, to accomplish this sorting, a factor is calcu(cid:173)
`lated that weighs both a net's length and fan-out. For
`example, a first number is used to gauge a net's length and a
`second number is used to gauge the net's fan-out. The product
`or summation of the two numbers determines the net's factor.
`The nets can then be sorted based on the factor.
`Insets 206a thru 206d depict the construction of a priori(cid:173)
`tized list according to the criteria described just above. That
`is, inset 206a depicts the identification of the power/ground
`nets, inset 206b depicts the identification of the clock nets
`after the power/ground nets in the priority/ranking scheme,
`inset 206c depicts the identification of the timing/slew critical
`nets after the clock nets and inset 206d depicts the identifica- 50
`tion of the shorter/lower fan-out nets after the timing/slew
`critical nets. It is important to point out that, although a
`physical list of such nets ordered by priority as suggested by
`inset 206a-d can be implemented-there is no requirement to
`do so.
`That is, for instance, the ranking can be recorded in a net
`list that physically lists nets by ID number (e.g., by simply
`listing for each net which category it belongs to (power/
`ground, clock, etc.)). In this case, the sorting/ranking infor(cid:173)
`mation is embedded in the net list-but the netlist itself is not
`physically "re-arranged" according to the sorting/ranking.
`Here, for purposes of the present application the term "list"
`simply means any data formulation or structure from which
`the ranking can be deemed. This would include both a physi(cid:173)
`cally rearranged list as well as a standard net list with embed(cid:173)
`ded ranking information. It also pertinent to point that other
`ranking schemes can be employed ( e.g., that prioritize differ-
`
`4
`ently and/or use different parameters as a basis for ranking
`such as timing slack, etc.). The ranking scheme of FIG. 1 is
`just one possible approach.
`Referring back to FIG. 1, once the ranked list of nets is
`5 created 101, the window size is determined 102. In an
`embodiment, the window size is based on the number of
`threads within the processing unit that will concurrently route
`the nets. For instance, the window size may be set equal to the
`number of threads multiplied by some factor. Thus, if the
`10 factor is 10 and there are four threads in the processing unit,
`the window size is set to 40 nets.
`A number of nets equal to the window size is then selected,
`in order, from the ranked list 103. According to one embodi(cid:173)
`ment, observed in FIG. 3, a pre-processing "tweak" is applied
`to the ranked nets within the window that geographically
`diversifies the nets across the chip surface them from list entry
`to list entry. Geographic diversification of neighboring nets
`within the window helps to avoid routing conflicts that might
`otherwise result from the concurrent routing process 104.
`Insets 307a through 307c of FIG. 3 illustrate an embodiment
`of the diversification process in more detail.
`Inset 307a shows the semiconductor chip being divided
`into an NxN grid of N2 regions. In an embodiment, each
`region contains multiple "tiles" (i.e., a semiconductor chip
`25 contains more tiles than regions). Inset 307b shows a portion
`of a selected window's ranked list that includes both an iden(cid:173)
`tification of the net (NET_ID) and the identification of a
`region (REGION_ID) in which an all endpoint terminals or
`"pins" of the net (which typically corresponds to a location of
`30 a contact to a transistor that is coupled to the net) is located. In
`a further embodiment, nets that run within more than one
`region are given a special REGION_ID value (such as
`REGION_ID=0).
`Starting from the top (highest priority) of the window's
`35 ranked list 307b and scanning the entries sequentially, if
`neighboring entries are found to have the same REGION_ID,
`such as entries 308 and 309, a look-ahead scan is performed
`over the immediately following entries 310 for the first net
`311 that has a different REGION_ID. Entry 311 is then
`40 swapped with entry 309 as observed in inset 307c. By geo(cid:173)
`graphically diversifying neighboring nets within the window
`the probability is reduced that two nets that touch the same
`region will conflict with one another during the concurrent
`routing process 104. Alternatively, techniques other than pure
`45 swapping may be employed to accomplish geographic diver(cid:173)
`sification of the nets. For instance, if two neighboring global
`nets have the same REGION_ID, one of them may be pushed
`deeper into the ranked list causing those above it to be pushed
`upward one location.
`The nets within the window are then concurrently routed
`104. FIG. 4 shows an architectural depiction of the concurrent
`routing process. According to the depiction of FIG. 4 the
`window of nets 403 can be viewed as being located within a
`queue 401 in order. That is, the highest priority nets within the
`55 window are at the bottom of the queue 401 and the lowest
`priority nets within the window are at the top of the queue 401
`so that higher priority nets issue from the queue 401 before
`lower priority nets. Note that in certain embodiments nets
`within a same net category, such as "power/ground", may be
`60 viewed as being equal priority. Nevertheless, at least in cases
`where a category boundary exists within the window, nets
`from the higher priority category will be issued from the
`queue before the nets from the lower priority category. For
`instance, if the ranking scheme of FIG. 2 is employed, power/
`65 ground nets will issue from the queue before clock nets.
`Nets issue from the queue 401 and are dispatched to threads
`402_1 through 402_N for routing. The general depiction of
`
`Page 9 of 12
`
`
`
`US 8,234,614 Bl
`
`5
`FIG. 4 indicates the existence ofN threads. Hence, initially, N
`nets will issue to the N threads----one net per thread. In an
`embodiment, each thread routes its respective net in isolation
`from the other nets within the net's window but not previously
`routed windows. For instance, as a simple example, assume a
`first window of nets is concurrently routed. Then, the next(cid:173)
`second-window of nets is selected from the ranked net list.
`The nets within the second window are routed so as not to
`conflict with the routed nets from the first window. However,
`in an embodiment, the nets within the second window are
`routed in isolation of one another. As such, no conflicts should
`exist between nets within the first window and nets within the
`second window. However, conflicts may exist between nets
`within the second window.
`The different threads may complete the routing of their
`respective nets at different times (e.g., a less complex route
`may take less time to complete than a more complex route).
`As such, nets are "dispatched" from the queue 401 to the
`threads as the threads become available in lieu of the amount
`of time it takes the threads to complete their respective routes.
`For instance, in a system with four threads, the first four nets
`in the queue 401 will initially issue across the first four
`threads. The four threads then concurrently route the four
`nets. If the third thread finishes its route before the others, the
`fifth net from the queue 401 will issue to the third thread
`which concurrently routes it with the other three nets still be
`processed. If the second thread is next to finish, the sixth net
`from the queue 401 will issue to the second thread. Thus, nets
`issue to threads as the threads become available. Which nets
`are currently being routed in parallel therefore depends on the 30
`specific sequence at which the threads finish routing their
`respective nets.
`Eventually all the nets within the window will be routed.
`Referring back to FIG. 1, the results are checked for conflicts
`105. FIG. 5 provides a simple example that demonstrates how 35
`conflicts may be detected 105 and resolved 106, 107. FIG. 5
`assumes a simple example where the window size is only 5.
`As such when the concurrent routing of the window is com(cid:173)
`plete there are 5 routed nets. Inset 500 shows an exemplary
`depiction of the 5 routed nets A, B, C, D and E mapped against 40
`tiles of the semiconductor chip.
`Here, net C is observed to occupy surface area capacity
`within tiles whose surface is also occupied by other nets.
`Specifically, net C occupies: 1) tile 501 with net D; 2) tile 502
`with net E; and, 3) tile 503 with net A. According to one 45
`embodiment, if a wire shares a tile with another wire, a
`conflict is flagged between the two wires. Thus, in the exem(cid:173)
`plary depiction 500 of FIG. 5, there exist three conflicts----one
`conflict in each of tiles 501, 502 and 503. According to a
`further embodiment, a graph theory approach is utilized to 50
`strategically remove the conflicts so that particularly trouble(cid:173)
`some routes are discarded ( and their corresponding nets iden(cid:173)
`tified for selection in the next window). By discarding the
`troublesome nets, most of the effort undertaken concurrently
`routing an entire window of nets is preserved. That is, an 55
`attempt is made to preserve as many conflict free routes as
`possible.
`Insets 504 and 505 show a simplistic demonstration of a
`graph theory approach used to eliminate the conflict situation
`of inset 500. Inset 504 shows a graph constructed from the 60
`conflict situation of inset 500. A node is represented for each
`routed net. Thus, inset 504 shows separate nodes for each of
`nets A, B, C, D and E. Each conflict is represented by an
`"edge" or line that connects two routes in conflict. Thus,
`recalling that net C is deemed to be in conflict with each of 65
`nets A, D and E, note that inset 504 shows three lines that
`respectively connect node C with nodes A, D and E. Accord-
`
`6
`ing to the graph theory algorithm, nodes are ranked by the
`number of edges connected to them. Here, node C is ranked
`above the other three nodes A, B, D and E because node Chas
`three edges. Nodes A, D and E are next ranked equally
`5 because each has one edge. Node B is ranked at the bottom
`because it does not contain any edges.
`Once the nodes are ranked by edge count, the node with the
`highest number of edges is eliminated. When a node is elimi(cid:173)
`nated, each of its edges are eliminated. Inset 504 shows the
`10 result when node C is eliminated. Note that all edges have
`been eliminated-therefore all conflicts between routes have
`been eliminated. Thus, the graph theory approach of insets
`504, 505 demonstrate that, with respect to the situation of
`inset 500, elimination of route C results in all the remaining
`15 routes A, B, D and E being preserved without any conflicts
`among them. Thus, in this simple example, referring back to
`FIG. 1, execution of processes 105 and 106 results in the
`routes for nets A, B, D and E being preserved from the
`previous, concurrent window routing 104 cycle. The route for
`20 net C is eliminated resulting in net C being added to the top of
`the ranked list for selection in the next window 103.
`Note that the example of FIG. 5 simplistically assumes that
`multiple routes cannot exist within the same tile. In fact, in
`one embodiment, multiple nets within a same tile is permit-
`25 ted. Here, each tile is modeled so as to contain a "bucket" of
`track resources. Each time a net is routed through a tile a unit
`of track resources is subtracted from the bucket amount. For
`instance, ifa tile has an initial bucket ofl O track resources and
`a net is routed through the tile, the tile is flagged as having 9
`remaining track resources. If another net is routed through the
`tile the remaining track resources for the tile is set equal to 8,
`etc.
`According to this approach, when constructing the conflict
`edges in the graph, there exist "permissible" conflicts and
`"impermissible" conflicts. That is, when the resource con(cid:173)
`sumption of a permissible conflict is accounted for, the tile's
`track resources are not below O (i.e., the tile has enough
`resources to entertain the route). By contrast, when an imper(cid:173)
`missible conflict is accounted for, the remaining track
`resources are below O (i.e., the tile does not have enough
`resources to entertain the route). As such, when ranking the
`nodes in the graph for elimination, nodes having more imper(cid:173)
`missible conflicts are ranked above nodes having less imper(cid:173)
`missible conflicts. When only permissible conflicts remain in
`the graph, the node elimination process is complete. Alterna(cid:173)
`tively, only impermissible conflicts are recorded as edges in
`the graph and the process of node elimination is the same as
`originally described with respect to insets 504, 505 of FIG. 5.
`That is, nodes having more conflicts are ranked above nodes
`having less conflicts. Nodes are eliminated in sequence
`according to this ranking until all conflicts are eliminated
`from the graph.
`It is pertinent to point out that although the immediately
`preceding discussion focused on a tile as being the semicon(cid:173)
`ductor region where multiple routes trigger a conflict, con(cid:173)
`ceivably, other regions ( e.g., smaller or larger than a tile) may
`also be used.
`Referring again to FIG. 1, once the conflicts have been
`eliminated from the window of nets that were concurrently
`routed, an inquiry is made to see if the concurrent routing
`cycle produced too many conflicts 108. If so, the window size
`is reduced 102 before the next window of nets is selected 103.
`In this manner the routing process adjusts to a more efficient
`window size. That is, the time savings associated with con(cid:173)
`current routing of multiple nets is offset each time a conflict is
`detected because additional time is spent re-routing the net
`during the concurrent routing of the next window. Ideally, no
`
`Page 10 of 12
`
`
`
`US 8,234,614 Bl
`
`5
`
`10
`
`7
`conflicts are detected and therefore no additional time is
`wasted re-routing nets. Different schemes may be used to
`determine whether or not too many conflicts exist. According
`to a number of possible of embodiments, however, the for(cid:173)
`mula that is used to calculate the threshold of conflicts
`deemed "too many" includes the number of threads as an
`input parameter. In the simplest case, the threshold is set equal
`to the number of threads.
`The algorithm of FIG. 1 adapts the window size 108, 102 in
`an attempt to minimize the number of conflicts that will result
`when a window of nets is concurrently routed. Speaking
`loosely, the algorithm should settle on small window sizes for
`difficultroutingjobs, large window sizes for easy routingjobs
`and medium window sizes for moderately difficult routing
`jobs. Any number of window reduction size algorithms can be 15
`employed to reduce the window size. According to one
`approach, the window size is initially set to a factor of the
`number of threads within the system (e.g., initial window
`size=l0(# threads)) and the window size is reduced by
`lowering the factor ( e.g., the first downward window resizing
`results in a window size of 9(# threads), the second
`downward window resizing results in a window size of 8(#
`threads) ... ).
`FIG.1 also shows an optional process whereby the window
`size may be increased if the concurrent routing of a window
`does not produce any conflicts 109, 102. Here, some form of
`intelligence may be built into the window reduction/expan(cid:173)
`sion process 108, 109, 102 such that an attempt is made to
`settle the window size within a range above which too many
`conflicts are expected and below which not enough nets are
`being concurrently routed.
`Processes taught by the discussion above may be per(cid:173)
`formed with program code such as machine-executable
`instructions that cause a machine that executes these instruc(cid:173)
`tions to perform certain functions. In this context, a
`"machine" may be a machine that converts intermediate form
`( or "abstract") instructions into processor specific instruc(cid:173)
`tions ( e.g., an abstract execution environment such as a "vir(cid:173)
`tual machine" ( e.g., a Java Virtual Machine), an interpreter, a
`Common Language Runtime, a high-level language virtual
`machine, etc.)), and/or, electronic circuitry disposed on a
`semiconductor chip ( e.g., "logic circuitry" implemented with
`transistors) designed to execute instructions such as a gen(cid:173)
`eral-purpose processor and/or a special-purpose processor.
`Processes taught by the discussion above may also be per(cid:173)
`formed by (in the alternative to a machine or in combination
`with a machine) electronic circuitry designed to perform the
`processes ( or a portion thereof) without the execution of
`program code.
`An article of manufacture may be used to store program
`code. An article of manufacture that stores program code may
`be embodied as, but is not limited to, one or more memories
`(e.g., one or more flash memories, random access memories
`(static, dynamic or other)), optical disks, CD-ROMs, DVD
`ROMs, EPROMs, EEPROMs, magnetic or optical cards or
`other type of machine-readable media suitable for storing 55
`electronic instructions. Program code may also be down(cid:173)
`loaded from a remote computer ( e.g., a server) to a requesting
`computer ( e.g., a client) by way of data signals embodied in a
`propagation medium (e.g., via a communication link (e.g., a
`network connection)).
`FIG. 6 shows an embodiment of a computing system ( e.g., 60
`a computer).