throbber
Fault Diagnosis of Plasma Etch Equipment
`Anna M. Ison, Wei Li and Costas J. Spanos
`Department of EECS, University of California, Berkeley, CA 94720-1772
`office:(5 10)642-9584, fax:(5 101642-2739, emaik annaison @eecs.berkeley.edu,
`WWW: http://bcam.eecs.berkeley.edu
`
`Abstract- The development and implementation of robust
`methods for fault detection promises to enhance manufmtur-
`ing by improving our capability to monitor equipment and
`processes. in order to filly utilize this capabiltty, it is impor-
`tant that the machine fault is not only detected but also diag-
`nosed as belonging to a fault category so that appropriate
`corrective action can promptly be taken. In this paper we
`exmine the diagnostic pelformanee of two probabilistic
`modeling techniques in using sensor signals to classify
`faults. We also discuss how the strength of these models
`may be combin!ed in a hierarchical architecture giving rise to
`a more powe @ul diagnostic tool.
`
`INTRODUCTION
`To guarantee continued success in the industry, semiconduc-
`tor manufacturers compete in product differentiation and
`development, !specifically by taking advantage of decreasing
`circuit geometries and through tighter specifications. How-
`ever, accommodating larger wafer sizes and meeting more
`stringent design demands necessitates accurate and robust
`chiuacterizatioln of the manufacturing process as well as reli-
`able prediction and control of its effects on the final wafer
`product. Proicess
`improvements translate directly
`to
`increased efficiency, decreased machine downtime, and sav-
`ings in that ealrly detection and diagnosis of potential prob-
`lems can prevent long runs of misprocessed product.
`Furthermore, to be competitive economically involves
`increasing throughput, maintaining high yield, lowering the
`cost of machine ownership and speeding up the process
`development cycle.
`Achieving these goals set by the industry requires better
`process characterization and control. This motivates the
`development of a manufacturing tool to monitor equipment
`and diagnose problems or abnormalities in machine behav-
`ior. Our approach is to use probabilistic models to character-
`ize variability in the process and to identify modes of
`operation or nnachine states.
`Although the techniques presented are general, we have cho-
`sen the plasma etch process as a test vehicle for the method-
`ology. Much attention has focused on plasma aching
`because it is considered a critical manufacturing process and
`yield limiter. However, due to its complexity, the process is
`not easily repsented by physically based models. Further-
`more, in the current production situation, data gathering
`capabilities are surpassing the development of useful andyti-
`cal tools. A system is needed to automatically extract infer-
`ences hom the various data sources in a timely fashion, so
`that appropriate action can be taken. Ironically, although the
`total volume of data is significant, relevant representative
`data properly annotated with machine log information is still
`a rare commodity, and hence the situation lends itself well to
`
`0-1803-3752-2, f9IIS10
`01997 IEEE
`
`statistical methods to draw inferences from a sample repre-
`sentative of a larger population.
`In this papex we examine two techniques for diagnosing
`faults in machine sensor data. Tree-based models are shown
`to be an effective method of identifying sensor signals most
`sensitive to changes in the input settings of the machine. This
`method is compared with the performance of generalized lin-
`ers models built to predict levels of the input settings based
`on sensor signals. Emally, we discuss how the strengths of
`the two methods can be combined to enhance diagnostic per-
`formance.
`For practical reasons, the system is constructed using data
`collected easily and economically &om the machine without
`interrupting the process. A designed experiment was con-
`ducted on a Lam Rainbow 4400 plasma etcher in the U.C.
`Berkeley microfabrication Laboratory, providing the multi-
`variate real-time tool data used to build the models in this
`work.
`
`MONlTORING AND FAULT DETECTION
`Through the use of real-time tool signals, in particular, SEC-
`SII machine information collected in-situ, we can effectively
`monitor the machine state without interrupting the process.
`Using data collected as part of the regular production pro-
`cess, we have built time-series models to monitor machind
`behavior on different time-scales, namely on a lot-to-lot,
`wafer-to-wafer, and real-time basis, and have used SPC tech-
`niques to detect faults [1][2]. The detection of an out-of-con-
`trol condition by the fault detection mechanism indicates the
`possible presence of a fault. In order to confirm the hypothe-
`sis that a fault has occurred and to identify an assignable
`cause, a diagnostic system in a probabilistic framework is
`developed which will class@ faults into discrete categories.
`In itself, the system serves as a tool to assist engineers in
`identifying problems affecting machine performance which
`could result in damage to the product, and as an early warn-
`ing system to aid in scheduling preventative maintenance
`events, potentially reducing machine downtime. However, in
`the larger framework, this classification capability identifies
`modes of equipment operation, allowing us to utilize the
`appropriate blend of models best suited for that operation
`mode. Accordingly, wafer chatacteristics can be predicted
`based on a better estimate of the machine state.
`
`EXPERIMENTfi SETUP
`Data Description and Experimental Design
`?he monitored signals used in this work are those suspected
`to be most sensitive to changes in the chamber state of the
`etcher [2]. These signals are known as real-time tool signals
`and are collected while wafers are being processed at a rate
`
`B-49
`
`Authorized licensed use limited to: Katie Hibert. Downloaded on July 28,2021 at 17:38:04 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1020
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 1 of 4
`
`

`

`of 1 Hz. The changes we wish to detect and classlfy in this
`paper correspond to specific shifts in the input settings of the
`machine. The assumption i s that abnormal machine behavior
`will manifest itself in a mann
`change in the input settings
`which are varied over three
`composite design; this is
`
`range of different faulty
`
`period for each of the 3
`
`datasets, The algorithm implementing the construction of
`tree-based models must determine variables on which to
`divide, and how to split the space into partitions. It does this
`by partitioning the space of the predictor variables x into
`homogeneous regions, attempting to make the conditional
`distribution of the response y given x, f(y/x), independent of
`x. The algorithm accomplishes this task by using a criterion
`a measure of deviance.
`" i h g
`Classification trees are based on the multinomial distribu-
`tion. If we considea a vector, for example, y = (O,l,O), to rep-
`resent the response J belonging to the second of three factor
`levels, then the probability corresponding to a response fall-
`into each level would be given by 1 = (p1,p2,p3), with the
`straint E p i = 1, i = 1,2,3.
`'Ibe model consists of a stochastic component given by
`yi - Mh), i = 1,2, ...,N
`
`Table 1. Input settings for the plasma etcher
`Signal Selection
`In this study, the probability of a high, medium, or Zow value
`for an input setting to a plasma etcher is
`using
`ber as
`real-time tool signals collected from the p
`predictors. To determine a preliminary set of predictor vari-
`ables to be used for modeling, boxplots are used to view the
`distributions of the real-time tool signals as a function of
`each input setting. Table 2 summarizes the real-time signals
`identilied as potential predictors for the factor responses.
`These signals reflect changes in the machine state which are
`in turn affected by changes in the input settings.
`
`and a structural component
`CI, = mJ
`The deviance- is defined as minus twice the log likelihood
`K
`
`(1)
`
`D(pi;yi) = -2
`y , k l O g ( P , k )
`k = 1
`and because the splits in a decision tree are based on maxi-
`mizing the change in deviance, the mechanism determining
`the partitions is equivalent to maximum likelihood estima-
`tion.
`Tree models are compared by how well the partition corre-
`sponds to the true decision rule for problem. For classifica-
`tion trees, a count of the number of errors as a proportion of
`the training set provides an estimate of the misclassification
`rate. Similarly, a probability distribution over the classes is
`formed from the training set, and using a Bayes decision
`rule, the algorithm chooses the class with the highest proba-
`bility as the prediction. ll~us, the tree serves as a probability
`model by providing a probability distribution over each one
`of the classes. The reader is referred to 131 and [4] for a more
`
`1
`Pressure
`Gau Suacinn I RFFuue, W o i l , Phase, Impedance, Volt,
`DCBias, EndpointC, &essure
`
`I
`
`I
`
`Table 2. Predictor variables for input setting responses
`The signal selection, model constructio
`-PLUS software in an S-PLUS environ-
`
`TREE-BASED MODELS
`
`Description
`Tree-based modeling is an exploratory
`be used to devise prediction rules, to select or screen vari-
`ables for prediction, and to examine complex multivariate
`
`B-50
`
`'be data set was divided into two mutually exclusive sets by
`arbitrarily picking 12 m s out of the 36 to use as a validation
`set. Classification trees for each factor response (input set-
`the training data of 24 runs
`ting) were then construct
`using the preliminary set
`ctors identitied in Table 2.
`An e x a l e of a classilkation tree built for the factor
`response gas rado is summarized in Table 3. Although the
`predictors listed in Table 2,
`RFTune and RFCoil as the most
`lassifying the gas ratio response.
`
`runs. Following the root are
`es where a split condition is
`
`with an asterisk (*) and represent final values or predictions.
`
`Authorized licensed use limited to: Katie Hibert. Downloaded on July 28,2021 at 17:38:04 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1020
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 2 of 4
`
`

`

`These nodes atre also called leaves. Note that there is a distri-
`bution by class within each node giving the probability of
`observations 'belonging to each factor response level. The
`shaded boxes indicate the diagnosis which is based on the
`level with the highest probability value.
`
`Node) Spltt (
`I) Root
`* 2) m
`e
`3) m e >
`*6) RFcoil*
`*7) m o i l :
`
`
`
`Table 3. Classification tree for Gas Ratio Response.
`A partition of the predictor space for the gas
`response
`is displayed in Figure 1. The partitions are based on the sim-
`plified classification trees obtained after snipping unneces-
`sary nodes. The resulting trees for each response span a
`domain defiled by no more than two predictor variables,
`thus enabling the plots of the partitioned space.
`
`high
`
`H
`
`L
`
`H
`H
`
`-7
`
`11400
`
`11600
`RFTune
`Figure 1, Partition for Gas Ratio Response
`
`11800
`
`12WO
`
`dation set. In general, reasonable models were obtained for
`the responses RFpower, totalflow and gap spacing, and
`these were tested by the validation set consisting of runs not
`used in building the models. However, the models for pres-
`sure and gas ratio performed rather poorly based on the val-
`idation sets.
`GENERALAZED LINEAR MODELS
`Description
`Geneaalized linear models (GLM's) extend linear models to
`allow for nonlinearity and heterogeneous variances. In the
`case for diagnosis, the factor responses can be modeled as
`binary response data (by grouping two factors together and
`attempting to distinguish them from the third). This is the
`approach taken Me.
`Assuming that the response y is encoded as binary data, the
`presence or absence of a condition, for example high pres-
`sure versus not high (medium or low) pressure, can be
`treated as a "success" with a value 'I", or "failure" with a
`value ''0'. This response data has a mean p, the probability
`of success, and a variance that depends on the mean. This
`leads to defining a link function relating the mean to the lin-
`ear predictors, g(p) = PTx, where the linear predictor is the
`logit link function
`
`q = l o p )
`1 -P
`
`
`
`(3)
`
`p = -
`e"
`1 +e"
`and p is guaranteed to lie within the range [0,11.
`The selection of the logit link is based on the binomial distri-
`bution and its conresponding log likelihood function.
`Thus the logistic regression model is defined by the logit link
`and the binomial variance function V(p) = p( 1 -p).
`Model Construction and Validation
`Rvo sets of models were built by encoding each factor
`response into a binary response. The first set was based on
`the high level as a "success" encoded with value "l", while
`the medium and low levels were grouped together as a ''fa;'
`we" and encoded with value "0'. The second set reversed t
`high and low roles, with low being a "success" encoded w
`value "l", and medium and high together encoded as "0'.
`GLM models were constructed using the training data set
`24 tuns to predict the probability of success for each facl
`response. As in the building of classification trees, the lint
`predictors for the models were chosen using the same set
`preliminary variables identilied for each factor in Table2. FUI
`example, the form of the model fitted for RFpower is repre-
`sented symbolically as
`logit@) = a + pTx
`where p is the probability of high RFpower for the first set of
`models. Table 5 shows the results of model building based
`on the trainiig data for each factor response. A m&e
`of
`g o h e s s for the models was calculated using the fomula
`D, - D, - x, - p
`2
`(4)
`B-5 1
`
`Results
`Table 4. Summary of Classification
`Table 4 lists the PrmctOrS used in me final &"mJn
`models for eaichiesponse, along with a Summary ofthe
`nostic results. The "M~sc" column contains the misclassifica-
`tion rate, a measure of the model fit to the training data. The
`"Valid' colurnn contains the misclassified points in the vali-
`
`Authorized licensed use limited to: Katie Hibert. Downloaded on July 28,2021 at 17:38:04 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1020
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 3 of 4
`
`

`

`In other words, the difference between the null and residual
`deviance is tested on the Chi-squared distributed with degree
`of freedom equal to the difference in the degrees (p-q) of the
`null and residual deviance respecti
`e interested reader
`is referred to [3] for a more thoro
`criteria. All of the models were found to be significant
`
`nostic results. Again the
`technique can lead to promising
`models can only be improved with a larger data set.
`We plan to expand our study to include data taken from man-
`ufacturing machines which have been known to exhibit real
`problems. This will test our assumptions about using
`designed experiments as simulations for faul
`
`I
`
`Pressure
`1
`-
`
`RF Power
`Gas Ratio
`
`I O . m I 0 I o.oo00 I O . m 1
`!
`0 . m 0 . m 0 . m
`0.2500 0.1250 0.2500
`
`I
`
`I
`
`o.oo00
`0.2500
`
`complement each others pitfalls.
`
`CONCLUSIONS
`Automated diagnosis of faults will provide a systematic
`method of drawing inferences from the availab
`while accounting for uncertainty by retaining a
`likelihood for each classification decision, The architecture
`under investigation gives structure to the problem and deals
`naturally with its inherent complexities. This is accom-
`plished by successively dividing the input space into operat-
`ing regions defined by fault categories and keeping track of
`independence assumptions. Future work includes automat-
`ing the calculation and updating of the model parameters and
`incorporating the models through the use of a hierarchical
`mixture of experts (HME) architecture [5].
`A diagnostic system promises to be invaluable to th
`tor, especially as a trouble-shooting tool to find problems
`early, thus preventing the propagation of faults and M e r
`damage to the machine. When implemented and used in con-
`junction with good engineering practices, this tool provides a
`means of mastery over the increasingly overwhelming
`amounts of data. By looking into the future and anticipating
`the needs generated by the advancement of technology, we
`set our research goals to advance the state of the aft in manu-
`facturing tools which brings us that much closer to automa-
`tion and better control in the fab.
`ACKNOWLEDGEMENTS
`The authors are grateful to Texas Instruments, Lam
`Research, Motorola, Triant Technologies, National Semicon-
`ductor, Digital Semiconductor, and to the SRC (97-FP-700)
`for support of this research.
`REFERENCES
`[l] A.M. Ison, C. J. Spanos, “Robust Fault Detection and
`Fault Classification of Semiconductor Manufacturing
`Equipment”, ISSM96, Tokyo, Japan, October 2-4 1996.
`[2] S. E Lee, E, D. Boskin, H. C. Liu, E. Wen, C. J.
`
`pp. 17-25.
`[3] W. N. Venables, B. D. Ripley,
`
`S, Chapman & Hall, London, 1993.
`[5] M.I. Jordan, RA. J
`experts and the EM
`
`6, p ~ . 181-214,1994
`
`al mixtures of
`1 Computation,
`
`Table 5. Summary of GLM Results
`Model validation was conducted on the remaining set of 12
`runs not used in bu
`the models. The results are shown
`ction (diagnosis) was
`in Table 5. Perfect
`the high and low levels for RFPower andpressu
`g a s f i w , and low gap spacing. The misclassification results
`are summarized in Table 5 for predictions of high and low
`responses respectively. The misclassification headings for
`the training and validation are the same as those in Table 4.
`
`DECISION TREE ARCHITECTURE
`The two modeling techniques for diagnosis examined in thcj
`paper can be combined in a decision tree architecture which
`makes use of the conditioning and partitioning of the input
`space in the classilkation trees, and the greater flexibility for
`modeling probabilities provided by the GLM’s. Specifically,
`it is wortb noting that the performance of the classification
`artitions were not con-
`trees could be vastly improved if
`in a single variable.
`strained to being constant hnc
`Using the logit link function to model the probabilities alle-
`viates this constraint. Similarly, because the GLM’s are fit to
`a single binary response, their performance as predictors
`could be improved by conditional knowledge of the operat-
`ing space (knowledge of the other responses). ’ h s could be
`achieved using the natural hierarchy provided by a tree-
`based model to partition the input space. For further explora-
`tion of these ideas, the interested reader is referred to PI.
`
`RE WORK
`SUMMARYAND
`Classification trees ate shown tQ be effective in predicting
`changes in the input settings using only a small subset of the
`real-time tool signals. In all cases, the trees are reduced to
`operate on a space defined by at most two predrctor variables
`(real-time signals), without an increase in the misclassifica-
`tion rate. Unfortunately the performance of these models is
`t used to train them. In our
`the results of the diagnosis
`or response to another.
`modeling binary response
`data shows that the increased flexibility in this modeling
`
`B-52
`
`Authorized licensed use limited to: Katie Hibert. Downloaded on July 28,2021 at 17:38:04 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1020
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 4 of 4
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket