Fault Diagnosis of Plasma Etch Equipment
`Anna M. Ison, Wei Li and Costas J. Spanos
`Department of EECS, University of California, Berkeley, CA 94720-1772
`office:(5 10)642-9584, fax:(5 101642-2739, emaik annaison,
`Abstract- The development and implementation of robust
`methods for fault detection promises to enhance manufmtur-
`ing by improving our capability to monitor equipment and
`processes. in order to filly utilize this capabiltty, it is impor-
`tant that the machine fault is not only detected but also diag-
`nosed as belonging to a fault category so that appropriate
`corrective action can promptly be taken. In this paper we
`exmine the diagnostic pelformanee of two probabilistic
`modeling techniques in using sensor signals to classify
`faults. We also discuss how the strength of these models
`may be combin!ed in a hierarchical architecture giving rise to
`a more powe @ul diagnostic tool.
`To guarantee continued success in the industry, semiconduc-
`tor manufacturers compete in product differentiation and
`development, !specifically by taking advantage of decreasing
`circuit geometries and through tighter specifications. How-
`ever, accommodating larger wafer sizes and meeting more
`stringent design demands necessitates accurate and robust
`chiuacterizatioln of the manufacturing process as well as reli-
`able prediction and control of its effects on the final wafer
`product. Proicess
`improvements translate directly
`increased efficiency, decreased machine downtime, and sav-
`ings in that ealrly detection and diagnosis of potential prob-
`lems can prevent long runs of misprocessed product.
`Furthermore, to be competitive economically involves
`increasing throughput, maintaining high yield, lowering the
`cost of machine ownership and speeding up the process
`development cycle.
`Achieving these goals set by the industry requires better
`process characterization and control. This motivates the
`development of a manufacturing tool to monitor equipment
`and diagnose problems or abnormalities in machine behav-
`ior. Our approach is to use probabilistic models to character-
`ize variability in the process and to identify modes of
`operation or nnachine states.
`Although the techniques presented are general, we have cho-
`sen the plasma etch process as a test vehicle for the method-
`ology. Much attention has focused on plasma aching
`because it is considered a critical manufacturing process and
`yield limiter. However, due to its complexity, the process is
`not easily repsented by physically based models. Further-
`more, in the current production situation, data gathering
`capabilities are surpassing the development of useful andyti-
`cal tools. A system is needed to automatically extract infer-
`ences hom the various data sources in a timely fashion, so
`that appropriate action can be taken. Ironically, although the
`total volume of data is significant, relevant representative
`data properly annotated with machine log information is still
`a rare commodity, and hence the situation lends itself well to
`0-1803-3752-2, f9IIS10
`01997 IEEE
`statistical methods to draw inferences from a sample repre-
`sentative of a larger population.
`In this papex we examine two techniques for diagnosing
`faults in machine sensor data. Tree-based models are shown
`to be an effective method of identifying sensor signals most
`sensitive to changes in the input settings of the machine. This
`method is compared with the performance of generalized lin-
`ers models built to predict levels of the input settings based
`on sensor signals. Emally, we discuss how the strengths of
`the two methods can be combined to enhance diagnostic per-
`For practical reasons, the system is constructed using data
`collected easily and economically &om the machine without
`interrupting the process. A designed experiment was con-
`ducted on a Lam Rainbow 4400 plasma etcher in the U.C.
`Berkeley microfabrication Laboratory, providing the multi-
`variate real-time tool data used to build the models in this
`Through the use of real-time tool signals, in particular, SEC-
`SII machine information collected in-situ, we can effectively
`monitor the machine state without interrupting the process.
`Using data collected as part of the regular production pro-
`cess, we have built time-series models to monitor machind
`behavior on different time-scales, namely on a lot-to-lot,
`wafer-to-wafer, and real-time basis, and have used SPC tech-
`niques to detect faults [1][2]. The detection of an out-of-con-
`trol condition by the fault detection mechanism indicates the
`possible presence of a fault. In order to confirm the hypothe-
`sis that a fault has occurred and to identify an assignable
`cause, a diagnostic system in a probabilistic framework is
`developed which will class@ faults into discrete categories.
`In itself, the system serves as a tool to assist engineers in
`identifying problems affecting machine performance which
`could result in damage to the product, and as an early warn-
`ing system to aid in scheduling preventative maintenance
`events, potentially reducing machine downtime. However, in
`the larger framework, this classification capability identifies
`modes of equipment operation, allowing us to utilize the
`appropriate blend of models best suited for that operation
`mode. Accordingly, wafer chatacteristics can be predicted
`based on a better estimate of the machine state.
`Data Description and Experimental Design
`?he monitored signals used in this work are those suspected
`to be most sensitive to changes in the chamber state of the
`etcher [2]. These signals are known as real-time tool signals
`and are collected while wafers are being processed at a rate
`Authorized licensed use limited to: Katie Hibert. Downloaded on July 28,2021 at 17:38:04 UTC from IEEE Xplore. Restrictions apply.
`Applied Materials, Inc. Ex. 1020
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 1 of 4


`of 1 Hz. The changes we wish to detect and classlfy in this
`paper correspond to specific shifts in the input settings of the
`machine. The assumption i s that abnormal machine behavior
`will manifest itself in a mann
`change in the input settings
`which are varied over three
`composite design; this is
`range of different faulty
`period for each of the 3
`datasets, The algorithm implementing the construction of
`tree-based models must determine variables on which to
`divide, and how to split the space into partitions. It does this
`by partitioning the space of the predictor variables x into
`homogeneous regions, attempting to make the conditional
`distribution of the response y given x, f(y/x), independent of
`x. The algorithm accomplishes this task by using a criterion
`a measure of deviance.
`" i h g
`Classification trees are based on the multinomial distribu-
`tion. If we considea a vector, for example, y = (O,l,O), to rep-
`resent the response J belonging to the second of three factor
`levels, then the probability corresponding to a response fall-
`into each level would be given by 1 = (p1,p2,p3), with the
`straint E p i = 1, i = 1,2,3.
`'Ibe model consists of a stochastic component given by
`yi - Mh), i = 1,2, ...,N
`Table 1. Input settings for the plasma etcher
`Signal Selection
`In this study, the probability of a high, medium, or Zow value
`for an input setting to a plasma etcher is
`ber as
`real-time tool signals collected from the p
`predictors. To determine a preliminary set of predictor vari-
`ables to be used for modeling, boxplots are used to view the
`distributions of the real-time tool signals as a function of
`each input setting. Table 2 summarizes the real-time signals
`identilied as potential predictors for the factor responses.
`These signals reflect changes in the machine state which are
`in turn affected by changes in the input settings.
`and a structural component
`CI, = mJ
`The deviance- is defined as minus twice the log likelihood
`D(pi;yi) = -2
`y , k l O g ( P , k )
`k = 1
`and because the splits in a decision tree are based on maxi-
`mizing the change in deviance, the mechanism determining
`the partitions is equivalent to maximum likelihood estima-
`Tree models are compared by how well the partition corre-
`sponds to the true decision rule for problem. For classifica-
`tion trees, a count of the number of errors as a proportion of
`the training set provides an estimate of the misclassification
`rate. Similarly, a probability distribution over the classes is
`formed from the training set, and using a Bayes decision
`rule, the algorithm chooses the class with the highest proba-
`bility as the prediction. ll~us, the tree serves as a probability
`model by providing a probability distribution over each one
`of the classes. The reader is referred to 131 and [4] for a more
`Gau Suacinn I RFFuue, W o i l , Phase, Impedance, Volt,
`DCBias, EndpointC, &essure
`Table 2. Predictor variables for input setting responses
`The signal selection, model constructio
`-PLUS software in an S-PLUS environ-
`Tree-based modeling is an exploratory
`be used to devise prediction rules, to select or screen vari-
`ables for prediction, and to examine complex multivariate
`'be data set was divided into two mutually exclusive sets by
`arbitrarily picking 12 m s out of the 36 to use as a validation
`set. Classification trees for each factor response (input set-
`the training data of 24 runs
`ting) were then construct
`using the preliminary set
`ctors identitied in Table 2.
`An e x a l e of a classilkation tree built for the factor
`response gas rado is summarized in Table 3. Although the
`predictors listed in Table 2,
`RFTune and RFCoil as the most
`lassifying the gas ratio response.
`runs. Following the root are
`es where a split condition is
`with an asterisk (*) and represent final values or predictions.
`Authorized licensed use limited to: Katie Hibert. Downloaded on July 28,2021 at 17:38:04 UTC from IEEE Xplore. Restrictions apply.
`Applied Materials, Inc. Ex. 1020
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 2 of 4


`These nodes atre also called leaves. Note that there is a distri-
`bution by class within each node giving the probability of
`observations 'belonging to each factor response level. The
`shaded boxes indicate the diagnosis which is based on the
`level with the highest probability value.
`Node) Spltt (
`I) Root
`* 2) m
`3) m e >
`*6) RFcoil*
`*7) m o i l :
`Table 3. Classification tree for Gas Ratio Response.
`A partition of the predictor space for the gas
`is displayed in Figure 1. The partitions are based on the sim-
`plified classification trees obtained after snipping unneces-
`sary nodes. The resulting trees for each response span a
`domain defiled by no more than two predictor variables,
`thus enabling the plots of the partitioned space.
`Figure 1, Partition for Gas Ratio Response
`dation set. In general, reasonable models were obtained for
`the responses RFpower, totalflow and gap spacing, and
`these were tested by the validation set consisting of runs not
`used in building the models. However, the models for pres-
`sure and gas ratio performed rather poorly based on the val-
`idation sets.
`Geneaalized linear models (GLM's) extend linear models to
`allow for nonlinearity and heterogeneous variances. In the
`case for diagnosis, the factor responses can be modeled as
`binary response data (by grouping two factors together and
`attempting to distinguish them from the third). This is the
`approach taken Me.
`Assuming that the response y is encoded as binary data, the
`presence or absence of a condition, for example high pres-
`sure versus not high (medium or low) pressure, can be
`treated as a "success" with a value 'I", or "failure" with a
`value ''0'. This response data has a mean p, the probability
`of success, and a variance that depends on the mean. This
`leads to defining a link function relating the mean to the lin-
`ear predictors, g(p) = PTx, where the linear predictor is the
`logit link function
`q = l o p )
`1 -P
`p = -
`1 +e"
`and p is guaranteed to lie within the range [0,11.
`The selection of the logit link is based on the binomial distri-
`bution and its conresponding log likelihood function.
`Thus the logistic regression model is defined by the logit link
`and the binomial variance function V(p) = p( 1 -p).
`Model Construction and Validation
`Rvo sets of models were built by encoding each factor
`response into a binary response. The first set was based on
`the high level as a "success" encoded with value "l", while
`the medium and low levels were grouped together as a ''fa;'
`we" and encoded with value "0'. The second set reversed t
`high and low roles, with low being a "success" encoded w
`value "l", and medium and high together encoded as "0'.
`GLM models were constructed using the training data set
`24 tuns to predict the probability of success for each facl
`response. As in the building of classification trees, the lint
`predictors for the models were chosen using the same set
`preliminary variables identilied for each factor in Table2. FUI
`example, the form of the model fitted for RFpower is repre-
`sented symbolically as
`logit@) = a + pTx
`where p is the probability of high RFpower for the first set of
`models. Table 5 shows the results of model building based
`on the trainiig data for each factor response. A m&e
`g o h e s s for the models was calculated using the fomula
`D, - D, - x, - p
`B-5 1
`Table 4. Summary of Classification
`Table 4 lists the PrmctOrS used in me final &"mJn
`models for eaichiesponse, along with a Summary ofthe
`nostic results. The "M~sc" column contains the misclassifica-
`tion rate, a measure of the model fit to the training data. The
`"Valid' colurnn contains the misclassified points in the vali-
`Authorized licensed use limited to: Katie Hibert. Downloaded on July 28,2021 at 17:38:04 UTC from IEEE Xplore. Restrictions apply.
`Applied Materials, Inc. Ex. 1020
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 3 of 4


`In other words, the difference between the null and residual
`deviance is tested on the Chi-squared distributed with degree
`of freedom equal to the difference in the degrees (p-q) of the
`null and residual deviance respecti
`e interested reader
`is referred to [3] for a more thoro
`criteria. All of the models were found to be significant
`nostic results. Again the
`technique can lead to promising
`models can only be improved with a larger data set.
`We plan to expand our study to include data taken from man-
`ufacturing machines which have been known to exhibit real
`problems. This will test our assumptions about using
`designed experiments as simulations for faul
`RF Power
`Gas Ratio
`I O . m I 0 I o.oo00 I O . m 1
`0 . m 0 . m 0 . m
`0.2500 0.1250 0.2500
`complement each others pitfalls.
`Automated diagnosis of faults will provide a systematic
`method of drawing inferences from the availab
`while accounting for uncertainty by retaining a
`likelihood for each classification decision, The architecture
`under investigation gives structure to the problem and deals
`naturally with its inherent complexities. This is accom-
`plished by successively dividing the input space into operat-
`ing regions defined by fault categories and keeping track of
`independence assumptions. Future work includes automat-
`ing the calculation and updating of the model parameters and
`incorporating the models through the use of a hierarchical
`mixture of experts (HME) architecture [5].
`A diagnostic system promises to be invaluable to th
`tor, especially as a trouble-shooting tool to find problems
`early, thus preventing the propagation of faults and M e r
`damage to the machine. When implemented and used in con-
`junction with good engineering practices, this tool provides a
`means of mastery over the increasingly overwhelming
`amounts of data. By looking into the future and anticipating
`the needs generated by the advancement of technology, we
`set our research goals to advance the state of the aft in manu-
`facturing tools which brings us that much closer to automa-
`tion and better control in the fab.
`The authors are grateful to Texas Instruments, Lam
`Research, Motorola, Triant Technologies, National Semicon-
`ductor, Digital Semiconductor, and to the SRC (97-FP-700)
`for support of this research.
`[l] A.M. Ison, C. J. Spanos, “Robust Fault Detection and
`Fault Classification of Semiconductor Manufacturing
`Equipment”, ISSM96, Tokyo, Japan, October 2-4 1996.
`[2] S. E Lee, E, D. Boskin, H. C. Liu, E. Wen, C. J.
`pp. 17-25.
`[3] W. N. Venables, B. D. Ripley,
`S, Chapman & Hall, London, 1993.
`[5] M.I. Jordan, RA. J
`experts and the EM
`6, p ~ . 181-214,1994
`al mixtures of
`1 Computation,
`Table 5. Summary of GLM Results
`Model validation was conducted on the remaining set of 12
`runs not used in bu
`the models. The results are shown
`ction (diagnosis) was
`in Table 5. Perfect
`the high and low levels for RFPower andpressu
`g a s f i w , and low gap spacing. The misclassification results
`are summarized in Table 5 for predictions of high and low
`responses respectively. The misclassification headings for
`the training and validation are the same as those in Table 4.
`The two modeling techniques for diagnosis examined in thcj
`paper can be combined in a decision tree architecture which
`makes use of the conditioning and partitioning of the input
`space in the classilkation trees, and the greater flexibility for
`modeling probabilities provided by the GLM’s. Specifically,
`it is wortb noting that the performance of the classification
`artitions were not con-
`trees could be vastly improved if
`in a single variable.
`strained to being constant hnc
`Using the logit link function to model the probabilities alle-
`viates this constraint. Similarly, because the GLM’s are fit to
`a single binary response, their performance as predictors
`could be improved by conditional knowledge of the operat-
`ing space (knowledge of the other responses). ’ h s could be
`achieved using the natural hierarchy provided by a tree-
`based model to partition the input space. For further explora-
`tion of these ideas, the interested reader is referred to PI.
`Classification trees ate shown tQ be effective in predicting
`changes in the input settings using only a small subset of the
`real-time tool signals. In all cases, the trees are reduced to
`operate on a space defined by at most two predrctor variables
`(real-time signals), without an increase in the misclassifica-
`tion rate. Unfortunately the performance of these models is
`t used to train them. In our
`the results of the diagnosis
`or response to another.
`modeling binary response
`data shows that the increased flexibility in this modeling
`Authorized licensed use limited to: Katie Hibert. Downloaded on July 28,2021 at 17:38:04 UTC from IEEE Xplore. Restrictions apply.
`Applied Materials, Inc. Ex. 1020
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 4 of 4

