`
`39
`
`Real-Time Diagnosis of Semiconductor
`Manufacturing Equipment Using a
`Hybrid Neural Network Expert System
`
`Byungwhan Kim, Member, IEEE, and Gary S. May, Senior Member, IEEE
`
`Abstract— This paper presents a tool for the real-time diag-
`nosis of integrated circuit fabrication equipment. The approach
`focuses on integrating neural networks into an expert system.
`The system employs evidential reasoning to identify malfunctions
`by combining evidence originating from equipment maintenance
`history, on-line sensor data, and in-line post-process measure-
`ments. Neural networks are used in the maintenance phase
`of diagnosis to approximate the functional form of the failure
`history distribution of each component. Predicted failure rates
`are then converted to belief levels. For on-line diagnosis in the
`case of previously unencountered faults, a CUSUM control chart
`is implemented on real sensor data to detect very small process
`shifts and their trends. For the known fault case, continuous
`hypothesis testing on the statistical mean and variance of the
`sensor data is performed to search for similar data patterns and
`assign belief levels. Finally, neural process models of process
`figures of merit (such as etch uniformity) derived from prior
`experimentation are used to analyze the in-line measurements,
`and identify the most suitable candidate among faulty input
`parameters (such as gas flow) to explain process shifts. A working
`prototype for this hybrid diagnostic system has been implemented
`on the Plasma Therm 700 series reactive ion etcher located in the
`Georgia Tech Microelectronics Research Center.
`
`Index Terms—Diagnosis, expert systems, neural networks, re-
`active ion etching.
`
`I. INTRODUCTION
`
`identifying the assignable causes for the equipment malfunc-
`tions and correcting them quickly to prevent the subsequent
`occurrence of expensive misprocessing. With the advent of
`highly proficient sensors capable of monitoring process con-
`ditions in-situ, it is now desirable to perform diagnosis on a
`real-time basis.
`Algorithmic diagnostic systems such as HIPPOCRATES [1]
`have been developed to identify process faults from statistical
`inference procedures and electrical measurements performed
`on finished IC wafers. Although this system makes good use
`of quantitative models of process behavior, it can only arrive at
`useful diagnostic conclusions in the limited regions of opera-
`tion over which these models are valid. Furthermore, in critical
`process steps such as reactive ion etching (RIE), the theo-
`retical basis for determining causal relationships is not well
`understood, thereby limiting the usefulness of physical models
`[2]. Expert systems such as PIES [3] have been designed
`to draw upon experiential knowledge to develop qualitative
`models of process behavior. This approach has attained limited
`success in attempting to diagnose unstructured problems which
`lack a solid conceptual foundation for reasoning. However, a
`purely knowledge-based technique often lacks the precision
`inherent in deep-level physical models, and is thus incapable
`of deriving solutions for unanticipated situations from the
`underlying principles surrounding the process.
`Neural networks have recently emerged as an effective
`tool for process modeling [4], [5] as well as fault diagnosis
`[6], [7]. Diagnostic problem solving using neural networks
`requires the association of input patterns representing quanti-
`tative and qualitative process behavior to fault identification.
`Robustness to noisy sensor data and high speed parallel
`computation make neural networks an attractive alternative for
`real-time diagnosis. However, the pattern recognition-based
`neural network approach has limitations. First, a complete set
`of fault signatures is hard to obtain, and the representational
`inadequacy of a limited number of data sets can induce
`network overtraining, thus increasing the misclassification or
`“false alarm” rate. Also, pattern matching approaches in which
`diagnostic actions take place following a sequence of several
`processing steps are sub-optimal since evidence pertaining
`to potential equipment malfunctions accumulates at irregular
`intervals throughout the process sequence. At the end of a
`sequence, significant misprocessing and yield loss may have
`already taken place, making post-process diagnosis alone
`economically undesirable.
`1083–4400/97$10.00 ª
`
`AS THE semiconductor industry moves toward submicron
`
`fabrication technology, tight control of process variability
`is an essential requirement. A certain amount of variability
`is inherent
`in sophisticated semiconductor equipment, and
`significant performance shifts may occur when this variabil-
`ity becomes large compared to random process noise (i.e.,
`fluctuations resulting from small and essentially uncontrol-
`lable causes). Such shifts are often indicative of equipment
`malfunctions. When unreliable equipment performance causes
`operating conditions to vary beyond an acceptable level, over-
`all product quality is jeopardized. Thus, timely and accurate
`equipment malfunction diagnosis can be a key to the success of
`the semiconductor manufacturing process. Diagnosis involves
`
`Manuscript received January 31, 1996; revised March 1997. This work was
`supported by the National Science Foundation Grant DDM-9 358163 and the
`IEEE/CPMT Motorola Fellowship.
`B. Kim is with the Memory R&D Division, Department of Equipment
`Engineering, Hyundai Electronics Industries Co., Ltd., Korea.
`G. S. May is with the School of Electrical and Computer Engineering,
`Georgia Institute of Technology, Atlanta, GA 30332-0250 USA.
`Publisher Item Identifier S 1083-4400(97)04320-9.
`
`1997 IEEE
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:51:26 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1029
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 1 of 9
`
`
`
`40
`
`IEEE TRANSACTIONS ON COMPONENTS, PACKAGING, AND MANUFACTURING TECHNOLOGY—PART C, VOL. 20, NO. 1, JANUARY 1997
`
`This paper presents a prototype tool for the automated mal-
`function diagnosis of integrated circuit fabrication equipment.
`The methodology described combines the best characteristics
`of quantitative algorithmic, qualitative experiential and pattern
`recognition-based neural network approaches. This system
`offers advantages in that it yields a stable and reliable ranked
`list of fault possibilities, even in the presence of measurement
`noise (in part due to the inherent noise resistance of neural
`networks). In addition, the varying degrees of belief in each
`stage of diagnosis aids in the early detection of suspicious
`trends, often prior to an actual failure occurrences. This work-
`ing prototype is currently being developed and implemented
`on a Plasma Therm 700 series RIE located in the Georgia Tech
`Microelectronics Research Center.
`
`II. DIAGNOSTIC INFERENCE METHOD
`As a diagnostic inference method,
`the Dempster–Shafer
`theory of evidential reasoning [8] has proven to be suitable
`for real-time malfunction diagnosis applications [9]. This tech-
`nique allows the combination of various pieces of uncertain
`evidence obtained at irregular intervals, and its implementation
`results in time-varying, nonmonotonic belief functions which
`reflect the current status of diagnostic conclusions at any given
`point in time.
`One of the basic concepts in Dempster–Shafer theory is
`the frame of discernment (symbolized by ), defined as an
`exhaustive set of mutually exclusive propositions. In diagnosis,
`the frame of discernment is the union of all possible fault
`hypotheses. Each piece of collected evidence can be mapped
`to a fault or group of faults within . The likelihood of a fault
`proposition
`is expressed as a bounded interval [
`]
`which lies in [0, 1]. The parameter
`represents the support
`for
`, which measures the weight of evidence in support of
`. The other parameter,
`, called the plausibility of
`,
`is defined as the degree to which contradictory evidence is
`lacking. Plausibility measures the maximum amount of belief
`that can possibly be assigned to
`. The quantity
`is
`the uncertainty of
`, which is the difference between the
`evidential plausibility and support. For example, an evidence
`indicates that
`the
`interval of [0.3, 0.7] for proposition
`probability of
`is between 0.3 and 0.7, with an uncertainty
`of 0.4.
`For diagnosis, proposition
`represents a given fault hy-
`pothesis. An evidence interval for fault
`is determined from
`a basic probability mass distribution (BPMD). The BPM
`indicates the portion of the total belief in evidence assigned
`to a particular fault hypothesis set. Any residual belief in the
`frame of discernment that cannot be attributed to any subset
`of
`is assigned directly to
`itself, which in effect introduces
`uncertainty into the diagnosis. Using this framework,
`the
`support and plausibility of proposition
`are given by
`
`(1)
`(2)
`
`Fig. 1. Partial schematic of RIE gas delivery system.
`
`Dempster’s rules for evidence combination provide a de-
`terministic and unambiguous method of combining BPMD’s
`from separate and distinct sources of evidence contributing
`varying degrees of belief to several propositions under a
`common frame of discernment. The rule for combing the
`observed BPM’s of two arbitrary and independent knowledge
`sources
`and
`into a third
`is as follows:
`
`where
`
`and
`
`(3)
`
`(4)
`
`where
`represent various
`and
`Ø. Here
`propositions which consist of fault hypotheses and disjunctions
`thereof. Thus, the BPM of the intersection of
`and
`is the
`product of the individual BPM’s of
`and
`. The factor
`is a normalization constant which prevents the total
`belief from exceeding unity due to attributing portions of belief
`to the empty set.
`Consider the combination of
`when each contains
`and
`different evidence concerning the diagnosis of a malfunction
`in the RIE application. Such evidence could result from two
`different sensor readings for example. In particular, suppose
`that the sensors have observed that the flow of one of the
`etch gases into the process chamber is too low. Let
`the
`frame of discernment
`, where
`through
`symbolically represent the following mutually exclusive
`equipment faults:
`mass flow controller miscalibration;
`gas line leak;
`throttle valve malfunction;
`incorrect sensor signal.
`These components are illustrated graphically in the partial
`schematic of the etcher gas flow system shown in Fig. 1.
`Suppose that belief in this frame of discernment is dis-
`tributed according to the BPMD’s:
`
`and the summation is taken over
`and
`where
`all propositions in a given BPM. Thus the total belief in
`is
`the sum of support ascribed to
`and all subsets thereof.
`
`) is shown in
`The calculation of the combined BPMD (
`Table I. Each cell of the table contains the intersection of the
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:51:26 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1029
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 2 of 9
`
`
`
`KIM AND MAY: REAL-TIME DIAGNOSIS OF SEMICONDUCTOR MANUFACTURING EQUIPMENT
`
`41
`
`TABLE I
`ILLUSTRATION OF BPMD COMBINATION
`
`, along with the
`and
`corresponding propositions from
`product of their individual beliefs. Note that the intersection
`of any proposition with
`is the original proposition. The
`BPM attributed to the empty set,
`, which originates from
`the presence of various propositions in
`and
`whose
`intersection is empty, is 0.11. By applying (3), BPM’s for the
`remaining propositions result in
`
`combined
`individual
`
`the
`in
`propositions
`for
`plausibilities
`The
`BPM are
`calculated by applying (2). The
`evidential
`intervals implied by
`are
`.
`and
`Combining the evidence available from knowledge sources
`and
`thus leads to the conclusion that
`the most
`likely cause of the insufficient gas flow malfunction is a
`miscalibration of the mass flow controller (proposition
`).
`
`III. NEURAL NETWORK-BASED RIE MODELING
`Neural networks have the capability of learning complex
`relationships between groups of related parameters. They
`consist of parallel processing units (called neurons), which
`are interconnected in such a way that knowledge is stored
`in the weight of the connections between them. Each neuron
`contains the weighted sum of its inputs filtered by a sigmoidal
`activation function. The nonlinear mapping capabilities of
`neural networks have recently been applied by several other
`researchers in semiconductor process modeling [10]–[13]. To
`model the RIE process, the quantitative relationships which
`relate input parameters to output responses have been encoded
`in feed-forward neural networks via the error back-propagation
`(BP) algorithm [14]. The structure of a typical BP network
`appears in Fig. 2. The specific manner in which BP neural
`nets are used in RIE diagnosis is described below.
`
`A. Time Series Modeling
`For real-time diagnosis, it is critical to model the variation
`of in-situ sensor data and develop an efficient method for
`handling this voluminous and multidimensional data. Time-
`series modeling is a means to achieve each of these ends.
`Under malfunction conditions, sensor readings can serve as
`process “signatures” which assist in identifying the occurrence
`of a particular fault. Recently, neural networks have been
`proposed as a means to develop time series models of tool
`data [15].
`
`Fig. 2. Typical back-propagation neural network.
`
`Fig. 3. Data signatures for a malfunctioning CHF3 mass flow controller.
`
`Neural networks used to generalize the behavior of a time
`series are referred to as neural time series (NTS) models. The
`NTS model is capable of simultaneously filtering both auto-
`and cross-correlated data. That is, the NTS model can account
`for correlation among several variables being monitored simul-
`taneously. To illustrate, real-time tool data was collected via an
`equipment monitoring system designed to transfer data from an
`etcher to a remote workstation. Monitoring was accomplished
`using a Tektronix Model 2510 TestLab data acquisition system
`interfaced to the Plasma Therm RIE system via serial ports. In
`this example, an equipment alarm was signaled, and its cause
`was later identified to be an insufficient gas supply from the
`tri-fluoro methane mass flow controller (CHF ). Fig. 3 depicts
`malfunctioning behavior of the CHF gas flow.
`An NTS network was trained to model the CHF flow
`pattern in the RIE process using a simple sampling technique
`which involved training the network to forecast the next CHF
`from the behavior of five past values. The training set for the
`NTS network consisted of one out of every ten data samples.
`As shown in Fig. 4, auto-correlation among consecutive CHF
`measurements was accounted for by simultaneously training
`the network on the present value of CHF and five past
`values. The cross-correlation among the CHF was modeled
`by including as inputs to the network the present values of
`the temperature, incident and reflected RF power, oxygen and
`CHF . The accuracy of the trained network was measured by
`its root-mean-squared (RMS) error, which was 2.2%. Once
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:51:26 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1029
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 3 of 9
`
`
`
`42
`
`IEEE TRANSACTIONS ON COMPONENTS, PACKAGING, AND MANUFACTURING TECHNOLOGY—PART C, VOL. 20, NO. 1, JANUARY 1997
`
`Inputs (auto and cross-correlated data) and output of a neural
`Fig. 4.
`time-series model.
`
`trained, the NTS model provides a simple means to encode
`this fault signature for later use.
`
`B. Process Modeling
`Diagnostic information can also be extracted from in-line
`measurements of post-processed wafers. To achieve this, these
`measurements must be compared to values predicted by a
`process model. Differences between model predictions and
`measured responses are indicative of potential equipment
`malfunctions. In [5], neural network models of RIE responses
`were developed from a Box–Wilson central composite circum-
`scribed design requiring 27 trials [16]. Etching was performed
`on a test structure designed to facilitate the simultaneous
`measurement of the etch rate, uniformity, anisotropy of SiO
`in a CHF and oxygen plasma, as well as the selectivity of
`the SiO etch with respect to photoresist. This characterization
`experiment provided neural network training data.
`A “forward” neural network-based process model defines
`a functional relationship between RIE process conditions (in-
`puts) such as RF power or gas composition and responses
`(outputs) such as etch rate or uniformity. The forward process
`model also provides a mechanism for comparing measured
`RIE output responses to predicted values. Large differences,
`which may indicate potential equipment faults, must then be
`traced back to fluctuations in model input parameters. To
`calculate the shift of the process input settings from their
`nominal values, an inverse neural process model is employed.
`This inverse model is obtained by training the network “in
`reverse” (i.e., using output/input pairs, rather than input/output
`pairs). The inverse model provides a means to identify the
`input parameter which is most likely responsible for an output
`process shift. Process shifts required for generating evidential
`support and plausibility can then be computed by utilizing the
`inverse neural process models.
`
`IV. GENERATION OF EVIDENTIAL SUPPORT
`The three relevant time periods for evidence collection in
`semiconductor manufacturing are:
`1) during equipment maintenance periods (before process-
`ing);
`2) during on-line equipment operation (during processing);
`3) during in-line, post-process physical and/or electrical
`inspection (after processing).
`
`Fig. 5. Chronological evidence sources for equipment malfunction diagno-
`sis.
`
`Diagnosis based on this framework for evidence collection
`takes place in three chronological stages (Fig. 5). Maintenance
`diagnosis is performed by examining the relevant historical
`records of equipment performance and building reliability
`models of each equipment component. During on-line diag-
`nosis, both neural time-series models and CUSUM control
`chart [17] techniques are employed to analyze fault pat-
`terns available from equipment monitoring system. For in-line
`diagnosis, measurements on processed wafers are used in
`conjunction with neural network process models. In each
`phase, evidential support and plausibility for various fault
`hypotheses are generated and mapped to particular equipment
`components. The methodology employed to do so is discussed
`below.
`
`A. Maintenance Diagnosis
`During the maintenance phase, the objective of the diag-
`nostic system is to derive evidence of potential component
`failures based on the historical performance of equipment
`components. The data available from which evidential belief
`may be generated is limited, consisting of only the number of
`failures a given component has experienced and the compo-
`nent age. In order to derive evidential support for potential
`malfunctions from this information, a reliability modeling
`technique has been developed to investigate the aging behavior
`of components. The failure probability as a function of time
`and the instantaneous failure rate (or “hazard” rate) for each
`component may be estimated from a neural network trained on
`failure history. The neural reliability model may then be used
`to generate evidential support, plausibility and uncertainty for
`each fault hypotheses (i.e., each potentially faulty component)
`in the frame of discernment.
`1) The Weibull Distribution: Consider reliability model-
`ing based on the Weibull distribution. The Weibull distribution
`has been used extensively as a model of time to failure in
`electrical and mechanical components and systems. Examples
`of systems which lend themselves to the Weibull model include
`electrical components such as batteries and ceramic multilayer
`capacitors, mechanical systems such as gas turbine engines,
`semiconductor devices such as memory circuits, individual
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:51:26 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1029
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 4 of 9
`
`
`
`KIM AND MAY: REAL-TIME DIAGNOSIS OF SEMICONDUCTOR MANUFACTURING EQUIPMENT
`
`43
`
`The error signal for the modified back-propagation neural
`network in this case is
`
`(7)
`
`Since the predicted hazard rate is differentiable, the error
`gradient with respect to the network weights may be computed
`using the chain rule as
`
`Fig. 6. Scheme to estimate Weibull function parameters.
`
`(8)
`
`mechanical parts such as bearings, and structural elements in
`aircraft and automobiles [18]. When a system is composed
`of a number of components and failure is due to the most
`serious of a large number of possible faults,
`the Weibull
`distribution seems to be a particularly accurate model [17],
`and this closely resembles the situation being addressed in
`semiconductor equipment malfunction diagnosis.
`The cumulative distribution function (which represents the
`failure probability of a component at time ) for the two-
`parameter Weibull distribution is given by
`
`(5)
`
`are called scale and shape parameters, respec-
`and
`where
`tively. If a device exhibits Weibull-like reliability behavior, the
`and
`will allow this distribution
`appropriate selection of
`functions to closely approximate the observed failure behavior
`throughout its lifetime. The Weibull hazard rate,
`, is given
`by
`
`is the calculated output of the th neuron in the
`where
`th layer. The first partial derivative in (8) is
`,
`and the third is the same as in the standard implementation
`of the back-propagation algorithm [14]. As for the second
`factor,
`this partial derivative may be computed separately
`for each individual output neuron (or equivalently, for each
`unknown parameter to be estimated). Due to the initially
`random network weights, the first predicted values of the
`hazard rate are arbitrary. However, after several
`training
`iterations, the predicted hazard rate converges to the actual
`rate. At this point, the scale and shape parameters computed
`at the network output are the estimates which best fit the
`distribution indicated by the training data.
`Following parameter estimation using this technique, the
`evidential support for each equipment component
`is then
`obtained from the Weibull distribution function in (5) with
`the estimated parameters. The corresponding plausibility is the
`associated with this probability estimate,
`confidence level
`which is defined as [19]
`
`(6)
`
`(9)
`
`The hazard rate may be computed from the failure history of
`each component by plotting the number of failures versus time
`and finding the slope of this curve at each time point.
`A scheme designed to extract the shape and scale parameters
`using neural networks has been developed and tested, and
`is outlined in [18]. This scheme is depicted schematically
`in Fig. 6. Here the network outputs represent the initially
`unknown scale and shape parameters. These outputs are iter-
`atively adjusted to reach to their optimal values as the neural
`network learns. The outputs are fed into the failure hazard
`function in (6), a predicted hazard rate (
`) is computed, and
`the result is compared with the actual hazard rate (
`), which
`has been computed from the failure history data.
`The standard back-propagation training algorithm for feed-
`forward neural networks begins with a random set of weights.
`An input vector which has been normalized so that all input
`data lies in the interval between
`1 and 1 is then presented
`to the network, and the output is calculated using this initial
`weight matrix. Next, the calculated output vector is compared
`to the measured output vector, and the squared difference
`between the two is used to determine the system error.
`Error minimization is accomplished via the gradient descent
`approach, in which the weights are adjusted in the direction
`of decreasing error.
`
`denotes the total number of component failures which
`where
`have been observed at time .
`2) The Exponential Distribution: Although the Weibull
`distribution provides one approach to component reliability
`modeling, due to its simple functional form, the exponential
`distribution is widely used to describe the time elapsing
`between two failures by characterizing the period during which
`a failure rate is constant [17]. The cumulative distribution
`function of this distribution is
`
`(10)
`
`is a constant equal to the reciprocal of the mean-time-
`where
`to-failure. This parameter may be estimated as
`
`(11)
`
`represents the elapsed time between th and (
`where
`)th failure of a specific component. The evidential support
`is obtained by inserting (11) into (10) and subsequently
`computing the corresponding Dempster–Shafer plausibility by
`using (9).
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:51:26 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1029
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 5 of 9
`
`
`
`44
`
`IEEE TRANSACTIONS ON COMPONENTS, PACKAGING, AND MANUFACTURING TECHNOLOGY—PART C, VOL. 20, NO. 1, JANUARY 1997
`
`TABLE II
`SAMPLE FAULT RANKING AFTER MAINTENANCE DIAGNOSIS
`
`3) The Normal Distribution: The normal distribution is a
`symmetrical two-parameter distribution characterized by the
`mean and the standard deviation. Its extensive popularity in
`statistical applications stems from the fact that many mea-
`surements exhibit normal behavior. The normal cumulative
`distribution function is
`
`(12)
`
`The normal parameters in (12) may be estimated by using
`maximum likelihood estimation. The resultant estimates of
`the mean and the variance, which are denoted as
`and
`,
`respectively, are [17]
`
`(13)
`
`(14)
`
`The evidential belief in this case is computed from (12). The
`corresponding Dempster–Shafer plausibility is then calculated
`from (9).
`In this diagnostic system, the user is able to select which
`probability distribution best describes the reliability behavior
`of each component. Applying this methodology to the Plasma
`Therm RIE system yields a ranked list of components faults
`similar to that shown in Table II.
`
`B. On-Line Diagnosis
`1) Recognized Fault Case: In diagnosing faults previ-
`ously encountered by the system, NTS models are used
`to describe raw tool data indicating specific fault patterns.
`The resemblance between stored NTS fault models and the
`pattern currently under examination is measured to ascertain
`a measure of their similarity. The underlying assumption is
`
`that an equipment malfunction is triggered by an inadvertent
`process shift in one of process settings. This shift is assumed
`to be larger than the variability inherent in the processing
`equipment. To ascribe evidential support and plausibility
`to such a shift, statistical hypothesis tests are applied to
`sample means and variances of the time series data. This
`again requires the assumption that the notion of statistical
`confidence is analogous to the Dempster-Shafer concept of
`plausibility [9]. Equivalently, this means that the significance
`of a given statistical hypothesis test is equal to one minus the
`plausibility of a given event.
`2) Hypothesis Test on Pattern Means: To compare two
`(potentially faulty) data patterns, we first make the assumption
`that if the two patterns are the same (or similar), then their
`means and variances are both similar. Further, it is assumed
`that an equipment malfunction may cause either a shift in
`the mean or variance of a given sensor signal. To apply a
`hypothesis test to assess the similarity of two data patterns,
`the mean and variance of each pattern must first be computed.
`We begin by testing the hypothesis that the mean value (
`)
`of the current fault pattern equals the mean (
`) of previously
`stored fault patterns. The quantity
`is calculated using the
`predictions from the stored NTS fault models. Letting
`and
`be the sample variances of current pattern and stored
`-statistic [17],
`pattern, the appropriate test statistic is the
`which is given by
`
`(15)
`
`are the sample sizes for the current and stored
`and
`where
`pattern, respectively. The statistical significance level for this
`hypothesis test
`satisfies the relationship:
`,
`where
`is the number of degrees of freedom. After the
`significance level has been computed, the probability that the
`is equal
`mean values of the two data patterns are equal
`to
`.
`the as-
`3) Hypothesis Test on Pattern Variances: Under
`sumption that
`two similar data patterns will have similar
`variances, we may also test the hypothesis that the variance
`of the fault pattern currently being analyzed (
`) equals the
`variance of each of the stored patterns (
`). The appropriate
`-ratio [17], defined as
`test statistic is now the
`
`(16)
`
`are the sample variances for the current fault
`and
`where
`pattern and the stored patterns, respectively. The statistical
`) satisfies the
`significance level for this hypothesis test (
`relationship
`
`(17)
`
`where
`are the degrees of freedom
`and
`for
`, respectively. The parameters
`and
`are the
`and
`numbers of samples collected for the current fault pattern and
`the stored patterns, respectively. The resultant probability of
`equal variances is
`.
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:51:26 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1029
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 6 of 9
`
`
`
`KIM AND MAY: REAL-TIME DIAGNOSIS OF SEMICONDUCTOR MANUFACTURING EQUIPMENT
`
`45
`
`Fig. 7. Plot of real-time support and plausibility for a recognized gas flow
`fault.
`
`Fig. 8. Typical CUSUM control chart depicting the V-mask and scaling
`parameters.
`
`their targets. This is accomplished by means of the moving
`“V-mask” (see Fig. 8).
`Using this approach to generate evidential support requires
`the cumulative sums
`
`(19)
`(20)
`
`After the two hypothesis tests for equal mean and variance
`have been completed, the evidential support and plausibility
`that the current sampled pattern is similar to a previously
`stored pattern are defined as
`
`Support
`Plausibility
`
`(18)
`
`where
`is the sum used to detect positive process shifts,
`is the sum used to detect negative shifts,
`is the mean
`value of the current sample, and
`is the target value. The
`initial value of both
`and
`is set to zero. Both sums
`accumulate deviations from the target value
`greater than ,
`and both reset to zero upon becoming negative. The parameter
`is given by
`
`Using the rules of evidence combination outlined above, the
`support and plausibility of a particular pattern generated at
`each time point are integrated with their prior values for each
`sensor, thereby updating them continuously and providing the
`mechanism for real-time diagnosis.
`To demonstrate this approach, sensor data corresponding to
`the faulty CHF flow shown in Fig. 3 was used to derive an
`NTS model for this fault pattern. The training set for the NTS
`models consisted of one out of every ten data samples. The
`NTS fault model is assumed to be stored in a database, from
`which it is compared to other patterns collected by sensors in
`real-time so that the similarity of the sensor data to this stored
`pattern can be evaluated. In this example, the pattern of CHF
`flow under consideration as a potential match to the stored
`fault pattern was sampled once for every 15 sensor data points.
`and
`Using the sample mean and variance of each pattern,
`were calculated using (15) and (16). After evaluating the data,
`the evidential support and plausibility for pattern similarity
`are shown in Fig. 7.
`4) Unrecognized Fault Case: In order to identify mal-
`functions which have not been previously encountered, May
`and Spanos established a technique based on the CUSUM
`control chart [9]. The technique allows the detection of very
`small process shifts, which is critical for fabrication steps such
`as reactive ion etching, where slight equipment miscalibrations
`may only have sufficient
`time to manifest
`themselves as
`small shifts when the total processing time is on the order of
`minutes. CUSUM charts monitor such shifts by comparing the
`cumulative sums of the deviations of the sample values from
`
`(21)
`
`where
`is the standard deviation of the sampled variable
`and
`is the aspect angle of the V-mask. This angle has
`been selected to detect one-sigma process shifts with 95%
`probability. The chart has an average run length of 50 wafers
`between alarms when the process is in control [17].
`When either
`or
`exceeds the decision interval ( ), this
`signals that the process has shifted out of statistical control.
`The decision interval is
`
`(22)
`
`where
`is the V-mask lead distance. The decision interval may
`be used as the process tolerance limit and the sums
`and
`are to be treated as measurement residuals. Thus, numerical
`support is derived from the