`
`IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 8, NO. 3, AUGUST 1995
`
`Prediction of Wafer State After Plasma
`Processing Using Real-Time Tool Data
`
`Sherry F. Lee, Member, IEEE, and Costas J. Spanos, Member, IEEE
`
`' 1
`
`Abstract- Empirical models based on real-time equipment
`signals are used to predict the outcome (e.g., etch rates and
`uniformity) of each wafer during and after plasma processing.
`Three regression and one neural network modeling methods were
`investigated. The models are verified on data collected several
`weeks after the initial experiment, demonstrating that the models
`built with real-time data survive small changes in the machine due
`to normal operation and maintenance. The predictive capability
`can be used to assess the quality of the wafers after processing,
`thereby ensuring that only wafers worth processing continue
`down the fabrication line. Future applications include real-time
`evaluation of wafer features and economical run-to-run control.
`
`I. INTRODUCTION
`ITH INCREASING world-wide competition and esca-
`lating factory costs, companies are continuously im-
`proving their manufacturing skills to maintain high yield,
`increase throughput, and reduce the cost of equipment own-
`ership on the manufacturing line. A key element in achieving
`these goals is to monitor the equipment to ensure that the
`semiconductor wafers are processed properly at each step.
`The cost in dollars and throughput of measuring each wafer
`after it completes each step, however, becomes prohibitive in
`semiconductor factories producing hundreds of manufacturing
`steps. Resent practice is to measure monitor wafers periodi-
`cally, perhaps at the start of each work shift, after performing
`maintenance, or after changing the machine settings. Even with
`the use of monitor wafers, however, subsequent production
`wafers may still be processed improperly. Thus, instead of
`detecting equipment faults causing wafer yield loss early in
`the process flow, wafer yield loss is usually found very late
`in the processing line.
`We propose to use empirical models based on real-time
`equipment data to predict the outcome of each wafer immedi-
`ately after processing by each piece of equipment [l]. This
`will reduce the need for costly and time-consuming wafer
`measurements. The prediction ability allows the quality of
`the wafer to be known immediately after processing, thereby
`obtaining important wafer yield information to ensure that only
`wafers worth processing continue down the line. By predicting
`
`Manuscript received August 24, 1994; revised March 31, 1995. This work
`was supported by the SRC under Grant 93-MP-700, 94-MY-700, Digital
`Equipment Corporation, and Lam Research.
`S. Lee was with the Department of Electrical Engineering & Computer
`Sciences, University of California at Berkeley, CA 94720 USA. She is now
`with Motorola, Imine, CA 92718 USA.
`C. Spanos is with the Department of Electrical Engineering & Computer
`Sciences, University of California, Berkeley, CA 94720 USA.
`IEEE Log Number 9412874.
`
`the wafer characteristics, significant cost reduction is possible,
`thus lowering the overall cost of equipment ownership [2].
`We verify this general prediction methodology on a plasma
`etcher, one of the costliest pieces of equipment in the semi-
`conductor fabrication line. Not only is the etcher usually a
`bottleneck piece of equipment, the scrap produced by the
`etcher can be extremely costly. Furthermore, empirical models
`are appropriate because the etching mechanisms are not well
`understood. Although there is a tremendous push to develop
`models relating the plasma to interesting output characteristics
`of the wafer based on basic physical principles, first principle
`models are several years away from becoming useful on the
`factory floor [3]-[5]. Thus, at this time empirical models are
`faster and more practical for wafer state prediction.
`To provide useful prediction capabilities, robust prediction
`models of the plasma etchers are required. The industry
`standard is to use response surface methodology (RSM) to
`build models relating the input settings of the etchers to
`the output wafer state (Fig. 1). Models using input settings,
`however, may become unusable with time as the machine drifts
`with regular use, rendering them ineffective for prediction.
`Recently there has been much interest in using real-time tool
`data for modeling purposes. Wangmaneerat [6] used partial
`least squares regression to model the etch rate of silicon
`nitride thin films systems with optical emission spectroscopy
`(OES) signals. More recently, Anderson et al. [7] demonstrated
`that spatially resolved OES signals are effective in modeling
`plasma etch rates, selectivities, and uniformity, also using
`partial least squares regression. Neither work, however, has
`shown prediction capabilities by testing the models on data
`not used to build the original models. Rietman and Lory [8]
`have shown that neural networks can be used to model wafer
`attributes using a combination of real-time tool data and input
`setting data. The output of the model was the final oxide
`thickness in the source and drain regions of CMOS devices.
`The inputs to the model included input settings such as applied
`RF power, chamber pressure, gas flow rates, and real-time
`data such as induced dc bias, reflected RF power, and the
`emission spectrum, as well as the etch time. The resulting
`neural network models were tested using data not used to
`build the model. This testing data, however, was not separated
`in time from the original experiment, so it did not necessarily
`test the model's ability to withstand normal equipment drifts
`due to use over time.
`This paper shows that successful wafer state prediction
`over long periods of time can be achieved by using the real-
`time data from key sensors inside the equipment. Because
`
`0894-6507/95$04.00 0 1995 IEEE
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 1 of 10
`
`
`
`LEE AND SPANOS: PREDICTION OF WAFER STATE AFTER PLASMA PROCESSING
`
`~
`
`253
`
`1
`
`TABLE I
`REAL-TIME STATE SIGNALS COLLECTED FOR THE LAM RAINBOW 4400
`
`Lamstation Software
`RF Load Coil Position
`RF Tune Vane Position
`
`Comdel RPM-1
`
`RF Power
`RF Voltane
`
`DC Bias
`Endpoint
`
`II
`II
`
`DC Bias
`
`TABLE I1
`DESCRIPTION OF THE REAL-TIME SIGNALS
`
`Signal
`RF Tune Vane Position
`
`RF Load Coil Position
`
`RF Load Impedance
`RF Phase Error
`
`DC Bias
`Peak-to-Peak Voltage
`End Point Data
`
`RF Voltage
`RF Current
`
`Description
`Position of the tune vane in the matching network of the
`U D W ~ electrode: acts as a variable caDacitor
`Position of the load coil position in the matching network
`of the upper electrode, acts as a variable inductor
`I 900, at the uDwr electrode
`Apparent input impedance of the matching network
`lix phase error between the current and voltage (ideally
`I Measures the potential difference of the electrodes
`Magnitude of voltage on the electrodes
`Reads the intensity of the plasma in the chamber at a par-
`ticular wavelength
`I Root-mean-square (RMS) voltage at the upper electrode
`RMS current at the upper electrode
`
`I Input Settir
`
`Pressure
`Power
`Gas Flow 1
`Gas Flow 2
`
`Gap
`
`Fig. 1. Wafer state prediction: This paper shows that Chamber State Based
`(CSB) models, which map the chamber state data to the output state, are
`effective for prediction of wafer state, even in the presence of equipment
`aging.
`
`these real-time signals provide important information about the
`chamber state, we call the signals chamber state data. Models
`built with chamber state data, called CSB models, are effective
`for prediction since the chamber state data reflects the actual
`(as opposed to the intended) state of the equipment.
`To develop the prediction models, two sets of experiments
`were conducted. During the experiments, both the input set-
`tings and the chamber state data were collected. The wafer
`states of interest are the etch rates, selectivity, and uniformity.
`The first experiment, called the training experiment, consists
`of a central composite design. The models using data from
`the training experiment relating the chamber state data to the
`wafer states are called the training models. The second experi-
`ment, called the verijication experiment, was conducted several
`weeks later to determine the actual prediction capability of the
`training models. Three types of regression modeling methods
`for prediction (ordinary least squares regression, principal
`component regression, and partial least squares regression)
`are explored. These regression models are also compared
`to models developed using simple neural networks. Neural
`networks are included in this study because they have emerged
`as an effective modeling method for semiconductor manufac-
`turing processes. In addition, it has been shown that neural
`networks result in superior prediction results compared to
`ordinary least squares regression using input settings [SI-[ 121.
`In this paper, we compare different regression techniques with
`a simple feed-forward neural network using real-time data. The
`prediction metric used to compare the models is determined
`by how well the training model predicts the wafer states of
`the verification experiment. This metric is a good measure
`of the actual predictive capability of the models because it
`is determined from runs performed much later in time which
`were not included in model generation.
`The goal of this paper, then, is to show that chamber state
`data collected while the machine is processing are well-suited
`for prediction of the wafer state. We also demonstrate the
`importance of the verification experiment and show how it
`helps determine the prediction capability of the models. The
`paper begins with a description of the chamber state signals
`used in the CSB models, followed by a discussion of the
`methodology and models used to determine the wafer state
`prediction capability of the models. Next is a description of
`the training and verification experiments. The modeling results
`are then discussed, followed by a brief discussion of future
`directions.
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 2 of 10
`
`
`
`254
`
`IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 8, NO. 3, AUGUST 1995
`
`“’I
`
`I
`
`4
`
`I
`
`1
`
`2
`
`3
`
`4
`
`8
`
`9 IO 11
`
`12
`
`6
`7
`5
`wafer number
`(b)
`Fig. 2. Real-time chamber state signals of (a) RF load coil position and (b)
`dc bias for different input conditions on 12 wafers. Wafers #4 and #5 have
`unstable real-time signals and are rejected as “ b a d wafers [17]. Notice the
`large wafer-to-wafer variance compared to the within-wafer variance.
`
`for wafer #4 was unusually low due to the drop in RF power.
`Therefore, the run corresponding to wafer #4 was left out of
`the model training runs. As seen in Fig. 2, wafer #5 exhibited
`unstable signals and was also rejected as an outlier.’
`Once the outliers have been determined, in this work the
`time series nature of the signals is not used for prediction
`purposes. Instead, the wafer-to-wafer variability is mapped
`to the output wafer state. Fig. 2 shows (excluding wafers
`#4 and #5) that the wafer-to-wafer variance is much larger
`than the within-wafer variance. Therefore the average values
`per signal across each wafer can be used as the input for the
`prediction models built with the real-time signals. Each signal
`is averaged over the duration of the main etch step (after the
`native oxide breakthrough etch and before the overetch), which
`lasts approximately 30 s.
`Unlike the fixed input settings, the chamber state signals
`change with the state of the machine. This is illustrated in Fig.
`3, which shows the load impedance and RF tune vane position
`for the duration of six wafers processed at the same input
`settings. While the input settings are fixed for all six wafers,
`the chamber state signals vary for each etch, indicating that
`the chamber state data may give a more accurate description
`of the actual equipment state.
`Although the examples shown in this paper are based on
`data collected from the Lamstation and RPM-1 sensors, the
`methodology presented is general and can be applied to other
`types of sensor data. For example, data collected via optical
`emission spectroscopy can be used in exactly the same manner.
`A current research area is to determine the sensor data set
`which precisely describe the chamber state. At present we
`have found the data collected from Lamstation and RPM-1
`to be sufficient to show the power of this class of real-time
`tool data.
`
`111. WAFER STATE PREDICTION METHODOLOGY
`This section outlines the basic advantages and disadvantages
`of the four modeling methods, and discusses the prediction
`metric used to compare the prediction capability of the models.
`’Although undetected by the machine, these errors were detected by the
`Berkeley real-time fault detection system, RTSPC [16], [17].
`
`(b)
`Fig. 3. Real-time signals for six center point wafers during the duration of
`the main etch. Unlike the input settings, the real-time chamber state signals (a)
`load impedance and (b) RF tune vane position reflect changes in machine state.
`
`A. Modeling Methods
`The first method under discussion is ordinary least squares
`regression. Since this method results in poor prediction ca-
`pability when the modeling variables are correlated, other
`methods are investigated. Principal component regression and
`partial least squares regression can handle correlated data
`and have the added advantage that they can reduce the
`dimensionality of the model. Simple feed-forward error back
`propagation neural networks are also briefly discussed.
`1) Ordinary Least Squares Regression: The first regression
`method discussed is ordinary least squares regression (OLSR).
`The equation for the linear regression model is2
`
`where ij ( n x 1) is the prediction of the response y , X ( n x p )
`is the input matrix, and
`is a p x 1 vector of estimated model
`coefficients defined as
`f? = ( x ’ X ) - l x / y
`
`(2)
`
`provided that ( X ’ X ) is positive definite and therefore can be
`inverted. Throughout the paper, n is the number of observa-
`tions and p is the number of model parameters.
`Prediction problems arise when the columns of X exhibit
`multicollinearity, or are highly correlated. The main idea is
`that high correlation in X leads to small eigenvalues in X ’ X ,
`
`In this paper, bold face upper case letters denote matrices. Lower case bold
`face letters and Greek letters with an underscore (- ) denote column vectors.
`Scalars are denoted by lowercase letters. Transpose is denoted by (’).
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 3 of 10
`
`
`
`LEE A N D SPANOS: PREDICTION OF WAFER STATE AFTER PLASMA PROCESSING
`
`255
`
`which results in a high variance in both the estimate of
`the coefficienfs and the predicted responses. For example,
`let Go = zo,f? be a predicted value. The variance of this
`predicted value can be solved in terms of the eigenvalues w j
`and eigenvectors vj of X’X:
`var ($0) = var (Zo& = %COV [$, $14
`
`J
`
`J
`
`j = 1
`3=1
`where cov[Y, Y] = a21,. Equation (3) shows that the variance
`of the predicted values depends on both the value of the
`eigenvalues and the direction of the input 5,. The variance
`will be large for small eigenvalues and large values of zovj.
`The consequence of large variances in the predicted values
`is that the error in the prediction can potentially be huge.
`Thus, when the columns of X exhibit multicollinearity, the
`prediction capability of the model can be very poor.
`2) Principal Component Regression: Principal component
`regression (PCR) addresses the problem of multicollinearity.
`When building models with real-time data, it is common to
`have large numbers of correlated input parameters X. This
`number can easily escalate to an almost unmanageable number
`when interactions are included. For example, in this paper 13
`main signals are collected, resulting in 90 model variables
`when all the corresponding two-way interactions are included.
`Because many of the signals are correlated, not all 90 variables
`should (or can) be used independently in a model.
`To eliminate the correlation among the input variables, PCR
`transforms the correlated input variables to a set of orthogonal
`variables. The transformed variables 2, known as the principal
`components (PC’s), are linear combinations of the original
`variables. The value of these PC’s are called the scores.
`The coefficients of the original variables, or loadings, are the
`eigenvectors V of X’X. The equation for the transformed
`variables 2 is
`
`2 = (X - 1Z’)V
`(4)
`where Z’ is the vector of average values of each variable in
`X and 1 is a column vector of 1’s.
`All or a subset of the PC’s can be used as the input
`matrix for regression. Because the PC’s are orthogonal, there
`are no multicollinearity problems, and standard least squares
`techniques can be employed. The resulting model is
`y = z-;.
`
`(5)
`
`where 9 is the estimate of the coefficients using the equation
`
`9 = (2’2)-12’y.
`
`Because much of the variability can be captured in a subset
`of the PC’s, PCR reduces the dimensionality of the models to
`its most dominant factors. The subset of statistically significant
`
`PC’s in the model are determined by calculating the Student-
`t test for each of the coefficients. Only those PC’s with
`statistically significant coefficients at a specified level are
`retained in the model (0.05 significance level is used in the
`examples of Section V).
`While PCR decreases the number of parameters in the
`model, each model parameter still consists of a linear com-
`bination of input variables. Ideally, those input variables in X
`which do not significantly contribute to the model should be
`left out. When there are such large numbers of input variables,
`however, it is often very difficult to determine which of these
`simply add noise to the model and which are significant. An
`empirical method we developed to determine the “streamlined”
`models is to transform PCR model back to the input space of
`X. Assuming that the model is of the form in (5) and using
`(4) to substitute in for 2
`
`y = (X - 1Z’)VT = XVT - 1ZVT = Xp - 1Zp
`
`(6)
`
`where = VT. The general rule of thumb we found was to
`eliminaie those input parameters which have p value; at least
`a magnitude smaller than the average of the fargest 6 values.
`Regenerate the PCR model with the reduced set of input
`parameters, using the Student-t test to calculate the significance
`of the new PC’s. Continue to reduce the input parameter
`space as described above until the model prediction no longer
`improves. (An effective metric to determine prediction is
`described in Section 1II.B) This simple, yet effective empirical
`method handles large numbers of input parameters very easily.
`3) Partial Least Squares Regression: The last regression
`modeling technique under discussion is partial least squares re-
`gression (PLSR). This method is widely used in chemometrics,
`a field of chemistry that uses statistical methods for chemical
`data analysis [18]. The general idea of the PLSR algorithm
`is similar to that of PCR. A reduced set of parameters that
`sufficiently describe the input data is found and then used as
`the regressors on Y. The notion of factor loadings and scores
`introduced in the context of PCR is also used in PLSR. Instead
`of one set of loadings as was the case in PCR, two sets are
`used in PLSR, one for the input matrix and another for the
`response. The algorithm for one response follows.
`Let A,,,
`be the maximum number of PLSR factors. At the
`start of the algorithm, A,,,
`should be larger than anticipated
`to allow for unexpected factors. The following steps are then
`performed for each factor a = 1 , 2 , . . . , A,,,
`[18]-[20]:
`1) Determine the loading weight vector w,:
`,.
`XLlYa-1
`w, = IIXb-1Ya-1 I1 ’
`
`The loadings wa are orthonormal vectors which maxi-
`mize the covariance between X,-1 and ya-l. In other
`... , w ~ )
`words, W = (I&,&,
`relates the input and
`response, and is used to calculate the response in the
`model.
`2) Estimate the scores 2,:
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 4 of 10
`
`
`
`256
`
`IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 8, NO. 3, AUGUST 1995
`
`t, indicates how much of the response is correlated with
`the input data, and T = (i!1,22, . . . , 2 ~ ) is the reduced
`set of orthogonal scores that are used as regressors for
`Y. Orthogonal vectors are necessary to deal with the
`problem of multicollinearity.
`3) Estimate the input loadings pa:
`
`p = (pl, &, . . . , f i A ) is similar to the eigenvector matrix
`V in PCR, in that it consists of the loadings for the input.
`Although p is chosen to ensure that the 2, vectors are
`orthogonal, the pa vectors are generally not orthogonal.
`Unlike the loadings in PCA, the first pa vector does
`not explain the maximum variance in the input matrix;
`rather, it explains as much variance as possible while
`correlating with the response.
`4) Estimate the response loadings q,:
`
`Q = (el, q 2 , . . . , @A) is the additional loading term
`which brings the response into the model. It relates the
`score 2,
`to the response, minimizing the residual sum of
`squares of the response. Note that q, are scalars since
`this model is for one response.
`5 ) Create the new residuals i and P by subtracting the
`estimated values found in the previous steps from the
`actual values:
`
`A /
`
`
`
`i = X a - l - tap:
`F = ya-l - kq,.
`The product tap, estimates the input matrix, while
`the product t,ij, estimates the response matrix. Replace
`X,-1 and ya-l by the new residuals and increment a:
`X , = i, ya = F , and a = a + 1.
`Go back to Step 1.
`6) Once the number ( A ) of valid PLSR factors is deter-
`mined, the estimate of the coefficients to be used in the
`
`prediction model y = lao + XB are
`and bo = - 3’8. (7)
`= k(@’k)-’q
`
`Using (7) as an estimate of the coefficients, the same type
`of “streamlining” method described for PCR to reduce the
`number of input parameters can also be applied to PLSR.
`4 ) Feed-Forward Error Backward Propagation Neural Net-
`works: The last modeling method investigated is neural net-
`works, which are useful for modeling complex relationships,
`such as the plasma etching process. Furthermore, the form of
`the models is derived from the actual data, and not set a priori
`as is done for regression. Neural networks, however, do not
`
`provide information about the physics of the processes [8],
`PI, [Ill.
`Neural network models are empirically-based models which
`train a combination of “neurons,” or nodes, to learn and
`model relationships between a set of inputs and outputs. The
`connections among the nodes are weighted. In this application,
`one hidden layer was used, making a total of three layers in
`the network. The connections are between the input nodes
`and the hidden nodes, and between the hidden nodes and the
`output nodes. No bias was applied to the first layer. The output
`function for the remaining layers is the “squashing” activation
`function of the form f(x) = 1/1 + ec2, where x is the sum
`of the weighted outputs of the nodes preceding this particular
`node.
`The neural network algorithm selected for this analysis
`is the feed-forward, error backward propagation (FFEBP)
`method, which has shown to be effective in modelling noisy
`input and out put data [9]-[ll]. In this algorithm, the inputs
`are fed forward through the layers of the net work until
`reaching the output layer. The result at the output layer of
`node j is compared with the desired, or training, output. The
`difference, called the error, is used with the output of node
`i in a neighboring layer to calculate the new weighting of
`the connection between node i and node j . These errors are
`then used to calculate the weight changes for the connection
`between the input and hidden units. Because the weight
`corrections depend upon the corrections previously computed
`from the neighboring layer, the error in effect is propagated
`backward through the network [21]. In the FFEBP method,
`the gradient search method is used to minimize the sum of
`the squared errors [22]. A more thorough description of the
`algorithm can be found the review paper by Widrow and Lehr
`Wl.
`The Stuttgart Neural Network Simulator (SNNS) was used
`to simulate and train the neural networks [21]. The network
`learns the relationship between the input and output patterns
`as it undergoes learning iterations. To determine when to stop
`training, the neural network model was applied to the verifica-
`tion data set. Training stopped when this testing set achieved
`its lowest error. This is a usual practice to eliminate over-
`training, which results in decreased generalization capability
`of the network model.
`
`B. Testing the Prediction Capability of the Models
`This section describes the methodology used to determine
`the prediction capability of the models. As stated in Section
`I, two sets of experiments were conducted-the
`first for
`model generation and the second for model verification. It is
`important to note that the two experiments were conducted
`several weeks apart, and that between the experiments the
`equipment underwent normal use and maintenance. The veri-
`fication experiment is used to determine if the training models
`can withstand small changes in the equipment that occur with
`time.
`The often neglected verification stage is one of the most
`important in prediction model building. In many modeling
`situations, the assumption is made that if the model has a good
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 5 of 10
`
`
`
`LEE AND SPANOS: PREDICTION OF WAFER STATE AFTER PLASMA PROCESSING
`
`__
`
`251
`
`fit (for example, a high adjusted R2 and statistically significant
`terms), the model can be used for prediction. Unfortunately,
`this is not the case for plasma etchers on a production line.
`Because the machines go through regular maintenance and
`may drift with use, the model with the best fit based on one
`experiment conducted in a short time frame may not capture
`these changes in the machine. The model may also be too
`specific for the particular runs. These combined deficiencies
`result in unsatisfactory predictive capability. The verification
`experiment is designed to determine the best predictive model
`which takes into account normal equipment changes.
`The prediction metric determining the best model is based
`on how well the training model predicts the verification
`wafers. Because the verification data has been collected from
`the machine some time after the original experiment, and
`is not included during model generation, the true prediction
`capability of the models can be gauged. The metric used
`is the standard error prediction (SEP), where Yi is the ith
`observation, Yi is the predicted value of the ith point, and n
`is the number of observations in the experiment:
`
`KT
`
`SEP =
`
`C(Y2 - Y i ) 2
`i=l
`
`(8)
`
`Essentially, the SEP metric measures the spread of the differ-
`ence between the predicted and actual value. The examples
`shown in Section V rate the different models according to
`their normalized SEP metrics.
`
`IV. DESIGNED EXPERIMENT
`This section describes the experiment conducted to obtain
`the real-time data sets used to develop and verify the pre-
`dictive methodology. First, the wafer test structure is briefly
`described, followed by a discussion of both the training and
`the verification experiments. Finally, the wafer measurements
`are described.
`
`A. Test Structure
`The test structure was designed so that all processes of
`interest were simultaneously obtained in the same etch step.
`Due to complex loading effects, this method results in more
`accurate etch rates and selectivities than etching blanket wafers
`individually [24]. A simplified view of the test structure
`indicating the etched surfaces is shown in Fig. 4. First, a 600
`A thermal gate oxide is grown on the 4-in wafers, followed
`by 5500 8, n+ doped polysilicon, deposited via low pressure
`chemical vapor deposition (LPCVD). After a 20 min nitrogen
`anneal at 95OoC, 2800 A undoped low temperature oxide
`(LTO) is deposited via LPCVD. A three step mask process
`is required to build the test structure.
`
`B. Experiment
`The plasma etcher used in this experiment is the Lam
`Rainbow 4400 polysilicon plasma etcher. The main etch
`chemistry was a chlorine and helium etch. In both the training
`and verification experiments, a fixed pre-etch recipe was used
`
`I
`
`5MOA polySi
`600A gate oxide
`
`I
`
`Fig. 4. Test structure for the experiment.
`
`Parameter
`
`Pressure
`Power
`
`Gap
`RowRatio
`TotalFlow
`
`TABLE I11
`IN % FROM NOMINAL
`CHANGE
`
`1 Training Experiment I
`
`Experiment
`
`I
`
`I
`
`I
`
`Phase1
`
`f15%
`f15%
`f11%
`f19%
`i l l %
`
`Phase11
`I 392.5%
`i22.546
`f17%
`392%
`392%
`
`I
`
`*IO%
`f10%
`f10%
`flO%
`*IO%
`
`;
`
`I 0 -
`
`-1-0
`
`(C)
`(a)
`(b)
`Depiction of three input settings for the following: (a) Training Phase
`Fig. 5.
`I; (b) Training Phase 11; (c) Verification experiments.
`
`for all runs. The main etch recipe was modified according to
`a designed experiment described below. To obtain accurate
`etch rates the main etch was a timed etch, so no overetch
`was performed. The input parameters varied in the experiment
`are the chamber pressure, RF forward power, electrode gap
`spacing, the ratio of C12 to He, and the total gas flow rates of
`C12 and He. The output wafer parameters of interest are the
`etch rate of polysilicon, selectivity of polysilicon to oxide and
`I-line positive photoresist, and polysilicon wafer uniformity.
`The training experiment consisted of two phases. Phase
`I, the variable screening stage, determined the statistically
`significant variables in the models. Phase I1 assessed the
`quadratic nature of the system via a star design. The input
`values used for all experiments are listed in Table 111, in terms
`of percent offset from the nominal values. Fig. 5 illustrates the
`different points for three variables in the input space covered
`by each experiment. The particular values were chosen to
`cover a wide range of operating conditions of the machine.
`The actual runs conducted, in order of execution, are listed
`in Tables IV and V. Of the 32 runs in both phases of the
`training experiment, five were eliminated before modeling due
`to unstable real-time signals or misprocessing.
`Phase I consists of a two-level, 16 run fractional 25-1
`factorial design and 4 center points. The design is resolution
`V with no blocking, but drops to resolution I11 when blocked
`for time and split lots. The variable screening analysis was
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 6 of 10
`
`
`
`25 8
`
`IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 8, NO. 3, AUGUST 1995
`
`TABLE IV
`TRAINING EXPERIMENT CONDITIONS: PHASE I
`
`TABLE VI
`VERIFICATION EXPERIMENT CONDITIONS
`
`Outer Ring (3
`
`I
`
`I
`
`I
`
`17
`
`18
`19
`20
`
`425
`489
`489
`489
`
`0.42
`0.50
`0.34
`0.50
`
`275
`316
`316
`234
`
`I
`
`0.9
`1.0
`
`0.8
`0.8
`
`I
`
`540
`
`600
`600
`600
`
`TABLE V
`TRAINING EXPERIMENT CONDITIONS: PHASE I1
`
`f- + -L --e - Inner Ring
`‘, ’ $1
`
`‘
`
`/
`
`
`
`% - / ’
`0
`
`Fig. 6. Wafer measurement points.
`
`etched in the system between experiments). The input settings
`for this experiment, varied one at a time, are listed in Table
`111. In addition, five center points were run. The list of runs
`conducted are in Table VI.
`
`C. Wafer Measurements
`In both experiments, film thickness measurements were
`taken by a Nanometrics Nanospec AFT system on 9 die per 4-
`in wafer. As shown in F