throbber
252
`
`IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 8, NO. 3, AUGUST 1995
`
`Prediction of Wafer State After Plasma
`Processing Using Real-Time Tool Data
`
`Sherry F. Lee, Member, IEEE, and Costas J. Spanos, Member, IEEE
`
`' 1
`
`Abstract- Empirical models based on real-time equipment
`signals are used to predict the outcome (e.g., etch rates and
`uniformity) of each wafer during and after plasma processing.
`Three regression and one neural network modeling methods were
`investigated. The models are verified on data collected several
`weeks after the initial experiment, demonstrating that the models
`built with real-time data survive small changes in the machine due
`to normal operation and maintenance. The predictive capability
`can be used to assess the quality of the wafers after processing,
`thereby ensuring that only wafers worth processing continue
`down the fabrication line. Future applications include real-time
`evaluation of wafer features and economical run-to-run control.
`
`I. INTRODUCTION
`ITH INCREASING world-wide competition and esca-
`lating factory costs, companies are continuously im-
`proving their manufacturing skills to maintain high yield,
`increase throughput, and reduce the cost of equipment own-
`ership on the manufacturing line. A key element in achieving
`these goals is to monitor the equipment to ensure that the
`semiconductor wafers are processed properly at each step.
`The cost in dollars and throughput of measuring each wafer
`after it completes each step, however, becomes prohibitive in
`semiconductor factories producing hundreds of manufacturing
`steps. Resent practice is to measure monitor wafers periodi-
`cally, perhaps at the start of each work shift, after performing
`maintenance, or after changing the machine settings. Even with
`the use of monitor wafers, however, subsequent production
`wafers may still be processed improperly. Thus, instead of
`detecting equipment faults causing wafer yield loss early in
`the process flow, wafer yield loss is usually found very late
`in the processing line.
`We propose to use empirical models based on real-time
`equipment data to predict the outcome of each wafer immedi-
`ately after processing by each piece of equipment [l]. This
`will reduce the need for costly and time-consuming wafer
`measurements. The prediction ability allows the quality of
`the wafer to be known immediately after processing, thereby
`obtaining important wafer yield information to ensure that only
`wafers worth processing continue down the line. By predicting
`
`Manuscript received August 24, 1994; revised March 31, 1995. This work
`was supported by the SRC under Grant 93-MP-700, 94-MY-700, Digital
`Equipment Corporation, and Lam Research.
`S. Lee was with the Department of Electrical Engineering & Computer
`Sciences, University of California at Berkeley, CA 94720 USA. She is now
`with Motorola, Imine, CA 92718 USA.
`C. Spanos is with the Department of Electrical Engineering & Computer
`Sciences, University of California, Berkeley, CA 94720 USA.
`IEEE Log Number 9412874.
`
`the wafer characteristics, significant cost reduction is possible,
`thus lowering the overall cost of equipment ownership [2].
`We verify this general prediction methodology on a plasma
`etcher, one of the costliest pieces of equipment in the semi-
`conductor fabrication line. Not only is the etcher usually a
`bottleneck piece of equipment, the scrap produced by the
`etcher can be extremely costly. Furthermore, empirical models
`are appropriate because the etching mechanisms are not well
`understood. Although there is a tremendous push to develop
`models relating the plasma to interesting output characteristics
`of the wafer based on basic physical principles, first principle
`models are several years away from becoming useful on the
`factory floor [3]-[5]. Thus, at this time empirical models are
`faster and more practical for wafer state prediction.
`To provide useful prediction capabilities, robust prediction
`models of the plasma etchers are required. The industry
`standard is to use response surface methodology (RSM) to
`build models relating the input settings of the etchers to
`the output wafer state (Fig. 1). Models using input settings,
`however, may become unusable with time as the machine drifts
`with regular use, rendering them ineffective for prediction.
`Recently there has been much interest in using real-time tool
`data for modeling purposes. Wangmaneerat [6] used partial
`least squares regression to model the etch rate of silicon
`nitride thin films systems with optical emission spectroscopy
`(OES) signals. More recently, Anderson et al. [7] demonstrated
`that spatially resolved OES signals are effective in modeling
`plasma etch rates, selectivities, and uniformity, also using
`partial least squares regression. Neither work, however, has
`shown prediction capabilities by testing the models on data
`not used to build the original models. Rietman and Lory [8]
`have shown that neural networks can be used to model wafer
`attributes using a combination of real-time tool data and input
`setting data. The output of the model was the final oxide
`thickness in the source and drain regions of CMOS devices.
`The inputs to the model included input settings such as applied
`RF power, chamber pressure, gas flow rates, and real-time
`data such as induced dc bias, reflected RF power, and the
`emission spectrum, as well as the etch time. The resulting
`neural network models were tested using data not used to
`build the model. This testing data, however, was not separated
`in time from the original experiment, so it did not necessarily
`test the model's ability to withstand normal equipment drifts
`due to use over time.
`This paper shows that successful wafer state prediction
`over long periods of time can be achieved by using the real-
`time data from key sensors inside the equipment. Because
`
`0894-6507/95$04.00 0 1995 IEEE
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 1 of 10
`
`

`

`LEE AND SPANOS: PREDICTION OF WAFER STATE AFTER PLASMA PROCESSING
`
`~
`
`253
`
`1
`
`TABLE I
`REAL-TIME STATE SIGNALS COLLECTED FOR THE LAM RAINBOW 4400
`
`Lamstation Software
`RF Load Coil Position
`RF Tune Vane Position
`
`Comdel RPM-1
`
`RF Power
`RF Voltane
`
`DC Bias
`Endpoint
`
`II
`II
`
`DC Bias
`
`TABLE I1
`DESCRIPTION OF THE REAL-TIME SIGNALS
`
`Signal
`RF Tune Vane Position
`
`RF Load Coil Position
`
`RF Load Impedance
`RF Phase Error
`
`DC Bias
`Peak-to-Peak Voltage
`End Point Data
`
`RF Voltage
`RF Current
`
`Description
`Position of the tune vane in the matching network of the
`U D W ~ electrode: acts as a variable caDacitor
`Position of the load coil position in the matching network
`of the upper electrode, acts as a variable inductor
`I 900, at the uDwr electrode
`Apparent input impedance of the matching network
`lix phase error between the current and voltage (ideally
`I Measures the potential difference of the electrodes
`Magnitude of voltage on the electrodes
`Reads the intensity of the plasma in the chamber at a par-
`ticular wavelength
`I Root-mean-square (RMS) voltage at the upper electrode
`RMS current at the upper electrode
`
`I Input Settir
`
`Pressure
`Power
`Gas Flow 1
`Gas Flow 2
`
`Gap
`
`Fig. 1. Wafer state prediction: This paper shows that Chamber State Based
`(CSB) models, which map the chamber state data to the output state, are
`effective for prediction of wafer state, even in the presence of equipment
`aging.
`
`these real-time signals provide important information about the
`chamber state, we call the signals chamber state data. Models
`built with chamber state data, called CSB models, are effective
`for prediction since the chamber state data reflects the actual
`(as opposed to the intended) state of the equipment.
`To develop the prediction models, two sets of experiments
`were conducted. During the experiments, both the input set-
`tings and the chamber state data were collected. The wafer
`states of interest are the etch rates, selectivity, and uniformity.
`The first experiment, called the training experiment, consists
`of a central composite design. The models using data from
`the training experiment relating the chamber state data to the
`wafer states are called the training models. The second experi-
`ment, called the verijication experiment, was conducted several
`weeks later to determine the actual prediction capability of the
`training models. Three types of regression modeling methods
`for prediction (ordinary least squares regression, principal
`component regression, and partial least squares regression)
`are explored. These regression models are also compared
`to models developed using simple neural networks. Neural
`networks are included in this study because they have emerged
`as an effective modeling method for semiconductor manufac-
`turing processes. In addition, it has been shown that neural
`networks result in superior prediction results compared to
`ordinary least squares regression using input settings [SI-[ 121.
`In this paper, we compare different regression techniques with
`a simple feed-forward neural network using real-time data. The
`prediction metric used to compare the models is determined
`by how well the training model predicts the wafer states of
`the verification experiment. This metric is a good measure
`of the actual predictive capability of the models because it
`is determined from runs performed much later in time which
`were not included in model generation.
`The goal of this paper, then, is to show that chamber state
`data collected while the machine is processing are well-suited
`for prediction of the wafer state. We also demonstrate the
`importance of the verification experiment and show how it
`helps determine the prediction capability of the models. The
`paper begins with a description of the chamber state signals
`used in the CSB models, followed by a discussion of the
`methodology and models used to determine the wafer state
`prediction capability of the models. Next is a description of
`the training and verification experiments. The modeling results
`are then discussed, followed by a brief discussion of future
`directions.
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 2 of 10
`
`

`

`254
`
`IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 8, NO. 3, AUGUST 1995
`
`“’I
`
`I
`
`4
`
`I
`
`1
`
`2
`
`3
`
`4
`
`8
`
`9 IO 11
`
`12
`
`6
`7
`5
`wafer number
`(b)
`Fig. 2. Real-time chamber state signals of (a) RF load coil position and (b)
`dc bias for different input conditions on 12 wafers. Wafers #4 and #5 have
`unstable real-time signals and are rejected as “ b a d wafers [17]. Notice the
`large wafer-to-wafer variance compared to the within-wafer variance.
`
`for wafer #4 was unusually low due to the drop in RF power.
`Therefore, the run corresponding to wafer #4 was left out of
`the model training runs. As seen in Fig. 2, wafer #5 exhibited
`unstable signals and was also rejected as an outlier.’
`Once the outliers have been determined, in this work the
`time series nature of the signals is not used for prediction
`purposes. Instead, the wafer-to-wafer variability is mapped
`to the output wafer state. Fig. 2 shows (excluding wafers
`#4 and #5) that the wafer-to-wafer variance is much larger
`than the within-wafer variance. Therefore the average values
`per signal across each wafer can be used as the input for the
`prediction models built with the real-time signals. Each signal
`is averaged over the duration of the main etch step (after the
`native oxide breakthrough etch and before the overetch), which
`lasts approximately 30 s.
`Unlike the fixed input settings, the chamber state signals
`change with the state of the machine. This is illustrated in Fig.
`3, which shows the load impedance and RF tune vane position
`for the duration of six wafers processed at the same input
`settings. While the input settings are fixed for all six wafers,
`the chamber state signals vary for each etch, indicating that
`the chamber state data may give a more accurate description
`of the actual equipment state.
`Although the examples shown in this paper are based on
`data collected from the Lamstation and RPM-1 sensors, the
`methodology presented is general and can be applied to other
`types of sensor data. For example, data collected via optical
`emission spectroscopy can be used in exactly the same manner.
`A current research area is to determine the sensor data set
`which precisely describe the chamber state. At present we
`have found the data collected from Lamstation and RPM-1
`to be sufficient to show the power of this class of real-time
`tool data.
`
`111. WAFER STATE PREDICTION METHODOLOGY
`This section outlines the basic advantages and disadvantages
`of the four modeling methods, and discusses the prediction
`metric used to compare the prediction capability of the models.
`’Although undetected by the machine, these errors were detected by the
`Berkeley real-time fault detection system, RTSPC [16], [17].
`
`(b)
`Fig. 3. Real-time signals for six center point wafers during the duration of
`the main etch. Unlike the input settings, the real-time chamber state signals (a)
`load impedance and (b) RF tune vane position reflect changes in machine state.
`
`A. Modeling Methods
`The first method under discussion is ordinary least squares
`regression. Since this method results in poor prediction ca-
`pability when the modeling variables are correlated, other
`methods are investigated. Principal component regression and
`partial least squares regression can handle correlated data
`and have the added advantage that they can reduce the
`dimensionality of the model. Simple feed-forward error back
`propagation neural networks are also briefly discussed.
`1) Ordinary Least Squares Regression: The first regression
`method discussed is ordinary least squares regression (OLSR).
`The equation for the linear regression model is2
`
`where ij ( n x 1) is the prediction of the response y , X ( n x p )
`is the input matrix, and
`is a p x 1 vector of estimated model
`coefficients defined as
`f? = ( x ’ X ) - l x / y
`
`(2)
`
`provided that ( X ’ X ) is positive definite and therefore can be
`inverted. Throughout the paper, n is the number of observa-
`tions and p is the number of model parameters.
`Prediction problems arise when the columns of X exhibit
`multicollinearity, or are highly correlated. The main idea is
`that high correlation in X leads to small eigenvalues in X ’ X ,
`
`In this paper, bold face upper case letters denote matrices. Lower case bold
`face letters and Greek letters with an underscore (- ) denote column vectors.
`Scalars are denoted by lowercase letters. Transpose is denoted by (’).
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 3 of 10
`
`

`

`LEE A N D SPANOS: PREDICTION OF WAFER STATE AFTER PLASMA PROCESSING
`
`255
`
`which results in a high variance in both the estimate of
`the coefficienfs and the predicted responses. For example,
`let Go = zo,f? be a predicted value. The variance of this
`predicted value can be solved in terms of the eigenvalues w j
`and eigenvectors vj of X’X:
`var ($0) = var (Zo& = %COV [$, $14
`
`J
`
`J
`
`j = 1
`3=1
`where cov[Y, Y] = a21,. Equation (3) shows that the variance
`of the predicted values depends on both the value of the
`eigenvalues and the direction of the input 5,. The variance
`will be large for small eigenvalues and large values of zovj.
`The consequence of large variances in the predicted values
`is that the error in the prediction can potentially be huge.
`Thus, when the columns of X exhibit multicollinearity, the
`prediction capability of the model can be very poor.
`2) Principal Component Regression: Principal component
`regression (PCR) addresses the problem of multicollinearity.
`When building models with real-time data, it is common to
`have large numbers of correlated input parameters X. This
`number can easily escalate to an almost unmanageable number
`when interactions are included. For example, in this paper 13
`main signals are collected, resulting in 90 model variables
`when all the corresponding two-way interactions are included.
`Because many of the signals are correlated, not all 90 variables
`should (or can) be used independently in a model.
`To eliminate the correlation among the input variables, PCR
`transforms the correlated input variables to a set of orthogonal
`variables. The transformed variables 2, known as the principal
`components (PC’s), are linear combinations of the original
`variables. The value of these PC’s are called the scores.
`The coefficients of the original variables, or loadings, are the
`eigenvectors V of X’X. The equation for the transformed
`variables 2 is
`
`2 = (X - 1Z’)V
`(4)
`where Z’ is the vector of average values of each variable in
`X and 1 is a column vector of 1’s.
`All or a subset of the PC’s can be used as the input
`matrix for regression. Because the PC’s are orthogonal, there
`are no multicollinearity problems, and standard least squares
`techniques can be employed. The resulting model is
`y = z-;.
`
`(5)
`
`where 9 is the estimate of the coefficients using the equation
`
`9 = (2’2)-12’y.
`
`Because much of the variability can be captured in a subset
`of the PC’s, PCR reduces the dimensionality of the models to
`its most dominant factors. The subset of statistically significant
`
`PC’s in the model are determined by calculating the Student-
`t test for each of the coefficients. Only those PC’s with
`statistically significant coefficients at a specified level are
`retained in the model (0.05 significance level is used in the
`examples of Section V).
`While PCR decreases the number of parameters in the
`model, each model parameter still consists of a linear com-
`bination of input variables. Ideally, those input variables in X
`which do not significantly contribute to the model should be
`left out. When there are such large numbers of input variables,
`however, it is often very difficult to determine which of these
`simply add noise to the model and which are significant. An
`empirical method we developed to determine the “streamlined”
`models is to transform PCR model back to the input space of
`X. Assuming that the model is of the form in (5) and using
`(4) to substitute in for 2
`
`y = (X - 1Z’)VT = XVT - 1ZVT = Xp - 1Zp
`
`(6)
`
`where = VT. The general rule of thumb we found was to
`eliminaie those input parameters which have p value; at least
`a magnitude smaller than the average of the fargest 6 values.
`Regenerate the PCR model with the reduced set of input
`parameters, using the Student-t test to calculate the significance
`of the new PC’s. Continue to reduce the input parameter
`space as described above until the model prediction no longer
`improves. (An effective metric to determine prediction is
`described in Section 1II.B) This simple, yet effective empirical
`method handles large numbers of input parameters very easily.
`3) Partial Least Squares Regression: The last regression
`modeling technique under discussion is partial least squares re-
`gression (PLSR). This method is widely used in chemometrics,
`a field of chemistry that uses statistical methods for chemical
`data analysis [18]. The general idea of the PLSR algorithm
`is similar to that of PCR. A reduced set of parameters that
`sufficiently describe the input data is found and then used as
`the regressors on Y. The notion of factor loadings and scores
`introduced in the context of PCR is also used in PLSR. Instead
`of one set of loadings as was the case in PCR, two sets are
`used in PLSR, one for the input matrix and another for the
`response. The algorithm for one response follows.
`Let A,,,
`be the maximum number of PLSR factors. At the
`start of the algorithm, A,,,
`should be larger than anticipated
`to allow for unexpected factors. The following steps are then
`performed for each factor a = 1 , 2 , . . . , A,,,
`[18]-[20]:
`1) Determine the loading weight vector w,:
`,.
`XLlYa-1
`w, = IIXb-1Ya-1 I1 ’
`
`The loadings wa are orthonormal vectors which maxi-
`mize the covariance between X,-1 and ya-l. In other
`... , w ~ )
`words, W = (I&,&,
`relates the input and
`response, and is used to calculate the response in the
`model.
`2) Estimate the scores 2,:
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 4 of 10
`
`

`

`256
`
`IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 8, NO. 3, AUGUST 1995
`
`t, indicates how much of the response is correlated with
`the input data, and T = (i!1,22, . . . , 2 ~ ) is the reduced
`set of orthogonal scores that are used as regressors for
`Y. Orthogonal vectors are necessary to deal with the
`problem of multicollinearity.
`3) Estimate the input loadings pa:
`
`p = (pl, &, . . . , f i A ) is similar to the eigenvector matrix
`V in PCR, in that it consists of the loadings for the input.
`Although p is chosen to ensure that the 2, vectors are
`orthogonal, the pa vectors are generally not orthogonal.
`Unlike the loadings in PCA, the first pa vector does
`not explain the maximum variance in the input matrix;
`rather, it explains as much variance as possible while
`correlating with the response.
`4) Estimate the response loadings q,:
`
`Q = (el, q 2 , . . . , @A) is the additional loading term
`which brings the response into the model. It relates the
`score 2,
`to the response, minimizing the residual sum of
`squares of the response. Note that q, are scalars since
`this model is for one response.
`5 ) Create the new residuals i and P by subtracting the
`estimated values found in the previous steps from the
`actual values:
`
`A /
`
`
`
`i = X a - l - tap:
`F = ya-l - kq,.
`The product tap, estimates the input matrix, while
`the product t,ij, estimates the response matrix. Replace
`X,-1 and ya-l by the new residuals and increment a:
`X , = i, ya = F , and a = a + 1.
`Go back to Step 1.
`6) Once the number ( A ) of valid PLSR factors is deter-
`mined, the estimate of the coefficients to be used in the
`
`prediction model y = lao + XB are
`and bo = - 3’8. (7)
`= k(@’k)-’q
`
`Using (7) as an estimate of the coefficients, the same type
`of “streamlining” method described for PCR to reduce the
`number of input parameters can also be applied to PLSR.
`4 ) Feed-Forward Error Backward Propagation Neural Net-
`works: The last modeling method investigated is neural net-
`works, which are useful for modeling complex relationships,
`such as the plasma etching process. Furthermore, the form of
`the models is derived from the actual data, and not set a priori
`as is done for regression. Neural networks, however, do not
`
`provide information about the physics of the processes [8],
`PI, [Ill.
`Neural network models are empirically-based models which
`train a combination of “neurons,” or nodes, to learn and
`model relationships between a set of inputs and outputs. The
`connections among the nodes are weighted. In this application,
`one hidden layer was used, making a total of three layers in
`the network. The connections are between the input nodes
`and the hidden nodes, and between the hidden nodes and the
`output nodes. No bias was applied to the first layer. The output
`function for the remaining layers is the “squashing” activation
`function of the form f(x) = 1/1 + ec2, where x is the sum
`of the weighted outputs of the nodes preceding this particular
`node.
`The neural network algorithm selected for this analysis
`is the feed-forward, error backward propagation (FFEBP)
`method, which has shown to be effective in modelling noisy
`input and out put data [9]-[ll]. In this algorithm, the inputs
`are fed forward through the layers of the net work until
`reaching the output layer. The result at the output layer of
`node j is compared with the desired, or training, output. The
`difference, called the error, is used with the output of node
`i in a neighboring layer to calculate the new weighting of
`the connection between node i and node j . These errors are
`then used to calculate the weight changes for the connection
`between the input and hidden units. Because the weight
`corrections depend upon the corrections previously computed
`from the neighboring layer, the error in effect is propagated
`backward through the network [21]. In the FFEBP method,
`the gradient search method is used to minimize the sum of
`the squared errors [22]. A more thorough description of the
`algorithm can be found the review paper by Widrow and Lehr
`Wl.
`The Stuttgart Neural Network Simulator (SNNS) was used
`to simulate and train the neural networks [21]. The network
`learns the relationship between the input and output patterns
`as it undergoes learning iterations. To determine when to stop
`training, the neural network model was applied to the verifica-
`tion data set. Training stopped when this testing set achieved
`its lowest error. This is a usual practice to eliminate over-
`training, which results in decreased generalization capability
`of the network model.
`
`B. Testing the Prediction Capability of the Models
`This section describes the methodology used to determine
`the prediction capability of the models. As stated in Section
`I, two sets of experiments were conducted-the
`first for
`model generation and the second for model verification. It is
`important to note that the two experiments were conducted
`several weeks apart, and that between the experiments the
`equipment underwent normal use and maintenance. The veri-
`fication experiment is used to determine if the training models
`can withstand small changes in the equipment that occur with
`time.
`The often neglected verification stage is one of the most
`important in prediction model building. In many modeling
`situations, the assumption is made that if the model has a good
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 5 of 10
`
`

`

`LEE AND SPANOS: PREDICTION OF WAFER STATE AFTER PLASMA PROCESSING
`
`__
`
`251
`
`fit (for example, a high adjusted R2 and statistically significant
`terms), the model can be used for prediction. Unfortunately,
`this is not the case for plasma etchers on a production line.
`Because the machines go through regular maintenance and
`may drift with use, the model with the best fit based on one
`experiment conducted in a short time frame may not capture
`these changes in the machine. The model may also be too
`specific for the particular runs. These combined deficiencies
`result in unsatisfactory predictive capability. The verification
`experiment is designed to determine the best predictive model
`which takes into account normal equipment changes.
`The prediction metric determining the best model is based
`on how well the training model predicts the verification
`wafers. Because the verification data has been collected from
`the machine some time after the original experiment, and
`is not included during model generation, the true prediction
`capability of the models can be gauged. The metric used
`is the standard error prediction (SEP), where Yi is the ith
`observation, Yi is the predicted value of the ith point, and n
`is the number of observations in the experiment:
`
`KT
`
`SEP =
`
`C(Y2 - Y i ) 2
`i=l
`
`(8)
`
`Essentially, the SEP metric measures the spread of the differ-
`ence between the predicted and actual value. The examples
`shown in Section V rate the different models according to
`their normalized SEP metrics.
`
`IV. DESIGNED EXPERIMENT
`This section describes the experiment conducted to obtain
`the real-time data sets used to develop and verify the pre-
`dictive methodology. First, the wafer test structure is briefly
`described, followed by a discussion of both the training and
`the verification experiments. Finally, the wafer measurements
`are described.
`
`A. Test Structure
`The test structure was designed so that all processes of
`interest were simultaneously obtained in the same etch step.
`Due to complex loading effects, this method results in more
`accurate etch rates and selectivities than etching blanket wafers
`individually [24]. A simplified view of the test structure
`indicating the etched surfaces is shown in Fig. 4. First, a 600
`A thermal gate oxide is grown on the 4-in wafers, followed
`by 5500 8, n+ doped polysilicon, deposited via low pressure
`chemical vapor deposition (LPCVD). After a 20 min nitrogen
`anneal at 95OoC, 2800 A undoped low temperature oxide
`(LTO) is deposited via LPCVD. A three step mask process
`is required to build the test structure.
`
`B. Experiment
`The plasma etcher used in this experiment is the Lam
`Rainbow 4400 polysilicon plasma etcher. The main etch
`chemistry was a chlorine and helium etch. In both the training
`and verification experiments, a fixed pre-etch recipe was used
`
`I
`
`5MOA polySi
`600A gate oxide
`
`I
`
`Fig. 4. Test structure for the experiment.
`
`Parameter
`
`Pressure
`Power
`
`Gap
`RowRatio
`TotalFlow
`
`TABLE I11
`IN % FROM NOMINAL
`CHANGE
`
`1 Training Experiment I
`
`Experiment
`
`I
`
`I
`
`I
`
`Phase1
`
`f15%
`f15%
`f11%
`f19%
`i l l %
`
`Phase11
`I 392.5%
`i22.546
`f17%
`392%
`392%
`
`I
`
`*IO%
`f10%
`f10%
`flO%
`*IO%
`
`;
`
`I 0 -
`
`-1-0
`
`(C)
`(a)
`(b)
`Depiction of three input settings for the following: (a) Training Phase
`Fig. 5.
`I; (b) Training Phase 11; (c) Verification experiments.
`
`for all runs. The main etch recipe was modified according to
`a designed experiment described below. To obtain accurate
`etch rates the main etch was a timed etch, so no overetch
`was performed. The input parameters varied in the experiment
`are the chamber pressure, RF forward power, electrode gap
`spacing, the ratio of C12 to He, and the total gas flow rates of
`C12 and He. The output wafer parameters of interest are the
`etch rate of polysilicon, selectivity of polysilicon to oxide and
`I-line positive photoresist, and polysilicon wafer uniformity.
`The training experiment consisted of two phases. Phase
`I, the variable screening stage, determined the statistically
`significant variables in the models. Phase I1 assessed the
`quadratic nature of the system via a star design. The input
`values used for all experiments are listed in Table 111, in terms
`of percent offset from the nominal values. Fig. 5 illustrates the
`different points for three variables in the input space covered
`by each experiment. The particular values were chosen to
`cover a wide range of operating conditions of the machine.
`The actual runs conducted, in order of execution, are listed
`in Tables IV and V. Of the 32 runs in both phases of the
`training experiment, five were eliminated before modeling due
`to unstable real-time signals or misprocessing.
`Phase I consists of a two-level, 16 run fractional 25-1
`factorial design and 4 center points. The design is resolution
`V with no blocking, but drops to resolution I11 when blocked
`for time and split lots. The variable screening analysis was
`
`Authorized licensed use limited to: LEHIGH UNIVERSITY. Downloaded on July 12,2021 at 00:56:00 UTC from IEEE Xplore. Restrictions apply.
`
`Applied Materials, Inc. Ex. 1028
`Applied v. Ocean, IPR Patent No. 6,836,691
`Page 6 of 10
`
`

`

`25 8
`
`IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 8, NO. 3, AUGUST 1995
`
`TABLE IV
`TRAINING EXPERIMENT CONDITIONS: PHASE I
`
`TABLE VI
`VERIFICATION EXPERIMENT CONDITIONS
`
`Outer Ring (3
`
`I
`
`I
`
`I
`
`17
`
`18
`19
`20
`
`425
`489
`489
`489
`
`0.42
`0.50
`0.34
`0.50
`
`275
`316
`316
`234
`
`I
`
`0.9
`1.0
`
`0.8
`0.8
`
`I
`
`540
`
`600
`600
`600
`
`TABLE V
`TRAINING EXPERIMENT CONDITIONS: PHASE I1
`
`f- + -L --e - Inner Ring
`‘, ’ $1
`
`‘
`
`/
`
`
`
`% - / ’
`0
`
`Fig. 6. Wafer measurement points.
`
`etched in the system between experiments). The input settings
`for this experiment, varied one at a time, are listed in Table
`111. In addition, five center points were run. The list of runs
`conducted are in Table VI.
`
`C. Wafer Measurements
`In both experiments, film thickness measurements were
`taken by a Nanometrics Nanospec AFT system on 9 die per 4-
`in wafer. As shown in F

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket