`
`2143
`
`Adaptive Antenna Systems
`
`B. WIDROW, MEMBER, IEEE, P. E. MANTEY, MEMBER, IEEE, L. J. GRIFFITHS,
`STUDENT MEMBER, IEEE, AND B. B. GOODE, STUDENT MEMBER, IEEE
`
`Abstract-A system CODSistiDg of an antenna array and an adaptive
`processor can perform filtering In botb tbe space and tbe frequency domains.
`tblll!l reducing tbe seusithi:ty of tbe signal-receiving system to interfering
`directiooal ooise sources.
`Variable weights of a signal processor can be automatically adjusted by a
`simple adaptive tedmique based oo tbe least-mean-squares (L~) algorithm.
`During tbe adaptive process an injected pilot signal simulates a received signal
`from a desired "look" direction. This allows tbe array to be .. trained" so
`tbat its directivity pattern bas a main lobe in tbe previously specified look
`directioo. At tbe same time, tbe array processing system can reject any
`incideut ooises, whose directioos of propagatioo are ditfereut from tbe
`desired look directioo, by forming appropriate nulls In tbe anteuna directivity
`pattern. The array adapts itself to form a main lobe, witb its directioo and
`bandwidtb determined by tbe pilot signa.l, and to reject signals or ooises
`occurring outside tbe main lobe as wen as possible in tbe minimmu mean(cid:173)
`square error seuse.
`Several examples illlll!ltrate tbe convergence of tbe LMS adaptatioo
`procedure toward tbe corresponding Wieuer optimmu solutioos. Rates of
`adaptatioo and misadjastmellts of tbe solutions are predicted tbeoretically
`and checked experimeutally. Substantial reductioos in ooise receptioo are
`demoostrated in computer-simulated experimeuts. The tedmiques described
`are applicable to signal-receiving arrays for use over a wide range of fre(cid:173)
`qneocies.
`
`INTRODUCTION
`
`T HE SENSITIVITY of a signal-receiving array to
`
`interfering noise sources can be reduced by suitable
`processing of the outputs of the individual array ele(cid:173)
`ments. The combination of array and processing acts as a
`filter in both space and frequency. This paper describes a
`method of applying the techniques of adaptive filteringUl
`to the design of a receiving antenna system which can extract
`directional signals from the medium with minimum dis(cid:173)
`tortion due to noise. This system will be called an adaptive
`array. The adaptation process is based on minimization of
`mean-square error by
`the LMS algorithmPl-!4 1 The
`system operates with knowledge of the direction of arrival
`and spectrum of the signal, but with no knowledge of the
`noise field. The adaptive array promises to be useful when(cid:173)
`ever there is interference that possesses some degree of
`spatial correlation; such conditions manifest themselves
`over the entire spectrum, from seismic to radar frequencies.
`
`Manuscript received May 29, 1967; revised September 5, 1967.
`B. Widrow and L. J. Griffiths are with the Department of Electrical
`Engineering, Stanford University, Stanford, Calif.
`P. E. Mantey was formerly with the Department of Electrical Engineer·
`ing, Stanford University. He is presently with the Control and Dynamical
`Systems Group, IBM Research Laboratories, San Jose, Calif.
`B. B. Goode is with the Department of Electrical Engineering, Stan(cid:173)
`ford University, Stanford, Calif., and the Navy Electronics Laboratory,
`San Diego, Calif.
`
`The term "adaptive antenna" has previously been used
`by Van Atta!SJ and others.l6 1 to describe a self-phasing an(cid:173)
`tenna system which reradiates a signal in the direction from
`which it was received. This type of system is called adaptive
`because it performs without any prior knowledge of the
`direction in which it is to transmit. For clarity, such a sys(cid:173)
`tem might be called an adaptive transmitting array; whereas
`the system described in this paper might be called an adap(cid:173)
`tive receiving array.
`The term "adaptive filter" has been used by Jakowatz,
`Shuey, and White!7l to describe a system which extracts an
`unknown signal from noise, where the signal waveform
`recurs frequently at random intervals. Davisson!BI has
`described a method for estimating an unknown signal wave(cid:173)
`form in the presence of white noise of unknown variance.
`Glaser19 l has described an adaptive system suitable for the
`detection of a pulse signal of fixed but unknown waveform.
`Previous work on array signal processing directly related
`to the present paper was done by Bryn, Mermoz, and Shor.
`The problem of detecting Gaussian signals in additive
`Gaussian noise fields was studied by Bryn,1101 who showed
`that, assuming K antenna elements in the array, the Bayes
`optimum detector could be implemented by either K 2 linear
`filters followed by "conventional" beam-forming for each
`possible signal direction, or by K linear filters for each
`possible signal direction. In either case, the measurement
`and inversion of a 2K by 2K correlation matrix was required
`at a large number of frequencies in the band of the signal.
`Mermoz[ 11 1 proposed a similar scheme for narrowband
`known signals, using the signal-to-noise ratio as a perfor(cid:173)
`mance criterion. Shor[12l also used a signal-to-noise-ratio
`criterion to detect narrowband pulse signals. He proposed
`that the sensors be switched off when the signal was known
`to be absent, and a pilot signal injected as if it were a noise(cid:173)
`free signal impinging on the array from a specified direction.
`The need for specific matrix inversion was circumvented
`by calculating the gradient of the ratio between the output
`power due to pilot signal and the output power due to
`noise, and using the method of steepest descent. At the same
`time, the number of correlation measurements required was
`reduced, by Shor's procedure, to 4K at each step in the
`adjustment of the processor. Both Mermoz and Shor have
`suggested the possibility of real-time adaptation.
`This paper presents a potentially simpler scheme for ob(cid:173)
`taining the desired array processing improvement in real
`time. The performance criterion used is minimum mean(cid:173)
`square error. The statistics of the signal are assumed
`
`Petitioner Apple Inc.
`Ex. 1016, p. 2143
`
`
`
`2144
`
`PROCEEDINGS OF THE IEEE, DECEMBER 1967
`
`to be known, but no prior knowledge or direct measure(cid:173)
`ments of the noise field are required in this scheme. The
`adaptive array processor considered in the study may be
`automatically adjusted (adapted) according to a simple
`iterative algorithm, and the procedure does not directly
`involve the computation of any correlation coefficients or
`the inversion of matrices. The input signals are used only
`once, as they occur, in the adaptation process. There is no
`need to store past input data; but there is a need to store the
`processor adjustment values, i.e., the processor weighting
`coefficients ("weights"). Methods of adaptation are pre(cid:173)
`sented here, which may be implemented with either analog
`or digital adaptive circuits, or by digital-computer realiza(cid:173)
`tion.
`
`DIRECTIONAL AND SPATIAL FILTERING
`An example of a linear-array receiving antenna is shown in
`Fig. l(a) and (b). The antenna of Fig. l(a) consists of seven
`isotropic elements spaced A.0/2 apart along a straight line,
`where ).0 is the wavelength of the center frequency / 0 of
`the array. The received signals are summed to produce an
`array output signal. The directivity pattern, i.e., the relative
`sensitivity of response to signals from various directions,
`is plotted in this figure in a plane over an angular range of
`-n/2<0<n/2 for frequency f 0 • This pattern is symmetric
`about the vertical line 0=0. The main lobe is centered at
`0=0. The largest-amplitude side lobe, at 0=24°, has a
`maximum sensitivity which is 12.5 dB below the maximum
`main-lobe sensitivity. This pattern would be different if it
`were plotted at frequencies other than f 0 .
`The same array configuration is shown in Fig. l{b); how(cid:173)
`ever, in this case the output of each element is delayed in
`time before being summed. The resulting directivity pattern
`now has its main lobe at an angle of !jl radians, where
`
`. _ 1 (;·oJ fo)
`
`,1,
`..,=sm - - =sm
`d
`
`. _ 1 (cb)
`
`-
`d
`
`.
`
`(1)
`
`in which
`
`fo = frequency of received signal
`20 = wavelength at frequency fo
`b = time-delay difference between neighboring-element
`outputs
`d = spacing between antenna elements
`c = signal propagation velocity= l 0f 0•
`The sensitivity is maximum at angle !jl because signals re(cid:173)
`ceived from a plane wave source incident at this angle, and
`delayed as in Fig. l(b), are in phase with one another and
`produce the maximum output signal. For the example
`illustrated in the figure, d=A.0/2, c5=(0.12941/ / 0 ), and there(cid:173)
`fore 1/1 =sin- 1 (2J / 0 ) = 15°.
`There are many possible configurations for phased arrays.
`Fig. 2(a) shows one such configuration where each of the
`antenna-element outputs is weighted by two weights in
`parallel, one being preceded by a time delay of a quarter of
`
`a cycle at frequency fo (i.e., a 90° phase shift), denoted by
`1/(4/0). The output signal is the sum of all the weighted
`signals, and since all weights are set to unit values, the direc(cid:173)
`tivity pattern at frequency fo is by symmetry the same as that
`of Fig. l(a). For purposes of illustration, an interfering
`directional sinusoidal "noise" of frequency / 0 incident on
`the array is shown in Fig. 2(a), indicated by the dotted
`arrow. The angle of incidence (45S). of this noise is such
`that it would be received on one of the side lobes of the
`directivity pattern with a sensitivity only 17 dB less than
`that of the main lobe at 8 = oo.
`If the weights are now set as indicated in Fig. 2(b), the
`directivity pattern at frequency / 0 becomes as shown in that
`figure. In this case, the main lobe is almost unchanged from
`that shown in Figs. l(a) and 2(a), while the particular side
`lobe that previously intercepted the sinusoidal noise in
`Fig. 2(a) has been shifted so that a null is now placed in the
`direction of that noise. The sensitivity in the noise direction
`is 77 dB below the main lobe sensitivity, improving the noise
`rejection by 60 dB.
`A simple example follows which illustrates the existence
`and calculation of a set of weights which will cause a signal
`from a desired direction to be accepted while a "noise" from
`a different direction is rejected. Such an example is illus(cid:173)
`trated in Fig. 3. Let the signal arriving from the de(cid:173)
`sired direction 8=0° be called the "pilot" signal p(t)=P
`sin w0 t, where ro0 £ 2:n: f 0 , and let the other signal, the noise,
`be chosen as n(t)= N sin w0 t, incident to the receiving array
`at an angle O=n/6 radians. Both the pilot signal and the
`noise signal are assumed for this example to be at exactly
`the same frequency / 0 . At a point in space midway between
`the antenna array elements, the signal and the noise are
`assumed to be in phase. In the example shown, there are
`two identical omnidirectional array elements, spaced ).0/2
`apart. The signals received by each element are fed to two
`variable weights, one weight being preceded by a quarter(cid:173)
`wave time delay of 1/(4/0 ). The four weighted signals are
`then summed to form the array output.
`The problem of obtaining a set of weights to accept p(t)
`and reject n(t) can now be studied. Note that with any set
`of nonzero weights, the output is of the form A sin (ro0t+f/>),
`and a number of solutions exist which will make the output
`be p(t). However, the output of the array must be indepen(cid:173)
`dent of the amplitude and phase of the noise signal if the
`array is to be regarded as rejecting the noise. Satisfaction of
`this constraint leads to a unique set of weights determined as
`follows.
`The array output due to the pilot signal is
`
`For this output to be equal to the desired output of p(t)=P
`sin w 0 t (which is the pilot signal itself), it is necessary that
`
`(3)
`
`Petitioner Apple Inc.
`Ex. 1016, p. 2144
`
`
`
`WIDROW ET AL.: ADAPTIVE ANTENNA SYSTEMS
`
`2145
`
`LOC)(
`DIRECTION
`
`NOISE
`
`.,~s
`·~
`"(NOISE AT FREQ f0 )
`
`!ol
`
`'-----oARRA~TPUT
`
`LOOK
`DIRECTION
`
`(b)
`
`Fig. 1. Directivity pattern for a linear array. (a) Simple array.
`(b) Delays added.
`
`p(t) =P!;inw,t
`
`"PILOT" SIGNAL l
`
`ARRAY OUTPUT
`SIGNAL
`
`w,= 0.099 w1 • - 1.233
`... -- 1.255 w,=-0. 182
`w,•-0.266 w,•- L610
`w4 •-1.518 w11 • 0.266
`w,• 0.182 w,.•-L519
`•.-- 1.610 w,.--0.999
`w,.• 0.000 w,.•- I .255
`Fig. 2. Directivity pattern of linear array. (a) With equal weighting.
`(b) With weighting for noise elimination.
`
`/• NOIS€"
`
`/n(t)=Nsinw,t
`I
`,
`I
`f-TT
`l
`i 6"'7'
`I
`
`I
`
`/
`
`Fig. 3. Array configuration for noise elimination example.
`
`L-----o~
`
`Petitioner Apple Inc.
`Ex. 1016, p. 2145
`
`
`
`2146
`
`PROCEEDINGS OF THE IEEE, DECEMBER 1967
`
`With respect to the midpoint between the antenna ele(cid:173)
`ments, the relative time delays of the noise at the two an(cid:173)
`tenna elements are ± [1/(4/o)] sin n/6 = ± 1/(8/o) = ± A.0/(8c),
`which corresponds to phase shifts of ± n/4 at frequency fo.
`The array output due to the incident noise at e = n/6 is then
`N [w 1 sin ( w0 t - ~) + w2 sin (w 0 t -
`+ w3 sin ( w 0 t + ~) + w4 sin ( w 0 t - ~) J · (4)
`
`3;
`
`)
`
`For this response to equal zero, it is necessary that
`
`Fig. 4. Adaptive array configuration for receiving narrowband signals.
`
`(5)
`
`(6)
`
`Thus the set of weights that satisfies the signal and noise
`response requirements can be found by solving (3) and (5)
`simultaneously. The solution is
`t.
`wl = t, W2 = t, W3 = t, W4 = -
`With these weights, the array will have the desired proper(cid:173)
`ties in that it will accept a signal from the desired direction,
`while rejecting a noise, even a noise which is at the same
`frequency fo as the signal, because the noise comes from a
`different direction than does the signal.
`The foregoing method of calculating the weights is more
`illustrative than practical. This method is usable when there
`are only a small number of directional noise sources, when
`the noises are monochromatic, and when the directions of
`the noises are known a priori. A practical processor should
`not require detailed information about the number and the
`nature of the noises. The adaptive processor described in
`the following meets this requirement. It recursively solves
`a sequence of simultaneous equations, which are generally
`overspecified, and it finds solutions which minimize the
`mean-square error between the pilot signal and the total
`array output.
`
`CONFIGURATIONS OF ADAPTIVE ARRAYS
`Before discussing methods of adaptive filtering and signal
`processing to be used in the adaptive array, various spatial
`and electrical configurations of antenna arrays will be
`considered. An adaptive array configuration for processing
`narrowband signals is shown in Fig. 4. Each individual
`antenna element is shown connected to a variable weight
`and to a quarter-period time delay whose output is in
`turn connected to another variable weight. The weighted
`signals are summed, as shown in the figure. The signal,
`assumed to be either monochromatic or narrowband, is
`received by the antenna element and is thus weighted by a
`complex gain factor A eN. Any phase angle 4J = -tan- 1
`(w 2 jw 1 ) can be chosen by setting the two weight values, and
`the magnitude of this complex gain factor A= ,J wi + w~
`can take on a wide range of values limited only by the range
`limitations of the two individual weights. The latter can
`assume a continuum of both positive and negative values.
`
`Fig. 5. Adaptive array configuration for receiving broadband signals.
`
`Thus the two weights and the 1/(4{0) time delay provide
`completely adjustable linear processing for narrowband
`signals received by each individual antenna element.
`The full array of Fig. 4 represents a completely general
`way of combining the antenna-element signals in an ad(cid:173)
`justable linear structure when the received signals and noises
`are narrowband. It should be realized that the same
`generality (for narrowband signals) can be achieved even
`when the time delays do not result in a phase shift of exactly
`n/2 at the center frequency fo. Keeping the phase shifts
`close to n/2 is desirable for keeping required weight values
`small, but is not necessary in principle.
`When one is interested in receiving signals over a wide
`band of frequencies, each of the phase shifters in Fig. 4 can
`be replaced by a tapped-delay-line network as shown in
`Fig. 5. This tapped delay line permits adjustment of gain
`and phase as desired at a number of frequencies over the
`band of interest. If the tap spacing is sufficiently close, this
`network approximates the ideal filter which would allow
`complete control of the gain and phase at each frequency
`in the passband.
`
`.
`ADAPTIVE SIGNAL PROCESSORS
`Once the form of network connected to each antenna
`element has been chosen, as shown for example in Fig. 4
`or Fig. 5, the next step is to develop an adaptation procedure
`which can be used to adjust automatically the multiplying
`weights to achieve the desired spatial and frequency filtering.
`
`Petitioner Apple Inc.
`Ex. 1016, p. 2146
`
`
`
`WID ROW ET AL.: ADAPTIVE ANTENNA SYSTEMS
`
`2147
`
`The procedure should produce a given array gain in the
`specified look direction while simultaneously nulling out
`interfering noise sources.
`Fig. 6 shows an adaptive signal-processing element. If
`this element were combined with an output-signal quantizer,
`it would then comprise an adaptive threshold logic unit.
`Such an element has been called an "Adaline"l 131 or a
`threshold logic unit (TLU)_I1 4 1 Applications of the adaptive
`threshold element have been made in pattern-recognition
`systems and
`in experimental adaptive control sys(cid:173)
`tems_I2J,!3J,I14l-!17J
`In Fig. 6 the input signals x1(t), · · ·, X;(t), · · ·, xn(t) are the
`same signals that are applied to the multiplying weights
`w1, · · ·, W;, · · ·, wn shown in Fig. 4 or Fig. 5. The heavy lines
`show the paths of signal flow; the lighter lines show func(cid:173)
`tions related to weight-changing or adaptation processes.
`The output signal s(t) in Fig. 6 is the weighted sum
`
`n
`
`s(t) = L, x;(t)w;
`
`(7)
`
`i= 1
`where n is the number of weights; or, using vector notation
`
`s(t) = WTX(t)
`
`(8)
`
`where WT is the transpose of the weight vector
`
`W ~ W;
`
`and the signal-input vector is
`
`X(t) £ X;(t)
`
`For digital systems, the input signals are in discrete-time
`sampled-data form and the output is written
`
`(9)
`
`where the index j indicates the jth sampling instant.
`In order that adaptation take place, a "desired response"
`signal, d(t) when continuous or d(j) when sampled, must be
`supplied to the adaptive element. A method for obtaining
`this signal for adaptive antenna array processing will be
`discussed in a following section.
`The difference between the desired response and the out(cid:173)
`put response forms the error signal e(j):
`
`(10)
`
`s(tl
`OUTPUT
`
`d(t)
`DESIRED
`flESf'tME
`Fig. 6. Basic adaptive element.
`
`This signal is used as a control signal for the "weight ad(cid:173)
`justment circuits" of Fig. 6.
`
`Solving Simultaneous Equations
`The purpose of the adaptation or weight-changing pro(cid:173)
`cesses is to find a set of weights that will permit the output
`response of the adaptive element at each instant of time to
`be equal to or as close as possible to the desired response.
`For each input-signal vector X(j), the error e(j) of (10)
`should be made as small as possible.
`Consider the finite set of linear simultaneous equations
`WTX(l) = d(l)
`WTX(2) = d(2)
`
`(11)
`
`WTX(N) = d(N)
`
`where N is the total number of input-signal vectors; each
`vector is a measurement of an underlying n-dimensional
`random process. There are N equations, corresponding to
`N instants of time at which the output response values are
`of concern; there are n "unknowns," the n weight values
`which form the components of W. The set of equations (11)
`will usually be overspecified and inconsistent, since in the
`present application, with an ample supply of input data, it is
`usual that N >> n. [These equations did have a solution in
`the simple example represented in Fig. 3. The solution is
`given in ( 6). Although the simultaneous equations (3) in that
`example appear to be different from (11 ), they are really the
`same, since those in (3) are in a specialized form for the case
`when all inputs are deterministic sinusoids which can be
`easily specified over all time in terms of amplitudes, phases,
`and frequencies.]
`When N is very large compared to n, one is generally
`interested in obtaining a solution of a set of N equations
`[each equation in the form of (10)] which minimizes the
`sum of the squares of the errors. That is, a set of weights W
`is found to minimize
`
`(12)
`
`Petitioner Apple Inc.
`Ex. 1016, p. 2147
`
`
`
`2148
`
`PROCEEDINGS OF THE IEEE, DECEMBER 1967
`
`When the input signals can be regarded as stationary
`stochastic variables, one is usually interested in finding a set
`of weights to minimize mean-square error. The quantity of
`interest is then the expected value of the square of the error,
`i.e., the mean-square error, given by
`
`(13)
`
`The set of weights that minimizes mean-square error can
`be calculated by squaring both sides of (10) which yields
`e2U) = d 2U) + WTXU)XUfW- 2dU)WTXU)
`and then taking the expected value of both sides of (14)
`E[e 2U)] = E[d 2 + WTXU)XTU)W- 2WTdU)XU)]
`= E[ tP] + WT(J)(x, x)W- 2WTCI»(x, d)
`
`(15)
`
`(14)
`
`where
`
`oll(x, x) ~ E[X U)X 7 U)] ~ E x,:,
`
`x 1x 1
`
`X1X2 · · · XtXnl
`' '' X2Xn
`
`(16)
`
`and
`
`[
`
`X..Xt
`
`x,.x.
`
`(J)(x, d) ~ E[XU)dU)] ~ E
`
`(17)
`
`The symmetric matrix Cl»(x, x) is a matrix of cross correla(cid:173)
`tions and autocorrelations of the input signals to the ·adap(cid:173)
`tive element, and the column matrix (J)(x, d) is the set of
`cross correlations betweeen the n input signals and the de(cid:173)
`sired response signal.
`The mean-square error defined in (15) is a quadratic
`function of the weight values. The components of the gradi(cid:173)
`ent of the mean-square-error function are the partial
`derivatives of the mean-square error with respect to the
`weight values. Differentiating (15) with respect to W yields
`the gradient V E[ e2
`], a linear function of the weights,
`
`VE[e 2 ] = 2(J)(x, x)W- 2(J)(x, d).
`
`(18)
`
`When the choice of the weights is optimized, the gradient
`is zero. Then
`
`(J)(x,x)W~ = Cl»(x,~
`w~ = Cl»- 1(x, x)CI»(x, ~·
`
`(19)
`
`The optimum weight vector WLMS is the one that gives the
`least mean-square error. Equation (19) is the Wiener-Hopf
`equation, and is the equation for the multichannel least(cid:173)
`squares filter used by Burg£181 and Claerbout£ 191 in the
`processing of digital seismic array data.
`One way of finding the optimum set of weight values is
`
`to solve (19). This solution is generally straightforward, but
`presents serious computational problems when the number
`of weights n is large and when data rates are high. In addi(cid:173)
`tion to the necessity of inverting ann x n matrix, this method
`may require as many as n(n + 1 )/2 autocorrelation and cross(cid:173)
`correlation measurements to obtain the elements ofel»(x, x).
`Furthermore, this process generally needs to be continually
`repeated in most practical situations where the input signal
`statistics change slowly. No perfect solution of (19) is pos(cid:173)
`sible in practice because of the fact that an infinite sta(cid:173)
`tistical sample would be required to estimate perfectly the
`elements of the correlation matrices.
`Two methods for finding approximate solutions to (19)
`will be presented in the following. Their accuracy is limited
`by statistical sample size, since they find weight values based
`on finite-time measurements of input-data signals. These
`methods do not require explicit measurements of correla(cid:173)
`tion functions or matrix inversion. They are based on
`gradient-search techniques applied to mean-square-error
`functions. One of these methods, the LMS algorithm, does
`not even require squaring, averaging, or differentiation in
`order to make use of gradients of mean-square-error func(cid:173)
`tions. The second method, a relaxation method, will be
`discussed later.
`
`The LMS Algorithm
`A number of weight-adjustment procedures or algorithms
`exist which minimize the mean-square error. Minimization
`is usually accomplished by gradient-search techniques. One
`method that has proven to be very useful is the LMS
`algorithm.f 1H 3J,[t 71 This algorithm is based on the method
`of steepest descent. Changes in the weight vector are made
`along the direction of the estimated gradient vector.
`Accordingly,
`
`W(j + 1) = W(j) + k.V(j)
`
`(20)
`
`where
`
`W(j) ~weight vector before adaptation
`W(j + 1) ~weight vector after adaptation
`k. ~scalar constant controlling rate of convergence
`and stability (k. < 0)
`V(j) ~estimated gradient vector of ? with respect
`toW.
`
`One method for obtaining the estimated gradient of the
`mean-square-error function is to take the gradient of a
`single time sample of the squared error
`V(j) = V[ e2(j) J = 2e(j)V[ e(j)].
`
`From (10)
`
`Thus
`
`V[e(j)J = V[d(j)- WT(j)X(j}]
`- X(j).
`=
`
`V (j) = - 2e(j)X(j).
`The gradient estimate of(21) is unbiased, as will be shown
`by the following argument. For a given weight vector W(j),
`
`(21)
`
`Petitioner Apple Inc.
`Ex. 1016, p. 2148
`
`
`
`WID ROW ET AL.: ADAPTIVE ANTENNA SYSTEMS
`
`2149
`
`the expected value of the gradient estimate is
`E[V'U)] = -2E[{d(j)- WT(j)X(j)}X(j)]
`= -2[«D(x, d)- WT(j)«b(x, x)].
`
`(22)
`
`Comparing (18) and (22), we see that
`E[V'(j)] = VE[a2J
`and therefore, for a given weight vector, the expected value
`ofthe estimate equals the true value.
`Using the gradient estimation formula given in (21), the
`weight iteration rule (20) becomes
`W(j + 1) = W(j) - 2k.a(j)X(j)
`
`(23)
`
`and the next weight vector is obtained by adding to the
`present weight vector the input vector scaled by the value
`of the error.
`The LMS algorithm is given by (23). It is directly usable
`as a weight-adaptation formula for digital systems. Fig.
`7(a) shows a block-diagram representation of this equation
`in terms of one component wi of the weight vector W. An
`equivalent differential equation which can be used in
`analog implementation of continuous systems (see Fig. 7(b))
`is given by
`
`d
`dt W(t) = - 2k.a(t)X(t).
`
`This equation can also be written as
`
`W(t) = - 2k. f ~ a(~)X(~) d~.
`
`Fig. 8 shows how circuitry of the type indicated in Fig.
`7(a) or (b) might be incorporated into the implementation
`of the basic adaptive element of Fig. 6.
`
`Convergence of the Mean of the Weight Vector
`For the purpose of the following discussion, we assume
`that the time between successive iterations of the LMS
`algorithm is sufficiently long so that the sample input
`vectors X(j) and X(j + 1) are uncorrelated. This assumption
`is common in the field of stochastic approximation.l20H 22l
`Because the weight vector W(j) is a function only of the
`input vectors X(j -1), X(j- 2), · · ·, X(O) [see (23)] and be(cid:173)
`cause the successive input vectors are uncorrelated, W(j)
`is independent of X(j). For stationary input processes
`meeting this condition, the expected value E[W(i)] of the
`weight vector after a large number of iterations can then
`be shown to converge to the Wiener solution given by
`(19). Taking the expected value of both sides of (23), we
`obtain a difference equation in the expected value of the
`weight vector
`
`E[W(i + 1)] = E[WU)] - 2k.E[{d(j)- WT(j)X(j)}X(j)]
`= [I + 2k.«b(x, x)]E[WU)] - 2k.«b(x, d) (24)
`where I is the identity matrix. With an initial weight vector
`W(O), j + 1 iterations of (24) yield
`
`WEIGHT
`SETTING
`
`WEIGHT
`SETTING
`
`(b)
`
`E(t)
`ERROR
`SIGNAL
`Fig. 7. Block diagram representation of LMS algorithm.
`(a) Digital realization. (b) Analog realization.
`
`Fig. 8. Analogjdigital implementation of LMS
`weight-adjustment algorithm.
`
`E[W(j + 1)] = [I + 2k.«D(x, x)y+ 1 W(O)
`- 2k. I [I + 2k5«b(x, x)J«D(x, d).
`
`j
`
`(25)
`
`i~O
`
`Equation (25) may be put in diagonal form by using the
`appropriate similarity transformation Q for the matrix
`«b(x, x ), that is,
`
`where
`
`0
`0
`
`E,g_
`
`el
`0
`
`0
`e2
`
`0
`
`0
`
`is the diagonal matrix of eigenvalues. The eigenvalues are
`all positive, since «b(x, x) is positive definite [see (16)].
`Equation (25) may now be expressed as
`
`E[WU + 1)]= [I+ 2k.Q- 1EQ]i+ 1 W(O)
`j
`- 2k. I [I+ 2k.Q- 1EQJ«b(x, d)
`= Q- 1[I + 2k,.E]i+ 1 QW(O)
`- 2k.Q- 1 I [I+ 2k,.EJQ«b(x, d).
`
`i~O
`
`j
`
`(26)
`
`i~O
`
`Petitioner Apple Inc.
`Ex. 1016, p. 2149
`
`
`
`2150
`
`PROCEEDINGS OF THE IEEE, DECEMBER 1967
`
`Consider the diagonal matrix [I + 2k,sE]. As long as its
`diagonal terms are all of magnitude less than unity
`
`~m [I + 2k,sE]i+ 1
`J-ro
`
`-t 0
`
`and the first term of (26) vanishes as the number of iterations
`increases. The second term in (26) generally converges to a
`nonzero limit. The summation factor 2:{ = 0 [I+ 2k,sE]' be(cid:173)
`comes
`lim I [I+ 2k,sE]' = -
`
`1 E- 1
`-
`2k.
`
`J-oo i=O
`
`where the formula for the sum of a geometric series has
`been used, that is,
`
`00
`
`L (1 + 2k,ep); = -
`1
`
`i=O
`
`1
`-1
`- - - -= - ·
`(1 + 2k,ep)
`2k,eP
`Thus, in the limit, (26) becomes
`~ E[W(j + 1)]= Q- 1E- 1Qcp(x, d)
`= Cl»- 1(x, x)cp(x, d).
`
`(29)
`
`It is the opinion of the authors that the assumption of
`independent successive input samples used in the fore(cid:173)
`going convergence proof is overly restrictive. That is, con(cid:173)
`vergence of the mean of the weight vector to the LMS
`solution can be achieved under conditions of highly cor(cid:173)
`related input samples. In fact, the computer-simulation
`experiments described in this paper do not satisfy the con(cid:173)
`dition of independence.
`
`Time Constants and Learning Curve with LMS Adaptation
`State-variable methods, which are widely used in modem
`control theory, have been applied by Widrow[11 and Koford
`and Groner[21 to the analysis of stability and time constants
`(related to rate of convergence) of the LMS algorithm. Con(cid:173)
`siderable simplifications in the analysis have been realized
`by expressing transient phenomena of the system adjust(cid:173)
`ments (which take place during the adaptation process) in
`terms of the normal coordinates of the system. As shown by
`Widrow,l 1 1 the weight values undergo transients during
`adaptation. The transients consist of sums of exponentials
`with time constants given
`
`1
`r:P = ~2(c-_-:-:-- p = 1, 2, · · ·, n
`where eP is the pth eigenvalue of the input-signal correlation
`matrix cp(x, x).
`In the special case when all eigenvalues are equal, all
`time constants are equal. Accordingly,
`
`(30)
`
`j-+<1:)
`
`Comparison of this result with (19) shows that as the
`number of iterations increases without limit, the expected
`value ofthe weight vector converges to the Wiener solution.
`Convergence of the mean of the weight vector to the
`Wiener solution is insured if and only if the proportionality
`constant k. is set within certain bounds. Since the diagonal
`terms of [I+ 2k,sE] must all have magnitude less than unity,
`and since all eigenvalues in E are positive, the bounds on k,
`are given by
`
`or
`
`11 + 2k,emax! < 1
`
`-1
`emax
`
`< k. < 0
`
`(27)
`
`1
`7: = 2(-k.)e.
`
`where emax is the maximum eigenvalue of cp(x, x). This con(cid:173)
`vergence condition on k. can be related to the total input
`power as follows .
`Since
`
`emax ~trace [CJ)(x, x)]
`
`(28)
`
`where
`
`trace [CJ)(x, x)]! E[XT(j)X(j)]
`"
`= L E[ xf]! total input power,
`i= 1
`it follows that satisfactory convergence can be obtained with
`
`ft
`
`-1
`L E[xf]
`
`< k. < 0.
`
`i=l
`In practice, when slow, precise adaptation is desired, k. is
`usually chosen such that
`
`One very useful way to monitor the progress of an adap(cid:173)
`tive process is to plot or display its "learning curve." When
`mean-square error is the performance criterion being used,
`one can plot the expected mean-square error at each stage
`of the learning process as a function of the number of adapta(cid:173)
`tion cycles. Since the underlying relaxation phenomenon
`which takes place in the weight values is of exponential
`nature, and since from (15) the mean-square error is a
`quadratic form in the weight values, the transients in the
`mean-square-error function must also be exponential in
`nature.
`When all the time constants are equal. the mean-square(cid:173)
`error learning curve is a pure exponential with a time
`constant
`
`The basic reason for this is that the square of an exponential
`function is an exponential with half the time constant.
`
`Petitioner Apple Inc.
`Ex. 1016, p. 2150
`
`
`
`WIDROW ET AL.: ADAPTIVE ANTENNA SYSTEMS
`
`2151
`
`Estimation of the rate of adap