`
`Klaus Uwe Simmer
`Sven Fischer
` University of Bremen
`Department of Physics and Electrical Engineering
`P.O. Box 330 440, D–28334 Bremen, Germany
`e-mail: fischer@comm.uni-bremen.de
` Houpert Digital Audio
`Wiener Str. 5, D–28359 Bremen, Germany, Fax: +49 421 705675
`
`2. CONSTRAINED ADAPTIVE
`BEAMFORMING WITH ADAPTIVE
`LOOK–DIRECTION RESPONSE
`The task of constraint minimum variance beamform-
`ing is to minimize the total output power of the ar-
`ray subject to the constraint of preventing an a priori
`specified impulse response in the look direction[4]. In
`beamforming for speech application however, the per-
`formance of the array can be improved by allowing the
`look direction impulse response to vary with time. We
`use as adaptive look direction response the impulse re-
`sponse of a non–causal Wiener filter. The implemen-
`tation is straightforward, if we choose a generalized
`sidelobe cancelling structure as shown in figure 1. The
`constraints are included in the signal blocking matrix
`and an unconstrained update algorithm can be used [5].
`In our application we use the short time Fourier trans-
`form and the overlap–add method to estimate the trans-
`fer functions. The transfer functions Hi of the sidelobe
`cancelling part (lower signal track in figure 1) are given
`by [8]:
`
`Hiz (cid:3)
`
`(cid:2) i (cid:3) (cid:2) (cid:3) (cid:3) (cid:3)(cid:2) L (cid:2)
`
`(1)
`
` iyw z
` i iz
`where the number L depends on the blocking matrix
`structure (L can be M (cid:2) or less (M b(cid:3) number of mi-
`crophones) ). The cross power density spectrum iyw
`and the auto power density spectrum i i of equation
`(1) are estimated using the recursive update formulas:
`
`ABSTRACT
`In this paper we present an adaptive microphone ar-
`ray to suppress coherent as well as incoherent noise in
`disturbed speech signals. We use a generalized side-
`lobe cancelling (GSC) structure implemented in the fre-
`quency domain, because it allows a separate handling of
`determining the adaptive look direction response to sup-
`press incoherent noise and adjusting the adaptive filters
`for cancellation of coherent noise. The transfer function
`in the look direction is an adaptive Wiener–Filter which
`is estimated using the short time Fourier–Transform and
`the Nuttall/Carter method for spectrum estimation.
`
`INTRODUCTION
`1.
`Various methods for noise reduction and speech en-
`hancement with the aid of microphone arrays have pre-
`viously been described in the literature. The differ-
`ent approaches can be classified into three main cate-
`gories:
` conventional beamforming [1], [2], [3]
` adaptive beamforming [4], [5],
` microphone arrays with adaptive postfiltering [6]
`[7].
`The performance of these array techniques for noise
`reduction depends on the acoustical environment in
`which they have to operate. For example, adaptive
`beamforming works well if the number of point noise
`sources is smaller than the number of sensors. How-
`ever, in closed environments, noise is influenced by
`multipath propagation and reverberation which yields
`a multi–source noise field. In such noise fields the third
`method yields a much better noise reduction perfor-
`mance, but theoretically requires a completely inco-
`herent noise field.
`Realistic noise fields are neither perfectly diffuse
`nor do they consist of direct–path noise only. The re-
`flection coefficients of the walls as well as the distance
`between the noise sources and the array determine the
`ratio of coherent and incoherent noise components re-
`ceived by a microphone array. Therefore, a practical
`system for noise reduction must operate independently
`of the correlation properties of the noise field.
`The method presented here is able to suppress co-
`herent (i.e. direct path) noise and incoherent (i.e. dif-
`fuse) noise and can be conceived as a unification of
`the above mentioned three array techniques for noise
`reduction.
`
`(2)
`(3)
`
`(cid:2)
`
`l
` iyw k (cid:3)
`l
` i ik (cid:3)
`where k is the frequency index, l is the time segment
`index, i(cid:3)lk is the short time spectrum at the output of
`the signal blocking unit and Yw(cid:3)lk is the postfiltered
`output spectrum of the conventional beamformer (see
`also figure 1). In equations (2) and (3), is a number
`close to one and defines the average time. The GSC ap-
`proach for noise reduction is closely related to adaptive
`noise cancelling proposed by Widrow et al. [8]. The
`noise reduction which can be achieved by this type of
`processor is completely specified by the spatial coher-
`ence of the noise field. A reasonable noise reduction
`can be achieved, if the noise signals between adjacent
`microphones are highly correlated [9]. Therefore, this
`part of our noise reduction system is able to suppress
`
`l(cid:0)
` iyw k (cid:8) i(cid:3)lkYw(cid:3)lk
`
`l(cid:0)
`k (cid:8) j i(cid:3)lkj
` i i
`
`b
`
`b
`
`b
`
`b
`
`Proc. IWAENC-95, Røros, Norway, June 1995
`RTL898_1024-0001
`
`1
`
`Realtek 898 Ex. 1024
`
`
`
`adaption of all the transfer functions take place simul-
`taneously.
`2.2.
`Improvement of the Transfer Function Esti-
`mate cW
`The identity in equation (4) holds only in a statistical
`sense. In practice, only estimates of the spatial cross
`power densities bxixj are available, and validation of
`the identity in equation (4) requires infinitely long pe-
`riodogram time–averaging. Due to the nonstationary
`nature of the speech signals, only short time intervals
`are available for spectrum estimation. Therefore, the
`transfer function cW is only a rough estimate for the
`true Wiener Filter.
`To improve the estimate cW we use the combined
`time and lag weighting technique for periodogram
`smoothing as introduced by Nuttall and Carter [11],
`which was adapted to our application. The starting
`point are equations (6) and (7), which describes a
`short time Weighted Overlapped Segment Averaging
`(WOSA) method (excluding constant factors). These
`estimates are subjected to an inverse Fourier Trans-
`form to yield the correlation function estimates bRss
`and bRxx respectively. In a next step, the correlation es-
`timates are multiplied by a symmetric real lag weight-
`ing function wlag, which takes into account the win-
`dowing of the input data prior to the computation of
`the FFT’s, and is calculated according to the following
`expression [11]:
`
`wlagn (cid:4)
`
`(cid:2)
`
`(8)
`
`wdnRww
`Rwwn
`In equation (8), wd is the desired lag window (in our
`case a Hanning window of one fourth the FFT length
`to perform the desired smoothing), Rww is the auto
`correlation function of the data window, and wlag is
`the reshaped lag window. In a final step the weighted
`correlation estimates are transformed back into the
`frequency domain to yield the desired power density
`spectra used in determining cW according to equation
`(5).
`It should be noted that, the lag multiplication and
`Fourier transform can be replaced with frequency do-
`main convolution. But as stated in [11], the unusual
`lag weighting is important for achieving good mean
`behaviour and is unlikely to be apparent by consider-
`ing a frequency domain convolution approach.
`Finally, because a conventional beamformer per-
`fectly cancels single frequencies in several directions,
`cW is bounded between values of zero and one to avoid
`poles in the transfer function estimate.
`3. SPATIOTEMPORAL BLOCKING MATRIX
`In the Generalized Sidelobe Canceller the constraints
`are included in the signal blocking matrix. In its sim-
`plest form, the signal blocking is realized by taking the
`difference between adjacent sensor signals to yield (in
`the ideal case) noise only reference signals. The major
`drawback of this approach is that, if the desired speech
`signal is not blocked out completely at the input of the
`
`IFFT O
`
`LA
`
`+
`
`-
`
`Σ
`
`δ
`
`δ
`
`Blocking Matrix
`
`Σ
`
`FFT
`
`FFT
`
`FFT
`
`Window
`
`Window
`
`Window
`
`τ
`0
`
`τ
`1
`
`τ
`M-1
`
`Fig. 1. Block diagram of the noise reduction system.
`
`the coherent direct path noise, but is inefficient for in-
`coherent noise.
`2.1. Transfer Function in Look Direction
`The transfer function W in look direction contains the
`constraint values and is designed to suppress spatially
`incoherent noise signals only.
`It is based on the as-
`sumption of a spatially white noise field, where in the
`ideal case of uncorrelated speech and noise the spatial
`cross power density spectrum of the received signals
`xixj z equals the auto power density spectrum of
`the desired speech signal ssz [6]:
`
`(4)
`xixj z (cid:4) ssz
`This fact is utilized to estimate the Wiener Filter W in
`look direction. The transfer function cW is given in our
`application by [10]:
`
`(cid:2)
`
`
`
`
`M M (cid:0)
`
`M (cid:0) X
`
`MX
`
`xixj z
`
`i(cid:2)
`
`j(cid:2)i(cid:3)
`
`
`
` bssz
`
`(cid:3)
`
`cW z (cid:3)
`(cid:2)
`xxz
`xxz
`(5)
`xxz is the auto power density spectrum of the out-
`put signal of the conventional beamformer x. It can
`be shown that this transfer function cW is identical to
`the transfer function of a non causal Wiener Filter in
`the case of zero spatial correlation of the noise signals
`[10]. In the case of a completely coherent noise field
`the transfer function cW equals one and the noise re-
`duction is only due to the sidelobe cancelling path of
`the system shown in figure 1. The power density spec-
`tra in the numerator and denominator of equation (5)
`can be estimated in a manner similar to equations (2)
`and (3) from the short time spectra:
`
`blss k (cid:4) bl(cid:0)
`
`M M (cid:0)
`
`ss
`
`k (cid:8)
`
`(6)
`
`M (cid:0) X
`
`MX
`
`
`
`X i(cid:2)lkXj(cid:2)lk
`
`i(cid:5)
`j(cid:5)i(cid:6)
`(7)
`blxxk (cid:4) bl(cid:0)
`
`k (cid:8) jX lkj
`The transfer functions cW and Hi are determined as
`the data arrives at the input microphones. Thus, the
`
`xx
`
`(cid:2)
`
`RTL898_1024-0002
`
`2
`
`
`
`4. EXPERIMENTAL RESULTS
`4.1. Simulation Description
`To test the noise reduction performance of the de-
`scribed system, a computer program has been de-
`veloped which allows easy changing of the acous-
`tical properties of the enclosure. The input signals
`were generated by convolving one channel anechoic
`recordings of speech and noise with the source–to–
`microphone impulse responses. These room impulse
`responses were simulated using the image method de-
`scribed by Allen and Berkley [13]. The room dimen-
`sions were 3.50 7.10 2.96 m and the wall re-
`flection coefficients were varied to simulate different
`reverberation times, i.e. different ratios of direct path
`noise and diffuse noise. The desired speaker was posi-
`tioned 50 cm in front of the array and as noise source
`we used a hair–drier positioned 4.3 m away from the
`array center. The input SNR was 3 dB.
`4.2. Choosing the Array Aperture
`An array of discrete sensors can be conceived as a
`sampled continuous aperture. If the sampling period
`is not chosen appropriately, this sampling introduces
`spatial aliasing in form of grating lobes [14] [15]. On
`the other hand, the estimation of the transfer function
`in look direction W assumes a spatially white noise
`field.
`In practice the noise field can be at best dif-
`fuse with a spatial coherence function given by a sinc
`function. To yield spatially uncorrelated noise signals,
`undersampling the continuous aperture is usually per-
`formed. This works well for pure diffuse noise fields
`and if the desired speaker is close to the array. Our
`proposed system for noise reduction is in principle an
`adaptive beamformer which includes the method pro-
`posed in [7] as special case. An undersampled aperture
`yields a poor system performance in the case of direct
`path noise.
`We used a seven element linear, equally spaced ar-
`ray with 5 cm inter–element spacing and total aperture
`length of 35 cm to avoid spatial aliasing in the fre-
`quency band below 3400 Hz. Experiments with var-
`ious sensor configurations led us to revert to the linear
`array, which yielded the best performance under the
`constraint of a maximal number of seven sensors.
`A better performance is expected by splitting the ar-
`ray in subarrays [3] and performing the noise reduction
`in each subarray with the system shown in figure 1, and
`combining the outputs. However, this will increase the
`cost.
`4.3. Performance Measure
`In speech communications, the ultimate recipient of
`information is the human being. The artefacts gen-
`erated by many speech enhancement techniques de-
`crease the user acceptance for voice communication
`systems. Therefore, for performance evaluation, sub-
`jective listening tests are absolutely necessary. But be-
`cause this is time and cost intensive, we used for per-
`formance evaluation the Log Area Ratio (LAR) Dis-
`tance (L norm without energy weighting) as objective
`measure for speech quality which is found to correlate
`well with the subjective sensation [16].
`
`30
`
`20
`
`10
`
`0
`
`−10
`0
`
`500
`
`1000
`
`2500
`2000
`1500
`Frequency (Hertz)
`
`3000
`
`3500
`
`4000
`
`Fig. 2. Magnitude response of the lowpass filter for the
`signals i to increase the noise reduction performance
`in the low temporal frequency region.
`
`Magnitude Response (dB)
`
`noise cancelling filters Hi, the filters will adapt to the
`desired speech signal and as a consequence the latter
`will be partially cancelled in the output signal y. In
`beamforming for speech application, this signal can-
`cellation is often observed.
`The rows of the blocking matrix can be interpreted
`as fixed beamformers, each of them forming a spatial
`null in the look direction [5]. Based on this interpre-
`tation, a blocking matrix using a spatial filtering tech-
`nique was proposed in [12] to broaden the look direc-
`tion and therefore prevent the adaptive filters Hi from
`cancelling signals coming from an area around the fo-
`cal point. This approach can reduce the signal cancel-
`lation due to steering delay errors or widespread signal
`sources, but in general requires many sensors.
`To increase the overall performance of the noise re-
`duction system shown in figure 1, we propose in addi-
`tion to the spatial filter approach a temporal filtering in
`the blocking matrix. The motivation behind this is as
`follows:
`The low frequency components of the noise field can
`neither be suppressed by the conventional beamformer
`W , because of the good spatial
`nor by the postfilter c
`correlation in this frequency region. The summation
`of the array signals has the effect of a temporal low-
`pass filter on the noise signals. On the other hand, the
`low temporal frequencies of the noise field will be at-
`tenuated at the output of the blocking matrix. There-
`fore, the signal blocking has the effect of a temporal
`highpass filter on the array signals. The transforma-
`tion filters Hi have to compensate for this opposed be-
`haviour to form a proper cancelling signal YS k. The
`transformation filters are theoretically given by equa-
`tion (1) for time stationary signals. But in practice,
`there are only estimates available and the filter order is
`limited. There exists always a potential for mismatch
`in the transfer functions Hi, and because these filters
`have to operate over a large range of gain values, a
`mismatch can result in a very distorted output signal.
`Therefore, we include a fixed temporal lowpass filter
`in the blocking matrix, with the effect that the low fre-
`quency components in the signals i will be empha-
`sized. The used lowpass filter, whose magnitude re-
`sponse is shown in figure 2, is a one pole filter with
`transfer function Gz (cid:5) (cid:3) (cid:0) az(cid:0) and a (cid:5) (cid:4) .
`To avoid poles in the estimated transfer functions Hi
`due to zero power densities, their magnitudes were in
`addition constrained between values of zero and one.
`
`RTL898_1024-0003
`
`3
`
`
`
`REFERENCES
`[1] J.L. Flanagan, J.D. Johnston, R. Zahn, and G.W. Elko,
`“Computer–steered microphone arrays for sound
`transduction in large rooms,” J. Acoust. Soc. Amer.,
`vol. 78, no. 5, pp. 1508–1518, Nov. 1985.
`[2] W. Kellermann, “A self–steering digital microphone
`array,” in Proc. of
`the Internat. Conference on
`Acoustics, Speech and Signal Processing ICASSP–91,
`pp. 3581–3584, 1991.
`[3] Y. Mahieux, G. Le Tourneur, A. Gilloire, A. Saliou,
`and J.P. Jullien, “A microphone array for multimedia
`workstations,” in Proc. of the 3rd International Work-
`shop on Acoustic Echo Control, (Plestin les Gr`eves,
`France), pp. 145–149, Sep. 1993.
`[4] O.L. Frost, “An algorithm for linearly constrained
`adaptive array processing,” Proc. IEEE, vol. 60, no. 8,
`pp. 926–935, Aug. 1972.
`[5] L.J. Griffiths and C.W. Jim, “An alternative approach
`to linearly constrained adaptive beamforming,” IEEE
`Trans. Antennas Propagat., vol. AP-30, no. 1, pp. 27–
`34, Jan. 1982.
`[6] R. Zelinski, “A microphone array with adaptive post–
`filtering for noise reduction in reverberant rooms,”
`in Proc. of the Internat. Conference on Acoustics,
`Speech and Signal Processing ICASSP–88, (New
`York), pp. 2578–2581, Apr. 1988.
`[7] K.U. Simmer and A. Wasiljeff, “Adaptive microphone
`arrays for noise suppression in the frequency domain,”
`in Second Cost 229 Workshop on Adaptive Algorithms
`in Communications, (Bordeaux, France), pp. 185–
`194, 30.9.–2.10 1992.
`[8] B. Widrow, J.R. Glover, J.M. McCool, J. Kaunitz,
`Ch.S. Williams, R.H. Hearn, J.R. Zeidler, E. Dong,
`and R.C. Goodlin, “Adaptive noise cancelling: Prin-
`ciples and applications,” Proc. IEEE, vol. 63, no. 12,
`pp. 1692–1975, Dec. 1975.
`[9] W. Armbr¨uster, R. Czarnach, and P. Vary, “Adaptive
`noise cancellation with reference input – possible ap-
`plications and theoretical limits,” in Proc. European
`Signal Processing Conf. EUSIPCO–86, (The Hague),
`pp. 391–394, Sep. 1986.
`[10] K.U. Simmer, S. Fischer, and A. Wasiljeff, “Suppres-
`sion of coherent and incoherent noise using a micro-
`phone array,” Annals of telecommunications, vol. 49,
`no. 7/8, no. 7/8, pp. 439–446, 1994.
`[11] A. H. Nuttall and G.C. Carter, “Spectral estimation us-
`ing combined time and lag weighting,” Proc. IEEE,
`vol. 70, no. 9, pp. 1115–1125, Sep. 1982.
`[12] I. Claesson and S. Nordholm, “A spatial filtering ap-
`proach to robust adaptive beamforming,” IEEE Trans.
`Antennas Propagat., vol. 40, no. 9, pp. 1093–1096,
`Sep. 1992.
`[13] J.B. Allen and D.A. Berkley, “Image method for ef-
`ficiently simulating small–room acoustics,” J. Acoust.
`Soc. Amer., vol. 65, no. 4, pp. 943–950, Apr. 1979.
`[14] Don H. Johnson and Dan E. Dudgeon, Array Signal
`Processing — Concepts and Techniques. Englewood
`Cliffs: Prentice Hall, 1993.
`[15] L.J. Ziomek, Fundamentals of Acoustic Field Theory
`and Space–Time Signal Processing. Boca Raton: CRC
`Press, 1995.
`[16] S.R Quackenbush, T.P. Barnwell, and M.A. Clements,
`Objective Measures of Speech Quality. Englewood
`Cliffs: Prentice Hall, 1988.
`
`6
`
`5.5
`
`5
`
`4.5
`
`4
`
`3.5
`
`3
`
`2.5
`
`2
`
`Log Area Ratio Distance
`
`1.5
`100
`
`150
`
`200
`
`350
`300
`250
`Reverberation Time (msec)
`
`400
`
`450
`
`500
`
`Fig. 3. Log Area Ratio Distance as function of rever-
`beration time of the enclosure. Solid line: input LAR,
`dotted line: output LAR, dashed line: output LAR with-
`out temporal filtering in the blocking matrix.
`
`4.4. Results
`Figure 3 shows the LAR improvement as a function
`
`of Sabine’s reverberation time T (low LAR b(cid:0) high
`speech quality). The solid line shows the input LAR
`and the dotted Line shows the output LAR of the pro-
`posed noise reduction system. We can deduce from
`figure 3 that the proposed method works well for a
`large range of reverberation times and is therefore able
`to operate independently of the acoustical properties
`of the enclosure. The speech quality is considerably
`increased at the output of the noise reduction system.
`The dashed line in figure 3 shows the LAR at the out-
`put of the noise reduction system without the temporal
`lowpass filter in the blocking matrix, thus the overall
`performance can be increased by the proposed spa-
`tiotemporal signal blocking matrix, especially for re-
`verberation times below 400 msec.
`
`5. CONCLUSION
`In this paper we proposed a noise reduction system for
`suppression of coherent and incoherent noise in dis-
`turbed speech signals which is based on a Generalized
`Sidelobe Canceller with two adaptive portions.
`Im-
`provements of the estimation of the adaptive transfer
`function in look direction and the design of the signal
`blocking matrix were given. The experimental results
`demonstrated that the proposed method works well for
`a large range of reverberation times and is therefore
`able to operate independently of the acoustical proper-
`ties of the enclosure.
`
`ACKNOWLEDGEMENT
`The authors would like to thank Mr. E. Ochieng–
`Ogolla of the University of Bremen for his helpful ad-
`vice and suggestions that contributed to the improve-
`ment of this paper.
`
`RTL898_1024-0004
`
`4