`James M. Katesa) and Mark R. Weiss
`Center for Research in Speech and Hearing Sciences, City University of New York, Graduate Center,
`Room 901, 33 West 42nd Street, New York, New York 10036
`共Received 8 May 1995; revised 5 September 1995; accepted 9 January 1995兲
`Microphone arrays have proven effective in improving speech intelligibility in noise for
`hearing-impaired listeners, and several array processing techniques have been proposed for hearing
`aids. Among the signal-processing approaches are classical delay-and-sum beamforming,
`superdirective arrays, and adaptive arrays. To directly compare the effectiveness of these different
`processing strategies, a 10-cm-long linear array was built using five uniformly spaced
`omnidirectional microphones. This array was used in the end-fire orientation to acquire speech and
`noise signals for a variety of array placements in two representative rooms. Both digital and
`simulated analog processing techniques were considered, with the array processing implemented in
`the frequency domain. The performance metric was the steady-state array gain weighted to represent
`the relative importance of the different frequency regions in understanding speech. The processing
`comparison indicates that digital systems are more effective than the simulated analog processing,
`and that both superdirective and adaptive digital array processing can provide more than 9 dB of
`weighted array gain. © 1996 Acoustical Society of America.
`PACS numbers: 43.66.Ts, 43.60.Gk
`
`INTRODUCTION
`
`In this paper, several fixed-coefficient and adaptive pro-
`cessing algorithms are compared for a short microphone ar-
`ray suitable for hearing-aid applications. The processing ef-
`fectiveness is evaluated using acoustic data acquired in two
`representative rooms, with the processing performed off-line.
`End-fire array placements on the side of the head or near
`reflecting surfaces were used to give conditions similar to
`those that could be experienced in everyday use. The data
`from a real room avoids the limitations of computer simula-
`tions that are typically used in evaluating array-processing
`algorithms, and permits an accurate comparison of the dif-
`ferent processing strategies that have been proposed for hear-
`ing aids.
`A short microphone array is attractive for hearing-aid
`applications since it is one of the few approaches, among the
`many that have been proposed, that has actually improved
`speech intelligibility in noise for the hearing impaired. The
`improvement in signal-to-noise ratio 共SNR兲 for a 10-cm long
`array using five uniformly spaced cardioid microphones with
`delay-and-sum beamforming is 5–12 dB 共Soede et al.,
`1993a, 1993b兲, with the greatest improvement occurring at
`the highest frequencies. Such an array can be hand-held or
`can be built into an eyeglass frame, and the performance of
`the array does not appear to be affected by the head to any
`great extent. The directional arrays used by Soede et al. im-
`proved the speech reception threshold 共SRT兲 by 7 dB in a
`diffuse noise field, so the improvement in SNR is directly
`related to a comparable improvement in speech intelligibility
`in noise.
`The performance offered by delay-and-sum beamform-
`ing can be bettered by using superdirective array processing
`
`a兲Corresponding author. Tel: 212-642-2179; Fax: 212-642-2379; E-mail:
`jkates@email.gc.cuny.edu
`
`共Cheng, 1971; Cox et al., 1986兲, in which the array perfor-
`mance is optimized for noise coming uniformly from all di-
`rections. A sensitivity constraint 共Newman et al., 1978; Cox
`et al., 1986兲 can be used in designing the superdirective ar-
`ray weights to reduce the effects of microphone position er-
`rors, wavefront perturbations, and the sensor internal noise.
`The constraint, however, causes a small reduction in the ar-
`ray gain. Simulation studies 共Kates, 1993; Stadler and
`Rabinowitz, 1993兲 have shown that a constrained superdirec-
`tive array can offer substantially more array gain than clas-
`sical delay-and-sum beamforming, but the performance in a
`real room has not been ascertained. A further processing op-
`tion is an oversteered array, similar to delay-and-sum beam-
`forming except that the time delays used in combining the
`microphone output signals are greater than the acoustic
`propagation times between the microphones. An oversteered
`array can offer performance very close to that of the optimal
`superdirective array 共Cox et al., 1986兲, and can be realized
`with a relatively simple analog system.
`Adaptive algorithms have also been proposed for
`hearing-aid arrays 共Peterson et al., 1987; Greenberg and
`Zurek, 1992; Link and Buckley, 1993; McKinney and De-
`Brunner, 1993; Hoffman et al., 1994兲. Adaptive array pro-
`cessing offers the possibility of improved performance over
`arrays using fixed coefficients, but a perturbed wavefront, as
`can be caused by sensor misalignment or by a specular re-
`flection, can result in signal cancellation 共Cox, 1973兲. The
`scaled projection algorithm 共Cox et al., 1987兲 can be used to
`prevent signal cancellation, and its application to adaptive
`hearing-aid arrays 共Link and Buckley, 1993; Hoffman et al.,
`1994兲 has resulted in improved performance. However, the
`improvement in speech SNR due to the array processing can
`be substantially reduced at low ratios of direct to reverberent
`sound even when the scaled projection constraint is used
`共Hoffman et al., 1994; Greenberg, 1994兲.
`
`3138
`
`0001-4966/96/99(5)/3138/11/$6.00
`
`J. Acoust. Soc. Am. 99 (5), May 1996
`© 1996 Acoustical Society of America
`Realtek 898 Ex. 1022
`RTL898_1022-0001
` Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 141.217.20.120 On: Thu, 08 Jan 2015 20:38:27
`
`3138
`
`
`
`The desire for immunity to correlated interference such
`as specular reflections has lead to modifications of the basic
`adaptive array-processing algorithms. One technique is to
`force the correlation matrix used in the array processing to
`have a Toeplitz structure 共Godara and Gray, 1989; Godara,
`1991兲 in which the entries of the correlation matrix are re-
`placed by the values averaged along the diagonals. Simula-
`tion studies have shown that the resulting structured correla-
`tion matrix offers improved performance in the presence of
`correlated interference for an array several wavelengths long
`共Godara, 1991兲. A further modification is to form a compos-
`ite correlation matrix, using the structured correlation matrix
`in the scaled projection algorithm at a low estimated input
`SNR value and gradually changing to the correlation matrix
`corresponding to an ideal isotropic noise field at high input
`SNR values. This approach is designed to give the benefits of
`an adaptive system at low input SNR values, but to smoothly
`shift to a superdirective array at high input SNR values
`where adaptive systems have exhibited reduced performance.
`In this paper, five frequency-domain processing algo-
`rithms are compared for the same set of microphone data.
`The algorithms are classical delay-and-sum beamforming, an
`oversteered superdirective array, an optimal superdirective
`array, an adaptive system using the scaled projection algo-
`rithm, and an adaptive system using the scaled projection
`algorithm combined with the composite structured correla-
`tion matrix. In order to directly compare the effectiveness of
`these different processing strategies in a real room, a 10-cm-
`long linear array was built using five uniformly spaced om-
`nidirectional microphones. This array was used in the end-
`fire orientation to acquire speech and noise signals for three
`array placements in two representative rooms. The methods
`used for the data acquisition, signal processing, and perfor-
`mance evaluation are described in the remainder of the pa-
`per, along with the performance results.
`
`I. METHOD
`
`A. Data acquisition
`The array used for experiments was 10 cm long and
`consisted of five uniformly spaced Knowles EK-3033 omni-
`directional microphones. The array was used in the end-fire
`orientation. The outputs of the microphones were found to be
`matched to within ⫾1 dB, and no amplitude or phase equal-
`ization was provided. The microphone outputs were sampled
`at 10 kHz using an A/D converter with simultaneous sample-
`and-hold circuits having a ⫾25 ns aperture uncertainty.
`Stimuli were presented one at a time over a loudspeaker
`with the microphone responses sampled and stored on the
`computer for later processing. Speech stimuli were presented
`at an azimuth of 0 deg, and the noise stimuli were presented
`at azimuths of 60, 105, 180, 255, and 300 deg counterclock-
`wise around the array. The speech stimulus consisted of the
`sentence ‘‘The candy shop was empty.’’ spoken by a male
`talker. The uncorrelated noise stimuli at the other azimuths
`consisted of multitalker speech babble. A combined noise
`source was formed by summing the babble signals from the
`five noise azimuths at equal intensities; this combination pro-
`duced a diffuse noise field of the sort that would be found in
`
`FIG. 1. Floor plan of the office used for the array measurements. The array
`position and orientation for the floor-standing and KEMAR measurements
`are indicated by the arrow within the circle, and the loudspeaker positions
`for the speech and noise are indicated by the crosses. Angles are measured
`counterclockwise from the array orientation, with the speech loudspeaker
`position at 0 deg. The desk used for the desk-top measurements is also
`identified. For the desk measurements, the array was positioned at the ‘‘U’’
`in ‘‘USED’’ with the speech loudspeaker positioned at
`the ‘‘Y’’ in
`‘‘ARRAY.’’
`
`a restaurant or similar environment where several people are
`talking simultaneously. The test stimuli were bandlimited to
`5 kHz.
`Two rooms, an office and a conference room, were used
`for the measurements. A floor plan of the office showing the
`location of the furniture, the microphone array, and the loud-
`speaker positions, is presented in Fig. 1. The office walls are
`painted plasterboard, the floor is carpeting over a concrete
`slab, and the ceiling is acoustical tile beneath a plenum. Two
`of the walls are covered with bookshelves, and the office
`contains several desks, tables, and chairs, thus providing a
`complex acoustical environment. A floor plan of the confer-
`ence room showing the location of the furniture, the micro-
`phone array, and the loudspeaker positions, is presented in
`Fig. 2. The construction of the conference room is the same
`as for the office with the exception that the floor is covered
`with cork tiles instead of carpeting.
`Three array positions were used for the data acquisition
`in each room. A quasi-‘‘free-field’’ position was obtained by
`placing the array at a height of 1.4 m on a floor stand near
`the middle of the room and as far as possible from any re-
`flecting surface. A desktop position was obtained by placing
`the array on a microphone stand at a height of 15 cm above
`the surface of a desk 共office兲 or group of tables 共conference
`room兲, with the array at one end and the speech loudspeaker
`at the opposite end of the desk or tables. Measurements were
`also made using the KEMAR anthropometric manikin
`共Burkhard and Sachs, 1975兲 positioned near the center of the
`
`3139
`
`J. Acoust. Soc. Am., Vol. 99, No. 5, May 1996
`RTL898_1022-0002
` Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 141.217.20.120 On: Thu, 08 Jan 2015 20:38:27
`
`J. M. Kates and M. R. Weiss: Array processing comparison
`
`3139
`
`
`
`FIG. 2. Floor plan of the conference room used for the array measurements.
`The array position and orientation for the desk-top measurements is indi-
`cated by the arrow within the circle, and the loudspeaker positions for the
`speech and noise are indicated by the crosses. For the floor-standing and
`KEMAR measurements, the tables were moved to the periphery of the room
`and approximately the same loudspeaker and array positions within the
`room were used.
`
`room with the array positioned just above the left ear at a
`height of 1.2 m above the floor. The power for each speech
`or noise test signal at each microphone array position was
`normalized by forming the rms average across the five mi-
`crophones in the array and setting this average to 1 V.
`The physical and acoustic properties of the rooms are
`summarized in Table I. The reverberation time was estimated
`by observing the decay of a speech-shaped noise signal that
`was allowed to reach steady state in the room and was then
`switched off. The test signal was output by the speech loud-
`speaker in the room and the response was measured at the
`floor microphone array position. Since the ambient noise lev-
`els in the rooms did not permit an accurate measurement of
`the entire 60-dB decay of the test signal, the time to reach a
`level 20 dB below the steady-state level was measured and
`then tripled to give the indicated 60-dB reverberation time.
`The calculated quantities were computed from the physical
`measurements and reverberation time using the room acous-
`tics formulae given by Beranek 共1954兲.
`
`B. Array processing
`All of the array processing was implemented using a
`block frequency-domain approach as shown in Fig. 3. A
`frequency-domain implementation of the adaptive processing
`
`TABLE I. Acoustic properties of the rooms used for the processing evalu-
`ation.
`
`Property
`
`Office
`
`Conference room
`
`Measured:
`Length, m
`Width, m
`Height, m
`Volume, m3
`Reverb. time T60, ms
`Calculated:
`Ave. absorption coef.
`Mean free path, m
`Critical distance, m
`Speech direct/reverb., dB
`Noise direct/reverb., dB
`
`5.1
`4.5
`2.8
`60
`250
`
`0.332
`2.50
`0.97
`⫺6.3
`⫺0.2
`
`10.7
`6.2
`2.8
`185
`600
`
`0.197
`3.26
`1.05
`⫺10.5
`⫺5.0
`
`FIG. 3. Block diagram for the frequency-domain array processing.
`
`generally offers faster convergence than a time-domain ver-
`sion due to the reduced eigenvalue spread in the correlation
`matrices 共Narayan et al., 1983兲. A block frequency-domain
`implementation was chosen to reduce the computational bur-
`den 共Mansour and Gray, 1982兲, and a time-domain constraint
`was added to the weight computation to ensure a causal
`adaptive filter 共Clark et al., 1983兲. To implement the equiva-
`lent of an L-tap time-domain filter, a 2L-sample block of
`data is acquired from each microphone. A fast Fourier trans-
`form 共FFT兲 of size 2L is performed on each 2L-sample data
`buffer, after which the weights are computed independently
`for each positive FFT frequency bin. The frequency-domain
`signal is multiplied by the weights, summed across micro-
`phones at each frequency, and a 2L-point inverse FFT re-
`turns the weighted signal to the time domain. An overlap-
`save implementation 共Clarke et al., 1983兲 was used, with the
`buffer contents and weights updated every L input samples.
`Relatively short adaptive filters, varying in length from L⫽8
`to L⫽32 samples, were used in the experiments since work
`on adaptive microphone arrays 共Sondhi and Elko, 1986兲 has
`indicated that a short filter offers better immunity to delete-
`rious reflection effects than does a long filter.
`The weight vectors for the different processing ap-
`proaches can all be expressed using the same basic equation
`共Cox, 1973; Monzingo and Miller, 1980兲. The set of micro-
`phone weights in each FFT frequency bin is chosen to opti-
`mize the array output SNR subject to a constraint that a
`signal from the end-fire direction be passed with unit gain.
`The processing strategies differ primarily in the description
`of the noise field. The equation for the steady-state weights
`for all of the processing approaches is given by
`R⫺1共k 兲d共k 兲
`d*共k 兲R⫺1共k 兲d共k 兲 ,
`where d(k) is the steering vector 共vector giving the phase
`shift from one microphone to the next as a wave arriving
`from 0 deg propagates across the array兲 for FFT frequency
`index k, R(k) is the noise correlation matrix, and the asterisk
`denotes the conjugate transpose of the vector.
`
`w共k 兲⫽
`
`共1兲
`
`3140
`
`J. Acoust. Soc. Am., Vol. 99, No. 5, May 1996
`RTL898_1022-0003
` Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 141.217.20.120 On: Thu, 08 Jan 2015 20:38:27
`
`J. M. Kates and M. R. Weiss: Array processing comparison
`
`3140
`
`
`
`Classical delay-and-sum beamforming is based on the
`assumption that the dominant source of noise is the self-
`noise of the microphones and not the ambient noise field.
`This assumption leads to the system noise correlation matrix
`being the identity matrix, that is, R(k)⫽I(k), since the as-
`sumed Gaussian noise has equal intensity and is completely
`independent at each microphone 共Cox, 1973兲. The solution
`of Eq. 共1兲 for this form of assumed interference reduces to
`w(k)⫽d(k)/M, where M is the number of microphones in
`the array. The weight vector is independent in each FFT
`frequency bin. The oversteered weight vector is similar to
`that for delay-and-sum beamforming, but uses a modified
`steering vector having time delays multiplied by a scale fac-
`tor greater than one. The oversteered delay factor was imple-
`mented by approximating the phase response of a cascade of
`analog one-zero/one-pole all-pass networks, with the group
`delay chosen to double the normal propagation time between
`the microphones at low frequencies and to reduce to the nor-
`mal propagation time at 5 kHz. This degree of oversteering
`gave a minimum white noise gain of 0 dB.
`The weights for the superdirective and adaptive algo-
`rithms are also similar in form, but use correlation matrices
`that optimize the array performance for the ambient noise
`field rather than for the sensor self-noise. The superdirective
`processing is based on the correlation matrix R(k) calculated
`a priori for an assumed ideal spherically isotropic noise field
`共Cheng, 1971兲, while the adaptive system uses for R(k) the
`signal-plus-noise correlation matrix estimated directly from
`the incoming microphone signals 共Cox et al., 1987兲. The su-
`perdirective processing therefore determines
`the array
`weights based on assumed noise-field characteristics, while
`the adaptive processing determines the array weights in re-
`sponse to the actual noise field found in the room.
`The adaptive processing was implemented using the
`scaled projection algorithm of Cox et al. 共1987兲. This algo-
`rithm imposes a constraint on the magnitude of the weights
`so that
`
`w*共k 兲w共k 兲⭐1/␦2共k 兲,
`
`共2兲
`
`which has been shown to minimize the amount of signal
`cancellation that will occur under perturbed wavefront con-
`ditions. The constraint is equivalent to adding a constant to
`the elements of the main diagonal of the system correlation
`matrix R(k), with the result that the array response ap-
`proaches that of delay-and-sum processing when tightly con-
`strained. Because of the frequency-domain implementation,
`the weight constraint can easily be made frequency-
`dependent. At low frequencies, where the array is shortest
`with respect to the acoustic wavelength and thus has the
`poorest directivity, the constraint can be adjusted to allow a
`higher degree of directionality in the array response. Con-
`versely, at high frequencies, where delay-and-sum beam-
`forming can give adequate amounts of array gain, the con-
`straint can be tightened to guarantee that no signal
`cancellation will occur. The weight constraint was thus set to
`
`␦2共k 兲⫽再 0 dB re:1,
`
`f ⬍1 kHz,
`f ⫺1 dB re:1,
`f ⬎1 kHz.
`
`共3兲
`
`The correlation matrix was computed separately at each FFT
`analysis frequency, and each matrix was smoothed by a low-
`pass filter having a time constant of 500 ms. The weight
`adaptation was independent at each of the FFT bin frequen-
`cies, with convergence for the equivalent of a 16-tap filter
`taking about 200 ms. The algorithm was allowed to adapt for
`2 s to ensure full convergence prior to computing the perfor-
`mance metrics.
`The superdirective processing used a variant on the
`scaled projection algorithm to produce the array weights.
`The scaled projection algorithm uses the signal-plus-noise
`correlation matrix in computing the set of array weights that
`minimizes the array output power; the processing is adaptive
`because the weights are computed iteratively using a corre-
`lation matrix that can change over time. To produce the su-
`perdirective weights,
`the correlation matrix for the ideal
`spherically isotropic noise field, computed a priori from the
`array geometry 共Cheng, 1971兲 and unvarying in time, was
`substituted for the matrix measured from the input signal.
`The weight calculation was then iterated until convergence
`was reached, and the converged weights were used for the
`superdirective array performance measurements. The super-
`directive weights used the same scaled projection constraint
`on the magnitude of the weight vector as was used for the
`adaptive processing in order to prevent any potential signal
`cancellation caused by system misalignment 共Cox et al.,
`1986兲.
`the
`For the composite structured correlation matrix,
`scaled projection algorithm framework was again used, but
`with a modified correlation matrix. The values of the mea-
`sured signal correlation matrix were first averaged along
`each diagonal to give a Toeplitz structure 共Godara, 1991兲;
`this matrix was then combined with the correlation matrix
`calculated for the spherically diffuse noise field, with the
`proportion of the diffuse noise-field matrix increasing with
`increasing input SNR.
`
`C. Performance metric
`The performance metric used in this paper is the articu-
`lation index 共AI兲 weighted array gain, which is similar to
`intelligibility-weighted gain 共Greenberg et al., 1993兲. Ex-
`perimental results have shown a strong correlation between
`the array gain and the improvement in speech intelligibility
`共Soede et al., 1993b兲, and an even better correlation between
`the weighted array gain and intelligibility 共Hoffman et al.,
`1994兲. The AI-weighted array gain is calculated from the
`array gain computed at each frequency of the transformed
`data, and the array gains are combined using weights for
`each frequency band derived from the articulation index im-
`portance function given by Kryter 共1962a兲.
`The array gain 共Cox et al., 1987兲 for the kth FFT bin is
`given by
`
`G共k 兲⫽
`
`兩w*共k 兲d共k 兲兩2
`w*共k 兲Q共k 兲w共k 兲 ,
`where Q(k) is the noise-alone correlation matrix normalized
`so that Tr关Q(k)]⫽M, the number of microphones in the
`array. The array gain depends on the array weights and on
`
`共4兲
`
`3141
`
`J. Acoust. Soc. Am., Vol. 99, No. 5, May 1996
`RTL898_1022-0004
` Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 141.217.20.120 On: Thu, 08 Jan 2015 20:38:27
`
`J. M. Kates and M. R. Weiss: Array processing comparison
`
`3141
`
`
`
`TABLE II. AI-weighted array gain in dB for the single noise source in the office. The data are presented as a
`function of the microphone array position and the filter length L, and are averaged over source location.
`
`Position
`and
`length
`
`Delay
`and
`sum
`
`Oversteered
`
`Optimal
`superdirective
`
`Scaled projection
`Input SNR, dB
`
`Composite struct.
`Input SNR, dB
`
`⫺10
`
`0
`
`⫹10
`
`⫺10
`
`0
`
`⫹10
`
`Floor
`L⫽8
`16
`32
`
`Desk
`L⫽8
`16
`32
`
`KEMAR
`L⫽8
`16
`32
`
`5.5
`5.3
`5.1
`
`6.0
`5.6
`5.3
`
`5.3
`5.0
`4.8
`
`7.6
`7.5
`7.3
`
`8.2
`7.9
`7.6
`
`7.3
`7.0
`6.8
`
`8.9
`9.5
`9.3
`
`10.1
`10.6
`10.1
`
`8.1
`8.6
`8.4
`
`10.6
`11.4
`11.1
`
`11.8
`12.4
`11.8
`
`10.0
`10.7
`10.6
`
`10.3
`10.9
`10.6
`
`11.5
`11.8
`11.1
`
`9.5
`10.2
`9.9
`
`9.7
`9.9
`9.4
`
`11.1
`10.7
`9.8
`
`8.4
`8.5
`7.8
`
`9.0
`9.7
`9.1
`
`10.5
`10.7
`9.9
`
`8.2
`8.5
`8.1
`
`9.5
`9.8
`9.2
`
`10.5
`10.9
`10.0
`
`8.7
`8.8
`8.3
`
`9.9
`10.0
`9.4
`
`10.2
`10.9
`10.0
`
`8.9
`9.0
`8.4
`
`the spatial distribution of the noise, but is independent of the
`actual signal and noise powers. An array consisting of a
`single omnidirectional microphone has an array gain of 1.
`The estimated noise-alone correlation matrix used in the
`array-gain calculation was smoothed using a low-pass filter
`having a time constant of 500 ms. Since both the speech and
`noise were measured in reverberent rooms, this metric gives
`the ratio of the power in the direct portion of the speech
`signal to the total direct-plus-reverberent noise power at the
`array output, normalized by the SNR at the array input. This
`measure thus represents the directional gain of the array in
`the noise field. It is also possible, under conditions of a per-
`turbed signal wavefront, for the array gain to appear to be
`favorable even though signal cancellation is occurring. The
`output signal power in each FFT frequency bin was moni-
`tored as a check for this condition, and no measurable signal
`cancellation was observed.
`The AI-weighted array gain is then given by
`K
`
`GAI⫽ 兺
`
`a共k 兲关10 log10G共k 兲兴 dB,
`
`共5兲
`
`k⫽0
`where the set of weights 兵a(k)其 is the AI importance func-
`tion weights given by Kryter 共1962a兲 reinterpolated for the
`FFT band edges. Spread of masking effects are ignored in
`this metric. The AI-weighted array gain GAI is expressed in
`dB re: the array gain for a single omnidirectional micro-
`phone.
`The array gain given in Eq. 共4兲 differs from the ratio of
`array output SNR to input SNR used by other authors
`共Greenberg and Zurek, 1992; Hoffman et al., 1994兲 as the
`basis of the performance metric. The array output SNR is the
`ratio of the total speech power to the total noise power at the
`output of the array. The speech and noise powers both in-
`clude the reverberated as well as the direct components. The
`array output SNR is given by
`w*共k 兲S共k 兲w共k 兲
`w*共k 兲N共k 兲w共k 兲 ,
`where S(k) is the speech-alone correlation matrix and N(k)
`is the noise-alone correlation matrix. The processing benefit
`
`SNR共k 兲⫽
`
`共6兲
`
`using this metric would then be calculated as the ratio of the
`array output SNR to the array input SNR, converted to dB
`and summed over frequency using the AI weights.
`The preference for array gain versus the ratio of output
`to input SNR as the basis of the performance metric depends
`on the assumptions made about the effects of reverberation
`on speech intelligibility. The SNR-based metric assumes that
`all speech power, reverberated as well as direct, contributes
`equally to speech intelligibility. Experiments in speech intel-
`ligibility in reverberation, however, indicate that reverbera-
`tion times typical of the rooms used in this paper lead to
`reduced speech intelligibility, and the longer the reverbera-
`tion time the greater the reduction in intelligibility 共Moncur
`and Dirks, 1967; Houtgast and Steeneken, 1972兲. This reduc-
`tion of speech intelligibility with increasing reverberation
`time applies to hearing-impaired as well as to normal-
`hearing subjects 共Duquesnoy and Plomp, 1980; Na´beˇlek,
`1982; Na´beˇlek, 1988兲. The effects of reverberation on speech
`intelligibility have been accurately modeled by the speech
`transmission index 共STI兲, based on the modulation transfer
`function within the room for speech envelope modulation
`frequencies 共Houtgast and Steeneken, 1973; Steeneken and
`Houtgast, 1980兲. Even small amounts of reverberation within
`the room will reduce the envelope modulation depth and will
`therefore reduce the speech intelligibility predicted by the
`STI. These results indicate that the effects of reverberation
`are similar to those of noise in reducing speech intelligibility
`in rooms. Thus the array gain, by excluding the reverberated
`components in the estimated speech power, may lead to a
`more valid estimate of the array benefit for speech intelligi-
`bility in rooms than an estimate that assumes that all of the
`reverberated speech is beneficial.
`
`II. RESULTS
`The data from the experiment are presented in Tables
`II–V. Five array processing approaches were considered in
`the experiment. The three fixed-coefficient approaches of
`delay-and-sum beamforming, oversteered delay-and-sum
`beamforming, and optimal superdirective processing were
`tested along with the two adaptive approaches based on the
`
`3142
`
`J. Acoust. Soc. Am., Vol. 99, No. 5, May 1996
`RTL898_1022-0005
` Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 141.217.20.120 On: Thu, 08 Jan 2015 20:38:27
`
`J. M. Kates and M. R. Weiss: Array processing comparison
`
`3142
`
`
`
`TABLE III. AI-weighted array gain in dB for the combined noise source in the office. The data are presented
`as a function of the microphone array position and the filter length L.
`
`Position
`and
`length
`
`Delay
`and
`sum
`
`Oversteered
`
`Optimal
`superdirective
`
`Scaled projection
`input SNR, dB
`
`Composite struct.
`input SNR, dB
`
`⫺10
`
`0
`
`⫹10
`
`⫺10
`
`0
`
`⫹10
`
`Floor
`L⫽8
`16
`32
`
`Desk
`L⫽8
`16
`32
`
`KEMAR
`L⫽8
`16
`32
`
`5.3
`5.0
`4.6
`
`5.8
`5.4
`4.9
`
`5.1
`4.8
`4.5
`
`7.5
`7.5
`6.9
`
`8.2
`7.8
`7.4
`
`7.1
`6.7
`6.4
`
`8.6
`9.5
`8.5
`
`10.0
`10.3
`9.6
`
`7.9
`8.3
`7.9
`
`10.0
`10.3
`9.8
`
`11.4
`11.6
`10.6
`
`9.5
`9.6
`9.2
`
`9.7
`10.0
`9.4
`
`11.2
`11.1
`10.0
`
`9.0
`9.1
`8.6
`
`9.3
`9.3
`8.5
`
`10.9
`10.3
`9.1
`
`8.2
`8.0
`7.2
`
`8.4
`8.8
`8.3
`
`10.2
`10.2
`9.2
`
`8.3
`8.6
`7.7
`
`9.4
`9.1
`8.5
`
`10.2
`10.5
`9.4
`
`8.8
`8.4
`7.8
`
`9.7
`9.5
`8.7
`
`10.0
`10.5
`9.4
`
`8.9
`8.6
`8.0
`
`scaled projection algorithm and the scaled projection algo-
`rithm combined with the composite structured correlation
`matrix. The input SNR was varied for the adaptive process-
`ing approaches but not for the fixed-coefficient approaches.
`The difference in treating the SNR is a direct result of the
`performance metric based on the array gain. The array gain,
`given by Eq. 共4兲, depends only on the array weights and the
`normalized noise-only correlation matrix. The performance
`of the fixed-coefficient systems is independent of the array
`input SNR since neither the fixed coefficients nor the nor-
`malized noise-only correlation matrix will change with
`changes in the incoming signal or noise levels. The SNR
`affects the adaptive systems, however, through the array
`weights; the adaptive array weights change in response to
`changes in the relative levels of the signal and noise embed-
`ded in the measured signal-plus-noise correlation matrix
`used by the adaptive weight algorithm. The speech and noise
`power levels used to compute the array input SNR were de-
`
`termined from the respective steady-state sound fields at the
`microphone array, and thus include both the direct and re-
`verberent field contributions.
`Three filter lengths were used for each processing ap-
`proach, these being equivalent to time-domain filters having
`8, 16, or 32 samples duration. The microphone position in-
`cludes the three placements of floor stand, desk stand, and
`above the left ear of KEMAR. Two rooms, the office and the
`conference room, were used for the measurements, and the
`noise source was either the average of the individual loud-
`speaker results or the result for the five-loudspeaker combi-
`nation.
`Each entry in Tables II and IV is the AI-weighted array
`gain in dB re: a single omnidirectional microphone, averaged
`over the set of five noise loudspeaker azimuths. The data in
`Table II is for the office, and the data in Table IV is for the
`conference room. The entries in Tables III and V were pro-
`duced by combining the babble signals from the five separate
`
`TABLE IV. AI-weighted array gain in dB for the single noise source in the conference room. The data are
`presented as a function of the microphone array position and the filter length L, and are averaged over the
`source location.
`
`Position
`and
`length
`
`Delay
`and
`sum
`
`Oversteered
`
`Optimal
`superdirective
`
`Scaled projection
`input SNR, dB
`
`Composite struct.
`input SNR, dB
`
`⫺10
`
`0
`
`⫹10
`
`⫺10
`
`0
`
`⫹10
`
`Floor
`L⫽8
`16
`32
`
`Desk
`L⫽8
`16
`32
`
`KEMAR
`L⫽8
`16
`32
`
`5.8
`5.4
`5.0
`
`6.2
`5.9
`5.5
`
`5.0
`4.8
`4.7
`
`7.8
`7.5
`7.1
`
`8.3
`8.0
`7.6
`
`7.0
`6.8
`6.6
`
`9.1
`9.6
`9.1
`
`10.2
`10.8
`10.3
`
`7.2
`7.6
`7.5
`
`11.0
`11.3
`10.7
`
`12.1
`12.8
`12.0
`
`9.4
`10.0
`10.1
`
`10.9
`11.1
`10.5
`
`11.9
`12.4
`11.7
`
`9.3
`9.8
`9.8
`
`10.5
`10.6
`9.8
`
`11.5
`11.6
`10.7
`
`8.7
`8.9
`8.6
`
`9.3
`9.6
`8.9
`
`10.0
`10.5
`9.7
`
`7.5
`7.8
`7.5
`
`9.4
`9.7
`8.9
`
`10.3
`10.6
`9.9
`
`7.7
`7.9
`7.4
`
`9.4
`9.7
`8.9
`
`10.4
`10.6
`10.2
`
`7.7
`7.9
`7.4
`
`3143
`
`J. Acoust. Soc. Am., Vol. 99, No. 5, May 1996
`RTL898_1022-0006
` Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 141.217.20.120 On: Thu, 08 Jan 2015 20:38:27
`
`J. M. Kates and M. R. Weiss: Array processing comparison
`
`3143
`
`
`
`TABLE V. AI-weighted array gain in dB for the combined noise source in the conference room. The data are
`presented as a function of the microphone array position and the filter length L.
`
`Position
`and
`length
`
`Delay
`and
`sum
`
`Oversteered
`
`Optimal
`superdirective
`
`Scaled projection
`input SNR, dB
`
`Composite struct.
`input SNR, dB
`
`⫺10
`
`0
`
`⫹10
`
`⫺10
`
`0
`
`⫹10
`
`Floor
`L⫽8
`16
`32
`
`Desk
`L⫽8
`16
`32
`
`KEMAR
`L⫽8
`16
`32
`
`5.4
`5.0
`4.4
`
`6.0
`5.6
`5.2
`
`4.8
`4.6
`4.4
`
`7.5
`7.1
`6.7
`
`8.0
`7.7
`7.4
`
`6.9
`6.8
`6.6
`
`8.7
`9.0
`8.4
`
`9.9
`10.1
`9.7
`
`7.2
`7.6
`7.5
`
`10.4
`10.4
`9.6
`
`11.5
`11.8
`11.1
`
`9.0
`9.2
`9.1
`
`10.3
`10.3
`9.5
`
`11.5
`11.6
`10.8
`
`9.0
`9.2
`9.0
`
`10.1
`10.0
`9.1
`
`11.2
`11.2
`10.2
`
`8.6
`8.7
`8.2
`
`8.5
`9.1
`8.1
`
`9.4
`10.0
`9.2
`
`6.9
`7.6
`7.2
`
`8.7
`9.1
`8.1
`
`9.7
`9.9
`9.4
`
`7.0
`7.7
`7.2
`
`8.9
`9.1
`8.1
`
`10.3
`9.9
`9.5
`
`7.3
`7.8
`7.3
`
`loudspeaker azimuths into a single normalized data file rep-
`resenting a diffuse source of interference of the sort that
`could occur in a meeting room or restau