`
`ADAPTIVE BEAMFORMING MICROPHONE ARRAYS
`
`Walter Kellermann
`
`Fachhochschule Regensburg Germany
`
`ABSTRACT
`
`New concepts for e cient combination of acoustic echo
`cancellationAEC and adaptive beamforming microphone
`arraysABMA are presented. By decomposing common
`beamforming methods into a timeinvariant part which the
`AEC can integrate and a separate timevariant part the
`number of echo cancellers is minimized without rendering
`the system identi cation problem more di cult. Methods
`for controlling the interaction of ABMA and AEC are out
`lined and implementations for typical microphone array ap
`plications are discussed brie y.
`
` .
`
`INTRODUCTION
`
`For acoustic echo control in conventional handsfree com
`munication it is generally acknowledged that an echo can
`cellerEC is desirable which models the impulse response
`of the loudspeaker enclosure microphone system by an
`adaptive lter in order to remove echo components from
`the microphone signal. Other echo control methods like
`loss insertion or nonlinear devices are impairing fullduplex
`communication and thus are mostly considered as supple
`mentary measures only. For applications such as teleconfe
`rencing between o ces studios auditoria or
`car telephony convenience or safety aspects suggest
`that the personal microphone be replaced by a microphone
`array MA directing a beam of increased sensitivity at the
`active talker.
`
` . . Acoustic Echo Path with Microphone Arrays
`
`In contrast to singlemicrophoneSM handsfree commu
`nication or multichannel teleconferencing one might
`hope that for a MA no echo cancellerEC is required be
`cause the acoustic echo path from the loudspeaker could be
`su ciently attenuated by the array directivity. Considering
` as a guideline echo attenuation should be at least dB
`during singletalk and dB during doubletalk. Examining
`the echo attenuation provided by known MA implementa
`tions we nd
` The absolute gain of the MA has to increase along with
`the distance from the local talkers in order to compensate
`for the decay of the sound level dB per doubling of
`distance in the far eld. This extra gain requires corre
`spondingly more echo attenuation.
` The directivity index quantifying the gain of the
`desired direction over the average of all other directions of
`
` xed beamforming arrays does not exceed dB over a wide
`frequency range and is much smaller at low frequencies
` . SNR improvement of adaptive beamforming arrays is
`limited to about dB for realistic conditions . For
`reverberant environments both quantities approximately
`express the echo attenuation provided by the MA.
` Nullsteering to the loudspeaker for maximum echo at
`tenuation is only e ective in nonreverberant environments
` . Even in in carefully designed studios with optimized
`placement of sources MA and loudspeaker unexpected re
` ections may reduce echo attenuation below dB .
`
`For the echo path impulse response of an N sensor MA
`in a reverberant environment a simple model is supported
`by measurements The array impulse response behaves li
`ke the sum of the N impulse responses for the individual
`microphones with the accumulated samples being mutually
`uncorrelated . This implies an increased average echo
`attenuation for the MA on the order of about log N dB
`over a SM. This advantage of the MA must compensate
`for the additional gain due to the usually increased avera
`ge talkersensor distance and a possibly higher directivity
`of the SM compared to a single array sensor if an EC of
`equal length should provide the same echo attenuation as
`for a SM. Thus although the MA echo path could be fur
`ther attenuated by loudspeaker arrays in combination with
`absorbing walls AEC will in most cases remain desirable
`for fullduplex communications with MAs.
`
`. GENERIC CONCEPTS
`
`In Fig. the structure of handsfree telecommunication
`using an ABMA is outlined. For the adaptive beamfor
`mingBF we allow here all spatially selective algorithms
`that extract the desired signal from the N microphone si
`gnals. This notion covers classical adaptive beamforming
`arrays as well as beamsteering algorithms . Only
`a single farendsignal is allowed to avoid interference with
`the stereophonic AEC problem which can be treated sepa
`rately . Two generic AEC approaches are discussed to
`illustrate the AEC problem
`AECI operates directly on the microphone signals i.e.
`for each of theN echo paths an acoustic echo canceller
`must be implemented. The AEC feels no repercussions by
`
` Note that this distinction is independent of the structure
`fullbandsubbandtransform domain structures may be used
`and of the adaptation algorithm for the AEC.
`
`RTL607_1025-0001
`
`Realtek 607 Ex. 1025
`
`
`
`nal complexity of AECI and circumvent the timevariant
`BF in AECII. The key to this is to decompose the ABMA
`into a timeinvariant stage followed by a timevariant stage.
`The timeinvariant BF is to produce a minimum number of
`output signals which the AEC can incorporate into its echo
`path model and the timevariant part of the ABMA may
`not interfere with the AEC.
`
` . . Beamforming methods
`
`We distinguish two classes of BF methods which are com
`mon for MAs in telecommunications
`
`BFI For beamsteering a set of M xed beam signals
`is computed independently of the array input data and the
`output of the beamformer is a weighted sum of these beams
`with timevariant weights accounting for the active talkers
`voting .
`BFII Classical adaptive beamforming methods aim
`at minimizing a statistical error criterion and lter the mi
`crophone signals accordingly . Characteristically the
`parameters of these systems are continually changing over
`time in order to converge to optimum lter coe cients
` . Note that tracking of moving or changing sources
`is usually not supported.
`
` .. AEC with BFI
`
`BFI inherently provides the desired separation into a time
`invariant and a timevariant stage. To minimize the num
`ber of signals for the AEC we introduce a mapping of
`the M xedbeam signals onto L talker beams whenever
`L M N Fig.. For maximum spatial selectivity the
`mapping should select one xed beam or a linear combina
`tion of two neighboring xed beams per talker. The AEC
`
`farend speech signal
`q
`
` rr
`
`AECI
`
`hh hqq q
`
`
`
` rr
`
`Voting
`
`m
`
`Mapping
`
`L
`
`Control
`
`M
`
`Fixed
`BF
`
`N
`
`Figure . AEC combined with BFI
`
`has now to identify a timeinvariant BF system as long as
`
`Note that all beam signals are meant to cover the entire
`frequency range of interest. Accounting for the wideband nature
`of speech and audio signals nested arrays are usually employed
`whose outputs may be ltered as an ensemble or as subarrays
` before yielding a wideband beam signal. Fractional delay
`beamforming for increased spatial resolution is also covered by
`our model.
`
`
`
`rrrr
`
`
`
`
`
`local
`
`talkers
`
`rr
`
`
`
`q
`
` rr
`
`farend speech signal
`
`q
`
` rr
`
`
`
`
`
`q
`
`
`q
`
`q
`
`AECII
`
`AECI
`
`hh hqq q
`
`m
`
`m
`
`Adaptive
`BF
`
`N
`
` rrq
`
` rr
`m
`
`
`
`
`
` rr
`
`Figure . ABMA in a handsfree telecommunicati
`on system with two alternatives for AEC
`
`the adaptive BF and thus the AEC problem is structurally
`the same as for a SM duplicated by the number of sensors.
`AECII operates on the output signal of the BF re
`quiring only a single EC. However the AEC model has to
`incorporate the BF in addition to the acoustic echo path.
`
`A major advantage of AECI is given by its structu
`ral simplicity as it only requires duplicating the established
`SMAEC algorithms. However for large N the computatio
`nal load is considerable and may be prohibitive for com
`mon teleconferencing and car telephony with N
`microphones .
`For AECII only a single AEC is required but this has
`to include the adaptive BF into its model of the echo path.
`As the unknown acoustic components cannot be identi ed
`separately from the known BF ltering system knapsack
`problem the timevariance of the BF poses a major pro
`blem With the identi cation of the acoustic echo path
`being already di cult due to its large number of degrees of
`freedom and its unpredictable potentially fast and severe
`changes of the impulse response it becomes even more
`di cult if adaptive BF must be incorporated. Observing
`that the BF system must change its parameters whenever
`it switches to a newly active local talker severe uctua
`tions in the echo path impulse response occur at a time
`when the adaptive EC is unable to track it because the
`local source acts as interfering noise on the system iden
`ti cation. Hence AECII will in general provide no echo
`attenuation until a farend talker is in a singletalk period
`again and allows convergence of the EC. Thus the bene ts
`of AEC are often missing when they are desired most i.e.
`during doubletalk and at transitions from farend activity
`to local activity and viceversa at other times loss insertion
`is less objectionable. As a result the timevariance of the
`BF discourages the use of AECII.
`
` . NEW EFFICIENT CONCEPTS
`
`From the previous section we conclude that for large N
`new e cient concepts ideally should avoid the computatio
`
`RTL607_1025-0002
`
`
`
` ... AEC
`
`As long as computing resources allow all L ECs should
`adapt in parallel during farend talk only periods. Alter
`natively only the currently needed ECs according to the
`voting could be operated while all others are kept frozen.
`As in the SM case estimating the current echo path atte
`nuation provided by AEC during farend talk remains in
`dispensable for determining the amount of required supple
`mentary loss notably during initial convergence at changes
`of the acoustic path and when the mapping for BFI or the
` xed BF of BFII is updated.
`
` .. . BF during farend talk only
`
`Experiments con rmed that using a BF con guration
`which simply minimizes echo feedback to avoid loss inserti
`on may give a disturbing spatial impression to the farend
`party.
`Instead we propose to use the BF con gurations
`covering the local talkers and to insert supplementary loss.
`
` ... Voting
`
`The voting algorithm derives the array output signal from
`a weighted linear combination of L beam signals. Equal
`ly for BFI and BFII the timevariant weights are cho
`sen to allow a fast reaction to newly active local sources
` msec while at the same time avoiding the percepti
`on of switching noise . For maximum spatial selectivity
`for each talker only one beam signal should have a nonze
`ro weight in the stationary case for details see e.g. .
`When entering a farend talk period we propose to start out
`with the weights for the most recently active local talker
`and gradually change weights to arrive at a beamforming
`averaging over all L talker beams.
`
` ... Mapping for BFI
`
`For initialization the results of a training procedure can
`be incorporated or the dominant xed beams during the
` rst periods of local speech are used as initial talker be
`ams. While applying the current xed mapping to form the
`output signal the control unit continuously monitors the
`shortterm energies of the xed beams and incorporates the
`beam energy patterns into a learning procedure e.g. a
` rstorder recursive ltering over time for the currently
`active talker. The mapping should only be changed if a
`xed beam or a combination of two neighboring xed beams
`exhibits signi cantly more energy than the current map
`ping. A combination of two xed beams is considered for
`the mapping only if the neighboring beams have about the
`same energy and their weighted sum produces clearly more
`energy than each of them. The mapping should preferably
`be updated during farend talkonly periods as only then
`the AEC can identify the new echo path.
`
` ... Fixed beamforming for BFII
`
`As with BFI the xed beamforming for each of the
`L talkers must be initialized and should be updated on
`ly when the adaptive beamforming performs signi cantly
`better than the established xed BF for the active talker.
`The initialization usually must include the localization of
`the desired sources and the convergence to an e cient BF
`con guration for each talker c.f. . The control unit is
`supported by an adaptive BF unit which is continually ai
`
`the mapping does not change and thus deals with an L
`channel AECI problem.
`
` . . AEC with BFII
`
`Similarly to the BFI concept we simultaneously apply L
` xed sets of BF lters to the N microphone signals to ac
`count for each talker Fig. . Thus again we obtain an
`Lchannel AECI echo cancellation problem. The signal
`path of this structure is essentially the same as for BFI
`employing xed beamforming and voting. The actual ad
`aptive beamforming has been moved to the control path.
`
`hh hqq q
`
`farend speech signal
`q
`
`rr
`
`
`
` rr
`
`AECI
`
`
`
` rr
`
`Voting
`
`m
`
`Fixed
`BF
`
`L
`
`N
`
`Control
`
`Adaptive
`BF
`
`Figure . AEC combined with BFII
`
`For both BFI and BFII the incorporation of the xed
`beamforming into the echo path model requires a longer
`EC impulse response. The extra length is determined by
`the maximum delay realized in the delay and sum networks
`plus for BFI interpolation and beam shaping lter or
`der and for BFII the length of the adaptive
`beamforming lter .
`
` .. Control mechanisms
`
`With ABMAs and AEC being intensively researched areas
`on their own we concentrate here on e ciently controlling
`their interaction. Unless referenced otherwise the methods
`described below were veri ed and subjectively evaluated
`using recorded dialogues and measured impulse responses
`of MAs in cars o ces and a videoconference studio.
`
` .. . Talker activity detection
`
`The detection of talker activity is crucial for both AEC
`and BF. AEC relies on it for controlling the speed of adap
`tation and BF needs it for voting and to identify periods
`when mapping for BFI or optimum BF for BFII can be
`learned. As in SM concepts talker activity is classi ed by
`primarily evaluating the energies of loudspeaker and micro
`phone signals respectively . The spatial resolution of
`beamforming MAs provides additional
`information E.g.
`for the BFI concept the M beam signal energies will show
`a typical pattern for each spatially xed source such as the
`loudspeaker which can then be distinguished from the pat
`terns of other sources.
`
`RTL607_1025-0003
`
`
`
`ming at optimizing BF lters for the currently active local
`talker not during doubletalk or when several local talkers
`are active. For all L local talkers the BF lter outputs
`must be computed for activity detection if nothing else.
`
` .. Examples
`
`For illustration the integration of our AEC concepts into
`various known ABMA implementations is considered.
`For car telephony MAs using GSC with typically or
`sensors have mainly be investigated for speech reco
`gnition applications so far. When using the BFII concept
`for handsfree fullduplex telephony the requirements for an
`EC are essentially the same as for a SM as long as only a
`single local talker e.g. the driver is considered. Although
`the directivity gain of the array is not completely balanced
`by the increased average microphone distance compared to
`an optimally located SM the incorporation of the beam
`forming into the echo path model leads to an EC impulse
`response of comparable length as for a SM.
`For desktop teleconferencing MAs compete with multi
`channel systems o ering the advantage of requiring less
`sensors when large groups communicate. The BFII con
`cept could be applied e.g. to the AMNOR beamforming
` based on N sensors. Assuming seated participants
`the BF lters must be updated very infrequently and as
`the echo paths will remain relatively stable most of the ti
`me it will su ce to adapt one EC at a time. A realization
`of AEC with BFI for desktop teleconferencing has been re
`ported in Combining N dipole microphones L
`beam signals are formed and only minfL N g ECs need
`to be realized acting directly on the microphone outputs.
`For videoconferencing MAs mounted to a wall or to the
`ceiling again compete with multichannel systems see e.g.
` . With nested beamsteering subarrays BFI using a
`total of N microphones up to M beams
`are formed which cover typically L talkers. With a
`distance of m between array and talker an additional
`echo gain of at least dB must be compensated by array
`directivity and AEC compared to SMs located at m from
`the talkers. Thus the L ECs will in general be at least as
`complex as for SMs unless the directivity of a loudspeaker
`array combined with absorbing surfaces provides additional
`echo attenuation.
`For an auditorium as described in using a planar ar
`ray BFI N L M the echo cancellation
`problem is scaled up along three parameters compared to
`a teleconferencing studio increased reverberation time de
`mands longer EC impulse responses increased talkerarray
`distance provides extra echo gain demanding even longer
`EC impulse responses and the large L requires more ECs.
`Thus loudspeaker directivity and room design will remain
`of great importance for this application if loss insertion is
`to be minimized.
`
`. CONCLUSIONS
`
`Comparing the proposed concepts to AEC for a SM per lo
`cal talker the complexity of AEC for a MA is on the same
`order for car telephony and desktop teleconferencing but
`increases along with arraytalker distance for videoconfe
`rencing and auditoria. Many details of the outlined control
`
`methods call for further investigation and more sophistica
`ted approaches could be applied to key problems like beam
`forming training and voting. For spotting the most critical
`issues however reallife experiments using simple but com
`plete implementations must be evaluated rst.
`
`. ACKNOWLEDGMENT
`
`The author wishes to thank Yannick Mahieux of CNET
`Lannion France for inspiring discussions and for contribu
`ting helpful measurement data and Gary W. Elko of Bell
`Labs Murray Hill NJ who incited this work years ago.
`
`REFERENCES
`
` M.M. Sondhi and W. Kellermann. Echo cancellation for
`speech signals. In S. Furui and M.M. Sondhi eds. Advances
`in Speech Signal Processing. Marcel Dekker .
`
` J.L. Flanagan J.D. Johnston R. Zahn and G.W. Elko.
`Computersteered microphone arrays for sound transducti
`on in large rooms. JASA .
`
` Y. Kaneda and J. Ohga. Adaptive microphonearray sy
`stem for noise reduction. IEEE TRASSP
` .
`
` W. Kellermann. A selfsteering digital microphone array.
`Proc. ICASSP pp. Toronto .
`
` J.L. Flanagan D.A. Berkley G.W. Elko J.E. West and
`M.M. Sondhi. Autodirective microphone systems. Acustica
` .
`
` P. Chu. Desktop mic array for teleconferencing. Proc. ICAS
`SP pp. Detroit .
`
` C. Marro Y. Mahieux. Analysis of dereverberation and noi
`se reduction techniques based on microphone arrays micro
`phone with optimal ltering. IEEE TRSAP submitted.
`
` S. Oh V. Viswanathan and P. Papamichalis. Handsfree
`voice communication in an automobile with a microphone
`array. Proc. ICASSP pp.I I San Francisco .
`
` S. Nordebo S. Nordholm B. Bengtsson and I. Claesson.
`Noise reduction using an adaptive microphone array in a
`car. In Conf. Rec. of IEEE ASSP Workshop on Appl. of
`DSP to Audio and Acoustics New Paltz USA .
`
` ITUT Recommendation G. Acoustic Echo Controllers
`March .
`
` K. Farrell R.J. Mammone and J.L. Flanagan. Beamfor
`ming microphone arrays for speech enhancement. Proc.
`ICASSP pp.I I San Francisco .
`
` W. Kellermann. Some properties of echo path impulse re
`sponses of microphone arrays and consequences for acoustic
`echo cancellation. InConf. Rec. of the th Intern. Workshop
`on Acoustic Echo Control R ros Norway .
`
` B.D. Van Veen K.M. Buckley. Beamforming A versatile
`approach to spatial ltering. ASSP Mag. .
`
` M.M. Sondhi D.R. Morgan and J.L. Hall. Stereophonic
`echo cancellation An overview of the fundamental problem.
`IEEE Signal Processing Letters .
`
` N. Koizumi S. Makino and H. Oikawa. Acoustic echo can
`celler with multiple echo path. JASJ E .
`
` L.M. v.d. Kerkho and W.J.W. Kitzen. Tracking of a time
`varying acoustic impulse response by an adaptive lter.
`IEEE TRSP .
`
` T. Chou. Frequencyindependent beamformer with low re
`sponse error. Proc. ICASSP pp. Detroit .
`
`RTL607_1025-0004