`Walter Kellermann
`Fachhochschule Regensburg Germany
`New concepts for ecient combination of acoustic echo
`cancellationAEC and adaptive beamforming microphone
`arraysABMA are presented. By decomposing common
`beamforming methods into a timeinvariant part which the
`AEC can integrate and a separate timevariant part the
`number of echo cancellers is minimized without rendering
`the system identication problem more dicult. Methods
`for controlling the interaction of ABMA and AEC are out
`lined and implementations for typical microphone array ap
`plications are discussed briey.
` .
`For acoustic echo control in conventional handsfree com
`munication it is generally acknowledged that an echo can
`cellerEC is desirable which models the impulse response
`of the loudspeaker  enclosure  microphone system by an
`adaptive lter in order to remove echo components from
`the microphone signal. Other echo control methods like
`loss insertion or nonlinear devices are impairing fullduplex
`communication and thus are mostly considered as supple
`mentary measures only. For applications such as teleconfe
`rencing between oces studios auditoria       or
`car telephony   convenience or safety aspects suggest
`that the personal microphone be replaced by a microphone
`array MA directing a beam of increased sensitivity at the
`active talker.
` . . Acoustic Echo Path with Microphone Arrays
`In contrast to singlemicrophoneSM handsfree commu
`nication or multichannel teleconferencing   one might
`hope that for a MA no echo cancellerEC is required be
`cause the acoustic echo path from the loudspeaker could be
`suciently attenuated by the array directivity. Considering
`  as a guideline echo attenuation should be at least dB
`during singletalk and dB during doubletalk. Examining
`the echo attenuation provided by known MA implementa
`tions we nd
`  The absolute gain of the MA has to increase along with
`the distance from the local talkers in order to compensate
`for the decay of the sound level  dB per doubling of
`distance in the fareld. This extra gain requires corre
`spondingly more echo attenuation.
` The directivity index  quantifying the gain of the
`desired direction over the average of all other directions  of
`xed beamforming arrays does not exceed dB over a wide
`frequency range and is much smaller at low frequencies
` . SNR improvement of adaptive beamforming arrays is
`limited to about dB for realistic conditions    . For
`reverberant environments both quantities approximately
`express the echo attenuation provided by the MA.
`  Nullsteering to the loudspeaker for maximum echo at
`tenuation is only eective in nonreverberant environments
` . Even in in carefully designed studios with optimized
`placement of sources MA and loudspeaker unexpected re
`ections may reduce echo attenuation below dB .
`For the echo path impulse response of an N sensor MA
`in a reverberant environment a simple model is supported
`by measurements The array impulse response behaves li
`ke the sum of the N impulse responses for the individual
`microphones with the accumulated samples being mutually
`uncorrelated  . This implies an increased average echo
`attenuation for the MA on the order of about log N dB
`over a SM. This advantage of the MA must compensate
`for the additional gain due to the usually increased avera
`ge talkersensor distance and a possibly higher directivity
`of the SM compared to a single array sensor if an EC of
`equal length should provide the same echo attenuation as
`for a SM. Thus although the MA echo path could be fur
`ther attenuated by loudspeaker arrays in combination with
`absorbing walls AEC will in most cases remain desirable
`for fullduplex communications with MAs.
`In Fig. the structure of handsfree telecommunication
`using an ABMA is outlined. For the adaptive beamfor
`mingBF we allow here all spatially selective algorithms
`that extract the desired signal from the N microphone si
`gnals. This notion covers classical adaptive beamforming
`arrays   as well as beamsteering algorithms   . Only
`a single farendsignal is allowed to avoid interference with
`the stereophonic AEC problem which can be treated sepa
`rately  . Two generic AEC approaches are discussed to
`illustrate the AEC problem
`AECI operates directly on the microphone signals i.e.
`for each of theN echo paths an acoustic echo canceller
`must be implemented. The AEC feels no repercussions by
` Note that this distinction is independent of the structure
`fullbandsubbandtransform domain structures may be used
`and of the adaptation algorithm for the AEC.
`nal complexity of AECI and circumvent the timevariant
`BF in AECII. The key to this is to decompose the ABMA
`into a timeinvariant stage followed by a timevariant stage.
`The timeinvariant BF is to produce a minimum number of
`output signals which the AEC can incorporate into its echo
`path model and the timevariant part of the ABMA may
`not interfere with the AEC.
` . . Beamforming methods
`We distinguish two classes of BF methods which are com
`mon for MAs in telecommunications
`BFI For beamsteering a set of M xed beam signals
`is computed independently of the array input data and the
`output of the beamformer is a weighted sum of these beams
`with timevariant weights accounting for the active talkers
`voting   .
`BFII Classical adaptive beamforming methods aim
`at minimizing a statistical error criterion and lter the mi
`crophone signals accordingly  . Characteristically the
`parameters of these systems are continually changing over
`time in order to converge to optimum lter coecients
`   . Note that tracking of moving or changing sources
`is usually not supported.
` .. AEC with BFI
`BFI inherently provides the desired separation into a time
`invariant and a timevariant stage. To minimize the num
`ber of signals for the AEC we introduce a mapping of
`the M xedbeam signals onto L talker beams whenever
`L M N Fig.. For maximum spatial selectivity the
`mapping should select one xed beam or a linear combina
`tion of two neighboring xed beams per talker. The AEC
`farend speech signal
`  rr
`hh hqq q
` rr
`Figure . AEC combined with BFI
`has now to identify a timeinvariant BF system as long as
`Note that all beam signals are meant to cover the entire
`frequency range of interest. Accounting for the wideband nature
`of speech and audio signals nested arrays are usually employed
`whose outputs may be ltered as an ensemble   or as subarrays
`   before yielding a wideband beam signal. Fractional delay
`beamforming for increased spatial resolution is also covered by
`our model.
`  rr
`farend speech signal
`  rr
`hh hqq q
`  rrq
` rr
` rr
`Figure . ABMA in a handsfree telecommunicati
`on system with two alternatives for AEC
`the adaptive BF and thus the AEC problem is structurally
`the same as for a SM duplicated by the number of sensors.
`AECII operates on the output signal of the BF re
`quiring only a single EC. However the AEC model has to
`incorporate the BF in addition to the acoustic echo path.
`A major advantage of AECI is given by its structu
`ral simplicity as it only requires duplicating the established
`SMAEC algorithms. However for large N the computatio
`nal load is considerable   and may be prohibitive for com
`mon teleconferencing and car telephony with N  
`microphones    .
`For AECII only a single AEC is required but this has
`to include the adaptive BF into its model of the echo path.
`As the unknown acoustic components cannot be identied
`separately from the known BF ltering system knapsack
`problem the timevariance of the BF poses a major pro
`blem With the identication of the acoustic echo path
`being already dicult due to its large number of degrees of
`freedom and its unpredictable potentially fast and severe
`changes of the impulse response   it becomes even more
`dicult if adaptive BF must be incorporated. Observing
`that the BF system must change its parameters whenever
`it switches to a newly active local talker severe uctua
`tions in the echo path impulse response occur at a time
`when the adaptive EC is unable to track it because the
`local source acts as interfering noise on the system iden
`tication. Hence AECII will in general provide no echo
`attenuation until a farend talker is in a singletalk period
`again and allows convergence of the EC. Thus the benets
`of AEC are often missing when they are desired most i.e.
`during doubletalk and at transitions from farend activity
`to local activity and viceversa at other times loss insertion
`is less objectionable. As a result the timevariance of the
`BF discourages the use of AECII.
`From the previous section we conclude that for large N 
`new ecient concepts ideally should avoid the computatio

` ... AEC
`As long as computing resources allow all L ECs should
`adapt in parallel during farend talk only periods. Alter
`natively only the currently needed ECs according to the
`voting could be operated while all others are kept frozen.
`As in the SM case estimating the current echo path atte
`nuation provided by AEC during farend talk remains in
`dispensable for determining the amount of required supple
`mentary loss notably during initial convergence at changes
`of the acoustic path and when the mapping for BFI or the
`xed BF of BFII is updated.
` .. . BF during farend talk only
`Experiments conrmed that using a BF conguration
`which simply minimizes echo feedback to avoid loss inserti
`on may give a disturbing spatial impression to the farend
`Instead we propose to use the BF congurations
`covering the local talkers and to insert supplementary loss.
` ... Voting
`The voting algorithm derives the array output signal from
`a weighted linear combination of L beam signals. Equal
`ly for BFI and BFII the timevariant weights are cho
`sen to allow a fast reaction to newly active local sources
` msec while at the same time avoiding the percepti
`on of switching noise . For maximum spatial selectivity
`for each talker only one beam signal should have a nonze
`ro weight in the stationary case for details see e.g. .
`When entering a farend talk period we propose to start out
`with the weights for the most recently active local talker
`and gradually change weights to arrive at a beamforming
`averaging over all L talker beams.
` ... Mapping for BFI
`For initialization the results of a training procedure can
`be incorporated or the dominant xed beams during the
`rst periods of local speech are used as initial talker be
`ams. While applying the current xed mapping to form the
`output signal the control unit continuously monitors the
`shortterm energies of the xed beams and incorporates the
`beam energy patterns into a learning procedure  e.g. a
`rstorder recursive ltering over time  for the currently
`active talker. The mapping should only be changed if a 
`xed beam or a combination of two neighboring xed beams
`exhibits signicantly more energy than the current map
`ping. A combination of two xed beams is considered for
`the mapping only if the neighboring beams have about the
`same energy and their weighted sum produces clearly more
`energy than each of them. The mapping should preferably
`be updated during farend talkonly periods as only then
`the AEC can identify the new echo path.
` ... Fixed beamforming for BFII
`As with BFI the xed beamforming for each of the
`L talkers must be initialized and should be updated on
`ly when the adaptive beamforming performs signicantly
`better than the established xed BF for the active talker.
`The initialization usually must include the localization of
`the desired sources and the convergence to an ecient BF
`conguration for each talker c.f. . The control unit is
`supported by an adaptive BF unit which is continually ai
`the mapping does not change and thus deals with an L
`channel AECI problem.
` . . AEC with BFII
`Similarly to the BFI concept we simultaneously apply L
`xed sets of BF lters to the N microphone signals to ac
`count for each talker Fig. . Thus again we obtain an
`Lchannel AECI echo cancellation problem. The signal
`path of this structure is essentially the same as for BFI
`employing xed beamforming and voting. The actual ad
`aptive beamforming has been moved to the control path.
`hh hqq q
`farend speech signal
`  rr
` rr
`Figure . AEC combined with BFII
`For both BFI and BFII the incorporation of the xed
`beamforming into the echo path model requires a longer
`EC impulse response. The extra length is determined by
`the maximum delay realized in the delay and sum networks
`plus  for BFI  interpolation and beam shaping lter or
`der   and  for BFII  the length of the adaptive
`beamforming lter  .
` .. Control mechanisms
`With ABMAs and AEC being intensively researched areas
`on their own we concentrate here on eciently controlling
`their interaction. Unless referenced otherwise the methods
`described below were veried and subjectively evaluated
`using recorded dialogues and measured impulse responses
`of MAs in cars oces and a videoconference studio.
` .. . Talker activity detection
`The detection of talker activity is crucial for both AEC
`and BF. AEC relies on it for controlling the speed of adap
`tation and BF needs it for voting and to identify periods
`when mapping for BFI or optimum BF for BFII can be
`learned. As in SM concepts talker activity is classied by
`primarily evaluating the energies of loudspeaker and micro
`phone signals respectively  . The spatial resolution of
`beamforming MAs provides additional
`information E.g.
`for the BFI concept the M beam signal energies will show
`a typical pattern for each spatially xed source such as the
`loudspeaker which can then be distinguished from the pat
`terns of other sources.

`ming at optimizing BF lters for the currently active local
`talker not during doubletalk or when several local talkers
`are active. For all L local talkers the BF lter outputs
`must be computed for activity detection if nothing else.
` .. Examples
`For illustration the integration of our AEC concepts into
`various known ABMA implementations is considered.
`For car telephony MAs using GSC with typically  or 
`sensors   have mainly be investigated for speech reco
`gnition applications so far. When using the BFII concept
`for handsfree fullduplex telephony the requirements for an
`EC are essentially the same as for a SM as long as only a
`single local talker e.g. the driver is considered. Although
`the directivity gain of the array is not completely balanced
`by the increased average microphone distance compared to
`an optimally located SM the incorporation of the beam
`forming into the echo path model leads to an EC impulse
`response of comparable length as for a SM.
`For desktop teleconferencing MAs compete with multi
`channel systems oering the advantage of requiring less
`sensors when large groups communicate. The BFII con
`cept could be applied e.g. to the AMNOR beamforming
`  based on N  sensors. Assuming seated participants
`the BF lters must be updated very infrequently and as
`the echo paths will remain relatively stable most of the ti
`me it will suce to adapt one EC at a time. A realization
`of AEC with BFI for desktop teleconferencing has been re
`ported in  Combining N  dipole microphones L 
`beam signals are formed and only minfL N g  ECs need
`to be realized acting directly on the microphone outputs.
`For videoconferencing MAs mounted to a wall or to the
`ceiling again compete with multichannel systems see e.g.
` . With nested beamsteering subarrays BFI using a
`total of N  microphones    up to M  beams
`are formed which cover typically L   talkers. With a
`distance of  m between array and talker an additional
`echo gain of at least  dB must be compensated by array
`directivity and AEC compared to SMs located at m from
`the talkers. Thus the L ECs will in general be at least as
`complex as for SMs unless the directivity of a loudspeaker
`array combined with absorbing surfaces provides additional
`echo attenuation.
`For an auditorium as described in  using a planar ar
`ray BFI N  L M  the echo cancellation
`problem is scaled up along three parameters compared to
`a teleconferencing studio increased reverberation time de
`mands longer EC impulse responses increased talkerarray
`distance provides extra echo gain demanding even longer
`EC impulse responses and the large L requires more ECs.
`Thus loudspeaker directivity and room design will remain
`of great importance for this application if loss insertion is
`to be minimized.
`Comparing the proposed concepts to AEC for a SM per lo
`cal talker the complexity of AEC for a MA is on the same
`order for car telephony and desktop teleconferencing but
`increases along with arraytalker distance for videoconfe
`rencing and auditoria. Many details of the outlined control
`methods call for further investigation and more sophistica
`ted approaches could be applied to key problems like beam
`forming training and voting. For spotting the most critical
`issues however reallife experiments using simple but com
`plete implementations must be evaluated rst.
`The author wishes to thank Yannick Mahieux of CNET
`Lannion France for inspiring discussions and for contribu
`ting helpful measurement data and Gary W. Elko of Bell
`Labs Murray Hill NJ who incited this work years ago.
`  M.M. Sondhi and W. Kellermann. Echo cancellation for
`speech signals. In S. Furui and M.M. Sondhi eds. Advances
`in Speech Signal Processing. Marcel Dekker .
` J.L. Flanagan J.D. Johnston R. Zahn and G.W. Elko.
`Computersteered microphone arrays for sound transducti
`on in large rooms. JASA     .
`  Y. Kaneda and J. Ohga. Adaptive microphonearray sy
`stem for noise reduction. IEEE TRASSP   
` .
` W. Kellermann. A selfsteering digital microphone array.
`Proc. ICASSP pp.    Toronto .
` J.L. Flanagan D.A. Berkley G.W. Elko J.E. West and
`M.M. Sondhi. Autodirective microphone systems. Acustica
`   .
` P. Chu. Desktop mic array for teleconferencing. Proc. ICAS
`SP pp.   Detroit .
` C. Marro Y. Mahieux. Analysis of dereverberation and noi
`se reduction techniques based on microphone arrays micro
`phone with optimal ltering. IEEE TRSAP submitted.
` S. Oh V. Viswanathan and P. Papamichalis. Handsfree
`voice communication in an automobile with a microphone
`array. Proc. ICASSP pp.I  I San Francisco .
`  S. Nordebo S. Nordholm B. Bengtsson and I. Claesson.
`Noise reduction using an adaptive microphone array in a
`car. In Conf. Rec. of IEEE ASSP Workshop on Appl. of
`DSP to Audio and Acoustics New Paltz USA .
`  ITUT Recommendation G.   Acoustic Echo Controllers
`March .
`  K. Farrell R.J. Mammone and J.L. Flanagan. Beamfor
`ming microphone arrays for speech enhancement. Proc.
`ICASSP pp.I  I San Francisco .
`  W. Kellermann. Some properties of echo path impulse re
`sponses of microphone arrays and consequences for acoustic
`echo cancellation. InConf. Rec. of the th Intern. Workshop
`on Acoustic Echo Control R ros Norway .
`  B.D. Van Veen K.M. Buckley. Beamforming A versatile
`approach to spatial ltering. ASSP Mag.   .
`  M.M. Sondhi D.R. Morgan and J.L. Hall. Stereophonic
`echo cancellation An overview of the fundamental problem.
`IEEE Signal Processing Letters     .
`  N. Koizumi S. Makino and H. Oikawa. Acoustic echo can
`celler with multiple echo path. JASJ  E     .
`  L.M. v.d. Kerkho and W.J.W. Kitzen. Tracking of a time
`varying acoustic impulse response by an adaptive lter.
`IEEE TRSP     .
`  T. Chou. Frequencyindependent beamformer with low re
`sponse error. Proc. ICASSP pp.   Detroit .

