throbber
STRATEGIES FOR COMBINING ACOUSTIC ECHO CANCELLATION AND
`
`ADAPTIVE BEAMFORMING MICROPHONE ARRAYS
`
`Walter Kellermann
`
`Fachhochschule Regensburg Germany
`
`ABSTRACT
`
`New concepts for ecient combination of acoustic echo
`cancellationAEC and adaptive beamforming microphone
`arraysABMA are presented. By decomposing common
`beamforming methods into a timeinvariant part which the
`AEC can integrate and a separate timevariant part the
`number of echo cancellers is minimized without rendering
`the system identication problem more dicult. Methods
`for controlling the interaction of ABMA and AEC are out
`lined and implementations for typical microphone array ap
`plications are discussed briey.
`
` .
`
`INTRODUCTION
`
`For acoustic echo control in conventional handsfree com
`munication it is generally acknowledged that an echo can
`cellerEC is desirable which models the impulse response
`of the loudspeaker  enclosure  microphone system by an
`adaptive lter in order to remove echo components from
`the microphone signal. Other echo control methods like
`loss insertion or nonlinear devices are impairing fullduplex
`communication and thus are mostly considered as supple
`mentary measures only. For applications such as teleconfe
`rencing between oces studios auditoria       or
`car telephony   convenience or safety aspects suggest
`that the personal microphone be replaced by a microphone
`array MA directing a beam of increased sensitivity at the
`active talker.
`
` . . Acoustic Echo Path with Microphone Arrays
`
`In contrast to singlemicrophoneSM handsfree commu
`nication or multichannel teleconferencing   one might
`hope that for a MA no echo cancellerEC is required be
`cause the acoustic echo path from the loudspeaker could be
`suciently attenuated by the array directivity. Considering
`  as a guideline echo attenuation should be at least dB
`during singletalk and dB during doubletalk. Examining
`the echo attenuation provided by known MA implementa
`tions we nd
`  The absolute gain of the MA has to increase along with
`the distance from the local talkers in order to compensate
`for the decay of the sound level  dB per doubling of
`distance in the fareld. This extra gain requires corre
`spondingly more echo attenuation.
` The directivity index  quantifying the gain of the
`desired direction over the average of all other directions  of
`
`xed beamforming arrays does not exceed dB over a wide
`frequency range and is much smaller at low frequencies
` . SNR improvement of adaptive beamforming arrays is
`limited to about dB for realistic conditions    . For
`reverberant environments both quantities approximately
`express the echo attenuation provided by the MA.
`  Nullsteering to the loudspeaker for maximum echo at
`tenuation is only eective in nonreverberant environments
` . Even in in carefully designed studios with optimized
`placement of sources MA and loudspeaker unexpected re
`ections may reduce echo attenuation below dB .
`
`For the echo path impulse response of an N sensor MA
`in a reverberant environment a simple model is supported
`by measurements The array impulse response behaves li
`ke the sum of the N impulse responses for the individual
`microphones with the accumulated samples being mutually
`uncorrelated  . This implies an increased average echo
`attenuation for the MA on the order of about log N dB
`over a SM. This advantage of the MA must compensate
`for the additional gain due to the usually increased avera
`ge talkersensor distance and a possibly higher directivity
`of the SM compared to a single array sensor if an EC of
`equal length should provide the same echo attenuation as
`for a SM. Thus although the MA echo path could be fur
`ther attenuated by loudspeaker arrays in combination with
`absorbing walls AEC will in most cases remain desirable
`for fullduplex communications with MAs.
`
`. GENERIC CONCEPTS
`
`In Fig. the structure of handsfree telecommunication
`using an ABMA is outlined. For the adaptive beamfor
`mingBF we allow here all spatially selective algorithms
`that extract the desired signal from the N microphone si
`gnals. This notion covers classical adaptive beamforming
`arrays   as well as beamsteering algorithms   . Only
`a single farendsignal is allowed to avoid interference with
`the stereophonic AEC problem which can be treated sepa
`rately  . Two generic AEC approaches are discussed to
`illustrate the AEC problem
`AECI operates directly on the microphone signals i.e.
`for each of theN echo paths an acoustic echo canceller
`must be implemented. The AEC feels no repercussions by
`
` Note that this distinction is independent of the structure
`fullbandsubbandtransform domain structures may be used
`and of the adaptation algorithm for the AEC.
`
`RTL607_1025-0001
`
`Realtek 607 Ex. 1025
`
`

`
`nal complexity of AECI and circumvent the timevariant
`BF in AECII. The key to this is to decompose the ABMA
`into a timeinvariant stage followed by a timevariant stage.
`The timeinvariant BF is to produce a minimum number of
`output signals which the AEC can incorporate into its echo
`path model and the timevariant part of the ABMA may
`not interfere with the AEC.
`
` . . Beamforming methods
`
`We distinguish two classes of BF methods which are com
`mon for MAs in telecommunications
`
`BFI For beamsteering a set of M xed beam signals
`is computed independently of the array input data and the
`output of the beamformer is a weighted sum of these beams
`with timevariant weights accounting for the active talkers
`voting   .
`BFII Classical adaptive beamforming methods aim
`at minimizing a statistical error criterion and lter the mi
`crophone signals accordingly  . Characteristically the
`parameters of these systems are continually changing over
`time in order to converge to optimum lter coecients
`   . Note that tracking of moving or changing sources
`is usually not supported.
`
` .. AEC with BFI
`
`BFI inherently provides the desired separation into a time
`invariant and a timevariant stage. To minimize the num
`ber of signals for the AEC we introduce a mapping of
`the M xedbeam signals onto L talker beams whenever
`L M N Fig.. For maximum spatial selectivity the
`mapping should select one xed beam or a linear combina
`tion of two neighboring xed beams per talker. The AEC
`
`farend speech signal
`q
`
`  rr
`
`AECI
`
`hh hqq q
`
` 
`
` rr
`
`Voting
`
`m
`
`Mapping
`
`L
`
`Control
`
`M
`
`Fixed
`BF
`
`N
`
`Figure . AEC combined with BFI
`
`has now to identify a timeinvariant BF system as long as
`
`Note that all beam signals are meant to cover the entire
`frequency range of interest. Accounting for the wideband nature
`of speech and audio signals nested arrays are usually employed
`whose outputs may be ltered as an ensemble   or as subarrays
`   before yielding a wideband beam signal. Fractional delay
`beamforming for increased spatial resolution is also covered by
`our model.
`
`
`
`rrrr
`
`
`
`
`
`local
`
`talkers
`
`rr
`
`
`
`q
`
`  rr
`
`farend speech signal
`
`q
`
`  rr
`
`
`
`
`
`q
`
`
`q
`
`q
`
`AECII
`
`AECI
`
`hh hqq q
`
`m
`
`m
`
`Adaptive
`BF
`
`N
`
`  rrq
`
` rr
`m
`
` 
`
` 
`
` rr
`
`Figure . ABMA in a handsfree telecommunicati
`on system with two alternatives for AEC
`
`the adaptive BF and thus the AEC problem is structurally
`the same as for a SM duplicated by the number of sensors.
`AECII operates on the output signal of the BF re
`quiring only a single EC. However the AEC model has to
`incorporate the BF in addition to the acoustic echo path.
`
`A major advantage of AECI is given by its structu
`ral simplicity as it only requires duplicating the established
`SMAEC algorithms. However for large N the computatio
`nal load is considerable   and may be prohibitive for com
`mon teleconferencing and car telephony with N  
`microphones    .
`For AECII only a single AEC is required but this has
`to include the adaptive BF into its model of the echo path.
`As the unknown acoustic components cannot be identied
`separately from the known BF ltering system knapsack
`problem the timevariance of the BF poses a major pro
`blem With the identication of the acoustic echo path
`being already dicult due to its large number of degrees of
`freedom and its unpredictable potentially fast and severe
`changes of the impulse response   it becomes even more
`dicult if adaptive BF must be incorporated. Observing
`that the BF system must change its parameters whenever
`it switches to a newly active local talker severe uctua
`tions in the echo path impulse response occur at a time
`when the adaptive EC is unable to track it because the
`local source acts as interfering noise on the system iden
`tication. Hence AECII will in general provide no echo
`attenuation until a farend talker is in a singletalk period
`again and allows convergence of the EC. Thus the benets
`of AEC are often missing when they are desired most i.e.
`during doubletalk and at transitions from farend activity
`to local activity and viceversa at other times loss insertion
`is less objectionable. As a result the timevariance of the
`BF discourages the use of AECII.
`
` . NEW EFFICIENT CONCEPTS
`
`From the previous section we conclude that for large N 
`new ecient concepts ideally should avoid the computatio
`
`RTL607_1025-0002
`
`

`
` ... AEC
`
`As long as computing resources allow all L ECs should
`adapt in parallel during farend talk only periods. Alter
`natively only the currently needed ECs according to the
`voting could be operated while all others are kept frozen.
`As in the SM case estimating the current echo path atte
`nuation provided by AEC during farend talk remains in
`dispensable for determining the amount of required supple
`mentary loss notably during initial convergence at changes
`of the acoustic path and when the mapping for BFI or the
`xed BF of BFII is updated.
`
` .. . BF during farend talk only
`
`Experiments conrmed that using a BF conguration
`which simply minimizes echo feedback to avoid loss inserti
`on may give a disturbing spatial impression to the farend
`party.
`Instead we propose to use the BF congurations
`covering the local talkers and to insert supplementary loss.
`
` ... Voting
`
`The voting algorithm derives the array output signal from
`a weighted linear combination of L beam signals. Equal
`ly for BFI and BFII the timevariant weights are cho
`sen to allow a fast reaction to newly active local sources
` msec while at the same time avoiding the percepti
`on of switching noise . For maximum spatial selectivity
`for each talker only one beam signal should have a nonze
`ro weight in the stationary case for details see e.g. .
`When entering a farend talk period we propose to start out
`with the weights for the most recently active local talker
`and gradually change weights to arrive at a beamforming
`averaging over all L talker beams.
`
` ... Mapping for BFI
`
`For initialization the results of a training procedure can
`be incorporated or the dominant xed beams during the
`rst periods of local speech are used as initial talker be
`ams. While applying the current xed mapping to form the
`output signal the control unit continuously monitors the
`shortterm energies of the xed beams and incorporates the
`beam energy patterns into a learning procedure  e.g. a
`rstorder recursive ltering over time  for the currently
`active talker. The mapping should only be changed if a 
`xed beam or a combination of two neighboring xed beams
`exhibits signicantly more energy than the current map
`ping. A combination of two xed beams is considered for
`the mapping only if the neighboring beams have about the
`same energy and their weighted sum produces clearly more
`energy than each of them. The mapping should preferably
`be updated during farend talkonly periods as only then
`the AEC can identify the new echo path.
`
` ... Fixed beamforming for BFII
`
`As with BFI the xed beamforming for each of the
`L talkers must be initialized and should be updated on
`ly when the adaptive beamforming performs signicantly
`better than the established xed BF for the active talker.
`The initialization usually must include the localization of
`the desired sources and the convergence to an ecient BF
`conguration for each talker c.f. . The control unit is
`supported by an adaptive BF unit which is continually ai
`
`the mapping does not change and thus deals with an L
`channel AECI problem.
`
` . . AEC with BFII
`
`Similarly to the BFI concept we simultaneously apply L
`xed sets of BF lters to the N microphone signals to ac
`count for each talker Fig. . Thus again we obtain an
`Lchannel AECI echo cancellation problem. The signal
`path of this structure is essentially the same as for BFI
`employing xed beamforming and voting. The actual ad
`aptive beamforming has been moved to the control path.
`
`hh hqq q
`
`farend speech signal
`q
`
`rr
`
`
`
`  rr
`
`AECI
`
` 
`
` rr
`
`Voting
`
`m
`
`Fixed
`BF
`
`L
`
`N
`
`Control
`
`Adaptive
`BF
`
`Figure . AEC combined with BFII
`
`For both BFI and BFII the incorporation of the xed
`beamforming into the echo path model requires a longer
`EC impulse response. The extra length is determined by
`the maximum delay realized in the delay and sum networks
`plus  for BFI  interpolation and beam shaping lter or
`der   and  for BFII  the length of the adaptive
`beamforming lter  .
`
` .. Control mechanisms
`
`With ABMAs and AEC being intensively researched areas
`on their own we concentrate here on eciently controlling
`their interaction. Unless referenced otherwise the methods
`described below were veried and subjectively evaluated
`using recorded dialogues and measured impulse responses
`of MAs in cars oces and a videoconference studio.
`
` .. . Talker activity detection
`
`The detection of talker activity is crucial for both AEC
`and BF. AEC relies on it for controlling the speed of adap
`tation and BF needs it for voting and to identify periods
`when mapping for BFI or optimum BF for BFII can be
`learned. As in SM concepts talker activity is classied by
`primarily evaluating the energies of loudspeaker and micro
`phone signals respectively  . The spatial resolution of
`beamforming MAs provides additional
`information E.g.
`for the BFI concept the M beam signal energies will show
`a typical pattern for each spatially xed source such as the
`loudspeaker which can then be distinguished from the pat
`terns of other sources.
`
`RTL607_1025-0003
`
`

`
`ming at optimizing BF lters for the currently active local
`talker not during doubletalk or when several local talkers
`are active. For all L local talkers the BF lter outputs
`must be computed for activity detection if nothing else.
`
` .. Examples
`
`For illustration the integration of our AEC concepts into
`various known ABMA implementations is considered.
`For car telephony MAs using GSC with typically  or 
`sensors   have mainly be investigated for speech reco
`gnition applications so far. When using the BFII concept
`for handsfree fullduplex telephony the requirements for an
`EC are essentially the same as for a SM as long as only a
`single local talker e.g. the driver is considered. Although
`the directivity gain of the array is not completely balanced
`by the increased average microphone distance compared to
`an optimally located SM the incorporation of the beam
`forming into the echo path model leads to an EC impulse
`response of comparable length as for a SM.
`For desktop teleconferencing MAs compete with multi
`channel systems oering the advantage of requiring less
`sensors when large groups communicate. The BFII con
`cept could be applied e.g. to the AMNOR beamforming
`  based on N  sensors. Assuming seated participants
`the BF lters must be updated very infrequently and as
`the echo paths will remain relatively stable most of the ti
`me it will suce to adapt one EC at a time. A realization
`of AEC with BFI for desktop teleconferencing has been re
`ported in  Combining N  dipole microphones L 
`beam signals are formed and only minfL N g  ECs need
`to be realized acting directly on the microphone outputs.
`For videoconferencing MAs mounted to a wall or to the
`ceiling again compete with multichannel systems see e.g.
` . With nested beamsteering subarrays BFI using a
`total of N  microphones    up to M  beams
`are formed which cover typically L   talkers. With a
`distance of  m between array and talker an additional
`echo gain of at least  dB must be compensated by array
`directivity and AEC compared to SMs located at m from
`the talkers. Thus the L ECs will in general be at least as
`complex as for SMs unless the directivity of a loudspeaker
`array combined with absorbing surfaces provides additional
`echo attenuation.
`For an auditorium as described in  using a planar ar
`ray BFI N  L M  the echo cancellation
`problem is scaled up along three parameters compared to
`a teleconferencing studio increased reverberation time de
`mands longer EC impulse responses increased talkerarray
`distance provides extra echo gain demanding even longer
`EC impulse responses and the large L requires more ECs.
`Thus loudspeaker directivity and room design will remain
`of great importance for this application if loss insertion is
`to be minimized.
`
`. CONCLUSIONS
`
`Comparing the proposed concepts to AEC for a SM per lo
`cal talker the complexity of AEC for a MA is on the same
`order for car telephony and desktop teleconferencing but
`increases along with arraytalker distance for videoconfe
`rencing and auditoria. Many details of the outlined control
`
`methods call for further investigation and more sophistica
`ted approaches could be applied to key problems like beam
`forming training and voting. For spotting the most critical
`issues however reallife experiments using simple but com
`plete implementations must be evaluated rst.
`
`. ACKNOWLEDGMENT
`
`The author wishes to thank Yannick Mahieux of CNET
`Lannion France for inspiring discussions and for contribu
`ting helpful measurement data and Gary W. Elko of Bell
`Labs Murray Hill NJ who incited this work years ago.
`
`REFERENCES
`
`  M.M. Sondhi and W. Kellermann. Echo cancellation for
`speech signals. In S. Furui and M.M. Sondhi eds. Advances
`in Speech Signal Processing. Marcel Dekker .
`
` J.L. Flanagan J.D. Johnston R. Zahn and G.W. Elko.
`Computersteered microphone arrays for sound transducti
`on in large rooms. JASA     .
`
`  Y. Kaneda and J. Ohga. Adaptive microphonearray sy
`stem for noise reduction. IEEE TRASSP   
` .
`
` W. Kellermann. A selfsteering digital microphone array.
`Proc. ICASSP pp.    Toronto .
`
` J.L. Flanagan D.A. Berkley G.W. Elko J.E. West and
`M.M. Sondhi. Autodirective microphone systems. Acustica
`   .
`
` P. Chu. Desktop mic array for teleconferencing. Proc. ICAS
`SP pp.   Detroit .
`
` C. Marro Y. Mahieux. Analysis of dereverberation and noi
`se reduction techniques based on microphone arrays micro
`phone with optimal ltering. IEEE TRSAP submitted.
`
` S. Oh V. Viswanathan and P. Papamichalis. Handsfree
`voice communication in an automobile with a microphone
`array. Proc. ICASSP pp.I  I San Francisco .
`
`  S. Nordebo S. Nordholm B. Bengtsson and I. Claesson.
`Noise reduction using an adaptive microphone array in a
`car. In Conf. Rec. of IEEE ASSP Workshop on Appl. of
`DSP to Audio and Acoustics New Paltz USA .
`
`  ITUT Recommendation G.   Acoustic Echo Controllers
`March .
`
`  K. Farrell R.J. Mammone and J.L. Flanagan. Beamfor
`ming microphone arrays for speech enhancement. Proc.
`ICASSP pp.I  I San Francisco .
`
`  W. Kellermann. Some properties of echo path impulse re
`sponses of microphone arrays and consequences for acoustic
`echo cancellation. InConf. Rec. of the th Intern. Workshop
`on Acoustic Echo Control R ros Norway .
`
`  B.D. Van Veen K.M. Buckley. Beamforming A versatile
`approach to spatial ltering. ASSP Mag.   .
`
`  M.M. Sondhi D.R. Morgan and J.L. Hall. Stereophonic
`echo cancellation An overview of the fundamental problem.
`IEEE Signal Processing Letters     .
`
`  N. Koizumi S. Makino and H. Oikawa. Acoustic echo can
`celler with multiple echo path. JASJ  E     .
`
`  L.M. v.d. Kerkho and W.J.W. Kitzen. Tracking of a time
`varying acoustic impulse response by an adaptive lter.
`IEEE TRSP     .
`
`  T. Chou. Frequencyindependent beamformer with low re
`sponse error. Proc. ICASSP pp.   Detroit .
`
`RTL607_1025-0004

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket