`
`A LONG LASTING CHALLENGE
`
`Pia Dreiseitel
`
`Eberhard Hansler
`Signaltheorie
`Darmstadt University of Technology D Darmstadt Germany
`fdreiseithaenslerhpuderg(nesi(cid:4)tudarmstadt(cid:4)de
`
`Henning Puder
`
`ABSTRACT
`Handsfree operation of telephones incorporating echo
`cancellation and noise reduction has been discussed for
`over a decade(cid:4) This paper presents an overview of the
`wide range of algorithms which are applicable to echo
`cancellers and noise reduction(cid:4) Practical problems asso
`ciated with implementation and overall system control
`are also discussed(cid:4)
`
` INTRODUCTION
`
`When telecommunications started about a century ago
`users had their two hands busy (cid:4) They had to hold
`a microphone close to their mouth and a loudspeaker
`close to one ear(cid:4) It did not take long to get one hand
`free microphone and loudspeaker were assembled in a
`handset(cid:4) However the aim of handsfree operation has
`not yet been attained(cid:4)
`In early years of telecommunication the lack of ef
` cient electro acoustic devices and ampli ers justi ed
`the inconvenience to the customer(cid:4) At the same time
`two problems were solved
`
` acoustic echos transmitted back to the remote user
`were reduced by providing su cient attenuation
`
` operation in a noisy environment was possible by
`an improved signal to noise ratio(cid:4)
`
`For nonexperts it is still di cult to understand that
`it takes all the signal processing capabilities available
`today to achieve at least some solution of these eas
`ily explained problems of handsfree operation(cid:4) A large
`number of papers on the topic under consideration have
`been published within the last few years including bib
`liographies and reports on the state of the
`art (cid:4) Adaptive algorithms for acoustic echo com
`pensation and noise control gained special attention in
` (cid:4)
`
` BASICS
`
`At the most general level there are two sources that
`make the solution of the handsfree problem di cult
`
` rst the physical properties of loudspeakerenclosure
`microphone systems LEMSs and speech signals and
`secondly the ful llment of the regulations of the Interna
`tional Telecommunications Union ITU(cid:4) Although the
`latter may seem arbitrary it is essential for the equip
`ment to be licensed by telecommunication authorities(cid:4)
`
`(cid:4) Physics
`
`Audio communication systems include at least one loud
`speaker and one microphone housed within the same en
`closure(cid:4) Consequently the microphone picks up not only
`locally generated signals like speech and environmental
`noise but also the signal radiated by the loudspeaker as
`well as its echos caused by re ections at the boundaries
`of the enclosure(cid:4) Assuming linearity the audio charac
`teristics of the LEMS may be modeled by an impulse
`response(cid:4) The duration of the response depends on the
`reverberation time of the enclosure(cid:4)
`In case of an of
` ce room this time is in the order of several hundred
`milliseconds in case of a passenger car it is in the or
`der of fty to one hundred milliseconds(cid:4) Furthermore
`the response of the LEMS is extremely sensitive to any
`movements within the enclosure(cid:4) Finally the system is
`driven by audio signals typically a mixture of speech
`and noise where speech itself is comprised by periodic
`and aperiodic components with highly uctuating mag
`nitudes and pauses(cid:4) Brie y from a signal processing
`point of view the system and the signals involved are
`extremely unpleasant(cid:4)
`
`(cid:4) Regulations
`
`The ITUT Recommendations put very stringent
`conditions on handsfree telephone systems(cid:4) For ordi
`nary telephones the echo attenuation has to be at least
` dB in the case of single talk(cid:4) In double talk situations
`this value can be reduced by dB(cid:4) Beyond that only a
`negligible delay may be introduced into the signal path
`by the handsfree facility(cid:4)
`
`WAVES607_1008-0001
`
`Petitioner Waves Audio Ltd. 607 - Ex. 1008
`
`
`
`
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`
`gk
`
`g
`
`yk
`
`xk
`
`
`r
`
`AK
`A
`
`ck
`
`A
`A
`
`A
`
`farend
`speaker
`
`
`
`
`
`f r
`
`ek
`
`to farend
`speaker
`
`Figure Adaptive System
`
`ck ck
`
`ek xk
`kxkk
`
`
`
`where ek denotes the adaptation error dk the de
`sired signal ck the coe cient vector of the adaptive
` lter xk the excitation vector and nally a variable
`step size factor(cid:2)
`The computational requirements of the NLMS algo
`rithms are low(cid:2) This is important since the application
`considered here requires a large number of coe cients(cid:2)
`The disadvantage is its slow speed of convergence espe
`cially in case of correlated inputs(cid:2)
`
` (cid:2) NLMS algorithm with prewhitening lters
`
`A simple approach to overcome this problem is pre
`whitening of both input and reference signal(cid:2) This can
`be achieved by a linear prediction error lter(cid:2) Restrict
`ing oneself to a time invariant lter a lter of order one
`to four proved to be su cient(cid:2) Prewhitening lters of
`higher order have to be adaptive(cid:2) They can be designed
`using the LevinsonDurbinalgorithm(cid:2)
`
` (cid:2) Step size Control
`
`As mentioned before the NLMS algorithm uses an adap
`tation factor called step size which is responsible for
`both stability and speed of convergence(cid:2) Controlling the
`step size becomes especially important in case of noisy
`microphone signals like those in car environments or in
`double talk situations (cid:2)
`It can be shown that an optimal step size exists for
`adaptation in a noisy environment which is also suitable
`for non stationary excitation signals
`
` opt
`
`Ef kg
`Efekg
`
`
`
`
`
`where Ef g denotes expectation ek the current error
`value and k the residual echo signal(cid:2) The application
`of however requires a reliable double talk detector
` (cid:2)
`
`(cid:2) Systems for Acoustic Echo and Noise Con
`trol
`
`A number of systems have been proposed for acoustic
`echo and noise control(cid:2) They all use three units or a
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`
`f
`
`Loudspeaker
`Enclosure
`Microphone
`System
`
`r
`
`
`r
`
`
`
`
`
`
`
`
`r
`
`
`
`
`
`d
`Adaptive
`Filter
`
`
`
`
`
`r
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`d
`
`
`
`
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`d
`
`r
`
`
`
`Loss Control
`
`Post lter
`
`Figure Basic structure for acoustic echo and noise
`control
`
`subset of them Fig(cid:2) The rst unit is a loss control
`that attenuates the incoming andor the outgoing sig
`nal(cid:2) Early handsfree communication systems used this
`unit only reducing conversations to half duplex(cid:2) Be
`cause of the ITU regulations loss control still remains
`the most important function because it has to guaran
`tee the required attenuation(cid:2) The second unit consists of
`an adaptive lter functioning as a replica of the LEMS(cid:2)
`If perfect adaptation could be achieved loudspeaker and
`microphone would be decoupled entirely without any
`impact on locally generated signals(cid:2) The third most
`modern ingredient of an echo and noise control sys
`tem consists of a Wiener lter within the outgoing sig
`nal path(cid:2) In contrast to the loss control unit this lter
`provides a frequency dependent attenuation of the out
`going signal(cid:2)
`Its aim is twofold to suppress residual
`echos not covered by the adaptive lter and to enhance
`speech quality by suppressing noise components(cid:2)
`Design considerations and results achieved by these
`three units will be given within the following sections(cid:2)
`
` ADAPTIVE ALGORITHMS
`
`Several adaptation algorithms have been applied to
`acoustic echo cancellers(cid:2) Each of these algorithms min
`imises the mean square error of signal ek s(cid:2) Fig(cid:2) (cid:2)
`The algorithms discussed in this section are dealt with
`in order of increasing complexity(cid:2)
`
` (cid:2) NLMS
`
`The least mean square NLMS algorithm is the most
`easily and frequently implemented algorithm(cid:2) It is de
`scribed by the following relations
`
`ek dk cT k xk
`
`
`
`WAVES607_1008-0002
`
`Petitioner Waves Audio Ltd. 607 - Ex. 1008
`
`
`
` (cid:2) A ne Projection Algorithm
`
`Looking closely at the a ne projection algorithm
`APA
`it can be considered as an extension of the
`NLMS algorithm taking into account the P last exci
`tation vectors(cid:7)
`
`ek dk cT kxk
`
`
`
`ek ek ek ek P T
`
`Xk xk xk xk P T
`
`ck ck X k X T k Xk eT k
`
`Usually P is small compared to the total number of l
`ter coe cients(cid:7) In contrast to the NLMS algorithm the
`matrix X T k Xk has to be inverted(cid:7) This can be car
`ried out recursively(cid:7) A fast version of the APA called
`FAP fast a ne projection has been developed for
`an e cient implementation(cid:7) This algorithm is therefore
`suitable for acoustic echo cancellers(cid:7) However numeri
`cal instabilities occur because of recursively calculated
`correlation matrices(cid:7) One can overcome these problems
`by regularising the correlation matrix by adding a con
`stant value to the values of the main diagonal(cid:7) Fur
`thermore the algorithm has to be reinitialised whenever
`divergence is detected(cid:7)
`If an a ne projection of second order is applied the
`inverse of the matrix can be calculated directly requiring
`only small computational load(cid:7) Compared to the NLMS
`algorithm even a second order APA increases the speed
`of convergence remarkably(cid:7)
`
` (cid:2) RLS and FTF
`
`The recursive least squares algorithm RLS is known as
`a very fast converging recursive algorithm(cid:7) A straight
`forward notation of this algorithm is given here
`
`wk R
`xx k xk
`
`R
`xx k R
`xx k
`
`wk wT k
` w T k xk
`
`
`
`ek dk cT k xk
`
`ck ck ek wk
`
`
`
`
`
`
`
`
`
`where Rxxk denotes an estimate of the autocorrela
`tion matrix of the excitation signal an exponential
`forgetting factor and wk the gain vec
`tor(cid:7) The convergence of the RLS algorithm is superior
`to the NLMS algorithm(cid:7) However there is the problem
`of locking when is chosen close to one(cid:7) The tracking
`performance of the RLS algorithm is therefore not as
`satisfying as the initial convergence(cid:7)
`
`If the algorithm is implemented with niteprecision
`it can become unstable for the numerical roundo er
`ror increases(cid:7) A QRDecomposition based inversion of
`the autocorrelation matrix does not show this behaviour
` (cid:7)
`If one has to deal with a large number of coe cients
`the direct implementation of the RLS algorithm is not
`feasible since its computational complexity of order M (cid:7)
`Several approaches for a fast version of the RLS algo
`rithm are known principally based on prewindowing
`techniques which reduce the computational load to or
`der M (cid:7) A fast implementation of the RLS algorithm
` called Fast Transversal Filter algorithm FTF is
`organised in four steps
`
` Recursive forward linear prediction(cid:7)
`
` Recursive backward linear prediction(cid:7)
`
` Recursive computation of the gain vector(cid:7)
`
` Recursive estimation of the desired response(cid:7)
`
`Unfortunately the FTF algorithm is numerically unsta
`ble and tends to diverge(cid:7) In fact stabilising the RLS
`algorithms is a topic in its own right (cid:7) One ba
`sic idea is to extend the algorithm by accumulating the
`roundo errors and to perform corrections when the
`numerical error becomes signi cant(cid:7)
`
` (cid:2) Fast Newton
`
`Whereas the APA may be considered as an extended
`version of the NLMS algorithm the Fast Newton algo
`rithm can be seen as a simpli ed version of the fast RLS
`algorithm (cid:7) In fast implementations of the RLS al
`gorithm linear predictions of the order M are required
`where M is the size of the coe cient vector(cid:7) When
`the order of correlation of the excitation signal is small
`there is actually no need to calculate the full predic
`tion vector of order M (cid:7) Reducing the size of the pre
`diction vector to a size P appropriate to the excitation
`signal leads to the Fast Newton algorithm(cid:7) The conver
`gence performance is comparable to the RLS algorithm
`whereas the numerical complexity is of order M P (cid:7)
`
` (cid:2) Fullband Subband Blockprocessing
`
`Until now our discussion of adaptive lters has dealt
`only with fullband signals since this is the most straight
`forward method of implementation(cid:7) However straight
`forward does not necessarily mean most e cient(cid:7) Both
`subband and block processing enable implementations
`resulting in less computational cost(cid:7)
`If a signal is split up into subbands one can subsam
`ple the resulting signals leading to shorter adaptive echo
`cancellers(cid:7) All of the adaptive algorithms mentioned
`above are suitable for subband implementation(cid:7) The
`
`WAVES607_1008-0003
`
`Petitioner Waves Audio Ltd. 607 - Ex. 1008
`
`
`
`tion to one of the excitation signals (cid:2) In a second
`approach the correlation matrix is regularised by intro
`ducing leakage into the update of the coe cient vector
`(cid:2)
`
` NOISE REDUCTION
`
`With the increasing number of mobile telephones more
`and more people use them in cars(cid:2) This generates a
`demand for handsfree telephone sets for cars that not
`only increase the comfort to the user but also allow the
`driver to keep his hands on the steering wheel(cid:2)
`To enhance the speech signal outgoing to the farend
`user noise reduction methods are desirable(cid:2)
`We describe one channel methods for two reasons
` rst the cost for installing a second channel may be pro
`hibitive and secondly single channel procedures can also
`be extended to multichannel methods(cid:2)
`
`(cid:3) Basic architecture
`
`Most noise reduction procedures are based on the
`Wiener solution
`
`XP SD kB np
`GoptkB n
` NP SDkB n
`
` f
`
` Gopt f
` otherwise
`
`
`
`where NP SDkB n and XP SDkB n denote the PSD
`of the noise and the distorted input signal respectively
`and B is equal to the block size(cid:2) The frequency index is
`given by n(cid:2) Compared to the wellknown Wiener lter
`an overestimation factor a variable power p and a
`spectral oor f are introduced(cid:2)
`Unfortunately there is a con ict between the ratio
`of the noise reduction and the quality of the resulting
`speech signal(cid:2) The parameters suggested above have to
`be chosen such that a subjective obtimum is achieved(cid:2)
`To preserve natural sounding speech the spectral
` oor is introduced which in turn limits the SNR
`improvement to log f dB(cid:2) The imprecision asso
`ciated with estimation of the timevarying PSDs causes
`an unpleasant tonal noise(cid:2) The socalled musicaltones
`can be attenuated by tailoring the transfer function ad
`equately with the additional parameters(cid:2)
`Modi cations of the lter are given by the
`MMSESTSA estimator Minimum Mean Square Error
`ShortTime Spectral Amplitude and its derivation the
`MMSELSA estimator Minimum Mean Square Error
`Logarithmic Spectral Amplitude (cid:2) To derive the
`algorithms the timevarying property of the distorted
`input signal has been taken into account(cid:2) For these al
`gorithms an <a priori< and an <a posteriori< signal to noise
`ratio SNR are estimated
`jXkB nj
`jNP SDkB nj
`
`SN RpostkB n
`
`
`
`SN RpriokB n maxSN RpostkB n
`
`processing power saved may be used for more complex
`adaptation(cid:2) However subband realisations do have one
`substantial disadvantage that may prohibit their appli
`cation they introduce delay into the system (cid:2) This
`delay is caused by the lterbanks for analysis decom
`position and synthesis of the excitation and error sig
`nals(cid:2) These lterbanks have to be designed with re
`spect to the special demands of an adaptive echo can
`celler(cid:2) The aliasing terms for example have to be min
`imised (cid:2) There is a substantial body of literature
`concerned with the design of polyphase lter banks used
`for echo cancellation e(cid:2)g(cid:2) (cid:2)
`In block processing the impulse response of the adap
`tive lter is split up into blocks(cid:2) Using fast convolution
`techniques the calculation of the output signal can be
`carried out very e ciently (cid:2) Again there is a trade
`o between e ciency of processing and delay(cid:2) However
`blockprocessing o ers the advantage of optimising de
`lay versus processing power(cid:2) Small block sizes keep the
`delay low but increase the processing power required(cid:2)
`
` STEREOPHONIC ECHO
`
`CANCELLATION
`
`Recently stereophonic acoustic echo cancellation be
`came more and more important for applications such
`as teleconferencing or video games (cid:2)
`
`yk
`
`
`
`e
`r
`
`
`e r
`
`e k
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`
`hhhh
`HH
`
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`kq (cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`
`
`
`h
`
`
`ek
`
`So
`S
`gk
`S
`JJ
`J
`g k
`
`J
`
`
`
`xk
`r
`
`x k
`
`r
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)R (cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`e
`
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0) (cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`kq(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`
`hh
`
`Figure Stereophonic echo cancellation
`
`As the excitation signals of the two channels are corre
`lated Fig(cid:2) there is no unique solution for identifying
`the two impulse responses(cid:2) Furthermore an extended
`correlation matrix of the two input signals has to be
`inverted(cid:2) In case of high correlation this causes numer
`ical instabilities due to illconditioning which in turn
`leads to divergence(cid:2) However there are a number of
`approaches to overcome the correlation of the two exci
`tation signals(cid:2) One technique applies a nonlinear func
`
`WAVES607_1008-0004
`
`Petitioner Waves Audio Ltd. 607 - Ex. 1008
`
`
`
`Filter
`
`Filter
`
`Filter
`
`+
`
`Filter
`
`Filter
`
`Noise reduction processing
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`+
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Figure Filterbank
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`+
`
`+
`
`+
`
`Noise reduction processing
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
`Filter
`
` jGoptk B n XkB nj
`jNP SDkB nj
`XkB n describes the STFT of the input signal xk
`at block andkB (cid:6) The weighting rules for the algorithms
`are given by
`
`
`
`
`
` MMSESTSA
`
`GoptkB n
`
`
`
`p
` s
`
`
` SN R postkB n
`
`SN RpriokB n
` SN R priokB n
`
`SN RpriokB n
`
` SN R priokB n
` M SN RpostkB n
`
` uI uwith M u exp u
`
`
` I u
`
` MMSE LSA
`
`Filter
`
`Filter
`
`Filter
`
`Figure Cascaded lterbank
`
`s(cid:6) (cid:6)(cid:6) Alternatively nonuniformly distributed reso
`lution can be obtained by cascaded lter banks Fig(cid:6)
`including the special case of the discrete wavelet trans
`form Fig(cid:6) (cid:6)
`With these cascaded structures also di erent time
`resolutions are obtained as subsampling is performed
`after each lter stage(cid:6) Fast varying high frequency com
`ponents can be treated with a higher resolution in time
`whereas low frequency components show a more detailed
`frequency resolution(cid:6)
`
`(cid:7) Estimation of the Power Spectral Densities
`
`The timefrequency analysed input signal can be used
`to estimate XP SDkB n and NP SDkB n(cid:6)
`To determine XP SDkB n a recursively smoothed
`periodogramm is su cient(cid:6)
`However only slight
`smoothing is tolerable to avoid echoreverberation ef
`fects in the enhanced signal(cid:6)
`The estimation of NP SDkB n has to be based on
`XkB n also(cid:6) To distinguish between noise compo
`
`2
`
`2
`
`HP
`
`TP
`
`+
`
`2
`
`2
`
`HP
`
`TP
`
`+
`
`2
`
`2
`
`HP
`
`TP
`
`+
`
`processing
`Noise reduction
`
`2
`
`2
`
`HP
`
`TP
`
`2
`
`2
`
`HP
`
`TP
`
`HP
`
`TP
`
`2
`
`2
`
`Figure Wavelet lterbank
`
`GoptkB n
`
`SN RpriokB n
` SN R priokB n
`
`
`
`SN RpriokB n
`
`e t
`
` SN R priokB n
`t dto and I I the
`
` M SN RpostkB n
`with Mu expn
`R
`
`u
`modi ed Bessel functions of rst and
`second order(cid:6)
`
`(cid:7) Frequency Decomposition
`As shown above the noise reduction lter is de ned in
`the frequency domain(cid:6) Therefore a frequency analysis of
`the nonstationary input signal is required(cid:6) One method
`achieving this is to use the STFT Short Time Fourier
`Transform which needs the multiplication of the input
`signal by a timewindow m
`
`XkB n
`
`N
`
`Xm
`
`xm m kBe j
`
`N nm
`
`
`
`Subband decomposition provides a second class of meth
`ods(cid:6) The sample values of the subband signals can pro
`duce a set of spectral coe cients for the noise reduction
`algorithm Fig(cid:6) (cid:6)
`After noise reduction the subband signals are upsam
`pled passed through antialiasing lters and synthe
`sised to obtain the enhanced output signal(cid:6) The lter
`banks shown in Fig(cid:6) split the input signal into uni
`formly spaced frequency bands comparable to the
`STFT(cid:6) Modi cations include nonuniformly spaced fre
`quency resolutions o ering the possibility of mod
`eling the human perception system ear and brain
`
`WAVES607_1008-0005
`
`Petitioner Waves Audio Ltd. 607 - Ex. 1008
`
`
`
`nents and speech components either voice activity de
`tection or the socalled MinimumStatistics are
`necessary(cid:10) In the latter the minima of the smoothed in
`put spectrum are observed for each frequency bin over a
`time window(cid:10) The length of this time window is chosen
`according to the duration of speech components(cid:10)
`
`(cid:2) Psycho Acoustics
`
`Recent studies use psycho acoustics to improve noise
`reduction algorithms(cid:10) Two approaches are followed the
` rst is to adapt the signal analysis to the human ear the
`second is to use the socalled masking thresholds(cid:10)
`It is known that the human ear performs a non
`uniform frequency analysis on a logarithmic scale Bark
`scale (cid:10) Methods presented in section (cid:10) allow a
`frequency analysis adapted to the human perception(cid:10)
`Masking means that weak tones are covered by
`stronger neighbouring tones in time or frequency(cid:10) Noise
`reduction lter design takes advantage of these proper
`ties (cid:10)
`
`(cid:2) Combinations
`
`When acoustic echo control is applied in a noisy environ
`ment such as in cars a combination of noise reduction
`and echo control is desirable(cid:10) As far as the succession
`is concerned echo control should precede noise reduc
`tion so that parts of the echo not compensated by
`the adaptive lter may be considered as additional noise
` (cid:10)
`The stationarity assumption for the background noise
`does not hold for the residual echo(cid:10) Therefore di erent
`estimation methods have to be applied(cid:10) The power spec
`tral density PSD of the residual echo EP SDkB n
`is estimated by a power transfer factor kB
`EP SDkB n kB FP SDkB n or a trans
`fer function kB n EP SDkB n kB n
`FP SDkB n specifying the ratio between the farend
`signal and the residual echo where FP SDkB n de
`notes the PSD of the farend signal(cid:10)
`The distortion signal is given by the sum of the esti
`mated noise and the residual echo
`
`N
`P SDkB n NP SDkB n EP SDkB n
`
`
`
`Consequently N
`P SDkB n replaces NP SDkB n in
`the noise reduction methods presented above(cid:10)
`Separating the two problems of cancelling the residual
`echo and noise suppression by applying two lters in
`series o ers additional degrees of freedom (cid:10)
`
`(cid:2) MultiMicrophoneSolutions
`
`Microphone arrays o er further improvements of noise
`reduction(cid:10)
`A simple approach consists of delay and sum beam
`forming where a control system adapts the direction of
`maximum sensitivity towards the nearend speaker(cid:10)
`
`Assuming that the di erent microphone signals are
`comprised by correlated speech and uncorrelated noise
`one yields an improved estimate of the noise power spec
`tral density (cid:10) The following formulas illustrate the
`procedure
`
`X kB n SkB n N kB n
`
`XkB n SkB n N kB n
`
`
`
`
`
`Cn
`
`
`
`E fX kB n XkB ng
`kB n E X
`kB n
`E X
`s n
`
`
`
`s n n n
`s n
`
`
`nn
`
`
`
`
`
`with SkB n being the STFT of the speech signal sk
`N kB n NkB n the STFTs of the uncorrelated
`
`
`
`noise signals s n n n and nn the correspond
`ing power spectra(cid:10)
`The assumption that the noise signals are uncorre
`lated is more valid in higher frequency bands and for
`microphones located further apart(cid:10) The correlation co
`e cients Cn may also in uence the transfer function
`Goptn of the noise reduction lter(cid:10)
`
` LOSS CONTROL
`
`Loss control is required to guarantee a prescribed echo
`suppression level e(cid:10)g(cid:10) by the ITUT(cid:10) The total at
`tenuation introduced by the loss control is distributed
`between the loudspeaker and microphone paths respec
`tively in such that the communication is disturbed as
`less as possible(cid:10)
`In combination with the acoustic echo cancellation
`loss control has to insert only the di erence between the
`attenuation reached by the acoustic echo canceller and
`the level required by ITUT(cid:10) This requires that means
`have to be provided to estimate the performance of the
`echo canceller(cid:10)
`
` SYSTEM CONTROL
`
`So far we have not discussed the importance of con
`trolling the handsfree telephone systems(cid:10) In a realistic
`scenario the adaptive lter does not achieve more than
` dB ERLE echo return loss enhancement and may
`achieve less if the processing power does not allow a
`large number of coe cients(cid:10) A loss control is therefore
`required(cid:10) Since the LEMSystem is timevariant the
`adaptation has to be performed whenever possible to
`track system changes(cid:10) Situations however may occur
`e(cid:10)g(cid:10) double talk low SNR where only small step sizes
`for the adaptation are permissible(cid:10) An exact observa
`tion of the total system is therefore important for the
`overall performance (cid:10)
`
`WAVES607_1008-0006
`
`Petitioner Waves Audio Ltd. 607 - Ex. 1008
`
`
`
` IMPLEMENTATIONS
`
`At this time lowcost realtime processing means that
`the algorithms have to be implemented on xedpoint
`signal processors with only bit wordlength(cid:9) The
`step from mathematical notation to implementation is
`therefore not trivial(cid:9) Restrictions of computational cost
`are still limiting the performance of the system(cid:9) How
`ever this problem may be resolved in the future(cid:9)
`
` QUALITY TESTING
`
`Although international test standards are desirable no
`de nition has yet been made(cid:9) Besides transmission qual
`ity in case of echo cancellation the system should be
`judged with regard to conversation ability(cid:9) For noise re
`duction the naturalism of the outgoing speech signal is
`most important(cid:9)
`As an example of objective testing of echo cancellers
`the composite source signal should be mentioned (cid:9) It
`consists of three sections with di erent signal character
`istics tonal random noise silence which enables both
`signal detection and adaptation(cid:9) For testing the dou
`ble talk ability composite source signals with increas
`ing respectively decreasing envelops are superimposed
` (cid:9) However it is not su cient to judge echo cancelling
`and noise reduction merely on objective measures since
`they cannot be brought into line with human perception(cid:9)
`Subjective judgments by system users are therefore still
`necessary(cid:9) These tests are based on the mean opinion
`score MOS which marks the signal on a scale of to
` very bad to very good(cid:9)
`
` OUTLOOK
`
`Voice communication systems with handsfree facilities
`are on the market(cid:9) Double talk capability and noise
`reduction are provided at least to a certain extent(cid:9) Fur
`ther improvements however are necessary(cid:9) These may
`result from a better understanding of the problem and
`more powerful yet a ordable hardware(cid:9) As far as the
`problem understanding is concerned an improved joint
`control of the various algorithms comprising an echo and
`noise procedure are most promising(cid:9) Therefore even
`after more than one decade of intensive research and
`development the challenge still remains(cid:9)
`
`References
`
` The new Bell telephone(cid:9) Scienti c American
` p(cid:9) (cid:9)
`
` E(cid:9) Hansler The handsfree telephone problem
`An annotated bibliography update(cid:9) Annales des
`T el ecommunications pp(cid:9) (cid:9)
`
` E(cid:9) Hansler The handsfree telephone problem
`A second annotated bibliography update(cid:9) Proc(cid:14) of
`the th Internat(cid:14) Workshop on Acoustic Echo and
`Noise Control R ros Norway pp(cid:9) (cid:9)
`
` A(cid:9) Gilloire P(cid:9) Scalart et al(cid:9) Innovative speech pro
`cessing for mobile terminals An annotated bibliog
`raphy(cid:9) COST workshop Toulouse France July
` (cid:9)
`
` A(cid:9) Gilloire State of the art in acoustic echo can
`cellation(cid:9) A(cid:9)R(cid:9) Figueiras and D(cid:9) Docampo Ed(cid:9)
`Adaptive Algorithms Applications and Non Classi
`cal Schemes(cid:14) Universidad de Vigo (cid:9) pp(cid:9) (cid:9)
`
` A(cid:9) Gilloire E(cid:9) Moulines D(cid:9) Slock and P(cid:9) Duhamel
`State of the art in acoustic echo cancellation(cid:9)
`A(cid:9)R(cid:9) FigueirasVidalEd(cid:9) Digital Signal Pro
`cessing in Telecommunications(cid:14) Springer London
`U(cid:9)K(cid:9) p(cid:9) (cid:9)
`
` T(cid:9) P etillon A(cid:9) Gilloire and S(cid:9) Th eodoridis A com
`parative study of e cient transversal algorithms
`for acoustic echo cancellation(cid:9) Proc(cid:14) EUSIPCO
` Brussels Belgium pp(cid:9) (cid:9)
`
` R(cid:9) Martin Algorithms for handsfree voice com
`munication in noisy environments(cid:9) Proc(cid:14) (cid:14) Aach
`ener Kolloquium Signaltheorie Institut
`fur Technische Elektronik der RWTH Aachen pp(cid:9)
` (cid:9)
`
` International Telecommunication Union General
`characteristics of international telephone connec
`tions and international telephone circuits acoustic
`echo controllers(cid:9) ITUT Recommendation G(cid:14)
` (cid:9)
`
` S(cid:9) Gay and S(cid:9) Tavathia The fast a ne projection
`algorithm(cid:9) Proc(cid:14) ICASSP (cid:9)
`
` S(cid:9) Haykin Adaptive Filter Theory(cid:14) nd ed(cid:9)
`PrenticeHall International New Jersey (cid:9)
`
` J(cid:9) Cio and T(cid:9) Kailath Fast RLS Transversal Fil
`ter for adaptive ltering(cid:9) IEEE Trans(cid:14) on ASSP
`Vol(cid:9) pp(cid:9) (cid:9)
`
` G(cid:9)V(cid:9) Moustakides and S(cid:9) Theodoridis Fast New
`ton Transversal Filters A New Class of Adap
`tive Estimation Algorithms(cid:9) IEEE Trans(cid:14) on Signal
`Processing Vol No(cid:9)