throbber
NOISE ESTIMATION TECHNIQUES FOR ROBUST SPEECH RECOGNITION
`
`H. G. Hirsc h-, C. Ehrlicher
`of Technology, 52056 Aachen,
`Institute of Communication Systems and Data Processing, Aachen University
`Germany
`
`
`
`
`
`ABSTRACT
`
`1. INTRODUCTION
`
`background noise is not stationary or the signal-to­
`
`noise ratio (SNR) is low. Some approaches are known
`Two new techniques are presented to estimate the
`to avoid the problem of speech pause detection and to
`noise spectra or the noise characteristics for noisy
`
`
`estimate the noise characteristics just from a past
`No explicit speech pause detection is
`
`segment of noisy speech [3]. [6), [7]. Tho
`speech signals.
`
`of just about 400 ms
`
`disadvantage of most approaches is the need Oof
`
`required. Past noisy segments
`
`relatively long past segments of noisy speech.
`
`duration are needed for the estimation. Thus the
`is able to quickly adapt to slowly varying
`
`The first method presented here calculates the
`algorithm
`weighted sum of past spectral magnitude values Xi in
`
`noise levels or slowly changing noise spectra. This
`
`techniques can be combined with a nonlinear spectral
`each subband i. The weighting is done by a simple first
`subtraction scheme. The ability can be shown to
`
`order recursive system
`enhance noisy speech a nd to improve the
`= (1-0.)· Xj(k) + IX· Ni(k-1) (1),
`Nj(k)
`performance of speech recognition systems. Another
`
`of a robust voice activity
`
`application is the realization
`1\
`where Xi(k) denotes the spectral magnitude at time k in
`detection.
`subband i and Ni(k) is an estimation
`of the noise
`magnitude.
`Some algorithms immediately
`use an average of past
`for the noisle
`spectral power values as an estimation
`
`power in the individual subband to realize a so called
`
`continous spectral subtraction (CSS) [1]. In contrast to
`
`Many proposals are known to improve speech
`recognition in situations with a noisy background,
`
`
`these approaches an adaptive threshold is introduced
`e.g.
`here. The magnitude values Xi are distributed
`[1], [2]. [3]. [4]. Especially
`
`the modified statistics of
`according to a Rayleigh d istribution in segments of
`spectral parameters should be considered
`in case of
`....
`using HMMs [5]. A well working algorithm to detect
`
`
`pure noise. Considerably higher values occur at the
`.
`modified statistics.
`
`speech pauses is presumed to determine these
`a threshold �. Ni(k-l) IS
`onset of speech. Thus
`where � takes a value in the rangE� of abolJ�
`introduced
`presents two methods to estimate
`This contribution
`1,5 to 2,5. When the actual spectral
`component Xi(k)
`the spectral parameters of noise without an explicit
`
`
`exceeds this threshold this is considered as a rough
`The first algorithm calculates
`speech pause detection.
`
`
`
`is accumulation detection of speech and the recursive
`stopped. The accumulated value is taken as an
`the noise level in each subband as a weighted average
`
`of past spectral magnitude values which are below an
`
`estimation for the noise level at this time. This simplE�
`
`adaptive threshold. The second approach evaluates
`in figure 1 as part of a completl�
`
`processing is illustrated
`the histograms of past spectral magnitude values in
`noise reduction scheme.
`each subband. The maximum is taken as an estimate
`A
`for the noise level.
`"
`The noise estimate N i is calculated with a first order
`
`recursive system. Ni is m ultiplied with an over-
`
`
`estimation factor � in the usual range of about 1,5 to
`values of (Xi - � Ni) the data input as
`Most of the noise reduction techniques based on
`
`2,5. For positive
`are stopped. This
`single channel recordings need an estimation
`of the
`well as the recursive accumulation
`A
`noise spectrum. This is usually done by detection of
`
`indicates an onset of speech. Negative values of
`•
`
`
`speech pauses to evaluate segments of pure noise. In
`(Xi - J3 Nj) are set to zero to get an estImate Sj of the
`this is a difficult
`if the
`practical
`task especially
`situations
`*) This author is with Ascom Business Systems (Solothurn,
`
`Switzerland) now. Email: hirsch@ens.ascom.ch
`
`2. E STIMATION OF NOISE SPECTRUM
`
`153
`
`$4.00 © 1995 IEEE
`0-7803-2431-5/95
`
`WAVES345_1007-0001
`
`Petitioner Waves Audio Ltd. 345 - Ex. 1007
`
`

`
`A
`yes
`N· I
`control
`+--.....;;;St�DP�....j output
`coc
`
`L-. __
`
`ex
`stop data input
`
`as sum over all frames of a FFT based
`are calculated
`spectral analysis.
`
`relative error I %
`4
`3.5
`3 .....
`
`2.5
`2 ....
`1.5
`......... , .................................. .
`
`histogram tec hnique
`
`0.5
`o
`
`SNRldB
`-5 o 5 1 0 1 5
`20
`Fig. 1. Simple noise reduction scheme in one
`subband
`Fig. 2. Relative error for
`
`the estimation of the noise
`power spectrum with both techniques at different
`SNRs
`The second approach is based on histograms of past
`A good estimation can be achieved by both
`spectral values in each subband. The above
`
`techniques. As expected the evaluation of histograms
`mentioned threshold is used to evaluate histograms of
`
`
`at higher SN R is leads to better results. The increase
`
`
`past values which are below this threshold. This
`caused by an inaccurate
`noise estimation during
`can be interpreted as a rough separation
`of
`processing
`segments of speech. Even at a high SNR this small
`the distributions of noise (Rayleigh distribution) and
`
`
`errors effect the calculation of a relative error much
`
`speech. Speech takes much higher values. Past
`more.
`to noise segments of about 400
`values corresponding
`ms duration are evaluated to determine the distribution
`Both techniques are applied to a noise reduction
`in about 40 bins. The noise level is estimated as
`[2]. A well
`
`scheme using nonlinear spectral subtraction
`
`maximum of the distribution in each subband. The
`
`
`by informal working suppression was confirmed
`estimated values for the noise magnitude are
`
`
`
`effects as e.g. musical listening tests. Also negative
`
`smoothed versus time to eliminate rarely occuring
`
`
`e.g. tones can be reduced by optimizing parameters,
`
`spikes. This leads to a very accurate estimation of the
`the overestimation factor.
`noise spectrum.
`
`
`
`clean speech. The computational complexity is low.
`
`
`
`3. RECOGNITION OF NOISEX DATA
`
`An objective evaluation of the accuracy is illustrated
`in
`stationary noise signals [8] were
`figure 2. Different
`A first series of recognition experiments was carried out
`
`
`
`artificially added to clean speech at different SNRs.
`[8].
`using the isolated words
`of the Noisex92 study
`
`The average noise spectrum!':! is calculated from the
`This is a first attempt for a common data base to get
`noise itself as well as an average estimated noise
`spectrum N obtained with the two mentioned
`
`comparative results on the recognition of noisy
`The relative error
`uttcranccs of the ten digits at different SNRs. The
`
`
`added to speech. Different noises are artificially
`techniques.
`-I _I
`L (N' - N·)2
`digits were spoken 100 tim es separately for training
`
`i
`Recordings exist for a male and a female
`and testing.
`L!':!i2
`speaker at a sampling rate of 16 kHz. Both above
`i
`mentioned estimation techniques are applied to the
`
`as a preprocessing step
`is calculated as an objective measurement for the
`
`
`
`nonlinear spectral subtraction
`
`to recognition. A HMM recognizer [9] is used for the
`accuracy. In figure 2 the average relative error
`is shown
`adding a car noise [8] to different
`of 3 male
`experiments configured as a connected word
`utterances
`and 3 female speakers. Average spectral components
`A single mixture continous HMM is trained
`recognizer.
`
`(2)
`
`154
`
`WAVES345_1007-0002
`
`Petitioner Waves Audio Ltd. 345 - Ex. 1007
`
`

`
`100
`- -------
`------ - -
`90 - -----------------
`------- -- --
`
`...... teChn;:;?:-::
`
`with 8 emitting states for
`each word. Pauses
`the number of "active"
`are
`bands is less than 4. A robust
`voice activity
`detection
`can be achieved by this
`represented
`by a single state HMM. All training
`is done
`with the clean data only. A set of 15 MFCC (Mel
`technique.
`frequency cepstral coefficients)
`are calculated as
`acoustic
`parameters
`for the recognition.
`Some results
`During segments of pauses the
`spectral subtraction
`jls
`are shown in figure 3 as average of 5 different
`noises
`applied with an overestimation
`factor of 3. An
`and as average of the two speakers.
`The male and the
`interesting
`result is observed decreasing
`thle
`female utterances
`were separately
`recognized.
`overestimation
`factor from usual values in the range of
`2,5 to a value of just 1 for segments of speech. Best
`recognition
`rate I %
`results
`are obtained
`for an overestimation
`factor in the
`range of 1. The use of
`a factor of 1 for the
`overestimation
`degrades the
`noise reduction
`scheme
`to a simple subtraction
`in subbands.
`This effect can ble
`----------------------------x-------
`explained by the training
`of the HMMs. The average
`::----------
`80
`and the variance of the acoustic parameters
`ane
`hi�t�;;�-�------:�:
`/::
`estimated
`for each state from the clean data. The
`70
`modified increased
`variance of spectral
`parameters
`is
`60
`not considered
`in this contribution.
`Thus tho average
`values are mainly evaluated for the recognition.
`50
`Subtracting
`more than the noise level will lower the
`.:{\�it�()_�t
`40
`spectral
`parameters
`in the individual
`subband on the
`__ ,-.• x"/' preprocessing
`30' -
`average. This
`will decrease the
`estimated
`averages in
`20 ------
`--------,-,-,.--------------
`----------------
`-------------------.--
`the corresponding
`states of the HMM.
`-:·-··-________________
`·:::�:·
`__ x::·:·:·
`_ ____ . __ _
`__ :·:::-:·�
`SNR/dB
`10
`O��-----r----�----'-----r--
`A second data base is considered
`for another series of
`6 12 18
`o
`-6
`experiments.
`13 words (digits
`inluding
`"zero",
`"oh" and
`Fig. 3. Average recognition
`results for a speaker
`''yes"
`, "no") were recorded from 200 speakers via
`dependent recognition
`of the Noisex data
`telephone lines. This time
`a HMM recognizer
`is
`configured
`as an isolated word recognizer
`but
`Considerable
`improvements
`can be achieved by
`including
`at model for the pauses. A continous
`HMM i:�
`applying the noise estimation
`techniques.
`In addition
`trained with 8 emitting
`states and 4 mixtures
`per state.
`the detection
`of speech pauses is implemented
`to
`Pauses are represented
`by a single state
`with 4
`obtain these results.
`This is necessary because
`no
`mixtures.
`5 PLP cepstral
`coefficients
`[10J are used as
`individual
`HMM model is calculated
`for the pauses at
`acoustic parameters.
`For each condition
`the
`each noise condition.
`The detection
`is based on the
`recognition
`rate is calculated
`as an average of 4
`evaluation
`of the SNRs in all subbands. A relative
`recognition
`experiments
`using 50 different
`speakers
`measure NX rei of the ratio NIX (noise to noise&signal
`)
`out of the 200 for training
`and the remaining 150 for
`is calculated
`for each subband:
`testing in each individual
`experiment.
`Car noise was
`NXi(k) -NXimin(k)
`artificially
`added at SNRs in the range from 5 t(> 20 dB.
`NXirel(k) = ---
`..:.:.....:- .....:.:.:..:::..:.:....:....­
`NXima x(k) -NXimin(k) (3)
`Some recognition
`results are illustrated
`in figure
`4. The
`where smoothed versions are used for N and X.
`experiments
`applying the simple noise reduction
`N Xmin and NXmax are determined
`from past segments
`scheme shown in figure 1 were done in comparison
`to
`using PLP [10J or Rasta-PLP
`of about 600 ms. The value NXrel is already calculated
`analysis [11]. Rasta-PLP
`is a well working technique
`to reduce the influence
`of
`to realize the nonlinear
`spectral
`subtraction.
`A low
`different
`frequency responses during recording
`or
`value of NXrel indicates speech.
`Speech pauses are
`transmission.
`It introduces
`a high-pass
`filtering
`of th!�
`detected by counting the number of subbands where
`logarithmic
`spectral
`envelopes
`in each subband. Thus
`the ratio NXrel is less than a certain threshold
`e.g. in
`our realization
`a value of 0,4. Using a FFT filter
`bank
`logarithmic
`domaine. The impulse response of thle
`with 128 subbands frames are classified
`as pauses if
`high-pass filter is similiar
`to the response of th,e
`
`4. SPEAKER INDEPENDENT RECOGNITIONI
`
`it can be interpreted
`as a spectral subtraction
`in thl6
`
`155
`
`WAVES345_1007-0003
`
`Petitioner Waves Audio Ltd. 345 - Ex. 1007
`
`

`
`rate I %
`
`---------------------------------..... -.--.. -.............. -.-......... ----.--....... ------.. .
`recognition
`
`1 00
`
`:���
`
`60
`50
`40
`30
`20
`•• =.---=.---
`... ------­
`.... --.-..
`
`
`-IJ :-"""--.-� ... -�.--:::::----"::"----.-.--
`1 0 -------.-.11;,;
`SNR/dB
`Q-L--,------,-----r---,.--
`5
`
`10
`
`15
`
`20
`
`combined with well known spectral subtraction
`
`
`
`techniques. Reducing the overestimation factor to a
`value in the range of 1 leads to simple reduction
`
`schemes with low computational complexity.
`These approaches are a good supplement to HMM
`
`recognition schemes which consider the modified
`
`statistics of spectral parameters caused by additive
`noise [5].
`
`6. ACKNOWLEDGEMENT-
`
`This work was partly carried out at the International
`
`
`Computer Science Institute in Berkeley, USA. The
`authors would like to thank the whole speech group
`
`
`
`and especially Dr. Morgan for fruitful discussions and a
`stimulating athmosphere.
`
`7. REFERENCES
`
`[1] J.A. Nolazco Flores, S. J. Young, "Continous
`Speech
`Fig. 4. Recognition results for a speaker independent
`in Noise Using Spectral Subtraction
`and HMM
`Recognition
`Adaptation·, ICASSP-94, Vol. 1 , pp. 409-412, 1994
`recognition (car noise)
`with a Nonlinear
`[2] P. Lockwood, J. Boudy, "Experiments
`filter scheme presented in figure 1.
`Spectral Subtractor,
`Hidden Markov Models and the
`The simple noise reduction is integrated into PLP
`
`
`Robust Speech Recognition in Cars·, Speech
`Projection for
`
`PLP includes a spectral analysis with a FFT
`Communication, Vol. 11, No. 2-3, pp. 215-228, 1992
`analysis.
`of a number of subband energies
`and the calculation
`[3] D. Van Campernolle,
`"Noise Adaptation in a Hidden
`of the subbands is derived from
`where the definition
`
`Markov Model Speech Recognition System·, Computer
`
`Speech and Language, Vol. 3, pp. 151-167, 1989
`
`groups of the human auditive system.
`the frequency
`[4] H.G. Hirsch,
`P. Meyer, H.W. Ruhl, "Improved Speech
`Better results are obtained applying the noise
`Recognition Using High-Pass Filtering
`to the subband energies of the 15
`of Subband
`reduction
`Envelopes", Eurospeech-91,
`pp. 413-416,
`1991
`
`than to all output values of
`
`nonlinearly spaced filters
`[5) M.J.F. Gales, S .J. Young, "Cepstral
`Parameter
`
`the FFT. The variance of the subband energy seems
`
`Compensation for HMM Recognition in Noise", Speech
`to decrease by summing up the FFT energies in 15
`Vo1.12, No.3, pp. 231-240, 1993
`Communication,
`subbands during segments of noise.
`[61 R. Martin, "An Efficient
`Algorithm to Estimate the
`Also this time a speech pause detection
`is added in
`
`Instantaneous SNR of Speech Signals", Eurospeech -93,
`case of applying the processing scheme to speech
`
`pp.1 093-1 096, 1993
`Thus the 15 subband energies are
`recognition.
`[7] H.G. Hirsch, "Estimation of Noise Spectrum and its
`
`filtered with the mentioned filter
`scheme using an
`
`to SNR Estimation and Speech Enhancement",
`Application
`
`
`overestimation factor of 2. A robust speech detection
`
`
`Technical Report TR-93-012, International Computer Science
`
`
`Institute, Berkeley, USA, 1993
`summing up the output values of all
`can be realized
`[8] A. Varga, H.J.M. Steeneken, " Assessment for Automatic
`
`for a positive value of the sum. Again
`
`filters and looking
`
`
`Speech Recognition: II. Noisex92: A Database and an
`factor of 3
`
`a filtering is applied with an overestimation
`Experiment to Study the Effect of Additive Noise on Speech
`during segments of noise. Speech segments are
`Vo1.12, No.
`
`Recognition Systems", Speech Communication,
`
`filtered with a factor of 1.
`3,pp.247-252,1993
`[9] S.J. Young, "HTK Version 1.4: Reference Manual and
`User Manual", Cambridge University Engineering
`
`Department, Speech Group, 1992
`[10J H. Hermansky, "Perceptual Linear Predictive
`Two methods are presented to estimate the noise
`
`
`(PLP)
`spectra and more general the noise characteristics
`of
`Analysis of Speech", JASA, pp. 1738-1752, 1990
`[11] N. Morgan, H. Hermansky et aI., "Compensation
`noisy speech without an explicit speech pause
`for the
`Effect of the Communication Channel in Auditory-Like
`
`detection. These are able to adapt to varying noise
`Analysis of Speech (Rasta-PLP)",
`
`Eurospeech-91, pp. 1367-
`levels. Also one of the algorithms has a low
`1370,1991
`can be
`
`
`computational complexity. The approaches
`
`5. CONCLUSION
`
`156
`
`WAVES345_1007-0004
`
`Petitioner Waves Audio Ltd. 345 - Ex. 1007

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket