throbber
ISCAArchive
`http://www.isca-speech.org/archive
`
`3rd European Conference on
`Speech Communication and Technology
`EUROSPEECH'93
`Berlin, Germany, September 19-23, 1993
`
`AN EFFICIENT ALGORITHM TO ESTIMATE THE
`INSTANTANEOUS SNR OF SPEECH SIGNALS
`
`Rainer Martin
`
`Institute for Communication Systems and Data Processing (IND), Aachen University of Technology,
`Templergraben 55, 52056 Aachen, Germany, Phone: +49 241 806984, Fax: +49 241 806985
`
`ABSTRACT
`This contribution presents an efficient algorithm to esti(cid:173)
`mate the instantaneous signal-to-noise ratio of speech signals.
`The algorithm is capable to track non stationary noise signals
`and has a low computational complexity. It does not need a
`speech activity detector nor histograms to learn signal statis(cid:173)
`tics. The algorithm is based on the observation that a noise
`power estimate can be obtained using minimum values of a
`smoothed power estimate. This paper will present this algo(cid:173)
`rithm, its performance, its limits, and some applications.
`Keywords: SNR, time delay estimation, speech enhancement
`
`1. INTRODUCTION
`Instantaneous SNR estimation is an essential component
`of speech processing algorithms which are sensitive to vary(cid:173)
`ing noise levels. An instantaneous SNR estimate is based
`on short time power estimates with time constants of inte(cid:173)
`gration in the range of 0.02 - 0.1 s. Typical applications are
`time delay estimation and speech enhancement (e.g. spectral
`subtraction).
`To acquire noise statistics the conventional approach to
`SNR estimation employs a voice activity detector to extract
`the noise only segments of the disturbed speech signal. The
`identification of noise segments might be based on the signal
`power, on a statistical evaluation by means of histograms
`or on combinations thereof [1). In all cases the update of
`the noise power estimate requires a signal segment where
`no speech is present. Depending on the method tracking of
`varying noise levels might be slow and confined to periods
`of no speech activity.
`The proposed algorithm, however, does not need an
`explicit speech/nospeech decision to gather noise statistics
`and is capable to track varying noise levels during speech
`activity. The algorithm is based on the observation that the
`smoothed power estimate of a noisy speech signal exhibits
`distinct peaks and valleys (see Figure 1). While the peaks
`correspond to speech activity the valleys of the smoothed
`noise estimate can be used to obtain a noise power estimate.
`To estimate the noise floor our algorithm takes the minimum
`of a smoothed power estimate within a window of finite
`length. The SNR estimates obtained by this method are fairly
`accurate.
`
`In section 2 and 3 we will present the algorithm and
`discuss some of its statistical properties. Section 4 will
`present experimental results. We conclude in section 5 with
`two applications.
`
`2. DESCRIPTION OF ALGORITHM
`In what follows we assume that the bandlimited and
`sampled disturbed signal x(i) is a sum of a speech signal
`sCi) and a noise signal n(i), :r:(i) = sCi) + n(i), where i
`denotes the time index. We further assume that s(i) and n(i)
`are statistically independent, hence E{ x2(i)} = E{ s2(i)} +
`E{n2(i)}.
`SN R.,(i) will denote the estimated signal-to-noise ratio
`of signal xCi) at time i. The algorithm works on a sample
`basis, i.e. a new output sample SN R.,(i) is computed for
`each input sample xCi).
`
`&tlmated ,bon time power and coise floor
`:dO'
`10.---~--~--~--~~~--~--~--~~
`
`1
`
`samples
`
`Figure 1: Smoothed power and estimated noise floor of noisy
`speech signal (f.=81cHz, segmental SNR ca. 5 dB, ClIr noise)
`
`The computation of SN R.,(i) is based on a noise power
`estimate P,,(i) which is obtained as the minimum of the
`smoothed short time power estimate F,,(i) within a window
`of L samples.
`
`EUROSPEECH 93, Berlin, Germany, September 1993
`
`1093
`
`RTL345-2_1022-0001
`
`

`
`Besides initialization the algorithm can be split into three
`major parts which will be discussed below (see Figure 2):
`
`1. Computation of a smoothed short time power estimate
`P",(i) of signal ~(i)
`2. Computation of the noise power estimate P .. (i)
`3. Computation of the SN 14 (i)
`
`SNR,,(i) -
`
`P,,(i)- mil:l{ot.dar .PaCi), p~(i»)
`~.PII(i)
`
`Figure 2: Flowchart of the SNR estimation algorithm
`
`Computation of a smoothed power estimate
`Computation of the short time signal power P", (i) and
`smoothing of the power estimate is done in two steps.
`The power estimate may be obtained reculliively or non(cid:173)
`reculliive)y. We here use a sliding rectangular window of
`length N with N=l28.
`In many applications, however, a
`power estimate is already available.
`Let P",(i) denote the smoothed short time power esti(cid:173)
`mate at time i. Smoothing of the power estimate is done
`by means of a first order recursive system. The smoothing
`comtant is typically set to values between Q = 0.95 ... 0.98.
`The recursion for i > N is given by equation 1:
`P",(i) = P.,(i - 1) + ~(i) ... ~(i) - ~(i - N). ~(i - N)
`P",(i) = Q oj< P",(i - 1) + (1 - Q) oj< P,,(i)
`
`(1)
`
`Noise power estimation
`
`The noise power estimate is based on the minimum of
`signal power within a window of L samples. For reasons
`of computational complexity and delay the data window of
`
`length L is decomposed into W windows of length M such
`that M '" W = 1. For a sampling rate of f,=8 kHz typical
`window parameters are M=1250 and W=4, thus t.=5000
`corresponding to a time window of 0.625 s.
`The minimum power of the last M samples is found by
`a samplewise comparison of the actual minimum PMm,,, (i)
`and the smoothed power ji.,(i).
`i = r.
`ples and reset PMmin(i = r. M) to its maximum value:
`Whenever M samples have been read, i.e.
`M, we store the minimum power of the last M sam(cid:173)
`PMm,n(i = r. M+) = Pm"",'
`To determine the noise power estimate we distinguish
`two cases:
`
`1.
`2.
`
`slowly varying noise power,
`rapidly varying noise power.
`
`(2)
`
`If the minimum power of the last W windows with
`M samples each is monotonically increasing we decide on
`In this case the noise power
`rapid noise power variation.
`estimate equals the power minimum of the last M samples
`P .. (i) = PMmi,,(i = r", M).
`In case of non monotonic power the noise power esti(cid:173)
`mate is set to the minimum of the length L window, i.e.:
`P,,(i) = hmin(i). The minimum power of the length L
`window is easily obtained as the minimum of the last W
`minimum power estimates:
`PLm"n(i) = min(PMmin(i = r* M),
`PMmi,,(i = (r - 1) ... M),
`... , PMmi,,(i = (r - W + 1). M»
`If the actual smoothed power is smaller than the esti(cid:173)
`mated noise power P,,(i) the noise power is updated im(cid:173)
`mediately independent of window adjustment: P,,(i) =
`min (P ... (i), Pn(i»).
`Computation of SNR
`The estimated SNR is computed on the basis of the
`estimated minimum noise power Pn(i). A factor ofa~or
`accounts for the fact that the minimum power estimate is
`smaller than the true noise power. ofa~or is typically set
`to values between 1.3 and 2 (see section 3):
`SNR(i) =
`( p"'(i)-min(ofa~or*Pn(i),P"(i»)) (3)
`f -4
`P. (.)
`o a"oor. n
`I
`Figure 1 plots the smoothed power estimate and the es(cid:173)
`timated noise floor for a noisy speech sample. The window
`length L = M. W must be large enough to bridge any peak
`of speech activity, but short enough to follow non stationary
`noise variations. Experiments with different speakers, differ(cid:173)
`ent languages, and modulated noise signals have shown that
`a window length of 0.625 s is a good value.
`In case of slowly varying noise power the update of
`noise estimates is delayed by L + M samples. If a rapid
`noise power increase is detected this delay is reduced to M
`samples, thus improving the noise tracking capability of the
`algorithm.
`
`I
`10. 0910
`
`EUROSPEECH 93, Berlin, Germany, September 1993
`
`1094
`
`RTL345-2_1022-0002
`
`

`
`3. STATISTICS OF MINIMUM ESTIMATES
`In this section we compute the density function of the
`minimum noise power estimate and justify our choice of the
`overestimation factor r>fadr>r. To facilitate the analytical
`evaluation of minimum estimates we assume tbat the noise
`process n is zero mean white Gaussian noise with variance 0-2
`and that the computation of the smoothed power estimate is
`entirely done by means of non recursive accumulation, i.e.:
`
`We now choose the overestimation factor r>factr>r such
`that the noise power estimate is approximately unbiased, i.e.
`E{P .. } ... of actor ~ E{P.J. Since fPe(Y) and fmi .. (v) are
`scaled by the noise variance 0'2 r>factoT does not depend on
`0-2 • Figure 4 shows the dependency of r>factor on Nand
`Lw and allows the selection of an appropriate overestimation
`factor,
`
`N-l
`
`P.,(i) = L :c2(i - m)
`
`(4)
`
`m=O
`Then, the power estimate Pre i) is chi-square distributed
`[2] with mean N ... 0'2 and density:
`'Pe(l!) =
`1
`(0-'I12(r(N/2)
`where ro and UO denote the Gamma function and the unit
`
`... ?/2-1 H-II/2(T' ... U(y) (5)
`
`step function, respectively.
`The density of the minimum of Lw independent power
`estimates is given by [2]:
`
`I
`
`2.6
`
`1..
`
`! ~
`I
`I
`1.2 L--
`I
`
`g 1.8
`i
`
`(6)
`
`Lw
`
`Figure 4: Overestimatioll factor a/actor versus Nand L",
`
`4. EXPERIMENTAL RESULTS
`Figure 5 plots the true and the estimated instantaneous
`SNR of the same noisy speech signal as in Figure 1. The
`true SNR was computed on the basis of separate speech and
`noise signals. Our SNR estimate shows good agreement with
`the true SNR during speech activity. In agreement with the
`statistical evaluation the estimate is biased when no speech
`is present.
`.
`
`It>
`
`~ .. Z
`
`In
`
`where Fpz (y) denotes the distribution function of the chi(cid:173)
`square density:
`
`Fp~ (y) = 1 - e- II /
`
`2
`
`N/2-1 1
`
`(T·... E m!'" C2~2) m ... U(y) (7)
`
`m=O
`
`Clearly, successive values of Pso(i) are correlated but if
`we shift the sliding window of equ. 4 by .6.i > N /2 we
`obtain sufficiently uncorrelated power estimates.
`Figure 3 plots the density func!ions fpc (y) and fmi .. (y)
`and corresponding histograms of P., (i) and P" (i) for a car
`noise signal.
`
`0,2
`
`0.18
`
`Q.J6
`
`O,U
`
`0.12
`
`0.1
`
`~
`0.05
`
`'<l.06
`
`0.Q.t
`
`(\
`0.02 J) \
`
`00
`
`",
`10
`
`Q.2
`
`0.1&
`
`OJ6
`
`0.1~
`
`O.l2
`
`~ 0.1
`!L08
`
`0.06
`
`0,04
`
`0.02
`
`0
`0
`
`IS
`
`samples
`
`dO'
`
`IS
`
`Figure 5: True and estimated instantaneous
`SNR of noisy speech signal (of actor = 1.5)
`
`Figure 3: Density (unctions fpz(v) (dotted) and fmi .. (v)
`(solid) for .,.2 = 0.09, N = 80, and L", = 20 (left
`graph) and corresponding histog",ms of P.,( i) (dotted)
`and P .. (i) (solid) for car noise signals (right 8"'ph)
`
`To test the algorithm with non stationary noise the noise
`signal was modulated with a sine function and then added to
`a speech signal: x(i) = s(i)+n(i) ... (1.5 + sine .. ·;toESOi)).
`The modulation frequency was set to fm = 0,33 Hz.
`
`EUROSPEECH 93, Berlin, Germany, September 1993
`
`1095
`
`RTL345-2_1022-0003
`
`

`
`Figure 6 plots the corresponding short time power and
`the estimated noise floor. Note the delay of the noise power
`values in case of increasing noise power. Figure 7 shows
`the true and estimated SNR. Due to the window length of
`0.625 s rapid noise variations might result in erroneous SNR
`estimates.
`
`detennine the delay between microphone signals we com(cid:173)
`pute the maximum of a smoothed cross correlation estimate.
`Whenever the SNR is below a preset threshold the update
`of smoothed correlation functions is frozen. Figure 8 plots
`the delay estimate without and with SNR estimation. The
`enhanced algorithm clearly eliminates all large deviations of
`the time delay estimate.
`
`14
`
`.... pI ..
`
`Figure 6: Short time power of modulated noisy
`speech signal and noise estimate [Of fm=O.33 Hz
`
`Figure 7: True and estimated SNR of
`modulated noisy speech signal for fm=O.33 Hz
`
`5. APPLICATIONS
`The algorithm was tested with varying noise levels and
`successfully incOlporated in several speech processing sys(cid:173)
`tems. In what follows we briefly discuss two applications,
`namely time delay estimation and spectral subtraction.
`
`TIME DElAY ESTIMATION
`from mi(cid:173)
`Time delayed speech signals originate e.g.
`crophone arrays where the speaker is in a non symmetric
`position relative to the array and possibly moving. In-phase
`summation or adaptive processing of these microphone sig(cid:173)
`nals usually requires a time delay compensation.
`The SNR estimator was implemented to support time
`delay estimation by means of (generalized) correlation. To
`
`samples
`
`xlO'
`
`Figure 8: Time delay of microphone channell with respect to
`channel 2 of a noisy speech sample with moving speaker
`wilhout (upper graph) and wilh (lower graph) SNR estimalion.
`SPECTRAL SUBTRACTION
`To reduce the noise level within a disturbed speech
`signal the spectral subtraction method modifies the short time
`In our
`spectral magnitude of the disturbed speech signal.
`experiments we used a filter bank with 256 channels and
`estimated the minimum power in each of these channels.
`Our infonnal listening test reveal relatively few annoy(cid:173)
`ing musical tones. However, due to the fact that we subtract
`slightly biased noise power estimates (of actor = 1.5) the noise
`suppression is limited. Power spectra of the disturbed and of
`the improved signal show an improvement of about 10 dB.
`
`6. CONCLUSION
`Varying noise levels have a significant impact on the
`performance of many speech processing algorithms. The
`algorithm proposed in this paper provides a computational
`inexpensive and effective mean to cope with this problem.
`The algorithm is accurate for medium to high SNR conditions
`but necessarily biased when no speech is present. A priori
`knowledge of noise variation and noise correlation is helpful
`to adapt window length and to control the estimation bias.
`
`ACKNOWLEDGMENn
`Part of this work was supported by Philips Kommllnikalions Indus(cid:173)
`lrie, Germany. Spectral sublraclioD using minimum power eslimales was
`iDvesligaled by Pel.r Kocybik.
`
`References
`
`[1)
`
`(2)
`
`R. McAulay and M. Malpass: "Speech Enbanoemenl Usiag a Soft·
`Decision Noise Suppression Filter ", IEEE Traos. ASSP, Vol. 28, No.
`2, pp. 137-145, April 1980.
`A. Papoulis: "Probability, Random Variables, and SloCbaslic Pro(cid:173)
`cesses", 2nd ed., McGraw-Hili, 1984.
`
`EUROSPEECH 93, Berlin, Germany, September 1993
`
`1096
`
`RTL345-2_1022-0004

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket