`http://www.isca-speech.org/archive
`
`3rd European Conference on
`Speech Communication and Technology
`EUROSPEECH'93
`Berlin, Germany, September 19-23, 1993
`
`AN EFFICIENT ALGORITHM TO ESTIMATE THE
`INSTANTANEOUS SNR OF SPEECH SIGNALS
`
`Rainer Martin
`
`Institute for Communication Systems and Data Processing (IND), Aachen University of Technology,
`Templergraben 55, 52056 Aachen, Germany, Phone: +49 241 806984, Fax: +49 241 806985
`
`ABSTRACT
`This contribution presents an efficient algorithm to esti(cid:173)
`mate the instantaneous signal-to-noise ratio of speech signals.
`The algorithm is capable to track non stationary noise signals
`and has a low computational complexity. It does not need a
`speech activity detector nor histograms to learn signal statis(cid:173)
`tics. The algorithm is based on the observation that a noise
`power estimate can be obtained using minimum values of a
`smoothed power estimate. This paper will present this algo(cid:173)
`rithm, its performance, its limits, and some applications.
`Keywords: SNR, time delay estimation, speech enhancement
`
`1. INTRODUCTION
`Instantaneous SNR estimation is an essential component
`of speech processing algorithms which are sensitive to vary(cid:173)
`ing noise levels. An instantaneous SNR estimate is based
`on short time power estimates with time constants of inte(cid:173)
`gration in the range of 0.02 - 0.1 s. Typical applications are
`time delay estimation and speech enhancement (e.g. spectral
`subtraction).
`To acquire noise statistics the conventional approach to
`SNR estimation employs a voice activity detector to extract
`the noise only segments of the disturbed speech signal. The
`identification of noise segments might be based on the signal
`power, on a statistical evaluation by means of histograms
`or on combinations thereof [1). In all cases the update of
`the noise power estimate requires a signal segment where
`no speech is present. Depending on the method tracking of
`varying noise levels might be slow and confined to periods
`of no speech activity.
`The proposed algorithm, however, does not need an
`explicit speech/nospeech decision to gather noise statistics
`and is capable to track varying noise levels during speech
`activity. The algorithm is based on the observation that the
`smoothed power estimate of a noisy speech signal exhibits
`distinct peaks and valleys (see Figure 1). While the peaks
`correspond to speech activity the valleys of the smoothed
`noise estimate can be used to obtain a noise power estimate.
`To estimate the noise floor our algorithm takes the minimum
`of a smoothed power estimate within a window of finite
`length. The SNR estimates obtained by this method are fairly
`accurate.
`
`In section 2 and 3 we will present the algorithm and
`discuss some of its statistical properties. Section 4 will
`present experimental results. We conclude in section 5 with
`two applications.
`
`2. DESCRIPTION OF ALGORITHM
`In what follows we assume that the bandlimited and
`sampled disturbed signal x(i) is a sum of a speech signal
`sCi) and a noise signal n(i), :r:(i) = sCi) + n(i), where i
`denotes the time index. We further assume that s(i) and n(i)
`are statistically independent, hence E{ x2(i)} = E{ s2(i)} +
`E{n2(i)}.
`SN R.,(i) will denote the estimated signal-to-noise ratio
`of signal xCi) at time i. The algorithm works on a sample
`basis, i.e. a new output sample SN R.,(i) is computed for
`each input sample xCi).
`
`&tlmated ,bon time power and coise floor
`:dO'
`10.---~--~--~--~~~--~--~--~~
`
`1
`
`samples
`
`Figure 1: Smoothed power and estimated noise floor of noisy
`speech signal (f.=81cHz, segmental SNR ca. 5 dB, ClIr noise)
`
`The computation of SN R.,(i) is based on a noise power
`estimate P,,(i) which is obtained as the minimum of the
`smoothed short time power estimate F,,(i) within a window
`of L samples.
`
`EUROSPEECH 93, Berlin, Germany, September 1993
`
`1093
`
`RTL345-2_1022-0001
`
`
`
`Besides initialization the algorithm can be split into three
`major parts which will be discussed below (see Figure 2):
`
`1. Computation of a smoothed short time power estimate
`P",(i) of signal ~(i)
`2. Computation of the noise power estimate P .. (i)
`3. Computation of the SN 14 (i)
`
`SNR,,(i) -
`
`P,,(i)- mil:l{ot.dar .PaCi), p~(i»)
`~.PII(i)
`
`Figure 2: Flowchart of the SNR estimation algorithm
`
`Computation of a smoothed power estimate
`Computation of the short time signal power P", (i) and
`smoothing of the power estimate is done in two steps.
`The power estimate may be obtained reculliively or non(cid:173)
`reculliive)y. We here use a sliding rectangular window of
`length N with N=l28.
`In many applications, however, a
`power estimate is already available.
`Let P",(i) denote the smoothed short time power esti(cid:173)
`mate at time i. Smoothing of the power estimate is done
`by means of a first order recursive system. The smoothing
`comtant is typically set to values between Q = 0.95 ... 0.98.
`The recursion for i > N is given by equation 1:
`P",(i) = P.,(i - 1) + ~(i) ... ~(i) - ~(i - N). ~(i - N)
`P",(i) = Q oj< P",(i - 1) + (1 - Q) oj< P,,(i)
`
`(1)
`
`Noise power estimation
`
`The noise power estimate is based on the minimum of
`signal power within a window of L samples. For reasons
`of computational complexity and delay the data window of
`
`length L is decomposed into W windows of length M such
`that M '" W = 1. For a sampling rate of f,=8 kHz typical
`window parameters are M=1250 and W=4, thus t.=5000
`corresponding to a time window of 0.625 s.
`The minimum power of the last M samples is found by
`a samplewise comparison of the actual minimum PMm,,, (i)
`and the smoothed power ji.,(i).
`i = r.
`ples and reset PMmin(i = r. M) to its maximum value:
`Whenever M samples have been read, i.e.
`M, we store the minimum power of the last M sam(cid:173)
`PMm,n(i = r. M+) = Pm"",'
`To determine the noise power estimate we distinguish
`two cases:
`
`1.
`2.
`
`slowly varying noise power,
`rapidly varying noise power.
`
`(2)
`
`If the minimum power of the last W windows with
`M samples each is monotonically increasing we decide on
`In this case the noise power
`rapid noise power variation.
`estimate equals the power minimum of the last M samples
`P .. (i) = PMmi,,(i = r", M).
`In case of non monotonic power the noise power esti(cid:173)
`mate is set to the minimum of the length L window, i.e.:
`P,,(i) = hmin(i). The minimum power of the length L
`window is easily obtained as the minimum of the last W
`minimum power estimates:
`PLm"n(i) = min(PMmin(i = r* M),
`PMmi,,(i = (r - 1) ... M),
`... , PMmi,,(i = (r - W + 1). M»
`If the actual smoothed power is smaller than the esti(cid:173)
`mated noise power P,,(i) the noise power is updated im(cid:173)
`mediately independent of window adjustment: P,,(i) =
`min (P ... (i), Pn(i»).
`Computation of SNR
`The estimated SNR is computed on the basis of the
`estimated minimum noise power Pn(i). A factor ofa~or
`accounts for the fact that the minimum power estimate is
`smaller than the true noise power. ofa~or is typically set
`to values between 1.3 and 2 (see section 3):
`SNR(i) =
`( p"'(i)-min(ofa~or*Pn(i),P"(i»)) (3)
`f -4
`P. (.)
`o a"oor. n
`I
`Figure 1 plots the smoothed power estimate and the es(cid:173)
`timated noise floor for a noisy speech sample. The window
`length L = M. W must be large enough to bridge any peak
`of speech activity, but short enough to follow non stationary
`noise variations. Experiments with different speakers, differ(cid:173)
`ent languages, and modulated noise signals have shown that
`a window length of 0.625 s is a good value.
`In case of slowly varying noise power the update of
`noise estimates is delayed by L + M samples. If a rapid
`noise power increase is detected this delay is reduced to M
`samples, thus improving the noise tracking capability of the
`algorithm.
`
`I
`10. 0910
`
`EUROSPEECH 93, Berlin, Germany, September 1993
`
`1094
`
`RTL345-2_1022-0002
`
`
`
`3. STATISTICS OF MINIMUM ESTIMATES
`In this section we compute the density function of the
`minimum noise power estimate and justify our choice of the
`overestimation factor r>fadr>r. To facilitate the analytical
`evaluation of minimum estimates we assume tbat the noise
`process n is zero mean white Gaussian noise with variance 0-2
`and that the computation of the smoothed power estimate is
`entirely done by means of non recursive accumulation, i.e.:
`
`We now choose the overestimation factor r>factr>r such
`that the noise power estimate is approximately unbiased, i.e.
`E{P .. } ... of actor ~ E{P.J. Since fPe(Y) and fmi .. (v) are
`scaled by the noise variance 0'2 r>factoT does not depend on
`0-2 • Figure 4 shows the dependency of r>factor on Nand
`Lw and allows the selection of an appropriate overestimation
`factor,
`
`N-l
`
`P.,(i) = L :c2(i - m)
`
`(4)
`
`m=O
`Then, the power estimate Pre i) is chi-square distributed
`[2] with mean N ... 0'2 and density:
`'Pe(l!) =
`1
`(0-'I12(r(N/2)
`where ro and UO denote the Gamma function and the unit
`
`... ?/2-1 H-II/2(T' ... U(y) (5)
`
`step function, respectively.
`The density of the minimum of Lw independent power
`estimates is given by [2]:
`
`I
`
`2.6
`
`1..
`
`! ~
`I
`I
`1.2 L--
`I
`
`g 1.8
`i
`
`(6)
`
`Lw
`
`Figure 4: Overestimatioll factor a/actor versus Nand L",
`
`4. EXPERIMENTAL RESULTS
`Figure 5 plots the true and the estimated instantaneous
`SNR of the same noisy speech signal as in Figure 1. The
`true SNR was computed on the basis of separate speech and
`noise signals. Our SNR estimate shows good agreement with
`the true SNR during speech activity. In agreement with the
`statistical evaluation the estimate is biased when no speech
`is present.
`.
`
`It>
`
`~ .. Z
`
`In
`
`where Fpz (y) denotes the distribution function of the chi(cid:173)
`square density:
`
`Fp~ (y) = 1 - e- II /
`
`2
`
`N/2-1 1
`
`(T·... E m!'" C2~2) m ... U(y) (7)
`
`m=O
`
`Clearly, successive values of Pso(i) are correlated but if
`we shift the sliding window of equ. 4 by .6.i > N /2 we
`obtain sufficiently uncorrelated power estimates.
`Figure 3 plots the density func!ions fpc (y) and fmi .. (y)
`and corresponding histograms of P., (i) and P" (i) for a car
`noise signal.
`
`0,2
`
`0.18
`
`Q.J6
`
`O,U
`
`0.12
`
`0.1
`
`~
`0.05
`
`'<l.06
`
`0.Q.t
`
`(\
`0.02 J) \
`
`00
`
`",
`10
`
`Q.2
`
`0.1&
`
`OJ6
`
`0.1~
`
`O.l2
`
`~ 0.1
`!L08
`
`0.06
`
`0,04
`
`0.02
`
`0
`0
`
`IS
`
`samples
`
`dO'
`
`IS
`
`Figure 5: True and estimated instantaneous
`SNR of noisy speech signal (of actor = 1.5)
`
`Figure 3: Density (unctions fpz(v) (dotted) and fmi .. (v)
`(solid) for .,.2 = 0.09, N = 80, and L", = 20 (left
`graph) and corresponding histog",ms of P.,( i) (dotted)
`and P .. (i) (solid) for car noise signals (right 8"'ph)
`
`To test the algorithm with non stationary noise the noise
`signal was modulated with a sine function and then added to
`a speech signal: x(i) = s(i)+n(i) ... (1.5 + sine .. ·;toESOi)).
`The modulation frequency was set to fm = 0,33 Hz.
`
`EUROSPEECH 93, Berlin, Germany, September 1993
`
`1095
`
`RTL345-2_1022-0003
`
`
`
`Figure 6 plots the corresponding short time power and
`the estimated noise floor. Note the delay of the noise power
`values in case of increasing noise power. Figure 7 shows
`the true and estimated SNR. Due to the window length of
`0.625 s rapid noise variations might result in erroneous SNR
`estimates.
`
`detennine the delay between microphone signals we com(cid:173)
`pute the maximum of a smoothed cross correlation estimate.
`Whenever the SNR is below a preset threshold the update
`of smoothed correlation functions is frozen. Figure 8 plots
`the delay estimate without and with SNR estimation. The
`enhanced algorithm clearly eliminates all large deviations of
`the time delay estimate.
`
`14
`
`.... pI ..
`
`Figure 6: Short time power of modulated noisy
`speech signal and noise estimate [Of fm=O.33 Hz
`
`Figure 7: True and estimated SNR of
`modulated noisy speech signal for fm=O.33 Hz
`
`5. APPLICATIONS
`The algorithm was tested with varying noise levels and
`successfully incOlporated in several speech processing sys(cid:173)
`tems. In what follows we briefly discuss two applications,
`namely time delay estimation and spectral subtraction.
`
`TIME DElAY ESTIMATION
`from mi(cid:173)
`Time delayed speech signals originate e.g.
`crophone arrays where the speaker is in a non symmetric
`position relative to the array and possibly moving. In-phase
`summation or adaptive processing of these microphone sig(cid:173)
`nals usually requires a time delay compensation.
`The SNR estimator was implemented to support time
`delay estimation by means of (generalized) correlation. To
`
`samples
`
`xlO'
`
`Figure 8: Time delay of microphone channell with respect to
`channel 2 of a noisy speech sample with moving speaker
`wilhout (upper graph) and wilh (lower graph) SNR estimalion.
`SPECTRAL SUBTRACTION
`To reduce the noise level within a disturbed speech
`signal the spectral subtraction method modifies the short time
`In our
`spectral magnitude of the disturbed speech signal.
`experiments we used a filter bank with 256 channels and
`estimated the minimum power in each of these channels.
`Our infonnal listening test reveal relatively few annoy(cid:173)
`ing musical tones. However, due to the fact that we subtract
`slightly biased noise power estimates (of actor = 1.5) the noise
`suppression is limited. Power spectra of the disturbed and of
`the improved signal show an improvement of about 10 dB.
`
`6. CONCLUSION
`Varying noise levels have a significant impact on the
`performance of many speech processing algorithms. The
`algorithm proposed in this paper provides a computational
`inexpensive and effective mean to cope with this problem.
`The algorithm is accurate for medium to high SNR conditions
`but necessarily biased when no speech is present. A priori
`knowledge of noise variation and noise correlation is helpful
`to adapt window length and to control the estimation bias.
`
`ACKNOWLEDGMENn
`Part of this work was supported by Philips Kommllnikalions Indus(cid:173)
`lrie, Germany. Spectral sublraclioD using minimum power eslimales was
`iDvesligaled by Pel.r Kocybik.
`
`References
`
`[1)
`
`(2)
`
`R. McAulay and M. Malpass: "Speech Enbanoemenl Usiag a Soft·
`Decision Noise Suppression Filter ", IEEE Traos. ASSP, Vol. 28, No.
`2, pp. 137-145, April 1980.
`A. Papoulis: "Probability, Random Variables, and SloCbaslic Pro(cid:173)
`cesses", 2nd ed., McGraw-Hili, 1984.
`
`EUROSPEECH 93, Berlin, Germany, September 1993
`
`1096
`
`RTL345-2_1022-0004