throbber
Estimation of Noise Spectrum and its Application to SNR-
`Estimation and Speech Enhancement
`
`H.Günter Hirsch
`
`Technical Report TR-93-012
`
`International Computer Science Institute,
`Berkeley, California, USA
`
`1
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 1
`
`

`

`Estimation of Noise Spectra and its Application to SNR-Estimation and
`Speech Enhancement
`
`H. Günter Hirsch
`
`Contents
`
`1. Introduction
`
`2. Principal Idea
`
`3. Practical Realization
`
`4. Signal-to-Noise Ratio (SNR) Estimation
`
`5. Speech Enhancement
`
`6. Conclusions
`
`7. References
`
`2
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 2
`
`

`

`1. Introduction
`
`At the time of this writing, some experiments in robust speech recognition had already
`been done at ICSI.
`The original goal of this work was improved recognition of speech recorded with different
`microphones and transmitted over channels with different frequency characteristics. One
`practical application of this is the recognition of speech recorded via telephone lines
`where you have microphones and channels with different transmission characteristics. It
`could be shown that the recognition rates can be improved when introducing a high-pass
`filtering of the logarithmic spectral envelopes in subbands /1/.
`This idea is based on the fact that a frequency characteristic corresponds to a multiplica-
`tion of the speech spectrum with the frequency response of the transmission channel.
`The result would be a constant additive component in the logarithmic spectral envelopes
`in subbands (assuming a nearly constant transmission characteristic). Because of this, a
`high-pass filtering leads to a suppression of these constant components.
`
`Another aspect is the superposition of noise in many applications of speech recognizers
`in real environments, e.g. voice dialing in a car or serving any kind of machines on the
`street or in workshops. This noise would result in a nearly constant additive component
`to the magnitude spectral envelopes in subbands (assuming a nearly stationary noise) .
`It could be shown that recognition rates can be improved by high-pass filtering the mag-
`nitude spectral envelopes /2/.
`
`Additive noise as well as a certain frequency characteristic are present in many real situ-
`ations. One way to handle both effects could be to use a combination of processing in
`the magnitude as well as in the logarithmic spectral domain. Another possibility could be
`a processing anywhere between the magnitude and the logarithmic domain dependent
`of the amount of noise in the specific situation. This would presuppose an estimation of
`the signal-to-noise ratio (SNR).
`
`Looking at the first possibility several processing techniques are well known to reduce
`the noise in the magnitude spectral domain. One could be the already mentioned high-
`pass filtering. A disadvantage of this method is the suppression of certain spectral fea-
`tures in speech segments. Introducing a high-pass filter with a total suppression of the
`DC component, not only the constant noise components are suppressed but also the
`constant component of the speech. Because of this just the spectral features of the pho-
`nemes with less energy are reduced in the case of a preceding phoneme with higher
`energy and spectral components in the same subbands.
`
`One solution could be a kind of nonlinear filtering with the goal of preserving the spectral
`features of the phonemes with less energy on one hand but suppressing the noise com-
`ponents on the other hand. Another method to reduce the noise is the well known spec-
`tral subtraction technique /3/,/4/. This technique is based on the estimation of the noise
`spectrum during speech pauses and an adaptive filtering with the estimated noise spec-
`trum. A major disadvantage is the necessity of the detection of speech pauses to esti-
`mate the noise spectrum. This is a very difficult and ultimately unsolved problem for
`
`3
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 3
`
`

`

`realistic situations with a varying noise level. Another disadvantage is the fact that the
`algorithm cannot adapt to a varying noise level during segments of speech. The adaptive
`filtering is always based on the estimated noise spectrum of the preceding speech
`pause.
`
`An improvement of the spectral subtraction technique would be an estimation of the
`noise spectrum without the necessity of a speech pause detection. A method is pre-
`sented in this report to estimate the noise spectrum without a speech pause detection.
`
`One application for this method presented in this report is the estimation of the actual
`SNR of a noisy signal. Furthermore the technique is applied to speech enhancement
`based on a spectral subtraction respectively on a nonlinear high-pass filtering of the
`spectral envelopes dependent on the actual SNR.
`
`4
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 4
`
`

`

`2. Principal Idea
`
`The principal idea to estimate the noise level in a certain subband is based on a statisti-
`cal analysis of a segment of the magnitude spectral envelope.
`
`Looking at such a spectral envelope in figure 2.1 and the corresponding distribution den-
`sity function in figure 2.2 the most commonly occurring spectral magnitude value is zero.
`The spectral envelope was calculated for a clean speech signal with a duration of about
`4.5 s and in a subband of about 500 Hz. The distribution density function was calculated
`for the whole duration of 4.5 s with an accuracy of about 1 percent in regard to the maxi-
`mum spectral value inside this subband. The function is shown for the range of 0 to 50
`percent of the maximum. Only a few values occur which are higher than 50 percent.
`
`spectral magnitude
`
`time/s
`
`Figure 2.1: Spectral envelope in a band with a centre frequency of 500 Hz
`
`Noise was added artificially to this speech signal to produce different SNRs. The results
`can be seen in figure 2.3. The noise was a bandpass limited Gaussian noise with a cen-
`tre frequency of 500 Hz and a bandwidth of 200 Hz.
`
`An increase of the maximum value in the distribution function can be observed for a
`decreasing SNR. This most frequently occurring value can be taken as an estimation for
`the noise level inside this band.
`Also, an increasing variance of the spectral magnitude values of the noise can be seen
`for an decreasing SNR. Because of a broad distribution the estimation isn’t so accurate
`
`5
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 5
`
`

`

`distribution
`density
`
`Figure 2.2: Distribution density function for the spectral envelope in figure 2.1
`
`magnitude/maximum
`
`for channels with a low SNR. A reduction of the accuracy for low SNRs has the effect of
`smoothing the distribution function and improving the maximum detection. On the other
`hand the accuracy has to be high for channels with less noise to get a reasonable
`estimation for the amount of noise at all. Because of this the accuracy for the calculation
`of the distribution density function is made dependent on the actual SNR inside one
`channel. The accuracy is less for channels with a bad SNR and higher for channels
`with a better SNR.
`
`6
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 6
`
`

`

`SNR = 15 dB
`
`SNR = 5 dB
`
`SNR = -5 dB
`
`Figure 2.3: Spectral envelopes and distribution functions for different SNRs
`
`7
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 7
`
`

`

`Some channels don’t show the good behaviour seen in figure 2.3. One result can be
`seen in figure 2.4 for a channel with a high centre frequency of about 2950 Hz. There is
`nearly no speech energy but a significant noise energy in this channel. The noise was a
`Gaussian noise in this case.
`
`Figure 2.4: Spectral envelope and the corresponding distribution density function of a fre-
`quency band with a centre frequency of 2590 Hz
`
`Sometimes a maximum respectively a noise level is calculated with an unrealistic high
`value in these cases. Because of this the possible estimated noise level is limited to the
`average spectral value. This is related to the fact that no noise energy can occur which is
`higher than the total amount of energy inside a band.
`
`Another channel with a nonideal behaviour is shown in figure 2.5. This is a channel with
`a centre frequency of about 219 Hz where the spectral magnitude of the signal often
`takes a very high value.
`
`8
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 8
`
`

`

`spectral
`magnitude
`
`time/s
`
`Figure 2.5: Spectral envelope in a subband with a centre frequency of 219 Hz
`
`One problem for this envelope occurs with a statistical analysis in a short time window of
`e.g. 500 ms around the time of 2s. The signal takes nearly only high values inside this
`window so that it will be impossible to estimate a reasonable noise level. To avoid this
`problem we must detect channels with such a behaviour and to extend the analysis win-
`dow in these cases. Usually these are only channels with a centre frequency less than
`500 Hz and with high energy. Both these criteria are used for the detection of channels
`with a possible behaviour similar to the spectral envelopes of figure 2.5.
`
`9
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 9
`
`

`

`3. Practical realization
`
`In a practical application the calculation must be done online with running speech. The
`length of a window has to be defined for the calculation of the distribution density func-
`tion. On one hand, the length should be as high as possible to increase the accuracy of
`the noise level estimation. On the other hand, the length may not exceed a certain dura-
`tion if a signal with a varying noise level should be analyzed. The spectral analysis is
`done with a universal program for a short-term spectral analysis.
`
`The FFT length used was 256 so that the centre frequencies of the estimated spectrum
`have a distance of 31.25 Hz at a sampling frequency of 8000 Hz. The window for weight-
`ing the speech samples was a Kaiser window multiplied by a sinc function. The influence
`of different window types was not examined but it could be assumed that it is not very
`high. The spectrum is calculated every 8 ms so that the magnitude spectral values inside
`one band are given with a sampling frequency of 125 Hz.
`
`The first estimation of the noise spectrum used the average of the first ten magnitude
`spectral values, calculated within each band. It is assumed that the first incoming speech
`samples of a recording are related to the noise. This noise estimation is used up to a
`time of 250 ms after starting the recording. Afterwards a window with an increasing
`length is used up to the final window length for the calculation of the distribution density
`function. We finally considered window lengths of 250ms, 500ms, 1s and 2s.
`
`The distribution density function is calculated for the magnitude spectral values inside
`the window for each band. The function is computed in a range from 0 to the maximum
`spectral value which was found in this band up to this time. The accuracy is 0.25 per cent
`of the maximum which corresponds to dividing the whole range into 400 intervals. A
`search for the maximum peak value is done where at first an accuracy of 2 % for the dis-
`tribution function is used. This is done by summing 8 neighboring values of the function,
`corresponding to a smoothing of the distribution density function.
`
`If the maximum value is higher than 10 % of the maximum spectral value inside this band
`the estimated value is taken directly. If the detected maximum is anywhere in the range
`of 5 to 10 per cent an accuracy of 1 % is used, in the range of 2.5 to 5 % an accuracy of
`0.5 % and under 2.5 % the highest available accuracy of 0.25 % is used for a more accu-
`rate noise level estimation. This kind of analysis is related to the fact that on one hand a
`more smoothed version of the distribution function should be used for the noise level
`estimation in cases of a low SNR inside a band. On the other hand the resolution has to
`be higher for high SNRs where you have only a small amount of noise in one channel.
`The fixing of the analysis intervals and the accuracy was done empirically.
`
`The following processing is done to avoid a poor estimation of the noise level inside
`some low frequency bands where spectral magnitude values very often occur with a high
`value (already mentioned in the preceding section). The five channels with the highest
`amount of energy are calculated by looking back 1 s of time to the past. An upper limit for
`
`10
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 10
`
`

`

`the noise level inside each of these bands is estimated by taking an average of a certain
`amount of the smallest spectral values of the last second. Then a distribution density
`function is calculated with a variable window length. Up to 50 values of the past are con-
`sidered which take a value between 0 and the upper limit of the noise level. The maxi-
`mum of the distribution density function is calculated as mentioned before.
`
`The estimated magnitude spectral values are smoothed in all cases by computing a
`weighted sum of the actual estimated noise level and all estimated values of the past.
`The weighting is done with an exponentially decaying curve.
`
`A result of this kind of noise estimation can be seen in figure 3.2. There is shown the
`average estimated spectrum of a noise signal. The estimation was done after adding the
`noise signal to a nearly clean speech signal. The length of the analysis window for the
`distribution density function was 500 ms. The average spectrum of the noise itself is
`shown in figure 3.1. The averaging was done for the whole noise signal.
`
`The estimation of the noise spectrum appears to work well. However, some overestima-
`tion can be seen for a frequency component at about 750 Hz with high energy.
`
`Some further estimated noise spectra are shown in figure 3.3 for additive noise resulting
`in different SNRs. The estimation seems to be nearly independent of the SNR. Some
`small differences can be seen for the case of a high SNR of 25 dB. The reason for this is
`the influence of the “clean” speech signal itself. The speech was recorded at a SNR of
`about 35 to 40 dB. This noise of the “clean” speech can already be seen for example in
`the region of higher frequencies.
`
`A spectrum of another noise signal and some corresponding noise spectral estimates
`are shown in figure 3.4.
`
`11
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 11
`
`

`

`Figure 3.1: Average magnitude spectrum of a noise signal
`
`frequency/Hz
`
`Figure 3.2: Average estimated magnitude noise spectrum of a noisy speech signal with a
`SNR of 5 dB
`
`frequency/Hz
`
`12
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 12
`
`

`

`SNR = -5 dB
`
`SNR = 15 dB
`
`SNR = 25 dB
`
`frequency/Hz
`
`Figure 3.3: Estimated magnitude noise spectra for different SNRs
`
`13
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 13
`
`

`

`noise spectrum
`
`SNR = -5 dB
`
`SNR = 5 dB
`
`SNR = 15 dB
`
`SNR = 25 dB
`
`Figure 3.4: Average noise and estimated noise spectra for different SNRs
`
`14
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 14
`
`

`

`4. SNR (signal-to-noise ratio) estimation
`
`The estimation of the noise spectrum can be used for an estimation of the SNR (signal-
`to-noise ratio). Despite the well-known engineering meaning of the acronym, SNR must
`first be defined in further detail.
`
`In the field of speech coding the term SNR is often used as a so called segmental SNR
`where the SNR is actually only a description of the ratio of speech energy to noise
`energy for a short time period, e.g. 20 ms. Often the expression is really interpreted as a
`long-term measurement, especially for stationary noise situations. In this case the aver-
`age energy of the speech is usually calculated over a longer time period, at least some
`seconds and including speech pauses. The noise energy is calculated for the same time
`period assuming that the noise is nearly stationary during this time. This can be ideally
`done when artificially mixing a speech and a noise signal with an optional SNR.
`
`In this application the term SNR can apply to any of a range of temporal scales.
`On one hand it should be more related to the long-term SNR. But on the other hand it
`should be possible to follow slowly changing noise situations as they occur in real situa-
`tions.
`
`Given a real noisy signal it is difficult to directly estimate the SNR without any specific
`knowledge about the speech or noise energy. Because of this the noise to signal-plus-
`noise ratio N/(S+N) was considered instead of the signal to noise ratio itself.
`
`An estimation of the short-term energy Nenergy of the noise at a specific time t is calcu-
`lated with Parseval’s relation..
`
`Nenergy t( )
`
`=
`
`nfft∑
`
`1
`nfft
`
`
`
`Nspec iΔf t,(
`
`)
`
`0=
`i
`where Nspec(iΔf) is the estimated spectral magnitude of the noise in a subband with a
`centre frequency of iΔf where Δf = <sampling frequency> divided by nfft and nfft = <FFT-
`length>.
`
`The short term energy of the noisy signal x is calculated in the same way.
`
`Xenergy t( )
`
`=
`
`nfft∑
`
`1
`nfft
`
`
`
`Xspec iΔf t,(
`
`)
`
`0=
`i
`where Xspec is the spectral magnitude of the noisy speech signal.
`The average energy of the noisy signal can not be calculated over a longer period of
`speech, (e.g. several seconds), in this application because the interactive character of
`
`15
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 15
`
`

`

`the task precludes a long initial delay.
`
`The average energy X(t) is calculated for a past segment where the length of this window
`corresponds to the window length for the calculation of the distribution density function
`for the noise level estimation.
`
`If the actual value of this average energy X(t) would always be used, the result of the
`relation N(t)/X(t) would be a kind of segmental N/(S+N) ratio. Instead, the maximum of
`the energy X(t) up to this time is used. This maximum is slowly decreased with an expo-
`nential decay to adapt to an overall change of the signal level so as not to use a local
`peak value of X over a long time.
`
`A result for the estimation of N(t)/X(t) can be seen in figure 4.3. A sentence with a dura-
`tion of about 7s was used as a speech signal. A Gaussian noise was artificially added
`with a SNR of 5 dB. The time signal is shown in figure 4.1. The estimation of the noise
`level N(t) itself can be seen in figure 4.2. The noise estimation as well as the calculation
`of X(t) were done within a window of 1 s.
`
`amplitude
`
`Figure 4.1: Time signal of a noisy sentence
`
`time/s
`
`16
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 16
`
`

`

`N(t)
`
`Figure 4.2: Estimation of the noise level N(t) for the signal of figure 4.1
`
`time/s
`
`N(t)/X(t)
`
`time/s
`
`Figure 4.3: Estimation of the relation N(t)/X(t) for the signal of figure 4.1
`
`The noise level estimation seems to be a little bit too high at the beginning of the speech.
`One reason for this is a noise inside the speech signal itself caused by breathing of the
`speaker. The relation N(t)/X(t) takes a high value of about 0.7 at the beginning because
`there is nearly only noise. Then the curve rapidly slopes when the first high values for
`X(t) are calculated at the beginning of speech activity. Later on the relation takes a nearly
`constant value of about 0.07.
`
`17
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 17
`
`

`

`N/X
`
`N/X
`
`N/X
`
`SNR = -5 dB
`
`time/s
`
`SNR = 15 dB
`
`time/s
`
`SNR = 25 dB
`
`time/s
`
`Figure 4.4: Estimation of the relation N(t)/X(t) for different SNRs
`
`The calculation of the maximum of the average energy Xenergy(t) is actually not the same
`as computing the long term average of X. The value of Xenergy(t) is higher than a long
`term average. Because of this the relation doesn’t take the value of 0.24 in the constant
`
`18
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 18
`
`

`

`part of the curve in figure 4.3 which it ideally should take for a SNR of 5 dB.
`Some further results using the signal of figure 4.1 are shown in figure 4.4 for different
`SNRs.
`
`N/X
`
`SNR/dB
`
`SNR/dB
`
`Figure 4.5: Average estimation of the relation N/X for different noise signals
`
`N/X
`
`Figure 4.6: Average estimation of the relation N/X for different noise signals
`
`19
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 19
`
`

`

`An average of these estimated values is calculated for the last 4 s of the signal corre-
`sponding to the nearly constant part of the curve. The result of this averaging is shown in
`figures 4.4 and 4.5 for different SNRs and different noise signals where a window length
`of 500 ms was used for the noise estimation.
`
`Stationary segments of two naturally recorded signals and four artificially generated sig-
`nals were used as noise signals. The results are shown for a 40 dB range in dynamics.
`The noise signals were
`
`1) car noise
`2) computer room noise
`3) white Gaussian noise within a bandwidth of 0 to 4 kHz
`4) white Gaussian noise within a bandwidth of 0 to 0.5 kHz
`5) white Gaussian noise within a bandwidth of 0 to1 kHz
`6) white Gaussian noise within a bandwidth of 2 to 4 kHz
`
`The accuracy of the estimation slightly decreases for high SNRs. But overall a high cor-
`relation can be seen for the different noise signals. The accuracy is in a range of about 1
`to 2 dB. The results can be used to realize a mapping from the estimated Nenergy/Xenergy
`to the real SNR.
`
`The results when using different window lengths for the noise level estimation are shown
`in figure 4.7.
`
`The scaling of the ordinates is different for the different window lengths because of the
`calculation of X in the corresponding window. A much higher maximum of X occurs for
`the short window of 250 ms which is nearly comparable with the estimation of the energy
`of a vowel. No big difference can be seen for the different lengths of the window when
`comparing the correlations of SNR to the computed N/(S+N). However, these curves are
`the result of an averaging over 4 s so that the influence of the temporal fluctuations can
`not be seen.
`
`20
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 20
`
`

`

`windowlength = 250 ms
`
`windowlength = 250 ms
`
`windowlength = 1 s
`
`windowlength = 1 s
`
`windowlength = 2 s
`
`windowlength = 2 s
`
`Figure 4.7: Average estimation of the relation N/X for different window lengths
`
`21
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 21
`
`

`

`Some experiments were done adding noise with a varying SNR. A modulated Gaussian
`noise was added to the speech signal shown in figure 4.1 with an overall SNR of 10 dB.
`The modulation signal itself can be seen in figure 4.8. The result for the estimation of
`N(t)/X(t) is shown in figure 4.9 using an analysis window of 500 ms.
`
`Figure 4.8: Time signal for the modulation of a Gaussian noise
`
`N(t)/X(t)
`
`window length: 500 ms
`
`time/s
`
`time/s
`
`Figure 4.9: Estimated N(t)/X(t) for a signal disturbed by a modulated Gaussian noise
`
`The estimation of N/X follows this artificial modulation characteristic quite good. A delay
`of about 500 ms can be considered because of the analysis window in the past.
`
`22
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 22
`
`

`

`The results for different window lengths are shown in figure 4.10.
`
`N(t)/X(t)
`
`window length: 250 ms
`
`N(t)/X(t)
`
`window length: 1s
`
`N(t)/X(t)
`
`window llength: 2s
`
`Figure 4.10: Estimated N(t)/X(t) for different window lengths
`
`time/s
`
`23
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 23
`
`

`

`The delay is smaller for an analysis window of 250 ms but some errors occur for the
`noise level estimation. In the case of a 1 s window the curve doesnot fit the modulation
`characteristic as good as in the case of 500 ms. The length of 2 s is too high to follow the
`varying noise level.
`
`The length of the analysis window should be chosen for the particular noisy situations to
`which the processing is applied. A length of 500 ms seems to be a good compromise for
`the cases we have examined.
`
`Experiments with naturally recorded noisy speech signals have shown a good agree-
`ment of the estimated SNRs with the expected curves.
`
`24
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 24
`
`

`

`5. Speech Enhancement
`
`The estimation of noise spectra can also be used for speech enhancement. Two applica-
`tions were considered in this study.One is the well-known spectral subtraction technique.
`The other one is based on a modified high-pass filtering of the spectral envelopes in sub-
`bands.
`
`Only the magnitude spectral values are processed in both cases. The phase of the noisy
`speech is used for the resynthesis. An existing program called “synthese” was used for
`the resynthesis, in which time signals are generated from processed spectra with the
`overlap-add method.
`
`The noise spectrum estimation described above (section 3) is used for the spectral sub-
`traction. The subtraction is applied in 128 bands. The estimated magnitude noise spec-
`tral value N(iΔf) is used for an adaptive weighting of X(iΔf), the spectral magnitude of the
`noisy speech, in each subband with a centre frequency iΔf.
`
`An estimation of the magnitude component of the speech is calculated as
`
`Sˆ
`
`(
`
`iΔf
`
`)
`
`=
`
`
`1 N iΔf(
`−(
`X iΔf(
`
`
`)
`)
`
`
`
`) X iΔf(
`
`)
`
`The contour of X(iΔf) is usually smoothed with an exponentially decaying weighting of
`past values.
`
`Various modifications of the weighting function 1 - (N(iΔf)/X(iΔf)) are possible /4/, /5/, e.g.
`the realization as a Wiener filter.
`
`It is possible that negative weighting factors occur. One solution is to set these factors to
`zero. The time signals of clean speech, of noisy speech where a car noise was added
`with a SNR of 5 dB and of the processed noisy speech are shown in figure 5.1.
`
`A considerable improvement of the SNR can be seen. However, listening to the resyn-
`thesized speech one can hear a new artificial noise that is often referred to as ‘‘musical
`tones’’. This degradation, common to spectral subtraction-based enhancement tech-
`niques, significantly disturbs the subjective impression. However, earlier experiments /6/
`had shown that this artificial noise has little influence on the recognition rates of an iso-
`lated word recognizer. Thus, recognition rates could be improved by the introduction of
`the spectral subtraction technique.
`
`25
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 25
`
`

`

`8000
`
`6000
`
`4000
`
`2000
`
`0
`
`-2000
`
`-4000
`
`-6000
`0
`
`8000
`
`6000
`
`4000
`
`2000
`
`0
`
`-2000
`
`-4000
`
`-6000
`0
`
`8000
`
`6000
`
`4000
`
`2000
`
`0
`
`-2000
`
`-4000
`
`-6000
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`time/s
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`time/s
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`time/s
`
`Figure 5.1: Time signals of clean and noisy speech and after processing with a spectral
`subtraction technique
`
`26
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 26
`
`

`

`The second speech enhancement algorithm is based on a high-pass filtering of the spec-
`tral magnitude contours in subbands. The known filter function from /1/ is used in each
`subband. Applying this filtering directly to a spectral trajectory in one subband, many
`negative values occur as shown in figure 5.2.
`
`The envelope of clean speech, of noisy speech and the filtered envelope in a subband
`with a centre frequency of 500 Hz are shown. Setting all negative values to zero the
`noise is considerably reduced but also certain parts of the speech are suppressed.
`
`Because of this the filter structure is modified as shown in figure 5.3.
`
`spectral
`magnitude X(t,iΔf)
`
`1 - c(t,iΔf)
`
`Figure 5.3: Modified high-pass filtering
`
`c(t,iΔf)
`
`High-Pass
`Filter
`5
`
`f/Hz
`
`+
`
`The attenuation of the DC-component can be varied with this filter scheme with the factor
`c. In principal the attenuation should be high for noise segments and should be made
`dependent on the actual SNR for a speech segment. The weighting function 1 - N/X
`already used for spectral subtraction, is applied according to the additional weighting fac-
`tor c.
`
`The spectral trajectories of figure 5.2 are shown again in figure 5.4 but this time the pro-
`cessing is done with the modified filter structure.
`
`From this experiment, it appears that the suppression of speech segments is much less
`of a problem than in the case of a static high-pass filter. The result of processing a whole
`noisy speech sentence is shown in figure 5.5.
`
`The time signals of the clean, the noisy and the processed speech are plotted. A consid-
`erable improvement of the SNR can be obtained with this processing. The generation of
`musical tones seems to be a little bit less of a problem than in the case of spectral sub-
`traction.
`
`Some experiments with automatic recognition for a combination of this technique and
`Rasta-PLP are currently in progress and will be presented at ICASSP93 for signals cor-
`rupted by convolutional and additive noise.
`
`27
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 27
`
`

`

`6
`
`5
`
`4
`
`3
`
`2
`
`1
`
`0
`
`0
`
`5
`
`4.5
`
`4
`
`3.5
`
`3
`
`2.5
`
`2
`
`1.5
`
`1
`
`0.5
`
`0
`
`0
`
`4
`
`3.5
`
`3
`
`2.5
`
`2
`
`1.5
`
`1
`
`0.5
`
`0
`
`-0.5
`
`0.5
`
`1
`
`1.5
`
`2
`
`2.5
`
`time/s
`
`0.5
`
`1
`
`1.5
`
`2
`
`2.5
`
`time/s
`
`-1
`0
`
`0.5
`
`1
`
`1.5
`
`time/s
`Figure 5.2: Spectral envelope of clean and noisy speech and after RASTA high-pass fil-
`tering
`
`2
`
`2.5
`
`28
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 28
`
`

`

`6
`
`5
`
`4
`
`3
`
`2
`
`1
`
`0
`
`0
`
`5
`
`4.5
`
`4
`
`3.5
`
`3
`
`2.5
`
`2
`
`1.5
`
`1
`
`0.5
`
`0
`
`0
`
`5
`
`4
`
`3
`
`2
`
`1
`
`0
`
`0.5
`
`1
`
`1.5
`
`2
`
`2.5
`
`time/s
`
`0.5
`
`1
`
`1.5
`
`2
`
`2.5
`
`time/s
`
`-1
`0
`
`0.5
`
`1
`
`1.5
`
`time/s
`Figure 5.4: Spectral envelope of clean and noisy speech and after modified high-pass fil-
`tering
`
`2
`
`2.5
`
`29
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 29
`
`

`

`8000
`
`6000
`
`4000
`
`2000
`
`0
`
`-2000
`
`-4000
`
`-6000
`0
`
`8000
`
`6000
`
`4000
`
`2000
`
`0
`
`-2000
`
`-4000
`
`-6000
`0
`
`8000
`
`6000
`
`4000
`
`2000
`
`0
`
`-2000
`
`-4000
`
`-6000
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`time/s
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`time/s
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`time/s
`
`Figure 5.5: Time signals of clean and noisy speech and after processing with the modi-
`fied high-pass filtering
`
`30
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 30
`
`

`

`6. Conclusions
`
`A method is presented in this report to estimate the noise spectrum of speech utterances
`which are disturbed by additive noise. One advantage is that no speech pause detection
`is required.
`
`The processing is based on a calculation of the distribution density function of spectral
`magnitude values in a subband. The histogram for one subband is calculated for a past
`segment with a defined duration. Good results were obtained for a segment of 0,5 s. In
`this case the noise spectrum can also follow a slowly changing noise.
`
`Two applications of this technique are described in this report. The first one is an estima-
`tion of the actual SNR of a speech segment. Good results were obtained for a wide
`range of SNRs and for different noise signals. The second one is the use for an enhance-
`ment of noisy speech. The enhancement techniques attempted were a new form of
`spectral subtraction, and a modified high-pass filtering of the spectral envelopes in sub-
`bands.
`
`31
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 31
`
`

`

`7. References
`
`/1/
`
`/2/
`
`/3/
`
`/4/
`
`/5/
`
`/6/
`
`H. Hermansky, N. Morgan, A. Bayya, P. Kohn: Compensation for the effect of the
`communication channel in auditory-like analysis of speech (RASTA_PLP), Euro-
`speech, 1991, pp. 1367-1370
`
`H.G. Hirsch, P. Meyer, H.W. Rühl: Improved speech recognition using high-pass
`filtering of subband envelopes, Eurospeech, 1991, pp. 413-416
`
`S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction,
`IEEE ASSP-28, No.2, 1979, pp.113-120
`
`P. Vary: Noise suppression by spectral magnitude estimation- Mechanism and
`theoretical limits, Signal Processing, 1985, pp. 387-400
`
`P. Lockwood, J. Boudy: Experiments with a nonlinear spectral subtractor, Hidden
`Markov Models and the projection, for robust speech recognition in cars, Speech
`Communication 11, 1992, pp. 215-228
`
`H.G. Hirsch, H.W. Rühl: Automatic speech recognition in a noisy environment,
`Eurospeech, 1989, pp. 652-655
`
`32
`
`IPR No. 2017-00627
`Apple Inc. v. Andrea Electronics Inc. - Ex. 1032, p. 32
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket