`IEEE TRANSACTIONS
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP·27, NO.2, APRIL 1979
`
`113
`113
`
`Suppression of Acoustic Noise in Speech Using
`Suppression of Acoustic Noise in Speech Using
`Spectral Subtraction
`Spectral Subtraction
`
`STEVEN F. BOLL, MEMBER, IEEE
`
`Abstract-A stand-alone noise suppression algorithm is presented for
`Abstract-A stand-alone noise suppression algorithm is presented for
`reducing the spectral effects of acoustically added noise in speech. Ef-
`reducing the spectral effects of acoustically added noise in speech. Ef(cid:173)
`fective performance of digital speech processors operating in practical
`fective performance of digital speech processors operating in practical
`environments may require suppression of noise from the digital wave-
`environments may require suppression of noise from the digital wave(cid:173)
`form. Spectral subtraction offers a computationally efficient, processor-
`form. Spectral subtraction offers a computationally efficient, processor(cid:173)
`independent approach to effective digital speech analysis. The method,
`independent approach to effective digital speech analysis. The method,
`requiring about the same computation as high-speed convolution, sup-
`requiring about the same computation as high-speed convolution, sup(cid:173)
`presses stationary noise from speech by subtracting the spectral noise
`presses stationary noise from speech by subtracting the spectral noise
`bias calculated during nonspeech activity. Secondary procedures are
`bias calculated during nonspeech activity. Secondary procedures are
`then applied to attenuate the residual noise left after subtraction. Since
`then applied to attenuate the residual noise left after subtraction. Since
`the algorithm resynthesizes a speech waveform, it can be used as a pre-
`the algorithm resynthesizes a speech waveform, it can be used as a pre(cid:173)
`processor to narrow-band voice communications systems, speech recog-
`processor to narrow-band voice communications systems, speech recog(cid:173)
`nition systems, or speaker authentication systems.
`nition systems, or speaker authentication systems.
`
`speech processor implementation (it could be connected to a
`speech processor implementation (it could be connected to a
`CCD channel vocoder or a digital LPC vocoder).
`CCD channel vocoder or a digital LPC vocoder).
`The objectives of this effort were to develop a noise sup-
`The objectives of this effort were to develop a noise sup(cid:173)
`pression technique, implement a computationally
`efficient
`pression technique, implement a computationally efficient
`algorithm, and test its performance
`in actual noise environ-
`algorithm, and test its performance in actual noise environ(cid:173)
`ments. The approach used was
`to estimate the magnitude
`ments. The approach used was to estimate the magnitude
`frequency spectrum of the underlying clean speech by sub-
`frequency spectrum of the underlying clean speech by sub(cid:173)
`tracting the noise magnitude spectrum from the noisy speech
`tracting the noise magnitude spectrum from the noisy speech
`spectrum. This estimator requires an estimate
`of the current
`spectrum. This estimator requires an estimate of the current
`noise spectrum. Rather than obtain
`this noise estimate from
`noise spectrum. Rather than obtain this noise estimate from
`[9] , [lo] , it is approximated
`a second microphone source
`a second microphone source [9], [10], it is approximated
`using the average noise magnitude measured during nonspeech
`using the average noise magnitude measured during nonspeech
`activity. Using this approach, the spectral approximation error
`activity. Using this approach, the spectral approximation error
`is then defined, and secondary methods for reducing
`it are
`is then defined, and secondary methods for reducing it are
`described.
`described.
`is implemented using about the same
`The noise suppressor
`The noise suppressor is implemented using about the same
`amount of computation as required in a high-speech convolu-
`amount of computation as required in a high-speech convolu(cid:173)
`tion.
`It is tested on speech recorded in a helicopter environ-
`tion. It is tested on speech recorded in a helicopter environ(cid:173)
`ment. Its performance is measured using the Diagnostic Rhyme
`ment. Its performance is measured using the Diagnostic Rhyme
`Test (DRT) [ 111 and is demonstrated using isometric plots of
`Test (DRT) [11] and is demonstrated using isometric plots of
`short-time spectra.
`short-time spectra.
`The paper is divided into sections which develop the spectral
`The paper is divided into sections which develop the spectral
`estimator, describe the algorithm implementation, and demon-
`estimator, describe the algorithm implementation, and demon(cid:173)
`strate the algorithm performance.
`strate the algorithm performance.
`11. SUBTRACTIVE NOISE SUPPRESSION ANALYSIS
`II. SUBTRACTIVE NOISE SUPPRESSION ANALYSIS
`A. Introduction
`A. Introduction
`This section describes the noise-suppressed spectral estimator.
`This section describes the noise-suppressed spectral estimator.
`The estimator is obtained by subtracting an estimate of the
`The estimator is obtained by subtracting an estimate of the
`noise spectrum from the noisy speech spectrum. Spectral in-
`noise spectrum from the noisy speech spectrum. Spectral in(cid:173)
`formation required to describe the noise spectrum is obtained
`formation required to describe the noise spectrum is obtained
`from the
`signal measured during nonspeech
`activity. After
`from the signal measured during nonspeech activity. After
`developing the spectral estimator, the spectral error is com-
`developing the spectral estimator, the spectral error is com(cid:173)
`puted and four methods for reducing it are presented.
`puted and four methods for reducing it are presented.
`The following assumptions were used
`in developing the
`The following assumptions were used in developing the
`analysis. The background noise
`is acoustically or digitally
`analysis. The background noise is acoustically or digitally
`added to the
`speech. The background noise environment
`added to the speech. The background noise environment
`remains locally stationary to the degree that its spectral mag-
`remains locally stationary to the degree that its spectral mag(cid:173)
`nitude expected value just prior to speech activity equals its
`nitude expected value just prior to speech activity equals its
`expected value during speech
`activity.
`If the environment
`If the environment
`expected value during speech activity.
`changes to a new stationary state, there
`exists enough time
`changes to a new stationary state, there exists enough time
`(about 300 ms) to estimate a new background noise spectral
`(about 300 ms) to estimate a new background noise spectral
`magnitude expected value before speech activity commences.
`magnitude expected value before speech activity commences.
`For the slowly varying nonstationary noise environment, the
`For the slowly varying non stationary noise environment, the
`algorithm requires a speech
`activity detector to signal the
`algorithm requires a speech activity detector to signal the
`0096-3518/79/0400-0113$00.75 0 1979 IEEE
`0096-3518/79/0400-0113$00.75 © 1979 IEEE
`
`B ACKGROUND noise
`I. INTRODUCTION
`1. INTRODUCTION
`acoustically added to speech can
`BACKGROUND noise acoustically added to speech can
`degrade the performance of digital voice processors used
`degrade the performance of digital voice processors used
`for applications such as speech compression, recognition, and
`for applications such as speech compression, recognition, and
`authentication [ 11 , [2] . Digital voice systems will be used in
`authentication [lJ, [2]. Digital voice systems will be used in
`a variety of environments, and their performance must
`be
`a variety of environments, and their performance must be
`maintained at a level near that measured using noise-free input
`maintained at a level near that measured using noise-free input
`speech. To ensure continued
`reliability, the effects of back-
`speech. To ensure continued reliability, the effects of back(cid:173)
`ground noise can be reduced by using noise-cancelling micro-
`ground noise can be reduced by using noise-cancelling micro(cid:173)
`phones, internal modification of the voice processor algorithms
`phones, internal modification ofthe voice processor algorithms
`to explicitly compensate for
`signal contamination, or pre-
`to explicitly compensate for signal contamination, or pre(cid:173)
`processor noise reduction.
`processor noise reduction.
`essential for ex-
`Noise-cancelling microphones, although
`Noise-cancelling microphones, although essential for ex(cid:173)
`tremely high noise environments such as the helicopter cockpit,
`tremely high noise environments such as the helicopter cockpit,
`offer little or no noise reduction above 1 kHz [3] (see Fig. 5).
`offer little or no noise reduction above 1 kHz [3] (see Fig. 5).
`Techniques available for voice processor modification to ac-
`Techniques available for voice processor modification to ac(cid:173)
`count for noise contamination are being developed [4] , [ 5 ] .
`count for noise contamination are being developed [4], [5].
`But due to the time, effort, and money spent on the
`design
`But due to the time, effort, and money spent on the design
`and implementation of these voice processors [6] -[8] , there
`and implementation of these voice processors [6] -[8] , there
`is a reluctance to internally modify these systems.
`is a reluctance to internally modify these systems.
`Preprocessor noise reduction E121 , [21] offers the advantage
`Preprocessor noise reduction [12], [21] offers the advantage
`that noise stripping is done on the waveform itself with the
`that noise stripping is done on the waveform itself with the
`output being either digital or analog speech. Thus,
`existing
`output being either digital or analog speech. Thus, existing
`voice processors tuned
`to clean speech can
`continue to be
`voice processors tuned to clean speech can continue to be
`used unmodified. Also, since the output is speech, the noise
`used unmodified. Also, since the output is speech, the noise
`stripping becomes independent of any
`specific subsequent
`stripping becomes independent of any specific subsequent
`
`Manuscript received June 1, 1978; revised September 12, 1978. This
`Manuscript received June 1, 1978; revised September 12, 1978. This
`research was supported by the Information Processing Branch
`of the
`research was supported by the Information Processing Branch of the
`Agency, monitored by the Naval
`Defense Advanced Research Projects
`Defense Advanced Research Projects Agency, monitored by the Naval
`Research Laboratory under Contract N00173-77-C-0041,
`Research Laboratory under Contract NOOI73-77-C-0041.
`The author is with the Department of Computer Science, University
`The author is with the Department of Computer Science, University
`of Utah, Salt Lake City, UT 84112.
`of Utah, Salt Lake City, UT 84112.
`
`Authorized licensed use limited to: Keio University. Downloaded on January 19, 2010 at 04:38 from IEEE Xplore. Restrictions apply.
`
`RTL345-2_1023-0001
`
`
`
`114
`114
`
`AND SIGNAL PROCESSING, VOL. ASSP-27, NO. 2, APRIL 1979
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-27, NO.2, APRIL 1979
`
`program that speech has ceased and a new noise bias can be
`program that speech has ceased and a new noise bias can be
`estimated. Finally, it is assumed that significant noise reduc-
`estimated. Finally, it is assumed that significant noise reduc(cid:173)
`tion is possible by removing the effect of noise from the mag-
`tion is possible by removing the effect of noise from the mag(cid:173)
`nitude spectrum only.
`nitude spectrum only.
`Speech, suitably low-pass filtered and digitized, is analyzed
`Speech, suitably low-pass filtered and digitized, is analyzed
`by windowing data from half-overlapped input data buffers.
`by windowing data from half-overlapped input data buffers.
`The magnitude spectra of the windowed
`data are calculated
`The magnitude spectra of the windowed data are calculated
`and the spectral noise bias calculated during nonspeech activity
`and the spectral noise bias calculated during nonspeech activity
`is subtracted off. Resulting
`negative amplitudes are then
`is subtracted off. Resulting negative amplitudes are then
`zeroed out. Secondary residual noise suppression is then
`zeroed out. Secondary residual noise suppression is then
`applied. A time waveform is recalculated from the modified
`applied. A time waveform is recalculated from the modified
`magnitude. This waveform is then overlap added to the previ-
`magnitude. This waveform is then overlap added to the previ(cid:173)
`ous data to generate the output speech.
`ous data to generate the output speech.
`B. Additive Noise Model
`B. Additive Noise Model
`Assume that a windowed noise signal n(k) has been added to
`. Assume that a windowed noise signal n(k) has been added to
`a windowed speech signal s(k), with their sum denoted by X@).
`a windowed speech signal s(k), with their sum denoted by x (k).
`Then
`Then
`x(k) = s(k) + n(k).
`x(k)::: s(k) + n(k).
`Taking the Fourier transform gives
`Taking the Fourier transform gives
`X(e'") = S(ei") + N(eiw)
`X(e iw ) ::: S(e iw ) + N(e iw )
`where
`where
`x(k) ++ X(ei")
`x(k) ~ X(e iW )
`
`X(e iW )::: ~l x(k)eiWk
`k=O
`k=O
`
`1,
`
`
`.
`. k
`X(eiw)ejwk dw.
`X(e1W)e 1W dw.
`
`1
`=
`i1T
`2n
`1
`x(#%)= -
`x(k)::: -
`-1T
`2rr
`C. Spectral Subtraction Estimator
`C. Spectral Subtraction Estimator
`The spectral subtraction filter H(eiw) is Calculated by re-
`The spectral subtraction filter H(e iw ) is calculated by re(cid:173)
`placing the noise spectrum N(eiw) with spectra which can be
`placing the noise spectrum N(e iw ) with spectra which can be
`readily measured. The magnitude (N(eiw)( of N(eiw) is re-
`readily measured. The magnitude IN(eiw)1 of N(e iw ) is re(cid:173)
`placed by its average value p ( e J w ) taken during nonspeech
`placed by its average value JJ.(e iw ) taken during nonspeech
`activity, and the phase e,(ei")
`of N(eiw) is replaced by the
`activity, and the phase ON(e iw ) of N(e iw ) is replaced by the
`phase ex(eiw) of X(eiw). T2ese substitutions result in the
`phase Ox(e iw ) of X(e iw ). These substitutions result in the
`spectral subtraction estimator S(eiw):
`spectral subtraction estimator S (e/ W
`
`A
`
`•
`
`):
`
`S(e iw )::: [IX(eiw)l- JJ.(eiw)]ei8x(eiw)
`
`or
`
`with
`
`JJ.(e iw )
`H(e iw ) ::: 1 - ~o.......,.~
`IX(e iw )/
`JJ.(e iw )::: E {IN(eiw)I}.
`D. Spectral Error
`D. Spectral Error
`resulting from this estimator is
`The spectral error e(e'")
`The spectral error e(e iw ) resulting from this estimator is
`given by
`given by
`( jW)
`ee
`
`=$(e'W) - s(eiW> = N ( e i w ) - p(e'"> ejex.
`(iw) iOx
`SAC iW) S( iW) N( iw)
`::::
`-
`::::
`e
`e
`e
`-JJ.e
`e
`.
`
`A number of simple modifications are available
`to reduce
`A number of simple modifications are available to reduce
`the auditory effects of this
`spectral error. These include:
`the auditory effects of this spectral error. These include:
`1) magnitude averaging; 2) half-wave rectification; 3) residual
`1) magnitude averaging; 2) half-wave rectification; 3) residual
`noise reduction; and 4) additional signal attenuation during
`noise reduction; and 4) additional signal attenuation during
`nonspeech activity.
`nonspeech activity.
`E. Magnitude Averaging
`E. Magnitude Averaging
`Since the spectral error equals the difference between the
`Since the spectral error equals the difference between the
`noise spectrum N and its mean p, local averaging of spectral
`noise spectrum N and its mean JJ., local averaging of spectral
`magnitudes can be
`used to reduce the error. Replacing
`magnitudes can be used to reduce the error. Replacing
`IX(eiw)I with IX(ejW)I where
`IX(eiW)1 with IX(eiW)1 where
`1 M-1
`Ix(ej")l E
`IXi(e'")I
`IX(eiw)l::: ..!.. ~l IXj(eiw)1
`i=O
`M
`i=o
`Xi(&" = ith time-windowed transform of x(k)
`Xj(e iW )::: ith time-windowed transform ofx(k)
`gives
`gives
`SA (e iw ) :::: [/X(eiW)I- JJ.(eiw)] ei8x(eiW).
`The rationale behind averaging is that the spectral error be-
`The rationale behind averaging is that the spectral error be(cid:173)
`comes approximately
`comes approximately
`- s(eiw> zs
`e(e'"> =
`- p
`e(e iw ) ::: SA (e iw ) - S(e iW ):::: INI - JJ.
`where
`where
`
`.
`
`--,.-. -
`1 M-l
`IN(e1W)1 ::: - L /Nj(e/W)/.
`M
`i=O
`Thus, the sample mean of IN(eiw)l will converge to p(e'") as
`Thus, the sample mean of /N(eiw)1 will converge to JJ.(e iw ) as
`a longer average is taken.
`a longer average is taken.
`
`The obvious problem with this modification is that the speech
`The obvious problem with this modification is that the speech
`is nonstationary, and therefore only limited time averaging is
`is nonstationary, and therefore only limited time averaging is
`allowed. DRT
`results show that averaging over more
`than
`allowed. DRT results show that averaging over more than
`three half-overlapped windows with a total time duration
`of
`three half-overlapped windows with a total time duration of
`38.4 ms will decrease intelligibility. Spectral examples and
`38.4 ms will decrease intelligibility. Spectral examples and
`DRT scores with
`and without averaging are given
`in the
`DRT scores with and without averaging are given in the
`"Results" section. Based upon these results, it appears that
`"Results" section. Based upon these results, it appears that
`averaging coupled with half rectification offers some improve-
`averaging coupled with half rectification offers some improve(cid:173)
`ment. The major disadvantage of averaging is the risk of some
`ment. The major disadvantage of averaging is the risk of some
`temporal smearing of short transitory sounds.
`temporal smearing of short transitory sounds.
`F, Half- Wave Rectification
`F. Half- Wave Rectification
`For each frequency w where the noisy signal spectrum mag-
`For each frequency w where the noisy signal spectrum mag(cid:173)
`nitude IX(eIW)I is less than the average noise spectrum mag-
`nitude /X(eiW)1 is less than the average noise spectrum mag(cid:173)
`nitude p(ei"),
`the output
`is set
`to zero. This modification
`nitude JJ.(e iw ), the output is set to zero. This modification
`implemented by half-wave rectifying H(eiw).
`can be simply
`can be simply implemented by half-wave rectifying H(e iw ).
`The estimator then becomes
`The estimator then becomes
`$ ( e j w > = HR(ejW)X(ejW)
`where
`where
`
`The input-output relationship between X(eiW) and $(eiw) at
`The input-output relationship between X(e/ W) and S(e 1W ) at
`each frequency c3 is shown in Fig. 1 .
`each frequency w is shown in Fig. 1.
`Thus, the effect of half-wave rectification is to bias down the
`Thus, the effect of half-wave rectification is to bias down the
`w by the noise bias
`magnitude spectrum at each frequency
`magnitude spectrum at each frequency w by the noise bias
`determined at that frequency. The bias value can, of course,
`determined at that frequency. The bias value can, of course,
`
`A .
`
`•
`
`Authorized licensed use limited to: Keio University. Downloaded on January 19, 2010 at 04:38 from IEEE Xplore. Restrictions apply.
`
`RTL345-2_1023-0002
`
`
`
`BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH
`
`
`BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH
`
`115
`115
`
`probability that the spectrum at that frequency is due tolow
`probability that the spectrum at that frequency is due to low
`energy speech; therefore,*taking the minimum will retain the
`energy speech; therefore, taking the minimum will retain the
`"
`.
`information; and third, if S(eiw) is greater than the maximum,
`information; and third, if S(elW) is greater than the maximum,
`there is speech present at that frequency; therefore, removing
`there is speech present at that frequency; therefore, removing
`the bias is sufficient. The amount of noise reduction using this
`the bias is sufficient. The amount of noise reduction using this
`replacement scheme was judged equivalent to that obtained by
`replacement scheme was judged equivalent to that obtained by
`averaging over three frames. However, with this approach high
`averaging over three frames. However, with this approach high
`energy frequency bins are not averaged together. The disad-
`energy frequency bins are not averaged together. The disad·
`vantage to the scheme is that more storage is required to save
`vantage to the scheme is that more storage is required to save
`the maximum noise residuals and the magnitude
`values for
`the maximum noise residuals and the magnitude values for
`three adjacent frames.
`three adjacent frames.
`The residual noise reduction scheme is implemented as
`The residual noise reduction scheme is implemented as
`IS;(eIW)1 = IS;(eIW)I,
`for IS;(eIW)I;;;' max INR (eIW)1
`ISi(eIW)1 = min {ISi(eJW)1 j = i-I, i, i + I},
`IStCeiW)1 < max iNR(eiw )I
`for
`
`/'11.. A .
`
`A
`
`•
`
`' .
`
`A
`
`•
`
`A
`
`.
`
`A .
`
`•
`
`•
`
`where
`where
`Si(e IW ) =HR(eIW)Xi(eIW )
`and
`and
`max INR(ejw)I = maximum value of noise residual
`max INR(eiw)1 = maximum value of noise residual
`measured during nonspeech activity.
`measured during nonspeech activity.
`H. Additional Signal Attenuation During Nonspeech Activity
`H Additional Signal Attenuation During Nonspeech Activity
`The energy content of $(ei") relative to p(eiW) provides an
`The energy content of S(e IW ) relative to p.(eIW ) provides an
`accurate indicator of the presence of speech activity within a
`accurate indicator of the presence of speech activity within a
`given analysis frame. If speech activity is absent, then S(eiw)
`given analysis frame. If speech activity is absent, then S (e iw )
`will consist of the noise residual which remains after half-wave
`will consist of the noise residual which remains after half-wave
`rectification and minimum value selection. Empirically,it was
`rectification and minimum value selection. Empirically, it was
`determined that the average (before versus after) power ratio
`determined that the average (before versus after) power ratio
`was down at least 12 dB. This implied a measure for detecting
`was down at least 12 dB. This implied a measure for detecting
`the absence of speech given by
`the absence of speech given by
`T= 2010g10 [_1 J11 I s(e~:) I dWJ.
`J.I.(e l
`217
`)
`If T was less than - 12 dB, the frame was classified as having
`If T was less than -12 dB, the frame was classified as having
`no speech activity. During the absence of speech activity there
`no speech activity. During the absence of speech activity there
`are at least three options prior to resynthesis: do nothing, at-
`are at least three options prior to resynthesis: do nothing, at·
`tenuate the output by a fixed factor, or set the output to zero.
`tenuate the output by a fixed factor, or set the output to zero.
`Having some signal present during nonspeech
`activity was
`Having some signal present during nonspeech activity was
`judged to give the higher quality result. A possible reason for
`judged to give the higher quality result. A possible reason for
`activity is partially
`this is that noise present during speech
`this is that noise present during speech activity is partially
`masked by the
`speech.
`Its perceived magnitude should be
`masked by the speech.
`Its perceived magnitude should be
`balanced by the presence of the same amount of noise during
`balanced by the presence of the same amount of noise during
`nonspeech activity. Setting the buffer to zero had the effect
`nonspeech activity. Setting the buffer to zero had the effect
`of amplifying the noise during speech activity. Likewise, doing
`of amplifying the noise during speech activity. Likewise, doing
`nothing had the effect of amplifying the noise during nonspeech
`nothing had the effect of amplifying the noise during nonspeech
`activity. A reasonable, though by no means optimum amount
`activity. A reasonable, though by no means optimum amount
`of attenuation was found to be -30 dB. Thus, the
`output
`of attenuation was found to be -30 dB. Thus, the output
`spectral estimate including
`output attenuation during non-
`spectral estimate including output attenuation during non(cid:173)
`speech activity is given by
`speech activity is given by
`T>-12dB
`S(e iw ) = {s(e~W) T;;;'-12 dB
`T Q - 12 dB
`cX(eiw)
`cX(e IW )
`T';;;;-12 dB
`where 20 log,, c = -30 dB.
`where 20 Iog 1o c = -30 dB.
`
`A
`
`.
`
`•
`
`-11
`
`$(ejw)= { $(eiw)
`
`Fig. 1. Input-output relation between X(@) and $(eiw).
`Fig. 1. Input-output relation between X(e IW) and S (e IW).
`
`A '
`
`•
`
`change from frequency to frequency as well as from analysis
`change from frequency to frequency as well as from analysis
`time window to time window. The advantage of half rectifica-
`time window to time window. The advantage of half recti fica(cid:173)
`tion is that the noise floor is reduced by p(eiw). Also, any
`tion is that the noise floor is reduced by J.I.(e iw ). Also, any
`low variance coherent noise tones are essentially eliminated.
`low variance coherent noise tones are essentially eliminated.
`The disadvantage of half rectification can exhibit itself in the
`The disadvantage of half rectification can exhibit itself in the
`situation where the sum of the noise plus speech at a frequency
`situation where the sum of the noise plus speech at a frequency
`than p(e'"). Then the speech information at that
`w is less
`w is less than J.I.(e iw ). Then the speech information at that
`frequency is incorrectly removed, implying a possible decrease
`frequency is incorrectly removed, implying a possible decrease
`in intelligibility. As discussed in the section on "Results," for
`in intelligibility. As discussed in the section on "Results," for
`the helicopter speech data base this processing did not reduce
`the helicopter speech data base this processing did not reduce
`intelligibility as measured using the DRT.
`intelligibility as measured using the DRT.
`C. Residual Noise Reduction
`C. Residual Noise Reduction
`After half-wave rectification, speech plus noise lying above
`After half-wave rectification, speech plus noise lying above
`activity the difference
`p remain. In the absence of speech
`J.I. remain.
`In the absence of speech activity the difference
`N R - - N - p i e n , which shall be called the noise residual, will
`NR =N - J.l.e iOn , which shall be called the noise residual, will
`for uncorrelated noise exhibit itself in the spectrum as ran-
`for uncorrelated noise exhibit itself in the spectrum as ran(cid:173)
`domly spaced narrow bands of magnitude
`spikes (see Fig. 7).
`domly spaced narrow bands of magnitude spikes (see Fig. 7).
`This noise residual will have a magnitude between zero and a
`This noise residual will have a magnitude between zero and a
`maximum value measured during nonspeech
`activity. Trans-
`maximum value measured during nonspeech activity. Trans(cid:173)
`formed back to the time domain, the noise residual will sound
`formed back to the time domain, the noise residual will sound
`like the sum of
`tone generators with random fundamental
`like the sum of tone generators with random fundamental
`frequencies which are turned on and off at a rate of about 20
`frequencies which are turned on and off at a rate of about 20
`ms. During speech activity the noise residual will also be per-
`ms. During speech activity the noise residual will also be per(cid:173)
`ceived at those frequencies which
`are not masked by the
`ceived at those frequencies which are not masked by the
`speech.
`speech.
`residual can be reduced by
`The audible effects of the noise
`The audible effects of the noise residual can be reduced by
`taking advantage of
`its frame-to-frame randomness. Specifi-
`taking advantage of its frame-to-frame randomness. Specifi(cid:173)
`cally, at a given frequency bin, since the noise residual will
`cally, at a given frequency bin, since the noise residual will
`randomly fluctuate in amplitude at each
`analysis frame, it
`randomly fluctuate in amplitude at each analysis frame, it
`can be
`suppressed by replacing its current
`value with its
`can be suppressed by replacing its current value with its
`minimum value chosen from the adjacent
`analysis frames.
`minimum value chosen from the adjacent analysis frames.
`T&$g
`the minimum value is used only when the magnitude
`Takjng . the minimum value is used only when the magnitude
`of S ( e i w ) is less than the maximum noise residual calculated
`of S (e 1W ) is less than the maximum noise residual calculated
`during nonspeech activity. The motivation behind tEs replace-
`during nonspeech activity. The motivation behind this replace-
`ment scheme is threefold: first, if the amplitude of &'(eiW) lies
`ment scheme is threefold: first, if the amplitude of S (e IW ) lies
`below the maximum noise residual, and it varies radically from
`below the maximum noise residual, and it varies radically from
`analysis frame to frame, then there is a high probability that
`analysis frame to frame, then there is a high probability that
`the spectrum at that frequency is due to noee; therefore, sup-
`the spectrum at that frequency is due to noise; therefore, sup-
`.
`press it by taking the minimum; second, if S(eiw) lies below
`press it by taking the minimum; second, if S(e IW ) lies below
`the maximum but has a nearly constant value, there is a high
`the maximum but has a nearly constant value, there is a high
`
`A
`
`.
`
`/'.
`
`Authorized licensed use limited to: Keio University. Downloaded on January 19, 2010 at 04:38 from IEEE Xplore. Restrictions apply.
`
`RTL345-2_1023-0003
`
`
`
`116
`116
`
`1
`
`PROCESS
`
`ASP-27, NO. 2, APRIL 1979
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL.
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP·27, NO.2, APRIL 1979
`D. Magnitude Averaging
`D. Magnitude Averaging
`As was described in the previous section, the variance of the
`As was described in the previous section, the variance of the
`noise spectral estimate is reduced by averaging over as many
`noise spectral estimate is reduced by averaging over as many
`spectral magnitude sets as possible. However, the nonstation-
`spectral magnitude sets as possible. However, the nonstation(cid:173)
`arity of the speech limits the total time interval available for
`arity of the speech limits the total time interval available for
`local averaging. The number of averages is
`limited by the
`local averaging. The number of averages is limited by the
`number of analysis windows which can be fit into the stationary
`number of analysis windows which can be fit into the stationary
`speech time interval. The choice of window length and averag-
`speech time interval. The choice of window length and averag(cid:173)
`ing interval must compromise between
`conflicting require-
`ing interval must compromise between conflicting require(cid:173)
`ments. For acceptable spectral resolution
`a window length
`ments. For acceptable spectral resolution a window length
`greater than twice the expected largest pitch period is required
`greater than twice the expected largest pitch period is required
`with a 256-point window being used. For minimum noise
`with a 256-point window being used. For minimum noise
`variance a large number of windows are required for averaging.
`variance a large number of windows are required for averaging.
`Finally, for acceptable time resolution a narrow analysis inter-
`Finally, for acceptable time resolution a narrow analysis inter(cid:173)
`val is required. A reasonable compromise between variance
`val is required. A reasonable compromise between variance
`reduction and
`time resolution appears to be three averages.
`reduction and time resolution appears to be three averages.
`This results in an effective analysis time interval of 38 ms.
`This results in an effective analysis time interval of 38 ms.
`E. Bias Estimation
`E. Bias Estimation
`at
`req