`IEEE TRANSACTIONS
`
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP·27, NO.2, APRIL 1979 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP·27, NO.2, APRIL 1979
`
`113
`
`113 113
`
`Suppression of Acoustic Noise in Speech Using
`
`Suppression of Acoustic Noise in Speech Using Suppression of Acoustic Noise in Speech Using
`Spectral Subtraction
`
`Spectral Subtraction Spectral Subtraction
`
`
`
`STEVEN F. BOLL, MEMBER, IEEE STEVEN F. BOLL, MEMBER, IEEE
`
`Abstract-A stand-alone noise suppression algorithm is presented for
`
`Abstract-A stand-alone noise suppression algorithm is presented for Abstract-A stand-alone noise suppression algorithm is presented for
`reducing the spectral effects of acoustically added noise in speech. Ef-
`
`reducing the spectral effects of acoustically added noise in speech. Ef(cid:173)reducing the spectral effects of acoustically added noise in speech. Ef(cid:173)
`fective performance of digital speech processors operating in practical
`
`fective performance of digital speech processors operating in practical fective performance of digital speech processors operating in practical
`environments may require suppression of noise from the digital wave-
`
`environments may require suppression of noise from the digital wave(cid:173)environments may require suppression of noise from the digital wave(cid:173)
`form. Spectral subtraction offers a computationally efficient, processor-
`
`form. Spectral subtraction offers a computationally efficient, processor(cid:173)form. Spectral subtraction offers a computationally efficient, processor(cid:173)
`independent approach to effective digital speech analysis. The method,
`
`independent approach to effective digital speech analysis. The method, independent approach to effective digital speech analysis. The method,
`requiring about the same computation as high-speed convolution, sup-
`
`requiring about the same computation as high-speed convolution, sup(cid:173)requiring about the same computation as high-speed convolution, sup(cid:173)
`presses stationary noise from speech by subtracting the spectral noise
`
`presses stationary noise from speech by subtracting the spectral noise presses stationary noise from speech by subtracting the spectral noise
`bias calculated during nonspeech activity. Secondary procedures are
`
`bias calculated during nonspeech activity. Secondary procedures are bias calculated during nonspeech activity. Secondary procedures are
`then applied to attenuate the residual noise left after subtraction. Since
`
`then applied to attenuate the residual noise left after subtraction. Since then applied to attenuate the residual noise left after subtraction. Since
`the algorithm resynthesizes a speech waveform, it can be used as a pre-
`
`the algorithm resynthesizes a speech waveform, it can be used as a pre(cid:173)the algorithm resynthesizes a speech waveform, it can be used as a pre(cid:173)
`processor to narrow-band voice communications systems, speech recog-
`
`processor to narrow-band voice communications systems, speech recog(cid:173)processor to narrow-band voice communications systems, speech recog(cid:173)
`nition systems, or speaker authentication systems.
`
`nition systems, or speaker authentication systems. nition systems, or speaker authentication systems.
`
`speech processor implementation (it could be connected to a
`
`speech processor implementation (it could be connected to a speech processor implementation (it could be connected to a
`CCD channel vocoder or a digital LPC vocoder).
`
`CCD channel vocoder or a digital LPC vocoder). CCD channel vocoder or a digital LPC vocoder).
`The objectives of this effort were to develop a noise sup-
`
`The objectives of this effort were to develop a noise sup(cid:173)The objectives of this effort were to develop a noise sup(cid:173)
`pression technique, implement a computationally
`efficient
`
`pression technique, implement a computationally efficient pression technique, implement a computationally efficient
`algorithm, and test its performance
`in actual noise environ-
`
`algorithm, and test its performance in actual noise environ(cid:173)algorithm, and test its performance in actual noise environ(cid:173)
`ments. The approach used was
`to estimate the magnitude
`
`ments. The approach used was to estimate the magnitude ments. The approach used was to estimate the magnitude
`frequency spectrum of the underlying clean speech by sub-
`
`frequency spectrum of the underlying clean speech by sub(cid:173)frequency spectrum of the underlying clean speech by sub(cid:173)
`tracting the noise magnitude spectrum from the noisy speech
`
`tracting the noise magnitude spectrum from the noisy speech tracting the noise magnitude spectrum from the noisy speech
`spectrum. This estimator requires an estimate
`of the current
`
`spectrum. This estimator requires an estimate of the current spectrum. This estimator requires an estimate of the current
`noise spectrum. Rather than obtain
`this noise estimate from
`
`noise spectrum. Rather than obtain this noise estimate from noise spectrum. Rather than obtain this noise estimate from
`[9] , [lo] , it is approximated
`a second microphone source
`
`a second microphone source [9], [10], it is approximated a second microphone source [9], [10], it is approximated
`using the average noise magnitude measured during nonspeech
`
`using the average noise magnitude measured during nonspeech using the average noise magnitude measured during nonspeech
`activity. Using this approach, the spectral approximation error
`
`activity. Using this approach, the spectral approximation error activity. Using this approach, the spectral approximation error
`is then defined, and secondary methods for reducing
`it are
`
`is then defined, and secondary methods for reducing it are is then defined, and secondary methods for reducing it are
`described.
`
`described. described.
`is implemented using about the same
`The noise suppressor
`
`The noise suppressor is implemented using about the same The noise suppressor is implemented using about the same
`amount of computation as required in a high-speech convolu-
`
`amount of computation as required in a high-speech convolu(cid:173)amount of computation as required in a high-speech convolu(cid:173)
`tion.
`It is tested on speech recorded in a helicopter environ-
`
`tion. It is tested on speech recorded in a helicopter environ(cid:173)tion. It is tested on speech recorded in a helicopter environ(cid:173)
`ment. Its performance is measured using the Diagnostic Rhyme
`
`ment. Its performance is measured using the Diagnostic Rhyme ment. Its performance is measured using the Diagnostic Rhyme
`Test (DRT) [ 111 and is demonstrated using isometric plots of
`
`Test (DRT) [11] and is demonstrated using isometric plots of Test (DRT) [11] and is demonstrated using isometric plots of
`short-time spectra.
`
`short-time spectra. short-time spectra.
`The paper is divided into sections which develop the spectral
`
`The paper is divided into sections which develop the spectral The paper is divided into sections which develop the spectral
`estimator, describe the algorithm implementation, and demon-
`
`estimator, describe the algorithm implementation, and demon(cid:173)estimator, describe the algorithm implementation, and demon(cid:173)
`strate the algorithm performance.
`
`strate the algorithm performance. strate the algorithm performance.
`11. SUBTRACTIVE NOISE SUPPRESSION ANALYSIS
`
`II. SUBTRACTIVE NOISE SUPPRESSION ANALYSIS II. SUBTRACTIVE NOISE SUPPRESSION ANALYSIS
`A. Introduction
`
`A. Introduction A. Introduction
`This section describes the noise-suppressed spectral estimator.
`
`This section describes the noise-suppressed spectral estimator. This section describes the noise-suppressed spectral estimator.
`The estimator is obtained by subtracting an estimate of the
`
`The estimator is obtained by subtracting an estimate of the The estimator is obtained by subtracting an estimate of the
`noise spectrum from the noisy speech spectrum. Spectral in-
`
`noise spectrum from the noisy speech spectrum. Spectral in(cid:173)noise spectrum from the noisy speech spectrum. Spectral in(cid:173)
`formation required to describe the noise spectrum is obtained
`
`formation required to describe the noise spectrum is obtained formation required to describe the noise spectrum is obtained
`from the
`signal measured during nonspeech
`activity. After
`
`from the signal measured during nonspeech activity. After from the signal measured during nonspeech activity. After
`developing the spectral estimator, the spectral error is com-
`
`developing the spectral estimator, the spectral error is com(cid:173)developing the spectral estimator, the spectral error is com(cid:173)
`puted and four methods for reducing it are presented.
`
`puted and four methods for reducing it are presented. puted and four methods for reducing it are presented.
`The following assumptions were used
`in developing the
`
`The following assumptions were used in developing the The following assumptions were used in developing the
`analysis. The background noise
`is acoustically or digitally
`
`analysis. The background noise is acoustically or digitally analysis. The background noise is acoustically or digitally
`added to the
`speech. The background noise environment
`
`added to the speech. The background noise environment added to the speech. The background noise environment
`remains locally stationary to the degree that its spectral mag-
`
`remains locally stationary to the degree that its spectral mag(cid:173)remains locally stationary to the degree that its spectral mag(cid:173)
`nitude expected value just prior to speech activity equals its
`
`nitude expected value just prior to speech activity equals its nitude expected value just prior to speech activity equals its
`expected value during speech
`activity.
`If the environment
`
`If the environment If the environment
`
`expected value during speech activity. expected value during speech activity.
`changes to a new stationary state, there
`exists enough time
`
`changes to a new stationary state, there exists enough time changes to a new stationary state, there exists enough time
`(about 300 ms) to estimate a new background noise spectral
`
`(about 300 ms) to estimate a new background noise spectral (about 300 ms) to estimate a new background noise spectral
`magnitude expected value before speech activity commences.
`
`magnitude expected value before speech activity commences. magnitude expected value before speech activity commences.
`For the slowly varying nonstationary noise environment, the
`
`For the slowly varying non stationary noise environment, the For the slowly varying non stationary noise environment, the
`algorithm requires a speech
`activity detector to signal the
`
`algorithm requires a speech activity detector to signal the algorithm requires a speech activity detector to signal the
`0096-3518/79/0400-0113$00.75 0 1979 IEEE
`
`0096-3518/79/0400-0113$00.75 © 1979 IEEE 0096-3518/79/0400-0113$00.75 © 1979 IEEE
`
`B ACKGROUND noise
`I. INTRODUCTION
`
`1. INTRODUCTION 1. INTRODUCTION
`acoustically added to speech can
`
`BACKGROUND noise acoustically added to speech can BACKGROUND noise acoustically added to speech can
`degrade the performance of digital voice processors used
`
`degrade the performance of digital voice processors used degrade the performance of digital voice processors used
`for applications such as speech compression, recognition, and
`
`for applications such as speech compression, recognition, and for applications such as speech compression, recognition, and
`authentication [ 11 , [2] . Digital voice systems will be used in
`
`authentication [lJ, [2]. Digital voice systems will be used in authentication [lJ, [2]. Digital voice systems will be used in
`a variety of environments, and their performance must
`be
`
`a variety of environments, and their performance must be a variety of environments, and their performance must be
`maintained at a level near that measured using noise-free input
`
`maintained at a level near that measured using noise-free input maintained at a level near that measured using noise-free input
`speech. To ensure continued
`reliability, the effects of back-
`
`speech. To ensure continued reliability, the effects of back(cid:173)speech. To ensure continued reliability, the effects of back(cid:173)
`ground noise can be reduced by using noise-cancelling micro-
`
`ground noise can be reduced by using noise-cancelling micro(cid:173)ground noise can be reduced by using noise-cancelling micro(cid:173)
`phones, internal modification of the voice processor algorithms
`
`phones, internal modification ofthe voice processor algorithms phones, internal modification ofthe voice processor algorithms
`to explicitly compensate for
`signal contamination, or pre-
`
`to explicitly compensate for signal contamination, or pre(cid:173)to explicitly compensate for signal contamination, or pre(cid:173)
`processor noise reduction.
`
`processor noise reduction. processor noise reduction.
`essential for ex-
`Noise-cancelling microphones, although
`
`Noise-cancelling microphones, although essential for ex(cid:173)Noise-cancelling microphones, although essential for ex(cid:173)
`tremely high noise environments such as the helicopter cockpit,
`
`tremely high noise environments such as the helicopter cockpit, tremely high noise environments such as the helicopter cockpit,
`offer little or no noise reduction above 1 kHz [3] (see Fig. 5).
`
`offer little or no noise reduction above 1 kHz [3] (see Fig. 5). offer little or no noise reduction above 1 kHz [3] (see Fig. 5).
`Techniques available for voice processor modification to ac-
`
`Techniques available for voice processor modification to ac(cid:173)Techniques available for voice processor modification to ac(cid:173)
`count for noise contamination are being developed [4] , [ 5 ] .
`
`count for noise contamination are being developed [4], [5]. count for noise contamination are being developed [4], [5].
`But due to the time, effort, and money spent on the
`design
`
`But due to the time, effort, and money spent on the design But due to the time, effort, and money spent on the design
`and implementation of these voice processors [6] -[8] , there
`
`and implementation of these voice processors [6] -[8] , there and implementation of these voice processors [6] -[8] , there
`is a reluctance to internally modify these systems.
`
`is a reluctance to internally modify these systems. is a reluctance to internally modify these systems.
`Preprocessor noise reduction E121 , [21] offers the advantage
`
`Preprocessor noise reduction [12], [21] offers the advantage Preprocessor noise reduction [12], [21] offers the advantage
`that noise stripping is done on the waveform itself with the
`
`that noise stripping is done on the waveform itself with the that noise stripping is done on the waveform itself with the
`output being either digital or analog speech. Thus,
`existing
`
`output being either digital or analog speech. Thus, existing output being either digital or analog speech. Thus, existing
`voice processors tuned
`to clean speech can
`continue to be
`
`voice processors tuned to clean speech can continue to be voice processors tuned to clean speech can continue to be
`used unmodified. Also, since the output is speech, the noise
`
`used unmodified. Also, since the output is speech, the noise used unmodified. Also, since the output is speech, the noise
`stripping becomes independent of any
`specific subsequent
`
`stripping becomes independent of any specific subsequent stripping becomes independent of any specific subsequent
`
`Manuscript received June 1, 1978; revised September 12, 1978. This
`
`Manuscript received June 1, 1978; revised September 12, 1978. This Manuscript received June 1, 1978; revised September 12, 1978. This
`research was supported by the Information Processing Branch
`of the
`
`research was supported by the Information Processing Branch of the research was supported by the Information Processing Branch of the
`Agency, monitored by the Naval
`Defense Advanced Research Projects
`
`Defense Advanced Research Projects Agency, monitored by the Naval Defense Advanced Research Projects Agency, monitored by the Naval
`Research Laboratory under Contract N00173-77-C-0041,
`
`Research Laboratory under Contract NOOI73-77-C-0041. Research Laboratory under Contract NOOI73-77-C-0041.
`The author is with the Department of Computer Science, University
`
`The author is with the Department of Computer Science, University The author is with the Department of Computer Science, University
`of Utah, Salt Lake City, UT 84112.
`
`of Utah, Salt Lake City, UT 84112. of Utah, Salt Lake City, UT 84112.
`
`Authorized licensed use limited to: Keio University. Downloaded on January 19, 2010 at 04:38 from IEEE Xplore. Restrictions apply.
`
`RTL345-1_1024-0001
`
`
`
`114
`114
`114
`
`AND SIGNAL PROCESSING, VOL. ASSP-27, NO. 2, APRIL 1979
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-27, NO.2, APRIL 1979
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-27, NO.2, APRIL 1979
`
`program that speech has ceased and a new noise bias can be
`
`program that speech has ceased and a new noise bias can be program that speech has ceased and a new noise bias can be
`estimated. Finally, it is assumed that significant noise reduc-
`
`estimated. Finally, it is assumed that significant noise reduc(cid:173)estimated. Finally, it is assumed that significant noise reduc(cid:173)
`tion is possible by removing the effect of noise from the mag-
`
`tion is possible by removing the effect of noise from the mag(cid:173)tion is possible by removing the effect of noise from the mag(cid:173)
`nitude spectrum only.
`
`nitude spectrum only. nitude spectrum only.
`Speech, suitably low-pass filtered and digitized, is analyzed
`
`Speech, suitably low-pass filtered and digitized, is analyzed Speech, suitably low-pass filtered and digitized, is analyzed
`by windowing data from half-overlapped input data buffers.
`by windowing data from half-overlapped input data buffers.
`by windowing data from half-overlapped input data buffers.
`The magnitude spectra of the windowed
`data are calculated
`The magnitude spectra of the windowed data are calculated
`The magnitude spectra of the windowed data are calculated
`and the spectral noise bias calculated during nonspeech activity
`and the spectral noise bias calculated during nonspeech activity
`and the spectral noise bias calculated during nonspeech activity
`is subtracted off. Resulting
`negative amplitudes are then
`is subtracted off. Resulting negative amplitudes are then
`is subtracted off. Resulting negative amplitudes are then
`zeroed out. Secondary residual noise suppression is then
`zeroed out. Secondary residual noise suppression is then
`zeroed out. Secondary residual noise suppression is then
`applied. A time waveform is recalculated from the modified
`
`applied. A time waveform is recalculated from the modified applied. A time waveform is recalculated from the modified
`magnitude. This waveform is then overlap added to the previ-
`
`magnitude. This waveform is then overlap added to the previ(cid:173)magnitude. This waveform is then overlap added to the previ(cid:173)
`ous data to generate the output speech.
`
`ous data to generate the output speech. ous data to generate the output speech.
`B. Additive Noise Model
`B. Additive Noise Model
`B. Additive Noise Model
`Assume that a windowed noise signal n(k) has been added to
`. Assume that a windowed noise signal n(k) has been added to
`. Assume that a windowed noise signal n(k) has been added to
`a windowed speech signal s(k), with their sum denoted by X@).
`a windowed speech signal s(k), with their sum denoted by x (k).
`a windowed speech signal s(k), with their sum denoted by x (k).
`Then
`Then
`Then
`x(k) = s(k) + n(k).
`
`x(k)::: s(k) + n(k). x(k)::: s(k) + n(k).
`Taking the Fourier transform gives
`
`Taking the Fourier transform gives Taking the Fourier transform gives
`X(e'") = S(ei") + N(eiw)
`
`X(e iw ) ::: S(e iw ) + N(e iw ) X(e iw ) ::: S(e iw ) + N(e iw )
`where
`
`where where
`x(k) ++ X(ei")
`
`x(k) ~ X(e iW ) x(k) ~ X(e iW )
`
`X(e iW )::: ~l x(k)eiWk
`X(e iW )::: ~l x(k)eiWk
`k=O
`
`k=O k=O
`
`1,
`
`
`.
`. k
`.
`. k
`X(eiw)ejwk dw.
`
`X(e1W)e 1W dw. X(e1W)e 1W dw.
`
`1
`=
`i1T
`i1T
`2n
`1
`1
`x(#%)= -
`
`x(k)::: -x(k)::: -
`
`-1T -1T
`
`2rr 2rr
`C. Spectral Subtraction Estimator
`
`C. Spectral Subtraction Estimator C. Spectral Subtraction Estimator
`The spectral subtraction filter H(eiw) is Calculated by re-
`
`The spectral subtraction filter H(e iw ) is calculated by re(cid:173)The spectral subtraction filter H(e iw ) is calculated by re(cid:173)
`placing the noise spectrum N(eiw) with spectra which can be
`
`placing the noise spectrum N(e iw ) with spectra which can be placing the noise spectrum N(e iw ) with spectra which can be
`readily measured. The magnitude (N(eiw)( of N(eiw) is re-
`readily measured. The magnitude IN(eiw)1 of N(e iw ) is re(cid:173)
`readily measured. The magnitude IN(eiw)1 of N(e iw ) is re(cid:173)
`placed by its average value p ( e J w ) taken during nonspeech
`
`placed by its average value JJ.(e iw ) taken during nonspeech placed by its average value JJ.(e iw ) taken during nonspeech
`activity, and the phase e,(ei")
`of N(eiw) is replaced by the
`activity, and the phase ON(e iw ) of N(e iw ) is replaced by the
`activity, and the phase ON(e iw ) of N(e iw ) is replaced by the
`phase ex(eiw) of X(eiw). T2ese substitutions result in the
`phase Ox(e iw ) of X(e iw ). These substitutions result in the
`phase Ox(e iw ) of X(e iw ). These substitutions result in the
`spectral subtraction estimator S(eiw):
`
`spectral subtraction estimator S (e/ Wspectral subtraction estimator S (e/ W
`
`
`
`A A
`
`
`
`• •
`
`
`
`): ):
`
`
`
`S(e iw )::: [IX(eiw)l- JJ.(eiw)]ei8x(eiw) S(e iw )::: [IX(eiw)l- JJ.(eiw)]ei8x(eiw)
`
`
`
`or or
`
`
`
`with with
`
`
`JJ.(e iw ) JJ.(e iw )
`H(e iw ) ::: 1 - ~o.......,.~
`H(e iw ) ::: 1 - _.:.......,...-0-
`IX(e iw )/
`IX(e iw )/
`JJ.(e iw )::: E {IN(eiw)I}. JJ.(e iw )::: E {IN(eiw)I}.
`
`D. Spectral Error
`
`D. Spectral Error D. Spectral Error
`resulting from this estimator is
`The spectral error e(e'")
`
`The spectral error e(e iw ) resulting from this estimator is The spectral error e(e iw ) resulting from this estimator is
`given by
`
`given by given by
`
`( jW) ( jW)
`
`ee ee
`
`=$(e'W) - s(eiW> = N ( e i w ) - p(e'"> ejex.
`
`(iw) iOx (iw) iOx
`
`SAC iW) S( iW) N( iw) SAC iW) S( iW) N( iw)
`
`
`:::: ::::
`--
`
`:::: ::::
`
`e e
`
`e e
`
`
`e e
`-JJ.e -JJ.e
`
`
`e e
`. .
`
`A number of simple modifications are available
`to reduce
`A number of simple modifications are available to reduce
`A number of simple modifications are available to reduce
`the auditory effects of this
`spectral error. These include:
`
`the auditory effects of this spectral error. These include: the auditory effects of this spectral error. These include:
`1) magnitude averaging; 2) half-wave rectification; 3) residual
`1) magnitude averaging; 2) half-wave rectification; 3) residual
`1) magnitude averaging; 2) half-wave rectification; 3) residual
`noise reduction; and 4) additional signal attenuation during
`noise reduction; and 4) additional signal attenuation during
`noise reduction; and 4) additional signal attenuation during
`nonspeech activity.
`nonspeech activity.
`nonspeech activity.
`E. Magnitude Averaging
`
`E. Magnitude Averaging E. Magnitude Averaging
`Since the spectral error equals the difference between the
`
`Since the spectral error equals the difference between the Since the spectral error equals the difference between the
`noise spectrum N and its mean p, local averaging of spectral
`
`noise spectrum N and its mean JJ., local averaging of spectral noise spectrum N and its mean JJ., local averaging of spectral
`magnitudes can be
`used to reduce the error. Replacing
`magnitudes can be used to reduce the error. Replacing
`magnitudes can be used to reduce the error. Replacing
`IX(eiw)I with IX(ejW)I where
`
`IX(eiW)1 with IX(eiW)1 where IX(eiW)1 with IX(eiW)1 where
`1 M-1
`Ix(ej")l E
`IXi(e'")I
`IX(eiw)l::: ..!.. ~l IXj(eiw)1
`IX(eiw)l::: ..!.. ~l IXj(eiw)1
`i=O
`M
`M
`i=o
`i=o
`Xi(&" = ith time-windowed transform of x(k)
`Xj(e iW )::: ith time-windowed transform ofx(k)
`Xj(e iW )::: ith time-windowed transform ofx(k)
`gives
`gives
`gives
`
`SA (e iw ) :::: [/X(eiW)I- JJ.(eiw)] ei8x(eiW). SA (e iw ) :::: [/X(eiW)I- JJ.(eiw)] ei8x(eiW).
`The rationale behind averaging is that the spectral error be-
`The rationale behind averaging is that the spectral error be(cid:173)
`The rationale behind averaging is that the spectral error be(cid:173)
`comes approximately
`comes approximately
`comes approximately
`- s(eiw> zs
`e(e'"> =
`- p
`
`e(e iw ) ::: SA (e iw ) - S(e iW ):::: INI - JJ. e(e iw ) ::: SA (e iw ) - S(e iW ):::: INI - JJ.
`where
`where
`where
`
`.
`.
`
`---""7. -
`
`--,.-. -
`1 M-l
`1 M-l
`IN(e1W)1 ::: - L /Nj(e/W)/.
`IN(e1W)1 ::: - L /Nj(e/W)/.
`M
`M
`i=O
`i=O
`Thus, the sample mean of IN(eiw)l will converge to p(e'") as
`
`Thus, the sample mean of /N(eiw)1 will converge to JJ.(e iw ) as Thus, the sample mean of /N(eiw)1 will converge to JJ.(e iw ) as
`a longer average is taken.
`a longer average is taken.
`a longer average is taken.
`
`The obvious problem with this modification is that the speech
`
`The obvious problem with this modification is that the speech The obvious problem with this modification is that the speech
`is nonstationary, and therefore only limited time averaging is
`is nonstationary, and therefore only limited time averaging is
`is nonstationary, and therefore only limited time averaging is
`allowed. DRT
`results show that averaging over more
`than
`allowed. DRT results show that averaging over more than
`allowed. DRT results show that averaging over more than
`three half-overlapped windows with a total time duration
`of
`three half-overlapped windows with a total time duration of
`three half-overlapped windows with a total time duration of
`38.4 ms will decrease intelligibility. Spectral examples and
`38.4 ms will decrease intelligibility. Spectral examples and
`38.4 ms will decrease intelligibility. Spectral examples and
`DRT scores with
`and without averaging are given
`in the
`DRT scores with and without averaging are given in the
`DRT scores with and without averaging are given in the
`"Results" section. Based upon these results, it appears that
`"Results" section. Based upon these results, it appears that
`"Results" section. Based upon these results, it appears that
`averaging coupled with half rectification offers some improve-
`averaging coupled with half rectification offers some improve(cid:173)
`averaging coupled with half rectification offers some improve(cid:173)
`ment. The major disadvantage of averaging is the risk of some
`ment. The major disadvantage of averaging is the risk of some
`ment. The major disadvantage of averaging is the risk of some
`temporal smearing of short transitory sounds.
`temporal smearing of short transitory sounds.
`temporal smearing of short transitory sounds.
`F, Half- Wave Rectification
`
`F. Half- Wave Rectification F. Half- Wave Rectification
`For each frequency w where the noisy signal spectrum mag-
`For each frequency w where the noisy signal spectrum mag(cid:173)
`For each frequency w where the noisy signal spectrum mag(cid:173)
`nitude IX(eIW)I is less than the average noise spectrum mag-
`
`nitude /X(eiW)1 is less than the average noise spectrum mag(cid:173)nitude /X(eiW)1 is less than the average noise spectrum mag(cid:173)
`nitude p(ei"),
`the output
`is set
`to zero. This modification
`nitude JJ.(e iw ), the output is set to zero. This modification
`nitude JJ.(e iw ), the output is set to zero. This modification
`implemented by half-wave rectifying H(eiw).
`can be simply
`
`can be simply implemented by half-wave rectifying H(e iw ). can be simply implemented by half-wave rectifying H(e iw ).
`The estimator then becomes
`
`The estimator then becomes The estimator then becomes
`$ ( e j w > = HR(ejW)X(ejW)
`where
`
`where where
`
`The input-output relationship between X(eiW) and $(eiw) at
`The input-output relationship between X(e/ W) and S(e 1W ) at
`The input-output relationship between X(e/ W) and S(e 1W ) at
`each frequency c3 is shown in Fig. 1 .
`
`each frequency w is shown in Fig. 1. each frequency w is shown in Fig. 1.
`Thus, the effect of half-wave rectification is to bias down the
`
`Thus, the effect of half-wave rectification is to bias down the Thus, the effect of half-wave rectification is to bias down the
`w by the noise bias
`magnitude spectrum at each frequency
`
`magnitude spectrum at each frequency w by the noise bias magnitude spectrum at each frequency w by the noise bias
`determined at that frequency. The bias value can, of course,
`
`determined at that frequency. The bias value can, of course, determined at that frequency. The bias value can, of course,
`
`
`
`• •
`
`
`
`A . A .
`
`Authorized licensed use limited to: Keio University. Downloaded on January 19, 2010 at 04:38 from IEEE Xplore. Restrictions apply.
`
`RTL345-1_1024-0002
`
`
`
`BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH
`
`
`
`BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH
`
`115
`
`115 115
`
`probability that the spectrum at that frequency is due tolow
`
`probability that the spectrum at that frequency is due to low probability that the spectrum at that frequency is due to low
`energy speech; therefore,*taking the minimum will retain the
`
`energy speech; therefore, taking the minimum will retain the energy speech; therefore, taking the minimum will retain the
`
`" "
`
`. .
`information; and third, if S(eiw) is greater than the maximum,
`
`information; and third, if S(elW) is greater than the maximum, information; and third, if S(elW) is greater than the maximum,
`there is speech present at that frequency; therefore, removing
`
`there is speech present at that frequency; therefore, removing there is speech present at that frequency; therefore, removing
`the bias is sufficient. The amount of noise reduction using this
`
`the bias is sufficient. The amount of noise reduction using this the bias is sufficient. The amount of noise reduction using this
`replacement scheme was judged equivalent to that obtained by
`
`replacement scheme was judged equivalent to that obtained by replacement scheme was judged equivalent to that obtained by
`averaging over three frames. However, with this approach high
`
`averaging over three frames. However, with this approach high averaging over three frames. However, with this approach high
`energy frequency bins are not averaged together. The disad-
`
`energy frequency bins are not averaged together. The disad· energy frequency bins are not averaged together. The disad·
`vantage to the scheme is that more storage is required to save
`
`vantage to the scheme is that more storage is required to save vantage to the scheme is that more storage is required to save
`the maximum noise residuals and the magnitude
`values for
`
`the maximum noise residuals and the magnitude values for the maximum noise residuals and the magnitude values for
`three adjacent frames.
`
`three adjacent frames. three adjacent frames.
`The residual noise reduction scheme is implemented as
`
`The residual noise reduction scheme is implemented as The residual noise reduction scheme is implemented as
`
`IS;(eIW)1 = IS;(eIW)I, IS;(eIW)1 = IS;(eIW)I,
`
`for IS;(eIW)I;;;' max INR (eIW)1 for IS;(eIW)I;;;' max INR (eIW)1
`
`ISi(eIW)1 = min {ISi(eJW)1 j = i-I, i, i + I}, ISi(eIW)1 = min {ISi(eJW)1 j = i-I, i, i + I},
`
`IStCeiW)1 < max iNR(eiw )I IStCeiW)1 < max iNR(eiw )I
`
`for for
`
`
`
`/'11.. A . /'11.. A .
`
`
`
`A A
`
`
`
`• •
`
`
`
`' . ' .
`
`
`
`A A
`
`
`
`• •
`
`
`
`A A
`
`
`
`. .
`
`
`
`A . A .
`
`
`
`• •
`
`
`
`• •
`
`where
`
`where where
`
`Si(e IW ) =HR(eIW)Xi(eIW ) Si(e IW ) =HR(eIW)Xi(eIW )
`and
`
`and and
`max INR(ejw)I = maximum value of noise residual
`
`max INR(eiw)1 = maximum value of noise residual max INR(eiw)1 = maximum value of noise residual
`me