`United States Patent [19]
`Diethorn
`Diethorn
`
`111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US006035048A
`US006035048A
`[11] Patent Number:
`[11] Patent Number:
`[45] Date of Patent:
`[45] Date of Patent:
`
`6,035,048
`6,035,048
`Mar. 7,2000
`Mar. 7, 2000
`
`[54] METHOD AND APPARATUS FOR
`[54] METHOD AND APPARATUS FOR
`REDUCING NOISE IN SPEECH AND AUDIO
`REDUCING NOISE IN SPEECH AND AUDIO
`SIGNALS
`SIGNALS
`
`Primary Examiner—Vivian Chang
`Primary Examiner-Vivian Chang
`Attorney, Agent, or Firm-Martin I. Finston; Ozer M.N.
`Attorney, Agent, or Firm—Martin I. Finston; Ozer M.N.
`Teitelbaum
`Teitelbaum
`
`[75] Inventor: Eric John Diethorn, Morristown, NJ.
`[75]
`Inventor: Eric John Diethorn, Morristown, N.J.
`
`[57]
`[57]
`
`ABSTRACT
`ABSTRACT
`
`[73] Assignee: Lucent Technologies Inc., Murray Hill,
`[73] Assignee: Lucent Technologies Inc., Murray Hill,
`N.J.
`NJ.
`
`[21] Appl. No.: 08/877,909
`[21] Appl. No.: 08/877,909
`[22]
`Filed:
`Jun. 18, 1997
`Jun. 18, 1997
`[22] Filed:
`
`[51]
`Int. CI? ..................................................... H04B 15/00
`[51] Int. Cl.7 ................................................... .. H04B 15/00
`[52] U.S. CI. ............................................ 381/94.3; 704/226
`[52] US. Cl. ................ ..
`381/94.3; 704/226
`[58] Field of Search .................................. 381194.1, 94.2,
`[58] Field of Search ................................ .. 381/941, 94.2,
`381/943, 72, 94.5, 94.7, 98, 73.1, 71.1;
`381/94.3, 72, 94.5, 94.7, 98, 73.1, 71.1;
`704/225, 226
`704/225, 226
`
`[56]
`[56]
`
`References Cited
`References Cited
`
`U.S. PATENT DOCUMENTS
`U.S. PATENT DOCUMENTS
`
`5,251,263 10/1993 Andrea et al. ......................... 381/71.6
`5,251,263 10/1993 Andrea et a1. ....................... .. 381/716
`5,550,924
`8/1996 Helf et al. .
`5,550,924
`8/1996 Helf et a1. .
`
`OTHER PUBLICATIONS
`OTHER PUBLICATIONS
`
`R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal
`R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal
`Processing, Prentice—Hall, Englewood Cliffs, New Jersey,
`Processing, Prentice-Hall, Englewood Cliffs, New Jersey,
`Jan. 1983, Chapter 7, “Multirate Techniques in Filter Banks
`Jan. 1983, Chapter 7, "Multirate Techniques in Filter Banks
`and Spectrum Analyzers and Synthesizers,” pp. 289—400.
`and Spectrum Analyzers and Synthesizers," pp. 289-400.
`W. Etter and G. S. Moschytz, “Noise Reduction by
`W. Etter and G. S. Moschytz, "Noise Reduction by
`Noise—Adaptive Spectral Magnitude Expansion,” J. Audio
`Noise-Adaptive Spectral Magnitude Expansion," J. Audio
`Eng. Soc. 42 (May 1994) 341—349.
`Eng. Soc. 42 (May 1994) 341-349.
`J. B. Allen, “Short Term Spectral Analysis, Synthesis, and
`J. B. Allen, "Short Term Spectral Analysis, Synthesis, and
`Modification by Discrete Fourier Transform," IEEE Trans(cid:173)
`Modi?cation by Discrete Fourier Transform,” IEEE Trans
`actions on Acoustics, Speech, and Signal Processing, vol.
`actions on Acoustics, Speech, and Signal Processing, vol.
`ASSP-25, No.3, Jun. 1977.
`ASSP—25, No. 3, Jun. 1977.
`
`Amethod and apparatus are disclosed for enhancing, within
`A method and apparatus are disclosed for enhancing, within
`a signal bandwidth, a corrupted audio-frequency signal. The
`a signal bandwidth, a corrupted audio-frequency signal. The
`signal which is to be enhanced is analyzed into plural
`signal which is to be enhanced is analyzed into plural
`sub-band signals, each occupying a frequency sub-band
`sub-band signals, each occupying a frequency sub-band
`smaller than the signal bandwidth. A respective signal gain
`smaller than the signal bandwidth. A respective signal gain
`function is applied to each sub-band signal, and the respec
`function is applied to each sub-band signal, and the respec(cid:173)
`tive sub-band signals are then synthesized into an enhanced
`tive sub-band signals are then synthesized into an enhanced
`signal of the signal bandwidth. The signal gain function is
`signal of the signal bandwidth. The signal gain function is
`derived, in part, by measuring speech energy and noise
`derived, in part, by measuring speech energy and noise
`energy, and from these determining a relative amount of
`energy, and from these determining a relative amount of
`speech energy, within the corresponding sub-band. In certain
`speech energy, within the corresponding sub-band. In certain
`embodiments of the invention, the signal gain function is
`embodiments of the invention, the signal gain function is
`also derived, in part, by determining a relative amount of
`also derived, in part, by determining a relative amount of
`speech energy within a frequency range greater than, but
`speech energy within a frequency range greater than, but
`centered on, the corresponding sub-band. In other embodi
`centered on, the corresponding sub-band. In other embodi(cid:173)
`ments of the invention, the sub-band noise energy is deter(cid:173)
`ments of the invention, the sub-band noise energy is deter
`mined from a noise estimate that is updated at periodic
`mined from a noise estimate that is updated at periodic
`intervals, but is not updated if the newest sample of the
`intervals, but is not updated if the newest sample of the
`signal to be enhanced exceeds the current noise estimate by
`signal to be enhanced exceeds the current noise estimate by
`a multiplicative threshold (i.e., a threshold expressible in
`a multiplicative threshold (i.e., a threshold expressible in
`decibels). In still other embodiments of the invention, the
`decibels). In still other embodiments of the invention, the
`value of the noise estimate is limited by an upper bound that
`value of the noise estimate is limited by an upper bound that
`is matched to the dynamic range of the signal to be
`is matched to the dynamic range of the signal to be
`enhanced.
`enhanced.
`
`12 Claims, 4 Drawing Sheets
`12 Claims, 4 Drawing Sheets
`
`1001
`901
`501
`501
`701
`801
`lOO~
`~~
`~ ~~ ~ M~
`,--""----,
`rllO
`GAIN
`LUMPED
`SIGNAL
`NOISE
`NARROW-BAND
`BROAD-BAND
`Z110
`GAIN
`LUMPED
`SIGNAL
`NOISE
`NARROW-BAND BROAD-BAND
`COMPUTATION r--L-l
`DEFLECTION
`ESTIMATION
`ESTIMATION
`DEFLECTION
`DEFLECTION
`COMPUTATION r" ~1
`DEFLECTIUN
`ESTIMATION
`ESTIMATION
`DEFLECTION
`DEFLECTION
`[7]
`1
`i
`(6]
`(2)
`(3)
`[4]
`(5)
`L...-_(2r-) _ I L----y(3)_.J L....---,(4r-) ---I L...---,(5.-) ---I L...---,(6.-) ---I 1....-..,(7,-) ---I i GAIN i
`I010
`120 ~
`: 9 (k, m) :
`120
`‘9 '"H
`SUBBAND
`I
`:.
`,
`: k=Q ,.---"-----,
`
`l |% |
`
`INDEX k=0 c(k,ml
`SUBBAND
`k=O
`401
`~ __ ~~4--------+--------~-------+--------~-------4----~:~x : k=1
`SUBBAND
`k=1
`‘
`E H
`SUBBAND
`SYNTHESIS
`I
`ANALYSIS _-’
`!
`I
`-
`5YNT(§)E5I5
`ALL M SUBBANDS INDEPENDENTLY PROCESSED
`ALL M SUBBANDS INDEPENDENTLY PROCESSED
`!
`(8)
`BY BLOCKS
`(2) THROUGH
`(7)
`(1)
`:
`BY BLOCKS (2) THROUGH (71
`-
`I
`:
`I ,
`I k=M~1
`k=M—1
`l
`|k=M—1
`,
`__.
`:
`_|___,
`i , ,
`I , ,
`1
`1
`
`-
`
`00th
`
`SPEECH)
`
`'
`
`'
`
`L"_"J
`
`y tn)
`y(n1
`$5535
`(NOISE(cid:173)
`REDUCED
`SPEECH)
`
`RTL345-1_1026-0001
`
`
`
`u.s. Patent
`U.S. Patent
`
`Mar. 7,2000
`Mar. 7,2000
`
`Sheet 1 of4
`Sheet 1 of 4
`
`6,035,048
`6,035,048
`
`FIG.
`FIG. 1
`j
`(PRIOR ART)
`(PRIOR ART)
`x (i)
`xh)
`
`1
`
`ANALYZER
`ANALYZER
`
`10
`
`c (0, m)
`
`c (1. m)
`
`c (2, m)
`
`c (M-1, m)
`
`SPECTRAL
`SPECTRAL
`MUDIFIEH
`MODIFIER
`
`_
`
`,
`
`,
`
`l
`
`V
`
`l
`
`20
`J——g(0.m)
`9 (0, m)
`9 (1. m)
`-——g (1, m)
`;
`
`--—g(M-1.m]
`
`9 (M-l, m)
`
`[I30
`.
`.
`0
`l
`30
`SYNTHESIZEH /
`SYNTHESIZER
`
`y (i)
`Hi)
`
`FIG. 3
`FIG. 3
`EACH PROCESSING EPOCH, m
`130
`EACH PROCESSING EPOCH, m 5130
`SHLIFNTEWIN
`SHIFT IN
`L NEW
`SHIFT REGISTER
`(N)
`SAMPLES --—-- SHIFT REGISTER (N)
`SAMPLES
`OF x (i)
`0F x0)
`
`LENGTH N VECTOR
`LENGTH N VECTOR
`140
`140
`ANALYSIS wmnow [N] J
`ANALYSIS WINDOW
`(N)
`l LENGTH N VECTOR
`LENGTH N VECTOR
`150
`150
`m (N)
`5
`
`DIUSLCDAERSDTL
`DISCARD L
`OLDEST
`SAMPLES
`SAMPLES
`OF X (i)
`0F xh)
`
`HM) c(1,m)
`c (0, m) c (1. m)
`
`c (M-1, m)
`c(M-1,m)
`
`1 COMPLEX TIME SERIES SAMPLE, c (k, m)
`1 COMPLEX TIME SERIES SAMPLE, C (k, m)
`FOR EACH OF M = N/2 + 1 SUBBANDS
`FOR EACH OF M = N/E + 1 SUBBANDS
`
`RTL345-1_1026-0002
`
`
`
`FIG. 2
`
`50
`
`60
`
`70
`
`80
`
`90
`
`100
`
`SIGNAL
`ESTIMATION
`(2)
`
`NOISE
`ESTIMATION
`(3)
`
`NARROW-BAND
`DEFLECTION
`(4)
`
`BROAD-BAND
`DEFLECTION
`(5)
`
`LUMPED
`DEFLECTION
`(6)
`
`GAIN
`COMPUTATION
`(7)
`
`SUBBAND
`INDEX
`k=O c (k, m)
`
`k=1 -· · · k=M-1
`
`~
`
`40~
`
`SUBBAND
`ANALYSIS
`(1)
`
`x (i)
`(NOISY
`SPEECH)
`
`ALL M SUBBANDS INDEPENDENTLY PROCESSED
`BY BLOCKS
`(2) THROUGH
`(7)
`
`d •
`rJl
`•
`~
`~ .....
`~ = .....
`
`~
`~
`:"l
`~-..J
`N
`C
`C
`C
`
`'JJ. =(cid:173)~
`~ .....
`N
`o ....,
`
`~
`
`0\
`....
`8
`.... = ""-00
`
`Ul
`
`I
`
`r--
`I I G
`: 9 I
`I ,
`
`I
`1 k=O
`1
`SUBBAND
`: X : k=1
`I
`SYNTHESIS
`I
`:
`: : (8)
`I k=M~1
`I I ,
`I
`
`I
`I
`
`I
`I
`
`L _____ J
`
`y (n)
`(NOISE(cid:173)
`REDUCED
`SPEECH)
`
`RTL345-1_1026-0003
`
`
`
`u.s. Patent
`U.S. Patent
`
`Mar. 7, 2000
`Mar. 7,2000
`
`Sheet 3 of4
`Sheet 3 of 4
`
`6,035,048
`6,035,048
`
`FIG. 4
`FIG‘. 4
`
`4 3
`r-----------~ A=ALPHA_ATTACK
`- A=ALPHA ATTACK /
`
`4.1
`I4. 1
`s (k, m) = A s (k, m-1)
`+ (i-A) Ic(k,m)1
`
`c (k, m) ----e-<:
`U.
`m
`C
`
`- A=ALPHA_DECAY
`
`s (k, m)
`s (k, m)
`
`FIG. 5
`FIG. 5
`
`B=BETA ATTACK
`
`‘$5.3
`
`INHIBIT UPDATE ON
`INHIBIT UPDATE ON
`PROBABLE SPEECH SAMPLES
`PROBABLE SPEECH SAMPLES
`
`c (k, m)
`
`.2
`
`i
`
`LIMIT MAXIMUM
`LIMIT MAXIMUM
`ATTAINABLE NOISE LEVEL
`ATTAINABLE NOISE LEVEL
`
`n (k, m) = min [n (k, m) ,
`n(k,m) = min[n[k,ml.
`NOISE PROFILE (k)]
`NUISE_PHOFILE (k) I
`t “5.6 I
`n (k, m)
`= B DUMB-1]
`"(k-m)
`n (k, m) = B n (k, m-1)
`n (k. m)
`+ (1-8) Ie (k, m)1
`)lcTkmH
`+ (1-8
`
`DDN'T UPDATE n (k, 0)
`
`B=BETA_UECAY '—
`
`5.1
`
`FIG. 6
`FIG‘. 6
`s (k, m) -....-t
`d (k, m) =
`(Tm
`k)
`mn
`=.K.
`'—-d (k, m)
`r---d(k, m)
`1/
`AU | Wm
`....... '---_S_(k_, m_) I_n _(k,_m) __
`m
`S
`n (k, m) _
`
`RTL345-1_1026-0004
`
`
`
`u.s. Patent
`U.S. Patent
`
`Mar. 7,2000
`Mar. 7, 2000
`
`Sheet 4 of4
`Sheet 4 of 4
`
`6,035,048
`6,035,048
`
`FIG. 7
`FIG. 7
`
`5 (k. m)
`FOR SUBBAND INDICES
`D (k. M) = [5 (k-Kl m) + s (k-K+1. m) + ... +
`s“SW-"1111.111 = 1s (k-K 11) + 511-1010 + .
`FOR SUBS/1ND INDISES
`k = K. K+1.
`. ... M-K-1
`s (k+K. m) / [(2K+1) *n (k. m)]
`1<= K, |<+1,
`11-1-1 n[k|m)__
`S(k+K,m)]/[(2K+1)*R(k,m)]
`1<-1) AND k IN (M-K.
`E0R1< IN (0.
`1
`.... K-1) AND k IN
`(M-K.
`FOR k IN (0. 1)
`M—K+1, ...M-1). 001,11) IS NOT COMPUTED
`M-K +1.
`. .. M-1 . D (k. m)
`IS NOT COMPUTED
`FIG. 8A
`FIG. 8A
`
`. +._.D(k m)
`D (k. m)
`'
`
`d (k. m)
`“(k-"0*" PHI (k m) = {max [11 [K m] /GAMMA NB
`PHI (k. m) = Imax [d (k. m) /GAMMA~B.
`FOR SUBBAND INDICES
`{=02 SKU+R1SANIR IIINDMI_CKE_S1 Wm)‘
`Mum) NAMMLBBHW -
`D(k. m)/GAMMA_BB]l**p
`. ..• M-K-1 D (k. m)
`k=K, K+l.
`
`,__
`PHI (k. m)
`PHI (k, m)
`
`. ..• K-l] AND
`FOR k IN
`[0. 1.
`.
`E0R1< IN [0,
`1,
`1<-1] AND
`k IN (M-K. M-k+1.
`... M-1]
`kIN (M-K, 11-1<+1
`11-11
`d (k. m)
`110,111- PHI (1,11) = [d(k,m]/GAMMA_NB] 101p
`PHI (k.m) = [d(k. m)/GAMMA~B] **p
`
`FIG. 88
`FIG. 8B
`
`PHI (k .m)
`--RR1(|<.11)
`
`FIG. 9
`FIG. 9
`
`9 (k. m) = min /1. O. PHI (k. m) 1
`PHI (k, m) - 90.11) = 1111100, PHI 0.11)]
`PHI (k. m) -
`
`f--g (k. m)
`-—g(k.m)
`
`FIG.
`jO
`FIG. 10
`1 COMPLEX TIME SERIES SAMPLE, g(k, 0) 11111.11).
`1 COMPLEX TIME SERIES SAMPLE. g(k.m) *c (k. mI.
`FOR EACH OF M = N/2+1 SUBBANDS
`FOR EACH OF M = N/2+1 SUBBANDS
`9 (0. m) * c (0. m)
`g(0,m) 11 c(0,m)
`9 (1. m) * c (1. m)
`g(1,m) * clLml
`9 (M-1. m) * c (M-1. m)
`1
`1101-111] 1 101-111)
`
`IFFT
`(N)
`@160
`IFFT (N)
`160
`{
`LENGTH N vEcmR
`LENGTH N VECTOR
`170
`SYNTHESIS WINDOW
`(N)
`SYNTHESIS 111110011 (N)
`»170
`LENGTH N VECTU
`L NEWEST
`LENGTH N VECTOR
`LNEWEST
`SHIFT 1“
`ACCUMULATE S SHgFT REGISTER (N)
`R
`SHIFT IN -_--+ ACCUMULATE & SHIFT REGISTER
`P1211125“
`(N) 1 - - - PROCESSED
`L ZEROES
`SAMPLES
`
`RTL345-1_1026-0005
`
`
`
`1
`1
`METHOD AND APPARATUS FOR
`METHOD AND APPARATUS FOR
`REDUCING NOISE IN SPEECH AND AUDIO
`REDUCING NOISE IN SPEECH AND AUDIO
`SIGNALS
`SIGNALS
`
`6,035,048
`6,035,048
`
`5
`
`30
`30
`
`2
`2
`The input data are fed into ?lter-bank analyZer 10. The
`The input data are fed into filter-bank analyzer 10. The
`output of this analyZer consists of a respective sub-band
`output of this analyzer consists of a respective sub-band
`signal c(O,m), c(1,m), c(2,m), ... , c(M-1,m) at each of M
`signal c(0,m), c(1,m), c(2,m), .
`.
`. , c(M—1,m) at each of M
`respective output ports of the analyZer, M a positive integer.
`respective output ports of the analyzer, M a positive integer.
`(The time index is shown as changed from i to m because the
`(The time index is shoWn as changed from i to m because the
`effective sampling rate may differ betWeen the respective
`effective sampling rate may differ between the respective
`processing stages.)
`processing stages.)
`At short-time spectral modi?er 20, each of the sub-band
`At short-time spectral modifier 20, each of the sub-band
`signals is subjected to gain modi?cation according to a
`signals is subjected to gain modification according to a
`10 respective signal gain function g(k,m), k=0,1,2, ... , M-1,
`10
`respective signal gain function g(k,m), k=0,1,2, .
`.
`. , M-1,
`Which may differ betWeen respective sub-bands. (In this
`which may differ between respective sub-bands. (In this
`context, "short-time" refers to a time scale typical of that
`context, “short-time” refers to a time scale typical of that
`over which speech utterances evolve. Such a time scale is
`over Which speech utterances evolve. Such a time scale is
`generally on the order of 20 ms in applications for process-
`generally on the order of 20 ms in applications for process
`ing human speech.)
`15 ing human speech.)
`15
`The sub-band signals are recombined at filter-bank syn(cid:173)
`The sub-band signals are recombined at ?lter-bank syn
`thesiZer 30 into modi?ed full-band signal y(i).
`thesizer 30 into modified full-band signal y(i).
`One application of methods of this kind to the problem of
`One application of methods of this kind to the problem of
`noise reduction is described in W. Etter and G. S. MoschytZ,
`noise reduction is described in W. Etter and G. S. Moschytz,
`“Noise Reduction by Noise-Adaptive Spectral Magnitude
`"Noise Reduction by Noise-Adaptive Spectral Magnitude
`Expansion,” J. Audio Eng. Soc. 42 (May 1994) 341—349.
`Expansion," J. Audio Eng. Soc. 42 (May 1994) 341-349.
`This article discusses a signal gain function (for each
`This article discusses a signal gain function (for each
`respective sub-band) that varies inversely according to a
`respective sub-band) that varies inversely according to a
`power of the fractional contribution made by an estimated
`poWer of the fractional contribution made by an estimated
`noise level to the total signal (i.e., speech plus noise). At
`noise level to the total signal (i.e., speech plus noise). At
`relatively high signal-to-noise ratios, this signal gain func
`relatively high signal-to-noise ratios, this signal gain func(cid:173)
`tion assumes a maximum value of unity. The exponent in the
`tion assumes a maximum value of unity. The exponent in the
`power-function relationship is referred to as an expansion
`poWer-function relationship is referred to as an expansion
`factor. An expansion factor controls the rate at which the
`factor. An expansion factor controls the rate at Which the
`gain decays as the signal-to-noise ratio decreases.
`gain decays as the signal-to-noise ratio decreases.
`Although the article by Etter et al. provides useful insights
`Although the article by Etter et al. provides useful insights
`of a general nature, it does not teach how to estimate the
`of a general nature, it does not teach hoW to estimate the
`noise level or how to discriminate between incidents of
`noise level or hoW to discriminate betWeen incidents of
`speech and background noise that is free of speech. Thus it
`35 speech and background noise that is free of speech. Thus it
`35
`does not suggest any practical implementation of the ideas
`does not suggest any practical implementation of the ideas
`discussed there.
`discussed there.
`Another application of methods of this kind is described
`Another application of methods of this kind is described
`in U.S. Pat. No. 5,550,924, "Reduction of Background
`in US. Pat. No. 5,550,924, “Reduction of Background
`40 Noise for Speech Enhancement," issued Aug. 27, 1996 to B.
`Noise for Speech Enhancement,” issued Aug. 27, 1996 to B.
`40
`M. Helf and P. L. Chu. This patent describes two methods
`M. Helf and P. L. Chu. This patent describes tWo methods
`for estimating the noise level. Both methods involve detect(cid:173)
`for estimating the noise level. Both methods involve detect
`ing sequences of input data that satisfy some criterion that
`ing sequences of input data that satisfy some criterion that
`signi?es the likely presence of background noise Without
`signifies the likely presence of background noise without
`45 speech. In one method, the processor observes the frequency
`speech. In one method, the processor observes the frequency
`45
`spectrum of the input data and detects data sequences for
`spectrum of the input data and detects data sequences for
`Which this spectrum is stationary for a relatively long time
`which this spectrum is stationary for a relatively long time
`interval. In the other method, the input stream is divided into
`interval. In the other method, the input stream is divided into
`ten-second intervals, and within these intervals, the proces-
`ten-second intervals, and Within these intervals, the proces
`50 sor observes the energy content of multiple sub-intervals.
`sor observes the energy content of multiple sub-intervals.
`Within each interval, the processor takes as representative of
`Within each interval, the processor takes as representative of
`speech-free background noise that sub-interval having the
`speech-free background noise that sub-interval having the
`least energy.
`least energy.
`The method of Helf et al. further involves making a binary
`The method of Helf et al. further involves making a binary
`55 decision whether speech is present, based on the ratio of
`decision Whether speech is present, based on the ratio of
`55
`input signal to noise estimate. Acon?dence level is assigned
`input signal to noise estimate. A confidence level is assigned
`to each of these decisions. These confidence levels
`to each of these decisions. These con?dence levels
`determine, in part, the corresponding values of the signal
`determine, in part, the corresponding values of the signal
`gain function.
`gain function.
`Although useful, the method of Helf et al. involves
`Although useful, the method of Helf et al. involves
`relatively complex procedures for estimating the noise level,
`relatively complex procedures for estimating the noise level,
`establishing the presence of speech, and establishing values
`establishing the presence of speech, and establishing values
`for the signal gain function. Complexity is disadvantageous
`for the signal gain function. Complexity is disadvantageous
`because it increases demands on computational resources,
`because it increases demands on computational resources,
`65 and often leads to greater product costs.
`and often leads to greater product costs.
`65
`Moreover, it is signi?cant that human speech includes
`Moreover, it is significant that human speech includes
`intervals of narrowband, multicomponent energy, referred to
`intervals of narroWband, multicomponent energy, referred to
`
`20
`20
`
`25
`25
`
`FIELD OF THE INVENTION
`FIELD OF THE INVENTION
`This invention relates to the use of digital ?ltering tech
`This invention relates to the use of digital filtering tech(cid:173)
`niques to improve the audibility or intelligibility of speech
`niques to improve the audibility or intelligibility of speech
`or other audio-frequency signals that are corrupted With
`or other audio-frequency signals that are corrupted with
`noise. More particularly, the invention relates to those
`noise. More particularly, the invention relates to those
`techniques that seek to reduce stationary, or sloWly varying,
`techniques that seek to reduce stationary, or slowly varying,
`background noise.
`background noise.
`ART BACKGROUND
`ART BACKGROUND
`It is a matter of daily experience for speech (or other
`It is a matter of daily experience for speech (or other
`audible information) received over a communication chan(cid:173)
`audible information) received over a communication chan
`nel to be corrupted With background noise. Such noise may
`nel to be corrupted with background noise. Such noise may
`arise, e.g., from circuitry Within the communication system,
`arise, e.g., from circuitry within the communication system,
`or from environmental conditions at the source of the
`or from environmental conditions at the source of the
`audible signal. Environmental noise may come, for example,
`audible signal. Environmental noise may come, for example,
`from fans, automobile engines, other vibrating machines, or
`from fans, automobile engines, other vibrating machines, or
`nearby vehicular traf?c. Although noise components that
`nearby vehicular traffic. Although noise components that
`occupy narrow, discrete frequency bands are often advan(cid:173)
`occupy narroW, discrete frequency bands are often advan
`tageously removed by filtering, there are many cases in
`tageously removed by ?ltering, there are many cases in
`Which this does not provide an adequate solution. Instead,
`which this does not provide an adequate solution. Instead,
`the background noise often exhibits a frequency spectrum
`the background noise often exhibits a frequency spectrum
`that overlaps substantially With the spectrum of the desired
`that overlaps substantially with the spectrum of the desired
`signal. In such a case, a narrow frequency-rejection filter
`signal. In such a case, a narroW frequency-rejection ?lter
`may not reject enough of the noise, whereas a broad such
`may not reject enough of the noise, Whereas a broad such
`?lter may unacceptably distort the desired signal.
`filter may unacceptably distort the desired signal.
`What is needed in such a case is a filter whose frequency
`What is needed in such a case is a ?lter Whose frequency
`characteristics strike an appropriate balance betWeen reject
`characteristics strike an appropriate balance between reject(cid:173)
`ing frequency components characteristic of unWanted noise,
`ing frequency components characteristic of unwanted noise,
`and preserving the esthetic quality or intelligibility of the
`and preserving the esthetic quality or intelligibility of the
`desired signal. Among the various audible signals of interest,
`desired signal. Among the various audible signals of interest,
`it is fortuitous that speech, at least, is marked by frequent
`it is fortuitous that speech, at least, is marked by frequent
`pauses of suf?cient length to be captured and analyZed using
`pauses of sufficient length to be captured and analyzed using
`digital sampling techniques. Consequently, it is possible to
`digital sampling techniques. Consequently, it is possible to
`apply different ?lter characteristics depending Whether,
`apply different filter characteristics depending whether,
`according to some criterion, the current signal is more
`according to some criterion, the current signal is more
`probably speech or more probably noise. (Although the
`probably speech or more probably noise. (Although the
`desired signal will often be referred to below as speech, it
`desired signal Will often be referred to beloW as speech, it
`should be noted that this usage is purely for convenience.
`should be noted that this usage is purely for convenience.
`Those skilled in the art Will readily appreciate that the
`Those skilled in the art will readily appreciate that the
`techniques to be described here apply more generally to
`techniques to be described here apply more generally to
`audible signals of various kinds.)
`audible signals of various kinds.)
`Recently, a number of investigators have described
`Recently, a number of investigators have described
`approaches to this problem using digital ?lter banks for
`approaches to this problem using digital filter banks for
`sub-band filtering. The filter-bank methods used include,
`sub-band ?ltering. The ?lter-bank methods used include,
`e.g., the DFT (Discrete Fourier Transform) filter-bank
`e.g., the DFT (Discrete Fourier Transform) ?lter-bank
`method and the polyphase filter-bank method. (As is well(cid:173)
`method and the polyphase ?lter-bank method. (As is Well
`known in the art, these two methods are essentially the same,
`knoWn in the art, these tWo methods are essentially the same,
`but differ in certain details of the computational
`but differ in certain details of the computational
`implementation.) Sub-band ?ltering in general, and in par
`implementation.) Sub-band filtering in general, and in par(cid:173)
`ticular the DFT and polyphase filter-bank methods, are
`ticular the DFT and polyphase ?lter-bank methods, are
`described in detail in R. E. Crochiere and L. R. Rabiner,
`described in detail in R. E. Crochiere and L. R. Rabiner,
`Multirate Digital Signal Processing, Prentice-Hall, Engle
`Multirate Digital Signal Processing, Prentice-Hall, Engle(cid:173)
`wood Cliffs, N.J., 1983, hereinafter referred to as
`Wood Cliffs, N.J., 1983, hereinafter referred to as
`CROCHIERE, particularly at Chapter 7, “Multirate Tech
`CROCHIERE, particularly at Chapter 7, "Multirate Tech(cid:173)
`niques in Filter Banks and Spectrum AnalyZers and
`niques in Filter Banks and Spectrum Analyzers and
`Synthesizers,” pages 289—400. I hereby incorporate CRO
`Synthesizers," pages 289-400. I hereby incorporate CRO- 60
`60
`CHIERE by reference.
`CHIERE by reference.
`In a broad sense, these and similar approaches can be
`In a broad sense, these and similar approaches can be
`described in terms of the processing stages depicted in FIG.
`described in terms of the processing stages depicted in FIG.
`1. Adigitally sampled input signal is denoted in the ?gure by
`1. A digitally sampled input signal is denoted in the figure by
`xCi). Here, x typically represents the amplitude of an audio(cid:173)
`Here, x typically represents the amplitude of an audio
`frequency signal, and i is the time variable, referred to in this
`frequency signal, and i is the time variable, referred to in this
`digitized form as a time index.
`digitiZed form as a time index.
`
`RTL345-1_1026-0006
`
`
`
`6,035,048
`6,035,048
`
`4
`4
`FIG. 2 is a high-level, schematic diagram shoWing signal
`FIG. 2 is a high-level, schematic diagram showing signal
`?oW through various processing stages of the invention in an
`flow through various processing stages of the invention in an
`exemplary embodiment.
`exemplary embodiment.
`FIG. 3 is a more detailed, schematic representation of the
`FIG. 3 is a more detailed, schematic representation of the
`sub-band analysis stage of FIG. 2.
`5 sub-band analysis stage of FIG. 2.
`FIG. 4 is a more detailed, schematic representation of the
`FIG. 4 is a more detailed, schematic representation of the
`signal-estimation stage of FIG. 2.
`signal-estimation stage of FIG. 2.
`FIG. 5 is a more detailed, schematic representation of the
`FIG. 5 is a more detailed, schematic representation of the
`noise-estimation stage of FIG. 2.
`10 noise-estimation stage of FIG. 2.
`10
`FIG. 6 is a more detailed, schematic representation of the
`FIG. 6 is a more detailed, schematic representation of the
`narroWband de?ection stage of FIG. 2.
`narrowband deflection stage of FIG. 2.
`FIG. 7 is a more detailed, schematic representation of the
`FIG. 7 is a more detailed, schematic representation of the
`broadband de?ection stage of FIG. 2.
`broadband deflection stage of FIG. 2.
`FIGS. 8A and 8B provide a more detailed, schematic
`FIGS. 8A and 8B provide a more detailed, schematic
`representation of the lumped de?ection stage of FIG. 2.
`representation of the lumped deflection stage of FIG. 2.
`FIG. 9 is a more detailed, schematic representation of the
`FIG. 9 is a more detailed, schematic representation of the
`gain computation stage of FIG. 2.
`gain computation stage of FIG. 2.
`FIG. 10 is a more detailed, schematic representation of the
`FIG. 10 is a more detailed, schematic representation of the
`sub-band synthesis stage of FIG. 2.
`sub-band synthesis stage of FIG. 2.
`
`20
`
`15
`15
`
`3
`3
`as "voiced speech," and intervals of broadband energy,
`as “voiced speech,” and intervals of broadband energy,
`referred to as "unvoiced speech." Methods of sub-band
`referred to as “unvoiced speech.” Methods of sub-band
`processing, such as those described here, tend to be most
`processing, such as those described here, tend to be most
`effective in detecting voiced speech, because speech detec
`effective in detecting voiced speech, because speech detec(cid:173)
`tion can take place Within the speci?c frequency sub-bands
`tion can take place within the specific frequency sub-bands
`Where speech energy is concentrated. HoWever, such meth
`where speech energy is concentrated. However, such meth(cid:173)
`ods are generally less sensitive to incidents of unvoiced
`ods are generally less sensitive to incidents of unvoiced
`speech, because the speech energy is distributed over rela
`speech, because the speech energy is distributed over rela(cid:173)
`tively many frequency bands.
`tively many frequency bands.
`Thus, what has been lacking until now is a sub-band
`Thus, What has been lacking until noW is a sub-band
`method for enhancing speech (or other audible signals) that
`method for enhancing speech (or other audible signals) that
`is computationally relatively simple, and is at least as
`is computationally relatively simple, and is at least as
`effective for detecting unvoiced speech (or other incidents of
`effective for detecting unvoiced speech (or other incidents of
`broadband energy) as it is for detecting voice speech (or
`broadband energy) as it is for detecting voice speech (or
`other incidents of narroWband, multicomponent energy).
`other incidents of narrowband, multicomponent energy).
`SUMMARY OF THE INVENTION
`SUMMARY OF THE INVENTION
`I have invented an improved sub-band method for
`I have invented an improved sub-band method for
`enhancing speech or other audible signals in the presence of
`enhancing speech or other audible signals in the presence of
`background noise. My method is computationally relatively
`background noise. My method is computationally relatively
`simple, and thus can achieve economy in the use of, and
`simple, and thus can achieve economy in the use of, and
`demand for, computational resources. In contrast to methods
`demand for, computational resources. In contrast to methods
`of the prior art, my method includes separate speech
`of the prior art, my method includes separate speech(cid:173)
`detection stages, one directed primarily to voiced speech or
`detection stages, one directed primarily to voiced speech or
`the like, and the other directed primarily to unvoiced speech
`the like, and the other directed primarily to unvoiced speech
`or the like.
`or the like.
`In a broad aspect, my invention involves a method for
`In a broad aspect, my invention involves a method for
`enhancing, Within a signal bandWidth, a corrupted audio
`enhancing, within a signal bandwidth, a corrupted audio(cid:173)
`frequency signal having a signal component and a noise
`frequency signal having a signal component and a noise
`component. In accordance With this method, the corrupted
`component. In accordance with this method, the corrupted
`signal is analyZed into plural sub-band signals, each occu
`signal is analyzed into plural sub-band signals, each occu- 30
`pying a frequency sub-band smaller than the signal band
`pying a frequency sub-band smaller than the signal band(cid:173)
`Width. A respective signal gain function is applied to the
`width. A respective signal gain function is applied to the
`sub-band signal corresponding to each sub-band, thereby to
`sub-band signal corresponding to each sub-band, thereby to
`yield respective gain-modi?ed signals. The gain-modi?ed
`yield respective gain-modified signals. The gain-modified
`signals are synthesiZed into an enhanced signal of the signal
`signals are synthesized into an enhanced signal of the signal 35
`35
`bandwidth.
`bandWidth.
`Within each frequency sub-band, the step of applying the
`Within each frequency sub-band, the step of applying the
`signal gain function to the sub-band signal includes: evalu
`signal gain function to the sub-band signal includes: evalu(cid:173)
`ating a function that is preferentially sensitive to energy in
`ating a function that is preferentially sensitive to energy in
`the signal component; and applying, to the sub-band signal,
`the signal component; and applying, to the sub-band