throbber
United States Patent [19]
`Diethorn
`
`US006035048A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,035,048
`Mar. 7, 2000
`
`[54] METHOD AND APPARATUS FOR
`REDUCING NOISE IN SPEECH AND AUDIO
`SIGNALS
`
`Primary Examiner—Vivian Chang
`Attorney, Agent, or Firm—Martin I. Finston; Ozer M.N.
`Teitelbaum
`
`[75] Inventor: Eric John Diethorn, Morristown, NJ.
`
`[57]
`
`ABSTRACT
`
`[73] Assignee: Lucent Technologies Inc., Murray Hill,
`NJ.
`
`[21] Appl. No.: 08/877,909
`[22]
`Filed:
`Jun. 18, 1997
`
`[51] Int. Cl.7 ................................................... .. H04B 15/00
`[52] US. Cl. ................ ..
`381/94.3; 704/226
`[58] Field of Search ................................ .. 381/941, 94.2,
`381/943, 72, 94.5, 94.7, 98, 73.1, 71.1;
`704/225, 226
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,251,263 10/1993 Andrea et a1. ....................... .. 381/716
`5,550,924
`8/1996 Helf et a1. .
`
`OTHER PUBLICATIONS
`
`R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal
`Processing, Prentice—Hall, Englewood Cliffs, New Jersey,
`Jan. 1983, Chapter 7, “Multirate Techniques in Filter Banks
`and Spectrum Analyzers and Synthesizers,” pp. 289—400.
`W. Etter and G. S. Moschytz, “Noise Reduction by
`Noise—Adaptive Spectral Magnitude Expansion,” J. Audio
`Eng. Soc. 42 (May 1994) 341—349.
`J. B. Allen, “Short Term Spectral Analysis, Synthesis, and
`Modi?cation by Discrete Fourier Transform,” IEEE Trans
`actions on Acoustics, Speech, and Signal Processing, vol.
`ASSP—25, No. 3, Jun. 1977.
`
`Amethod and apparatus are disclosed for enhancing, within
`a signal bandwidth, a corrupted audio-frequency signal. The
`signal which is to be enhanced is analyzed into plural
`sub-band signals, each occupying a frequency sub-band
`smaller than the signal bandwidth. A respective signal gain
`function is applied to each sub-band signal, and the respec
`tive sub-band signals are then synthesized into an enhanced
`signal of the signal bandwidth. The signal gain function is
`derived, in part, by measuring speech energy and noise
`energy, and from these determining a relative amount of
`speech energy, within the corresponding sub-band. In certain
`embodiments of the invention, the signal gain function is
`also derived, in part, by determining a relative amount of
`speech energy within a frequency range greater than, but
`centered on, the corresponding sub-band. In other embodi
`ments of the invention, the sub-band noise energy is deter
`mined from a noise estimate that is updated at periodic
`intervals, but is not updated if the newest sample of the
`signal to be enhanced exceeds the current noise estimate by
`a multiplicative threshold (i.e., a threshold expressible in
`decibels). In still other embodiments of the invention, the
`value of the noise estimate is limited by an upper bound that
`is matched to the dynamic range of the signal to be
`enhanced.
`
`12 Claims, 4 Drawing Sheets
`
`501
`SIGNAL
`ESTIMATION
`(2)
`
`501
`NOISE
`ESTIMATION
`(3)
`
`801
`701
`NARROW-BAND BROAD-BAND
`DEFLECTION
`DEFLECTION
`[4]
`(5)
`
`901
`LUMPED
`DEFLECTIUN
`(6]
`
`SUBBAND
`
`INDEX k=0 c(k,ml
`401
`k=1
`SUBBAND
`ANALYSIS _-’
`(1)
`:
`k=M—1
`__.
`
`ALL M SUBBANDS INDEPENDENTLY PROCESSED
`BY BLOCKS (2) THROUGH (71
`
`-
`
`00th
`
`SPEECH)
`
`1001
`Z110
`GAIN
`COMPUTATION r" ~1
`[7]
`1
`i
`
`‘9 '"H
`
`I010
`k=O
`
`l |% |
`‘
`E H
`!
`I
`-
`5YNT(§)E5I5
`-
`I
`:
`l
`|k=M—1
`:
`_|___,
`
`120
`I
`SUBBAND
`
`'
`
`1
`
`'
`
`1
`
`L"_"J
`
`y(n1
`
`$5535
`
`RTL345-2_1025-0001
`
`

`
`U.S. Patent
`
`Mar. 7,2000
`
`Sheet 1 of4
`
`6,035,048
`
`FIG. 1
`(PRIOR ART)
`xh)
`
`1
`
`ANALYZER
`
`SPECTRAL
`MUDIFIEH
`
`10
`
`J——g(0.m)
`-——g (1, m)
`;
`
`,
`
`l
`
`V
`
`l
`
`_
`
`,
`
`--—g(M-1.m]
`
`[I30
`.
`.
`0
`l
`SYNTHESIZEH /
`
`Hi)
`
`FIG. 3
`EACH PROCESSING EPOCH, m 5130
`SHLIFNTEWIN
`SAMPLES --—-- SHIFT REGISTER (N)
`0F x0)
`
`LENGTH N VECTOR
`140
`ANALYSIS wmnow [N] J
`l LENGTH N VECTOR
`150
`m (N)
`5
`
`DIUSLCDAERSDTL
`SAMPLES
`0F xh)
`
`HM) c(1,m)
`
`c(M-1,m)
`
`1 COMPLEX TIME SERIES SAMPLE, C (k, m)
`FOR EACH OF M = N/E + 1 SUBBANDS
`
`RTL345-2_1025-0002
`
`

`
`U.S. Patent
`
`Mar. 7, 2000
`
`Sheet 2 of4
`
`6,035,048
`
`o:
`
`
`
` _9_;___.V_E_as
`
`_!-nl--|_
`
`
`
`
`
`_!-LzozsazsZSEEQzSBm_._u_mm_s:aEgz.E§:m_282228
`
`
`
`
`
`
`
`
`
`
`
`E5235QvasmEammueaEEEEQ:32:332._._<
`
`RTL345-2 1025-0003
`
`RTL345-2_1025-0003
`
`
`
`

`
`U.S. Patent
`
`Mar. 7, 2000
`
`Sheet 3 of4
`
`6,035,048
`
`FIG‘. 4
`
`4 3
`- A=ALPHA ATTACK /
`
`- A=ALPHA_DECAY
`
`FIG. 5
`
`B=BETA ATTACK
`
`‘$5.3
`
`I4. 1
`
`s (k, m)
`
`C
`
`U.
`m
`
`INHIBIT UPDATE ON
`PROBABLE SPEECH SAMPLES
`
`.2
`
`i
`
`LIMIT MAXIMUM
`ATTAINABLE NOISE LEVEL
`
`n(k,m) = min[n[k,ml.
`NUISE_PHOFILE (k) I
`t “5.6 I
`= B DUMB-1]
`"(k-m)
`n (k. m)
`)lcTkmH
`+ (1-8
`
`DDN'T UPDATE n (k, 0)
`
`B=BETA_UECAY '—
`
`FIG‘. 6
`(Tm
`k)
`mn
`1/
`AU | Wm
`
`=.K.
`
`m
`
`S
`
`'—-d (k, m)
`
`RTL345-2_1025-0004
`
`

`
`U.S. Patent
`
`Mar. 7, 2000
`
`Sheet 4 of4
`
`6,035,048
`
`FIG. 7
`
`s“SW-"1111.111 = 1s (k-K 11) + 511-1010 + .
`FOR SUBS/1ND INDISES
`1<= K, |<+1,
`11-1-1 n[k|m)__
`S(k+K,m)]/[(2K+1)*R(k,m)]
`
`. +._.D(k m)
`'
`
`1<-1) AND k IN (M-K.
`1
`E0R1< IN (0.
`M—K+1, ...M-1). 001,11) IS NOT COMPUTED
`
`FIG. 8A
`
`“(k-"0*" PHI (k m) = {max [11 [K m] /GAMMA NB
`{=02 SKU+R1SANIR IIINDMI_CKE_S1 Wm)‘
`Mum) NAMMLBBHW -
`
`,__
`PHI (k, m)
`
`.
`1,
`E0R1< IN [0,
`kIN (M-K, 11-1<+1
`
`FIG. 8B
`
`1<-1] AND
`11-11
`110,111- PHI (1,11) = [d(k,m]/GAMMA_NB] 101p
`
`--RR1(|<.11)
`
`FIG. 9
`
`PHI (k, m) - 90.11) = 1111100, PHI 0.11)]
`
`-—g(k.m)
`
`FIG. 10
`
`1 COMPLEX TIME SERIES SAMPLE, g(k, 0) 11111.11).
`FOR EACH OF M = N/2+1 SUBBANDS
`g(0,m) 11 c(0,m)
`g(1,m) * clLml
`1
`1101-111] 1 101-111)
`
`@160
`IFFT (N)
`LENGTH N vEcmR
`{
`SYNTHESIS 111110011 (N)
`»170
`LENGTH N VECTU
`ACCUMULATE S SHgFT REGISTER (N)
`R
`
`SHIFT 1“
`
`LNEWEST
`P1211125“
`
`RTL345-2_1025-0005
`
`

`
`1
`METHOD AND APPARATUS FOR
`REDUCING NOISE IN SPEECH AND AUDIO
`SIGNALS
`
`FIELD OF THE INVENTION
`This invention relates to the use of digital ?ltering tech
`niques to improve the audibility or intelligibility of speech
`or other audio-frequency signals that are corrupted With
`noise. More particularly, the invention relates to those
`techniques that seek to reduce stationary, or sloWly varying,
`background noise.
`ART BACKGROUND
`It is a matter of daily experience for speech (or other
`audible information) received over a communication chan
`nel to be corrupted With background noise. Such noise may
`arise, e.g., from circuitry Within the communication system,
`or from environmental conditions at the source of the
`audible signal. Environmental noise may come, for example,
`from fans, automobile engines, other vibrating machines, or
`nearby vehicular traf?c. Although noise components that
`occupy narroW, discrete frequency bands are often advan
`tageously removed by ?ltering, there are many cases in
`Which this does not provide an adequate solution. Instead,
`the background noise often exhibits a frequency spectrum
`that overlaps substantially With the spectrum of the desired
`signal. In such a case, a narroW frequency-rejection ?lter
`may not reject enough of the noise, Whereas a broad such
`?lter may unacceptably distort the desired signal.
`What is needed in such a case is a ?lter Whose frequency
`characteristics strike an appropriate balance betWeen reject
`ing frequency components characteristic of unWanted noise,
`and preserving the esthetic quality or intelligibility of the
`desired signal. Among the various audible signals of interest,
`it is fortuitous that speech, at least, is marked by frequent
`pauses of suf?cient length to be captured and analyZed using
`digital sampling techniques. Consequently, it is possible to
`apply different ?lter characteristics depending Whether,
`according to some criterion, the current signal is more
`probably speech or more probably noise. (Although the
`desired signal Will often be referred to beloW as speech, it
`should be noted that this usage is purely for convenience.
`Those skilled in the art Will readily appreciate that the
`techniques to be described here apply more generally to
`audible signals of various kinds.)
`Recently, a number of investigators have described
`approaches to this problem using digital ?lter banks for
`sub-band ?ltering. The ?lter-bank methods used include,
`e.g., the DFT (Discrete Fourier Transform) ?lter-bank
`method and the polyphase ?lter-bank method. (As is Well
`knoWn in the art, these tWo methods are essentially the same,
`but differ in certain details of the computational
`implementation.) Sub-band ?ltering in general, and in par
`ticular the DFT and polyphase ?lter-bank methods, are
`described in detail in R. E. Crochiere and L. R. Rabiner,
`Multirate Digital Signal Processing, Prentice-Hall, Engle
`Wood Cliffs, N.J., 1983, hereinafter referred to as
`CROCHIERE, particularly at Chapter 7, “Multirate Tech
`niques in Filter Banks and Spectrum AnalyZers and
`Synthesizers,” pages 289—400. I hereby incorporate CRO
`CHIERE by reference.
`In a broad sense, these and similar approaches can be
`described in terms of the processing stages depicted in FIG.
`1. Adigitally sampled input signal is denoted in the ?gure by
`Here, x typically represents the amplitude of an audio
`frequency signal, and i is the time variable, referred to in this
`digitiZed form as a time index.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`6,035,048
`
`2
`The input data are fed into ?lter-bank analyZer 10. The
`output of this analyZer consists of a respective sub-band
`signal c(0,m), c(1,m), c(2,m), .
`.
`. , c(M—1,m) at each of M
`respective output ports of the analyZer, M a positive integer.
`(The time index is shoWn as changed from i to m because the
`effective sampling rate may differ betWeen the respective
`processing stages.)
`At short-time spectral modi?er 20, each of the sub-band
`signals is subjected to gain modi?cation according to a
`respective signal gain function g(k,m), k=0,1,2, .
`.
`. , M-1,
`Which may differ betWeen respective sub-bands. (In this
`context, “short-time” refers to a time scale typical of that
`over Which speech utterances evolve. Such a time scale is
`generally on the order of 20 ms in applications for process
`ing human speech.)
`The sub-band signals are recombined at ?lter-bank syn
`thesiZer 30 into modi?ed full-band signal y(i).
`One application of methods of this kind to the problem of
`noise reduction is described in W. Etter and G. S. MoschytZ,
`“Noise Reduction by Noise-Adaptive Spectral Magnitude
`Expansion,” J. Audio Eng. Soc. 42 (May 1994) 341—349.
`This article discusses a signal gain function (for each
`respective sub-band) that varies inversely according to a
`poWer of the fractional contribution made by an estimated
`noise level to the total signal (i.e., speech plus noise). At
`relatively high signal-to-noise ratios, this signal gain func
`tion assumes a maximum value of unity. The exponent in the
`poWer-function relationship is referred to as an expansion
`factor. An expansion factor controls the rate at Which the
`gain decays as the signal-to-noise ratio decreases.
`Although the article by Etter et al. provides useful insights
`of a general nature, it does not teach hoW to estimate the
`noise level or hoW to discriminate betWeen incidents of
`speech and background noise that is free of speech. Thus it
`does not suggest any practical implementation of the ideas
`discussed there.
`Another application of methods of this kind is described
`in US. Pat. No. 5,550,924, “Reduction of Background
`Noise for Speech Enhancement,” issued Aug. 27, 1996 to B.
`M. Helf and P. L. Chu. This patent describes tWo methods
`for estimating the noise level. Both methods involve detect
`ing sequences of input data that satisfy some criterion that
`signi?es the likely presence of background noise Without
`speech. In one method, the processor observes the frequency
`spectrum of the input data and detects data sequences for
`Which this spectrum is stationary for a relatively long time
`interval. In the other method, the input stream is divided into
`ten-second intervals, and Within these intervals, the proces
`sor observes the energy content of multiple sub-intervals.
`Within each interval, the processor takes as representative of
`speech-free background noise that sub-interval having the
`least energy.
`The method of Helf et al. further involves making a binary
`decision Whether speech is present, based on the ratio of
`input signal to noise estimate. Acon?dence level is assigned
`to each of these decisions. These con?dence levels
`determine, in part, the corresponding values of the signal
`gain function.
`Although useful, the method of Helf et al. involves
`relatively complex procedures for estimating the noise level,
`establishing the presence of speech, and establishing values
`for the signal gain function. Complexity is disadvantageous
`because it increases demands on computational resources,
`and often leads to greater product costs.
`Moreover, it is signi?cant that human speech includes
`intervals of narroWband, multicomponent energy, referred to
`
`RTL345-2_1025-0006
`
`

`
`3
`as “voiced speech,” and intervals of broadband energy,
`referred to as “unvoiced speech.” Methods of sub-band
`processing, such as those described here, tend to be most
`effective in detecting voiced speech, because speech detec
`tion can take place Within the speci?c frequency sub-bands
`Where speech energy is concentrated. HoWever, such meth
`ods are generally less sensitive to incidents of unvoiced
`speech, because the speech energy is distributed over rela
`tively many frequency bands.
`Thus, What has been lacking until noW is a sub-band
`method for enhancing speech (or other audible signals) that
`is computationally relatively simple, and is at least as
`effective for detecting unvoiced speech (or other incidents of
`broadband energy) as it is for detecting voice speech (or
`other incidents of narroWband, multicomponent energy).
`SUMMARY OF THE INVENTION
`I have invented an improved sub-band method for
`enhancing speech or other audible signals in the presence of
`background noise. My method is computationally relatively
`simple, and thus can achieve economy in the use of, and
`demand for, computational resources. In contrast to methods
`of the prior art, my method includes separate speech
`detection stages, one directed primarily to voiced speech or
`the like, and the other directed primarily to unvoiced speech
`or the like.
`In a broad aspect, my invention involves a method for
`enhancing, Within a signal bandWidth, a corrupted audio
`frequency signal having a signal component and a noise
`component. In accordance With this method, the corrupted
`signal is analyZed into plural sub-band signals, each occu
`pying a frequency sub-band smaller than the signal band
`Width. A respective signal gain function is applied to the
`sub-band signal corresponding to each sub-band, thereby to
`yield respective gain-modi?ed signals. The gain-modi?ed
`signals are synthesiZed into an enhanced signal of the signal
`bandWidth.
`Within each frequency sub-band, the step of applying the
`signal gain function to the sub-band signal includes: evalu
`ating a function that is preferentially sensitive to energy in
`the signal component; and applying, to the sub-band signal,
`gain values that are related to the preferentially sensitive
`function.
`In contrast to methods of the prior art, the preferentially
`sensitive function is evaluated by, inter alia, measuring a
`relative amount of speech energy Within the corresponding
`sub-band, and also measuring a relative amount of speech
`energy Within a frequency range greater than, but centered
`on, the corresponding sub-band.
`I believe that through the use of my invention, noise in the
`speech channels of various kinds of telecommunication
`equipment can be e?iciently reduced, and improved subjec
`tive audio quality can thereby be efficiently achieved. Such
`equipment includes telephones such as cellular and cordless
`telephones, and audio and video teleconferencing systems.
`Further, my invention can be used to improve the quality of
`digitally encoded speech by reducing background noise that
`Would otherWise perturb the speech coder. Still further, I
`believe that my invention can be usefully employed Within
`the sWitching system of a telephone netWork to condition
`speech signals that have been degraded by noisy line
`conditions, or by background noise that is input at the
`location of one or more of the parties to a telephone call.
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`BRIEF DESCRIPTION OF THE DRAWING
`FIG. 1 is a schematic draWing that represents, in generic
`fashion, sub-band methods of speech enhancement, includ
`ing those of the prior art.
`
`65
`
`6,035,048
`
`4
`FIG. 2 is a high-level, schematic diagram shoWing signal
`?oW through various processing stages of the invention in an
`exemplary embodiment.
`FIG. 3 is a more detailed, schematic representation of the
`sub-band analysis stage of FIG. 2.
`FIG. 4 is a more detailed, schematic representation of the
`signal-estimation stage of FIG. 2.
`FIG. 5 is a more detailed, schematic representation of the
`noise-estimation stage of FIG. 2.
`FIG. 6 is a more detailed, schematic representation of the
`narroWband de?ection stage of FIG. 2.
`FIG. 7 is a more detailed, schematic representation of the
`broadband de?ection stage of FIG. 2.
`FIGS. 8A and 8B provide a more detailed, schematic
`representation of the lumped de?ection stage of FIG. 2.
`FIG. 9 is a more detailed, schematic representation of the
`gain computation stage of FIG. 2.
`FIG. 10 is a more detailed, schematic representation of the
`sub-band synthesis stage of FIG. 2.
`
`DETAILED DESCRIPTION OF A PREFERRED
`EMBODIMENT
`
`that is to be
`In the folloWing discussion, the signal
`enhanced is referred to for convenience as “noisy speech,”
`although not only speech, but also other audible signals are
`advantageously enhanced according to the present inven
`tion.
`is analyZed at
`As shoWn in FIG. 2, the noisy speech
`block 40 into M sub-band time series c(k,m), k=0,1, .
`.
`.
`,
`M-1. At block 50, a signal estimate s(k,m) is calculated for
`each sub-band. As Will be seen, this signal estimate is a
`short-term average of the sub-band time series. When speech
`is present, s(k,m) estimates the signal level corresponding to
`the speech.
`At block 60, a noise estimate n(k,m) is calculated for each
`sub-band. As Will be seen, this noise estimate is a long-term
`average of the sub-band time series. It estimates the station
`ary component of the corrupted input signal, Which is
`assumed to correspond to background noise.
`At block 70, a narroWband de?ection d(k,m) is calculated
`for each sub-band. This is one of tWo de?ections to be
`calculated. Each of these de?ections is a time series derived
`from the signal and noise estimates. The narroWband de?ec
`tion is derived from the sub-band signal and noise estimates,
`so as to be particularly sensitive to, e.g., the energy in voiced
`speech.
`At block 80, a broadband de?ection D(k,m) is calculated
`for each sub-band. This second de?ection is derived from
`the sub-band noise estimate and from an average over plural
`sub-bands of the respective sub-band signal estimates, so as
`to be particularly sensitive to, e.g., the energy in unvoiced
`speech.
`At block 90, a lumped de?ection PHI(k,m) is calculated
`from the narroWband and broadband de?ections. Roughly
`speaking, the lumped de?ection indicates the presence of
`speech When speech is indicated by either the narroWband or
`broadband de?ection. In addition, an expansion factor p is
`used to tailor the sensitivity of PHI to the respective de?ec
`tions.
`At block 100, a respective sub-band gain g(k,m) is applied
`to each of the sub-band time series c(k,m). Typically, this
`sub-band gain has an upper bound of unity. This upper
`bound is attained When speech is likely to be present. At
`other times, the gain assumes values less than one. The
`
`RTL345-2_1025-0007
`
`

`
`6,035,048
`
`5
`expansion factor p affects the rate at Which this gain decays
`as the incidence of speech becomes less likely. Signi?cantly,
`this gain is calculated as a time series, as shoWn in the
`notation used herein by the functional dependence on the
`time index m.
`At block 110, each sub-band time series c(k,m) is modi
`?ed by its corresponding sub-band gain g(k,m).
`At block 120, the modi?ed sub-band time series are
`synthesiZed to form modi?ed, full-band output signal y(n),
`also referred to herein as “noise-reduced speech.”
`Each of the processing stages discussed above is
`described in greater detail beloW, With reference to the
`pertinent ?gure. Each of these processing stages is conve
`niently carried out by a general-purpose digital computer,
`such as a desktop personal computer, under the control of an
`appropriate stored program or programs. Equivalently, some
`or all of these stages can be carried out using special-purpose
`electronic signal-processing circuits.
`Our currently preferred sub-band analysis technique is
`based on a perfect reconstruction ?lter bank using the
`discrete Fourier transform (DFT) ?lter bank method. This
`method is Well-knoWn in the art, and described in detail in,
`e.g., CROCHIERE. Accordingly, this method need not be
`described in detail here. HoWever, referring back to FIG. 1,
`it should be noted that perfect reconstruction ?lter banks
`have the property that When spectral modi?er 20 applies the
`identity function (i.e., unity gain across all sub-bands), the
`output of synthesiZer 30 is identical to the input to analyZer
`10 (Within the accuracy of the digital computation).
`As shoWn in FIG. 3, the operations of the sub-band
`analysis stage can be described in terms of accumulator 130,
`analysis WindoW 140, and Fast Fourier Transform (FFT)
`150. Time-series samples are processed in blocks of L
`samples, Where L is an integer. The term “epoch” is used to
`denote the action of processing one such block. Thus, at the
`beginning of each processing epoch, a data block consisting
`of L neW time-series samples
`is shifted into accumulator
`130, Which is exemplarily a shift register. The total length of
`this accumulator is N samples, Wherein N is the siZe of the
`Fourier transform, and N>L. Those skilled in the art of
`digital ?ltering Will appreciate that the number M of unique
`complex sub-bands is related to the siZe of the Fourier
`transform according to the formula:
`
`By Way of illustration, our current implementation, sam
`pling at a rate of 8 kHZ, has 33 unique sub-bands spanning
`the frequency range 0—4000 HZ.
`When L neW samples are shifted into the accumulator, the
`L oldest samples are shifted out. In our current
`implementation, the value of L is 16 and the value of N is
`64. These values are illustrative, and not essential to the
`practice of the invention.
`The N-vector of accumulated samples is multiplied by
`analysis WindoW 140, Which is a WindoW of length N.
`Analysis WindoWs are Well-knoWn in the digital ?ltering
`arts, and discussed at length in, e.g., CROCHIERE. Thus,
`they need not be described here in detail. Brie?y, an analysis
`WindoW is a function that embodies the frequency-selective
`properties of a digital ?lter, and conditions the sampled data
`to avoid a by-product of digital processing knoWn as fre
`quency aliasing. Frequency aliasing is undesirable because
`it can lead to distracting audible artifacts in the
`reconstructed, processed signal.
`The N-vector of WindoWed data is then subjected to
`N-point FFT 150. As noted, this transform is effectuated, in
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`6
`our current implementation, using the DFT algorithm. Each
`frequency bin output from the DFT represents one neW
`complex time-series sample for the sub-band frequency
`range corresponding to that bin. The bandWidth of each bin,
`or sub-band time series, is given by the ratio of sampling
`frequency to transform length.
`As shoWn graphically in FIG. 4, the signal estimate s(k,m)
`in each sub-band is computed (block 4.1) using the folloW
`ing non-linear single-pole recursion:
`
`The value of the coef?cient A is determined by a test (block
`4.2) of Whether the magnitude of the neW data sample c(k,m)
`is greater, or not greater, than the current value of the signal
`estimate. Depending on the outcome of this test, A assumes
`(blocks 4.3, 4.4) one of tWo alternative values, namely an
`“attack” value AiATTACK and a “decay” value
`AiDECAY, respectively. In our current implementation, a
`useful range for AiATTACK is 1—10 ms, and a useful range
`for AiDECAY is 20—50 ms. These speci?c values are
`illustrative and not essential to the practice of the invention.
`As shoWn graphically in FIG. 5, the noise estimate n(k,m)
`in each sub-band is computed (block 5.1) using the folloW
`ing non-linear single-pole recursion:
`
`The value of the coef?cient B is determined by a test (block
`5.2) of Whether the magnitude of the neW data sample c(k,m)
`is greater, or not greater, than the current value of the noise
`estimate. Depending on the outcome of this test, B assumes
`(blocks 5.3, 5.4) one of tWo alternative values, namely an
`“attack” value BiATTACK and a “decay” value
`BiDECAY, respectively. In our current implementation, a
`useful range for BiATTACK is 1—10 seconds, and a useful
`range for BiDECAY is 1—50 ms. These values are illustra
`tive and not essential to the practice of the invention.
`As also shoWn in FIG. 5, the updating of the noise
`estimate is advantageously conditioned on a test (block 5.5)
`of Whether the magnitude of the neW data sample c(k,m) is
`less than the current value of the noise estimate, times a
`multiplier T. By Way of illustration, our current implemen
`tation has T=20. This prevents an update of the noise
`estimate if the neW data sample exceeds the current value of
`the noise estimate by 26 dB. This condition prevents the
`noise estimate from being unduly biased (upWard) by
`samples Whose magnitudes are high enough that they assur
`edly represent speech or other non-stationary signal energy.
`I have found that this condition signi?cantly improves the
`stability of the noise estimate for extended speech utter
`ances.
`As also shoWn in FIG. 5, it is advantageous, in at least
`some cases, to impose (block 5.6) an upper bound, denoted
`NOISEiPROFILE(k), on the noise estimate in each sub
`band. NOISEiPROFILE(k) is advantageously matched to
`the dynamic range of the corrupted signal to be enhanced.
`The practical effect of this upper bound is to automatically
`inhibit the enhancement process in abnormally noisy envi
`ronments. Such inhibition is useful for preventing speech
`processing artifacts that often arise in such environments
`and that are perceived as unacceptable distortion.
`It should be noted that Whereas other forms can be used
`for the signal and noise estimates, the non-linear single-pole
`recursion relations discussed above for the signal and noise
`estimates are advantageous because they are computation
`ally simple. Moreover, they have the desirable property of
`adapting to changes in the character and absolute level of the
`
`RTL345-2_1025-0008
`
`

`
`6,035,048
`
`7
`noise and signal processes. Indeed, practitioners have rec
`ogniZed this and have Widely used these relations in various
`voice-processing applications.
`As shoWn in FIG. 6, the narroWband de?ection is obtained
`as the ratio of the sub-band signal estimate to the sub-band
`noise estimate. That is,
`
`I have found that for detection of broadband energy, it is
`advantageous to combine, in a certain sense, the results of
`tWo or more narroWband de?ection ratios. That is, a lumped
`broadband de?ection coef?cient is advantageously com
`puted by taking an arithmetic average of 2K+1 narroWband
`de?ection coef?cients (K a positive integer) in a range of
`sub-bands centered about a given sub-band, each of these
`coef?cients taken relative to the noise estimate in the given
`sub-band. Thus, as shoWn in FIG. 7, the broadband de?ec
`tion coef?cient D(k,m) is given by:
`
`10
`
`15
`
`20
`
`8
`According to the ?rst of these formulas, the narroWband
`and broadband de?ection coef?cients are each normaliZed to
`a respective threshold GAMMAiNB or GAMMAiBB.
`These thresholds represent the respective levels at Which the
`de?ection ratios are declared to indicate a certainty of
`speech energy. In a current implementation, both of these
`thresholds are set to 30.0.
`The greater of the tWo normaliZed de?ection coef?cients
`determines the value of PHI(k,m). An expansion factor p
`controls the rate at Which the lumped de?ection ratio decays
`for de?ection ratios less than unity. According to a current
`implementation, p is equal to unity, providing linear decay
`With the envelope of the sub-band signal energy. The ?rst
`formula is expressed by:
`
`According to the second formula, the lumped de?ection
`coef?cient is determined by the narroWband de?ection coef
`?cient and the expansion factor. The second formula is
`expressed by:
`
`As shoWn in FIG. 9, the signal gain function g(k,m) is
`determined by PHI(k,m), but has an upper bound of unity.
`That is,
`
`Thus, each sub-band time series having a de?ection of unity
`or less is passed to the synthesis ?lter bank With gain given
`by PHI(k,m), but each such series having a greater de?ection
`is passed to the synthesis bank With unity gain.
`As shoWn in FIG. 10, the input to the sub-band synthesis
`stage (in each processing epoch of index m) includes one
`complex time-series sample g(k,m)°c(k,m) for each of the M
`sub-bands. These M samples are processed by inverse FFT
`160 to produce an output vector of length N, as is Well
`knoWn in the art. This output vector is processed by syn
`thesis WindoW (of length N) 170, Which is the counterpart,
`on the synthesis side, of analysis WindoW 140. The output of
`synthesis WindoW 170 is a further vector of length N. This
`vector is input to accumulator 180, Which is the counterpart
`on the synthesis end of accumulator 130.
`Input to accumulator 180 takes place in frames of length
`N. Output from accumulator 180 takes place in blocks of
`length L. Data are transferred to the accumulator in an
`overlap-and-add operation. In such an operation, the neW
`(processed) samples are added to the previous values stored
`in corresponding cells of the accumulator. When L samples
`are shifted out of the output end of the accumulator, a
`sequence of L Zeroes is inserted at the input end. The output
`of accumulator 180 corresponds to the noise-reduced
`speech, y(n).
`It Will be appreciated that the inventive method involves
`a modest number of adjustable parameters. Although at least
`some of these Will typically be set in the factory, others can
`optionally be set in the ?eld, either manually by the user or
`automatically. Exemplary ?eld-settable parameters may
`include, among others, the bandWidth 2K+1 for broadband
`speech detection, the expansion coefficient p, and the respec
`tive speech thresholds GAMMAiNB and GAMMAiBB.
`In one illustrative scenario, a user of a telephone desires
`to improve the intelligibility of far-in speech; that is, of
`speech that is received from a remote location. Manual
`controls are readily provided so that such a user can select
`those values of the ?eld-settable parameters that afford the
`greatest speech intelligibility as perceived by that user.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`It should be noted in this regard that D(k,m) cannot be
`evaluated for values of k less than K It should further be
`noted that M-l is the maximum sub-band index. Thus,
`D(k,m) cannot be evaluated for values of k greater than
`M-K-l.
`In a current implementation, the value of K is 2. Other
`values of K (including the unity value as Well as values
`greater than 2) are readily chosen to provide optimal per
`formance in speci?c applications.
`I have found that the expression given above for D(k,m),
`in Which the central sub-band noise estimate appears directly
`in the denominator, is generally preferable to an arithmetic
`average of 2K+1 distinct narroWband de?ection coef?cients.
`This is because, for some classes of broadband voice
`utterances, the frequency band edges of the utterance that are
`poorly represented by the narroWband de?ection coef?cient
`are better represented by a broadband de?ection coef?cient
`that incorporates only the signal estimate from bands neigh
`boring those edges.
`Other techniques can also be used to obtain a broadband
`de?ection coefficient. For example, an alternate embodiment
`is readily implemented that includes a second sub-band ?lter
`architecture having broader sub-bands than that described
`above. (Such sub-bands may be referred to, e.g., as “auxil
`iary” sub-bands.) Broadband de?ection coef?cients are
`obtained by, e.g., a procedure analogous to the computation
`of d(k,m), but using this second ?lter architecture. Th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket