throbber
United States Patent [19]
`Arslan et al.
`
`I IIIII
`
`11111111~1111111011111111111111111111~11~ 11111111
`US005706395A
`5,706,395
`[11] Patent Number:
`[45] Date of Patent:
`Jan.6, 1998
`
`[54] ADAPTIVE WEINER FILTERING USING A
`DYNAMIC SUPPRESSION FACTOR
`
`[56]
`
`References Cited
`
`PUBUCATIONS
`
`[75]
`
`Inventors: Levent M. Arslan, Durham. N.C.; Alan
`V. McCree. Dallas; VIShu R.
`Viswanathan, Plano, both of Tex.
`
`[73] Assignee: Texas Instruments Incorporated,
`Dallas, Tex.
`
`[21] Appl. No.: 425,125
`
`[22] Filed:
`
`Apr. 19, 1995
`
`Int. CL6 ........................................................ G10L3/02
`[51]
`[52] U.S. CI •........................ 395/2.35; 395/2.36; 395/2.37
`[58] Field of Search .................................. 395/2.35, 2.36.
`395/2.37' 2.38, 2.42, 2.28; 381/34, 36
`
`Deller et al .. "Discrete-Time Processing of Speech Signals,"
`Prentice-Hall, Inc., pp. 506-528. 1987.
`Arslan et al., "New Methods for Adaptive Noise Suppres(cid:173)
`sion," ICASSP '95: Acoustics, Speech & Signal Processing
`Conference, pp. 812-815. May 1995.
`
`Primary Examiner-Allen R. MacDonald
`Assistant Examiner-Vijay Chawan
`Attome)\ Agent, or Firm-Carlton H. Hoel; W. James
`Brady; Richard L. Donaldson
`
`[57]
`
`ABSTRACT
`
`An acoustic noise suppression filter including attenuation
`filtering with a noise suppression factor depending upon the
`ratio of estimated noise energy of a frame divided by
`estimated signal energy.
`
`12 Claims, 7 Drawing Sheets
`
`WINDOWED y(j)
`INPUT
`FRAME
`
`y(j)
`,
`
`FFT
`
`Y(w)
`
`r
`
`A S (w)
`
`Py(w)
`
`H(w)
`
`FILTER
`PARAMETERS;
`CLAMP AND
`[SMOOTH]
`
`Py(w)
`
`PN(w)
`
`a
`
`_i I
`
`ENERGY
`
`Ey
`
`PRECEDING
`FRAME
`NOISE
`
`Ey
`
`PN'(w}
`
`y(j)
`
`LPC
`
`a· I
`
`OFT
`
`PRECEDING
`FRAME
`
`I
`
`:ex'
`
`I
`
`'
`
`UPDATE
`NOISE
`
`__., UPDATE
`ex
`EN
`
`Petitioner Apple Inc.
`Ex. 1011, p. 1
`
`

`
`U.S. Patent
`
`Jan. 6, 1998
`
`Sheet 1 of 7
`
`5,706,395
`
`~~
`
`102
`
`104
`
`106
`
`SAMPLING
`A/D
`CONVERTOR
`
`TRANSMIT/
`STORAGE
`
`NOISE
`SUPPRESSION
`
`ANALYSIS
`
`SYNTHESIS
`
`DAC
`
`108
`
`110
`FIG. 1a
`
`112
`
`100 "
`~
`
`150
`(
`
`RECOGNITION
`ANALYSIS
`
`OUTPUT
`
`NOISE
`SUPPRESSION
`
`SAMPLING
`A/0
`CONVERTOR
`FIG. 1b
`
`208
`
`NOISE
`FILTER
`
`206
`
`200
`(
`
`IFFT
`
`210
`
`NOISE
`SUPPRESSED
`FRAME BUFFER
`
`212
`
`SPEECH
`SAMPLE
`STREAM
`
`FRAME
`BUFFER
`
`FFT
`
`202
`
`204
`
`FIG. 2
`
`Petitioner Apple Inc.
`Ex. 1011, p. 2
`
`

`
`U.S. Patent
`
`Jan. 6, 1998
`
`Sheet 2 of 7
`
`5,706,395
`
`WINDOWED
`INPUT
`FRAME
`
`y(j)
`
`FFT
`
`Y(w)
`
`PRECEDING
`FRAME
`NOISE
`
`No{w)
`
`UPDATE
`NOISE
`
`N(w)
`
`INCREASE
`NOISE
`
`Y(w)
`
`W•IYI2(w)
`
`SMOOTH
`
`W•IY12(w)
`
`~
`
`CLAMP
`
`2N(w)
`
`H{w)
`
`S"(w)
`
`FIG. 3
`
`y(j)
`
`LPC
`
`a· I
`
`OFT
`
`Py(w)
`
`t I
`ENERGY
`
`Ey
`
`PRECEDING
`FRAME
`NOISE
`
`Ey
`
`PNI(w)
`
`UPDATE
`NOISE
`
`Py(w)
`
`PN(w)
`
`a
`
`PRECEDING
`FRAME
`I
`la'
`I
`i
`___.. UPDATE
`a
`EN
`
`WINDOWED yU)
`INPUT
`FRAME
`
`y(j)
`
`FFT
`
`Y(w)
`
`,.
`S (w)
`
`FILTER
`H(w) PARAMETERS; ~
`CLAMP AND
`[SMOOTH]
`
`FIG. 4
`
`Petitioner Apple Inc.
`Ex. 1011, p. 3
`
`

`
`U.S. Patent
`
`Jan. 6, 1998
`
`Sheet 3 of 7
`
`5,706,395
`
`•
`
`LPC
`
`a·
`.....!.... LSFn
`
`...!._. DISTANCE
`
`LSF·
`I CODEBOOK
`
`LSF·
`1
`
`di
`
`NOISE FREE
`LSF5
`
`LSF5
`
`LPC
`
`,.,
`a·
`I
`
`OFT
`
`P§(w)
`
`ITERATION
`
`WINDOWED y(j)
`INPUT
`FRAME
`y(j)
`
`FFT
`
`Y(w)
`
`q>
`
`IFFT
`
`1 • OUT
`
`H(w)
`
`CLAMP
`
`PN(w)
`
`FIG. 5
`
`y(j)
`
`ENERGY
`MEASURE
`
`WINDOWED
`INPUT
`FRAME
`y(j)
`,
`SCALE
`
`cy(j)
`
`FIG. 6
`
`FFT, H(w),
`
`INVERSE
`
`IFFT •
`SCALE • OUT
`
`PRECEDING
`FRAME
`NOISE
`
`NOISE
`UPDATE
`f
`Y(w)
`
`Petitioner Apple Inc.
`Ex. 1011, p. 4
`
`

`
`U.S. Patent
`
`Jan.6, 1998
`
`Sheet 4 of 7
`
`5,706,395
`
`SUPPRESSION
`FACTOR
`H(w) IN dB
`
`or---~~~===F:==~~~
`-5
`-r------------r------------T---
`-------T------------
`:
`:
`: NOISE
`:
`--{INCREASED--+------------
`i
`SPECTRAL
`I CLAMPED
`- SUBTRACTION ----------:------------1------------:------------
`
`1
`I
`
`I
`
`I
`
`I
`I
`
`:
`
`-10
`
`-15
`
`I
`I
`I
`
`I
`
`I
`
`I
`I
`
`I
`I
`
`1
`I
`
`-25
`
`-20
`
`1
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`------------L------------L------------~------------~------------
`1
`I
`I
`I
`I
`I
`I
`I
`t
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`------------L------------L------------L------------~------------
`1
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`-30+-------+-------~------~-------r------~
`2
`4
`6
`8
`10
`0
`UNPROCESSED INPUT SIGNAL TO NOISE RATIO lN dB
`FIG. 7
`
`DISTRIBUTION OF SPECTRAL ESTIMATES FOR WGN, SMOOTHING= 1,5,33, 128
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`-----------,----------- -r ---------- --•-- --------- -T
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`------------1------------!-------------:------------i
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`------------1------------r-----------~------------i
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`1
`I
`
`-----------4------------r-----------~-----------
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`0.5
`
`0.4
`
`0.3
`
`0.2
`
`Q_l
`
`- L-----------~------------
`: 128 ELEMENT
`o~==~~------~~~~--~~~SM~O~OT~H~IN~G----~
`-20
`-15
`-10
`-5
`0
`5
`10
`POWER SPECTRUM IN dB
`FIG. 8
`
`l
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`-----------J------------L-----------~-------
`i
`5 ELEMENT
`1
`:
`SMOOTHING
`:
`-----------1------------t-----------~----
`1
`I
`I
`I
`I
`I
`I
`I
`I
`-----------~------------L-----------~-
`1
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`1
`
`I
`
`-----------J------------L--------
`:
`NO
`:
`: SMOOTHING :
`------------'-
`-
`
`I
`I
`I
`
`I
`I
`--~-----------
`1
`I
`I
`I
`I
`
`------r-----------
`
`1
`I
`I
`I
`I
`
`---------r-----------
`
`1
`I
`I
`I
`I
`
`-----------.-----------
`
`1
`I
`I
`I
`I
`
`------------r-----------
`
`1
`I
`I
`I
`I
`
`------------r-----------
`
`1
`I
`I
`I
`I
`
`:--33-ELEM-ENf __________ _
`: SMOOTHING
`-,------------i-----------
`
`1
`I
`I
`I
`
`I
`I
`I
`I
`
`Petitioner Apple Inc.
`Ex. 1011, p. 5
`
`

`
`U.S. Patent
`
`Jan. 6, 1998
`
`Sheet 5 of 7
`
`5,706,395
`
`910
`
`pN'
`
`PN'
`
`NOISE
`BUFFER
`
`908
`
`Py
`
`Py
`
`900
`~
`
`906
`
`IN
`
`902
`
`OUT
`
`920
`
`922
`
`930
`
`FIG. 9a
`
`950
`~
`
`906
`
`904
`
`Py
`
`910
`
`908
`
`NOISE
`BUFFER
`
`NP
`
`954
`
`IN
`
`902
`
`OUT
`
`920
`
`922
`
`930
`
`FIG. 9b
`
`Petitioner Apple Inc.
`Ex. 1011, p. 6
`
`

`
`U.S. Patent
`
`Jan. 6, 1998
`
`Sheet 6 of 7
`
`5,706,395
`
`LOG Py(w)/PN(w)
`10d8
`
`5d8
`
`,/ADAPTIVE
`'
`CLAMP
`
`FIG.
`
`10a
`
`1/
`
`____ ________
`________
`7
`CONSTANT
`CLAMP
`
`/
`/
`
`I
`I
`I
`I
`I
`I
`
`I
`I
`
`' I
`' I
`I
`I
`I
`
`-5d8
`
`LOG H(w) -lOdB
`
`-15d8
`
`LOG Py(w)/PN(w)
`
`0
`1~8
`~8
`Or---------~-----------+------------------
`STANDARD SPECTRAL
`SUBTRACTION
`
`LOG H(w)
`
`-5d8
`
`-15d8
`
`FIG. 10b
`
`1302
`
`1300
`i
`
`OUT
`
`1304
`
`1306
`
`1308
`
`IN
`
`FIG. 13
`
`Petitioner Apple Inc.
`Ex. 1011, p. 7
`
`

`
`U.S. Patent
`
`Jan. 6, 1998
`
`Sheet 7 of 7
`
`5,706,395
`
`1100
`~
`
`Py(w)
`
`1112
`
`1114
`
`1106
`
`LPC
`COEFFICIENT
`ANALYZER
`
`1104
`
`IN
`
`r(j)
`
`AUTO
`CORRELATOR
`
`1102
`
`1130
`
`1120
`FIG. 1 1
`
`OUT
`
`1122
`
`LPC-
`TO-
`LSF
`
`o· I
`
`1208
`
`1206
`
`1210
`
`LSF·
`J
`
`CODEBOOK
`OF LSF5
`
`1212
`
`r( )
`
`1224
`
`1204
`
`1220
`
`pN'
`
`Ps
`
`IN
`
`1202
`
`~
`1200
`
`OUT
`
`1230
`
`1232
`
`1234
`
`1240
`
`FIG. 12
`
`Petitioner Apple Inc.
`Ex. 1011, p. 8
`
`

`
`5~706,395
`
`1
`ADAPTIVE WEINER Fll.TERING USING A
`DYNAMIC SUPPRESSION FACTOR
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`Cofiled patent applications with Ser. Nos. 081424.928.
`08/426.426. 08/426.746 and 08/426,427 discloses related
`subject matter. These applications all have a common
`assignee.
`
`2
`transmission rate may be only 2.4 Kbps rather than the 64
`Kbps of PCM. In practice, the LPC coefficients must be
`quantized for transmission, and the sensitivity of the filter
`behavior to the quantization error has led to quantization
`5 based on the Line Spectral Frequencies (LSF) representa(cid:173)
`tion.
`To improve the sound quality, further information may be
`extracted from the speech. compressed and transmitted or
`stored along with the LPC coefficients, pitch. voicing, and
`10 gain. For example, the codebook excitation linear prediction
`( CELP) method first analyzes a speech frame to find the LPC
`filter coefficients, and then filters the frame with the LPC
`filter. Next. CELP determines a pitch period from the filtered
`frame and removes this periodicity with a comb filter to
`15 yield a noise-looking excitation signal. Lastly. CELP
`encodes the excitation signals using a codebook. Thus CELP
`transmits the LPC filter coefficients. pitch, gain, and the
`codebook index of the excitation signal.
`The advent of digital cellular telephones has emphasized
`20 the role of noise suppression in speech processing, both
`coding and recognition. Customer expectation of high per(cid:173)
`formance even in extreme car noise situations plus the
`demand to move to progressively lower data rate speech
`coding in order to accommodate the ever-increasing number
`25 of cellular telephone customers have contributed to the
`importance of noise suppression. While higher data rate
`speech coding methods tend to maintain robust performance
`even in high noise environments. that typically is not the
`case with lower data rate speech coding methods. The
`30 speech quality of low data rate methods tends to degrade
`drastically with high additive noise. Noise supression to
`prevent such speech quality losses is important. but it must
`be achieved without introducing any undesirable artifacts or
`speech distortions or any significant loss of speech intelli-
`35 gibility. These performance goals for noise suppression have
`existed for many years, and they have recently come to the
`forefront due to digital cellular telephone application.
`FlG. 1a schematically illustrates an overall system 100 of
`40 modules for speech acquisition. noise suppression, analysis,
`transmission/storage, synthesis, and playback. A micro(cid:173)
`phone converts sound waves into electrical signals. and
`sampling analog-to-digital converter 102 typically samples
`at 8 KHz to cover the speech spectrum up to 4 KHz. System
`45 100 may partition the stream of samples into frames with
`smooth windowing to avoid discontinuities. Noise suppres(cid:173)
`sion 104 filters a frame to suppress noise, and analyzer 106
`extracts LPC coefficients, pitch. voicing. and gain from the
`noise-suppressed frame for transmission and/or storage 108.
`50 The transmission may be any type used for digital informa(cid:173)
`tion transmission, and the storage may likewise be any type
`used to store digital information. Of course, types of encod(cid:173)
`ing analysis other than LPC could be used. Synthesizer 110
`combines the LPC coefficients, pitch, voicing, and gain
`55 information to synthesize frames of sampled speech which
`digital-to-analog convertor (DAC) 112 converts to analog
`signals to drive a loudspeaker or other playback device to
`regenerate sound waves.
`FIG. 1b shows an analogous system 150 for voice rec-
`60 ognition with noise suppression. The recognition analyzer
`may simply compare input frames with frames from a
`database or may analyze the input frames and compare
`parameters with known sets of parameters. Matches found
`between input frames and stored information provides rec-
`65 ognition output
`One approach to noise suppression in speech employs
`spectral subtraction and appears in Boll. Suppression of
`
`BACKGROUND OF THE INVENTION
`
`The invention relates to electronic devices, and, more
`particularly, to speech analysis and synthesis devices and
`systems.
`Human speech consists of a stream of acoustic signals
`with frequencies ranging up to roughly 20 KHz; but the band
`of 100 Hz to 5 KHz contains the bulk of the acoustic energy.
`Telephone transmission of human speech originally con(cid:173)
`sisted of conversion of the analog acoustic signal stream into
`an analog electrical voltage signal stream (e. g .. microphone)
`for transmission and reconversion to an acoustic signal
`stream (e.g., loudspeaker) for reception.
`The advantages of digital electrical signal transmission
`led to a conversion from analog to digital telephone trans(cid:173)
`mission beginning in the 1960s. Typically, digital telephone
`signals arise from sampling analog signals at 8 KHz and
`nonlinearly quantizing the samples with 8-bit codes accord(cid:173)
`ing to the j.l-law (pulse code modulation, or PCM). A clocked
`digital-to-analog converter and companding amplifier recon(cid:173)
`struct an analog electrical signal stream from the stream of
`8-bit samples. Such signals require transmission rates of 64
`Kbps (kilobits per second). Many communications
`applications, such as digital cellular telehone. cannot handle
`such a high transmission rate, and this has inspired various
`speech compression methods.
`The storage of speech information in analog format (e.g.,
`on magnetic tape in a telephone answering machine) can
`likewise be replaced with digital storage. However, the
`memory demands can become overwhelming: 10 minutes of
`8-bit PCM sampled at 8 KHz would require about 5 MB
`(megabytes) of storage. This demands speech compression
`analogous to digital transmission compression.
`One approach to speech compression models the physi(cid:173)
`ological generation of speech and thereby reduces the nec(cid:173)
`essary information transmitted or stored. In particular, the
`linear speech production model presumes excitation of a
`variable filter (which roughly represents the vocal tract) by
`either a pulse train for voiced sounds or white noise for
`unvoiced sounds followed by amplification or gain to adjust
`the loudness. The model produces a stream of sounds simply
`by periodically making a voiced/unvoiced decision plus
`adjusting the filter coefficients and the gain. Generally, see
`Markel and Gray, Linear Prediction of Speech (Springer(cid:173)
`Verlag 1976).
`More particularly, the linear prediction method partitions
`a stream of speech samples s(n) into "frames" of. for
`example, 180 successive samples (22.5 msec intervals for a
`8 KHz sampling rate); and the samples in a frame then
`provide the data for computing the filter coefficients for use
`in coding and synthesis of the sound associated with the
`frame. Each frame generates coded bits for the linear pre(cid:173)
`diction filter coefficients (LPC). the pitch, the voiced/
`unvoiced decision, and the gain. This approach . of encoding
`only the model parameters represents far fewer bits than
`encoding the entire frame of speech samples directly, so the
`
`Petitioner Apple Inc.
`Ex. 1011, p. 9
`
`

`
`5,706,395
`
`3
`Acoustic Noise in Speech Using Spectral Subtraction, 27
`IEEE Tr. ASSP 113 (1979), and Lim and Oppenheim.
`Enhancement and Bandwidth Compression of Noisy
`Speech, 67 Proc. IEEE 1586 (1979). Spectral subtraction
`proceeds roughly as follows. Presume a sampled speech 5
`signal sO) with uncorrelated additive noise n(j) to yield an
`observed windowed noisy speech y(j)=s(j}+n(j). These are
`random processes over time. Noise is assumed to be a
`stationary process in that the process's autocorrelation
`depends only on the difference of the variables; that is, there 10
`is a function r "(.) such that:
`
`E{ n(j)n(i) }=r~i-J)
`
`where E is the expectation. The Fourier transform of the
`autocorrelation is called the power spectral density, P~co). 15
`If speech were also a stationary process with autocorrelation
`rij) and power spectral density Piro), then the power
`spectral densities would add due to the lack of correlation:
`
`4
`The power spectral density P~co) of the noise signal can
`be estimated by detection during noise-only periods, so the
`speech power spectral estimate becomes
`
`IS (ro)f!
`
`IY(ro)f!- IN(ro)i2
`
`Pr(ro)- PN(ro)
`
`which is the spectral subtraction.
`The spectral subtraction method can be interpreted as a
`time-varying linear filter H(co) so that S'(co)=H(ro)Y(co)
`which the foregoing estimate then defines as:
`
`H(ro)'-[Pr(ro)-PM:ro)]/Pr(ro)
`
`The ultimate estimate for the frame of windowed speech,
`s'(j), then equals the inverse Fourier transform of S'(co), and
`then combining the estimates from successive frames
`("overlap add") yields the estimated speech stream.
`This spectral subtraction can attenuate noise substantially,
`20 but it has problems including the introduction of fluctuating
`tonal noises commonly referred to as musical noises.
`The Lim and Oppenheim article also describes an alter(cid:173)
`native noise suppression approach using noncausal Wiener
`filtering which minimizes the mean-square error. That is,
`25 again s·(ro)=H(ro)Y(ro) but with H(ro) now given by:
`
`Hence, an estimate for Ps(co), and thus s(j), could be
`obtained from the observed noisy speech yO) and the noise
`observed during intervals of (presumed) silence in the
`observed noisy speech. In particular, take Py(co) as the
`squared magnitude of the Fourier transform of y(j) and
`P ~co) as the squared magnitude of the Fourier transform of
`the observed noise.
`Of course, speech is not a stationary process, so Lim and
`Oppenheim modified the approach as follows. Take s(j) not 30
`to represent a random process but rather to represent a
`windowed speech signal (that is, a speech signal which has
`been multiplied by a window function), n(j) a windowed
`noise signal, and y(j) the resultant windowed observed noisy
`speech signal. Then Fourier transforming and multiplying 35
`by complex conjugates yields:
`
`H(ro)=P J_ro)I[P .{ro)+P _,{ro)]
`
`This Wiener filter generalizes to:
`
`H(ro)=[P.(ro)/[P J_ro)+a.P _,{ro)]JP
`
`where constants a and [3 are called the noise suppression
`factor and the filter power. respectively. Indeed, a=1 and
`[3='12 leads to the spectral subtraction method in the follow(cid:173)
`ing.
`A noncausal Wiener filter cannot be directly applied to
`provide an estimate for s(j) because speech is not stationary
`and the power spectral density Ps(ro) is not known. Thus
`approximate the noncausal Wiener filter by an adaptive
`40 generalized Wiener filter which uses the squared magnitude
`of the estimate SA(ro) in place of P s(co):
`
`H(ro)=(IS"(ro)f'/IIS"(rof+ilE{ INroi'} ])P
`
`45 Recalling S"(ro)=H(co)Y(ro) and then solving for IS'(ro)l in
`the [3='12 case yields:
`IS'( ro )~[IY( ro i -«£{ IN( ro )P} ]112
`
`IY(ro)f=IS(ro)I2+1N(ro)2+2Re{S(ro)N(ro)*}
`
`For ensemble averages the last term on the righthand side of
`the equation equals zero due to the lack of correlation of
`noise with the speech signal. This equation thus yields an
`estimate, S'(co), for the speech signal Fourier transform as:
`
`IS'(ro)f=IY(ro)i'-E{IN(rof}
`
`This resembles the preceding equation for the addition of
`power spectral densities.
`An autocorrelation approach for the windowed speech
`and noise signals simplifies the mathematics. In particular,
`the autocorrelation for the speech signal is given by
`
`r,(j)=!:,s( <)S( i+j),
`
`with similar expressions for the autocorrelation for the noisy
`speech and the noise. Thus the noisy speech autocorrelation
`is:
`
`55
`
`which just replicates the spectral subtraction method when
`50 0.=1.
`However, this generalized Wiener filtering has problems
`including how to estimate S', and estimators usually apply
`an iterative approach with perhaps a half dozen iterations
`which increases computational complexity.
`Ephraim. A Minimum Mean Square Error Approach for
`Speech Enhancement, Conf. Proc. ICASSP 829 (1990),
`derived a Wiener filter by first analyzing noisy speech to find
`linear prediction coefficients (LPC) and then resynthesizing
`an estimate of the speech to use in the Wiener filter.
`In contrast, O'Shaughnessy, Speech Enhancement Using
`Vector Quantization and a Formant Distance Measure, Conf.
`Proc. ICASSP 549 (1988), computed noisy speech formants
`and selected quantized speech codewords to represent the
`speech based on formant distance; the speech was resynthe(cid:173)
`sized from the codewords. This has problems including
`degradation for high signal-to-noise signals because of the
`speech quality limitations of the LPC synthesis.
`
`where c5~.) is the cross correlation of s(j) and n(j). But the
`speech and noise signals should be uncorrelated, so the cross
`correlations can be approximated as 0. Hence, ry(j)=rs0)+ 60
`r~). And the Fourier transforms of the autocorrelations are
`just the power spectral densities, so
`
`Pr(ro)=P .{ro)+P_,(ro)
`
`Of course, Py(co) equals IY(co)l2 with Y(co) the Fourier 65
`transform of y(j) due to the autocorrelation being just a
`convolution with a time-reversed variable.
`
`Petitioner Apple Inc.
`Ex. 1011, p. 10
`
`

`
`5,706,395
`
`5
`The Fourier transforms of the windowed sampled speech
`signals in systems 100 and 150 can be computed in either
`fixed point or floating point format. Fixed point is cheaper
`to implement in hardware but has less dynamic range for a
`comparable number of bits. Automatic gain control limits
`the dynamic range of the speech samples by adjusting
`magnitudes according to a moving average of the preceding
`sample magnitudes, but this also destroys the distinction
`between loud and quiet speech. Further, the acoustic energy
`may be concentrated in a narrow frequency band and the
`Fourier transform will have large dynamic range even for
`speech samples with relatively constant magnitude. To com(cid:173)
`pensate for such overflow potential in fixed point format, a
`few bits may be reserved for large Fourier transform
`dynamic range; but this implies a loss of resolution for small
`magnitude samples and consequent degradation of quiet
`speech. This is especially true for systems which follow a
`Fourier transform with an inverse Fourier transform.
`
`SUMMARY OF THE INVENTION
`
`The present invention provides speech noise suppression
`by spectral subtraction filtering improved with filter
`clamping, limiting, and/or smoothing, plus generalized
`Wiener filtering with a signal-to-noise ratio dependent noise
`suppression factor, and plus a generalized Wiener filter
`based on a speech estimate derived from codebook noisy
`speech analysis and resynthesis. And each frame of samples
`has a frame-energy-based scaling applied prior to and after
`Fourier analysis to preserve quiet speech resolution.
`The invention has advantages including simple speech
`noise suppression.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The drawings are schematic for clarity.
`FIGS. la-b show speech systems with noise suppression.
`FIG. 2 illustrates a preferred embodiment noise suppres-
`sion subsystem.
`FIGS. 3-S are flow diagrams for preferred embodiment
`noise suppression.
`FIG. 6 is a flow diagram for a framewise scaling preferred
`embodiment.
`FIGS. 7-8 illustrate spectral subtraction preferred
`embodiment aspects.
`FIGS. 9a-b shows spectral subtraction preferred embodi(cid:173)
`ment systems.
`FIGS. lOa-b illustrates spectral subtraction preferred
`embodiments with adaptive minimum gain clamping.
`FIG. 11 is a block diagram of a modified Wiener filter
`preferred embodiment system.
`FIG. 12 shows a codebook based generalized Wiener filter
`preferred embodiment system.
`FIG. 13 illustrates a preferred embodiment internal pre(cid:173)
`cision control system.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENfS
`
`10
`
`6
`the frame by the filter coefficients generated in noise filter
`block 208; and IFFf module 210 converts back to the time
`domain by inverse fast Fourier transform. Noise suppressed
`frame buffer 212 holds the filtered output for speech
`5 analysis, such as LPC coding. recognition. or direct trans(cid:173)
`mission. The filter coefficients in block 208 derive from
`estimates for the noise spectrum and the noisy speech
`spectrum of the frame, and thus adapt to the changing input.
`All of the noise suppression computations may be performed
`with a standard digital signal processor such as a
`TMS320C25, which can also perform the subsequent speech
`analysis, if any. Also, general purpose microprocessors or
`specialized hardware could be used.
`The preferred embodiment noise suppression filters may
`also be realized without Fourier transforms; however, the
`15 multiplication of Fourier transforms then corresponds to
`convolution of functions.
`The preferred embodiment noise suppression filters may
`each be used as the noise suppression blocks in the generic
`systems of FIGS. la-b to yield preferred embodiment
`20 systems.
`The smoothed spectral subtraction preferred embodi(cid:173)
`ments have a spectral subtraction filter which (1) clamps
`attenuation to limit suppression for inputs with small signal(cid:173)
`to-noise ratios, (2) increases noise estimate to avoid filter
`25 fluctuations, (3) smoothes noisy speech and noise spectra
`used for filter definition, and (4) updates a noise spectrum
`estimate from the preceding frame using the noisy speech
`spectrum. The attenuation clamp may depend upon speech
`and noise estimates in order to lessen the attenuation (and
`30 distortion) for speech; this strategy may depend upon esti(cid:173)
`mates only in a relatively noise-free frequency band. FIG. 3
`is a flow diagram showing all four aspects for the generation
`of the noise suppression filter of block 208.
`The signal-to-noise ratio adaptive generalized Wiener
`35 filter preferred embodiments use H(ro)=[P5~(ro)/[P.i(ro)+
`a.P,Jro)]]ll where the noise suppression factor a depends on
`Btffl.N with EN the noise energy and By the noisy speech
`energy for the frame. These preferred embodiments also use
`a scaled LPC spectral approximation of the noisy speech for
`40 a smoothed speech power spectrum estimate as illustrated in
`the flow diagram Figure 4. FIG. 4 also illustrates an optional
`filtered a..
`The codebook-based generalized Wiener filter noise sup(cid:173)
`pression preferred embodiments use H(ro)=[P5~(m)I[P5A(ro)
`45 +a.P,Jm)]]ll with P5 '(ro) estimated from LSFs as weighted
`sums of LSFs in a codebook of LSFs with the weights
`determined by the LSFs of the input noisy speech. Then
`iterate: use this H(m) to form H(ro)Y(ro). next redetermine
`the input LSFs from H(ro)Y(ro). and then redetermine H(m)
`50 with these LSFs as weights for the codebook LSFs. A half
`dozen iterations may be used. FIG. S illustrates the flow.
`The power estimates used in the preferred embodiment
`filter definitions may also be used for adaptive scaling of low
`power signals to avoid loss of precision during FFT or other
`55 operations. The scaling factor adapts to each frame so that
`with fixed-point digital computations the scale expands or
`contracts the samples to provide a constant overflow
`headroom. and after the computations the inverse scale
`restores the frame power level. FIG. 6 illustrates the flow.
`60 This scaling applies without regard to automatic gain control
`and could even be used in conjunction with an automatic
`gain controlled input.
`
`Overview
`FIG. 2 shows a preferred embodiment noise suppression
`filter system 200. In particular. frame buffer 202 partitions
`an incoming stream of speech samples into overlapping
`frames of 256-sample size and windows the frames; FFT 65
`module 204 converts the frames to the frequency domain by
`fast Fourier transform; multiplier 206 pointwi~e multiplies
`
`Smoothed Spectral Subtraction Preferred
`Embodiments
`FIG. 3 illustrates as a flow diagram the various aspects of
`the spectral subtraction preferred embodiments as used to
`
`Petitioner Apple Inc.
`Ex. 1011, p. 11
`
`

`
`5,706,395
`
`H(ro)'=1-41N(ro)I2/IY(ro)l'
`
`8
`
`7
`generate the filter. A preliminary consideration of the stan(cid:173)
`dard spectral subtraction noise suppression simplifies expla(cid:173)
`nation of the preferred embodiments. Thus first consider the
`standard spectral subtraction filter:
`
`H(ro'f
`
`[IY(ro)IZ- IN(ro)fi]/IY(ro)fl
`
`1 - IN(ro)i211Y(ro)i2
`
`For small input signal-to-noise power ratios this becomes
`negative, but a damp as in (1) eliminates the problem. This
`5 noise increase factor appears as a shift in the logarithmic
`input signal-to-noise power ratio independent variable of
`FIG. 7. Of course, the 2 factor could be replaced by other
`factors such as 1.5 or 3; indeed, FIG. 7 shows a 5 dB noise
`A graph of this function with logarithmic scales appears in 10 increase factor with the resulting attenuation curve labelled
`FIG. 7 labelled "standard spectral subtraction". Indeed.
`"noise increased". Further, the factor could vary with fie-
`spectral subtraction consists of applying a frequency-
`quency such as more noise increase (i.e., more attenuation)
`at low frequencies.
`dependent attenuation to each frequency in the noisy speech
`power spectrum with the attenuation tracking the input
`(3) Reduce the variance of spectral estimates used in the
`signal-to-noise power ratio at each frequency. That is, H(ro) 15 noise suppression filter H(ro) by smoothing over neighbor-
`represents a linear time-varying filter. Consequently, as
`ing frequencies. That is, for an input windowed noisy speech
`shown in FIG. 7, the amount of attenuation varies rapidly
`signal y(j) with Fourier transform Y(ro). apply a running
`average over frequency so that IY(ro)l2 is replaced by
`with input signal-to-noise power ratio, especially when the
`(W*IYI 2)(ro) in H(ro) where W(ro) is a window about 0 and
`input signal and noise are nearly equal in power. When the
`input signal contains only noise, the filtering produces 20 * is the convolution operator. FIG. 8 shows that the spectral
`musical noise because the estimated input signal-to-noise
`estimates for white noise converge more closely to the
`power ratio at each frequency fluctuates due to measurement
`correct answer with increasing smoothing window size. That
`error, producing attenuation with random variation across
`is, the curves labelled "5 element smoothing", "33 element
`frequencies and over time. FIG. 8 shows the probability
`smoothing", and "128 element smoothing" show the
`distribution of the FFf power spectral estimate at a given 25 decreasing probabilities for large variations with increasing
`smoothing window sizes. More spectral smoothing reduces
`frequency of white noise with unity power (labelled "no
`smoothing"), and illustrates the amount of variation which
`noise fluctuations in the filtered speech signal because it
`can be expected.
`reduces the variance of spectral estimation for noisy frames;
`The preferred embodiments modify this standard spectral
`however, spectral smoothing decreases the spectral resolu-
`subtraction in four independent but synergistic approaches 30 tion so that the noise suppression attenuation filter cannot
`as detailed in the following.
`track sharp spectral characteristics. The preferred embodi-
`Preliminarily, partition an input stream of noisy speech
`ment operates with sampling at 8 KHz and windows the
`sampled at 8 KHz into 256-sample frames with a 50%
`input into frames of size 56 samples (32 milliseconds); thus
`an FFI' on the frame generates the Fourier transform as a
`overlap between successive frames; that is, each frame
`shares its first 128 samples with the preceding frame and 35 function on a domain of 256 frequency values. Take the
`shares its last 128 samples with the succeeding frame. This
`smoothing window W(ro) to have a width of 32 frequencies,
`yields an input stream of frames with each frame having 32
`so convolution with W(ro) averages over 32 adjacent fre-
`msec of samples and a new frame beginning every 16 msec.
`quencies. W(ro) may be a simple rectangular window or any
`Next, multiply each frame with a Hann window of width
`other window. The filter transfer function with such smooth-
`256. (A Hann window has the form w(k)=(1+cos(21tk/K))/2 40 ing is:
`with K+1 the window width.) Thus each frame has 256
`samples y(j), and the frames add to reconstruct the input
`speech stream.
`Fourier transform the windowed speech to find Y(m) for
`the frame; the noise spectrum estimation di11ers from the
`traditional methods and appears in modification ( 4 ).
`(1) Clamp the H(ro) attenuation curve so that the attenu(cid:173)
`ation cannot go below a minimum value; FIG. 7 has this
`labelled as "damped" and illustrates a 10 dB clamp. The
`clamping prevents the noise suppression filter H(ro) from
`fluctuating around very small gain values, and also reduces
`potential speech signal distortion. The corresponding filter
`would be:
`
`Thus a filter with all three of the foregoing features has
`transfer function:
`
`45
`
`H(ro)"=max[lO""', 1-IN(ro)l'tlY(ro)f']
`
`Of course, the 10 dB damp could be replaced with any other
`desirable damp level. such as 5 dB or 20 dB. Also, the
`damping could include a sloped damp or stepped clamping
`or other more general clamping curves, but a simple damp
`lessens computational complexity. The following "Adaptive 60
`filter damp" section describes a damp which adapts to the
`input signal energy level.
`(2) Increase the noise power spectrum estimate by a factor
`such as 2 so that small errors in the spectral estimates for
`input (noisy) signals do not result in fluctuating attenuation 65
`filters. The corresponding filter for this factor alone would
`be:
`
`Extend the definition of H( ro) by symmetry to 7t<eo<27t or
`50 -7t<C0<0
`( 4) Any noise suppression by sp

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket