throbber
United States Patent [19]
`Helf et al.
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US005550924A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,550,924
`Aug. 27, 1996
`
`[54] REDUCTION OF BACKGROUND NOISE
`FOR SPEECH ENHANCEMENT
`
`5,012,519
`5,133,013
`
`411991 Adlersberg et al ..
`7/1992 Munday.
`
`[75]
`
`Inventors: Brant M. Helf, Melrose; Peter L. Chu,
`Lexington, both of Mass.
`
`FOREIGN PATENT DOCUMENTS
`
`3132221
`
`511991
`
`Japan ....................................... 381/94
`
`[73] Assignee: PictureTel Corporation, Danvers,
`Mass.
`
`[21] Appl. No.: 402,550
`Mar. 13, 1995
`
`[22] Filed:
`
`Related U.S. Application Data
`
`[63] Continuation of Ser. No. 86,707, Jul. 7, 1993, abandoned.
`Int. Cl.6
`..................................................... H04B 15/00
`[51]
`[52] U.S. Cl . ................................. 381194; 381/46; 381/47;
`395/2.34; 395/2.35
`[58] Field of Search .................................. 381/94, 46, 47;
`395/2.34, 2.35
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,185,168
`4,628,529
`4,630,304
`4,630,305
`4,653,102
`4,658,426
`4,696,039
`4,852,175
`4,868,880
`4,912,767
`
`111980 Graupe et al ..
`12/1986 Borth et al ..
`12/1986 Borth et al ..
`12/1986 Borth et al ..
`3/1987 Hansen .
`411987 Chabries et al ..
`9/1987 Doddington .............................. 381/46
`711989 Borth et al ..
`9/1989 Bennett, Jr ..
`3/1990 Chang .
`
`Primary Examiner-Forester W. Isen
`Attorney, Agent, or Firm-Fish & Richardson P.C.
`
`[57]
`
`ABSTRACT
`
`Properties of human audio perception are used to perform
`spectral and time masking to reduce perceived loudness of
`noise added to the speech signal. A signal is divided tem(cid:173)
`porally into blocks which are then passed through notch
`filters to remove narrow frequency band components of the
`noise. Each block is then appended to part of the previous
`block in a manner which avoids block boundary disconti(cid:173)
`nuities. An FFf is then performed on the resulting larger
`block, after which the spectral components of the signal are
`fed to a background noise estimator. Each frequency com(cid:173)
`ponent of the signal is analyzed with respect to the back(cid:173)
`ground noise to determine, within various confidence levels,
`whether it is pure noise or a noise-and-signal combination.
`The frequency band's gain function is determined, based on
`the confidence levels. A spectral valley finder detects and
`fills in spectral valleys in the frequency component gain
`function, after which the function is used to modify the
`magnitude components of the FFI'. An inverse FFf then
`maps the signal back from the frequency domain to the time
`domain to give a frame of noise-reduced signal. This signal
`is then multiplied by a temporal window and joined to the
`previous frame's signal to derive the output.
`
`21 Claims, 3 Drawing Sheets
`
`28
`
`BACKGROUND
`NOISE
`ESTIMATOR
`
`20
`
`36
`
`Petitioner Apple Inc.
`Ex. 1010, p. 1
`
`

`

`til =
`
`~
`N
`\C
`...
`til
`...
`til
`
`~
`
`"""" s,
`C7.l [
`
`"""" -= -= ="
`--~
`~
`>
`
`~ a
`~ • rJJ.
`
`~
`•
`
`I
`
`I WINDOW I _I OVERLAP I !0
`
`c 18
`
`n& ADD
`
`30
`
`FIG. 1
`
`Petitioner Apple Inc.
`Ex. 1010, p. 2
`
`FILLER
`SPECTRAL VAllEY (;38
`
`1
`
`BY CRITICAL BANDS
`SPECTRAL SPREADING
`TEMPORAL AND
`
`I
`36
`
`1-}4
`
`NOISE DETECTOR
`
`LOCAL SPEECH
`
`l
`vs
`
`NOISE DETECTOR
`
`rJ:jGLOBAl v~PEECH LP
`
`SPECTRAL MODIFIER
`NOISE SUPPRESSION
`
`20
`
`ESTIMATOR
`NOISE
`BACKGROUND
`
`I•'
`
`26
`
`~4
`
`--,
`
`I
`
`ESTIMATOR
`STATIONARITY
`
`L--
`
`~
`ESTIMATES
`COMPARE
`
`"'
`
`28
`
`MIN. ESTIMATOR
`RUNNING
`1
`r
`
`r--...I.C:....:.1=-6
`
`IFFT -
`2:
`
`I
`
`ATTENUATE
`(.12
`
`ONE H
`
`DELAY
`,-.1 FRAME
`
`/'10
`
`FFT
`'-,
`8
`
`WINDOW
`1"6
`
`1
`
`4
`
`.l.
`
`

`

`U.S. Patent
`
`Aug. 27, 1996
`
`Sheet 2 of 3
`
`5,550,924
`
`c ompute Spectral Shape
`f • fc- 3 (k. k, . 31
`L
`Ni (fc) = 0. 25 I:
`f = fc
`
`k .. k1
`
`2,0
`
`+ I 2 (k, f)))
`
`(R 2 (kl f)
`
`Find Sdfc)
`
`242
`
`_f_
`
`k . kl + 31
`
`si (fc> = L
`
`k .. k1
`
`(R 2 (k, fc) + I 2 (k 1 fc})
`
`IF
`
`?
`Ni (fc) > tlsi (fc>
`
`~244
`
`OR
`
`si (fc>
`
`?
`>
`
`tlNi (fc), for i = 0, 1, ... , 7 v
`
`246
`
`IF
`Ni (fc) ?
`>
`
`thsi { fc)
`
`v24s
`
`OR
`si (fc} ?
`>
`
`t~i (fc) 1 for i = 0, 1, • • I
`
`I 7
`
`v250
`
`(50 Consecutive Noise Frames?
`
`252
`
`f .. f~ + 31
`
`. 32
`
`f = f
`
`FIG. 3
`
`Develop Background Noise Estimate
`Bk = .J:._ L
`~ •
`
`(R 2 (k, f) + I 2 (k, f))
`
`254
`v
`
`Petitioner Apple Inc.
`Ex. 1010, p. 3
`
`

`

`U.S. Patent
`
`Aug. 27, 1996
`
`Sheet 3 of 3
`
`5,550,924
`
`NOTCH FILTER j
`1-2cosez-1 +z-2
`1-2rcos6z -l + r 2 z -2
`~
`w(i) = f (i) ~ o. 5-o . 5 cos(n 1~ 1 )
`
`H(z) =
`
`WINDOW
`
`w(i) = f(i)
`w(i) =f(i)~0.5
`- 0. 5 cos(n 511 - i)
`191
`
`14
`
`(6
`
`for i : 01 11 • • • 1 1911
`
`for i=1921 193 1 ... I 319,
`
`for i=319, 320, ... , 511
`
`~
`
`FIG. 2
`
`FIND FREQUENCY OF
`lARGEST MAGNITUDE
`EVERY 100 Hz
`
`264
`
`FIG. 4
`
`Petitioner Apple Inc.
`Ex. 1010, p. 4
`
`

`

`5,550,924
`
`1
`REDUCTION OF BACKGROUND NOISE
`FOR SPEECH ENHANCEMENT
`
`This is a continuation of application Ser. No. 08/086,707,
`filed Jul. 7, 1993, now abandoned.
`
`BACKGROUND OF THE INVENTION
`This invention relates to communicating voice informa- 10
`tion over a channel such as a telephone communication
`channel.
`Microphones used in voice transmission systems typically
`pick up ambient or background sounds, called noise, along
`with the voices they are intended to pick up. In voice
`transmission systems in which the microphone is at some
`distance from the speaker(s), for example, systems used in
`video and audio telephone conference environments, back(cid:173)
`ground noises are a cause of poor audio quality since the
`noise is added onto the speech picked up by a microphone.
`By their nature and intended use, these systems must pick up
`sounds from all locations surrounding their microphones,
`and these sounds will include background noise.
`Fan noise, originating from HVAC systems, computers,
`and other electronic equipment, is the predominant source of
`noise in most teleconferencing environments.
`A good noise suppression technique will reduce the
`perception of the background noise while simultaneously
`not affecting the quality or intelligibility of the speech. In
`general it is an object of this invention to suppress any
`constant noise, narrowband or wideband, that is added onto
`the speech picked up by a single microphone. It is a further
`object of this invention to reduce fan noise that is added onto
`the speech picked up by a single microphone.
`
`2
`The global decision mechanism makes, for each fre(cid:173)
`quency component of the frequency spectrum components,
`a determination as to whether that frequency component is
`primarily noise. The local noise decision mechanism
`5 derives, for each frequency component of the frequency
`spectrum components, a confidence level that the frequency
`component is primarily a noise component. The detector
`determines, based on the confidence levels, a gain multipli-
`cative factor for each frequency component. The spreading
`mechanism spectrally and temporally spreads the effect of
`the determined gain multiplicative factors, and the spectral
`valley filler detects and fills in spectral valleys in the
`resulting frequency components.
`In other aspects of the preferred embodiment, the back-
`IS ground noise estimator also produces a noise estimate for
`each frequency spectrum component, and the local noise
`decision mechanism derives confidence levels based on:
`ratios between each frequency component and its corre(cid:173)
`sponding noise estimate, and the determinations made by the
`20 global decision mechanism.
`In another aspect, the invention further features a post(cid:173)
`window and an overlap-and-adder mechanism. The post(cid:173)
`window produces smoothed time-domain components for
`minimizing discontinuities in the noise-reduced time-do-
`25 main components; and the overlap-and-adder outputs a first
`portion of the smoothed time-domain components in com(cid:173)
`bination with a previously stored portion of smoothed time(cid:173)
`domain components, and stores a remaining portion of the
`smoothed frequency components, where the remaining por-
`tion comprises the smoothed frequency components not
`included in the first portion.
`In preferred embodiments of the device, the background
`noise estimator includes at least two estimators, each pro(cid:173)
`ducing a background noise estimate, and a comparator for
`comparing and selecting one of the background noise esti-
`mates. One of the estimators is a running minimum estima(cid:173)
`tor, .and the other estimator is a stationary estimator.
`In preferred embodiments, the device also includes a
`40 notch filter mechanism for determining the locations of the
`notches for the notch filter bank.
`
`30
`
`35
`
`SUMMARY OF THE INVENTION
`
`In one aspect, generally, the invention relates to a device
`for reducing the background noise of an input audio signal.
`The device features a framer for dividing the input audio
`signal into a plurality of frames of signals, and a notch filter
`bank for removing components of noise from each of the
`frames of signals to produce filtered frames of signals. A
`multiplier multiplies a combined frame of signals to produce 45
`a windowed frame of signals, wherein the combined frame
`of signals includes all signals in one filtered frame of signals
`combined with some signals in the filtered frame of signals
`immediately preceding in time the one filtered frame of
`signals. A transformer obtains frequency spectrum compo- so
`nents from the windowed frame of signals, and a back(cid:173)
`ground noise estimator uses the frequency spectrum com(cid:173)
`ponents to produce a noise estimate of an amount of noise
`in the frequency spectrum components. A noise suppression
`spectral modifier produces gain multiplicative factors based 55
`on the noise estimate and the frequency spectrum compo(cid:173)
`nents. A delayer delays the frequency spectrum components
`to produce delayed frequency spectrum components. A
`controlled attenuator attenuates the frequency spectrum
`components based on the gain multiplicative factors to 60
`produce noise-reduced frequency components, and an
`inverse transformer converts the noise-reduced frequency
`components to the time domain.
`In preferred embodiments, the noise suppression spectral
`modifier includes a global decision mechanism, a local 65
`decision mechanism, a detector, a spreading mechanism, and
`a spectral valley filler.
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`FIG. 1 is a block diagram of a noise suppression system
`according to the invention; and
`FIGS. 2-4 are detailed block diagrams implementing
`parts of the block diagram of FIG. 1.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`The simplest noise suppression apparatus, in daily use by
`millions of people around the world, is the so-called
`"squelch" circuit. A squelch circuit is standard on most
`Citizen Band two-way radios. It operates by simply discon(cid:173)
`necting the system's loudspeaker when the energy of the
`received signal falls below a certain threshold. The value of
`this threshold is usually fixed using a manual control knob
`to a level such that the background noise never passes to the
`speaker when the far end is silent. The problem with this
`kind of circuit is that when the circuit turns on and off as the
`far end speaker starts and then stops, the presence and then
`absence of noise can be clearly heard. The noise is wideband
`and covers frequencies in which there is little speech energy,
`and thus the noise can be heard simultaneously as the person
`is talking. The operation of the squelch unit produces a very
`
`Petitioner Apple Inc.
`Ex. 1010, p. 5
`
`

`

`5,550,924
`
`15
`
`3
`disconcerting effect, although it is preferable to having no
`noise suppression whatsoever.
`The noise suppression method of this invention improves
`on the "squelch" concept considerably by reducing the
`background noise in both speech and non-speech sections of 5
`the audio.
`The approach, according to the invention, is based on
`human perception. Using principles of spectral and time
`masking (both defined below), this invention reduces the
`perceived loudness of noise that is added onto or mixed with
`the speech signal.
`This approach differs from other approaches, for example,
`those in which the goal is to minimize the mean-squared(cid:173)
`error between the speech component by itself (speech(cid:173)
`without-noise) and the processed speech output of the sup(cid:173)
`pression system.
`The method used in this invention exploits the "squelch"
`notion of turning up the gain on a channel when the energy
`of that channel exceeds a threshold and turning down the
`gain when the channel energy falls below the threshold,
`however the method performs the operation separately on
`different frequency regions. The gain on a channel can be
`considered to be the ratio between the volume of the input
`signal and the volume of the corresponding output signal.
`The method further exploits various psychoacoustic prin(cid:173)
`ciples of spectral masking, in particular the principles which
`basically state that if there is a loud tone at some frequency,
`then there exists a given frequency band around that fre(cid:173)
`quency, called the critical band, within which other signals
`cannot be heard. In other words, other signals in the critical
`band cannot be heard. The method of the invention is far
`more effective than a simple "squelch" circuit in reducing
`the perception of noise while speech is being received from
`the far end.
`The method of the invention also exploits a temporal
`masking property. If a loud tone burst occurs, then for a
`period of time up to 200 milliseconds after that burst the
`sensitivity of the ear in the spectral region of the burst is
`decreased. Another acoustic effect is that for a time of up to
`20 milliseconds before the burst, the sensitivity of the ear is
`decreased (thus, human hearing has a pipeline delay of about
`20 milliseconds). One key element of this invention is thus
`that the signal threshold below which the gain for a given
`band is decreased can be lowered for a period of time both
`before and after the occurrence of a sufficiently strong signal
`in that band since the ear's sensitivity to noise is decreased
`in that period of time.
`
`4
`ately preceding frame of digital signals to produce a win(cid:173)
`dowed frame.
`In preferred embodiments, each frame of digital signals
`(20 ms) is combined with the last 12 ms of the preceding
`frame to produce windowed frames having durations of 32
`ms. In other words, each windowed frame includes three
`hundred and twenty samples from a frame of digital signals
`in combination with the last one hundred and ninety-two
`filtered samples of the immediately preceding frame. The
`10 512-sample segment of speech is then multiplied by a
`window, at a multiplier 6, to alleviate problems arising from
`discontinuities of the signal at the beginning and end of the
`512 sample frame. A fast Fourier Transform (FFT) 8 is then
`taken of the 512 sample windowed frame, producing a 257
`component frequency spectrum.
`The lowest (D.C.) and highest (sampling frequency
`divided by two, or 8 kHz) frequency components of the
`transformed signal have real parts only, while the other 255
`components have both real and imaginary parts. The spectral
`components are fed to a background noise estimator 20
`20 whose purpose is to estimate the background noise spectral
`energies and to find background noise spectral peaks at
`which to place the notches of notch filter 4. A signal
`magnitude spectrum estimator, a stationary estimator 24,
`and background noise spectrum estimator, a running mini-
`25 mum estimator 22, for each frequency component are com(cid:173)
`pared by a comparator 28 and various confidence levels are
`derived by a decision mechanism 32 for each frequency
`component as to whether or not the particular frequency
`component is primarily from noise or from signal-plus-
`30 noise. Based on these confidence levels, the gain for a
`frequency band is determined by a gain setter 34. The gains
`are then spread, by a spreading mechanism 36, in the
`frequency domain in critical bands, spectrally and tempo(cid:173)
`rally, exploiting psychoacoustic masking effects. A spectral
`35 valley filler 38 is used to detect spectral valleys in the
`frequency component gain function and fill in the valleys.
`The final frequency component gain function from noise
`compression spectral modifier 30 is used to modify the
`magnitude of the spectral components of the 512-point FFT
`40 at an attenuator 12. Note that the frame at attenuator 12 is
`one time unit behind the signals which are primarily used to
`generate the gains. An inverse FFT (IFFT) 14 then maps the
`signal back into the time domain from the frequency
`domain. The resulting 512 point frame of noise-reduced
`45 signal is multiplied by a window at a multiplier 16. The
`result is then overlapped and added, at adder 18, to the
`previous frame's signal to derive 20 milliseconds or 320
`samples of output signal on line 40.
`A more detailed description of each block in the signal
`processing chrun is now provided, from input to output in the
`order of their occurrence.
`As described above, the framed input signal is fed through
`a bank of notch filters 4.
`With reference to FIGS. 1 and 2, the notch filter bank 4
`consists of a cascade of Infinite Impulse Response (IIR)
`digital filters, where each filter has a response of the form:
`
`System Overview
`
`50
`
`With reference now to the block diagram of FIG. 1, the
`input signal 1 is first apportioned by a framer 2 into 20
`millisecond frames of samples. (Because the input signal is 55
`sampled at a rate of 16kHz in the illustrated embodiment,
`each 20 ms frame includes 320 samples.) The computational
`complexity of the method is significantly reduced by oper(cid:173)
`ating on groups of frames of samples at a time, rather than
`on individual samples, one at a time. The framed signal is 60
`then fed through a bank of notch filters 4, the purpose of
`which is to remove narrow band components of the noise,
`typically motor noise occurring at the rotational frequencies
`of the motors. If the notches are narrow enough with a sparse
`enough spectral density, the tonal quality of the speech will 65
`not be adversely affected. Each frame of digital signals is
`then combined with a portion from the end of the immedi-
`
`(1)
`
`H(z)
`
`1 - 2cosez-1 + z-2
`1 - 2rcosez-I + Tlz-2
`where 8=n/8000x(frequency of notch), and r is a value less
`than one which reflects the width of the notch. If the -3 dB
`width of the notch is roHz, then r-=1-(ro/2) (n/8000). The
`bandwidth, ro, used in the illustrated and preferred embodi(cid:173)
`ment is 20Hz. A notch is placed approximately every 100
`Hz, at the largest peak of the background noise energy near
`the nominal frequency.
`
`Petitioner Apple Inc.
`Ex. 1010, p. 6
`
`

`

`5,550,924
`
`6
`decreasing at the start and end of the frame. Thus, the first
`192 samples of the present 512 sample extended and win(cid:173)
`dowed frame are added to the last 192 samples of the
`previous extended and windowed frame. Then the next 128
`samples (8 milliseconds) of the current extended frame is
`output. The last 192 samples of the present extended and
`windowed frame are then stored for use by the next frame's
`overlap-add operation, and so on.
`In a preferred embodiment, the window function, W, used
`10 will have the property that:
`
`5
`
`wo-+CW2 shifted by amount of overlap)=!
`
`to avoid producing modulation over time. For example, if
`the amount of overlap is one half a frame, then the win(cid:173)
`dowing function, W, has the property that:
`
`15
`
`5
`The notch filtering is applied to the 320 samples of the
`new signal frame. The resulting 320 samples of notch
`filtered output are appended to the last 192 samples of
`notch-filtered output from the previous frame to produce a
`total extended frame of 512 samples.
`Referring to FIGS. 1 and 2, the notch-filtered 512 sample
`frame derived from filter bank 4 is multiplied by a window
`using the following formula:
`
`(2)
`
`w(!)=.f{i)~ 0.5-0.5cos ( 1t 1~1 )
`
`for i = 0, 1, ... , 191,
`
`w(l) = j{l)
`
`fori= 192, 193, ... ,319,
`
`w(i) = j{i) ~ 0.5- 0.5cos ( 1t 51~~ i
`
`)
`
`25
`
`for i = 320, 321, ... , 511
`where f(i) is the value ofthe ith notch-filtered sample of 512
`sample frame from filter bank 4 and w(i) is the resultant 20
`value of the ith sample of the resultant 512 sample win(cid:173)
`dowed output which is next fed to the FFT 8. The purpose
`of the window, effected by multiplier 6, is to minimize edge
`effects and discontinuities at the beginning and end of the
`extended frame.
`The time-windowed 512 sample points are now fed to the
`FFT 8. Because of the ubiquity of FFT' s, many Digital
`Signal Processing (DSP) chip manufacturer's supply highly
`optimized assembly language code to implement the FFT.
`A one frame delay 10 is introduced so that signal fre- 30
`quency components of the FFT can be amplified and pro(cid:173)
`cessed in attenuator 12 based upon later occurring signal
`values. This does not introduce any perceptual noise
`because, as noted above, a signal component will mask
`frequencies in its spectral neighborhood 20 milliseconds 35
`before it actually occurs. Also, since speech sounds gradu(cid:173)
`ally increase in volume starting from zero amplitude, the one
`frame delay prevents clipping the start of speech utterances.
`Those components of the FFT due to noise are attenuated
`by attenuator 12, while those components due to signal are 40
`less attenuated or unattenuated or may be amplified. As
`noted above, for each frequency, there is a real and an
`imaginary component. Both components are multiplied by a
`single factor found from the Noise Suppression Spectral
`Modifier module 30, so that the phase is preserved for the 45
`frequency component while the magnitude is altered.
`The inverse FFT 14 (IFFT) is taken of the magnitude
`modified FFT, producing a frequency processed extended
`frame, 512 samples in length.
`The windowing operation used in multiplier 16 is exactly 50
`the same as the windowing operation defined above for
`multiplier 6. Its purpose is to minimize discontinuities
`introduced by the attenuation of frequency components. For
`example, suppose that all frequency components have been
`set to zero except for one. The result will be a sine wave 55
`when the IFFT is taken. This sine wave may start at a large
`value and end at a large value. Neighboring frames may not
`have this sine wave component present. Thus, without
`proper windowing, when this signal is overlap-added in the
`output adder 18, a click may be heard at the start and end of 60
`the frame. However, by properly windowing the sine wave,
`using, for example, the parameters defined in Equation 2,
`what will be heard is a sine wave smoothly increasing in
`magnitude and then smoothing decreasing in magnitude.
`Because of the pre- and post-windowing of the frame by 65
`multipliers 6 and 16, overlap and addition of the frames is
`necessary to prevent the magnitude of the output from
`
`wo-+(wo- shifted by ll:z)=l
`
`Background Noise Estimator 20
`
`Referring to FIGS. 1 and 3, the background noise esti(cid:173)
`mator 20 and the noise suppression spectral modifier module
`30 operate as follows.
`The purpose of the background noise estimator 20 is to
`develop an estimate for each frequency component of the
`FFT, the average energy magnitude due to the background
`noise. The background noise estimator removes the need for
`the user to manually adjust or train the system for each new
`environment. The background noise estimator continually
`monitors the signal/noise environment, updating estimates
`of the background noise automatically in response to, for
`example, air conditioning fans turning off and on, etc. Two
`approaches are used, with the results of one or the other
`approach used in a particular situation. The first approach is
`more accurate, but requires one second intervals of solely
`background noise. The second approach is less accurate, but
`develops background noise estimates in 1 0 seconds under
`any conditions.
`
`Stationary Estimator 24
`
`With reference to FIGS. 1 and 3, the first approach uses
`a stationary estimator 24 to look for long sequences of
`frames where the spectral shape in each frame is very similar
`to that of the other frames. Presumably, this condition can
`only arise if the human in the room is silent and the constant
`background noise due to fans and/or circuit noise is the
`primary source of the signal. When such a sequence is
`detected, the average magnitude of each frequency is taken
`from those frames in the central part of the FFT sequence
`(frames at the beginning and end of the sequence may
`contain low level speech components). This method yields a
`much more accurate measurement of the background noise
`spectrum as compared to the second approach (described
`below), but requires that the background noise is relatively
`constant and that the humans in the room are not talking for
`a certain period of time, conditions sometimes not found in
`practice.
`The operation of this estimator, in more detail, is as
`follows:
`1. Referring to FIG. 3, the method in the first approach
`determines if the current 20 ms frame is similar in
`spectral shape to the previous frames. First, the method
`computes, at 240, the spectral shape of the previous
`frames:
`
`Petitioner Apple Inc.
`Ex. 1010, p. 7
`
`

`

`5,550,924
`
`7
`
`(3)
`
`(R2(k,J) + fl(k,J))
`
`)
`
`f=Jc-3 ( k=k,+31
`N;ifc) = 0.25
`1:
`1:
`f=fc
`k=ki
`where fc is the frame number for the current 20 ms frame (it
`advances by one for consecutive frames), i denotes a 1000
`Hz frequency band, k,=i * 32, k indexes the 256 frequency
`components of the 512 point FFT, and R(k, f) and l(k, f) are
`the real and imaginary components of the kth frequency
`component of the frame f.
`2. Next, Si(fc), the spectral shape of the current frame, is
`determined at 242:
`
`(4)
`
`(R2(k,fC) + fl(k,fc))
`
`k=ki+31
`S;(fc) =
`1:
`k=ki
`where the notation has the same meaning as in equation (3)
`above; and S; is the magnitude of the ith frequency compo(cid:173)
`nent of the current frame, fc.
`3. The estimator 24 then checks, at 244 and 246, to
`determine whether
`
`or
`
`(5)
`
`8
`mately every 100Hz, is found (at 264), and the frequency at
`which this locally maximum magnitude occurs corresponds
`to the location at which a notch center frequency will be
`placed (at 266). Notches are useful in reducing fan noise
`5 only up to 1500Hz or so, because for higher frequencies, the
`fan noise spectrum tends to be fairly even with the absence
`of strong peaks.
`
`Running Minimum Estimator 22
`
`10
`
`20
`
`There will be some instances when either the speech
`signal is never absent for more than a second or the
`background noise itself is never constant in spectral shape,
`so that the stationary estimator 24 (described above) will
`15 never produce noise background estimates. For these cases,
`the running minimum estimator 22 will produce noise
`background estimates, albeit with much less accuracy.
`The steps used by the running minimum estimator are:
`1. Over a 10 second interval, and for each frequency
`component k, find the eight consecutive frames which
`minimize the energy of the eight consecutive frames for
`that frequency component; that is, for every frequency
`component k find the frame fk that minimizes Mk(fk)
`where
`
`(6) 25
`
`where t1 is a lower threshold. In a preferred embodiment,
`t 1=3. If the inequality in (5) or (6) is satisfied for more than
`four values of i, then the current frame fc is classified as
`signal; otherwise, the estimator checks (at 248 and 250) to 30
`determine whether
`
`or
`
`(7)
`
`35
`
`(8)
`
`40
`
`where th is a higher threshold, and N; designates the mag-
`nitude of the ith frequency component of the background
`noise estimate. In a preferred embodiment, th=4.5. If either
`inequality is satisfied for one or more values of i, then the
`current frame fc is also classified as a signal frame. Other(cid:173)
`wise the current frame is classified as noise.
`4. If fifty consecutive noise-classified frames occur in a
`row, at 252 (corresponding to one second of noise),
`then estimator 24 develops noise background estimates
`by summing frequency energies from the 1Oth to the
`41st frame. By ignoring the beginning and ending 50
`frames of the sequence, confidence that the signal is
`absent in the remaining frames is increased. The esti(cid:173)
`mator finds, at 254,
`
`45
`
`(9)
`
`55
`
`1
`Bk = 32"""
`
`(R2(k,J) + fl(k,J))
`
`J=J,+31
`1:
`f=J,
`where k=O, 1, 2, ... , 255, fs is the starting index of the lOth
`noise-classified frame, and the other terms have the same
`notation as in equation (3). The values, Bk, now represent the
`average spectral magnitude of the noise component of the 60
`signal for the kth frequency.
`To determine where to place the notches of the notch filter
`bank, with reference to FIGS. 1 and 4, the unwindowed 20
`ms time-domain samples corresponding to the 32 noise-only
`classified frames are appended together (at 260) to form a 65
`contiguous sequence. A long FFf is taken ofthe sequence (at
`262). The component having the largest magnitude, approxi-
`
`Mk([k) = + 1: (R2(k,j) + P.(k,j))
`
`f=fk+1
`
`(10)
`
`t=fk
`where fk is any frame number occurring within the 10 second
`interval. Note that, in general, the fk that minimizes equation
`(10) will take on different values for different frequency
`components, k.
`2. Use the minimum values Mk derived in the previous
`step as the background noise spectral estimate if the
`following two conditions are both met:
`(a) It has been more than 10 seconds since the last
`update of the background noise spectral estimate due
`to the Stationary Estimator.
`(b) The difference, D, between the past background
`noise estimate, which may have resulted from the
`Stationary Estimator or the Running Minimum Esti(cid:173)
`mator, and the current Running Minimum Estimator
`is great. The metric used to define the difference D is
`given in Equation 11:
`
`2
`
`(11)
`
`( Mk
`NK
`
`NK
`' Mk
`
`)
`
`)
`
`- 1
`
`k=255 (
`D= ,db max
`where the max function returns the maximum of its
`two arguments, and Nk are the previous background
`noise estimates (from either Running Minimum or
`Stationary Estimators), and Mk are the current back(cid:173)
`ground noise estimates from the Runriing Minimum
`Estimator.
`If Dis greater than some threshold, for example, 3,000 in
`a preferred embodiment, and the preceding condition (a) is
`satisfied, then Mk is used as the new background spectral
`estimate. The use of Mk as the noise estimate indicates that
`the notch filters should be disabled, since a good estimate of
`the notch center frequencies is not possible.
`
`Noise Suppression Spectral Modifier 30
`
`Referring to FIG. 1, once the background noise estimate
`has been found, the current frame's spectra must be com(cid:173)
`pared to the background noise estimate's spectra, and on the
`basis of this comparison, attenuation must be derived for
`each frequency component of the current frame's FFT in an
`
`Petitioner Apple Inc.
`Ex. 1010, p. 8
`
`

`

`9
`attempt to reduce the perception of noise in the output
`signal.
`
`10
`Temporal and Spectral Spreading of Frequency Bin
`Gains by Critical Bands 36
`
`5,550,924
`
`Global Speech versus Noise Detector 32
`
`5
`
`Any given frame will either contain speech or not. Global
`Speech versus Noise Detector 32 makes a binary decision as
`to whether or not the frame is noise.
`In the presence of speech, thresholds, can be lowered 10
`because masking effects will tend to make incorrect signal
`versus noise declarations less noticeable. However, if the
`frame truly is noise only, slight errors in deciding whether or
`not frequency components are due to noise or signal will
`give rise to so-called "twinkling" sounds.
`In accordance with the illustrated embodiment, to deter(cid:173)
`mine whether speech is present in a frame, the system
`compares the magnitude of the kth frequency component of
`the current frame, designated Sk, and the magnitude of the
`kth frequency component of the background noise estimate, 20
`designated Ck. Then if SyTxC, for more than 7 values ofk
`(for one frame), where Tis a threshold constant (T=3, in a
`preferred embodiment), the frame is declared a speech
`frame. Otherwise, it is declared a noise frame.
`
`15
`
`25
`
`s.
`iffik >14, Dk=4,
`s.
`iffik >13, D•=3,
`s.
`if NK >12, D•=2,
`s.
`iffik >I), D•=l,
`
`else
`
`else
`
`else
`
`(12)
`
`50
`
`55
`
`Dk=O
`else
`where Sk=R2(k)+I2(k) for the current frame and Nk is the
`background noise estimate for component k. The values
`used for t1, ~, t3, t4 vary depending on whether the global
`spe

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket