`Lindemann et al.
`
`54 NOISE REDUCTION SYSTEM FOR
`BNAURAL, HEARNGAD
`
`75 Inventors: Eric Lindemann; John Laurence
`Melanson, both of Boulder, Colo.
`73 Assignee: AudioLogic, Inc., Boulder, Colo.
`
`(21) Appl. No.: 123,503
`22 Filed:
`Sep. 17, 1993
`(51
`Int. Cl. ... H04R 25/00
`52 U.S. Cl. .............
`381/68.2; 381/68.4; 395/2.35
`58 Field of Search ....................... 381/68.2, 68, 68.4,
`381/60, 26, 74,94, 46, 47: 395/2.35, 2.12,
`2.37, 2.42
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,628,529 12/1986 Borth et al..
`4,630,305 12/1986 Borth et al..
`4,868,880 9/1989 Bennett, Jr. .
`4,887,299 12/1989 Cummins et al..
`5,029,217
`7/1991 Chabries et al. .
`5,341,452 8/1994 Hall, II et al. ......................... 395/2.35
`OTHER PUBLICATIONS
`“Multimicrophone Signal-Processing Technique to Remove
`Reverberation from Speech Signals” by J. Allen et al., vol.
`62. No. 4, Oct. 1977, pp. 912-915.
`“An Alternative Approach to Linearly Constrained Adaprive
`Beamforming” By L. J. Griffiths et al. IEEE Transactions,
`vol. AP-30, No. 1, Jan. 1982, pp. 27–34.
`"Speech Enhancement Using A Minimum Mean-Square
`Error Short-Time Spectral Amplitude Estimator” By Y.
`Ephraim et al. IEE Transactions, Dec. 1984, No. 6.
`Article Entitled "Extension of a Binaural Cross-Correlation
`Model by Contralateral Inhibition” By W. Lindemann, J.
`Acoust. Soc. Am. 80(6), Dec. 1986, pp. 1608-1622.
`“Multimicrophone Adaptive Beamforming for Interference
`Reduction. In Hearing Aids” by P. Peterson et al., Journal of
`Rehabilitation Research and Development, vol. 24, No. 4,
`pp. 103-110.
`
`US005651071A
`Patent Number:
`11
`45 Date of Patent:
`
`5,651,071
`Jul. 22, 1997
`
`"Evaluation of Two Voice-Separation Algorithms Using
`Normal-Hearing and Hearing-Impaired Listeners” By R.
`Stubbs et al., J. Acoust. Soc., Oct. 1988.
`“Improvement of Speech Intelligibility In Noise Develop
`ment and Evaluation of a New Directional Hearing Instru
`ment Based On Array Technology” By W. Soede, Delft
`Univ. of Technology.
`Article Entitled “Evaluation of An Adaptive Beamforming
`Method for Hearing Aids' By J. Greenberg et al., J. Acoust.
`Soc. Am. 91 (3), Mar. 1992, pp. 1662-1676.
`“Digital Signal Processing for Binaural Hearing Aids”. By
`Kollmeier et al, Proceedings International Congress on
`Acoustics, 1992, Beijing, China.
`Article Entitled “Cocktail-Party-Processing: Concept and
`Results.” By M. Bodden, Bodden Proceedings, 1992,
`Beijing, China.
`(List continued on next page.)
`
`Primary Examiner-Curtis Kuntz
`Assistant Examiner-Huyen D. Le
`Attorney, Agent, or Firm-Homer L. Knearl; Holland & Hart
`57
`ABSTRACT
`In this invention noise in a binaural hearing aid is reduced
`by analyzing the left and right digital audio signals to
`produce left and right signal frequency domain vectors and
`thereafter using digital signal encoding techniques to pro
`duce a noise reduction gain vector. The gain vector can then
`be multiplied against the left and right signal vectors to
`produce a noise reduced left and right signal vector. The cues
`used in the digital encoding techniques include
`directionality, short term amplitude deviation from long
`term average, and pitch. In addition, a multidimensional
`gain function based on directionality estimate and amplitude
`deviation estimate is used that is more effective in noise
`reduction than simply Summing the noise reduction results
`of directionality alone and amplitude deviations alone. As
`further features of the invention, the noise reduction is
`scaled based on pitch-estimates and based on voice detec
`
`tion.
`
`W
`
`14 Claims, 5 Drawing Sheets
`
`39
`
`LEF IN
`
`PRE
`EPASS
`
`WINDOW
`
`
`
`48
`
`SEERING
`GAN
`
`STEERNG
`APASS
`
`44
`
`19
`
`151
`
`WINDOW
`
`17818O
`
`GE) () PTCH
`Y
`GAN
`
`58
`
`GAN AJS
`
`
`
`RIGHT IN
`
`RE-
`EPHASS
`s
`-
`14
`5
`
`4
`
`SEERING
`ALPASS
`
`SERG
`GAN
`
`G.) 4
`
`242
`
`Eas
`
`250r.
`G 2O2
`
`Illi,
`
`GAN
`
`23
`4.
`
`238
`
`256
`
`240
`
`GE)
`WOW
`E (54
`
`244
`
`- 1 -
`
`Amazon v. Jawbone
`U.S. Patent 11,122,357
`Amazon Ex. 1007
`
`
`
`5,651,071
`Page 2
`
`OTHER PUBLICATIONS
`
`"Microphone Array Speech Enhancement In Overdeter
`mined Signal Scenarios” By R. Slyh et al., Proceedings
`IEEE International Conference on on Acoustics, Speech and
`Signal Processing. II-347-II-350.
`
`"Separation of Speech from Interfering Speech. By Means of
`Harmonic Selection” by T. Parsons, J. Acoust.Soc. Am... vol.
`60, No. 4, Oct. 1976, pp. 911–918.
`"Suppression of Acoustic Noise In Speech Using Spectral
`Subtraction” By S. Boll, IEEE Transactions on Acoustics,
`Speech and Signal Processing, vol. ASSP-27, No. 2, Apr.
`1979, pp. 113-120.
`
`- 2 -
`
`
`
`U.S. Patent
`
`Jul. 22, 1997
`
`Sheet 1 of 5
`
`5,651,071
`
`99CZ
`
`
`
`M00 NAMA?aeG)
`
`–30
`
`Z
`
`
`
`
`
`
`
`SISWHdW3
`
`69 ||
`
`8 #7
`
`|
`
`- 3 -
`
`
`
`U.S. Patent
`
`Jul. 22, 1997
`
`Sheet 2 of 5
`
`5,651,071
`
`FG.2
`
`NOTE: THIS CIRCUIT IS
`R
`REPEATED FOR EVERY
`FRECUENCY F OF THE FFT
`
`
`
`
`
`
`
`NTEGRATE
`TOTAL
`POWER
`
`19 O
`
`PTCH
`CONFIDENCE
`
`MAXIMUM
`DOT
`PRODUCT
`
`
`
`
`
`
`
`HARMONIC
`GRD
`TABLE
`
`
`
`SELECT
`GRD
`
`186
`
`F.G. 6
`
`1 92
`
`
`
`
`
`- 4 -
`
`
`
`U.S. Patent
`
`Jul. 22, 1997
`
`Sheet 3 of 5
`
`5,651,071
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`INNER
`PRODUCT
`128 POINT
`VECTOR
`
`MAG SQ
`SUM
`28 POINT
`VECTOR
`
`NO SMOOTHING
`
`6-17
`
`4 POINT COSINE KERNAL 8-15
`SMOOTHING FILTER
`
`13-2O 6 POINT COSINE KERNAL 16-23
`SMOOTHING FILTER
`
`20-35 8 POINT COSINE KERNA 24-31
`SMOOTHING FILTER
`
`
`
`26-53 12 POINT COSINE KERNAL 32-47
`SMOOTHING FILTER
`
`38-82 20 POINT COSINE KERNAL48-72
`SMOOTHING FILTER
`57-127 32 POINT COSINE KERNAL 73-127
`SMOOTHING FILTER
`
`157
`
`F. G. 3A
`
`
`
`
`
`NO SMOOTHING
`
`7
`O
`
`6- 17
`
`5
`4. POINT COSINE KERNAL 8-1
`SMOOTHING FILTER
`s-or POINT COSINE KERNAL 16-23
`SMOOTHING FILTER
`20-35 8 POINT COSINE KERNAL 24-31
`SMOOTHING FILTER
`26-53 12 POINT COSINE KERNA 32-47
`SMOOTHING FILTER
`
`
`
`38-82 20 POINT COSINE KERNAL48-72
`SMOOTHING FILTER
`57-127. 32. POINT COSINE KERNAL 73-127
`SMOOTHING FILTER
`
`
`
`57
`
`F. G. 3B
`
`INNER
`PRODUCT
`AVERAGE
`128 POINT
`VECTOR
`
`MAG SQ
`Aver
`128 POINT
`VECTOR
`
`- 5 -
`
`
`
`U.S. Patent
`
`Jul. 22, 1997
`
`Sheet 4 of 5
`
`5,651,071
`
`POLE
`1 -
`LOWPASS
`
`THEN d8
`F E999Hz
`THEN d
`IF F(25 OOHZ
`THEN d2
`ELSE d
`
`
`
`2D
`GAN
`FUNCTION
`TABLE
`
`
`
`
`
`
`
`LONG TERM
`AVERAGE
`ONE-POL
`OWPASS
`
`NOTE: THIS CIRCUIT IS
`REPEATED FOR EVERY
`FREQUENCY F OF THE FFT
`FG. 4
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ATTACK /
`RELEASE
`TWO
`POLE
`21 6
`
`
`
`
`
`ADJUSTED
`NOISE
`REDUCTION
`GAN
`
`
`
`- 6 -
`
`
`
`U.S. Patent
`
`Jul. 22, 1997
`
`Sheet 5 of 5
`
`5,651,071
`
`2D GAIN FOR
`SERIAL
`CONNECTION
`
`
`
`F.G. 5A
`
`
`
`2D
`GENERALZED
`GAN
`
`F.G. 5B
`
`- 7 -
`
`
`
`1.
`NOISE REDUCTION SYSTEM FOR
`BNAURAL, HEARNGAD
`
`5,651,071
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`The presentinvention relates to patent application entitled
`"Binaural Hearing Aid” Ser. No. 08/123.499, filed Sep. 17,
`1993, which describes the system architecture of a hearing
`aid that uses the noise reduction system of the present
`invention.
`
`10
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention:
`This invention relates to binaural hearing aids, and more
`particularly, to a noise reduction system for use in a binaural
`hearing aid.
`2. Description of Prior Art:
`Noise reduction, as applied to hearing aids, means the
`attenuation of undesired signals and the amplification of
`desired signals. Desired signals are usually speech that the
`hearing aid user is trying to understand. Undesired signals
`can be any sounds in the environment which interfere with
`the principal speaker. These undesired sounds can be other
`speakers, restaurant clatter, music, traffic noise, etc. There
`have been three main areas of research in noise reduction as
`applied to hearing aids: directional beamforming, spectral
`subtraction, pitch-based speech enhancement.
`The purpose of beamforming in a hearing aid is to create
`an illusion of "tunnel hearing” in which the listener hears
`what he is looking at but does not hear sounds which are
`coming from other directions. If he looks in the direction of
`a desired sound-e.g., someone he is speaking to-then
`other distracting sounds-e.g., other speakers-will be
`attenuated. A beamformer then separates the desired "on
`axis" (line of sight) target signal from the undesired "off
`axis' jammer signals so that the target can be amplified
`while the jammer is attenuated.
`Researchers have attempted to use beamforming to
`improve signal-to-noise ratio for hearing aids for a number
`of years {References 1,2,3,7,8,9}. Three main approaches
`have been proposed. The simplest approach is to use purely
`analog delay and sum techniques {2}. A more sophisticated
`approach uses adaptive FIR filter techniques using
`algorithms, such as the Griffiths-Jim beamformer {1, 3}.
`These adaptive filter techniques require digital signal pro
`cessing and were originally developed in the context of
`antenna array beamforming for radar applications {5}. Still
`another approach is motivated from a model of the human
`binaural hearing system {14, 15. While the first two
`approaches are time domain approaches, this last approach
`is a frequency domain approach.
`There have been a number of problems associated with all
`of these approaches to beamforming. The delay-and-sum
`and adaptive filter approaches have tended to break down in
`non-anechoic, reverberant listening situations: any real room
`will have so many acoustic reflections coming off walls and
`ceilings that the adaptive filters will be largely unable to
`distinguish between desired sounds coming from the front
`and undesired sounds coming from other directions. The
`delay-and-sum and adaptive filter techniques have also
`required a large (>=8) number of microphone sensors to be
`effective. This has made it difficult to incorporate these
`systems into practical hearing aid packages. One package
`that has been proposed consists of a microphone array across
`the top of eyeglasses {2}.
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`The frequency domain approaches which have been pro
`posed {7,8,9} have performed better than delay-and-sum or
`adaptive filter approaches in reverberant listening environ
`ments and function with only two microphones. The prob
`lems related to the previously-published frequency domain
`approaches have included unacceptably long input-to-output
`time delay, distortion of the desired signal, spatial aliasing at
`high frequencies, and some difficulty in reverberant envi
`ronments (although less than for the adaptive filter case).
`While beamforming uses directionality to separate
`desired signal from undesired signal, spectral subtraction
`makes assumptions about the differences in statistics of the
`undesired signal and the desired signal, and uses these
`differences to separate and attenuate the undesired signal.
`The undesired signal is assumed to be lower in amplitude
`then the desired signal and/or has a less time varying
`spectrum. If the spectrum is static compared to the desired
`signal (speech), then a long-term estimation of the spectrum
`will approximate the spectrum of the undesired signal. This
`spectrum can be attenuated. If the desired speech spectrum
`is most often greater in amplitude and/or uncorrelated with
`the undesired spectrum, then it will pass through the system
`relatively undistorted despite attenuation of the undesired
`spectrum. Examples of workin spectral subtraction include
`references {11, 12, 13.
`Pitch-based speech enhancement algorithms use the
`pitched nature of voiced speech to attempt to extract a voice
`which is embedded in noise. A pitch analysis is made on the
`noisy signal. If a strong pitch is detected, indicating strong
`voiced speech superimposed on the noise, then the pitch can
`be used to extract harmonics of the voiced speech, removing
`most of the uncorrelated noise components. Examples of
`work in pitch-based enhancement are references {17, 18}.
`SUMMARY OF THE INVENTION
`In accordance with this invention, the above problems are
`solved by analyzing the left and right digital audio signals to
`produce left and right signal frequency domain vectors and,
`thereafter, using digital signal encoding techniques to pro
`duce a noise reduction gain vector. The gain vector can then
`be multiplied against the left and right signal vectors to
`produce a noise reduced left and right signal vector. The cues
`used in the digital encoding techniques include
`directionality, short-term amplitude deviation from long
`term average, and pitch. In addition, a multidimensional
`gain function, based on directionality estimate and ampli
`tude deviation estimate, is used that is more effective in
`noise reduction than simply summing the noise reduction
`results of directionality alone and amplitude deviations
`alone. As further features of the invention, the noise reduc
`tion is scaled based on pitch-estimates and based on voice
`detection.
`Other advantages and features of the invention will be
`understood by those of ordinary skill in the art after referring
`to the complete written description of the preferred embodi
`ments in conjunction with the following drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 illustrates the preferred embodiment of the noise
`reduction system for a binaural hearing aid.
`FIG. 2 shows the details of the inner product operation
`and the sum of magnitudes squared operation referred to in
`FIG. 1.
`FIGS. 3A and 3B show the band smoothing filters 157 of
`band smoothing operation 156 in FIG. 1.
`
`- 8 -
`
`
`
`3
`FIG. 4 shows the details of the beam spectral subtract gain
`operation 158 in FIG. 1.
`FIG. 5A is a graph of noise reduction gains as a serial
`function of directionality and spectral Subtraction.
`FIG. 5B is a graph of the noise reduction gain as a
`function of directionality estimate and spectral subtraction
`excursion estimate in accordance with the process in FIG. 4.
`FIG. 6 shows the details of the pitch-estimate gain opera
`tion 180 in FIG. 1.
`FIG. 7 shows the details of the voice detect gain scaling
`operation 208 in FIG. 1.
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`Theory of Operation:
`In the noise-reduction system described in this invention,
`all three noise reduction techniques, beamforming, spectral
`subtraction and pitch enhancement, are used. Innovations
`will be described relevant to the individual techniques,
`especially beamforming. In addition, it will be demonstrated
`that a synergy exists between these techniques such that the
`whole is greater than the sum of the parts.
`Multidimensional Noise Reduction:
`We call a multidimensional noise reduction system any
`system which uses two or more distinct cues generated from
`signal analysis to attempt to separate desired from undesired
`signal. In our case, we use three cues: directionality (D),
`short term amplitude deviation from long term average
`(STAD), and pitch (fo). Each of these cues has been used
`Separately to design noise reduction Systems, but the coop
`erative use of the cues taken together in a single system has
`not been done.
`To see the interactions between the cues assume a system
`which uses D and STAD separately, i.e., the use of D alone
`as a beamformer and STAD alone as a spectral subtractor. In
`the case, of the beamformer we estimate D and then specify
`40
`again function of D which is unity for high D and tends to
`zero for low D. Similarly, for the spectral subtractor we
`estimate STAD and provide again function of STAD which
`is unity for high STAD and tends to zero for low STAD.
`The two noise reduction systems can be connected back
`to back in serial fashion (e.g., beamformer followed by
`spectral subtractor). In this case, we can thinkin terms of a
`two-dimensional gain function of (D.STAD) with the func
`tion having a shape similar to that shown in FIG. 5A. With
`the serial connection, the gain function in FIG. 5A is
`rectangular. Values of (DSTAD) inside the rectangle gen
`erate a gain near unity which tends toward Zero near the
`boundaries of the rectangle.
`If we abandon the notion of a serial connection
`(beamformer followed by spectral subtractor) and instead
`think in terms of a general two-dimensional function of
`(D.STAD), then we can define non-rectangular gain
`contours, such as that shown in FIG. 5B Generalized Gain.
`Here we see that there is more interaction between the D and
`STAD values. A region which may have been included in the
`rectangular gain contour is now excluded because we are
`better able to take into consideration both D and STAD.
`A common problem in spectral subtraction noise reduc
`tion systems is musical noise . This is isolated bits of
`spectrum which manage to rise above the STAD threshold in
`discrete bursts. This can turn a steady state noise, such as a
`fan noise, into a fluttering random musical note generator.
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,651,071
`
`10
`
`15
`
`25
`
`30
`
`4
`By using the combination of (D.STAD) we are able to make
`a better decision about a spectral component by insisting that
`not only must it rise above the STAD threshold, but it must
`also be reasonably on-line. There is a continuous give and
`take between these two parameters.
`Including fo, pitch, as a third cue gives rise to a three
`dimensional noise reduction system. We found it advanta
`geous to estimate D and STAD in parallel and then use the
`two parameters in a single two-dimensional function for
`gain. We do not want to estimate fo in parallel with D and
`STAD, though, because we can do a better estimate off0 if
`we first noise reduce the signal somewhat using D and
`STAD. Therefore, based on the partially noise-reduced
`signal, we estimate fo and then calculate the final gain using
`D, STAD and fo in a general three-dimensional function, or
`we can use fo to adjust the gain produced from DSTAD
`estimates. When fo is included, we see that not only is the
`system more efficient because we can use arbitrary gain
`functions of three parameters, but also the presence of a first
`stage of noise reduction makes the subsequent fo estimation
`more robust than it would be in an fo only based system.
`The D estimate is based on values of phase angle and
`magnitude for the current input segment. The STAD esti
`mate is based on the Sum of magnitudes over many past
`Segments. A more general approach would make a single
`unified estimate based on current and past values of both
`phase angle and magnitude. More information would be
`used, the function would be more general, and so a better
`result would be had.
`Frequency Domain Beamforming:
`A frequency domain beamformer is a kind of analysis/
`synthesis system. The incoming signals are analyzed by
`transforming to the frequency (or frequency-like) domain.
`Operations are carried out on the signals in the frequency
`domain, and then the signals are resynthesized by transform
`ing them back to the time domain. In the case of two
`microphone beamformers, the two signals are the left and
`right ear signals. Once transformed to the frequency domain,
`a directionality estimate can be made at each frequency
`point by comparing left and right values at each frequency.
`The directionality estimate is then used to generate a gain
`which is applied to the corresponding left and right fre
`quency points and then the signals are resynthesized.
`There are several key issues involved in the design of the
`basic analysis/synthesis system. In general, the analysis/
`Synthesis system will treat the incoming signals as consecu
`tive (possibly time overlapped) time segments of N sample
`points. Each Nsample point segment will be transformed to
`produce a fixed length block of frequency domain coeffi
`cients. An optimum transform concentrates the most signal
`power in the Smallest percentage of frequency domain
`coefficients. Optimum and near optimum transforms have
`been widely studied in signal coding applications reference
`19 where the desire is to transmit a signal using the fewest
`coefficients to achieve the lowest data rate. If most of the
`signal power is concentrated in a few coefficients, then only
`those coefficients need to be coded with high accuracy, and
`the others can be crudely coded or not coded at all.
`The optimum transform is also extremely important for
`the beamformer. Assume that a signal consists of desired
`Signal plus undesired noise signal. When the signal is
`transformed, some of the frequency domain coefficients will
`correspond largely to desired signal, some to undesired
`signal, and some to both. For the frequency coefficients with
`substantial contributions from both desired signal and noise,
`
`- 9 -
`
`
`
`10
`
`15
`
`30
`
`35
`
`40
`
`25
`
`5
`it is difficult to determine an appropriate gain. For frequency
`coefficients corresponding largely to desired signals the gain
`is near unity. For frequency coefficients corresponding
`largely to noise, the gain is near Zero. For dynamic signals,
`such as speech, the distribution of energy across frequency
`coefficients from input segment to input segment can be
`regarded as random except for possibly a long-term global
`spectral envelope. Two signals, desired signal and noise,
`generate two random distributions across frequency coeffi
`cients. The value of a particular frequency coefficient is the
`sum of the contribution from both signals. Since the total
`number of frequency coefficients is fixed, the probability of
`two signals making substantial contributions to the same
`frequency coefficient increases as the number of frequency
`coefficients with substantial energy used to code each signal
`increases. Therefore, an optimum transform, which concen
`trates energy in the smallest percentage of the total
`coefficients, will result in the smallest probability of overlap
`between coefficients of the desired signal and noise signal.
`This, in turn, results in the highest probability of correct
`answers in the beamformer gain estimation.
`A different view of the analysis/synthesis system is as a
`multiband filter bank {20. In this case, each frequency
`coefficient, as it varies in time from input segment to input
`segment, is seen as the output of a bandpass filter. There are
`as many bandpass filters, adjacent in frequency, as there are
`frequency coefficients. To achieve high energy concentration
`in frequency coefficients we want sharp transition bands
`between bandpass filters. For speech signals, optimum trans
`forms correspond to filter banks with relatively sharp tran
`sition bands to minimize overlap between bands.
`In general, to achieve good discrimination between
`desired signal and noise, we want many frequency coeffi
`cients (or many bands of filtering) with energy concentrated
`in as few coefficients as possible (sharp transition bands
`between bandpass filters). Unfortunately, this kind of high
`frequency resolution implies large input sample segments
`which, in turn, implies long input to output delays in the
`system. In a hearing aid application, time delay through the
`system is an important parameter to optimize. If the time
`delay from input to output becomes too large (e.g.>about 40
`ms), the lips of speakers are no longer synchronized with
`sound. It also becomes difficult to speak since the sound of
`one's one voice is not synchronized with muscle move
`ments. The impression is unnatural and fatiguing. A com
`45
`promise must be made between input-output delay and
`frequency resolution. A good choice of analysis/synthesis
`architecture can ease the constraints on this compromise.
`Another important consideration in the design of analysis/
`synthesis systems is edge effects. These are discontinuities
`that occur between adjacent output segments. These edge
`effects can be due to the circular convolution nature of
`fourier transform and inverse transforms, or they can be due
`to abrupt changes in frequency domain filtering (noise
`reduction gain, for example) from one segment to the next.
`Edge effects can sound like fluttering at the input segment
`rate. A well-designed analysis/synthesis system will elimi
`nate these edge effects or reduce them to the point where
`they are inaudible.
`The theoretical optimum transform for a signal of known
`statistics is the Karhoenen-Loeve Transform or KLT 19.
`The KLT does not generally lend itself to practical
`implementation, but serves as a basis for measuring the
`effectiveness of other transforms. It has been shown that, for
`speech signals, various transforms approach the KLT in
`effectiveness. These include the DCT 19, and ELT 21.
`A large body of literature also exists for designing efficient
`
`5,651,071
`
`6
`filter banks {22, 23. This literature also proposes tech
`niques for eliminating or reducing edge effects.
`One common design for analysis/synthesis systems is
`based on a technique called overlap-add {16}. In the
`overlap-add scheme, the incoming time domain signals are
`segmented into N point non-overlapping, adjacent time
`segments. Each N point segment is "padded” with an
`additional L zero values. Then each NHL point “augmented”
`segment is transformed using the FFT. A frequency domain
`gain, which can be viewed as the FFT of another NHL point
`sequence consisting an M point time domain finite impulse
`response padded with NHL-M Zeros, is multiplied with the
`transformed “augmented” input segment, and the product is
`inverse transformed to generate an NHL point time domain
`sequence. As long as MKL, then the resultingN+L point time
`domain sequence will have no circular convolution compo
`nents. Since an NHL point segment is generated for each
`incoming N point segment, the resulting segments will
`overlap in time. If the overlapping regions of consecutive
`segments are summed, then the result is equivalent to a
`linear convolution of the input signal with the gain impulse
`response.
`There are a number of problems associated with the
`overlap-addscheme. Viewed from the point of view offilter
`bank analysis, an overlap/add scheme uses bandpass filters
`whose frequency response is the transform of a rectangular
`window. This results in a poor quality bandpass response
`with considerable leakage between bands so the coefficient
`energy concentration is poor. While an overlap-add scheme
`will guarantee smooth reconstruction in the case of convo
`lution with a stationary finite impulse response of con
`strained length, when the impulse response is changing
`every block time, as is the case when we generate adaptive
`gains for a beamformer, then discontinuities will be gener
`ated in the output. It is as if we were to abruptly change all
`the coefficients in an FIR filter every block time. In an
`overlap-add system, the input to output minimum delay is:
`
`=(1+Z/2) * N+(compute time for 2*N FFT)
`
`D
`Where:
`N=input segment length,
`Z=number of zeros added to each blockfor zero padding.
`Aminimum value for Z is N, but this can easily be greater
`if the gain function is not sufficiently smooth over frequency.
`The frequency resolution of this system is N/2 frequency
`bins given conjugate symmetry of the transforms of the real
`input signal, and the fact that zero padding results in an
`interpolation of the frequency points with no new informa
`tion added.
`In the system design described in the preferred embodi
`ments section of this patent, we use a windowed analysis/
`synthesis architecture. In a windowed FFT analysis/
`synthesis system, the input and output time domain sample
`segments are multiplied by a window function which in the
`preferred embodiment is a sine window for both the input
`and output segments. The frequency response of the band
`pass filters (the transform of the sine window) is more
`sharply bandpass than in the case of the rectangular win
`dows of the overlap-add scheme so there is better coefficient
`energy concentration. The presence of the synthesis window
`results in an effective interpolation of the adaptive gain
`coefficients from one segment to the next and so reduces
`edge effects. The input to output delay for a windowed
`system is:
`D=1 * N+(compute time for N FFT)
`
`50
`
`55
`
`60
`
`65
`
`- 10 -
`
`
`
`Where:
`N=input segment length.
`It is clear that the sine windowed system is preferable to
`the overlap-add system from the point of view of coefficient
`energy concentration, output Smoothness, and input-output
`delay. Other analysis/synthesis architectures, such as EIT,
`Paraunitary Filter Banks, QMF Filter Banks, Wavelets, DCT
`should provide similar performance in terms of input-output
`delay but can be superior to the sine window architecture in
`terms of energy concentration, and reduction of edge effects.
`Preferred Embodiment:
`In FIG. 1, the noise reduction stage, which is implemented
`as a DSP software program, is shown as an operations flow
`diagram. The left and right ear microphone signals have
`been digitized at the system sample rate which is generally
`adjustable in a range from Fsa=8-4.8 kHz, but has a
`nominal value of Fsamp 11.025 Khz sampling rate. The left
`and right audio signals have little, or no, phase or magnitude
`distortion. A hearing aid system for providing such low
`distortion left and right audio signals is described in the
`above-identified cross-referenced patent application entitled
`"Binaural Hearing Aid.” The time domain digital input
`signal from each ear is passed to one-Zero pre-emphasis
`filters 139,141. Pre-emphasis of the left and right ear signals
`using a simple one-zero high-pass differentiator pre-whitens
`the signals before they are transformed to the frequency
`domain. This results in reduced variance between frequency
`coefficients so that there are fewer problems with numerical
`error in the Fourier transformation process. The effects of
`the preemphasis filters 139, 141 are removed after inverse
`Fourier transformation by using one-pole integrator deem
`phasis filters 242 and 244 on the left and right signals at the
`end of noise reduction processing. Of course, if binaural
`compression follows the noise reduction stage of processing,
`the inverse transformation and deemphasis would be at the
`end of binaural compression.
`In FIG. 1, after preemphasis, if used, the left and right
`time domain audio signals are passed through allpass filters
`144, 145 to gain multipliers 146, 147. The allpass filter
`serves as a variable delay. The combination of variable delay
`and gain allows the direction of the beam in beam forming
`to be steered to any angle if desired. Thus, the on-axis
`direction of beam forming may be steered from something
`other than straight in front of the user, or may be tuned to
`compensate for microphone or other mechanical mis
`matches.
`At times, it may be desirable to provide maximum gain
`for signals appearing to be off-axis, as determined from
`analysis of left and right ear signals. This may be necessary
`to calibrate a system which has imbalances in the left and
`right audio chain, such as imbalances between the two
`microphones. It may also be desirable to focus a beam in
`another direction then straight ahead. This may be true when
`a listener is riding in a car and wants to listen to someone
`sitting next to him without turning in that direction. It may
`also be desirable for non-hearing aid applications, such as
`speaker phones or hands-free car phones. To accomplish this
`beam steering, a delay and gain are inserted in one of the
`time domain input signal paths. This tunes the beam for a
`particular direction.
`The noise reduction operation in FIG. 1 is performed on
`N point blocks. The choice of N is a trade-off between
`frequency resolution and delay in the system. It is also a
`function of the selected sample rate. For the nominal 11.025
`sample rate, a value of N=256 has been used. Therefore, the
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,651,071
`
`10
`
`15
`
`20
`
`8
`signal is processed in 256 point consecutive sample blocks.
`After each block is processed, the block origin is advanced
`by 128 points. So, if the first block spans samples 0.255 of
`both the left and right channels, then the second block spans
`samples 128.383, the third spans samples 256.511, etc. The
`processing of each consecutive block is identical.
`The noise reduction processing begins by multiplying the
`left and right 256 point sample blocks by a sine window in
`operations 148, 149. A fast Fourier transform (FFT) opera
`tion 150, 151 is then performed on the left and right blocks.
`Since the signals are real, this yields a 128 point complex
`frequency vector for both the left and right audio channels.
`The elements of the complex frequency vectors will be
`referred to as bin values. So there are 128 frequency bins
`from F=0 (DC) to FXFsamp/2 Khz.
`The