`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`WO 00/49602
`
`WORLD INTELLECTUAL PROPERTY ORGANIZATION
`International Bureau
`
`(51) International Patent Classification 7 :
`GIOL 21102
`
`(11) International Publication Number:
`
`Al
`
`(43) International Publication Date:
`
`24 August 2000 (24.08.00)
`
`(21) International Application Number:
`
`PCTIUSOOI03538
`
`(22) International Filing Date:
`
`11 February 2000 (11.02.00)
`
`(81) Designated States: CA, CN, IL, JP, US, European patent (AT,
`BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU,
`MC, NL, PT, SE).
`
`(30) Priority Data:
`09/252,874
`09/385,996
`
`18 February 1999 (18.02.99)
`30 August 1999 (30.08.99)
`
`US
`US
`
`Published
`With international search report.
`
`(71) Applicant (for all designated States except US): ANDREA
`ELECTRONICS CORPORATION [USIUS]; 45 Melville
`Park Road, Melville, NY 11747 (US).
`
`(72) Inventors; and
`(75) Inventors/Applicants (for US only): MARASH, Joseph
`[lUlL]; Shimkin Street lA, 34750 Haifa (lL). BERDUGO,
`Baruch [lUlL]; Hanarkisim Street 6, 28000 Kiriat-Ata (IL).
`
`(74) Agents: KOWALSKI, Thomas, J. et al.; Frommer Lawrence &
`Haug LLP, 745 Fifth Avenue, New York, NY 10151 (US).
`
`(54) Title: SYSTEM, METHOD AND APPARATUS FOR CANCELLING NOISE
`
`102
`(
`Input Samples
`I
`
`:>-
`
`.2
`
`Collect
`Input
`Dala
`
`100
`
`106
`,J
`Combine
`c::::> 256 New
`
`Points With
`256 History
`
`c::::>
`
`Shading
`Coefficients
`
`~ 108
`r:::>
`
`Multiply By
`Hanning
`Window
`
`~,-~_~...,Po,......ln...Jt c:::> ~~~ .. "g C:::>9C:::>
`
`110
`
`112 (200)
`
`114
`
`Spectral Subtraction System
`
`Overlap ~ Output
`and Sum
`'--1,.;00' Samples
`(
`118
`
`116
`
`(57) Abstract
`
`A system for cancelling and reducing an audio signal noise arising from electrical or electromagnetic noise sources such as AC to
`DC power converter used by computers. The system receives a digital audio signal (102) sampled at a frequency which is at least twice
`the bandwidth of the audio signal. The input samples are stored in a temporary buffer of 256 points (104). when the buffer is full, the new
`256 points are combined in a combiner (106) with the previous 256 points to provide 512 input points which are multiplied by multiplier
`(108) with a shading window with the length of 512 points. The shaded results are converted to the frequency domain through an FFT
`processor (110). The FFT output is processed in a noise processor (112) which includes the noise magnitude estimation for each frequency
`bin, the substraction that estimates the noise free complex value for each frequency bin and the residual noise reduction process.
`
`RTL345-1_1021-0001
`
`
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT.
`
`AL
`AM
`AT
`AU
`AZ
`BA
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`CG
`CII
`CI
`CM
`CN
`CU
`CZ
`DE
`DK
`EE
`
`Albania
`Annenia
`Austria
`Australia
`Azerbaijan
`Bosnia and Herzegovina
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belants
`Canada
`Central African Republic
`Congo
`Switzerland
`Cote d'Ivoire
`Cameroon
`China
`Cuba
`Czech Repub lie
`Gennany
`Denmark
`Estonia
`
`ES
`FI
`FR
`GA
`GB
`GE
`GH
`GN
`GR
`HU
`IE
`IL
`IS
`IT
`JP
`KE
`KG
`KP
`
`KR
`KZ
`LC
`LI
`LK
`LR
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Ireland
`Israel
`Iceland
`Italy
`Japan
`Kenya
`Kyrgyzstan
`Democratic People's
`Republic of Korea
`Republic of Korea
`Kazakstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`LS
`LT
`LV
`LV
`MC
`MD
`MG
`MK
`
`ML
`MN
`MR
`MW
`MX
`NE
`NL
`NO
`NZ
`PL
`PT
`RO
`RU
`SD
`SE
`SG
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`The fonner Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`NetherlandS
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`SI
`SK
`SN
`SZ
`TD
`TG
`TJ
`TM
`TR
`TT
`UA
`UG
`US
`UZ
`VN
`YU
`ZW
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`Turkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`Zimbabwe
`
`RTL345-1_1021-0002
`
`
`
`WOOO/49602
`
`PCT/USOO/03538 -
`
`SYSTEM. METHOD AND APPARATUS FOR CANCELING NOISE
`
`RELATED APPLICATIONS INCORPORATED BY REFERENCE.
`
`This application claims priority from U.S. Patent Application Serial
`
`5 Nos. 091252,874 filed February 18,1999 and 09/385,996 filed August 30,1999 and
`
`reference is made to U.S. Patent Application Serial No. 601126,567 filed March 26,
`
`1999, all of which are herein incorporated by refere1J.ce.
`
`The following applications anci patent(s) are cited and hereby herein
`
`incorporated by reference: U.S. Patent Application Serial No. 09/252,874 filed
`
`10
`
`Februarj 18, 1999, U.S. Patent Appbcation Serial }.;o. 09/130,923 filed August 6,
`
`1998, U.S. Patent Application Serial No 09/055,709 filed April 7,1998, U.S. Patent
`
`Application Serial No. 09/059,503 filed April 13. J 998, U.S. Patent Application Serial
`
`No. Og!840.159 filed April I4, 1997, U.S. Patent ApDlication Serial No. 09/130,923
`
`filed August 6, 1998, U.S. Patent Application Serial ~~o. 08/672,899 now issued U.S.
`
`IS
`
`Patent No. 5,825,898 issued October 20, 1998; and U.S. Patent A.pplication Serial No.
`
`09/089,710 filed June 3,1998, U.S. Patent No. 5,825.897 issued October 20, 1998,
`
`U.S. Patent No. 5,732,143 Issued March 24, 1998, l..l.S. Patent No. 5,673,325 issued
`
`September 30, i997, U.S. Patent No. 5,381,473 issued January 10,1995, U.S. Patent
`
`ApplicatIOn Seriai No. 08/833,384 filed April 4 1991. And, all documents cited
`
`20
`
`herein are incorporated herein by reference, as are documents CIted or referenced Ir
`
`documents cited herein.
`
`FIELD O:F THE INVENTION.
`
`The present invention relates to nois~ cancellation and reJuction ane.
`
`more specifica.lly ,LO noise ;;ancellation ;~.nd reduction using spectral 3uhtra;;tion.
`
`25 BACKGROU~D OF THE INVENTION.
`
`Ambient noise added to speech degrac~es the per!ormancf! of spee(:h
`
`processing algorithms. Such processing algorithrn.f may inciude dictation. voice
`
`activation, voice compression and other systems. In such systems, it is desired to
`
`reduce tbe noise and improve the signal to noise fDt10 (SIN ratio) without tffecting
`
`30
`
`the spee~h and its characteristics.
`
`Near field noise callceling microphones proVide a satisfactory Sol.utlOTI
`
`bm require that the microphone in the proximity orth;; voice SOtll"ce (e,g., mouth). Tn
`
`many cases, this is achi.::ved by mounting the micj'ophone on ":' boom of a beadset
`
`RTL345-1_1021-0003
`
`
`
`WOOO/49602
`
`PCT/USOO/03538 -
`
`2
`many cases, this is achieved by mounting the microphone on a boom of a headset
`
`which situates the microphone at the end of a boom proximate the mouth of the
`
`wearer. However, the headset has proven to be either uncomfortable to wear or too
`
`restricting for operation in, for example, an automobile.
`
`5
`
`Microphone array technology in general, and adaptive beamforming
`
`arrays in particular, handle severe directional noises in the most efficient way. These
`
`systems map the noise field and create nulls towards the noise sources. The number
`
`of nulls is limited by the number of microphone elements and processing power.
`
`Such arrays have the benefit of hands-free operation without the necessity of a
`
`10
`
`headset.
`
`However, when the noise sources are diffused, the performance ofthe
`
`adaptive system will be reduced to the perfom1ance of a regular delay and sum
`
`microphone array, which is not always satisfactory. This is the case where the
`
`environment is quite reverberant, such as when the noises are strongly reflected from
`
`15
`
`the walls of a room and reach the array from an infinite number of directions. Such
`
`is also the case in a car environment for some of the noises radiated from the car
`
`chassis.
`
`OBJECTS AND SUMMARY OF THE INVENTION
`
`The spectral subtraction technique provides a solution to further reduce
`
`20
`
`the noise by estimating the noise magnitude spectrum of the polluted signal. The
`
`technique estimates the magnitude spectral level of the noise by measuring it during
`
`non-speech time intervals detected by a voice switch, and then subtracting the noise
`
`magnitude spectrum from the signal. This method, described in detail in Suppression
`
`of Acoustic Noise in Speech Using Spectral Subtraction, (Steven F Boll, IEEE ASSP-
`
`25
`
`27 NO. 2 April, 1979), achieves good results for stationary diffused noises that are not
`
`correlated with the speech signal. The spectral subtraction method, however, creates
`
`artifacts, sometimes described as musical noise, that may reduce the performance of
`
`the speech algorithm (such as vocoders or voice activation) if the spectral subtraction
`
`is uncontrolled. In addition, the spectral subtraction method assumes erroneously that
`
`30
`
`the voice switch accurately detects the presence of speech and locates the non-speech
`
`time intervals. This assumption is reasonable for off-line systems but difficult to
`
`achieve or obtain in real time systems.
`
`RTL345-1_1021-0004
`
`
`
`WO 00/49602
`
`PCTIUSOO/03538 -
`
`3
`More particularly, the noise magnitude spectrum is estimated by
`
`performing an FFT of 256 points of the non-speech time intervals and computing the
`
`energy of each frequency bin. The FFT is performed after the time domain signal is
`
`multiplied by a shading window (Hanning or other) with an overlap of 50%. The
`
`5
`
`energy of each frequency bin is averaged with neighboring FFT time frames. The
`
`number of frames is not determined but depends on the stability of the noise. For a
`
`stationary noise, it is preferred that many frames are averaged to obtain better noise
`
`estimation. For a non-stationary noise, a long averaging may be harmful.
`
`Problematically, there is no means to know a-priori whether the noise is stationary or
`
`10
`
`non-stationary.
`
`Assuming the noise magnitude spectrum estimation is calculated, the
`
`input signal is multiplied by a shading window (Hanning or other), an FFT is
`
`performed (256 points or other) with an overlap of 50% and the magnitude of each
`
`bin is averaged over 2-3 FFT frames. The noise magnitude spectrum is then
`
`15
`
`subtracted from the signal magnitude. lfthe result is negative, the value is replaced
`
`by a zero (Half Wave Rectification). It is recommended, however, to further reduce
`
`the residual noise present during non-speech intervals by replacing low values with a
`
`minimum value (or zero) or by attenuating the residual noise by 30dB. The resulting
`
`output is the noise free magnitude spectrum.
`
`20
`
`The spectral complex data is reconstructed by applying the phase
`
`infomlation of the relevant bin of the signal's FFT with the noise free magnitude. An
`
`lFFT process is then performed on the complex data to obtain the noise free time
`
`domain data. The time domain results are overlapped and summed with the previous
`
`frame's results to compensate for the overlap process of the FFT.
`
`25
`
`There are several problems associated with the system described.
`
`First, the system assumes that there is a prior knowledge of the speech and non(cid:173)
`
`speech time intervals. A voice switch is not practical to detect those periods.
`
`Theoretically, a voice switch detects the presence of the speech by measuring the
`
`energy level and comparing it to a threshold. lfthe threshold is too high, there is a
`
`30
`
`risk that some voice time intervals might be regarded as a non-speech time interval
`
`and the system will regard voice information as noise. The result is voice distortion,
`
`especially in poor signal to noise ratio cases. If, on the other hand, the threshold is too
`
`low, there is a risk that the non-speech intervals will be too short especially in poor
`
`RTL345-1_1021-0005
`
`
`
`WO 00/49602
`
`PCTIUSOO/03538·
`
`4
`signal to noise ratio cases and in cases where the voice is continuous with little
`
`intennission.
`
`Another problem is that the magnitude calculation of the FFT result is
`
`quite complex. This involves square and square root calculations which are very
`
`5
`
`expensive in tenns of computation load. Yet another problem is the association of the
`
`phase infonnation to the noise free magnitude spectrum in order to obtain the
`
`infonnation for the IFFT. This process requires the calculation of the phase, the
`
`storage of the infonnation, and applying the infonnation to the magnitude data - all
`
`are expensive in tenns of computation and memory requirements. Another problem is
`
`10
`
`the estimation of the noise spectral magnitude. The FFT process is a poor and
`
`unstable estimator of energy. The averaging-over-time of frames contributes
`
`insufficiently to the stability. Shortening the length of the FFT results in a wider
`
`bandwidth of each bin and better stability but reduces the perfonnance of the system.
`
`A veraging-over-time, moreover, smears the data and, for this reason, cannot be
`
`15
`
`extended to more than a few frames. This means that the noise estimation process
`
`proposed is not sufficiently stable.
`
`It is therefore an object of this invention to provide a spectral
`
`subtraction system that has a simple, yet efficient mechanism, to estimate the noise
`
`magnitude spectrum even in poor signal-to-noise ratio situations and in continuous
`
`20
`
`fast speech cases.
`
`It is another object of this invention to provide an efficient mechanism
`
`that can perfonn the magnitude estimation with little cost, and will overcome the
`
`problem of phase association.
`
`It is yet another object of this invention to provide a stable mechanism
`
`25
`
`to estimate the noise spectral magnitUde without the smearing of the data.
`
`In accordance with the foregoing objectives, the present invention
`
`provides a system that correctly detennines the non-speech segments of the audio
`
`signal thereby preventing erroneous processing of the noise canceling signal during
`
`the speech segments. In the preferred embodiment, the present invention obviates the
`
`30
`
`need for a voice switch by precisely detennining the non-speech segments using a
`
`separate threshold detector for each frequency bin. The threshold detector precisely
`
`detects the positions of the noise elements, even within continuous speech segments,
`
`by detennining whether frequency spectrum elements, or bins, of the input signal are
`
`RTL345-1_1021-0006
`
`
`
`WO 00/49602
`
`peT IUSOO/03538-
`
`5
`within a threshold set according to a minimum value of the frequency spectrum
`
`elements over a preset period of time. More precisely, current and future minimum
`
`values of the frequency spectrum elements. Thus, for each syllable, the energy of the
`
`noise elements is determined by a separate threshold determination without
`
`5
`
`examination of the overall signal energy thereby providing good and stable
`
`estimation of the noise. In addition, the system preferably sets the threshold
`
`continuously and resets the threshold within a predetermined period of time of, for
`
`example, five seconds.
`
`In order to reduce complex calculations, it is preferred in the present
`
`10
`
`invention to obtain an estimate of the magnitude of the input audio signal using a
`
`multiplying combination of the real and imaginary parts of the input in accordance
`
`with, for example, the higher and the lower values of the real and imaginary parts of
`
`the signal. In order to further reduce instability of the spectral estimation, a two(cid:173)
`
`dimensional (2D) smoothing process is applied to the signal estimation. A two-step
`
`15
`
`smoothing function using first neighboring frequency bins in each time frame then
`
`applying an exponential time average effecting an average over time for each
`
`frequency bin produces excellent results.
`
`In order to reduce the complexity of determining the phase of the
`
`frequency bins during subtraction to thereby align the phases of the subtracting
`
`20
`
`elements, the present invention applies a filter multiplication to effect the subtraction.
`
`The filter function, a Weiner filter function for example, or an approximation ofthe
`
`Weiner filter is multiplied by the complex data of the frequency domain audio signal.
`
`The filter function may effect a full-wave rectification, or a half-wave rectification for
`
`otherwise negative results of the subtraction process or simple subtraction. It will be
`
`25
`
`appreciated that, since the noise elements are determined within continuous speech
`
`segments, the noise estimation is accurate and it may be canceled from the audio
`
`signal continuously providing excellent noise cancellation characteristics.
`
`The present invention also provides a residual noise reduction process
`
`for reducing the residual noise remaining after noise cancellation. The residual noise
`
`30
`
`is reduced by zeroing the non-speech segments, e.g., within the continuous speech, or
`
`decaying the non-speech segments. A voice switch may be used or another threshold
`
`detector which detects the non-speech segments in the time-domain.
`
`RTL345-1_1021-0007
`
`
`
`WO 00/49602
`
`PCT/USOO/03538 -
`
`6
`The present invention is applicable with various noise canceling
`
`systems including, but not limited to, those systems described- in the U.S. patent
`
`applications incorporated herein by reference. The present invention, for example, is
`
`applicable with the adaptive beamforming array. In addition, the present invention
`
`5 may be embodied as a computer program for driving a computer processor either
`
`installed as application software or as hardware.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Other objects, features and advantages according to the present
`
`invention will become apparent from the following detailed description of the
`
`10
`
`illustrated embodiments when read in conjunction with the accompanying drawings in
`
`which corresponding components are identified by the same reference numerals.
`
`Fig. 1 illustrates the present invention;
`
`Fig. 2 illustrates the noise processing of the present invention;
`
`Fig. 3 illustrates the noise estimation processing of the present
`
`15
`
`invention;
`
`Fig. 4 illustrates the subtraction processing of the present invention;
`
`Fig. 5 illustrates the residual noise processing of the present invention;
`
`Fig. 5A illustrates a variant ofthe residual noise processing of the
`
`present invention;
`
`20
`
`25
`
`Fig. 6 illustrates a flow diagram of the present invention;
`
`Fig. 7 illustrates a flow diagram of the present invention;
`
`Fig. 8 illustrates a flow diagram of the present invention; and
`
`Fig. 9 illustrates a flow diagram of the present invention.
`
`DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
`
`The present invention, in one embodiment, practicable as a spectral
`
`subtraction system, method and apparatus for canceling andlor reducing noise arising
`
`from electrical or electromagnetic noise sources such as external electromagnetic
`
`noise sources such as power sources including, power supplies such as an AC source,
`
`an AC to DC power converter such as used by a computer, particularly a lap-top
`
`30
`
`computer. In particular, it was discovered that the power supply of a computer such
`
`as a lap-top device creates an interference noise on or in relation to the Universal
`
`Serial Bus (USB) line, port or signal thereon. Evidently, the power source on the
`
`power conversion creates an interference signal (herein referred to as "isotropic
`
`RTL345-1_1021-0008
`
`
`
`WO 00/49602
`
`peT IUSOO/03538-
`
`7
`diffused stationary noise" or "isotropic noise") which is transposed through, for
`
`example, electromagnetic coupling to the USB signal line which interferes with the
`
`signals thereon. This noise is audible when reproduced by a transducer, for example,
`
`as a buzzing sound. Ironically, USB was previously thought to avoid the audio noise
`
`5
`
`present from such sources which manifests on such devices as sound cards. Since
`
`USB is rapidly becoming the standard for speech and voice communications
`
`applications, for example, received from audio signal peripherals including signals
`
`received over the internet or other remote-transmission medium, it is a significant feat
`
`to eliminate this isotropic noise and, indeed, would have the same impact on the
`
`10 market as when DolbyTM was invented.
`
`The present invention as described herein was discovered to eliminate
`
`the "dirty noise" arising from the power source or power converter and manifesting on
`
`the USB signal line. One skilled in the art will appreciate how the spectral subtraction
`
`system, method and apparatus described herein is embodied as any of the well-known
`
`15
`
`computer software and/or hardware applications on a computer including, for
`
`example, a device driver or dynamic link library as particularly set forth in related
`
`application U.S. Patent Application Serial No. 601126,567. In cooperation with the
`
`invention of U.S. Patent Application Serial No. 601126,567, the present invention
`
`includes filters selectable by pull down menus for filtering out the isotropic noise. In
`
`20
`
`one embodiment, it was discovered that the preferred operating range ofthe present
`
`invention is scalable either automatically at the control of a computer processor or
`
`manually by the user by way of, for example, potentiometer or clickable object
`
`presented by the tabbed pull-down tabbed, between 8dB to 14dB since this appeared
`
`to provide optimal performance although it is within the present invention to provide
`
`25
`
`other dB ranges. However, when noise reduction is above 14dB, speech is attacked
`
`and there can be degradation of speech recognition.
`
`Thus, the present invention is applicable to inherent system noise or
`
`noise induced by a system, as well as to accoustical noise; and the invention reduces
`
`or eliminates inherent system noise induced by a system (e.g., noise from a power
`
`30
`
`source or power converter), as well as accoustical noise.
`
`Figure 1 illustrates an embodiment of the present invention 100. The
`
`system receives a digital audio signal at input 102 sampled at a frequency which is at
`
`least twice the bandwidth of the audio signal. In one embodiment, the signal is
`
`RTL345-1_1021-0009
`
`
`
`WOOO/49602
`
`PCT/USOO/03538.
`
`8
`derived from a microphone signal that has been processed through an analog front
`
`end, AID converter and a decimation filter to obtain the required sampling frequency.
`
`In another embodiment, the input is taken from the output of a beamformer or even an
`
`adaptive beamformer. In that case the signal has been processed to eliminate noises
`
`5
`
`arriving from directions other than the desired one leaving mainly noises originated
`
`from the same direction of the desired one. In yet another embodiment, the input
`
`signal can be obtained from a sound board when the processing is implemented on a
`
`PC processor or similar computer processor.
`
`The input samples are stored in a temporary buffer 104 of 256 points.
`
`10 When the buffer is full, the new 256 points are combined in a combiner 106 with the
`
`previous 256 points to provide 512 input points. The 512 input points are multiplied
`
`by multiplier 108 with a shading window with the length of 512 points. The shading
`
`window contains coefficients that are multiplied with the input data accordingly. The
`
`shading window can be Hanning or other and it serves two goals: the first is to smooth
`
`15
`
`the transients between two processed blocks (together with the overlap process); the
`
`second is to reduce the side lobes in the frequency domain and hence prevent the
`
`masking of low energy tonals by high energy side lobes. The shaded results are
`
`converted to the frequency domain through an FFT (Fast Fourier Transform)
`
`processor 110. Other lengths of the FFT samples (and accordingly input buffers) are
`
`20
`
`possible including 256 points or 1024 points.
`
`The FFT output is a complex vector of256 significant points (the other
`
`256 points are an anti-symmetric replica of the first 256 points). The points are
`
`processed in the noise processing block 112(200) which includes the noise magnitude
`
`estimation for each frequency bin - the subtraction process that estimates the noise-
`
`25
`
`free complex value for each frequency bin and the residual noise reduction process.
`
`An IFFT (Inverse Fast Fourier Transform) processor 114 performs the Inverse Fourier
`
`Transform on the complex noise free data to provide 512 time domain points. The
`
`first 256 time domain points are summed by the summer 116 with the previous last
`
`256 data points to compensate for the input overlap and shading process and output at
`
`30
`
`output terminal 118. The remaining 256 points are saved for the next iteration.
`
`It will be appreciated that, while specific transforms are utilized in the
`
`preferred embodiments, it is of course understood that other transforms may be
`
`applied to the present invention to obtain the spectral noise signal.
`
`RTL345-1_1021-0010
`
`
`
`WOOO/49602
`
`PCTIUSOO/03538 -
`
`9
`Figure 2 is a detailed description of the noise processing block
`
`200(112). First, each frequency bin (n) 202 magnitude is estimated. The straight
`
`forward approach is to estimate the magnitude by calculating:
`
`5
`
`Y(n) =((Real(n)/+ (Imag(n)/r2
`
`In order to save processing time and complexity the signal magnitude
`
`(Y) is estimated by an estimator 204 using an approximation fonnula instead:
`
`10
`
`Y(n) Max[IReal(n),Imag(n)IJ+O.4* Min[IReal(n),Imag(n)1J
`
`In order to reduce the instability of the spectral estimation, which
`
`typically plagues the FFT Process (ref[2) Digital Signal Processing, Oppenheim
`
`Schafer, Prentice Hall P. 542545), the present invention implements a 2D smoothing
`
`15
`
`process. Each bin is replaced with the average of its value and the two neighboring
`
`bins' value (of the same time frame) by a first averager 206. In addition, the smoothed
`
`value of each smoothed bin is further smoothed by a second averager 208 using a time
`
`exponential average with a time constant of 0.7 (which is the equivalent of averaging
`
`over 3 time frames). The 2D-smoothed value is then used by two processes - the
`
`20
`
`noise estimation process by noise estimation processor 212(300) and the subtraction
`
`process by subtractor 210. The noise estimation process estimates the noise at each
`
`frequency bin and the result is used by the noise subtraction process. The output of
`
`the noise subtraction is fed into a residual noise reduction processor 216 to further
`
`reduce the noise. In one embodiment, the time domain signal is also used by the
`
`25
`
`residual noise process 216 to detennine the speech free segments. The noise free
`
`signal is moved to the IFFT process to obtain the time domain output 218.
`
`Figure 3 is a detailed description of the noise estimation processor
`
`300(212). Theoretically, the noise should be estimated by taking a long time average
`
`of the signal magnitude (Y) of non-speech time intervals. This requires that a voice
`
`30
`
`switch be used to detect the speech/non-speech intervals. However, a too-sensitive a
`
`switch may result in the use of a speech signal for the noise estimation which will
`
`defect the voice signal. A less sensitive switch, on the other hand, may dramatically
`
`RTL345-1_1021-0011
`
`
`
`WO 00/49602
`
`PCT/USOO/03538·
`
`10
`reduce the length of the noise time intervals (especially in continuous speech cases)
`
`and defect the validity of the noise estimation.
`
`In the present invention, a separate adaptive threshold is implemented
`
`for each frequency bin 302. This allows the location of noise elements for each bin
`
`5
`
`separately without the examination of the overall signal energy. The logic behind this
`
`method is that, for each syllable, the energy may appear at different frequency bands.
`
`At the same time, other frequency bands may contain noise elements. It is therefore
`
`possible to apply a non-sensitive threshold for the noise and yet locate many non(cid:173)
`
`speech data points for each bin, even within a continuous speech case. The advantage
`
`10
`
`of this method is that it allows the collection of many noise segments for a good and
`
`stable estimation of the noise, even within continuous speech segments.
`
`In the threshold determination process, for each frequency bin, two
`
`minimum values are calculated. A future minimum value is initiated every 5 seconds
`
`at 304 with the value of the current magnitude (Y(n» and replaced with a smaller
`
`15 minimal value over the next 5 seconds through the following process. The future
`
`minimum value of each bin is compared with the current magnitude value of the
`
`signal. If the current magnitude is smaller than the future minimum, the future
`
`minimum is replaced with the magnitude which becomes the new future minimum.
`
`At the same time, a current minimum value is calculated at 306. The
`
`20
`
`current minimum is initiated every 5 seconds with the value of the future minimum
`
`that was determined over the previous 5 seconds and follows the minimum value of
`
`the signal for the next 5 seconds by comparing its value with the current magnitude
`
`value. The current minimum value is used by the subtraction process, while the future
`
`minimum is used for the initiation and refreshing of the current minimum.
`
`25
`
`The noise estimation mechanism of the present invention ensures a
`
`tight and quick estimation of the noise value, with limited memory of the process (5
`
`seconds), while preventing a too high an estimation of the noise.
`
`Each bin's magnitude (Y(n» is compared with four times the current
`
`minimum value of that bin by comparator 308 which serves as the adaptive
`
`30
`
`threshold for that bin. If the magnitude is within the range (hence below the
`
`threshold), it is allowed as noise and used by an exponential averaging unit 310 that
`
`determines the level of the noise 312 of that frequency. If the magnitude is above the
`
`threshold it is rejected for the noise estimation. The time constant for the exponential
`
`RTL345-1_1021-0012
`
`
`
`WO 00/49602
`
`peT /USOO/03538-
`
`11
`averaging is typically 0.95 which may be interpreted as taking the average of the last
`
`20 frames. The threshold of 4*minimum value may be changed for some
`
`applications.
`
`Figure 4 is a detailed description of the subtraction processor 400(210).
`
`5
`
`In a straight forward approach, the value of the estimated bin noise magnitude is
`
`subtracted from the current bin magnitude. The phase of the current bin is calculated
`
`and used in conjunction with the result of the subtraction to obtain the Real and
`
`Imaginary parts of the result. This approach is very expensive in terms of processing
`
`and memory because it requires the calculation of the Sine and Cosine arguments of
`
`10
`
`the complex vector with consideration of the 4 quarters where the complex vector
`
`may be positioned. An alternative approach used in this present invention is to use a
`
`Filter approach. The subtraction is interpreted as a filter multiplication performed by
`
`filter 402 where H (the filter coefficient) is:
`
`15 H(n)
`
`IIY(n)I-IN(n)l!
`
`I Y(n) I
`
`Where Yen) is the magnitude of the current bin and N(n) is the noise
`
`estimation of that bin. The value H of the filter coefficient (of each bin separately) is
`
`20 mUltiplied by the Real and Imaginary parts of the current bin at 404:
`
`E(Real) = Y(Real) *H
`
`E(Imag) = Y(Imag) *H
`
`Where E is the noise free complex value. In the straight forward
`
`25
`
`approach the subtraction may result in a negative value of magnitude. This value can
`
`be either replaced with zero (half-wave rectification) or replaced with a positive value
`
`equal to the negative one (full-wave rectification). The filter approach, as expressed
`
`here, results in the full-wave rectification directly. The full wave rectification
`
`provides a little less noise reduction but introduces much less artifacts to the signal. It
`
`30 will be appreciated that this filter can be modified to effect a half-wave rectification
`
`by taking the non-absolute value of the numerator and replacing negative values with
`
`zeros.
`
`RTL345-1_1021-0013
`
`
`
`WO 00/49602
`
`PCT/USOO/03538·
`
`12
`Note also that the values of Y in the figures are the smoothed values of
`
`Y after averaging over neighboring spectral bins and over time frames (2D
`
`smoothing). Another approach is to use the smoothed Y only for the noise estimation
`
`(N), and to use the unsmoothed Y for the calculation of H.
`
`5
`
`Figure 5 illustrates the residual noise reduction processor 500(216).
`
`The residual noise is defined as the remaining noise during non-speech intervals. The
`
`noise in these intervals is first reduced by the subtraction process which does not
`
`differentiate between speech and non-speech