`
`WORLD INTELLECTUAL PROPERTY ORGANIZATION
`Intematronal Bureau
`
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(51) International Patent Classification 7 =
`10L 21/02
`G
`
`(11) International Publication Number:
`(43) International Publication Date:
`
`WO 00/49602
`24 August 2000 (24.08.00)
`
`(21) International Application Number:
`
`PCT/US00/03538
`
`(22) International Filing Date:
`
`11 February 2000 (11.02.00)
`
`(81) Designated States: CA, CN, IL, JP, US, European patent (AT,
`BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU,
`MC, NL, PT. SE)-
`
`(30) Priority Data:
`09/252,874
`09/385,996
`
`18 February 1999 (18.02.99)
`30 August 1999 (30.08.99)
`
`US
`US
`
`Published
`With international search report.
`
`(71) Applicant (for all designated States except US): ANDREA
`ELECTRONICS CORPORATION [US/US]; 45 Melville
`Park Road, Melville, NY 11747 (US).
`
`(72) Inventors; and
`Joseph
`(75) Inventors/Applicants (for US only): MARASH,
`[IL/IL]; Shimkin Street 1A, 34750 Haifa (IL). BERDUGO,
`Baruch [IL/IL]; Hanarkisim Street 6, 28000 Kiriat—Ata (IL).
`
`(74) Agents: KOWALSKI, Thomas, J. et al.; Frommer Lawrence &
`Haug LLP, 745 Fifth Avenue, New York, NY 10151 (US).
`
`(54) Title: SYSTEM, METHOD AND APPARATUS FOR CANCELLING NOISE
`
`102
`
`8
`Input Samples
`
`1
`-29
`
`Shading
`Coetllcients
`
`Combine
`256 New
`Points with
`256 History
`
`112 (200)
`
`1 14
`
`Spectral Subtraction System
`
`(57) Abstract
`
`A system for cancelling and reducing an audio signal noise arising from electrical or electromagnetic noise sources such as AC to
`DC power converter used by computers. The system receives a digital audio signal (102) sampled at a frequency which is at least twice
`the bandwidth of the audio signal. The input samples are stored in a temporary buffer of 256 points (104). when the buffer is full, the new
`256 points are combined in a combiner (106) with the previous 256 points to provide 512 input points which are multiplied by multiplier
`(108) with a shading window with the length of 512 points. The shaded results are converted to the frequency domain through an FFI‘
`processor (1 10). The FFI‘ output is processed in a noise processor (112) which includes the noise magnitude estimation for each frequency
`bin, the substraction that estimates the noise free complex value for each frequency bin and the residual noise reduction process.
`
`RTL345-2_l 021-0001
`
`Rea1tek345—2 EX. 1021
`
`RTL345-2_1021-0001
`
`
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT.
`Albania
`ES
`LS
`Annenia
`FI
`LT
`Austria
`FR
`LU
`Australia
`GA
`LV
`GB
`MC
`Azerbaijan
`GE
`MD
`Bosnia and Herzegovina
`Barbados
`GH
`MG
`GN
`MK
`Belgium
`Burkina Faso
`GR
`HU
`Bulgaria
`Benin
`IE
`Brazil
`IL
`Belarus
`IS
`Canada
`IT
`JP
`Central African Republic
`KE
`Congo
`Switzerland
`KG
`Cote d‘Ivoire
`KP
`Cameroon
`China
`Cuba
`Czech Republic
`Germany
`Denmark
`Estonia
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`The former Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`SI
`SK
`SN
`SZ
`TD
`TG
`TJ
`TM
`TR
`TT
`UA
`UG
`US
`UZ
`VN
`YU
`ZW
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`Turkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`Zimbabwe
`
`ML
`MN
`MR
`MW
`MX
`NE
`NL
`NO
`NZ
`PL
`PT
`R0
`RU
`SD
`SE
`SG
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Ireland
`Israel
`Iceland
`Italy
`Japan
`Kenya
`Kyrgyzstan
`Democratic People's
`Republic of Korea
`Republic of Korea
`Kazakstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`KR
`KZ
`LC
`LI
`LK
`LR
`
`RTL345-2_l 021-0002
`
`RTL345-2_1021-0002
`
`
`
`WO 00/49602
`
`A
`
`PCT/US00/03538 -
`
`1 ,
`SYSTEM. METHOD AND APPARATUS FOR CANCELING NOISE
`
`RELATED APPLICATIONS INCORPORATED BY REFERENCE.
`
`This application claims priority from US. Patent Application Serial
`
`Nos. 09/252,874 filed February 18, 1999 and 09/385,996 filed August 30, 1999 and
`
`reference is made to US. Patent Application Serial No. 60/ 126,567 filed March 26,
`
`1999, all of which are herein incorporated by reference.
`
`The following applications and p-.1tent(s)= are cited and hereby herein
`
`incorporated by reference: U.S. Patent Application Serial No. 09/252,874 filed
`
`February 18, 1999, U.S. Patent Application Serial No. 09/130,923 filed August 6,
`
`1998, U.S. Patent Application Serial No 09/055,709 filed April 7, 1998. US. Patent
`
`Application Serial No. 09/059,503 filed April 13, 1998, US. Patent Application Serial
`
`No. 08/840159 filed April 14, 1997, US. Patent Application Serial No. 09/130,923
`
`filed August 6, 1998, US. Patent Application Serial No. 08/672,899 now issued US.
`
`Patent No. 5,825,898 issued October 20, 1998; and US. Patent Application Serial No.
`
`09/089,710 filed June 3, 1998, U.S. Patent No. 5,825,897 issued October 20, 1998,
`
`U.S. Patent No. 5,732,143 issued March 24, 1998, US. Patent No. 5,673,325 issued
`
`September 30, 1997. U.S. Patent No. 5,381,473 issued January 10, 1995, US. Patent
`
`Application Serial No. 08/833,384 filed April 4, 1997 And, all documents cited
`
`herein are incorporated herein by reference, as are rlocuments cited or referenced in
`
`documents cited herein.
`
`FIELD OF THE INVENTQN.
`
`The present invention relates to noise cancellation and reduction and.
`
`more specifically, to noise cancellation and reduction using spectral subtra.:tion.
`
`BACKGROUND OF THE INVENTION.
`
`Ambient noise added to speech degrades the performance of speech
`
`processing algorithms. Such processing algorithms may include dictation, voice
`
`activation, voice compression and other systems.
`
`In such systems, it is desired to
`
`reduce the noise and improve the signal to noise ratio (S/N ratio) without effecting
`
`the speech and its characteristics.
`
`Near field noise canceling microphones provide a satisfactory solution
`
`but require that the microphone in the proximity of the voice source ('eg.. mout }.
`
`In
`
`many cases, this is achieved by iiioiiiitiiig the microphone on a boom of a headset
`
`RTL345-2_l 021-0003
`
`RTL345-2_1021-0003
`
`
`
`WO 00/49602
`
`S
`
`PCT/US00/03538 -
`
`2
`
`many cases, this is achieved by mounting the microphone on a boom of a headset
`
`which situates the microphone at the end of a boom proximate the mouth of the
`
`wearer. However, the headset has proven to be either uncomfortable to wear or too
`
`restricting for operation in, for example, an automobile.
`
`Microphone array technology in general, and adaptive beamforrning
`
`arrays in particular, handle severe directional noises in the most efficient way. These
`
`systems map the noise field and create nulls towards the noise sources. The number
`
`of nulls is limited by the number of microphone elements and processing power.
`
`Such arrays have the benefit of hands-free operation without the necessity of a
`
`headset.
`
`However, when the noise sources are diffused, the performance of the
`
`adaptive system will be reduced to the performance of a regular delay and sum
`
`microphone array, which is not always satisfactory. This is the case where the
`
`environment is quite reverberant, such as when the noises are strongly reflected from
`
`the walls of a room and reach the array from an infinite number of directions. Such
`
`is also the case in a car environment for some of the noises radiated from the car
`
`chassis.
`
`OBJECTS AND SUMMARY OF THE INVENTION
`
`The spectral subtraction technique provides a solution to further reduce
`
`the noise by estimating the noise magnitude spectrum of the polluted signal. The
`
`technique estimates the magnitude spectral level of the noise by measuring it during
`
`non-speech time intervals detected by a voice switch, and then subtracting the noise
`
`magnitude spectrum from the signal. This method, described in detail in Suppression
`
`ofAcoustic Noise in Speech Using Spectral Subtraction, (Steven F Boll, IEEE ASSP—
`
`27 NO.2 April, 1979), achieves good results for stationary diffused noises that are not
`
`correlated with the speech signal. The spectral subtraction method, however, creates
`
`artifacts, sometimes described as musical noise, that may reduce the performance of
`
`the speech algorithm (such as vocoders or voice activation) if the spectral subtraction
`
`is uncontrolled. In addition, the spectral subtraction method assumes erroneously that
`
`the voice switch accurately detects the presence of speech and locates the non-speech
`
`time intervals. This assumption is reasonable for off—line systems but difficult to
`
`achieve or obtain in real time systems.
`
`RTL345-2_l 021-0004
`
`RTL345-2_1021-0004
`
`
`
`WO 00/49602
`
`A
`
`PCT/US00/03538 -
`
`3
`
`More particularly, the noise magnitude spectrum is estimated by
`
`performing an FFT of 256 points of the non-speech time intervals and computing the
`
`energy of each frequency bin. The FFT is performed after the time domain signal is
`
`multiplied by a shading window (Hanning or other) with an overlap of 50%. The
`
`energy of each frequency bin is averaged with neighboring FFT time frames. The
`
`number of frames is not determined but depends on the stability of the noise. For a
`
`stationary noise, it is preferred that many frames are averaged to obtain better noise
`
`estimation. For a non-stationary noise, a long averaging may be harmful.
`
`Problematically, there is no means to know a-priori whether the noise is stationary or
`
`non-stationary.
`
`Assuming the noise magnitude spectrum estimation is calculated, the
`
`input signal is multiplied by a shading window (Hanning or other), an FFT is
`
`performed (256 points or other) with an overlap of 50% and the magnitude of each
`
`bin is averaged over 2-3 FFT frames. The noise magnitude spectrum is then
`
`subtracted from the signal magnitude. If the result is negative, the value is replaced
`
`by a zero (Half Wave Rectification). It is recommended, however,
`
`to further reduce
`
`the residual noise present during non-speech intervals by replacing low values with a
`
`minimum value (or zero) or by attenuating the residual noise by 30dB. The resulting
`
`output is the noise free magnitude spectrum.
`
`The spectral complex data is reconstructed by applying the phase
`
`information of the relevant bin of the signal's FFT with the noise free magnitude. An
`
`IFFT process is then performed on the complex data to obtain the noise free time
`
`domain data. The time domain results are overlapped and summed with the previous
`
`frame's results to compensate for the overlap process of the FFT.
`
`There are several problems associated with the system described.
`
`First, the system assumes that there is a prior knowledge of the speech and non-
`
`speech time intervals. A voice switch is not practical
`
`to detect those periods.
`
`Theoretically, a voice switch detects the presence of the speech by measuring the
`
`energy level and comparing it to a threshold. If the threshold is too high, there is a
`
`risk that some voice time intervals might be regarded as a non-speech time interval
`
`and the system will regard voice information as noise. The result is voice distortion,
`
`especially in poor signal to noise ratio cases. If, on the other hand, the threshold is too
`
`low, there is a risk that the non-speech intervals will be too short especially in poor
`
`RTL345-2_l 021-0005
`
`RTL345-2_1021-0005
`
`
`
`WO 00/49602
`
`A
`
`PCT/US00/03538 ~
`
`signal to noise ratio cases and in cases where the voice is continuous with little
`
`intermission.
`
`4
`
`Another problem is that the magnitude calculation of the FFT result is
`
`quite complex. This involves square and square root calculations which are very
`
`expensive in terms of computation load. Yet another problem is the association of the
`
`phase information to the noise free magnitude spectrum in order to obtain the
`
`information for the IFFT. This process requires the calculation of the phase, the
`
`storage of the information, and applying the information to the magnitude data - all
`
`are expensive in terms of computation and memory requirements. Another problem is
`
`the estimation of the noise spectral magnitude. The FFT process is a poor and
`
`unstable estimator of energy. The averaging-over-time of frames contributes
`
`insufficiently to the stability. Shortening the length of the FFT results in a wider
`
`bandwidth of each bin and better stability but reduces the performance of the system.
`
`Averaging~over—time, moreover, smears the data and, for this reason, cannot be
`
`extended to more than a few frames. This means that the noise estimation process
`
`proposed is not sufficiently stable.
`
`It is therefore an object of this invention to provide a spectral
`
`subtraction system that has a simple, yet efficient mechanism, to estimate the noise
`
`magnitude spectrum even in poor signal-to-noise ratio situations and in continuous
`
`fast speech cases.
`
`It is another object of this invention to provide an efficient mechanism
`
`that can perform the magnitude estimation with little cost, and will overcome the
`
`problem of phase association.
`
`It is yet another object of this invention to provide a stable mechanism
`
`to estimate the noise spectral magnitude without the smearing of the data.
`
`In accordance with the foregoing objectives, the present invention
`
`provides a system that correctly determines the non-speech segments of the audio
`
`signal thereby preventing erroneous processing of the noise canceling signal during
`
`the speech segments. In the preferred embodiment, the present invention obviates the
`
`need for a voice switch by precisely determining the non-speech segments using a
`
`separate threshold detector for each frequency bin. The threshold detector precisely
`
`detects the positions of the noise elements, even within continuous speech segments,
`
`by determining whether frequency spectrum elements, or bins, of the input signal are
`
`RTL345-2_l 021-0006
`
`RTL345-2_1021-0006
`
`
`
`WO 00/49602
`
`A
`
`PCT/US00/03538~
`
`5
`
`within a threshold set according to a minimum value of the frequency spectrum
`
`elements over a preset period of time. More precisely, current and future minimum
`
`values of the frequency spectrum elements. Thus, for each syllable, the energy of the
`
`noise elements is determined by a separate threshold determination without
`
`examination of the overall signal energy thereby providing good and stable
`
`estimation of the noise. In addition, the system preferably sets the threshold
`
`continuously and resets the threshold within a predetermined period of time of, for
`
`example, five seconds.
`
`In order to reduce complex calculations, it is preferred in the present
`
`invention to obtain an estimate of the magnitude of the input audio signal using a
`
`multiplying combination of the real and imaginary parts of the input in accordance
`
`with, for example, the higher and the lower values of the real and imaginary parts of
`
`the signal. In order to further reduce instability of the spectral estimation, a two-
`
`dimensional (2D) smoothing process is applied to the signal estimation. A two-step
`
`smoothing function using first neighboring frequency bins in each time frame then
`
`applying an exponential time average effecting an average over time for each
`
`frequency bin produces excellent results.
`
`In order to reduce the complexity of determining the phase of the
`
`frequency bins during subtraction to thereby align the phases of the subtracting
`
`elements, the present invention applies a filter multiplication to effect the subtraction.
`
`The filter function, a Weiner filter function for example, or an approximation of the
`
`Weiner filter is multiplied by the complex data of the frequency domain audio signal.
`
`The filter function may effect a full—wave rectification, or a half-wave rectification for
`
`otherwise negative results of the subtraction process or simple subtraction. It will be
`
`appreciated that, since the noise elements are determined within continuous speech
`
`segments, the noise estimation is accurate and it may be canceled from the audio
`
`signal continuously providing excellent noise cancellation characteristics.
`
`The present invention also provides a residual noise reduction process
`
`for reducing the residual noise remaining after noise cancellation. The residual noise
`
`is reduced by zeroing the non-speech segments, e.g., within the continuous speech, or
`
`decaying the non-speech segments. A voice switch may be used or another threshold
`
`detector which detects the non-speech segments in the time-domain.
`
`RTL345-2_l 021-0007
`
`RTL345-2_1021-0007
`
`
`
`WO 00/49602
`
`PCT/US00/03538 -
`
`6
`
`The present invention is applicable with various noise canceling
`
`systems including, but not limited to, those systems described‘ in the U.S. patent
`
`applications incorporated herein by reference. The present invention, for example, is
`
`applicable with the adaptive beamforming array. In addition, the present invention
`
`may be embodied as a computer program for driving a computer processor either
`
`installed as application software or as hardware.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Other objects, features and advantages according to the present
`
`invention will become apparent from the following detailed description of the
`
`illustrated embodiments when read in conjunction with the accompanying drawings in
`
`which corresponding components are identifi ed by the same reference numerals.
`
`Fig. 1 illustrates the present invention;
`
`Fig. 2 illustrates the noise processing of the present invention;
`
`Fig. 3 illustrates the noise estimation processing of the present
`
`invention;
`
`Fig. 4 illustrates the subtraction processing of the present invention;
`
`Fig. 5 illustrates the residual noise processing of the present invention;
`
`Fig. 5A illustrates a variant of the residual noise processing of the
`
`present invention;
`
`Fig. 6 illustrates a flow diagram of the present invention;
`
`Fig. 7 illustrates a flow diagram of the present invention;
`
`Fig. 8 illustrates a flow diagram of the present invention; and
`
`Fig. 9 illustrates a flow diagram of the present invention.
`
`DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
`
`The present invention, in one embodiment, practicable as a spectral
`
`subtraction system, method and apparatus for canceling and/or reducing noise arising
`
`from electrical or electromagnetic noise sources such as external electromagnetic
`
`noise sources such as power sources including, power supplies such as an AC source,
`
`an AC to DC power converter such as used by a computer, particularly a lap—top
`
`computer. In particular, it was discovered that the power supply of a computer such
`
`as a lap—top device creates an interference noise on or in relation to the Universal
`
`Serial Bus (USB) line, port or signal thereon. Evidently, the power source on the
`
`power conversion creates an interference signal (herein referred to as "isotropic
`
`RTL345-2_l 021-0008
`
`RTL345-2_1021-0008
`
`
`
`WO 00/49602
`
`A
`
`PCT/US00/03538 —
`
`7
`
`diffused stationary noise" or "isotropic noise") which is transposed through, for
`
`example, electromagnetic coupling to the USB signal line which interferes with the
`
`signals thereon. This noise is audible when reproduced by a transducer, for example,
`
`as a buzzing sound. Ironically, USB was previously thought to avoid the audio noise
`
`present from such sources which manifests on such devices as sound cards. Since
`
`USB is rapidly becoming the standard for speech and voice communications
`
`applications, for example, received from audio signal peripherals including signals
`
`received over the internet or other remote-transmission medium, it is a significant feat
`
`to eliminate this isotropic noise and, indeed, would have the same impact on the
`
`market as when Dolbym was invented.
`
`The present invention as described herein was discovered to eliminate
`
`the "dirty noise" arising from the power source or power converter and manifesting on
`
`the USB signal line. One skilled in the art will appreciate how the spectral subtraction
`
`system, method and apparatus described herein is embodied as any of the well-known
`
`computer software and/or hardware applications on a computer including, for
`
`example, a device driver or dynamic link library as particularly set forth in related
`
`application U.S. Patent Application Serial No. 60/126,567. In cooperation with the
`
`invention of U.S. Patent Application Serial No. 60/ 126,567, the present invention
`
`includes filters selectable by pull down menus for filtering out the isotropic noise. In
`
`one embodiment, it was discovered that the preferred operating range of the present
`
`invention is scalable either automatically at the control of a computer processor or
`
`manually by the user by way of, for example, potentiometer or clickable object
`
`presented by the tabbed pull—down tabbed, between 8dB to l4dB since this appeared
`
`to provide optimal performance although it is within the present invention to provide
`
`other dB ranges. However, when noise reduction is above l4dB, speech is attacked
`
`and there can be degradation of speech recognition.
`
`Thus, the present invention is applicable to inherent system noise or
`
`noise induced by a system, as well as to accoustical noise; and the invention reduces
`
`or eliminates inherent system noise induced by a system (e.g., noise from a power
`
`source or power converter), as well as accoustical noise.
`
`Figure 1 illustrates an embodiment of the present invention 100. The
`
`system receives a digital audio signal at input 102 sampled at a frequency which is at
`
`least twice the bandwidth of the audio signal. In one embodiment, the signal is
`
`RTL345-2_l 021-0009
`
`RTL345-2_1021-0009
`
`
`
`WO 00/49602
`
`A
`
`PCT/US00/03538 -
`
`8
`
`derived from a microphone signal that has been processed through an analog front
`
`end, A/D converter and a decimation filter to obtain the required sampling frequency.
`
`In another embodiment, the input is taken from the output of a beamformer or even an
`
`adaptive beamforrner. In that case the signal has been processed to eliminate noises
`
`arriving from directions other than the desired one leaving mainly noises originated
`
`from the same direction of the desired one.
`
`In yet another embodiment, the input
`
`signal can be obtained from a sound board when the processing is implemented on a
`
`PC processor or similar computer processor.
`
`The input samples are stored in a temporary buffer 104 of 256 points.
`
`When the buffer is full, the new 256 points are combined in a combiner 106 with the
`
`previous 256 points to provide 512 input points. The 512 input points are multiplied
`
`by multiplier 108 with a shading window with the length of 512 points. The shading
`
`window contains coefficients that are multiplied with the input data accordingly. The
`
`shading window can be Harining or other and it serves two goals: the first is to smooth
`
`the transients between two processed blocks (together with the overlap process); the
`
`second is to reduce the side lobes in the frequency domain and hence prevent the
`
`masking of low energy tonals by high energy side lobes. The shaded results are
`
`converted to the frequency domain through an FFT (Fast Fourier Transform)
`
`processor 110. Other lengths of the FFT samples (and accordingly input buffers) are
`
`possible including 256 points or 1024 points.
`
`The FFT output is a complex vector of 256 significant points (the other
`
`256 points are an anti—symmetric replica of the first 256 points). The points are
`
`processed in the noise processing block 112(200) which includes the noise magnitude
`
`estimation for each frequency bin - the subtraction process that estimates the noise-
`
`free complex value for each frequency bin and the residual noise reduction process.
`
`An IFFT (Inverse Fast Fourier Transform) processor 114 performs the Inverse Fourier
`
`Transform on the complex noise free data to provide 512 time domain points. The
`
`first 256 time domain points are summed by the summer 116 with the previous last
`
`256 data points to compensate for the input overlap and shading process and output at
`
`output terminal 118. The remaining 256 points are saved for the next iteration.
`
`It will be appreciated that, while specific transforms are utilized in the
`
`preferred embodiments, it is of course understood that other transforms may be
`
`applied to the present invention to obtain the spectral noise signal.
`
`RTL345-2_1021-0010
`
`RTL345-2_1021-0010
`
`
`
`WO 00/49602
`
`A
`
`PCT/US00/03538 —
`
`9
`
`Figure 2 is a detailed description of the noise processing block
`
`200(1 12). First, each frequency bin (n) 202 magnitude is estimated. The straight
`
`forward approach is to estimate the magnitude by calculating:
`
`You =«Rea1(n)f+(1mag(n)ffl
`
`In order to save processing time and complexity the signal magnitude
`
`(Y) is estimated by an estimator 204 using an approximation fonnula instead:
`
`Y(n) = Max[|ReaZ(n),Imag(n)|]+0.4* Min[|Real(n),Imag(n)[]
`
`In order to reduce the instability of the spectral estimation, which
`
`typically plagues the FFT Process (ref[2] Digital Signal Processing, Oppenheim
`
`Schafer, Prentice Hall P. 542545), the present invention implements a 2D smoothing
`
`process. Each bin is replaced with the average of its value and the two neighboring
`
`bins‘ value (of the same time frame) by a first averager 206. In addition, the smoothed
`
`value of each smoothed bin is further smoothed by a second averager 208 using a time
`
`exponential average with a time constant of 0.7 (which is the equivalent of averaging
`
`over 3 time frames). The 2D-smoothed value is then used by two processes - the
`
`noise estimation process by noise estimation processor 2l2(300) and the subtraction
`
`process by subtractor 210. The noise estimation process estimates the noise at each
`
`frequency bin and the result is used by the noise subtraction process. The output of
`
`the noise subtraction is fed into a residual noise reduction processor 216 to further
`
`reduce the noise. In one embodiment, the time domain signal is also used by the
`
`residual noise process 216 to determine the speech free segments. The noise free
`
`signal is moved to the IFFT process to obtain the time domain output 218.
`
`Figure 3 is a detailed description of the noise estimation processor
`
`300(2l2). Theoretically, the noise should be estimated by taking a long time average
`
`of the signal magnitude (Y) of non-speech time intervals. This requires that a voice
`
`switch be used to detect the speech/non-speech intervals. However, a too-sensitive a
`
`switch may result in the use of a speech signal for the noise estimation which will
`
`defect the voice signal. A less sensitive switch, on the other hand, may dramatically
`
`RTL345-2_1021-0011
`
`RTL345-2_1021-0011
`
`
`
`WO 00/49602
`
`PCT/US00/03538 -
`
`10
`
`reduce the length of the noise time intervals (especially in continuous speech cases)
`
`and defect the validity of the noise estimation.
`
`In the present invention, a separate adaptive threshold is implemented
`
`for each frequency bin 302. This allows the location of noise elements for each bin
`
`separately without the examination of the overall signal energy. The logic behind this
`
`method is that, for each syllable, the energy may appear at different frequency bands.
`
`At the same time, other frequency bands may contain noise elements. It is therefore
`
`possible to apply a non-sensitive threshold for the noise and yet locate many non-
`
`speech data points for each bin, even within a continuous speech case. The advantage
`
`of this method is that it allows the collection of many noise segments for a good and
`
`stable estimation of the noise, even within continuous speech segments.
`
`In the threshold determination process, for each frequency bin, two
`
`minimum values are calculated. A future minimum value is initiated every 5 seconds
`
`at 304 with the value of the current magnitude (Y(n)) and replaced with a smaller
`
`minimal value over the next 5 seconds through the following process. The future
`
`minimum value of each bin is compared with the current magnitude value of the
`
`signal. If the current magnitude is smaller than the future minimum, the future
`
`minimum is replaced with the magnitude which becomes the new future minimum.
`
`At the same time, a current minimum value is calculated at 306. The
`
`current minimum is initiated every 5 seconds with the value of the future minimum
`
`that was determined over the previous 5 seconds and follows the minimum value of
`
`the signal for the next 5 seconds by comparing its value with the current magnitude
`
`Value. The current minimum value is used by the subtraction process, While the future
`
`minimum is used for the initiation and refreshing of the current minimum.
`
`The noise estimation mechanism of the present invention ensures a
`
`tight and quick estimation of the noise value, with limited memory of the process (5
`
`seconds), while preventing a too high an estimation of the noise.
`
`Each bin's magnitude (Y(n)) is compared with four times the current
`
`minimum value of that bin by comparator 308 — which serves as the adaptive
`
`threshold for that bin. If the magnitude is within the range (hence below the
`
`threshold), it is allowed as noise and used by an exponential averaging unit 310 that
`
`determines the level of the noise 312 of that frequency. If the magnitude is above the
`
`threshold it is rejected for the noise estimation. The time constant for the exponential
`
`RTL345-2_l 021-0012
`
`RTL345-2_1021-0012
`
`
`
`WO 00/49602
`
`PCT/US00/03538 ~
`
`averaging is typically 0.95 which may be interpreted as taking the average of the last
`
`20 frames. The threshold of 4*minimum value may be changed for some
`
`1 1
`
`applications.
`
`Figure 4 is a detailed description of the subtraction processor 400(2l0).
`
`In a straight forward approach, the value of the estimated bin noise magnitude is
`
`subtracted from the current bin magnitude. The phase of the current bin is calculated
`
`and used in conjunction with the result of the subtraction to obtain the Real and
`
`Imaginary parts of the result. This approach is very expensive in terms of processing
`
`and memory because it requires the calculation of the Sine and Cosine arguments of
`
`the complex vector with consideration of the 4 quarters where the complex vector
`
`may be positioned. An alternative approach used in this present invention is to use a
`
`Filter approach. The subtraction is interpreted as a filter multiplication performed by
`
`filter 402 where H (the filter coefficient) is:
`
`H(n) =
`
`Y(n)
`
`— N(n2
`
`lY(n)|
`
`Where Y(n) is the magnitude of the current bin and N(n) is the noise
`
`estimation of that bin. The value H of the filter coefficient (of each bin separately) is
`
`multiplied by the Real and Imaginary parts of the current bin at 404:
`
`E(Real)=Y(Real) *H ;
`
`E(Imag) =Y(Imag) *H
`
`Where E is the noise free complex value. In the straight forward
`
`approach the subtraction may result in a negative value of magnitude. This value can
`
`be either replaced with zero (half-wave rectification) or replaced with a positive value
`
`equal to the negative one (full-wave rectification). The filter approach, as expressed
`
`here, results in the full-wave rectification directly. The full wave rectification
`
`provides a little less noise reduction but introduces much less artifacts to the signal. It
`
`will be appreciated that this filter can be modified to effect a half-wave rectification
`
`by taking the non-absolute value of the numerator and replacing negative values with
`
`ZCFOS.
`
`RTL345-2_l 021-0013
`
`RTL345-2_1021-0013
`
`
`
`WO 00/49602
`
`PCT/US00/03538 -
`
`12
`
`Note also that the values of Y in the figures are the smoothed values of
`
`Y after averaging over neighboring spectral bins and over time frames (2D
`
`smoothing). Another approach is to use the smoothed Y only for the noise estimation
`
`(N), and to use the unsmoothed Y for the calculation of H.
`
`Figure 5 illustrates the residual noise reduction processor 500(216).
`
`The residual noise is defined as the remaining noise during non—speech intervals. The
`
`noise in these intervals is first reduced by the subtraction process which does not
`
`differentiate between speech and non—speech time intervals. The remaining residual
`
`noise can be reduced further by using a voice switch 502 and either multiplying the
`
`residual noise by a decaying factor or replacing it with zeros. Another alternative to
`
`the zeroing is replacing the residual noise with a minimum value of noise at 504.
`
`Yet another approach, which avoids the voice switch, is illustrated in
`
`Figure 5A. The residual noise reduction processor 506 applies a similar threshold
`
`used by the noise estimator at 508 on the noise free output bin and replaces or decays
`
`the result when it is lower than the threshold at 510.
`
`The result of the residual noise processing of the present invention is a
`
`quieter sound in the non—speech intervals. However, the appearance of artifacts such
`
`as a pumping noise when the noise level is switched between the speech interval and
`
`the non—speech interval may occur in some applications.
`
`The spectral subtraction technique o