throbber
peT
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`WO 00/49602
`
`WORLD INTELLECTUAL PROPERTY ORGANIZATION
`International Bureau
`
`(51) International Patent Classification 7 :
`GIOL 21102
`
`(11) International Publication Number:
`
`Al
`
`(43) International Publication Date:
`
`24 August 2000 (24.08.00)
`
`(21) International Application Number:
`
`PCTIUSOOI03538
`
`(22) International Filing Date:
`
`11 February 2000 (11.02.00)
`
`(81) Designated States: CA, CN, IL, JP, US, European patent (AT,
`BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU,
`MC, NL, PT, SE).
`
`(30) Priority Data:
`09/252,874
`09/385,996
`
`18 February 1999 (18.02.99)
`30 August 1999 (30.08.99)
`
`US
`US
`
`Published
`With international search report.
`
`(71) Applicant (for all designated States except US): ANDREA
`ELECTRONICS CORPORATION [USIUS]; 45 Melville
`Park Road, Melville, NY 11747 (US).
`
`(72) Inventors; and
`(75) Inventors/Applicants (for US only): MARASH, Joseph
`[lUlL]; Shimkin Street lA, 34750 Haifa (lL). BERDUGO,
`Baruch [lUlL]; Hanarkisim Street 6, 28000 Kiriat-Ata (IL).
`
`(74) Agents: KOWALSKI, Thomas, J. et al.; Frommer Lawrence &
`Haug LLP, 745 Fifth Avenue, New York, NY 10151 (US).
`
`(54) Title: SYSTEM, METHOD AND APPARATUS FOR CANCELLING NOISE
`
`102
`(
`Input Samples
`I
`
`:>-
`
`.2
`
`Collect
`Input
`Dala
`
`100
`
`106
`,J
`Combine
`c::::> 256 New
`
`Points With
`256 History
`
`c::::>
`
`Shading
`Coefficients
`
`~ 108
`r:::>
`
`Multiply By
`Hanning
`Window
`
`~,-~_~...,Po,......ln...Jt c:::> ~~~ .. "g C:::>9C:::>
`
`110
`
`112 (200)
`
`114
`
`Spectral Subtraction System
`
`Overlap ~ Output
`and Sum
`'--1,.;00' Samples
`(
`118
`
`116
`
`(57) Abstract
`
`A system for cancelling and reducing an audio signal noise arising from electrical or electromagnetic noise sources such as AC to
`DC power converter used by computers. The system receives a digital audio signal (102) sampled at a frequency which is at least twice
`the bandwidth of the audio signal. The input samples are stored in a temporary buffer of 256 points (104). when the buffer is full, the new
`256 points are combined in a combiner (106) with the previous 256 points to provide 512 input points which are multiplied by multiplier
`(108) with a shading window with the length of 512 points. The shaded results are converted to the frequency domain through an FFT
`processor (110). The FFT output is processed in a noise processor (112) which includes the noise magnitude estimation for each frequency
`bin, the substraction that estimates the noise free complex value for each frequency bin and the residual noise reduction process.
`
`RTL345-1_1021-0001
`
`

`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT.
`
`AL
`AM
`AT
`AU
`AZ
`BA
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`CG
`CII
`CI
`CM
`CN
`CU
`CZ
`DE
`DK
`EE
`
`Albania
`Annenia
`Austria
`Australia
`Azerbaijan
`Bosnia and Herzegovina
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belants
`Canada
`Central African Republic
`Congo
`Switzerland
`Cote d'Ivoire
`Cameroon
`China
`Cuba
`Czech Repub lie
`Gennany
`Denmark
`Estonia
`
`ES
`FI
`FR
`GA
`GB
`GE
`GH
`GN
`GR
`HU
`IE
`IL
`IS
`IT
`JP
`KE
`KG
`KP
`
`KR
`KZ
`LC
`LI
`LK
`LR
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Ireland
`Israel
`Iceland
`Italy
`Japan
`Kenya
`Kyrgyzstan
`Democratic People's
`Republic of Korea
`Republic of Korea
`Kazakstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`LS
`LT
`LV
`LV
`MC
`MD
`MG
`MK
`
`ML
`MN
`MR
`MW
`MX
`NE
`NL
`NO
`NZ
`PL
`PT
`RO
`RU
`SD
`SE
`SG
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`The fonner Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`NetherlandS
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`SI
`SK
`SN
`SZ
`TD
`TG
`TJ
`TM
`TR
`TT
`UA
`UG
`US
`UZ
`VN
`YU
`ZW
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`Turkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`Zimbabwe
`
`RTL345-1_1021-0002
`
`

`
`WOOO/49602
`
`PCT/USOO/03538 -
`
`SYSTEM. METHOD AND APPARATUS FOR CANCELING NOISE
`
`RELATED APPLICATIONS INCORPORATED BY REFERENCE.
`
`This application claims priority from U.S. Patent Application Serial
`
`5 Nos. 091252,874 filed February 18,1999 and 09/385,996 filed August 30,1999 and
`
`reference is made to U.S. Patent Application Serial No. 601126,567 filed March 26,
`
`1999, all of which are herein incorporated by refere1J.ce.
`
`The following applications anci patent(s) are cited and hereby herein
`
`incorporated by reference: U.S. Patent Application Serial No. 09/252,874 filed
`
`10
`
`Februarj 18, 1999, U.S. Patent Appbcation Serial }.;o. 09/130,923 filed August 6,
`
`1998, U.S. Patent Application Serial No 09/055,709 filed April 7,1998, U.S. Patent
`
`Application Serial No. 09/059,503 filed April 13. J 998, U.S. Patent Application Serial
`
`No. Og!840.159 filed April I4, 1997, U.S. Patent ApDlication Serial No. 09/130,923
`
`filed August 6, 1998, U.S. Patent Application Serial ~~o. 08/672,899 now issued U.S.
`
`IS
`
`Patent No. 5,825,898 issued October 20, 1998; and U.S. Patent A.pplication Serial No.
`
`09/089,710 filed June 3,1998, U.S. Patent No. 5,825.897 issued October 20, 1998,
`
`U.S. Patent No. 5,732,143 Issued March 24, 1998, l..l.S. Patent No. 5,673,325 issued
`
`September 30, i997, U.S. Patent No. 5,381,473 issued January 10,1995, U.S. Patent
`
`ApplicatIOn Seriai No. 08/833,384 filed April 4 1991. And, all documents cited
`
`20
`
`herein are incorporated herein by reference, as are documents CIted or referenced Ir
`
`documents cited herein.
`
`FIELD O:F THE INVENTION.
`
`The present invention relates to nois~ cancellation and reJuction ane.
`
`more specifica.lly ,LO noise ;;ancellation ;~.nd reduction using spectral 3uhtra;;tion.
`
`25 BACKGROU~D OF THE INVENTION.
`
`Ambient noise added to speech degrac~es the per!ormancf! of spee(:h
`
`processing algorithms. Such processing algorithrn.f may inciude dictation. voice
`
`activation, voice compression and other systems. In such systems, it is desired to
`
`reduce tbe noise and improve the signal to noise fDt10 (SIN ratio) without tffecting
`
`30
`
`the spee~h and its characteristics.
`
`Near field noise callceling microphones proVide a satisfactory Sol.utlOTI
`
`bm require that the microphone in the proximity orth;; voice SOtll"ce (e,g., mouth). Tn
`
`many cases, this is achi.::ved by mounting the micj'ophone on ":' boom of a beadset
`
`RTL345-1_1021-0003
`
`

`
`WOOO/49602
`
`PCT/USOO/03538 -
`
`2
`many cases, this is achieved by mounting the microphone on a boom of a headset
`
`which situates the microphone at the end of a boom proximate the mouth of the
`
`wearer. However, the headset has proven to be either uncomfortable to wear or too
`
`restricting for operation in, for example, an automobile.
`
`5
`
`Microphone array technology in general, and adaptive beamforming
`
`arrays in particular, handle severe directional noises in the most efficient way. These
`
`systems map the noise field and create nulls towards the noise sources. The number
`
`of nulls is limited by the number of microphone elements and processing power.
`
`Such arrays have the benefit of hands-free operation without the necessity of a
`
`10
`
`headset.
`
`However, when the noise sources are diffused, the performance ofthe
`
`adaptive system will be reduced to the perfom1ance of a regular delay and sum
`
`microphone array, which is not always satisfactory. This is the case where the
`
`environment is quite reverberant, such as when the noises are strongly reflected from
`
`15
`
`the walls of a room and reach the array from an infinite number of directions. Such
`
`is also the case in a car environment for some of the noises radiated from the car
`
`chassis.
`
`OBJECTS AND SUMMARY OF THE INVENTION
`
`The spectral subtraction technique provides a solution to further reduce
`
`20
`
`the noise by estimating the noise magnitude spectrum of the polluted signal. The
`
`technique estimates the magnitude spectral level of the noise by measuring it during
`
`non-speech time intervals detected by a voice switch, and then subtracting the noise
`
`magnitude spectrum from the signal. This method, described in detail in Suppression
`
`of Acoustic Noise in Speech Using Spectral Subtraction, (Steven F Boll, IEEE ASSP-
`
`25
`
`27 NO. 2 April, 1979), achieves good results for stationary diffused noises that are not
`
`correlated with the speech signal. The spectral subtraction method, however, creates
`
`artifacts, sometimes described as musical noise, that may reduce the performance of
`
`the speech algorithm (such as vocoders or voice activation) if the spectral subtraction
`
`is uncontrolled. In addition, the spectral subtraction method assumes erroneously that
`
`30
`
`the voice switch accurately detects the presence of speech and locates the non-speech
`
`time intervals. This assumption is reasonable for off-line systems but difficult to
`
`achieve or obtain in real time systems.
`
`RTL345-1_1021-0004
`
`

`
`WO 00/49602
`
`PCTIUSOO/03538 -
`
`3
`More particularly, the noise magnitude spectrum is estimated by
`
`performing an FFT of 256 points of the non-speech time intervals and computing the
`
`energy of each frequency bin. The FFT is performed after the time domain signal is
`
`multiplied by a shading window (Hanning or other) with an overlap of 50%. The
`
`5
`
`energy of each frequency bin is averaged with neighboring FFT time frames. The
`
`number of frames is not determined but depends on the stability of the noise. For a
`
`stationary noise, it is preferred that many frames are averaged to obtain better noise
`
`estimation. For a non-stationary noise, a long averaging may be harmful.
`
`Problematically, there is no means to know a-priori whether the noise is stationary or
`
`10
`
`non-stationary.
`
`Assuming the noise magnitude spectrum estimation is calculated, the
`
`input signal is multiplied by a shading window (Hanning or other), an FFT is
`
`performed (256 points or other) with an overlap of 50% and the magnitude of each
`
`bin is averaged over 2-3 FFT frames. The noise magnitude spectrum is then
`
`15
`
`subtracted from the signal magnitude. lfthe result is negative, the value is replaced
`
`by a zero (Half Wave Rectification). It is recommended, however, to further reduce
`
`the residual noise present during non-speech intervals by replacing low values with a
`
`minimum value (or zero) or by attenuating the residual noise by 30dB. The resulting
`
`output is the noise free magnitude spectrum.
`
`20
`
`The spectral complex data is reconstructed by applying the phase
`
`infomlation of the relevant bin of the signal's FFT with the noise free magnitude. An
`
`lFFT process is then performed on the complex data to obtain the noise free time
`
`domain data. The time domain results are overlapped and summed with the previous
`
`frame's results to compensate for the overlap process of the FFT.
`
`25
`
`There are several problems associated with the system described.
`
`First, the system assumes that there is a prior knowledge of the speech and non(cid:173)
`
`speech time intervals. A voice switch is not practical to detect those periods.
`
`Theoretically, a voice switch detects the presence of the speech by measuring the
`
`energy level and comparing it to a threshold. lfthe threshold is too high, there is a
`
`30
`
`risk that some voice time intervals might be regarded as a non-speech time interval
`
`and the system will regard voice information as noise. The result is voice distortion,
`
`especially in poor signal to noise ratio cases. If, on the other hand, the threshold is too
`
`low, there is a risk that the non-speech intervals will be too short especially in poor
`
`RTL345-1_1021-0005
`
`

`
`WO 00/49602
`
`PCTIUSOO/03538·
`
`4
`signal to noise ratio cases and in cases where the voice is continuous with little
`
`intennission.
`
`Another problem is that the magnitude calculation of the FFT result is
`
`quite complex. This involves square and square root calculations which are very
`
`5
`
`expensive in tenns of computation load. Yet another problem is the association of the
`
`phase infonnation to the noise free magnitude spectrum in order to obtain the
`
`infonnation for the IFFT. This process requires the calculation of the phase, the
`
`storage of the infonnation, and applying the infonnation to the magnitude data - all
`
`are expensive in tenns of computation and memory requirements. Another problem is
`
`10
`
`the estimation of the noise spectral magnitude. The FFT process is a poor and
`
`unstable estimator of energy. The averaging-over-time of frames contributes
`
`insufficiently to the stability. Shortening the length of the FFT results in a wider
`
`bandwidth of each bin and better stability but reduces the perfonnance of the system.
`
`A veraging-over-time, moreover, smears the data and, for this reason, cannot be
`
`15
`
`extended to more than a few frames. This means that the noise estimation process
`
`proposed is not sufficiently stable.
`
`It is therefore an object of this invention to provide a spectral
`
`subtraction system that has a simple, yet efficient mechanism, to estimate the noise
`
`magnitude spectrum even in poor signal-to-noise ratio situations and in continuous
`
`20
`
`fast speech cases.
`
`It is another object of this invention to provide an efficient mechanism
`
`that can perfonn the magnitude estimation with little cost, and will overcome the
`
`problem of phase association.
`
`It is yet another object of this invention to provide a stable mechanism
`
`25
`
`to estimate the noise spectral magnitUde without the smearing of the data.
`
`In accordance with the foregoing objectives, the present invention
`
`provides a system that correctly detennines the non-speech segments of the audio
`
`signal thereby preventing erroneous processing of the noise canceling signal during
`
`the speech segments. In the preferred embodiment, the present invention obviates the
`
`30
`
`need for a voice switch by precisely detennining the non-speech segments using a
`
`separate threshold detector for each frequency bin. The threshold detector precisely
`
`detects the positions of the noise elements, even within continuous speech segments,
`
`by detennining whether frequency spectrum elements, or bins, of the input signal are
`
`RTL345-1_1021-0006
`
`

`
`WO 00/49602
`
`peT IUSOO/03538-
`
`5
`within a threshold set according to a minimum value of the frequency spectrum
`
`elements over a preset period of time. More precisely, current and future minimum
`
`values of the frequency spectrum elements. Thus, for each syllable, the energy of the
`
`noise elements is determined by a separate threshold determination without
`
`5
`
`examination of the overall signal energy thereby providing good and stable
`
`estimation of the noise. In addition, the system preferably sets the threshold
`
`continuously and resets the threshold within a predetermined period of time of, for
`
`example, five seconds.
`
`In order to reduce complex calculations, it is preferred in the present
`
`10
`
`invention to obtain an estimate of the magnitude of the input audio signal using a
`
`multiplying combination of the real and imaginary parts of the input in accordance
`
`with, for example, the higher and the lower values of the real and imaginary parts of
`
`the signal. In order to further reduce instability of the spectral estimation, a two(cid:173)
`
`dimensional (2D) smoothing process is applied to the signal estimation. A two-step
`
`15
`
`smoothing function using first neighboring frequency bins in each time frame then
`
`applying an exponential time average effecting an average over time for each
`
`frequency bin produces excellent results.
`
`In order to reduce the complexity of determining the phase of the
`
`frequency bins during subtraction to thereby align the phases of the subtracting
`
`20
`
`elements, the present invention applies a filter multiplication to effect the subtraction.
`
`The filter function, a Weiner filter function for example, or an approximation ofthe
`
`Weiner filter is multiplied by the complex data of the frequency domain audio signal.
`
`The filter function may effect a full-wave rectification, or a half-wave rectification for
`
`otherwise negative results of the subtraction process or simple subtraction. It will be
`
`25
`
`appreciated that, since the noise elements are determined within continuous speech
`
`segments, the noise estimation is accurate and it may be canceled from the audio
`
`signal continuously providing excellent noise cancellation characteristics.
`
`The present invention also provides a residual noise reduction process
`
`for reducing the residual noise remaining after noise cancellation. The residual noise
`
`30
`
`is reduced by zeroing the non-speech segments, e.g., within the continuous speech, or
`
`decaying the non-speech segments. A voice switch may be used or another threshold
`
`detector which detects the non-speech segments in the time-domain.
`
`RTL345-1_1021-0007
`
`

`
`WO 00/49602
`
`PCT/USOO/03538 -
`
`6
`The present invention is applicable with various noise canceling
`
`systems including, but not limited to, those systems described- in the U.S. patent
`
`applications incorporated herein by reference. The present invention, for example, is
`
`applicable with the adaptive beamforming array. In addition, the present invention
`
`5 may be embodied as a computer program for driving a computer processor either
`
`installed as application software or as hardware.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Other objects, features and advantages according to the present
`
`invention will become apparent from the following detailed description of the
`
`10
`
`illustrated embodiments when read in conjunction with the accompanying drawings in
`
`which corresponding components are identified by the same reference numerals.
`
`Fig. 1 illustrates the present invention;
`
`Fig. 2 illustrates the noise processing of the present invention;
`
`Fig. 3 illustrates the noise estimation processing of the present
`
`15
`
`invention;
`
`Fig. 4 illustrates the subtraction processing of the present invention;
`
`Fig. 5 illustrates the residual noise processing of the present invention;
`
`Fig. 5A illustrates a variant ofthe residual noise processing of the
`
`present invention;
`
`20
`
`25
`
`Fig. 6 illustrates a flow diagram of the present invention;
`
`Fig. 7 illustrates a flow diagram of the present invention;
`
`Fig. 8 illustrates a flow diagram of the present invention; and
`
`Fig. 9 illustrates a flow diagram of the present invention.
`
`DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
`
`The present invention, in one embodiment, practicable as a spectral
`
`subtraction system, method and apparatus for canceling andlor reducing noise arising
`
`from electrical or electromagnetic noise sources such as external electromagnetic
`
`noise sources such as power sources including, power supplies such as an AC source,
`
`an AC to DC power converter such as used by a computer, particularly a lap-top
`
`30
`
`computer. In particular, it was discovered that the power supply of a computer such
`
`as a lap-top device creates an interference noise on or in relation to the Universal
`
`Serial Bus (USB) line, port or signal thereon. Evidently, the power source on the
`
`power conversion creates an interference signal (herein referred to as "isotropic
`
`RTL345-1_1021-0008
`
`

`
`WO 00/49602
`
`peT IUSOO/03538-
`
`7
`diffused stationary noise" or "isotropic noise") which is transposed through, for
`
`example, electromagnetic coupling to the USB signal line which interferes with the
`
`signals thereon. This noise is audible when reproduced by a transducer, for example,
`
`as a buzzing sound. Ironically, USB was previously thought to avoid the audio noise
`
`5
`
`present from such sources which manifests on such devices as sound cards. Since
`
`USB is rapidly becoming the standard for speech and voice communications
`
`applications, for example, received from audio signal peripherals including signals
`
`received over the internet or other remote-transmission medium, it is a significant feat
`
`to eliminate this isotropic noise and, indeed, would have the same impact on the
`
`10 market as when DolbyTM was invented.
`
`The present invention as described herein was discovered to eliminate
`
`the "dirty noise" arising from the power source or power converter and manifesting on
`
`the USB signal line. One skilled in the art will appreciate how the spectral subtraction
`
`system, method and apparatus described herein is embodied as any of the well-known
`
`15
`
`computer software and/or hardware applications on a computer including, for
`
`example, a device driver or dynamic link library as particularly set forth in related
`
`application U.S. Patent Application Serial No. 601126,567. In cooperation with the
`
`invention of U.S. Patent Application Serial No. 601126,567, the present invention
`
`includes filters selectable by pull down menus for filtering out the isotropic noise. In
`
`20
`
`one embodiment, it was discovered that the preferred operating range ofthe present
`
`invention is scalable either automatically at the control of a computer processor or
`
`manually by the user by way of, for example, potentiometer or clickable object
`
`presented by the tabbed pull-down tabbed, between 8dB to 14dB since this appeared
`
`to provide optimal performance although it is within the present invention to provide
`
`25
`
`other dB ranges. However, when noise reduction is above 14dB, speech is attacked
`
`and there can be degradation of speech recognition.
`
`Thus, the present invention is applicable to inherent system noise or
`
`noise induced by a system, as well as to accoustical noise; and the invention reduces
`
`or eliminates inherent system noise induced by a system (e.g., noise from a power
`
`30
`
`source or power converter), as well as accoustical noise.
`
`Figure 1 illustrates an embodiment of the present invention 100. The
`
`system receives a digital audio signal at input 102 sampled at a frequency which is at
`
`least twice the bandwidth of the audio signal. In one embodiment, the signal is
`
`RTL345-1_1021-0009
`
`

`
`WOOO/49602
`
`PCT/USOO/03538.
`
`8
`derived from a microphone signal that has been processed through an analog front
`
`end, AID converter and a decimation filter to obtain the required sampling frequency.
`
`In another embodiment, the input is taken from the output of a beamformer or even an
`
`adaptive beamformer. In that case the signal has been processed to eliminate noises
`
`5
`
`arriving from directions other than the desired one leaving mainly noises originated
`
`from the same direction of the desired one. In yet another embodiment, the input
`
`signal can be obtained from a sound board when the processing is implemented on a
`
`PC processor or similar computer processor.
`
`The input samples are stored in a temporary buffer 104 of 256 points.
`
`10 When the buffer is full, the new 256 points are combined in a combiner 106 with the
`
`previous 256 points to provide 512 input points. The 512 input points are multiplied
`
`by multiplier 108 with a shading window with the length of 512 points. The shading
`
`window contains coefficients that are multiplied with the input data accordingly. The
`
`shading window can be Hanning or other and it serves two goals: the first is to smooth
`
`15
`
`the transients between two processed blocks (together with the overlap process); the
`
`second is to reduce the side lobes in the frequency domain and hence prevent the
`
`masking of low energy tonals by high energy side lobes. The shaded results are
`
`converted to the frequency domain through an FFT (Fast Fourier Transform)
`
`processor 110. Other lengths of the FFT samples (and accordingly input buffers) are
`
`20
`
`possible including 256 points or 1024 points.
`
`The FFT output is a complex vector of256 significant points (the other
`
`256 points are an anti-symmetric replica of the first 256 points). The points are
`
`processed in the noise processing block 112(200) which includes the noise magnitude
`
`estimation for each frequency bin - the subtraction process that estimates the noise-
`
`25
`
`free complex value for each frequency bin and the residual noise reduction process.
`
`An IFFT (Inverse Fast Fourier Transform) processor 114 performs the Inverse Fourier
`
`Transform on the complex noise free data to provide 512 time domain points. The
`
`first 256 time domain points are summed by the summer 116 with the previous last
`
`256 data points to compensate for the input overlap and shading process and output at
`
`30
`
`output terminal 118. The remaining 256 points are saved for the next iteration.
`
`It will be appreciated that, while specific transforms are utilized in the
`
`preferred embodiments, it is of course understood that other transforms may be
`
`applied to the present invention to obtain the spectral noise signal.
`
`RTL345-1_1021-0010
`
`

`
`WOOO/49602
`
`PCTIUSOO/03538 -
`
`9
`Figure 2 is a detailed description of the noise processing block
`
`200(112). First, each frequency bin (n) 202 magnitude is estimated. The straight
`
`forward approach is to estimate the magnitude by calculating:
`
`5
`
`Y(n) =((Real(n)/+ (Imag(n)/r2
`
`In order to save processing time and complexity the signal magnitude
`
`(Y) is estimated by an estimator 204 using an approximation fonnula instead:
`
`10
`
`Y(n) Max[IReal(n),Imag(n)IJ+O.4* Min[IReal(n),Imag(n)1J
`
`In order to reduce the instability of the spectral estimation, which
`
`typically plagues the FFT Process (ref[2) Digital Signal Processing, Oppenheim
`
`Schafer, Prentice Hall P. 542545), the present invention implements a 2D smoothing
`
`15
`
`process. Each bin is replaced with the average of its value and the two neighboring
`
`bins' value (of the same time frame) by a first averager 206. In addition, the smoothed
`
`value of each smoothed bin is further smoothed by a second averager 208 using a time
`
`exponential average with a time constant of 0.7 (which is the equivalent of averaging
`
`over 3 time frames). The 2D-smoothed value is then used by two processes - the
`
`20
`
`noise estimation process by noise estimation processor 212(300) and the subtraction
`
`process by subtractor 210. The noise estimation process estimates the noise at each
`
`frequency bin and the result is used by the noise subtraction process. The output of
`
`the noise subtraction is fed into a residual noise reduction processor 216 to further
`
`reduce the noise. In one embodiment, the time domain signal is also used by the
`
`25
`
`residual noise process 216 to detennine the speech free segments. The noise free
`
`signal is moved to the IFFT process to obtain the time domain output 218.
`
`Figure 3 is a detailed description of the noise estimation processor
`
`300(212). Theoretically, the noise should be estimated by taking a long time average
`
`of the signal magnitude (Y) of non-speech time intervals. This requires that a voice
`
`30
`
`switch be used to detect the speech/non-speech intervals. However, a too-sensitive a
`
`switch may result in the use of a speech signal for the noise estimation which will
`
`defect the voice signal. A less sensitive switch, on the other hand, may dramatically
`
`RTL345-1_1021-0011
`
`

`
`WO 00/49602
`
`PCT/USOO/03538·
`
`10
`reduce the length of the noise time intervals (especially in continuous speech cases)
`
`and defect the validity of the noise estimation.
`
`In the present invention, a separate adaptive threshold is implemented
`
`for each frequency bin 302. This allows the location of noise elements for each bin
`
`5
`
`separately without the examination of the overall signal energy. The logic behind this
`
`method is that, for each syllable, the energy may appear at different frequency bands.
`
`At the same time, other frequency bands may contain noise elements. It is therefore
`
`possible to apply a non-sensitive threshold for the noise and yet locate many non(cid:173)
`
`speech data points for each bin, even within a continuous speech case. The advantage
`
`10
`
`of this method is that it allows the collection of many noise segments for a good and
`
`stable estimation of the noise, even within continuous speech segments.
`
`In the threshold determination process, for each frequency bin, two
`
`minimum values are calculated. A future minimum value is initiated every 5 seconds
`
`at 304 with the value of the current magnitude (Y(n» and replaced with a smaller
`
`15 minimal value over the next 5 seconds through the following process. The future
`
`minimum value of each bin is compared with the current magnitude value of the
`
`signal. If the current magnitude is smaller than the future minimum, the future
`
`minimum is replaced with the magnitude which becomes the new future minimum.
`
`At the same time, a current minimum value is calculated at 306. The
`
`20
`
`current minimum is initiated every 5 seconds with the value of the future minimum
`
`that was determined over the previous 5 seconds and follows the minimum value of
`
`the signal for the next 5 seconds by comparing its value with the current magnitude
`
`value. The current minimum value is used by the subtraction process, while the future
`
`minimum is used for the initiation and refreshing of the current minimum.
`
`25
`
`The noise estimation mechanism of the present invention ensures a
`
`tight and quick estimation of the noise value, with limited memory of the process (5
`
`seconds), while preventing a too high an estimation of the noise.
`
`Each bin's magnitude (Y(n» is compared with four times the current
`
`minimum value of that bin by comparator 308 which serves as the adaptive
`
`30
`
`threshold for that bin. If the magnitude is within the range (hence below the
`
`threshold), it is allowed as noise and used by an exponential averaging unit 310 that
`
`determines the level of the noise 312 of that frequency. If the magnitude is above the
`
`threshold it is rejected for the noise estimation. The time constant for the exponential
`
`RTL345-1_1021-0012
`
`

`
`WO 00/49602
`
`peT /USOO/03538-
`
`11
`averaging is typically 0.95 which may be interpreted as taking the average of the last
`
`20 frames. The threshold of 4*minimum value may be changed for some
`
`applications.
`
`Figure 4 is a detailed description of the subtraction processor 400(210).
`
`5
`
`In a straight forward approach, the value of the estimated bin noise magnitude is
`
`subtracted from the current bin magnitude. The phase of the current bin is calculated
`
`and used in conjunction with the result of the subtraction to obtain the Real and
`
`Imaginary parts of the result. This approach is very expensive in terms of processing
`
`and memory because it requires the calculation of the Sine and Cosine arguments of
`
`10
`
`the complex vector with consideration of the 4 quarters where the complex vector
`
`may be positioned. An alternative approach used in this present invention is to use a
`
`Filter approach. The subtraction is interpreted as a filter multiplication performed by
`
`filter 402 where H (the filter coefficient) is:
`
`15 H(n)
`
`IIY(n)I-IN(n)l!
`
`I Y(n) I
`
`Where Yen) is the magnitude of the current bin and N(n) is the noise
`
`estimation of that bin. The value H of the filter coefficient (of each bin separately) is
`
`20 mUltiplied by the Real and Imaginary parts of the current bin at 404:
`
`E(Real) = Y(Real) *H
`
`E(Imag) = Y(Imag) *H
`
`Where E is the noise free complex value. In the straight forward
`
`25
`
`approach the subtraction may result in a negative value of magnitude. This value can
`
`be either replaced with zero (half-wave rectification) or replaced with a positive value
`
`equal to the negative one (full-wave rectification). The filter approach, as expressed
`
`here, results in the full-wave rectification directly. The full wave rectification
`
`provides a little less noise reduction but introduces much less artifacts to the signal. It
`
`30 will be appreciated that this filter can be modified to effect a half-wave rectification
`
`by taking the non-absolute value of the numerator and replacing negative values with
`
`zeros.
`
`RTL345-1_1021-0013
`
`

`
`WO 00/49602
`
`PCT/USOO/03538·
`
`12
`Note also that the values of Y in the figures are the smoothed values of
`
`Y after averaging over neighboring spectral bins and over time frames (2D
`
`smoothing). Another approach is to use the smoothed Y only for the noise estimation
`
`(N), and to use the unsmoothed Y for the calculation of H.
`
`5
`
`Figure 5 illustrates the residual noise reduction processor 500(216).
`
`The residual noise is defined as the remaining noise during non-speech intervals. The
`
`noise in these intervals is first reduced by the subtraction process which does not
`
`differentiate between speech and non-speech

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket