`(12) Patent Application Publication (10) Pub. No.: US 2003/0128848A1
`(43) Pub. Date:
`Jul. 10, 2003
`Burnett
`
`US 2003.0128848A1
`
`(54) METHOD AND APPARATUS FOR
`REMOVING NOISE FROM ELECTRONIC
`SIGNALS
`(76) Inventor: Gregory C. Burnett, Livermore, CA
`(US)
`Correspondence Address:
`Shemwell Gregory & Courtney LLP
`Suite 201
`4880 Stevens Creek Boulevard
`San Jose, CA 95129 (US)
`(21) Appl. No.:
`10/301.237
`(22) Filed:
`Nov. 21, 2002
`Related U.S. Application Data
`(63) Continuation-in-part of application No. 09/905,361,
`filed on Jul. 12, 2001.
`
`Publication Classification
`
`(51) Int. Cl." .......................... A61F 11/06; G1OK 11/16;
`HO3B 29/00
`(52) U.S. Cl. ........................................ 381/71.8; 381/71.14
`
`(57)
`
`ABSTRACT
`
`A method and System for removing acoustic noise removal
`from human speech is described. Acoustic noise is removed
`regardless of noise type, amplitude, or orientation. The
`System includes a processor coupled among microphones
`and a voice activation detection (“VAD”) element. The
`processor executes denoising algorithms that generate trans
`fer functions. The processor receives acoustic data from the
`microphones and data from the VAD. The processor gener
`ates various transfer functions when the VAD indicates
`Voicing activity and when the VAD indicates no voicing
`activity. The transfer functions are used to generate a
`denoised data Stream.
`
`
`
`(;)
`
`SIGNAL
`
`()-->
`
`LGE EXHIBIT NO. 1009
`
`- 1 -
`
`Amazon v. Jawbone
`U.S. Patent 11,122,357
`Amazon Ex. 1009
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003 Sheet 1 of 13
`
`US 2003/0128848A1
`
`
`
`
`
`
`
`t
`
`
`
`
`
`C.
`
`vur
`
`- 2 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003 Sheet 2 of 13
`
`US 2003/0128848A1
`
`z ºb|-
`
`
`
`- 3 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003 Sheet 3 of 13
`
`US 2003/0128848A1
`
`
`
`()-->
`
`FIG.2
`
`- 4 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003. Sheet 4 of 13
`
`US 2003/0128848A1
`
`
`
`Flo. M
`
`- 5 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003. Sheet 5 of 13
`
`US 2003/0128848A1
`
`
`
`RECEIVE ACOUSTIC
`SIGNALS
`
`SO2
`
`
`
`
`
`
`
`
`
`RECEIVE VOICE ACTIVITY
`(VAD) INFORMATION
`
`504
`
`
`
`
`
`
`
`
`
`DETERMINE ABSENCE OF
`WOICING AND GENERATE FIRST
`TRANSFER FUNCTION
`
`506
`
`DETERMINE PRESENCE OF
`WOCING AND GENERATE
`SECONDTRANSFERFUNCTION
`
`
`
`508
`
`PRODUCEDENOISED
`ACOUSTIC DATA STREAM
`
`SO
`
`FIG 5
`
`- 6 -
`
`
`
`
`
`rr
`
`=“nN ©on=a —_nmue=SS-m ”8§=boS=—)a¥P<SSa
`oa=&a&
`
`eeWN
`
`
`va.roolnp
`
`
`
`
`
`
`10000
`2| bl
`) 'rato
`ben sone
`
`
`8000
`0
`1
`@ 3
`
`4
`
`§ 6
`
`
`
`406
`
`5
`
`6
`
`LEANED ola uiohis
`AUDIO
`-2000
`
`- 7 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003 Sheet 7 of 13
`
`US 2003/0128848A1
`
`
`
`
`
`- 8 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003. Sheet 8 of 13
`
`US 2003/0128848A1
`
`3
`f
`
`
`
`r
`
`- 9 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003 Sheet 9 of 13
`
`US 2003/0128848A1
`
`
`
`b3ynºl:
`
`- 10 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003 Sheet 10 of 13
`
`US 2003/0128848A1
`
`
`
`?ISION / (})
`
`(u)u| 0 0 1
`
`- 11 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003. Sheet 11 of 13 US 2003/0128848A1
`
`
`
`s
`
`8
`
`O
`
`8
`|eub.Jo Joy % UJIO
`
`8
`
`- 12 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003 Sheet 12 of 13
`
`US 2003/0128848A1
`
`
`
`00GE
`
`- 13 -
`
`
`
`Patent Application Publication
`
`Jul. 10, 2003 Sheet 13 of 13
`
`US 2003/0128848A1
`
`
`
`00GE
`
`
`
`(ZH) Áouanbau-,
`
`[][][]Z
`
`TD ||
`
`ayrºl)
`
`- 14 -
`
`
`
`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`METHOD AND APPARATUS FOR REMOVING
`NOISE FROM ELECTRONIC SIGNALS
`
`RELATED APPLICATIONS
`0001. This patent application is a continuation in part of
`U.S. patent application Ser. No. 09/905,361, filed Jul. 12,
`2001, which is hereby incorporated by reference. This patent
`application also claims priority from U.S. Provisional Patent
`Application Serial No. 60/332,202, filed Nov. 21, 2001.
`
`FIELD OF THE INVENTION
`0002 The invention is in the field of mathematical meth
`ods and electronic Systems for removing or Suppressing
`undesired acoustical noise from acoustic transmissions or
`recordings.
`
`BACKGROUND
`0003. In a typical acoustic application, speech from a
`human user is recorded or Stored and transmitted to a
`receiver in a different location. In the environment of the
`user, there may exist one or more noise Sources that pollute
`the Signal of interest (the user's speech) with unwanted
`acoustic noise. This makes it difficult or impossible for the
`receiver, whether human or machine, to understand the
`user's Speech. This is especially problematic now with the
`proliferation of portable communication devices like cellu
`lar telephones and personal digital assistants. There are
`existing methods for Suppressing these noise additions, but
`they have significant disadvantages. For example, existing
`methods are Slow because of the computing time required.
`Existing methods may also require cumberSome hardware,
`unacceptably distort the Signal of interest, or have Such poor
`performance that they are not useful. Many of these existing
`methods are described in textbooks such as "Advanced
`Digital Signal Processing and Noise Reduction” by Vaseghi,
`ISBN 0-471-62692-9.
`
`BRIEF DESCRIPTION OF THE FIGURES
`0004 FIG. 1 is a block diagram of a denoising system,
`under an embodiment.
`0005 FIG. 2 is a block diagram illustrating a noise
`removal algorithm, under an embodiment assuming a single
`noise Source and a direct path to the microphones.
`0006 FIG. 3 is a block diagram illustrating a front end of
`a noise removal algorithm of an embodiment generalized to
`n distinct noise Sources (these noise Sources may be reflec
`tions or echoes of one another).
`0007 FIG. 4 is a block diagram illustrating a front end of
`a noise removal algorithm of an embodiment in a general
`case where there are n distinct noise Sources and Signal
`reflections.
`0008 FIG. 5 is a flow diagram of a denoising method,
`under an embodiment.
`0009 FIG. 6 shows results of a noise Suppression algo
`rithm of an embodiment for an American English female
`Speaker in the presence of airport terminal noise that
`includes many other human Speakers and public announce
`mentS.
`
`0010 FIG. 7 is a block diagram of a physical configu
`ration for denoising using unidirectional and omnidirec
`tional microphones, under the embodiments of FIGS. 2, 3,
`and 4.
`0011 FIG. 8 is a denoising microphone configuration
`including two omnidirectional microphones, under an
`embodiment.
`0012 FIG. 9 is a plot of the C required versus distance,
`under the embodiment of FIG. 8.
`0013 FIG. 10 is a block diagram of a front end of a noise
`removal algorithm under an embodiment in which the two
`microphones have different response characteristics.
`0014 FIG. 11A is a plot of the difference in frequency
`response (percent) between the microphones (at a distance
`of 4 centimeters) before compensation.
`0.015 FIG. 11B is a plot of the difference in frequency
`response (percent) between the microphones (at a distance
`of 4 centimeters) after DFT compensation, under an embodi
`ment.
`0016 FIG. 11C is a plot of the difference in frequency
`response (percent) between the microphones (at a distance
`of 4 centimeters) after time-domain filter compensation,
`under an alternate embodiment.
`
`DETAILED DESCRIPTION
`0017. The following description provides specific details
`for a thorough understanding of, and enabling description
`for, embodiments of the invention. However, one skilled in
`the art will understand that the invention may be practiced
`without these details. In other instances, well-known Struc
`tures and functions have not been shown or described in
`detail to avoid unnecessarily obscuring the description of the
`embodiments of the invention.
`0018. Unless described otherwise below, the construction
`and operation of the various blocks shown in the figures are
`of conventional design. As a result, Such blocks need not be
`described in further detail herein, because they will be
`understood by those skilled in the relevant art. Such further
`detail is omitted for brevity and so as not to obscure the
`detailed description of the invention. Any modifications
`necessary to the blocks in the Figures (or other embodi
`ments) can be readily made by one skilled in the relevant art
`based on the detailed description provided herein.
`0019 FIG. 1 is a block diagram of a denoising system of
`an embodiment that uses knowledge of when Speech is
`occurring derived from physiological information on Voic
`ing activity. The System includes microphones 10 and Sen
`sors 20 that provide signals to at least one processor 30. The
`processor includes a denoising Subsystem or algorithm 40.
`0020 FIG. 2 is a block diagram illustrating a noise
`removal algorithm of an embodiment, showing System com
`ponents used. A Single noise Source and a direct path to the
`microphones are assumed. FIG. 2 includes a graphic
`description of the process of an embodiment, with a single
`signal source 100 and a single noise source 101. This
`algorithm uses two microphones: a “signal' microphone 1
`(“MIC1”) and a “noise" microphone 2 (“MIC 2"), but is not
`so limited. MIC 1 is assumed to capture mostly signal with
`Some noise, while MIC 2 captures mostly noise with some
`
`- 15 -
`
`
`
`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`signal. The data from the signal source 100 to MIC 1 is
`denoted by S(n), where S(n) is a discrete Sample of the analog
`signal from the source 100. The data from the signal source
`100 to MIC 2 is denoted by s(n). The data from the noise
`source 101 to MIC 2 is denoted by n(n). The data from the
`noise source 101 to MIC 1 is denoted by n(n). Similarly, the
`data from MIC 1 to noise removal element 105 is denoted by
`m(n), and the data from MIC 2 to noise removal element
`105 is denoted by m(n).
`0021. The noise removal element also receives a signal
`from a voice activity detection (“VAD”) element 104. The
`VAD 104 detects uses physiological information to deter
`mine when a speaker is speaking. In various embodiments,
`the VAD includes a radio frequency device, an electroglot
`tograph, an ultrasound device, an acoustic throat micro
`phone, and/or an airflow detector.
`0022. The transfer functions from the signal source 100
`to MIC 1 and from the noise Source 101 to MIC 2 are
`assumed to be unity. The transfer function from the Signal
`source 100 to MIC 2 is denoted by H(z), and the transfer
`function from the noise source 101 to MIC 1 is denoted by
`H (Z). The assumption of unity transfer functions does not
`inhibit the generality of this algorithm, as the actual relations
`between the Signal, noise, and microphones are simply ratioS
`and the ratioS are redefined in this manner for simplicity.
`0023. In conventional noise removal systems, the infor
`mation from MIC 2 is used to attempt to remove noise from
`MIC 1. However, an unspoken assumption is that the VAD
`element 104 is never perfect, and thus the denoising must be
`performed cautiously, So as not to remove too much of the
`signal along with the noise. However, if the VAD 104 is
`assumed to be perfect Such that it is equal to Zero when there
`is no speech being produced by the user, and equal to one
`when Speech is produced, a Substantial improvement in the
`noise removal can be made.
`0024.
`In analyzing the single noise source 101 and the
`direct path to the microphones, with reference to FIG. 2, the
`total acoustic information coming into MIC 1 is denoted by
`m(n). The total acoustic information coming into MIC 2 is
`Similarly labeled m(n). In the Z (digital frequency) domain,
`these are represented as M(z) and M2(Z). Then
`
`with
`
`so that
`
`Eq. 1
`M(z)=N(z)+S(z)H, (z)
`0.025 This is the general case for all two microphone
`Systems. In a practical System there is always going to be
`Some leakage of noise into MIC 1, and Some leakage of
`Signal into MIC 2. Equation 1 has four unknowns and only
`two known relationships and therefore cannot be Solved
`explicitly.
`0026. However, there is another way to solve for some of
`the unknowns in Equation 1. The analysis Starts with an
`examination of the case where the Signal is not being
`generated, that is, where a signal from the VAD element 104
`
`equals Zero and Speech is not being produced. In this case,
`S(n)=S(z)=0, and Equation 1 reduces to
`
`0027 where the n subscript on the M variables indicate
`that only noise is being received. This leads to
`
`M(z) = M2, (3)H (3)
`M, (3)
`H 1(3) - M.
`
`Eq. 2
`
`0028 H (Z) can be calculated using any of the available
`System identification algorithms and the microphone outputs
`when the System is certain that only noise is being received.
`The calculation can be done adaptively, So that the System
`can react to changes in the noise.
`0029. A solution is now available for one of the
`unknowns in Equation 1. Another unknown, H2(z), can be
`determined by using the instances where the VAD equals
`one and Speech is being produced. When this is occurring,
`but the recent (perhaps less than 1 Second) history of the
`microphones indicate low levels of noise, it can be assumed
`that n(s)=N(Z)-0. Then Equation 1 reduces to
`M(z)=S(z)
`M(z)=S(z)H, (z)
`0030) which in turn leads to
`
`0031) which is the inverse of the H (Z) calculation. How
`ever, it is noted that different inputs are being used-now
`only the Signal is occurring whereas before only the noise
`was occurring. While calculating H2(z), the values calcu
`lated for H (Z) are held constant and Vice versa. Thus, it is
`assumed that while one of H (Z) and H2(z) are being
`calculated, the one not being calculated does not change
`Substantially.
`0032. After calculating H (Z) and H(z), they are used to
`remove the noise from the Signal. If Equation 1 is rewritten
`S
`
`0033)
`S(z) as:
`
`then N(z) may be substituted as shown to solve for
`
`S (3)
`
`M(z) - M2(3)H (3)
`in
`
`Eq. 3
`
`- 16 -
`
`
`
`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`0034). If the transfer functions H (Z) and H(z) can be
`described with Sufficient accuracy, then the noise can be
`completely removed and the original signal recovered. This
`remains true without respect to the amplitude or spectral
`characteristics of the noise. The only assumptions made are
`a perfect VAD, Sufficiently accurate H (Z) and H2(z), and
`that when one of H (Z) and H2(z) are being calculated the
`other does not change Substantially. In practice these
`assumptions have proven reasonable.
`0035. The noise removal algorithm described herein is
`easily generalized to include any number of noise Sources.
`FIG. 3 is a block diagram of a front end of a noise removal
`algorithm of an embodiment, generalized to n distinct noise
`Sources. These distinct noise Sources may be reflections or
`echoes of one another, but are not So limited. There are
`Several noise Sources shown, each with a transfer function,
`or path, to each microphone. The previously named path H
`has been relabeled as Ho, So that labeling noise Source 2's
`path to MIC 1 is more convenient. The outputs of each
`microphone, when transformed to the Z domain, are:
`
`0036 When there is no signal (VAD=0), then (suppress
`ing the Z's for clarity)
`
`0037. A new transfer function can now be defined, analo
`gous to H (Z) above:
`Min
`H
`NH + N2 H2 + ... N, H.
`
`Eq. 6
`
`0038) Thus H depends only on the noise sources and
`their respective transfer functions and can be calculated any
`time there is no signal being transmitted. Once again, the n
`Subscripts on the microphone inputs denote only that noise
`is being detected, while an S. Subscript denotes that only
`Signal is being received by the microphones.
`0.039
`Examining Equation 4 while assuming that there is
`no noise produces
`
`0040 Thus Ho can be solved for as before, using any
`available transfer function calculating algorithm. Math
`ematically
`
`M2
`
`0042 Solving for S yields,
`
`S = M1 - M2 Hi
`1 - Ho H
`
`Eq. 8
`
`0043 which is the same as Equation 3, with Ho taking the
`place of H, and H. taking the place of H. Thus the noise
`removal algorithm Still is mathematically valid for any
`number of noise Sources, including multiple echoes of noise
`Sources. Again, if Ho and H can be estimated to a high
`enough accuracy, and the above assumption of only one path
`from the Signal to the microphones holds, the noise may be
`removed completely.
`0044) The most general case involves multiple noise
`Sources and multiple Signal Sources. FIG. 4 is a block
`diagram of a front end of a noise removal algorithm of an
`embodiment in the most general case where there are n
`distinct noise Sources and Signal reflections. Here, reflec
`tions of the Signal enter both microphones. This is the most
`general case, as reflections of the noise Source into the
`microphones can be modeled accurately as Simple additional
`noise Sources. For clarity, the direct path from the Signal to
`MIC 2 has changed from Ho(z) to Hoo(z), and the reflected
`paths to MIC 1 and MIC 2 are denoted by Ho (Z) and Ho(z),
`respectively.
`004.5 The input into the microphones now becomes
`
`0046) When the VAD=0, the inputs become (suppressing
`the Z's again)
`
`M=NG+NG+ . . . NG
`0047 which is the same as Equation 5. Thus, the calcu
`lation of H in Equation 6 is unchanged, as expected. In
`examining the Situation where there is no noise, Equation 9
`reduces to
`
`M=S+SH
`M=SH+SH.
`0.048. This leads to the definition of H:
`
`M2s
`M1
`
`Hoo + Ho2
`1 + Hol
`
`Eq. 10
`
`0041) Rewriting Equation 4, using H defined in Equation
`6, provides,
`
`0049 Rewriting Equation 9 again using the definition for
`H (as in Equation 7) provides
`
`M1 - S
`H = M2 - SHo
`
`Eq. 7
`
`M1 - S(1 + Hol)
`H =
`M2 - S(Hoo + Ho)
`
`Ed. 11
`C
`
`- 17 -
`
`
`
`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`0050. Some algebraic manipulation yields
`
`Eq. 12
`
`S(1 + Ho - H(Hoo + Ho2))= M - M. H.
`(Hoo + Ho2)
`S(1 + H(1-fi,
`E=M, - M2 H
`S(1 + Hol)1 - H H = M - M. H.
`and finally
`
`M - M2H
`S(1 + Hol) =
`1 - H H2
`
`Equation 12 is the same as equation 8, with the
`0051
`replacement of Ho by H2, and the addition of the (1+Hol)
`factor on the left side. This extra factor means that S cannot
`be solved for directly in this situation, but a solution can be
`generated for the Signal plus the addition of all of its echoes.
`This is not Such a bad situation, as there are many conven
`tional methods for dealing with echo Suppression, and even
`if the echoes are not Suppressed, it is unlikely that they will
`affect the comprehensibility of the Speech to any meaningful
`extent. The more complex calculation of H is needed to
`account for the Signal echoes in MIC 2, which act as noise
`SOUCCS.
`FIG. 5 is a flow diagram of a denoising method of
`0.052
`an embodiment. In operation, the acoustic Signals are
`received 502. Further, physiological information associated
`with human voicing activity is received 504. A first transfer
`function representative of the acoustic Signal is calculated
`upon determining that Voicing information is absent from
`the acoustic Signal for at least one Specified period of time
`506. A second transfer function representative of the acous
`tic Signal is calculated upon determining that voicing infor
`mation is present in the acoustic Signal for at least one
`specified period of time 508. Noise is removed from the
`acoustic Signal using at least one combination of the first
`transfer function and the Second transfer function, producing
`denoised acoustic data streams 510.
`0.053 An algorithm for noise removal, or denoising algo
`rithm, is described herein, from the Simplest case of a single
`noise Source with a direct path to multiple noise Sources with
`reflections and echoes. The algorithm has been shown herein
`to be viable under any environmental conditions. The type
`and amount of noise are inconsequential if a good estimate
`has been made of H and H2, and if one does not change
`substantially while the other is calculated. If the user envi
`ronment is Such that echoes are present, they can be com
`pensated for if coming from a noise Source. If Signal echoes
`are also present, they will affect the cleaned Signal, but the
`effect should be negligible in most environments.
`0054.
`In operation, the algorithm of an embodiment has
`shown excellent results in dealing with a variety of noise
`types, amplitudes, and orientations. However, there are
`always approximations and adjustments that have to be
`made when moving from mathematical concepts to engi
`neering applications. One assumption is made in Equation 3,
`where H(z) is assumed small and therefore H(z)H, (Z)-0,
`So that Equation 3 reduces to
`
`0055. This means that only H(z) has to be calculated,
`Speeding up the proceSS and reducing the number of com
`
`putations required considerably. With the proper Selection of
`microphones, this approximation is easily realized.
`0056. Another approximation involves the filter used in
`an embodiment. The actual H (Z) will undoubtedly have
`both poles and Zeros, but for Stability and Simplicity an
`all-zero Finite Impulse Response (FIR) filter is used. With
`enough taps (around 60) the approximation to the actual
`H (Z) is very good.
`0057 Regarding Subband selection, the wider the range
`of frequencies over which a transfer function must be
`calculated, the more difficult it is to calculate it accurately.
`Therefore the acoustic data was divided into 16 Subbands,
`with the lowest frequency at 50 Hz and the highest at 3700.
`The denoising algorithm was then applied to each Subband
`in turn, and the 16 denoised data Streams were recombined
`to yield the denoised acoustic data. This works very well, but
`any combinations of Subbands (i.e. 4, 6, 8, 32, equally
`Spaced, perceptually spaced, etc.) can be used and has been
`found to work as well.
`0058. The amplitude of the noise was constrained in an
`embodiment So that the microphones used did not Saturate
`(that is, operate outside a linear response region). It is
`important that the microphones operate linearly to ensure the
`best performance. Even with this restriction, very low Sig
`nal-to-noise ratio (SNR) signals can be denoised (down to
`-10 dB or less).
`0059) The calculation of H(z) is accomplished every 10
`milliseconds using the Least-Mean Squares (LMS) method,
`a common adaptive transfer function. An explanation may
`be found in "Adaptive Signal Processing” (1985), by Wid
`row and Steams, published by Prentice-Hall, ISBN 0-13
`OO4O29-O.
`0060. The VAD for an embodiment is derived from a
`radio frequency Sensor and the two microphones, yielding
`very high accuracy (>99%) for both voiced and unvoiced
`Speech. The VAD of an embodiment uses a radio frequency
`(RF) interferometer to detect tissue motion associated with
`human Speech production, but is not So limited. It is there
`fore completely acoustic-noise free, and is able to function
`in any acoustic noise environment. A simple energy mea
`surement of the RF signal can be used to determine if voiced
`Speech is occurring. Unvoiced speech can be determined
`using conventional acoustic-based methods, by proximity to
`Voiced Sections determined using the RF Sensor or Similar
`Voicing Sensors, or through a combination of the above.
`Since there is much less energy in unvoiced speech, its
`activation accuracy is not as critical as Voiced speech.
`0061. With voiced and unvoiced speech detected reliably,
`the algorithm of an embodiment can be implemented. Once
`again, it is useful to repeat that the noise removal algorithm
`does not depend on how the VAD is obtained, only that it is
`accurate, especially for voiced Speech. If Speech is not
`detected and training occurs on the Speech, the Subsequent
`denoised acoustic data can be distorted.
`0062 Data was collected in four channels, one for MIC
`1, one for MIC 2, and two for the radio frequency sensor that
`detected the tissue motions associated with Voiced speech.
`The data were sampled simultaneously at 40 kHz, then
`digitally filtered and decimated down to 8 kHz. The high
`Sampling rate was used to reduce any aliasing that might
`result from the analog to digital process. A four-channel
`
`- 18 -
`
`
`
`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`National Instruments A/D board was used along with Lab
`View to capture and Store the data. The data was then read
`into a C program and denoised 10 milliseconds at a time.
`0.063
`FIG. 6 shows results of a noise Suppression algo
`rithm of an embodiment for an American English Speaking
`female in the presence of airport terminal noise that includes
`many other human Speakers and public announcements. The
`speaker is uttering the numbers 406-5562 in the midst of
`moderate airport terminal noise. The dirty acoustic data was
`denoised 10 milliseconds at a time, and before denoising the
`10 milliseconds of data were prefiltered from 50 to 3700 Hz.
`A reduction in the noise of approximately 17 dB is evident.
`No post filtering was done on this Sample, thus, all of the
`noise reduction realized is due to the algorithm of an
`embodiment. It is clear that the algorithm adjusts to the noise
`instantly, and is capable of removing the very difficult noise
`of other human Speakers. Many different types of noise have
`all been tested with Similar results, including Street noise,
`helicopters, music, and Sine waves, to name a few. Also, the
`orientation of the noise can be varied substantially without
`Significantly changing the noise Suppression performance.
`Finally, the distortion of the cleaned speech is very low,
`ensuring good performance for Speech recognition engines
`and human receivers alike.
`0064. The noise removal algorithm of an embodiment has
`been shown to be viable under any environmental condi
`tions. The type and amount of noise are inconsequential if a
`good estimate has been made of H, and H2. If the user
`environment is Such that echoes are present, they can be
`compensated for if coming from a noise Source. If Signal
`echoes are also present, they will affect the cleaned signal,
`but the effect should be negligible in most environments.
`0065 FIG. 7 is a block diagram of a physical configu
`ration for denoising using a unidirectional microphone M2
`for the noise and an omnidirectional microphone M1 for the
`speech, under the embodiments of FIGS. 2, 3, and 4. As
`described above, the path from the Speech to the noise
`microphone (MIC 2) is approximated as Zero, and that
`approximation is realized through the careful placement of
`omnidirectional and unidirectional microphones. This works
`quite well (20-40 dB of noise Suppression) when the noise
`is oriented opposite the signal location (noise Source N.).
`However, when the noise Source is oriented on the same side
`as the speaker (noise Source N), the performance can drop
`to only 10-20 dB of noise suppression. This drop in Sup
`pression ability can be attributed to the Steps taken to ensure
`that H is close to Zero. These Steps included the use of a
`unidirectional microphone for the noise microphone (MIC
`2) So that very little Signal is present in the noise data. As the
`unidirectional microphone cancels out acoustic information
`coming from a particular direction, it also cancels out noise
`that is coming from the same direction as Speech. This may
`limit the ability of the adaptive algorithm to characterize and
`then remove noise in a location Such as N. The same effect
`is noted when a unidirectional microphone is used for the
`Speech microphone, M1.
`0.066 However, if the unidirectional microphone M is
`replaced with an omnidirectional microphone, then a Sig
`nificant amount of Signal is captured by M. This runs
`counter to the aforementioned assumption that H is Zero,
`and as a result during voicing a Significant amount of Signal
`is removed, resulting in denoising and "de-signaling”. This
`
`is not acceptable if Signal distortion is to be kept to a
`minimum. In order to reduce the distortion, therefore, a
`value is calculated for H. However, the value for H can not
`be calculated in the presence of noise, or the noise will be
`mislabeled as Speech and not removed.
`0067 Experience with acoustic-only microphone arrays
`Suggests that a Small, two-microphone array might be a
`solution to the problem. FIG. 8 is a denoising microphone
`configuration including two omnidirectional microphones,
`under an embodiment. The same effect can be achieved
`through the use of two unidirectional microphones, oriented
`in the same direction (toward the signal Source). Yet another
`embodiment uses one unidirectional microphone and one
`omnidirectional microphone. The idea is to capture Similar
`information from acoustic Sources in the direction of the
`Signal Source. The relative locations of the Signal Source and
`the two microphones are fixed and known. By placing the
`microphones a distance d apart that corresponds with n
`discrete time Samples and placing the Speaker on the axis of
`the array, H can be fixed to be of the form Cz", where C
`is the difference in amplitude of the signal data at M and
`M. For the discussion that follows, the assumption is made
`that n=1, although any integer other than Zero may be used.
`For causality, the use of positive integerS is recommended.
`AS the amplitude of a spherical pressure Source varies as 1
`/r, this allows not only Specification of the direction of the
`Source but its distance. The C required can be estimated by
`
`|S| at M2
`ISI at M,
`
`d.
`d - d.
`
`0068 FIG. 9 is a plot of the C required versus distance,
`under the embodiment of FIG. 8. It can be seen that the
`asymptote is at C=1.0, and C reaches 0.9 at approximately
`38 centimeters, slightly more than a foot, and 0.94 at
`approximately 60 cm. At the distances normally encountered
`in a handset and earpiece (4 to 12 cm), C would be between
`approximately 0.5 to 0.75. This is a difference of approxi
`mately 19 to 44% with the noise source located at approxi
`mately 60 cm, and it is clear that most noise Sources would
`be located farther away than that. Therefore, the system
`using this configuration would be able to discriminate
`between noise and Signal quite effectively, even when they
`have a similar orientation.
`0069. To determine the effects on denoising of poor
`estimates of C, assume that C=n0, where C is an estimate
`and Co is the actual value of C. Using the Signal definition
`from above,
`
`M(z) - M2(3)H (3)
`S(z) = - it -
`
`0070 it has been assumed that H(z) was very small, so
`that the Signal could be approximated by
`
`- 19 -
`
`
`
`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`0071. This is true if there is no speech, because by
`definition H=0. However, if speech is occurring, H is
`nonzero, and if set to be Cz',
`
`0078 Fortunately, the choice of H eliminates the need
`for a deconvolution. From the discussion above, the Signal
`can be written as
`
`M(z) - M2(3)H (3)
`S(z) = -ie in
`
`0072 which can be rewritten as
`
`S (3)
`
`M(z) - M2(3)H (3)
`in
`
`0079) which can be rewritten as
`
`S(X) =
`(3)
`
`1 - incoz, H. (3)
`
`1 - Coz, H1(3) + (1 - n) Cozi H1(z)
`
`0073. The last factor in the denominator determines the
`error due to the poor estimation of C. This factor is labeled
`E:
`
`0074 Because ZH, (z) is a filter, its magnitude will
`always be positive. Therefore the change in calculated Signal
`magnitude due to E will depend completely on (1-n).
`0075. There are two possibilities for errors: underestima
`tion of C (n-1), and overestimation of C (nd 1). In the first
`case, C is estimated to be Smaller that it actually is, or the
`signal is closer than estimated. In this case (1-n) and
`therefore E is positive. The denominator is therefore too
`large, and the magnitude of the cleaned signal is too Small.
`This would indicate de-Signaling. In the Second case, the
`Signal is farther away than estimated, and E is negative,
`making S larger than it should be. In this case the denoising
`is insufficient. Because very low Signal distortion is desired,
`the estimations should err toward overestimation of C.
`
`0.076 This result also shows that noise located in the
`same Solid angle (direction from M) as the Signal will be
`Substantially removed depending on the change in C
`between the Signal location and the noise location. Thus,
`when using a handset with M approximately 4 cm from the
`mouth, the required C is approximately 0.5, and for noise at
`approximately 1 meter the C is approximately 0.96. Thus,
`for the noise, the estimate of C=0.5 means that for the noise
`C is underestimated, and the noise will be removed. The
`amount of removal will depend directly on (1-n). Therefore,
`this algorithm uses the direction and the range to the Signal
`to Separate the Signal from the noise.
`0.077 One issue that arises involves stability of this
`technique. Specifically, the deconvolution of (1 -HH)
`raises the question of S