throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2003/0128848A1
`(43) Pub. Date:
`Jul. 10, 2003
`Burnett
`
`US 2003.0128848A1
`
`(54) METHOD AND APPARATUS FOR
`REMOVING NOISE FROM ELECTRONIC
`SIGNALS
`(76) Inventor: Gregory C. Burnett, Livermore, CA
`(US)
`Correspondence Address:
`Shemwell Gregory & Courtney LLP
`Suite 201
`4880 Stevens Creek Boulevard
`San Jose, CA 95129 (US)
`(21) Appl. No.:
`10/301.237
`(22) Filed:
`Nov. 21, 2002
`Related U.S. Application Data
`(63) Continuation-in-part of application No. 09/905,361,
`filed on Jul. 12, 2001.
`
`Publication Classification
`
`(51) Int. Cl." .......................... A61F 11/06; G1OK 11/16;
`HO3B 29/00
`(52) U.S. Cl. ........................................ 381/71.8; 381/71.14
`
`(57)
`
`ABSTRACT
`
`A method and System for removing acoustic noise removal
`from human speech is described. Acoustic noise is removed
`regardless of noise type, amplitude, or orientation. The
`System includes a processor coupled among microphones
`and a voice activation detection (“VAD”) element. The
`processor executes denoising algorithms that generate trans
`fer functions. The processor receives acoustic data from the
`microphones and data from the VAD. The processor gener
`ates various transfer functions when the VAD indicates
`Voicing activity and when the VAD indicates no voicing
`activity. The transfer functions are used to generate a
`denoised data Stream.
`
`
`
`(;)
`
`SIGNAL
`
`()-->
`
`- 1 -
`
`Sony v. Jawbone
`
`U.S. Patent No. 11,122,357
`
`Sony Ex. 1009
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003 Sheet 1 of 13
`
`US 2003/0128848A1
`
`
`
`
`
`
`
`t
`
`
`
`
`
`C.
`
`vur
`
`- 2 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003 Sheet 2 of 13
`
`US 2003/0128848A1
`
`z ºb|-
`
`
`
`- 3 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003 Sheet 3 of 13
`
`US 2003/0128848A1
`
`
`
`()-->
`
`FIG.2
`
`- 4 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003. Sheet 4 of 13
`
`US 2003/0128848A1
`
`
`
`Flo. M
`
`- 5 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003. Sheet 5 of 13
`
`US 2003/0128848A1
`
`
`
`RECEIVE ACOUSTIC
`SIGNALS
`
`SO2
`
`RECEIVE VOICE ACTIVITY
`(VAD) INFORMATION
`
`504
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DETERMINE ABSENCE OF
`WOICING AND GENERATE FIRST
`TRANSFER FUNCTION
`
`506
`
`DETERMINE PRESENCE OF
`WOCING AND GENERATE
`SECONDTRANSFERFUNCTION
`
`
`
`508
`
`PRODUCEDENOISED
`ACOUSTIC DATA STREAM
`
`SO
`
`FIG 5
`
`- 6 -
`
`

`

`
`
`rr
`
`=“nN ©on=a —_nmue=SS-m ”8§=boS=—)a¥P<SSa
`oa=&a&
`
`eeWN
`
`
`va.roolnp
`
`
`
`
`
`
`10000
`2| bl
`) 'rato
`ben sone
`
`
`8000
`0
`1
`@ 3
`
`4
`
`§ 6
`
`
`
`406
`
`5
`
`6
`
`LEANED ola uiohis
`AUDIO
`-2000
`
`- 7 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003 Sheet 7 of 13
`
`US 2003/0128848A1
`
`
`
`
`
`- 8 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003. Sheet 8 of 13
`
`US 2003/0128848A1
`
`3
`f
`
`
`
`r
`
`- 9 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003 Sheet 9 of 13
`
`US 2003/0128848A1
`
`
`
`b3ynºl:
`
`- 10 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003 Sheet 10 of 13
`
`US 2003/0128848A1
`
`
`
`?ISION / (})
`
`(u)u| 0 0 1
`
`- 11 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003. Sheet 11 of 13 US 2003/0128848A1
`
`
`
`s
`
`8
`
`O
`
`8
`|eub.Jo Joy % UJIO
`
`8
`
`- 12 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003 Sheet 12 of 13
`
`US 2003/0128848A1
`
`
`
`00GE
`
`- 13 -
`
`

`

`Patent Application Publication
`
`Jul. 10, 2003 Sheet 13 of 13
`
`US 2003/0128848A1
`
`
`
`00GE
`
`
`
`(ZH) Áouanbau-,
`
`[][][]Z
`
`TD ||
`
`ayrºl)
`
`- 14 -
`
`

`

`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`METHOD AND APPARATUS FOR REMOVING
`NOISE FROM ELECTRONIC SIGNALS
`
`RELATED APPLICATIONS
`0001. This patent application is a continuation in part of
`U.S. patent application Ser. No. 09/905,361, filed Jul. 12,
`2001, which is hereby incorporated by reference. This patent
`application also claims priority from U.S. Provisional Patent
`Application Serial No. 60/332,202, filed Nov. 21, 2001.
`
`FIELD OF THE INVENTION
`0002 The invention is in the field of mathematical meth
`ods and electronic Systems for removing or Suppressing
`undesired acoustical noise from acoustic transmissions or
`recordings.
`
`BACKGROUND
`0003. In a typical acoustic application, speech from a
`human user is recorded or Stored and transmitted to a
`receiver in a different location. In the environment of the
`user, there may exist one or more noise Sources that pollute
`the Signal of interest (the user's speech) with unwanted
`acoustic noise. This makes it difficult or impossible for the
`receiver, whether human or machine, to understand the
`user's Speech. This is especially problematic now with the
`proliferation of portable communication devices like cellu
`lar telephones and personal digital assistants. There are
`existing methods for Suppressing these noise additions, but
`they have significant disadvantages. For example, existing
`methods are Slow because of the computing time required.
`Existing methods may also require cumberSome hardware,
`unacceptably distort the Signal of interest, or have Such poor
`performance that they are not useful. Many of these existing
`methods are described in textbooks such as "Advanced
`Digital Signal Processing and Noise Reduction” by Vaseghi,
`ISBN 0-471-62692-9.
`
`BRIEF DESCRIPTION OF THE FIGURES
`0004 FIG. 1 is a block diagram of a denoising system,
`under an embodiment.
`0005 FIG. 2 is a block diagram illustrating a noise
`removal algorithm, under an embodiment assuming a single
`noise Source and a direct path to the microphones.
`0006 FIG. 3 is a block diagram illustrating a front end of
`a noise removal algorithm of an embodiment generalized to
`n distinct noise Sources (these noise Sources may be reflec
`tions or echoes of one another).
`0007 FIG. 4 is a block diagram illustrating a front end of
`a noise removal algorithm of an embodiment in a general
`case where there are n distinct noise Sources and Signal
`reflections.
`0008 FIG. 5 is a flow diagram of a denoising method,
`under an embodiment.
`0009 FIG. 6 shows results of a noise Suppression algo
`rithm of an embodiment for an American English female
`Speaker in the presence of airport terminal noise that
`includes many other human Speakers and public announce
`mentS.
`
`0010 FIG. 7 is a block diagram of a physical configu
`ration for denoising using unidirectional and omnidirec
`tional microphones, under the embodiments of FIGS. 2, 3,
`and 4.
`0011 FIG. 8 is a denoising microphone configuration
`including two omnidirectional microphones, under an
`embodiment.
`0012 FIG. 9 is a plot of the C required versus distance,
`under the embodiment of FIG. 8.
`0013 FIG. 10 is a block diagram of a front end of a noise
`removal algorithm under an embodiment in which the two
`microphones have different response characteristics.
`0014 FIG. 11A is a plot of the difference in frequency
`response (percent) between the microphones (at a distance
`of 4 centimeters) before compensation.
`0.015 FIG. 11B is a plot of the difference in frequency
`response (percent) between the microphones (at a distance
`of 4 centimeters) after DFT compensation, under an embodi
`ment.
`0016 FIG. 11C is a plot of the difference in frequency
`response (percent) between the microphones (at a distance
`of 4 centimeters) after time-domain filter compensation,
`under an alternate embodiment.
`
`DETAILED DESCRIPTION
`0017. The following description provides specific details
`for a thorough understanding of, and enabling description
`for, embodiments of the invention. However, one skilled in
`the art will understand that the invention may be practiced
`without these details. In other instances, well-known Struc
`tures and functions have not been shown or described in
`detail to avoid unnecessarily obscuring the description of the
`embodiments of the invention.
`0018. Unless described otherwise below, the construction
`and operation of the various blocks shown in the figures are
`of conventional design. As a result, Such blocks need not be
`described in further detail herein, because they will be
`understood by those skilled in the relevant art. Such further
`detail is omitted for brevity and so as not to obscure the
`detailed description of the invention. Any modifications
`necessary to the blocks in the Figures (or other embodi
`ments) can be readily made by one skilled in the relevant art
`based on the detailed description provided herein.
`0019 FIG. 1 is a block diagram of a denoising system of
`an embodiment that uses knowledge of when Speech is
`occurring derived from physiological information on Voic
`ing activity. The System includes microphones 10 and Sen
`sors 20 that provide signals to at least one processor 30. The
`processor includes a denoising Subsystem or algorithm 40.
`0020 FIG. 2 is a block diagram illustrating a noise
`removal algorithm of an embodiment, showing System com
`ponents used. A Single noise Source and a direct path to the
`microphones are assumed. FIG. 2 includes a graphic
`description of the process of an embodiment, with a single
`signal source 100 and a single noise source 101. This
`algorithm uses two microphones: a “signal' microphone 1
`(“MIC1”) and a “noise" microphone 2 (“MIC 2"), but is not
`so limited. MIC 1 is assumed to capture mostly signal with
`Some noise, while MIC 2 captures mostly noise with some
`
`- 15 -
`
`

`

`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`signal. The data from the signal source 100 to MIC 1 is
`denoted by S(n), where S(n) is a discrete Sample of the analog
`signal from the source 100. The data from the signal source
`100 to MIC 2 is denoted by s(n). The data from the noise
`source 101 to MIC 2 is denoted by n(n). The data from the
`noise source 101 to MIC 1 is denoted by n(n). Similarly, the
`data from MIC 1 to noise removal element 105 is denoted by
`m(n), and the data from MIC 2 to noise removal element
`105 is denoted by m(n).
`0021. The noise removal element also receives a signal
`from a voice activity detection (“VAD”) element 104. The
`VAD 104 detects uses physiological information to deter
`mine when a speaker is speaking. In various embodiments,
`the VAD includes a radio frequency device, an electroglot
`tograph, an ultrasound device, an acoustic throat micro
`phone, and/or an airflow detector.
`0022. The transfer functions from the signal source 100
`to MIC 1 and from the noise Source 101 to MIC 2 are
`assumed to be unity. The transfer function from the Signal
`source 100 to MIC 2 is denoted by H(z), and the transfer
`function from the noise source 101 to MIC 1 is denoted by
`H (Z). The assumption of unity transfer functions does not
`inhibit the generality of this algorithm, as the actual relations
`between the Signal, noise, and microphones are simply ratioS
`and the ratioS are redefined in this manner for simplicity.
`0023. In conventional noise removal systems, the infor
`mation from MIC 2 is used to attempt to remove noise from
`MIC 1. However, an unspoken assumption is that the VAD
`element 104 is never perfect, and thus the denoising must be
`performed cautiously, So as not to remove too much of the
`signal along with the noise. However, if the VAD 104 is
`assumed to be perfect Such that it is equal to Zero when there
`is no speech being produced by the user, and equal to one
`when Speech is produced, a Substantial improvement in the
`noise removal can be made.
`0024.
`In analyzing the single noise source 101 and the
`direct path to the microphones, with reference to FIG. 2, the
`total acoustic information coming into MIC 1 is denoted by
`m(n). The total acoustic information coming into MIC 2 is
`Similarly labeled m(n). In the Z (digital frequency) domain,
`these are represented as M(z) and M2(Z). Then
`
`with
`
`so that
`
`Eq. 1
`M(z)=N(z)+S(z)H, (z)
`0.025 This is the general case for all two microphone
`Systems. In a practical System there is always going to be
`Some leakage of noise into MIC 1, and Some leakage of
`Signal into MIC 2. Equation 1 has four unknowns and only
`two known relationships and therefore cannot be Solved
`explicitly.
`0026. However, there is another way to solve for some of
`the unknowns in Equation 1. The analysis Starts with an
`examination of the case where the Signal is not being
`generated, that is, where a signal from the VAD element 104
`
`equals Zero and Speech is not being produced. In this case,
`S(n)=S(z)=0, and Equation 1 reduces to
`
`0027 where the n subscript on the M variables indicate
`that only noise is being received. This leads to
`
`M(z) = M2, (3)H (3)
`M, (3)
`H 1(3) - M.
`
`Eq. 2
`
`0028 H (Z) can be calculated using any of the available
`System identification algorithms and the microphone outputs
`when the System is certain that only noise is being received.
`The calculation can be done adaptively, So that the System
`can react to changes in the noise.
`0029. A solution is now available for one of the
`unknowns in Equation 1. Another unknown, H2(z), can be
`determined by using the instances where the VAD equals
`one and Speech is being produced. When this is occurring,
`but the recent (perhaps less than 1 Second) history of the
`microphones indicate low levels of noise, it can be assumed
`that n(s)=N(Z)-0. Then Equation 1 reduces to
`M(z)=S(z)
`M(z)=S(z)H, (z)
`0030) which in turn leads to
`
`0031) which is the inverse of the H (Z) calculation. How
`ever, it is noted that different inputs are being used-now
`only the Signal is occurring whereas before only the noise
`was occurring. While calculating H2(z), the values calcu
`lated for H (Z) are held constant and Vice versa. Thus, it is
`assumed that while one of H (Z) and H2(z) are being
`calculated, the one not being calculated does not change
`Substantially.
`0032. After calculating H (Z) and H(z), they are used to
`remove the noise from the Signal. If Equation 1 is rewritten
`S
`
`0033)
`S(z) as:
`
`then N(z) may be substituted as shown to solve for
`
`S (3)
`
`M(z) - M2(3)H (3)
`in
`
`Eq. 3
`
`- 16 -
`
`

`

`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`0034). If the transfer functions H (Z) and H(z) can be
`described with Sufficient accuracy, then the noise can be
`completely removed and the original signal recovered. This
`remains true without respect to the amplitude or spectral
`characteristics of the noise. The only assumptions made are
`a perfect VAD, Sufficiently accurate H (Z) and H2(z), and
`that when one of H (Z) and H2(z) are being calculated the
`other does not change Substantially. In practice these
`assumptions have proven reasonable.
`0035. The noise removal algorithm described herein is
`easily generalized to include any number of noise Sources.
`FIG. 3 is a block diagram of a front end of a noise removal
`algorithm of an embodiment, generalized to n distinct noise
`Sources. These distinct noise Sources may be reflections or
`echoes of one another, but are not So limited. There are
`Several noise Sources shown, each with a transfer function,
`or path, to each microphone. The previously named path H
`has been relabeled as Ho, So that labeling noise Source 2's
`path to MIC 1 is more convenient. The outputs of each
`microphone, when transformed to the Z domain, are:
`
`0036 When there is no signal (VAD=0), then (suppress
`ing the Z's for clarity)
`
`0037. A new transfer function can now be defined, analo
`gous to H (Z) above:
`Min
`H
`NH + N2 H2 + ... N, H.
`
`Eq. 6
`
`0038) Thus H depends only on the noise sources and
`their respective transfer functions and can be calculated any
`time there is no signal being transmitted. Once again, the n
`Subscripts on the microphone inputs denote only that noise
`is being detected, while an S. Subscript denotes that only
`Signal is being received by the microphones.
`0.039
`Examining Equation 4 while assuming that there is
`no noise produces
`
`0040 Thus Ho can be solved for as before, using any
`available transfer function calculating algorithm. Math
`ematically
`
`M2
`
`0042 Solving for S yields,
`
`S = M1 - M2 Hi
`1 - Ho H
`
`Eq. 8
`
`0043 which is the same as Equation 3, with Ho taking the
`place of H, and H. taking the place of H. Thus the noise
`removal algorithm Still is mathematically valid for any
`number of noise Sources, including multiple echoes of noise
`Sources. Again, if Ho and H can be estimated to a high
`enough accuracy, and the above assumption of only one path
`from the Signal to the microphones holds, the noise may be
`removed completely.
`0044) The most general case involves multiple noise
`Sources and multiple Signal Sources. FIG. 4 is a block
`diagram of a front end of a noise removal algorithm of an
`embodiment in the most general case where there are n
`distinct noise Sources and Signal reflections. Here, reflec
`tions of the Signal enter both microphones. This is the most
`general case, as reflections of the noise Source into the
`microphones can be modeled accurately as Simple additional
`noise Sources. For clarity, the direct path from the Signal to
`MIC 2 has changed from Ho(z) to Hoo(z), and the reflected
`paths to MIC 1 and MIC 2 are denoted by Ho (Z) and Ho(z),
`respectively.
`004.5 The input into the microphones now becomes
`
`0046) When the VAD=0, the inputs become (suppressing
`the Z's again)
`
`M=NG+NG+ . . . NG
`0047 which is the same as Equation 5. Thus, the calcu
`lation of H in Equation 6 is unchanged, as expected. In
`examining the Situation where there is no noise, Equation 9
`reduces to
`
`M=S+SH
`M=SH+SH.
`0.048. This leads to the definition of H:
`
`M2s
`M1
`
`Hoo + Ho2
`1 + Hol
`
`Eq. 10
`
`0041) Rewriting Equation 4, using H defined in Equation
`6, provides,
`
`0049 Rewriting Equation 9 again using the definition for
`H (as in Equation 7) provides
`
`M1 - S
`H = M2 - SHo
`
`Eq. 7
`
`M1 - S(1 + Hol)
`H =
`M2 - S(Hoo + Ho)
`
`Ed. 11
`C
`
`- 17 -
`
`

`

`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`0050. Some algebraic manipulation yields
`
`Eq. 12
`
`S(1 + Ho - H(Hoo + Ho2))= M - M. H.
`(Hoo + Ho2)
`S(1 + H(1-fi,
`E=M, - M2 H
`S(1 + Hol)1 - H H = M - M. H.
`and finally
`
`M - M2H
`S(1 + Hol) =
`1 - H H2
`
`Equation 12 is the same as equation 8, with the
`0051
`replacement of Ho by H2, and the addition of the (1+Hol)
`factor on the left side. This extra factor means that S cannot
`be solved for directly in this situation, but a solution can be
`generated for the Signal plus the addition of all of its echoes.
`This is not Such a bad situation, as there are many conven
`tional methods for dealing with echo Suppression, and even
`if the echoes are not Suppressed, it is unlikely that they will
`affect the comprehensibility of the Speech to any meaningful
`extent. The more complex calculation of H is needed to
`account for the Signal echoes in MIC 2, which act as noise
`SOUCCS.
`FIG. 5 is a flow diagram of a denoising method of
`0.052
`an embodiment. In operation, the acoustic Signals are
`received 502. Further, physiological information associated
`with human voicing activity is received 504. A first transfer
`function representative of the acoustic Signal is calculated
`upon determining that Voicing information is absent from
`the acoustic Signal for at least one Specified period of time
`506. A second transfer function representative of the acous
`tic Signal is calculated upon determining that voicing infor
`mation is present in the acoustic Signal for at least one
`specified period of time 508. Noise is removed from the
`acoustic Signal using at least one combination of the first
`transfer function and the Second transfer function, producing
`denoised acoustic data streams 510.
`0.053 An algorithm for noise removal, or denoising algo
`rithm, is described herein, from the Simplest case of a single
`noise Source with a direct path to multiple noise Sources with
`reflections and echoes. The algorithm has been shown herein
`to be viable under any environmental conditions. The type
`and amount of noise are inconsequential if a good estimate
`has been made of H and H2, and if one does not change
`substantially while the other is calculated. If the user envi
`ronment is Such that echoes are present, they can be com
`pensated for if coming from a noise Source. If Signal echoes
`are also present, they will affect the cleaned Signal, but the
`effect should be negligible in most environments.
`0054.
`In operation, the algorithm of an embodiment has
`shown excellent results in dealing with a variety of noise
`types, amplitudes, and orientations. However, there are
`always approximations and adjustments that have to be
`made when moving from mathematical concepts to engi
`neering applications. One assumption is made in Equation 3,
`where H(z) is assumed small and therefore H(z)H, (Z)-0,
`So that Equation 3 reduces to
`
`0055. This means that only H(z) has to be calculated,
`Speeding up the proceSS and reducing the number of com
`
`putations required considerably. With the proper Selection of
`microphones, this approximation is easily realized.
`0056. Another approximation involves the filter used in
`an embodiment. The actual H (Z) will undoubtedly have
`both poles and Zeros, but for Stability and Simplicity an
`all-zero Finite Impulse Response (FIR) filter is used. With
`enough taps (around 60) the approximation to the actual
`H (Z) is very good.
`0057 Regarding Subband selection, the wider the range
`of frequencies over which a transfer function must be
`calculated, the more difficult it is to calculate it accurately.
`Therefore the acoustic data was divided into 16 Subbands,
`with the lowest frequency at 50 Hz and the highest at 3700.
`The denoising algorithm was then applied to each Subband
`in turn, and the 16 denoised data Streams were recombined
`to yield the denoised acoustic data. This works very well, but
`any combinations of Subbands (i.e. 4, 6, 8, 32, equally
`Spaced, perceptually spaced, etc.) can be used and has been
`found to work as well.
`0058. The amplitude of the noise was constrained in an
`embodiment So that the microphones used did not Saturate
`(that is, operate outside a linear response region). It is
`important that the microphones operate linearly to ensure the
`best performance. Even with this restriction, very low Sig
`nal-to-noise ratio (SNR) signals can be denoised (down to
`-10 dB or less).
`0059) The calculation of H(z) is accomplished every 10
`milliseconds using the Least-Mean Squares (LMS) method,
`a common adaptive transfer function. An explanation may
`be found in "Adaptive Signal Processing” (1985), by Wid
`row and Steams, published by Prentice-Hall, ISBN 0-13
`OO4O29-O.
`0060. The VAD for an embodiment is derived from a
`radio frequency Sensor and the two microphones, yielding
`very high accuracy (>99%) for both voiced and unvoiced
`Speech. The VAD of an embodiment uses a radio frequency
`(RF) interferometer to detect tissue motion associated with
`human Speech production, but is not So limited. It is there
`fore completely acoustic-noise free, and is able to function
`in any acoustic noise environment. A simple energy mea
`surement of the RF signal can be used to determine if voiced
`Speech is occurring. Unvoiced speech can be determined
`using conventional acoustic-based methods, by proximity to
`Voiced Sections determined using the RF Sensor or Similar
`Voicing Sensors, or through a combination of the above.
`Since there is much less energy in unvoiced speech, its
`activation accuracy is not as critical as Voiced speech.
`0061. With voiced and unvoiced speech detected reliably,
`the algorithm of an embodiment can be implemented. Once
`again, it is useful to repeat that the noise removal algorithm
`does not depend on how the VAD is obtained, only that it is
`accurate, especially for voiced Speech. If Speech is not
`detected and training occurs on the Speech, the Subsequent
`denoised acoustic data can be distorted.
`0062 Data was collected in four channels, one for MIC
`1, one for MIC 2, and two for the radio frequency sensor that
`detected the tissue motions associated with Voiced speech.
`The data were sampled simultaneously at 40 kHz, then
`digitally filtered and decimated down to 8 kHz. The high
`Sampling rate was used to reduce any aliasing that might
`result from the analog to digital process. A four-channel
`
`- 18 -
`
`

`

`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`National Instruments A/D board was used along with Lab
`View to capture and Store the data. The data was then read
`into a C program and denoised 10 milliseconds at a time.
`0.063
`FIG. 6 shows results of a noise Suppression algo
`rithm of an embodiment for an American English Speaking
`female in the presence of airport terminal noise that includes
`many other human Speakers and public announcements. The
`speaker is uttering the numbers 406-5562 in the midst of
`moderate airport terminal noise. The dirty acoustic data was
`denoised 10 milliseconds at a time, and before denoising the
`10 milliseconds of data were prefiltered from 50 to 3700 Hz.
`A reduction in the noise of approximately 17 dB is evident.
`No post filtering was done on this Sample, thus, all of the
`noise reduction realized is due to the algorithm of an
`embodiment. It is clear that the algorithm adjusts to the noise
`instantly, and is capable of removing the very difficult noise
`of other human Speakers. Many different types of noise have
`all been tested with Similar results, including Street noise,
`helicopters, music, and Sine waves, to name a few. Also, the
`orientation of the noise can be varied substantially without
`Significantly changing the noise Suppression performance.
`Finally, the distortion of the cleaned speech is very low,
`ensuring good performance for Speech recognition engines
`and human receivers alike.
`0064. The noise removal algorithm of an embodiment has
`been shown to be viable under any environmental condi
`tions. The type and amount of noise are inconsequential if a
`good estimate has been made of H, and H2. If the user
`environment is Such that echoes are present, they can be
`compensated for if coming from a noise Source. If Signal
`echoes are also present, they will affect the cleaned signal,
`but the effect should be negligible in most environments.
`0065 FIG. 7 is a block diagram of a physical configu
`ration for denoising using a unidirectional microphone M2
`for the noise and an omnidirectional microphone M1 for the
`speech, under the embodiments of FIGS. 2, 3, and 4. As
`described above, the path from the Speech to the noise
`microphone (MIC 2) is approximated as Zero, and that
`approximation is realized through the careful placement of
`omnidirectional and unidirectional microphones. This works
`quite well (20-40 dB of noise Suppression) when the noise
`is oriented opposite the signal location (noise Source N.).
`However, when the noise Source is oriented on the same side
`as the speaker (noise Source N), the performance can drop
`to only 10-20 dB of noise suppression. This drop in Sup
`pression ability can be attributed to the Steps taken to ensure
`that H is close to Zero. These Steps included the use of a
`unidirectional microphone for the noise microphone (MIC
`2) So that very little Signal is present in the noise data. As the
`unidirectional microphone cancels out acoustic information
`coming from a particular direction, it also cancels out noise
`that is coming from the same direction as Speech. This may
`limit the ability of the adaptive algorithm to characterize and
`then remove noise in a location Such as N. The same effect
`is noted when a unidirectional microphone is used for the
`Speech microphone, M1.
`0.066 However, if the unidirectional microphone M is
`replaced with an omnidirectional microphone, then a Sig
`nificant amount of Signal is captured by M. This runs
`counter to the aforementioned assumption that H is Zero,
`and as a result during voicing a Significant amount of Signal
`is removed, resulting in denoising and "de-signaling”. This
`
`is not acceptable if Signal distortion is to be kept to a
`minimum. In order to reduce the distortion, therefore, a
`value is calculated for H. However, the value for H can not
`be calculated in the presence of noise, or the noise will be
`mislabeled as Speech and not removed.
`0067 Experience with acoustic-only microphone arrays
`Suggests that a Small, two-microphone array might be a
`solution to the problem. FIG. 8 is a denoising microphone
`configuration including two omnidirectional microphones,
`under an embodiment. The same effect can be achieved
`through the use of two unidirectional microphones, oriented
`in the same direction (toward the signal Source). Yet another
`embodiment uses one unidirectional microphone and one
`omnidirectional microphone. The idea is to capture Similar
`information from acoustic Sources in the direction of the
`Signal Source. The relative locations of the Signal Source and
`the two microphones are fixed and known. By placing the
`microphones a distance d apart that corresponds with n
`discrete time Samples and placing the Speaker on the axis of
`the array, H can be fixed to be of the form Cz", where C
`is the difference in amplitude of the signal data at M and
`M. For the discussion that follows, the assumption is made
`that n=1, although any integer other than Zero may be used.
`For causality, the use of positive integerS is recommended.
`AS the amplitude of a spherical pressure Source varies as 1
`/r, this allows not only Specification of the direction of the
`Source but its distance. The C required can be estimated by
`
`|S| at M2
`ISI at M,
`
`d.
`d - d.
`
`0068 FIG. 9 is a plot of the C required versus distance,
`under the embodiment of FIG. 8. It can be seen that the
`asymptote is at C=1.0, and C reaches 0.9 at approximately
`38 centimeters, slightly more than a foot, and 0.94 at
`approximately 60 cm. At the distances normally encountered
`in a handset and earpiece (4 to 12 cm), C would be between
`approximately 0.5 to 0.75. This is a difference of approxi
`mately 19 to 44% with the noise source located at approxi
`mately 60 cm, and it is clear that most noise Sources would
`be located farther away than that. Therefore, the system
`using this configuration would be able to discriminate
`between noise and Signal quite effectively, even when they
`have a similar orientation.
`0069. To determine the effects on denoising of poor
`estimates of C, assume that C=n0, where C is an estimate
`and Co is the actual value of C. Using the Signal definition
`from above,
`
`M(z) - M2(3)H (3)
`S(z) = - it -
`
`0070 it has been assumed that H(z) was very small, so
`that the Signal could be approximated by
`
`- 19 -
`
`

`

`US 2003/O128848 A1
`
`Jul. 10, 2003
`
`0071. This is true if there is no speech, because by
`definition H=0. However, if speech is occurring, H is
`nonzero, and if set to be Cz',
`
`0078 Fortunately, the choice of H eliminates the need
`for a deconvolution. From the discussion above, the Signal
`can be written as
`
`M(z) - M2(3)H (3)
`S(z) = -ie in
`
`0072 which can be rewritten as
`
`S (3)
`
`M(z) - M2(3)H (3)
`in
`
`0079) which can be rewritten as
`
`S(X) =
`(3)
`
`1 - incoz, H. (3)
`
`1 - Coz, H1(3) + (1 - n) Cozi H1(z)
`
`0073. The last factor in the denominator determines the
`error due to the poor estimation of C. This factor is labeled
`E:
`
`0074 Because ZH, (z) is a filter, its magnitude will
`always be positive. Therefore the change in calculated Signal
`magnitude due to E will depend completely on (1-n).
`0075. There are two possibilities for errors: underestima
`tion of C (n-1), and overestimation of C (nd 1). In the first
`case, C is estimated to be Smaller that it actually is, or the
`signal is closer than estimated. In this case (1-n) and
`therefore E is positive. The denominator is therefore too
`large, and the magnitude of the cleaned signal is too Small.
`This would indicate de-Signaling. In the Second case, the
`Signal is farther away than estimated, and E is negative,
`making S larger than it should be. In this case the denoising
`is insufficient. Because very low Signal distortion is desired,
`the estimations should err toward overestimation of C.
`
`0.076 This result also shows that noise located in the
`same Solid angle (direction from M) as the Signal will be
`Substantially removed depending on the change in C
`between the Signal location and the noise location. Thus,
`when using a handset with M approximately 4 cm from the
`mouth, the required C is approximately 0.5, and for noise at
`approximately 1 meter the C is approximately 0.96. Thus,
`for the noise, the estimate of C=0.5 means that for the noise
`C is underestimated, and the noise will be removed. The
`amount of removal will depend directly on (1-n). Therefore,
`this algorithm uses the direction and the range to the Signal
`to Separate the Signal from the noise.
`0.077 One issue that arises involves stability of this
`technique. Specifically, the deconvolution of (1 -HH)
`raises the question of Stability, as the need

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket