throbber
Attorney Docket No. ALPH.POlOX
`
`UNITED STATED PATENT APPLICATION
`
`for
`
`Voice Activity Detector (V AD)-Based Multiple-Microphone Acoustic Noise Suppression
`
`Inventors:
`
`Gregory C. Burnett
`
`Eric F. Breitf eller
`
`Prepared by
`
`Shemwell Gregory & Courtney LLP
`4880 Stevens Creek Blvd., Suite 201
`San Jose, CA 95129
`408-236-6647
`
`Attorney Docket No. ALPH.OIOX
`
`EXPRESS MAIL CERTIFICATE OF MAILING
`
`"Express Mail" mailing label number: EV 326 938 875 US
`Date of Deposit:
`September 18, 2003
`I hereby certify that this paper is being deposited with the United States Postal
`Service "Express Mail Post Office to Addressee" service under 37 CFR § 1.10 on the date
`indicated above and is addressed to Mail Stop Patent Application, Commissioner for
`Patents, PO Box 1450, Alexandria, VA 22313-1450.
`
`1
`
`Amazon v. Jawbone
`U.S. Patent 8,280,072
`Amazon Ex. 1011
`
`

`

`Attorney Docket No. ALPH.P0l0X
`
`Voice Activity Detector {YAO) -Based Multiple-Microphone Acoustic Noise
`
`Suppression
`
`RELATED APPLICATIONS
`
`5
`
`This patent application is a continuation-in-part of United States Patent
`
`Appli~ation Number 09/905,361, filed July 12, 2001, which claims priority from United
`
`States Patent Application Number 60/219,297, filed July 19, 2000. This patent
`
`application also claims priority from United States Patent Application Number
`
`10/383,162, filed March 5, 2003.
`
`10
`
`FIELD OF THE INVENTION
`
`The disclosed embodiments relate to systems and methods for detecting and
`
`processing a desired signal in the presence of acoustic noise.
`
`15
`
`BACKGROUND
`
`Many noise suppression algorithms and techniques have been developed over the
`
`years. Most of the noise suppression systems in use today for speech communication
`
`systems are based on a single-microphone spectral subtraction technique first develop in
`
`the 1970's and described, for example, by S. F. Boll in "Suppression of Acoustic Noise in
`
`20
`
`Speech using Spectral Subtraction," IEEE Trans. on ASSP, pp. 113-120, 1979. These
`
`techniques have been refined over the years, but the basic principles of operation have
`
`remained the same. See, for example, United States Patent Number 5,687,243 of
`
`McLaughlin, et al., and United States Patent Number 4,811,404 ofVilmur, et al.
`
`Generally, these techniques make use of a microphone-based Voice Activity Detector
`
`25
`
`(V AD) to determine the background noise characteristics, where "voice" is generally
`
`understood to include human voiced speech, unvoiced speech, or a combination of voiced
`
`and unvoiced speech.
`
`The V AD has also been used in digital cellular systems. As an example of such a
`
`use, see Llnited States Patent Number 6,453,291 of Ashley, where a V AD configuration
`
`30
`
`appropriate to the front-end of a digital cellular system is described. Further, some Code
`
`Division Multiple Access (CDMA) systems utilize a V AD to minimize the effective radio
`
`spectrum used, thereby allowing for more system capacity. Also, Global System for
`
`2
`
`

`

`Attorney Docket No. ALPH.P0l0X
`
`Mobile Communication (GSM) systems can include a V AD to reduce co-channel
`
`interference and to reduce battery consumption on the client or subscriber device.
`
`These typical microphone-based V AD systems are significantly limited in
`
`capability as a result of the addition of environmental acoustic noise to the desired speech
`
`5
`
`signal received by the single microphone, wherein the analysis is performed using typical
`
`signal processing techniques. In particular, limitations in performance of these
`
`microphone-based V AD systems are noted when processing signals having a low signal(cid:173)
`
`to-noise ratjo (SNR), and in settings where the background noise varies quickly. Thus,
`
`similar limitations are found in noise suppression systems using these microphone-based
`
`10 VADs.
`
`3
`
`

`

`Attorney Docket No. ALPH.POIOX
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`Figure 1 is a block diagram of a denoising system, under an embodiment.
`Figure 2 is a block diagram including components of a noise removal algorithm,
`under the denoising system of an embodiment assuming a single noise source and direct
`
`5
`
`paths to the microphones.
`
`Figure 3 is a block diagram including front-end.components of a noise removal
`
`algorithm of an embodiment generalized ton distinct noise sources (these noise sources
`
`may be reflections or echoes of one another).
`
`Figure 4 is a block diagram including front-end components of a noise removal
`
`10
`
`algorithm of an embodiment in a general case where there· are n distinct noise sources and
`
`signal reflections.
`
`Figure 5 is a flow diagram of a denoising method, under an embodiment.
`
`Figure 6 shows results of a noise suppression algorithm of an embodiment for an
`
`American English female speaker in the presence of airport terminal noise that includes
`
`15 many other human speakers and public announcements.
`
`Figure 7A is a block diagram of a Voice Activity Detector (VAD) system
`
`including hardware for use in receiving and processing signals relating to V AD, under an
`
`embodiment.
`
`Figure 7B is a block diagram of a V AD system using hardware of a coupled noise
`
`20
`
`suppression system for use in receiving V AD information, under an alternative
`
`embodiment.
`
`Figure 8 is a flow diagram of a method for determining voiced and unvoiced
`
`speech using an accelerometer-based V AD, under an embodiment.
`
`Figure 9 shows plots including a noisy audio signal (live recording) along with a
`
`25
`
`corresponding accelerometer-based V AD signal, the corresponding accelerometer output
`
`signal, and the denoised audio signal following processing by the noise suppression
`
`system using the V AD signal, under an embodiment.
`
`Figure 10 shows plots including a noisy audio signal (live recording) along with a
`
`corresponding SSM-based V AD signal, the corresponding SSM output signal, and the
`
`30
`
`denoised audio signal following processing by the noise suppression system using the
`
`V AD signal, under an embodiment.
`
`4
`
`

`

`Attorney Docket No. ALPH.PO I OX
`
`Figure 11 shows plots including a noisy audio signal (live recording) along with a
`
`corresponding GEMS-based V AD signal, the corresponding GEMS output signal, and the
`
`denoised audio signal following processing by the noise suppression system using the
`
`V AD signal, under an embodiment.
`
`5
`
`

`

`Attorney Docket No. ALPH.POlOX
`
`DETAILED DESCRIPTION
`
`The following description provides specific details for a thorough understanding
`
`of, and enabling description for, embodiments of the noise suppression system.
`
`However, one skilled in the art will understand that the invention may be practiced
`
`5 without these details. In other instances, well-known structures and functions have not
`
`been shown or described in detail to avoid unnecessarily obscuring the description of the
`
`. embodiments of the noise suppression system. In the following description, "signal"
`
`represents any acoustic signal (such as human speech) that is desired, and "noise" is any
`
`acoustic signal (which may include human speech) that is not desired. An example
`
`10 would be a person talking on a cellular telephone with a radio in the background. The
`
`person's speech is desired and the acoustic energy from the radio is not desired. In
`
`addition, "user" describes a person who is using the device.and whose speech is desired
`
`to be captured by the system.
`
`Also, "acoustic" is generally defined as acoustic waves propagating in air.
`
`15
`
`Propagation of acoustic waves in media other than air will be noted as such. References
`
`to "speech" or "voice" generally refer to human speech including voiced speech,
`
`unvoiced speech, and/or a combination of voiced and unvoiced speech. Unvoiced speech
`
`or voiced speech is distinguished where necessary. The term "noise suppression"
`
`generally describes any method by which noise is reduced or eliminated in an electronic
`
`20
`
`signal.
`
`Moreover, the term "V AD" is generally defined as a vector or array signal, data,
`
`or information that in some manner represents the occurrence of speech in the digital or
`
`analog domain. A common representation ofV AD information is a one-bit digital signal
`
`sampled at the same rate as the corresponding acoustic signals, with a zero value
`
`25
`
`representing that no speech has occurred during the corresponding time sample, and a
`
`unity value indicating that speech has occurred during the corresponding time sample.
`
`While the embodiments described herein are generally described in the digital domain,
`
`the descriptions are also valid for the analog domain.
`
`Figure 1 is a block diagram of a denoising system 1000 of an embodiment that
`
`30
`
`uses knowledge of when speech is occurring derived from physiological information on
`
`voicing activity. The system 1000 includes microphones 10 and sensors 20 that provide
`
`6
`
`

`

`Attorney Docket No. ALPH.POlOX
`
`signals to at least one processor 30. The processor includes a denoising subsystem or
`
`algorithm 40.
`
`Figure 2 is a block diagram including components of a noise removal algorithm
`
`200 of an embodiment. A single noise source and a direct path to the microphones are
`
`5
`
`assumed. An operational description of the noise removal algorithm 200 of an
`
`embodiment is provided using a single signal source 100 and a single noise source 101,
`
`but is not so limited. This algorithm 200 uses two microphones: a "signal" microphone 1
`
`("MICl") and a "noise" microphone 2 ("MIC 2"), but is not so limited. The signal
`
`microphone MIC 1 is assumed to capture mostly signal with some noise, while MIC 2
`
`10
`
`captures mostly noise with some signal. The data from the signal source I 00 to MIC 1 is
`
`denoted by s(n), where s(n) is a discrete sample of the analog signal from the source 100.
`
`The data from the signal source 100 to MIC 2 is denoted by si(n). The data from the
`
`noise source 101 to MIC 2 is denoted by n(n). The data from the noise source 101 to
`
`MIC 1 is denoted by ni(n). Similarly, the data from MIC 1 to noise removal element 205
`
`15
`
`is denoted by mi(n), and the data from MIC 2 to noise removal element 205 is denoted by
`
`mi(n).
`
`The noise removal element 205 also receives a signal from a voice activity
`
`detection (V AD) element 204. The V AD 204 uses physiological information to
`
`determine when a speaker is speaking. In various embodiments, the V AD can include at
`
`20
`
`least one of an accelerometer, a skin surface microphone in physical contact with skin of
`
`a user, a human tissue vibration detector, a radio frequency (RF) vibration and/or motion
`
`detector/device, an electroglottograph, an ultrasound device, an acoustic microphone that
`
`is being used to detect acoustic frequency signals that correspond to the user's speech
`
`directly from the skin of the user ( anywhere on the body), an airflow detector, and a laser
`
`25
`
`vibration detector.
`
`The transfer functions from the signal source 100 to MIC 1 and from the noise
`
`source 101 to MIC 2 are assumed to be unity. The transfer function from the signal
`
`source 100 to MIC 2 is denoted by Hi(z), and the transfer function from the noise source
`
`101 to MIC 1 is denoted .by Hi(z). The assumption of unity transfer functions does not
`
`30
`
`inhibit the generality of this algorithm, as the actual relations between the signal, noise,
`
`and microphones are simply ratios and the ratios are redefined in this manner for
`
`simplicity.
`
`7
`
`

`

`Attorney Docket No. ALPH.PO I OX
`
`In conventional two-microphone noise removal systems, the information from
`
`MIC 2 is used to attempt to remove noise from MIC 1. However, an (generally
`
`unspoken) assumption is that the V AD element 204 is never perfect, and thus the
`
`denoising must be performed cautiously, so as not to remove too much of the signal along
`
`5 with the noise. However, if the V AD 204 is assumed to be perfect such that it is equal to
`
`zero when there is no speech being produced by the user, and equal to one when speech is
`
`produced, a substantial improvement in the noise removal can be made.
`
`In analyzing the single noise source 101 and the direct path to the microphones,
`with reference to Figure 2, the total acoustic information coming into MIC 1 is denoted
`
`10
`
`by mi(n). The total acoustic information coming into MIC 2 is similarly labeled mi(n).
`
`In the z (digital frequency) domain, these are represented as M 1(z) and Mi{z). Then,
`
`M 1(z) =S(z)+ N 2 (z)
`M 2 (z)=N(z)+S 2 (z)
`
`Ni(z)=N(z)Hi(z)
`S 2 (z) = S(z)H 2 (z),
`
`Mi(z)=S(z)+ N(z)Hi(z)
`M 2 (z) = N(z) + S(z)H 2 (z) .
`This is the general case for all two microphone systems. In a practical system
`
`Eq.
`
`1
`
`with
`
`so that
`
`15
`
`20
`
`there is always going to be some leakage of noise into MIC 1, and some leakage of signal
`
`into MIC 2. Equation 1 has four unknowns and only two known relationships and
`
`therefore cannot be solved explicitly.
`
`However, there is another way to solve for some of the unknowns in Equation 1.
`
`25
`
`The analysis starts with an examination of the case where the signal is not being
`
`generated, that is, where a signal from the V AD element 204 equals zero and speech is
`not being produced. In this case, s(n) = S(z) = 0, and Equation 1 reduces to
`
`M 1n(z)=N(z)H1(z)
`M 2n(z)=N(z),
`
`30
`
`where then subscript on the M variables indicate that only noise is being received. This
`
`leads to
`
`8
`
`

`

`Attorney Docket No. ALPH.PO I OX
`
`M 1Jz)=M 2Jz)HJz)
`HJz) M1n(z)
`M2n(z)
`
`Eq. 2
`
`The function H1(z) can be calculated using any of the available system
`identification algorithms and the microphone outputs when the system is certain that only
`
`5
`
`noise is being received. The calculation can be done adaptively, so that the system can
`
`react to changes in the noise.
`
`A solution is now available for one of the unknowns in Equation 1. Another
`
`unknown, Hi(z), can be determined by using the instances where the V AD equals one and
`
`10
`
`speech is being produced. When this is occurring, but the recent (perhaps less than 1
`
`second) history of the microphones indicate low levels of noise, it can be assumed that
`n(s) = N(z) ~ 0. Then Equation 1 reduces to
`
`15 which in tum leads to
`
`M 1s(z)=S(z)·
`M ls (z)=S(z)H 2 (z),
`
`M 2s(z)=M 1s(z)H2(z)
`
`H2(z) M2Jz)'
`M 1s(z}
`which is the inverse of the H1(z) calculation. However, it is noted that different inputs are
`being used (now only the signal is occurring whereas before only the noise was
`
`20
`
`occurring). While calculating Hi(z), the values calculated for H 1(z) are held constant and
`vice versa. Thus, it is assumed that while one ofH 1(z) and Hi(z) are being calculated, the
`one not being calculated does not change substantially.
`
`After calculating Hi(z) and Hi(z}, they are used to remove the noise from the
`
`signal. If Equation 1 is rewritten as
`
`25
`
`. S(z)=MJz)-N(z)HJz)
`N(z)=M 2(z)-S(z)H2(z)
`S(z)=M 1(z)-[Mifz)-S(z)H2(z)]HJz)'
`S(z)[l-H2(z)HJz)] =M1(z)-Mi(z)H1 (z),
`
`30
`
`then N(z) may be substituted as shown to solve for S(z) as
`
`9
`
`

`

`Attorney Docket No. ALPH.PO 1 OX
`
`S(z) M i(z)- M 2 (z)H i(z) .
`J-H2(z)H 1(z)
`· If the 1:!ansfer functions H1(z) and Hi{z) can be described with sufficient accuracy,
`then the noise can be completely removed and the original signal recovered. This
`
`Eq. 3
`
`remains true without respect to the amplitude or spectral characteristics of the noise. The
`
`5
`
`only assumptions made include use of a perfect V AD, sufficiently accurate Hi(z) and
`
`Hi{z), and that when one ofH 1(z) and Hi(z) are being calculated the other does not
`change substantially. In practice these assumptions have proven reasonable.
`
`The noise removal algorithm described herein is easily generalized to include any
`
`number of noise sources. Figure 3 is a block diagram including front-end components
`
`10
`
`300 of a noise removal algorithm of an embodiment, generalized to n distinct noise
`
`sources. These distinct noise sources may be reflections or echoes of one another, but are
`
`not so limited. There are several noise sources shown, each with a transfer function, or
`path, to each microphone. The previously named path H2 has been relabeled as H0, so
`that labeling noise source 2's path to MIC 1 is more convenient. The outputs of each
`
`15 microphone, when transformed to the z domain, are:
`
`M 1 (z) = S(z) + N 1 (z)H1 (z) + N 2 (z)H 2 (z) + ... Nn (z)H n (z)
`M 2(z)=S(z)H0 (z)+NJz)Gi(z)+N2(z)G2(z)+ ... Nn(z)Gn(z).
`
`Eq. 4
`
`When there is no signal (V AD= 0), then (suppressing z for clarity) .
`
`20
`
`Min =N1H1 +N2H2 + ... NnHn
`M2n =N1G1 +N2G2 + ... NnGn •
`
`Eq. 5
`
`A new transfer function can now be defined as
`
`fj == Min NIH! +N2H 2 + ... N,,Hn
`M2n NIGi +N2G2 + ... NnGn
`25 where ii I is analogous to H1(z) above. Thus ii I depends only on the noise sources and
`their respective transfer functions and can be calculated any time there is no signal being
`
`Eq. 6
`
`I
`
`'
`
`transmitted. Once again, the "n" subscripts on the microphone inputs denote only that
`
`10
`
`

`

`Attorney Docket No. ALPH.PO 1 OX
`
`noise is being detected, while an "s" subscript denotes that only signal is being received
`
`by the microphones.
`
`Examining Equation 4 while assuming an absence of noise produces
`
`5
`
`Mis ==S
`M2s =SHo.
`
`Thus, H0 can be solved for as before, using any available transfer function calculating
`algorithm. Mathematically, then,
`
`10
`
`Rewriting Equation 4, using ii I defined in Equation 6, provides,
`
`Eq. 7
`
`Solving for S yields,
`
`s M1-M2_H1'
`I-HOHi
`15 which is the same as Equation 3, with H0 taking the place ofH2, and ii. 1 taking the place
`ofH1• Thus the noise removal algorithm still is mathematically valid for any number of
`noise sources, including multiple echoes of noise sources. Again, ifH0 and ii. 1 can be
`estimated to a high enough accuracy, and the above assumption of only one path from the
`
`Eq. 8
`
`signal to the microphones holds, the noise may be removed completely.
`
`20
`
`The most general case involves multiple noise sources and multiple signal
`
`sources. Figure 4 is a block diagram including front-end components 400 of a noise
`
`removal algorithm of an embodiment in the most general case where there are n distinct
`
`noise sources and signal reflections. Here, signal reflections enter both microphones MIC
`
`1 and MIC 2. This is the most general case, as reflections of the noise source into the
`
`25 microphones MIC 1 and MIC 2 can be modeled accurately as simple additional noise
`
`sources. For clarity, the direct path from the signal to MIC 2 is changed from Ho(z) to
`
`11
`
`

`

`Attorney Docket No. ALPH.PO 1 OX
`
`H00(z), and the reflected paths to MIC 1 and MIC 2 are denoted by H01(z) and H02(z),
`respectively.
`The input into the microphones now becomes
`
`M 1 (z) = S(z) + S(z)H 01 (z) + N 1 (z)H 1 (z) + N 2 (z)H 2 (z) + ... Nn (z)H n (z)
`M 2(z)==S(z)H00 (z)+S(z)H0Jz)+Ni(z)Gi(z)+N2(z)G2(z)+ ... Nn(z)G,Jz). Eq. 9
`
`When the V AD= 0, the inputs become (suppressing z again)
`
`Min =N1H1 +N2H2 + ... NnHn
`M2n =N1G1 +N2G2 + ... NnGn •
`
`which is the same as Equation 5. Thus, the calculation of H1 in Equation 6 is unchanged,
`as expected. In examining the situation where there is no noise, Equation 9 reduces to
`
`5
`
`10
`
`Mls S+SHOJ
`M2s =SHoo +SH02 ·
`This leads to the definition of ii 2 as
`
`15
`
`H~ _ M2s Hoo +Ho2
`Mls
`l+Ho1
`
`2-
`
`Rewriting Equation 9 again using the definition for ii1 (as in Equation 7)
`provides
`
`Eq. 10
`
`Eq. 11
`
`20
`
`Some algebraic manipulation yields
`
`S(l+Ho1-iilHoo +Ho2J)=M1 -M2ii1
`
`I. (l+H01)
`
`I
`
`2
`
`I
`
`S(l+H )[1-fi (Hoo +Ho2 )]=M -M ii
`S(l+H01 J[1-if1fi 2 ]=M 1 -M2fi1'
`
`OJ
`
`and finally
`
`12
`
`

`

`Attorney Docket No. ALPH.PO 1 OX
`
`S(l+Ho1) M1-::12,.__,H1
`J-H1H2
`Equation 12 is the same as equation 8, with the replacement ofH0 by H2 , and the
`addition of the (1 + H01 ) factor on the left side. This extra factor (1 + H01 ) means that S
`cannot be solved for directly in this situation, but a solution can be generated for the
`
`Eq. 12
`
`5
`
`signal plus the addition of all of its echoes. This is not such a bad situation, as there are
`
`many conventional methods for dealing with echo suppression, and even if the echoes are
`
`not suppressed, it is unlikely that they will affect the comprehensibility of the speech to
`any meaningful extent. The more complex calculation of H2 is needed to account for the
`signal echoes in MIC 2, which act as noise sources.
`
`10
`
`Figure 5 is a flow diagram 500 of a denoising algorithm, under an embodiment.
`
`In operation, the acoustic signals are received, at block 502. Further, physiological
`
`information associated with human voicing activity is received, at block 504. A first
`
`transfer function representative of the acoustic signal is calculated upon determining that
`
`voicing information is absent from the acoustic signal for at least one specified period of
`
`15
`
`time, at block 506. A second transfer function representative of the acoustic signal is
`
`calculated upon determining that voicing information is present in the acoustic signal for
`
`at least one specified period of time, at block 508. Noise is removed from the acoustic
`
`signal using at least one combination of the first transfer function and the second transfer
`
`function, producing denoised acoustic data streams, at block 510.
`
`20
`
`An algorithm for noise removal, or denoising algorithm, is described herein, from
`
`the simplest case of a single noise source with a direct path to multiple noise sources with
`
`reflections and echoes. The algorithm has been shown herein to be viable under any
`
`25
`
`environmental conditions. The type and amount of noise are inconsequential if a good
`estimate has been made of H1 and H2 , and if one does not change substantially while the
`other is calculated. If the user environment is such that echoes are present, they can be
`compensated for if coming from a noise source. If signal echoes are also present, they
`will affect the cleaned signal, but the effect should be negligible in most environments.
`
`In operation, the algorithm of an embodiment has shown excellent results in
`
`dealing with a variety of noise types, amplitudes, and orientations. However, there are
`
`30
`
`always approximations and adjustments that have to be made when moving from
`
`13
`
`

`

`Attorney Docket No. ALPH.P0l0X
`
`mathematical concepts to engineering applications. One asswnption is made in Equation
`
`3, where Hi(z) is assumed small and therefore Hlz)Hlz) ~ 0, so that Equation 3 reduces
`
`to
`
`5
`
`S(z) ~ M 1 (z)- M 2 (z)H 1 (z).
`This means that only H1(z) has to be calculated, speeding up the process and reducing the
`number of computations required considerably. With the proper selection of
`
`microphones, this approximation is easily realized.
`
`Another approximation involves the filter used in an embodiment. The actual
`
`H1(z) will undoubtedly have both poles and zeros, but for stability and simplicity an all-
`zero Finite Impulse Response (FIR) filter is used. With enough taps the approximation to
`
`10
`
`the actual H 1(z) can be very good.
`To further increase the performance of the noise suppression system, the spectrum
`
`of interest (generally about 125 to 3700 Hz) is divided into subbands. The wider the
`
`range of frequencies over which a transfer function must be calculated, the more difficult
`
`15
`
`it is to calculate it accurately. Therefore the acoustic data was divided into 16 subbands,
`
`and the denoising algorithm was then applied to each subband in turn. Finally, the 16
`
`denoised data streams were recombined to yield the denoised acoustic data. This works
`
`very well, but any combinations of subbands (i.e., 4, 6, 8, 32, equally spaced,
`
`perceptually spaced, etc.) can be used and all have been found to work better than a single
`
`20
`
`subband.
`
`The amplitude of the noise was constrained in an embodiment so that the
`
`microphones used did not saturate (that is, operate outside a linear response region). It is
`
`important that the microphones operate linearly to ensure the best performance. Even
`
`with this restriction, very low signal-to-noise ratio (SNR) signals can be denoised (down
`
`25
`
`to -10 dB or less).
`
`The calculation ofH 1(z) is accomplished every 10 milliseconds using the Least(cid:173)
`Mean Squares (LMS) method, a common adaptive transfer function. An explanation may
`
`be found in ''Adaptive Signal Processing" (1985), by Widrow and Steams, published by
`
`Prentice-Hall, ISBN 0-13-004029-0. The LMS was used for demonstration purposes, but
`
`30 many other system idenfication techniques can be used to identify H 1(z) and Hi(z) in
`Figure 2.
`
`14
`
`

`

`Attorney Docket No. ALPH.PO I OX
`
`The V AD for an embodiment is derived from a radio frequency sensor and the
`
`two microphones, yielding very high accuracy (>99%) for both voiced and unvoiced
`
`speech. The V AD of an embodiment uses a radio frequency (RF) vibration detector
`
`interferometer to detect tissue motion associated with human speech production, but is
`
`5
`
`not so limited. The signal from the RF device is completely acoustic-noise free, and is
`
`able to function in any acoustic noise environment. A simple energy measurement of the
`
`RF signal can be used to determine if voiced speech is occurring. Unvoiced speech can
`
`be determined using conventional acoustic-based methods, by proximity to voiced
`
`sections determined using the RF sensor or similar voicing sensors, or through a
`
`10
`
`combination of the above. Since there is much less energy in unvoiced speech, its
`
`detection accuracy is not as critical to good noise suppression performance as is voiced
`
`speech.
`
`With voiced and unvoiced speech detected reliably, the algorithm of an
`
`embodiment can be implemented. Once again, it is useful to repeat that the noise
`
`15
`
`removal algorithm does not depend on how the V AD is obtained, only that it is accurate,
`
`especially for voiced speech. If speech is not detected and training occurs on the speech,
`
`the subsequent denoised acoustic data can be distorted.
`
`Data was collected in four channels, one for MIC 1, one for MIC 2, and two for
`
`the radio frequency sensor that detected the tissue motions associated with voiced speech.
`
`20
`
`The data were sampled simultaneously at 40 kHz, then digitally filtered and decimated
`
`down to 8 kHz. The high sampling rate was used to reduce any aliasing that might result
`
`from the analog to digital process. A four-channel National Instruments AID board was
`
`used along with Labview to capture and store the data. The data was then read into a C
`
`program and denoised 10 milliseconds at a time.
`
`25
`
`Figure 6 shows a denoised audio 602 signal output upon application of the noise
`
`suppression algorithm of an embodiment to a dirty acoustic signal 604, under an
`
`embodiment. The dirty acoustic signal 604 includes speech of an American English(cid:173)
`
`speaking female in the presence of airport terminal noise where the noise includes many
`
`other human speakers and public announcements. The speaker is uttering the numbers
`
`30
`
`"406 5562" in the midst of moderate airport terminal noise. The dirty acoustic signal 604
`
`was denoised 10 milliseconds at a time, and before denoising the 10 milliseconds of data
`
`were prefiltered from 50 to 3 700 Hz. A reduction in the noise of approximately 1 7 dB is
`
`15
`
`

`

`Attorney Docket No. ALPH.PO I OX
`
`evident. No post filtering was done on this sample; thus, all of the noise reduction
`realized is due to the algorithm of an embodiment. It is clear that the algorithm adjusts to
`
`the noise instantly, and is capable of removing the very difficult noise of other human
`
`speakers. Many different types of noise have all been tested with similar results,
`
`5
`
`including street noise, helicopters, music, and sine waves. Also, the orientation of the
`
`noise can be varied substantially without significantly changing the noise suppression
`
`performance. Finally, the distortion of the cleaned speech is very low, ensuring good
`
`performance for speech recognition engines and human receivers alike.
`
`The noise removal algorithm of an embodiment has been shown to be viable
`
`10
`
`under any environmental conditions. The type and amount of noise are inconsequential if
`a good estimate has been made of ii 1 and ii 2 • If the user environment is such that
`echoes are present, they can be compensated for if corning from a noise source. If signal
`
`echoes are also present, they will affect the cleaned signal, but the effect should be
`
`negligible in most environments.
`
`15
`
`When using the V AD devices and methods described herein with a noise
`
`suppression system, the V AD signal is processed independently of the noise suppression
`
`system, so that the receipt and processing ofV AD information is independent from the
`
`processing associated with the noise suppression, but the embodiments are not so limited.
`
`This independence is attained physically (i.e., different hardware for use in receiving and
`
`20
`
`processing signals relating to the V AD and the noise suppression), but is not so limited.
`
`The V AD devices/methods described herein generally include vibration and
`
`movement sensors, but are not so limited. In one emb?diment, an accelerometer is placed
`
`on the skin for use in detecting skin surface vibrations that correlate with hwnan speech.
`
`These recorded vibrations are then used to calculate a V AD signal for use with or by an
`
`25
`
`adaptive noise suppression algorithm in suppressing environmental acoustic noise from a
`
`simultaneously (within a few milliseconds) recorded acoustic signal that includes both
`
`speech and noise.
`
`Another embodiment of the V AD devices/methods described herein includes an
`
`. acoustic microphone modified with a membrane so that the microphone no longer
`
`30
`
`efficiently detects acoustic vibrations in air. The membrane, though, allows the
`
`microphone to detect acoustic vibrations in objects with which it is in physical contact
`
`(allowing a good mechanical impedance match), such as human skin. That is, the
`
`16
`
`

`

`Attorney Docket No. ALPH.POIOX
`
`acoustic microphone is modified in some way such that it no longer detects acoustic
`
`vibrations in air (where it no longer has a good physica\ impedance match), but only in
`
`objects with which the microphone is in contact. This configures the microphone, like
`
`the accelerometer, to detect vibrations of human skin associated with the speech
`
`5
`
`production of that human while not efficiently detecting acoustic environmental noise in
`
`the air. The detected vibrations are processed to form a V AD signal for use in a noise
`
`suppression system, as detailed below.
`
`Yet another embodiment of the V AD described herein uses an electromagnetic
`
`vibration sensor, such as a radiofrequency vibrometer (RF) or laser vibrometer, which
`
`10
`
`detect skin vibrations. Further, the RF vibrometer detects the movement of tissue within
`
`the body, such as the inner surface of the cheek or the tracheal wall. Both the exterior
`
`skin and internal tissue vibrations associated with speech production can be used to form
`
`a VAD signal for use in a noise suppression system as detailed below.
`Figure 7 A is a block diagram of a V AD system 702A including hardware for use
`in receiving and processing signals relating to V AD, under an embodiment. The V AD
`
`15
`
`system 702A includes a V AD device 730 coupled to provide data to a corresponding
`
`VAD algorithm 740. Note that noise suppression systems of alternative embodiments
`
`can integrate some or all functions of the V AD algorithm with the noise suppression
`
`processing in any manner obvious to those skilled in the art. Referring to Figure 1, the
`
`20
`
`voicing sensors 20 include the V AD system 702A, for example, but are not so limited.
`
`Referring to Figure 2, the V AD includes the V AD system 702A, for example, but is not
`
`so limited.
`
`Figure 7B is a block diagram of a V AD system 702B using hardware of the
`
`associated noise suppression system 701 for use in receiving V AD information 764,
`
`25
`
`under an embodiment. The V AD system 702B includes a V AD algorithm 750 that
`
`receives data 764 from MIC 1 and MIC 2, or other components, of the corresponding
`
`signal processing system 700. Alternative embodiments of the noise suppression system
`
`can integrate some or all functions of the V AD algorithm with the noise suppression
`
`processing in any manner obvious to those skilled in the art.
`
`30
`
`The vibration/movement-based V AD devices described herein include the
`
`physical hardware devices for use in receiving and processing signals relating to the V AD
`
`and the noise suppression. As a speaker or user produces speech, the resulting vibrations
`
`17
`
`

`

`Attorney Docket No. ALPH.PO I OX
`
`propagate through the tissue of the speak

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket