`61/045,377
`
`PATENT NUMBER
`
`GROUP ART UNIT
`
`FILE WRAPPER LOCATION
`
`Ul\TfED STATES DEPA RTME'IT OF COMMERCE
`United States Patent and Trademark Office
`Adiliess. COMMISSIO'JER FOR PATENTS
`PO Box 1450
`Alexandria, Virgmia 22313-1450
`\VVi\V.USpto.gov
`
`1111111111111111111111 m~mmmmiui,w, 11111111111111111111111
`
`Correspondence Address/Fee Address Change
`
`The following fields have been set to Customer Number 98195 on 08/09/2010
`• Correspondence Address
`• Maintenance Fee Address
`• Power of Attorney Address
`
`The address of record for Customer Number 98195 is:
`
`98195
`Gregory & Martensen LLP
`2018 Bissonnet Street
`Houston, TX 77005
`
`PART 1 - ATTORNEY/APPLICANT COPY
`page 1 of 1
`
`- 1 -
`
`Amazon v. Jawbone
`U.S. Patent 11,122,357
`Amazon Ex. 1011
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`
`APPLICATION
`NUMBER
`61/045,377
`
`FILING or
`37l(c)DATE
`04/16/2008
`
`GRPART
`UNIT
`
`FIL FEE REC'D
`210
`
`53186
`COURTNEY STANIFORD & GREGORY LLP
`P.O. BOX 9686
`SAN JOSE, CA 95157
`
`UKITED STATES DEPARTME.\IT OF COMMERCE
`United States Patent and Trademark Office
`Address: COMMISSIO'JER FOR PATENTS
`P.O. Box 1450
`Alexandria, V:irgmia 22313-1450
`\V½w.uspto.gov
`
`ATTY.DOCKET.NO
`ALPH.P035P2
`
`TOT CLAIMS IND CLAIMS
`
`CONFIRMATION NO. 2839
`FILING RECEIPT
`
`1111111111111111111111 ll]~!l~!~l!~l!~l!~UU!~] 111111111111111 IIII IIII
`
`Date Mailed: 05/12/2008
`
`Receipt is acknowledged of this provisional patent application. It will not be examined for patentability and will
`become abandoned not later than twelve months after its filing date. Any correspondence concerning the application
`must include the following identification information: the U.S. APPLICATION NUMBER, FILING DATE, NAME OF
`APPLICANT, and TITLE OF INVENTION. Fees transmitted by check or draft are subject to collection. Please verify
`the accuracy of the data presented on this receipt. If an error is noted on this Filing Receipt, please submit
`a written request for a Filing Receipt Correction. Please provide a copy of this Filing Receipt with the
`changes noted thereon. If you received a "Notice to File Missing Parts" for this application, please submit
`any corrections to this Filing Receipt with your reply to the Notice. When the USPTO processes the reply
`to the Notice, the USPTO will generate another Filing Receipt incorporating the requested corrections
`
`Applicant(s)
`
`Greg Burnett, Dodge Center, MN;
`Power of Attorney: The patent practitioners associated with Customer Number 53186
`
`If Required, Foreign Filing License Granted: 05/08/2008
`The country code and number of your priority application, to be used for filing abroad under the Paris Convention,
`is US 61 /045,377
`Projected Publication Date: None, application is not eligible for pre-grant publication
`Non-Publication Request: No
`Early Publication Request: No
`Title
`
`Generalized Omnidirectional Implementation of an Adaptive Noise Suppression System
`
`PROTECTING YOUR INVENTION OUTSIDE THE UNITED STATES
`
`Since the rights granted by a U.S. patent extend only throughout the territory of the United States and have no
`effect in a foreign country, an inventor who wishes patent protection in another country must apply for a patent
`in a specific country or in regional patent offices. Applicants may wish to consider the filing of an international
`application under the Patent Cooperation Treaty (PCT). An international (PCT) application generally has the same
`effect as a regular national patent application in each PCT-member country. The PCT process simplifies the filing
`of patent applications on the same invention in member countries, but does not result in a grant of "an international
`patent" and does not eliminate the need of applicants to file additional documents and fees in countries where patent
`protection is desired.
`
`page 1 of 3
`
`- 2 -
`
`
`
`Almost every country has its own patent law, and a person desiring a patent in a particular country must make an
`application for patent in that country in accordance with its particular laws. Since the laws of many countries differ
`in various respects from the patent law of the United States, applicants are advised to seek guidance from specific
`foreign countries to ensure that patent rights are not lost prematurely.
`
`Applicants also are advised that in the case of inventions made in the United States, the Director of the US PTO must
`issue a license before applicants can apply for a patent in a foreign country. The filing of a U.S. patent application
`serves as a request for a foreign filing license. The application's filing receipt contains further information and
`guidance as to the status of applicant's license for foreign filing.
`
`Applicants may wish to consult the USPTO booklet, "General Information Concerning Patents" (specifically, the
`section entitled "Treaties and Foreign Patents") for more information on timeframes and deadlines for filing foreign
`patent applications. The guide is available either by contacting the USPTO Contact Center at 800-786-9199, or it
`can be viewed on the USPTO website at http://www.uspto.gov/web/offices/pac/doc/general/index.html.
`
`For information on preventing theft of your intellectual property (patents, trademarks and copyrights), you may wish
`to consult the U.S. Government website, http://www.stopfakes.gov. Part of a Department of Commerce initiative,
`this website includes self-help "toolkits" giving innovators guidance on how to protect intellectual property in specific
`countries such as China, Korea and Mexico. For questions regarding patent enforcement issues, applicants may
`call the U.S. Government hotline at 1-866-999-HAL T (1-866-999-4158).
`
`LICENSE FOR FOREIGN FILING UNDER
`
`Title 35, United States Code, Section 184
`
`Title 37, Code of Federal Regulations, 5.11 & 5.15
`
`GRANTED
`
`The applicant has been granted a license under 35 U.S.C. 184, if the phrase "IF REQUIRED, FOREIGN FILING
`LICENSE GRANTED" followed by a date appears on this form. Such licenses are issued in all applications where
`the conditions for issuance of a license have been met, regardless of whether or not a license may be required as
`set forth in 37 CFR 5.15. The scope and limitations of this license are set forth in 37 CFR 5.15(a) unless an earlier
`license has been issued under 37 CFR 5.15(b). The license is subject to revocation upon written notification. The
`date indicated is the effective date of the license, unless an earlier license of similar scope has been granted under
`37 CFR 5.13 or 5.14.
`
`This license is to be retained by the licensee and may be used at any time on or after the effective date thereof unless
`it is revoked. This license is automatically transferred to any related applications(s) filed under 37 CFR 1.53(d). This
`license is not retroactive.
`
`The grant of a license does not in any way lessen the responsibility of a licensee for the security of the subject matter
`as imposed by any Government contract or the provisions of existing laws relating to espionage and the national
`security or the export of technical data. Licensees should apprise themselves of current regulations especially with
`respect to certain countries, of other agencies, particularly the Office of Defense Trade Controls, Department of
`State (with respect to Arms, Munitions and Implements of War (22 CFR 121-128)); the Bureau of Industry and
`Security, Department of Commerce (15 CFR parts 730-774); the Office of Foreign AssetsControl, Department of
`Treasury (31 CFR Parts 500+) and the Department of Energy.
`
`page 2 of 3
`
`- 3 -
`
`
`
`NOT GRANTED
`
`No license under 35 U.S.C. 184 has been granted at this time, if the phrase "IF REQUIRED, FOREIGN FILING
`LICENSE GRANTED" DOES NOT appear on this form. Applicant may still petition for a license under 37 CFR 5.12,
`if a license is desired before the expiration of 6 months from the filing date of the application. If 6 months has lapsed
`from the filing date of this application and the licensee has not received any indication of a secrecy order under 35
`U.S.C. 181, the licensee may foreign file the application pursuant to 37 CFR 5.15(b).
`
`page 3 of 3
`
`- 4 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`
`Transmittal of Documents
`
`Certification Under 37 CF R §l.8(a)
`
`April 16, 2008
`Date of Transmission
`
`Transmitted via
`
`USPTOEFS
`
`I hereby certify that this document, and any other accompanying documents referred to herein are being
`transmitted to the United States Patent Office via EFS in accordance with 37 C.F.R. §l.6(a)(4) on the
`date indicated above.
`
`Jerry Donnard
`
`(Print Name of Person Transmitting Documents)
`
`Application Data Sheet;
`Provisional Patent Application;
`Drawings; and
`Electronic payment of filing fee.
`
`- 5 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`United States Patent Application for
`Generalized Omnidirectional Implementation of
`an Adaptive Noise Suppression System
`
`Inventor: Greg Burnett
`
`1 / 42
`
`- 6 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`RELATED APPLICATION
`
`This application is related to United States (US) Patent Application
`
`Number 60/953,444, filed August 1, 2007.
`
`TECHNICAL FIELD
`
`The disclosure herein relates generally to noise suppression. In
`
`particular, this disclosure relates to noise suppression systems, devices, and
`
`methods for use in acoustic applications.
`
`INCORPORATION BY REFERENCE
`
`Each patent, patent application, and/or publication mentioned in this
`
`specification is herein incorporated by reference in its entirety to the same
`
`extent as if each individual patent, patent application, and/or publication was
`
`specifically and individually indicated to be incorporated by reference.
`
`2142
`
`- 7 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`Abstract
`
`Embodiments of a noise suppression system are described. A dual
`omnidirectional microphone array (DOMA) implementation of a noise
`suppression system is presented. Two virtual microphones are constructed
`using two physical omnidirectional microphones. The virtual directional
`microphones are then used along with a VAD signal and an adaptive noise
`suppression algorithm to remove a significant amount of noise without
`adversely affecting the speech. The array herein is significantly different
`from conventional arrays in that the only null in the virtual microphones of an
`embodiment is for the user's speech, not the noise. The array and
`corresponding algorithm of an embodiment significantly increase the signal(cid:173)
`to-noise ratio (SNR) of the user's speech signal and have been demonstrated
`to be robust with respect to variations in array-to-user's-mouth distance and
`orientation as well as ambient temperature, to name a few.
`
`3 / 42
`
`- 8 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`Brief Description of Figures
`
`Figure 1: A two-microphone adaptive noise suppression system.
`
`Figure 2: Array and speech source (S) configuration, under an
`
`embodiment. The microphones are separated by a distance
`approximately equal to 2d 0 , and the speech source is located a
`distance d5 away from the midpoint of the array at an angle 0.
`The system is axially symmetric so only ds and 0 need be
`
`specified.
`
`Figure 3:
`
`Flow diagram for a first order gradient microphone using two
`
`omnidirectional elements 01 and 02, under an embodiment.
`
`The output of each is delayed, multiplied by a constant, and
`
`subtracted from the other to get the output.
`
`Figure 4:
`
`Linear response of V2 to a 1 kHz speech source at a distance of
`
`0.1 m, under an embodiment. Note the null at O degrees,
`
`where the speech is normally located.
`
`Figure 5:
`
`Linear response of V2 to a 1 kHz noise source at a distance of
`
`1.0 m, under an embodiment. Note that there is no null - all
`
`noise sources are detected.
`
`Figure 6:
`
`Linear response of Vl to a 1 kHz speech source at a distance of
`
`0.1 m, under an embodiment. There is no null and the response
`
`for speech is greater than that shown in Figure 4.
`
`Figure 7:
`
`Linear response of Vl to a 1 kHz noise source at a distance of
`
`1.0 m, under an embodiment. There is no null and the response
`
`is very similar to V2 shown in Figure 5.
`
`4142
`
`- 9 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`Figure 8:
`
`Linear response of Vl to a speech source at a distance of 0.1 m
`
`for frequencies of 100, 500, 1000, 2000, 3000, and 4000 Hz,
`
`under an embodiment.
`
`Figure 9: Comparison of frequency responses for speech for the array of
`
`an embodiment and for a conventional cardioid microphone.
`
`Figure 10: Speech response for Vl (top, dashed) and V2 (bottom, solid) vs.
`
`B with ds assumed to be 0.1 m, under an embodiment. Note
`
`that the spatial null in V2 is rather broad.
`
`Figure 11: Ratio of Vl / V2 speech responses shown in Figure 10 versus B,
`
`under an embodiment. Note that the ratio is above 10 dB for all
`0.8 < B < 1.1. This means that the physical f3 of the system
`need not be exactly modeled for good performance.
`
`Figure 12: Plot of B vs. actual d5 assuming that d5 = 10 cm and theta = 0,
`under an embodiment.
`
`Figure 13: Plot of B vs. theta with d5 = 10 cm and assuming d5 = 10 cm,
`under an embodiment.
`
`Figure 14: Amplitude (top) and phase (bottom) response of N(s) with B = 1
`and D = -7.2 µsec, under an embodiment. The resulting phase
`difference clearly affects high frequencies more than low.
`
`Figure 15: Amplitude (top) and phase (bottom) response of N(s) with B =
`1.2 and D = -7.2 µsec, under an embodiment. Non-unity B
`affects the entire frequency range.
`
`Figure 16: Amplitude (top) and phase (bottom) response of the effect on
`
`the speech cancellation in V2 due to a mistake in the location of
`
`5 I 42
`
`- 10 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`the speech source with ql = 0 degrees and q2 = 30 degrees,
`under an embodiment. Note that the cancellation is still below -
`10 dB for frequencies below 6 kHz.
`
`Figure 17: Amplitude (top) and phase (bottom) response of the effect on
`the speech cancellation in V2 due to a mistake in the location of
`the speech source with ql = 0 degrees and q2 = 45 degrees,
`under an embodiment. Now the cancellation is below -10 dB
`only for frequencies below about 2.8 kHz and a reduction in
`performance ls expected.
`
`Figure 18: Experimental esults for a 2d 0 = 19 mm array using a linear~ of
`0.83 on a Brue! and Kjaer Head and Torso Simulator (HATS) in
`very loud ( ~85 dBA) music/speech noise environment, under an
`embodiment. The noise has been reduced by about 25 dB and
`the speech hardly affected, with no noticeable distortion.
`
`6 / 42
`
`- 11 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`Definitions
`
`Unless otherwise specified, the following terms have the corresponding meanings in
`addition to any meaning or understanding they may convey to one skilled in the
`art:
`
`Bleedthrough: The undesired presence of noise during speech.
`Denoising: Removing unwanted noise from Micl
`Devoicing: Removing/distorting the desired speech from Micl
`Directional microphone (DM): A physical directional microphone that is vented on
`both sides of the sensing diaphragm.
`Micl (Ml): General designation for an adaptive noise suppression system
`microphone that usually contains more speech than noise.
`Mic2 (M2): General designation for an adaptive noise suppression system
`microphone that usually contains more noise than speech.
`Noise: Unwanted environmental acoustic noise.
`Null: A zero or minima in the spatial response of a physical or virtual directional
`microphone.
`
`01, 02: The physical omnidirectional microphones used to form the array
`Speech: Desired speech of the user.
`SSM: Skin Surface Microphone used in an earpiece ( e.g., the Jawbone earpiece
`available from Aliph of San Francisco, California) to detect speech vibrations on the
`user's skin.
`
`Vl: The virtual directional "speech" microphone - has no nulls.
`V2: The virtual directional "noise" microphone - has a null for the user's speech.
`VAD: Voice Activity Detection signal - indicates when user speech is detected.
`Virtual directional microphones (VM): A microphone constructed using two or more
`omnidirectional microphones and signal processing.
`
`7 / 42
`
`- 12 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`I. Background
`
`Adaptive noise suppression algorithms have been around for some time. The basic
`algorithmic flow is shown in Figure 1. Two or more microphones are used to
`
`sample both an (unwanted) acoustic noise field and the (desired) speech of a user.
`
`The noise relationship between the microphones is then determined using an
`adaptive filter (such as Least-Mean-Squares as described in Haykin & Widrow,
`
`ISBN# 0471215708, Wiley, 2002, but any adaptive or stationary system
`
`identification algorithm may be used) and that relationship used to filter the noise
`
`from the desired signal.
`
`Most of the noise suppression systems in use today for speech communication
`
`systems are based on a single-microphone spectral subtraction technique first
`
`develop in the 1970's and described, for example, by S. F. Boll in "Suppression of
`
`Acoustic Noise in Speech using Spectral Subtraction," IEEE Trans. on ASSP, pp.
`
`113-120, 1979. These techniques have been refined over the years, but the basic
`
`principles of operation have remained the same. See, for example, United States
`
`Patent Number 5,687,243 of McLaughlin, et al., and United States Patent Number
`
`4,811,404 of Vilmur, et al. There have also been several attempts at multi(cid:173)
`
`microphone noise suppression systems, such as those outlined in United States
`
`Patent Number 5,406,622 of Silverberg et al. and United States Patent Number
`
`5,463,694 of Bradley et al. Multi-microphone systems have not been very
`
`successful for a variety of reasons, the most compelling being poor noise
`
`cancellation performance and/or significant speech distortion. Primarily,
`
`conventional multi-microphone systems attempt to increases the SNR of the user's
`
`speech by "steering" the nulls of the system to the strongest noise sources. This
`
`approach is limited in the number of noise sources removed by the number of
`
`available nulls.
`
`The Jawbone earpiece, introduced by Aliphcom in December 2006, was the first
`
`known commercial product to use a pair of physical directional microphones
`
`(instead of omnidirectional microphones) to reduce environmental acoustic noise.
`
`8 / 42
`
`- 13 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`The technology behind the Jawbone is currently described under one or more of US
`
`Patent Number 7,246,058 by Burnett and/or US Patent Application Numbers
`
`10/400,282, 10/667,207, and/or 10/769,302. Generally, multi-microphone
`
`techniques make use of an acoustic-based Voice Activity Detector (VAD) to
`
`determine the background noise characteristics, where "voice" is generally
`
`understood to include human voiced speech, unvoiced speech, or a combination of
`
`voiced and unvoiced speech. The Jawbone improved on this by using a
`
`microphone-based sensor to construct a VAD signal using directly detected speech
`
`vibrations in the user's cheek. This allowed the Jawbone to aggressively remove
`noise \AJhen the user \"Jas not producing speech. Ho\"Jever, the current Javvbcne
`
`product uses a directional microphone array which can be difficult to manufacture
`
`reliably, and an omnidirectional array replacement was desired.
`
`II. Adaptive noise suppression basics
`
`In analyzing the single noise source 101 and the direct path to the microphones,
`with reference to Figure 1, the total acoustic information coming into MIC 1 (102,
`which can be an physical or virtual microphone) is denoted by m1(n). The total
`acoustic information coming into MIC 2 (103, which can also be an physical or
`virtual microphone) is similarly labeled m2(n). In the z ( digital frequency) domain,
`these are represented as M1(z) and M2(z). Then,
`
`with
`
`so that
`
`M 1 (z) =S(z)+ N 2 (z)
`M 2 (z)= N(z) +S 2 (z)
`
`N 2 (z)= N(z)H 1 (z)
`S2 (z) = S(z)H 2 (z) ,
`
`M 1 (z)=S(z)+ N(z)H 1 (z)
`M 2 (z) = N(z) + S(z)H 2 (z).
`
`Eq. 1
`
`This is the general case for all two microphone systems. In a practical system
`
`there is always going to be some leakage of noise into MIC 1, and some leakage of
`
`9 / 42
`
`- 14 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`speech into MIC 2. Equation 1 has four unknowns and only two known
`relationships and therefore cannot be solved explicitly.
`
`However, there is another way to solve for some of the unknowns in Equation 1.
`The analysis starts with an examination of the case where the speech is not being
`generated, that is, where a signal from the VAD subsystem 104 equals zero. In this
`case, s(n) = S(z) = 0, and Equation 1 reduces to
`
`MIN (z) = N(z)H1 (z)
`M2N (z)= N(z),
`
`where the N subscript on the M variables indicate that only noise is being received.
`This leads to
`
`MIN(z)=M2N(z)H1 (z)
`
`H1 (z) MIN(z)
`M2N(z)
`
`Eq. 2
`
`The function H1(z) can be calculated using any of the available system identification
`algorithms and the microphone outputs when the system is certain that only noise
`is being received. The calculation can be done adaptively, so that the system can
`react to changes in the noise.
`
`A solution is now available for H1(z), one of the unknowns in Equation 1. The final
`unknown, H2(z), can be determined by using the instances where speech is being
`produced and the VAD equals one. When this is occurring, but the recent (perhaps
`less than 1 second) history of the microphones indicate low levels of noise, it can be
`assumed that n(s) = N(z) ~ 0. Then Equation 1 reduces to
`
`which in turn leads to
`
`M 1s (z)=S(z)
`M 2s (z)=S(z)H2 (z),
`
`10 I 42
`
`- 15 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`which is the inverse of the H1(z) calculation. However, it is noted that different
`inputs are being used (now only the speech is occurring whereas before only the
`noise was occurring). While calculating H2(z), the values calculated for H1(z) are
`held constant (and vice versa) and it is assumed that the noise !eve! is not high
`enough to cause errors in the H2(z) calculation.
`
`After calculating H1(z) and H2(z), they are used to remove the noise from the
`signal. If Equation 1 is rewritten as
`
`S(z)=M 1 (z)-N(z)H 1 (z)
`N(z)=M 2 (z)-S(z)H 2 (z)
`S(z) =M 1 (z)-[M 2 (z)-S(z)H 2 (z)]H 1 (z)
`S(z)[l-H 2 (z)H 1 (z)]=M 1 (z)-M 2 (z)H 1 (z),
`
`then N(z) may be substituted as shown to solve for S(z) as
`
`S(z) M 1 (z)-M 2 (z)H 1 (z)
`1- H 1 (z)H 2 (z)
`
`Eq. 3
`
`If the transfer functions H1(z) and H2(z) can be described with sufficient accuracy,
`then the noise can be completely removed and the original signal recovered. This
`remains true without respect to the amplitude or spectral characteristics of the
`noise. If there is very little or no leakage from the speech source into M2, then
`H2 (z);::; O and Equation 3 reduces to
`
`Eq. 4
`
`This is much simpler to implement and is very stable, assuming H1(z) is stable.
`However, if significant speech energy is in M2(z), devoicing can occur. In order to
`construct a well-performing system and use Equation 4, consideration is given to
`the following:
`
`11 / 42
`
`- 16 -
`
`
`
`Attorney Docket No. ALPH.P03SP2
`Filing Date April 16, 2008
`
`Rl. Availability of a perfect ( or at least very good) VAD in noisy conditions
`
`R2.
`
`R3.
`
`R4.
`
`RS.
`
`Sufficiently accurate H1(z)
`
`Very small (ideally zero) H2(z).
`During speech production, H1(z) cannot change substantially.
`During noise, H2(z) cannot change substantially.
`
`Rl is easy to satisfy if the SNR of the desired speech to the unwanted noise is high
`
`enough. "Enough" means different things depending on the method of VAD
`generation. If a V,1!,.,D vibration sensor is used, as in Burnett 7,256,048, accurate
`
`VAD in very low SNRs (-10 dB or less) is possible. Acoustic-only methods using
`
`information from 01 and 02 can also return accurate VADs, but are limited to SNRs
`
`of ~3 dB or greater for adequate performance.
`
`RS is normally simple to satisfy because for most applications the microphones will
`
`not change position with respect to the user's mouth very often or rapidly. In those
`
`applications where it may happen (such as hands-free conferencing systems) it can
`be satisfied by configuring Mic2 so that H 2 (z)::.::: O.
`
`Satisfying R2, R3, and R4 are more difficult but are possible given the right
`combination of V1 and V2. Methods are examined below that have proven to be
`effective in satisfying the above, resulting in excellent noise suppression
`
`performance and minimal speech removal and distortion in an embodiment.
`
`Ill. A different omnidirectional array solution
`
`A generalized two-microphone array is shown below in Figure 2. Two
`
`omnidirectional microphones 201 and 202 are placed a distance 2d 0 apart and a
`speech source 200 is located a distance d5 away at an angle of 0. This array is
`axially symmetric (at least in free space), so no other angle is needed. The output
`from each microphone can be delayed (z1 and z2), multiplied by a gain (A1 and A2),
`and then summed with the other as demonstrated in Figure 3. This operation can
`
`12 / 42
`
`- 17 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`be over any frequency range desired. By varying the magnitude and sign of the
`
`delays and gains, a wide variety of virtual directional microphones (VMs) can be
`
`realized. There are other methods known to those skilled in the art for constructing
`
`VMs but this is a common one and will be used in the enablement below.
`
`The construction of VMs for the adaptive noise suppression system of an
`
`embodiment includes:
`
`1. Very similar noise response in Vl and V2. This means that H1(z) is simple to
`model and \AJill not change much during speech, satisfying R2 and R4 above
`
`and allowing strong denoising and minimized bleedthrough.
`2. Very little speech response for V2. This means that H2(z) ~ 0, which will
`
`satisfy R3 and RS above.
`
`3. Sufficient speech response for Vl so that the cleaned speech will have
`
`significantly higher SNR than the original speech captured by 01.
`
`It is assumed below that the responses of the omnidirectional microphones 01 and
`
`02 to an identical acoustic source have been normalized so that they have exactly
`
`the same response (amplitude and phase) to that source. This can be
`
`accomplished using standard microphone array methods (such as frequency-based
`
`calibration) well known to those versed in the art.
`
`Using the second guideline of the VMs as a starting point ( e.g., very little speech
`
`response for V2), it is seen that for discrete systems Vi(z) can be represented as:
`
`where
`
`P=~
`d2
`
`y = d 2 - di · f
`C
`
`5
`
`(samples)
`
`13 / 42
`
`- 18 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`d 1 =.Jct; -2dsdo cos(8)+d6
`d 2 =.Jct; +2dsdo cos(8)+d6
`
`The distances d1 and d2 are the distance from 0 1 and 02 to the speech source,
`respectively, and y is their difference divided by c, the speed of sound, and
`multiplied by the sampling frequency fs, Thus y is in sampies, but need not be an
`integer. For non-integer y, fractional-delay filters (well known to those versed in
`the art) may be used.
`
`It is important to note that the 13 above is NOT the conventional 13 used to denote
`the mixing of VMs in adaptive beamforming; it is a physical variable of the system
`that depends on the intra-microphone distance d0 (which is fixed) and the distance
`ds and angle 8, which can vary. As shown below, for properly calibrated
`microphones, it is not necessary for the system to be programmed with the exact
`13 of the array. Errors of approximately 10-15% in the actual 13 (i.e. the 13 used by
`the algorithm is not the 13 of the physical array) have been used with very little
`degradation in quality. The algorithmic value of 13 may be calculated and set for a
`particular user or may be calculated adaptively during speech production when little
`or no noise is present. However, adaptation during use is not required for nominal
`performance.
`
`The above formulation for V2(z) has a null at the speech location and will therefore
`exhibit minimal response to the speech. This is shown in Figure 4 for an array
`with d0 = 10. 7 mm and a speech source on the axis of the array (8 = 0) at 10 cm.
`Note that the speech null at zero degrees is not present for noise in the far field, as
`shown in Figure 5 with a noise source distance of approximately 1 meter. This
`insures that noise in front of the user will be detected so that it can be removed.
`This differs from conventional systems that can have difficulty removing noise in
`the direction of the mouth of the user.
`
`The V1(z) can be formulated using the general form for V1(z):
`
`14 / 42
`
`- 19 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`Since
`
`then for noise in the forward direction
`
`and
`
`V2N (z)= 01N (z) · Z-y - z-YpOlN(z)
`V2N(z)= (1-p)(orn(z)-z-r)
`
`if this is set equal to V 1(z) above, the result is
`
`thus
`
`and
`
`This formulation assures that the noise response will be as similar as possible and
`that the speech response will be proportional to (1-[f). Since [3 is the ratio of the
`distances from 01 and 02 to the speech source, it is affected by the size of the
`array and the distance from the array to the speech source.
`
`15 / 42
`
`- 20 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`The response of V1 to speech is shown in Figure 6, and the response to noise in
`Figure 7. Note the difference in speech response compared to V2 shown in Figure
`4 and the similarity of noise response shown in Figure 5. Also note that the
`orientation of the speech response for V1 shown in Figure 6 is completely opposite
`the orientation of conventional systems, where the main lobe of response is
`
`normally oriented toward the speech source. The orientation of an embodiment, in
`which the main lobe of the speech response of V1 is oriented away from the speech
`source, means that the speech sensitivity of V1 is lower than a normal directional
`microphone but is flat for all frequencies within approximately +-30 degrees of the
`axis of the array, as shO\"Jn in Figure 8. This flatness of response for speech
`
`means that no shaping postfilter is needed to restore omnidirectional frequency
`response. This does come at a price - as shown in Figure 9, which shows the
`speech response of V1 and the speech response of a cardioid microphone. The
`speech response of V1 is approximately Oto ~13 dB less than a normal directional
`microphone between approximately 500 and 7500 Hz and approximately Oto 10+
`
`dB greater than a directional microphone below approximately 500 Hz and above
`
`7500 Hz for a sampling frequency of approximately 16000 Hz. However, the
`
`superior noise suppression made possible using this system more than
`
`compensates for the initially poorer SNR.
`
`It should be noted that Figures 4-7 assume the speech is located at approximately
`
`0 degrees and approximately 10 cm and the noise at all angles and approximately
`
`1.0 meters. Generally, the noise distance is not required to be 1 m or more, but
`
`the denoising is the best for those distances. For distances less than approximately
`
`1 m, denoising will not be as effective due to the greater dissimilarity in the noise
`
`responses of Vl and V2. This has not proven to be an impediment in practical use
`
`- in fact, it can be seen as a feature. Any "noise" source that is ~10 cm away from
`
`the earpiece is likely to be desired to be captured and transmitted.
`
`The definitions for V1 and V2 above mean that for noise H1(z) is:
`
`16 / 42
`
`- 21 -
`
`
`
`Attorney Docket No. ALPH.P035P2
`Filing Date April 16, 2008
`
`which has the form of an allpass filter, which has the advantage of being easily and
`
`accurately modeled, especially in magnitude response, satisfying R2.
`
`Clearly the speech null designed into V2 means that the VAD signal is no longer a
`critical component. The VAD's purpose was to ensure that the system would not
`
`train on speech and then subsequently remove it, resulting in speech distortion. If,
`however, V2 contains no speech, the adaptive system cannot train on the speech
`and cannot remove it. As a result, the system can denoise all the time without fear
`
`of devoicing, and the resulting clean audio can then be used to generate a VAD
`
`signal for use in subsequent single-channel noise suppression algorithms such as
`
`spectral subtraction. In addition, constraints on the absolute value of H1(z) (i.e.
`restricting it to absolute values less than two) can keep the system from