`
`DUAL OMNIDIRECTIONAL MICROPHONE ARRAY (DOMA)
`
`Inventor:
`
`Gregory C. Burnett
`
`RELATED APPLICATIONS
`
`This application claims the benefit of United States (US) Patent Application
`
`Numbers 60/934,551, filed June 13, 2007, 60/953,444, filed August 1, 2007,
`
`60/954,712, filed August 8, 2007, and 61/045,377, filed April 16, 2008.
`
`TECHNICAL FIELD
`
`The disclosure herein relates generally to noise suppression.
`
`In particular,
`
`this disclosure relates to noise suppression systems, devices, and methods for use
`
`in acoustic applications.
`
`BACKGROUND
`
`10
`
`15
`
`Conventional adaptive noise suppression algorithms have been around for
`
`some time. These conventional algorithms have used two or more microphones to
`
`sample both an (unwanted) acoustic noise field and the (desired) speech of a user.
`
`20
`
`The noise relationship between the microphonesis then determined using an
`
`adaptivefilter (such as Least-Mean-Squares as described in Haykin & Widrow,
`
`ISBN# 0471215708, Wiley, 2002, but any adaptive or stationary system
`
`identification algorithm may be used) and that relationship used tofilter the noise
`
`from the desired signal.
`
`25
`
`Most conventional noise suppression systems currently in use for speech
`
`communication systems are based on a single-microphone spectral subtraction
`
`technique first develop in the 1970’s and described, for example, by S. F. Boll in
`
`“Suppression of Acoustic Noise in Speech using Spectral Subtraction," IEEE Trans.
`
`on ASSP, pp. 113-120, 1979. These techniques have been refined over the years,
`
`30
`
`but the basic principles of operation have remained the same. See, for example,
`
`US Patent Number 5,687,243 of McLaughlin, et al., and US Patent Number
`
`4,811,404 of Vilmur, et al. There have also been several attempts at multi-
`
`microphone noise suppression systems, such as those outlined in US Patent
`1
`Amazon v. Jawbone
`US. Patent 8,280,072
`
`Amazon Ex. 1009
`
`Amazon v. Jawbone
`U.S. Patent 8,280,072
`Amazon Ex. 1009
`
`
`
`Attorney Docket No. ALPH.P035
`
`Number 5,406,622 of Silverberg et al. and US Patent Number 5,463,694 of Bradley
`et al. Multi-microphone systems have not been very successful for a variety of
`reasons, the most compelling being poor noise cancellation performance and/or
`significant speech distortion. Primarily, conventional multi-microphone systems
`attempt to increase the SNR of the user’s speechby“steering” the nulls of the
`system to the strongest noise sources. This approachis limited in the number of
`noise sources removed by the number of available nulls.
`The Jawbone earpiece (referred to as the “Jawbone), introduced in December
`2006 by AliphCom of San Francisco, California, was the first known commercial
`product to use a pair of physical directional microphones (instead of omnidirectional
`microphones) to reduce environmental acoustic noise. The technology supporting
`the Jawbone is currently described under one or more of US Patent Number
`7,246,058 by Burnett and/or US Patent Application Numbers 10/400,282,
`10/667,207, and/or 10/769,302. Generally, multi-microphone techniques make
`use of an acoustic-based Voice Activity Detector (VAD) to determine the
`background noise characteristics, where “voice” is generally understood to include
`human voiced speech, unvoiced speech, or a combination of voiced and unvoiced
`speech. The Jawbone improved on this by using a microphone-based sensor to
`construct a VADsignal using directly detected speech vibrations in the user’s cheek.
`This allowed the Jawbone to aggressively remove noise when the user was not
`producing speech. However, the Jawbone usesa directional microphone array.
`
`INCORPORATION BY REFERENCE
`
`Each patent, patent application, and/or publication mentionedin this
`specification is herein incorporated by referencein its entirety to the same extent
`as if each individual patent, patent application, and/or publication was specifically
`and individually indicated to be incorporated by reference.
`
`10
`
`15
`
`20
`
`25
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`Figure 1 is a two-microphone adaptive noise suppression system, under an
`
`30
`
`embodiment.
`
`
`
`Attorney Docket No. ALPH.P035
`
`Figure 2 is an array and speech source (S) configuration, under an
`
`embodiment. The microphones are separated by a distance approximately equal to
`
`2do, and the speech sourceis located a distance d, away from the midpoint of the
`
`array at an angle 6. The system is axially symmetric so only d, and 6 need be
`
`specified.
`
`Figure 3 is a block diagram for a first order gradient microphone using two
`
`omnidirectional elements 0, and O2, under an embodiment.
`
`Figure 4 is a block diagram for a DOMAincluding two physical microphones
`
`configured to form twovirtual microphones V; and V2, under an embodiment.
`
`10
`
`Figure 5 is a block diagram for a DOMAincluding two physical microphones
`
`configured to form N virtual microphones V,; through Vy, where N is any number
`
`greater than one, under an embodiment.
`
`Figure 6 is an example of a headset or head-worn device that includes the
`
`DOMA, as described herein, under an embodiment.
`
`15
`
`Figure 7 is a flow diagram for denoising acoustic signals using the DOMA,
`
`under an embodiment.
`
`Figure8is a flow diagram for forming the DOMA, under an embodiment.
`
`Figure9is a plot of linear response of virtual microphone V2 to a 1 kHz
`
`speech source at a distance of 0.1 m, under an embodiment. The null is at 0
`
`20
`
`degrees, where the speechis normally located.
`
`Figure 10 is a plot of linear response of virtual microphone V2 to a 1 kHz
`noise source at a distance of 1.0 m, under an embodiment. There is no null and all
`
`noise sources are detected.
`
`Figure 11 is a plot of linear response of virtual microphone V; to a 1 kHz
`speech source at a distance of 0.1 m, under an embodiment. There is no null and
`
`25
`
`the response for speech is greater than that shownin Figure 9.
`Figure 12 is a plot of linear response of virtual microphone V; to a 1 kHz
`noise source at a distance of 1.0 m, under an embodiment. Thereis no null and
`
`30
`
`the responseis very similar to V2 shown in Figure 10.
`Figure 13 is a plot of linear response of virtual microphone V; to a speech
`source at a distance of 0.1 m for frequencies of 100, 500, 1000, 2000, 3000, and
`
`4000 Hz, under an embodiment.
`
`
`
`Attorney Docket No. ALPH.P035
`
`Figure 14 is a plot showing comparison of frequency responses for speech
`for the array of an embodiment and for a conventional cardioid microphone.
`Figure 15 is a plot showing speech responsefor V; (top, dashed) and V2
`(bottom, solid) versus B with d,; assumed to be 0.1 m, under an embodiment. The
`spatial null in V2 is relatively broad.
`Figure 16 is a plot showing a ratio of V:/V2 speech responses shown in
`Figure 10 versus B, under an embodiment. The ratio is above 10 dB for all 0.8 < B
`< 1.1. This means that the physical p of the system need not be exactly modeled
`
`for good performance.
`Figure 17 is a plot of B versus actual d, assuming that d, = 10 cm and theta
`= 0, under an embodiment.
`Figure 18is a plot of B versus theta with d, = 10 cm and assuming d; = 10
`cm, under an embodiment.
`Figure 19 is a plot of amplitude (top) and phase (bottom) response of N(s)
`with B = 1 and D = -7.2 usec, under an embodiment. The resulting phase
`difference clearly affects high frequencies more than low.
`Figure 20is a plot of amplitude (top) and phase (bottom) responseof N(s)
`with B = 1.2 and D = -7.2 usec, under an embodiment. Non-unity B affects the
`
`entire frequency range.
`Figure 21 is a plot of amplitude (top) and phase (bottom) responseof the
`effect on the speech cancellation in V2 due to a mistake in the location of the
`speech source with ql = 0 degrees and q2 = 30 degrees, under an embodiment.
`The cancellation remains below -10 dB for frequencies below 6 kHz.
`Figure 22is a plot of amplitude (top) and phase (bottom) response of the
`effect on the speech cancellation in V2 due to a mistake in the location of the
`speech source with qi = 0 degrees and q2 = 45 degrees, under an embodiment.
`The cancellation is below -10 dB only for frequencies below about 2.8 kHz and a
`
`reduction in performanceis expected.
`Figure 23 shows experimental results for a 2d) = 19 mm arrayusing a
`linear B of 0.83 on a Bruel and Kjaer Head and Torso Simulator (HATS) in very loud
`(~85 dBA) music/speech noise environment, under an embodiment. The noise has
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`Attorney Docket No. ALPH.P035
`
`been reduced by about 25 dB and the speechhardly affected, with no noticeable
`
`distortion.
`
`DETAILED DESCRIPTION
`
`A dual omnidirectional microphone array (DOMA) that provides improved
`noise suppression is described herein. Compared to conventional arrays and
`algorithms, which seek to reduce noise by nulling out noise sources, the array of an
`embodimentis used to form two distinct virtual directional microphones which are
`
`configured to have very similar noise responses and very dissimilar speech
`responses. The only null formed by the DOMAis one used to remove the speech of
`the user from V2. The two virtual microphones of an embodiment can be paired
`with an adaptivefilter algorithm and/or VAD algorithm to significantly reduce the
`noise without distorting the speech, significantly improving the SNR of the desired
`speech over conventional noise suppression systems. The embodiments described
`herein are stable in operation, flexible with respect to virtual microphone pattern
`choice, and have proven to be robust with respect to speech source-to-array
`distance and orientation as well as temperature and calibration techniques.
`In the following description, numerous specific details are introduced to
`provide a thorough understanding of, and enabling description for, embodiments of
`the DOMA. One skilled in the relevant art, however, will recognize that these
`embodiments can be practiced without one or moreof the specific details, or with
`other components, systems, etc.
`In other instances, well-known structures or
`operations are not shown, orare not described in detail, to avoid obscuring aspects
`
`of the disclosed embodiments.
`
`10
`
`15
`
`20
`
`25
`
`Unless otherwise specified, the following terms have the corresponding
`meanings in addition to any meaning or understanding they may convey to one
`
`skilled in the art.
`
`The term “bleedthrough” means the undesired presence of noise during
`
`30
`
`speech.
`The term “denoising” means removing unwanted noise from Mic1, and also
`refers to the amount of reduction of noise energy in a signal in decibels (dB).
`
`
`
`Attorney Docket No. ALPH.P035
`
`The term “devoicing” means removing/distorting the desired speech from
`
`Micl.
`
`The term “directional microphone (DM)” meansa physical directional
`
`microphone that is vented on both sides of the sensing diaphragm.
`The term “Micl (M1)” means a general designation for an adaptive noise
`
`suppression system microphone that usually contains more speech than noise.
`The term “Mic2 (M2)” means a general designation for an adaptive noise
`suppression system microphone that usually contains more noise than speech.
`
`The term “noise” means unwanted environmental acoustic noise.
`
`10
`
`The term “null” means a zero or minima in the spatial response of a physical
`
`or virtual directional microphone.
`The term “O,” means a first physical omnidirectional microphone used to
`
`form a microphone array.
`The term “O.” means a second physical omnidirectional microphone used to
`
`15
`
`form a microphone array.
`
`The term “speech” means desired speech of the user.
`
`The term “Skin Surface Microphone (SSM)”is a microphone usedin an
`
`earpiece (e.g., the Jawbone earpiece available from Aliph of San Francisco,
`California) to detect speech vibrations on the user’s skin.
`The term “V,” meansthe virtual directional “speech” microphone, which has
`
`no nulls.
`
`The term “V2” means the virtual directional “noise” microphone, which has a
`
`null for the user’s speech.
`The term “Voice Activity Detection (VAD) signal” means a signal indicating
`
`when user speechis detected.
`The term “virtual microphones (VM)”or “virtual directional microphones”
`means a microphone constructed using two or more omnidirectional microphones
`
`and associated signal processing.
`Figure 1 is a two-microphone adaptive noise suppression system 100, under
`an embodiment. The two-microphone system 100 including the combination of
`physical microphones MIC 1 and MIC 2 along with the processing or circuitry
`components to which the microphonescouple (described in detail below, but not
`
`20
`
`25
`
`30
`
`
`
`Attorney Docket No. ALPH.P035
`
`shown in this figure) is referred to herein as the dual omnidirectional microphone
`array (DOMA) 110, but the embodimentis not so limited. Referring to Figure 1, in
`analyzing the single noise source 101 and the direct path to the microphones, the
`total acoustic information coming into MIC 1 (102, which can be an physical or
`virtual microphone) is denoted by m;(n). The total acoustic information coming
`into MIC 2 (103, which can also be an physical or virtual microphone) is similarly
`labeled m2(n).
`In the z (digital frequency) domain, these are represented as M,(z)
`
`and M2(z). Then,
`
`10
`
`15
`
`20
`
`25
`
`30
`
`with
`
`so that
`
`M, (Zz) =S(Z)+N,(2)
`M,(z)=N(Z) +8, ()
`
`N,()=N@H,(@)
`S,(z)=S(@H,(Z),
`
`M, (z)=S() + N(@)H, @)
`M,(z)=N(z)+S@H,(z).
`
`Eq.
`
`1
`
`This is the general case for all two microphone systems. Equation 1 has four
`unknowns and only two known relationships and therefore cannot be solved
`
`explicitly.
`However, there is another way to solve for some of the unknownsin
`Equation 1. The analysis starts with an examination of the case where the speech
`is not being generated, that is, where a signal from the VAD subsystem 104
`(optional) equals zero.
`In this case, s(n) = S(z) = 0, and Equation 1 reduces to
`
`Myx (2)=N(Z)H, (2)
`Mon (Z)=N@),
`
`where the N subscript on the M variables indicate that only noise is being received.
`
`This leads to
`
`Myy (2)=M>n (ZH, @)
`
`My)
`m@= Mn (2)
`
`7
`
`Eq. 2
`
`
`
`Attorney Docket No. ALPH.P035
`
`The function H(z) can be calculated using any of the available system identification
`algorithms and the microphone outputs when the system is certain that only noise
`is being received. The calculation can be done adaptively, so that the system can
`
`react to changesin the noise.
`A solution is now available for H,(z), one of the unknowns in Equation 1. The
`
`final unknown, H2(z), can be determined by using the instances where speechis
`being produced and the VAD equals one. Whenthis is occurring, but the recent
`(perhaps less than 1 second) history of the microphonesindicate low levels of
`noise, it can be assumed that n(s) = N(z) ~ 0. Then Equation 1 reduces to
`
`which in turn leads to
`
`M)s5(Z)=S@)
`M).(Z)=S(Z)H, (z),
`
`Mp5 (Z)=Mj5(Z)H,(2)
`M5 (Z)
`m@= Mis (Z)
`
`whichis the inverse of the H(z) calculation. However, it is noted that different
`inputs are being used (nowonly the speech is occurring whereas before only the
`noise was occurring). While calculating H2(z), the values calculated for H(z) are
`held constant (and vice versa) and it is assumed that the noise level is not high
`
`enough to cause errors in the H2(z) calculation.
`After calculating H:(z) and H(z), they are used to removethe noise from the
`
`signal.
`
`If Equation 1 is rewritten as
`
`S(z)=M,(2)-N@H,@)
`N(Z)=M(Z)-S@)H2 @)
`S@)=M, (2)-[M2(Z)-S@H,(@IH, @)
`S(z)[1-H, (2H, (Z)]=M, (z)-M,(2)H, @),
`
`10
`
`15
`
`20
`
`25
`
`30
`
`then N(z) may be substituted as shown to solve for S(z) as
`
`
`
`Attorney Docket No. ALPH.P035
`
`gz) =OMOH@| Eq. 3
`
`
`1-H, (2H, (z)
`
`If the transfer functions H(z) and H2(z) can be described with sufficient
`accuracy, then the noise can be completely removed and the original signal
`recovered. This remains true without respect to the amplitude or spectral
`characteristics of the noise.
`If there is very little or no leakage from the speech
`source into M2, then H,(z)*0 and Equation 3 reduces to
`
`S(z)*M,(z)—~M,(@H,(Z).
`
`Eq. 4
`
`Equation 4 is much simpler to implement and is very stable, assuming H(z)
`is stable. However,if significant speech energy is in M2(z), devoicing can occur.
`In
`order to construct a well-performing system and use Equation 4, considerationis
`
`given to the following conditions:
`
`R1.
`
`R2.
`
`R3.
`R4.
`R5.
`
`Availability of a perfect (or at least very good) VADin noisy conditions
`
` Sufficiently accurate H1(z)
`
`Very small (ideally zero) H(z).
`During speech production, Hi(z) cannot change substantially.
`During noise, H2(z) cannot change substantially.
`
`Condition R1 is easy to satisfy if the SNR of the desired speech to the
`unwantednoise is high enough. “Enough” meansdifferent things depending on the
`method of VAD generation.
`If a VAD vibration sensor is used, as in Burnett
`7,256,048, accurate VADin very low SNRs(-10 dB or less) is possible. Acoustic-
`only methodsusing information from O, and O2 can also return accurate VADs, but
`are limited to SNRs of ~3 dB or greater for adequate performance.
`Condition R5 is normally simple to satisfy because for most applications the
`microphoneswill not changeposition with respect to the user’s mouth very often or
`rapidly.
`In those applications where it may happen (such as hands-free
`conferencing systems)it can be satisfied by configuring Mic2 so that H,(z)=0.
`
`9
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`Attorney Docket No. ALPH.P035
`
`Satisfying conditions R2, R3, and R4 are moredifficult but are possible given
`the right combination of V; and V2. Methods are examined below that have proven
`to be effective in satisfying the above, resulting in excellent noise suppression
`performance and minimal speech removal and distortion in an embodiment.
`The DOMA,in various embodiments, can be used with the Pathfinder system
`
`as the adaptivefilter system or noise removal. The Pathfinder system, available
`from AliphCom, San Francisco, CA, is described in detail in other patents and patent
`applications referenced herein. Alternatively, any adaptivefilter or noise removal
`algorithm can be used with the DOMAin one or more various alternative
`
`10
`
`embodiments or configurations.
`
`When the DOMAis used with the Pathfinder system, the Pathfinder system
`
`generally provides adaptive noise cancellation by combining the two microphone
`signals (e.g., Micl1, Mic2) byfiltering and summing in the time domain. The
`adaptive filter generally uses the signal received from a first microphoneof the
`DOMAto remove noise from the speech received from at least one other
`
`microphone of the DOMA, whichrelies on a slowly varying linear transfer function
`between the two microphonesfor sources of noise. Following processing of the two
`channels of the DOMA, an output signal is generated in which the noise contentis
`attenuated with respect to the speech content, as described in detail below.
`Figure 2 is a generalized two-microphone array (DOMA)including an array
`201/202 and speech source S configuration, under an embodiment. Figure 3 is a
`system 300 for generating or producing a first order gradient microphone V using
`two omnidirectional elements O, and O2, under an embodiment. The array of an
`
`embodimentincludes two physical microphones 201 and 202 (e.g., omnidirectional
`microphones) placed a distance 2d apart and a speech source 200 is located a
`distance d, away at an angle of 8. This array is axially symmetric (at least in free
`space), so no other angle is needed. The output from each microphone 201 and
`202 can be delayed (z; and zz), multiplied by a gain (A; and Az), and then summed
`with the other as demonstrated in Figure 3. The outputof the array is or forms at
`least one virtual microphone, as described in detail below. This operation can be
`over any frequency range desired. By varying the magnitude and sign of the delays
`and gains, a wide variety of virtual microphones (VMs), also referred to herein as
`
`15
`
`20
`
`25
`
`30
`
`10
`
`
`
`Attorney Docket No. ALPH.P035
`
`virtual directional microphones, can be realized. There are other methods knownto
`
`those skilled in the art for constructing VMs but this is a common one and will be
`
`used in the enablement below.
`
`As an example, Figure 4 is a block diagram for a DOMA 400 including two
`physical microphonesconfigured to form twovirtual microphones V; and V2, under
`an embodiment. The DOMAincludes twofirst order gradient microphones V, and
`
`V2 formed using the outputs of two microphonesor elements 0, and O2 (201 and
`202), under an embodiment. The DOMA of an embodimentincludes two physical
`microphones 201 and 202 that are omnidirectional microphones, as described
`
`above with reference to Figures 2 and 3. The output from each microphone is
`
`coupled to a processing component 402, or circuitry, and the processing component
`outputs signals representing or corresponding to the virtual microphones V, and V2.
`
`In this example system 400, the output of physical microphone 201 is
`
`coupled to processing component 402 that includes a first processing path that
`
`includes application of a first delay 21, and a first gain Ai, and a second processing
`path that includes application of a second delay 212 and a second gain Ai. The
`output of physical microphone 202 is coupled to a third processing path of the
`processing component 402 that includes application of a third delay zz; and a third
`gain Az, and a fourth processing path that includes application of a fourth delay Z22
`and a fourth gain A22. The output of the first and third processing paths is summed
`to form virtual microphone Vi, and the output of the second and fourth processing
`
`paths is summedto form virtual microphone Vp.
`As described in detail below, varying the magnitude and sign of the delays
`
`and gains of the processing paths leads to a wide variety of virtual microphones
`(VMs), also referred to herein as virtual directional microphones, can berealized.
`While the processing component 402 described in this example includes four
`processing paths generating twovirtual microphones or microphone signals, the
`embodimentis not so limited. For example, Figure 5 is a block diagram for a
`
`DOMA500 including two physical microphones configured to form N virtual
`microphonesV; through Vy, where N is any number greater than one, under an
`embodiment. Thus, the DOMAcan include a processing component 502 having any
`
`10
`
`15
`
`20
`
`25
`
`30
`
`11
`
`
`
`Attorney Docket No. ALPH.P035
`
`number of processing paths as appropriate to form a number N ofvirtual
`
`microphones.
`
`The DOMA of an embodiment can be coupled or connected to one or more
`
`remote devices.
`
`In a system configuration, the DOMA outputs signals to the
`
`remote devices. The remote devices include, but are not limited to, at least one of
`
`cellular telephones, satellite telephones, portable telephones, wireline telephones,
`
`Internet telephones, wireless transceivers, wireless communication radios, personal
`
`digital assistants (PDAs), personal computers (PCs), headset devices, head-worn
`
`devices, and earpieces.
`
`10
`
`Furthermore, the DOMA of an embodiment can be a component or subsystem
`
`In this system configuration, the DOMA outputs
`integrated with a host device.
`signals to components or subsystems of the host device. The host device includes,
`
`but is not limited to, at least one of cellular telephones, satellite telephones,
`
`portable telephones, wireline telephones, Internet telephones, wireless
`
`15
`
`transceivers, wireless communication radios, personal digital assistants (PDAs),
`
`personal computers (PCs), headset devices, head-worn devices, and earpieces.
`
`As an example, Figure 6 is an example of a headset or head-worn device
`
`600 that includes the DOMA, as described herein, under an embodiment. The
`
`20
`
`headset 600 of an embodimentincludes a housing having two areas or receptacles
`(not shown) that receive and hold two microphones(e.g., O1 and O2). The headset
`600 is generally a device that can be worn by a speaker 602, for example, a
`headset or earpiece that positions or holds the microphonesin the vicinity of the
`speaker’s mouth. The headset 600 of an embodimentplacesa first physical
`microphone (e.g., physical microphone 0O,) in a vicinity of a speaker's lips. A
`
`25
`
`second physical microphone(e.g., physical microphone Oz) is placed a distance
`
`behind the first physical microphone. The distance of an embodimentis in a range
`
`of a few centimeters behind the first physical microphone or as described herein
`
`(e.g., described with reference to Figures 1-5). The DOMAis symmetric and is
`used in the same configuration or manneras a single close-talk microphone, butis
`
`30
`
`not so limited.
`
`Figure 7 is a flow diagram for denoising 700 acoustic signals using the
`DOMA, under an embodiment. The denoising 700 begins by receiving 702 acoustic
`
`12
`
`
`
`Attorney Docket No. ALPH.P035
`
`In
`signals at a first physical microphone and a second physical microphone.
`response to the acoustic signals, a first microphone signal is output from the first
`physical microphone and a second microphone signal is output from the second
`physical microphone 704.Afirst virtual microphone is formed 706 by generating a
`first combination of the first microphone signal and the second microphone signal.
`A second virtual microphone is formed 708 by generating a second combination of
`the first microphone signal and the second microphone signal, and the second
`combination is different from the first combination. The first virtual microphone
`
`10
`
`15
`
`20
`
`25
`
`30
`
`and the second virtual microphone are distinct virtual directional microphones with
`substantially similar responses to noise and substantially dissimilar responses to
`speech. The denoising 700 generates 710 output signals by combining signals from
`the first virtual microphone and the second virtual microphone, and the output
`signals include less acoustic noise than the acoustic signals.
`Figure 8 is a flow diagram for forming 800 the DOMA, under an
`embodiment. Formation 800 of the DOMAincludes forming 802 a physical
`microphone array including a first physical microphone and a second physical
`microphone. The first physical microphone outputs a first microphone signal and
`the second physical microphone outputs a second microphonesignal. A virtual
`microphone array is formed 804 comprising a first virtual microphone and a second
`virtual microphone. The first virtual microphone comprisesa first combination of
`the first microphone signal and the second microphone signal. The second virtual
`microphone comprises a second combination of the first microphone signal and the
`second microphone signal, and the second combination is different from the first
`combination. The virtual microphone array including a single null oriented in a
`
`direction toward a source of speech of a human speaker.
`The construction of VMs for the adaptive noise suppression system of an
`embodiment includes substantially similar noise response in V; and V2.
`Substantially similar noise response as used herein meansthat H;(z) is simple to
`model and will not change much during speech, satisfying conditions R2 and R4
`described above and allowing strong denoising and minimized bleedthrough.
`The construction of VMs for the adaptive noise suppression system of an
`embodimentincludes relatively small speech responsefor V2. The relatively small
`
`13
`
`
`
`Attorney Docket No. ALPH.P035
`
`speech response for V2 means that H-{z) ¥ 0, which will satisfy conditions R3 and R5
`
`described above.
`
`The construction of VMs for the adaptive noise suppression system of an
`
`embodimentfurther includes sufficient speech response for V; so that the cleaned
`
`speech will have significantly higher SNR than the original speech captured by Oj.
`The description that follows assumes that the responses ofthe
`
`omnidirectional microphones O, and O> to an identical acoustic source have been
`
`normalized so that they have exactly the same response (amplitude and phase) to
`
`that source. This can be accomplished using standard microphone array methods
`
`10
`
`(such as frequency-based calibration) well known to those versed in the art.
`
`Referring to the condition that construction of VMs for the adaptive noise
`suppression system of an embodimentincludesrelatively small speech responsefor
`V2, it is seen that for discrete systems V2(z) can be represented as:
`
`15
`
`where
`
`Vo (z)= O, (z) —z"BO, (z)
`
`d
`= SL
`
`B d,
`(samples)
`y= “et -f,
`d, =4/d2 - 2d,d, cos(0)+ d?
`d, =4/d2 +2d,d, cos(0)+ d?
`
`20
`
`25
`
`The distances d, and d> are the distance from O,; and O, to the speech source (see
`Figure 2), respectively, and y is their difference divided by c, the speed of sound,
`and multiplied by the sampling frequencyf;. Thus y is in samples, but need not be
`an integer. For non-integer y, fractional-delay filters (well known to those versedin
`
`the art) may be used.
`It is important to note that the B aboveis not the conventional B used to
`denote the mixing of VMs in adaptive beamforming; it is a physical variable of the
`system that depends on the intra-microphone distance do (which is fixed) and the
`
`14
`
`
`
`Attorney Docket No. ALPH.P035
`
`distance d, and angle 6, which can vary. As shown below,for properly calibrated
`
`microphones, it is not necessary for the system to be programmed with the exact
`
`B of the array. Errors of approximately 10-15%in the actual B (i.e. the B used by
`
`the algorithm is not the B of the physical array) have been used withverylittle
`
`degradation in quality. The algorithmic value of B may be calculated and set for a
`
`particular user or may be calculated adaptively during speech production whenlittle
`
`or no noise is present. However, adaptation during use is not required for nominal
`
`performance.
`
`Figure 9 is a plot of linear response of virtual microphone V2 with B = 0.8 to
`
`10
`
`a 1 kHz speech source at a distance of 0.1 m, under an embodiment. The null in
`
`the linear response of virtual microphone V2 to speech is located at 0 degrees,
`
`wherethe speech is typically expected to be located. Figure 10 is a plotoflinear
`
`responseof virtual microphone V2 with B = 0.8 to a 1 kHz noise source at a
`
`distance of 1.0 m, under an embodiment. The linear response of V2 to noise is
`
`15
`
`devoid of or includes no null, meaning all noise sources are detected.
`
`The above formulation for V2(z) has a null at the speech location and will
`
`therefore exhibit minimal response to the speech. This is shown in Figure 9 for an
`
`array with dg = 10.7 mm and a speech source on the axis of the array (8 = 0) at 10
`
`cm (8 = 0.8). Note that the speech null at zero degrees is not presentfor noise in
`
`20
`
`the far field for the same microphone, as shown in Figure 10 with a noise source
`
`distance of approximately 1 meter. This insures that noise in front of the user will
`
`be detected so that it can be removed. This differs from conventional systems that
`
`can havedifficulty removing noise in the direction of the mouth of the user.
`
`The V,(z) can be formulated using the general form for Vi(z):
`
`25
`
`Since
`
`V(z)=0,0,(z)-2°A - 30,(z)-2°
`
`V, (z) =O, (z) —z"BO, (z)
`
`30
`
`and, since for noise in the forward direction
`
`15
`
`
`
`Attorney Docket No. ALPH.P035
`
`then
`
`Onn (z) = Ow (z) ZO
`
`Von (z)= Oww (z)- z*-—z"*BOw (z)
`Von (z) = (1 - B\On (z): z*)
`
`If this is then set equal to V;(z) above, the resultis
`
`Viy (2) = 0,O1y (2) 2% ~ 04,01 (z)-27 Ze =(1-BO,y(z)-27)
`
`thus we may set
`
`10
`
`15
`
`to get
`
`da =7
`
`dg = 0
`
`a= 1
`
`ag = B
`
`Vi (z)= 0; (z): zy BO, (z)
`
`The definitions for V; and V2 above mean that for noise H,(z) is:
`
`_ V2) _ 7 BO, (z)+ O,(z)- Zz"
`B= @)
`0,(2)-2780,()
`
`20
`
`which, if the amplitude noise responses are about the same, has the form of an
`allpass filter. This has the advantage of being easily and accurately modeled,
`
`especially in magnitude response, satisfying R2.
`
`25
`
`This formulation assures that the noise responsewill be as similar as possible and
`that the speech responsewill be proportional to (1-B”). Since B is the ratio of the
`distances from O, and O, to the speech source,it is affected by the size of the array
`
`and the distance from the array to the speech source.
`
`16
`
`
`
`Attorney Docket No. ALPH.P035
`
`Figure 11 is a plot of linear response of virtual microphone V; with B = 0.8
`to a 1 kHz speech source at a distance of 0.1 m, under an embodiment. The linear
`responseofvirtual microphone V, to speechis devoid of or includes no null and the
`
`response for speech is greater than that shown in Figure 4.
`Figure 12 is a plot of linear response of virtual microphone V; with B = 0.8
`to a 1 kHz noise source at a distance of 1.0 m, under an embodiment. The linear
`
`response of virtual microphoneV; to noise is devoid of or includes no null and the
`
`response is very similar to V2 shown in Figure 5.
`
`Figure 13 is a plot of linear response of virtual microphone V; with B = 0.8
`to a speech source at a distance of 0.1 m for frequencies of 100, 500, 1000, 2000,
`3000, and 4000 Hz, under an embodiment. Figure 14 is a plot showing
`comparison of frequency responses for speech for the array of an embodiment and
`
`for a conventional cardioid microphone.
`
`The response of V; to speech is shownin Figure 11,