`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`SHARED-REAR VENT METHOD FOR ADAPTIVE MULTIPLE-MICROPHONE NOISE
`
`REMOVAL
`
`Inventor: Greg Burnett
`
`RELATED APPLICATIONS
`
`This application relates to and claims the benefit of United States Patent Application Numbers
`
`10/159,770, filed May 30, 2002, 10/667,207, filed September 18, 2003, 10/400,282, filed March
`
`27, 2003, 10/301,237, filed November 21, 2002, 10/769,302, filed January 30, 2004, and
`
`11/199,856, filed August 8, 2005.
`
`INCORPORATION BY REFERENCE
`
`Each patent, patent application, and/or publication mentioned in this specification is herein
`
`incorporated by reference in its entirety to the same extent as if each individual patent, patent
`
`application, and/or publication was specifically and individually indicated to be incorporated by
`
`reference.
`
`DEFINITIONS
`
`The following terms have the following general meanings as they are used herein.
`
`Noise means unwanted environmental acoustic noise.
`
`Speech means desired speech of the user.
`
`Mic 1 means the "speech" microphone, which is more sensitive to speech than other
`
`microphones.
`
`Mic2 means the "noise" microphone, which is less sensitive to speech than Mic 1. Mic2 can
`
`include multiple microphones.
`
`Denoising means removing unwanted noise from Mic 1.
`
`Devoicing means removing/distorting desired speech from Micl.
`
`Directional microphone (DM) means a physical directional microphone that is vented on both
`
`sides of the sensing diaphragm.
`
`Virtual directional microphone (VDM) means a directional microphone constructed using
`
`omnidirectional microphones and digital signal processing.
`
`I
`
`Amazon v. Jawbone
`U.S. Patent 8,280,072
`Amazon Ex. 1013
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Figure 1 is a block diagram of a shared-vent configuration, under an embodiment.
`
`Figure 2 is a block diagram of a shared-vent configuration including omnidirectional
`
`microphones to form VDMs, under an embodiment.
`
`Figure 3 shows results obtained for a physical DM, under an embodiment.
`
`Figure 4 is a block diagram of the noise removal algorithm, assuming a single noise source and
`
`direct path to the microphones, under an embodiment.
`
`DESCRIPTION
`
`The Related Applications describe novel methods and systems of noise suppression using an
`
`adaptive filter, multiple microphones, and speech detection devices. A description is provided
`
`herein of a system and method of arranging and venting microphones so that the performance of
`
`the noise suppression system is enhanced. By making the input to the rear vents of directional
`
`microphones (actual or virtual) as similar as possible, the real-world filter to be modeled
`
`becomes much simpler to model using an adaptive filter. In some cases, the filter collapses to
`
`unity, the simplest filter of all. This method has been successfully implemented in the laboratory
`
`and in physical systems and provides improved performance over conventional methods.
`
`The Pathfinder noise suppression algorithm (referred to herein as "Pathfinder") described below
`
`in Appendix A and the Related Applications uses multiple microphones and a V AD signal to
`
`remove undesired noise while preserving the intelligibility and quality of the speech of the user.
`
`Pathfinder does this using a configuration including directional microphones and overlapping the
`
`noise and speech response of the microphones - that is, one microphone will be more sensitive to
`
`speech than the other but they will both have similar noise responses. If the microphones do not
`
`have the same or similar noise responses, the denoising performance will be poor. If the
`
`microphones have similar speech responses, then devoicing will take place. The key to a
`
`successful system is ensuring that the noise response of the microphones is as similar as possible
`
`while simultaneously constructing the speech response of the microphones as dissimilar as
`
`2
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`possible. The technique described herein is effective at removing undesired noise while
`
`preserving the intelligibility and quality of the speech of the user.
`
`The microphone configuration of an embodiment allows the rear vents of the directional
`
`microphones to sample a common pressure source. This is accomplished differently for
`
`directional microphones and VDMs. The theory behind the microphone configuration as well as
`
`more specific configurations are described in detail below for both physical and virtual
`
`directional microphones.
`
`Theory
`
`The following description includes reference to two directional microphones; however, the
`
`description that follows can be generalized to any number of microphones.
`
`Pathfinder operates using an adaptive algorithm to continuously update the filter constructed
`
`using Mic 1 and Mic 2. In the frequency domain, each microphone's output can be represented
`
`as:
`
`M1 (z)= F1 (z}-z-d 1 B1 (z)
`M2 (z} = F2 (z}- z-d 2 B 2 (z}
`
`where F1(z) represents the pressure at the front port ofMicl, B,(z) the pressure at the back (rear)
`
`port, and z-dl the delay instituted by the microphone. This delay can be realized through port
`
`venting and/or microphone construction and/or other ways known to those skilled in the art,
`
`including acoustic retarders which slow the acoustic pressure wave. If using omnidirectional
`
`microphones to construct virtual directional microphones, these delays can also be realized using
`
`delays in DSP. The delays are not required to be integer delays. The filter that is constructed
`
`using these outputs is
`
`3
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`In the case where B1(z) is not equal to B2(z), this is an IIR filter. It can become quite complex
`when multiple microphones are employed. However, if B1(z)=B2(z) and d1 = d2, then
`
`The front ports are related to each other by a simple relationship:
`
`F2 (z) = Az-d 12 F1 (z)
`where A is the difference in amplitude of the noise between the two microphones and d12 is the
`delay between the microphones. Both of these will vary depending on where the acoustic source
`
`is located with respect to the microphones. A single noise source is assumed for purposes of this
`
`description, but the analysis presented can be generalized to multiple noise sources. For noise,
`which is assumed to be more than a meter away (in the far field), A is approximately ~ 1. The
`
`delay d12 will vary depending on the noise source between-d12max and +d12max, where d12max is
`
`the maximum delay possible between the two front ports. This maximum delay is a function of
`
`the distance between the front vents of the microphones and the speed of sound in air.
`
`The rear ports are related to the front port by a similar relationship:
`
`B1(z)= Bz-d13 F1(z)
`where B is difference in amplitude of the noise between the two microphones and dFs is the delay
`
`between front port 1 and the common back port 3. Both of these will vary depending on where
`
`the acoustic source is located with respect to the microphones as shown above with d12. The
`delay d13 will vary depending on the noise source between-d1Jmax and +d13max, where d1Jmax is
`
`the maximum delay possible between front port 1 and the common back port 3. This maximum
`
`delay is determined by the path length between front port 1 and the common back port 3 - for
`
`example, if they are located 3 cm apart, dl3max will be
`
`d
`0.03m
`d 13 max = - = - - - = 0.87 msec
`c
`345m/s
`
`4
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`Again, for noise, B ~ 1 since the noise sources are assumed to be > 1 meter away from the
`
`microphones. Thus, in general, the above equation reduces to:
`F1(z)-z-d1Bz-d13 F1(z)
`_
`z _
`H
`IN( )- -d12p ( )
`-d1B -d23p ( ) -
`I Z -z
`l Z
`Z
`Z
`
`l-z-(di+d13)
`-d12
`-(d1+d13}
`-z
`Z
`
`where the "N" denotes that this response is for far-field noise. Since di is a characteristic of the
`
`microphone, it remains the same for all different noise orientations. Conversely, d13 and d 12 are
`relative measurements that depend on the location of the noise source with respect to the array.
`
`If dl2 (cid:157) 0, then the filter HIN(z) collapses to
`1- 2 -(d1+d13)
`-(d1+d13) = l
`l
`-z
`
`HIN (z) (cid:157)
`
`(d12 (cid:157) o)
`
`and the resulting filter is a simple unity response filter, which is extremely simple to model with
`
`an adaptive FIR system. For noise sources perpendicular to the array axis, the distance from the
`
`noise source to the front vents will be equal and d12 will go to zero. Even for small angles from
`the perpendicular, d12 will be small and the response will still be close to unity. Thus, for many
`
`noise locations, the HIN(z) filter can be easily modeled using an adaptive FIR algorithm. This is
`
`not the case if the two directional microphones do not have a common rear vent. Even for noise
`
`sources away from a line perpendicular to the array axis, the HIN(z) filter is still simpler and
`
`more easily modeled using an adaptive FIR filter algorithm and improvements in performance
`
`have been observed.
`
`The approximations made in the description above are as follows:
`
`1. B1(z) =B2(z)
`
`a. The rear vents are exposed to and have the same response to the same pressure
`
`volume. This approximation can be satisfied if the common vented volume is
`
`small compared to a wavelength of the sound wave of interest.
`
`5
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`2. d1 = d2
`a. The rear port delays for each microphone are the same - this is no problem with
`
`physical directional microphones, but must be specified for VDMs. These delays
`
`are relative - the front ports can also be delayed if desired, as long as the delay is
`
`the same for both microphones.
`
`3. F2 (z) ~ F1 (z)z-d12
`
`a. The amplitude response of the front vents are about the same and the only
`
`difference is a delay. For noise sources> 1 meter away, this is a good
`
`approximation, as the amplitude of a sound wave varies as 1/r.
`
`For speech, since it is much closer to the microphones (~I to 10 cm), A is not unity. The closer
`
`to the mouth of the user, the more different from unity A becomes. For example, ifMicl is
`
`located 8 cm away from the mouth and Mic2 12 cm, then for speech A would be
`
`A= F2(z) = ½2 = 0.67
`F1(z) ½
`
`This means for speech H 1(z) will be
`
`with the "S" denoting the response for near-field speech and A 'i' I. This does not reduce to a
`
`simple FIR approximation and will be harder for the adaptive FIR algorithm to adapt to. This
`
`means that the models for the filters HIN(z) and H 1s(z) will be very different, thus reducing
`devoicing. Of course, if a noise source is located close to the microphone, the response will be
`
`the similar - which could cause more devoicing. However, unless the noise source is located
`
`very near the mouth of the user, a non-unity A and nonzero d 12 should be enough to limit
`devoicing.
`
`6
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`Examples
`
`The difference in response is next examined for speech and noise when the noise is located
`
`behind the microphones. Let d1 = 3. For speech, let d12 = 2, A= 0.67, and B = 0.82. Then
`F1 {z)-z-d1 B1 {z)
`H (z)-
`is
`- z-d12 AF1 {z)- z-d1 B1 {z)
`1-0.82z-3
`H ()
`is z = 0.67z-3 - 0.82z-2
`
`which has a very non-FIR response. For noise located directly opposite the speech, d12 = -2, A=
`B = 1. Thus the phase of the noise at F2 is two samples ahead of F 1. Then
`-2
`-5
`z -z
`-5
`-z
`1
`
`which is much simpler and easily modeled than the speech filter.
`
`Implementation
`
`The microphone configuration of an embodiment implements the technique described above,
`
`using directional microphones, by including or constructing a vented volume that is small
`
`compared to the wavelength of the acoustic wave of interest and vent the front of the DMs to the
`
`outside of the volume and the rear of the DM to the volume itself Figure 1 is a block diagram
`
`of a shared-vent configuration, under an embodiment. This example configuration includes
`
`electret directional microphones having a 6 millimeter (mm) diameter, but the embodiment is not
`
`so limited. Alternative embodiments can include any type of directional microphone having any
`
`number of different sizes and/or configurations. The vent openings for the front of each
`
`microphone and the common rear vent volume must be large enough to ensure adequate speech
`
`energy at the front and rear of each microphone. A vent opening of approximately 3 mm in
`
`diameter has been implemented with good results.
`
`Figure 2 is a block diagram of a shared-vent configuration including omnidirectional
`
`microphones to form VDMs, under an embodiment. Here, the common "rear vent" is a third
`
`omnidirectional microphone situated between the other two microphones. Micl and Mic2 (as
`
`7
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`defined above) can be defined as:
`
`M, = o, -03Z-dt
`M2 = 02 - 03Z-dt
`
`Here the distances "d" between the microphones is equal but the embodiment is not so limited.
`
`The delay time "dt" is the time it takes for the sound to travel the distance "d". In this
`embodiment, assuming a temperature of20 Celsius, that time would be about 5.83 x 10-5
`
`seconds. The above assumes that all three omnidirectional microphones have been calibrated so
`
`that their response to an identical source is the same, but this is not limiting as calibration
`
`techniques are well known to those in the art. Different combinations of two or more
`
`microphones are possible, but the virtual "rear vents" must be as similar as possible to derive full
`
`benefit from this configuration. The simplest way to ensure this is to dedicate a single
`
`microphone (in this case 03) to be the rear "vent" for both VDMs.
`
`RESULTS
`
`Figure 3 shows results obtained for a physical DM, under an embodiment. These experimental
`
`results were obtained using the shared-rear-vent configuration described herein using a live
`
`subject in a sound room in the presence of complex babble noise. The top plot ("Mic I no
`processing") is the original noisy signal in Mic 1, and the bottom plot ("Mic 1 after PF + SS) the
`
`denoised signal (Pathfinder plus spectral subtraction) (under identical or nearly identical
`
`conditions) after adaptive Pathfinder denoising of approximately 8 dB and additional single(cid:173)
`
`channel spectral subtraction of approximately 12 dB. Clearly the technique is adept at removing
`
`the unwanted noise from the desired signal.
`
`CONCLUSION
`
`. A novel system and method for improving the performance of two-microphone adaptive noise
`
`suppression algorithms has been demonstrated. While Pathfinder is used as an example for
`
`purposes of the description herein, the embodiments described herein are not limited to
`
`Pathfinder and can be used with a variety of other noise suppression algorithms. By making the
`
`8
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`input to the rear vents of directional microphones (actual or virtual) as similar as possible, the
`
`real-world filter to be modeled becomes much simpler to model using an adaptive filter. In some
`
`cases, the filter collapses to unity, the simplest filter of all. The teachings described herein have
`
`been successfully implemented in the laboratory and in physical configurations and perform
`
`significantly better than conventional configurations.
`
`Unless the context clearly requires otherwise, throughout the description, the words "comprise,"
`
`"comprising," and the like are to be construed in an inclusive sense as opposed to an exclusive or
`
`exhaustive sense; that is to say, in a sense of "including, but not limited to." Words using the
`
`singular or plural number also include the plural or singular number respectively. Additionally,
`
`the words "herein," "hereunder," "above," "below," and words of similar import refer to this
`
`application as a whole and not to any particular portions of this application. When the word "or"
`
`is used in reference to a list of two or more items, that word covers all of the following
`
`interpretations of the word: any of the items in the list, all of the items in the list and any
`
`combination of the items in the list.
`
`The above description of embodiments is not intended to be exhaustive or to limit the systems
`
`and methods described to the precise form disclosed. While specific embodiments of, and
`
`examples for, the embodiments are described herein for illustrative purposes, various equivalent
`
`modifications are possible within the scope of other systems and methods, as those skilled in the
`
`relevant art will recognize. The teachings provided herein can be applied to other noise
`
`suppression systems and methods, not only for the systems and methods described above.
`
`The elements and acts of the various embodiments described above can be combined to provide
`
`further embodiments. These and other changes can be made to the noise suppression systems
`
`and methods in light of the above detailed description.
`
`APPENDIXA: PATHFINDERREVIEW
`
`The description that follows makes use of a configuration including two microphones, for
`
`simplicity, but the algorithm can be extended to as many microphones as desired. Also, a single
`
`9
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`noise source and direct path are assumed, which is easily generalized to multiple noise sources
`
`and reflective paths as shown in the Related Applications.
`
`Figure 4 is a block diagram of the noise removal algorithm, assuming a single noise source and
`
`direct path to the microphones, under an embodiment. In Figure 4, the acoustic information
`
`coming into MIC 1 is denoted by m 1(n). The information coming into MIC 2 is similarly labeled
`
`m2(n). In the z (digital frequency) domain, we can represent them as M1(z) and M2(z). Then
`M1(z)= S(z)+ N2 (z)
`M 2 (z)= N(z}+S2(z)
`
`with
`
`so that
`
`N2(z)= N(z)H1(z)
`S2 (z )= S(z )H2 (z)
`
`M1 (z) = S(z )+ N(z )H1 (z)
`M 2 (z) = N(z)+ S(z)H2(z)
`This is the general case for all two microphone systems. There is always going to be
`
`(1)
`
`some leakage of noise into MIC 1, and some leakage of signal into MIC 2. Equation 1 has four
`
`unknowns and only two known relationships and therefore cannot be solved explicitly.
`
`However, there is another way to solve for some of the unknowns in Equation 1. Assume
`
`a case where the signal is not being generated - that is, where the V AD signal equals zero and
`
`speech is not being produced. In this case, s(n) = S(z) = 0, and Equation 1 reduces to
`
`M,n(z)= N(z)H1(z)
`M2n(z)=N(z)
`where the n subscript on the M variables indicate that only noise is being received. This leads to
`
`M1n(z)= M2n(z}H1 (z)
`H1(z)= M1n(z)
`M2n(z)
`H1(z) can be calculated using any of the available system identification algorithms and the
`
`(2)
`
`microphone outputs when the system is certain that only noise is being received. The calculation
`
`can be done adaptively, so that the system can react to changes in the noise.
`
`10
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`So now a solution has been found to one of the unknowns in Equation 1. Another unknown,
`
`H2(z), can be solved for by using the instances where the VAD equals one and speech is being
`produced. When this is occurring, but the recent (less than 1 second) history of the microphones
`indicate low levels of noise, it can be assumed that n(s) = N(z) ~ 0. Then Equation 1 reduces to
`M1s(z) = S(z)
`M2Jz)= S(z)H2(z)
`
`which in tum leads to
`
`M 2s(z)= M 18 (z)H 2(z)
`H2{z)= M2s(z)
`M1Jz)
`which is the inverse of the H1(z) calculation, but different inputs are being used- now only the
`signal is occurring whereas before only the noise was occurring. While calculating H2(z), the
`
`values calculated for H1(z) are held constant and vice versa. Thus one of the assumptions has to
`
`be that H1(z) and H2(z) do not change substantially while the other is being calculated.
`After H1(z) and H2(z) have been calculated above, we can use them to remove the noise from the
`signal. If Equation I is rewritten as
`S{z)= M1 {z)-N{z)H1 {z)
`N(z)= M2{z)-S{z)H2{z)
`S{z) = M1 {z )-[M2 {z )-S{z)H2{z )}H1 {z) '
`S(zXl-H2(z)H1 (z)] = M 1 (z)-M2 (z)H1 (z)
`a solution for S(z) can be found:
`
`(3)
`
`If the transfer functions H 1(z) and H2(z) can be described with sufficient accuracy, then
`
`the noise can be completely removed and the original signal recovered. This remains true
`
`without respect to the amplitude or spectral characteristics of the noise. The only assumptions
`
`made are a perfect VAD, sufficiently accurate H1(z) andH2(z), and that H1(z) and H2(z) do not
`
`11
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`change substantially when the other is being calculated. In practice these assumptions have
`
`proven reasonable.
`
`12
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`FIGURES
`
`C01mnon rear vent voltune
`
`. .
`
`SIDE VlEW
`
`C01mnon vent OJ?ening (each side)
`
`FRONTVlEW
`
`FIGURE 1
`
`- 13 -
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`. ~·
`...... 0.-........... •
`
`I
`
`I
`I •
`
`•
`
`I
`
`•
`
`•
`.: • _,
`
`•
`
`:...,.-d=2cm .....
`
`FIGURE2
`
`- 14 -
`
`
`
`Attorney Docket No. ALPH.P034P
`Express Mail No. EV 708 402 622 US
`
`Filing Date June 27, 2007
`
`Results in cafe environment with no NS (top) and PF+SS (bottom)
`
`0.6
`
`C) 0.4
`C:
`·u;
`(/) 0.2
`Q)
`(.) e
`a.
`0
`C: -0.2
`
`0
`
`T""
`(.)
`
`~ -0.4
`
`0
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`3
`
`0.6
`
`en 0.4
`en
`+ 0.2
`LL a..
`~ cu -0.2
`T""
`0
`~ -0.4
`
`L..
`
`0
`
`0
`
`5
`
`10
`
`15
`20
`Time (sec)
`
`25
`
`30
`
`3
`
`FIGURE3
`
`- 15 -
`
`
`
`:::f:
`"O 0
`
`tn (cid:141) X
`til a
`"' ('I)
`"' '<
`3: t,
`~g
`Z@
`. z
`t'I1o <·
`-.,l (cid:141) Ot"°"
`
`0
`
`.....
`
`00 "'C
`::i:::
`.j::.
`O·
`N'"d
`0\ 8
`N~
`"'C
`N
`C
`en
`
`I
`
`V.AD
`
`I
`
`Voicing infonnation
`
`s(n)
`
`•
`
`~
`~~
`
`MIC 1
`
`(Cc;)))~
`
`~
`
`SIGNAL
`/ s(n)
`
`100
`
`101
`
`\
`(~;))),
`
`NOISE
`n(n)
`
`Noise removal
`
`Cleaned speech....,.
`
`103 ~·
`
`~~
`
`n(n)
`
`MIC2
`
`Figure 4
`
`~
`s·
`
`(JQ
`
`t, * ......
`
`i::
`::s
`('I)
`N
`u-.,l
`N
`0
`0
`-.,l
`
`- 16 -
`
`