`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 1 of 57
`
`
`
`
`EXHIBIT D
`EXHIBIT D
`
`
`
`
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 2 of 57
`ee”—STETTATATTAAAT
`
`US008321213B2
`
`US 8,321,213 B2
`(10) Patent No:
`a2) United States Patent
`Petit et al.
`(45) Date of Patent:
`*Nov. 27, 2012
`
`
`(54) ACOUSTIC VOICE ACTIVITY DETECTION
`(AVAD) FOR ELECTRONIC SYSTEMS
`
`(75)
`
`Inventors: Nicolas Petit, San Francisco, CA (US);
`Gregory Burnett, Dodge Center, MN
`(US); Zhinian Jing, San Francisco, CA
`US(US)
`(73) Assignee: uy Inc., San Francisco, CA
`:
`:
`:
`:
`+.
`(*) Notice:
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 540 days.
`.
`.
`.
`.
`.
`This patent is subject to a terminal dis-
`claimer.
`
`(21) Appl. No.: 12/606,146
`
`(22)
`(65)
`
`Filed:
`
`Oct. 26, 2009
`Prior Publication Data
`US 2010/0128894 Al
`May27, 2010
`
`Related U.S. Application Data
`(63) Continuation-in-part of application No. 12/139,333
`filed on Jun. 13. 2008. and a continuation-in-part of
`application No. 11/805,987,filed on May 25, 2007,
`now abandoned.
`
`(60) Provisional application No. 61/108,426, filed on Oct.
`24, 2008.
`
`(51)
`
`(56)
`
`Int. Cl.
`(2006.01)
`GIOL 11/06
`(52) US. Cd ccc cccteesceseneeenecnees 704/208; 704/214
`(58) Field of Classification Search ........0......... 704/208,
`704/210, 214, 215; 381/99, 100, 46
`See application file for complete search history.
`References Cited
`US. PATENT DOCUMENTS
`5,459,814 A *
`10/1995 Gupta etal. oo. 704/233
`7,171,357 B2*
`.. 704/231
`1/2007 Boland.
`.......
`
`704/296
`746.058 B2*
`7/2007 Burnett _....
`
`...scssesecre 704/210
`7,464,029 B2* 12/2008 Visser et al.
`9/2011 Bumettetal. ou... 381/718
`8,019,091 B2*
`4/2009 Wangetal. wo. 704/233
`2009/0089053 Al*
`* cited by examiner
`
`Primary Examiner — Abul Azad
`(74) Attorney, Agent, or Firm — Kokka & Backus, PC
`
`ABSTRACT
`(57)
`Acoustic Voice Activity Detection (AVAD) methodsandsys-
`tems are described. The AVAD methods and systems, includ-
`ing corresponding algorithms or programs, use microphones
`to generate virtual directional microphones which have very
`similar noise responsesand very dissimilar speech responses.
`The ratio of the energies of the virtual microphonesis then
`calculated over a given windowsize andthe ratio can then be
`used with a variety ofmethodsto generate a VAD signal. The
`virtual microphonescan be constructed using either an adap-
`tive or a fixedfilter.
`
`42 Claims, 35 Drawing Sheets
`
`Formingfirst virtual microphone by combining
`first signal offirst physical microphone and
`secondsignal of second physical microphone.
`
`Formingfilter that describes relationship for
`speech betweenfirst physical microphone
`and second physical microphone.
`
`v0
`
`502
`
`504
`
`energy ratio is greater than threshold value. 508
`
`Forming secondvirtual microphone by
`applyingfilter to first signal to generate
`first intermediate signal, and summing
`first intermediate signal and secondsignal.
`
`506
`
`Generating energy ratio of energiesoffirst virtual
`microphone and secondvirtual microphone.
`
`Detecting acoustic voice activity of speaker when
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 3 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 3 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 1 of 35
`
`US 8,321,213 B2
`
`
`
`FIG.2
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 4 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 4 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 2 of 35
`
`US 8,321,213 B2
`
`
`
`FIG.3
`
`
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 5 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 5 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 3 of 35
`
`US 8,321,213 B2
`
`first intermediate signal and second signal.
`
`y00
`
`502
`
`504
`
`306
`
`508
`
`510
`
`Forming first virtual microphone by combining
`first signal of first physical microphone and
`secondsignal of second physical microphone.
`
`Formingfilter that describes relationship for
`speech between first physical microphone
`and second physical microphone.
`
`Forming second virtual microphone by
`applyingfilter to first signal to generate
`first intermediate signal, and summing
`
`Generating energy ratio of energies offirst virtual
`microphone and second virtual microphone.
`
`Detecting acoustic voice activity of speaker when
`energy ratio is greater than threshold value.
`
`FIG.S
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 6 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 6 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 4 of 35
`
`US 8,321,213 B2
`
`
`
`OSIOUUlB}9qPaXlfJoy(W10}}0q)ZApue(do})[A
`
`
`
`
`
`
`
`
`
`(998)oy
`
`9DIA
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 7 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 7 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 5 of 35
`
`US 8,321,213 B2
`
` Ayao
`
`yooodsvjaqpoxtyJo}(W10}}0q)ZApue(dod)TA
`
`0€G202OlG
`
`
`
`(99s)aunty
`
`LOld
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 8 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 8 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 6 of 35
`
`US 8,321,213 B2
`
`[A
`
`
`
`
`QSI0UUIYooodsvjJ9qPoxXIyJOF(W10}}0q)ZApur(do})
`
`SESS ESUS eS aA SF
`
`GE0¢G¢4GhOl
`
`(998)ouuT}
`
`8DIA
`
` ------1____
`
`subseaey
`CTee
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 9 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 9 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 7 of 35
`
`US 8,321,213 B2
`
`
`
`
`
`QSIOUUlBjaqSAIdepeJ0F(W0j30q)ZApure(dor)[A
`
`
`
`
`
`G
`
`
`
`
`
`(998)
`
`OUI}0fwovGtsk
`
`6Did
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 10 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 10 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 8 of 35
`
`US 8,321,213 B2
`
`
`
`
`
`AyuoyooadsbyaqaAt}depeJoy(W0j0q)ZApue(doy)[A
`
`
`
`
`
`
`
`
`
`r-
`
`
`
`(998)SUIT}
`
`01Did
`
`qoondb---]
`
`
`
`LambJULdH
`
`toonaneas
`
`‘
`
`bee
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 11 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 11 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 9 of 35
`
`US 8,321,213 B2
`
`
`
`
`
`
`
`[A
`
`
`
`asiouUlyooadsejoqaAtjdepeJoy(woy0q)7Apue(dor)
`
`ae
`
`owt}wezcUGSbOl
`
`(998)
`
`IPDW
`
`!!11
`
`111
`
`Lt
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 12 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 12 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 10 of 35
`
`US 8,321,213 B2
`
`1230 Detection
`
`1250
`
`1240
`
`1230
`
` Processor
`
`Subsystem
`—
`Denoising
`Subsystem
`
`Voicing
`Sensors
`
`FIG.12
`
`
`
` Processor
`
`Denoising
`
`Detection
`
`FIG.13
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 13 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 13 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 11 of 35
`
`US 8,321,213 B2
`
`yooadspouesy)
`
`ISION
`
`7ON [PAOWAY
`
`())
`
`SION
`
`(u)u
`
`viDIA
`
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 14 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 14 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 12 of 35
`
`US 8,321,213 B2
`
`1250
`
`a
`
`Pa
`
`Constants:
`
`~ Readoeeeerdata|V=Oifnoise, 1 if UV, 2ifV
`
`from m1, m2, gems
`VTC = voiced threshold for corr
`
`—
`VTS = voiced threshold for std. dev.
`
`ff = forgetting factor for std. dev.
`
`Calculate XCORR of m1, gems|um_ma = # of taps in m.a.filter
`
`UV_ma = UVstd dev m.a.thresh
`Step 10 msec
`UV_std = UV std dev threshold
`UV = binary values denoting UV
`
`detected in each subband
`Calc mean (abs(XCORR)) = MC
`NAVSAD
`num_begin = # win at "beginning
`
`Variables:
`bh1 = LMScalc of MIC 1-2 TF
`Calc STD DEV of gems= GSD]
`keep_old = 1 if last win V/UV, 0 ow
`sd_ma_vector= last NV sd values
`Viwindow) = 2
` PSAD
`sd_ma = m.a.of the last NV sd
`bhi = bh1_old
`UV = [0,0], Filter m1 and
`
`m2 into 2 bands, 1500-2500
`and 2500-3500 Hz
`
`Calculate bh1 using
`Pathfinder for each subband
`
`Is
`
`new_sid> oy sd_ma
`new_sum = sum(abs(bh1));
`
`news oRV.s¢
`If not keep_old orat beginning,
`
`
`are we at the beginning?
`add new_sum fo new_sum_vector
`
`
`(ff numberslong)
`
`
`
`old_std = new_std
`
`new_std = STD DEV of
`keepold =0
`
`new sum vector
`
`UV(subband) = 2
`bh1_old = bh1
`
`
`
`bh1 = bh1_old
`
`
` If not keep_old or at beginning,
`keep_old = 1
`shift sd_ma_vectorto right
`
`After both subbands
`
`
`
`Replacefirst value in
`checked, is
`CEIL(SUM(UV}/2) = 1?
`sd ma vector with old std
`
`
` Filter sd_ma_vector with moving
`FIG. l 5
`averagefilter to get sd_ma
`
`IsMC >VTC and
`GSD > VTS?
`
`\_N
`
`
`
` 0
`
`td >
`
`UV
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 15 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 15 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 13 of 35
`
`US 8,321,213 B2
`
`Gems and Mean Correlation
`
`0
`
`0.5
`
`|
`
`1.5
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`FIG.16A
`
`Gems and Standard Deviation
`
`0
`
`0.5
`
`|
`
`1.5
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`FIG.16B
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 16 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 16 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 14 of 35
`
`US 8,321,213 B2
`
`v7 1100
`
`Voicing
`
`Noise
`
`1706.
`|
`| Acoustic
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 17 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 17 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 15 of 35
`
`US 8,321,213 B2
`
` Linear array
`
`midline
`
`FIG.18
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 18 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 18 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 16 of 35
`
`US 8,321,213 B2
`
`1900
`
`di versus delta M for delta d= 1, 2, 3,4 cm
`
`di (cm)
`
`FIG.19
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 19 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 19 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 17 of 35
`
`US 8,321,213 B2
`
`2000
`
`Gain Parameter
`2002
`
`Acoustic data (solid) and gain parameter (dashed)
`
`Acoustic Data
`
`0
`
`0.5
`
`I
`
`1.5
`2
`time (samples)
`
`2.5
`
`3
`
`3.5
`
`4
`
`x 10°
`
`FIG.20
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 20 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 20 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 18 of 35
`
`US 8,321,213 B2
`
`2100
`
`Mic1 and V for "pop pan" in \headmic\micgemsp1.bin
`
`Voicing Signal
`
`Audio Signal
`2104
`
`Level
`
`A. Unvoiced
`Level
`
`"Gems Signal
`2106
`
`Not Voiced
`
`0
`
`0.5
`
`|
`
`1.5
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`time (samples)
`
`FIG.21
`
`x 10
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 21 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 21 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 19 of 35
`
`US 8,321,213 B2
`
`
`yosadgpourayy
`
`[PAOWOYISION
`
`HSION
`
`(u)u
`
`0077—*
`
`00
`
`
`
`()
`
`(u)s
`
`10¢¢
`
`(sy)
`
`‘IVNDIS
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 22 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 22 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 20 of 35
`
`US 8,321,213 B2
`
`
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 23 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 23 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 21 of 35
`
`US 8,321,213 B2
`
`
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 24 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 24 of 57
`
`U.S. Patent
`
`US 8,321,213 B2
`
`Nov.27, 2012
`
`Sheet 22 of 35
`
`
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 25 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 25 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 23 of 35
`
`US 8,321,213 B2
`
` aw”
`
`TTT TT eee
`
`¢2
`
`702
`
`FIG.27
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 26 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 26 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 24 of 35
`
`US 8,321,213 B2
`
`Receive acoustic signals at a first physical
`microphone and a second physical microphone.
`
`Output first microphonesignal from first physical
`microphone and second microphonesignal from
`second physical microphone.
`
`Form first virtual microphoneusing the first combination
`of first microphone signal and second microphonesignal.
`
`Form second virtual microphone using second combination
`of first microphone signal and second microphonesignal.
`
`Generate denoised output signals having less
`acoustic noise than received acoustic signals.
`2800
`FIG.28
`
`Form physical microphone array includingfirst
`physical microphone and second physical microphone.
`
`signals from physical microphonearray.
`
`Form virtual microphone array includingfirst virtual
`microphone and second virtual microphone using
`
`2900—*
`
`FIG.29
`
`2802
`
`2804
`
`9806
`
`9808
`
`2810
`
`2902
`
`2904
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 27 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 27 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 25 of 35
`
`US 8,321,213 B2
`
`Linear response of V2 to a speech source at 0.10 meters
`
`4------
`
`4------b-----
`
`180
`+11l1 FIG.31
`
`-----4------
`
`11(114bo.
`
`11(1
`
`Linear response of V2 to a noise source at 1 meters
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 28 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 28 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 26 of 35
`
`US 8,321,213 B2
`
`Linear response of V1 to a speech source at 0.10 meters
`0
`
`0.8
`
`7
`
`near Tesponse 0
`
`f V1 toa no
`
`IS SOUrCE a
`
`t 1 meters
`
`L cM
`
`omO—Loy
`
`
`
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 29 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 29 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 27 of 35
`
`US 8,321,213 B2
`
`Linear response of V1 to a speech source at 0.1 meters
`
`180
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 30 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 30 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 28 of 35
`
`US 8,321,213 B2
`
`
`
`Response(dB)
`
`|
`:
`
`,
`:
`
`'Cardioid speech '
`___Tesponse
`
`!
`:
`
`|
`3
`
`Frequency response at 0 degrees
`
`facepevceereeterrereertens gph,fenerperenne
`response
`-------b---------eeeeet-te
`
`
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`Frequency (Hz)
`
`FIG.35
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 31 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 31 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 29 of 35
`
`US 8,321,213 B2
`
`
`
`Response(dB)
`
`
`
`V1/V2forspeech(dB)
`
`0
`FIG.36
`
`V1/V2 for speech versus B assuming d, = 0.1m
`
`V1(top, dashed) and V2 speech response vs. B assuming d, = 0.1m
`
`
`0.4
`
`0.5
`
`06
`
`07
`
`0.8
`B
`FIG.37
`
`0.9
`
`1
`
`1.1
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 32 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 32 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 30 of 35
`
`US 8,321,213 B2
`
`
`
`B factorvs. actual d, assuming d, = 0.1m and theta = 0
`B versus theta assuming d, = 0.1m
`
`0.05
`
`01
`
`015
`
`O08
`025
`O02
`Actual d, (meters)
`FIG.38
`
`035
`
`04
`
`045
`
`05
`
`
`
`80
`
`60
`
`40
`
`-20
`0
`20
`theta (degrees)
`FIG.39
`
`40
`
`60
`
`80
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 33 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 33 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 31 of 35
`
`US 8,321,213 B2
`
`(dB)
`Amplitude
`(degrees)
`Phase
`
`6000
`
`7000
`
`8000
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`Frequency (Hz)
`
`FIG.40
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 34 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 34 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 32 of 35
`
`US 8,321,213 B2
`
`40)
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`(dB)
`Amplitude
`
`
`Phase(degrees)
`
`180
`
`0
`
`1000
`
`2000
`
`
`5000
`3000
`4000
`Frequency (Hz)
`
`6000
`
`7000
`
`8000
`
`FIG.41
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 35 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 35 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 33 of 35
`
`US 8,321,213 B2
`
`Cancellation with dl = 1, thetal = 0, d2 = 1, and theta2 = 30 Amplitude
`(dB)
`(degrees)
`Phase
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`0
`
`1000
`
`2000
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`Frequency (Hz)
`
`FIG.42
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 36 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 36 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 34 of 35
`
`US 8,321,213 B2
`
`
`
`
`
`Cancellation with dl = 1, thetal = 0, d2 = 1, and theta2 = 45
`
`Phase (degrees)
`
`
`
`Amplitude(dB)
`
`
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`Frequency (Hz)
`
`FIG.43
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 37 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 37 of 57
`
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 35 of 35
`
`US 8,321,213 B2
`
`Original V1 (top) and cleaned V1 (bottom) with simplified VAD (dashed) in noise
`
`Noisy
`Cleaned
`
`0
`
`0.5
`
`1
`1.5
`Time (samples at 8 kHz/sec)
`
`2
`
`2.5
`
`FIG.44
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 38 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 38 of 57
`
`US 8,321,213 B2
`
`1
`ACOUSTIC VOICE ACTIVITY DETECTION
`
`(AVAD) FOR ELECTRONIC SYSTEMS
`
`RELATED APPLICATIONS
`
`This application claimsthe benefit of U.S. Patent Applica-
`tion No. 61/108,426, filed Oct. 24, 2008.
`This application is a continuation in part of U.S. patent
`application Ser. No. 11/805,987, filed May 25, 2007.
`This application is a continuation in part of U.S. patent
`application Ser. No. 12/139,333, filed Jun. 13, 2008.
`
`TECHNICAL FIELD
`
`The disclosure herein relates generally to noise suppres-
`sion. In particular, this disclosure relates to noise suppression
`systems, devices, and methods for use in acoustic applica-
`tions.
`
`BACKGROUND
`
`The ability to correctly identify voiced and unvoiced
`speech is critical to many speech applications including
`speech recognition, speaker verification, noise suppression,
`and many others. In a typical acoustic application, speech
`from a human speaker is captured and transmitted to a
`receiver in a different location. In the speaker’s environment
`there may exist one or more noise sources that pollute the
`speech signal, the signal of interest, with unwanted acoustic
`noise. This makes it difficult or impossible for the receiver,
`whether human or machine, to understand the user’s speech.
`Typical methods for classifying voiced and unvoiced
`speech haverelied mainly on the acoustic content of single
`microphone data, which is plagued by problems with noise
`and the corresponding uncertainties in signal content. This is
`especially problematic withthe proliferation ofportable com-
`munication devices like mobile telephones. There are meth-
`ods knownin the art for suppressing the noise present in the
`speech signals, but these normally require a robust method of
`determining when speech is being produced. Non-acoustic
`methods have been employed successfully in commercial
`products suchas the Jawbone headset produced by Aliphcom,
`Inc., San Francisco, Calif. (Aliph), but an acoustic-only solu-
`tion is desired in some cases (e.g., for reduced cost, as a
`supplementto the non-acoustic sensor, etc.).
`
`INCORPORATION BY REFERENCE
`
`Each patent, patent application, and/or publication men-
`tioned in this specification is herein incorporated by reference
`in its entirety to the sameextent as if each individualpatent,
`patent application, and/or publication was specifically and
`individually indicated to be incorporated by reference.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG.1 is a configuration of a two-microphone array with
`speech source S, under an embodiment.
`FIG.2 is a block diagram ofV, construction using a fixed
`B(z), under an embodiment.
`FIG. 3 is a block diagram of V, construction using an
`adaptive B(z), under an embodiment.
`FIG. 4 is a block diagram of V, construction, under an
`embodiment.
`FIG. 5 is a flow diagram of acoustic voice activity detec-
`tion, under an embodiment.
`
`2
`FIG.6 showsexperimentalresults of the algorithm using a
`fixed beta when only noise is present, under an embodiment.
`FIG. 7 shows experimentalresults of the algorithm using a
`fixed beta when only speech is present, under an embodiment.
`FIG. 8 showsexperimentalresults of the algorithm using a
`fixed beta when speech andnoise is present, under an embodi-
`ment.
`
`FIG. 9 shows experimental results of the algorithm using
`an adaptive beta whenonly noise is present, under an embodi-
`ment.
`
`FIG. 10 shows experimentalresults of the algorithm using
`an adaptive beta when only speech is present, under an
`embodiment.
`
`FIG. 11 shows experimentalresults of the algorithm using
`an adaptive beta when speech and noise is present, under an
`embodiment.
`
`FIG. 12 is ablock diagram of aNAVSADsystem, under an
`embodiment
`
`FIG. 13 is a block diagram of a PSAD system, under an
`embodiment.
`
`FIG. 14 is a block diagram of a denoising subsystem,
`referred to herein as the Pathfinder system, under an embodi-
`ment.
`
`FIG. 15 is a flow diagram of a detection algorithm for use
`in detecting voiced and unvoiced speech, under an embodi-
`ment.
`
`20
`
`25
`
`FIGS. 16A, 16B, and 17 show data plots for an example in
`which a subject twice speaks the phrase “pop pan”, under an
`embodiment.
`
`30
`
`FIG. 16A plots the received GEMSsignalfor this utterance
`along with the mean correlation between the GEMSsignal
`and the Mic 1 signal and the threshold T1 used for voiced
`speech detection, under an embodiment.
`FIG. 16Bplots the recetved GEMSsignalfor this utterance
`along with the standard deviation ofthe GEMSsignal and the
`threshold T2 used for voiced speech detection, under an
`embodiment.
`
`FIG. 17 plots voiced speech detected from the acoustic or
`audio signal, along with the GEMSsignal and the acoustic
`noise; no unvoiced speechis detectedin this example because
`ofthe heavy background babble noise, under an embodiment.
`FIG. 18 is a microphonearray for use under an embodi-
`ment of the PSAD system.
`FIG. 19 is a plot of AM versus d, for several Ad values,
`under an embodiment.
`
`FIG. 20 showsa plotofthe gain parameteras the sum ofthe
`absolute values of H,(z) and the acoustic data or audio from
`microphone1, under an embodiment.
`FIG.21 is an alternative plot of acoustic data presented in
`FIG. 20, under an embodiment.
`FIG. 22 is a two-microphoneadaptive noise suppression
`system, under an embodiment.
`FIG. 23 is a generalized two-microphone array (DOMA)
`including an array and speech source S configuration, under
`an embodiment.
`
`FIG. 24 is a system for generating or producing a first order
`gradient microphone V using two omnidirectional elements
`O, and O,, under an embodiment.
`FIG. 25 is a block diagram for a DOMA including two
`physical microphones configured to form two virtual micro-
`phones V, and V,,, under an embodiment.
`FIG. 26 is a block diagram for a DOMA including two
`physical microphones configured to form N virtual micro-
`phones V, through V,,, where N is any numbergreater than
`one, under an embodiment.
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 39 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 39 of 57
`
`3
`FIG. 271s an example ofa headset or head-worn device that
`includes the DOMA,as described herein, under an embodi-
`ment.
`
`4
`signal but requires training. In addition, restrictions can be
`placedonthefilter to ensure thatit is training only on speech
`and not on environmentalnoise.
`
`US 8,321,213 B2
`
`FIG. 28 is a flow diagram for denoising acoustic signals
`using the DOMA,under an embodiment.
`FIG.29 is a flow diagram for forming the DOMA,under an
`embodiment.
`
`FIG.30 is a plot of linear response of virtual microphone
`V, with B=0.8 to a 1 kHz speech source at a distance of 0.1 m,
`under an embodiment.
`
`FIG.31 is a plot of linear response of virtual microphone
`V, with (8=0.8 to a 1 kHz noise source at a distance of 1.0 m,
`under an embodiment.
`FIG.32 is a plot of linear response of virtual microphone
`V, with B=0.8 toa 1 kHz speech source at a distance of 0.1 m,
`under an embodiment.
`FIG.33 is a plot of linear response of virtual microphone
`V, with B=0.8 to a 1 kHz noise sourceat a distance of 1.0 m,
`under an embodiment.
`
`FIG.34 is a plot of linear response of virtual microphone
`V, with 6=0.8 to a speech source at a distance of 0.1 m for
`frequencies of 100, 500, 1000, 2000, 3000, and 4000 Hz,
`under an embodiment.
`FIG. 35 is a plot showing comparison of frequency
`responses for speech for the array of an embodimentand for
`a conventional cardioid microphone, under an embodiment.
`FIG. 36 is a plot showing speech response for V, (top,
`dashed) andV,, (bottom, solid) versus B with d, assumedto be
`0.1 m, under an embodiment, under an embodiment.
`FIG.37 is a plot showing a ratio ofV/V, speech responses
`shown in FIG. 31 versus B, under an embodiment.
`FIG.38 is a plot of B versus actual d, assuming that d,=10
`cm and theta=0, under an embodiment.
`FIG. 39 is a plot of B versus theta with d=10 cm and
`assuming d,=10 cm, under an embodiment.
`FIG. 40 is a plot of amplitude (top) and phase (bottom)
`response of N(s) with B=1 and D=-7.2 usec, under an
`embodiment.
`
`FIG. 41 is a plot of amplitude (top) and phase (bottom)
`response of N(s) with B=1.2 and D=-7.2 usec, under an
`embodiment.
`FIG. 42 is a plot of amplitude (top) and phase (bottom)
`responseofthe effect on the speech cancellation in V, due to
`a mistake in the location of the speech source with q1=0
`degrees and q2=30 degrees, under an embodiment.
`FIG. 43 is a plot of amplitude (top) and phase (bottom)
`response of the effect on the speech cancellation in V, due to
`a mistake in the location of the speech source with q1=0
`degrees and q2=45 degrees, under an embodiment.
`FIG.44 shows experimentalresults for a 2d,=19 mm array
`using a linear B of 0.83 and B1=B2=1 on a Bruel and Kjaer
`Head and Torso Simulator (HATS) in very loud (~85 dBA)
`music/speech noise environment.
`
`DETAILED DESCRIPTION
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`In the following description, numerousspecific details are
`introduced to provide a thorough understanding of, and
`enabling description for, embodiments. One skilled in the
`relevant art, however, will recognize that these embodiments
`can be practiced without one or more ofthe specific details, or
`with other components, systems, etc. In other instances, well-
`known structures or operations are not shown, or are not
`described in detail, to avoid obscuring aspectsofthe disclosed
`embodiments.
`FIG. 1 is a configuration of a two-microphonearray of the
`AVAD with speech source S, under an embodiment. The
`AVADof an embodimentuses two physical microphones (O,
`and O.,) to form two virtual microphones (V, and V,). The
`virtual microphones of an embodimentare directional micro-
`phones, but the embodimentis not so limited. The physical
`microphones of an embodiment
`include omnidirectional
`microphones, but the embodiments described herein are not
`limited to omnidirectional microphones. The virtual micro-
`phone (VM)V,is configured in sucha waythatit has minimal
`responseto the speech of the user, while V, is configured so
`that it does respondto the user’s speech buthas a very similar
`noise magnitude response to V,, as described in detail herein.
`The PSAD VAD methodscan then be used to determine when
`speech is taking place. A further refinementis the use of an
`adaptivefilter to further minimize the speech response ofV,,
`thereby increasing the speech energyratio used in PSAD and
`resulting in better overall performance of the AVAD.
`The PSAD algorithm as described herein calculates the
`ratio of the energies of two directional microphones M, and
`M3:
`
`
`My (2;
`My(zi)”
`
`66599
`1
`
`wherethe “z’”indicates the discrete frequency domain and
`ranges from the beginning of the window ofinterest to the
`end, but the samerelationship holds in the time domain. The
`summation can occur over a window of any length; 200
`samples at a sampling rate of 8 kHz has been used to good
`effect. Microphone M,is assumedto have a greater speech
`response than microphone M,. The ratio R depends on the
`relative strength of the acoustic signal of interest as detected
`by the microphones.
`For matched omnidirectional microphones(i.e. they have
`the same response to acoustic signals forall spatial orienta-
`tions and frequencies), the size of R can be calculated for
`speech and noise by approximating the propagation of speech
`and noise wavesas spherically symmetric sources. For these
`the energy of the propagating wave decreases as 1/y:
`
`Acoustic Voice Activity Detection (AVAD) methods and
`systems are described herein. The AVAD methodsandsys-
`tems, which include algorithms or programs, use micro-
`phones to generate virtual directional microphones which
`have very similar noise responses and very dissimilar speech
`Thedistance d, is the distance from the acoustic source to
`responses. The ratio of the energies of the virtual micro-
`M,, d, is the distance from the acoustic source to M,, and
`phonesis then calculated over a given window size and the
`d=d,-d, (see FIG. 1). It is assumed that O, is closer to the
`ratio can then be used with a variety of methodsto generate a
`speech source (the user’s mouth) so that d is alwayspositive.
`VAD signal. The virtual microphones can be constructed
`
`using either a fixed or an adaptive filter. The adaptivefilter If the microphonesand the user’s mouthare all onaline, then
`generally results in a more accurate and noise-robust VAD
`d=2d,, the distance between the microphones. For matched
`
`» Miz dz
`
`. Moz
`
`R=
`
`60
`
`65
`
`
`=2-.
`dy
`
`di +d
`
`aq
`
`
`
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 40 of 57
`Case 6:21-cv-00984-ADA Document 19-4 Filed 12/23/21 Page 40 of 57
`
`US 8,321,213 B2
`
`5
`omnidirectional microphones, the magnitude of R, depends
`only onthe relative distance between the microphonesand the
`acoustic source. For noise sources, the distances are typically
`a meter or more, and for speech sources, the distances are on
`the order of 10 cm, but the distances are not so limited.
`Therefore for a 2-cm array typical values of R are:
`
`6
`Thefilter B(z) can also be determined experimentally using
`an adaptivefilter. FIG. 3 is a block diagram ofV, construction
`using an adaptive B(z), under an embodiment, where:
`
`
`a(Z)O2(z)
`ZO;(2)
`
`Bz) =
`
`
`Ru @ 12cm _
`5 aq 10em
`
`102 cm 1.02
`d
`vad 100em
`
`where the “S”subscript denotes the ratio for speech sources
`and “N”the ratio for noise sources. There is not a significant
`amount of separation between noise and speech sources in
`this case, and therefore it would be difficult to implement a
`robust solution using simple omnidirectional microphones.
`A better implementationis to use directional microphones
`where the second microphone has minimal speech response.
`As described herein, such microphones can be constructed
`using omnidirectional microphonesO, and O,:
`
`Fi @=-B@)a@)O2Z)+01(@)2"
`
`Vo(@)-a(z)O2(z)-BE)Oi@)z*
`
`where a(z) is a calibration filter used to compensate O,’s
`response so that it is the same as O,, 6(z) is a filter that
`describes the relationship between O, and calibrated O, for
`speech, andy is a fixed delay that depends onthe size of the
`array. There is no loss of generality in defining a(z) as above,
`as either microphone may be compensated to match the other.
`For this configuration V, and V, have very similar noise
`response magnitudes and very dissimilar speech response
`magnitudesif
`
`20
`
`25
`
`30
`
`35
`
`40
`
`where again d=2d, and c is the speed of soundin air, which is
`temperature dependent and approximately
`
`45
`
`|
`
`m
`T
`1+ a5 sec
`
`ec =331.3
`
`50
`
`where T is the temperature of the air in Celsius.
`Thefilter B(z) can be calculated using wave theory to be
`
`a
`Pa= a=
`
`
`dy
`djt+d
`
`55
`
`[2]
`
`where again d, is the distance from the user’s mouth to O,.
`FIG.2 is a block diagram of V, construction using a fixed
`B(z), under an embodiment. This fixed (or static) 6B works
`sufficiently well ifthe calibration filter a(z) is accurate and d,
`and d., are accurate for the user. This fixed-B algorithm, how-
`ever, neglects important effects such as reflection,diffraction,
`poorarray orientation(i.e. the microphonesand the mouth of
`the userare not all ona line), andthe possibility of different d,
`and d, values for different users.
`
`60
`
`65
`
`The adaptive process varies A(z) to minimizethe outputofV>
`when only speech is being received by O, and O,. A small
`amountof noise maybetolerated with little ill effect, butit is
`preferred that only speech is being received whenthe coeffi-
`cients of A(z) are calculated. Any adaptive process may be
`used; a normalized least-mean squares (NLMS) algorithm
`wasused in the examples below.
`The V, can be constructedusing the current value for f(z)
`or the fixed filter B(z) can be used for simplicity. FIG. 4 is a
`block diagram ofV, construction, under an embodiment.
`Nowtheratio R is
`
`
`_ WM
`Ie
`
`(-Blz)a(Z)O2(z) + Ovle)e")
`(a(2)Ox(z) — B20 (Zz)
`
`where double bar indicates norm and again any size window
`maybe used.If B(z) has been accurately calculated,therati