throbber
Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 1 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 1 of 19
`
`
`
`
`EXHIBIT B
`EXHIBIT B
`
`
`
`
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 2 of 19
`ee”TATA
`
`US007246058B2
`
`a2) United States Patent
`US 7,246,058 B2
`(10) Patent No.:
`
` Burnett (45) Date of Patent: Jul. 17, 2007
`
`
`(54) DETECTING VOICED AND UNVOICED
`SPEECH USING BOTH ACOUSTIC AND
`NONACOUSTIC SENSORS
`
`(75)
`
`J
`
`nventor:
`
`Gregory C. B
`- Burnett,
`rego
`(US)
`
`Li
`Livermore,
`
`CA
`
`(73) Assignee: Aliph, Inc., San Francisco, CA (US)
`(*) Notice:
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 688 days.
`
`(21) Appl. No.: 10/159,770
`
`(22)
`
`(65)
`
`4.
`Filed:
`
`May30, 2002
`
`Prior Publication Data
`
`US 2002/0198705 Al
`
`Dec. 26, 2002
`
`Related U.S. Application Data
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`3,789,166 A
`
`1/1974 Sebesta
`
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`0 637 187 A
`2/1995
`
`EP
`
`(Continued)
`
`OTHER PUBLICATIONS
`
`Gregory C. Burnett: “The Physiological Basis of Glottal Electro-
`magnetic Micropower Sensors (GEMS) and Their Use in Defining
`an Excitation Function for the Human Vocal Tract”, Dissertation,
`University of California at Davis, Jan. 1999, USA.
`
`(Continued)
`
`Primary Examiner—Abul K. Azad
`(74) Attorney, Agent, or Firm—Courtney Staniford &
`Gregory LLP
`
`(57)
`
`ABSTRACT
`
`(60) Provisional application No. 60/294,383, filed on May
`30, 2001, provisional application No. 60/335,100,
`filed on Oct. 30, 2001, provisional application No.
`60/332,202,filed on Nov. 21, 2001, provisional appli-
`Systems and methodsare provided for detecting voiced and
`cation No. 60/362,162,filed on Mar. 5, 2002, provi-
`unvoiced speech in acoustic signals having varying levels of
`sional application No. 60/362,103, filed on Mar. 5,
`background noise. The systems receive acoustic signals at
`2002, provisional application No. 60/362,170, filed
`two microphones, and generate difference parameters
`on Mar. 5, 2002, provisional application No. 60/361,
`between the acoustic signals received at each of the two
`981, filed on Mar. 5, 2002, provisional application
`microphones. The difference parameters are representative
`No. 60/362,161,
`filed on Mar.
`5, 2002, provisional
`application No. 60/368,209,filed on Mar. 27, 2002,_0! therelative differencein signal gain between portions of
`
`
`
`
`‘sional 60/368.208. filedonM.application No. the received acoustic signals. The systems identify informa-
`Provisional app ication 0. OUO9G6,
`200,
`Te? On
`Nlar.
`tion of the acoustic signals as unvoiced speech when the
`
`27, 2002, provisional application No. 60/368,343, difference parameters exceedafirst threshold, and identify
`filed on Mar. 27, 2002.
`information of the acoustic signals as voiced speech when
`the difference parameters exceed a second threshold. Fur-
`ther, embodiments of the systems include non-acoustic
`sensors that receive physiological
`information to aid in
`identifying voiced speech.
`
`(51)
`
`Int. Cl.
`(2006.01)
`GIOL 11/06
`(52) US. C1. ccccccceeeeeeseseststeeseens 704/226; 704/214
`(58) Field of Classification Search ......0.000000.. None
`See application file for complete search history.
`
`5 Claims, 10 Drawing Sheets
`
`
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 3 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 3 of 19
`
`US 7,246,058 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`2002/0039425 Al
`
`4/2002 Burnett et al.
`
`AAAAAAAAAAAAAAA
`
`AAAAAAAAAAAAAAB
`
`4,006,318
`4,591,668
`4,653,102
`4,777,649
`4,901,354
`5,097,515
`5,212,764
`5,400,409
`5,406,622
`5,414,776
`5,473,702
`5,515,865
`5,517,435
`5,539,859
`5,590,241
`5,633,935
`5,649,055
`5,664,052
`5,684,460
`5,729,694
`5,754,665
`5,835,608
`5,853,005
`5,917,921
`5,966,090
`5,986,600
`6,006,175
`6,009,396
`6,069,963
`6,191,724
`6,233,551
`6,266,422
`6,430,295
`
`ok
`ok
`
`*
`
`2/1977
`5/1986
`3/1987
`10/1988
`2/1990
`3/1992
`5/1993
`3/1995
`4/1995
`5/1995
`12/1995
`5/1996
`5/1996
`7/1996
`12/1996
`5/1997
`TN997
`9/1997
`11/1997
`3/1998
`5/1998
`11/1998
`12/1998
`6/1999
`10/1999
`11/1999
`12/1999
`12/1999
`5/2000
`2/2001
`5/2001
`7/2001
`8/2002
`
`Sebesta et al.
`Iwata
`Hansen ..........sseeseeeeeeees 381/92
`Carlson et al. we. 704/233
`Gollmar etal.
`Baba
`Ariyoshi
`Linhard
`Silverberg et al.
`Sims, Jr.
`Yoshida etal.
`Scanlonetal.
`Sugiyama
`Robbeetal.
`Park et al. oe 704/227
`Kanamori etal.
`Guptaetal.
`Nishiguchiet al.
`Scanlonetal.
`Holzrichter et al.
`Hosoiet al.
`Warnakaetal.
`Scanlon
`Sasaki et al.
`McEwan
`McEwan
`Holzrichter ............... 704/208
`Nagata
`Martin et al.
`McEwan
`Cho et al. wee 704/208
`Ikeda
`Handeletal.
`
`......... 704/214
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`EP
`JP
`JP
`WO
`
`0 795 851 A2
`0 984 660 A2
`2000 312 395
`2001 189 987
`WO 02 07151
`
`9/1997
`3/2000
`11/2000
`7/2001
`1/2002
`
`OTHER PUBLICATIONS
`
`Todd J. Gableet al.: “Speaker Verification Using Combined. Acous-
`tic and EM SensorSignal Processing”, IEEE Intl. Conf. on Acous-
`tics, Speech & Signal Processing (ICASSP-2001), Salt Lake City,
`USA,2001.
`A. Hussain: “Intelligibility Assessment of a Multi-Band Speech
`Enhancement Scheme”, Proceedings IEEE Intl. Conf. on Acoustics,
`Speech & Signal Processing ICASSP-2000). Istanbul, Turkey, Jun.
`2000.
`ZhaoLi et al: “Robust Speech Coding Using Microphone Arrays”,
`Signals Systems and Computers, 1997. Conf.
`record of 31st
`Asilomar Conf., Nov. 2-5, 1997, IEEE Comput. Soc. Nov. 2, 1997,
`USA.
`L.C. Ng et al.: “Denoising of Human Speech Using Combined.
`Acoustic and EM Sensor Signal Processing”, 2000 IEEE Intl Conf
`on Acoustics Speech and Signal Processing. Proceedings (Cat. No.
`00CH37100),
`Istanbul, Turkey, Jun. 5-9, 2000, XP002186255,
`ISBN 0-7803-6293-4.
`S. Affes et al.: “A Signal Subspace Tracking Algorithm for Micro-
`phone Array Processing of Speech”. IEEE Transactions on Speech
`and Audio Processing, N.Y, USA vol. 5, No. 5, Sep. 1, 1997,
`XP000774303, ISBN 1063-6676.
`
`* cited by examiner
`
`l
`BI*
`Bl
`Bl
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 4 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 4 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 1 of 10
`
`US 7,246,058 B2
`
`
`
`100
` PROCESSOR
`30
`
`
`MICROPHONES
`
`10
`
`DETECTION
`
`
`SUBSYSTEM
`
`
`VOICING
`
`50
`
`SENSORS
`
` 20
`
`
`
` DENOISING
`
`
`SUBSYSTEM
`40
`
`
`
`Figure 1
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 5 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 5 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 2 of 10
`
`US 7,246,058 B2
`
`200
` PROCESSOR
`
`
`MICROPHONES
`
`10
`
` DETECTION
`SUBSYSTEM
`50
`
`30
`
`
`
`
`
`
` DENOISING
`
`SUBSYSTEM
`40
`
`Figure 2
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 6 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 6 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 3 of 10
`
`US 7,246,058 B2
`
`OPE
`
`@—'peedspours[D[VAOUIOIISIONI
`
`
`
`}fide<—s@)
`(eyIONos(u)s%TIVNDIS
`ozJUONVULIO;UTBupjoxa——|_avA|
`>\eo
`ZON(uu
`ASION’<———(wu(9)
`
`
`
`Oe
`
`‘>
`
`¢dans]
`
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 7 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 7 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 4 of 10
`
`US 7,246,058 B2
`
`50
`
`
`
`V(window = 0)
`Read in 20 msec of data
`from ml, m2, gems
`
`
`
`Cakulate XCORR of
`mi, gems
`
`NAVSAD
`
`Cak mean(abs(XCORR}
`
`anteXCORR)
`
`Cak STD DEV of gems
`= GSD
`
`Constants:
`V= Oif noise, 1 if UV, 2if V
`VIC= voiced threshold for corr
`VTS = voiced threshold for std. dev.
`ff = forgetting factor for std. dev.
`num_ma = # oftaps in m.afilter
`UV_ma = UV std dev m.a.thresh
`UV_std = UV std dev threshold
`UV = binary values denoting UV
`detected in each subband
`num_begin = # win at "beginning"
`Variables:
`bhi = LMS calc of MIC 1-2 TF
`keep_old= 1 iflast win V/DV, 0 ow
`sd_ma_vector = last NV sd values
`sd_ma=m.a ofthe last NVsd
`PSAD
`
`
`
`
`Is MC> VTC
`
`V¢window) =2 [YES
`
`and
`
`
`bhi = bhi_otd
`GSD> VTS?
`
`
`NO
`
`
`old_std =new_std
`keep_old =0
`bhi_old= bil
`
`new_std>UV_ma*sd_ma
`and
`new_std >UV_sd
`OR
`
`
`
`
`UV = (0 0], Fitter ml and
`m2 into 2 bands,
`1500-2500 and
`2500-3500 Hz
`
`
`
`
`Cakulate bhi using
`Pathfinder for each
`subband
`
`new_sum =
`
`sum{abs(bhl));
`
`
`
`Ifnot keep_old or at
`
`beginning, add new_sum
`
`to new_sum_vector (fF
`
`
`numbers long)
`
`new_sid = STD DEV of
`new_sum_ vector
`
` If not keepold or at
`
`
`begiming, shift
`sd_ma_vectorto right
`
`
`
`Replace first value in
`sd_ma_vector with
`old_sid
`
`Filter sd_ma_vector with
`moving averagefiter to
`get sd_ma
`
`
`
`
` Is
`are we at the beginnig?
`
`
`
`
`After both subbands
`checked,is
`CEL(SUM(UV)2) = 1?
`
`Vwindow) = 1
`
`YES
`
`UV(subband) = 2
`bhi = bhi_old
`keep_old=1
`
`Figure 4
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 8 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 8 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 5 of 10
`
`US 7,246,058 B2
`
`Figure 5A
`
`GEMS AND MEAN CORRELATION
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 9 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 9 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 6 of 10
`
`US 7,246,058 B2
`
`
`T
`T
`T
`T
`T
`T
`T
`|
`
`600
`
`
`
`|
`
`ACOUSTIC
`NOISE
`
`|
`
`
`
`
`
`. VOICING
`602 NE — |
`bo
`|
`|
`
`|
`
`mo
`
`| f | -
`
`1
`
`|
`
`a
`
`aL
`2
`
`_t
`.
`
`_t
`3
`
`_S
`
`J
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 10 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 10 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 7 of 10
`
`US 7,246,058 B2
`
` Linear array
`
`midline
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 11 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 11 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 8 of 10
`
`US 7,246,058 B2
`
`3
`
`t
`
`deltaM
` t
`
`800
`d1 versus delta M fordeltad =1,2,3,4cm
`Jf
`——T
`
`TT
`
`TT
`
`—T
`
`0
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`d1 (cm)
`
`Figure 8
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 12 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 12 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 9 of 10
`
`US 7,246,058 B2
`
`1
`
`900
`
`TT
`
`
`
`
`
`|
`
`i
`Acoustic data (solid) and gain parameter (dashed)
`T
`—
`T
`T
`1
`
`|
`
`|
`+—— ACOUSTIC DATA
`904
`
`i
`
`t
`1
`L
`L
`_.
`1
`l
`|
`4
`0
`0.5
`4
`1.5
`2
`2.5
`3
`3.5
`4
`time (samples)
`x 10°
`
`Figure 9
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 13 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 13 of 19
`
`U.S. Patent
`
`Jul. 17, 2007
`
`Sheet 10 of 10
`
`US 7,246,058 B2
`
`Mic 1 and V for "pop pan”in \headmic\micgems_p1.bin /
`
`T
`T
`“Ty
`1
`—
`—
`| VOICING
`| SIGNAL
`1002
`\
`
`11004
`
`
`
`VOICED
`LEVEL
`
`UNVOICED
`LEVEL
`
`GEMS
`SIGNAL
`/ 1006
`NOT
`
`/
`
`~ VOICED
`LEVEL
`
`co
`
`
`
`
`
`
`
`
`
`
`
`
`
`9
`
`4
`
`
`
`time (samples)
`
`x 10°
`
`Figure 10
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 14 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 14 of 19
`
`US 7,246,058 B2
`
`1
`DETECTING VOICED AND UNVOICED
`SPEECH USING BOTH ACOUSTIC AND
`NONACOUSTIC SENSORS
`
`RELATED APPLICATIONS
`
`This application claims the benefit of U.S. application
`Nos. 60/294,383 filed May 30, 2001; 09/905,361 filed Jul.
`12, 2001; 60/335,100 filed Oct. 30, 2001; 60/332,202 and
`09/990,847, both filed Nov. 21, 2001; 60/362,103, 60/362,
`161, 60/362,162, 60/362,170, and 60/361,981, all filed Mar.
`5, 2002; 60/368,208, 60/368,209, and 60/368,343, all filed
`Mar. 27, 2002; all of which are incorporated herein by
`reference in their entirety.
`
`TECHNICAL FIELD
`
`The disclosed embodiments relate to the processing of
`speech signals.
`
`BACKGROUND
`
`The ability to correctly identify voiced and unvoiced
`speech is critical to many speech applications including
`speech recognition, speaker verification, noise suppression,
`and many others. In a typical acoustic application, speech
`from a human speaker is captured and transmitted to a
`receiver in a different location. In the speaker’s environment
`there may exist one or more noise sources that pollute the
`speech signal, or the signal of interest, with unwanted
`acoustic noise. This makes it difficult or impossible for the
`receiver, whether human or machine,
`to understand the
`user’s speech.
`Typical methods for classifying voiced and unvoiced
`speech haverelied mainly on the acoustic content of micro-
`phonedata, which is plagued by problems with noise and the
`corresponding uncertainties in signal content. This is espe-
`cially problematic now with the proliferation of portable
`communication devices like cellular telephones and per-
`sonal digital assistants because, in manycases, the quality of
`service provided by the device depends on the quality of the
`voice services offered by the device. There are methods
`known in the art for suppressing the noise present in the
`speech signals, but these methods demonstrate performance
`shortcomings that include unusually long computing time,
`requirements for cumbersome hardware to perform the
`signal processing, and distorting the signals of interest.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`FIG. 1 is a block diagram of a NAVSADsystem, under an
`embodiment.
`
`FIG. 2 is a block diagram of a PSAD system, under an
`embodiment.
`
`FIG.3 is a block diagram of a denoising system, referred
`to herein as the Pathfinder system, under an embodiment.
`FIG.4 is a flow diagram of a detection algorithm for use
`in detecting voiced and unvoiced speech, under an embodi-
`ment.
`
`FIG. 5A plots the recetved GEMSsignal for an utterance
`along with the mean correlation between the GEMSsignal
`and the Mic 1 signal and the threshold for voiced speech
`detection.
`
`FIG. 5B plots the recetved GEMSsignal for an utterance
`along with the standard deviation of the GEMSsignal and
`the threshold for voiced speech detection.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`40
`
`50
`
`55
`
`60
`
`65
`
`2
`FIG. 6 plots voiced speech detected from an utterance
`along with the GEMSsignal and the acoustic noise.
`FIG. 7 is a microphone array for use under an embodi-
`ment of the PSAD system.
`FIG. 8 is a plot of AM versus d, for several Ad values,
`under an embodiment.
`FIG. 9 showsa plot of the gain parameter as the sum of
`the absolute values of H,(z) and the acoustic data or audio
`from microphone 1.
`FIG. 10 is an alternative plot of acoustic data presented in
`FIG. 9.
`
`In the figures, the same reference numbers identify iden-
`tical or substantially similar elements or acts.
`Any headings provided herein are for convenience only
`and do not necessarily affect the scope or meaning of the
`claimed invention.
`
`DETAILED DESCRIPTION
`
`Systems and methods for discriminating voiced and
`unvoiced speech from background noise are provided below
`including a Non-Acoustic Sensor Voiced Speech Activity
`Detection (NAVSAD) system and a Pathfinder Speech
`Activity Detection (PSAD) system. The noise removal and
`reduction methods provided herein, while allowing for the
`separation andclassification of unvoiced and voiced human
`speech from background noise, address the shortcomings of
`typical systems knownin the art by cleaning acoustic signals
`of interest without distortion.
`FIG. 1 is a block diagram of a NAVSAD system 100,
`under an embodiment. The NAVSADsystem couples micro-
`phones 10 and sensors 20 to at least one processor 30. The
`sensors 20 of an embodimentinclude voicing activity detec-
`tors or non-acoustic sensors. The processor 30 controls
`subsystems including a detection subsystem 50, referred to
`herein as a detection algorithm, and a denoising subsystem
`40. Operation of the denoising subsystem 40 is described in
`detail in the Related Applications. The NAVSAD system
`works extremely well
`in any background acoustic noise
`environment.
`
`FIG. 2 is a block diagram of a PSAD system 200, under
`an embodiment. The PSAD system couples microphones 10
`to at least one processor 30. The processor 30 includes a
`detection subsystem 50, referred to herein as a detection
`algorithm, and a denoising subsystem 40. The PSAD system
`is highly sensitive in low acoustic noise environments and
`relatively insensitive in high acoustic noise environments.
`The PSAD can operate independently or as a backup to the
`NAVSAD,detecting voiced speech if the NAVSADfails.
`Note that
`the detection subsystems 50 and denoising
`subsystems 40 of both the NAVSAD and PSADsystems of
`an embodimentare algorithms controlled by the processor
`30, but are not so limited. Alternative embodiments of the
`NAVSAD and PSAD systems can include detection sub-
`systems 50 and/or denoising subsystems 40 that comprise
`additional hardware, firmware, software, and/or combina-
`tions of hardware, firmware, and software. Furthermore,
`functions of the detection subsystems 50 and denoising
`subsystems 40 maybe distributed across numerous compo-
`nents of the NAVSAD and PSADsystems.
`FIG.3 is a block diagram of a denoising subsystem 300,
`referred to herein as the Pathfinder system, under an embodi-
`ment. The Pathfinder system is briefly described below, and
`is described in detail
`in the Related Applications. Two
`microphones Mic 1 and Mic 2 are used in the Pathfinder
`system, and Mic 1 is considered the “signal” microphone.
`With reference to FIG. 1,
`the Pathfinder system 300 is
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 15 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 15 of 19
`
`US 7,246,058 B2
`
`3
`equivalent to the NAVSAD system 100 when the voicing
`activity detector (VAD) 320 is a non-acoustic voicing sensor
`20 and the noise removal subsystem 340 includes the
`detection subsystem 50 and the denoising subsystem 40.
`With reference to FIG. 2,
`the Pathfinder system 300 is
`equivalent to the PSAD system 200 in the absence of the
`VAD 320, and when the noise removal subsystem 340
`includes the detection subsystem 50 and the denoising
`subsystem 40.
`The NAVSAD and PSAD systems support a two-level
`commercial approach in which(i) a relatively less expensive
`PSADsystem supports an acoustic approach that functions
`in most
`low- to medium-noise environments, and (ii) a
`NAVSAD system adds a non-acoustic sensor to enable
`detection of voiced speech in any environment. Unvoiced
`speech is normally not detected using the sensor, as it
`normally does not sufficiently vibrate human tissue. How-
`ever, in high noise situations detecting the unvoiced speech
`is not as important, as it is normally very low in energy and
`easily washed out by the noise. Therefore in high noise
`environments the unvoiced speech is unlikely to affect the
`voiced speech denoising. Unvoiced speech information is
`most important in the presence oflittle to no noise and,
`therefore, the unvoiced detection should be highly sensitive
`in low noise situations, and insensitive in high noise situa-
`tions. This is not easily accomplished, and comparable
`acoustic unvoiced detectors known in the art are incapable
`of operating under these environmental constraints.
`The NAVSAD and PSADsystemsinclude an array algo-
`rithm for speech detection that uses the difference in fre-
`quency content between two microphones to calculate a
`relationship between the signals of the two microphones.
`This is in contrast to conventional arrays that attempt to use
`the time/phase difference of each microphone to remove the
`noise outside of an “area of sensitivity”. The methods
`described herein provide a significant advantage, as they do
`not require a specific orientation of the array with respect to
`the signal.
`the systems described herein are sensitive to
`Further,
`noise of every type and every orientation, unlike conven-
`tional arrays that depend on specific noise orientations.
`Consequently, the frequency-based arrays presented herein
`are unique as they depend only ontherelative orientation of
`the two microphones themselves with no dependence on the
`orientation of the noise and signal with respect
`to the
`microphones. This results in a robust signal processing
`system with respect to the type of noise, microphones, and
`orientation between the noise/signal source and the micro-
`phones.
`The systems described herein use the information derived
`from the Pathfinder noise suppression system and/or a
`non-acoustic sensor described in the Related Applications to
`determine the voicing state of an input signal, as described
`in detail below. The voicing state includessilent, voiced, and
`unvoiced states. The NAVSAD system,
`for example,
`includes a non-acoustic sensor to detect the vibration of
`
`human tissue associated with speech. The non-acoustic
`sensor of an embodiment
`is a General Electromagnetic
`Movement Sensor (GEMS) as described briefly below and
`in detail in the Related Applications, but is not so limited.
`Alternative embodiments, however, may use any sensor that
`is able to detect human tissue motion associated with speech
`and is unaffected by environmental acoustic noise.
`The GEMSis a radio frequency device (2.4 GHz) that
`allows the detection of moving human tissue dielectric
`interfaces. The GEMSincludes an RF interferometer that
`
`uses homodyne mixing to detect small phase shifts associ-
`
`30
`
`40
`
`45
`
`55
`
`4
`ated with target motion. In essence, the sensor sends out
`weak electromagnetic waves (less than 1 milliwatt) that
`reflect off of whatever is around the sensor. The reflected
`
`waves are mixed with the original transmitted waves and the
`results analyzed for any change in position of the targets.
`Anything that moves near the sensor will cause a change in
`phase of the reflected wave that will be amplified and
`displayed as a change in voltage output from the sensor. A
`similar sensor is described by Gregory C. Burnett (1999) in
`“The physiological
`basis of glottal
`electromagnetic
`micropower sensors (GEMS) andtheir use in defining an
`excitation function for the human vocaltract”; Ph.D. Thesis,
`University of California at Davis.
`FIG. 4 is a flow diagram of a detection algorithm 50 for
`use in detecting voiced and unvoiced speech, under an
`embodiment. With reference to FIGS. 1 and 2, both the
`NAVSADand PSADsystems of an embodimentinclude the
`detection algorithm 50 as the detection subsystem 50. This
`detection algorithm 50 operates in real-time and,
`in an
`embodiment, operates on 20 millisecond windowsand steps
`10 milliseconds at a time, but is not so limited. The voice
`activity determination is recorded for the first 10 millisec-
`onds, and the second 10 milliseconds functions as a “look-
`ahead” buffer. While an embodiment uses the 20/10 win-
`
`dows, alternative embodiments may use numerous other
`combinations of window values.
`
`Consideration was given to a number of multi-dimen-
`sional factors in developing the detection algorithm 50. The
`biggest consideration was to maintaining the effectiveness of
`the Pathfinder denoising technique, described in detail in the
`Related Applications and reviewed herein. Pathfinder per-
`formance can be compromisedif the adaptive filter training
`is conducted on speech rather than on noise. It is therefore
`important not to exclude any significant amount of speech
`from the VAD to keep such disturbances to a minimum.
`Consideration was also given to the accuracy of the
`characterization between voiced and unvoiced speech sig-
`nals, and distinguishing each of these speech signals from
`noise signals. This type of characterization can be useful in
`such applications as speech recognition and speaker verifi-
`cation.
`Furthermore, the systems using the detection algorithm of
`an embodimentfunction in environments containing varying
`amounts of background acoustic noise. If the non-acoustic
`sensor is available, this external noise is not a problem for
`voiced speech. However, for unvoiced speech (and voiced if
`the non-acoustic sensor is not available or has malfunc-
`
`tioned) reliance is placed on acoustic data alone to separate
`noise from unvoiced speech. An advantage inheres in the use
`of two microphones in an embodiment of the Pathfinder
`noise suppression system, and the spatial
`relationship
`between the microphonesis exploited to assist in the detec-
`tion of unvoiced speech. However, there may occasionally
`be noise levels high enough that the speech will be nearly
`undetectable and the acoustic-only method will fail. In these
`situations,
`the non-acoustic sensor (or hereafter just the
`sensor) will be required to ensure good performance.
`In the two-microphonesystem, the speech source should
`be relatively louder in one designated microphone when
`compared to the other microphone. Tests have shown that
`this requirement
`is easily met with conventional micro-
`phones when the microphonesare placed on the head, as any
`noise should result in an H, with a gain near unity.
`Regarding the NAVSAD system, and with reference to
`FIG. 1 and FIG. 3, the NAVSAD relies on two parameters
`to detect voiced speech. These two parameters include the
`energy of the sensor in the window ofinterest, determined
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 16 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 16 of 19
`
`US 7,246,058 B2
`
`5
`in an embodiment by the standard deviation (SD), and
`optionally the cross-correlation (XCORR) between the
`acoustic signal from microphone1 and the sensor data. The
`energy of the sensor can be determined in any one of a
`number of ways, and the SD is just one convenient way to
`determine the energy.
`For the sensor, the SD is akin to the energy ofthe signal,
`which normally corresponds quite accurately to the voicing
`state, but may be susceptible to movement noise (relative
`motion of the sensor with respect to the human user) and/or
`electromagnetic noise. To further differentiate sensor noise
`from tissue motion, the XCORR can be used. The XCORR
`is only calculated to 15 delays, which corresponds to just
`under 2 milliseconds at 8000 Hz.
`
`The XCORRcan also be useful when the sensorsignalis
`distorted or modulated in some fashion. For example, there
`are sensor locations (such as the jaw or back of the neck)
`where speech production can be detected but where the
`signal may have incorrect or distorted time-based informa-
`tion. That is, they may not have well defined features in time
`that will match with the acoustic waveform. However,
`XCORRis more susceptible to errors from acoustic noise,
`and in high (<0 dB SNR) environments is almost useless.
`Therefore it should not be the sole source of voicing
`information.
`The sensor detects human tissue motion associated with
`
`the closure of the vocal folds, so the acoustic signal pro-
`ducedby the closure ofthe folds is highly correlated with the
`closures. Therefore, sensor data that correlates highly with
`the acoustic signal is declared as speech, and sensor data that
`does not correlate well is termed noise. The acoustic datais
`
`expected to lag behind the sensor data by about 0.1 to 0.8
`milliseconds (or about 1-7 samples) as a result of the delay
`time dueto the relatively slower speed of sound (around 330
`m/s). However, an embodiment uses a 15-sample correla-
`tion, as the acoustic wave shapevaries significantly depend-
`ing on the sound produced, and a larger correlation width is
`needed to ensure detection.
`
`The SD and XCORRsignals are related, but are suffi-
`ciently different so that the voiced speech detection is more
`reliable. For simplicity, though, either parameter may be
`used. The values for the SD and XCORRare compared to
`empirical thresholds, and if both are above their threshold,
`voiced speech is declared. Example data is presented and
`described below.
`
`FIGS. 5A, 5B, and 6 show data plots for an example in
`which a subject twice speaks the phrase “pop pan”, under an
`embodiment. FIG. 5A plots the received GEMSsignal 502
`for this utterance along with the mean correlation 504
`between the GEMSsignal and the Mic 1 signal and the
`threshold T1 used for voiced speech detection. FIG. 5B plots
`the received GEMSsignal 502 for this utterance along with
`the standard deviation 506 of the GEMSsignal and the
`threshold T2 used for voiced speech detection. FIG. 6 plots
`voiced speech 602 detected from the acoustic or audio signal
`608, along with the GEMSsignal 604 andthe acoustic noise
`606; no unvoiced speech is detected in this example because
`of the heavy background babble noise 606. The thresholds
`have been set so that there are virtually no false negatives,
`and only occasionalfalse positives. A voiced speech activity
`detection accuracy of greater than 99% has been attained
`under any acoustic background noise conditions.
`The NAVSAD can determine when voiced speech is
`occurring with high degrees of accuracy due to the non-
`acoustic sensor data. However, the sensoroffers little assis-
`tance in separating unvoiced speech from noise, as unvoiced
`speech normally causes no detectable signal in most non-
`
`6
`acoustic sensors. If there is a detectable signal, the NAVSAD
`can be used, although use of the SD method is dictated as
`unvoiced speech is normally poorly correlated.
`In the
`absenceofa detectable signal use is made of the system and
`methods of the Pathfinder noise removal algorithm in deter-
`mining when unvoiced speech is occurring. A brief review
`of the Pathfinder algorithm is described below, while a
`detailed description is provided in the Related Applications.
`With reference to FIG.3, the acoustic information coming
`into Microphone 1 is denoted by m,(n), the information
`coming into Microphone 2 is similarly labeled m,(n), and
`the GEMSsensor is assumed available to determine voiced
`
`10
`
`15
`
`speech areas. In the z (digital frequency) domain,
`signals are represented as M,(z) and M.(z). Then
`
`these
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`My(2) = S(x) + Naz)
`
`M(z) = N(z) + S2(z)
`
`with
`
`N2(@) = N@AL(Z)
`
`So () = S(Z)Haz)
`
`so that
`
`My(2) = S(2)+ NAL)
`Ma(z) = N(x) + S()Ha()
`
`dQ)
`
`This is the general case for all two microphone systems.
`There is always going to be some leakage of noise into Mic
`1, and someleakage of signal into Mic 2. Equation 1 has four
`unknownsand only two relationships and cannot be solved
`explicitly.
`However, there is another way to solve for some of the
`unknownsin Equation 1. Examine the case wherethe signal
`is not being generated—that is, where the GEMSsignal
`indicates voicing is not occurring. In this case, s(n)=S(z) =0,
`and Equation 1 reduces to
`
`M,,(2)-N@)A)
`
`M2,(Z)-N@)
`
`where the n subscript on the M variables indicate that only
`noise is being received. This leads to
`
`Min(Z) = Mon(2)A(2)
`
`Min
`noe tl
`
`(2)
`
`H,(@) can be calculated using any of the available system
`identification algorithms and the microphone outputs when
`only noise is being received. The calculation can be done
`adaptively, so that if the noise changes significantly H,(z)
`can be recalculated quickly.
`With a solution for one of the unknowns in Equation 1,
`solutions can be found for another, H(z), by using the
`amplitude of the GEMSor similar device along with the
`amplitude of the two microphones. When the GEMSindi-
`cates voicing, but the recent (less than 1 second) history of
`
`

`

`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 17 of 19
`Case 6:21-cv-00984-ADA Document 19-2 Filed 12/23/21 Page 17 of 19
`
`US 7,246,058 B2
`
`7
`the microphones indicate low levels of noise, assume that
`n(s)=N(z)~0. Then Equation 1 reduces to
`M,,(2)-S@)
`
`M2,(2)-S@)Ha)
`
`which in turn leads to
`
`Mos(z) = Mis(Z)H2(z)
`
`Mos(2)
`Ms(z)
`
`F(z) =
`
`which is the inverse of the H,(z) calculation, but note that
`different inputs are being used.
`After calculating H,(z) and H.(z) above, they are used to
`removethe noise from the signal. Rewrite Equation 1 as
`S(@)=M(@)-N@H(2)
`
`N(@=M3@)-S(@)H @)
`
`S(2)=M (@)-[M3(2)-S@)H(Z)2),
`
`S(@)[1-Ha(2)@)]=M| (2)-Ma(@)H(2)
`
`and solve for S(z) as:
`
`S{z) =
`
`Mi (2) — Mo(2Ai@)
`1 — A(z), (2)
`
`3

`
`In practice H,(z) is usually quite small, so that H,(z)H,(z)
`<<l, and
`
`S@)=M(@)-M, 2A),
`
`obviating the need for the H,(z) calculation.
`With reference to FIG. 2 and FIG. 3, the PSAD system is
`described. As sound waves propagate, they normally lose
`energy as they travel due to diffraction and dispersion.
`Assuming the sound waves originate from a point source
`and radiate isotropically, their amplitude will decrease as a
`function of 1/r, where r is the distance from the originating
`point. This function of 1/r proportional to amplitude is the
`worst case, if confined to a smaller area the reduction will be
`less. Howeverit is an adequate model for the configurations
`of interest, specifically the propagation of noise and speech
`to microphones located somewhere on the user’s head.
`FIG. 7 is a microphone array for use under an embodi-
`ment of the PSAD system. Placing the microphones Mic 1
`and Mic 2 in a linear array with the mouth on the array
`midline, the difference in signal strength in Mic 1 and Mic
`2 (assuming the microphones have identical
`frequency
`responses) will be proportional to both d, and Ad. Assuming
`a 1/r (or in this case 1/d) relationship, it is seen that
`
`|Micl|
`~ [Mic)|
`
`
`d, + Ad
`= AM (Z) « aq
`
`where AM is the difference in gain between Mic 1 and Mic
`2 and therefore H,(z), as above in Equation 2. The variable
`d, is the distance from Mic 1 to the speech or noise source.
`
`8
`FIG.8 is a plot 800 of AM versus d, for several Ad values,
`under an embodiment.It is clear that as Ad becomes larger
`and the noise source is closer, A

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket