`
`a2) United States Patent
`US 8,838,184 B2
`(0) Patent No.:
`*Sep. 16, 2014
`(45) Date of Patent:
`Burnett et al.
`
`(54)
`
`(75)
`
`WIRELESS CONFERENCE CALL
`TELEPHONE
`
`Inventors: Gregory C. Burnett, Northfield, MN
`(US); Michael Goertz, Redwood City,
`CA (US); Nicolas Jean Petit, Mountain
`View, CA (US); Zhinian Jing, Belmont,
`CA (US); Steven Foster Forestieri,
`Santa Clara, CA (US); Thomas Alan
`Donaldson, London (GB)
`
`(73)
`
`Assignee: AliphCom, San Francisco, CA (US)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 132 days.
`
`This patent is subject to a terminal dis-
`claimer.
`
`(21)
`
`Appl. No.: 13/184,422
`
`(22)
`
`Filed:
`
`Jul. 15, 2011
`
`(65)
`
`(63)
`
`(60)
`
`(51)
`
`Prior Publication Data
`
`US 2012/0288079 Al
`
`Nov. 15, 2012
`
`Related U.S. Application Data
`
`Continuation-in-part of application No. 12/139,333,
`filed on Jun. 13, 2008, now Pat. No. 8,503,691, and a
`continuation-in-part of application No. 10/667,207,
`filed on Sep. 18, 2003, now Pat. No. 8,019,091.
`
`Provisional application No. 61/364,675, filed on Jul.
`15, 2010.
`
`Int. Cl.
`
`HO04B 1/38
`HO04M 1/00
`HO4M 3/56
`HOA4R 3/04
`GIOL 21/0208
`HO4R 3/00
`HO4R 1/40
`
`(2006.01)
`(2006.01)
`(2006.01)
`(2006.01)
`(2013.01)
`(2006.01)
`(2006.01)
`
`(2006.01)
`(2013.01)
`
`HO4R 1/10
`GIOL 21/0216
`US. Cl.
`CPC wee H04M 3/568 (2013.01); HO4R 1/1083
`(2013.01); HO4M 2203/509 (2013.01); GIOL
`2021/02165 (2013.01); HO4R 2420/07
`(2013.01); HO4M 3/56 (2013.01); HO4R 3/04
`(2013.01); G1OL 21/0208 (2013.01); HO4R
`3/005 (2013.01); HO4M 2250/62 (2013.01);
`HOAR 1/406 (2013.01)
`USPC ooceececceeceneseneeeescneneenees 455/569.1; 455/570
`Field of Classification Search
`USPC oo ceeeeeeseneteceecneeeeneceeees 455/569.1, 570
`See application file for complete search history.
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`(52)
`
`(58)
`
`(56)
`
`6,707,910 BL*
`2002/0039425 Al*
`
`3/2004 Valve etal. o.. 379/388.06
`4/2002 Burnett etal. oc. 381/94.7
`
`(Continued)
`
`Primary Examiner — Howard Weiss
`(74) Attorney, Agent, or Firm — Kokka & Backus, PC
`
`(57)
`
`ABSTRACT
`
`A wireless conference call telephone system uses body-worn
`wired or wireless audio endpoints comprising microphone
`arrays and, optionally, speakers. These audio-endpoints,
`which include headsets, pendants, and clip-on microphones
`to name a few, are used to capture the user’s voice and the
`resulting data may be used to remove echo and environmental
`acoustic noise. Each audio-endpointtransmits its audio to the
`telephony gateway, where noise and echo suppression can
`take place if not already performed on the audio-endpoint,
`and where each audio-endpoint’s output can be labeled, inte-
`grated with the output of other audio-endpoints, and trans-
`mitted over one or more telephony channels of a telephone
`network. The noise and echo suppression can also be done on
`the audio-endpoint. The labeling of each user’s output can be
`usedby the outside caller’s phone to spatially locate each user
`in space, increasing intelligibility.
`
`16 Claims, 33 Drawing Sheets
`
`tivo 7
`
`]
`
`
`
`
`
`4g0
`
`Hag”
`‘Wireless Children and/orFriends
`
`1
`
`APPLE 1012
`
`Pacent Hog"
`Network Connection fa
`
`
`JFJrtz
`Telephony,
`Telephony
`Telephony,
`
`Connection]
`
`
`Connection
`Conseco]
`LT ij
`
`Multi-way Calling Subsystem 420
`Connection 44p
`Audio Processing
`
`
`Management
`435
`ll
`Ll
`|
`Wireless
`Wireless
`Wireless
`Radio qq?
`Ratiogy0
`Radio 44"
`
`
`whea
`Child
`wes
`
`|
`|
`
`
`
`
`
`
`
`eH,
` é= HAs
`
`APPLE 1012
`
`1
`
`
`
`US 8,838,184 B2
`
`Page 2
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`........... 455/416
`2009/0264114 Al* 10/2009 Virolainenet al.
`2012/0184337 Al*
`7/2012 Burnett etal.
`....0..... 455/569.1
`
`2009/0081999 Al*
`
`3/2009 Khasawnehetal. .......... 455/416
`
`* cited by examiner
`
`2
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 1 of 33
`
`US8,838,184 B2
`
`We
`
`Clip
`
`\
`Vent
`
`
`o\9 Directional -—~” fx
`mic (inside)
`
`Battery/Radio/DSP
`Multi-use “"S'G2) 10
`button
`140
`
`Vent
`iv
`
`Figure 1
`
`3
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 2 of 33
`
`US8,838,184 B2
`
`Zoe")
`
`Y
`
`Zhe
`Multi-use
`button
`
`ZZo
`™\ Connects behind
`neck of user
`
`Battery/Radio/DSP
`(inside)
`zefo
`
`0, vent
`Zg0
`
`Figure 2
`
`4
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 3 of 33
`
`US8,838,184 B2
`
`Soo J
`
`g
`
`connections
`
`——
`
`4
`
`392
`
`headset Children
`
`440
`Child recharging
`stations
`| \\ s, Four wireless
`Power and telephony
`
`
`
` ZLO
`
`Message window
`
`Function buttons
`
`$30
`
`410
`
`34
`
`Figure 3
`
`5
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 4 of 33
`
`US8,838,184 B2
`
`{vo
`
`Parent 40"
`
`
`
`anagement
`
`
`
`
`
`
`
`Network Connection Lf ja
`
`
`
`
`Jf
`
`
`Telephony,,|:
`Telephony
`Telephony
`
`
`
`
`
`
`
`
`
` Neomnection ud
`.
`Wired
`
`Child
`uLs
`
`
`
`
`
`Wireless
`Wireless
`Radio 44e
`
`Radio gd
`
`Wireless Children and/or Friends
`
`Figure 4
`
`6
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 5 of 33
`
`US8,838,184 B2
`
`
`
`
`Far-end
`user over
`
`
`
`Far end
`
`user over
`
`
`end user
`
`
`SIP B
`Friend A
`
`$20
`
`
`
`
` Near-
`
`
`
`end user
`
`
`Friend B
`
`
`Figure 5
`
`7
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 6 of 33
`
`US8,838,184 B2
`
` ba
`
`
`
`
`
`
`= User participates in conferencecall
`
`Leaves room
`LL 69°
`G32
`
`t Sets up telephonyconnection tocontinuecall (optional)
`we
`
`Breaks Connection
`-
`
`34
`
`
`Removes from
`
`conferencecall
`
`
`Figure 6
`
`
`
`ty Yor
`
`Request audio connection
`
`tt
`
`Accepts Connection
`
`Request audio connection
`
`Sets up audio connection
`
`624
`
`Accepts audio connection
`SE
`
`626
`
`
`Adds audio to
`conference call
`
`
`
`
`8
`
`
`
`Sep. 16, 2014
`
`U.S. Patent
`
`((5))|oOtL
`
`(u)s
`
`TVNOIS
`
`Sheet 7 of 33
`
`US 8,838,184 B2
`
`yo
`
`(())
`
`HSION
`
`(u)u
`
`9
`
`
`
`Sheet 8 of 33
`
`US8,838,184 B2
`
`U.S. Patent
`
`Sep. 16, 2014
`
`
`
`10
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 9 of 33
`
`US 8,838,184 B2
`
`
`
`FIG, 10
`
`11
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 10 of 33
`
`US 8,838,184 B2
`
`
`
`FIG
`
`12
`
`12
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 11 of 33
`
`US 8,838,184 B2
`
`
`
`[Zoo
`
`TSS meee
`
`-aa
`
`FIG,IZ
`
`13
`
`13
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 12 of 33
`
`US 8,838,184 B2
`
`Receive acoustic signals at a first physical
`microphone and a second physical microphone.
`
`
`
`Output first microphone signal from first physical
`microphone and second microphone signal from
`second physical microphone,
`
`
`
`702.
`
`(Be
`
`Form first virtual microphoneusing the first combination
`of first microphone signal and second microphonesignal.
`
`306
`
`Form second virtual microphone using second combination 130g
`of first microphonesignal and second microphonesignal.
`
`Generate denoised output signals having less
`acoustic noise than received acoustic signals.
`
`(310
`
`LA
`
`Vee
`
`FIG, '3
`
`
`
`Form physical microphonearray includingfirst
`physical microphone and secondphysical microphone.
`© [Oe
`
`
`
`
`
`
`Form virtual microphonearray includingfirst virtual
`microphone and secondvirtual microphoneusing a |4o4
`signals from physical microphonearray.
`
`
`\4 00 7
`FIG. 4
`
`
`
`14
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 13 of 33
`
`US 8,838,184 B2
`
`Linear response of V2 to a speech source at 0.10 meters
`
`FIG. 15
`0.8
`
`90
`
`15
`
`15
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 14 of 33
`
`US 8,838,184 B2
`
`Linear response of V1 to a speech sourceat 0.10 meters oo
`
`fi,
`
`CD—<
`
`Linear response of V1 to a noise source at | meters
`
`16
`
`16
`
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 15 of 33
`
`US8,838,184 B2
`
`Linear response of V1 to a speech source at 0.1 meters
`
`180
`
`17
`
`17
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 16 of 33
`
`US 8,838,184 B2
`
`
`
`Response(dB)
`
`Frequency response at 0 degrees
`
`Catdioid speech
`response
`
`aprossosscboscossonndenms VIL gneegh wbovcecccbtet
`|
`!
`|
`response
`|
`
`|
`
`-20
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`Frequency (Hz)
`
`FIG, 20
`
`18
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 17 of 33
`
`US8,838,184 B2
`
`
`
`Response(dB)
`
`0.7
`
`0.8
`B
`FIG,2
`
`V1 (top, dashed) and V2 speech responsevs. B assuming d, = 0.1m
`
`
`V1/V2 for speech versus B assuming d, = 0.1m
`
`
`
`
`
`
`
`V1/V2forspeech(dB)
`
`19
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 18 of 33
`
`US 8,838,184 B2
`
`B factorvs. actual d, assuming d, = 0.1m and theta = 0
`
`
`005
`01
`015
`O02
`025
`O38
`0385
`O04
`045
`05
`Actual d; (meters)
`FIG, 2%
`
`B versus theta assuming d, = 0.1m
`
`
`80
`-60
`40
`-20
`0
`20
`40
`60
`80
`theta (degrees)
`FIG. 24
`
`20
`
`20
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 19 of 33
`
`US 8,838,184 B2
`
`(dB) &
`
`100
`
`1000
`
`2000
`
`Amplitude
`
`bb
`3S
`
`3p
`
`y
`
`5000
`4000
`3000
`Frequency (Hz)
`
`6000
`
`7000
`
`8000
`
`FIG, 25
`
`21
`
`21
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 20 of 33
`
`US 8,838,184 B2
`
`Oe
`
`-10
`
`baneTerran
`
`:
`-20
`$BQ |nennnreeerence eet eecnc nn
`“40
`1000
`2000
`3000
`4000
`
`dp cenn ccna nnncenbanncanone
`5000
`6000
`7000
`8000
`
`260
`2ee
`
`N(s) for B = 1.2 and D = -7.2e-006 seconds
`
`
`
`
`Amplitude(dB)
`
`
`
`Phase(degrees)
`
`DOD|------- haebe neeeebbeee
`
`220
`
`180
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`Frequency (Hz)
`
`FIG. 26
`
`22
`
`22
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 21 of 33
`
`US 8,838,184 B2
`
`Cancellation with dl = 1, thetal = 0, d2 = 1, and theta2 = 30
`
`
`
`
`8000
`
`
`
`605
`1000
`2000
`3000
`4000
`5000
`6000
`7000
`Frequency (Hz)
`
`23
`
`
`
`Amplitude(dB)
`
`Eb
`
`3 #P
`
`o
`
`23
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 22 of 33
`
`US 8,838,184 B2
`
`Cancellation with dl = 1, thetal = 0, d2 = 1, and theta2 = 45
`
`
`
`
`
`
`
`
`
`
`
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`
`
`Amplitude(dB)
`
`
`
`Phase(degrees)
`
`
`
`7000
`
`8000
`
`
`
`6000
`
`3000
`4000
`5000
`Frequency (Hz)
`
`0
`
`1000
`
`2000
`
`FIG. 23
`
`24
`
`24
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 23 of 33
`
`US 8,838,184 B2
`
`Original V1 (top) and cleaned V1 (bottom) with stmplifted VAD (dashed) in noise
`
`Noisy
`
`Cleaned
`
`0
`
`0.5
`
`1
`1.5
`Time (samples at 8 kHz/sec)
`
`2
`
`2.5
`
`FIG, a1
`
`25
`
`25
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 24 of 33
`
`US 8,838,184 B2
`
`OeSls
`
`OvOE
`
`BUISTOUaT
`
`waysAsqns
`
`SUINTOA
`
`slosuag
`
`OcO0e
`
`
`
`ogoJOSS900Ig
`
`souoqdors1
`
`OLOE\—cove
`
`
`yoaedgpours])
`
`JBAOWAYSION
`
`LeOld
`
`26
`
`26
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`—ow—/(3)
`
`
`
`:(2)s|SN(2)'Hjeusis
`
`Sep. 16, 2014
`
`Sheet 25 of 33
`
`US 8,838,184 B2
`
`ceOls
`
`(9)
`
`(2)"NUoSION
`
`27
`
`27
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 26 of 33
`
`US 8,838,184 B2
`
`ceOld
`
`()
`
`eusts
`
`(2)s
`
`28
`
`28
`
`
`
`U.S. Patent
`
`FIG.34
`
`Sep. 16, 2014
`
`Sheet 27 of 33
`
`Receive acoustic signals
`
`Receive voice activity
`(VAD) information
`
` US 8,838,184 B2
`
`Determine absence of
`voicing and generate first
`transfer function
`
`3406
`
`Determine presence of
`voicing and generate
`second transfer function
`
`Produce denoised
`acoustic data stream
`
`3410
`
`29
`
`29
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 28 of 33
`
`US 8,838,184 B2
`
`Dirty
`Audio
`3504
`
`Noise Removal Results for American English Female Saying 406-5562
`
`3502
`
`4
`
`Cleaned
`Audio
`
`30
`
`30
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 29 of 33
`
`US 8,838,184 B2
`
`-I6.36A
`
`FIG.36B
`
`VAD
`
`Device
`
`VAD
`
`3602A
`
`3630
`
`3640
`
`|
`
`3601
`
`3602B
`
`Signal
`Processing
`
`3600
`
`VAD
`Algorithm
`3650
`3664 fs 3604
` System
`
`Noise
`
`Suppression
`System
`
`3601
`
`31
`
`31
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 30 of 33
`
`US 8,838,184 B2
`
`yo 8100
`
`FIG.37
`
`32
`
`32
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 31 of 33
`
`US 8,838,184 B2
`
`Denoised
`
`AccelerometerNoisyAudio
`
`Time (samples at 8 kHz)
`Ke
`
`FIG.38
`
`33
`
`33
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 32 of 33
`
`US 8,838,184 B2
`
`2 oh
`2 0
`3902
`:
`
` 4 3904
`
`-0.4
`
`Audio
`
`35
`
`4
`
`45
`
`5
`
`55
`
`6
`
`65
`
`SSM
`
`Denoised
`
`2
`
`25
`
`3
`
`Time (samples at 8 kHz)
`
`FIG.39
`
`34
`
`34
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 33 of 33
`
`US 8,838,184 B2
`
`
`
`GEMSNoisyAudio
`
`Audio
`
`Denoised
`
`\
`
`Time (samples at 8 kHz)
`|
`
`FIG.40
`
`)
`
`35
`
`35
`
`
`
`US 8,838,184 B2
`
`1
`WIRELESS CONFERENCE CALL
`TELEPHONE
`
`RELATED APPLICATIONS
`
`This application claimsthe benefit of U.S. Patent Applica-
`tion No. 61/364,675, filed Jul. 15, 2010.
`This application is a continuation in part of U.S. patent
`application Ser. No. 12/139,333, filed Jun. 13, 2008.
`This application is a continuation in part of U.S. patent
`application Ser. No. 10/667,207, filed Sep. 18, 2003.
`
`TECHNICAL FIELD
`
`The disclosure herein relates generally to telephones con-
`figured for conference calling, including such implementa-
`tions as personal computers or servers acting as telephony
`devices.
`
`BACKGROUND
`
`Conventional conference call telephones use one or more
`microphonesto sample acoustic soundin the environmentof
`interest and one or more loudspeakersto broadcast the incom-
`ing communication. There are severaldifficulties involved in
`such communications systems, including strong echo paths
`between the loudspeaker(s) and the microphone(s), difficulty
`in clearly transmitting the speech of users in the room, and
`little or no environmental acoustic noise suppression. These
`problemsresult in the outside caller(s) having difficulty hear-
`ing and/or understandingall of the users, poor or impossible
`duplex communication, and noise (such as mobile phone
`ringers and typing on keyboards on the same table as the
`conference phone) being clearly transmitted through the con-
`ference call to the outside caller(s)—sometimes at a higher
`level than the users’ speech.
`
`INCORPORATION BY REFERENCE
`
`Each patent, patent application, and/or publication men-
`tioned in this specification is herein incorporated by reference
`in its entirety to the sameextent as if each individual patent,
`patent application, and/or publication was specifically and
`individually indicated to be incorporated by reference.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 shows a body-worn Child device as a clip-on micro-
`phonearray, under an embodiment.
`FIG. 2 shows a body-worn Child device as a pendant
`microphonearray, under an alternative embodiment.
`FIG. 3 showsa wireless conference call telephone system
`comprising a Parent with four wireless Children and one
`wired Child, under an embodiment.
`FIG. 4 showsa block diagram of wireless conferencecall
`telephone system comprising a Parent and its modules and the
`Children/Friends (three headsets and a loudspeaker), under
`an embodiment.
`
`FIG. 5 is a flow diagram showing audio streaming between
`two far-end users and two near-end users, under an embodi-
`ment.
`
`FIG. 6 is a flow chart for connecting wireless Friends/
`Children and a Parent of the wireless conference call tele-
`
`phone system, under an embodiment.
`FIG. 7 is a two-microphone adaptive noise suppression
`system, under an embodiment.
`
`2
`FIG. 8 is an array and speech source (S) configuration,
`under an embodiment. The microphones are separated by a
`distance approximately equal to 2d), and the speech sourceis
`located a distance d, away from the midpointofthe array at an
`angle 0. The system is axially symmetric so only d, and 0 need
`be specified.
`FIG. 9 is a block diagram fora first order gradient micro-
`phoneusing two omnidirectional elements O, and O., under
`an embodiment.
`
`FIG. 10 is a block diagram for a DOMA including two
`physical microphones configured to form two virtual micro-
`phones V, and V.,, under an embodiment.
`FIG. 11 is a block diagram for a DOMA including two
`physical microphones configured to form N virtual micro-
`phones V, through V,,, where N is any numbergreater than
`one, under an embodiment.
`FIG. 12 is an exampleof a headset or head-worn devicethat
`includes the DOMA,as described herein, under an embodi-
`ment.
`
`FIG. 13 is a flow diagram for denoising acoustic signals
`using the DOMA,under an embodiment.
`FIG. 14 1s a flow diagram for forming the DOMA,under an
`embodiment.
`
`FIG.15 is a plot of linear response of virtual microphone
`V, to a 1 kHz speech sourceat a distance of 0.1 m, under an
`embodiment. The null is at 0 degrees, where the speech is
`normally located.
`FIG.16 is a plot of linear response of virtual microphone
`V, to a 1 kHz noise source at a distance of 1.0 m, under an
`embodiment. There is no null and all noise sources are
`detected.
`FIG. 17 is a plot of linear response of virtual microphone
`V, toa 1 kHz speech source at a distance of 0.1 m, under an
`embodiment. There is no null and the response for speech is
`greater than that shown in FIG.9.
`FIG.18 is a plot of linear response of virtual microphone
`V, toa 1 kHz noise source at a distance of 1.0 m, under an
`embodiment. There is no null andthe responseis very similar
`to V, shown in FIG. 10.
`FIG. 19 is a plot of linear response of virtual microphone
`V,, to aspeech source at a distance of 0.1 m for frequencies of
`100, 500, 1000, 2000, 3000, and 4000 Hz, under an embodi-
`ment.
`
`FIG. 20 is a plot showing comparison of frequency
`responses for speech for the array of an embodiment andfor
`a conventional cardioid microphone.
`FIG. 21 is a plot showing speech response for V, (top,
`dashed) andV,, (bottom,solid) versus B with d, assumedto be
`0.1 m, under an embodiment. Thespatial null in V, is rela-
`tively broad.
`FIG.22 is a plot showing a ratio ofV,/V, speech responses
`shown in FIG. 10 versus B, under an embodiment. Theratio
`is above 10 dB forall 0.8<B <1.1. This meansthat the physi-
`cal B of the system need not be exactly modeled for good
`performance.
`FIG.23is a plot of B versus actual d, assumingthat d,=10
`cm and theta=0, under an embodiment.
`FIG. 24 is a plot of B versus theta with d=10 cm and
`assuming d,=10 cm, under an embodiment.
`FIG. 25 is a plot of amplitude (top) and phase (bottom)
`response of N(s) with B=1 and D=-7.2 usec, under an
`embodiment. The resulting phase difference clearly affects
`high frequencies more than low.
`FIG. 26 is a plot of amplitude (top) and phase (bottom)
`response of N(s) with B=1.2 and D=-7.2 sec, under an
`embodiment. Non-unity B affects the entire frequency range.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`36
`
`36
`
`
`
`US 8,838,184 B2
`
`3
`FIG. 27 is a plot of amplitude (top) and phase (bottom)
`response of the effect on the speech cancellation in V, due to
`a mistake in the location of the speech source with q1=0
`degrees and q2=30 degrees, under an embodiment. The can-
`cellation remains below -10 dB for frequencies below 6 kHz.
`FIG. 28 is a plot of amplitude (top) and phase (bottom)
`response of the effect on the speech cancellation in V, due to
`a mistake in the location of the speech source with q1=0
`degrees and q2=45 degrees, under an embodiment. The can-
`cellation is below -10 dB only for frequencies below about
`2.8 kHz and a reduction in performanceis expected.
`FIG. 29 shows experimentalresults for a 2d)=19 mm array
`using a linear § of 0.83 on a Bruel and Kjaer Head and Torso
`Simulator (HATS) in very loud (~85 dBA) music/speech
`noise environment, under an embodiment. The noise has been
`reduced by about 25 dB and the speech hardly affected, with
`no noticeable distortion.
`
`FIG. 30 is a block diagram ofa denoising system, under an
`embodiment.
`
`FIG.31 is a block diagram including components ofa noise
`removalalgorithm, underthe denoising system of an embodi-
`ment assuming a single noise source and direct paths to the
`microphones.
`FIG.32 is a block diagram including front-end components
`of a noise removal algorithm of an embodiment generalized
`to n distinct noise sources (these noise sources maybereflec-
`tions or echoes of one another).
`FIG.33 is a block diagram including front-end components
`of a noise removal algorithm of an embodimentin a general
`case wherethereare n distinct noise sources and signalreflec-
`tions.
`
`FIG.34 is a flow diagram of a denoising method, under an
`embodiment.
`
`FIG. 35 showsresults of a noise suppression algorithm of
`an embodiment for an American English female speaker in
`the presence of airport terminal noise that includes many
`other human speakers and public announcements.
`FIG. 36Ais a block diagram of a Voice Activity Detector
`(VAD) system including hardware for use in receiving and
`processing signals relating to VAD, under an embodiment.
`FIG.36B is a block diagram of a VAD system using hard-
`ware of a coupled noise suppression system for use in receiv-
`ing VAD information, under an alternative embodiment.
`FIG. 37 is a flow diagram of a method for determining
`voiced and unvoiced speech using an accelerometer-based
`VAD, under an embodiment.
`FIG. 38 shows plots including a noisy audio signal (live
`recording) along with a corresponding accelerometer-based
`VADsignal, the corresponding accelerometer output signal,
`and the denoised audio signal following processing by the
`noise suppression system using the VAD signal, under an
`embodiment.
`
`FIG. 39 shows plots including a noisy audio signal (live
`recording) along with a corresponding SSM-based VAD sig-
`nal, the corresponding SSM output signal, and the denoised
`audio signal following processing by the noise suppression
`system using the VAD signal, under an embodiment.
`FIG. 40 shows plots including a noisy audio signal (live
`recording) along with a corresponding GEMS-based VAD
`signal,
`the corresponding GEMS output signal, and the
`denoised audio signal following processing by the noise sup-
`pression system using the VADsignal, under an embodiment.
`
`DETAILED DESCRIPTION
`
`The conference-call telephone, also referred to as a speak-
`erphone,
`is a vital tool in business today. A conventional
`
`4
`speakerphonetypically uses a single loudspeaker to transmit
`far-end speech and one or more microphonesto capture near-
`end speech. The proximity of the loudspeaker to the micro-
`phone(s) requires effective echo cancellation and/or half-
`duplex operation. Also, the intelligibility of the users on both
`ends is often poor, and there may be very large differences in
`sound levels betweenusers, depending on their distance to the
`speakerphone’s microphone(s).
`In addition, no effective
`noise suppression of the near-end is possible, and various
`noises (like mobile phones ringing) create a large nuisance
`during thecall.
`A wireless conference call telephone system is described
`herein that addresses many of the problems of conventional
`conference call telephones. Instead of using microphones on
`or near the conference call
`telephone,
`the embodiments
`described herein use body-worn wiredor wireless audio end-
`points (e.g., comprising microphones and optionally, loud-
`speakers). These body-worn audio-endpoints (for example,
`headsets, pendants, clip-on microphones, etc.) are used to
`capture the user’s voice andthe resulting data may be used to
`remove echo and environmental acoustic noise. Each headset
`
`or pendant transmits its audio to the conference call phone,
`where noise and echo suppression can take place if not
`already performed on the body-worn unit, and where each
`headset or pendant’s output can be labeled, integrated with
`the other headsets and/or pendants, and transmitted over a
`telephone network, over one or more telephony channels. The
`noise and echo suppression can also be done on the headset or
`pendant. The labeling of each user’s output can be used by the
`outside caller’s phoneto spatially locate each user in space,
`increasing intelligibility.
`In the following description, numerousspecific details are
`introduced to provide a thorough understanding of, and
`enabling description for, embodiments of the wireless con-
`ference call telephone system and methods. Oneskilled in the
`relevant art, however, will recognize that these embodiments
`can be practiced without one or more ofthe specific details, or
`with other components, systems, etc. In other instances, well-
`known structures or operations are not shown, or are not
`described in detail, to avoid obscuring aspectsofthe disclosed
`embodiments.
`Unless otherwise specified, the following terms have the
`corresponding meaningsin addition to any meaning or under-
`standing they may conveyto oneskilled in theart.
`The term “conference calling” is defined as the use of a
`telephony device that is designed to allow one or more near-
`end users to connectto a phonethat will then connect through
`an analog or digital
`telephony network to
`another
`telephone(s).
`The term “omnidirectional microphone” meansa physical
`microphone that is equally responsive to acoustic waves
`originating from any direction.
`The term “near-end”refers to the side of the telephonecall
`that is in acoustic proximity to the conference calling system.
`The term “far-end”refers to the side of the telephonecall
`that is not in acoustic proximity to the conference calling
`system.
`The term “noise” means unwanted environmental acoustic
`noise in the environment of the conference call phone.
`The term “virtual microphones (VM)”or “virtual direc-
`tional microphones” means a microphone constructed using
`two or more omnidirectional microphones and associated
`signal processing.
`The term “Children” refers to one or more body-worn
`audio endpoints (for example, headsets or pendants or other
`body-worn devices that contain microphonearraysofat least
`one microphone and an optional loudspeaker). They may be
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`37
`
`37
`
`
`
`US 8,838,184 B2
`
`5
`wired or wireless. Children are hard-coded to the Parent so
`that they cannoteasily be used with other devices. If needed,
`they may be recharged on the Parent for efficiency and con-
`venience.
`The term “Friends” refers to headsets or other similar
`devices that can be used with the Parentbut are notrestricted
`to the Parent. They may be wired or wireless. Examples are
`Bluetooth devices such as Aliph’s Jawbone Icon headset
`(http:/Awww.jawbone.com) and USB devices such as Log-
`itech’s ClearChat Comfort USB headset.
`
`The term “Parent”refers to the main body of the confer-
`ence call phone, where the different wired and/or wireless
`streams from each Child are received, integrated, and pro-
`cessed. The Parent broadcasts the incoming acoustic infor-
`mation to the Children and the Friends, or optionally, using a
`conventional loudspeaker.
`The term HCIis an acronym for Host Controller Interface.
`The term HFPis an acronym for the Hands-FreeProfile, a
`wireless interface specification for Bluetooth-based commu-
`nication devices.
`The term PSTN is an acronym for Public Switched Tele-
`phone Network.
`The term SDFis an acronym for Service Discovery Proto-
`col.
`
`The term SIP is an acronym for Session Initiate Protocol.
`The term SPI busis an acronym for Serial Peripheral Inter-
`face bus.
`The term UARTis an acronym for Universal asynchronous
`receiver/transmitter.
`The term USART is an acronym for Universal synchro-
`nous/asynchronousreceiver/transmitter.
`The term USBis an acronym for Universal Serial Bus.
`The term UUID is an acronym for Universally Unique
`Identifier.
`The term VoIP is an acronym for Voice over Internet Pro-
`tocol.
`
`The wireless conference call telephone system described
`herein comprises wearable wired and/or wireless devices to
`transmit both incoming and outgoing speech with or without
`a loudspeaker to ensure that all users’ speech is properly
`captured. Noise and/or echo suppression can take place on the
`wireless devicesor on the Parent device. Some of the devices
`
`mayberestricted to use only on the Parent to simplify opera-
`tion. Other wireless devices such as microphones and loud-
`speakers are also supported, and any wireless transmission
`protocols alone or in combination can be used.
`The wireless conference call
`telephone system of an
`embodiment comprises a fixed or mobile conferencing unit
`and a multiplicity of body-worn wireless telephony units or
`endpoints. The fixed or mobile conferencing unit comprises a
`telephony terminal that acts as an endpoint for a multiplicity
`of telephony calls (via PSTN, VoIP and similar). The fixed or
`mobile conferencing unit comprises a wireless terminal that
`acts as the gateway for a multiplicity of wireless audio ses-
`sions (for example Bluetooth HFP audio session). The fixed
`or mobile conferencing unit comprises an audio signal pro-
`cessing unit that inter-alia merges and optimizes a multiplic-
`ity of telephony calls into a multiplicity of wireless audio
`sessions and vice-versa. Optionally, the fixed or mobile con-
`ferencing unit comprises a loudspeaker.
`The body-worn wireless telephony unit of an embodiment
`comprises a wireless communication system that maintains
`an audio session with the conferencing unit (such as a Blue-
`tooth wireless system capable of enacting the HFP protocol).
`The body-worm wireless telephony unit comprises a user
`speech detection and transmission system (e.g., microphone
`system). The body-worn wireless telephony unit optionally
`
`6
`comprises a meansof presenting audioto the user. The body-
`worn wireless telephony unit optionally comprises a signal
`processorthat optimizes the user speech for transmission to
`the conferencing unit (for example by removing echo and/or
`environmental noise). The body-wom wireless telephony
`unit optionally comprises a signal processor that optimizes
`received audio for presentation to the user.
`Moving the microphones from the proximity of the loud-
`speaker to the body ofthe useris a critical improvement. With
`the microphones on the bodyofthe user, the speech to noise
`ratio (SNR)is significantly higher and similarforall near-end
`users. Using technology like the Dual Omnidirectional
`Microphone Array (DOMA)(described in detail herein andin
`USS. patent application Ser. No. 12/139,333, filed Jun. 13,
`2008)available from Aliph,Inc., San Francisco, Calif., two or
`more microphones can be used to capture audio that can be
`used to remove acoustic noise (including other users speak-
`ing) and echo (if a loudspeaker is still used to broadcast
`far-end speech). Under the embodiments herein, the signal
`processing is not required to be done on the device carried on
`the user, as the recorded audio from the microphones can be
`transmitted for processing on the Parent device. If a wireless
`headset device is used to house the microphones, the incom-
`ing far-end speech could also be broadcast to the headset(s)
`instead ofusing the loudspeaker. This improves echo suppres-
`sion and allowstrue duplex, highly intelligible, private, con-
`ference conversationsto take place.
`The components of the wireless conference call telephone
`system are describedin detail below. Each component, while
`described separately forclarity, can be combined with one or
`more other components to form a complete conference call
`system.
`Wearable Devices (Children)
`The term “Children” refers to one or more body-worn
`audio endpoints (for example, headsets or pendants or other
`body-worn devices that contain microphonearraysofat least
`one microphone and an optional loudspeaker). They may be
`wired or wireless. Children are hard-codedto a Parentso that
`
`they cannoteasily be used with other devices. If desired, they
`may be recharged on the Parent for efficiency and conve-
`nience.
`The wearable devices of an embodiment comprise a single
`microphone(e.g., omnidirectional microphone, directional
`microphone,etc.), analog to digital convertor (ADC), and a
`digital signal processor. The wearable devices also include a
`wireless communication component(e.g., Bluetooth, etc.) for
`transferring data or information to/from the wearable device.
`The wireless communication component enablesfixed pair-
`ing between Parent and Child so that the Children don’t get
`removed from the Parent. To assist this, the Children can be
`made to beep and/orflash and/or turn off when removed from
`the proximity of the Parent. Forbest effect, the Children may
`recharge on the Parent. Any number of Children maybe used;
`four to eight should be sufficient for most conferencecalls.
`Optionally, wired devices such as headsets, microphones, and
`loudspeakers can be supported as well.
`The wearable devices of an alternative embodiment com-
`
`prise two or more microphonesthat form a microphonearray
`(e.g., the DOMA(describedin detail herein and in U.S. patent
`application Ser. No. 12/139,333, filed Jun. 13, 2008) available
`from Aliph, Inc., San Francisco, Calif.). Using physical
`microphonearrays, virtual directional microphonesare con-
`structed that increase the SNR of the user’s speech. The
`speech can be processed using an adaptive noise suppression
`algorithm, for example, the Pathfinder available from Aliph,
`Inc., San Francisco, Calif., and described in detail herein and
`in US patent application Ser. No. 10/667,207, filed Sep. 18,
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`38
`
`38
`
`
`
`US 8,838,184 B2
`
`7
`2003. The processing used in support of DOMA,Pathfinder,
`and echo suppression can be performed on the Child or,
`alternatively, on the Parent. If a Parent loudspeaker is used
`and echo suppression is done on the Child, the Parent can
`route the speaker output to the Child via wireless communi-
`cations to assist in the echo suppression process.
`The Child may be head-worn (like a headset), in which case
`a Child loudspeaker can be used to broadcast the far-end
`speech into the ear of the user, or body-worn, in which case
`the Parent will be required to use a loudspeaker to broadcast
`the far-end speech. The body-worn device can clip on to the
`clothing of the user, or be hung from the headlike a pendant.
`The pendant can use a hypoallergenic substance to construct
`the structure that goes around the neck since it may be in
`contact with the user’s skin. Ifa headset is used as a Child, an
`on-the-ear mount is recommendedoveran in-the-ear mount,
`due to hygienic considerations.
`As an example, FIG. 1 shows a body-worn Child device as
`a clip-on microphone array, under an embodiment. The
`device attaches to a user with a gator