throbber
Transparent communication
`
`D(cid:0)W(cid:0)E(cid:0) Schobben and P(cid:0)C(cid:0)W(cid:0) Sommen
`
`Electrical Engineering Department
`Eindhoven University of Technology
`P(cid:0)O(cid:0) Box  (cid:4)  MB Eindhoven(cid:4) the Netherlands
`
`tel(cid:0)
`
`(cid:1) (cid:4) 
`
`fax(cid:0)
`
`(cid:1) (cid:4)
`
`email (cid:11)
`
`D(cid:0)W(cid:0)E(cid:0)Schobben(cid:12)ele(cid:0)tue(cid:0)nl and P(cid:0)C(cid:0)W(cid:0)Sommen(cid:12)ele(cid:0)tue(cid:0)nl
`
`speakers(cid:2) cancel the acoustical echos(cid:2) eliminate the
`reverberation and suppress surrounding noise(cid:1) Fig(cid:0)
`ure depicts a teleconferencing setup(cid:2) in which L
`loudspeakers reproduce the far end speech and L
`microphones pickup degraded near end speech(cid:1) An
`
`(cid:0)(cid:0)(cid:0)
`
`(cid:0)(cid:0)(cid:0)
`
`(cid:0)(cid:0)(cid:0)
`
`Far end speech
`
`L
`
`Near end
`signals
`
`L
`
`Adaptive
`Processor
`
`To far end
`
`Figure (cid:6) Teleconferencing setup
`
`approach is presented in this paper to combine the
`BSS and MC(cid:0)AEC that are needed to achieve trans(cid:0)
`parent communication(cid:1)
`In this way(cid:2) both perfor(cid:0)
`mance and computational cost can be improved(cid:1) The
`problem of dereverberation is not addressed in this
`paper(cid:1)
`
` BLIND SIGNAL SEPARATION
`
`The goal of blind signal separation is to recover esti(cid:0)
`mates of the sources signal from an observed mixture
`of them(cid:1) Figure  depicts the mixing and unmixing
`system in this context(cid:1) The mixing system H can
`be modeled by FIR (cid:10)lters that are present between
`every input and every output of this multichannel
`system(cid:1) For acoustical applications these (cid:10)lters can
`have a length of several thousands of tabs(cid:2) depend(cid:0)
`ing on the sample rate and properties of the room in
`which the microphones are placed(cid:1) The goal of the
`unmixing system is to produce outputs that are lin(cid:0)
`ear functions of the sources(cid:2) yi (cid:11) fi(cid:3)si(cid:4)(cid:0) (cid:1) i (cid:1) J (cid:2)
`with J the number of inputs and outputs of the mix(cid:0)
`
`Abstract
`
`Transparent communication refers to the audio sig(cid:0)
`nal processing which is applied in communication ap(cid:0)
`plications(cid:1) The goal is to make the audio as transpar(cid:0)
`ent as possible in the sense that the reproduced au(cid:0)
`dio should ideally be free from reverberation(cid:2) noise(cid:2)
`acoustical echos and mixed speakers(cid:1) Application ar(cid:0)
`eas are for example teleconferencing and hands(cid:0)free
`telephony(cid:1) This paper presents new ideas for the im(cid:0)
`plementation of such a system(cid:1)
`In particular(cid:2) the
`use of blind signal separation is examined and new
`ideas are presented for the joint implementation of
`the Multi(cid:0)Channel Acoustical Echo Canceler (cid:3)MC(cid:0)
`AEC(cid:4) and the Blind Signal Separation (cid:3)BSS(cid:4)(cid:1) In this
`way(cid:2) acoustical quality can be improved at a reduced
`computational cost(cid:1)
`
` INTRODUCTION
`
`In a teleconferencing setup(cid:2) plain recording of near
`end speech can result in ineligible reproduced speech
`at the far end(cid:1) This reproduced speech is observed
`as a noisy unnatural sounding mixture of multiple
`speech signals which also contains acoustical echos(cid:1)
`Besides this(cid:2) data compression is far less e(cid:5)cient for
`such a signal than for clean speech(cid:1) The degradation
`of the near end speech recordings is caused by the
`following(cid:6)
`
`(cid:0) Reproduced far end speech propagates towards
`the microphones and generates acoustical echos(cid:1)
`
`(cid:0) Microphones pickup an acoustical mixture of
`several speech signals
`
`(cid:0) Microphone signals have reduced signal to noise
`ratio due to the pickup of unwanted surrounding
`noise(cid:1)
`
`(cid:0) Speech signals are a(cid:7)ected by the acoustical re(cid:0)
`verberation(cid:1)
`
`Quality can be improved if multiple microphones are
`used(cid:1) Digital signal processing is applied to these
`microphone signals in order to ideally separate the
`
`Petitioner Apple Inc.
`Ex. 1017, p. 171
`
`

`

`ing and unmixing system(cid:1) Note that the permuta(cid:0)
`tion of the recovered signals and the linear functions
`fi are ambiguous when no properties of the sources
`themselves or their locations are used(cid:1) In practical
`situations these permutations are often not impor(cid:0)
`tant(cid:1) Also(cid:2) the unmixing system can be restricted
`so that it has amplitude responses that are relatively
`(cid:12)at(cid:1) In this way(cid:2) its outputs sound just as natural as
`its inputs(cid:1)
`
`s
`
`sJ
`
`(cid:0)(cid:0)(cid:0)
`
`x
`
`xJ
`
`(cid:0)(cid:0)(cid:0)
`
`w
`
`y
`
`yJ
`
`(cid:0)(cid:0)(cid:0)
`
`H
`
`Figure (cid:6) Blind signal separation
`
`From the separated speech signals(cid:2) the strongest one
`may be sent to the far end(cid:1) Audio can be made even
`more transparent by sending more than one speech
`signal and play them via several distinct loudspeak(cid:0)
`ers(cid:1) If only one speech signal is required(cid:2) it is also
`possible to track only the strongest one(cid:1) This can
`be done using array signal processing (cid:13) (cid:2) (cid:2) (cid:15)(cid:1) Ar(cid:0)
`ray signal processing typically assumes knowledge of
`the geometry of the array and tracks the strongest
`source(cid:1) Tracking only the strongest sources has the
`disadvantage that it gives poor performance at the
`time that it switches from one source to the other(cid:1)
`Blind signal separation is usually based on the fact
`that the speech signals are independent of each other
`and typically tries to recover all source signals(cid:1) Even
`if only one speech signal is sent to the far end(cid:2) recov(cid:0)
`ering all sources has the advantage that the strongest
`one can be picked from the outputs at all times(cid:2) with(cid:0)
`out the system having to reconverge(cid:1) The unmixing
`system w consists of a set of J  FIR (cid:10)lters similar
`to the mixing system(cid:1) For blind signal separation(cid:2)
`these (cid:10)lters are controlled to minimize a cost func(cid:0)
`tion (cid:13)(cid:2) (cid:2) (cid:15)(cid:1) This cost function can be based(cid:2) for ex(cid:0)
`ample(cid:2) on mutual information(cid:2) maximum likelihood(cid:2)
`second or higher order statistics(cid:1) A priory knowl(cid:0)
`edge of the probability density functions (cid:3)pdf(cid:19)s(cid:4) of
`the speech signals can be used as a tool to adap(cid:0)
`tively maximize the mutual information among the
`outputs of the BSS scheme (cid:13)(cid:2) (cid:2) (cid:15)(cid:1)
`Another interesting approach which will be used in
`this paper is to minimize cross(cid:0)correlations among
`the outputs of the BSS scheme (cid:13) (cid:15)(cid:1) This approach
`does not require any a priory knowledge other than
`the statistical independence of the speech signals(cid:1)
`The objective of this approach can be given by
`
`min
`
`w Xl Xi Xj(cid:2)i
`
`jryiyj (cid:13)l(cid:15)j
`
`(cid:3) (cid:4)
`
`This cost function will be small when the (cid:10)ltercoe(cid:5)(cid:0)
`cients w are chosen such that the outputs of the BSS
`become independent of each other in terms of their
`second order statistics(cid:1) The correlation lags l which
`are used in this cost function must be from (cid:2)N (cid:24) to
`N (cid:2) (cid:2) with N the length of the FIR (cid:10)lters in the un(cid:0)
`mixing system(cid:1) This ensures that the problem is not
`ambiguous(cid:1) Using more lags is also allowed and can
`further improve performance (cid:13) (cid:15)(cid:1) The crosscorrela(cid:0)
`tion ryi yj (cid:13)l(cid:15) can be expressed in the crosscorrelation
`of the input of the BSS rxi xj (cid:13)l(cid:15)
`
`ryi yj (cid:13)l(cid:15) (cid:11) Efyi(cid:13)n(cid:15)yj (cid:13)n (cid:24) l(cid:15)g
`L
`L
`
`N(cid:1) N(cid:1)
`wia(cid:13)b(cid:15)wjc(cid:13)d(cid:15)rxa xc (cid:13)l(cid:24)b(cid:2)d(cid:15) (cid:3)(cid:4)
`
`Xb(cid:2)
`
`Xd(cid:2)
`
`(cid:11)
`
`
`
`Xa(cid:2) Xc(cid:2)
`
`In this notation(cid:2) wia(cid:13)b(cid:15) is the bth tab of the FIR(cid:0)
`(cid:10)lter which is present between the ath input and the
`ith output of the BSS(cid:1) The advantage of (cid:3)(cid:4) over (cid:3) (cid:4)
`is that it no longer explicitly contains ryi yj (cid:13)l(cid:15) which
`changes when the (cid:10)lter coe(cid:5)cients change(cid:1) Instead(cid:2)
`rxixj (cid:13)l(cid:15) can be estimated once from a data set(cid:2) and
`the (cid:10)lter coe(cid:5)cients can be found from minimizing
`the cost function which is expressed in rxixj (cid:13)l(cid:15) and
`in the (cid:10)lter coe(cid:5)cients only(cid:1)
`
` COMBINING MC(cid:3)AEC (cid:4) BSS
`
`First(cid:2) two traditional approaches will be presented in
`this section(cid:1) It will be argued that they have impor(cid:0)
`tant drawbacks which cannot be solved by applying
`AEC and BSS independently(cid:1)
`
` (cid:5) Separate BSS (cid:4) MC(cid:3)AEC
`
`The acoustical echos (cid:3)i(cid:1)e(cid:1) the far end speech signals(cid:4)
`that are picked up by the microphone array can be
`cancelled using a MC(cid:0)AEC (cid:13) (cid:15)(cid:1) Next(cid:2) BSS is ap(cid:0)
`plied to separate the near end speech signals(cid:1) This
`is depicted in Figure (cid:1) This approach has the fol(cid:0)
`lowing drawbacks
`
`(cid:0) The MC(cid:0)AEC is not able to work well with dou(cid:0)
`ble talk(cid:2) i(cid:1)e(cid:1) when there is both near end speech
`and far end speech at the same time(cid:1) This is a
`problem when tracking time varying acoustical
`transfer functions(cid:1) The overall performance of
`the system will degrade since the performance
`of the BSS depends on the performance of the
`MC(cid:0)AEC(cid:1)
`
`(cid:0) The separate implementation of MC(cid:0)AEC and
`BSS can result in a considerable computational
`workload(cid:1) This is especially true for cases with
`several loudspeakers and many microphones but
`where only a few outputs need to be retrieved(cid:1)
`
`Petitioner Apple Inc.
`Ex. 1017, p. 172
`
`

`

`Far end
`Speech
`L
`
`Near end
`signals
`L
`
`BSS
`
`Far end
`Speech
`
`Noise
`
`Near end
`Speech
`
`Figure (cid:6) Combined adaptive echo canceling and
`blind signal separation
`
`and outputs are (cid:10)xed to the unit impulse response(cid:1)
`Furthermore(cid:2) the (cid:10)lters from the microphone inputs
`to the far end speech outputs are kept identically
`equal to zero(cid:1)
`Besides the trivial far end speech outputs(cid:2) the BSS
`produces separated near end speech which are inde(cid:0)
`pendent of the far end speech(cid:1) In this way acoustical
`echos are suppressed(cid:1) The number of microphones
`used in this approach must be greater than or equal
`to the number of local speakers(cid:1) If the number of mi(cid:0)
`crophones is larger than the number of local speakers(cid:2)
`the BSS will also generate noise outputs which cor(cid:0)
`respond to noisy observations of the local speakers(cid:2)
`or strong physical noise sources such as the fan of an
`overhead projector(cid:1)
`
` EXPERIMENTS
`
`Experiments were carried out with audio signals
`recorded in a real acoustical environment(cid:1) The room
`which is used for the recordings was (cid:1) x (cid:1) x (cid:1)
`m (cid:3)height x width x depth(cid:4)(cid:1) Two live speakers read
` sentences aloud(cid:1) Also(cid:2) far end speech was intro(cid:0)
`duced by playing prerecorded French news over a
`small loudspeaker(cid:1) The resulting sound was recorded
`by two microphones(cid:1) The setup is depicted in Fig(cid:0)
`ure (cid:1) The microphone signals and the far end speech
`were used to minimize the extended objective func(cid:0)
`tion (cid:3) (cid:4) o(cid:7)(cid:0)line(cid:1) The FIR (cid:10)lters that were used in
`the BSS all have   tabs(cid:1) All signals were sam(cid:0)
`pled at KHz with a  bit accuracy(cid:1) The output
`of the BSS shows clear separation of the speech and
`good suppression of the acoustical echos(cid:1) An AEC
`could however be used to further remove the residual
`echos(cid:1) For this application area(cid:2) quality can not be
`expressed in terms of SNR in a straightforward way
`because the separated speech signals don(cid:19)t resemble
`
`Petitioner Apple Inc.
`Ex. 1017, p. 173
`
`Far end
`Speech
`L
`
`Near end
`signals
`L
`
`AEC
`
`(cid:1)
`
`(cid:0)
`
`BSS
`
`To far end
`
`Figure (cid:6) Adaptive echo canceling followed by blind
`signal separation
`
` (cid:5) BSS without MC(cid:3)AEC
`Theoretically(cid:2) the BSS can classify acoustical echos
`as sources that are independent of the near end
`speech signals(cid:1) Therefore(cid:2) a possible solution could
`be to use L (cid:24) L microphones and let the BSS re(cid:0)
`trieve the far end speech from this as outputs which
`are independent of the retrieved near end speech(cid:1)
`Simulations showed however that the BSS is not able
`to do this with an accuracy comparable to that of
`the MC(cid:0)AEC(cid:1) Furthermore(cid:2) the separation becomes
`more di(cid:5)cult when more sources are involved(cid:1)
`In
`the following section(cid:2) an approach is presented which
`also makes use of the far end speech itself(cid:1) The per(cid:0)
`formance of the system is greatly improved by this(cid:1)
`
` JOINT BSS (cid:4) MC(cid:3)AEC
`
`In order to obtain a succesfull combination of BSS
`and MC(cid:0)AEC(cid:2) the objective function (cid:3) (cid:4) is extended
`to
`
`jryiyj (cid:13)l(cid:15)j
`
`jrzmyi (cid:13)l(cid:15)j (cid:24) jryi zm (cid:13)l(cid:15)j(cid:2) (cid:0)
`
`(cid:3) (cid:4)
`
`(cid:1)
`
`Xj(cid:2)i
`w Xl Xi
`(cid:24) Xm
`
`min
`
`with zm the mth far end speech signal(cid:1) In this way(cid:2)
`the BSS will produce outputs which are not only in(cid:0)
`dependent of each other(cid:2) but they are also indepen(cid:0)
`dent of the far end speech signals(cid:1) In fact(cid:2) the BSS
`is extended by adding the far end speech as input
`signals(cid:1) The BSS with its inputs and outputs is de(cid:0)
`picted in Fig(cid:1) (cid:1) So(cid:2) when the system is controlled
`by optimizing (cid:3) (cid:4)(cid:2) it can be considered as a special
`case of optimizing (cid:3)(cid:4) with the restriction that far
`end speech must also appear as outputs of the BSS(cid:1)
`Therefore the (cid:10)lters between the far end speech input
`
`

`

`the original speech signals(cid:2) but are linear functions
`of them instead(cid:1) Both the original speech signals and
`the linear functions are unknown(cid:1) In order to give an
`impression of the improvements that can be achieved
`using this approach(cid:2) the microphone signals(cid:2) far end
`speech(cid:2) and the output of the BSS are available for
`listening(cid:1) The tracks can be found in WAV(cid:0)format
`at
`
`http(cid:0)(cid:1)(cid:1)www(cid:2)ses(cid:2)ele(cid:2)tue(cid:2)nl(cid:1)persons(cid:1)daniels(cid:1)
`
`by choosing (cid:25)Transparent Communication(cid:25) from the
`publication list(cid:1) There is also BSS page on which the
`latest research results will be presented(cid:1)
`
` (cid:0)m
`
`cm
`
`(cid:0)m
`
` (cid:0)m
`
` m
`
`cm
`
`Figure (cid:6) Recording setup
`
` Conclusions and future work
`
`in
`Blind signal separation is an important tool
`applications like teleconferencing(cid:1) In this paper(cid:2) the
`concept of blind signal separation is extended to in(cid:0)
`corporate acoustical echo cancellation(cid:1) Experiments
`with real acoustical measurements show that this
`extended approach exhibits a good performance(cid:1)
`Subject to further study is the online (cid:3)adaptive(cid:4)
`implementation of the extended algorithm(cid:1) Impor(cid:0)
`tant issues to be considered are the computational
`workload and the convergence speed(cid:1)
`
`References
`
`(cid:13) (cid:15) K(cid:1)M(cid:1) Buckley B(cid:1)D(cid:1) van Veen(cid:1) Beamforming(cid:6)
`A versatile approach to digital (cid:10)ltering(cid:1) AASP
`Mag(cid:0)(cid:2) (cid:3)(cid:4)(cid:6)(cid:26)(cid:2) (cid:1)
`
`(cid:13)(cid:15) Y(cid:1) Grenier S(cid:1) A(cid:7)es(cid:1) A speaker tracking array
`IEEE Trans(cid:0) on Speech and
`of microphones(cid:1)
`Audio Proc(cid:0)(cid:2) (cid:6)(cid:26) (cid:2) Sept(cid:1) (cid:1)
`
`(cid:13) (cid:15) W(cid:1) Kellerman(cid:1) Strategies for combining acous(cid:0)
`tic echo cancellation and adaptive beamforming
`microphone arrays(cid:1)
`In Int(cid:0) Conf(cid:0) on Acous(cid:1)
`tics(cid:2) Speech and Signal Proc(cid:0) (cid:3)ICASSP(cid:4)(cid:2) pages
` (cid:26)(cid:2) Apr(cid:1) (cid:1)
`
`(cid:13)(cid:15) P(cid:1) Smaragdis(cid:1) Blind separation of convolved
`mixtures in the frequency domain(cid:1)
`In Int(cid:0)
`Workshop on Independence (cid:5) Arti(cid:6)cial Neural
`Networks(cid:2) Febr(cid:1) (cid:1)
`
`(cid:13)(cid:15) R(cid:1) Lambert T(cid:0)W(cid:1) Lee(cid:2) A(cid:1)J(cid:1) Bell(cid:1) Blind separa(cid:0)
`tion of delayed and convolved sources(cid:1) Advances
`in Neural Inf(cid:0) Proc(cid:0) Systems (cid:2) pages (cid:26)(cid:2)
` (cid:1)
`
`(cid:13)(cid:15) A(cid:1)J(cid:1) Bell R(cid:1)H(cid:1) Lambert(cid:1) Blind separation of
`multiple speakers in a multipath environment(cid:1)
`In Int(cid:0) Conf(cid:0) on Acoustics(cid:2) Speech and Signal
`Proc(cid:0) (cid:3)ICASSP(cid:4)(cid:2) Apr(cid:1) (cid:1)
`
`An information
`(cid:13)(cid:15) T(cid:1)J(cid:1) Sejnowski A(cid:1)J(cid:1) Bell(cid:1)
`maximisation approach to blind separation and
`blind deconvolution(cid:1) Neural Computation (cid:2)
`pages  (cid:26)  (cid:2) (cid:1) MIT Press(cid:2) Cambridge
`MA(cid:1)
`
`(cid:13)(cid:15) R(cid:1)H(cid:1) Lambert(cid:1) Multichannel Blind Deconvo(cid:1)
`lution(cid:9) FIR Matrix Algebra and Separation of
`Multipath Mixtures(cid:1) University of Southern Cal(cid:0)
`ifornia(cid:2) May (cid:1) Ph(cid:1)D(cid:1) Thesis(cid:1)
`
`Blind separation of delayed
`(cid:13) (cid:15) K(cid:1) Torkkola(cid:1)
`sources based on information maximization(cid:1) In
`IEEE Workshop on Neural Networks for Signal
`Proc(cid:0)(cid:2) Sept(cid:1) (cid:1)
`
`(cid:13) (cid:15) D(cid:1)C(cid:1)B(cid:1) Chan(cid:1) Blind Signal Separation(cid:1) Cam(cid:0)
`bridge(cid:6) Thesis University of Cambridge(cid:2) (cid:1)
`Ph(cid:1) D(cid:1) Thesis(cid:1)
`
`(cid:13) (cid:15) Y(cid:1) Grenier F(cid:1) Alberge(cid:2) P(cid:1) Duhamel(cid:1) A com(cid:0)
`binded fdaf(cid:27)wsaf algorithm for stereophonic
`acoustic echo cancellation(cid:1)
`In Int(cid:0) Conf(cid:0) on
`Acoustics(cid:2) Speech and Signal Proc(cid:0) (cid:3)ICASSP(cid:4)(cid:2)
`May (cid:1)
`
`Petitioner Apple Inc.
`Ex. 1017, p. 174
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket