IPR2017-01244, No. 1030-30 Exhibit - Exhibit 1030 Burrows 1996 (P.T.A.B. Apr. 4, 2017)

SpeechProcessingwithLinear
`and
`NeuralNetworkModels
`Tina-LouiseBurrows
`ACambridgeUniversityEngineeringDepartment
`TrumpingtonStreet
`CambridgeCB PZ
`England
`Thisdissertationissubmittedforconsiderationforthedegree
`ofDoctorofPhilosophyattheUniversityofCambridge
`
`Ex. 1030 / Page 1 of 202
`Apple v. Saint Lawrence
`
`

`Summary
`Thisdissertationinvestigatessomeaspectsofspeechprocessingusinglinearmodelsand
`singlehiddenlayerneuralnetworks.Thestudyisdividedintotwopartswhichfocuson
`speechmodellingandspeechclassi(cid:12)cationrespectively.
`The(cid:12)rstpartofthedissertationexamineslinearandnonlinearvocaltractmodels
`forsynthesisinghighqualityspeechwithadjustablepitch.Asource-(cid:12)lterframeworkfor
`analysisandsynthesisisused,inwhichthesourceisarepresentationoftheglottalvolume
`velocitywaveform.Twofamiliesoflinearmodelareconsidered,ARX(autoregressive
`withexternalinput)andOE(outputerror).Theirperformanceinestimatingvocaltract
`transferfunctionsiscomparedonsyntheticspeechdata,andthedi(cid:11)erenceisexplained
`intermsoftheparameterestimationprocedure,thefrequencydistributionofbiasin
`theestimateandtheassumptionsaboutthespectrumofthenoiseinthevocaltract
`system.ThenoisespectrumforARXmodelsisshowntobeperceptuallysigni(cid:12)cantfor
`speechsynthesisapplicationsbecauseitexploitsauditorymasking.Methodsforimproving
`poorqualitysynthesesfromOEmodelsareproposed.Nonlinearvocaltractmodels,
`implementedasfeed-forwardorrecurrentneuralnetworks,areinvestigated.Methodsfor
`initialisingnetworksfromlinearmodelsaredeveloped.Amodi(cid:12)edrecurrentarchitecture
`isintroducedwhichpermitsinitialisationfromARXmodels.Theuseofregularization,
`forimposingcontinuitybetweenmodelsofadjacentspeechsegments,andlearningrate
`adaptation,forimprovingback-propagationtraining,arediscussed.Forsynthesisingreal
`speechutterances,anaudiotapedemonstratesthatARXmodelsproducethehighest
`qualitysyntheticspeechandthatthequalityismaintainedwhenpitchmodi(cid:12)cationsare
`applied.Thesecondpartofthedissertationstudiestheoperationofrecurrentneuralnetworks
`inclassifyingpatternsofcorrelatedfeaturevectors.Suchpatternsaretypicalofspeech
`classi(cid:12)cationtasks.Theoperationofahiddennodewitharecurrentconnectionisex-
`plainedintermsofadecisionboundarywhichchangespositioninfeaturespace.The
`feedbackisshowntodelayswitchingfromoneclasstoanotherandtosmoothoutput
`decisionsforsequencesoffeaturevectorsfromthesameclass.Fornetworkstrainedwith
`constantclasstargets,asequenceoffeaturevectorsfromthesameclasstendstodrive
`theoperationofhiddennodesintosaturation.
`Itisdemonstratedthatsaturationde-
`(cid:12)neslimitsonthepositionofthedecisionboundaryresultingincontext-sensitiveand
`context-insensitiveregionsofthefeaturespace.Whilesaturationpersists,itisshownthat
`networkshavereducedsensitivitytotheorderofpresentationoffeaturevectorsbecause
`movementofthedecisionboundaryisinhibited.Toimprovethiswithin-classsensitivity,
`trainingwithramp-likeclasstargetsisinvestigated.Theoperationofsmallrecurrent
`networksisdemonstratedfortwotasks;classi(cid:12)cationofspeechutterancesintovoicedand
`unvoicedsegments,andclassi(cid:12)cationofclockwiseandanti-clockwisetrajectoriesofvectors
`producedbytwoautoregressiveprocesses.
`
`Ex. 1030 / Page 2 of 202
`
`

`Acknowledgements
`IwouldliketothankeveryoneintheFallsideLabformakingmytimeinCambridge
`anexperience.Inparticular,IwouldliketomentionJulianforhispracticaladvice,Rob
`forallhishelpwiththe(cid:12)ddlytape-recording,andXtofforhispatiencewithmyfaltering
`Spanish.Specialthankstomysupervisor,Dr.MahesanNiranjan,forhisguidance,andto
`Dr.LjungandDr.Maciejowskiforhelpfuldiscussionsonsystemidenti(cid:12)cationtheory.The
`biggestthank-youofallgoestomysister,Tanya,forallherloveandsupport,especially
`whilewritingup.
`ThisworkhasbeenfundedbytheScienceandEngineeringResearchCouncilwith
`someusefultop-upsfromtheEngineeringDepartmentandQueens'College.
`Dedication
`ToMumandDad.ThankyouforsupportingmeinallthemadthingsIdo.
`Declaration
`This,worddissertationisentirelytheresultofmyownworkandincludesnothing
`whichistheoutcomeofworkdoneincollaboration.
`Tina-LouiseBurrows
`Queens'College
`March,
`
`Ex. 1030 / Page 3 of 202
`
`

`Contents
`
` Introduction
`
` . TheSpeechProductionMechanism.......................
`
` .SpeechProcessing................................
`
` .. ReviewofResearchinModellingSpeechSignals
`...........
`
` ..ReviewofResearchinClassi(cid:12)cationwithNeuralNetworks
`.....
`
` . OutlineofThesis.................................
`
` . . PartI-VocalTractModelling.....................
` . .PartII-Classi(cid:12)cationofSpeechPatterns...............
` .Publications....................................
`IVocalTractModelling
`
`ModellingtheSpeechSignal
`
`.
`Introduction....................................
`.AcousticModelling................................
`.. FrequencyDomainAcousticModelling.................
`..TimeDomainAcousticModelling
`...................
`. LinearPredictionAnalysis............................
`. . LinearPredictionforSpeechAnalysisandSynthesis.........
`. .SpectralMatching............................
`. . PredictorOrder..............................
`. .Pre-emphasisofSpeech.........................
`. .LimitationsofLinearPredictionforAnalysisandSynthesis.....
`ImprovementstoLinearPredictionAnalysisandSynthesis..........
`.
`.. Analysis-by-SynthesisTechniques....................
`..PerceptualWeightingFilters
`......................
`.. DecouplingtheSourceandVocalTractFilter.............
`i
`
`Ex. 1030 / Page 4 of 202
`
`

`ii
`Contents
`.SystemIdenti(cid:12)cationApproachtoVocalTractModelling..........
` LinearModelsoftheVocalTract
`
` . LinearBlack-BoxModels
`............................
` . . ARXModels...............................
` . .OEModels................................
` .PredictionversusSynthesis
`...........................
` . ParameterEstimation..............................
` . . FrequencyDomainInterpretationofPrediction-ErrorMethod....
` . .PerceptualSigni(cid:12)canceoftheModelNoiseandTransferFunction
`Bias....................................
` . . ChangingtheNoiseModelandTransferFunctionBias........
` .ModelOrderSelection..............................
` .. A(q)andF(q)
`..............................
` ..B(q)....................................
` .GeneratinganExcitationWaveformforBlack-BoxModels..........
` ..
`InverseFilteringTechniques.......................
` ..VolumeVelocityPulseModels
`.....................
` .. TheGlottalExcitationModelUsedinThisWork...........
` .ComparisonofDi(cid:11)erentAnalysisMethodsUsingSyntheticData......
` .. NoiseModelandTransferFunctionEstimate.............
` ..E(cid:11)ectofPre-emphasisonEstimationofTransferFunction......
` .. E(cid:11)ectofNoiseonEstimationofTransferFunction..........
` ..E(cid:11)ectofMisalignmentofExcitationonEstimationofTransferFunction
` .TheVocalTractModellingFramework.....................
` .PreprocessingofSpeechandLaryngographData...............
` .. SourcesofSpeechandLaryngographData...............
` ..
`InitialPreprocessing...........................
` .. PitchandVoicingAnalysis
`.......................
` . TheVocalTractFilter..............................
` . . ModelOrder
`...............................
` . .ParameterEstimation..........................
` . . FilterImplementation..........................
` . PerformanceonRealSpeechDataatNormalPitch..............
` . . LinearPredictionPerformance.....................
` . .ARXPerformance............................
`
`Ex. 1030 / Page 5 of 202
`
`

`iii
`Contents
` . . ComparisonofOEandARXModels..................
` . .ImprovingthePerformanceoftheOEmodel
`.............
` . .E(cid:11)ectofMisalignmentErrorsonEstimationofVocalTractTransfer
`Function..................................
` . .ParallelImplementation.........................
` . PitchManipulationofSynthesisedSpeech...................
` . . PitchManipulationbyPSOLA.....................
` . .PitchManipulationPerformance....................
` . ConcludingRemarks...............................
` . . EvaluationofPerformance........................
` . .ChoosingaSuitablePre-emphasisFilter................
` . . SpeechCoding..............................
` . .LimitationsoftheLinearVocalTractSystem.............
`NeuralNetworkModelsoftheVocalTract
`
`. Di(cid:14)cultiesinModellingLongUtteranceswithNeuralNetworks.......
`.TheNeuralNetworkModel
`...........................
`.. OperationofHiddenNodesofSmallNetworksonContinuousData.
`.
`InitialisationofNeuralNetworkWeights....................
`. . ReviewofWorkonInitialisationofNeuralNetworkWeights
`....
`. .MotivationforInitialisationfromLinearModels
`...........
`. . WeightInitialisationsforFeed-forwardNetworks...........
`. .WeightInitialisationsforSingleDelayRecurrentNetworks
`.....
`.Modi(cid:12)edRecurrentNetworkArchitecture...................
`.. WeightInitialisationfromARXmodel.................
`..Trainingbyback-propagation......................
`.NonlinearVocalTractModellingUsingNeuralNetworks...........
`.. SelectingaNetworkArchitecture....................
`..SizeofNetworksandInitialARXmodels
`...............
`..
`IssuesforBack-propagation.......................
`.PerformanceofNetworkModels.........................
`ImprovingThePerceptualQualityofNetworkSynthesis...........
`.
`.. Back-propagationwithRegularization.................
`..
`ImprovedPerformanceResults
`.....................
`.ConcludingRemarks...............................
`.. Summary.................................
`
`Ex. 1030 / Page 6 of 202
`
`

`iv
`Contents
`..................
`..StabilityofNeuralNetworkModels
`.. UsefulnessofLinearInitialisation....................
`..DrawbacksofaNonlinearModel
`....................
`IIClassi(cid:12)cationofSpeechPatterns
`
`RecurrentNetworksforContext-DependentSpeechClassi(cid:12)cation
`
`.
`Introduction....................................
`. . Context-dependentPatternClassi(cid:12)cationTasks............
`. .HiddenMarkovModelsvsNeuralNetworks..............
`. . RecurrentNetworksforContext-DependentPatternClassi(cid:12)cation.
`.TheRecurrentNetworkDecisionBoundary..................
`. TheE(cid:11)ectofSaturation-Context-Sensitivity.................
`.TheE(cid:11)ectofDecisionBoundaryMovement-Trajectory-Sensitivity....
`.
`ImplicationsforClassi(cid:12)erPerformance.....................
`.. Misclassi(cid:12)cations.............................
`..Between-ClassContext-SwitchingDelay...............
`.. Within-ClassContext-OutputSmoothing..............
`.LargerNetworks
`.................................
`.Classi(cid:12)cationofDVectorARprocesses....................
`.Voiced-UnvoicedClassi(cid:12)cationofSpeechUtterances
`.............
`. ConcludingRemarks...............................
`ConclusionsandFurtherWork
`
`. Conclusions....................................
`. . VocalTractModelling..........................
`. .Classi(cid:12)cationofSpeechPatterns....................
`.FurtherWork...................................
`.. LinearModelsoftheVocalTract....................
`..NeuralNetworkModelsoftheVocalTract
`..............
`.. Classi(cid:12)cationofSpeechPatterns....................
`ABack-propagationTrainingForMulti-DelayRecurrentNeural
`Networks
`
`BTapeDemonstration
`
`B. Introduction....................................
`
`Ex. 1030 / Page 7 of 202
`
`

`v
`Contents
`B.ComparisonofDi(cid:11)erentLinearModels.....................
`B. PitchManipulationusingARXandLPModels................
`B.NeuralNetworkModelsoftheVocalTract...................
`
`Ex. 1030 / Page 8 of 202
`
`

`ListofFigures
`
`.........................
` . SpeechProductionMechanism.
`
` .Typicalspeechwaveformandspectrograms...................
`. Theacoustictheoryofspeechproduction.
`...................
`.CascadeandParallelFormantSynthesisers.
`..................
`. Source-(cid:12)lterarrangementforLPsynthesis.
`..................
`.Examplesoflinearpredictionspectra.
`.....................
`.Amplitudespectrafortypicalweighting(cid:12)lters.
`................
` . Systemidenti(cid:12)cationapproachtovocaltractmodelling.
`...........
` .Operationofblack-boxmodelsinpredictionandsynthesis.
`.........
` . Source-(cid:12)ltercon(cid:12)gurationsforblack-boxmodels.
`...............
` .TheRosenbergglottalvolumevelocitywavepulse.
`..............
` .Typicalspeech,laryngograph,residualandglottalvolumevelocitywaveforms.
` .Estimatesofthevocaltracttransferfunctionforthesyntheticvowelin`hod'.
` .Frequencybiasfunctionsforthesyntheticvowelin`hod'.
`..........
` .Noisemodelsandspectraofsynthesiserrorsforthesyntheticvowelin`hod'.
` . Estimatesofthevocaltracttransferfunctionforthesyntheticnasalised
`vowel=~(cid:15)=......................................
` . Frequencybiasfunctionsfor=~(cid:15)=.
`........................
` . Absolute(%)errorinestimationofformantsandbandwidthsofsynthetic
`voweldata.
`....................................
` . E(cid:11)ectofnoiseonestimationoftransferfunctionofthesyntheticvowelin
``hawed'.
`......................................
` . VaryingstagesintheestimationofOEtransferfunction.
`..........
` . E(cid:11)ectofmisalignmentofexcitationontransferfunctionestimates.
`.....
` . E(cid:11)ectofalignmenterrorsonpredictionerror,synthesiserrorandsynthesis
`forthesyntheticvowelin`hud'..........................
` . Thevocaltractmodellingframeworkforspeechsynthesis...........
`vi
`
`Ex. 1030 / Page 9 of 202
`
`

`vii
`ListofFigures
`.................
` . Pitch-synchronousparameterupdatescheme.
`............
` . Comparisonofpitchcontoursatdi(cid:11)erentframerates.
` . Typicalvoicingtransitionswhereerrorsinvoicingdecisionoccur.......
` .Comparisonofspectrafromautocorrelationandcovariancemethodsoflin-
`earprediction.
`..................................
` . ComparisonofARXandLPmodelsforthespeechfragment`inlang'.
`...
` .ComparisonofARXandLPmodelsforasegmentofthephone`ng'.....
` . Spectrogramsofsynthesisoftheutterance`Germany'sdecisionfollowed
`eightyearslater'bydi(cid:11)erentvocaltractmodels.
`...............
` .ComparisonperformanceofOEandARXmodelsinsynthesis.
`.......
` .Transferfunctionestimatesforthesyntheticvowelin`hawed'.
`.......
` .ComparisonofperformanceofARXandregularizedOEmodels.
`......
` .SpectrogramofsynthesisbyregularizedOEmodel...............
` .ModifyingthepitchofvoicedspeechusingthePSOLAmethod........
` . Spectrogramsofpitchmanipulatedsynthesisfromdi(cid:11)erentmodels......
`. Structureofaneuralnetworkmodelofthevocaltract.............
`.Regionsofoperationoftanhnonlinearity.
`...................
`.
`Illustrationoftheoperationofhiddennodesofatwonodenetwork.
`....
`.Comparisonofinitialisationtechniquesforfeed-forwardnetworks.......
`.ComparisonofinitialisationtechniquesforRNN .
`..............
`.ComparisonofinitialisationtechniquesforRNN.
`..............
`.Structureofmodi(cid:12)edrecurrentnetwork.....................
`.................
`.SpectraofARXmodelandHi(z)forRNN .
`. E(cid:11)ectoflearningrateadaptationonback-propagationtrainingoffeed-forward
`andrecurrentnetworks.
`.............................
`. Spectrogramsofsynthesesbynetworkmodels.
`................
`. Phonerecognition-acontext-dependentclassi(cid:12)cationtask.
`.........
`.HMMandneuralnetapproachestophonerecognition.............
`. Singlehiddennodewithrecurrentconnection..................
`.E(cid:11)ectofarecurrentconnectiononthepositionofthedecisionboundaryin
`featurespace....................................
`.E(cid:11)ectofsaturationonpositionofthedecisionboundary.
`..........
`.Featurespaceprojections(vTx(t))forwhichclassi(cid:12)cationcausesmovement
`ofthedecisionboundary.
`............................
`.Anexampleoftrajectory-sensitivity.
`......................
`
`Ex. 1030 / Page 10 of 202
`
`

`viii
`ListofFigures
`.Two-stateHMMequivalenttoasinglenoderecurrentnetwithstepnon-
`linearity.......................................
`. E(cid:11)ectofgradientofnonlinearfunctiononswitchingdelay...........
`. E(cid:11)ectofgradientofnonlinearfunctiononswitchingspeed.
`.........
`. Operationofclassi(cid:12)eronaclass -trajectory.
`................
`. E(cid:11)ectofboundarymovementonoutputsmoothing.
`.............
`. FeaturespaceforvectorARprocesses,showinglimitingpositionsofthe
`decisionboundarywhenhiddenunitssaturate.
`................
`. Operationoftherecurrentnetwork(nh= ),trainedwith(cid:12)xedclasstar-
`gets,inclassifyingthevectorARtrajectories..................
`. Operationoftherecurrentnetwork(nh=),trainedwith(cid:12)xedclasstar-
`gets,inclassifyingthevectorARtrajectories..................
`. Operationofrecurrentnetwork(nh=),trainedwithexponentialclass
`targets,inclassifyingthevectorARtrajectories.
`...............
`. Operationofrecurrentnetwork(nh=),trainedwith(cid:12)xedclasstargets,
`invoiced-unvoicedclassi(cid:12)cationofthesentence\Johncleansshell(cid:12)shfora
`living"........................................
`. Operationofrecurrentnetwork(nh=),trainedwith(cid:12)xedclasstargets,
`invoiced-unvoicedclassi(cid:12)cationofthesentence\Johncleansshell(cid:12)shfora
`living"........................................
`B. Pitchcontoursappliedtotheutterance\Francebecamethe(cid:12)rstdecimal
`countryinEurope,in ".
`..........................
`B.Pitchcontoursappliedtotheutterance\:::
`joinedbyBelgium,Italyand
`Switzerland,in ".
`..............................
`B. Pitchcontoursappliedtotheutterance\Germany'sdecisionfollowedeight
`yearslater".....................................
`
`Ex. 1030 / Page 11 of 202
`
`

`ListofTables
` . Summaryofsource-(cid:12)lterparametersforgenerationofsyntheticdata.....
` .Summaryofmodelordersusedinanalysisofsyntheticspeechdata.
`....
` . E(cid:11)ectofmisalignmentofexcitationonpredictionandsynthesisSNRfor
`ARXandOEmodels.
`..............................
` .E(cid:11)ectofvariationofmodelorderonmeanpredictionSNRforlinearpre-
`dictionmodels.
`..................................
` .E(cid:11)ectofvariationofmodelorderonmeanSNR(dB)forARXmodels....
` .Summaryofmodelordersusedforvocaltractmodellingusingblack-box
`andlinearpredictionmodels.
`..........................
` .
`ImprovementinpredictionSNRofARXmodelsoverLPmodels.
`......
` .E(cid:11)ectofvariationofinputanduseofpre-emphasisonOEandARXmodels.
`. Summaryofparametervaluesforvocaltractmodellingwithneuralnetworks.
`.Performanceofnetworkstrainedwithlearningrateadaptation........
`. Performanceofregularizednetworks.......................
`. PerformanceresultsfornetworkstrainedtoclassifyvectorARprocesses...
`.MappingbetweenTIMITphonelabelsandvoiced-unvoicedclasses......
`. Comparisonofperformanceofrecurrentnetworks(RNN)andextracted
`feed-forwardnetwork(FNN)withequivalentweightsUandV........
`.Comparisonofperformanceofrecurrentnetworkstrainedwith(cid:12)xedand
`exponentialtargetsforvoiced-unvoicedclassi(cid:12)cationofspeech.
`.......
`ix
`
`Ex. 1030 / Page 12 of 202
`
`

`ListofNotation
`Abbreviations
`AR
`autoregressivemodel
`ARX
`autoregressive(AR)modelwithexternalinput(X)
`OE
`outputerrormodel
`LP
`linearpredictionmodel
`CELPcodeexcitedlinearprediction
`FNN
`feedforwardneuralnetwork
`RNN
`recurrentneuralnetwork
`HMMhiddenMarkovmodel
`ETFEempiricaltransferfunctionestimate
`SNR
`signal-to-noiseratio
`MSEmeansquarederror
`SymbolDe(cid:12)nitions
`R(z);R(q)
`lipradiationcharacteristic
`P(z);P(q)
`pre-emphasis(cid:12)lter
`^H(z),H(z),H(q)
`vocaltracttransferfunction
`H(ej!;(cid:18))
`vocaltractfrequenyresponse
`^^H(q)
`EmpiricalTransferFunctionEstimate
`Q(!;(cid:18))
`frequencybiasfunctionfortransferfunctionestimate
`N(q),N(ej!;(cid:18))
`modelnoiseandcorrespondingspecturm
`spectrumofsynthesiserror
`(cid:8)ER(!;(cid:18))
`L(t),dL(t)
`laryngographsignaland(cid:12)rstdi(cid:11)erence
`x(t),dx(t)
`glottalvolumevelocitywavemodeland(cid:12)rstdi(cid:11)erence
`X(ej!)
`spectrumofinputwaveform(x(t)ordx(t))
`y(t)
`speechwaveform
`Y(ej!)
`speechspectrum
`^ys(t)
`modelsynthesis
`^yp(t)
`modelprediction
`x(t)
`networkinputvector
`h(t)
`hiddennodeoutput
`^y(t)
`networkoutput(predictionorsynthesis)
`U,V,W
`networkweights(output,inputandfeedback)
`q(cid:0)
`backwardshiftoperator,q(cid:0) x(t)=x(t(cid:0) )
`z,z(cid:0)
`z-transforms
`(:)T
`denotesmatrixtranspose
`k:k
`denotesEuclideannormx
`
`Ex. 1030 / Page 13 of 202
`
`

`Chapter
`Introduction
`\InthebeginningwastheWord,andtheWordwaswithGod,andtheWordwasGod."
`St.John : .
`Speechistheacousticrealisationofalanguage.Ourknowledgeofhowwespeak,
`hear,recogniseandunderstandalanguagecanbeincreasedbystudyingthespeechsignal
`andattemptingtomodelthesefunctions.Thisthesisinvestigatessomeissuesforspeech
`processingwithlinearandneuralnetworkmodels.Inthischapter,thespeechproduction
`mechanismisdescribedandsomeoftheterminologyapplicabletospeechprocessingis
`introduced.Previousrelevantresearchinspeechprocessingwithlinearandneuralnetwork
`modelsisreviewedandtheresearchpresentedinthisthesisisoutlined.
` . TheSpeechProductionMechanism
`Themechanismforspeechproduction,showninFig. . ,consistsofthetrachea,vocal
`cords,tongue,vocaltract(oralandnasalcavities),lips,teethandnostrils,inadditionto
`thediaphragmandlungs.Aspeechutterancebeginsasanairstreamorvolumevelocity
`wavefromthelungs,whichtravelsalongthetracheaandvocaltracttoberadiatedasan
`acousticpressurewaveformfromthelipsorthelipsandnostrils.
`Speechisclassi(cid:12)edasvoicedorunvoiced,dependingonthenatureoftheexcitationof
`thevocaltract.Forvoicedphones,theexcitationofthevocaltractoriginatesattheglottis
`andisbytheperiodicvibrationofthevocalcords.Thefrequencyofvibration,orpitch,is
`controlledbythetensioninthevocalcordsandtheairpressurefromthelungs.Typical
`pitchvalueslieintherange-Hzforadults,andcanriseto Hzinchildren.Due
`toitsperiodicnature,thespectrumofvoicedexcitationcontainsdiscretecomponentsat
`harmonicsofthepitchfrequency.Forunvoicedsounds,theexcitationisduetoturbulence
`generatedbyair(cid:13)owpastanarrowconstrictionandtendstoberandominnature,with
`a(cid:13)at,continuousspectrum.Thenoiseisknownasaspirationiftheconstrictionisatthe
`glottisandfricationifitoccursatsomepointalongthevocaltract.Mixedexcitation
`
`
`Ex. 1030 / Page 14 of 202
`
`

`
` .Introduction
`Figure . :SpeechProductionMechanism.
`isalsopossiblefortheclassofsoundsknownasvoicedfricatives,inwhichturbulent
`excitationisamplitudemodulatedperiodicallybythevibrationofthevocalcords.
`Theacousticsignalcanberepresentedbyatranscriptionofphonemes,whichare
`thesmallestunitswhichconveylinguisticmeaningofalanguage.Theactualsounds
`whichareproducedinspeakingastringoftargetphonemesarecalledphones.Each
`phoneofanutterancecorrespondstoasegmentoftheacousticwaveformwhichhas
`acharacteristictime-varyingvibratorypattern.Vibratorypatternsaresuperimposed
`ontheairstreambythevibrationofthevocalcordsandresonanceofthevocaltract.
`Theresonantpropertiesofthevocaltractaremodi(cid:12)edbychangingthepositionofthe
`articulators(thelips,tongue,jawandvelum,showninFig. . .)Duetothephysical
`constraintsofthevocaltract,thepositionsofthearticulatorscanonlychangeslowlywith
`timeandindividualrealisationsofaphonearestronglyin(cid:13)uencedbypreviousandfuture
`phonesinanutterance.Thisphenomenonisknownasco-articulationandisimportant
`forbothaccuratespeechrecognitionandnaturalsoundingspeechsynthesis.
`Duetotheslowlytime-varyingnatureoftheacousticwaveformforeachphone,the
`resultingspectrumofthespeechvarieswithtime.Thetimevariabilityofthespectrumis
`capturedbycalculatingthespectrumofoverlappingshort-timesegmentsoftheacoustic
`waveformandisdisplayedusingaspectrogram.Aspectrogramplotsthefrequencyof
`successiveshort-timespectrausingtheintensityoftheplottoindicatetheenergyofthe
`frequencycomponentsataparticularinstant.Mostoftheenergyinthespeechspec-
`trumisbetween-Hz.Intelligibilitytestsonband-pass(cid:12)lteredspeechshowthat
`intelligibilityisnotimpairedwhenspeechislow-pass(cid:12)lteredtoremoveallfrequencies
`abovekHz(French&Steinberg ,Klatt ).Thispermitsalowersamplingrate
`of kHz.Withinthefrequencyrange-kHz,thevocaltractforvoicedphonestypically
`has-resonantfrequencies(Klatt )whicharecalledformants.Formantsarevisible
`asdarkhorizontalbandsonaspectrogram.Examplesofwidebandandnarrowbandspec-
`trogramsfortheutterance`Belgium'areshowninFig. ..Widebandandnarrowband
`
`Ex. 1030 / Page 15 of 202
`
`

`
` .Introduction
`spectrogramsrepresentatradeo(cid:11)betweentimeandfrequencyresolutionofthespectrum.
`Narrowbandspectorgramsuseshort-timespeechsegmentsofacoupleofpitchperiodsin
`duration.Theresultingspectrogramhashighfrequencyresolution(yaxis)andindividual
`pitchharmonicsappearascloselyspacedhorizontalbands,asillustratedinFig. .(b).
`However,timeresolutionispoorandrapidformanttransitionsareaveragedovertime.
`Fortheclassofsoundscalledstops,the`b'in`Belgium'forexample,thevocaltract
`becomescompletelyoccludedbythetongueorlipsforpartoftheutterance.Rapidmove-
`mentofthearticulatorstoreleasetheocclusion,whichmaybeaccompaniedbyaburst
`ofnoise,givesrisetosoundsthatareshortindurationandhighlytransientinnature.
`Thetime-variabilityinthespectrumofsuchphonesmaynotbeaccuratelyrepresentedby
`anarrowbandspectrogram.Widebandspectrogramsuseshort-timesegmentsofroughly
`onepitchperiodindurationandgivemuchbettertimeresolutionattheexpenseoffre-
`quencyresolution.Forvoicedspeech,verticalstriationsatthepitchperiodarevisible,as
`illustratedinFig. .(c).
`Duringnasals,suchas`m'or`n',theair(cid:13)owisdivertedintothenasalcavityby
`theloweringofthevelum.Withthelipsclosed,thenasalcavityformstheprincipal
`resonantpathwhichdeterminestheformantsandthevocaltractactsasaclosedside-
`branchwhichintroducesananti-resonance(spectralvalley)intothespectrum.Innasalised
`vowels,boththenasalandoralcavitiesareopenandsoundisradiatedfromthelipsand
`nostrilssimultaneously.Themainresonancesareduetotheoralcavity,whichdetermines
`thelocationoftheformants,andthenasalbranchisconsideredastheside-branch.
`Onreachingthelipsandnostrils,thee(cid:11)ectofdirectionalsoundpropagationfrom
`theseaperturesistoconvertthevolumevelocitywaveintoanacousticpressurewaveform
`whichradiatesawayfromthehead.Thepressurewavemeasureddirectlyinfrontofthe
`headisproportionaltothetimederivativeoftheresultantvolumevelocitywavefromthe
`lipsandnostrils,andisinverselyproportionaltothedistancefromthelips(Fant ).
`Theradiatione(cid:11)ectcanbeapproximatedasthatofradiationfromacircularaperturein
`asphereorin(cid:12)niteplane(Flanagan )andtheamplitudespectrumoftheresultant
`acousticwaveformisapproximatelymodi(cid:12)edby+dB/octavewhencomparedtothatof
`thevolumevelocitywaveattheendofthevocaltract.
`Additionalfeatureswhichaddintelligibility,meaningandnaturalnesstospeechare
`stressand,overlongerphrasaldurations,prosody.
`Inadditiontopitch,durationand
`intensity(loudness)constitutetheparametersofstressandprosodywhichareusedto
`emphasiseimportantacousticeventsandbreakspeechupintomeaningfulunits.Ata
`higherphrasallevel,speci(cid:12)cprosodicpatternscanalsoconveyemotionandattitude.
` .SpeechProcessing
`Thetwoareasofspeechprocessingconsideredinthisthesisaresignalmodelling(for
`speechsynthesis)andsignalclassi(cid:12)cation(forspeechrecognition).Inmodellingthespeech
`signal,theaimistoparametrizespeechwaveformsinsuchawaythattheycanbestored
`
`Ex. 1030 / Page 16 of 202
`
`

`
`
`0.10
`
`0.20
`
`0.30
`
`0.40
`
`0.05
`
`0.1
`
`0.15
`
`0.2
`
`0.25
`
`0.3
`
`0.35
`
`0.4
`
`0.45
`
`0.05
`
`0.1
`
`0.15
`
`0.2
`
`0.25
`
`0.3
`
`0.35
`
`0.4
`
`0.45
`
` .Introduction
`
`6000
`
`4000
`
`2000
`
`0
`
`−2000
`
`−4000
`
`−6000
`
`−8000
`
`−10000
`
`−12000
`
`−14000
`
`4000
`
`3500
`
`3000
`
`2500
`
`2000
`
`1500
`
`1000
`
`500
`
`0
`
`4000
`
`3500
`
`3000
`
`2500
`
`2000
`
`1500
`
`1000
`
`500
`
`0
`
`−16000
`
`(a)Speechutterance`Belgium'
`(b)Narrowbandspectrogram
`(c)Widebandspectrogram
`Figure .:Typicalspeechwaveformandspectrograms.Forspectrograms,horizontalaxis
`showstimeinseconds,verticalaxisshowsfrequencyinHz.
`
`Ex. 1030 / Page 17 of 202
`
`

`
` .Introduction
`e(cid:14)cientlyandreproduced(synthesised)atalaterdate.Parametersformodelscanbe
`foundbyperformingatimeorfrequencydomainmatchbetweentheoriginalspeechsignal
`andthatgeneratedbythemodel.
`Inclassi(cid:12)cation,modelsaredevelopedtoassignclasslabelstosegmentsoftheacoustic
`signalbasedonthedistinguishingfeaturesofaparametricrepresentationofeachsegment.
`Inspeechrecognition,forexample,theclasslabelsarelinguisticunitsofthelanguage
`suchasphones,diphonesortriphones.Thelinguisticunitscanformtheinputforhigher
`levelnaturallanguageprocessing,inwhichsyntacticandsemanticconstraintsonpossible
`linguisticsequencesareappliedandthemeaningoftheintendedutteranceextracted.
`Lowerlevelclassesarealsopossible,suchasclassifyingthespeechsignalintovoicedand
`unvoicedsegments.
`Speechprocessingtypicallyinvolvestheuseorcalculationofaparametricrepresenta-
`tionofacousticwaveforms.Speechsignalsarenon-stationaryandwhenprocessinglong
`utterances,atime-varyingparametricrepresentationisneeded.Aquasi-stationaryap-
`proachisusuallyadopted,inwhichanutteranceisdividedintoasequenceofoverlapping
`segmentsandassumedtobestationaryforthedurationofeachsegment.Sincethespeech
`productionmechanismcanchangeonlyslowlywithtime,parametricrepresentationsof
`adjacentsegmentsofspeechshowahighdegreeofcorrelation.Formodellingacoustic
`waveforms,thisimpliescontinuityinthevaluesofmodelparametersforadjacentseg-
`ments.Forspeechclassi(cid:12)cation,itimpliesthattheclasslabelassignedtoaparticular
`featurevectorisdependentonthecontextinwhichthatfeaturevectoroccursinaninput
`sequence(context-dependentclassi(cid:12)cation).Exploitingthecorrelationbetweensegments
`ofspeechishighlybene(cid:12)cialforspeechprocessingapplicationsandisawayofrepresenting
`co-articulatione(cid:11)ects.
` .. ReviewofResearchinModellingSpeechSignals
`Themostwidelyusedtechniqueforspeechanalysisislinearpredictionanalysis(Makhoul
` ,Markel&Gray ),andformsthebasisofmostspeechcodingsystems,suchas
`vocoders(Markel&Gray ),CELP(code-excitedlinearprediction)coders(Schroeder
`&Atal ),multi-pulsecoders(Atal&Remde )andahostofvariantswhichdi(cid:11)erin
`thenatureoftheexcitationofthelinearpredictionmodelatthedecoder.Thepopularity
`oflinearpredictionisduetoeaseofanalysisandimplementationandlowcomputational
`requirements.Analternativeapproachistomodelthetransferfunctionofthevocal
`tractsystem(vocaltractmodelling).ARX(autoregressivewithexternalinput)(Lobo&
`Ainsworth ,Fujisaki&Ljungqvist ),OE(outputerror)(Wang,Guan&Fujisaki
` )andstate-space(Morikawa&Fujisaki )parametrizationsforthevocaltract
`(cid:12)lterhavebeenusedanddi(cid:11)erintheirunderlyingstructureofthemodelandthenature
`oftheerrorwhichisminimisedintheparameterestimationprocedure.
`Modellingthevocaltracttransferfunctiondirectlyallowsinclusionofzerosinthe
`modelandhasbeenshowntoimprovepredictiongainevenforasimpleimpulse(Fu-
`
`Ex. 1030 / Page 18 of 202
`
`

`
` .Introduction
`jisaki&Ljungqvist ),ormulti-pulseexcitation(Singhal&Atal ).Theuseofa
`morerealisticrepresentationofthevocaltractexcitation,basedonglottalvolumevelocity
`wavepulsemodels,hasbeenshowntoimprovethepredictiongainby- dBwhencom-
`paredwithlinearpredictionanalysis(Fujisaki&Ljungqvist ,Thomson ,Hedelin
` ),andgivesimprovednaturalnessofsyntheticspeechgeneratedfrombothformant
`synthesisersandvocaltractmodels(Holmes ,Rosenberg ,Fujisaki&Ljungqvist
` ).Alternativeapproachestomodellingtheexcitationsignalincludelinearandnon-
`linearinverse(cid:12)lteringtechniques(Alku ,Milenkovic ,Denzler,Kompe,Kie(cid:25)ling,
`Niemann&N(cid:127)oth ),incorporatinganall-zeromodeldirectlyinthevocaltracttrans-
`ferfunction(Mathews,Miller&David ,Funaki&Mitome )orincorporating
`amoregeneralfunction-basedmodeloftheexcitationwithinthevocaltracttransfer
`function(Thomson ,Cheng&O'Shaughnessy ).Speechcodingsystemsusing
`ARXmodelsandapulse-basedexcitationhavebeenshowntogiveimprovednaturalness
`andpredictionperformanceoverlinearpredictionbasedcoders(Hedelin ,Cheng&
`O'Shaughnessy ).
`Speechcodersrequireaparameterestimationprocedurethatisrobusttothee(cid:11)ects
`ofnoise.InthespeechenhancementworkbyLim&Oppenheim( )andHansen&
`Clements( ),MAPestimationwasusedtoimprovetheestimationoflinearpredic-
`tionparametersinnoisyenvironments.Thecorrelationbetweenthemodelsofadjacent
`segmentswasexploitedbyusingthemodelparametersfromprevioussegmentsasinitial
`estimatesoftheparametersforthecurrentsegment.UsingaBayesianframeworktocal-
`culatemodelparameters,priorassumptionsabouttheexpectedvaluesoftheparameters
`canbeincorporatedintotheestimationprocedure.Saleh,Niranjan&Fitzgerald( )
`haveusedthisapproachforlinearpredictionanalysis,toobtainsmoothedestimatesof
`theformanttracksofnoisyspeechutterances.
`Thereisexperimentalandtheoreticalevidencethatthespeechproductionmechanism
`isnonlinear(Teager&Teager ).Nonlinearitiesinthespeechdataarecausedby
`rapidtransitionsbetweenandduringphones,especiallyplosiveswherethereisocclusion
`ofthevocaltract,andbyturbulentexcitationdurin

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases