throbber
SpeechProcessingwithLinear
`and
`NeuralNetworkModels
`Tina-LouiseBurrows
`ACambridgeUniversityEngineeringDepartment
`TrumpingtonStreet
`CambridgeCB PZ
`England
`Thisdissertationissubmittedforconsiderationforthedegree
`ofDoctorofPhilosophyattheUniversityofCambridge
`
`Ex. 1030 / Page 1 of 202
`Apple v. Saint Lawrence
`
`

`

`Summary
`Thisdissertationinvestigatessomeaspectsofspeechprocessingusinglinearmodelsand
`singlehiddenlayerneuralnetworks.Thestudyisdividedintotwopartswhichfocuson
`speechmodellingandspeechclassi(cid:12)cationrespectively.
`The(cid:12)rstpartofthedissertationexamineslinearandnonlinearvocaltractmodels
`forsynthesisinghighqualityspeechwithadjustablepitch.Asource-(cid:12)lterframeworkfor
`analysisandsynthesisisused,inwhichthesourceisarepresentationoftheglottalvolume
`velocitywaveform.Twofamiliesoflinearmodelareconsidered,ARX(autoregressive
`withexternalinput)andOE(outputerror).Theirperformanceinestimatingvocaltract
`transferfunctionsiscomparedonsyntheticspeechdata,andthedi(cid:11)erenceisexplained
`intermsoftheparameterestimationprocedure,thefrequencydistributionofbiasin
`theestimateandtheassumptionsaboutthespectrumofthenoiseinthevocaltract
`system.ThenoisespectrumforARXmodelsisshowntobeperceptuallysigni(cid:12)cantfor
`speechsynthesisapplicationsbecauseitexploitsauditorymasking.Methodsforimproving
`poorqualitysynthesesfromOEmodelsareproposed.Nonlinearvocaltractmodels,
`implementedasfeed-forwardorrecurrentneuralnetworks,areinvestigated.Methodsfor
`initialisingnetworksfromlinearmodelsaredeveloped.Amodi(cid:12)edrecurrentarchitecture
`isintroducedwhichpermitsinitialisationfromARXmodels.Theuseofregularization,
`forimposingcontinuitybetweenmodelsofadjacentspeechsegments,andlearningrate
`adaptation,forimprovingback-propagationtraining,arediscussed.Forsynthesisingreal
`speechutterances,anaudiotapedemonstratesthatARXmodelsproducethehighest
`qualitysyntheticspeechandthatthequalityismaintainedwhenpitchmodi(cid:12)cationsare
`applied.Thesecondpartofthedissertationstudiestheoperationofrecurrentneuralnetworks
`inclassifyingpatternsofcorrelatedfeaturevectors.Suchpatternsaretypicalofspeech
`classi(cid:12)cationtasks.Theoperationofahiddennodewitharecurrentconnectionisex-
`plainedintermsofadecisionboundarywhichchangespositioninfeaturespace.The
`feedbackisshowntodelayswitchingfromoneclasstoanotherandtosmoothoutput
`decisionsforsequencesoffeaturevectorsfromthesameclass.Fornetworkstrainedwith
`constantclasstargets,asequenceoffeaturevectorsfromthesameclasstendstodrive
`theoperationofhiddennodesintosaturation.
`Itisdemonstratedthatsaturationde-
`(cid:12)neslimitsonthepositionofthedecisionboundaryresultingincontext-sensitiveand
`context-insensitiveregionsofthefeaturespace.Whilesaturationpersists,itisshownthat
`networkshavereducedsensitivitytotheorderofpresentationoffeaturevectorsbecause
`movementofthedecisionboundaryisinhibited.Toimprovethiswithin-classsensitivity,
`trainingwithramp-likeclasstargetsisinvestigated.Theoperationofsmallrecurrent
`networksisdemonstratedfortwotasks;classi(cid:12)cationofspeechutterancesintovoicedand
`unvoicedsegments,andclassi(cid:12)cationofclockwiseandanti-clockwisetrajectoriesofvectors
`producedbytwoautoregressiveprocesses.
`
`Ex. 1030 / Page 2 of 202
`
`

`

`Acknowledgements
`IwouldliketothankeveryoneintheFallsideLabformakingmytimeinCambridge
`anexperience.Inparticular,IwouldliketomentionJulianforhispracticaladvice,Rob
`forallhishelpwiththe(cid:12)ddlytape-recording,andXtofforhispatiencewithmyfaltering
`Spanish.Specialthankstomysupervisor,Dr.MahesanNiranjan,forhisguidance,andto
`Dr.LjungandDr.Maciejowskiforhelpfuldiscussionsonsystemidenti(cid:12)cationtheory.The
`biggestthank-youofallgoestomysister,Tanya,forallherloveandsupport,especially
`whilewritingup.
`ThisworkhasbeenfundedbytheScienceandEngineeringResearchCouncilwith
`someusefultop-upsfromtheEngineeringDepartmentandQueens'College.
`Dedication
`ToMumandDad.ThankyouforsupportingmeinallthemadthingsIdo.
`Declaration
`This,worddissertationisentirelytheresultofmyownworkandincludesnothing
`whichistheoutcomeofworkdoneincollaboration.
`Tina-LouiseBurrows
`Queens'College
`March, 
`
`Ex. 1030 / Page 3 of 202
`
`

`

`Contents
`
` Introduction
`
` . TheSpeechProductionMechanism.......................
`
` .SpeechProcessing................................
`
` .. ReviewofResearchinModellingSpeechSignals
`...........
`
` ..ReviewofResearchinClassi(cid:12)cationwithNeuralNetworks
`.....
`
` . OutlineofThesis.................................
`
` . . PartI-VocalTractModelling.....................
` . .PartII-Classi(cid:12)cationofSpeechPatterns...............
` .Publications....................................
`IVocalTractModelling
` 
`ModellingtheSpeechSignal
`
`.
`Introduction....................................
`.AcousticModelling................................ 
`.. FrequencyDomainAcousticModelling................. 
`..TimeDomainAcousticModelling
`................... 
`. LinearPredictionAnalysis............................ 
`. . LinearPredictionforSpeechAnalysisandSynthesis......... 
`. .SpectralMatching............................
`. . PredictorOrder..............................
`. .Pre-emphasisofSpeech.........................
`. .LimitationsofLinearPredictionforAnalysisandSynthesis.....
`ImprovementstoLinearPredictionAnalysisandSynthesis..........
`.
`.. Analysis-by-SynthesisTechniques....................
`..PerceptualWeightingFilters
`......................
`.. DecouplingtheSourceandVocalTractFilter.............
`i
`
`Ex. 1030 / Page 4 of 202
`
`

`

`ii
`Contents
`.SystemIdenti(cid:12)cationApproachtoVocalTractModelling..........
` LinearModelsoftheVocalTract
`
` . LinearBlack-BoxModels
`............................
` . . ARXModels...............................
` . .OEModels................................
` .PredictionversusSynthesis
`........................... 
` . ParameterEstimation.............................. 
` . . FrequencyDomainInterpretationofPrediction-ErrorMethod.... 
` . .PerceptualSigni(cid:12)canceoftheModelNoiseandTransferFunction
`Bias.................................... 
` . . ChangingtheNoiseModelandTransferFunctionBias........ 
` .ModelOrderSelection.............................. 
` .. A(q)andF(q)
`.............................. 
` ..B(q).................................... 
` .GeneratinganExcitationWaveformforBlack-BoxModels..........
` ..
`InverseFilteringTechniques.......................
` ..VolumeVelocityPulseModels
`.....................
` .. TheGlottalExcitationModelUsedinThisWork...........
` .ComparisonofDi(cid:11)erentAnalysisMethodsUsingSyntheticData......
` .. NoiseModelandTransferFunctionEstimate.............
` ..E(cid:11)ectofPre-emphasisonEstimationofTransferFunction......
` .. E(cid:11)ectofNoiseonEstimationofTransferFunction..........
` ..E(cid:11)ectofMisalignmentofExcitationonEstimationofTransferFunction
` .TheVocalTractModellingFramework.....................
` .PreprocessingofSpeechandLaryngographData...............
` .. SourcesofSpeechandLaryngographData...............
` ..
`InitialPreprocessing...........................
` .. PitchandVoicingAnalysis
`.......................
` . TheVocalTractFilter..............................
` . . ModelOrder
`...............................
` . .ParameterEstimation..........................
` . . FilterImplementation..........................
` . PerformanceonRealSpeechDataatNormalPitch..............
` . . LinearPredictionPerformance.....................
` . .ARXPerformance............................
`
`Ex. 1030 / Page 5 of 202
`
`

`

`iii
`Contents
` . . ComparisonofOEandARXModels..................
` . .ImprovingthePerformanceoftheOEmodel
`.............
` . .E(cid:11)ectofMisalignmentErrorsonEstimationofVocalTractTransfer
`Function..................................
` . .ParallelImplementation.........................
` . PitchManipulationofSynthesisedSpeech...................
` . . PitchManipulationbyPSOLA.....................
` . .PitchManipulationPerformance....................
` . ConcludingRemarks............................... 
` . . EvaluationofPerformance........................ 
` . .ChoosingaSuitablePre-emphasisFilter................ 
` . . SpeechCoding.............................. 
` . .LimitationsoftheLinearVocalTractSystem............. 
`NeuralNetworkModelsoftheVocalTract
` 
`. Di(cid:14)cultiesinModellingLongUtteranceswithNeuralNetworks....... 
`.TheNeuralNetworkModel
`........................... 
`.. OperationofHiddenNodesofSmallNetworksonContinuousData.
`.
`InitialisationofNeuralNetworkWeights....................
`. . ReviewofWorkonInitialisationofNeuralNetworkWeights
`....
`. .MotivationforInitialisationfromLinearModels
`........... 
`. . WeightInitialisationsforFeed-forwardNetworks........... 
`. .WeightInitialisationsforSingleDelayRecurrentNetworks
`.....
`.Modi(cid:12)edRecurrentNetworkArchitecture................... 
`.. WeightInitialisationfromARXmodel................. 
`..Trainingbyback-propagation......................
`.NonlinearVocalTractModellingUsingNeuralNetworks...........
`.. SelectingaNetworkArchitecture.................... 
`..SizeofNetworksandInitialARXmodels
`............... 
`..
`IssuesforBack-propagation....................... 
`.PerformanceofNetworkModels......................... 
`ImprovingThePerceptualQualityofNetworkSynthesis........... 
`.
`.. Back-propagationwithRegularization................. 
`..
`ImprovedPerformanceResults
`..................... 
`.ConcludingRemarks............................... 
`.. Summary................................. 
`
`Ex. 1030 / Page 6 of 202
`
`

`

`iv
`Contents
`..................
`..StabilityofNeuralNetworkModels
`.. UsefulnessofLinearInitialisation....................
`..DrawbacksofaNonlinearModel
`.................... 
`IIClassi(cid:12)cationofSpeechPatterns
`
`RecurrentNetworksforContext-DependentSpeechClassi(cid:12)cation
` 
`.
`Introduction.................................... 
`. . Context-dependentPatternClassi(cid:12)cationTasks............ 
`. .HiddenMarkovModelsvsNeuralNetworks.............. 
`. . RecurrentNetworksforContext-DependentPatternClassi(cid:12)cation. 
`.TheRecurrentNetworkDecisionBoundary.................. 
`. TheE(cid:11)ectofSaturation-Context-Sensitivity................. 
`.TheE(cid:11)ectofDecisionBoundaryMovement-Trajectory-Sensitivity.... 
`.
`ImplicationsforClassi(cid:12)erPerformance..................... 
`.. Misclassi(cid:12)cations............................. 
`..Between-ClassContext-SwitchingDelay............... 
`.. Within-ClassContext-OutputSmoothing.............. 
`.LargerNetworks
`................................. 
`.Classi(cid:12)cationofDVectorARprocesses.................... 
`.Voiced-UnvoicedClassi(cid:12)cationofSpeechUtterances
`............. 
`. ConcludingRemarks............................... 
`ConclusionsandFurtherWork
` 
`. Conclusions.................................... 
`. . VocalTractModelling.......................... 
`. .Classi(cid:12)cationofSpeechPatterns.................... 
`.FurtherWork................................... 
`.. LinearModelsoftheVocalTract.................... 
`..NeuralNetworkModelsoftheVocalTract
`.............. 
`.. Classi(cid:12)cationofSpeechPatterns.................... 
`ABack-propagationTrainingForMulti-DelayRecurrentNeural
`Networks
` 
`BTapeDemonstration
` 
`B. Introduction.................................... 
`
`Ex. 1030 / Page 7 of 202
`
`

`

`v
`Contents
`B.ComparisonofDi(cid:11)erentLinearModels..................... 
`B. PitchManipulationusingARXandLPModels................ 
`B.NeuralNetworkModelsoftheVocalTract................... 
`
`Ex. 1030 / Page 8 of 202
`
`

`

`ListofFigures
`
`.........................
` . SpeechProductionMechanism.
`
` .Typicalspeechwaveformandspectrograms...................
`. Theacoustictheoryofspeechproduction.
`................... 
`.CascadeandParallelFormantSynthesisers.
`.................. 
`. Source-(cid:12)lterarrangementforLPsynthesis.
`.................. 
`.Examplesoflinearpredictionspectra.
`.....................
`.Amplitudespectrafortypicalweighting(cid:12)lters.
`................
` . Systemidenti(cid:12)cationapproachtovocaltractmodelling.
`...........
` .Operationofblack-boxmodelsinpredictionandsynthesis.
`.........
` . Source-(cid:12)ltercon(cid:12)gurationsforblack-boxmodels.
`...............
` .TheRosenbergglottalvolumevelocitywavepulse.
`..............
` .Typicalspeech,laryngograph,residualandglottalvolumevelocitywaveforms.
` .Estimatesofthevocaltracttransferfunctionforthesyntheticvowelin`hod'.
` .Frequencybiasfunctionsforthesyntheticvowelin`hod'.
`..........
` .Noisemodelsandspectraofsynthesiserrorsforthesyntheticvowelin`hod'.
` . Estimatesofthevocaltracttransferfunctionforthesyntheticnasalised
`vowel=~(cid:15)=......................................
` . Frequencybiasfunctionsfor=~(cid:15)=.
`........................
` . Absolute(%)errorinestimationofformantsandbandwidthsofsynthetic
`voweldata.
`....................................
` . E(cid:11)ectofnoiseonestimationoftransferfunctionofthesyntheticvowelin
``hawed'.
`......................................
` . VaryingstagesintheestimationofOEtransferfunction.
`..........
` . E(cid:11)ectofmisalignmentofexcitationontransferfunctionestimates.
`.....
` . E(cid:11)ectofalignmenterrorsonpredictionerror,synthesiserrorandsynthesis
`forthesyntheticvowelin`hud'..........................
` . Thevocaltractmodellingframeworkforspeechsynthesis...........
`vi
`
`Ex. 1030 / Page 9 of 202
`
`

`

`vii
`ListofFigures
`.................
` . Pitch-synchronousparameterupdatescheme.
`............
` . Comparisonofpitchcontoursatdi(cid:11)erentframerates.
` . Typicalvoicingtransitionswhereerrorsinvoicingdecisionoccur.......
` .Comparisonofspectrafromautocorrelationandcovariancemethodsoflin-
`earprediction.
`..................................
` . ComparisonofARXandLPmodelsforthespeechfragment`inlang'.
`...
` .ComparisonofARXandLPmodelsforasegmentofthephone`ng'.....
` . Spectrogramsofsynthesisoftheutterance`Germany'sdecisionfollowed
`eightyearslater'bydi(cid:11)erentvocaltractmodels.
`...............
` .ComparisonperformanceofOEandARXmodelsinsynthesis.
`.......
` .Transferfunctionestimatesforthesyntheticvowelin`hawed'.
`.......
` .ComparisonofperformanceofARXandregularizedOEmodels.
`......
` .SpectrogramofsynthesisbyregularizedOEmodel...............
` .ModifyingthepitchofvoicedspeechusingthePSOLAmethod........
` . Spectrogramsofpitchmanipulatedsynthesisfromdi(cid:11)erentmodels......
`. Structureofaneuralnetworkmodelofthevocaltract............. 
`.Regionsofoperationoftanhnonlinearity.
`...................
`.
`Illustrationoftheoperationofhiddennodesofatwonodenetwork.
`.... 
`.Comparisonofinitialisationtechniquesforfeed-forwardnetworks.......
`.ComparisonofinitialisationtechniquesforRNN .
`.............. 
`.ComparisonofinitialisationtechniquesforRNN.
`.............. 
`.Structureofmodi(cid:12)edrecurrentnetwork..................... 
`.................
`.SpectraofARXmodelandHi(z)forRNN .
`. E(cid:11)ectoflearningrateadaptationonback-propagationtrainingoffeed-forward
`andrecurrentnetworks.
`............................. 
`. Spectrogramsofsynthesesbynetworkmodels.
`................
`. Phonerecognition-acontext-dependentclassi(cid:12)cationtask.
`......... 
`.HMMandneuralnetapproachestophonerecognition............. 
`. Singlehiddennodewithrecurrentconnection..................
`.E(cid:11)ectofarecurrentconnectiononthepositionofthedecisionboundaryin
`featurespace.................................... 
`.E(cid:11)ectofsaturationonpositionofthedecisionboundary.
`.......... 
`.Featurespaceprojections(vTx(t))forwhichclassi(cid:12)cationcausesmovement
`ofthedecisionboundary.
`............................ 
`.Anexampleoftrajectory-sensitivity.
`...................... 
`
`Ex. 1030 / Page 10 of 202
`
`

`

`viii
`ListofFigures
`.Two-stateHMMequivalenttoasinglenoderecurrentnetwithstepnon-
`linearity....................................... 
`. E(cid:11)ectofgradientofnonlinearfunctiononswitchingdelay........... 
`. E(cid:11)ectofgradientofnonlinearfunctiononswitchingspeed.
`......... 
`. Operationofclassi(cid:12)eronaclass -trajectory.
`................ 
`. E(cid:11)ectofboundarymovementonoutputsmoothing.
`............. 
`. FeaturespaceforvectorARprocesses,showinglimitingpositionsofthe
`decisionboundarywhenhiddenunitssaturate.
`................ 
`. Operationoftherecurrentnetwork(nh= ),trainedwith(cid:12)xedclasstar-
`gets,inclassifyingthevectorARtrajectories.................. 
`. Operationoftherecurrentnetwork(nh=),trainedwith(cid:12)xedclasstar-
`gets,inclassifyingthevectorARtrajectories.................. 
`. Operationofrecurrentnetwork(nh=),trainedwithexponentialclass
`targets,inclassifyingthevectorARtrajectories.
`............... 
`. Operationofrecurrentnetwork(nh=),trainedwith(cid:12)xedclasstargets,
`invoiced-unvoicedclassi(cid:12)cationofthesentence\Johncleansshell(cid:12)shfora
`living"........................................ 
`. Operationofrecurrentnetwork(nh=),trainedwith(cid:12)xedclasstargets,
`invoiced-unvoicedclassi(cid:12)cationofthesentence\Johncleansshell(cid:12)shfora
`living"........................................ 
`B. Pitchcontoursappliedtotheutterance\Francebecamethe(cid:12)rstdecimal
`countryinEurope,in  ".
`.......................... 
`B.Pitchcontoursappliedtotheutterance\:::
`joinedbyBelgium,Italyand
`Switzerland,in ".
`.............................. 
`B. Pitchcontoursappliedtotheutterance\Germany'sdecisionfollowedeight
`yearslater"..................................... 
`
`Ex. 1030 / Page 11 of 202
`
`

`

`ListofTables
` . Summaryofsource-(cid:12)lterparametersforgenerationofsyntheticdata.....
` .Summaryofmodelordersusedinanalysisofsyntheticspeechdata.
`....
` . E(cid:11)ectofmisalignmentofexcitationonpredictionandsynthesisSNRfor
`ARXandOEmodels.
`..............................
` .E(cid:11)ectofvariationofmodelorderonmeanpredictionSNRforlinearpre-
`dictionmodels.
`..................................
` .E(cid:11)ectofvariationofmodelorderonmeanSNR(dB)forARXmodels....
` .Summaryofmodelordersusedforvocaltractmodellingusingblack-box
`andlinearpredictionmodels.
`..........................
` .
`ImprovementinpredictionSNRofARXmodelsoverLPmodels.
`......
` .E(cid:11)ectofvariationofinputanduseofpre-emphasisonOEandARXmodels.
`. Summaryofparametervaluesforvocaltractmodellingwithneuralnetworks. 
`.Performanceofnetworkstrainedwithlearningrateadaptation........ 
`. Performanceofregularizednetworks....................... 
`. PerformanceresultsfornetworkstrainedtoclassifyvectorARprocesses... 
`.MappingbetweenTIMITphonelabelsandvoiced-unvoicedclasses...... 
`. Comparisonofperformanceofrecurrentnetworks(RNN)andextracted
`feed-forwardnetwork(FNN)withequivalentweightsUandV........ 
`.Comparisonofperformanceofrecurrentnetworkstrainedwith(cid:12)xedand
`exponentialtargetsforvoiced-unvoicedclassi(cid:12)cationofspeech.
`....... 
`ix
`
`Ex. 1030 / Page 12 of 202
`
`

`

`ListofNotation
`Abbreviations
`AR
`autoregressivemodel
`ARX
`autoregressive(AR)modelwithexternalinput(X)
`OE
`outputerrormodel
`LP
`linearpredictionmodel
`CELPcodeexcitedlinearprediction
`FNN
`feedforwardneuralnetwork
`RNN
`recurrentneuralnetwork
`HMMhiddenMarkovmodel
`ETFEempiricaltransferfunctionestimate
`SNR
`signal-to-noiseratio
`MSEmeansquarederror
`SymbolDe(cid:12)nitions
`R(z);R(q)
`lipradiationcharacteristic
`P(z);P(q)
`pre-emphasis(cid:12)lter
`^H(z),H(z),H(q)
`vocaltracttransferfunction
`H(ej!;(cid:18))
`vocaltractfrequenyresponse
`^^H(q)
`EmpiricalTransferFunctionEstimate
`Q(!;(cid:18))
`frequencybiasfunctionfortransferfunctionestimate
`N(q),N(ej!;(cid:18))
`modelnoiseandcorrespondingspecturm
`spectrumofsynthesiserror
`(cid:8)ER(!;(cid:18))
`L(t),dL(t)
`laryngographsignaland(cid:12)rstdi(cid:11)erence
`x(t),dx(t)
`glottalvolumevelocitywavemodeland(cid:12)rstdi(cid:11)erence
`X(ej!)
`spectrumofinputwaveform(x(t)ordx(t))
`y(t)
`speechwaveform
`Y(ej!)
`speechspectrum
`^ys(t)
`modelsynthesis
`^yp(t)
`modelprediction
`x(t)
`networkinputvector
`h(t)
`hiddennodeoutput
`^y(t)
`networkoutput(predictionorsynthesis)
`U,V,W
`networkweights(output,inputandfeedback)
`q(cid:0)
`backwardshiftoperator,q(cid:0) x(t)=x(t(cid:0) )
`z,z(cid:0)
`z-transforms
`(:)T
`denotesmatrixtranspose
`k:k
`denotesEuclideannormx
`
`Ex. 1030 / Page 13 of 202
`
`

`

`Chapter
`Introduction
`\InthebeginningwastheWord,andtheWordwaswithGod,andtheWordwasGod."
`St.John : .
`Speechistheacousticrealisationofalanguage.Ourknowledgeofhowwespeak,
`hear,recogniseandunderstandalanguagecanbeincreasedbystudyingthespeechsignal
`andattemptingtomodelthesefunctions.Thisthesisinvestigatessomeissuesforspeech
`processingwithlinearandneuralnetworkmodels.Inthischapter,thespeechproduction
`mechanismisdescribedandsomeoftheterminologyapplicabletospeechprocessingis
`introduced.Previousrelevantresearchinspeechprocessingwithlinearandneuralnetwork
`modelsisreviewedandtheresearchpresentedinthisthesisisoutlined.
` . TheSpeechProductionMechanism
`Themechanismforspeechproduction,showninFig. . ,consistsofthetrachea,vocal
`cords,tongue,vocaltract(oralandnasalcavities),lips,teethandnostrils,inadditionto
`thediaphragmandlungs.Aspeechutterancebeginsasanairstreamorvolumevelocity
`wavefromthelungs,whichtravelsalongthetracheaandvocaltracttoberadiatedasan
`acousticpressurewaveformfromthelipsorthelipsandnostrils.
`Speechisclassi(cid:12)edasvoicedorunvoiced,dependingonthenatureoftheexcitationof
`thevocaltract.Forvoicedphones,theexcitationofthevocaltractoriginatesattheglottis
`andisbytheperiodicvibrationofthevocalcords.Thefrequencyofvibration,orpitch,is
`controlledbythetensioninthevocalcordsandtheairpressurefromthelungs.Typical
`pitchvalueslieintherange-Hzforadults,andcanriseto Hzinchildren.Due
`toitsperiodicnature,thespectrumofvoicedexcitationcontainsdiscretecomponentsat
`harmonicsofthepitchfrequency.Forunvoicedsounds,theexcitationisduetoturbulence
`generatedbyair(cid:13)owpastanarrowconstrictionandtendstoberandominnature,with
`a(cid:13)at,continuousspectrum.Thenoiseisknownasaspirationiftheconstrictionisatthe
`glottisandfricationifitoccursatsomepointalongthevocaltract.Mixedexcitation
`
`
`Ex. 1030 / Page 14 of 202
`
`

`

`
` .Introduction
`Figure . :SpeechProductionMechanism.
`isalsopossiblefortheclassofsoundsknownasvoicedfricatives,inwhichturbulent
`excitationisamplitudemodulatedperiodicallybythevibrationofthevocalcords.
`Theacousticsignalcanberepresentedbyatranscriptionofphonemes,whichare
`thesmallestunitswhichconveylinguisticmeaningofalanguage.Theactualsounds
`whichareproducedinspeakingastringoftargetphonemesarecalledphones.Each
`phoneofanutterancecorrespondstoasegmentoftheacousticwaveformwhichhas
`acharacteristictime-varyingvibratorypattern.Vibratorypatternsaresuperimposed
`ontheairstreambythevibrationofthevocalcordsandresonanceofthevocaltract.
`Theresonantpropertiesofthevocaltractaremodi(cid:12)edbychangingthepositionofthe
`articulators(thelips,tongue,jawandvelum,showninFig. . .)Duetothephysical
`constraintsofthevocaltract,thepositionsofthearticulatorscanonlychangeslowlywith
`timeandindividualrealisationsofaphonearestronglyin(cid:13)uencedbypreviousandfuture
`phonesinanutterance.Thisphenomenonisknownasco-articulationandisimportant
`forbothaccuratespeechrecognitionandnaturalsoundingspeechsynthesis.
`Duetotheslowlytime-varyingnatureoftheacousticwaveformforeachphone,the
`resultingspectrumofthespeechvarieswithtime.Thetimevariabilityofthespectrumis
`capturedbycalculatingthespectrumofoverlappingshort-timesegmentsoftheacoustic
`waveformandisdisplayedusingaspectrogram.Aspectrogramplotsthefrequencyof
`successiveshort-timespectrausingtheintensityoftheplottoindicatetheenergyofthe
`frequencycomponentsataparticularinstant.Mostoftheenergyinthespeechspec-
`trumisbetween-Hz.Intelligibilitytestsonband-pass(cid:12)lteredspeechshowthat
`intelligibilityisnotimpairedwhenspeechislow-pass(cid:12)lteredtoremoveallfrequencies
`abovekHz(French&Steinberg ,Klatt ).Thispermitsalowersamplingrate
`of kHz.Withinthefrequencyrange-kHz,thevocaltractforvoicedphonestypically
`has-resonantfrequencies(Klatt )whicharecalledformants.Formantsarevisible
`asdarkhorizontalbandsonaspectrogram.Examplesofwidebandandnarrowbandspec-
`trogramsfortheutterance`Belgium'areshowninFig. ..Widebandandnarrowband
`
`Ex. 1030 / Page 15 of 202
`
`

`

`
` .Introduction
`spectrogramsrepresentatradeo(cid:11)betweentimeandfrequencyresolutionofthespectrum.
`Narrowbandspectorgramsuseshort-timespeechsegmentsofacoupleofpitchperiodsin
`duration.Theresultingspectrogramhashighfrequencyresolution(yaxis)andindividual
`pitchharmonicsappearascloselyspacedhorizontalbands,asillustratedinFig. .(b).
`However,timeresolutionispoorandrapidformanttransitionsareaveragedovertime.
`Fortheclassofsoundscalledstops,the`b'in`Belgium'forexample,thevocaltract
`becomescompletelyoccludedbythetongueorlipsforpartoftheutterance.Rapidmove-
`mentofthearticulatorstoreleasetheocclusion,whichmaybeaccompaniedbyaburst
`ofnoise,givesrisetosoundsthatareshortindurationandhighlytransientinnature.
`Thetime-variabilityinthespectrumofsuchphonesmaynotbeaccuratelyrepresentedby
`anarrowbandspectrogram.Widebandspectrogramsuseshort-timesegmentsofroughly
`onepitchperiodindurationandgivemuchbettertimeresolutionattheexpenseoffre-
`quencyresolution.Forvoicedspeech,verticalstriationsatthepitchperiodarevisible,as
`illustratedinFig. .(c).
`Duringnasals,suchas`m'or`n',theair(cid:13)owisdivertedintothenasalcavityby
`theloweringofthevelum.Withthelipsclosed,thenasalcavityformstheprincipal
`resonantpathwhichdeterminestheformantsandthevocaltractactsasaclosedside-
`branchwhichintroducesananti-resonance(spectralvalley)intothespectrum.Innasalised
`vowels,boththenasalandoralcavitiesareopenandsoundisradiatedfromthelipsand
`nostrilssimultaneously.Themainresonancesareduetotheoralcavity,whichdetermines
`thelocationoftheformants,andthenasalbranchisconsideredastheside-branch.
`Onreachingthelipsandnostrils,thee(cid:11)ectofdirectionalsoundpropagationfrom
`theseaperturesistoconvertthevolumevelocitywaveintoanacousticpressurewaveform
`whichradiatesawayfromthehead.Thepressurewavemeasureddirectlyinfrontofthe
`headisproportionaltothetimederivativeoftheresultantvolumevelocitywavefromthe
`lipsandnostrils,andisinverselyproportionaltothedistancefromthelips(Fant ).
`Theradiatione(cid:11)ectcanbeapproximatedasthatofradiationfromacircularaperturein
`asphereorin(cid:12)niteplane(Flanagan )andtheamplitudespectrumoftheresultant
`acousticwaveformisapproximatelymodi(cid:12)edby+dB/octavewhencomparedtothatof
`thevolumevelocitywaveattheendofthevocaltract.
`Additionalfeatureswhichaddintelligibility,meaningandnaturalnesstospeechare
`stressand,overlongerphrasaldurations,prosody.
`Inadditiontopitch,durationand
`intensity(loudness)constitutetheparametersofstressandprosodywhichareusedto
`emphasiseimportantacousticeventsandbreakspeechupintomeaningfulunits.Ata
`higherphrasallevel,speci(cid:12)cprosodicpatternscanalsoconveyemotionandattitude.
` .SpeechProcessing
`Thetwoareasofspeechprocessingconsideredinthisthesisaresignalmodelling(for
`speechsynthesis)andsignalclassi(cid:12)cation(forspeechrecognition).Inmodellingthespeech
`signal,theaimistoparametrizespeechwaveformsinsuchawaythattheycanbestored
`
`Ex. 1030 / Page 16 of 202
`
`

`

`
`
`0.10
`
`0.20
`
`0.30
`
`0.40
`
`0.05
`
`0.1
`
`0.15
`
`0.2
`
`0.25
`
`0.3
`
`0.35
`
`0.4
`
`0.45
`
`0.05
`
`0.1
`
`0.15
`
`0.2
`
`0.25
`
`0.3
`
`0.35
`
`0.4
`
`0.45
`
` .Introduction
`
`6000
`
`4000
`
`2000
`
`0
`
`−2000
`
`−4000
`
`−6000
`
`−8000
`
`−10000
`
`−12000
`
`−14000
`
`4000
`
`3500
`
`3000
`
`2500
`
`2000
`
`1500
`
`1000
`
`500
`
`0
`
`4000
`
`3500
`
`3000
`
`2500
`
`2000
`
`1500
`
`1000
`
`500
`
`0
`
`−16000
`
`(a)Speechutterance`Belgium'
`(b)Narrowbandspectrogram
`(c)Widebandspectrogram
`Figure .:Typicalspeechwaveformandspectrograms.Forspectrograms,horizontalaxis
`showstimeinseconds,verticalaxisshowsfrequencyinHz.
`
`Ex. 1030 / Page 17 of 202
`
`

`

`
` .Introduction
`e(cid:14)cientlyandreproduced(synthesised)atalaterdate.Parametersformodelscanbe
`foundbyperformingatimeorfrequencydomainmatchbetweentheoriginalspeechsignal
`andthatgeneratedbythemodel.
`Inclassi(cid:12)cation,modelsaredevelopedtoassignclasslabelstosegmentsoftheacoustic
`signalbasedonthedistinguishingfeaturesofaparametricrepresentationofeachsegment.
`Inspeechrecognition,forexample,theclasslabelsarelinguisticunitsofthelanguage
`suchasphones,diphonesortriphones.Thelinguisticunitscanformtheinputforhigher
`levelnaturallanguageprocessing,inwhichsyntacticandsemanticconstraintsonpossible
`linguisticsequencesareappliedandthemeaningoftheintendedutteranceextracted.
`Lowerlevelclassesarealsopossible,suchasclassifyingthespeechsignalintovoicedand
`unvoicedsegments.
`Speechprocessingtypicallyinvolvestheuseorcalculationofaparametricrepresenta-
`tionofacousticwaveforms.Speechsignalsarenon-stationaryandwhenprocessinglong
`utterances,atime-varyingparametricrepresentationisneeded.Aquasi-stationaryap-
`proachisusuallyadopted,inwhichanutteranceisdividedintoasequenceofoverlapping
`segmentsandassumedtobestationaryforthedurationofeachsegment.Sincethespeech
`productionmechanismcanchangeonlyslowlywithtime,parametricrepresentationsof
`adjacentsegmentsofspeechshowahighdegreeofcorrelation.Formodellingacoustic
`waveforms,thisimpliescontinuityinthevaluesofmodelparametersforadjacentseg-
`ments.Forspeechclassi(cid:12)cation,itimpliesthattheclasslabelassignedtoaparticular
`featurevectorisdependentonthecontextinwhichthatfeaturevectoroccursinaninput
`sequence(context-dependentclassi(cid:12)cation).Exploitingthecorrelationbetweensegments
`ofspeechishighlybene(cid:12)cialforspeechprocessingapplicationsandisawayofrepresenting
`co-articulatione(cid:11)ects.
` .. ReviewofResearchinModellingSpeechSignals
`Themostwidelyusedtechniqueforspeechanalysisislinearpredictionanalysis(Makhoul
` ,Markel&Gray ),andformsthebasisofmostspeechcodingsystems,suchas
`vocoders(Markel&Gray ),CELP(code-excitedlinearprediction)coders(Schroeder
`&Atal ),multi-pulsecoders(Atal&Remde )andahostofvariantswhichdi(cid:11)erin
`thenatureoftheexcitationofthelinearpredictionmodelatthedecoder.Thepopularity
`oflinearpredictionisduetoeaseofanalysisandimplementationandlowcomputational
`requirements.Analternativeapproachistomodelthetransferfunctionofthevocal
`tractsystem(vocaltractmodelling).ARX(autoregressivewithexternalinput)(Lobo&
`Ainsworth ,Fujisaki&Ljungqvist ),OE(outputerror)(Wang,Guan&Fujisaki
` )andstate-space(Morikawa&Fujisaki )parametrizationsforthevocaltract
`(cid:12)lterhavebeenusedanddi(cid:11)erintheirunderlyingstructureofthemodelandthenature
`oftheerrorwhichisminimisedintheparameterestimationprocedure.
`Modellingthevocaltracttransferfunctiondirectlyallowsinclusionofzerosinthe
`modelandhasbeenshowntoimprovepredictiongainevenforasimpleimpulse(Fu-
`
`Ex. 1030 / Page 18 of 202
`
`

`

`
` .Introduction
`jisaki&Ljungqvist ),ormulti-pulseexcitation(Singhal&Atal  ).Theuseofa
`morerealisticrepresentationofthevocaltractexcitation,basedonglottalvolumevelocity
`wavepulsemodels,hasbeenshowntoimprovethepredictiongainby- dBwhencom-
`paredwithlinearpredictionanalysis(Fujisaki&Ljungqvist ,Thomson ,Hedelin
` ),andgivesimprovednaturalnessofsyntheticspeechgeneratedfrombothformant
`synthesisersandvocaltractmodels(Holmes  ,Rosenberg  ,Fujisaki&Ljungqvist
` ).Alternativeapproachestomodellingtheexcitationsignalincludelinearandnon-
`linearinverse(cid:12)lteringtechniques(Alku ,Milenkovic ,Denzler,Kompe,Kie(cid:25)ling,
`Niemann&N(cid:127)oth ),incorporatinganall-zeromodeldirectlyinthevocaltracttrans-
`ferfunction(Mathews,Miller&David  ,Funaki&Mitome )orincorporating
`amoregeneralfunction-basedmodeloftheexcitationwithinthevocaltracttransfer
`function(Thomson ,Cheng&O'Shaughnessy  ).Speechcodingsystemsusing
`ARXmodelsandapulse-basedexcitationhavebeenshowntogiveimprovednaturalness
`andpredictionperformanceoverlinearpredictionbasedcoders(Hedelin ,Cheng&
`O'Shaughnessy ).
`Speechcodersrequireaparameterestimationprocedurethatisrobusttothee(cid:11)ects
`ofnoise.InthespeechenhancementworkbyLim&Oppenheim( )andHansen&
`Clements( ),MAPestimationwasusedtoimprovetheestimationoflinearpredic-
`tionparametersinnoisyenvironments.Thecorrelationbetweenthemodelsofadjacent
`segmentswasexploitedbyusingthemodelparametersfromprevioussegmentsasinitial
`estimatesoftheparametersforthecurrentsegment.UsingaBayesianframeworktocal-
`culatemodelparameters,priorassumptionsabouttheexpectedvaluesoftheparameters
`canbeincorporatedintotheestimationprocedure.Saleh,Niranjan&Fitzgerald( )
`haveusedthisapproachforlinearpredictionanalysis,toobtainsmoothedestimatesof
`theformanttracksofnoisyspeechutterances.
`Thereisexperimentalandtheoreticalevidencethatthespeechproductionmechanism
`isnonlinear(Teager&Teager ).Nonlinearitiesinthespeechdataarecausedby
`rapidtransitionsbetweenandduringphones,especiallyplosiveswherethereisocclusion
`ofthevocaltract,andbyturbulentexcitationdurin

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket