`
`IPR2023-00035
`Apple EX1035 Page 1
`
`
`
`1. My name is June Ann Munford. I am over the age of 18, have personal
`
`knowledge of the facts set forth herein, and am competent to testify to the
`
`same.
`
`2. I earned a Master of Library and Information Science (MLIS) from the
`
`University of Wisconsin-Milwaukee in 2009. I have over ten years of
`
`experience in the library/information science field. Beginning in 2004, I
`
`have served in various positions in the public library sector including
`
`Assistant Librarian, Youth Services Librarian and Library Director. I have
`
`attached my Curriculum Vitae as Appendix CV.
`
`3. During my career in the library profession, I have been responsible for
`
`materials acquisition for multiple libraries. In that position, I have cataloged,
`
`purchased and processed incoming library works. That includes purchasing
`
`materials directly from vendors, recording publishing data from the material
`
`in question, creating detailed material records for library catalogs and
`
`physically preparing that material for circulation. In addition to my
`
`experience in acquisitions, I was also responsible for analyzing large
`
`collections of library materials, tailoring library records for optimal catalog
`
`1
`
`IPR2023-00035
`Apple EX1035 Page 2
`
`
`
`search performance and creating lending agreements between libraries
`
`during my time as a Library Director.
`
`4. I am fully familiar with the catalog record creation process in the library
`
`sector. In preparing a material for public availability, a library catalog record
`
`describing that material would be created. These records are typically
`
`written in Machine Readable Catalog (herein referred to as “MARC”) code
`
`and contain information such as a physical description of the material,
`
`metadata from the material’s publisher, and date of library acquisition. In
`
`particular, the 008 field of the MARC record is reserved for denoting the
`
`date of creation of the library record itself. As this typically occurs during
`
`the process of preparing materials for public access, it is my experience that
`
`an item’s MARC record indicates the date of an item’s public availability.
`
`5. Typically, in creating a MARC record, a librarian would gather various bits
`
`of metadata such as book title, publisher and subject headings among others
`
`and assign each value to a relevant numerical field. For example, a book’s
`
`physical description is tracked in field 300 while title/attribution is tracked in
`
`field 245. The 008 field of the MARC record is reserved for denoting the
`
`creation of the library record itself. As this is the only date reflecting the
`
`inclusion of said materials within the library’s collection, it is my experience
`
`2
`
`IPR2023-00035
`Apple EX1035 Page 3
`
`
`
`that an item’s 008 field accurately indicates the date of an item’s public
`
`availability.
`
`6. I have reviewed Exhibit 1012, Efficient Algorithms for Speech Recognition
`
`by Mosur K. Ravishankar.
`
`7. Attached hereto as Appendix RAVISHANKAR01 is a true and correct copy
`
`of the MARC record for Efficient Algorithms for Speech Recognition as held
`
`by the Carnegie Mellon University library. I secured this record myself from
`
`the library’s public catalog. The MARC record contained within Appendix
`
`RAVISHANKAR01 accurately describes the title, author, publisher, and
`
`submission details of Efficient Algorithms for Speech Recognition by Mosur
`
`K. Ravishankar.
`
`8. Attached hereto as Appendix RAVISHANKAR02 is a true and correct copy
`
`of selections from Efficient Algorithms for Speech Recognition. I secured
`
`these scans myself from the Carnegie Mellon University’s holdings. In
`
`comparing Exhibit 1012 to Appendix RAVISHANKAR02, it is my
`
`determination that Exhibit 1012 is a true and correct copy of Efficient
`
`Algorithms for Speech Recognition by Mosur K. Ravishankar.
`
`3
`
`IPR2023-00035
`Apple EX1035 Page 4
`
`
`
`9. The 008 field of the MARC record in Appendix RAVISHANKAR01
`
`indicates the date of record creation. The 008 field of Appendix
`
`RAVISHANKAR01 indicates the Carnegie Mellon University library first
`
`acquired this thesis as of June 27, 1996. Considering this information, it is
`
`my determination that Efficient Algorithms for Speech Recognition was
`
`made available to the public at least as early as December 31, 1996.
`
`
`10. I have been retained on behalf of the Petitioner to provide assistance in the
`
`above-illustrated matter in establishing the authenticity and public
`
`availability of the documents discussed in this declaration. I am being
`
`compensated for my services in this matter at the rate of $100.00 per hour
`
`plus reasonable expenses. My statements are objective, and my
`
`compensation does not depend on the outcome of this matter.
`
`
`11. I declare under penalty of perjury that the foregoing is true and correct. I
`
`hereby declare that all statements made herein of my own knowledge are
`
`true and that all statements made on information and belief are believed to
`
`be true; and further that these statements were made the knowledge that
`
`willful false statements and the like so made are punishable by fine or
`
`imprisonment, or both, under Section 1001 of Title 18 of the United States
`
`Code.
`
`
`
`4
`
`IPR2023-00035
`Apple EX1035 Page 5
`
`
`
`
`
`Dated: 9/26/2022
`
`
`
`June Ann Munford
`
`
`
`5
`
`IPR2023-00035
`Apple EX1035 Page 6
`
`
`
`J. Munford
`Curriculum Vitae
`
`Education
`
`University of Wisconsin-Milwaukee - MS, Library & Information Science, 2009
`Milwaukee, WI
`
`
`● Coursework included cataloging, metadata, data analysis, library systems,
`management strategies and collection development.
`● Specialized in library advocacy, cataloging and public administration.
`
`
`Grand Valley State University - BA, English Language & Literature, 2008
`Allendale, MI
`
` ●
`
` Coursework included linguistics, documentation and literary analysis.
`● Minor in political science with a focus in local-level economics and
`government.
`
`
`
`Professional Experience
`
`Researcher / Expert Witness, October 2017 – present
`Freelance ● Pittsburgh, Pennsylvania & Grand Rapids, Michigan
`
`
`● Material authentication and public accessibility determination.
`Declarations of authenticity and/or public accessibility provided upon
`research completion. Experienced with appeals and deposition process.
`
` ●
`
` Research provided on topics of public library operations, material
`publication history, digital database services and legacy web resources.
`
` ●
`
` Past clients include Alston & Bird, Arnold & Porter, Baker Botts, Fish &
`Richardson, Erise IP, Irell & Manella, O'Melveny & Myers, Perkins-Coie,
`Pillsbury Winthrop Shaw Pittman and Slayden Grubert Beard.
`
`Library Director, February 2013 - March 2015
`Dowagiac District Library ● Dowagiac, Michigan
`
`
`● Executive administrator of the Dowagiac District Library. Located in
`
`IPR2023-00035
`Apple EX1035 Page 7
`
`
`
`Southwest Michigan, this library has a service area of 13,000, an annual
`operating budget of over $400,000 and total assets of approximately
`$1,300,000.
`
`● Developed careful budgeting guidelines to produce a 15% surplus during
`the 2013-2014 & 2014-2015 fiscal years while being audited.
`
`
`
` ●
`
` Using this budget surplus, oversaw significant library investments
`including the purchase of property for a future building site, demolition of
`existing buildings and building renovation projects on the current facility.
`
` Led the organization and digitization of the library's archival records.
`
` ●
`
` ●
`
` Served as the public representative for the library, developing business
`relationships with local school, museum and tribal government entities.
`
` ●
`
` Developed an objective-based analysis system for measuring library
`services - including a full collection analysis of the library's 50,000+
`circulating items and their records.
`
`November 2010 - January 2013
`Librarian & Branch Manager, Anchorage Public Library ● Anchorage, Alaska
`
`
`● Headed the 2013 Anchorage Reads community reading campaign
`including event planning, staging public performances and creating
`marketing materials for mass distribution.
`
` ●
`
` Co-led the social media department of the library's marketing team,
`drafting social media guidelines, creating original content and instituting
`long-term planning via content calendars.
`
` ●
`
` Developed business relationships with The Boys & Girls Club, Anchorage
`School District and the US Army to establish summer reading programs for
`children.
`
`
`June 2004 - September 2005, September 2006 - October 2013
`Library Assistant, Hart Area Public Library
`Hart, MI
`
`
`● Responsible for verifying imported MARC records and original MARC
`
`IPR2023-00035
`Apple EX1035 Page 8
`
`
`
`
`
`cataloging for the local-level collection as well as the Michigan Electronic
`Library.
`
`● Handled OCLC Worldcat interlibrary loan requests & fulfillment via
`ongoing communication with lending libraries.
`
`
`
`Professional Involvement
`
`Alaska Library Association - Anchorage Chapter
`● Treasurer, 2012
`
`
`Library Of Michigan
`● Level VII Certification, 2008
`● Level II Certification, 2013
`
`
`Michigan Library Association Annual Conference 2014
`● New Directors Conference Panel Member
`
`
`Southwest Michigan Library Cooperative
`● Represented the Dowagiac District Library, 2013-2015
`
`
`
`Professional Development
`
`Library Of Michigan Beginning Workshop, May 2008
`Petoskey, MI
`● Received training in cataloging, local history, collection management,
`children’s literacy and reference service.
`
`
`Public Library Association Intensive Library Management Training, October 2011
`Nashville, TN
`● Attended a five-day workshop focused on strategic planning, staff
`management, statistical analysis, collections and cataloging theory.
`
`
`Alaska Library Association Annual Conference 2012 - Fairbanks, February 2012
`Fairbanks, AK
`● Attended seminars on EBSCO advanced search methods, budgeting,
`cataloging, database usage and marketing.
`
`IPR2023-00035
`Apple EX1035 Page 9
`
`
`
`Depositions
`
`2019 ● Fish & Richardson
`
`IPR Petitions of 865 Patent, Apple v. Qualcomm (IPR2018-001281 /
`
`39521-00421IP & IPR2018-01282 / 39521-00421IP2)
`
`2019 ● Erise IP
`
`Implicit, LLC v. Netscout Systems, Inc (Civil Action No. 2:18-cv-53-JRG)
`
`2019 ● Perkins-Coie
`
`Adobe Inc. v. RAH Color Technologies LLC (Cases IPR2019-00627,
`
`IPR2019-00628, IPR2019-00629 and IPR2019-00646)
`
`2020 ● O’Melveny & Myers
`
`Maxell, Ltd. v. Apple Inc. (Case 5:19-cv-00036-RWS)
`
`2021 ● Pillsbury Winthrop Shaw Pittman LLP
`
`Intel v. SRC (Case IPR2020-1449)
`
`
`Limited Case History & Potential Conflicts
`
`Alston & Bird
`
`● Nokia (v. Neptune Subsea, Xtera)
`
`Arnold & Porter
`
`● Ivantis (v. Glaukos)
`
`Erise I.P.
`
`● Apple
`
`
`v. Future Link Systems (IPRs 6317804, 6622108, 6807505, and
`
`
`7917680)
`
`
`v. INVT
`
`
`v. Navblazer LLC (Case No. IPR2020-01253)
`
`IPR2023-00035
`Apple EX1035 Page 10
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`v. Qualcomm (IPR2018-001281, 39521-00421IP, IPR2018-01282,
`39521-00421IP2)
`v. Quest Nettech Corp, Wynn Technologies (Case No. IPR2019-
`00XXX, RE. Patent Re38137)
`
`● Fanduel (v CGT)
`
`● Garmin (v. Phillips North America LLC, Case No. 2:19-cv-6301-AB-KS
`Central District of California)
`
`● Netscout
`
`v. Longhorn HD LLC)
`
`v. Implicit, LLC (Civil Action No. 2:18-cv-53-JRG)
` ● Sony Interactive Entertainment LLC
`v. Bot M8 LLC
`v. Infernal Technology LLC
`● Unified Patents (v GE Video Compression, Civil Action No. 2:19-cv-248)
`
`
`Fish & Richardson
`
`● Apple
`
`
`v. LBS Innovations
`
`
`v. Masimo (IPR 50095-0012IP1, 50095-0012IP2, 50095-0013IP1,
`
`
`50095-0013IP2, 50095-0006IP1)
`
`
`v. Neonode
`
`
`v. Qualcomm (IPR2018-001281, 39521-00421IP, IPR2018-01282,
`
`
`39521-00421IP2)
`
`
`
`
`● Dish Network
`
`v. Realtime Adaptive Streaming, Case No 1:17-CV-02097-RBJ)
`
`IPR2023-00035
`Apple EX1035 Page 11
`
`
`
`v. TQ Delta LLC
`
` Huawei (IPR 76933211)
`
` Kianxis
`
`
`
` ●
`
` ●
`
` ●
`
` LG Electronics (v. Bell Northern Research LLC, Case No. 3:18-cv-2864-
`CAB-BLM)
`
` ●
`
` ●
`
` Samsung (v. Bell Northern Research, Civil Action No. 2:19-cv-00286-
`JRG)
`
` Texas Instruments
`
` ●
`
`
`Irell & Manella
`
`● Curium
`
`O’Melveny & Myers
`
`● Apple (v. Maxell, Case 5:19-cv-00036-RWS)
`
`Perkins-Coie
`
`● TCL Industries (v. Koninklijke Philips NV, PTAB Case Nos. IPR2021-
`
`00495, IPR2021-00496, and IPR2021-00497)
`
`Pillsbury Winthrop Shaw Pittman
`
`● Intel (v. FG SRC LLC, Case No. 6:20-cv-00315 W.D. Tex)
`
` Metaswitch
`
` MLC Intellectual Property (v. MicronTech, Case No. 3:14-cv-03657-SI)
`
` Realtek Semiconductor
`
` Quectel
`
` ●
`
` ●
`
` ●
`
`IPR2023-00035
`Apple EX1035 Page 12
`
`
`
`screenshot-cmu.primo.exlibrisgroup.com-2022.09.02-14_19_50
`hitps://cmu.primo. exlibrisgroup.com/discovery/sourceRecord?vid=01CMU_INST:01CMU&docid=alma99 1002439939704436&recordOwner=01CMU_INST
`02.09.2022
`
`leader
`@e1
`@o5
`8e8
`@35
`@35
`@35
`840
`@49
`92
`10@
`245
`268
`30e
`498
`568
`5@2
`504
`536
`596
`658
`658
`658
`838
`901
`982
`983
`g1¢8
`916
`g17
`918
`919
`928
`945
`945
`999
`999
`
`th
`
`208 @engd
`
`61768nam a22004451a 4506
`991902439939704436
`19968627153454.8
`96062751996
`paua
`##$a468549-olcmu_inst
`fH#$a(OCoLC)468549 $9ExL
`f#$a(Sirsi) 034999141
`##$aPMC $cPMC
`fHibaPMCC
`##$a510.7808 SbC28r
`1i#$aRavishankar, Mosur.
`lesaEfficient algorithms for speech recognition / $cMosur K. Ravishankar.
`f##$aPittsburgh, Pa.
`: $bSchool of Computer Science, Carnegie Mellon University, $cc 1996.
`fi#$axii, 132 p.
`: $bill.
`; $c28 cm.
`1#$a[Research paper] / Carnegie Mellon University. School of Computer Science, $vCMU-CS-96-143
`f#$a"May 15, 1996."
`##$aThesis (Ph. D.)--Carnegie Mellon University, 1996.
`##$aIncludes bibliographical references.
`f#$aSupported in part by the Department of the Navy, Naval Research Laboratory. $cN@@@14-93-1-2005
`##$a9
`#@$aAlgorithms.
`#O$ahutomatic speech recognition.
`#@$aReal-time data processing.
`#@$aResearch paper (Carnegie Mellon University. School of Computer Science)
`##$a0cm34999141
`fH#$a468549
`##$a034999141
`##$ajh/mm 6-24-96
`f#$219960627
`##$a19960627
`fHisaCATALOGER
`##$a20070726
`d#$aBATCH
`##$v96-143 $c2 $he&s-tech rept $138482006838482
`##$V96-143 $c3 $he&s-tech rept $138482006838490
`##$a510.78@8 C28R 96-143 SwOEWEY $c2 $138482006838482 $d8/5/2a02 $e9/25/20@@ $1BY-REQUEST $mOFFSITE $n13 $q2 $rV $sY $tTECH-RPT $u6/27/1996
`##$a510.7808 C28R 96-143 $wDEWEY $c3 $138482006838490 $d6/18/2002 $e6/18/2002 $1BY-REQUEST $mOFFSITE $n4 $r¥ $sY $tTECH-RPT $u6/27/1996 FORM=MARC
`
`; $vCMU-C5-96-143
`
`te)
`BI
`ESa=a
`
`a
`
`IPR2023-00035
`Apple EX1035 Page 13
`
`IPR2023-00035
`Apple EX1035 Page 13
`
`
`
`iT
`
`screenshot-cmu.primo.exlibrisgroup.com-2022.09.02-14_19_27
`hitps://cmu.primo.exlibrisgroup.com/discoveryAulldisplay?
`context=L&vid=01CMU_INST:01CMU&search_scope=Myinst_and_Ci&tab=Everything&docid=alma99 1002439939704436
`02.09.2022
`
`EenieLll)
`
`CPiMeOMekg
`Miele
`SEARCH TIPS
`BROWSE SEARCH
`DATABASES A-Z
`JOURNAL SEARCH
`COLLECTION
`yoohaay
`eaeey
`
`z
`
`SIGN IN
`
`Search anything
`»
`e
`
`s
`Everything 7 v 2p
`
`
`DISSERTATION
`Efficient algorithms for speech recognition
`Ravishankar, Mosur.
`c1996
`®! Available at Offsite Repository BY-REQUEST(510.7808 C28R 96-143)
`wee
`
`TOP
`
`>
`
`z
`
`Find in Library
`ae
`TOP
`.
`
`Please sign in tocheckifthere are any requestoptions.BDsienin
`
`BACK TO LOCATIONS
`LOCATION ITEMS.
`Offsite Repository
`Available , BY-REQUEST; 510.7808 C28R 96-143
`(2copies, i available, 0 requests)
`Vv
`
`EEi
`aES
`=
`a
`
`en
`=v
`
`Links
`
`In transit until 09/03/2022
`Sign in for loan information
`
`Item in place
`Sign in for loan information
`
`v
`
`wa
`
`ToP
`
`..
`
`Report a Problem 4 >
`Signed in andstillcan’t find whatyou're looking for? Let us know
`Display source record) >
`
`Details
`Title
`Efficient algorithmsforspeech recognition
`Creator
`Ravishankar, Mosur. >
`Dissertation
`Thesis (Ph. D.)--Carnegie Mellon University, 1996.
`Subject
`Algorithms >
`Automatic speech recognition >
`Real-time data processing >
`Series
`[Research paper] / Carnegie Mellon University. School of Computer Science, CMU-CS-96-143 >
`Research paper (Carnegie Mellon University. Schaol ofComputer Science) ; CMU-CS-96-143, >
`Publisher
`Pittsburgh, Pa, : School of Computer Science, Carnegie Mellon University
`Creation Date
`€1596
`Format
`xii, 132 p. rill. ; 28m,
`
`iceogireferences,
`
`9=)Cya
`ESPadBe
`
`Ga
`
`IPR2023-00035
`Apple EX1035 Page 14
`
`IPR2023-00035
`Apple EX1035 Page 14
`
`€
`
`
`"May 15, 1996."
`Source
`Library Catalog
`
`Send to
`
`TOP
`
`ee
`
`Virtual Browse
`
`TO EXCEL
`Ruevor
`
`pean
`Gbevan
`
`&PERMALINK
`Tau
`
`9BciTaTiON
`
`
`
`By txronrRl1S
`
`B pron
`BIBTEX
`
`Biases
`
`
`
`ver
`Designof the
`Storage
`Efficient
`Acase for
`Properties ofa
`Adaptive
`able
`per
`programming
`strategies for
`algorithmsfor
`network-
`family of
`precision
`am
`oft
`language
`fault-tolerant
`speech
`attached
`parallel finite
`floating-point
`wolf
`
`
`
`
`uage securedisks—...arithmetic and element recognition... video servers... Forsythe os loc
`
`
`
`els
`fastrobust
`simulations...
`ve
`..
`1996
`c1s96
`1996
`c1996
`usit
`1996
`1996
`——_—_—————_
`198
`
`<
`
`>
`
`
`
`IPR2023-00035
`Apple EX1035 Page 15
`
`IPR2023-00035
`Apple EX1035 Page 15
`
`
`
`
`
`Computer Science
`
`
`
`
`Carnegie
`ios
`
`
`
`
`IPR2023-00035
`Apple EX1035 Page 16
`
`IPR2023-00035
`Apple EX1035 Page 16
`
`
`
`-"
`
`
`University Libraries
`Carnegie Mellon Universit
`Pittsburgh PA 15213-389
`
`IPR2023-00035
`Apple EX1035 Page 17
`
`IPR2023-00035
`Apple EX1035 Page 17
`
`
`
`Efficient Algorithms for Speech Recognition
`
`Mosur K. Ravishankar
`
`May 15, 1996
`CMU-CS-96-143
`
`School of Computer Science
`Computer Science Division
`Carnegie Mellon University
`Pittsburgh, PA 15213
`
`Submitted in partial fulfillment of the requirements
`for the degree of Doctor of Philosophy.
`
`Thesis Committee:
`
`Roberto Bisiani, co-chair (University of Milan)
`Raj Reddy, co-chair
`Alexander Rudnicky
`Richard Stern
`Wayne Ward
`
`© 1996 Mosur K. Ravishankar
`
`This research was supported by the Department of the Navy, Naval Research Laboratory under
`Grant No. N00014-93-1-2005. The views and conclusions contained in this document are those of
`the author and should not be interpreted as representing the official policies, either expressed or
`implied, of the U.S. government.
`
`IPR2023-00035
`Apple EX1035 Page 18
`
`IPR2023-00035
`Apple EX1035 Page 18
`
`
`
`
`
`_School of Computer Science
`
`DOCTORAL THESIS
`in the field of
`Computer Science
`
`Efficient Algorithms for Speech Recognition
`
`MOSUR K. RAVISHANKAR
`
`Submitted in Partial Fulfillment of the Requirements
`for the Degree of Doctor of Philosophy
`
`ACCEPTED:
`
`;
`
`é,-COMMITTEECHAIR
`
`THESIS COMMITTEE CHAIR
`
`Kee42Riitls“(Bremid. +- 30-96
`
`
`
`/ &
`
`DATE
`
`—EPARTMENT HEAD~DATE
`
`flr7
`Sf]
`
`/9¢
`
`APPROVED:
`
`ee
`
`s]11 [94
`
`IPR2023-00035
`Apple EX1035 Page 19
`
`IPR2023-00035
`Apple EX1035 Page 19
`
`
`
`Abstract
`
`Advances in speech technology and computing power have created a surge of
`interest in the practical application of speech recognition. However, the most accurate
`speech recognition systemsin the research world arestill far too slow and expensive to
`be used in practical, large vocabulary continuous speech applications. Their main goal
`has been recognition accuracy, with emphasis on acoustic and language modelling.
`But practical speech recognition also requires the computation to be carried out in
`real time within the limited resources—CPU power and memorysize—of commonly
`available computers. There has been relatively little work in this direction while
`preserving the accuracy of research systems.
`
`In this thesis, we focus on efficient and accurate speech recognition. It is easy to
`improve recognition speed and reduce memoryrequirements by trading away accu-
`racy, for example bygreater pruning, and using simpler acoustic and language models.
`It is much harder to improve both the recognition speed and reduce main memory
`size while preserving the accuracy.
`
`This thesis presents several techniques for improving the overall performance of
`the CMU Sphinx-II system. Sphinx-II employs semi-continuous hidden Markov mod-
`els for acoustics and trigram language models, and is one of the premier research
`systemsof its kind. The-techniquesin this thesis are validated on several widely used
`benchmarktest sets using two vocabularysizes of about 20K and 58K words.
`
`The main contributions of this thesis are an 8-fold speedup and 4-fold memory size
`reduction over the baseline Sphinx-II system. The improvementin speed is obtained
`from the following techniques: lexical tree search, phonetic fast match heuristic, and
`global best path search of the word lattice. The gain in speed from the tree searchis
`about a factor of 5. The phonetic fast match heuristic speeds up the tree search by
`another factor of 2 by finding the most likely candidate phones active at any time.
`Though the tree search incurs someloss of accuracy, it also produces compact word
`lattices with low error rate which can be rescored for accuracy. Such a rescoring is
`combined with the best path algorithm to find a globally optimum path through a
`word lattice. This recovers the original accuracy of the baseline system. The total
`recognition time is about 3 timesreal time for the 20K task on a 175MHz DEC Alpha
`workstation.
`
`The memory requirements of Sphinx-II are minimized by reducing the sizes of
`the acoustic and language models. The language model is maintained on disk and
`bigrams and trigrams are read in on demand. Explicit software caching mechanisms
`effectively overcome the disk access latencies. The acoustic modelsize is reduced by
`simply truncating precision of probability values to 8 bits. Several other engineering
`solutions, not explored in this thesis, can be applied to reduce memoryrequirements
`further. The memorysize for the 20K task is reduced to about 30-40MB.
`
`i
`
`IPR2023-00035
`Apple EX1035 Page 20
`
`IPR2023-00035
`Apple EX1035 Page 20
`
`
`
`IPR2023-00035
`Apple EX1035 Page 21
`
`IPR2023-00035
`Apple EX1035 Page 21
`
`
`
`Acknowledgements
`
`I cannot overstate the debt I owe to Roberto Bisiani and Raj Reddy. They have
`not only helped me and given me everyopportunity to extend myprofessional career,
`but also helped me through personaldifficulties as well. It is quite remarkable that I
`have landed not one but two advisors that combineintegrity towards research with a
`humantouch that transcends the proverbial hard-headedness of science. One cannot
`hope for better mentors than them. Alex Rudnicky, Rich Stern, and Wayne Ward,
`all have a clarity of thinking and self-expression that simply amazes me without end.
`They have given methe most insightful advice, comments, and questions that I could
`have asked for. Thank you, all.
`The CMUspeech group has been a pleasure to work with. First of all, I would
`like to thank some former and current members, Mei-Yuh Hwang, Fil Alleva, Lin
`Chase, Eric Thayer, Sunil Issar, Bob Weide, and Roni Rosenfeld. They have helped
`me through the early stages of my induction into the group, and later given invaluable
`support in my work.
`I’m fortunate to have inherited the work of Mei-Yuh and Fil.
`Lin Chase has been a greatfriend and sounding board forideas through these years.
`Eric has been all of that and a great officemate. I have learnt a lot from discussions
`with Paul Placeway. Therest of the speech group and the robust gang has madeit a
`most lively environment to work in.
`I hope the charge continues through Sphinx-III
`and beyond.
`
`I have spent a good fraction of mylife in the CMU-CS communityso far. It has
`been,andstill is, the greatest intellectual environment. The spirit of cooperation, and
`informalityof interactions as simply unique. I wouldlike to acknowledge the support
`of everyone I have ever come to knowhere, too manyto name, from the Warp and
`Nectar days until now. The administrative folks have always succeeded in blunting
`the edge off a difficult day. You never know what nickname Catherine Copetas will
`christen you with next. And Sharon Burks has always put up with all my antics.
`It goes without saying that I owe everything to myparents. I have had tremendous
`support from mybrothers, and some veryspecial uncles andaunts. In particular, I
`must mention the fun I’ve had with mybrother Kuts. I would also like to acknowledge
`K. Gopinath’s help during mystay in Bangalore. Finally, “BB”, who has suffered
`through my tantrums on bad days, kept me in touch with therest of the world, has a
`most creative outlook on the commonplace, can drive me nuts some days, but when
`all is said and done, is a most relaxed and comfortable person to have around.
`Last but notleast, I would like to thank Andreas Nowatzyk, Monica Lam, Duane
`Northcutt and Ray Clark. It has been mygood fortune to witness and participate in
`some of Andreas’s creative work. This thesis owes a lot to his unending support and
`encouragement.
`
`ii
`
`IPR2023-00035
`Apple EX1035 Page 22
`
`IPR2023-00035
`Apple EX1035 Page 22
`
`
`
`IPR2023-00035
`Apple EX1035 Page 23
`
`IPR2023-00035
`Apple EX1035 Page 23
`
`
`
`Contents
`
`Abstract
`
`Acknowledgements
`
`1
`
`Introduction
`
`1:1.
`
`1:2’.
`
`1.3;
`
`The: Modelling Problenty.
`
`. <.2.4.4 4 Se arararied WW rrr eas eo
`
`‘The Search Probleia,
`
`—.. ..-cc we
`
`conve ee ere eS Soe pees SS
`
`“Thedia Contribritionés:
`
`3.2) awa gia seis. Kimicue
`
`©
`
`gov aia sae ad
`
`i
`
`ili
`
`1
`
`3
`
`5
`
`7
`
`LSD Trapooving Speed ma.
`
`- Rass:
`
`5 eet bia. Oe hs ee 8
`
`13:2 Reducing Memory Size 291 ig ek dred he ei es Ba 8
`
`8
`
`1.4 Summary and Dissertation Outline ..- 2... eee eee 9
`
`2 Background
`
`11
`
`
`
`
`
`
`
`S'1).Acoustic MOqeligg is. ack eas preys. 8 Si 2.2 FiSG.2 HAR Gas ews SG 11
`
`
`
`Sl,
`
`Phones: and. THphoned si
`
`tesce oS Srdyare ee did dedi wo a
`
`5
`
`2.1.2 HMM modelling of Phones and Triphones ...........
`
`22 Language Modelling.
`
`6.6 a
`
`a scala ots
`
`e Pee Oe RAE HES ae HES
`
`11
`
`12
`
`13
`
`2:3: WReanch, Migntrbhint: gu © feck ae Sattar oe i eae 15
`
`O:3:1.
`
`“Viterbi Beat Search,
`
`..0 00 fo pape nt Oy ee Pm ee 5
`
`15
`
`
`
`
`
`
`
`
`
`
`
`
`
`A~Fetatearl WGI t, tah se Mk i a: tah eat ap eth ay eR lel cnieel ted SR Se Gn A 17
`
`
`
`24:1.
`
`Tree Structured Léxic6hss 2.2.0 Fa eee aa a se
`
`2.4.2 Memory Size and Speed Improvements in Whisper ......
`
`2.4.3
`
`Search Pruning Using Posterior Phone Probabilities ......
`
`17
`
`19
`
`20
`
`IPR2023-00035
`
`Apple EX1035 Page 24 “al
`
`IPR2023-00035
`Apple EX1035 Page 24
`
`
`
`2.4.4
`
`Lower Complexity Viterbi Algorithm .............-- 20
`
`Se),
`
`BSwAMSEM aad hiinh See a bo ah eee dies eae > &
`
`Ff
`
`21
`
`The Sphinx-II Baseline System
`
`Bit.
`
`“Krowledge Bawraees
`
`csi eis
`
`tien
`
`sy Oy a: wal Geld w pale po Se OS BE
`
`Sid. Acoustic Modells cress aiee ae Girs fae aaa ns
`
`B12 Pronunciation Lexioet.. ii eS G ceed ces yee wenesss
`
`82 Forward Beani Search: oss ese eee eae ge a eee wea
`
`22
`
`24
`
`24
`
`26
`
`26
`
`3.2.1
`
`Flat Lexical: Strnctutés.. 5
`
`.h cdr be eS es 26
`
`3.2.2
`
`Incorporating the Language Model .......----+----:
`
`3.2.3 Cross-Word Triphone Modeling ........--+55455,
`
`$.2:4. The Forward Search’ <0 sn 2
`
`66a Gears a
`
`2 RG ery ree ed
`
`$:3.. Backward and) A* Searelt s.5 cso woe
`
`5 or anos Spe wee OES
`
`27
`
`28
`
`31
`
`36
`
`$3.1 Backward Viterbi Seatch . «2 6
`
`2 ns ee ee ee es 37
`
`
`
`93.55AM Searéhts sua Gee w Arid O50 ONG ab ead En gp OS ed 37
`
`3.4 Baseline Sphinx-II System Performance.....-..-..-5558-
`
`3.4.1 Experimentation Methodology ......---..-+++0+5-
`
`38
`
`39
`
`3.4.2 Recogmition Accuracy
`
`ire. ee ae seb bd eee as FG 41
`
`
`
`3.473:'Search Speed: (a.-o oc Wher dF iw 4.65 Ba Le 42
`
`34,4)
`
`‘Methory Usage nc). eos Fe hee ea me Ame we
`
`3:5. Baseline System Summary: 5. sti ese Ge eho eee meee s
`
`Search Speed Optimization
`
`45
`
`48
`
`49
`
`431) Motivation 4a.0.5 24 Beem as 2
`
`ol Aly,
`
`© idl Ss aie die 49
`
`42 Texical TreeSéarth: oc ec i ec pKa ie Oe ah SOD A Gwe eo
`
`51
`
`4.2.1
`
`Lexical Tree Construction «i066. eae ee ewes 54
`
`4.2.2
`
`Incorporating Language Model Probabilities ..........
`
`4.2.3 Outline of Tree Search Algorithm ....-.....2. 55845
`
`4.2.4
`
`Performance of Lexical Tree Search .... 1... +e pees
`
`42:5
`
`Lexical Tree Search Summary . 2.6 ge cw a eee ws
`
`56
`
`61
`
`62
`
`67
`
`Vi
`
`IPR2023-00035
`Apple EX1035 Page 25
`
`
`IPR2023-00035
`Apple EX1035 Page 25
`
`
`
`aS Glébal (Best. Path Searehy coi
`apa
`Go
`ec ee inky
`eee ww cow ele
`tant
`a ween
`4.3.1 Best Path Search PBT hs Gg orc £em.6)gaapla sala.
`2
`4.3.2
`Performance ..... 4s sere ree eee ees
`A338. Best Path Search SamMmMary
`iu sta Gade ec ell sina
`
`4.4 Rescoring Tree-Search Word Lattice. ....-.--.---2+2-5-
`
`68
`68
`73
`74
`
`76
`
`AAD
`
`“MOtvatiinls 3) 5.6:554.5 5.85 © white hs 2 BS eR ara 76
`
`AAD BERANE oes teh
`
`-'ny
`
`lahn
`
`in
`
`Sarin
`
`Fo Lats Bole Dae
`
`gp
`
`ligte etna 5
`
`76
`
`S48 'Sutensey? Gs ohn ib oe Se es BAG 78
`
`45.
`
`Phonetic: Bast Maton. ..6 ole else g
`
`goal
`
`poe ee eee aig eee we
`
`78
`
`BST:
`
`ABORTVALION cals.
`
`ca en eae eas Sle! Ree a
`
`ew)
`
`cae oh Ge es 78
`
`4.5.2 Details of Phonetic Fast Match ......-...0 000068.
`
`4.5.3
`
`Performance of Fast Match Using All Senones .........
`
`4.5.4
`
`Performance of Fast Match Using CI Senones .........
`
`4.5.5
`
`Phonetic Fast Match Summary ........-.555000-
`
`4:6. Exploiting Conctrrenty ois. d.s.e @ mle pin go lee oie mes x
`
`&
`
`4.6.1 Multiple Levels of Concurrency .........--+-+.2-.-:
`
`4.6.2
`
`Parallelization Summary ......05 5859888 88 Saas
`
`4.7 Summary of Search Speed Optimization ............00045.
`
`Memory Size Reduction
`
`5.1
`
`Senone Mixture Weights Compression. ........-.-.-.505+.
`
`5.2 Disk-Based Language Models .... 2... 2.2.5.2 2 cee eneneae
`
`80
`
`84
`
`87
`
`88
`
`89
`
`90
`
`93
`
`93
`
`97
`
`97
`
`98
`
`5.3 Summary of Experiments on Memory Size ......-.-...-.2.. 100
`
`Small Vocabulary Systems
`
`101
`
`
`
`GL))Mpraerecnl Teotie’ ooh cig sean 4 SLE ee et ee Acad oy wre 8 101
`
`
`
`62 Performance tn ATIC -o52 ha ee ES dE GSS
`
`102
`
`6.2.1 Baseline System Performance ..........-2.2204.4.
`
`102
`
`6.2.2
`
`Performance of Lexical Tree Based System ........... 103
`
`6.3 Small Vocabulary Systems Summary ......-.---554 44s 106
`
`vil
`
`IPR2023-00035
`Apple EX1035 Page 26
`
`—
`
`IPR2023-00035
`Apple EX1035 Page 26
`
`
`
`7 Conclusion
`
`107
`
`
`
`Cok) SATIN ERRMIER.oq. vale Sadie ied dedotesebie-d Soe bea Aires 108
`
`Pips+ORETIMEMONE Ln Bi ge Shd Sa Bt Gy BL, with Gk Sa Got ah Oe ee & ED 109
`
`
`
`
`
`
`
`
`
`
`
`7.3 Future Work on Efficient Speech Recognition. ............. lll
`
`Appendices
`
`A The Sphinx-II Phone Set
`
`B Statistical Significance Tests
`
`115
`
`116
`
`BL PUES Wiatllc-e
`
`etc nee)
`
`@. O06 © Yo Qe Be,
`
`fade aie ae
`
`819) ah hey ge ee G ae
`
`117
`
`
`
`Ded. Gow. LAGywhine Selle as Sahed Sha Sb dua Gat aE Ala enw acy a 121
`
`Bibliography
`
`125
`
`vill
`
`IPR2023-00035
`Apple EX1035 Page 27
`
`
`IPR2023-00035
`Apple EX1035 Page 27
`
`
`
`List of Figures
`
`2.1 Viterbi Search as Dynamic Programming ................
`
`3.1
`
`Sphinx-II Signal Processing Front End. .......--........
`
`3.2 Sphinx-II HMM Topology: 5-State Bakis Model.............
`
`3.3 Cross-word Triphone Modelling at Word Ends in Sphinx-II.......
`
`3.4 Word Initial Triphone HMM Modelling in Sphinx-I]. .........
`3.5 One Frame of Forward Viterbi Beam Search in the Baseline System.
`.
`
`3.6 Word Transitions in Sphinx-II Baseline System. ...... TEE 248
`
`3.7 Outline of A* Algorithm in Baseline System .............-.
`
`3.8 Language Model Structure in Baseline Sphinx-I] System. .......
`
`4.1 Basephone Lexical Tree Example. ..........5.0280506.
`
`15
`
`24
`
`25
`
`29
`
`31
`33
`
`35
`
`38
`
`46
`
`52
`
`4.2: Txaphone Lexical ‘Tree Example: 2... a be es ee 55
`
`4.3. Cross-Word Transitions With Flat and Tree Lexicons..........
`
`57
`
`4.4 Auxiliary Flat Lexical Structure for Bigram Transitions. .......
`
`4.5 Path Score Adjustment Factor f for Word w; Upon Its Exit. .....
`
`4.6 One Frame of Forward Viterbi Beam Search in Tree Search Algorithm.
`
`4.7 Word Lattice for Utterance: Take Fidelity’s case as an example. ...
`
`4.8 Word Lattice Example Represented asa DAG...........-..-
`
`4.9 Word Lattice DAG Example Using a Trigram Grammar. .......
`
`4.10 Suboptimal Usage of Trigrams in Sphinx-II Viterbi Search. ......
`
`4,11 Base Phones Predicted by Top Scoring Senones in Each Frame; Speech
`Fragment for Phrase THIJS TREND, Pronounced DH-IX-S T-R-EH-
`AVRERED:
`Ns
`the
`3. Fg aR eh EARS ee hte. CAN Sal ae ioe ta
`ees
`He SES
`
`58
`
`59
`
`63
`
`69
`
`70
`
`71
`
`73
`
`81
`
`ix
`
`IPR2023-00035
`Apple EX1035 Page 28
`
`IPR2023-00035
`Apple EX1035 Page 28
`
`
`
`4.12 Position of Correct Phone in Ranking Created by Phonetic Fast Match. 82
`
`4.13 Lookahead Window for Smoothing the Active Phone List. ......
`
`4.14 Phonetic Fast Match Performance Using All Senones (20K Task).
`
`..
`
`4.15 Word Error Rate vs Recognition Speed of Various Systems...... .
`
`4.16 Configuration of a Practical Speech Recognition System. ....-...
`
`83
`
`85
`
`94
`
`95
`
`
`
`
`
`
`
`\
`
`
`IPR2023-00035
`Apple EX1035 Page 29
`
`
`
`
`
`
`
`
`
`IPR2023-00035
`Apple EX1035 Page 29
`
`
`
`List of Tables
`
`3.1 No. of Words and Sentences in Each Test Set .............
`
`3.2 Percentage Word Error Rate of Baseline Sphinx-I] System. ......
`
`3.3. Overall Execution Times of Baseline Sphinx-II System (xRealTime)..
`
`3.4 Baseline Sphinx-II System Forward Viterbi Search Execution Times
`CROR