`
`
`DECLARATION OF JUNE ANN MUNFORD
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IPR2023-00037
`Apple EX1033 Page 1
`
`
`
`1. My name is June Ann Munford. I am over the age of 18, have personal
`
`knowledge of the facts set forth herein, and am competent to testify to the
`
`same.
`
`2. I earned a Master of Library and Information Science (MLIS) from the
`
`University of Wisconsin-Milwaukee in 2009. I have over ten years of
`
`experience in the library/information science field. Beginning in 2004, I
`
`have served in various positions in the public library sector including
`
`Assistant Librarian, Youth Services Librarian and Library Director. I have
`
`attached my Curriculum Vitae as Appendix CV.
`
`
`
`
`
`3. During my career in the library profession, I have been responsible for
`
`materials acquisition for multiple libraries. In that position, I have cataloged,
`
`purchased and processed incoming library works. That includes purchasing
`
`materials directly from vendors, recording publishing data from the material
`
`in question, creating detailed material records for library catalogs and
`
`physically preparing that material for circulation. In addition to my
`
`experience in acquisitions, I was also responsible for analyzing large
`
`collections of library materials, tailoring library records for optimal catalog
`
`IPR2023-00037
`Apple EX1033 Page 2
`
`
`
`search performance and creating lending agreements between libraries
`
`during my time as a Library Director.
`
`
`4. I am fully familiar with the catalog record creation process in the library
`
`sector. In preparing a material for public availability, a library catalog record
`
`describing that material would be created. These records are typically
`
`written in Machine Readable Catalog (herein referred to as “MARC”) code
`
`and contain information such as a physical description of the material,
`
`metadata from the material’s publisher, and date of library acquisition. In
`
`particular, the 008 field of the MARC record is reserved for denoting the
`
`date of creation of the library record itself. As this typically occurs during
`
`the process of preparing materials for public access, it is my experience that
`
`an item’s MARC record indicates the date of an item’s public availability.
`
`
`5. Typically, in creating a MARC record, a librarian would gather various bits
`
`of metadata such as book title, publisher and subject headings among others
`
`and assign each value to a relevant numerical field. For example, a book’s
`
`physical description is tracked in field 300 while title/attribution is tracked in
`
`field 245. The 008 field of the MARC record is reserved for denoting the
`
`creation of the library record itself. As this is the only date reflecting the
`
`inclusion of said materials within the library’s collection, it is my experience
`
`IPR2023-00037
`Apple EX1033 Page 3
`
`
`
`that an item’s 008 field accurately indicates the date of an item’s public
`
`availability.
`
`
`6. I have reviewed Exhibit ####, Efficient Algorithms for Speech Recognition
`
`by Mosur K. Ravishankar.
`
`
`7. Attached hereto as Appendix RAVISHANKAR01 is a true and correct copy
`
`of the MARC record for Efficient Algorithms for Speech Recognition as held
`
`by the Carnegie Mellon University library. I secured this record myself from
`
`the library’s public catalog. The MARC record contained within Appendix
`
`RAVISHANKAR01 accurately describes the title, author, publisher, and
`
`submission details of Efficient Algorithms for Speech Recognition by Mosur
`
`K. Ravishankar.
`
`
`8. Attached hereto as Appendix RAVISHANKAR02 is a true and correct copy
`
`of selections from Efficient Algorithms for Speech Recognition. I secured
`
`these scans myself from the Carnegie Mellon University’s holdings. In
`
`comparing Exhibit #### to Appendix RAVISHANKAR02, it is my
`
`determination that Exhibit #### is a true and correct copy of Efficient
`
`Algorithms for Speech Recognition by Mosur K. Ravishankar.
`
`
`
`IPR2023-00037
`Apple EX1033 Page 4
`
`
`
`9. The 008 field of the MARC record in Appendix RAVISHANKAR01
`
`indicates the date of record creation. The 008 field of Appendix
`
`RAVISHANKAR01 indicates the Carnegie Mellon University library first
`
`acquired this thesis as of June 27, 1996. Considering this information, it is
`
`my determination that Efficient Algorithms for Speech Recognition was
`
`made available to the public at least as early as December 31, 1996.
`
`
`10. I have been retained on behalf of the Petitioner to provide assistance in the
`
`above-illustrated matter in establishing the authenticity and public
`
`availability of the documents discussed in this declaration. I am being
`
`compensated for my services in this matter at the rate of $100.00 per hour
`
`plus reasonable expenses. My statements are objective, and my
`
`compensation does not depend on the outcome of this matter.
`
`
`11. I declare under penalty of perjury that the foregoing is true and correct. I
`
`hereby declare that all statements made herein of my own knowledge are
`
`true and that all statements made on information and belief are believed to
`
`be true; and further that these statements were made the knowledge that
`
`willful false statements and the like so made are punishable by fine or
`
`imprisonment, or both, under Section 1001 of Title 18 of the United States
`
`Code.
`
`IPR2023-00037
`Apple EX1033 Page 5
`
`
`
`
`
`Dated: 9/26/2022
`
`
`
`June Ann Munford
`
`IPR2023-00037
`Apple EX1033 Page 6
`
`
`
`J. Munford
`Curriculum Vitae
`
`Education
`
`University of Wisconsin-Milwaukee - MS, Library & Information Science, 2009
`Milwaukee, WI
`
`
`● Coursework included cataloging, metadata, data analysis, library systems,
`management strategies and collection development.
`● Specialized in library advocacy, cataloging and public administration.
`
`
`Grand Valley State University - BA, English Language & Literature, 2008
`Allendale, MI
`
` ●
`
` Coursework included linguistics, documentation and literary analysis.
`● Minor in political science with a focus in local-level economics and
`government.
`
`
`
`Professional Experience
`
`Researcher / Expert Witness, October 2017 – present
`Freelance ● Pittsburgh, Pennsylvania & Grand Rapids, Michigan
`
`
`● Material authentication and public accessibility determination.
`Declarations of authenticity and/or public accessibility provided upon
`research completion. Experienced with appeals and deposition process.
`
` ●
`
` Research provided on topics of public library operations, material
`publication history, digital database services and legacy web resources.
`
` ●
`
` Past clients include Alston & Bird, Arnold & Porter, Baker Botts, Fish &
`Richardson, Erise IP, Irell & Manella, O'Melveny & Myers, Perkins-Coie,
`Pillsbury Winthrop Shaw Pittman and Slayden Grubert Beard.
`
`Library Director, February 2013 - March 2015
`Dowagiac District Library ● Dowagiac, Michigan
`
`
`● Executive administrator of the Dowagiac District Library. Located in
`
`IPR2023-00037
`Apple EX1033 Page 7
`
`
`
`Southwest Michigan, this library has a service area of 13,000, an annual
`operating budget of over $400,000 and total assets of approximately
`$1,300,000.
`
`● Developed careful budgeting guidelines to produce a 15% surplus during
`the 2013-2014 & 2014-2015 fiscal years while being audited.
`
`
`
` ●
`
` Using this budget surplus, oversaw significant library investments
`including the purchase of property for a future building site, demolition of
`existing buildings and building renovation projects on the current facility.
`
` Led the organization and digitization of the library's archival records.
`
` ●
`
` ●
`
` Served as the public representative for the library, developing business
`relationships with local school, museum and tribal government entities.
`
` ●
`
` Developed an objective-based analysis system for measuring library
`services - including a full collection analysis of the library's 50,000+
`circulating items and their records.
`
`November 2010 - January 2013
`Librarian & Branch Manager, Anchorage Public Library ● Anchorage, Alaska
`
`
`● Headed the 2013 Anchorage Reads community reading campaign
`including event planning, staging public performances and creating
`marketing materials for mass distribution.
`
` ●
`
` Co-led the social media department of the library's marketing team,
`drafting social media guidelines, creating original content and instituting
`long-term planning via content calendars.
`
` ●
`
` Developed business relationships with The Boys & Girls Club, Anchorage
`School District and the US Army to establish summer reading programs for
`children.
`
`
`June 2004 - September 2005, September 2006 - October 2013
`Library Assistant, Hart Area Public Library
`Hart, MI
`
`
`● Responsible for verifying imported MARC records and original MARC
`
`IPR2023-00037
`Apple EX1033 Page 8
`
`
`
`
`
`cataloging for the local-level collection as well as the Michigan Electronic
`Library.
`
`● Handled OCLC Worldcat interlibrary loan requests & fulfillment via
`ongoing communication with lending libraries.
`
`
`
`Professional Involvement
`
`Alaska Library Association - Anchorage Chapter
`● Treasurer, 2012
`
`
`Library Of Michigan
`● Level VII Certification, 2008
`● Level II Certification, 2013
`
`
`Michigan Library Association Annual Conference 2014
`● New Directors Conference Panel Member
`
`
`Southwest Michigan Library Cooperative
`● Represented the Dowagiac District Library, 2013-2015
`
`
`
`Professional Development
`
`Library Of Michigan Beginning Workshop, May 2008
`Petoskey, MI
`● Received training in cataloging, local history, collection management,
`children’s literacy and reference service.
`
`
`Public Library Association Intensive Library Management Training, October 2011
`Nashville, TN
`● Attended a five-day workshop focused on strategic planning, staff
`management, statistical analysis, collections and cataloging theory.
`
`
`Alaska Library Association Annual Conference 2012 - Fairbanks, February 2012
`Fairbanks, AK
`● Attended seminars on EBSCO advanced search methods, budgeting,
`cataloging, database usage and marketing.
`
`IPR2023-00037
`Apple EX1033 Page 9
`
`
`
`Depositions
`
`2019 ● Fish & Richardson
`
`IPR Petitions of 865 Patent, Apple v. Qualcomm (IPR2018-001281 /
`
`39521-00421IP & IPR2018-01282 / 39521-00421IP2)
`
`2019 ● Erise IP
`
`Implicit, LLC v. Netscout Systems, Inc (Civil Action No. 2:18-cv-53-JRG)
`
`2019 ● Perkins-Coie
`
`Adobe Inc. v. RAH Color Technologies LLC (Cases IPR2019-00627,
`
`IPR2019-00628, IPR2019-00629 and IPR2019-00646)
`
`2020 ● O’Melveny & Myers
`
`Maxell, Ltd. v. Apple Inc. (Case 5:19-cv-00036-RWS)
`
`2021 ● Pillsbury Winthrop Shaw Pittman LLP
`
`Intel v. SRC (Case IPR2020-1449)
`
`
`Limited Case History & Potential Conflicts
`
`Alston & Bird
`
`● Nokia (v. Neptune Subsea, Xtera)
`
`Arnold & Porter
`
`● Ivantis (v. Glaukos)
`
`Erise I.P.
`
`● Apple
`
`
`v. Future Link Systems (IPRs 6317804, 6622108, 6807505, and
`
`
`7917680)
`
`
`v. INVT
`
`
`v. Navblazer LLC (Case No. IPR2020-01253)
`
`IPR2023-00037
`Apple EX1033 Page 10
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`v. Qualcomm (IPR2018-001281, 39521-00421IP, IPR2018-01282,
`39521-00421IP2)
`v. Quest Nettech Corp, Wynn Technologies (Case No. IPR2019-
`00XXX, RE. Patent Re38137)
`
`● Fanduel (v CGT)
`
`● Garmin (v. Phillips North America LLC, Case No. 2:19-cv-6301-AB-KS
`Central District of California)
`
`● Netscout
`
`v. Longhorn HD LLC)
`
`v. Implicit, LLC (Civil Action No. 2:18-cv-53-JRG)
` ● Sony Interactive Entertainment LLC
`v. Bot M8 LLC
`v. Infernal Technology LLC
`● Unified Patents (v GE Video Compression, Civil Action No. 2:19-cv-248)
`
`
`Fish & Richardson
`
`● Apple
`
`
`v. LBS Innovations
`
`
`v. Masimo (IPR 50095-0012IP1, 50095-0012IP2, 50095-0013IP1,
`
`
`50095-0013IP2, 50095-0006IP1)
`
`
`v. Neonode
`
`
`v. Qualcomm (IPR2018-001281, 39521-00421IP, IPR2018-01282,
`
`
`39521-00421IP2)
`
`
`
`
`● Dish Network
`
`v. Realtime Adaptive Streaming, Case No 1:17-CV-02097-RBJ)
`
`IPR2023-00037
`Apple EX1033 Page 11
`
`
`
`v. TQ Delta LLC
`
` Huawei (IPR 76933211)
`
` Kianxis
`
`
`
` ●
`
` ●
`
` ●
`
` LG Electronics (v. Bell Northern Research LLC, Case No. 3:18-cv-2864-
`CAB-BLM)
`
` ●
`
` ●
`
` Samsung (v. Bell Northern Research, Civil Action No. 2:19-cv-00286-
`JRG)
`
` Texas Instruments
`
` ●
`
`
`Irell & Manella
`
`● Curium
`
`O’Melveny & Myers
`
`● Apple (v. Maxell, Case 5:19-cv-00036-RWS)
`
`Perkins-Coie
`
`● TCL Industries (v. Koninklijke Philips NV, PTAB Case Nos. IPR2021-
`
`00495, IPR2021-00496, and IPR2021-00497)
`
`Pillsbury Winthrop Shaw Pittman
`
`● Intel (v. FG SRC LLC, Case No. 6:20-cv-00315 W.D. Tex)
`
` Metaswitch
`
` MLC Intellectual Property (v. MicronTech, Case No. 3:14-cv-03657-SI)
`
` Realtek Semiconductor
`
` Quectel
`
` ●
`
` ●
`
` ●
`
`IPR2023-00037
`Apple EX1033 Page 12
`
`
`
`screenshot-cmu.primo.exlibrisgroup.com-2022.09.02-14_19_50
`hitps://cmu.primo. exlibrisgroup.com/discovery/sourceRecord?vid=01CMU_INST:01CMU&docid=alma99 1002439939704436&recordOwner=01CMU_INST
`02.09.2022
`
`deader
`@e1
`aos
`a8
`@35
`@35
`@35
`aaa
`as
`92
`108
`
`th
`
`e9@ Beng d
`
`@1768nam az2004451a 4506
`991862439939704436
`19966627153454.8
`96062751996
`paua
`##$2468549-olcmu_inst
`fH#Ba(OCOLC)4G8549 $9ExL
`iiba(Sirsi) 034999141
`4Ht$aPMC $cPMC
`fHibaPMCC
`##$a510.7808 SbC28r
`l#$aRavishankar, Mosur.
`le$aefficient algorithms for speech recognition / $cMosur K. Ravishankar.
`Hi$aPittsburgh, Pa.
`: SbSchool of Computer Science, Carnegie Mellon University, #cc1996.
`##$axii, 132 p.
`: $bill.
`3 $c28 cm.
`l#$a[Research paper]
`/ Carnegie Mellon University, School of Computer Science, $vCMU-CS-96-143
`##$a"May 15, 1996."
`#H#$aThesis
`(Ph. D.)--Carnegie Mellon University, 1996,
`H#$aIncludes bibliographical references.
`f#$aSupported in part by the Department of the Navy, Naval Research Laboratory. $cN@@014-93-1-2005
`##$a9
`#@ZaAlgorithms.
`#@$aAutomatic speech recognition.
`#@$aReal-time data processing.
`#@$aResearch paper (Carnegie Mellon University. School of Computer Science)
`d$a0cm34999141
`fHi$a4eas49
`Hi$a034999141
`##$ajh/mm 6-24-96
`#i#$al9960627
`##$2819960627
`fHibaCATALOGER
`##$a2@070726
`tHtbaBATCH
`#H$V96-143 $c2 $heSs-tech rept $i 3848280683482
`##$V96-143 $c3 bhe&s-tech rept $138482e06938499
`##$a510.7808 C28R 95-143 SwDOEWEY $c2 $138482006838482 $d8/5/2002 $e9/25/2600 ZIBY-REQUEST $mOFFSITE $n13 $q2 SPV $sY BtTECH-RPT $u6/27/1996
`##$a516,7808 C28R 96-143 SwDEWEY $c3 $138482006838490 $d6/18/2002 $e6/18/2002 B1BY-REQUEST $mOFFSITE $n4 $rV¥ $sVY StTECH-RPT $u6/27/1996 FORM=MARC
`
`; $vCMU-CS-96-143
`
`co
`
`IPR2023-00037
`Apple EX1033 Page 13
`
`IPR2023-00037
`Apple EX1033 Page 13
`
`
`
`screenshot-cmu.primo.exlibrisgroup.com-2022.09.02-14_19_27
`hitps://cmu.primo.exlibrisgroup.com/discoveryAulldisplay?
`context=L&vid=01CMU_INST:01CMU&search_scope=Myinst_and_Ci&tab=Everything&docid=alma99 1002439939704436
`02.09.2022
`
`Carnegie Mellon University
`Bley
`
`SIGNIN Pic
`
`Search anythin
`¥
`
`B
`
`5
`Everything 7 nf P
`
`
`DISSERTATION
`Efficient algorithms for speech recognition
`Ravishankar, Mosur.
`1996
`a! Available at OffsiteRepository BY-REQUEST(510.7808 C28R 96-143)
`=
`
`Tor
`
`»
`
`Find in Library
`ee
`TOP
`
`Please sign intocheckifthere areany requestoptions,BDsienin
`BACK TO LOCATIONS
`LOCATION ITEMS
`Offsite Repository
`Available , BY-REQUEST; 510.7808 C28R 96-143
`(2 copies, 1 available,0 requests)
`Tv
`
`ae
`=v
`
`Links
`
`In transit until 09/03/2022
`Signin for loan information
`
`Item in place
`Sign in for loan information
`
`oo
`
`i
`
`Top
`
`4.
`
`Report aProblem 4 >
`Signed in andstill can't find whatyou're looking for? Let us know
`‘Display source record >
`
`Details
`Title
`Efficient algorithmsforspeech recognition
`Creator
`Rawishankar, Mosur. >
`Dissertation
`Thesis (Ph. D.)--Carnegie Mellon University, 1996,
`Subject
`Algorithms »
`Automatic speech recognition »
`Real-time data processing »
`Series
`[Research paper] / Carnegie Mellon University, School of Computer Science, CMU-CS-96-143 >
`Research paper (Carnegie MellonUniversity. School of Computer Science) ; CMU.CS-96-143.
`>
`Publisher
`Pittsburgh, Pa, : School of ComputerScience, Carnegie Mellon University
`Creation Date
`1996
`Format
`xii, 132 p. rill. ;28em,
`
`ier
`
`ra)
`a2.
`&
`
`
`
`.=7
`
`2 a3
`
`IPR2023-00037
`Apple EX1033 Page 14
`
`IPR2023-00037
`Apple EX1033 Page 14
`
`€
`
`
`"May15,1996"
`Source
`Library Catalog
`
`Send to
`
`TOP
`
`tee
`
`Virtual Browse
`
`ENPORT
` TO EXCEL
`
`penn
`Baevat
`
`&PERMALINK
`"aus
`ai.”
`99 CITATION
`
`BevoRl1S
`
`BIBTEX
`Beer
`
`Bese
`
`<
`
`able
`am
`off
`wage
`els
`
`Adaptive
`precision
`floating-point
`arithmetic and
`fastrobust
`1996
`
`Properties of a
`family of
`parallel finite
`element
`simulations...
`1996
`
`in
`
`A case for
`network
`attached
`securedisks
`1596
`
`Efficient
`algorithmsfor
`speech
`recognition...
`c1se6
`
`
`Storage
`Strategies for
`faultLolerant
`video servers...
`‘1996
`
`Design of the
`programming
`language
`Forsythe
`1996
`
`«i
`
`>
`
`ver
`per
`oft
`loc;
`usit
`198
`
`
`
`IPR2023-00037
`Apple EX1033 Page 15
`
`IPR2023-00037
`Apple EX1033 Page 15
`
`
`
`
`
`Computer Science
`
`
`
`
`
`CarnegieTg
`
`
`
`IPR2023-00037
`Apple EX1033 Page 16
`
`IPR2023-00037
`Apple EX1033 Page 16
`
`
`
`a
`University Libraries
`Carnegie Mellon University
`Pittsburgh PA 15213-389
`
`IPR2023-00037
`Apple EX1033 Page 17
`
`IPR2023-00037
`Apple EX1033 Page 17
`
`
`
`Efficient Algorithms for Speech Recognition
`
`Mosur K. Ravishankar
`
`May 15, 1996
`CMU-CS-96-143
`
`School of Computer Science
`Computer Science Division
`Carnegie Mellon University
`Pittsburgh, PA 15213
`
`Submitted in partial fulfillment of the requirements
`for the degree of Doctor of Philosophy.
`
`Thesis Committee:
`
`Roberto Bisiani, co-chair (University of Milan)
`Raj Reddy, co-chair
`Alexander Rudnicky
`Richard Stern
`Wayne Ward
`
`© 1996 Mosur K. Ravishankar
`
`This research was supported by the Department of the Navy, Naval Research Laboratory under
`Grant No. N00014-93-1-2005. The views and conclusions contained in this document are those of
`the author and should not be interpreted as representing the official policies, either expressed or
`implied, of the U.S. government.
`
`IPR2023-00037
`Apple EX1033 Page 18
`
`IPR2023-00037
`Apple EX1033 Page 18
`
`
`
`
`
`_School of Computer Science
`
`DOCTORAL THESIS
`in the field of
`Computer Science
`
`Efficient Algorithms for Speech Recognition
`
`MOSURK. RAVISHANKAR
`
`Submitted in Partial Fulfillment of the Requirements
`for the Degree of Doctor of Philosophy
`
`ACCEPTED:
`
`i42Riitls“(Bremid. +- 30-96
`
`/
`
`se
`
`DATE
`
`;
`
`é,-COMMITTEECHAIR
`
`THESIS COMMITTEE CHAIR
`
`
`
`=)—SNFISE
`
`
`
`—ss EPARTMENTHEAD—HEAD DATE
`
`APPROVED:
`
`|e’
`
`s]11 [94
`
`IPR2023-00037
`Apple EX1033 Page 19
`
`IPR2023-00037
`Apple EX1033 Page 19
`
`
`
`Abstract
`
`Advances in speech technology and computing power have created a surge of
`interest in the practical application of speech recognition. However, the most accurate
`speech recognition systemsin the research world arestill far too slow and expensive to
`be used in practical, large vocabulary continuous speech applications. Their main goal
`has been recognition accuracy, with emphasis on acoustic and language modelling.
`But practical speech recognition also requires the computation to be carried outin
`real time within the limited resources—CPU power and memorysize—of commonly
`available computers. There has been relatively little work in this direction while
`preserving the accuracy of research systems.
`
`In this thesis, we focus on efficient and accurate speech recognition. It is easy to
`improve recognition speed and reduce memoryrequirements by trading away accu-
`racy, for example bygreater pruning, and using simpler acoustic and language models.
`It is much harder to improve both the recognition speed and reduce main memory
`size while preserving the accuracy.
`
`This thesis presents several techniques for improving the overall performance of
`the CMU Sphinx-II system. Sphinx-II employs semi-continuous hidden Markov mod-
`els for acoustics and trigram language models, and is one of the premier research
`systemsof its kind. The-techniquesin this thesis are validated on several widely used
`benchmarktest sets using two vocabularysizes of about 20K and 58K words.
`
`The main contributions of this thesis are an 8-fold speedup and 4-fold memory size
`reduction over the baseline Sphinx-II system. The improvementin speed is obtained
`from the following techniques: lexical tree search, phonetic fast match heuristic, and
`global best path search of the word lattice. The gain in speed from the tree searchis
`about a factor of 5. The phonetic fast match heuristic speeds up the tree search by
`another factor of 2 by finding the most likely candidate phones active at any time.
`Though the tree search incurs someloss of accuracy, it also produces compact word
`lattices with low error rate which can be rescored for accuracy. Such a rescoring is
`combined with the best path algorithm to find a globally optimum path through a
`word lattice. This recovers the original accuracy of the baseline system. The total
`recognition time is about 3 timesreal time for the 20K task on a 175MHz DEC Alpha
`workstation.
`
`The memory requirements of Sphinx-II are minimized by reducing the sizes of
`the acoustic and language models. The language model is maintained on disk and
`bigrams and trigrams are read in on demand. Explicit software caching mechanisms
`effectively overcome the disk access latencies. The acoustic modelsize is reduced by
`simply truncating precision of probability values to 8 bits. Several other engineering
`solutions, not explored in this thesis, can be applied to reduce memoryrequirements
`further. The memorysize for the 20K task is reduced to about 30-40MB.
`
`i
`
`IPR2023-00037
`Apple EX1033 Page 20
`
`IPR2023-00037
`Apple EX1033 Page 20
`
`
`
`IPR2023-00037
`Apple EX1033 Page 21
`
`IPR2023-00037
`Apple EX1033 Page 21
`
`
`
`Acknowledgements
`
`I cannot overstate the debt I owe to Roberto Bisiani and Raj Reddy. They have
`not only helped me and given me everyopportunity to extend myprofessional career,
`but also helped me through personaldifficulties as well. It is quite remarkable that I
`have landed not one but two advisors that combine integrity towards research with a
`humantouch that transcends the proverbial hard-headedness of science. One cannot
`hope for better mentors than them. Alex Rudnicky, Rich Stern, and Wayne Ward,
`all have a clarity of thinking and self-expression that simply amazes me without end.
`They have given methe most insightful advice, comments, and questions that I could
`have asked for. Thank you, all.
`The CMUspeech group has been a pleasure to work with. First of all, I would
`like to thank some former and current members, Mei-Yuh Hwang, Fil Alleva, Lin
`Chase, Eric Thayer, Sunil Issar, Bob Weide, and Roni Rosenfeld. They have helped
`me through the early stages of my induction into the group, and later given invaluable
`support in my work.
`I’m fortunate to have inherited the work of Mei-Yuh and Fil.
`Lin Chase has been a greatfriend and sounding board forideas through these years.
`Eric has beenall of that and a great officemate. I have learnt a lot from discussions
`with Paul Placeway. Therest of the speech group and the robust gang has madeit a
`most lively environment to work in.
`I hope the charge continues through Sphinx-III
`and beyond.
`
`I have spent a good fraction of mylife in the CMU-CS communityso far. It has
`been,andstill is, the greatest intellectual environment. The spirit of cooperation, and
`informality of interactions as simply unique. I wouldlike to acknowledge the support
`of everyone I have ever come to knowhere, too manyto name, from the Warp and
`Nectar days until now. The administrative folks have always succeeded in blunting
`the edge off a difficult day. You never know what nickname Catherine Copetas will
`christen you with next. And Sharon Burks has always put up with all my antics.
`It goes without saying that I owe everything to myparents. I have had tremendous
`support from mybrothers, and some veryspecial uncles andaunts. In particular, I
`must mention the fun I’ve had with mybrother Kuts. I wouldalsolike to acknowledge
`K. Gopinath’s help during mystay in Bangalore. Finally, “BB”, who has suffered
`through my tantrums on bad days, kept me in touch with therest of the world, has a
`most creative outlook on the commonplace, can drive me nuts some days, but when
`all is said and done, is a most relaxed and comfortable person to have around.
`Last but notleast, I would like to thank Andreas Nowatzyk, Monica Lam, Duane
`Northcutt and Ray Clark. It has been mygood fortune to witness and participate in
`some of Andreas’s creative work. This thesis owes a lot to his unending support and
`encouragement.
`
`iii
`
`IPR2023-00037
`Apple EX1033 Page 22
`
`IPR2023-00037
`Apple EX1033 Page 22
`
`
`
`iv
`
`IPR2023-00037
`Apple EX1033 Page 23
`
`IPR2023-00037
`Apple EX1033 Page 23
`
`
`
`Contents
`
`Abstract
`
`Acknowledgements
`
`1
`
`Introduction
`
`1:1.
`
`1:2’.
`
`1.3;
`
`The: Modelling Problenty.
`
`. <.2.4.4 4 Se arararied WW rrr eas eo
`
`‘The Search -Probleii, 0.0. wie cone cee eee See ew ees | HS
`
`“Thedia Contribritionés:
`
`3.2) awa gia seis. Kimicue
`
`©
`
`gov aia sae ad
`
`i
`
`ili
`
`1
`
`3
`
`5
`
`7
`
`LSD Trapooving Speed ma.
`
`- Rass:
`
`5 eet bia. Oe hs ee 8
`
`1.3.2 Reducing Memory Size i ig Diaries eas
`
`8
`
`1.4 Summary and Dissertation Outline ..- 2... eee eee 9
`
`2 Background
`
`11
`
`
`
`
`
`
`
`S'1).Acoustic MOqeligg is. ack eas preys. 8 Si 2.2 FiSG.2 HAR Gas ews SG 11
`
`
`
`Pt,
`
`Phones and Teiphoned si kere ode
`
`setpdee edb gd FG Gow Sa 5
`
`2.1.2 HMM modelling of Phones and Triphones ...........
`
`22 Language Modelling.
`
`6.6 a
`
`a scala ots
`
`e Pee Oe RAE HES ae HES
`
`11
`
`12
`
`13
`
`2:3: WReanch, Migntrbhint: gu © feck ae Sattar oe i eae 15
`
`O31 Viterbi Beath Search,
`
`.000 Ae eae ak ale ee em erg
`
`15
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Dh~Feetearh Ges be toc as i a. tal Pad ae tien eG PG! cruel te D AIR Sa Sa a A 17
`
`
`
`24:1.
`
`Tree Structured Léxic6hss 2.2.0 Fa eee aa a se
`
`2.4.2 Memory Size and Speed Improvements in Whisper ......
`
`2.4.3
`
`Search Pruning Using Posterior Phone Probabilities ......
`
`17
`
`19
`
`20
`
`IPR2023-00037
`
`Apple EX1033 Page 24 aj
`
`IPR2023-00037
`Apple EX1033 Page 24
`
`
`
`2.4.4
`
`Lower Complexity Viterbi Algorithm .............-- 20
`
`Se),
`
`BSwAMSEM aad hiinh See a bo ah eee dies eae > &
`
`Ff
`
`21
`
`The Sphinx-II Baseline System
`
`Bit.
`
`“Krowledge Bawraees
`
`csi eis
`
`tien
`
`sy Oy a: wal Geld w pale po Se OS BE
`
`Sid. Acoustic Modells cress aiee ae Girs fae aaa ns
`
`31:2 Pronunciation Lexictti. ti co eS Geo cee pee wees
`
`82 Forward Beani Search: oss ese eee eae ge a eee wea
`
`22
`
`24
`
`24
`
`26
`
`26
`
`3.2.1
`
`Flat Lexical: Strnctutés.. 5
`
`.h cdr be eS es 26
`
`3.2.2
`
`Incorporating the Language Model .......----+----:
`
`3.2.3 Cross-Word Triphone Modeling ........--+55455,
`
`$.2:4. The Forward Search’ <0 sn 2
`
`66a Gears a
`
`2 RG ery ree ed
`
`$:3.. Backward and) A* Searelt s.5 cso woe
`
`5 or anos Spe wee OES
`
`27
`
`28
`
`31
`
`36
`
`$3.1 Backward Viterbi Seatch . «2 6
`
`2 ns ee ee ee es 37
`
`
`
`93.55AM Searéhts sua Gee w Arid O50 ONG ab ead En gp OS ed 37
`
`3.4 Baseline Sphinx-II System Performance.....-..-..-5558-
`
`3.4.1 Experimentation Methodology ......---..-+++0+5-
`
`38
`
`39
`
`3.4.2 Recognition Accuracy ror. ee we se hb de tea nd 41
`
`
`
`3.473:'Search Speed: (a.-o oc Wher dF iw 4.65 Ba Le 42
`
`34,4)
`
`‘Methory Usage nc). eos Fe hee ea me Ame we
`
`3:5. Baseline System Summary: 5. sti ese Ge eho eee meee s
`
`Search Speed Optimization
`
`45
`
`48
`
`49
`
`431) Motivation 4a.0.5 24 Beem as 2
`
`ol Aly,
`
`© idl Ss aie die 49
`
`42 Texical TreeSéarth: oc ec i ec pKa ie Oe ah SOD A Gwe eo
`
`51
`
`
`
`
`
`4.2.1 Lexical Tree Construction«i066.eae ee ewes 54
`
`4.2.2
`
`Incorporating Language Model Probabilities ..........
`
`4.2.3 Outline of Tree Search Algorithm ....-.....2. 55845
`
`4.2.4
`
`Performance of Lexical Tree Search .... 1... +e pees
`
`42:5
`
`Lexical Tree Search Summary . 2.6 ge cw a eee ws
`
`56
`
`61
`
`62
`
`67
`
`Vi
`
`IPR2023-00037
`Apple EX1033 Page 25
`
`IPR2023-00037
`Apple EX1033 Page 25
`
`
`
`aS Glébal (Best. Path Searehy coi
`apa
`Go
`ec ee inky
`eee ww cow ele
`tant
`a ween
`4.3.1 Best Path Search PUOUTITs o hss cree £un.64sapla Daa 3
`4.3.2
`Performance ..... 4s sere ree eee ees
`A338. Best Path Search SamMmMary
`iu sta Gade ec ell sina
`
`4.4 Rescoring Tree-Search Word Lattice. ....-.--.---2+2-5-
`
`68
`68
`73
`74
`
`76
`
`AAD
`
`“MOtvatiinls 3) 5.6:554.5 5.85 © white hs 2 BS eR ara 76
`
`AAD BERANE oes teh
`
`-'ny
`
`lahn
`
`in
`
`Sarin
`
`Fo Lats Bole Dae
`
`gp
`
`ligte etna 5
`
`76
`
`S48 'Sutensey? Gs ohn ib oe Se es BAG 78
`
`45.
`
`Phonetic: Bast Maton. ..6 ole else g
`
`goal
`
`poe ee eee aig eee we
`
`78
`
`BST:
`
`ABORTVALION cals.
`
`ca en eae eas Sle! Ree a
`
`ew)
`
`cae oh Ge es 78
`
`4.5.2 Details of Phonetic Fast Match ......-...0 000068.
`
`4.5.3
`
`Performance of Fast Match Using All Senones .........
`
`4.5.4
`
`Performance of Fast Match Using CI Senones .........
`
`4.5.5
`
`Phonetic Fast Match Summary ..........250002-
`
`4:6. Exploiting Conctrrenty ois. d.s.e @ mle pin go lee oie mes x
`
`&
`
`4.6.1 Multiple Levels of Concurrency ..........--.+..+-.-.
`
`4.6.2
`
`Parallelization Summary .......5 285-888 8h 88 &s
`
`4.7 Summary of Search Speed Optimization .......-.....004.
`
`Memory Size Reduction
`
`5.1
`
`Senone Mixture Weights Compression. ........-.-.-.505+.
`
`5.2 Disk-Based Language Models .... 2... 2.2.5.2 2 cee eneneae
`
`80
`
`84
`
`87
`
`88
`
`89
`
`90
`
`93
`
`93
`
`97
`
`97
`
`98
`
`5.3. Summary of Experiments on Memory Size ............... 100
`
`Small Vocabulary Systems
`
`101
`
`
`
`GL))Mpraerecnl Teotie’ ooh cig sean 4 SLE ee et ee Acad oy wre 8 101
`
`
`
`62 Performance tn ATIC -o52 ha ee ES dE GSS
`
`102
`
`6.2.1 Baseline System Performance ..........-2.2204.4.
`
`102
`
`6.2.2
`
`Performance of Lexical Tree Based System ........... 103
`
`6.3 Small Vocabulary Systems Summary ..... 2... 2.2.2 uae 106
`
`aii
`
`IPR2023-00037
`Apple EX1033 Page 26
`
`a
`
`IPR2023-00037
`Apple EX1033 Page 26
`
`
`
`7 Conclusion
`
`107
`
`Col)
`
`SSuptieparyitey ELAMGe!
`
`oy.
`
`vals.
`
`Se Sadly ed Gedwc
`
`to
`
`sedee-<d Fre
`
`bea Aiea 108
`
`
`
`Pit)+SODETIMORIONS Us Fig. Sh sn Bo, ee ke a Se ee eae & 109
`
`7.3 Future Work on Efficient Speech Recognition. ............. lll
`
`Appendices
`
`A The Sphinx-II Phone Set
`
`B Statistical Significance Tests
`
`115
`
`116
`
`Bd CO Bag hat
`
`i nee
`
`B. BtGR 6.
`
`2
`
`we Re ada ee a 89 eck Bi ad a e-Gnaie
`
`117
`
`BD DEK Tasks
`
`vetsne
`
`is gehen ae yt Sghd Sig SG daa Gal
`
`aPareue) dig ene aya
`
`121
`
`Bibliography
`
`125
`
`vill
`
`IPR2023-00037
`Apple EX1033 Page 27
`
`
`IPR2023-00037
`Apple EX1033 Page 27
`
`
`
`List of Figures
`
`2.1 Viterbi Search as Dynamic Programming ................
`
`3.1
`
`Sphinx-II Signal Processing Front End. .......--........
`
`3.2 Sphinx-II HMM Topology: 5-State Bakis Model.............
`
`3.3 Cross-word Triphone Modelling at Word Ends in Sphinx-II.......
`
`3.4 Word Initial Triphone HMM Modelling in Sphinx-I]. .........
`3.5 One Frame of Forward Viterbi Beam Search in the Baseline System.
`.
`
`3.6 Word Transitions in Sphinx-II Baseline System. ...... TEE 248
`
`3.7 Outline of A* Algorithm in Baseline System .............-.
`
`3.8 Language Model Structure in Baseline Sphinx-I] System. .......
`
`4.1 Basephone Lexical Tree Example. ..........5.0280506.
`
`15
`
`24
`
`25
`
`29
`
`31
`33
`
`35
`
`38
`
`46
`
`52
`
`4.2: Txaphone Lexical ‘Tree Example: 2... a be es ee 55
`
`4.3. Cross-Word Transitions With Flat and Tree Lexicons..........
`
`57
`
`4.4 Auxiliary Flat Lexical Structure for Bigram Transitions. .......
`
`4.5 Path Score Adjustment Factor f for Word w; Upon Its Exit. .....
`
`4.6 One Frame of Forward Viterbi Beam Search in Tree Search Algorithm.
`
`4.7 Word Lattice for Utterance: Take Fidelity’s case as an example. ...
`
`4.8 Word Lattice Example Represented asa DAG...........-..-
`
`4.9 Word Lattice DAG Example Using a Trigram Grammar. .......
`
`4.10 Suboptimal Usage of Trigrams in Sphinx-II Viterbi Search. ......
`
`4,11 Base Phones Predicted by Top Scoring Senones in Each Frame; Speech
`Fragment for Phrase THIJS TREND, Pronounced DH-IX-S T-R-EH-
`DAEis the 53a od. oP athe EIS te HS ete, Ae Se oe Ios
`G
`a
`aes
`fe Ses
`
`58
`
`59
`
`63
`
`69
`
`70
`
`71
`
`73
`
`81
`
`5
`
`IPR2023-00037
`Apple EX1033 Page 28
`
`IPR2023-00037
`Apple EX1033 Page 28
`
`
`
`4.12 Position of Correct Phone in Ranking Created by Phonetic Fast Match. 82
`
`4.13 Lookahead Window for Smoothing the Active Phone List. ......
`
`4.14 Phonetic Fast Match Performance Using All Senones (20K Task).
`
`..
`
`4.15 Word Error Rate vs Recognition Speed of Various Systems...... .
`
`4.16 Configuration of a Practical Speech Recognition System. ....-...
`
`83
`
`85
`
`94
`
`95
`
`
`
`
`
`IPR2023-00037
`Apple EX1033 Page 29
`
`IPR2023-00037
`Apple EX1033 Page 29
`
`
`
`List of Tables
`
`3.1 No. of Words and Sentences in Each Test Set .............
`
`3.2 Percentage Word Error Rate of Baseline Sphinx-I] System. ......
`
`3.3. Overall Execution Times of Baseline Sphinx-II System (xRealTime)..
`
`3.4 Baseline Sphinx-II System Forward Viterbi Search Execution Times
`CROREeee a yrioee Rk ee Ee, Dad anc bee aera me
`|
`
`3.5 HMMs Evaluated Per Frame in Baseline Sphinx-II] System. .... ..
`
`3.6 N-gram Transitions Per Frame in Baseline Sphinx-II System..... .
`
`4.1 No. of Nodes at Each Level in Tree and Flat Lexicons. ........
`
`4.2 Executi