`Young et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,120,582 B1
`Oct. 10, 2006
`
`US007120582B1
`
`(54) EXPANDING AN EFFECTIVE VOCABULARY
`OF A SPEECH RECOGNITION SYSTEM
`
`(*) Notice:
`
`(75) Inventors: Jonathan H. Young, Newton, MA
`(US); Haakon L. Chevalier,
`Cambridge, MA (US); Laurence S.
`Gillick, Newton, MA (US); Toffee A.
`Albina, Cambridge, MA (US);
`Marlboro B. Moore, III, Jamaica
`Plain, MA (US); Paul E. Rensing, W.
`Newton, MA (US); Jonathan P.
`Yamron, Sudbury, MA (US)
`(73) Assignee: Dragon Systems, Inc., Newtonville,
`MA (US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`(21) Appl. No.: 09/390,370
`(22) Filed:
`Sep. 7, 1999
`(51) Int. Cl.
`(2006.01)
`GOL 5/00
`(2006.01)
`GIOL I5/06
`(52) U.S. Cl. ....................................... 704/255; 704/243
`(58) Field of Classification Search ........ 704/243-245,
`704/255 257, 4, 9–10
`See application file for complete search history.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`1, 1980 Pirz et al.
`4,181,821 A
`4,227,176 A 10, 1980 Moshier
`4,481,593. A 11, 1984 Bahler
`4.489.435 A 12/1984 Moshier
`4,783,803 A 11, 1988 Baker et al.
`4,805,218 A
`2/1989 Bamberg et al.
`4,805.219 A
`2f1989 Baker et al.
`4,829,576 A
`5, 1989 Porter
`4,837,831 A
`6, 1989 Gillick et al.
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`DE
`
`1951O 083. A
`
`9, 1996
`
`(Continued)
`OTHER PUBLICATIONS
`
`SYSTRANR) Personal for Windows 95 or NT (Version 1.0.2):
`http://www.systransoft.com/personal.html, pp. 1-2, May 6, 1998.
`(Continued)
`Primary Examiner—Angela Armstrong
`(74) Attorney, Agent, or Firm Fish & Richardson P.C.
`(57)
`ABSTRACT
`
`The invention provides techniques for creating and using
`fragmented word models to increase the effective size of an
`active vocabulary of a speech recognition system. The active
`Vocabulary represents all words and word fragments that the
`speech recognition system is able to recognize. Each word
`may be represented by a combination of acoustic models. As
`Such, the active vocabulary represents the combinations of
`acoustic models that the speech recognition system may
`compare to a user's speech to identify acoustic models that
`best match the user's speech. The effective size of the active
`Vocabulary may be increased by dividing words into con
`stituent components or fragments (for example, prefixes,
`Suffixes, separators, infixes, and roots) and including each
`component as a separate entry in the active vocabulary.
`Thus, for example, a list of words and their plural forms (for
`example, "book, books, cook, cooks, hook, hooks, look and
`looks') may be represented in the active vocabulary using
`the words (for example, “book, cook, hook and look”) and
`an entry representing the Suffix that makes the words plural
`(for example, "+s', where the "+” preceding the “s' indi
`cates that “+s' is a suffix). For a large list of words, and
`ignoring the entry associated with the Suffix, this technique
`may reduce the number of vocabulary entries needed to
`represent the list of words considerably.
`
`43 Claims, 31 Drawing Sheets
`
`1605
`
`1700
`Postulate fragments
`
`1705
`
`Split Words
`
`1707
`ldentify and Keep
`Best Fragments
`
`1710
`Postulate Fragments
`for Umsplit Words
`1715
`
`Split Words
`
`720
`
`Keep fragments
`up to threshold
`
`1750
`Split Backup Dictionary Words
`
`1745
`
`Create Stop list
`
`1740
`Using Short roots, Split insplit
`Backup Dictionary Words
`
`1730
`Make Threshold
`1725
`33d More Difficult
`to Satisfy
`Yes
`1735
`Excluding Short Roots, Split
`Backup Dictionary Words
`
`IPR2023-00037
`Apple EX1015 Page 1
`
`
`
`US 7,120,582 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`2f1990 Gillick et al.
`4,903,305 A
`6, 1991 Roberts et al.
`5,027,406 A
`4, 1993 Gillick et al.
`5,202,952 A
`8, 1993 Bahl et al.
`5,233,681 A
`5,267,345 A 11/1993 Brown et al.
`5,428,707 A
`6, 1995 Gould et al.
`5,526.463. A
`6, 1996 Gillick et al.
`5,680,511 A 10, 1997 Baker et al.
`5,754,972 A
`5, 1998 Baker et al.
`5,765,132 A * 6/1998 Roberts ...................... TO4,254
`5,797,122 A
`8/1998 Spies
`5,835,888 A * 11/1998 Kanevsky et al. ............. TO4/9
`6,092,044 A * 7/2000 Baker et al. ......... ... 704,254
`6,212,498 B1 * 4/2001 Sherwood et al. .......... 704,244
`FOREIGN PATENT DOCUMENTS
`
`
`
`EP
`EP
`
`O 982 712 A2
`O 992 979 A2
`
`3, 2000
`4/2000
`
`OTHER PUBLICATIONS
`SYSTRANR PROfessional for Windows (Version 2.0); http://www.
`systransoft.com/pro.html, pp. 1-4, Nov. 13, 1997.
`SYSTRANR) Classic for Windows (Version 1.6.2): http://www.
`systransoft.com/clas.html, pp. 1-3, Nov. 12, 1997.
`SYSTRANR PROfessional Client/Server (Version 1.6.2): http://
`www.systransoft.com/cliser.html, pp. 1-3, Nov. 12, 1997.
`SYSTRAN's MT Architecture, http://www.systransoft.com/how
`works.html, pp. 1-3, Nov. 19, 1997.
`Langenscheidt's T1, “The translator for translators'. http://www.
`gmsmuc.de/english/tl.html, pp. 1-2, 1997.
`Langenscheidt's T1 Plus, http://www.gmsmuc.de/english/tlplus.
`html, p. 1, 1997.
`Langenscheidt's T1, “Professional—Setting New Standards in
`Machine Translation'. http://www.gmsmuc.de/english/tlprofi.html.
`pp. 1-2, 1997.
`Langenscheidt's T1Translation Memory, http://www.gmsmuc.de/
`english/memory.html, p. 1, 1997.
`Langenscheidt's T1 Hotline Support, , http://www.gmsmuc.de/
`english hotline.html, p. 1-2, 1997.
`GLOBALINK(R) Power Translator 6.0, http://www.globalink.com/
`pages/product-pwtrans6.html, p. 1.
`Comprende—Real Time Internet Translation Services, http://www.
`globalink.com/pages/product-comprende.html, p. 1.
`Intranet Translator RealTime Intranet Translation Services, http://
`www.globalink.com/pages/product-intranet-translator.html, p. 1.
`Web Translator, http://www.globalink.com/pages/product-web
`translator.html, pp. 1-2.
`
`Talk to Me, http://www.globalink.com/pages/product-talktome.
`html, p. 1.
`Language Assistance Series, http://www.globalink.com/pages/prod
`uct-language-assistant.html, p. 1.
`Subject Dictionaries, http://www.globalink.com/pages/product-Sub
`ject-dictionaries.html, p. 1.
`E-mail Translator Plug-In for Eudora, http://www.globalink.com/
`pages/product-plugins.html, p. 1.
`Frisch et al., “Spelling Assistance for Compound Words'. IBM
`Journal of Research & Development; vol. 32, No. 2, Mar. 1, 1988,
`pp. 197-198.
`Marcus Spies; "A Language Model for Compound Words in Speech
`Recognition'. European Conference on Speech Communication and
`Technology, Sep. 1995, pp. 1767-1770.
`Bandara et al., “Handling German Compound Words in an Isolated
`Word Speech Recognizer", IEEE Workshop on Speech Recognition,
`Harriman, NY, Dec. 15-18, 1992, pp. 1-3.
`Steeneken et al., “Multi-Lingual Assessment of Independent Large
`Vocabulary Speech-Recognition Systems: The Sqale-Project”.
`European Conference on Speech Communication and Technology,
`Sep. 1995, pp. 1271-1274.
`Dugast et al., “The Philips Large-Vocabulary Recognition System
`for Americal English, French and German'. European Conference
`on Speech Communication and Technology, Sep. 1995, pp. 197
`200.
`Pye, et al., "Large Vocabulary Multilingual Speech Recognition
`Using HTK'. European Conference on Speech Communication and
`Technology, Sep. 1995, pp. 181-184.
`Lamel et al., “Issues in Large Vocabulary, Multilingual Speech
`Recognition'. European Conference on Speech Communication and
`Technology, Sep. 1995, pp. 185-188.
`Barnett et al., Comparative Performance in Large-Vocabulary Iso
`lated-Word Recognition in Five European Languages, European
`Conference on Speech Communication and Technology, Sep. 1995,
`pp. 189-192.
`Geutner, P.; “Using Morphology Towards Better Large-Vocabulary
`Speech Recognition Systems'; Proceedings of the International
`Conference on Acoustics, Speech and Signal Processing (ICASSP);
`pp. 445-448; May 9, 1995; XP 000658026.
`Hwang, “Vocabulary Optimization Based on Perplexity”; IEEE
`International Conference on Acoustics, Speech and Signal Process
`ing (CASSP); pp. 1419-1422; Apr. 21, 1997; XP000822723.
`Berton et al., “Compound Words in Large-Vocabulary German
`Speech Recognition Systems'; Proceedings of the International
`Conference on Spoken Language Processing, vol. 2: pp. 1165-1168;
`Oct. 3, 1996; XP002142831.
`Wothke, K.; “Morphologically based automatic phonetic transcrip
`tion: IBM Systems Journal, vol. 32(3): pp. 486-511; 1993.
`* cited by examiner
`
`IPR2023-00037
`Apple EX1015 Page 2
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 1 of 31
`
`US 7,120,582 B1
`
`100 Y
`
`120
`
`Display
`
`115
`
`125
`
`
`
`Computer
`
`Memory
`150
`Operating
`System
`
`Speech
`Recognition
`Software
`
`FIG. 1
`(Prior Art)
`
`IPR2023-00037
`Apple EX1015 Page 3
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 2 of 31
`
`US 7,120,582 B1
`
`SAIPY
`
`Aseinqeson
`
`onsnooy
`
`sjepoyy
`
`be eeea
`
`JUIBSUOD
`
`SJBWLUEJS)
`
`dnyoeg
`
`Ayeuonoig
`
`SVC
`
`uonluBooay
`
`sojepipued
`
`|Ou}UOT)
`
`GLE
`
`SOLLJ9}U|/[OUOD
`
`gINpoyy
`
`0é¢
`
`
`
`Jo
`
`aoe
`
`SPION\
`
`sjsenbey
`
`Ove
`
`Buueyy-ad
`
`aINPSd0/¢
`
`JaziubooeyBuisseoold
`
`
`"SIBJOWEIEd|pys-ju014sojdwes
`
`josowel4jevbig
`
`
`
`aInpo!W0Sz
`
`IPR2023-00037
`Apple EX1015 Page 4
`
`IPR2023-00037
`Apple EX1015 Page 4
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 3 of 31
`
`US 7,120,582 B1
`
`300
`
`305
`
`31 O
`
`Produce X(f)
`
`Determine
`log (X(f))2
`
`315
`Frequency Warping
`
`320
`Filter Bank Analysis
`
`3 25
`Cepstral Analysis
`
`3 30
`Channel Normalization
`
`3 35
`Produce Cepstral
`Differences
`
`3 40
`Produce Cepstral
`Second Differences
`
`3 45
`
`MELDA
`
`FIG. 3
`(Prior Art)
`
`IPR2023-00037
`Apple EX1015 Page 5
`
`
`
`U.S. Patent
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 4 of31
`
`US 7,120,582 B1
`US 7,120,582 B1
`
`
`
`
`oO_
`ao
`JU2s
`> Oo
`© 8S |i
`oe
`=t
`
`
`
`Oo
`“
`
`
`
`
`
`"select" 405
`
`
`
`at
`+ <
`«ho
`O ©
`La
`—
`
`mt
`+ <
`.
`bo
`Oo
`Lg
`ema?
`
`IPR2023-00037
`Apple EX1015 Page 6
`
`IPR2023-00037
`Apple EX1015 Page 6
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet S of 31
`
`US 7,120,582 B1
`
`
`
`FIG. 5
`(Prior Art)
`
`IPR2023-00037
`Apple EX1015 Page 7
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 6 of 31
`
`US 7,120,582 B1
`
`600-N
`
`512
`
`O
`
`505
`
`C
`
`61 O
`
`O
`H
`
`O
`AA
`
`t
`
`O
`Z
`
`QU
`
`OR
`
`LQ
`
`CM
`
`6
`
`O
`
`ZO
`
`62O
`
`NGC
`
`615
`
`Hear
`
`Heal,
`Heel
`
`O
`
`Hum
`
`Hug
`
`Heals,
`Heels
`
`Healing
`
`ONG
`
`Humming
`
`FIG. 6
`(Prior Art)
`
`IPR2023-00037
`Apple EX1015 Page 8
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 7 of 31
`
`US 7,120,582 B1
`
`700
`
`Get Next Frame
`
`Find an Active Node with no
`Active, Unprocessed Subnodes
`
`710
`
`
`
`715
`
`
`
`At
`Highest Node
`2
`
`
`
`Yes
`
`725
`
`730
`
`Go to Next Node with no
`Unprocessed Subnodes
`
`735
`More
`Y
`Words Fossible)
`
`
`
`
`
`740
`
`No
`
`Return List
`
`FIG. 7
`(Prior Art)
`
`IPR2023-00037
`Apple EX1015 Page 9
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 8 of 31
`
`US 7,120,582 B1
`
`800
`
`"
`
`80s.'
`
`830-
`
`512-N
`845
`()
`so.5 865
`
`FIG. 8B
`(Prior Art)
`
`"
`
`a15 825
`
`"
`
`815
`
`825-
`
`FIG. 8A
`(Prior Art)
`
`505
`860,
`
`as
`
`-
`
`FIG. 8C
`(Prior Art)
`
`IPR2023-00037
`Apple EX1015 Page 10
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 9 of 31
`
`US 7,120,582 B1
`
`
`
`(„0.) 018
`
`
`
`(„8.) 908
`
`
`
`006
`
`G06
`
`O L6
`
`OZ6
`
`006
`
`G06
`
`0 | 6
`
`G?6
`
`OZ6
`
`IPR2023-00037
`Apple EX1015 Page 11
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 10 of 31
`
`US 7,120,582 B1
`
`1100
`
`1105
`
`1110
`
`1115
`
`
`
`1125
`
`Update
`SCOres/Times
`
`Score(s)
`D
`Threshold
`
`No
`
`Deactivate
`States/Node
`
`WOrd
`to be Added
`?
`
`
`
`Add Words to List
`
`1130
`
`Save SCOre for
`Reseeding
`
`FIG. 11
`(Prior Art)
`
`IPR2023-00037
`Apple EX1015 Page 12
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 11 of 31
`
`US 7,120,582 B1
`
`1200-
`
`1205
`
`
`
`1210
`
`Initialize Lexical Tree
`
`Retrieve Frame
`
`1215
`No
`Hypothesiso Consider
`
`Y eS
`Go to First Hypothesis
`
`Compare Frame to Hypothesis
`
`Update Score
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1235
`
`1245
`
`Delete Hypothesis
`
`Set Word Ending Flag
`
`1255
`Additional ilypothesis
`
`1260
`
`Next Hypothesis
`
`1265
`
`1270
`
`
`
`
`
`
`
`
`
`1275
`1280
`
`Request Pre-filtering List
`
`Create/Expand Hypotheses
`
`1285
`
`Return Recognition Candidates
`
`
`
`No
`
`
`
`
`
`FIG. 12
`(Prior Art)
`
`IPR2023-00037
`Apple EX1015 Page 13
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 12 of 31
`
`US 7,120,582 B1
`
`OOOOG
`
`
`
`
`
`
`
`OOOOG
`
`Spo/W
`
`IPR2023-00037
`Apple EX1015 Page 14
`
`
`
`U.S. Patent
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 13 of 31
`
`US 7,120,582 B1
`US 7,120,582 B1
`
`SorL
`
`sjuowBel4
`
`oS©
`
`>Oo
`
`oO
`
`Ozrl3S.
`SZbl2goOonn
`
`oerl=
`
`Gehl
`
`Oo
`
`35aNon
`
`VvlSls dnyoegKJenvy2
`sexijeld[]]sjooyeny[||aiqenids-uoyTM=aiqenids0SOXIUNSKYS}OOYPONEZsjusuBel4RYSPIOAA
`SAIOY£7
`orb‘OIdarb“Sid
`
`
`
`
`
`
`
`
`
`
`
`Jayy-youel4
`
`
`
`uol}e}IIGsJa;dwoy
`
`Asejnqeson
`
`a10Jog-youel4
`
`
`
`uOolje}LDIGgeye|dwo04y
`
`Aieinqeso0a
`
`
`
`
`000‘0SZ 000°00Z 000'0SL 000°001
`
`000'0S
`
`0
`
`SPJON
`
`000
`
`NaoO
`
`NS©2SSo
`
`—222=)S—wn2l=)2oS
`
`0000S
`
`IPR2023-00037
`Apple EX1015 Page 15
`
`IPR2023-00037
`Apple EX1015 Page 15
`
`
`
`
`
`
`U.S. Patent
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 14 of 31
`
`US 7,120,582 B1
`US 7,120,582 B1
`
`sjuowbe4
`
`sjuowBes4 PJOAA
`
`SOXISJg[I]s}jooyey
`SOXILINSKJS]OONPJOAAFY
`
`aiqeniids-uon
`
` aiqenndsOsjuewbel4AYSPIOAASAIDY Aseynqesoa
`dnyoegRY)eA
`
` 0
`
`O
`
`9S}‘Sls
`
`
`
`gS-‘Old
`
`VSb‘Sis
`
`
`
`
`
`
`Jayy-ys![6uq“sn
`
`elojog-ysi[buq“s'n
`
`
`
`uolepIGgeye;}dwo4y
`
`
`
`uole}IGgeyajdwoy
`
`000‘0SZ O00'00Z 000'0SL 000'001
`
`000°0S
`
`SPIOM
`
`Asejnqesoa
`
`000‘0SZ 000°00Z/ 000'0Sl 000°001
`
`000'0S
`
`O
`0
`
`IPR2023-00037
`Apple EX1015 Page 16
`
`IPR2023-00037
`Apple EX1015 Page 16
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 15 of 31
`
`US 7,120,582 B1
`
`Generate New ACtive
`Vocabulary & Backup
`Dictionary
`
`
`
`Perform Speech
`Recognition on Utterance
`
`Perform Post-Recognition
`Processing on Utterance
`
`
`
`FIG. 16B
`
`IPR2023-00037
`Apple EX1015 Page 17
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 16 of 31
`
`US 7,120,582 B1
`
`1605
`
`1700
`Postulate Fragments
`
`1750
`Split Backup Dictionary Words
`
`11 77 OO
`
`Split Words
`
`Identify and Keep
`Best Fragments
`
`1 7 1 O
`Postulate Fragments
`for Unsplit Words
`
`1715
`
`1720
`
`Split Words
`
`Keep Fragments
`up to Threshold
`
`
`
`1745
`
`Create Stop List
`
`1740
`Using Short Roots, Split Unsplit
`Backup Dictionary Words
`
`1725
`Make Threshold
`Syd More Difficult
`to Satisfy
`
`Yes
`1735
`Excluding Short Roots, Split
`Backup Dictionary Words
`
`FIG. 17
`
`IPR2023-00037
`Apple EX1015 Page 18
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 17 of 31
`
`US 7,120,582 B1
`
`
`
`1800
`
`Postulate Affixes
`
`1805
`Find Word-ROOtS and
`True-ROOtS
`
`
`
`1810
`POStulate Affixes With ACtive
`Vocabulary Supplemented
`by Roots
`
`
`
`Find Useful Spelling Rules
`
`
`
`
`
`
`
`Group Spelling Rules
`to Form Affixes
`1910 N.
`Keep Most Useful Spelling
`Rules for Each Affix
`
`Keep Most Useful Affixes
`
`FIG. 18
`
`FIG. 19
`
`IPR2023-00037
`Apple EX1015 Page 19
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 18 of 31
`
`US 7,120,582 B1
`
`1900
`
`2000
`Read Backup Dictionary
`into Data Structure
`
`2005
`Index Words in Active
`Vocabulary Using j
`and Set j=0
`
`2010
`
`End
`
`2040
`
`Retrieve Active Wordj
`
`Output Spelling Rules
`
`
`
`2015
`For Active Word j,
`Search for Similar Words
`in Backup Dictionary
`
`
`
`Increment j
`
`Yes
`
`2030
`
`No
`
`More
`Active
`Words
`2
`
`2025
`Store All Spelling
`Rules
`
`FIG. 20
`
`IPR2023-00037
`Apple EX1015 Page 20
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 19 Of 31
`
`US 7,120,582 B1
`
`2100
`Find POSSible ROOtS Based On
`Junction with Affixes to Form
`Backup Dictionary Words
`
`2105
`
`Retrieve ROOt
`
`
`
`2115
`Call Root
`WOrd ROOt
`
`
`
`2110
`Yes
`
`ROOt
`Close to Existing
`Word
`
`Call ROOt
`True ROOt
`
`2125
`
`2135
`
`
`
`More
`Ropts
`
`NO
`
`Keep Most Useful
`New ROOtS
`
`End
`
`Retrieve
`Next ROOt
`
`IPR2023-00037
`Apple EX1015 Page 21
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 20 of 31
`
`US 7,120,582 B1
`
`2200
`Read Backup Dictionary and
`Affixes into Data Structure
`
`
`
`2205
`Retrieve Word in Backup
`Dictionary
`
`2210
`
`Retrieve Affix j
`
`2215
`Root ij= Word i-Affix.j
`
`2220
`
`Store Root ij
`
`
`
`
`
`Retrieve Next
`Affix j
`
`
`
`2240
`Retrieve Next
`Word
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IPR2023-00037
`Apple EX1015 Page 22
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 21 of 31
`
`US 7,120,582 B1
`
`Find WOrd-ROOtS
`and True-ROOtS
`2305
`Postulate Affixes for
`Unsplit Words for All
`Words in Active
`Vocabulary + Latest
`Set of ROOtS
`
`FIG. 23
`
`
`
`
`
`
`
`Do Splitting
`
`2410
`Count Uses of
`Each Fragment
`
`2415
`Keep Most Useful
`Fragments
`
`FIG. 24
`
`IPR2023-00037
`Apple EX1015 Page 23
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 22 of 31
`
`US 7,120,582 B1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`2500
`
`Determine Possible Splits
`
`2505
`Load All Words and Word
`Fragments into Data Structure
`2510
`
`Set Word Index KEO
`
`2515
`
`For Wordk, Find
`Partial PrOn Matches
`
`Split Cover
`Entire Word
`p
`
`
`
`Split
`Valid
`
`
`
`
`
`2535
`Increment Yes
`K
`
`
`
`
`
`S
`Matching
`Complete
`?
`
`
`
`Output
`Splits
`
`FIG. 25
`
`IPR2023-00037
`Apple EX1015 Page 24
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 23 of 31
`
`US 7,120,582 B1
`
`assaloocano
`41%.1%LOOO
`ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZl 9,000 &
`SN
`S.
`S.
`SZZZZZ 967000 Y2
`44444 %GOOO Q6
`SNS
`O
`S.
`
`S.
`NNNNS
`ZZZZZZZZZZZZZZZZ 9, O'O & Q
`SNNNNS
`t
`S. 9,
`SZZZZ, 9% OO
`ZaayaaaaaaaaaaaaZ % G OO
`O
`YNNass
`ZZZZZZZZZ (9%GOO
`SYYYYY
`O
`ZZZZZZZ 9.9700
`RSSN
`Z41 % AOO
`RNNYSSSNYSSYYY
`444 %OO
`NNNNNNSNNNNYY
`ZZZZZZZZZ 900
`SSNNNNNSNSN
`7.7272/72222222222
`N %OZO
`
`OY
`S. &
`t
`s^
`?o
`S.
`t
`S.
`S. 2 co
`S.
`N
`& '9' as
`(D
`S.
`S.
`so LL
`
`
`
`w
`O w
`r – S 6
`R 9
`5
`O Go O
`9 5
`p O g
`Z st
`t
`r:
`S.
`&
`
`}
`
`()
`
`n
`
`Y
`
`SaaSYSSSSSSYNY.
`o
`S44%Og O
`ZZZZZZZZZZZ 94O9 O
`41 %OAAO
`YSSYSSSSSSSSNYSSYS
`ZZ 9%OO60
`SSSYaaaaar-SSNSSN
`O
`
`SSSYaaaaaSSSSS
`
`y O
`4 %OO6
`San SSSSSSSSSSSSSNNNN a
`ZYZ 9/OOR6
`SaaSNNNSSSSSSSSSSSNYS
`stanz
`SNSSNNNNNNNNNNNSNS
`%000 OCZ
`4 %OOOC
`sy O
`24 %OOOC7
`SNNaxNSSNNaNNSN O
`
`O O O v- O Co O O O
`O o v
`O r
`v
`
`equunN/Ue3e
`
`O O
`O o
`O O
`O O
`o y
`v Y
`S
`S
`
`W.
`OY
`S.
`/
`s
`s
`1,
`O
`27 o
`
`(S
`%
`SO
`9,
`S '9
`S.
`h
`2,
`1, O
`or
`S.
`
`IPR2023-00037
`Apple EX1015 Page 25
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 24 of 31
`
`US 7,120,582 B1
`
`Begin
`
`2705
`
`Limit NE
`
`2 7 1 to Build
`
`Active Vocabulary
`2 7 1 5
`Split Words in
`Active Vocabulary
`2720
`Build NeWACtive
`Vocabulary
`
`2725
`Not Using Short Roots,
`Split Words in
`Backup Vocabulary
`2730
`Using Short Roots,
`Split Words in
`Backup Dictionary
`2735
`Make List of
`Unused Fragments
`
`End
`
`FIG. 27
`
`IPR2023-00037
`Apple EX1015 Page 26
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 25 Of 31
`
`US 7,120,582 B1
`
`Generate Language Model SCOres for
`Words and Word Fragments in Active
`Vocabulary
`
`FG. 28A
`
`
`
`
`
`
`
`
`
`Retrieve Training Collection of Text
`
`2830
`
`Build N-gram Language Model
`
`
`
`2835
`For Each N-gram Sequence, Replace
`Splittable Backup Words with Corresponding
`Words and Word Fragments
`
`2840
`Generate N-gram Language Model
`for Words and Word Fragments
`
`
`
`FIG. 28B
`
`IPR2023-00037
`Apple EX1015 Page 27
`
`
`
`
`
`eBenbue7qwesbiunS1OUMnN+OOZE=BtOUM0}N+00LS=o6OJOUM
`
`
`lapoyyON+006h=0}Ayyoinb
`
`S86|Ain+oge+009==Alt06
`cggz—”JxoLJO992
`
`zsezseLS8z
`"*J8A0OBAj+yoinboy="=aueuMO68Aj+JUSHINYIM
`
`
`weBIN499g¢=juabinAjuebun
`
`MPNOge=yoINbJ8A0
`
`
`
`
`“*yaa006Ayyoinboo}«6°auaumMO68Ajuabin[iM*°
`
`JOA0V4OGG=JAAO
`
`
`IMN+OSGE
`
`9S87¢
`
`}XOL_JOUTOa||OD
`
`
`
`UdH99]/ODpolylpow
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 26 of 31
`
`US 7,120,582 B1
`
`982‘Sls
`
`IPR2023-00037
`Apple EX1015 Page 28
`
`IPR2023-00037
`Apple EX1015 Page 28
`
`
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 27 of 31
`
`US 7,120,582 B1
`
`
`
`ebenbueqwesbig
`
`|spo/
`
`*JOAOOBAl+YOIND0}
`
`
`aJayMOBAj+JueGun[IM-~
`
`
`
`U9H99|[ODPopo
`
`jx]JO
`
`}XOlJOUOlaTIOD
`
`CS8SA
`
`NSLS8c
`
`**JQ@A006AyyoOINboO}
`
`
`
`aJ9ymobAjuebun[IM
`
`PAIN+OOL+002A\wsbin,|+002
`
`Aj+juebun
`
`06Aj+
`
`BJOUMob+009
`8J8UMOB
`
`yuebuinIMn+OSz
`
`
`yUeBHIN||IM
`
`INDON+OSL
`
`APPIN+COL
`
`
`
`J8AOOBDy00+
`
`Aj+yoinb
`
`yoinbo}
`
`Jaaoob
`
`LL82
`
`
`
`Ajjua6un|yim
`
`06Ajjuebun
`
`Ajyoinb0}
`
`06Ajyoinb
`
`3J8UMOB
`
`JaA006
`
`qagz
`
`r
`
`IPR2023-00037
`Apple EX1015 Page 29
`
`IPR2023-00037
`Apple EX1015 Page 29
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 28 of 31
`
`US 7,120,582 B1
`
`IPR2023-00037
`Apple EX1015 Page 30
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 29 of 31
`
`US 7,120,582 B1
`
`2880
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`5
`
`;
`
`;
`
`FIG. 28F
`
`IPR2023-00037
`Apple EX1015 Page 31
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 30 of 31
`
`US 7,120,582 B1
`
`1660
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Receive Recognition
`Candidates
`2900
`Select Next
`Candidate
`
`includes
`Fragments
`
`Yes
`
`No
`2910-
`Add Candidate
`to Revised List
`
`
`
`2915
`Process Word
`Fragments
`
`NeW
`Candidate(s)
`
`2925
`
`ResCOre
`Candidate(s)
`
`Add ResCored
`Candidate(s) to
`Revised List
`
`More
`Candidates
`
`
`
`FIG. 29
`
`
`
`
`
`IPR2023-00037
`Apple EX1015 Page 32
`
`
`
`U.S. Patent
`
`Oct. 10, 2006
`
`Sheet 31 of 31
`
`US 7,120,582 B1
`
`2915
`
`-3000
`Retrieve Next Sequence
`of Fragments
`
`3010
`Get First Spelling
`Rule Set
`
`Z-3015
`Form Prospective Word
`
`
`
`invalid
`Sequence
`
`Yes
`
`Backup
`Dictionary
`2
`
`NO
`
`Yes
`Generate Candidate
`Using Prospective Word
`
`Active
`Vocabulary
`
`
`
`
`
`
`
`
`
`
`
`
`
`Process Word Yes
`Fragments
`
`
`
`More
`Fragments in
`Generated
`Candidate
`
`
`
`
`
`
`
`
`
`
`
`
`
`Spelling
`Rules
`
`No
`
`Yes
`3040
`Get Next Spelling
`Rule Set
`
`FIG. 30
`
`3045
`
`
`
`
`
`More
`Sequences
`p
`
`Yes
`
`No End
`
`IPR2023-00037
`Apple EX1015 Page 33
`
`
`
`US 7,120,582 B1
`
`1.
`EXPANDING AN EFFECTIVE VOCABULARY
`OF A SPEECH RECOGNITION SYSTEM
`
`BACKGROUND
`
`2
`an available amount of memory is limited. Since the recog
`nizer does not recognize words that are not included in the
`active vocabulary, the ability of the recognizer to recognize
`less-frequently-used words may be improved by increasing
`the size of the active vocabulary.
`The effective size of the active vocabulary may be
`increased by dividing words into constituent components or
`fragments (for example, prefixes, Suffixes, separators,
`infixes, and roots) and including each component as a
`separate entry in the active vocabulary. Thus, for example,
`a list of words and their plural forms (for example, “book,
`books, cook, cooks, hook, hooks, look and looks') may be
`represented in the active vocabulary using the words (for
`example, “book, cook, hook and look”) and an entry repre
`senting the Suffix that makes the words plural (for example,
`+s', where the "+
`& G
`“+” preceding the
`indicates that "+s” is
`a Suffix). For a large list of words, and ignoring the entry
`associated with the Suffix, this technique may reduce the
`number of vocabulary entries needed to represent the list of
`words considerably.
`The invention provides a method of expanding an effec
`tive active vocabulary of a speech recognition system that
`uses a speech recognizer. The speech recognizer perform
`speech recognition on a user utterance to produce one or
`more recognition candidates. Speech recognition includes
`comparing digital values representative of the user utterance
`to a set of acoustic models representative of an active
`vocabulary of the system. The set of acoustic models
`includes models of words and models of word fragments.
`The method further includes receiving the recognition
`candidates from the speech recognizer. When a received
`recognition candidate includes a word fragment, the method
`includes determining whether the word fragment may be
`combined with one or more adjacent word fragments or
`words to form a proposed word included in a backup
`dictionary of the speech recognition system.
`Furthermore, if the word fragment may be combined with
`one or more adjacent word fragments or words to form a
`proposed word included in a backup dictionary of the speech
`recognition system, the method includes modifying the
`recognition candidate to Substitute the proposed word for the
`word fragment and the one or more adjacent word fragments
`or words used to form the proposed word.
`Moreover, if the word fragment may not be combined
`with one or more adjacent word fragments or words to form
`a proposed word included in a backup dictionary of the
`speech recognition system, the method includes discarding
`the recognition candidate.
`Embodiments may include one or more of the following
`features. For example, the expanded effective vocabulary
`may include words from the backup dictionary that are
`formed from a combination of words and word fragments or
`word fragments and word fragments from an active Vocabu
`lary that includes words and word fragments, and words
`from the active vocabulary.
`The word fragments may include Suffixes, prefixes, and
`roots that are not words. Additionally, one or more spelling
`rules may be associated with each prefix and each suffix.
`Determining whether the word fragment may be combined
`with one or more adjacent word fragments or words to form
`a proposed word may include using a prefix or Suffix as the
`particular word fragment and using an associated spelling
`rule in forming the proposed word. As a result of using the
`associated spelling rule, a spelling of the proposed word may
`differ from a spelling that would result from merely con
`catenating the particular word fragment with the one or more
`adjacent word fragments or words.
`
`10
`
`15
`
`The invention relates to expanding an effective Vocabu
`lary of a speech recognition system.
`A speech recognition system analyzes a user's speech to
`determine what the user said. Most speech recognition
`systems are frame-based. In a frame-based system, a pro
`cessor divides a signal descriptive of the speech to be
`recognized into a series of digital frames, each of which
`corresponds to a small time increment of the speech.
`A speech recognition system may be a "discrete” system
`that recognizes discrete words or phrases but which requires
`the user to pause briefly between each discrete word or
`phrase. Alternatively, a speech recognition system may be a
`“continuous system that can recognize spoken words or
`phrases regardless of whether the user pauses between them.
`Continuous speech recognition systems typically have a
`higher incidence of recognition errors in comparison to
`discrete recognition systems due to complexities of recog
`nizing continuous speech. A detailed description of continu
`ous speech recognition is provided in U.S. Pat. No. 5,202,
`952, entitled “LARGE-VOCABULARY CONTINUOUS
`25
`SPEECH PREFILTERING AND PROCESSING SYS
`TEM, which is incorporated by reference.
`In general, the processor of a continuous speech recog
`nition system analyzes “utterances” of speech. An utterance
`includes a variable number of frames and corresponds, for
`example, to a period of speech followed by a pause of at
`least a predetermined duration.
`The processor determines what the user said by finding
`acoustic models that best match the digital frames of an
`utterance, and by identifying text that corresponds to those
`acoustic models. An acoustic model may correspond to a
`word, phrase or command from a vocabulary. An acoustic
`model also may represent a sound, or phoneme, that corre
`sponds to a portion of a word. Collectively, the constituent
`phonemes for a word represent the phonetic spelling of the
`word. Acoustic models also may represent silence and
`various types of environmental noise. In general, the pro
`cessor may identify text that corresponds to the best-match
`ing acoustic models by reference to phonetic word models in
`an active Vocabulary of words and phrases.
`The words or phrases corresponding to the best matching
`acoustic models are referred to as recognition candidates.
`The processor may produce a single recognition candidate
`for an utterance, or may produce a list of recognition
`candidates.
`
`30
`
`35
`
`40
`
`45
`
`50
`
`SUMMARY
`
`The invention provides techniques for creating and using
`fragmented word models to increase the effective size of an
`active vocabulary of a speech recognition system. The active
`Vocabulary represents all words and word fragments that the
`speech recognition system is able to recognize. Each word
`may be represented by a combination of acoustic models. As
`Such, the active vocabulary represents the combinations of
`acoustic models that the speech recognition system may
`compare to a user's speech to identify acoustic models that
`best match the user's speech.
`Memory and processing speed requirements tend to
`increase with the number of entries in the active vocabulary.
`As such, the size of the active vocabulary that may be
`processed in an allotted time by a particular processor using
`
`55
`
`60
`
`65
`
`IPR2023-00037
`Apple EX1015 Page 34
`
`
`
`US 7,120,582 B1
`
`10
`
`15
`
`3
`Determining whether the word fragment may be com
`bined with one or more adjacent word fragments or words
`may include retrieving from the received recognition can
`didate a sequence that includes the particular word fragment
`and adjacent word fragments or words. Determining may
`further include determining if the sequence is a valid
`Sequence.
`A valid sequence may include only one or more allowed
`adjacent combinations of word fragments and words. More
`over, allowed adjacent combinations may include one or
`more prefixes, followed by a root or a word, followed by one
`or more Suffixes. Other allowed adjacent combinations may
`include a root or a word followed by one or more suffixes,
`and one or more prefixes followed by a root or a word.
`The method may further include combining the particular
`word fragment with the one or more adjacent word frag
`ments or words to form a second proposed word that differs
`from the first proposed word by using a second associated
`spelling rule in forming the proposed word.
`One or more spelling rules may be associated with a
`particular word fragment. And, combining the particular
`word fragment with one or more adjacent word fragments or
`words to form a proposed word may include using an
`associated spelling rule in forming the proposed word. As a
`result of using the associated spelling rule, a spelling of the
`proposed word may differ from a spelling that would result
`from merely concatenating the particular word fragment
`with the one or more adjacent word fragments or words.
`Determining whether the word fragment may be com
`30
`bined with one or more adjacent word fragments or words to
`form a proposed word included in a backup dictionary of the
`speech recognition system may include searching the
`backup dictionary for the proposed word.
`Modifying the recognition candidate may include forming
`a prospective recognition candidate, and if the prospective
`recognition candidate includes an additional word fragment,
`forming a final recognition candidate. The prospective rec
`ognition candidate may be formed by modifying the recog
`nition candidate to substitute the proposed word for the word
`fragment and the one or more adjacent word fragments or
`words used to form the proposed word. Moreover, the
`prospective recognition candidate may be further processes
`to generate an additional word using the additional word
`fragment and one or more adjacent words or word frag
`ments. The final recognition candidate may be formed by
`replacing the additional word fragment and the one or more
`adjacent words with the additional word.
`A score may be associated with the received recognition
`candidate. Such that the method further includes producing
`a score associated with the modified recognition candidate
`by rescoring the modified recognition candidate.
`The score associated with the received recognition can
`didate may include an acoustic component and a language
`model component. Rescoring the modified recognition can
`didate may therefore include generating a language model
`score for the modified recognition candidate.
`Producing the score associated with the modified recog
`nition candidate may include combining the acoustic com
`ponent of the score for the received recognition candidate
`60
`with the language model score generated for the modified
`recognition candidate.
`Rescoring the modified recognition candidate may
`include generating an acoustic model score for the modified
`recognition candidate. Furthermore, producing the score
`associated with the modified recognition candidate may
`include combining the acoustic model score generated for
`
`50
`
`4
`the modified recognition candidate with the language model
`score generated for the modified recognition candidate.
`The score associated with the received recognition can
`didate may include an acoustic component and a language
`model component. Rescoring the modified recognition can
`didate may therefore include generating an acoustic score
`for the modified recognition candidate.
`Producing the score associated with the modified recog
`nition candidate may include combining the language model
`component of the score for the received recognition candi
`date with the