throbber
United States Patent (19)
`Robinson
`
`USOO5983.18OA
`Patent Number:
`11
`(45) Date of Patent:
`
`5,983,180
`Nov. 9, 1999
`
`54 RECOGNITION OF SEQUENTIAL DATA
`USING FINITE STATE SEQUENCE MODELS
`ORGANIZED IN A TREE STRUCTURE
`
`75 Inventor: Anthony John Robinson, Cambridge,
`United Kingdom
`
`73 Assignee: SoftSound Limited, Hertfordshire,
`United Kingdom
`
`Tony Robinson and James Chrisitie, “Time-First Search for
`Large Vocabulary speech Recognition,” Proc. 1998 IEEE
`Int. Conf. Acoust., Speech, and Sig. Proc., 1998, May
`12–15, 1998, vol. 2, pp. 829–832.
`Aubert, X., et al., "Large Vocabulary Continuous Speech
`Recognition of Wall Street Journal Data”, Proceedings of the
`IEEE International Conference on Acoustics, Speech, and
`Signal Processing, vol. 2, 129-132, (1994).
`
`21 Appl. No.: 09/031,881
`22 Filed:
`Feb. 27, 1998
`30
`Foreign Application Priority Data
`Oct. 31, 1997 GB United Kingdom ................... 9723092
`Jan. 13, 1998 GB United Kingdom ................... 98OO658
`(51) Int. Cl. .................................................. G10L 5/06
`52 U.S. Cl. ............................................. 704/254; 704/256
`58 Field of Search ..................................... 704/241-245,
`704/254 .257
`
`56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,748,670 5/1988 Bahl et al. .............................. 704/256
`5,033,087
`7/1991 Bahl et al. ...
`... 704/256
`5,349,645 9/1994 Zhao ................
`... 704/243
`5,621,859 4/1997 Schwartz et al. ....................... 704/256
`FOREIGN PATENT DOCUMENTS
`0627726 12/1994 European Pat. Off. .......... G10L 3/OO
`97/08686 3/1997 WIPO .............................. G10L 5/06
`
`OTHER PUBLICATIONS
`Douglas B. Paul, “An Efficient A* Stack Decoder Algorithm
`for Continuous Speech Recognition with a Stochastic Lan
`guage Model.” Int. Conf. on Acoust., Speech, and Sig. Proc.,
`1992, ICASSP-92, 23–25 Mar. 1992, vol. 1, pp. 125-128.
`Patrick Kenny, Rene Hollan, Vishwa N. Gupta, Matthew
`Lennig, P. Mermelstein, and Douglas O'Shaughnessy, “A*
`-admissible Heuristics for Rapid Lexical Access,” IEEE
`Trans. Speech and Audio Proc., Jan. 1993, Vol. 1, iss. 1, pp.
`49-58.
`
`
`
`(List continued on next page.)
`
`Primary Examiner David R. Hudspeth
`ASSistant Examiner Donald L. Storm
`Attorney, Agent, or Firm-Schwegan, Lundberg, Woessner
`& Kluth, PA.
`ABSTRACT
`57
`In a method of automatically recognizing data which com
`prises Sequential data units represented as Sequential tokens
`grouped into one or more items, known items are Stored as
`respective finite State Sequence models. Each State corre
`sponds to a token and the models which have common prefix
`States are organized in a tree Structure Such that Suffix States
`comprise branches from common prefix States and there are
`a plurality of tree Structures each having a different prefix
`State. Each Sequential data unit is compared with Stored
`reference data units identified by reference tokens to gen
`erate Scores indicating the Similarity of the data units to
`reference data units. An accumulated Score for the final State
`in the models is determined by Steps of (a) sequentially
`calculating the accumulated Score for a model to reach the
`final State comprising a leaf in the tree, (b) identifying the
`closest branch to the leaf corresponding to a next model for
`which an accumulated Score for the final Stage has not yet
`been calculated, and (c) accumulating the score from the
`identified closest branch for the next model to the final state.
`These steps are repeated for the branches of the trees. The
`item corresponding to a model having the highest accumu
`lated Score is recognized as the model best matching the
`data.
`
`48 Claims, 24 Drawing Sheets
`
`47
`AccEPTABLE
`
`IPR2023-00037
`Apple EX1009 Page 1
`
`

`

`5,983,180
`Page 2
`
`OTHER PUBLICATIONS
`Deller, Jr., J.R., et al., Discrete-Time Processing of Speech
`Signals, J. Griffin, ed., Macmillan Publishing Company,
`New York, table of contents, (1993).
`Jelinek, F., “Up From Trigrams. The Struggle for Improved
`Language Models”, Proceedings of the European Confer
`ence on Speech Technology, 1037–1040, (1991).
`Katz, S.M., “Estimation of Probabilities From Sparse Data
`for the Language Model Component of a Speech Recog
`nizer”, IEEE Transactions on Acoustics, Speech, and Signal
`Processing, vol. ASSP-35, 400–401, (Mar. 1987).
`Moyano, J.H., “Memory Efficient Search for Large Vocabu
`lary Recognition”, Thesis for MPhil in Computer Speech
`and Language Processing, Cambridge University, (Aug.
`1997).
`Rabiner, L., et al., Fundamentals of Speech Recognition,
`PTR Prentice-Hall, Inc., publisher, Englewood Cliffs, New
`Jersey, table of contents, (1993).
`
`Rabiner, L.R., "A Tutorial on Hidden Markov Models and
`Selected Applications in Speech Recognition', Proceedings
`of the IEEE, vol. 77, 257–286, (Feb. 1989).
`Renals, S., et al., “Decoder Technology for Connectionist
`Large Vocabulary Speech Recognition’, Cambridge Univer
`sity Engineering Department Technical Report CS-95-17,
`1-22 and appdx., (Sep. 1995).
`Robinson, T., et al., “The Use of Recurrent Neural Networks
`in Continuous Speech Recognition', Advanced Topics in
`Automatic Speech and Speaker Recognition, C-H. Lee and
`F. K. Soong, eds., Kluwar Academic Publishers, 159-184
`(Chap. 7), (1996).
`Sedgewick, R., Algorithms, Second Edition, Addison-Wes
`ley Publishing Company, Inc., table of contents, (1988).
`Woodland, P., et al., “MPhil in Computer Speech and
`Language Processing, Introduction to Speech Recognition',
`Course
`Notes,
`Cambridge
`University,
`(1995).
`
`IPR2023-00037
`Apple EX1009 Page 2
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 1 of 24
`
`5,983,180
`
`
`
`IPR2023-00037
`Apple EX1009 Page 3
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 2 of 24
`
`5,983,180
`
`
`
`O(1) O(2) O(3) O(4) O(5) O(6)
`
`FIG. 2
`
`
`
`HMM STATE
`
`s
`
`
`
`21
`
`22
`
`23
`
`FIG. 3
`
`IPR2023-00037
`Apple EX1009 Page 4
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 3 of 24
`
`5,983,180
`
`CG)
`C) C3
`C9 C C,
`Ce CeCe Ce
`Cé) C3) C3) C3)
`CO) CO) C(O) CCO)
`CG) CG) CG) CG)
`CO) s
`g
`Cs) CS C3) C3) C3) C3)
`CeCe CeCe CeCe C?
`C3 C3, C5 C5 C5 C5 C5 C5 C5
`
`IPR2023-00037
`Apple EX1009 Page 5
`
`

`

`U.S. Patent
`
`Nov.9, 1999
`
`Sheet 4 of 24
`
`5,983,180
`
`
`INaWNOGNYEY=---(}-—(U)}-—{KD)}-—{W)
`OZI-SLIbL1-60L801-601201-26sv=-(Z)}-—{Aa)
`
`
`
`
`
`oNINOGNVaY=---6U-—{U!)-—(u)}-—xd)-(P)}-—(u)-—@0)-(q)-—k)
`GFUUYUY
`
` ve-61/|aaNoaNvay=--{(P);9g-1¢V|v=---40)
`96-1606-S@\G/-f2ZL-L999-1909-GS¥S-6hBP-S¥Z-LE
`
`
`
`
`
`
`OQOKOOGOGOGOGOo
`NOGNVaYo¢-6zvy
`
`
`
`logay~---(}}-—KO)
`Aga~---(AN)
`
`woai/UV
`OGOU
`SY\an9-1
`
`ZEL-LZ19Z1-1Z
`(q)-—@0
`UO,y
`
`G‘Old
`
`IPR2023-00037
`Apple EX1009 Page 6
`
`IPR2023-00037
`Apple EX1009 Page 6
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet S of 24
`
`5,983,180
`
`HMM STATE
`
`
`
`IPR2023-00037
`Apple EX1009 Page 7
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 6 of 24
`
`5,983,180
`
`X100||S
`
`TEGOWHOÀIWES
`
`EN|0NE
`
`
`
`WIWO
`
`0
`Z
`
`IPR2023-00037
`Apple EX1009 Page 8
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 7 of 24
`
`5,983,180
`
`
`
`FIG. 8
`
`HMM STATE
`
`FIG. 9
`
`IPR2023-00037
`Apple EX1009 Page 9
`
`

`

`U.S. Patent
`
`
`
`Nov. 9, 1999
`
`Sheet 8 of 24
`
`5,983,180
`
`@-@@-@@-@);
`
`~~~~ ~~~~);
`
`FIG. 1 O
`
`IPR2023-00037
`Apple EX1009 Page 10
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 9 of 24
`
`5,983,180
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Self-rob
`sil 0.245465 2 0.962093 0.37907
`SelfProb
`0.5755260.424474
`ax 0.031528 1
`Self Prob
`b 0.012664 2 0.736O220.263978
`Self Prob
`ih 0.042714 2 0.5551210.446879
`SelfRrob
`0.026959 2 0.692925.0.307075
`SelfRrob exitFrob
`ih 0.042714 2 0.5551210.446879
`(t) - Sense Self Prob exitFrob
`t 0.053775. 2 0.7267680.273232
`n S nigte t e 6. g E.
`i. X
`i 8
`O %
`0. d f O 9
`SelfRrob
`sil 0.245465. 2 0.962093
`
`
`
`
`
`
`
`
`
`
`
`al-e-
`
`
`
`IPR2023-00037
`Apple EX1009 Page 11
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 10 of 24
`
`5,983,180
`
`
`
`o
`
`O
`
`1 2
`
`4.
`l
`
`g
`i
`
`sit
`
`
`e al
`h
`a.
`
`W
`
`X
`
`h
`C
`
`d
`h
`
`e
`
`e
`r
`
`f
`
`hh
`
`i
`i
`h
`k
`
`THE UK PHONE TABLE
`
`States
`
`Self Prob
`
`024546s
`
`0.024685
`0.03200
`0.003347
`
`0.0979
`
`
`
`0.692925
`070660
`
`FIG. 12
`
`0.037907
`0.2O6536
`0.31400
`0.327266
`0.193428
`O. 187452
`0.424.474
`0.183951
`0.263978
`0.200266
`0.323747
`0.363.584
`0.215546
`0.272536
`0.20644
`0.219322
`0.208359
`0.2941 14
`0.279065
`0.215634
`0.446879
`0.245839
`0.202.504
`0.252953
`0.307075
`0.28934
`
`IPR2023-00037
`Apple EX1009 Page 12
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 11 of 24
`
`5,983,180
`
`Index
`
`2 6
`2 7
`
`3 3
`3 4
`35
`3 6
`
`4 O
`
`i
`
`4 3
`
`oh
`
`O W
`
`sh
`
`th
`
`la
`uh
`
`W
`
`THE UK PHONE TABLE
`
`
`
`States
`
`Self Prob
`
`2
`
`0.048768
`0.009 110
`O.O12692
`0.03.108
`0.002564
`0.023534
`0.022903
`0.059022
`0.014548
`0.053775
`0.005 125
`0.001.254
`0.001 666
`0.01 0664
`0.01.3320
`0.011328
`0.00947
`0.032797
`OOOO8
`
`
`
`0.7552O6
`0.751965
`96
`
`FIG. 12 (Continued)
`
`0.326562
`0.243297
`O.27958O
`0.219885
`O. 185322
`0.237239
`0.346892
`0.20800
`0.164777
`0.273232
`0.241910
`0.274O16
`0.4695.06
`0.277400
`0.290314
`0.257742
`0.258759
`0.244794
`
`IPR2023-00037
`Apple EX1009 Page 13
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 12 of 24
`
`5,983,180
`
`S1
`
`INTIALIZATION
`
`S2
`
`S5
`
`POINT TO FIRST
`PHONEME
`
`
`
`S4
`
`PROCESS NODE
`ROUTINE
`
`S5
`ALL NODES
`PROCESSED
`
`S6
`OUTPUT WORD(S)
`WITH HIGHEST
`PROBABILITY
`
`
`
`
`
`FIG. 13
`
`
`
`POINT TO NEXT
`PHONEME
`
`IPR2023-00037
`Apple EX1009 Page 14
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 13 of 24
`
`5,983,180
`
`THE PROCESS NODE SUBROUTE
`
`
`
`S
`
`POINT TO
`FIRST WORD
`
`
`
`
`
`S18
`S1
`STORE ID OF
`BEST (bj(T
`TO by-its WORD AND (T)
`
`
`
`
`
`
`
`S19
`
`S20
`
`DONE
`
`YES
`
`
`
`NEXT WORD
`
`FIG. 14
`
`IPR2023-00037
`Apple EX1009 Page 15
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 14 of 24
`
`5,983,180
`
`
`
`
`
`
`
`
`
`
`
`S21
`
`ANY
`CHILDREN2
`
`S22
`POINT TO
`FIRST CHILD
`
`
`
`PASS (bj(1).j(T) TO
`PROCESS NODE SUBROUTINE
`
`
`
`S25
`POINT TO
`NEXT CHILD
`
`
`
`
`
`CHILDREN
`DONE
`
`THE PROCESS NODE SUBROUTE
`
`FIG. 14 (Continued)
`
`IPR2023-00037
`Apple EX1009 Page 16
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 15 of 24
`
`5,983,180
`
`tO = START TIME
`t1 = END TIME
`
`(i(t0+1)=p(0) + log ai + log bj0(t0+1)
`
`(bj(t)=max(bj(t-1) + log ai + log bjOCt))
`
`<i> t=t -
`YES
`
`1
`
`i(t)=(i(t-1) + log aij + log bj0(t))
`
`
`
`NO
`
`$5(t) LOCALL
`PRUNED2
`YES
`
`FIND LAST START TIME ABOVE
`GLOBAL PRUNING THRESHOLD
`
`FIND LAST END TIME ABOVE
`GLOBAL PRUNING THRESHOLD
`
`IPR2023-00037
`Apple EX1009 Page 17
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 16 of 24
`
`5,983,180
`
`HMM STATE
`
`
`
`s
`
`IPR2023-00037
`Apple EX1009 Page 18
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 17 Of 24
`
`5,983,180
`
`
`
`s
`
`IPR2023-00037
`Apple EX1009 Page 19
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 18 of 24
`
`5,983,180
`
`S30
`INTIALIZE HEAP WITH
`CSTART), TIME=O AND
`PROBABILITY=0
`
`S31
`POP HEAP TEM FROM
`TOP OF HEAP
`
`S32
`
`CALL PROCESS
`TREE ROUTINE
`
`S35
`ANY WORDS
`OUT2
`
`
`
`YES
`
`S54
`
`POINT AT FIRST WORD
`
`S36
`
`
`
`
`
`
`
`PUSH NEW
`LANGUAGE
`NO MODEL STATE,
`PROBABILITIES
`HISTORIES AND
`TIME RANGE
`ONTO HEAP
`
`
`
`NO
`
`NO
`
`
`
`
`
`
`
`S35
`
`LANGUAGE
`
`IN HEAP
`
`
`
`
`
`
`
`POINT AT NEXT WORD
`
`
`
`
`
`
`
`ANY MORE
`WORDS?
`
`NO
`
`S40
`
`S41
`
`YES
`
`FIG. 18
`
`IPR2023-00037
`Apple EX1009 Page 20
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 19 of 24
`
`5,983,180
`
`S52
`
`FOR EACH
`TIME, IF NEW IS
`BETTER REPLACE
`PROBABILITY
`AND HISTORY
`
`
`
`S53
`CONCATENATE
`THE REST
`
`
`
`
`
`
`
`
`
`S5
`7
`
`PAD GAP
`WITH DUMMY
`S5
`8
`CONCATENATE
`TWO RANGES
`PLUS DUMMY
`
`S50
`
`
`
`
`
`
`
`
`
`MERGE START
`
`
`
`IS THERE
`OVERLAP IN TIME
`RANCES
`
`
`
`ARE THEY
`DISJOINTED
`p
`
`NO
`
`CONCATENATE
`
`
`
`FIG. 19
`
`IPR2023-00037
`Apple EX1009 Page 21
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 20 of 24
`
`5,983,180
`
`CSTART)
`
`
`
`O
`
`10 20 30 40 50 60 70 80 90
`
`TIME/FRAMES
`
`FIG. 20
`
`IPR2023-00037
`Apple EX1009 Page 22
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 21 of 24
`
`5,983,180
`
`
`
`
`
`
`
`
`
`
`
`IWW EH1
`
`IPR2023-00037
`Apple EX1009 Page 23
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 22 of 24
`
`5,983,180
`
`WORD HYPOTHESIS TREE
`
`CSTARTY
`
`TIME/FRAMES
`I-I-I-T-I-T-I-T-T I I- --
`O 10 20 30 40 50 60 70 80 90 100 110 120 130 140
`
`FIG. 22
`
`
`
`TIME/FRAMES
`---------
`70 80 90 100 110 120 130 140
`FIG. 23
`
`IPR2023-00037
`Apple EX1009 Page 24
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 23 of 24
`
`5,983,180
`
`
`
`)
`
`(
`
`(
`
`Gae-t-k)-ax-CAT+k
`) / (
`)
`Gae-t-dh)-ax-CAT+dh
`(
`)
`)
`(
`de-t--S )--OX-CAT+S
`(dh-ax+k)-ax-THE+k
`) / (
`)
`(
`OX-dh--OX
`(dh-ax+d)-ax-THE+dh
`
`dh-Ox+s)--OX-THE--S
`(
`)
`Gae-t-k)-ox-SAT+k
`) / (
`)
`(
`S-de--t
`
`--OX-SAT+dh
`
`(
`
`)
`
`FIG. 25
`
`(
`
`)
`
`--OX-SAT+s
`
`IPR2023-00037
`Apple EX1009 Page 25
`
`

`

`U.S. Patent
`
`Nov. 9, 1999
`
`Sheet 24 of 24
`
`5,983,180
`
`: C1 : C2 : C3 : C4 : C5 :
`
`
`
`CODON
`
`FIG. 26A
`
`CC2
`
`G1
`
`:
`ICN
`
`G2
`
`GENE
`
`FIG. 26B
`
`Phi : Ph2 :
`
`Ph3
`
`: Pha :
`
`Y-1-1
`PHONEME
`
`FIG. 26C
`
`W1
`
`PhilPh2
`
`.
`Phn
`
`W2
`
`WORD
`
`FIG. 26D
`
`IPR2023-00037
`Apple EX1009 Page 26
`
`

`

`1
`RECOGNITION OF SEQUENTIAL DATA
`USING FINITE STATE SEQUENCE MODELS
`ORGANIZED IN A TREE STRUCTURE
`
`The present invention generally relates to a recognition
`method and apparatus for recognising data comprising
`Sequential data units represented by Sequential tokens
`grouped into one or more items. In an exemplary embodi
`ment the present invention is applied to Speech recognition.
`Systems for recognising Sequences of data units repre
`Sented as Sequential tokens which can be grouped into one
`or more items attempt to recognise items within the data by
`matching groups of tokens to a pre-stored dictionary of
`Sequences of tokens representing known items. Once a
`match has been found and thus the input data recognised, an
`appropriate output can be generated. Typically for Speech
`recognition this output can take the form of a command to
`instruct apparatus to carry out a function, or it can be
`translated into text for input into an application Such as a
`word processor.
`FIG. 1 illustrates a typical apparatus used for Speech
`recognition. A computer 1 has a keyboard 3 and is interfaced
`via an input/output device 11 to a microphone 7 and a
`loudspeaker 5. A Speech Signal is detected by the micro
`phone 7 and input to the computer 1 whereupon a program
`operates to recognise the Speech in order to at least one of
`control the computer, and generate an acoustic output at the
`loudspeaker 5. The loudspeaker 5 can also be used to play
`back the Speech Signal if this is digitally recorded. The
`computer 1 is operated under the control of a program which
`can be loaded using a computer Storage medium Such as a
`floppy disc 13 or any other type of computer Storage medium
`Such an optical disc, magnetic tape, or programmable
`memory devices.
`There are many different methods for carrying out Speech
`recognition and many techniques are based on probabilistic
`finite State Sequence models known as hidden Markov
`models (HMMs). A Markov chain comprises a plurality of
`States wherein a transition probability is defined for each
`transition from State to every other State and a Self transition
`probability is defined for transitions from a state to itself.
`FIG. 2 illustrates a hidden Markov model for the phonetic
`Sequence ae, b, aX and t for the pronunciation of ABBOT, AS
`can be seen there are four Sequential States which form the
`hidden Markov model for the word and there are six
`observations O(1) to O(6) which correspond to six sequen
`tial data units representing extracted portions of Speech.
`Since the hidden Markov model can undergo Self transitions,
`the transition along the hidden Markov model need not be,
`and indeed is rarely, Synchronous with the observed portions
`of Speech. AS can be seen in FIG. 2 Self transitions occur in
`the ae State and the ax State corresponding to the longer
`duration of the pronunciation of these phonemes.
`FIG. 3 illustrates more clearly the relationship between
`the HMM states and time. Each of the transitions from a
`State to another State and the Self transition within a State has
`asSociated with it a probability. Also, there is a probability
`that the observation corresponds with an HMM state. Thus
`in order to determine the most likely match between the
`sequence of observations and an HMM model for a word it
`is necessary to calculate an accumulated probability in a
`final state of the HMM model. The accumulated probability
`in a final State can be calculated by Summing log probabili
`ties or by multiplying the probabilities. In a system with
`more than one word model, the word having the highest
`accumulated probability can then be identified as the most
`likely spoken word corresponding to the Speech data.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,983,180
`
`2
`A prior art method for calculating the accumulated
`probability for each HMM model is the Viterbi method
`which is described in detail, together with an application of
`HMMs to speech recognition in a paper entitled “ATutorial
`on Hidden Markov Models and Selected Applications in
`Speech Recognition” by L. R. Rabiner (Proceedings of the
`IEEE, Vol. 77, No. 2, February 1989, pages 257-286). In
`implementing the Viterbi algorithm, all of the prior art
`methods calculate the accumulated probabilities for all of
`the HMM states for all of the models time sequentially. Thus
`as can be seen in FIG. 3, where the numbers under each state
`indicate the order calculation of the accumulated probability
`for each state, for each observation (O(1) to O(6)) i.e. for
`each input portion of Speech a probability of the portion of
`speech, matching each phoneme (HMM State) is generated
`and used to Sequentially determine the probability for each
`HMM state of each model. This is then repeated for each
`observation until an accumulated probability is determined
`for the final HMM state in step 24. Thus, as can be seen in
`FIG. 4 which illustrates a set of HMM models for a part of
`a dictionary, using the Viterbi method it is necessary for the
`accumulated probability for the first temporal State for each
`of the HMM states of all of the models to be stored. In fact,
`two temporal levels of HMM states are stored since the
`accumulated probabilities of the HMM states in the current
`time period must be calculated based on the probabilities of
`the HMM states in the previous time period. For a small
`dictionary this does not present a problem. However, for
`large dictionary Speech recognition e.g. 65,000 words, a
`large memory is required for the calculation of the prob
`abilities.
`AS can be seen in FIG. 4, the calculation of the accu
`mulated probability at many of the HMM states represents
`a duplication. This has been recognised in the prior art and
`in order to avoid this repetition the plurality of HMM models
`has been organised into a tree structure (U.S. Pat. No.
`5,621.859). FIG. 5 illustrates how the HMM models of FIG.
`4 can be rearranged in Such a tree Structure. AS can be seen
`this reduces the memory requirement for the Search Space in
`order to accumulate the probabilities.
`The prior art method utilising the tree structure of FIG.
`5 however still require all of the probabilities for the tree
`structure to be stored in memory. And thus whilst there is a
`memory Saving compared to the use of a complete Set of
`HMM models arranged linearly, it is desirable to still further
`reduce the memory requirements for the Search Space to
`calculate the accumulated probabilities and it is to the
`Solution of this problem that one aspect of the present
`invention is directed.
`There are two types of Speech recognition. Discrete
`Speech recognition constrains the Speaker to pause between
`each word to thereby provide discretely spoken words for
`recognition. Such Systems are now being Superseded how
`ever with Systems capable of recognising natural Speech i.e.
`continuous speech wherein a Speaker is not constrained to
`pause between each spoken word. The problem with Such a
`System is that spoken words tend to merge together and the
`boundaries between words are often indefinite. Thus, the
`transition between HMM models is not defined by a single
`transition from a final State but instead can comprise mul
`tiple transitions as illustrated in FIG. 6. AS can be seen in
`FIG. 6 there are four possible transitions from a previous
`word ending in t to a Succeeding word ABBEY comprising
`the phonemes ae, b, and iy. Thus, for continuous speech
`recognition it is necessary for Such transitions to be taken
`into account.
`In speech recognition, in order to improve the accuracy
`of the recognition, not only are models for the words used,
`
`IPR2023-00037
`Apple EX1009 Page 27
`
`

`

`5,983,180
`
`15
`
`25
`
`35
`
`40
`
`3
`but also a language model is used to provide models
`indicating the probability of the occurrence of Sequences of
`words. Typically an N-gram is used wherein the accumu
`lated probability to the final state is modified in dependence
`upon one or more previous words. Thus, for instance in a
`bigram model for general English, there is a higher prob
`ability of the word CAT following the word THE than the
`word THE following the word THE. These probabilities can
`be used to modify the accumulated probability in the final
`HMM state. In order to do this however the accumulated
`probability for each HMM model is used as an initial
`probability for one or more initial HMM states in a preced
`ing word. This may be achieved using Stack decoding as for
`example disclosed in U.S. Pat. No. 4,748,670.
`It is an object of a Second aspect of the present invention
`to provide a more efficient recognition method for use in for
`example continuous speech recognition to reduce the com
`putational requirements.
`In accordance with the first aspect the present invention
`provides a method of automatically recognising data com
`prising Sequential data units represented as Sequential tokens
`grouped into one or more items, the method comprising the
`Steps of:
`Storing data representing known items as respective finite
`State Sequence models, where each State corresponds to
`a token and Said models having common prefix States
`are organised in a tree Structure Such that Suffix States
`comprise branches from common prefix States and
`there are a plurality of tree Structures each having a
`different prefix state;
`comparing each Sequential data unit with Stored reference
`data units identified by respective reference tokens to
`generate Scores for each data unit indicating the Simi
`larity of the data unit to respective Said reference data
`units,
`determining an accumulated Score for the final State in the
`models by
`a) Sequentially calculating the accumulated Score for a
`model to reach the final State comprising a leaf in the
`tree,
`b) identifying the closest branch to the leaf correspond
`ing to a next model for which an accumulated Score
`for the final State has not yet been calculated,
`c) accumulating the Score from the identified closest
`branch for the next model to the final state,
`d) repeating steps (b) and (c) for the branches of the
`tree, and
`e) repeating steps (a) to (d) for the plurality of trees, and
`identifying at least the item corresponding to the model
`having the highest accumulated Score.
`Also, in accordance with this aspect of the present inven
`tion there is provided recognition apparatus for recognising
`Sequential tokens grouped into one or more items, the
`apparatus comprising:
`Storage means for Storing data representing known items
`as respective finite State Sequence models, where each
`State corresponds to a token and Said models having
`common prefix States are organised in a tree Structure
`Such that Suffix States comprise branches from common
`prefix States and there are a plurality of tree Structures
`each having a different prefix State,
`comparing means for comparing each Sequential data unit
`with Stored reference data units identified by respective
`reference tokens to generate Scores for each data unit
`indicating the Similarity of the data unit to respective
`Said reference data units,
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`determining means for determining an accumulated Score
`for the final State in the models comprising
`a) means for Sequentially calculating the accumulated
`Score for a model to reach the final State comprising
`a leaf in the tree
`b) means for identifying the closest branch to the leaf
`corresponding to a next model for which an accu
`mulated Score for the final State has not yet been
`calculated, and
`c) means for accumulating the Score from the identified
`closest branch for the next model to the final state,
`wherein the Scores are accumulated for the branches
`of the tree and accumulated for the plurality of trees,
`and
`means for identifying at least the item corresponding to
`the model having the highest accumulated Score.
`The present invention can be implemented using dynamic
`programming to determine the most likely State Sequences or
`the most likely word Sequences, i.e. the maximum accumu
`lated Score is calculated or within a word the Score for all
`possible State Sequences can be Summed to calculate what is
`termed the forward probability (see page 335 of “Funda
`mentals of Speech Recognition” by L. Robiner and B. Juang,
`Prentice Hall, 1993).
`In a preferred embodiment of the present invention each
`state of the hidden Markov model corresponds to a token.
`The present invention is however not limited to such a direct
`comparison and more than one HMM State may correspond
`to a token.
`Thus instead of calculating the accumulated probabilities
`for all of the States in a time instance i.e. time Synchronously
`in accordance with prior art methods of implementing the
`Viterbialgorithm, the accumulated probabilities for each
`State are calculated for nodes of a tree until a final State of
`a model is reached which may or may not be a leaf node of
`the tree, i.e. State Sequentially down branches of the trees.
`Once the accumulated probability at the final state has been
`determined, the memory Space during the accumulated prob
`abilities for the nodes of the tree back to the nearest branch
`for which an accumulated probability for a model has still to
`be calculated is freed for use in the calculation of the
`accumulated probabilities down the next branch. In this way
`the amount of memory required is reduced. The memory
`requirement depends on the number of States in the longest
`hidden Markov model, and the number of observations. In
`contrast the time synchronous implementation of the Viterbi
`algorithm requires a memory capacity equivalent to the
`average number of states in the hidden Markov model times
`the number of hidden Markov models. The comparing step
`can comprise any method of generating probabilities for
`each data unit indicating the likelihood of the data unit being
`represented by a particular token.
`In an embodiment each token is modelled as an HMM
`where each state of the HMM corresponds to a subdata unit.
`Comparison Step then comprises determining accumulated
`probabilities for each data unit using the respective HMM.
`In order to reduce the calculations a pruning Step is used
`to halt the accumulation of Scores for any branches where
`the accumulated score falls below a threshold. Such a
`threshold can be an absolute threshold although in a pre
`ferred embodiment the threshold is relative to the highest
`accumulated Score. Such a technique is termed “beam prun
`ing.
`In accordance with the present invention the tokens can
`represent any form of Sequential data units which occur in
`one or more groups corresponding to items. The tokens
`comprise a finite Set of tokens where each token appears in
`
`IPR2023-00037
`Apple EX1009 Page 28
`
`

`

`5,983,180
`
`5
`
`25
`
`35
`
`40
`
`S
`more than one item. An example of Such is speech recog
`nition wherein Speech data comprises parts of Speech e.g.
`phonemes or Syllables which occur Sequentially and are
`grouped into one or more words.
`Each token can be independent of preceding and Succeed
`ing tokens. For example in Speech recognition it can be
`assumed that there are a Set of parts of Speech e.g. phonemes
`which are not modified by the preceding or Succeeding part
`of Speech. Alternatively, the tokens can depend on neigh
`bouring tokens in the finite State Sequence e.g. in Speech
`recognition context dependent phone models can be used to
`generate phoneme probabilities.
`Since the probabilities of the states of the hidden Markov
`model are calculated for each state sequentially, the HMM
`States can be considered to have temporal States correspond
`ing to each data unit e.g. each time frame. However, the
`15
`present invention is not limited to the recognition of time
`Sequential data and can recognise any data once it is pre
`Sented to the System Sequentially. The accumulation of the
`Score for the final State in the models is calculated by
`calculating the accumulated Score for the temporal States for
`a State in a timewise manner before calculating the temporal
`States for the next State in a timewise manner. This makes it
`possible for each branch of the lexical tree in a speech
`recognition System to be calculated one at a time reusing the
`common Suffix phonemes by releasing the memory for
`HMM states of the previous model.
`In an embodiment of the present invention the accumu
`lated Scores for the temporal States for a final State of a model
`of an item are Stored as a group in a memory together with
`information identifying a grouping model State used to
`arrive at the final State, information identifying positions in
`the tree Structure and information identifying the temporal
`position of the accumulated Scores. A previously stored
`group of accumulated Scores is then read from memory and
`the accumulated Scores for the plurality of temporal States is
`used as a plurality of temporally different initial Scores for
`the calculation of the accumulated Scores for the plurality of
`temporal States in the final State of a model of a Subsequent
`item. The grouping model State used to arrive at the final
`state of the model of the subsequent item is determined from
`the identification of the Subsequent item and the grouping
`model State of the read group of accumulated Scores.
`This embodiment of the present invention reduces the
`calculations required for continuous Speech recognition
`when there is an overlap between the final States and initial
`States of neighbouring words i.e. the boundary between two
`spoken words is not sharply defined. In this embodiment for
`Speech recognition the group model State comprises a lan
`guage model State which for an N-gram model comprises
`N-1 words.
`A Store for each of the grouping model States can be Stored
`to indicate the likelihood of items being Sequentially
`grouped based on a grouping model of likely item
`Sequences. In order to take into account a likelihood of
`Sequences of items, the accumulated Scores for the plurality
`of temporal States for the final State of a model of an item can
`be modified by the Score for the grouping model State before
`being Stored. Thus for the Speech recognition System an
`N-gram language model can be used to give the probability
`of word Sequences.
`The memory means in which the groups of accumulated
`Scores are Stored comprises a Stack in which the groups are
`ordered temporally or by their Scores. In this way groups of
`accumulated Scores for grouping model States can be pushed
`onto and popped off a Stack as groups.
`In an embodiment of the present invention it can be
`determined if a grouping model State is already Stored in the
`
`65
`
`45
`
`50
`
`55
`
`60
`
`6
`memory means and if So the calculated accumulated Scores
`for a final State of a model of an item and associated
`information is merged with the Stored group of accumulated
`Scores and asSociated information in temporal alignment
`whereby if there is temporal overlap between accumulated
`Scores the highest accumulated Score is Stored in the
`memory means together with respective associated informa
`tion.
`In accordance with a Second aspect of the present inven
`tion there is provided a method of automatically recognising
`data comprising Sequential data units represented as Sequen
`tial tokens grouped into a plurality of items, the method
`comprising the Steps of:
`Storing data representing known items as respective finite
`State Sequence models where each State corresponds to
`a token and each State comprises a plurality of temporal
`States corresponding in time to the Sequence of tokens,
`comparing each Sequential data unit with Stored reference
`data units identified by respective reference tokens to
`generate Scores for each data unit indicating the Simi
`larity of the data unit to respective Said reference data
`units,
`determining acc

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket