`
`111 1111111111 1111111111 11111 11111 1111111111 111111111111111111
`
`US010971140B2
`
`(IO) Patent No.: US 10,971,140 B2
`
`c12) United States Patent
`
`(45)Date of Patent:
`Apr. 6, 2021
`Catchpole
`
`(54)SPEECH RECOGNITION CIRCUIT USING
`
`PARALLEL PROCESSORS
`
`(56)
`
`
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`
`
`
`
`
`
`(71)Applicant: Zentian Limited, Cambridge (GB)
`
`
`
`
`
`
`
`4,977,599 A * 12/1990 Bahl ....................... Gl0L 15/02
`
`704/256.4
`
`4,980,918 A * 12/1990 Bahl ....................... Gl0L 15/08
`704/240
`(Continued)
`
`
`
`
`
`(72)Inventor: Mark Catchpole, Prickwillow (GB)
`
`
`
`
`
`
`
`
`
`
`
`(73)Assignee: Zentian Limited, Cambridge (GB)
`
`the term ofthis ( *) Notice: Subject to any disclaimer,
`
`
`GB
`under 35 patent is extended or adjusted
`
`
`
`GB
`U.S.C. 154(b) by 0 days.
`
`(21)Appl. No.: 16/266,265
`
`
`
`(22) Filed: Feb. 4, 2019
`
`(65)
`
`
`
`Prior Publication Data
`
`
`
`US 2019/0172447 Al Jun. 6, 2019
`
`
`
`FOREIGN PATENT DOCUMENTS
`
`2 112 194 A 7 /1983
`
`2 331 392 A
`5/1999
`(Continued)
`
`OTHER PUBLICATIONS
`
`R.M. Woodward et al., "Tissue Classification Using Terahertz
`
`
`
`
`
`
`
`
`
`Pulsed Imaging", Proceedings of the SPIE-The International Soci
`
`
`
`
`ety for Optical Engineering SPIE-INT. Soc. Opt. Eng. USA, vol.
`
`
`
`5318, No. 1, 2004, pp. 23-33. Abstract also attached.
`(Continued)
`
`Primary Examiner - Daniel Abebe
`
`
`Related U.S. Application Data
`
`
`
`(74)Attorney, Agent, or Firm - Finnegan Henderson
`
`Farabow Garrett & Dunner LLP
`
`
`
`(63)Continuation of application No. 15/392,396, filed on
`
`Dec. 28, 2016, now Pat. No. 10,217,460, which is a
`(57)
`(Continued)
`
`
`
`ABSTRACT
`
`A speech recognition circuit comprises an input buffer for
`
`
`
`
`
`
`
`
`receiving processed speech parameters. A lexical memory
`
`
`
`
`
`contains lexical data for word recognition. The lexical data
`
`
`
`
`comprises a plurality of lexical tree data structures. Each
`
`Feb. 4, 2002 (GB) ...................................... 0202546
`
`
`
`lexical tree data structure comprises a model of words
`
`
`
`
`having common prefix components. An initial component of
`
`
`
`
`each lexical tree structure is unique. A plurality of lexical
`
`
`
`
`tree processors are connected in parallel to the input buffer
`(2013.01)
`
`
`
`for processing the speech parameters in parallel to perform
`(2013.01)
`GlOL 151187
`
`
`
`
`
`parallel lexical tree processing for word recognition by
`(Continued)
`
`
`
`
`
`accessing the lexical data in the lexical memory. A results
`(52)U.S. Cl.
`
`
`
`
`memory is connected to the lexical tree processors for
`CPC ............ GlOL 151187 (2013.01); GlOL 15105
`
`
`
`
`
`
`
`
`storing processing results from the lexical tree processors
`
`(2013.01); GlOL 15134 (2013.01)
`
`
`
`
`and lexical tree identifiers to identify lexical trees to be
`
`
`
`
`processed by the lexical tree processors. A controller con
`( 58)Field of Classification Search
`
`
`
`
`
`
`trols the lexical tree processors to process lexical trees
`
`
`CPC ................................ Gl0L 15/32; Gl0L 15/34
`
`
`
`See application file for complete search history.
`(Continued)
`
`
`
`
`
`(30) Foreign Application Priority Data
`
`
`
`(51)Int. Cl.
`GlOL 15132
`
`d-r+ao
`
`LEXICAL TREE 3
`ao-t+sii
`
`HARD
`
`ROCK
`
`LEXICAL TREE 4
`
`LOT
`
`LOCK
`
`..
`
`,.
`
`....
`
`-► ©
`
`r-d+r
`r-d+I
`
`
`
`US 10,971,140 B2
`
`Page 2
`
`identified in the results memory by performing parallel
`
`
`2003/0178584 Al 9/2003 Arnone et al.
`
`
`
`
`2005/0119883 Al* 6/2005 Miyazaki .............. GlOL 15/142
`
`
`
`processing on a plurality of said lexical tree data structures.
`704/231
`
`
`8 Claims, 6 Drawing Sheets
`
`
`
`
`
`2008/0255839 Al 10/2008 Larri et al.
`
`FOREIGN PATENT DOCUMENTS
`
`OTHER PUBLICATIONS
`
`(51)
`Int. Cl.
`GlOL 15134
`
`GlOL 15105
`
`(2013.01)
`(2013.01)
`
`(56)
`
`
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`7,120,582 Bl 10/2006 Young et al.
`
`
`
`7,174,037 B2 2/2007 Arnone et al.
`
`
`7,899,669 B2 3/2011 Gadbois
`
`
`2001/0011218 Al 8/2001 Phillips et al.
`
`
`2002/0143531 Al 10/2002 Kahn
`
`
`
`* cited by examiner
`
`
`
`
`
`Related U.S. Application Data
`
`9/2000
`2 347 835 A
`
`GB
`4/2003
`
`2 380 920 A
`GB
`8/2000
`WO 00/50859
`WO
`
`
`
`continuation of application No. 14/309,476, filed on
`8/2003
`
`WO-03/067572 A2
`WO
`
`
`Jun. 19, 2014, now Pat. No. 9,536,516, which is a
`
`
`
`continuation of application No. 13/253,223, filed on
`
`
`Oct. 5, 2011, now Pat. No. 8,768,696, which is a
`V.P. Wallace et al., "Biomedical Applications of THz Imaging",
`
`
`
`
`
`continuation of application No. 12/554,607, filed on
`
`
`
`Microwave Symposium Digest, 2004, IEEE MTT-S International
`
`Sep. 4, 2009, now Pat. No. 8,036,890, which is a
`
`
`Fort Worth, TX, USA, Jun. 6-11, 2004, Piscataway, NJ, USA, IEEE,
`
`
`
`continuation of application No. 10/503,463, filed as
`
`vol. 3, Jun. 6, 2004 (Jun. 6, 2004), pp. 1579-1581.
`
`
`application No. PCT/GB03/00459 on Feb. 4, 2003,
`
`
`
`V.P. Wallace et al., "Biomedical Applications of Terahertz Pulse
`now Pat. No. 7,587,319.
`
`
`Imaging", Second Joint EMBS-BMES Conference 2002. Confer
`
`
`
`ence Proceedings. 24th Annual International Conference of the
`
`
`
`
`Engineering in Medicinal and Biology Society. Annual Fall Meeting
`
`
`
`
`
`of the Biomedical Engineering Society, Houston, TX, Oct. 23-26,
`
`
`
`2002, Annual vol. 1 of 3. Conf. 24, Oct. 23, 2002 (Oct. 23, 2002),
`pp. 2333-2334.
`A.J. Fitzgerald et al., "Terahertz Imaging of Breast Cancer, a
`
`
`
`
`
`
`
`Feasibility Study", Conference Digest of the 2004 Joint 29th
`
`
`
`
`International Conference on Infrared and Millimeter Waves and
`
`
`
`
`12th International Conference on Terahertz Electronics IEEE Piscataway,
`Klein et al.
`
`5,293,213 A 3/1994
`
`
`NJ, USA, 2004, pp. 823-824.
`
`5,349,645 A 9/1994 Zhao
`
`B.E. Cole et al., "Terahertz Imaging and Spectroscopy of Human
`
`
`
`
`5,457,768 A 10/1995 Tsuboi
`
`Skin, In-vivo", Proceedings of SPIE, vol. 4276, pp. 1-10.
`
`
`
`5,579,436 A * 11/1996 Chou .................... G lOL 15/063
`
`
`
`
`S.Glinski et al., "Spoken Language Recognition on a DSP Array
`704/236
`
`
`
`
`Processor", IEEE Transactions on Parallel and Distributed Systems,
`5,621,859 A 4/1997 Schwartz
`
`
`Jul. 5, 1994, No. 7, New York, USA, pp. 697-703.
`
`
`5,710,866 A * 1/1998 Alleva .................... Gl0L 15/10
`
`
`
`S.Chatterjee et al., "Connected Speech Recognition on a Multiple
`704/246
`
`
`
`
`Processor Pipeline", ICASSP 89, May 23, 1989, Glasgow, UK, pp.
`5,832,428 A 11/1998 Chow
`
`
`774-777.
`
`5,881,312 A 3/1999
`
`Dulong
`S. H. Chung et al. "A Parallel Phoneme Recognition Algorithm
`
`
`
`
`5,960,395 A 9/1999 Tzirkel-Hanock
`
`
`
`Based on Continuous Hidden Markov Model", Proceedings 13th
`
`
`5,983,180 A * 11/1999 Robinson .............. GlOL 15/142
`
`
`
`
`International Parallel Processing Symposium and 10th Symposium
`704/254
`
`
`
`on Parallel and Distributed Processing, IPPS/SPDP 1999, Proceed
`
`5,995,930 A 11/1999 Rab-Umbach et al.
`
`
`
`6,047,283 A 4/2000 Braun
`
`
`
`
`ings of 13th International Parallel Processing Symposium and 10th
`
`
`6,374,222 Bl 4/2002 Kao
`
`
`
`
`
`Symposium on Parallel and Distributed Pro, IEEE Comput. Soc. pp.
`6,526,380 Bl 2/2003 Thelen
`
`453-457 (1999).
`
`
`
`7,024,359 B2 * 4/2006 Chang ................... G lOL 15/065
`S.H. Chung et al. "A Parallel Computation Model for Integrated
`
`
`704/251
`
`
`
`Speech and Natural Language Understanding", IEEE Transactions
`7,035,802 Bl 4/2006 Rigazio
`
`
`
`
`on Computers, vol. 42, No. 10, Oct. 1, 1993.
`
`
`7,065,488 B2 * 6/2006 Yajima .................... Gl0L 15/20
`
`
`N. Deshmukn et al. "Hierarchical Search for Large-Vocabul
`ary
`704/226
`
`
`
`
`
`Conversational Speech Recognition: Working Toward a Solution to
`
`
`
`
`
`the Decoding Problem", IEEE Signal Processing Magazine, vol. 16,
`
`
`No. 5, pp. 84-107 (Sep. 1999).
`
`
`
`�
`
`3
`
`ADC
`fs = 48 kHz
`20 bit
`
`4
`
`normalise
`
`9
`
`t
`
`� i .-10
`.-:,( '�,�;��;�
`
`Energy I
`
`..,
`
`Fig 1
`
`14
`
`�
`
`Vt::LlUlti� /
`
`rinn1 ,_
`
`e •
`00
`
`• � � ��
`�
`
`=
`
`> "e
`:-:
`'"Cl's
`N
`0
`N
`
`....
`
`rJJ
`
`=-('D
`('D
`.....
`
`....
`0
`
`....
`Cl's
`
`d
`
`r.,;_ "'""'
`-....l "'""''" "'""' �
`
`'"= \0
`
`=
`=
`N
`
`
`
`e •
`00 •
`
`�
`�
`�
`�
`
`�
`=
`
`I
`
`N bej
`sentem
`
`--
`
`t :-:
`
`'"Cl's
`N
`0
`N
`
`....
`
`rJJ
`
`=('D
`('D .....
`N
`
`0
`
`....
`Cl's
`
`d
`r.,;_
`"'""'
`
`-..= \0
`-....l
`"'""''"
`"'""'
`�
`
`=
`=
`N
`
`.,--- 28
`
`I
`
`program & data bus
`memory -
`
`/ 27
`
`,,,- 2s
`
`Search
`Controller
`
`-
`
`path scores
`
`1ex1ca1 tree
`idtmtifiers
`
`,,...- 2s
`
`'l
`
`Lexical Tree
`Processor
`
`Control bus
`
`Language
`- Model
`Processor ,,
`To all Lexical
`Tree Processors
`
`-
`
`-
`-
`
`
`
`Results Memory
`
`/ 31
`
`Language
`
`Model
`Memory
`
`word
`constralnt
`K
`I
`r
`
`r
`
`feature
`vector
`
`c)
`
`buffer
`,_ ----,.._
`____... --
`I 21
`/21
`21 1•
`I
`I 1.
`tree lexical tree lexical tree lexical
`
`
`
`I
`I
`processor I
`processor processor
`I
`I 1
`I
`I
`I
`I
`
`I Acoustic Model Memory
`l
`Lexical Tree Processor Cluster 1
`
`
`I
`______________
`I
`
`model bus ,,,- 30
`I language
`bus ,,,- 25
`oath score and hlstorv
`
`
`feature vector bus
`
`,,
`
`I
`
`2
`
`k
`••
`
`,
`
`1
`
`2
`
`
`
`I
`I
`
`I
`I
`I
`______________ J
`
`
`
`Lexical Tree Processor Cluster N
`
`Fig 2
`
`/ 24
`,_ .
`--------_,..._
`I 21
`21
`I 1,,. /21
`r-
`, ' I
`tree lexical tree lextca!
`
`I
`I
`lexical
`tree
`processor processor
`processor I
`I
`k
`••
`I
`I
`I 22
`I
`I Search Engine
`I
`...
`/23
`/2.3
`I
`r
`�22
`I
`I
`I Acoustic Model_ Memory
`
`I
`I
`I
`J
`
`
`
`LEXICAL rREE 1 r-d+r
`
`YOU
`
`�--··
`
`-.e y-uw+k
`
`y-uw+h
`
`Fig 3a
`
`LEXICAL TREE 2
`
`_J
`HART I
`
`ao-t+silLEXICAL TREE 3
`ROT
`
`I
`I
`
`��--
`
`r-d+r
`r-d+I
`
`/I
`
`\ d:_ao
`
`\ l\. I -• ROCK
`
`I
`1-ao+t ao-t+s11LEXICAL TREE 4
`1-ao+k f+\__.
`I
`
`LOT
`
`Fig 3b
`
`I
`
`- �LOCK
`
`I
`
`e •
`00 •
`�
`
`�
`�
`
`�
`=
`
`t :-:
`N ....
`
`'"Cl's
`N
`0
`
`('D
`
`rJJ=-('D
`0 ....
`
`�
`
`Cl's
`
`d
`r.,;_
`"'""'
`
`'"= \0
`-....l
`"'""''"
`"'""'
`�
`
`=
`
`= N
`
`
`
`U.S. Patent Apr. 6, 2021 Sheet 4
`of 6
`
`
`
`US 10,971,140 B2
`
`$1 (INITIALISE)
`
`I
`
`N0�
`
`YES
`READ FEATURE VECTOR FROM
`
`S5
`
`BUFFER
`
`EVALUATE STATE TRANSITIONS FOR
`S6 FIRST LEXICAL TREE NODE USING
`ACOUSTIC MODEL DATA
`
`DETERMINE SCORE FOR FIRST
`LEXICAL TREE NODE
`
`S7
`
`REQUEST LANGUAGE MODELSCORE
`S8 USING PREVIOUS N-1 WORDS iN
`RECEIVED DATA AND LEXICAL TREE
`OATA IN ACOUSTIC MODa MEMORY
`
`RECEIVE LANGUAGE MODa SCORES
`S9 FOR WORDS IN THE LEXICAL TREE
`AND PICK HIGHEST SCORE
`
`GENERATE TEMPORARY LEXICAL TREE
`S10 SCORE USING THE SCORE FOR FIRST
`LEXICAL TREE NODE AND THE
`HIGHEST LANGUAGE Mooa SCORE
`
`S11 SEND SCORE TO RESULTS MEMORY
`'-----1
`Af!, TEMPORARY LEXICAL TREE SCORE
`
`Fig 4
`
`
`
`U.S. Patent
`Apr. 6, 2021 Sheet 5 of 6
`
`
`
`US 10,971,140 B2
`
`S20 STNlT
`
`S23
`
`No�
`
`YES
`
`S24
`
`S25
`
`NO
`
`S26
`
`EVALUATE-STATE TRANSITIONS FOR
`
`EACH PAlli USIHG ACOUSTIC.MODEL
`DA.TA
`DETERMINE SCORES FOR PATHS, SEND
`
`BEST SCORE TO RESULTS MEMORY AND
`STORE PATH HISTORIES LOCALLY
`PRUNING APPLIED TO LEXICAL TREE
`TO DELETE PATHS
`
`S27
`
`s
`
`C
`
`S29
`
`S30
`
`APPLY LANGUAGE MODEL SCORE
`SEND SCORE AND HISTORYTO
`RESULTS MEMORY
`AND
`DELETE PATH(S)
`HISTORY DATA
`
`S31
`
`Fig 5
`
`S33 MESSAGE SENT TO SEARCH COtv'TROUER TO
`
`INDICATE THAT LEXICAL TREE HAS BEEN
`PROCESSED
`
`NO
`
`
`
`U.S. Patent Apr. 6, 2021
`Sheet 6 of 6
`
`
`US 10,971,140 B2
`
`S40 INITIALISATION
`
`READ INITIAL LEXICAL TREE DATA IN
`RESULTS MEMORY
`S41
`
`INITIAL LEXICAL TREE DATA
`DIS1RIBUTED AMONGST LEXICAL TREE
`S42 PROCESSORS FOR TEMPORARY
`LEXICAL TREE SCORE DETERMINATION
`
`TEMPORARY LEXICAL TREE SCORES
`S43 RETURNED TO RESULTS MEMORY
`
`Fig 6
`
`PRUNE THE LEXICAL TREES IN THE
`S44 RESULTS MEMORY ON BASIS OF
`TEMPORARY LEXICAL TREE �RES
`
`LEXICAL TREE PROCESSING
`S45 DISTRIBUTED AMONGST LEXICAL TREE�--------,
`
`PROCESSORS
`
`NO
`
`YES
`
`DETERMINE NEXT POSSIBLE LEXICAL
`
`S47 TREES USING CROSS WORD
`TRIPHONES
`
`LEXICAL TREE DATA DISTRIBUTED
`AMONGST LEXICAL TREE
`PRQCESSORSFORTEMPORARY
`S48
`LEXICAL TREE SCORE DETERMINATION
`
`S49 TEMPORARY LEXICAL TREE SCORES
`RETURNED TO RESULTS MEMORY
`
`PRUNE THE LEXICAL TREES IN THE
`
`S50 RESULTS MEMORY ON BASIS OF
`TEMPORARY LEXICAL SCORES
`
`YES
`
`t
`S521 RESULTS
`OUTPUT
`
`
`
`US 10,971,140 B2
`
`PARALLEL PROCESSORS
`
`1
`
`SPEECH RECOGNITION CIRCUIT USING
`
`
`2
`data structures representing a plurality of lexical trees. Each
`
`
`
`
`
`
`lexical tree data structure comprises a model of words
`
`
`
`having common prefix components and an initial component
`This is a continuation of application Ser. No. 15/392,396,
`
`
`
`
`
`
`which is unique as an initial component for lexical trees. A
`
`filed Dec. 28, 2016, which is a continuation of application
`
`
`
`
`5 plurality of lexical tree processors are connected in parallel
`
`
`Ser. No. 14/309,476, filed Jun. 19, 2014, now U.S. Pat. No.
`
`
`to the input port and perform parallel lexical tree processing
`
`
`
`9,536,516, which is a continuation of application Ser. No.
`
`
`
`for word recognition by accessing the lexical data in the
`
`
`13/253,223, now U.S. Pat. No. 8,768,696, filed Oct. 5, 2011,
`
`
`
`lexical memory arrangement. A results memory arrangement
`
`
`which is a continuation of application Ser. No. 12/554,607,
`
`
`
`
`is connected to the lexical tree processors for storing pro-
`
`now U.S. Pat. No. 8,036,890, filed Sep. 4, 2009, which is
`
`
`
`
`10 cessing results from the lexical tree processors and lexical
`
`
`
`continuation of application Ser. No. 10/503,463, now U.S.
`
`
`
`
`tree identifiers to identify lexical trees to be processed by the
`
`
`Pat. No. 7,587,319, filed May 24, 2005, which is a 371 of
`
`
`
`
`lexical tree processors. A controller controls the lexical tree
`
`
`
`International Application No. PCT/GB2003/000459, filed
`
`
`
`
`processors to process lexical trees identified in the results
`
`
`
`Feb. 4, 2003, which claims foreign priority to United King
`
`
`
`
`memory arrangement by performing parallel processing of a
`
`
`dom Application No. 0202546.8 filed Feb. 4, 2002, the
`
`15 plurality of lexical tree data structures.
`
`
`
`
`disclosures of which are incorporated herein by reference in
`
`
`
`Thus in accordance with this embodiment of the present
`their entireties.
`
`
`
`invention, the processing in order to perform word recog
`
`
`
`
`The present invention generally relates to a speech rec
`
`
`
`
`
`nition is distributed across the processors by controlling the
`
`
`
`
`ognition circuit which uses parallel processors for process
`
`
`
`
`processors to perform processing on different lexical trees.
`
`
`ing the input speech data in parallel.
`
`
`
`
`20 The controller controls the processor by the processes to
`
`
`Conventional large vocabulary speech recognition can be
`
`
`
`provide for efficient process management by distributing
`
`
`
`divided into two processes: front end processing to generate
`
`
`
`lexical processing to appropriate processors.
`
`
`
`
`processed speech parameters such as feature vectors, fol
`
`
`
`The lexical tree data structure can comprise a phone
`
`
`lowed by a search process which attempts to find the most
`
`
`
`model of words, wherein the components comprise phones.
`
`likely set of words spoken from a given vocabulary (lexi
`
`
`
`
`25 For reduced storage, the lexical tree data structure can
`con).
`
`
`comprise a mono phone lexical tree. The mono phone lexical
`
`
`
`The front end processing generally represents no problem
`
`
`
`tree can be used to generate context dependent phone
`
`
`
`
`for current processing systems. However, for large vocabu
`
`
`
`models dynamically. This enables the use of context depen
`
`
`
`lary, speaker independent speech recognition, it is the search
`
`
`dent phone models for matching and hence increased accu-
`
`
`
`
`
`process that presents the biggest challenge. An article by
`
`
`30 racy whilst not increasing memory requirements. Alterna
`
`
`Deshmukh et al entitled "Hierarchical Search for Large
`
`
`
`
`tively, the lexical tree data structure can comprise context
`
`
`Vocabulary Conversational Speech Recognition" (IEEE Sig
`
`dependent phone models.
`
`
`
`nal Processing Magazine, September 1999, pages 84 to
`The processing performed by each processor in one
`
`
`
`
`107), the content of which is hereby incorporated by refer
`
`
`
`embodiment comprises the comparison of the speech param-
`
`
`
`ence, discusses the general concepts of large vocabulary
`
`35 eters with the lexical data, e.g. phone models or data derived
`
`
`
`speech recognition. As discussed in this paper, one algorithm
`
`
`
`from the lexical data ( e.g. dynamically generated context
`
`
`
`
`for performing the search is the Viterbi algorithm. The
`
`
`
`dependent phone models) to identify words as a word
`
`
`
`
`
`
`Viterbi algorithm is a parallel or breadth first search through
`
`
`
`recognition event and to send information identifying the
`
`
`
`
`a transition network of states of Hidden Markov Models. An
`
`
`
`identified words to the results memory as the processing
`
`
`
`acoustic model for words in a lexicon are represented as
`
`
`
`40 results. In this embodiment a language model processor
`
`
`
`states of Hidden Markov Models. These states represent
`
`
`
`arrangement can be provided for providing a language
`phones or n phones in a phone model of the words. The
`
`
`
`
`model output for modifying the processing results at a word
`
`
`
`
`search requires the evaluation of possible word matches. It
`
`
`
`recognition event by a lexical tree processor. The modifica
`
`
`is known that such a search is computationally intensive.
`
`
`
`tion can either take place at each lexical tree processor, or at
`
`
`In order to speed up the processing,
`
`
`
`45 the performed during language model processing arrangement.
`
`
`
`
`
`such a search in a speech recognition system, parallel In one embodiment each lexical tree processor determines
`
`
`
`
`processing has been explored. In an article by M K Rav
`
`
`an output score for words in the processing results at word
`
`
`
`
`
`
`
`
`ishankar entitled "Parallel Implementation of Fast Beam recognition events. Thus in this embodiment the language
`
`
`
`
`
`Search for Speaker-Independent Continuous Speech Recogmodel processing arrangement can modify the score using a
`
`
`
`
`
`
`
`nition" (Indian Institute of Science, Bangalor, India, Jul. 16, 50 score for a language model for n preceding words, where n
`
`
`
`
`1993) a multi-threaded implementation of a fast beam search is an integer.
`algorithm is disclosed. The multi-threading implementation In one embodiment the controller instructs a lexical tree
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`a lexical tree tree by passing to process a lexical and synprocessor amount of communication requires a sign ificant
`
`
`
`
`chronization among threads. In an MSC project report by R.
`
`
`
`identifier for the lexical tree and history data for a recogni
`
`
`
`
`
`
`
`Dujari entitled "Parallel Viterbi Search Algorithm for 55 tion path associated with the lexical tree from the results
`
`
`
`
`
`
`
`Speech Recognition" (MIT, February 1992) the parallel memory. The history data preferably includes an accumu
`
`
`
`
`
`
`processing of input speech parameters is disclosed in which lated score for the recognition path. This enables a score to
`
`
`
`a lexical network is split statically
`
`
`be determined among processors. based on the score for the recogn ition path to
`
`
`
`
`
`accumulate to provide a new score during recognition carried out using
`
`
`It is an object of the present invention an
`
`
`improved circuit
`
`
`
`the lexical which can perform parallel processing of 60 tree data structure. The scores can be output in the
`
`
`
`
`
`processing results to the results memory during the process
`speech parameters.
`
`
`In accordance with a first embodiment of the
`
`ing of the present speech parameters so that the scores can be used
`
`
`
`invention, a speech recognition circuit comprises for pruning. an input
`
`
`
`
`port such as input buffer for receiving parameterized speech In one embodiment of the present invention, each lexical
`
`
`
`
`
`
`
`
`
`
`
`
`
`data such as feature vectors. A lexical memory arrangement 65 tree processor operates on more than one lexical tree at the
`
`
`
`
`
`
`is provided which contains lexicon data for word recognisame time, e.g. two lexical trees represented by two different
`
`
`
`lexical tree data structures, or two lexical trees represented
`
`
`
`
`tion. The lexical data comprises a plurality of lexical tree
`
`
`
`3
`
`US 10,971,140 B2
`
`
`4
`
`by the same data structure but displaced in time (which can
`
`
`benefits from a limited segmentation of the lexical data. By
`
`
`
`
`
`be termed to instances of the same lexical tree).
`
`
`
`providing a plurality of processors in a group with a com
`
`
`
`
`At word recognition events, the controller determines new
`
`
`
`mon memory, flexibility in the processing is provided with
`lexical tree identifiers for storing in the results memory for
`
`
`
`
`
`
`
`
`
`out being bandwidth limited by the interface to the memory
`
`5 that would occur if only a single memory were used for all
`
`words identified in the results memory for respective word
`
`
`
`
`
`events. In order to reduce the processing, the controller can
`
`
`
`
`processors. The arrangement is more flexible than the par
`
`
`
`prune the new lexical tree identifiers to reduce the number
`
`
`allel processing arrangement in which each processor only
`
`
`
`of lexical trees which are required to be processed. This
`
`has access to its own local memory and requires fewer
`
`
`
`
`pruning can be achieved using context dependant n phones
`
`
`
`
`memory interfaces (i.e. chip pins). Each processor within a
`10
`
`to reduce the number of possible next phones. The number
`
`group can access the same lexical data as any other proces
`
`
`
`can be further reduced by using a language model look
`
`
`
`sor in the group. The controller can thus control the parallel
`ahead technique.
`
`
`
`processing of input speech parameters in a more flexible
`
`
`
`In one embodiment of the present invention, the lexical
`
`
`
`manner. For example, it allows more than one processor to
`tree processors are arranged in groups or clusters. The
`
`
`
`
`datausing the same lexical input speech parameters lexical memory arrangement comprises a plurality of partial 15 process
`
`
`
`
`
`
`
`in a lexical memory. This is because the lexical data is
`
`
`
`
`lexical memories. Each partial lexical memory is connected
`
`
`segmented into domains which are accessible by multiple
`
`
`
`to one of the groups of lexical tree processors and contains
`processors.
`part of the lexical data. Thus a group of lexical tree proces
`
`
`In a preferred embodiment this aspect of the present
`
`
`sors and a partial lexical memory form a cluster. Each lexical
`
`
`
`
`
`invention is used in combination with the first aspect of the
`
`
`
`tree processor is operative to process the speech parameters 20
`
`
`
`present invention. In such an arrangement each processor
`
`
`
`using a partial lexical memory and the controller controls
`
`
`
`each lexical tree processor to process a lexical tree corre
`
`
`
`performs lexical tree processing and the lexical data stored
`
`
`sponding to partial lexical data in a corresponding partial
`
`
`
`in each lexical memory comprises lexical tree data structures
`lexical memory.
`
`
`which each comprise a model of words having common
`
`
`
`
`
`In another embodiment of the present invention
`
`
`
`
`prefix the lexi- 25 components and an initial component that is unique.
`cal memory arrangement comprises
`
`
`In preferred a plurality of partial embodiments of the second aspect of the
`
`
`
`
`
`
`present invention, the preferred embodiments of the first
`
`
`
`lexical memories. Each partial lexical memory being con
`
`
`processors
`
`aspect of the present nected to one of the lexical tree and containing invention are incorporated.
`
`part of the lexical data. Each lexical tree
`
`
`
`Embodiments processor processes of the present invention will now be
`
`
`
`the speech parameters using a corresponding
`
`
`
`described partial lexical 30 with reference to the accompanying drawings in
`
`
`
`
`
`memory and the controller is operative which: to control each
`
`
`circuit for of a speech data processing lexical tree processor to process a lexical tree corresponding FIG. 1 is a diagram
`
`
`
`
`
`
`
`
`
`
`to partial lexical data in a corresponding partial
`
`
`
`generating lexical parameterized speech data (feature vectors);
`
`
`
`circuit in FIG. 2 is a diagram of a speech recognition
`memory.
`
`
`
`In one embodiment of the present
`
`35 accordance invention the lexical with an embodiment of the present invention;
`
`memory arrangement stores the lexical
`
`FIGS. 3a and 3b are schematic diagrams tree data structures illustrating
`
`
`
`
`as Hidden Markov Models and the lexical lexical tree processors tree structures;
`
`
`process perdiagram illustrating the are operative to perform the Viterbi search algorithm using FIG. 4 is a flow
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`each respective lexical tree data structure. Thus in this way, formed by a lexical tree processor to determine a temporary
`
`
`
`
`
`
`
`
`this embodiment of the present invention provides a parallel 40 lexical tree score in accordance with an embodiment of the
`
`present invention;
`
`
`
`Viterbi lexical tree search process for speech recognition.
`process perFIG. 5 is a flow diagram illustrating the
`
`
`
`
`
`
`The first aspect of the present invention is a special
`
`
`
`formed by the lexical tree processor for processing the input
`
`
`purpose circuit built for performing the speech recognition
`
`
`
`feature vectors in accordance with an embodiment of the
`
`
`search process in which there are a plurality of processors
`
`
`present invention; and
`
`
`
`
`
`for performing parallel lexical tree processing on individual 45
`process perFIG. 6 is a flow diagram illustrating the
`
`
`
`
`
`lexical tree processors.
`
`
`formed by the controller in accordance with an embodiment
`
`
`
`In another aspect of the present invention a speech
`
`of the present invention.
`
`
`
`recognition circuit comprises an input port such as an input
`
`
`FIG. 1 illustrates a typical circuit for the parameterization
`
`
`buffer for receiving parameterized speech data such as
`
`of input speech data. In this embodiment the parameters
`
`
`
`
`
`feature vectors. A plurality oflexical memories are provided 50
`
`generated are speech vectors.
`
`
`
`
`which contain in combination complete lexical data for word
`form and A microphone speech in an analogue 1 records
`
`
`
`
`
`
`recognition. Each lexical memory contains part of the com
`
`
`this is input through an anti-aliasing filter 2 to an analogue
`
`
`
`plete lexical data. A plurality of processors are provided
`
`
`
`to-digital converter 3 which samples the speech at 48 kHz at
`
`
`
`
`
`connected in parallel to the input port for processing the
`
`
`20 bits per sample. The digitized output signal is normalized
`
`
`
`
`speech parameters in parallel. The processors are arranged in 55
`
`
`(4)to generated a 10 millisecond data frame every 5
`
`
`groups in which each group is connected to a corresponding
`
`milliseconds with 5 milliseconds overlap (5). A pre-empha
`
`
`
`
`
`lexical memory to form a cluster. A controller controls each
`
`sis operation to the data followed by a hamming6 is applied
`
`
`
`
`processor to process the speech parameters using partial
`
`
`(FFT)window 7. The data is then fast Fourier transformed
`
`
`
`lexical data read from a corresponding lexical memory. The
`
`using a 512 point fast Fourier transform (8) before being
`
`
`
`
`results of processing the speech parameters are output from 60
`
`
`
`energy infiltered by filter bank 9 into 12 frequencies. The
`
`
`the processors as recognition data.
`
`featurethe data frame 5 is also recorded (13) as an additional
`
`
`
`
`
`Thus this aspect of the present invention provides a circuit
`
`
`
`
`filter bankoutputs of the and together with the 12 frequency
`
`
`in which speech recognition processing is performed in
`
`
`
`
`produced and these are9, 13 feature vectors (10) are thus
`
`
`
`
`parallel by groups of processors operating in parallel in
`
`
`
`vectors output as part of the 39 feature 14. First and second
`
`which each group accesses a common memory of lexical 65
`
`
`
`
`feature vectors (11 and derivatives 12) are taken of the 13 10
`
`
`
`data. This aspect of the present invention provides the
`
`
`
`
`vectors to complete the generation of the 39 feature 14.
`
`
`
`advantage of parallel processing of speech parameters and
`
`
`
`5
`
`US 10,971,140 B2
`
`
`6
`
`The arrangement illustrated in FIG. 1 is purely given for
`
`
`tree processor can modify the score in accordance with the
`
`
`
`
`
`
`
`illustration. The present invention encompasses any means
`
`
`
`language model and output a score to the results memory
`25
`
`
`
`
`
`for a word at the by which speech and data can be parameterized to a suitable end of a branch of the lexical tree
`
`
`form for input to the search process
`
`
`
`processing. as will be described in Thus the results memory stores the results as an
`
`
`
`
`
`
`5 ordered list of scores for words together with their histories.
`
`more detail hereinafter.
`
`
`FIG. 2 is a schematic diagram of a speech recognition
`
`
`
`following data: The results memory 25 stores the
`
`
`
`circuit in accordance with an embodiment of the present
`
`
`
`
`1.Initial lexical tree data. This comprises pointers to an
`
`
`
`
`invention for performing the search process. The parameter
`
`
`
`
`initial set of lexical trees. No history data is associated with
`
`
`ized speech data, which in this embodiment comprise fea
`
`
`
`
`
`the initial set of lexical trees. The initial set of lexical trees
`
`
`ture vectors, are input to a feature vector buffer 20. The 10
`
`
`
`is predetermined and stored in the results memory 25 based
`
`
`to buffer the incoming feature vector buffer 20 is provided
`
`
`
`on the most likely initial phones of as utterance. This initial
`
`
`
`
`feature vectors to allow lexical tree processors 21 to read and
`
`
`
`lexical tree data is required to initialize the search process.
`
`
`
`process the feature vectors in the buffer 20 via a feature
`
`
`
`2.History data for search results. This comprises a record
`
`
`
`
`k of lexical tree processors vector bus 24. A plurality 21 are
`
`
`
`of a recognition path through the lexical tree recognition
`
`
`
`
`arranged in a respective lexical tree processor cluster 22. 15
`
`
`
`
`process performed by the lexical tree processors 21. The
`
`
`
`
`model Each lexical tree processor cluster 22 has an acoustic
`
`
`
`
`
`history data includes the current word, the previous N-1
`
`memory 23 in which is stored lexical data for use by the
`
`
`
`words, the current accumulated score, the phone history (for
`
`
`tree processor lexical tree processors 21 within the lexical
`
`
`
`
`
`use in the determination of likely next lexical trees using
`
`
`
`
`tree lexical processor cluster 22. Each lexical tree 21 in the
`
`
`
`
`cross word context dependent tri-phones), and an identifier
`
`
`
`
`
`20 to the acoustic model processor cluster 22 is connected
`
`
`
`or pointer to the lexical tree used for identifying the word.
`
`
`
`tree processor memory 23 within the lexical 22. There are N
`
`3. Best scores for best paths being processed by each
`
`
`
`lexical tree processor clusters and thus there are Nk lexical
`
`
`
`enables the lexical tree processor 21. This information
`
`
`
`
`tree processors by the feature vector 21 connected bus 24 to
`
`
`
`being per-the processing search controller 27 to monitor
`
`
`
`
`tree processor the feature vector buffer 20. Each lexical 21
`
`
`
`
`whether a formed by lexical tree processors 21 to determine
`
`
`
`
`is capable of processing a different lexical tree and thus Nk 25
`
`
`
`global pruning strategy should be applied in order to reas
`
`
`
`
`lexical trees can be processed in parallel. The acoustic model
`
`
`
`
`sign processing performed by a lexical tree processor if its
`
`
`
`
`set of lexical data, whole a complete memories 23 store as a
`
`
`
`
`best score for its best path is below a threshold or well below
`
`
`
`i.e. lexical tree data structures for use in the lexical tree
`
`
`
`the best scores for the paths being processed by other lexical
`
`
`
`processing by the lexical tree processors 21. Each acoustic
`tree processors
`21.
`
`
`
`30of the lexical model memory 23 contains part or a segment
`4.Temporary lexical tree scores. These comprise tree
`
`
`
`
`
`treetree data. Since lexical tree processors 21 in a lexical
`
`
`
`scores which are determined as temporary scores to prune
`
`
`modelprocessor cluster 22 access the same acoustic
`
`
`
`the next lexical trees to be processed at word ends. The
`
`
`
`
`one lexical treememory 23, it is possible for more than
`
`
`
`
`temporary lexical tree scores include lexical tree identifiers
`
`
`
`
`data. This providesthe same lexical processor 21 to process
`
`
`
`or pointers to identify the next lexical trees to be processed.
`
`
`
`for some degree of flexibility in the controlling of the 35
`
`
`The scores enable the pruning of this list.
`
`
`
`
`theprocessing by the lexical tree processors 21. Further,
`5.Pruning threshold. This can be a global threshold value
`
`
`
`
`
`
`
`only one copyacoustic model memories 23 need not contain
`for use in the pruning of the lexical trees globally, or a local
`
`
`
`of the lexical data. It is-possible to build in a redundancy in
`
`
`
`threshold value for use by a lexical processor for locally
`
`
`the data to further enhance the flexibility. This avoids any
`
`
`
`
`pruning the processing performed by the lexical processor
`
`
`
`
`bottleneck in the processing due to the search processing 40
`21.
`
`
`focusing on a small number of lexical trees.
`Hidden Markov The acoustic model memory 23 stores a
`
`
`
`
`
`processing for storing A results memory 25 is provided
`
`
`Model for acoustically modelling w