`
`US007587319B2
`
`(10)Patent No.:
`c12) United States Patent
`US 7,587,319 B2
`
`(45)Date of Patent:
`Catchpole
`Sep.8,2009
`
`(54)SPEECH RECOGNITION CIRCUIT USING
`FOREIGN PATENT DOCUMENTS
`PARALLEL PROCESSORS
`
`GB
`
`2 112 194 A
`
`7 /1983
`
`
`
`(75)Inventor: Mark Catchpole, Cambridgeshire (GB)
`
`
`
`
`
`
`
`
`
`(73)Assignee: Zentian Limited, Cambridge (GB)
`
`
`
`
`
`( *) Notice: Subject to any disclaimer, the term ofthis
`
`
`
`
`
`patent is extended or adjusted under 35
`
`
`U.S.C. 154(b) by 823 days.
`
`(Continued)
`
`OTHER PUBLICATIONS
`
`S.Glinski et al., "Spoken Language Recognition on a DSP Array
`
`
`
`
`
`
`
`Processor", IEEE Transactions on Parallel and Distributed Systems,
`
`Jul. 5, 1994, No. 7, New York, USA, pp. 697-703.
`
`(Continued)
`
`Primary Examiner-Daniel D Abebe
`
`
`
`
`(74) Attorney, Agent, or Firm-Dickstein Shapiro LLP
`
`(57)
`
`ABSTRACT
`
`(21)Appl. No.: 10/503,463
`
`(22)PCT Filed:Feb.4, 2003
`
`(86)PCT No.: PCT /GB03/00459
`
`§371 (c)(l),
`
`
`
`(2), ( 4) Date:May 24, 2005
`
`
`
`(87) PCT Pub. No.: WO03/067572
`
`
`
`PCT Pub. Date: Aug. 14, 2003
`
`(65)
`
`Prior Publication Data
`
`
`
`
`
`US 2005/0251390Al Nov. 10, 2005
`
`(30)
`
`
`
`A speech recognition circuit comprises an input buffer for
`
`
`
`
`
`
`
`
`receiving processed speech parameters. A lexical memory
`
`
`
`
`contains lexical data for word recognition. The lexical data
`
`
`
`
`comprises a plurality of lexical tree data structures. Each
`
`
`
`lexical tree data structure comprises a model of words having
`
`
`
`common prefix components. An initial component of each
`
`
`
`
`
`lexical tree structure is unique. A plurality of lexical tree
`
`Feb. 4, 2002 (GB) ................................. 0202546.8
`
`
`
`
`processors are connected in parallel to the input buffer for
`(51)Int. Cl.
`
`
`
`
`processing the speech parameters in parallel to perform par
`GJ0L 15100 (2006.01)
`
`
`
`
`allel lexical tree processing for word recognition by accessing
`
`
`
`
`(52)U.S. Cl. ....................... 704/235; 704/242; 704/243;
`
`
`
`
`the lexical data in the lexical memory. A results memory is
`704/258
`
`
`
`
`connected to the lexical tree processors for storing processing
`
`
`(58)Field of Classification Search ................. 704/235,
`
`
`
`
`results from the lexical tree processors and lexical tree iden
`704/242,243,256,258
`
`
`
`
`tifiers to identify lexical trees to be processed by the lexical
`
`See application file for complete search history.
`
`
`
`
`tree processors. A controller controls the lexical tree proces
`
`References Cited
`
`
`
`
`
`sors to process lexical trees identified in the results memory
`
`
`
`
`by performing parallel processing on a plurality of said lexi
`cal tree data structures.
`
`
`
`
`
`Foreign Application Priority Data
`
`
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`Klein et al.
`
`5,293,213 A 3/1994
`
`
`
`
`
`
`5,832,428 A * 11/1998 Chow et al. ................. 704/254
`
`(Continued)
`
`
`
`69 Claims, 6 Drawing Sheets
`
`Results Memory
`
`Search Engine
`
`22
`
`I
`
`LmdealTree PfOoessorClu$ter1 I
`
`L ____________
`'--------------..J
`
`Lexical Tree ProcessorClusterN
`
`.J
`
`I
`
`IPR2023-00033
`Apple EX1001 Page 1
`
`
`
`US 7,587,319 B2
`
`Page 2
`
`S-H. Chung et al., "A Parallel Computational Model for Integrated
`
`
`U.S. PATENT DOCUMENTS
`
`
`
`Speech and Natural Language Understanding", IEEE Transactions
`5,881,312 A 3/1999 Dulong
`
`
`
`
`on Computers, New York, USA, vol. 42, No. 10, Oct. 1, 1993, pp.
`5,995,930 A * 1111999 Rab-Umbach et al. ...... 704/254
`
`
`
`
`1171-1183.
`
`
`
`6,047,283 A * 4/2000 Braun .. ... ... .. ... ... ... ... .. ... 707 /3
`
`
`
`
`N. Deshmukh et al., "Hierarchical Search for Large-Vocabulary Con
`
`
`
`7,035,802 Bl* 4/2006 Rigazio et al. .............. 704/256
`
`
`
`
`versational Speech Recognition: Working Toward a Solution for the
`
`
`
`7,120,582 Bl* 10/2006 Young et al ................. 704/255
`
`
`
`
`Decoding Problem", IEEE Signal Processing Magazine, vol. 16, No.
`
`
`7,174,037 B2 2/2007 Arnone et al.
`
`5,Sep. 1999, pp. 84-107.
`
`
`
`
`2001/0011218 Al* 8/2001 Phillips et al. .............. 704/256
`R.M. Woodward et al., "Tissue Classification Using Terahertz Pulsed
`
`
`
`
`
`2003/0178584 Al 9/2003 Arnone et al.
`
`
`
`
`Imaging", Proceedings of the SPIE-The International Society for
`
`
`2008/0255839 Al 10/2008 Larri et al.
`
`
`
`Optical Engineering SPIE-INT. Soc. Opt. Eng. USA, vol. 5318, No.
`
`
`1, 2004, pp. 23-33. Abstract also attached.
`
`
`
`V.P. Wallace et al., "Biomedical Applications of THz Imaging",
`FOREIGN PATENT DOCUMENTS
`
`
`
`Microwave Symposium Digest, 2004, IEEE MTT-S International
`
`
`
`Fort Worth, TX, USA, Jun. 6-11, 2004, Piscataway, NJ, USA, IEEE,
`
`vol. 3, Jun. 6, 2004, pp. 1579-1581.
`V.P. Wallace et al., "Biomedical Applications of Terahertz Pulse
`
`
`
`
`
`
`Imaging", Second Joint EMBS-BMES Conference 2002. Confer
`
`
`
`ence Proceedings. 24th Annual International Conference of the Engi
`
`
`
`
`
`neering in Medicinal and Biology Society. Annual Fall Meeting of the
`
`
`
`
`
`
`Biomedical Engineering Society, Houston, TX, Oct. 23-26, 2002,
`
`
`Annual vol. 1 of 3. Conf. 24, Oct. 23, 2002, pp. 2333-2334.
`
`
`
`
`
`A.J. Fitzgerald et al., "Terahertz Imaging of Breast Cancer, a Feasi
`S. Chatterjee et al., "Connected Speech Recognition on a Multiple
`
`
`
`
`
`
`
`bility Study", Conference Digest of the 2004 Joint 29th International
`
`
`
`
`Processor Pipeline", ICASSP 89, May 23, 1989, Glasgow, UK, pp.
`
`
`
`Conference on Infrared and Millimeter Waves and 12th International
`774-777.
`
`
`
`
`Conference on Terahertz Electronics IEEE Piscataway, NJ, USA,
`
`
`S-H. Chung et al., "A Parallel Phoneme Recognition Algorithm
`2004, pp. 823-824.
`
`
`
`Based on Continuous Hidden Markov Model", Proceedings of 13th
`B.E. Cole et al., "Terahertz Imaging and Spectroscopy of Human
`
`
`
`
`
`
`
`
`International Parallel Processing Symposium and 10th Symposium
`
`
`Skin, In-vivo", Proceedings ofSPIE, vol. 4276, pp. 1-10.
`
`
`
`
`on Parallel and Distributed Processing, 1999, Los Alamitos, CA,
`
`
`USA, pp. 453-457.
`
`2 331 392 A
`5/1999
`2 347 835 A
`9/2000
`2 380 920 A
`4/2003
`WO 00/50859
`8/2000
`
`WO 03/067572 A2
`8/2003
`
`GB
`GB
`GB
`WO
`WO
`
`OTHER PUBLICATIONS
`
`* cited by examiner
`
`IPR2023-00033
`Apple EX1001 Page 2
`
`
`
`O'I
`
`....
`
`=-
`
`('D
`('D
`
`rJJ
`
`0
`0
`N
`QO �
`
`('D
`rJJ
`
`Fig 1
`
`13
`
`Window
`
`(Hamming)
`
`9
`
`7
`
`4
`
`20 bit
`
`filter
`
`fs =48 kHz
`
`aliasing
`
`ADC
`
`Anti
`
`3
`
`2
`
`IPR2023-00033
`Apple EX1001 Page 3
`
`
`
`O'I
`....
`0
`
`N
`
`...,._
`('D
`
`=('D
`
`
`
`rJJ
`
`••
`
`I<
`
`I
`,,
`21
`
`tree
`
`processor
`tree lexical tree lexical
`
`processor processor
`lexlcal
`
`I
`I
`
`I ,. ,, r21 1
`,------,--------,
`
`21
`
`I
`I
`
`k
`
`2
`
`1
`
`I
`I
`
`processor processor
`lexical
`
`I
`I
`
`sontcnces
`
`Nb I
`
`Results Memory
`
`
`
`-�
`
`,,
`
`Control bus
`Processor
`Lexical Tree
`
`.
`
`'
`
`Processor
`
`Language
`
`Model
`
`r2s
`
`feature vector bus r24
`
`bus r26
`
`oath score and
`historv
`
`To all Lexical
`
`/30
`Tree Processors
`
`language model bus
`
`path scores
`
`Controller -leXlc;aJ tree
`Search
`
`identifiers
`
`,_,-25
`
`r21
`
`program & data bus
`'
`,,.-28
`
`memory
`
`-
`
`
`
`--
`
`-
`
`-
`
`Memory
`Model
`
`Language
`
`.,..-31
`
`c=,
`
`I'
`
`constrain �
`wort!
`
`21
`
`',
`
`-,
`
`--
`
`,...,__ ...... -----
`
`buffer
`vector
`feature
`
`c)
`
`,.,.-21 1• 21
`
`,,
`
`I22
`I Search Engine
`I
`
`.,..-23
`
`Fig 2
`
`I Lexical Tree Processor Cluster N
`I
`I Acoustic Model Memory r
`I
`
`I
`
`L ____________ _.
`
`I
`
`1 I
`I
`I
`
`Cluster
`
`Processor
`
`L-------------'
`
`I
`
`r23
`
`I
`
`I Lexical Tree
`I
`I Acoustic Model Memory
`I
`I
`I
`
`v-22
`I
`I ...
`
`I
`I
`
`2
`
`1
`
`tree lexical tree lexical tree
`
`••
`processor
`I
`
`I
`
`IPR2023-00033
`Apple EX1001 Page 4
`
`
`
`\0
`"'""'
`
`--...l
`00
`
`
` UI
`'-"--...l
`
`LOCK
`
`LOT
`
`LEXICAL TREE 4
`
`ROCK
`
`Fig 3b
`
`--•---
`
`HARD
`
`r-d+I
`
`r-d+r
`
`►0
`
`.... O'I
`
`ROT
`
`0
`
`�
`
`('D
`('D
`
`rJJ
`
`0
`0
`N
`
`
`"'CIO
`
`'?
`('D
`rJJ
`
`
`00•
`
`�
`�
`�
`
`�
`
`LEXICAL TREE 3
`
`ao-t+sil
`
`Fig 3a LEXICAL TREE 2
`
`HART
`
`aa-r+t
`
`YOU ...-e---
`
`y-uw+h
`y-uw+k
`►9
`
`LEXICAL TREE 1
`
`r-d+r
`
`IPR2023-00033
`Apple EX1001 Page 5
`
`
`
`U.S. Patent
`US 7,587,319 B2
`Sep.8,2009 Sheet 4 of 6
`
`S1 INITIALISE
`
`i------NO
`
`YES
`
`YES
`
`S4
`
`NO�
`
`S5
`
`READ FEATURE VECTOR FROM
`BUFFER
`
`EVALUATE STATE TRANSITIONS FOR
`
`
`FIRST LEXICAL TREE NODE USING
`S6
`ACOUSTIC MODEL DATA
`
`S7
`
`DETERMINE SCORE FOR FIRST
`
`LEXICAL TREE NODE
`
`REQUEST LANGUAGE MODEL SCORE
`S USING PREVIOUS N-1 WORDS IN
`B
`
`
`RECEIVED DATAAND LEXICAL TREE
`
`DATA IN ACOUSTIC MODEL MEMORY
`
`RECEIVE LANGUAGE MODEL SCORES
`
`
`
`S9 FOR WORDS IN THE LEXICAL TREE
`
`AND PICK HIGHEST SCORE
`
`GENERATE TEMPORARY LEXICAL TREE
`
`
`S1 Q SCORE USING THE SCORE FOR FIRST
`
`
`LEXICAL TREE NODE AND THE
`HIGHEST LANGUAGE MODEL SCORE
`
`S11
`SEND SCORE TO RESULTS MEMORY
`AS TEMPORARY LEXICAL TREE SCORE
`
`Fig 4
`
`IPR2023-00033
`Apple EX1001 Page 6
`
`
`
`U.S. Patent
`US 7,587,319 B2
`Sep.8,2009 Sheet 5 of 6
`
`S20 START
`
`YES
`
`YES
`
`S23
`
`NO�
`
`READ FEATURE VECTOR FROM
`S24
`BUFFER
`
`EVALUATE STATE TRANSITIONS FOR
`
`EACH PATH USING ACOUSTIC MODEL
`S25
`DATA
`
`NO
`
`DETERMINE SCORES FOR PATHS, SEND
`BEST SCORE TO RES UL T
`S MEMORY AND
`S26
`
`STORE PATH HISTORIES LOCALLY
`
`YES
`
`PRUNING APPLIED TO LEXICAL TREE
`
`TO DELETE PATHS
`S27
`
`YES
`
`
`S29
`
`APPLY LANGUAGE MODEL SCORE
`
`SEND SCORE AND HISTORY TO
`S30
`RESULTS MEMORY
`
`S31
`
`DELETE PATH(S) AND
`HISTORY DATA
`
`Fig 5
`
`NO
`
`S33 MESSAGE SENT TO SEARCH CONTROLLER TO
`
`INDICATE THAT LEXICAL TREE HAS BEEN
`
`PROCESSED
`
`IPR2023-00033
`Apple EX1001 Page 7
`
`
`
`Sheet 6 of 6 US 7,587,319 B2
`U.S. Patent Sep.8,2009
`
`S40 INITIALISATION
`
`READ INITIAL LEXICAL TREE DATA IN
`
`S41
`RESULTS MEMORY
`
`INITIAL LEXICAL TREE DATA
`
`
`DISTRIBUTED AMONGST LEXICAL TREE
`S42 PROCESSORS FOR TEMPORARY
`
`LEXICAL TREE SCORE DETERMINATION
`
`TEMPORARY LEXICAL TREE SCORES
`S43
`RETURNED TO RESULTS MEMORY
`
`Fig 6
`
`PRUNE THE LEXICAL TREES IN THE
`S44 RES UL TS MEMORY ON BASIS OF
`TEMPORARY LEXICAL TREE SCORES
`
`LEXICAL TREE PROCESSING
`S45 DISTRIBUTED
`AMONGST LEXICAL TREE 14---------,
`PROCESSORS
`
`NO
`
`YES
`
`DETERMINE NEXT POSSIBLE LEXICAL
`
`
`S47 TREES USING CROSS WORD
`TRI PHONES
`
`LEXICAL TREE DATA DISTRIBUTED
`AMONGST LEXICAL TREE
`S48
`
`PROCESSORS FOR TEMPORARY
`LEXICAL TREE SCORE DETERMINATION
`
`S49 TEMPORARY LEXICAL TREE SCORES
`RETURNED TO RESULTS MEMORY
`
`
`PRUNE THE LEXICAL TREES IN THE
`S50 RES UL TS MEMORY ON BASIS OF
`TEMPORARY LEXICAL SCORES
`
`YES
`
`S52 RESULTS
`OUTPUT
`
`IPR2023-00033
`Apple EX1001 Page 8
`
`
`
`
`
`US 7,587,319 B2
`
`1
`
`PARALLEL PROCESSORS
`
`
`
`2
`SPEECH RECOGNITION CIRCUIT USING Thus in accordance with this embodiment of the present
`
`
`
`
`
`
`invention, the processing in order to perform word recogni
`
`
`
`
`tion is distributed across the processors by controlling the
`
`
`
`to a speech
`
`processors The present invention generally relates recog to perform processing on different lexical trees.
`
`nition circuit which uses parallel processors for processing
`
`
`
`
`
`
`5 The controller controls the processor by the processes to
`
`the input speech data in parallel.
`
`
`
`
`provide for efficient process management by distributing
`
`
`
`Conventional large vocabulary speech recognition can be
`
`
`lexical processing to appropriate processors.
`
`
`
`divided into two processes: front end processing to generate
`
`
`The lexical tree data structure can comprise a phone model
`
`
`
`
`processed speech parameters such as feature vectors, fol
`
`
`
`of words, wherein the components comprise phones. For
`
`
`
`lowed by a search process which attempts to find the most
`
`
`
`10 reduced storage, the lexical tree data structure can comprise a
`
`
`
`likely set of words spoken from a given vocabulary (lexicon).
`
`
`
`mono phone lexical tree. The mono phone lexical tree can be
`
`
`
`The front end processing generally represents no problem
`
`
`
`
`used to generate context dependent phone models dynami
`
`
`
`for current processing systems. However,
`
`
`
`
`
`cally. for large vocabu This enables the use of context dependent phone models
`
`
`lary, speaker independent
`
`
`for matching speech recognition, it is the search and hence increased accuracy whilst not
`
`
`
`
`
`
`
`
`process that presents the biggest challenge. An article by 15 increasing memory requirements. Alternatively, the lexical
`
`
`
`
`
`
`
`
`
`
`
`
`
`Deshmukh et al entitled "Hierarchical Search for Large-Votree data structure can comprise context dependent phone
`
`
`
`
`cabulary Conversational Speech Recognition" (IEEE Signal models.
`The processing performed by each processor in one
`
`
`
`
`
`Processing Magazine, September 1999, pages 84 to 107), the
`
`
`
`
`
`
`
`
`
`
`content of which is hereby incorporated by reference, disembodiment comprises the comparison of the speech param
`
`
`
`
`
`
`cusses the general concepts of large vocabulary speech rec-20 eters with the lexical data, e.g. phone models or data derived
`
`
`
`
`
`
`
`
`
`ognition. As discussed in this paper, one algorithm for perfrom the lexical data ( e.g. dynamically generated context
`
`
`
`
`
`
`
`forming the search is the Viterbi algorithm. The Viterbi dependent phone models) to identify words as a word recog
`
`
`
`
`
`
`
`
`
`
`algorithm is a parallel or breadth first search through a trannition event and to send information identifying the identified
`
`
`
`
`
`
`
`
`
`sition network of states of Hidden Markov Models. An acous-words to the results memory as the processing results. In this
`
`
`
`
`
`
`
`
`tic model for words in a lexicon are represented as states of 25 embodiment a language model processor arrangement can be
`
`
`
`
`
`
`
`Hidden Markov Models. These states represent phones or n provided for providing a language model output for modify
`
`
`
`
`
`
`
`phones in a phone model of the words. The search requires the ing the processing results at a word recognition event by a
`
`
`
`
`
`
`
`
`evaluation of possible word matches. It is known that such a lexical tree processor. The modification can either take place
`
`
`search is computationally intensive.
`
`
`
`
`at each lexical tree processor, or at the language model pro-
`
`
`
`
`In order to speed up the processing performed during such 30 cessing arrangement.
`
`
`
`
`
`
`
`a search in a speech recognition system, parallel processing In one embodiment each lexical tree processor determines
`
`
`
`
`has been explored. In an article by MK Ravishankar entitled
`
`
`
`an output score for words in the processing results at word
`"Parallel Implementation of Fast Beam Search for Speaker
`
`
`
`
`
`recognition events. Thus in this embodiment the language
`Independent Continuous Speech Recognition" (Indian Insti
`
`
`
`
`
`model processing arrangement can modify the score using a
`
`tute of Science, Bangalor, India, Jul. 16, 1993) a multi
`
`
`
`
`
`
`35 score for a language model for n preceding words, where n is
`an integer.
`
`threaded implementation of a fast beam search algorithm is
`
`
`
`
`In one embodiment the controller instructs a lexical tree
`
`
`
`
`
`
`
`
`disclosed. The multi-threading implementation requires a
`
`
`
`
`processor to process a lexical tree by passing a lexical tree
`
`
`significant amount of communication and synchronization
`
`
`
`identifier for the lexical tree and history data for a recognition
`
`
`
`among threads. In an MSC project report by R Dujari entitled
`
`
`
`path associated with the lexical tree from the results memory.
`
`
`
`
`"Parallel Viterbi Search Algorithm for Speech Recognition"
`40
`
`
`
`
`The history data preferably includes an accumulated score for
`
`
`
`(MIT, February 1992) the parallel processing of input speech
`
`
`the recognition path. This enables a score to be determined
`
`
`
`
`parameters is disclosed in which a lexical network is split
`
`
`based on the score for the recognition path to accumulate a
`
`statically among processors.
`
`
`
`new score during recognition carried out using the lexical tree
`
`
`
`
`It is an object of the present invention to provide an
`
`
`
`data structure. The scores can be output in the processing
`
`
`
`
`
`improved circuit which can perform parallel processing of 45
`
`
`
`
`results to the results memory during the processing of the
`
`speech parameters.
`
`
`
`speech parameters so that the scores can be used for pruning.
`
`
`
`
`In accordance with a first embodiment of the present inven
`
`
`
`In one embodiment of the present invention, each lexical
`
`
`
`
`
`tion, a speech recognition circuit comprises an input port such
`
`
`
`tree processor operates on more than one lexical tree at the
`
`
`
`
`as input buffer for receiving parameterized speech data such
`
`
`same time, e.g. two lexical trees represented by two different
`
`
`
`
`as feature vectors. A lexical memory arrangement is provided
`50
`
`
`
`
`lexical tree data structures, or two lexical trees represented by
`
`
`
`which contains lexicon data for word recognition. The lexical
`
`
`
`the same data structure but displaced in time (which can be
`
`
`
`data comprises a plurality of lexical tree data structures rep
`
`termed to instances of the same lexical tree).
`
`
`
`
`
`resenting a plurality of lexical trees. Each lexical tree data
`
`
`
`
`At word recognition events, the controller determines new
`
`
`structure comprises a model of words having common prefix
`
`
`
`
`
`lexical tree identifiers for storing in the results memory for
`
`
`
`components and an initial component which is unique as an 55
`
`
`
`
`words identified in the results memory for respective word
`
`
`
`
`initial component for lexical trees. A plurality of lexical tree
`
`
`
`events. In order to reduce the processing, the controller can
`
`
`
`processors are connected in parallel to the input port and
`
`
`
`prune the new lexical tree identifiers to reduce the number of
`
`
`
`perform parallel lexical tree processing for word recognition
`
`
`lexical trees which are required to be processed. This pruning
`
`
`
`by accessing the lexical data in the lexical memory arrange
`
`
`can be achieved using context dependant n phones to reduce
`
`
`
`ment. A results memory arrangement is connected to the 60
`
`
`the number of possible next phones. The number can be
`
`
`
`
`
`lexical tree processors for storing processing results from the
`
`
`
`
`further reduced by using a language model look ahead tech
`
`
`
`
`lexical tree processors and lexical tree identifiers to identify
`nique.
`
`
`
`lexical trees to be processed by the lexical tree processors. A
`In one embodiment of the present invention, the lexical tree
`
`
`
`
`
`
`
`controller controls the lexical tree processors to process lexi-
`
`
`
`processors are arranged in groups or clusters. The lexical
`
`
`
`
`cal trees identified in the results memory arrangement by 65
`
`
`
`
`memory arrangement comprises a plurality of partial lexical
`
`
`
`
`performing parallel processing of a plurality of lexical tree
`
`
`
`
`memories. Each partial lexical memory is connected to one of
`data structures.
`
`IPR2023-00033
`Apple EX1001 Page 9
`
`
`
`
`
`US 7,587,319 B2
`
`4
`3
`the groups of lexical tree processors and contains part of the
`
`
`
`invention. In such an arrangement each processor performs
`
`
`
`
`
`
`lexical data. Thus a group of lexical tree processors and a
`
`
`
`
`lexical tree processing and the lexical data stored in each
`
`
`
`partial lexical memory form a cluster. Each lexical tree pro
`
`
`
`
`lexical memory comprises lexical tree data structures which
`
`
`
`
`cessor is operative to process the speech parameters using a
`
`
`
`
`each comprise a model of words having common prefix com-
`
`
`
`
`partial lexical memory and the controller controls each lexical
`
`
`
`5 ponents and an initial component that is unique.
`
`
`
`
`tree processor to process a lexical tree corresponding to par
`
`
`In preferred embodiments of the second aspect of the
`
`
`
`
`tial lexical data in a corresponding partial lexical memory.
`
`
`
`
`present invention, the preferred embodiments of the first
`
`
`In another embodiment of the present invention the lexical
`
`
`
`
`aspect of the present invention are incorporated.
`
`
`
`
`memory arrangement comprises a plurality of partial lexical
`
`
`
`Embodiments of the present invention will now be
`
`
`
`
`memories. Each partial lexical memory being connected to
`
`
`
`
`10 described with reference to the accompanying drawings in
`
`
`
`one of the lexical tree processors and containing part of the
`which:
`
`
`
`
`lexical data. Each lexical tree processor processes the speech
`FIG. 1 is a diagram of a speech data processing circuit for
`
`
`
`
`
`
`
`parameters using a corresponding partial lexical memory and
`
`
`
`
`generating parameterized speech data (feature vectors);
`
`
`
`
`the controller is operative to control each lexical tree proces-
`
`
`
`FIG. 2 is a diagram of a speech recognition circuit in
`
`
`
`
`sor to process a lexical tree corresponding to partial lexical
`
`
`
`accordance with an embodiment of the present invention;
`15
`
`
`
`data in a corresponding partial lexical memory.
`
`
`FIGS. 3a and 3b are schematic diagrams illustrating lexical
`
`
`
`In one embodiment of the present invention tree structures; the lexical
`
`FIG. 4 is a flow diagram illustrating the process performed
`
`
`
`
`
`memory arrangement stores the lexical tree data structures as
`
`
`
`
`
`by a lexical tree processor to determine a temporary lexical
`
`
`Hidden Markov Models and the lexical tree processors are
`
`
`
`
`tree score in accordance with an embodiment of the present
`
`
`
`
`
`operative to perform the Viterbi search algorithm using each 20
`invention;
`
`
`
`
`respective lexical tree data structure. Thus in this way, this
`FIG. 5 is a flow diagram illustrating the process performed
`
`
`
`
`
`
`
`
`embodiment of the present invention provides a parallel Vit
`
`
`
`by the lexical tree processor for processing the input feature
`
`
`
`
`erbi lexical tree search process for speech recognition.
`
`
`
`vectors in accordance with an embodiment of the present
`
`
`
`
`
`The first aspect of the present invention is a special purpose
`
`invention; and
`
`
`
`circuit built for performing the speech recognition search
`25
`FIG. 6 is a flow diagram illustrating the process performed
`
`
`
`
`
`
`process in which there are a plurality of processors for per
`
`
`
`by the controller in accordance with an embodiment of the
`
`
`
`
`
`forming parallel lexical tree processing on individual lexical
`
`present invention.
`tree processors.
`FIG. 1 illustrates a typical circuit for the parameterization
`
`
`
`
`
`
`
`
`
`In another aspect of the present invention a speech recog
`
`
`30 of input speech data. In this embodiment the parameters
`
`
`
`nition circuit comprises an input port such as an input buffer
`
`generated are speech vectors.
`
`
`
`
`for receiving parameterized speech data such as feature vec
`A microphone 1 records speech in an analogue form and
`
`
`
`
`
`
`
`
`tors. A plurality of lexical memories are provided which
`
`
`
`this is input through an anti-aliasing filter 2 to an analogue
`
`
`
`
`contain in combination complete lexical data for word recog
`
`
`
`to-digital converter 3 which samples the speech at 48 kHz at
`
`
`
`
`nition. Each lexical memory contains part of the complete
`
`
`
`
`35 20 bits per sample. The digitized output signal is normalized
`
`
`
`
`lexical data. A plurality of processors are provided connected
`
`
`( 4) to generated a 10 millisecond data frame every 5 millisec
`
`
`
`in parallel to the input port for processing the speech param
`
`
`
`onds with 5 milliseconds overlap (5). A pre-emphasis opera
`
`
`
`
`eters in parallel. The processors are arranged in groups in
`
`
`tion 6 is applied to the data followed by a haniming window
`
`
`
`which each group is connected to a corresponding lexical
`
`
`
`7. The data is then fast Fourier transformed (FFT) using a 512
`
`
`
`memory to form a cluster. A controller controls each proces
`
`
`
`40 point fast Fourier transform (8) before being filtered by filter
`
`
`
`
`
`sor to process the speech parameters using partial lexical data
`
`
`bank 9 into 12 frequencies. The energy in the data frame 5 is
`
`
`
`
`read from a corresponding lexical memory. The results of
`
`
`
`
`also recorded (13) as an additional feature and together with
`
`
`
`
`processing the speech parameters are output from the proces
`
`
`
`the 12 frequency outputs of the filter bank 9, 13 feature
`
`sors as recognition data.
`
`
`
`vectors (10) are thus produced and these are output as part of
`
`
`
`
`Thus this aspect of the present invention provides a circuit
`
`
`
`
`
`45 the 39 feature vectors 14. First and second derivatives (11 and
`
`
`
`in which speech recognition processing is performed in par
`
`
`
`
`12)are taken of the 13 feature vectors 10 to complete the
`
`
`
`allel by groups of processors operating in parallel in which
`
`
`
`
`generation of the 39 feature vectors 14.
`
`
`each group accesses a common memory of lexical data. This
`The arrangement illustrated in FIG. 1 is purely given for
`
`
`
`
`
`
`
`
`aspect of the present invention provides the advantage of
`
`
`
`
`illustration. The present invention encompasses any means by
`
`
`
`
`
`parallel processing of speech parameters and benefits from a
`
`50 which speech and data can be parameterized to a suitable
`
`
`
`
`
`limited segmentation of the lexical data. By providing a plu
`
`
`
`form for input to the search process as will be described in
`
`
`
`rality of processors in a group with a common memory,
`
`more detail hereinafter.
`
`
`
`
`
`flexibility in the processing is provided without being band
`
`
`FIG. 2 is a schematic diagram of a speech recognition
`
`
`width limited by the interface to the memory that would occur
`
`
`circuit in accordance with an embodiment of the present
`
`
`if only a single memory were used for all processors. The
`
`
`
`
`invention for performing the search process. The parameter
`arrangement is more flexible than the parallel processing
`
`
`
`55
`
`
`
`
`ized speech data, which in this embodiment comprise feature
`
`arrangement in which each,processor only has access to its
`
`
`
`
`
`
`vectors are input to a feature vector buffer 20. The feature
`
`own local memory and requires fewer memory interfaces (i.e.
`
`
`
`
`
`
`
`vector buffer 20 is provided to buffer the incoming feature
`chip pins). Each processor within a group can access the same
`
`
`
`
`
`
`
`
`
`vectors to allow lexical tree processors 21 to read and process
`
`lexical data as any other processor in the group. The controller
`
`
`
`
`
`
`
`the feature vectors in the buffer 20 via a feature vector bus 24.
`can thus control the parallel processing of input speech 60
`
`
`
`
`
`
`
`A plurality k of lexical tree processors 21 are arranged in a
`parameters in a more flexible manner. For example, it allows
`
`
`
`
`
`
`
`
`respective lexical tree processor cluster 22. Each lexical tree
`more than one processor to process input speech parameters
`
`
`
`
`
`
`
`processor cluster 22 has an acoustic model memory 23 in
`using the same lexical data in a lexical memory. This is
`
`
`
`
`
`
`which is stored lexical data for use by the lexical tree proces-
`because the lexical data is segmented into domains which are
`
`
`
`
`
`
`
`65 sors 21 within the lexical tree processor cluster 22. Each
`
`
`accessible by multiple processors.
`
`
`
`this aspect
`
`
`
`
`lexical tree In a preferred embodiment of the present inven processor 21 in the lexical tree processor cluster
`
`tion is used in combination with the first aspect
`
`
`22 are connected of the present to the acoustic model memory 23 within the
`
`
`
`
`IPR2023-00033
`Apple EX1001 Page 10
`
`
`
`
`
`US 7,587,319 B2
`
`6
`
`5
`lexical tree processor 22. There are N lexical tree processor
`
`
`
`3.Best scores for best paths being processed by each lexical
`
`
`
`
`
`clusters and thus there are Nk lexical tree processors 21
`
`
`
`tree processor 21. This information enables the search con
`
`
`
`
`connected by the feature vector bus 24 to the feature vector
`
`
`
`
`troller 27 to monitor the processing being performed by
`
`
`
`buffer 20. Each lexical tree processor 21 is capable of pro
`
`
`
`
`lexical tree processors 21 to determine whether a global
`
`
`
`cessing a different lexical tree and thus Nk lexical trees can be 5
`
`
`pruning strategy should be applied in order to reassi
`gn
`
`
`
`
`
`
`
`
`
`processed in parallel. The acoustic model memories 23 store processing performed by a lexical tree processor if its best
`
`
`score for its best path is below a threshold or well below the
`
`
`
`as a whole a complete set oflexical data, i.e. lexical tree data
`
`
`best scores for the paths being processed by other lexical
`structures for use in the lexical tree processing by the lexical
`
`
`
`
`tree processors 21.
`tree processors 21. Each acoustic model memory 23 contains
`
`
`
`part or a segment of the lexical tree data. Since lexical tree
`
`
`10 4. Temporary lexical tree scores. These comprise tree scores
`
`
`
`
`
`
`
`
`
`processors 21 in a lexical tree processor cluster 22 access the
`
`
`
`which are determined as temporary scores to prune the next
`
`
`
`same acoustic model memory 23, it is possible for more than
`
`
`lexical trees to be processed at word ends. The temporary
`
`
`
`
`one lexical tree processor 21 to process the same lexical data.
`
`
`
`
`
`lexical tree scores include lexical tree identifiers or pointers
`
`
`
`to identify the next lexical trees to be processed. The scores
`
`
`
`This provides for some degree of flexibility in the controlling
`
`
`enable the pruning of this list.
`
`
`
`of the processing by the lexical tree processors 21. Further,
`15
`
`
`5. Pruning threshold. This can be a global threshold value for
`
`
`
`the acoustic model memories 23 need not contain only one
`
`
`use in the pruning of the lexical trees globally, or a local
`
`
`
`copy of the lexical data. It is possible to build in a redundancy
`
`
`
`threshold value for use by a lexical processor for locally
`
`
`
`in the data to further enhance the flexibility. This avoids any
`
`
`
`
`pruning the processing performed by the lexical processor
`
`
`
`
`bottleneck in the processing due to the search processing
`
`focusing on a small number oflexical trees. 20
`21.
`The acoustic model memory 23 stores a Hidden Markov
`
`
`
`
`A results memory 25 is provided for storing processing
`
`
`
`
`
`Model for acoustically modelling words as lexical trees. The
`
`
`
`results from the lexical tree processors 21 which are received
`
`
`
`acoustic model memory 23 stores a plurality of lexical tree
`
`
`over the path score and history bus 26. The results memory 25
`
`
`
`
`data structures. Each lexical tree data structure comprises an
`
`
`
`
`also stores information on lexical trees to identify which
`n phone model of a number of words having common prefix
`
`
`
`
`lexical trees are to be processed. A search controller 27 is 25
`
`
`
`
`phones. The first node of the lexical tree (the root) comprises
`
`
`
`
`provided to control the processing performed by the lexical
`
`
`a common n phone to all words in the lexical tree and uniquely
`
`
`
`tree processors 21 in dependence upon a program and data
`
`
`identifies the lexical tree.
`
`
`
`stored in program and data memory 28. The search controller
`Each lexical tree processor 21 includes on-board memory
`
`
`
`
`
`
`
`reads the path scores and lexical tree identifiers from the
`
`
`
`( or local memory) to be used during the lexical tree process
`
`
`
`results memory and controls the lexical tree processors
`30
`
`
`ing. This working memory has to store all of the parameters
`
`
`
`accordingly. A language model processor 29 is provided
`
`
`
`
`currently working on including current scores for all paths
`
`
`
`which is connected to each lexical tree processor 21 by a
`
`
`
`being processed within the lexical tree, and previous N-1
`
`
`language model bus 30. The language model processor 29
`
`
`words for the lexical tree. The local storage of the previous
`
`accesses a language model memory 31 to read language
`
`
`N-1 words enables the lexical tree processor 21, when a word
`
`
`
`model data for provision to lexical tree processors 21 in 35
`
`
`end is reached along a branch of the lexical tree, to send a
`
`
`
`
`
`response to language model data requests. External control of
`
`
`
`request for the language model score for an N gram by send
`
`
`the language model memory 31 is provided by a word con
`
`
`
`ing the identity of the N-1 previous words together with the
`
`
`
`
`
`strains input. The language model processor 29 determines a
`
`word identified at the end of the branch.
`
`
`
`score for a word occurring following N previous words using
`FIG. 3a and 3b schematically illustrate lexical trees which
`
`
`
`
`
`
`
`N grams. When a lexical tree processor requires a language 40
`
`
`
`
`can be processed by the lexical tree processor 21 during the
`
`
`
`model score a request is sent to the language model processor
`
`
`recognition of the two words HARD ROCK. In FIG. 3a a
`
`
`29 over the language model bus 30 identifying the current
`
`
`
`
`
`previous lexical tree terminated at a branch recognizing the
`
`
`
`word and the N-1 previous words. A language model score
`
`word YOU and terminating with the mono phone uw which is
`
`
`
`for the N gram can be returned to the lexical tree processor 21
`
`
`
`
`associated with two context dependent tri-phones y -uw+k
`
`for the modification of the score at the end of a branch of
`45
`
`
`
`
`
`lexical tree processing. The lexical tree processor
`
`
`associ-and y -uw+h. Thus the context dependent tri-phone can modify
`
`ated with the last phone in the lexical tree word model for
`
`
`
`the score in accordance with the language model and output a
`
`
`
`
`YOU indicates two possible next lexical trees, i.e. the lexical
`score to the results memory 25 for a word at the end of a
`
`
`
`trees beginning with the mono phone k and hand having a left
`branch of the lexical tree proces