`Catchpole
`
`US 10,971,140 B2
`( 10 ) Patent No .:
`Apr. 6 , 2021
`( 45 ) Date of Patent :
`
`USO10971140B2
`
`( 54 )
`
`SPEECH RECOGNITION CIRCUIT USING
`PARALLEL PROCESSORS
`( 71 ) Applicant : Zentian Limited , Cambridge ( GB )
`( 72 ) Inventor : Mark Catchpole , Prickwillow ( GB )
`( 73 ) Assignee : Zentian Limited , Cambridge ( GB )
`Subject to any disclaimer , the term of this
`( * ) Notice :
`patent is extended or adjusted under 35
`U.S.C. 154 ( b ) by 0 days .
`
`( 21 ) Appl . No .: 16 / 266,265
`
`( 22 ) Filed :
`
`Feb. 4 , 2019
`
`( 65 )
`
`Prior Publication Data
`US 2019/0172447 A1
`Jun . 6 , 2019
`
`( 63 )
`
`Related U.S. Application Data
`Continuation of application No. 15 / 392,396 , filed on
`Dec. 28 , 2016 , now Pat . No. 10,217,460 , which is a
`( Continued )
`Foreign Application Priority Data
`( 30 )
`0202546
`Feb. 4 , 2002 ( GB )
`
`( 51 )
`
`Int . Ci .
`GIOL 15/32
`GIOL 15/187
`
`( 2013.01 )
`( 2013.01 )
`( Continued )
`
`( 52 ) U.S. CI .
`CPC
`
`GIOL 15/187 ( 2013.01 ) ; GIOL 15/05
`( 2013.01 ) ; GIOL 15/34 ( 2013.01 )
`( 58 ) Field of Classification Search
`CPC
`GIOL 15/32 ; GIOL 15/34
`See application file for complete search history .
`
`( 56 )
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,977,599 A * 12/1990 Bahl
`4,980,918 A * 12/1990 Bahl
`( Continued )
`FOREIGN PATENT DOCUMENTS
`
`GIOL 15/02
`704 / 256.4
`GIOL 15/08
`704/240
`
`GB
`GB
`
`7/1983
`2 112 194 A
`5/1999
`2 331 392 A
`( Continued )
`
`OTHER PUBLICATIONS
`R.M. Woodward et al . , “ Tissue Classification Using Terahertz
`Pulsed Imaging ” , Proceedings of the SPIE — The International Soci
`ety for Optical Engineering SPIE - INT . Soc . Opt . Eng . USA , vol .
`5318 , No. 1 , 2004 , pp . 23-33 . Abstract also attached .
`( Continued )
`Primary Examiner Daniel Abebe
`( 74 ) Attorney , Agent , or Firm - - Finnegan Henderson
`Farabow Garrett & Dunner LLP
`( 57 )
`ABSTRACT
`A speech recognition circuit comprises an input buffer for
`receiving processed speech parameters . A lexical memory
`contains lexical data for word recognition . The lexical data
`comprises a plurality of lexical tree data structures . Each
`lexical tree data structure comprises a model of words
`having common prefix components . An initial component of
`each lexical tree structure is unique . A plurality of lexical
`tree processors are connected in parallel to the input buffer
`for processing the speech parameters in parallel to perform
`parallel lexical tree processing for word recognition by
`accessing the lexical data in the lexical memory . A results
`memory is connected to the lexical tree processors for
`storing processing results from the lexical tree processors
`and lexical tree identifiers to identify lexical trees to be
`processed by the lexical tree processors . A controller con
`trols the lexical tree processors to process lexical trees
`( Continued )
`
`HARD
`
`reder
`r - d +
`
`d - stao
`
`d - tao
`
`r - ao + t
`r - ao + k
`ao
`
`l - a0 ++
`l - ao + k
`ao ,
`
`ao - t + sil
`
`ao - k + sil
`k
`
`ao - t + sil
`
`t
`
`ao - k + sil
`K
`
`LEXICAL TREE 3
`
`ROT
`
`ROCK
`
`LEXICAL TREE 4
`LOT
`
`LOCK
`
`IPR2023-00037
`Apple EX1001 Page 1
`
`
`
`US 10,971,140 B2
`Page 2
`
`identified in the results memory by performing parallel
`processing on a plurality of said lexical tree data structures .
`8 Claims , 6 Drawing Sheets
`
`2003/0178584 A1
`2005/0119883 A1 *
`
`9/2003 Arnone et al .
`6/2005 Miyazaki
`2008/0255839 A1 10/2008 Larri et al .
`
`GIOL 15/142
`704/231
`
`FOREIGN PATENT DOCUMENTS
`
`GB
`GB
`WO
`WO
`
`2 347 835 A
`2 380 920 A
`WO 00/50859
`WO - 03 / 067572 A2
`
`9/2000
`4/2003
`8/2000
`8/2003
`
`OTHER PUBLICATIONS
`V.P. Wallace et al . ,
`“ Biomedical Applications of THz Imaging ” ,
`Microwave Symposium Digest , 2004 , IEEE MTT - S International
`Fort Worth , TX , USA , Jun . 6-11 , 2004 , Piscataway , NJ , USA , IEEE ,
`vol . 3 , Jun . 6 , 2004 ( Jun . 6 , 2004 ) , pp . 1579-1581 .
`V.P. Wallace et al . ,
`“ Biomedical Applications of Terahertz Pulse
`Imaging ” , Second Joint EMBS - BMES Conference 2002. Confer
`ence Proceedings . 24th Annual International Conference of the
`Engineering in Medicinal and Biology Society . Annual Fall Meeting
`of the Biomedical Engineering Society , Houston , TX , Oct. 23-26 ,
`2002 , Annual vol . 1 of 3. Conf . 24 , Oct. 23 , 2002 ( Oct. 23 , 2002 ) ,
`pp . 2333-2334 .
`A.J. Fitzgerald et al . , “ Terahertz Imaging of Breast Cancer , a
`Feasibility Study ” , Conference Digest of the 2004 Joint 29th
`International Conference on Infrared and Millimeter Waves and
`12th International Conference on Terahertz Electronics IEEE Piscataway ,
`NJ , USA , 2004 , pp . 823-824 .
`B.E. Cole et al . , “ Terahertz Imaging and Spectroscopy of Human
`Skin , In - vivo ” , Proceedings of SPIE , vol . 4276 , pp . 1-10 .
`S. Glinski et al . ,
`“ Spoken Language Recognition on a DSP Array
`Processor ” , IEEE Transactions on Parallel and Distributed Systems ,
`Jul . 5 , 1994 , No. 7 , New York , USA , pp . 697-703 .
`S. Chatterjee et al . , “ Connected Speech Recognition on a Multiple
`Processor Pipeline ” , ICASSP 89 , May 23 , 1989 , Glasgow , UK , pp .
`774-777 .
`S. H. Chung et al . “ A Parallel Phoneme Recognition Algorithm
`Based on Continuous Hidden Markov Model ” , Proceedings 13th
`International Parallel Processing Symposium and 10th Symposium
`on Parallel and Distributed Processing , IPPS / SPDP 1999 , Proceed
`ings of 13th International Parallel Processing Symposium and 10th
`Symposium on Parallel and Distributed Pro , IEEE Comput . Soc . pp .
`453-457 ( 1999 ) .
`S.H. Chung et al . “ A Parallel Computation Model for Integrated
`Speech and Natural Language Understanding ” , IEEE Transactions
`on Computers , vol . 42 , No. 10 , Oct. 1 , 1993 .
`N. Deshmukn et al .
`“ Hierarchical Search for Large - Vocabulary
`Conversational Speech Recognition : Working Toward a Solution to
`the Decoding Problem ” , IEEE Signal Processing Magazine , vol . 16 ,
`No. 5 , pp . 84-107 ( Sep. 1999 ) .
`* cited by examiner
`
`Related U.S. Application Data
`continuation of application No. 14 / 309,476 , filed on
`Jun . 19 , 2014 , now Pat . No. 9,536,516 , which is a
`continuation of application No. 13 / 253,223 , filed on
`Oct. 5 , 2011 , now Pat . No. 8,768,696 , which is a
`continuation of application No. 12 / 554,607 , filed on
`Sep. 4 , 2009 , now Pat . No. 8,036,890 , which is a
`continuation of application No. 10 / 503,463 , filed as
`application No. PCT / GB03 / 00459 on Feb. 4 , 2003 ,
`now Pat . No. 7,587,319 .
`( 51 ) Int . Cl .
`GIOL 15/34
`GIOL 15/05
`
`( 56 )
`
`( 2013.01 )
`( 2013.01 )
`References Cited
`U.S. PATENT DOCUMENTS
`5,293,213 A
`5,349,645 A
`5,457,768 A
`5,579,436 A *
`
`3/1994 Klein et al .
`9/1994 Zhao
`10/1995 Tsuboi
`11/1996 Chou
`
`GIOL 15/063
`704/236
`
`GIOL 15/10
`704/246
`
`GIOL 15/142
`704/254
`
`GIOL 15/065
`704/251
`
`GIOL 15/20
`704/226
`
`5,621,859 A
`5,710,866 A *
`5,832,428 A
`5,881,312 A
`5,960,395 A
`5,983,180 A *
`5,995,930 A
`6,047,283 A
`6,374,222 B1
`6,526,380 B1
`7,024,359 B2 *
`7,035,802 B1
`7,065,488 B2 *
`7,120,582 B1
`7,174,037 B2
`7,899,669 B2
`2001/0011218 Al
`2002/0143531 A1
`
`4/1997 Schwartz
`1/1998 Alleva
`
`11/1998 Chow
`3/1999 Dulong
`9/1999 Tzirkel - Hanock
`11/1999 Robinson
`
`11/1999 Hab - Umbach et al .
`4/2000 Braun
`4/2002 Kao
`2/2003 Thelen
`4/2006 Chang
`4/2006 Rigazio
`6/2006 Yajima
`10/2006 Young et al .
`2/2007 Arnone et al .
`3/2011 Gadbois
`8/2001 Phillips et al .
`10/2002 Kahn
`
`IPR2023-00037
`Apple EX1001 Page 2
`
`
`
`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 1 of 6
`
`US 10,971,140 B2
`
`-14
`
`dldt
`
`-11
`
`-12
`
`39 feature vectors @ 200Hz
`
`d / dt
`
`-10
`
`13 feature vectors
`
`Filterbank 12 freqs
`
`9
`
`8
`
`FFT 512 pt .
`-13
`
`7
`
`( Hamming ) Window
`
`Energy
`
`normalise
`
`ADC fs = 48 kHz 20 bit
`
`Fig 1
`
`2
`
`Anti aliasing filter
`
`pre - emphasis
`
`5
`
`10 ms data frame
`
`48000 * 10 * e - 3
`
`
`
`= 480 samples
`
`IPR2023-00037
`Apple EX1001 Page 3
`
`
`
`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 2 of 6
`
`US 10,971,140 B2
`
`N best sentences
`
`
`
`Results Memory
`
`Tree Processor Control bus
`Lexical
`
`-29
`
`Language Model Processor
`
`-31
`
`Language Model Memory
`
`25
`
`-27
`
`tree identifiers
`
`path scores Texical
`
`Search Controller
`
`-28
`
`bus
`
`program & data memory
`
`
`
`
`
`language model bus
`
`-26
`
`-30
`
`To all
`
`
`
`Lexical Tree Processors
`
`
`
`path score and history bus
`
`
`
`
`
`
`
`
`
`Search Engine
`
`-22
`
`!
`
`-2.3
`
`
`
`
`
`
`
`Lexical Tree Processar Cluster N
`
`Fig 2
`
`
`
`
`
`
`
`Lexical Tree Processor Cluster 1
`
`tree processor
`lexical
`tree processor 2
`tree processor 1
`
`lexical
`
`lexical
`
`-22
`
`-23
`
`tree processor ?
`lexical
`tree processor 2
`tree processor 1
`lexical
`
`lexical
`
`
`
`
`
`Acoustic Model Memory
`
`
`
`
`
`Acoustic Model Memory
`
`-24
`
`
`
`feature vector bus
`
`-21
`
`-21
`
`21
`
`-21
`
`feature vector buffer
`
`feature vector
`
`word constraints
`
`IPR2023-00037
`Apple EX1001 Page 4
`
`
`
`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 3 of 6
`
`US 10,971,140 B2
`
`CARD
`
`r - d + r
`
`d
`
`HEART
`
`HART
`
`retr
`
`HARD
`
`
`
`LEXICAL TREE 3
`
`r - d + r - dur d
`
`ROCK
`
`ROT
`
`
`
`LEXICAL TREE 4
`LOT
`
`LOCK
`
`
`
`LEXICAL TREE 1 aa - r + d
`
`r
`
`aa - r + t aa - r + d
`r
`
`ao - t + sil
`
`t .
`
`ao - k + sil
`
`k
`
`ao - t + sil
`
`t
`
`ao - k + sil k
`
`k - aa + r
`
`aa
`
`h - aa + r
`
`aa
`
`
`
`LEXICAL TREE 2
`
`r - ao + t r - ao + k
`
`ao
`
`l - ao + t l - ao + k
`
`ao
`
`UW - k + aa
`
`K
`
`uw - h + aa
`h
`
`d - r + ao
`
`r
`
`d - tao
`
`YOU
`
`HARD
`
`Fig 3a
`
`y - uw + K y - uw + h
`( UW
`
`Fig 3b
`
`d
`r - dur r - d +
`
`IPR2023-00037
`Apple EX1001 Page 5
`
`
`
`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 4 of 6
`
`US 10,971,140 B2
`
`$ 1 ( INITIALISE
`
`-NO
`
`S2
`
`DATA FOR A NEW LEXICAL
`TREE RECEIVED FROM SEARCH
`CONTROLLER ?
`
`YES
`
`S3
`
`FEATURE VECTOR
`AVAILABLE IN BUFFER ?
`
`S4
`
`-NO
`
`ERROR
`
`YES
`
`READ FEATURE VECTOR FROM
`BUFFER
`
`EVALUATE STATE TRANSITIONS FOR
`FIRST LEXICAL TREENODE USING
`ACOUSTIC MODEL DATA
`
`S5
`
`$ 6
`
`S7 )
`
`DETERMINE SCORE FOR FIRST
`LEXICAL TREE NODE
`
`S8
`
`REQUEST LANGUAGE MODEL SCORE
`USING PREVIOUS N - 1 WORDS IN
`RECENED DATA AND LEXICAL TREE
`DATA IN ACOUSTIC MODEL MEMORY
`
`S9
`
`RECEIVE LANGUAGE MODEL SCORES
`FOR WORDS IN THE LEXICAL TREE
`AND PICK HIGHEST SCORE
`
`GENERATE TEMPORARY LEXICAL TREE
`S10 SCORE USING THE SCORE FOR FIRST
`LEXICAL TREE NODE AND THE
`HIGHEST LANGUAGE MODEL SCORE
`
`$ 11
`
`SEND SCORE TO RESULTS MEMORY
`AS TEMPORARY LEXICAL TREE SCORE
`
`Fig 4
`
`IPR2023-00037
`Apple EX1001 Page 6
`
`
`
`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 5 of 6
`
`US 10,971,140 B2
`
`S20
`
`START
`
`-NO
`
`S21
`DATA FOR A LEXICAL
`TREE RECEIVED FROM SEARCH
`CONTROLLER ?
`
`YES
`
`$ 22
`
`FEATURE VECTOR
`AVAILABLE IN BUFFER
`
`S23
`
`-NO
`
`ERROR
`
`YES
`READ FEATURE VECTOR FROM
`BUFFER
`
`$ 24
`
`EVALUATE STATE TRANSITIONS FOR
`S25 EACH PATH USING ACOUSTIC MODEL
`DATA
`
`NO
`
`S26
`
`DETERMINE SCORES FOR PATHS , SEND
`BEST SCORE TO RESULTS MEMORY AND
`STORE PATH HISTORIES LOCALLY
`
`YES
`
`S27
`
`PRUNING APPLIED TO LEXICAL TREE
`TO DELETE PATHS
`
`S28
`PATH ( S ) REACHED WORD END ?
`
`$ 29
`
`530
`
`YES
`V ?
`APPLY LANGUAGE MODEL SCORE
`
`SEND SCORE AND HISTORY TO
`RESULTS MEMORY
`
`31
`
`DELETE PATH ( S ) AND
`HISTORY DATA
`
`S32
`
`ANY PATHS LEFT TO PROCESS ?
`
`NO
`S33 MESSAGE SENT TO SEARCH CONTROLLER TO
`INDICATE THAT LEXICAL TREE HAS BEEN
`PROCESSED
`
`Fig 5
`
`IPR2023-00037
`Apple EX1001 Page 7
`
`
`
`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 6 of 6
`
`US 10,971,140 B2
`
`S40 INITIALISATION
`
`S41 READ INITIAL LEXICAL TREE DATA IN
`RESULTS MEMORY
`
`INITIAL LEXICAL TREE DATA
`DISTRIBUTED AMONGST LEXICAL TREE
`S42
`PROCESSORS FOR TEMPORARY
`LEXICAL TREE SCORE DETERMINATION
`
`S43 TEMPORARY LEXICAL TREE SCORES
`RETURNED TO RESULTS MEMORY
`
`$ 44
`
`PRUNE THE LEXICAL TREES IN THE
`RESULTS MEMORY ON BASIS OF
`TEMPORARY LEXICAL TREE SCORES
`?
`LEXICAL TREE PROCESSING
`S45 DISTRIBUTED AMONGST LEXICAL TREE
`PROCESSORS
`
`Fig 6
`
`-NO
`
`NO
`
`S46
`HISTORY AND SCORE
`ENTERED IN RESULTS MEMORY FOR
`A WORD ?
`
`YES
`
`S47
`
`DETERMINE NEXT POSSIBLE LEXICAL
`TREES USING CROSS WORD
`TRIPHONES
`
`LEXICAL TREE DATA DISTRIBUTED
`AMONGST LEXICAL TREE
`S48
`PROCESSORS FOR TEMPORARY
`LEXICAL TREE SCORE DETERMINATION
`
`S49 TEMPORARY LEXICAL TREE SCORES
`RETURNED TO RESULTS MEMORY
`
`S50
`
`PRUNE THE LEXICAL TREES IN THE
`RESULTS MEMORY ON BASIS OF
`TEMPORARY LEXICAL SCORES
`
`S51
`
`ALL
`LEXICAL TREE
`PROCESSORS FINISHED
`PROCESSING AND NO LEXICAL
`TREES IN RESULTS
`MEMORY
`
`YES
`
`S52
`
`RESULTS
`OUTPUT
`
`IPR2023-00037
`Apple EX1001 Page 8
`
`
`
`US 10,971,140 B2
`
`1
`2
`SPEECH RECOGNITION CIRCUIT USING
`data structures representing a plurality of lexical trees . Each
`lexical tree data structure comprises a model of words
`PARALLEL PROCESSORS
`having common prefix components and an initial component
`This is a continuation of application Ser . No. 15 / 392,396 ,
`which is unique as an initial component for lexical trees . A
`filed Dec. 28 , 2016 , which is a continuation of application 5 plurality of lexical tree processors are connected in parallel
`Ser . No. 14 / 309,476 , filed Jun . 19 , 2014 , now U.S. Pat . No.
`to the input port and perform parallel lexical tree processing
`9,536,516 , which is a continuation of application Ser . No.
`for word recognition by accessing the lexical data in the
`13 / 253,223 , now U.S. Pat . No. 8,768,696 , filed Oct. 5 , 2011 ,
`lexical memory arrangement . A results memory arrangement
`which is a continuation of application Ser . No. 12 / 554,607 ,
`is connected to the lexical tree processors for storing pro
`now U.S. Pat . No. 8,036,890 , filed Sep. 4 , 2009 , which is 10 cessing results from the lexical tree processors and lexical
`continuation of application Ser . No. 10 / 503,463 , now U.S.
`tree identifiers to identify lexical trees to be processed by the
`Pat . No. 7,587,319 , filed May 24 , 2005 , which is a 371 of
`lexical tree processors . A controller controls the lexical tree
`International Application No. PCT / GB2003 / 000459 , filed
`processors to process lexical trees identified in the results
`Feb. 4 , 2003 , which claims foreign priority to United King-
`memory arrangement by performing parallel processing of a
`dom Application No. 0202546.8 filed Feb. 4 , 2002 , the 15 plurality of lexical tree data structures .
`disclosures of which are incorporated herein by reference in
`Thus in accordance with this embodiment of the present
`invention , the processing in order to perform word recog
`their entireties .
`The present invention generally relates to a speech rec-
`nition is distributed across the processors by controlling the
`ognition circuit which uses parallel processors for process-
`processors to perform processing on different lexical trees .
`ing the input speech data in parallel .
`20 The controller controls the processor by the processes to
`Conventional large vocabulary speech recognition can be
`provide for efficient process management by distributing
`divided into two processes : front end processing to generate
`lexical processing to appropriate processors .
`processed speech parameters such as feature vectors , fol-
`The lexical tree data structure can comprise a phone
`lowed by a search process which attempts to find the most
`model of words , wherein the components comprise phones .
`likely set of words spoken from a given vocabulary ( lexi- 25 For reduced storage , the lexical tree data structure can
`comprise a mono phone lexical tree . The mono phone lexical
`con ) .
`The front end processing generally represents no problem
`tree can be used to generate context dependent phone
`for current processing systems . However , for large vocabu-
`models dynamically . This enables the use of context depen
`lary , speaker independent speech recognition , it is the search
`dent phone models for matching and hence increased accu
`process that presents the biggest challenge . An article by 30 racy whilst not increasing memory requirements . Alterna
`Deshmukh et al entitled “ Hierarchical Search for Large-
`tively , the lexical tree data structure can comprise context
`Vocabulary Conversational Speech Recognition ” ( IEEE Sig-
`dependent phone models .
`nal Processing Magazine , September 1999 , pages 84 to
`The processing performed by each processor in one
`107 ) , the content of which is hereby incorporated by refer-
`embodiment comprises the comparison of the speech param
`ence , discusses the general concepts of large vocabulary 35 eters with the lexical data , e.g. phone models or data derived
`speech recognition . As discussed in this paper , one algorithm
`from the lexical data ( e.g. dynamically generated context
`for performing the search is the Viterbi algorithm . The
`dependent phone models ) to identify words as a word
`Viterbi algorithm is a parallel or breadth first search through
`recognition event and to send information identifying the
`a transition network of states of Hidden Markov Models . An
`identified words to the results memory as the processing
`acoustic model for words in a lexicon are represented as 40 results . In this embodiment a language model processor
`states of Hidden Markov Models . These states represent
`arrangement can be provided for providing a language
`phones or n phones in a phone model of the words . The
`model output for modifying the processing results at a word
`search requires the evaluation of possible word matches . It
`recognition event by a lexical tree processor . The modifica
`is known that such a search is computationally intensive .
`tion can either take place at each lexical tree processor , or at
`In order to speed up the processing , performed during 45 the language model processing arrangement .
`such a search in a speech recognition system , parallel
`In one embodiment each lexical tree processor determines
`processing has been explored . In an article by M K Rav-
`an output score for words in the processing results at word
`ishankar entitled “ Parallel Implementation of Fast Beam
`recognition events . Thus in this embodiment the language
`Search for Speaker - Independent Continuous Speech Recog-
`model processing arrangement can modify the score using a
`nition ” ( Indian Institute of Science , Bangalor , India , Jul . 16 , 50 score for a language model for n preceding words , where n
`1993 ) a multi - threaded implementation of a fast beam search
`is an integer .
`algorithm is disclosed . The multi - threading implementation
`In one embodiment the controller instructs a lexical tree
`requires a significant amount of communication and syn-
`processor to process a lexical tree by passing a lexical tree
`chronization among threads . In an MSC project report by R.
`identifier for the lexical tree and history data for a recogni
`Dujari entitled “ Parallel Viterbi Search Algorithm for 55 tion path associated with the lexical tree from the results
`Speech Recognition " ( MIT , February 1992 ) the parallel
`memory . The history data preferably includes an accumu
`processing of input speech parameters is disclosed in which
`lated score for the recognition path . This enables a score to
`a lexical network is split statically among processors .
`be determined based on the score for the recognition path to
`It is an object of the present invention to provide an
`accumulate a new score during recognition carried out using
`improved circuit which can perform parallel processing of 60 the lexical tree data structure . The scores can be output in the
`speech parameters .
`processing results to the results memory during the process
`In accordance with a first embodiment of the present
`ing of the speech parameters so that the scores can be used
`invention , a speech recognition circuit comprises an input
`for pruning .
`port such as input buffer for receiving parameterized speech
`In one embodiment of the present invention , each lexical
`data such as feature vectors . A lexical memory arrangement 65 tree processor operates on more than one lexical tree at the
`is provided which contains lexicon data for word recogni-
`same time , e.g. two lexical trees represented by two different
`tion . The lexical data comprises a plurality of lexical tree
`lexical tree data structures , or two lexical trees represented
`
`IPR2023-00037
`Apple EX1001 Page 9
`
`
`
`US 10,971,140 B2
`
`4
`3
`benefits from a limited segmentation of the lexical data . By
`by the same data structure but displaced in time ( which can
`providing a plurality of processors in a group with a com
`be termed to instances of the same lexical tree ) .
`mon memory , flexibility in the processing is provided with
`At word recognition events , the controller determines new
`out being bandwidth limited by the interface to the memory
`lexical tree identifiers for storing in the results memory for
`words identified in the results memory for respective word 5 that would occur if only a single memory were used for all
`events . In order to reduce the processing , the controller can
`processors . The arrangement is more flexible than the par
`prune the new lexical tree identifiers to reduce the number
`allel processing arrangement in which each processor only
`of lexical trees which are required to be processed . This
`has access to its own local memory and requires fewer
`pruning can be achieved using context dependant n phones
`memory interfaces ( .e . chip pins ) . Each processor within a
`to reduce the number of possible next phones . The number 10 group can access the same lexical data as any other proces
`can be further reduced by using a language model look
`sor in the group . The controller can thus control the parallel
`ahead technique .
`processing of input speech parameters in a more flexible
`In one embodiment of the present invention , the lexical
`manner . For example , it allows more than one processor to
`tree processors are arranged in groups or clusters . The
`lexical memory arrangement comprises a plurality of partial 15 process input speech parameters using the same lexical data
`in a lexical memory . This is because the lexical data is
`lexical memories . Each partial lexical memory is connected
`segmented into domains which are accessible by multiple
`to one of the groups of lexical tree processors and contains
`processors .
`part of the lexical data . Thus a group of lexical tree proces
`sors and a partial lexical memory form a cluster . Each lexical
`In a preferred embodiment this aspect of the present
`tree processor is operative to process the speech parameters 20 invention is used in combination with the first aspect of the
`using a partial lexical memory and the controller controls
`present invention . In such an arrangement each processor
`each lexical tree processor to process a lexical tree corre-
`performs lexical tree processing and the lexical data stored
`sponding to partial lexical data in a corresponding partial
`in each lexical memory comprises lexical tree data structures
`lexical memory .
`which each comprise a model of words having common
`In another embodiment of the present invention the lexi- 25 prefix components and an initial component that is unique .
`cal memory arrangement comprises a plurality of partial
`In preferred embodiments of the second aspect of the
`lexical memories . Each partial lexical memory being con-
`present invention , the preferred embodiments of the first
`nected to one of the lexical tree processors and containing
`aspect of the present invention are incorporated .
`part of the lexical data . Each lexical tree processor processes
`Embodiments of the present invention will now be
`the speech parameters using a corresponding partial lexical 30 described with reference to the accompanying drawings in
`memory and the controller is operative to control each
`which :
`lexical tree processor to process a lexical tree corresponding
`FIG . 1 is a diagram of a speech data processing circuit for
`to partial lexical data in a corresponding partial lexical
`generating parameterized speech data ( feature vectors ) ;
`FIG . 2 is a diagram of a speech recognition circuit in
`memory .
`In one embodiment of the present invention the lexical 35 accordance with an embodiment of the present invention ;
`FIGS . 3a and 3b are schematic diagrams illustrating
`memory arrangement stores the lexical tree data structures
`as Hidden Markov Models and the lexical tree processors
`lexical tree structures ;
`are operative to perform the Viterbi search algorithm using
`FIG . 4 is a flow diagram illustrating the process per
`each respective lexical tree data structure . Thus in this way ,
`formed by a lexical tree processor to determine a temporary
`this embodiment of the present invention provides a parallel 40 lexical tree score in accordance with an embodiment of the
`Viterbi lexical tree search process for speech recognition .
`present invention ;
`The first aspect of the present invention is a special
`FIG . 5 is a flow diagram illustrating the process per
`purpose circuit built for performing the speech recognition
`formed by the lexical tree processor for processing the input
`search process in which there are a plurality of processors
`feature vectors in accordance with an embodiment of the
`for performing parallel lexical tree processing on individual 45 present invention ; and
`FIG . 6 is a flow diagram illustrating the process per
`lexical tree processors .
`In another aspect of the present invention a speech
`formed by the controller in accordance with an embodiment
`recognition circuit comprises an input port such as an input
`of the present invention .
`buffer for receiving parameterized speech data such as
`FIG . 1 illustrates a typical circuit for the parameterization
`feature vectors . A plurality of lexical memories are provided 50 of input speech data . In this embodiment the parameters
`which contain in combination complete lexical data for word
`generated are speech vectors .
`recognition . Each lexical memory contains part of the com-
`A microphone 1 records speech in an analogue form and
`plete lexical data . A plurality of processors are provided
`this is input through an anti - aliasing filter 2 to an analogue
`connected in parallel to the input port for processing the
`to - digital converter 3 which samples the speech at 48 kHz at
`speech parameters in parallel . The processors are arranged in 55 20 bits per sample . The digitized output signal is normalized
`groups in which each group is connected to a corresponding
`( 4 ) to generated a 10 millisecond data frame every 5
`lexical memory to form a cluster . A controller controls each
`milliseconds with 5 milliseconds overlap ( 5 ) . A pre - empha
`processor to process the speech parameters using partial
`sis operation 6 is applied to the data followed by a hamming
`lexical data read from a corresponding lexical memory . The
`window 7. The data is then fast Fourier transformed ( FFT )
`results of processing the speech parameters are output from 60 using a 512 point fast Fourier transform ( 8 ) before being
`filtered by filter bank 9 into 12 frequencies . The energy in
`the processors as recognition data .
`Thus this aspect of the present invention provides a circuit
`the data frame 5 is also recorded ( 13 ) as an additional feature
`in which speech recognition processing is performed in
`and together with the 12 frequency outputs of the filter bank
`parallel by groups of processors operating in parallel in
`9 , 13 feature vectors ( 10 ) are thus produced and these are
`which each group accesses a common memory of lexical 65 output as part of the 39 feature vectors 14. First and second
`data . This aspect of the present invention provides the
`derivatives ( 11 and 12 ) are taken of the 13 feature vectors 10
`advantage of parallel processing of speech parameters and
`to complete the generation of the 39 feature vectors 14 .
`
`IPR2023-00037
`Apple EX1001 Page 10
`
`
`
`US 10,971,140 B2
`
`5
`6
`tree processor can modify the score in accordance with the
`The arrangement illustrated in FIG . 1 is purely given for
`language model and output a score to the results memory 25
`illustration . The present invention encompasses any means
`for a word at the end of a branch of the lexical tree
`by which speech and data can be parameterized to a suitable
`processing . Thus the results memory stores the results as an
`form for input to the search process as will be described in
`5 ordered list of scores for words together with their histories .
`more detail hereinafter .
`The results memory 25 stores the following data :
`FIG . 2 is a schematic diagram of a speech recognition
`1. Initial lexical tree data . This comprises pointers to an
`circuit in accordance with an embodiment of the present
`initial set of lexical trees . No history data is associated with
`invention for performing the search process . The parameter-
`the initial set of lexical trees . The initial set of lexical trees
`ized speech data , which in this embodiment comprise fea-
`ture vectors , are input to a feature vector buffer 20. The 10 is predetermined and stored in the results memory 25 based
`feature vector buffer 20 is provided to buffer the incoming
`on the most likely initial phones of as utterance . This initial
`feature vectors to allow lexical tree processors 21 to read and
`lexical tree data is required to initialize the search process .
`process the feature vectors in the buffer 20 via a feature
`2. History data for search results . This comprises a record
`vector bus 24. A plurality k of lexical tree processors 21 are
`of a recognition path through the lexical tree recognition
`arranged in a respective lexical tree processor cluster 22. 15 process performed by the lexical tree processors 21. The
`Each lexical tree processor cluster 22 has an acoustic model
`history data includes the current word , the previous N - 1
`memory 23 in which is stored lexical data for use by the
`words , the current accumulated score , the phone history ( for
`lexical tree processors 21 within the lexical tree processor
`use in the determination of likely next lexical trees using
`cluster 22. Each lexical tree processor 21 in the lexical tree
`cross word context dependent tri - phones ) , and an identifier
`processor cluster 22 is connected to the acoustic model 20 or pointer to the lexical tree used for identifying the word .
`memory 23 within the lexical tree processor 22. There are N
`3. Best scores for best paths being processed by each
`lexical tree processor clusters and thus there are Nk lexical
`lexical tree processor 21. This information enables the
`tree processors 21 connected by the feature vector bus 24 to
`search controller 27 to monitor the processing being per
`the feature vector buffer 20. Each lexical tree processor 21
`formed by lexical tree processors 21 to determine whether a
`is capable of processing a different lexical tree and thus Nk 25 global pruning strategy should be applied in order to reas
`lexical trees can be processed in parallel . The acoustic model
`sign processing performed by a lexical tree processor if its
`memories 23 store as a whole a complete set of lexical data ,
`best score for its best path is below a threshold or well below
`i.e. lexical tree data structures for use in the lexical tree
`the best scores for the paths being processed by other lexical
`processing by the lexical tree processors 21. Each acoustic
`tree processors 21 .
`model memory 23 contains part or a segment of the lexical 30
`4. Temporary lexical tree scores . These comprise tree
`tree data . Since lexical tree processors 21 in a lexical tree
`scores which are determined as temporary scores to prune
`processor cluster 22 access the same acoustic model
`the next lexical trees to be processed at word ends . The
`memory 23 , it is possible for more than one lexical tree
`temporary lexical tree scores include lexical tree identifiers
`processor 21 to process the same lexical data . This provides
`or pointers to identify the next lexical trees to be processed .
`for some degree of flexibility in the controlling of the 35 The scores enable the pruning of this list .
`processing by the lexical tree processors 21. Further , the
`5. Pruning threshold . This can be a global threshold value
`acoustic model memories 23 need not contain only one copy
`for use in the pruning of the lexical trees globally , or a local
`of the lexical data . It is - possible to build in a redundancy in
`threshold value for use by a lexical processor for locally
`the data to further enhance the flexibility . This avoids any
`pruning the processing performed by the lexical processor
`bottleneck in the processing due to the search processing 40 21 .
`focusing on a small number of lexical trees .
`The acoustic model memory 23 stores a Hidden Markov
`A results memory 25 is provided for storing processing
`Model for acoustically modelling words as lexical trees . The
`results from the lexical tree proce