throbber
( 12 ) United States Patent
`Catchpole
`
`US 10,971,140 B2
`( 10 ) Patent No .:
`Apr. 6 , 2021
`( 45 ) Date of Patent :
`
`USO10971140B2
`
`( 54 )
`
`SPEECH RECOGNITION CIRCUIT USING
`PARALLEL PROCESSORS
`( 71 ) Applicant : Zentian Limited , Cambridge ( GB )
`( 72 ) Inventor : Mark Catchpole , Prickwillow ( GB )
`( 73 ) Assignee : Zentian Limited , Cambridge ( GB )
`Subject to any disclaimer , the term of this
`( * ) Notice :
`patent is extended or adjusted under 35
`U.S.C. 154 ( b ) by 0 days .
`
`( 21 ) Appl . No .: 16 / 266,265
`
`( 22 ) Filed :
`
`Feb. 4 , 2019
`
`( 65 )
`
`Prior Publication Data
`US 2019/0172447 A1
`Jun . 6 , 2019
`
`( 63 )
`
`Related U.S. Application Data
`Continuation of application No. 15 / 392,396 , filed on
`Dec. 28 , 2016 , now Pat . No. 10,217,460 , which is a
`( Continued )
`Foreign Application Priority Data
`( 30 )
`0202546
`Feb. 4 , 2002 ( GB )
`
`( 51 )
`
`Int . Ci .
`GIOL 15/32
`GIOL 15/187
`
`( 2013.01 )
`( 2013.01 )
`( Continued )
`
`( 52 ) U.S. CI .
`CPC
`
`GIOL 15/187 ( 2013.01 ) ; GIOL 15/05
`( 2013.01 ) ; GIOL 15/34 ( 2013.01 )
`( 58 ) Field of Classification Search
`CPC
`GIOL 15/32 ; GIOL 15/34
`See application file for complete search history .
`
`( 56 )
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,977,599 A * 12/1990 Bahl
`4,980,918 A * 12/1990 Bahl
`( Continued )
`FOREIGN PATENT DOCUMENTS
`
`GIOL 15/02
`704 / 256.4
`GIOL 15/08
`704/240
`
`GB
`GB
`
`7/1983
`2 112 194 A
`5/1999
`2 331 392 A
`( Continued )
`
`OTHER PUBLICATIONS
`R.M. Woodward et al . , “ Tissue Classification Using Terahertz
`Pulsed Imaging ” , Proceedings of the SPIE — The International Soci
`ety for Optical Engineering SPIE - INT . Soc . Opt . Eng . USA , vol .
`5318 , No. 1 , 2004 , pp . 23-33 . Abstract also attached .
`( Continued )
`Primary Examiner Daniel Abebe
`( 74 ) Attorney , Agent , or Firm - - Finnegan Henderson
`Farabow Garrett & Dunner LLP
`( 57 )
`ABSTRACT
`A speech recognition circuit comprises an input buffer for
`receiving processed speech parameters . A lexical memory
`contains lexical data for word recognition . The lexical data
`comprises a plurality of lexical tree data structures . Each
`lexical tree data structure comprises a model of words
`having common prefix components . An initial component of
`each lexical tree structure is unique . A plurality of lexical
`tree processors are connected in parallel to the input buffer
`for processing the speech parameters in parallel to perform
`parallel lexical tree processing for word recognition by
`accessing the lexical data in the lexical memory . A results
`memory is connected to the lexical tree processors for
`storing processing results from the lexical tree processors
`and lexical tree identifiers to identify lexical trees to be
`processed by the lexical tree processors . A controller con
`trols the lexical tree processors to process lexical trees
`( Continued )
`
`HARD
`
`reder
`r - d +
`
`d - stao
`
`d - tao
`
`r - ao + t
`r - ao + k
`ao
`
`l - a0 ++
`l - ao + k
`ao ,
`
`ao - t + sil
`
`ao - k + sil
`k
`
`ao - t + sil
`
`t
`
`ao - k + sil
`K
`
`LEXICAL TREE 3
`
`ROT
`
`ROCK
`
`LEXICAL TREE 4
`LOT
`
`LOCK
`
`IPR2023-00037
`Apple EX1001 Page 1
`
`

`

`US 10,971,140 B2
`Page 2
`
`identified in the results memory by performing parallel
`processing on a plurality of said lexical tree data structures .
`8 Claims , 6 Drawing Sheets
`
`2003/0178584 A1
`2005/0119883 A1 *
`
`9/2003 Arnone et al .
`6/2005 Miyazaki
`2008/0255839 A1 10/2008 Larri et al .
`
`GIOL 15/142
`704/231
`
`FOREIGN PATENT DOCUMENTS
`
`GB
`GB
`WO
`WO
`
`2 347 835 A
`2 380 920 A
`WO 00/50859
`WO - 03 / 067572 A2
`
`9/2000
`4/2003
`8/2000
`8/2003
`
`OTHER PUBLICATIONS
`V.P. Wallace et al . ,
`“ Biomedical Applications of THz Imaging ” ,
`Microwave Symposium Digest , 2004 , IEEE MTT - S International
`Fort Worth , TX , USA , Jun . 6-11 , 2004 , Piscataway , NJ , USA , IEEE ,
`vol . 3 , Jun . 6 , 2004 ( Jun . 6 , 2004 ) , pp . 1579-1581 .
`V.P. Wallace et al . ,
`“ Biomedical Applications of Terahertz Pulse
`Imaging ” , Second Joint EMBS - BMES Conference 2002. Confer
`ence Proceedings . 24th Annual International Conference of the
`Engineering in Medicinal and Biology Society . Annual Fall Meeting
`of the Biomedical Engineering Society , Houston , TX , Oct. 23-26 ,
`2002 , Annual vol . 1 of 3. Conf . 24 , Oct. 23 , 2002 ( Oct. 23 , 2002 ) ,
`pp . 2333-2334 .
`A.J. Fitzgerald et al . , “ Terahertz Imaging of Breast Cancer , a
`Feasibility Study ” , Conference Digest of the 2004 Joint 29th
`International Conference on Infrared and Millimeter Waves and
`12th International Conference on Terahertz Electronics IEEE Piscataway ,
`NJ , USA , 2004 , pp . 823-824 .
`B.E. Cole et al . , “ Terahertz Imaging and Spectroscopy of Human
`Skin , In - vivo ” , Proceedings of SPIE , vol . 4276 , pp . 1-10 .
`S. Glinski et al . ,
`“ Spoken Language Recognition on a DSP Array
`Processor ” , IEEE Transactions on Parallel and Distributed Systems ,
`Jul . 5 , 1994 , No. 7 , New York , USA , pp . 697-703 .
`S. Chatterjee et al . , “ Connected Speech Recognition on a Multiple
`Processor Pipeline ” , ICASSP 89 , May 23 , 1989 , Glasgow , UK , pp .
`774-777 .
`S. H. Chung et al . “ A Parallel Phoneme Recognition Algorithm
`Based on Continuous Hidden Markov Model ” , Proceedings 13th
`International Parallel Processing Symposium and 10th Symposium
`on Parallel and Distributed Processing , IPPS / SPDP 1999 , Proceed
`ings of 13th International Parallel Processing Symposium and 10th
`Symposium on Parallel and Distributed Pro , IEEE Comput . Soc . pp .
`453-457 ( 1999 ) .
`S.H. Chung et al . “ A Parallel Computation Model for Integrated
`Speech and Natural Language Understanding ” , IEEE Transactions
`on Computers , vol . 42 , No. 10 , Oct. 1 , 1993 .
`N. Deshmukn et al .
`“ Hierarchical Search for Large - Vocabulary
`Conversational Speech Recognition : Working Toward a Solution to
`the Decoding Problem ” , IEEE Signal Processing Magazine , vol . 16 ,
`No. 5 , pp . 84-107 ( Sep. 1999 ) .
`* cited by examiner
`
`Related U.S. Application Data
`continuation of application No. 14 / 309,476 , filed on
`Jun . 19 , 2014 , now Pat . No. 9,536,516 , which is a
`continuation of application No. 13 / 253,223 , filed on
`Oct. 5 , 2011 , now Pat . No. 8,768,696 , which is a
`continuation of application No. 12 / 554,607 , filed on
`Sep. 4 , 2009 , now Pat . No. 8,036,890 , which is a
`continuation of application No. 10 / 503,463 , filed as
`application No. PCT / GB03 / 00459 on Feb. 4 , 2003 ,
`now Pat . No. 7,587,319 .
`( 51 ) Int . Cl .
`GIOL 15/34
`GIOL 15/05
`
`( 56 )
`
`( 2013.01 )
`( 2013.01 )
`References Cited
`U.S. PATENT DOCUMENTS
`5,293,213 A
`5,349,645 A
`5,457,768 A
`5,579,436 A *
`
`3/1994 Klein et al .
`9/1994 Zhao
`10/1995 Tsuboi
`11/1996 Chou
`
`GIOL 15/063
`704/236
`
`GIOL 15/10
`704/246
`
`GIOL 15/142
`704/254
`
`GIOL 15/065
`704/251
`
`GIOL 15/20
`704/226
`
`5,621,859 A
`5,710,866 A *
`5,832,428 A
`5,881,312 A
`5,960,395 A
`5,983,180 A *
`5,995,930 A
`6,047,283 A
`6,374,222 B1
`6,526,380 B1
`7,024,359 B2 *
`7,035,802 B1
`7,065,488 B2 *
`7,120,582 B1
`7,174,037 B2
`7,899,669 B2
`2001/0011218 Al
`2002/0143531 A1
`
`4/1997 Schwartz
`1/1998 Alleva
`
`11/1998 Chow
`3/1999 Dulong
`9/1999 Tzirkel - Hanock
`11/1999 Robinson
`
`11/1999 Hab - Umbach et al .
`4/2000 Braun
`4/2002 Kao
`2/2003 Thelen
`4/2006 Chang
`4/2006 Rigazio
`6/2006 Yajima
`10/2006 Young et al .
`2/2007 Arnone et al .
`3/2011 Gadbois
`8/2001 Phillips et al .
`10/2002 Kahn
`
`IPR2023-00037
`Apple EX1001 Page 2
`
`

`

`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 1 of 6
`
`US 10,971,140 B2
`
`-14
`
`dldt
`
`-11
`
`-12
`
`39 feature vectors @ 200Hz
`
`d / dt
`
`-10
`
`13 feature vectors
`
`Filterbank 12 freqs
`
`9
`
`8
`
`FFT 512 pt .
`-13
`
`7
`
`( Hamming ) Window
`
`Energy
`
`normalise
`
`ADC fs = 48 kHz 20 bit
`
`Fig 1
`
`2
`
`Anti aliasing filter
`
`pre - emphasis
`
`5
`
`10 ms data frame
`
`48000 * 10 * e - 3
`
`
`
`= 480 samples
`
`IPR2023-00037
`Apple EX1001 Page 3
`
`

`

`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 2 of 6
`
`US 10,971,140 B2
`
`N best sentences
`
`
`
`Results Memory
`
`Tree Processor Control bus
`Lexical
`
`-29
`
`Language Model Processor
`
`-31
`
`Language Model Memory
`
`25
`
`-27
`
`tree identifiers
`
`path scores Texical
`
`Search Controller
`
`-28
`
`bus
`
`program & data memory
`
`
`
`
`
`language model bus
`
`-26
`
`-30
`
`To all
`
`
`
`Lexical Tree Processors
`
`
`
`path score and history bus
`
`
`
`
`
`
`
`
`
`Search Engine
`
`-22
`
`!
`
`-2.3
`
`
`
`
`
`
`
`Lexical Tree Processar Cluster N
`
`Fig 2
`
`
`
`
`
`
`
`Lexical Tree Processor Cluster 1
`
`tree processor
`lexical
`tree processor 2
`tree processor 1
`
`lexical
`
`lexical
`
`-22
`
`-23
`
`tree processor ?
`lexical
`tree processor 2
`tree processor 1
`lexical
`
`lexical
`
`
`
`
`
`Acoustic Model Memory
`
`
`
`
`
`Acoustic Model Memory
`
`-24
`
`
`
`feature vector bus
`
`-21
`
`-21
`
`21
`
`-21
`
`feature vector buffer
`
`feature vector
`
`word constraints
`
`IPR2023-00037
`Apple EX1001 Page 4
`
`

`

`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 3 of 6
`
`US 10,971,140 B2
`
`CARD
`
`r - d + r
`
`d
`
`HEART
`
`HART
`
`retr
`
`HARD
`
`
`
`LEXICAL TREE 3
`
`r - d + r - dur d
`
`ROCK
`
`ROT
`
`
`
`LEXICAL TREE 4
`LOT
`
`LOCK
`
`
`
`LEXICAL TREE 1 aa - r + d
`
`r
`
`aa - r + t aa - r + d
`r
`
`ao - t + sil
`
`t .
`
`ao - k + sil
`
`k
`
`ao - t + sil
`
`t
`
`ao - k + sil k
`
`k - aa + r
`
`aa
`
`h - aa + r
`
`aa
`
`
`
`LEXICAL TREE 2
`
`r - ao + t r - ao + k
`
`ao
`
`l - ao + t l - ao + k
`
`ao
`
`UW - k + aa
`
`K
`
`uw - h + aa
`h
`
`d - r + ao
`
`r
`
`d - tao
`
`YOU
`
`HARD
`
`Fig 3a
`
`y - uw + K y - uw + h
`( UW
`
`Fig 3b
`
`d
`r - dur r - d +
`
`IPR2023-00037
`Apple EX1001 Page 5
`
`

`

`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 4 of 6
`
`US 10,971,140 B2
`
`$ 1 ( INITIALISE
`
`-NO
`
`S2
`
`DATA FOR A NEW LEXICAL
`TREE RECEIVED FROM SEARCH
`CONTROLLER ?
`
`YES
`
`S3
`
`FEATURE VECTOR
`AVAILABLE IN BUFFER ?
`
`S4
`
`-NO
`
`ERROR
`
`YES
`
`READ FEATURE VECTOR FROM
`BUFFER
`
`EVALUATE STATE TRANSITIONS FOR
`FIRST LEXICAL TREENODE USING
`ACOUSTIC MODEL DATA
`
`S5
`
`$ 6
`
`S7 )
`
`DETERMINE SCORE FOR FIRST
`LEXICAL TREE NODE
`
`S8
`
`REQUEST LANGUAGE MODEL SCORE
`USING PREVIOUS N - 1 WORDS IN
`RECENED DATA AND LEXICAL TREE
`DATA IN ACOUSTIC MODEL MEMORY
`
`S9
`
`RECEIVE LANGUAGE MODEL SCORES
`FOR WORDS IN THE LEXICAL TREE
`AND PICK HIGHEST SCORE
`
`GENERATE TEMPORARY LEXICAL TREE
`S10 SCORE USING THE SCORE FOR FIRST
`LEXICAL TREE NODE AND THE
`HIGHEST LANGUAGE MODEL SCORE
`
`$ 11
`
`SEND SCORE TO RESULTS MEMORY
`AS TEMPORARY LEXICAL TREE SCORE
`
`Fig 4
`
`IPR2023-00037
`Apple EX1001 Page 6
`
`

`

`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 5 of 6
`
`US 10,971,140 B2
`
`S20
`
`START
`
`-NO
`
`S21
`DATA FOR A LEXICAL
`TREE RECEIVED FROM SEARCH
`CONTROLLER ?
`
`YES
`
`$ 22
`
`FEATURE VECTOR
`AVAILABLE IN BUFFER
`
`S23
`
`-NO
`
`ERROR
`
`YES
`READ FEATURE VECTOR FROM
`BUFFER
`
`$ 24
`
`EVALUATE STATE TRANSITIONS FOR
`S25 EACH PATH USING ACOUSTIC MODEL
`DATA
`
`NO
`
`S26
`
`DETERMINE SCORES FOR PATHS , SEND
`BEST SCORE TO RESULTS MEMORY AND
`STORE PATH HISTORIES LOCALLY
`
`YES
`
`S27
`
`PRUNING APPLIED TO LEXICAL TREE
`TO DELETE PATHS
`
`S28
`PATH ( S ) REACHED WORD END ?
`
`$ 29
`
`530
`
`YES
`V ?
`APPLY LANGUAGE MODEL SCORE
`
`SEND SCORE AND HISTORY TO
`RESULTS MEMORY
`
`31
`
`DELETE PATH ( S ) AND
`HISTORY DATA
`
`S32
`
`ANY PATHS LEFT TO PROCESS ?
`
`NO
`S33 MESSAGE SENT TO SEARCH CONTROLLER TO
`INDICATE THAT LEXICAL TREE HAS BEEN
`PROCESSED
`
`Fig 5
`
`IPR2023-00037
`Apple EX1001 Page 7
`
`

`

`U.S. Patent
`
`Apr. 6 , 2021
`
`Sheet 6 of 6
`
`US 10,971,140 B2
`
`S40 INITIALISATION
`
`S41 READ INITIAL LEXICAL TREE DATA IN
`RESULTS MEMORY
`
`INITIAL LEXICAL TREE DATA
`DISTRIBUTED AMONGST LEXICAL TREE
`S42
`PROCESSORS FOR TEMPORARY
`LEXICAL TREE SCORE DETERMINATION
`
`S43 TEMPORARY LEXICAL TREE SCORES
`RETURNED TO RESULTS MEMORY
`
`$ 44
`
`PRUNE THE LEXICAL TREES IN THE
`RESULTS MEMORY ON BASIS OF
`TEMPORARY LEXICAL TREE SCORES
`?
`LEXICAL TREE PROCESSING
`S45 DISTRIBUTED AMONGST LEXICAL TREE
`PROCESSORS
`
`Fig 6
`
`-NO
`
`NO
`
`S46
`HISTORY AND SCORE
`ENTERED IN RESULTS MEMORY FOR
`A WORD ?
`
`YES
`
`S47
`
`DETERMINE NEXT POSSIBLE LEXICAL
`TREES USING CROSS WORD
`TRIPHONES
`
`LEXICAL TREE DATA DISTRIBUTED
`AMONGST LEXICAL TREE
`S48
`PROCESSORS FOR TEMPORARY
`LEXICAL TREE SCORE DETERMINATION
`
`S49 TEMPORARY LEXICAL TREE SCORES
`RETURNED TO RESULTS MEMORY
`
`S50
`
`PRUNE THE LEXICAL TREES IN THE
`RESULTS MEMORY ON BASIS OF
`TEMPORARY LEXICAL SCORES
`
`S51
`
`ALL
`LEXICAL TREE
`PROCESSORS FINISHED
`PROCESSING AND NO LEXICAL
`TREES IN RESULTS
`MEMORY
`
`YES
`
`S52
`
`RESULTS
`OUTPUT
`
`IPR2023-00037
`Apple EX1001 Page 8
`
`

`

`US 10,971,140 B2
`
`1
`2
`SPEECH RECOGNITION CIRCUIT USING
`data structures representing a plurality of lexical trees . Each
`lexical tree data structure comprises a model of words
`PARALLEL PROCESSORS
`having common prefix components and an initial component
`This is a continuation of application Ser . No. 15 / 392,396 ,
`which is unique as an initial component for lexical trees . A
`filed Dec. 28 , 2016 , which is a continuation of application 5 plurality of lexical tree processors are connected in parallel
`Ser . No. 14 / 309,476 , filed Jun . 19 , 2014 , now U.S. Pat . No.
`to the input port and perform parallel lexical tree processing
`9,536,516 , which is a continuation of application Ser . No.
`for word recognition by accessing the lexical data in the
`13 / 253,223 , now U.S. Pat . No. 8,768,696 , filed Oct. 5 , 2011 ,
`lexical memory arrangement . A results memory arrangement
`which is a continuation of application Ser . No. 12 / 554,607 ,
`is connected to the lexical tree processors for storing pro
`now U.S. Pat . No. 8,036,890 , filed Sep. 4 , 2009 , which is 10 cessing results from the lexical tree processors and lexical
`continuation of application Ser . No. 10 / 503,463 , now U.S.
`tree identifiers to identify lexical trees to be processed by the
`Pat . No. 7,587,319 , filed May 24 , 2005 , which is a 371 of
`lexical tree processors . A controller controls the lexical tree
`International Application No. PCT / GB2003 / 000459 , filed
`processors to process lexical trees identified in the results
`Feb. 4 , 2003 , which claims foreign priority to United King-
`memory arrangement by performing parallel processing of a
`dom Application No. 0202546.8 filed Feb. 4 , 2002 , the 15 plurality of lexical tree data structures .
`disclosures of which are incorporated herein by reference in
`Thus in accordance with this embodiment of the present
`invention , the processing in order to perform word recog
`their entireties .
`The present invention generally relates to a speech rec-
`nition is distributed across the processors by controlling the
`ognition circuit which uses parallel processors for process-
`processors to perform processing on different lexical trees .
`ing the input speech data in parallel .
`20 The controller controls the processor by the processes to
`Conventional large vocabulary speech recognition can be
`provide for efficient process management by distributing
`divided into two processes : front end processing to generate
`lexical processing to appropriate processors .
`processed speech parameters such as feature vectors , fol-
`The lexical tree data structure can comprise a phone
`lowed by a search process which attempts to find the most
`model of words , wherein the components comprise phones .
`likely set of words spoken from a given vocabulary ( lexi- 25 For reduced storage , the lexical tree data structure can
`comprise a mono phone lexical tree . The mono phone lexical
`con ) .
`The front end processing generally represents no problem
`tree can be used to generate context dependent phone
`for current processing systems . However , for large vocabu-
`models dynamically . This enables the use of context depen
`lary , speaker independent speech recognition , it is the search
`dent phone models for matching and hence increased accu
`process that presents the biggest challenge . An article by 30 racy whilst not increasing memory requirements . Alterna
`Deshmukh et al entitled “ Hierarchical Search for Large-
`tively , the lexical tree data structure can comprise context
`Vocabulary Conversational Speech Recognition ” ( IEEE Sig-
`dependent phone models .
`nal Processing Magazine , September 1999 , pages 84 to
`The processing performed by each processor in one
`107 ) , the content of which is hereby incorporated by refer-
`embodiment comprises the comparison of the speech param
`ence , discusses the general concepts of large vocabulary 35 eters with the lexical data , e.g. phone models or data derived
`speech recognition . As discussed in this paper , one algorithm
`from the lexical data ( e.g. dynamically generated context
`for performing the search is the Viterbi algorithm . The
`dependent phone models ) to identify words as a word
`Viterbi algorithm is a parallel or breadth first search through
`recognition event and to send information identifying the
`a transition network of states of Hidden Markov Models . An
`identified words to the results memory as the processing
`acoustic model for words in a lexicon are represented as 40 results . In this embodiment a language model processor
`states of Hidden Markov Models . These states represent
`arrangement can be provided for providing a language
`phones or n phones in a phone model of the words . The
`model output for modifying the processing results at a word
`search requires the evaluation of possible word matches . It
`recognition event by a lexical tree processor . The modifica
`is known that such a search is computationally intensive .
`tion can either take place at each lexical tree processor , or at
`In order to speed up the processing , performed during 45 the language model processing arrangement .
`such a search in a speech recognition system , parallel
`In one embodiment each lexical tree processor determines
`processing has been explored . In an article by M K Rav-
`an output score for words in the processing results at word
`ishankar entitled “ Parallel Implementation of Fast Beam
`recognition events . Thus in this embodiment the language
`Search for Speaker - Independent Continuous Speech Recog-
`model processing arrangement can modify the score using a
`nition ” ( Indian Institute of Science , Bangalor , India , Jul . 16 , 50 score for a language model for n preceding words , where n
`1993 ) a multi - threaded implementation of a fast beam search
`is an integer .
`algorithm is disclosed . The multi - threading implementation
`In one embodiment the controller instructs a lexical tree
`requires a significant amount of communication and syn-
`processor to process a lexical tree by passing a lexical tree
`chronization among threads . In an MSC project report by R.
`identifier for the lexical tree and history data for a recogni
`Dujari entitled “ Parallel Viterbi Search Algorithm for 55 tion path associated with the lexical tree from the results
`Speech Recognition " ( MIT , February 1992 ) the parallel
`memory . The history data preferably includes an accumu
`processing of input speech parameters is disclosed in which
`lated score for the recognition path . This enables a score to
`a lexical network is split statically among processors .
`be determined based on the score for the recognition path to
`It is an object of the present invention to provide an
`accumulate a new score during recognition carried out using
`improved circuit which can perform parallel processing of 60 the lexical tree data structure . The scores can be output in the
`speech parameters .
`processing results to the results memory during the process
`In accordance with a first embodiment of the present
`ing of the speech parameters so that the scores can be used
`invention , a speech recognition circuit comprises an input
`for pruning .
`port such as input buffer for receiving parameterized speech
`In one embodiment of the present invention , each lexical
`data such as feature vectors . A lexical memory arrangement 65 tree processor operates on more than one lexical tree at the
`is provided which contains lexicon data for word recogni-
`same time , e.g. two lexical trees represented by two different
`tion . The lexical data comprises a plurality of lexical tree
`lexical tree data structures , or two lexical trees represented
`
`IPR2023-00037
`Apple EX1001 Page 9
`
`

`

`US 10,971,140 B2
`
`4
`3
`benefits from a limited segmentation of the lexical data . By
`by the same data structure but displaced in time ( which can
`providing a plurality of processors in a group with a com
`be termed to instances of the same lexical tree ) .
`mon memory , flexibility in the processing is provided with
`At word recognition events , the controller determines new
`out being bandwidth limited by the interface to the memory
`lexical tree identifiers for storing in the results memory for
`words identified in the results memory for respective word 5 that would occur if only a single memory were used for all
`events . In order to reduce the processing , the controller can
`processors . The arrangement is more flexible than the par
`prune the new lexical tree identifiers to reduce the number
`allel processing arrangement in which each processor only
`of lexical trees which are required to be processed . This
`has access to its own local memory and requires fewer
`pruning can be achieved using context dependant n phones
`memory interfaces ( .e . chip pins ) . Each processor within a
`to reduce the number of possible next phones . The number 10 group can access the same lexical data as any other proces
`can be further reduced by using a language model look
`sor in the group . The controller can thus control the parallel
`ahead technique .
`processing of input speech parameters in a more flexible
`In one embodiment of the present invention , the lexical
`manner . For example , it allows more than one processor to
`tree processors are arranged in groups or clusters . The
`lexical memory arrangement comprises a plurality of partial 15 process input speech parameters using the same lexical data
`in a lexical memory . This is because the lexical data is
`lexical memories . Each partial lexical memory is connected
`segmented into domains which are accessible by multiple
`to one of the groups of lexical tree processors and contains
`processors .
`part of the lexical data . Thus a group of lexical tree proces
`sors and a partial lexical memory form a cluster . Each lexical
`In a preferred embodiment this aspect of the present
`tree processor is operative to process the speech parameters 20 invention is used in combination with the first aspect of the
`using a partial lexical memory and the controller controls
`present invention . In such an arrangement each processor
`each lexical tree processor to process a lexical tree corre-
`performs lexical tree processing and the lexical data stored
`sponding to partial lexical data in a corresponding partial
`in each lexical memory comprises lexical tree data structures
`lexical memory .
`which each comprise a model of words having common
`In another embodiment of the present invention the lexi- 25 prefix components and an initial component that is unique .
`cal memory arrangement comprises a plurality of partial
`In preferred embodiments of the second aspect of the
`lexical memories . Each partial lexical memory being con-
`present invention , the preferred embodiments of the first
`nected to one of the lexical tree processors and containing
`aspect of the present invention are incorporated .
`part of the lexical data . Each lexical tree processor processes
`Embodiments of the present invention will now be
`the speech parameters using a corresponding partial lexical 30 described with reference to the accompanying drawings in
`memory and the controller is operative to control each
`which :
`lexical tree processor to process a lexical tree corresponding
`FIG . 1 is a diagram of a speech data processing circuit for
`to partial lexical data in a corresponding partial lexical
`generating parameterized speech data ( feature vectors ) ;
`FIG . 2 is a diagram of a speech recognition circuit in
`memory .
`In one embodiment of the present invention the lexical 35 accordance with an embodiment of the present invention ;
`FIGS . 3a and 3b are schematic diagrams illustrating
`memory arrangement stores the lexical tree data structures
`as Hidden Markov Models and the lexical tree processors
`lexical tree structures ;
`are operative to perform the Viterbi search algorithm using
`FIG . 4 is a flow diagram illustrating the process per
`each respective lexical tree data structure . Thus in this way ,
`formed by a lexical tree processor to determine a temporary
`this embodiment of the present invention provides a parallel 40 lexical tree score in accordance with an embodiment of the
`Viterbi lexical tree search process for speech recognition .
`present invention ;
`The first aspect of the present invention is a special
`FIG . 5 is a flow diagram illustrating the process per
`purpose circuit built for performing the speech recognition
`formed by the lexical tree processor for processing the input
`search process in which there are a plurality of processors
`feature vectors in accordance with an embodiment of the
`for performing parallel lexical tree processing on individual 45 present invention ; and
`FIG . 6 is a flow diagram illustrating the process per
`lexical tree processors .
`In another aspect of the present invention a speech
`formed by the controller in accordance with an embodiment
`recognition circuit comprises an input port such as an input
`of the present invention .
`buffer for receiving parameterized speech data such as
`FIG . 1 illustrates a typical circuit for the parameterization
`feature vectors . A plurality of lexical memories are provided 50 of input speech data . In this embodiment the parameters
`which contain in combination complete lexical data for word
`generated are speech vectors .
`recognition . Each lexical memory contains part of the com-
`A microphone 1 records speech in an analogue form and
`plete lexical data . A plurality of processors are provided
`this is input through an anti - aliasing filter 2 to an analogue
`connected in parallel to the input port for processing the
`to - digital converter 3 which samples the speech at 48 kHz at
`speech parameters in parallel . The processors are arranged in 55 20 bits per sample . The digitized output signal is normalized
`groups in which each group is connected to a corresponding
`( 4 ) to generated a 10 millisecond data frame every 5
`lexical memory to form a cluster . A controller controls each
`milliseconds with 5 milliseconds overlap ( 5 ) . A pre - empha
`processor to process the speech parameters using partial
`sis operation 6 is applied to the data followed by a hamming
`lexical data read from a corresponding lexical memory . The
`window 7. The data is then fast Fourier transformed ( FFT )
`results of processing the speech parameters are output from 60 using a 512 point fast Fourier transform ( 8 ) before being
`filtered by filter bank 9 into 12 frequencies . The energy in
`the processors as recognition data .
`Thus this aspect of the present invention provides a circuit
`the data frame 5 is also recorded ( 13 ) as an additional feature
`in which speech recognition processing is performed in
`and together with the 12 frequency outputs of the filter bank
`parallel by groups of processors operating in parallel in
`9 , 13 feature vectors ( 10 ) are thus produced and these are
`which each group accesses a common memory of lexical 65 output as part of the 39 feature vectors 14. First and second
`data . This aspect of the present invention provides the
`derivatives ( 11 and 12 ) are taken of the 13 feature vectors 10
`advantage of parallel processing of speech parameters and
`to complete the generation of the 39 feature vectors 14 .
`
`IPR2023-00037
`Apple EX1001 Page 10
`
`

`

`US 10,971,140 B2
`
`5
`6
`tree processor can modify the score in accordance with the
`The arrangement illustrated in FIG . 1 is purely given for
`language model and output a score to the results memory 25
`illustration . The present invention encompasses any means
`for a word at the end of a branch of the lexical tree
`by which speech and data can be parameterized to a suitable
`processing . Thus the results memory stores the results as an
`form for input to the search process as will be described in
`5 ordered list of scores for words together with their histories .
`more detail hereinafter .
`The results memory 25 stores the following data :
`FIG . 2 is a schematic diagram of a speech recognition
`1. Initial lexical tree data . This comprises pointers to an
`circuit in accordance with an embodiment of the present
`initial set of lexical trees . No history data is associated with
`invention for performing the search process . The parameter-
`the initial set of lexical trees . The initial set of lexical trees
`ized speech data , which in this embodiment comprise fea-
`ture vectors , are input to a feature vector buffer 20. The 10 is predetermined and stored in the results memory 25 based
`feature vector buffer 20 is provided to buffer the incoming
`on the most likely initial phones of as utterance . This initial
`feature vectors to allow lexical tree processors 21 to read and
`lexical tree data is required to initialize the search process .
`process the feature vectors in the buffer 20 via a feature
`2. History data for search results . This comprises a record
`vector bus 24. A plurality k of lexical tree processors 21 are
`of a recognition path through the lexical tree recognition
`arranged in a respective lexical tree processor cluster 22. 15 process performed by the lexical tree processors 21. The
`Each lexical tree processor cluster 22 has an acoustic model
`history data includes the current word , the previous N - 1
`memory 23 in which is stored lexical data for use by the
`words , the current accumulated score , the phone history ( for
`lexical tree processors 21 within the lexical tree processor
`use in the determination of likely next lexical trees using
`cluster 22. Each lexical tree processor 21 in the lexical tree
`cross word context dependent tri - phones ) , and an identifier
`processor cluster 22 is connected to the acoustic model 20 or pointer to the lexical tree used for identifying the word .
`memory 23 within the lexical tree processor 22. There are N
`3. Best scores for best paths being processed by each
`lexical tree processor clusters and thus there are Nk lexical
`lexical tree processor 21. This information enables the
`tree processors 21 connected by the feature vector bus 24 to
`search controller 27 to monitor the processing being per
`the feature vector buffer 20. Each lexical tree processor 21
`formed by lexical tree processors 21 to determine whether a
`is capable of processing a different lexical tree and thus Nk 25 global pruning strategy should be applied in order to reas
`lexical trees can be processed in parallel . The acoustic model
`sign processing performed by a lexical tree processor if its
`memories 23 store as a whole a complete set of lexical data ,
`best score for its best path is below a threshold or well below
`i.e. lexical tree data structures for use in the lexical tree
`the best scores for the paths being processed by other lexical
`processing by the lexical tree processors 21. Each acoustic
`tree processors 21 .
`model memory 23 contains part or a segment of the lexical 30
`4. Temporary lexical tree scores . These comprise tree
`tree data . Since lexical tree processors 21 in a lexical tree
`scores which are determined as temporary scores to prune
`processor cluster 22 access the same acoustic model
`the next lexical trees to be processed at word ends . The
`memory 23 , it is possible for more than one lexical tree
`temporary lexical tree scores include lexical tree identifiers
`processor 21 to process the same lexical data . This provides
`or pointers to identify the next lexical trees to be processed .
`for some degree of flexibility in the controlling of the 35 The scores enable the pruning of this list .
`processing by the lexical tree processors 21. Further , the
`5. Pruning threshold . This can be a global threshold value
`acoustic model memories 23 need not contain only one copy
`for use in the pruning of the lexical trees globally , or a local
`of the lexical data . It is - possible to build in a redundancy in
`threshold value for use by a lexical processor for locally
`the data to further enhance the flexibility . This avoids any
`pruning the processing performed by the lexical processor
`bottleneck in the processing due to the search processing 40 21 .
`focusing on a small number of lexical trees .
`The acoustic model memory 23 stores a Hidden Markov
`A results memory 25 is provided for storing processing
`Model for acoustically modelling words as lexical trees . The
`results from the lexical tree proce

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket