`
`
`
`y
`
`
`
`IAP8Rec'dPCT/PTO 14MAR 2D_OI
`
`PTO-1390 (Rev. 07-2005)
`Approved for use through 03/31/2007. 0MB 0651-0021
`U. S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE
`Under the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of infonmation unless it displays a valid 0MB control number.
`
`'US ~P.A,,.01~"-ien
`
`20. [!j &her
`
`ii;m;"o~nform~o7
`
`INTERNATIONAL APPLICATION NO.
`PCT/GB2005/003554
`
`ATTORNEY'S DOCKET NUMBER
`M0025.0369/P369
`
`I
`WO 2006/030214, Notification Concerning Submission of Priority Document, Intl.
`Search Report
`
`c;: 1.5)_
`
`The following fees have been submitted
`Basic national fee (37 CFR 1.492(a)) .................................................
`
`$300
`
`CALCULATIONS
`300.00
`$
`
`21.0
`
`PTOUSEONLY
`
`Examination fee (37 CFR 1.492(c))
`22.0
`If the written opinion prepared by ISA/US or the international preliminary examination report
`prepared by IPEA/US indicates all claims satisfy provisions of PCT Article 33(1 )-(4} ................. $0
`, .................... $200
`••••••••••••••••• ........................................................
`All other situations
`••••••••••••••••••••••
`Search fee (37 CFR 1.492(b))
`23.G
`If the written opinion of the ISA/US or the international preliminary examination report prepared by
`IPEA/US indicates all claims satisfy provisions of PCT Article 33(1)·(4) ............................... $0
`Search fee (37 CFR 1.445(a)(2)) has been paid on the international application to the USPTO as an
`International Searching Authority ........................................................................................ $100
`International Search Report prepared by an ISA other than the US and provided to the Office or
`previously communicated to the US by the 1B .................................................................... $400
`$500
`All other situations ..........................................................................................................................
`TOTAL OF 21, 22 and 23 =
`~ Additional fee for specification and drawings filed in paper over 100 sheets (excluding
`sequence listing in compliance with 37 CFR 1.821 (c) or (e) or computer program listing in an
`electronic medium) (37 CFR 1.492(j)).
`The fee is $250 for each additional so sheets of paper or fraction thereof.
`
`Total Sheets
`
`Extra Sheets
`
`Number of each additional 50 or fraction
`thereof (round up to a whole number)
`
`104
`
`-100 =
`
`4
`
`/50 =
`
`1
`
`RATE
`
`X $250.00
`
`Surcharge of $130 for furnishing any of the search fee, examination fee, or the oath or declaration
`after the date of commencement of the national stage (37 CFR 1.492(h)).
`
`CLAIMS
`
`NUMBER FILED
`39 - 20 =
`Total claims
`6-3=
`Independent claims
`MULTIPLE DEPENDENT CLAIM(S) (if applicable)
`
`NUMBER EXTRA
`19
`3
`
`X
`
`X
`
`+
`
`RATE
`
`50.00
`200.00
`
`TOTAL OF ABOVE CALCULATIONS=
`
`~ Applicant claims small entity status. See 37 CFR 1.27. Fees above are reduced by½.
`
`SUBTOTAL=
`
`Processing fee of $130.00 for furnishing the English translation later than 30 months from the earliest
`claimed priority date (37 CFR 1.492(i)}.
`
`$
`
`200.00
`
`400.00
`
`900.00
`
`250.00
`
`950.00
`600.00
`
`2,700.00
`
`1,350.00
`
`1,350.00
`
`$
`
`$
`
`$
`
`$
`
`$
`
`$
`
`$
`
`Fee for recording the enclosed assignment (37 CFR 1.21 (h)). The assignment must be accompanied
`by an appropriate cover sheet (37 CFR 3.28, 3.31). $40.00 per property
`+
`
`$
`
`TOTAL NATIONAL FEE = $
`
`I
`
`TOTAL FEES ENCLOSED=
`
`$
`$
`Amount to be
`refunded:
`
`Amount to be
`charged
`
`1,350.00
`
`40.00
`
`1,390.00
`
`$
`
`$
`
`Page 2 of 3
`
`DSMDB-2229206v0
`
`I
`
`IPR2023-00034
`Apple EX1002 Page 3
`
`
`
`
`
`'► t t/662704
`IAP8 Rec'd PCT/PTO 14 MAR ZOOI
`
`1
`
`A Speech Recognition Circuit and Method
`
`The present invention relates to speech recognition circuits and methods. These circuits
`
`and methods have wide applicability, particularly for devices such as mobile electronic
`
`devices.
`
`There is growing consumer demand for embedded speech recognition in mobile
`
`electronic devices, such as mobile phones, dictation machines, PDAs (personal digital
`
`assistants), mobile games consoles, etc. For example, email and text message dictation,
`
`note taking, form filling, and command and control applications are all potential
`
`applications of embedded speech recognition.
`
`However, when a medium to large vocabulary is required, effective speech recognition
`
`for mobile electronic devices has many difficulties not associated with speech
`
`recognition systems in hardware systems such as personal computers or workstations.
`
`Firstly, the available power in mobile systems is often supplied by battery, and may be
`
`severely limited. Secondly, mobile electronic devices are frequently designed to be as
`small as practically possible. Thus, the memory and resources of such mobile embedded
`systems tends to be very limited, due to power and space restrictions. The cost of
`
`providing extra memory and resources in a mobile electronic device is typically much
`
`higher than that for a less portable device without this space restriction. Thirdly, the
`
`mobile hardware may be typically used in a noisier environment than that of a fixed
`
`computer, e.g. on public transport, near a busy road, etc. Thus, a more complex speech
`
`model and more intensive computation may be required to obtain adequate speech
`
`recognition results.
`
`These restrictions have made it difficult to implement effective speech recognition in
`mobile devices, other than with very limited vocabularies.
`
`Some prior art schemes have been proposed to increase the efficiency of speech
`
`recognition systems, in an attempt to make them more suitable for use in mobile
`technology.
`
`IPR2023-00034
`Apple EX1002 Page 5
`
`
`
`2
`
`In an article entitled "A low-power accelerator for the SPHINX 3 speech recognition
`
`system", in University of Utah, International conference on Compilers, Architectures
`
`and Synthesis for Embedded Systems, Nov 2003, Davis et al have proposed the idea of
`
`using a special purpose co-processor for up-front calculation of the computationally
`
`expensive Gaussian output probabilities of audio frames corresponding to particular
`
`states in the acoustic model.
`
`In an article entitled "Hardware Speech Recognition in Low Cost, Low Power Devices",
`
`University of California, Berkeley, CS252 Class Project, Spring 2003, Sukun Kim et al
`
`describe using special purpose processing elements for each of the nodes in the network
`
`to be searched. This effectively implies having a single processing element for each
`
`phone in the network. An alternative suggested by Sukun Kim et al is to provide a
`
`processor for each state in the network.
`
`In an article entitled "Dynamic Programming Search for Continuous Speech
`
`Recognition" in IEEE Signal Processing Magazine, Sept 1999, Ney et al discuss
`
`language model lookahead. Language model lookahead involves computation of a
`
`language model factor for each node (i.e. phone) in the lexical tree. This technique is
`
`also known as smearing. Each phone instance in the search network can be given a
`
`language model factor when it is used in the lexical tree search. Ney et al show that for
`
`an example bigram language model, the average number of states per 10 ms frame can
`
`be reduced from around 168,000 states with no language model lookahead to around
`
`8,000 states when language model lookahead is used. They also show that bigram
`
`language model lookahead requires about a quarter of the states compared with unigram
`
`language model lookahead.
`
`Although these prior art documents provide improvements to speech recognition in
`
`embedded mobile technology, further improvement is still needed to provide a larger
`
`vocabulary and better accuracy.
`
`One aspect of the present invention provides a speech recognition circuit including a
`
`circuit for providing state identifiers which identify states corresponding to nodes or
`
`groups of adjacent nodes in a lexical tree, and for providing scores corresponding to
`
`IPR2023-00034
`Apple EX1002 Page 6
`
`
`
`3
`
`said state identifiers. The lexical tree includes a model of words. The speech recognition
`
`circuit also has a memory structure for receiving and storing state identifiers identified
`
`by a node identifier identifying nodes or groups of adjacent nodes, the memory structure
`
`being adapted to allow lookup to identify particular state identifiers, reading of the
`
`scores corresponding to the state identifiers, and writing back of the scores to the
`
`memory structure after modification of the scores. An accumulator is provided for
`
`receiving score updates corresponding to particular state identifiers from a score update
`
`generating circuit which generates the score updates using audio input, for receiving
`
`scores from the memory structure, and for modifying said scores by adding said score
`
`updates to said scores. A selector circuit is used for selecting at least one node or group
`
`of nodes of the lexical tree according to said scores.
`
`One suitable type of hardware for the memory structure includes a content addressable
`
`memory {CAM). A CAM is a memory unit which stores a series of data items using a
`
`series of addresses. However, the memory is accessed by specifying a data item, such
`
`that the CAM returns the corresponding address. This contrasts with a random access
`memory (RAM) in which the memory is accessed by specifying an address, such that
`
`the RAM returns the corresponding data item.
`
`However, the memory structure is not limited to including a CAM. Other types of
`hardware are also possible, to provide this functionality. For example, a single chip
`
`which operates in the same way as a CAM and RAM may be used instead.
`
`Embodiments of the present invention provide a solution to the problem of how to map
`
`a lexical tree search to a CAM system architecture. The realisation by the present
`
`inventors that certain speech recognition data structures can be mapped into the CAMs
`
`allows a lexical tree search to be performed using a CAM system architecture.
`
`Further embodiments of the invention include a counter for sequentially generating state
`
`identifiers, and using said generated state identifiers to sequentially lookup said states in
`
`the memory structure.
`
`IPR2023-00034
`Apple EX1002 Page 7
`
`
`
`4
`
`The node identifier may comprise a direct reference to the lexical tree. However, in
`
`some embodiments, the node identifier for at least some of the states includes a pointer
`
`to a node identifier for another state. For example, a state corresponding to the furthest
`
`part of the search path in the lexical tree may be referenced by a node identifier which
`
`directly links to a particular node or group of nodes in the lexical tree. In a lexical tree
`
`comprising phones, using a state model of triphones, the node identifier may indicate
`
`the position of a triphone in the lexical tree.
`
`However, in this example, for states occuring further back in the search path, instead of
`
`supplying a node identifier linking directly to the lexical tree, instead a pointer to a node
`
`identifier of another state may be supplied. E.g. a triphone instance may have a pointer
`to another triphone instance, which has a pointer to another triphone instance, which has
`
`a pointer to a node or group of nodes in the lexical tree. Chains of reference may be set
`
`up in this way, where only the last state in the chain has a direct pointer to the lexical
`
`tree.
`
`There may not be a one-to-one correspondence between the nodes of the lexical tree and
`
`the node identifiers. This will occur for a branched lexical tree, where the nodes
`
`represent monophones, but the acoustic model states represent triphones, i.e. groups of
`
`three adjacent monophones. Then, paths of three monophones will have unique
`
`identifiers to be stored in the memory structure, rather than single monophones having
`
`unique identifiers.
`
`Phone instance numbers may be generated, and used to uniquely label each phone
`
`instance. They can be generated sequentially, using a counter. The phone instance
`
`numbers may be used as pointers between phone instances to assist in node
`
`identification. It is thus not essential to provide a direct node identifier for each phone
`instance to directly indicate a location in the lexical tree. The dynamic network of phone
`
`instances provided in the memory structure may thus include both direct and relative
`references to the lexical tree.
`
`The memory structure may be divided into one part which stores phone instance
`
`identifiers and direct references to the lexical tree, and a second part which stores phone
`
`IPR2023-00034
`Apple EX1002 Page 8
`
`
`
`5
`
`instance identifiers and corresponding states. This can speed up the processing, by only
`
`storing the phone instances which are furthest on in the lexical tree in the first part of
`
`the memory structure.
`
`The memory structure may also be divided into separately accessable units, to reduce
`
`the amount to data in each unit, thereby decreasing the chance of finding the same two
`
`states identifiers in different phone instances in any single memory unit, and increasing
`
`the chance of some state identifiers being completely absent from any single memory
`
`unit. This makes it easier to deal with the situation when the same two state identifiers
`
`are found, because a spare time slot is available for processing when a state identifier is
`
`not present.
`
`A further aspect of the invention provides a distance calculation engine within a speech
`
`recognition system. The distance calculation engine may be included within an
`accelerator. The accelerator may include logic to interface with other parts of a speech
`recognition circuit, in addition to the distance engine, although this is not essential. For
`
`example, the accelerator may include one or more results memories for storing distances
`
`calculated by the distance calculation engine. The accelerator may also include at least
`
`one of a memory for storing one or more acoustic models, a decompressor for
`
`decompressing acoustic data that has been stored in a compressed format, a memory for
`
`storing feature vectors, a checksum or data signature calculation means, buffers for data
`
`storage, and data registers. The accelerator may be implemented in software or in
`
`hardware, or in a combination. It may be physically separate to the rest of the speech
`
`recognition circuit, although this is not essential.
`
`The distance calculation engine may calculate one or more of a wide range of distance
`
`metrics and probability distributions. The distances may represent the likely
`
`correspondance of feature vectors to states in an acoustic model. In other words, the
`distances can indicate the similarity of an audio data frame to each possible state in an
`
`acoustic model
`
`There are a wide variety of probability distributions that can be used for the distance
`
`calculation stage of a speech recogniser, and a wide variety of distance metrics used.
`
`IPR2023-00034
`Apple EX1002 Page 9
`
`
`
`These are widely documented in the literature. A point is a simple example of a
`
`probability distribution.
`
`6
`
`A common choice is to use Gaussian Distributions and correspondingly the
`Mahalanobis Distance metric. The Gaussian probability distribution is then defined by a
`
`mean vector, which defines centre point in the N-dimensional space, and a Covariance
`matrix which defines the shape of the probability distribution. It is common to restrict
`
`the Covariance matrix to be a diagonal matrix ( only N non-zero values along the
`
`diagonal of the NxN matrix) which significantly lowers the implementation cost by
`
`reducing the number of arithmetic operations.
`
`In particular embodiments, the distance calculated is a Mahalanobis distance. Particular
`
`examples of this are described later in the specification.
`
`In one embodiment,
`
`the distance engine autonomously computes all of the distances
`
`associated with a given feature vector. This may comprise computing distances for
`
`every state in the lexicon. The distance engine may operate in a pipelined manner with
`other stages of the recognition process. In this context a distance is an indication of the
`
`probability or likelihood that a feature vector corresponds to a particular state. An
`
`important class of distance computation in speech recognition is the calculation of
`
`output state probabilities in recognisers using Hidden Markov Models. Another
`
`possible use is in recognisers using Neural Networks.
`
`The distance engine reads data from the acoustic models to use as parameters in the
`
`calculation. The acoustic models may be optionally stored in a compressed format. The
`
`distance engine may read and de-compress the acoustic models one ( or more) times for
`each feature vector processed. Each reading of the acoustic models may require reading
`the entire acoustic model, or various optimisations may be implemented to avoid
`reading parts of the acoustic model that are not required to be used for calculations with
`
`the current feature vector. The distance engine may use a de-compression method where
`
`the de-compression
`
`is sign or zero extension or may otherwise convert data of narrow or
`
`variable width to a wider data format. The distance engine may use a de-compression
`method where the de-compression
`is sign or zero extension or may otherwise convert
`
`IPR2023-00034
`Apple EX1002 Page 10
`
`
`
`7
`
`data of narrow or variable width to IEEE standard single or double precision floating
`
`point format. The distance engine may use a decompression method where
`
`decompression is a codebook decompression of a binary bitstream, where the codebook
`
`is stored as part of the acoustic model data. The distance engine may use a
`
`decompression method where the decompression is decompression of a Huffman or
`
`Lempel-Ziv compressed stream. The distance engine may use a decompression method
`
`where decompression is decompression of run length encoded data. The distance engine
`may use a decompression method where decompression is decompression of difference
`
`encoded data. The distance engine may use a decompression method using any well
`
`known lossy or lossless compression scheme. The distance engine may use a
`
`decompression method using subspace distribution clustering. The distance engine may
`use a decompression method comprising any combination of the above described
`
`decompression types.
`
`The distance engine may read the acoustic models from a dedicated on-chip memory.
`
`The distance engine may read the acoustic models from a dedicated off-chip memory.
`
`The distance engine may read the acoustic models from a shared on-chip memory. The
`
`distance engine may read the acoustic models from a shared off-chip memory. Any of
`these acoustic models may be compressed.
`
`The distance engine may compute a CRC or checksum or similar signature as it reads in
`
`the acoustic model and compares this to a stored CRC, checksum, or signature, in order
`to check that the acoustic model has not been corrupted, and signals an error condition
`
`if such corruption is detected. The stored CRC, Checksum, or signature may had been
`
`pre-computed and stored in the model data, or it may be computed at the time the model
`
`data is loaded into the Acoustic Model Memory. It may be held in Acoustic Model
`
`Memory, or it may be loaded into a register or another memory from where it can be
`
`accessed and compared when the CRC/checksum/signature
`
`is computed each time the
`
`Acoustic Model is loaded.
`
`The distance engine may support the pass-through of data from the front-end to the
`search-stage. The data to be passed through will be supplied to the distance engine as an
`
`adjunct to the feature vector, and the distance engine will pass it to the search stage as
`
`IPR2023-00034
`Apple EX1002 Page 11
`
`
`
`8
`
`an adjunct to the distance results. This provides a simple mechanism for passing frame(cid:173)
`
`specific data that is not involved in the distance calculation through to the search stage,
`
`and keeping it associated with the correct frame, which may otherwise be complex in
`
`pipelined systems with multiple processors. The data passed through may be for any
`
`purpose. Examples might include silence detected, end-of-audio-stream detected, a
`
`frame number, information that some intervening frames have been dropped, or data
`
`from another input device such as a button or keyboard in a multi-modal interface.
`
`The distance engine may be implemented in hardware, software, or
`
`a combination. Other stages may be implemented in hardware, software, or a
`combination. The distance engine may be implemented with any number representation
`
`format including fixed point or floating point arithmetic, or any mixture of number
`
`representation formats.
`
`In particular, the other stages may be implemented on a CPU, or on a DSP and CPU.
`
`The "DSP" and "CPU" may each be implemented as software programmable devices.
`
`The distance engine may implement one or more additional pipeline stages to overcome
`
`delays introduced by low bandwidth, high latency, or conflicted bus interfaces. The
`
`distance engine may also implement additional pipeline stages to maintain the same
`throughput while allowing more time for each distance calculation. Particular
`
`embodiments of the invention may include one or more of the above aspects.
`
`A further aspect of the invention comprises a speech recognition circuit, comprising: an
`
`audio front end for calculating a feature vector from an audio signal, wherein the feature
`
`vector comprises a plurality of extracted and/or derived quantities from said audio
`
`signal during a defined audio time frame; calculating circuit for calculating a distance
`indicating the similarity between a feature vector and a predetermined acoustic state of
`
`an acoustic model; and a search stage for using said calculated distances to identify
`
`words within a lexical tree, the lexical tree comprising a model of words; a buffer
`
`memory between the calculating circuit and the search stage, for receiving data passing
`from the calculating circuit to the search stage, wherein a processor in the search stage
`has higher bandwidth and/or lower latency access to the buffer compared to the
`
`IPR2023-00034
`Apple EX1002 Page 12
`
`
`
`9
`
`bandwidth and/or latency of direct transfer between the calculating circuit and the
`
`search stage. The data transfer from the calculating circuit to the buffer memory and/or
`
`from the buffer memory to the search stage may be performed as one or more sequential
`
`bursts. The data transfer to the buffer memory may be performed in parallel with data
`
`transfer to the calculating circuit and/or in parallel with data transfer to the search stage.
`
`A second buffer memory may be provided between the audio front end and the
`
`calculating circuit.
`
`A further aspect of the invention comprises a speech recognition circuit, comprising: an
`
`audio front end for calculating a feature vector from an audio signal, wherein the feature
`
`vector comprises a plurality of extracted and/or derived quantities from said audio
`
`signal during a defined audio time frame; calculating circuit for calculating a distance
`
`indicating the similarity between a feature vector and a predetermined acoustic state of
`
`an acoustic model; and a search stage for using said calculated distances to identify
`
`words within a lexical tree, the lexical tree comprising a model of words; comprising an
`
`elastic buffer between at least one of the front end and calculating circuit, or the
`
`calculating circuit and search stage, and/or for buffering said audio signal.
`
`A further aspect of the invention comprises an accelerator for a speech recognition
`
`circuit, the accelerator comprising: calculating means for calculating a distance
`
`indicating the similarity between a feature vector and a predetermined acoustic state of
`
`an acoustic model, wherein the feature vector comprises a plurality of extracted and/or
`
`derived quantities from an audio signal during a defined audio time frame; means for
`
`comparing a first version of a stored checksum of data representing said acoustic model
`
`and a second version of said stored checksum, wherein the second version is obtained
`
`from an updated measurement and calculation of the checksum; and means for
`
`indicating an error status if the checksums do not match.
`
`A further aspect of the invention comprises an accelerator for a speech recognition
`
`circuit, the accelerator comprising: calculating means for calculating a distance
`
`indicating the similarity between a feature vector and a predetermined acoustic state of
`an acoustic model, wherein the feature vector comprises a plurality of extracted and/or
`
`derived quantities from an audio signal during a defined audio time frame; wherein said
`
`f
`
`IPR2023-00034
`Apple EX1002 Page 13
`
`
`
`IO
`
`accelerator is configured to autonomously compute distances for every acoustic state
`
`defined by the acoustic model.
`
`A further aspect of the invention comprises an accelerator for calculating distances for
`
`a speech recognition circuit, the accelerator comprising: calculating circuit for
`calculating distances indicating the similarity between a feature vector and a plurality of
`
`predetermined acoustic states of an acoustic model, wherein the feature vector
`comprises a plurality of extracted and/or derived quantities from an audio signal during
`
`a defined audio time frame; first and second storage circuit, which may be referred to as
`
`result memories, each for storing calculated distances for at least one said audio time
`
`frame, and for making said stored distances available for use by another part of the
`
`speech recognition circuit; control circuit for controlling read and write access to the
`
`first and second storage circuit, said control means being configured to allow writing to
`
`one said storage means while the other said storage means is available for reading, to
`
`allow first calculated distances for one audio time frame to be written to one said
`
`storage means while second calculated distances for an earlier audio time frame are
`
`made available for reading from the other said storage means.
`
`Embodiments of the invention may comprise means for generating a checksum or
`
`computed signature for the acoustic model data stored in the memory, and means for
`
`comparing checksums or computed signatures that have been calculated at different
`
`times, to indicate an error status if the checksums do not match, one possible cause of
`
`such mismatch being that the acoustic model data has been overwritten by said other
`
`data and said error status being used to indicate that the acoustic model should be re(cid:173)
`
`loaded into the said memory.
`
`A further aspect of the invention comprises a speech recognition circuit comprising:
`lexical memory containing lexical data for word recognition, said lexical data
`
`comprising a lexical tree data structure comprising a model of words; means for
`
`accessing a state model corresponding to each phone or each group of phones in the
`
`lexical tree; a content addressable memory for storing content addressable data for each
`
`phone or group of phones, including states corresponding to said phone or group of
`
`phones, and for storing an address value for each said phone or group of phones; a
`RAM configured to store accumulated scores for each said phone or group of phones,
`
`IPR2023-00034
`Apple EX1002 Page 14
`
`
`
`11
`
`the accumulated scores being addressable by said address value for each said phone or
`
`group of phones; means to obtain scores that each of a plurality of frames of an audio
`
`signals corresponds to each of a plurality of said states; a counter to sequentially search
`
`for each said state in the content addressable memory, to obtain the corresponding
`
`address value if the state is found in the content addressable memory; means to use said
`
`address value to access an accumulated likelihood and an accumulator to add said
`
`likelihood to the accumulated likelihood; means to use the phones or groups of phones
`
`with the highest accumulated scores to obtain a plurality of next phones from the lexical
`
`tree which correspond to the next phone; and output means for outputting a lexical tree
`
`path of highest likelihood.
`
`A further aspect of the invention comprises speech recognition apparatus comprising: a
`
`lexical tree having a corresponding state model; means for obtaining scores of an audio
`
`input corresponding to each of a plurality of states in said state model; a content
`
`addressable memory for storing a marker indicating a part of the lexical tree, and one or
`
`more states associated with said part of the lexical tree; a random access memory
`
`addressable by the CAM output, to output accumulated scores for states corresponding
`
`to said parts of the lexical tree; adder means for adding likelihood to said accumulated
`
`likelihood, to be stored back in the RAM.
`
`A further aspect of the invention comprises speech recognition apparatus, comprising: a
`
`CAM-RAM arrangement for storing records including pointers to a lexical tree, and
`
`accumulative scores for states within the lexical tree; input means for obtaining scores
`
`that an audio frame corresponds to a particular state in the lexical tree; an accumulator
`
`for calculating the updated scores and modifying the records in the CAM-RAM
`
`accordingly; output means for outputting a path of highest likelihood in the lexical tree.
`
`A further aspect of the invention comprises a speech recognition method comprising:
`
`storing state identifiers which identify states corresponding to nodes or groups of
`
`adjacent nodes in a lexical tree, and scores corresponding to said state identifiers in a
`
`memory structure, the lexical tree comprising a model of words and the memory
`
`structure being adapted to allow lookup to identify particular state identifiers, reading of
`
`the scores corresponding to the state identifiers, and writing back of the scores to the
`
`IPR2023-00034
`Apple EX1002 Page 15
`
`
`
`12
`
`memory structure after modification of the scores; repeating the following sequence of
`
`steps for each of a plurality of incoming frames of an audio signal; obtaining score
`
`updates corresponding to the likelihoods that said frame of the audio signal corresponds
`
`to each of a plurality of said states; accessing said memory structure to obtain scores,
`
`updating the scores by adding score updates to the scores, and writing back the updated
`
`scores to the memory structure; determining if scores for states furthest on in the lexical
`
`tree correspond to a significant likelihood, and if so, then accessing the lexical tree to
`
`determine the next set of possible states; and storing the next set of possible states and
`
`said scores of significant likelihood in the memory structure.
`
`A further aspect of the invention comprises speech recognition circuit comprising: a
`
`circuit for providing state identifiers which identify states corresponding to phones or
`
`groups of adjacent phones in a lexical tree, and for providing scores corresponding to
`
`said state identifiers, the lexical tree comprising a model of words; a memory structure
`
`for receiving and storing state identifiers and phone instance identifiers uniquely
`
`identifying instances of phones or groups of phones in the lexical tree; said memory
`
`structure being adapted to allow lookup to identify particular state identifiers, reading of
`
`the scores corresponding to the state identifiers, and writing back of the scores to the
`
`memory structure after modification of the scores; an accumulator for receiving score
`
`updates corresponding to particular state identifiers from a score update generating
`
`circuit which generates the score updates using audio input, for receiving scores from
`
`the memory structure, and for modifying said scores by adding said score updates to
`
`said scores; and a selector circuit for selecting at least one phone instance identifier
`
`according to said scores.
`
`A speech recognition apparatus according to the invention may be embedded in or
`
`included in a mobile electronic device such as a mobile telephone, PDA (personal
`
`digital assistant), etc.
`
`Embodiments of the present invention will now be described, by way of example only,
`with reference to the accompanying drawings, in which:
`
`IPR2023-00034
`Apple EX1002 Page 16
`
`
`
`13
`Figure 1 is a block diagram of the system architecture for a speech recognition
`
`apparatus according to an embodiment of the invention;
`
`Figure 2 is a block diagram showing the main data structures used in the speech
`
`recognitio