throbber

`

`

`

`y
`
`
`
`IAP8Rec'dPCT/PTO 14MAR 2D_OI
`
`PTO-1390 (Rev. 07-2005)
`Approved for use through 03/31/2007. 0MB 0651-0021
`U. S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE
`Under the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of infonmation unless it displays a valid 0MB control number.
`
`'US ~P.A,,.01~"-ien
`
`20. [!j &her
`
`ii;m;"o~nform~o7
`
`INTERNATIONAL APPLICATION NO.
`PCT/GB2005/003554
`
`ATTORNEY'S DOCKET NUMBER
`M0025.0369/P369
`
`I
`WO 2006/030214, Notification Concerning Submission of Priority Document, Intl.
`Search Report
`
`c;: 1.5)_
`
`The following fees have been submitted
`Basic national fee (37 CFR 1.492(a)) .................................................
`
`$300
`
`CALCULATIONS
`300.00
`$
`
`21.0
`
`PTOUSEONLY
`
`Examination fee (37 CFR 1.492(c))
`22.0
`If the written opinion prepared by ISA/US or the international preliminary examination report
`prepared by IPEA/US indicates all claims satisfy provisions of PCT Article 33(1 )-(4} ................. $0
`, .................... $200
`••••••••••••••••• ........................................................
`All other situations
`••••••••••••••••••••••
`Search fee (37 CFR 1.492(b))
`23.G
`If the written opinion of the ISA/US or the international preliminary examination report prepared by
`IPEA/US indicates all claims satisfy provisions of PCT Article 33(1)·(4) ............................... $0
`Search fee (37 CFR 1.445(a)(2)) has been paid on the international application to the USPTO as an
`International Searching Authority ........................................................................................ $100
`International Search Report prepared by an ISA other than the US and provided to the Office or
`previously communicated to the US by the 1B .................................................................... $400
`$500
`All other situations ..........................................................................................................................
`TOTAL OF 21, 22 and 23 =
`~ Additional fee for specification and drawings filed in paper over 100 sheets (excluding
`sequence listing in compliance with 37 CFR 1.821 (c) or (e) or computer program listing in an
`electronic medium) (37 CFR 1.492(j)).
`The fee is $250 for each additional so sheets of paper or fraction thereof.
`
`Total Sheets
`
`Extra Sheets
`
`Number of each additional 50 or fraction
`thereof (round up to a whole number)
`
`104
`
`-100 =
`
`4
`
`/50 =
`
`1
`
`RATE
`
`X $250.00
`
`Surcharge of $130 for furnishing any of the search fee, examination fee, or the oath or declaration
`after the date of commencement of the national stage (37 CFR 1.492(h)).
`
`CLAIMS
`
`NUMBER FILED
`39 - 20 =
`Total claims
`6-3=
`Independent claims
`MULTIPLE DEPENDENT CLAIM(S) (if applicable)
`
`NUMBER EXTRA
`19
`3
`
`X
`
`X
`
`+
`
`RATE
`
`50.00
`200.00
`
`TOTAL OF ABOVE CALCULATIONS=
`
`~ Applicant claims small entity status. See 37 CFR 1.27. Fees above are reduced by½.
`
`SUBTOTAL=
`
`Processing fee of $130.00 for furnishing the English translation later than 30 months from the earliest
`claimed priority date (37 CFR 1.492(i)}.
`
`$
`
`200.00
`
`400.00
`
`900.00
`
`250.00
`
`950.00
`600.00
`
`2,700.00
`
`1,350.00
`
`1,350.00
`
`$
`
`$
`
`$
`
`$
`
`$
`
`$
`
`$
`
`Fee for recording the enclosed assignment (37 CFR 1.21 (h)). The assignment must be accompanied
`by an appropriate cover sheet (37 CFR 3.28, 3.31). $40.00 per property
`+
`
`$
`
`TOTAL NATIONAL FEE = $
`
`I
`
`TOTAL FEES ENCLOSED=
`
`$
`$
`Amount to be
`refunded:
`
`Amount to be
`charged
`
`1,350.00
`
`40.00
`
`1,390.00
`
`$
`
`$
`
`Page 2 of 3
`
`DSMDB-2229206v0
`
`I
`
`IPR2023-00034
`Apple EX1002 Page 3
`
`

`

`

`

`'► t t/662704
`IAP8 Rec'd PCT/PTO 14 MAR ZOOI
`
`1
`
`A Speech Recognition Circuit and Method
`
`The present invention relates to speech recognition circuits and methods. These circuits
`
`and methods have wide applicability, particularly for devices such as mobile electronic
`
`devices.
`
`There is growing consumer demand for embedded speech recognition in mobile
`
`electronic devices, such as mobile phones, dictation machines, PDAs (personal digital
`
`assistants), mobile games consoles, etc. For example, email and text message dictation,
`
`note taking, form filling, and command and control applications are all potential
`
`applications of embedded speech recognition.
`
`However, when a medium to large vocabulary is required, effective speech recognition
`
`for mobile electronic devices has many difficulties not associated with speech
`
`recognition systems in hardware systems such as personal computers or workstations.
`
`Firstly, the available power in mobile systems is often supplied by battery, and may be
`
`severely limited. Secondly, mobile electronic devices are frequently designed to be as
`small as practically possible. Thus, the memory and resources of such mobile embedded
`systems tends to be very limited, due to power and space restrictions. The cost of
`
`providing extra memory and resources in a mobile electronic device is typically much
`
`higher than that for a less portable device without this space restriction. Thirdly, the
`
`mobile hardware may be typically used in a noisier environment than that of a fixed
`
`computer, e.g. on public transport, near a busy road, etc. Thus, a more complex speech
`
`model and more intensive computation may be required to obtain adequate speech
`
`recognition results.
`
`These restrictions have made it difficult to implement effective speech recognition in
`mobile devices, other than with very limited vocabularies.
`
`Some prior art schemes have been proposed to increase the efficiency of speech
`
`recognition systems, in an attempt to make them more suitable for use in mobile
`technology.
`
`IPR2023-00034
`Apple EX1002 Page 5
`
`

`

`2
`
`In an article entitled "A low-power accelerator for the SPHINX 3 speech recognition
`
`system", in University of Utah, International conference on Compilers, Architectures
`
`and Synthesis for Embedded Systems, Nov 2003, Davis et al have proposed the idea of
`
`using a special purpose co-processor for up-front calculation of the computationally
`
`expensive Gaussian output probabilities of audio frames corresponding to particular
`
`states in the acoustic model.
`
`In an article entitled "Hardware Speech Recognition in Low Cost, Low Power Devices",
`
`University of California, Berkeley, CS252 Class Project, Spring 2003, Sukun Kim et al
`
`describe using special purpose processing elements for each of the nodes in the network
`
`to be searched. This effectively implies having a single processing element for each
`
`phone in the network. An alternative suggested by Sukun Kim et al is to provide a
`
`processor for each state in the network.
`
`In an article entitled "Dynamic Programming Search for Continuous Speech
`
`Recognition" in IEEE Signal Processing Magazine, Sept 1999, Ney et al discuss
`
`language model lookahead. Language model lookahead involves computation of a
`
`language model factor for each node (i.e. phone) in the lexical tree. This technique is
`
`also known as smearing. Each phone instance in the search network can be given a
`
`language model factor when it is used in the lexical tree search. Ney et al show that for
`
`an example bigram language model, the average number of states per 10 ms frame can
`
`be reduced from around 168,000 states with no language model lookahead to around
`
`8,000 states when language model lookahead is used. They also show that bigram
`
`language model lookahead requires about a quarter of the states compared with unigram
`
`language model lookahead.
`
`Although these prior art documents provide improvements to speech recognition in
`
`embedded mobile technology, further improvement is still needed to provide a larger
`
`vocabulary and better accuracy.
`
`One aspect of the present invention provides a speech recognition circuit including a
`
`circuit for providing state identifiers which identify states corresponding to nodes or
`
`groups of adjacent nodes in a lexical tree, and for providing scores corresponding to
`
`IPR2023-00034
`Apple EX1002 Page 6
`
`

`

`3
`
`said state identifiers. The lexical tree includes a model of words. The speech recognition
`
`circuit also has a memory structure for receiving and storing state identifiers identified
`
`by a node identifier identifying nodes or groups of adjacent nodes, the memory structure
`
`being adapted to allow lookup to identify particular state identifiers, reading of the
`
`scores corresponding to the state identifiers, and writing back of the scores to the
`
`memory structure after modification of the scores. An accumulator is provided for
`
`receiving score updates corresponding to particular state identifiers from a score update
`
`generating circuit which generates the score updates using audio input, for receiving
`
`scores from the memory structure, and for modifying said scores by adding said score
`
`updates to said scores. A selector circuit is used for selecting at least one node or group
`
`of nodes of the lexical tree according to said scores.
`
`One suitable type of hardware for the memory structure includes a content addressable
`
`memory {CAM). A CAM is a memory unit which stores a series of data items using a
`
`series of addresses. However, the memory is accessed by specifying a data item, such
`
`that the CAM returns the corresponding address. This contrasts with a random access
`memory (RAM) in which the memory is accessed by specifying an address, such that
`
`the RAM returns the corresponding data item.
`
`However, the memory structure is not limited to including a CAM. Other types of
`hardware are also possible, to provide this functionality. For example, a single chip
`
`which operates in the same way as a CAM and RAM may be used instead.
`
`Embodiments of the present invention provide a solution to the problem of how to map
`
`a lexical tree search to a CAM system architecture. The realisation by the present
`
`inventors that certain speech recognition data structures can be mapped into the CAMs
`
`allows a lexical tree search to be performed using a CAM system architecture.
`
`Further embodiments of the invention include a counter for sequentially generating state
`
`identifiers, and using said generated state identifiers to sequentially lookup said states in
`
`the memory structure.
`
`IPR2023-00034
`Apple EX1002 Page 7
`
`

`

`4
`
`The node identifier may comprise a direct reference to the lexical tree. However, in
`
`some embodiments, the node identifier for at least some of the states includes a pointer
`
`to a node identifier for another state. For example, a state corresponding to the furthest
`
`part of the search path in the lexical tree may be referenced by a node identifier which
`
`directly links to a particular node or group of nodes in the lexical tree. In a lexical tree
`
`comprising phones, using a state model of triphones, the node identifier may indicate
`
`the position of a triphone in the lexical tree.
`
`However, in this example, for states occuring further back in the search path, instead of
`
`supplying a node identifier linking directly to the lexical tree, instead a pointer to a node
`
`identifier of another state may be supplied. E.g. a triphone instance may have a pointer
`to another triphone instance, which has a pointer to another triphone instance, which has
`
`a pointer to a node or group of nodes in the lexical tree. Chains of reference may be set
`
`up in this way, where only the last state in the chain has a direct pointer to the lexical
`
`tree.
`
`There may not be a one-to-one correspondence between the nodes of the lexical tree and
`
`the node identifiers. This will occur for a branched lexical tree, where the nodes
`
`represent monophones, but the acoustic model states represent triphones, i.e. groups of
`
`three adjacent monophones. Then, paths of three monophones will have unique
`
`identifiers to be stored in the memory structure, rather than single monophones having
`
`unique identifiers.
`
`Phone instance numbers may be generated, and used to uniquely label each phone
`
`instance. They can be generated sequentially, using a counter. The phone instance
`
`numbers may be used as pointers between phone instances to assist in node
`
`identification. It is thus not essential to provide a direct node identifier for each phone
`instance to directly indicate a location in the lexical tree. The dynamic network of phone
`
`instances provided in the memory structure may thus include both direct and relative
`references to the lexical tree.
`
`The memory structure may be divided into one part which stores phone instance
`
`identifiers and direct references to the lexical tree, and a second part which stores phone
`
`IPR2023-00034
`Apple EX1002 Page 8
`
`

`

`5
`
`instance identifiers and corresponding states. This can speed up the processing, by only
`
`storing the phone instances which are furthest on in the lexical tree in the first part of
`
`the memory structure.
`
`The memory structure may also be divided into separately accessable units, to reduce
`
`the amount to data in each unit, thereby decreasing the chance of finding the same two
`
`states identifiers in different phone instances in any single memory unit, and increasing
`
`the chance of some state identifiers being completely absent from any single memory
`
`unit. This makes it easier to deal with the situation when the same two state identifiers
`
`are found, because a spare time slot is available for processing when a state identifier is
`
`not present.
`
`A further aspect of the invention provides a distance calculation engine within a speech
`
`recognition system. The distance calculation engine may be included within an
`accelerator. The accelerator may include logic to interface with other parts of a speech
`recognition circuit, in addition to the distance engine, although this is not essential. For
`
`example, the accelerator may include one or more results memories for storing distances
`
`calculated by the distance calculation engine. The accelerator may also include at least
`
`one of a memory for storing one or more acoustic models, a decompressor for
`
`decompressing acoustic data that has been stored in a compressed format, a memory for
`
`storing feature vectors, a checksum or data signature calculation means, buffers for data
`
`storage, and data registers. The accelerator may be implemented in software or in
`
`hardware, or in a combination. It may be physically separate to the rest of the speech
`
`recognition circuit, although this is not essential.
`
`The distance calculation engine may calculate one or more of a wide range of distance
`
`metrics and probability distributions. The distances may represent the likely
`
`correspondance of feature vectors to states in an acoustic model. In other words, the
`distances can indicate the similarity of an audio data frame to each possible state in an
`
`acoustic model
`
`There are a wide variety of probability distributions that can be used for the distance
`
`calculation stage of a speech recogniser, and a wide variety of distance metrics used.
`
`IPR2023-00034
`Apple EX1002 Page 9
`
`

`

`These are widely documented in the literature. A point is a simple example of a
`
`probability distribution.
`
`6
`
`A common choice is to use Gaussian Distributions and correspondingly the
`Mahalanobis Distance metric. The Gaussian probability distribution is then defined by a
`
`mean vector, which defines centre point in the N-dimensional space, and a Covariance
`matrix which defines the shape of the probability distribution. It is common to restrict
`
`the Covariance matrix to be a diagonal matrix ( only N non-zero values along the
`
`diagonal of the NxN matrix) which significantly lowers the implementation cost by
`
`reducing the number of arithmetic operations.
`
`In particular embodiments, the distance calculated is a Mahalanobis distance. Particular
`
`examples of this are described later in the specification.
`
`In one embodiment,
`
`the distance engine autonomously computes all of the distances
`
`associated with a given feature vector. This may comprise computing distances for
`
`every state in the lexicon. The distance engine may operate in a pipelined manner with
`other stages of the recognition process. In this context a distance is an indication of the
`
`probability or likelihood that a feature vector corresponds to a particular state. An
`
`important class of distance computation in speech recognition is the calculation of
`
`output state probabilities in recognisers using Hidden Markov Models. Another
`
`possible use is in recognisers using Neural Networks.
`
`The distance engine reads data from the acoustic models to use as parameters in the
`
`calculation. The acoustic models may be optionally stored in a compressed format. The
`
`distance engine may read and de-compress the acoustic models one ( or more) times for
`each feature vector processed. Each reading of the acoustic models may require reading
`the entire acoustic model, or various optimisations may be implemented to avoid
`reading parts of the acoustic model that are not required to be used for calculations with
`
`the current feature vector. The distance engine may use a de-compression method where
`
`the de-compression
`
`is sign or zero extension or may otherwise convert data of narrow or
`
`variable width to a wider data format. The distance engine may use a de-compression
`method where the de-compression
`is sign or zero extension or may otherwise convert
`
`IPR2023-00034
`Apple EX1002 Page 10
`
`

`

`7
`
`data of narrow or variable width to IEEE standard single or double precision floating
`
`point format. The distance engine may use a decompression method where
`
`decompression is a codebook decompression of a binary bitstream, where the codebook
`
`is stored as part of the acoustic model data. The distance engine may use a
`
`decompression method where the decompression is decompression of a Huffman or
`
`Lempel-Ziv compressed stream. The distance engine may use a decompression method
`
`where decompression is decompression of run length encoded data. The distance engine
`may use a decompression method where decompression is decompression of difference
`
`encoded data. The distance engine may use a decompression method using any well
`
`known lossy or lossless compression scheme. The distance engine may use a
`
`decompression method using subspace distribution clustering. The distance engine may
`use a decompression method comprising any combination of the above described
`
`decompression types.
`
`The distance engine may read the acoustic models from a dedicated on-chip memory.
`
`The distance engine may read the acoustic models from a dedicated off-chip memory.
`
`The distance engine may read the acoustic models from a shared on-chip memory. The
`
`distance engine may read the acoustic models from a shared off-chip memory. Any of
`these acoustic models may be compressed.
`
`The distance engine may compute a CRC or checksum or similar signature as it reads in
`
`the acoustic model and compares this to a stored CRC, checksum, or signature, in order
`to check that the acoustic model has not been corrupted, and signals an error condition
`
`if such corruption is detected. The stored CRC, Checksum, or signature may had been
`
`pre-computed and stored in the model data, or it may be computed at the time the model
`
`data is loaded into the Acoustic Model Memory. It may be held in Acoustic Model
`
`Memory, or it may be loaded into a register or another memory from where it can be
`
`accessed and compared when the CRC/checksum/signature
`
`is computed each time the
`
`Acoustic Model is loaded.
`
`The distance engine may support the pass-through of data from the front-end to the
`search-stage. The data to be passed through will be supplied to the distance engine as an
`
`adjunct to the feature vector, and the distance engine will pass it to the search stage as
`
`IPR2023-00034
`Apple EX1002 Page 11
`
`

`

`8
`
`an adjunct to the distance results. This provides a simple mechanism for passing frame(cid:173)
`
`specific data that is not involved in the distance calculation through to the search stage,
`
`and keeping it associated with the correct frame, which may otherwise be complex in
`
`pipelined systems with multiple processors. The data passed through may be for any
`
`purpose. Examples might include silence detected, end-of-audio-stream detected, a
`
`frame number, information that some intervening frames have been dropped, or data
`
`from another input device such as a button or keyboard in a multi-modal interface.
`
`The distance engine may be implemented in hardware, software, or
`
`a combination. Other stages may be implemented in hardware, software, or a
`combination. The distance engine may be implemented with any number representation
`
`format including fixed point or floating point arithmetic, or any mixture of number
`
`representation formats.
`
`In particular, the other stages may be implemented on a CPU, or on a DSP and CPU.
`
`The "DSP" and "CPU" may each be implemented as software programmable devices.
`
`The distance engine may implement one or more additional pipeline stages to overcome
`
`delays introduced by low bandwidth, high latency, or conflicted bus interfaces. The
`
`distance engine may also implement additional pipeline stages to maintain the same
`throughput while allowing more time for each distance calculation. Particular
`
`embodiments of the invention may include one or more of the above aspects.
`
`A further aspect of the invention comprises a speech recognition circuit, comprising: an
`
`audio front end for calculating a feature vector from an audio signal, wherein the feature
`
`vector comprises a plurality of extracted and/or derived quantities from said audio
`
`signal during a defined audio time frame; calculating circuit for calculating a distance
`indicating the similarity between a feature vector and a predetermined acoustic state of
`
`an acoustic model; and a search stage for using said calculated distances to identify
`
`words within a lexical tree, the lexical tree comprising a model of words; a buffer
`
`memory between the calculating circuit and the search stage, for receiving data passing
`from the calculating circuit to the search stage, wherein a processor in the search stage
`has higher bandwidth and/or lower latency access to the buffer compared to the
`
`IPR2023-00034
`Apple EX1002 Page 12
`
`

`

`9
`
`bandwidth and/or latency of direct transfer between the calculating circuit and the
`
`search stage. The data transfer from the calculating circuit to the buffer memory and/or
`
`from the buffer memory to the search stage may be performed as one or more sequential
`
`bursts. The data transfer to the buffer memory may be performed in parallel with data
`
`transfer to the calculating circuit and/or in parallel with data transfer to the search stage.
`
`A second buffer memory may be provided between the audio front end and the
`
`calculating circuit.
`
`A further aspect of the invention comprises a speech recognition circuit, comprising: an
`
`audio front end for calculating a feature vector from an audio signal, wherein the feature
`
`vector comprises a plurality of extracted and/or derived quantities from said audio
`
`signal during a defined audio time frame; calculating circuit for calculating a distance
`
`indicating the similarity between a feature vector and a predetermined acoustic state of
`
`an acoustic model; and a search stage for using said calculated distances to identify
`
`words within a lexical tree, the lexical tree comprising a model of words; comprising an
`
`elastic buffer between at least one of the front end and calculating circuit, or the
`
`calculating circuit and search stage, and/or for buffering said audio signal.
`
`A further aspect of the invention comprises an accelerator for a speech recognition
`
`circuit, the accelerator comprising: calculating means for calculating a distance
`
`indicating the similarity between a feature vector and a predetermined acoustic state of
`
`an acoustic model, wherein the feature vector comprises a plurality of extracted and/or
`
`derived quantities from an audio signal during a defined audio time frame; means for
`
`comparing a first version of a stored checksum of data representing said acoustic model
`
`and a second version of said stored checksum, wherein the second version is obtained
`
`from an updated measurement and calculation of the checksum; and means for
`
`indicating an error status if the checksums do not match.
`
`A further aspect of the invention comprises an accelerator for a speech recognition
`
`circuit, the accelerator comprising: calculating means for calculating a distance
`
`indicating the similarity between a feature vector and a predetermined acoustic state of
`an acoustic model, wherein the feature vector comprises a plurality of extracted and/or
`
`derived quantities from an audio signal during a defined audio time frame; wherein said
`
`f
`
`IPR2023-00034
`Apple EX1002 Page 13
`
`

`

`IO
`
`accelerator is configured to autonomously compute distances for every acoustic state
`
`defined by the acoustic model.
`
`A further aspect of the invention comprises an accelerator for calculating distances for
`
`a speech recognition circuit, the accelerator comprising: calculating circuit for
`calculating distances indicating the similarity between a feature vector and a plurality of
`
`predetermined acoustic states of an acoustic model, wherein the feature vector
`comprises a plurality of extracted and/or derived quantities from an audio signal during
`
`a defined audio time frame; first and second storage circuit, which may be referred to as
`
`result memories, each for storing calculated distances for at least one said audio time
`
`frame, and for making said stored distances available for use by another part of the
`
`speech recognition circuit; control circuit for controlling read and write access to the
`
`first and second storage circuit, said control means being configured to allow writing to
`
`one said storage means while the other said storage means is available for reading, to
`
`allow first calculated distances for one audio time frame to be written to one said
`
`storage means while second calculated distances for an earlier audio time frame are
`
`made available for reading from the other said storage means.
`
`Embodiments of the invention may comprise means for generating a checksum or
`
`computed signature for the acoustic model data stored in the memory, and means for
`
`comparing checksums or computed signatures that have been calculated at different
`
`times, to indicate an error status if the checksums do not match, one possible cause of
`
`such mismatch being that the acoustic model data has been overwritten by said other
`
`data and said error status being used to indicate that the acoustic model should be re(cid:173)
`
`loaded into the said memory.
`
`A further aspect of the invention comprises a speech recognition circuit comprising:
`lexical memory containing lexical data for word recognition, said lexical data
`
`comprising a lexical tree data structure comprising a model of words; means for
`
`accessing a state model corresponding to each phone or each group of phones in the
`
`lexical tree; a content addressable memory for storing content addressable data for each
`
`phone or group of phones, including states corresponding to said phone or group of
`
`phones, and for storing an address value for each said phone or group of phones; a
`RAM configured to store accumulated scores for each said phone or group of phones,
`
`IPR2023-00034
`Apple EX1002 Page 14
`
`

`

`11
`
`the accumulated scores being addressable by said address value for each said phone or
`
`group of phones; means to obtain scores that each of a plurality of frames of an audio
`
`signals corresponds to each of a plurality of said states; a counter to sequentially search
`
`for each said state in the content addressable memory, to obtain the corresponding
`
`address value if the state is found in the content addressable memory; means to use said
`
`address value to access an accumulated likelihood and an accumulator to add said
`
`likelihood to the accumulated likelihood; means to use the phones or groups of phones
`
`with the highest accumulated scores to obtain a plurality of next phones from the lexical
`
`tree which correspond to the next phone; and output means for outputting a lexical tree
`
`path of highest likelihood.
`
`A further aspect of the invention comprises speech recognition apparatus comprising: a
`
`lexical tree having a corresponding state model; means for obtaining scores of an audio
`
`input corresponding to each of a plurality of states in said state model; a content
`
`addressable memory for storing a marker indicating a part of the lexical tree, and one or
`
`more states associated with said part of the lexical tree; a random access memory
`
`addressable by the CAM output, to output accumulated scores for states corresponding
`
`to said parts of the lexical tree; adder means for adding likelihood to said accumulated
`
`likelihood, to be stored back in the RAM.
`
`A further aspect of the invention comprises speech recognition apparatus, comprising: a
`
`CAM-RAM arrangement for storing records including pointers to a lexical tree, and
`
`accumulative scores for states within the lexical tree; input means for obtaining scores
`
`that an audio frame corresponds to a particular state in the lexical tree; an accumulator
`
`for calculating the updated scores and modifying the records in the CAM-RAM
`
`accordingly; output means for outputting a path of highest likelihood in the lexical tree.
`
`A further aspect of the invention comprises a speech recognition method comprising:
`
`storing state identifiers which identify states corresponding to nodes or groups of
`
`adjacent nodes in a lexical tree, and scores corresponding to said state identifiers in a
`
`memory structure, the lexical tree comprising a model of words and the memory
`
`structure being adapted to allow lookup to identify particular state identifiers, reading of
`
`the scores corresponding to the state identifiers, and writing back of the scores to the
`
`IPR2023-00034
`Apple EX1002 Page 15
`
`

`

`12
`
`memory structure after modification of the scores; repeating the following sequence of
`
`steps for each of a plurality of incoming frames of an audio signal; obtaining score
`
`updates corresponding to the likelihoods that said frame of the audio signal corresponds
`
`to each of a plurality of said states; accessing said memory structure to obtain scores,
`
`updating the scores by adding score updates to the scores, and writing back the updated
`
`scores to the memory structure; determining if scores for states furthest on in the lexical
`
`tree correspond to a significant likelihood, and if so, then accessing the lexical tree to
`
`determine the next set of possible states; and storing the next set of possible states and
`
`said scores of significant likelihood in the memory structure.
`
`A further aspect of the invention comprises speech recognition circuit comprising: a
`
`circuit for providing state identifiers which identify states corresponding to phones or
`
`groups of adjacent phones in a lexical tree, and for providing scores corresponding to
`
`said state identifiers, the lexical tree comprising a model of words; a memory structure
`
`for receiving and storing state identifiers and phone instance identifiers uniquely
`
`identifying instances of phones or groups of phones in the lexical tree; said memory
`
`structure being adapted to allow lookup to identify particular state identifiers, reading of
`
`the scores corresponding to the state identifiers, and writing back of the scores to the
`
`memory structure after modification of the scores; an accumulator for receiving score
`
`updates corresponding to particular state identifiers from a score update generating
`
`circuit which generates the score updates using audio input, for receiving scores from
`
`the memory structure, and for modifying said scores by adding said score updates to
`
`said scores; and a selector circuit for selecting at least one phone instance identifier
`
`according to said scores.
`
`A speech recognition apparatus according to the invention may be embedded in or
`
`included in a mobile electronic device such as a mobile telephone, PDA (personal
`
`digital assistant), etc.
`
`Embodiments of the present invention will now be described, by way of example only,
`with reference to the accompanying drawings, in which:
`
`IPR2023-00034
`Apple EX1002 Page 16
`
`

`

`13
`Figure 1 is a block diagram of the system architecture for a speech recognition
`
`apparatus according to an embodiment of the invention;
`
`Figure 2 is a block diagram showing the main data structures used in the speech
`
`recognitio

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket