throbber
Trials@uspto.gov
`571-272-7822
`
`
`
`
`
` Paper 10
`Entered: June 12, 2023
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`____________
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`____________
`APPLE INC.,
`Petitioner,
`v.
`ZENTIAN LIMITED,
`Patent Owner.
`____________
`
`IPR2023-00035
`Patent 10,062,377
`____________
`
`
`Before KEVIN F. TURNER, JEFFREY S. SMITH, and
`CHRISTOPHER L. OGDEN, Administrative Patent Judges.
`
`TURNER, Administrative Patent Judge.
`
`
`DECISION
`Granting Institution of Inter Partes Review
`35 U.S.C. § 314
`
`
`
`
`
`
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`I.
`
`INTRODUCTION
`
`Background
`A.
`Apple Inc. (“Petitioner”) filed a Petition (Paper 1, “Pet.”) requesting
`institution of inter partes review of claims 1–6 of U.S. Patent
`No. 10,062,377 B2 (Ex. 1001, “the ’377 Patent”). Zentian Limited (“Patent
`Owner”) filed a Preliminary Response (Paper 6, “Prelim. Resp.”).
`An inter partes review may be instituted only if “the information
`presented in the petition . . . and any [preliminary] response . . . shows that
`there is a reasonable likelihood that the petitioner would prevail with respect
`to at least 1 of the claims challenged in the petition.” 35 U.S.C. § 314(a)
`(2018). For the reasons given below, Petitioner has established a reasonable
`likelihood that it would prevail in showing the unpatentability of at least one
`of the challenged claims of the ’377 Patent. Accordingly, we institute an
`inter partes review of claims 1–6 of the ’377 Patent on the ground of
`unpatentability raised in the Petition.
`
`Related Proceedings
`B.
`Both parties identify the following judicial or administrative matters
`
`that would affect, or be affected by, a decision in this proceeding: Zentian
`Ltd. v. Apple Inc., Case No. 6:22-cv-00122 (W.D. Tex.); Zentian Ltd. v.
`Amazon.com, Inc., Case No. 6:22-cv-00123 (W.D. Tex.). Pet. 75; Paper 3,
`1. Both parties also identify related inter partes reviews: IPR2023-00033,
`IPR2023-00034, IPR2023-00036, and IPR2023-00037. Id.
`
`The ’377 Patent
`C.
`The ’377 Patent is titled “Distributed Pipelined Parallel Speech
`Recognition System,” and is directed to a speech recognition system using
`multiple programmable devices to perform different steps of the recognition
`
`2
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`process. Ex. 1001, code (54), 5:63–6:6, 6:20–33. Figure 16, reproduced
`below, illustrates a speech recognition system comprising a front end 103,
`distance calculation engine 104, and search stage 106:
`
`
`Figure 16 illustrates a block diagram showing an embodiment of a speech
`recognition system, illustrating data flow between parts thereof
`The ’377 patent teaches that an “audio input for speech recognition”
`may be input to the front end (for example at Front End 103) in the form of
`digital audio or analog audio that is converted to digital audio using an
`analog to digital converter. Ex. 1001, 12:50–53. “The audio input is divided
`into time frames, each time frame typically being on the order of 10 ms.” Id.
`at 12:53–55. “For each audio input time frame, the audio signal is converted
`into a feature vector. This may be done by splitting the audio signal into
`spectral components,” such as, for instance, 13 components plus their first
`and second derivatives, creating a total of 39 components. Id. at 12:56–58.
`The feature vector thus “represents a point in an N-dimensional space,”
`where N is generally in the range of 20 to 39. Id. at 13:19–23.
`
`3
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`Each feature vector is then passed to the calculating circuit, or
`distance calculation engine (for example element 104), which calculates a
`distance indicating the similarity between a feature vector and one or more
`predetermined acoustic states of an acoustic model. Ex. 1001, 5:63–6:2,
`25:33–35 (“Each feature vector is transferred to a distance calculation
`engine circuit 204, to obtain distances for each state of the acoustic model.”).
`“The distance calculator stage of the recognition process computes a
`probability or likelihood that a feature vector corresponds to a particular
`state.” Id. at 13:24–26. “The likelihood of each state is determined by the
`distance between the feature vector and each state.” Id. at 13:1–2. The
`distance calculation may be a Mahalanobis distance using Gaussian
`distributions. Id. at 4:20–21. “The MHD (Mahalanobis Distance) is a
`distance between two N-dimensional points, scaled by the statistical
`variation in each component.” Id. at 13:13–15. The distance calculation
`engine or calculating circuit “may be included within an accelerator” (Ex.
`1001, 3:59–61), which may be a “loosely bound co-processor for a CPU
`running speech recognition software,” and which “has the advantage of
`reducing computational load on the CPU, and reducing memory bandwidth
`load for the CPU.” Id. at 24:17–20; see Figs. 17–23.
`The distances calculated by the distance calculation engine are then
`transferred to search stage 106 of the speech processing circuit, which uses
`models, such as one or more word models and/or language models, to
`generate and output recognized text. Ex. 1001, 24:5–11. Search stage 106
`may use the distance calculations using a Hidden Markov Model (HMM) or
`a neural network. Id. at 4:40–44.
`
`4
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`Thus, “the DSP [element 103] provides a feature vector to the
`Accelerator, [element 104] and the accelerator provides a set of distance
`results to the CPU [element 105].” Ex. 1001, 27:45–50. The system,
`thereby, provides a speech recognition circuit with three programmable
`devices per independent claim 1.
`
`D. Challenged Claims
`Claim 1 is the sole independent claim challenged in this proceeding,
`with each of challenged claims 2–6 dependent on claim 1, directly or
`indirectly. Independent claim 1 is considered to be representative and is
`reproduced below:
`1. [1(Pre)] A speech recognition system comprising:
`[1(a)] a first programmable device programmed to
`calculate a feature vector from a digital audio stream, [1(b)]
`wherein the feature vector comprises a plurality of extracted
`and/or derived quantities from said digital audio stream during a
`defined audio time frame;
`[1(c)] a second programmable device programmed to
`calculate distances indicating the similarity between a feature
`vector and a plurality of acoustic states of an acoustic model
`[1(d)] wherein said feature vector is received by the second
`programmable device after it
`is calculated by the first
`programmable device; and
`[1(e)] a third programmable device programmed to
`identify spoken words in said digital audio stream using Hidden
`Markov Models and/or Neural Networks [1(f)] wherein said
`word identification uses one or more distances that were
`calculated by the second programmable device, [1(g)] wherein
`said identification of spoken words uses one or more distances
`calculated from a first feature vector; and
`[1(h)] a search stage for using the calculated distances to
`identify words within a lexical tree, the lexical tree comprising a
`model of words.
`
`5
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`Ex. 1001, 38:53–39:30 (with annotations provided by Petitioner, Pet. 77–
`78).
`
`Asserted Ground of Unpatentability
`E.
`Petitioner asserts the following ground of unpatentability (Pet. 5.),
`supported by the declaration of Mr. Christopher Schmandt (Ex. 1003):
`
`Claims Challenged 35 U.S.C. §
`1
`103(a)1
`2, 3
`103(a)
`4
`103(a)
`5
`103(a)
`
`6
`
`
`
`103(a)
`
`Reference(s)/Basis
`Jiang, 2 Smyth3
`Jiang, Smyth, Nguyen4
`Jiang, Smyth, Nguyen, Boike5
`Jiang, Smyth, Nguyen, Baumgartner6
`Jiang, Smyth, Nguyen, Boike,
`Baumgartner
`
`
`1 The Leahy-Smith America Invents Act (“AIA”), Pub. L. No. 112-29, 125
`Stat. 284, 285–88 (2011), revised 35 U.S.C. § 103 effective March 16, 2013.
`Because the challenged patent claims priority to an application filed before
`March 16, 2013, we refer to the pre-AIA version of § 103.
`2 U.S. Patent No. 6,374,219 B1, filed Feb. 20, 1998, issued Apr. 16, 2002
`(Ex. 1004, “Jiang”).
`3 U.S. Patent No. 5,819,222, issued Oct. 6, 1998 (Ex. 1005, “Smyth”).
`4 U.S. Patent No. 6,879,954 B2, filed Apr. 22, 2002, issued Apr. 12, 2005
`(Ex. 1047, “Nguyen”).
`5 U.S. Patent No. 6,959,376 B1, filed Oct. 11, 2001, issued Oct. 25, 2005
`(Ex. 1006, “Boike”).
`6 U.S. Patent Publication 2002/0049582 A1, published Apr. 25, 2002 (Ex.
`1007, “Baumgartner”).
`
`6
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`II. ANALYSIS
`A. Level of Ordinary Skill in the Art
`Petitioner, supported by Mr. Schmandt’s testimony, proposes that a
`person of ordinary skill in the art at the time of the invention “would have
`had a master’s degree in computer engineering, computer science, electrical
`engineering, or a related field, with at least two years of experience in the
`field of speech recognition, or a bachelor’s degree in the same fields with at
`least four years of experience in the field of speech recognition.” Pet. 4
`(citing Ex. 1003 ¶ 24). Patent Owner indicates that it does not challenge the
`qualifications proposed by Petitioner for a person of ordinary skill in the art.
`Prelim. Resp. 5.
`At this stage of the proceeding, we find Petitioner’s proposal
`consistent with the level of ordinary skill in the art reflected by the prior art
`of record, see Okajima v. Bourdeau, 261 F.3d 1350, 1355 (Fed. Cir. 2001);
`In re GPAC Inc., 57 F.3d 1573, 1579 (Fed. Cir. 1995); In re Oelrich, 579
`F.2d 86, 91 (CCPA 1978), and, therefore, we adopt Petitioner’s unopposed
`position as to the level of ordinary skill in the art for purposes of this
`decision.
`
`B. Claim Construction
`In this inter partes review, claims are construed using the same claim
`construction standard that would be used to construe the claims in a civil
`action under 35 U.S.C. § 282(b). See 37 C.F.R. § 42.100(b) (2020). The
`claim construction standard includes construing claims in accordance with
`the ordinary and customary meaning of such claims as understood by one of
`ordinary skill in the art at the time of the invention. See id.; Phillips v. AWH
`Corp., 415 F.3d 1303, 1312–14 (Fed. Cir. 2005) (en banc). In construing
`
`7
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`claims in accordance with their ordinary and customary meaning, we
`consider the specification and prosecution history. Phillips, 415 F.3d at
`1315–17.
`Neither party provides explicit claim constructions for claim features.
`Pet. 5–6; Prelim. Resp. 5. Therefore, we determine that it is not necessary to
`provide an express interpretation of any claim terms. See Nidec Motor
`Corp. v. Zhongshan Broad Ocean Motor Co. Matal, 868 F.3d 1013, 1017
`(Fed. Cir. 2017); Vivid Techs., Inc. v. Am. Sci. & Eng’g, Inc., 200 F.3d 795,
`803 (Fed. Cir. 1999) (“[O]nly those terms need be construed that are in
`controversy, and only to the extent necessary to resolve the controversy.”).
`
`C. Legal Standards – Obviousness
`The U.S. Supreme Court sets forth the framework for applying the
`statutory language of 35 U.S.C. § 103 in Graham v. John Deere Co. of
`Kansas City, 383 U.S. 1, 17–18 (1966):
`Under § 103, the scope and content of the prior art are to be
`determined; differences between the prior art and the claims at
`issue are to be ascertained; and the level of ordinary skill in the
`pertinent art resolved. Against this background, the obviousness
`or nonobviousness of the subject matter is determined. Such
`secondary considerations as commercial success, long felt but
`unsolved needs, failure of others, etc., might be utilized to give
`light to the circumstances surrounding the origin of the subject
`matter sought to be patented.
`As explained by the Supreme Court in KSR Int’l Co. v. Teleflex Inc.,
`Often, it will be necessary for a court to look to interrelated
`teachings of multiple patents; the effects of demands known to
`the design community or present in the marketplace; and the
`background knowledge possessed by a person having ordinary
`skill in the art, all in order to determine whether there was an
`apparent reason to combine the known elements in the fashion
`
`8
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`claimed by the patent at issue. To facilitate review, this analysis
`should be made explicit.
`550 U.S. 398, 418 (2007) (citing In re Kahn, 441 F.3d 977, 988 (Fed. Cir.
`2006) (“[R]ejections on obviousness grounds cannot be sustained by mere
`conclusory statements; instead, there must be some articulated reasoning
`with some rational underpinning to support the legal conclusion of
`obviousness.”)).
`“Whether an ordinarily skilled artisan would have been motivated to
`modify the teachings of a reference is a question of fact.” WBIP, LLC v.
`Kohler Co., 829 F.3d 1317, 1327 (Fed. Cir. 2016) (citations omitted).
`“[W]here a party argues a skilled artisan would have been motivated to
`combine references, it must show the artisan ‘would have had a reasonable
`expectation of success from doing so.’” Arctic Cat Inc. v. Bombardier
`Recreational Prods. Inc., 876 F.3d 1350, 1360–61 (Fed. Cir. 2017) (quoting
`In re Cyclobenzaprine Hydrochloride Extended-Release Capsule Patent
`Litig., 676 F.3d 1063, 1068–69 (Fed. Cir. 2012)).
`
`D. Obviousness of Claim 1 over Jiang in view of Smyth
`Petitioner asserts that the combination of Jiang and Smyth would have
`rendered the subject matter of claim 1 obvious to one of ordinary skill in the
`art at the time of the invention. Pet. 5–59. Patent Owner argues that the
`cited references do not teach or suggest at least limitations [1(c)] and [1(g)]
`and that the Petition has failed to establish the requisite motivation to
`combine the cited references. Prelim. Resp. 9–24. We begin with brief
`discussions of the cited references, and then consider Petitioner’s arguments
`with respect to the references’ teachings applied to the instant claim as well
`as Patent Owner’s arguments asserting deficiencies in this ground of
`unpatentability.
`
`9
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`Jiang
`1.
`Jiang is directed to “computer speech recognition performed by
`conducting a prefix tree search of a silence bracketed lexicon.” Ex. 1004,
`1:15–18. Possible words represented by the input data stream are provided
`as a prefix tree including a plurality of phoneme branches connected at
`nodes. Id. at 4:8–12. Speech is input into system 60 in the form of audible
`voice signal provided by the user to microphone 62, which converts the
`audible speech into an analog electric signal, which is converted by A/D
`converter 64 into a sequence of digital signals that are then provided to
`feature extraction module 66. Id. at 6:45–52. Feature extraction module 66
`divides the digital signal into frames, each approximately 10 ms in duration.
`Id. at 6:62–65.
`The frames are then preferably encoded by feature extraction module
`66 into a feature vector reflecting the spectral characteristics for a plurality
`of frequency bands. Ex. 1004, 6:65–7:1. “In the case of discrete and semi-
`continuous hidden Markov modeling, feature extraction module 66 also
`preferably encodes the feature vectors into one or more codewords using
`vector quantization techniques and a codebook derived from training data.”
`Id. at 7:1–5.
`Upon receiving the codewords from feature extraction module 66, and
`the boundary detection signal provided by silence detection module 68, tree
`search engine 74 accesses information stored in the phonetic speech unit
`model memory 72. Ex. 1004, 7:30–34. Based upon the HMMs stored in
`memory 72, tree search engine 74 determines a most likely phoneme
`represented by the codeword received from feature extraction module 66,
`
`10
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`and hence representative of the utterance received by the user of the system.
`Id. at 7:37–42.
`An example system of Jiang with feature extraction module 66
`functionally in pipeline with tree search engine 74 is illustrated in Figure 2,
`reproduced below:
`
`
`
`Figure 2 of Jiang illustrates a detailed block diagram of a portion of its
`system
`“In a preferred embodiment, feature extraction module 66 is a conventional
`array processor which performs spectral analysis on the digital signals.” Ex.
`1004, 6:53–55. “Further, tree search engine 74 is preferably implemented in
`CPU 21 (which may include one or more processors) or may be performed
`by a dedicated speech recognition processor employed by personal computer
`20.” Id. at 6:39–42.
`
`Smyth
`2.
`Smyth is directed to “[a] speech recognition system [which]
`recognizes connected speech using a plurality of vocabulary nodes.” Ex.
`
`11
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`1005, Abs. Smyth is of particular interest in the area of task-constrained
`connected word recognition where the task, for example, might be to
`recognize one of a set of account numbers or product codes. Id. at 1:18–21.
`Smyth teaches a system including feature extractor 33 for generating
`from a frame of samples a corresponding feature vector and classifier 34
`receiving the succession of feature vectors and operating on each with a
`plurality of model states, to generate recognition results. Ex. 1005, 5:5–9.
`Smyth further includes parsing processor 351, part of sequencer 35, arranged
`to read, at each frame, the state probabilities output by classifier processor
`341, which is part of classifier 34. Id. at 6:35–39. Parsing processor 351 is
`specifically configured to recognize certain phrases or words. Id. at 6:62.
`This system is illustrated in Figure 2, reproduced below:
`
`
`
`Figure 2 of Smyth illustrates a block diagram showing the functional
`elements of the recognition processor
`The frame generator 32 and feature extractor 33 are, in this
`embodiment, provided by a single suitably programmed digital signal
`processor (DSP) device “(such as the Motorola DSP 56000, or the Texas
`Instruments TMC C 320) or similar device.” Ex. 1005, 5:49–53. The
`
`12
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`classifying processor 341 may be a suitably programmed digital signal
`processing (DSP) device, may in particular be the same digital signal
`processing device as the feature extractor 33. Ex. 1005, 6:5–9. The parsing
`processor 351 may, for example, be a microprocessor “such as the Intel(TM) i-
`486(TM) microprocessor or the Motorola(TM) 68000 microprocessor, or may
`alternatively be a DSP device (for example, the same DSP device as is
`employed for any of the preceding processors).” Id. at 6:49–51.
`Frame generator 32 is arranged to receive speech samples, and feature
`extractor 33 receives frames from the frame generator 32 and generates, in
`each case, a set or vector of features, where each is a corresponding feature
`vector. Ex. 1005, 5:7, 20, 31–33.
`
`Classification processor 34 is arranged to read each state field within
`the memory 342 in turn, and calculate for each, using the current input
`feature coefficient set, the probability that the input feature set or vector
`corresponds to the corresponding state. Ex. 1005, 5:63–67. “Accordingly,
`the output of the classification processor is a plurality of state probabilities
`P, one for each state in the state memory 342, indicating the likelihood that
`the input feature vector corresponds to each state.” Id. at 6:1–4.
`Parsing processor 351 is arranged to read, at each frame, the state
`probabilities output by the classifier processor 341, and the previous stored
`state probabilities in the state probability memory 353, and to calculate the
`most likely path of states to date over time, and to compare this with each of
`the state sequences stored in the state sequence memory 352. Ex. 1005,
`6:36–43. “The calculation employs the well known Hidden Markov Model
`method.” Id. at 6:43–45. Accordingly, for each state sequence
`(corresponding to a word, phrase or other speech sequence to be recognized)
`
`13
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`a probability score is output by the parser processor 351 at each frame of
`input speech. For example, the state sequences may comprise the names in a
`telephone directory. Id. at 6:52–56.
`
`Claim 1
`3.
`Element 1(Pre): Preamble, “A speech recognition system”
`a.
`With respect to the preamble of claim 1, Petitioner asserts that “Jiang
`teaches a speech recognition system: ‘A speech recognition system
`recognizes speech based on an input data stream indicative of the speech.’”
`Pet. 10 (citing Ex. 1004, 4:8–9, 4:30-32, 1:15–19, 6:18–20, code (57), Figs.
`1–2).
`
`Patent Owner does not contest that Jiang is directed to “a speech
`recognition system.” Rather, Patent Owner states that Jiang is directed to
`“speech recognition techniques” in a “speech recognition system” (Prelim
`Resp. 21), thereby indicating that Jiang discloses or teaches the preamble.
`We find that at this stage of the proceeding, on the present record,
`Petitioner sufficiently establishes that Jiang meets the limitations of the
`preamble of claim 1 for the reasons explained by Petitioner.
`
`b.
`
`Element 1(a): “a first programmable device programmed to
`calculate a feature vector from a digital audio stream”
`With respect to a first programmable device calculating a feature
`vector from audio data, Petitioner asserts that Jiang teaches a feature
`extraction module 66 for calculating a feature vector, as claimed. Jiang’s
`feature extraction module 66 is a programmable device, as the module 66
`may be a hardware or software module in computer 20 having CPU 21. Pet.
`10 (citing Ex. 1004, 6:31-36, 6:46–7:10).
`
`14
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`Petitioner further states that to any extent Patent Owner argues that
`Jiang fails to teach the element, “Smyth teaches a ‘first programmable
`device’ programmed to calculate a feature vector, as claimed, namely ‘a
`single suitably programmed digital signal processor (DSP) device . . . or
`similar device’ providing frame generator 32 and feature extractor 33.”
`Pet. 14 (citing Ex. 1005, 5:49–53). Petitioner continues with the assertion
`that Smyth also teaches that its programmable device is programmed to
`calculate a feature vector of a digital audio stream. Pet. 14–15 (citing Ex.
`1005, 5:2–3).
`Patent Owner does not offer counterarguments with respect to this
`element of claim 1 in the Preliminary Response. We find that at this stage of
`the proceeding, on the present record, Petitioner sufficiently establishes that
`both Jiang and Smyth meet this limitation of claim 1 for the reasons
`explained by Petitioner.
`
`c.
`
`Element 1(b): “wherein the feature vector comprises a plurality
`of extracted and/or derived quantities from said digital audio
`stream during a defined audio time frame”
`Petitioner asserts that Jiang teaches the calculated feature vector
`comprises extracted and/or derived quantities in the form of “spectral
`characteristics.” The frames are preferably encoded by feature extraction
`module 66 into a feature vector reflecting the spectral characteristics for a
`plurality of frequency bands. Pet. 17 (citing Ex. 1004, 6:62–7:10, 6:53–57).
`Petitioner further asserts that Jiang teaches the feature vector includes
`extracted and/or derived quantities of the digital audio stream “during a
`defined audio time frame,” as claimed. Jiang teaches the digital signal is
`divided into frames of approximately 10 milliseconds in duration, where the
`frames are then encoded by the feature extraction module 66 into a feature
`
`15
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`vector, where the feature vector “reflect[s] the spectral characteristics.” Pet.
`18 (citing Ex. 1004, 6:62–7:1).
`Patent Owner does not offer counterarguments with respect to this
`element of claim 1 in the Preliminary Response. We find that at this stage of
`the proceeding, on the present record, Petitioner sufficiently establishes that
`Jiang meets this limitation of claim 1 for the reasons explained by Petitioner.
`
`d.
`
`Element 1(c): “a second programmable device programmed to
`calculate distances indicating the similarity between a feature
`vector and a plurality of acoustic states of an acoustic model”
`Petitioner asserts that Jiang in combination with Smyth teaches this
`limitation, with Smyth relied on in the combination for teaching the “second
`programmable device.” Pet. 19. Petitioner asserts that both Jiang and
`Smyth teach the claimed functionality of calculating distances. Id. at 18.
`In regard to Jiang, Petitioner demonstrates that Jiang uses HMMs to
`identify speech units, thereby teaching an acoustic model with a plurality of
`acoustic states. Pet. 21. Petitioner further maps the HMM states to the
`recited distance calculations, asserting a distance is a probability or
`likelihood of a feature vector compared to a particular state of an HMM,
`thereby equating distances with probabilities or likelihoods. Id. at 20–21.
`Petitioner notes that Jiang teaches encoding feature vectors into
`codewords using vector quantitation techniques, which will be lossy. Pet.
`22. Jiang teaches these codewords are applied to the HMMs to identify
`utterances. Id. at 23. Petitioner argues that since a probability is associated
`with a corresponding state of an HMM, this represents a likelihood that a
`codeword corresponds to a phoneme utterance, which Petitioner further
`argues is the recited “distance” of this limitation. Id. at 23–25.
`
`16
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`In regard to Smyth, Petitioner asserts Smyth teaches “three distinct
`DSPs for performing the three general speech recognition steps” of feature
`vector calculation, distance calculation, and word identification. Pet. 28.
`Petitioner further asserts that Smyth discloses the second programmable
`device is programmed to calculate distances as recited. Id. at 34. In support
`of this proposition, Petitioner reasons Smyth teaches calculating a plurality
`of probabilities, and that this operation is calculating distances as recited
`because processor 341 receives the feature vector and calculates the
`probability of correspondence. Id. at 37.
`Patent Owner disputes Petitioner’s assertions with respect to this
`feature. Prelim Resp. 10–17. In regard to Jiang, Patent Owner argues that
`the features of claim 1 require the distance be calculated from a first feature
`vector, and Jiang fails to disclose or teach this requirement. Id. at 10.
`Further, Patent Owner argues that the distances must be calculated from the
`feature vectors, and not from a derived value. Id. at 11. Patent Owner
`contends that in contrast, Jiang encodes the feature vectors into [concordant]
`codewords, and then uses the codewords for word recognition. Id. at 12–13.
`Patent Owner argues that determining likelihood scores based on codewords
`is a different technique from calculating distances based on feature vectors.
`Id. Patent Owner reiterates that the distance is a vector in a space, and
`buttresses the asserted distinction between the probability calculations of
`Jiang and the distance calculations as recited by arguing that codewords are
`used to obviate the need for distance calculations. Id. at 4, 16.
`In regard to Smyth, Patent Owner does not offer counterarguments
`with regard to the asserted teaching of Smyth by Petitioner in respect to this
`element of claim 1 in the Preliminary Response. Patent Owner instead
`
`17
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`argues there is no motivation to combine the teaching of Smyth with Jiang
`and to do so would not have been obvious. Prelim Resp. 17. This argument
`against applying the teaching of Smyth to Jiang is addressed separately
`below. We are not persuaded by Patent Owner’s arguments with respect to
`Jiang, which we discuss below.
`The preliminary evidence suggests that codewords are representations
`of feature vectors or groups of similarly clustered feature vectors, such that a
`distance between a codeword and an acoustic state indicates the similarity
`between that acoustic state and any feature vector that the codeword
`represents. See Ex. 1003 ¶ 141. Claim 1 does not require a direct
`comparison between a feature vector and an acoustic state, or proscribe use
`of codewords to represent feature vectors. That is to say, so long as the
`calculated distance indicates the similarity between a feature vector and an
`acoustic state, this limitation does not proscribe using codewords or other
`intermediate representations of a feature vector in the distance calculation.
`Because Jiang discloses using codewords as representations of feature
`vectors, and the calculated distance using codewords indicates the similarity
`between a feature vector and acoustic states within an acoustic model, we
`are persuaded that Jiang teaches “indicating the similarity between a feature
`vector and a plurality of acoustic states.”
`We find that at this stage of the proceeding, on the present record,
`Petitioner sufficiently establishes that the combination of Jiang and Smyth
`meets this limitation of claim 1 for the reasons explained by Petitioner.
`
`18
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`e.
`
`Element 1(d): “wherein said feature vector is received by the
`second programmable device after it is calculated by the first
`programmable device”
`Petitioner asserts that Jiang teaches the recited element where it
`discloses that tree search engine 74 receives codewords from feature
`extraction module 66, where the codeword is a representation of a feature
`vector; based upon HMMs stored in memory, tree search engine 74
`determines a most likely phoneme represented by the codeword received
`from feature extraction module 66. Pet. 41–42 (citing Ex. 1004, 7:30–42).
`Therefore, Petitioner seems to assert that because codewords represent
`feature vectors, receiving the codewords at tree search engine 74 after the
`feature vectors are calculated by feature extraction module 66 and converted
`to codewords teaches the recited.
`Petitioner further asserts that Smyth teaches the recited element
`because Smyth teaches feature extractor 33 which calculates feature vectors,
`and classifier 34 which uses the feature vectors calculated by feature
`extractor 33 to determine probability the likelihood that a feature vector
`corresponds to a state. Pet. 42–43 (citing Ex. 1005, 5:7–9, 63–67).
`Patent Owner does not offer counterarguments with respect to this
`element of claim 1 in the Preliminary Response. At this stage of the
`proceeding, on the present record, we determine that Petitioner has
`sufficiently established that an ordinarily skilled artisan would have
`understood that both Jiang and Smyth teach to calculate feature vectors at a
`device, and then pass the feature vectors or a representation thereof to a
`further device for further computation.
`
`19
`
`

`

`IPR2023-00035
`Patent 10,062,377 B2
`
`
`f.
`
`Element 1(e): “a third programmable device programmed to
`identify spoken words in said digital audio stream using Hidden
`Markov Models and/or Neural Networks”
`Petitioner asserts that Jiang alone or in combination with Smyth
`teaches this claim element of claim 1. Pet. 44.
`First, in regard to Jiang, Petitioner asserts that Jiang teaches the
`recited third programmable device. Pet. 44. Petitioner also argues it would
`have been obvious to use a third programmable device because it would
`have been obvious to use single function devices for each functional stage of
`the three-stage pipeline of feature vector calculation, similarity
`determination, and word identification. Id. at 44–45 (citing Ex. 1003 ¶¶ 71–
`72, 111–120, 156, 187–188). Petitioner additionally cites to Smyth as
`teaching three programmable devices when applied to Jiang. Id.
`Further in regard to Jiang, Petitioner asserts Jiang teaches the recited
`features of identifying spoken words using HMM. Pet. 45. Petitioner states
`that the HMMs are used to determine a score for a node and associated
`phoneme branch of the prefix tree, such that the score indicates the
`likelihood the phoneme being examined for the phoneme branch is the actual
`phoneme indicated by the codeword. Id. (citing Ex. 1004, 8:16–24, 7:30–
`43).
`
`Second, in regard to Smyth, Petitioner asserts Smyth teaches the
`claimed third programmable device which identifies words, citing to the
`parsing processor 351 of Smyth. Pet. 47. Petitioner states that parsing
`processor 351 calculates the most likely path of states, compares this path
`with stored state sequences, and identifies words and phrases, where parsing
`processor 351 maybe a microprocessor, or may alternatively be a DSP
`device. Id. (citing Ex. 1005, 6:36–60).
`
`20
`
`

`

`IPR2023-00035

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket