`571-272-7822
`
`
`
`
`
` Paper 10
`Entered: June 12, 2023
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`____________
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`____________
`APPLE INC.,
`Petitioner,
`v.
`ZENTIAN LIMITED,
`Patent Owner.
`____________
`
`IPR2023-00035
`Patent 10,062,377
`____________
`
`
`Before KEVIN F. TURNER, JEFFREY S. SMITH, and
`CHRISTOPHER L. OGDEN, Administrative Patent Judges.
`
`TURNER, Administrative Patent Judge.
`
`
`DECISION
`Granting Institution of Inter Partes Review
`35 U.S.C. § 314
`
`
`
`
`
`
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`I.
`
`INTRODUCTION
`
`Background
`A.
`Apple Inc. (“Petitioner”) filed a Petition (Paper 1, “Pet.”) requesting
`institution of inter partes review of claims 1–6 of U.S. Patent
`No. 10,062,377 B2 (Ex. 1001, “the ’377 Patent”). Zentian Limited (“Patent
`Owner”) filed a Preliminary Response (Paper 6, “Prelim. Resp.”).
`An inter partes review may be instituted only if “the information
`presented in the petition . . . and any [preliminary] response . . . shows that
`there is a reasonable likelihood that the petitioner would prevail with respect
`to at least 1 of the claims challenged in the petition.” 35 U.S.C. § 314(a)
`(2018). For the reasons given below, Petitioner has established a reasonable
`likelihood that it would prevail in showing the unpatentability of at least one
`of the challenged claims of the ’377 Patent. Accordingly, we institute an
`inter partes review of claims 1–6 of the ’377 Patent on the ground of
`unpatentability raised in the Petition.
`
`Related Proceedings
`B.
`Both parties identify the following judicial or administrative matters
`
`that would affect, or be affected by, a decision in this proceeding: Zentian
`Ltd. v. Apple Inc., Case No. 6:22-cv-00122 (W.D. Tex.); Zentian Ltd. v.
`Amazon.com, Inc., Case No. 6:22-cv-00123 (W.D. Tex.). Pet. 75; Paper 3,
`1. Both parties also identify related inter partes reviews: IPR2023-00033,
`IPR2023-00034, IPR2023-00036, and IPR2023-00037. Id.
`
`The ’377 Patent
`C.
`The ’377 Patent is titled “Distributed Pipelined Parallel Speech
`Recognition System,” and is directed to a speech recognition system using
`multiple programmable devices to perform different steps of the recognition
`
`2
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`process. Ex. 1001, code (54), 5:63–6:6, 6:20–33. Figure 16, reproduced
`below, illustrates a speech recognition system comprising a front end 103,
`distance calculation engine 104, and search stage 106:
`
`
`Figure 16 illustrates a block diagram showing an embodiment of a speech
`recognition system, illustrating data flow between parts thereof
`The ’377 patent teaches that an “audio input for speech recognition”
`may be input to the front end (for example at Front End 103) in the form of
`digital audio or analog audio that is converted to digital audio using an
`analog to digital converter. Ex. 1001, 12:50–53. “The audio input is divided
`into time frames, each time frame typically being on the order of 10 ms.” Id.
`at 12:53–55. “For each audio input time frame, the audio signal is converted
`into a feature vector. This may be done by splitting the audio signal into
`spectral components,” such as, for instance, 13 components plus their first
`and second derivatives, creating a total of 39 components. Id. at 12:56–58.
`The feature vector thus “represents a point in an N-dimensional space,”
`where N is generally in the range of 20 to 39. Id. at 13:19–23.
`
`3
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`Each feature vector is then passed to the calculating circuit, or
`distance calculation engine (for example element 104), which calculates a
`distance indicating the similarity between a feature vector and one or more
`predetermined acoustic states of an acoustic model. Ex. 1001, 5:63–6:2,
`25:33–35 (“Each feature vector is transferred to a distance calculation
`engine circuit 204, to obtain distances for each state of the acoustic model.”).
`“The distance calculator stage of the recognition process computes a
`probability or likelihood that a feature vector corresponds to a particular
`state.” Id. at 13:24–26. “The likelihood of each state is determined by the
`distance between the feature vector and each state.” Id. at 13:1–2. The
`distance calculation may be a Mahalanobis distance using Gaussian
`distributions. Id. at 4:20–21. “The MHD (Mahalanobis Distance) is a
`distance between two N-dimensional points, scaled by the statistical
`variation in each component.” Id. at 13:13–15. The distance calculation
`engine or calculating circuit “may be included within an accelerator” (Ex.
`1001, 3:59–61), which may be a “loosely bound co-processor for a CPU
`running speech recognition software,” and which “has the advantage of
`reducing computational load on the CPU, and reducing memory bandwidth
`load for the CPU.” Id. at 24:17–20; see Figs. 17–23.
`The distances calculated by the distance calculation engine are then
`transferred to search stage 106 of the speech processing circuit, which uses
`models, such as one or more word models and/or language models, to
`generate and output recognized text. Ex. 1001, 24:5–11. Search stage 106
`may use the distance calculations using a Hidden Markov Model (HMM) or
`a neural network. Id. at 4:40–44.
`
`4
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`Thus, “the DSP [element 103] provides a feature vector to the
`Accelerator, [element 104] and the accelerator provides a set of distance
`results to the CPU [element 105].” Ex. 1001, 27:45–50. The system,
`thereby, provides a speech recognition circuit with three programmable
`devices per independent claim 1.
`
`D. Challenged Claims
`Claim 1 is the sole independent claim challenged in this proceeding,
`with each of challenged claims 2–6 dependent on claim 1, directly or
`indirectly. Independent claim 1 is considered to be representative and is
`reproduced below:
`1. [1(Pre)] A speech recognition system comprising:
`[1(a)] a first programmable device programmed to
`calculate a feature vector from a digital audio stream, [1(b)]
`wherein the feature vector comprises a plurality of extracted
`and/or derived quantities from said digital audio stream during a
`defined audio time frame;
`[1(c)] a second programmable device programmed to
`calculate distances indicating the similarity between a feature
`vector and a plurality of acoustic states of an acoustic model
`[1(d)] wherein said feature vector is received by the second
`programmable device after it
`is calculated by the first
`programmable device; and
`[1(e)] a third programmable device programmed to
`identify spoken words in said digital audio stream using Hidden
`Markov Models and/or Neural Networks [1(f)] wherein said
`word identification uses one or more distances that were
`calculated by the second programmable device, [1(g)] wherein
`said identification of spoken words uses one or more distances
`calculated from a first feature vector; and
`[1(h)] a search stage for using the calculated distances to
`identify words within a lexical tree, the lexical tree comprising a
`model of words.
`
`5
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`Ex. 1001, 38:53–39:30 (with annotations provided by Petitioner, Pet. 77–
`78).
`
`Asserted Ground of Unpatentability
`E.
`Petitioner asserts the following ground of unpatentability (Pet. 5.),
`supported by the declaration of Mr. Christopher Schmandt (Ex. 1003):
`
`Claims Challenged 35 U.S.C. §
`1
`103(a)1
`2, 3
`103(a)
`4
`103(a)
`5
`103(a)
`
`6
`
`
`
`103(a)
`
`Reference(s)/Basis
`Jiang, 2 Smyth3
`Jiang, Smyth, Nguyen4
`Jiang, Smyth, Nguyen, Boike5
`Jiang, Smyth, Nguyen, Baumgartner6
`Jiang, Smyth, Nguyen, Boike,
`Baumgartner
`
`
`1 The Leahy-Smith America Invents Act (“AIA”), Pub. L. No. 112-29, 125
`Stat. 284, 285–88 (2011), revised 35 U.S.C. § 103 effective March 16, 2013.
`Because the challenged patent claims priority to an application filed before
`March 16, 2013, we refer to the pre-AIA version of § 103.
`2 U.S. Patent No. 6,374,219 B1, filed Feb. 20, 1998, issued Apr. 16, 2002
`(Ex. 1004, “Jiang”).
`3 U.S. Patent No. 5,819,222, issued Oct. 6, 1998 (Ex. 1005, “Smyth”).
`4 U.S. Patent No. 6,879,954 B2, filed Apr. 22, 2002, issued Apr. 12, 2005
`(Ex. 1047, “Nguyen”).
`5 U.S. Patent No. 6,959,376 B1, filed Oct. 11, 2001, issued Oct. 25, 2005
`(Ex. 1006, “Boike”).
`6 U.S. Patent Publication 2002/0049582 A1, published Apr. 25, 2002 (Ex.
`1007, “Baumgartner”).
`
`6
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`II. ANALYSIS
`A. Level of Ordinary Skill in the Art
`Petitioner, supported by Mr. Schmandt’s testimony, proposes that a
`person of ordinary skill in the art at the time of the invention “would have
`had a master’s degree in computer engineering, computer science, electrical
`engineering, or a related field, with at least two years of experience in the
`field of speech recognition, or a bachelor’s degree in the same fields with at
`least four years of experience in the field of speech recognition.” Pet. 4
`(citing Ex. 1003 ¶ 24). Patent Owner indicates that it does not challenge the
`qualifications proposed by Petitioner for a person of ordinary skill in the art.
`Prelim. Resp. 5.
`At this stage of the proceeding, we find Petitioner’s proposal
`consistent with the level of ordinary skill in the art reflected by the prior art
`of record, see Okajima v. Bourdeau, 261 F.3d 1350, 1355 (Fed. Cir. 2001);
`In re GPAC Inc., 57 F.3d 1573, 1579 (Fed. Cir. 1995); In re Oelrich, 579
`F.2d 86, 91 (CCPA 1978), and, therefore, we adopt Petitioner’s unopposed
`position as to the level of ordinary skill in the art for purposes of this
`decision.
`
`B. Claim Construction
`In this inter partes review, claims are construed using the same claim
`construction standard that would be used to construe the claims in a civil
`action under 35 U.S.C. § 282(b). See 37 C.F.R. § 42.100(b) (2020). The
`claim construction standard includes construing claims in accordance with
`the ordinary and customary meaning of such claims as understood by one of
`ordinary skill in the art at the time of the invention. See id.; Phillips v. AWH
`Corp., 415 F.3d 1303, 1312–14 (Fed. Cir. 2005) (en banc). In construing
`
`7
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`claims in accordance with their ordinary and customary meaning, we
`consider the specification and prosecution history. Phillips, 415 F.3d at
`1315–17.
`Neither party provides explicit claim constructions for claim features.
`Pet. 5–6; Prelim. Resp. 5. Therefore, we determine that it is not necessary to
`provide an express interpretation of any claim terms. See Nidec Motor
`Corp. v. Zhongshan Broad Ocean Motor Co. Matal, 868 F.3d 1013, 1017
`(Fed. Cir. 2017); Vivid Techs., Inc. v. Am. Sci. & Eng’g, Inc., 200 F.3d 795,
`803 (Fed. Cir. 1999) (“[O]nly those terms need be construed that are in
`controversy, and only to the extent necessary to resolve the controversy.”).
`
`C. Legal Standards – Obviousness
`The U.S. Supreme Court sets forth the framework for applying the
`statutory language of 35 U.S.C. § 103 in Graham v. John Deere Co. of
`Kansas City, 383 U.S. 1, 17–18 (1966):
`Under § 103, the scope and content of the prior art are to be
`determined; differences between the prior art and the claims at
`issue are to be ascertained; and the level of ordinary skill in the
`pertinent art resolved. Against this background, the obviousness
`or nonobviousness of the subject matter is determined. Such
`secondary considerations as commercial success, long felt but
`unsolved needs, failure of others, etc., might be utilized to give
`light to the circumstances surrounding the origin of the subject
`matter sought to be patented.
`As explained by the Supreme Court in KSR Int’l Co. v. Teleflex Inc.,
`Often, it will be necessary for a court to look to interrelated
`teachings of multiple patents; the effects of demands known to
`the design community or present in the marketplace; and the
`background knowledge possessed by a person having ordinary
`skill in the art, all in order to determine whether there was an
`apparent reason to combine the known elements in the fashion
`
`8
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`claimed by the patent at issue. To facilitate review, this analysis
`should be made explicit.
`550 U.S. 398, 418 (2007) (citing In re Kahn, 441 F.3d 977, 988 (Fed. Cir.
`2006) (“[R]ejections on obviousness grounds cannot be sustained by mere
`conclusory statements; instead, there must be some articulated reasoning
`with some rational underpinning to support the legal conclusion of
`obviousness.”)).
`“Whether an ordinarily skilled artisan would have been motivated to
`modify the teachings of a reference is a question of fact.” WBIP, LLC v.
`Kohler Co., 829 F.3d 1317, 1327 (Fed. Cir. 2016) (citations omitted).
`“[W]here a party argues a skilled artisan would have been motivated to
`combine references, it must show the artisan ‘would have had a reasonable
`expectation of success from doing so.’” Arctic Cat Inc. v. Bombardier
`Recreational Prods. Inc., 876 F.3d 1350, 1360–61 (Fed. Cir. 2017) (quoting
`In re Cyclobenzaprine Hydrochloride Extended-Release Capsule Patent
`Litig., 676 F.3d 1063, 1068–69 (Fed. Cir. 2012)).
`
`D. Obviousness of Claim 1 over Jiang in view of Smyth
`Petitioner asserts that the combination of Jiang and Smyth would have
`rendered the subject matter of claim 1 obvious to one of ordinary skill in the
`art at the time of the invention. Pet. 5–59. Patent Owner argues that the
`cited references do not teach or suggest at least limitations [1(c)] and [1(g)]
`and that the Petition has failed to establish the requisite motivation to
`combine the cited references. Prelim. Resp. 9–24. We begin with brief
`discussions of the cited references, and then consider Petitioner’s arguments
`with respect to the references’ teachings applied to the instant claim as well
`as Patent Owner’s arguments asserting deficiencies in this ground of
`unpatentability.
`
`9
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`Jiang
`1.
`Jiang is directed to “computer speech recognition performed by
`conducting a prefix tree search of a silence bracketed lexicon.” Ex. 1004,
`1:15–18. Possible words represented by the input data stream are provided
`as a prefix tree including a plurality of phoneme branches connected at
`nodes. Id. at 4:8–12. Speech is input into system 60 in the form of audible
`voice signal provided by the user to microphone 62, which converts the
`audible speech into an analog electric signal, which is converted by A/D
`converter 64 into a sequence of digital signals that are then provided to
`feature extraction module 66. Id. at 6:45–52. Feature extraction module 66
`divides the digital signal into frames, each approximately 10 ms in duration.
`Id. at 6:62–65.
`The frames are then preferably encoded by feature extraction module
`66 into a feature vector reflecting the spectral characteristics for a plurality
`of frequency bands. Ex. 1004, 6:65–7:1. “In the case of discrete and semi-
`continuous hidden Markov modeling, feature extraction module 66 also
`preferably encodes the feature vectors into one or more codewords using
`vector quantization techniques and a codebook derived from training data.”
`Id. at 7:1–5.
`Upon receiving the codewords from feature extraction module 66, and
`the boundary detection signal provided by silence detection module 68, tree
`search engine 74 accesses information stored in the phonetic speech unit
`model memory 72. Ex. 1004, 7:30–34. Based upon the HMMs stored in
`memory 72, tree search engine 74 determines a most likely phoneme
`represented by the codeword received from feature extraction module 66,
`
`10
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`and hence representative of the utterance received by the user of the system.
`Id. at 7:37–42.
`An example system of Jiang with feature extraction module 66
`functionally in pipeline with tree search engine 74 is illustrated in Figure 2,
`reproduced below:
`
`
`
`Figure 2 of Jiang illustrates a detailed block diagram of a portion of its
`system
`“In a preferred embodiment, feature extraction module 66 is a conventional
`array processor which performs spectral analysis on the digital signals.” Ex.
`1004, 6:53–55. “Further, tree search engine 74 is preferably implemented in
`CPU 21 (which may include one or more processors) or may be performed
`by a dedicated speech recognition processor employed by personal computer
`20.” Id. at 6:39–42.
`
`Smyth
`2.
`Smyth is directed to “[a] speech recognition system [which]
`recognizes connected speech using a plurality of vocabulary nodes.” Ex.
`
`11
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`1005, Abs. Smyth is of particular interest in the area of task-constrained
`connected word recognition where the task, for example, might be to
`recognize one of a set of account numbers or product codes. Id. at 1:18–21.
`Smyth teaches a system including feature extractor 33 for generating
`from a frame of samples a corresponding feature vector and classifier 34
`receiving the succession of feature vectors and operating on each with a
`plurality of model states, to generate recognition results. Ex. 1005, 5:5–9.
`Smyth further includes parsing processor 351, part of sequencer 35, arranged
`to read, at each frame, the state probabilities output by classifier processor
`341, which is part of classifier 34. Id. at 6:35–39. Parsing processor 351 is
`specifically configured to recognize certain phrases or words. Id. at 6:62.
`This system is illustrated in Figure 2, reproduced below:
`
`
`
`Figure 2 of Smyth illustrates a block diagram showing the functional
`elements of the recognition processor
`The frame generator 32 and feature extractor 33 are, in this
`embodiment, provided by a single suitably programmed digital signal
`processor (DSP) device “(such as the Motorola DSP 56000, or the Texas
`Instruments TMC C 320) or similar device.” Ex. 1005, 5:49–53. The
`
`12
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`classifying processor 341 may be a suitably programmed digital signal
`processing (DSP) device, may in particular be the same digital signal
`processing device as the feature extractor 33. Ex. 1005, 6:5–9. The parsing
`processor 351 may, for example, be a microprocessor “such as the Intel(TM) i-
`486(TM) microprocessor or the Motorola(TM) 68000 microprocessor, or may
`alternatively be a DSP device (for example, the same DSP device as is
`employed for any of the preceding processors).” Id. at 6:49–51.
`Frame generator 32 is arranged to receive speech samples, and feature
`extractor 33 receives frames from the frame generator 32 and generates, in
`each case, a set or vector of features, where each is a corresponding feature
`vector. Ex. 1005, 5:7, 20, 31–33.
`
`Classification processor 34 is arranged to read each state field within
`the memory 342 in turn, and calculate for each, using the current input
`feature coefficient set, the probability that the input feature set or vector
`corresponds to the corresponding state. Ex. 1005, 5:63–67. “Accordingly,
`the output of the classification processor is a plurality of state probabilities
`P, one for each state in the state memory 342, indicating the likelihood that
`the input feature vector corresponds to each state.” Id. at 6:1–4.
`Parsing processor 351 is arranged to read, at each frame, the state
`probabilities output by the classifier processor 341, and the previous stored
`state probabilities in the state probability memory 353, and to calculate the
`most likely path of states to date over time, and to compare this with each of
`the state sequences stored in the state sequence memory 352. Ex. 1005,
`6:36–43. “The calculation employs the well known Hidden Markov Model
`method.” Id. at 6:43–45. Accordingly, for each state sequence
`(corresponding to a word, phrase or other speech sequence to be recognized)
`
`13
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`a probability score is output by the parser processor 351 at each frame of
`input speech. For example, the state sequences may comprise the names in a
`telephone directory. Id. at 6:52–56.
`
`Claim 1
`3.
`Element 1(Pre): Preamble, “A speech recognition system”
`a.
`With respect to the preamble of claim 1, Petitioner asserts that “Jiang
`teaches a speech recognition system: ‘A speech recognition system
`recognizes speech based on an input data stream indicative of the speech.’”
`Pet. 10 (citing Ex. 1004, 4:8–9, 4:30-32, 1:15–19, 6:18–20, code (57), Figs.
`1–2).
`
`Patent Owner does not contest that Jiang is directed to “a speech
`recognition system.” Rather, Patent Owner states that Jiang is directed to
`“speech recognition techniques” in a “speech recognition system” (Prelim
`Resp. 21), thereby indicating that Jiang discloses or teaches the preamble.
`We find that at this stage of the proceeding, on the present record,
`Petitioner sufficiently establishes that Jiang meets the limitations of the
`preamble of claim 1 for the reasons explained by Petitioner.
`
`b.
`
`Element 1(a): “a first programmable device programmed to
`calculate a feature vector from a digital audio stream”
`With respect to a first programmable device calculating a feature
`vector from audio data, Petitioner asserts that Jiang teaches a feature
`extraction module 66 for calculating a feature vector, as claimed. Jiang’s
`feature extraction module 66 is a programmable device, as the module 66
`may be a hardware or software module in computer 20 having CPU 21. Pet.
`10 (citing Ex. 1004, 6:31-36, 6:46–7:10).
`
`14
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`Petitioner further states that to any extent Patent Owner argues that
`Jiang fails to teach the element, “Smyth teaches a ‘first programmable
`device’ programmed to calculate a feature vector, as claimed, namely ‘a
`single suitably programmed digital signal processor (DSP) device . . . or
`similar device’ providing frame generator 32 and feature extractor 33.”
`Pet. 14 (citing Ex. 1005, 5:49–53). Petitioner continues with the assertion
`that Smyth also teaches that its programmable device is programmed to
`calculate a feature vector of a digital audio stream. Pet. 14–15 (citing Ex.
`1005, 5:2–3).
`Patent Owner does not offer counterarguments with respect to this
`element of claim 1 in the Preliminary Response. We find that at this stage of
`the proceeding, on the present record, Petitioner sufficiently establishes that
`both Jiang and Smyth meet this limitation of claim 1 for the reasons
`explained by Petitioner.
`
`c.
`
`Element 1(b): “wherein the feature vector comprises a plurality
`of extracted and/or derived quantities from said digital audio
`stream during a defined audio time frame”
`Petitioner asserts that Jiang teaches the calculated feature vector
`comprises extracted and/or derived quantities in the form of “spectral
`characteristics.” The frames are preferably encoded by feature extraction
`module 66 into a feature vector reflecting the spectral characteristics for a
`plurality of frequency bands. Pet. 17 (citing Ex. 1004, 6:62–7:10, 6:53–57).
`Petitioner further asserts that Jiang teaches the feature vector includes
`extracted and/or derived quantities of the digital audio stream “during a
`defined audio time frame,” as claimed. Jiang teaches the digital signal is
`divided into frames of approximately 10 milliseconds in duration, where the
`frames are then encoded by the feature extraction module 66 into a feature
`
`15
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`vector, where the feature vector “reflect[s] the spectral characteristics.” Pet.
`18 (citing Ex. 1004, 6:62–7:1).
`Patent Owner does not offer counterarguments with respect to this
`element of claim 1 in the Preliminary Response. We find that at this stage of
`the proceeding, on the present record, Petitioner sufficiently establishes that
`Jiang meets this limitation of claim 1 for the reasons explained by Petitioner.
`
`d.
`
`Element 1(c): “a second programmable device programmed to
`calculate distances indicating the similarity between a feature
`vector and a plurality of acoustic states of an acoustic model”
`Petitioner asserts that Jiang in combination with Smyth teaches this
`limitation, with Smyth relied on in the combination for teaching the “second
`programmable device.” Pet. 19. Petitioner asserts that both Jiang and
`Smyth teach the claimed functionality of calculating distances. Id. at 18.
`In regard to Jiang, Petitioner demonstrates that Jiang uses HMMs to
`identify speech units, thereby teaching an acoustic model with a plurality of
`acoustic states. Pet. 21. Petitioner further maps the HMM states to the
`recited distance calculations, asserting a distance is a probability or
`likelihood of a feature vector compared to a particular state of an HMM,
`thereby equating distances with probabilities or likelihoods. Id. at 20–21.
`Petitioner notes that Jiang teaches encoding feature vectors into
`codewords using vector quantitation techniques, which will be lossy. Pet.
`22. Jiang teaches these codewords are applied to the HMMs to identify
`utterances. Id. at 23. Petitioner argues that since a probability is associated
`with a corresponding state of an HMM, this represents a likelihood that a
`codeword corresponds to a phoneme utterance, which Petitioner further
`argues is the recited “distance” of this limitation. Id. at 23–25.
`
`16
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`In regard to Smyth, Petitioner asserts Smyth teaches “three distinct
`DSPs for performing the three general speech recognition steps” of feature
`vector calculation, distance calculation, and word identification. Pet. 28.
`Petitioner further asserts that Smyth discloses the second programmable
`device is programmed to calculate distances as recited. Id. at 34. In support
`of this proposition, Petitioner reasons Smyth teaches calculating a plurality
`of probabilities, and that this operation is calculating distances as recited
`because processor 341 receives the feature vector and calculates the
`probability of correspondence. Id. at 37.
`Patent Owner disputes Petitioner’s assertions with respect to this
`feature. Prelim Resp. 10–17. In regard to Jiang, Patent Owner argues that
`the features of claim 1 require the distance be calculated from a first feature
`vector, and Jiang fails to disclose or teach this requirement. Id. at 10.
`Further, Patent Owner argues that the distances must be calculated from the
`feature vectors, and not from a derived value. Id. at 11. Patent Owner
`contends that in contrast, Jiang encodes the feature vectors into [concordant]
`codewords, and then uses the codewords for word recognition. Id. at 12–13.
`Patent Owner argues that determining likelihood scores based on codewords
`is a different technique from calculating distances based on feature vectors.
`Id. Patent Owner reiterates that the distance is a vector in a space, and
`buttresses the asserted distinction between the probability calculations of
`Jiang and the distance calculations as recited by arguing that codewords are
`used to obviate the need for distance calculations. Id. at 4, 16.
`In regard to Smyth, Patent Owner does not offer counterarguments
`with regard to the asserted teaching of Smyth by Petitioner in respect to this
`element of claim 1 in the Preliminary Response. Patent Owner instead
`
`17
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`argues there is no motivation to combine the teaching of Smyth with Jiang
`and to do so would not have been obvious. Prelim Resp. 17. This argument
`against applying the teaching of Smyth to Jiang is addressed separately
`below. We are not persuaded by Patent Owner’s arguments with respect to
`Jiang, which we discuss below.
`The preliminary evidence suggests that codewords are representations
`of feature vectors or groups of similarly clustered feature vectors, such that a
`distance between a codeword and an acoustic state indicates the similarity
`between that acoustic state and any feature vector that the codeword
`represents. See Ex. 1003 ¶ 141. Claim 1 does not require a direct
`comparison between a feature vector and an acoustic state, or proscribe use
`of codewords to represent feature vectors. That is to say, so long as the
`calculated distance indicates the similarity between a feature vector and an
`acoustic state, this limitation does not proscribe using codewords or other
`intermediate representations of a feature vector in the distance calculation.
`Because Jiang discloses using codewords as representations of feature
`vectors, and the calculated distance using codewords indicates the similarity
`between a feature vector and acoustic states within an acoustic model, we
`are persuaded that Jiang teaches “indicating the similarity between a feature
`vector and a plurality of acoustic states.”
`We find that at this stage of the proceeding, on the present record,
`Petitioner sufficiently establishes that the combination of Jiang and Smyth
`meets this limitation of claim 1 for the reasons explained by Petitioner.
`
`18
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`e.
`
`Element 1(d): “wherein said feature vector is received by the
`second programmable device after it is calculated by the first
`programmable device”
`Petitioner asserts that Jiang teaches the recited element where it
`discloses that tree search engine 74 receives codewords from feature
`extraction module 66, where the codeword is a representation of a feature
`vector; based upon HMMs stored in memory, tree search engine 74
`determines a most likely phoneme represented by the codeword received
`from feature extraction module 66. Pet. 41–42 (citing Ex. 1004, 7:30–42).
`Therefore, Petitioner seems to assert that because codewords represent
`feature vectors, receiving the codewords at tree search engine 74 after the
`feature vectors are calculated by feature extraction module 66 and converted
`to codewords teaches the recited.
`Petitioner further asserts that Smyth teaches the recited element
`because Smyth teaches feature extractor 33 which calculates feature vectors,
`and classifier 34 which uses the feature vectors calculated by feature
`extractor 33 to determine probability the likelihood that a feature vector
`corresponds to a state. Pet. 42–43 (citing Ex. 1005, 5:7–9, 63–67).
`Patent Owner does not offer counterarguments with respect to this
`element of claim 1 in the Preliminary Response. At this stage of the
`proceeding, on the present record, we determine that Petitioner has
`sufficiently established that an ordinarily skilled artisan would have
`understood that both Jiang and Smyth teach to calculate feature vectors at a
`device, and then pass the feature vectors or a representation thereof to a
`further device for further computation.
`
`19
`
`
`
`IPR2023-00035
`Patent 10,062,377 B2
`
`
`f.
`
`Element 1(e): “a third programmable device programmed to
`identify spoken words in said digital audio stream using Hidden
`Markov Models and/or Neural Networks”
`Petitioner asserts that Jiang alone or in combination with Smyth
`teaches this claim element of claim 1. Pet. 44.
`First, in regard to Jiang, Petitioner asserts that Jiang teaches the
`recited third programmable device. Pet. 44. Petitioner also argues it would
`have been obvious to use a third programmable device because it would
`have been obvious to use single function devices for each functional stage of
`the three-stage pipeline of feature vector calculation, similarity
`determination, and word identification. Id. at 44–45 (citing Ex. 1003 ¶¶ 71–
`72, 111–120, 156, 187–188). Petitioner additionally cites to Smyth as
`teaching three programmable devices when applied to Jiang. Id.
`Further in regard to Jiang, Petitioner asserts Jiang teaches the recited
`features of identifying spoken words using HMM. Pet. 45. Petitioner states
`that the HMMs are used to determine a score for a node and associated
`phoneme branch of the prefix tree, such that the score indicates the
`likelihood the phoneme being examined for the phoneme branch is the actual
`phoneme indicated by the codeword. Id. (citing Ex. 1004, 8:16–24, 7:30–
`43).
`
`Second, in regard to Smyth, Petitioner asserts Smyth teaches the
`claimed third programmable device which identifies words, citing to the
`parsing processor 351 of Smyth. Pet. 47. Petitioner states that parsing
`processor 351 calculates the most likely path of states, compares this path
`with stored state sequences, and identifies words and phrases, where parsing
`processor 351 maybe a microprocessor, or may alternatively be a DSP
`device. Id. (citing Ex. 1005, 6:36–60).
`
`20
`
`
`
`IPR2023-00035