`
` UNITED STATES PATENT AND TRADEMARK OFFICE
`
`
`
`
`
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`
`
`
`
`
`APPLE INC.,
`Petitioner,
`
`v.
`
`Zentian Limited,
`Patent Owner.
`____________________
`
`Case IPR2023-00035
`Patent No. 10,062,377
`____________________
`
`
`
`
`DECLARATION OF DAVID ANDERSON, Ph.D. IN SUPPORT OF
`PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`I.
`
`I, David Anderson, Ph.D, do hereby declare as follows:
`
`Introduction
`
`Engagement
`
`A.
`
`1.
`
`I have been retained by Patent Owner Zentian Limited (“Zentian” or
`
`“Patent Owner”) to provide my opinions with respect to Zentian’s Preliminary
`
`Response to the Petition in Inter Partes Review proceeding IPR2023-00035, with
`
`respect to U.S. Pat. 10,062,377. I am being compensated for my time spent on this
`
`matter. I have no interest in the outcome of this proceeding and the payment of my
`
`fees is in no way contingent on my providing any particular opinions.
`
`2.
`
`As part of this engagement, I have also been asked to provide my
`
`technical review, analysis, insights, and opinions regarding the materials cited and
`
`relied upon by the Petition, including the prior art references and the supporting
`
`Declaration of Mr. Schmandt.
`
`3.
`
`The statements made herein are based on my own knowledge and
`
`opinions.
`
`Background and qualifications
`
`B.
`
`4. My full qualifications, including my professional experience and
`
`education, can be found in my Curriculum Vitae, which includes a complete list of
`
`my publications, and is attached as Ex. A to this declaration.
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`5.
`
`I am a professor in the School of Electrical and Computer Engineering
`
`
`
`at the Georgia Institute of Technology (“Georgia Tech”) in Atlanta, Georgia. I
`
`have been a professor at Georgia Tech since 1999. In 2009 I served as a visiting
`
`professor in the Department of Computer Science at Korea University in Seoul,
`
`South Korea.
`
`6.
`
`I received my Ph.D. in Electrical and Computer Engineering from
`
`Georgia Tech in 1999. I received my B.S. and M.S. in Electrical Engineering from
`
`Brigham Young University in 1993 and 1994, respectively.
`
`7.
`
`In my employment prior to Georgia Tech as well as in my subsequent
`
`studies and research, I have worked extensively in areas related to the research,
`
`design, and implementation of speech and audio processing systems. I have also
`
`taught graduate and undergraduate level courses at Georgia Tech on the
`
`implementation of signal processing and embedded systems. For example, I have
`
`taught courses on statistical machine learning, machine learning for speech, pattern
`
`recognition, multimedia processing and systems, software design, computer
`
`architecture, real-time signal processing systems, and applications of signal
`
`processing (covering topics in audio processing and speech recognition). I have
`
`also designed and taught a course on signal processing in the context of human
`
`
`
`2
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`perception. These courses and my research have covered many topics relevant to
`
`the subject matter of the ’377 patent and the prior art cited therein.
`
`8.
`
`I have served as principal investigator or co-principal investigator in
`
`numerous multi-disciplinary research projects including “Blind Source Separation
`
`for Audio,” “Audio Classification,” “Auditory Scene Analysis,” “Hearing Aid
`
`Audio Processing,” “Speaker Driver Sound Enhancement,” “I-Vector Based Voice
`
`Quality,” “Analysis of Voice Exercise Using Signal Processing,” and “Smart
`
`Homes for Effective and Safe Remote Work During a Pandemic and Beyond.”
`
`9.
`
`I also have extensive experience with the practical implementation of
`
`signal processing algorithms, information theory, signal detection, and related
`
`topics through my research and consulting. I have published over 200 book
`
`chapters and papers in reviewed journals and conferences. Topics include those
`
`such as “Speech recognition using filter bank features,” “Speaker adaptation using
`
`speaker similarity score on DNN features.” “Segmentation based speech
`
`enhancement using auxiliary sensors,” “A framework for estimation of clean
`
`speech by fusion of outputs from multiple speech enhancement systems,”
`
`“Distributed acquisition and processing systems for speech and audio,” “A missing
`
`data-based feature fusion strategy for noise-robust automatic speech recognition
`
`using noisy sensors,” “Learning distances to improve phoneme classification,”
`
`
`
`3
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`“Identification of voice quality variation using i-vectors,” “Varying time-constants
`
`and gain adaptation in feature extraction for speech processing,” “Low bit-rate
`
`coding of speech in harsh conditions using non-acoustic auxiliary devices,”
`
`“Speech analysis and coding using a multi-resolution sinusoidal transform,”
`
`“Biologically inspired auditory sensing system interfaces on a chip,” “Cascade
`
`classifiers for audio classification,” and “Single acoustic channel speech
`
`enhancement based on glottal correlation using non-acoustic sensors.” I have also
`
`contributed book chapters for treatises such as “Independent Component Analysis
`
`for Audio and Biosignal Applications,” and written a book on Fixed-Point Signal
`
`Processing which is related to the practical implementation of systems for
`
`processing sound and other signals.
`
`10.
`
`I am a named inventor on eight patents, including “Speech activity
`
`detector for use in noise reduction system, and methods therefor” (U.S. Patent No.
`
`6,351,731), and “Analog audio signal enhancement system using a noise
`
`suppression algorithm” (U.S. Patent No. 7,590,250).
`
`11.
`
`I am a Senior Member of the Institute of Electrical and Electronics
`
`Engineers (“IEEE”) and have been a Member since 1991. I am also a Member of
`
`the IEEE Signal Processing Society. From 1994 to 2016, I was also a member of
`
`the Acoustical Society of America. In 2003, I served as the Co-Chair for the NSF
`
`
`
`4
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`Symposium on Next Generation Automatic Speech Recognition. In 2004, I
`
`received the Presidential Early Career Award for Scientists and Engineers,
`
`presented by then-President George W. Bush, for my work on ultra-low-power
`
`signal processing system design.
`
`C. Materials considered
`
`12.
`
`In the course of preparing my opinions, I have reviewed and am familiar
`
`with the ’377 patent, including its written description, figures, and claims. I have
`
`also reviewed and am familiar with the Petition in this proceeding, the supporting
`
`Declaration of Mr. Schmandt, and the relied upon prior art, including Jiang and
`
`Smyth. I have also reviewed the materials cited in this declaration. My opinions are
`
`based on my review of these materials as well as my more than 30 years of
`
`experience, research, and education in the field of art.
`
`II. Relevant legal standards
`
`13.
`
`I am not an attorney. I offer no opinions on the law. But counsel has
`
`informed me of the following legal standards relevant to my analysis here. I have
`
`applied these standards in arriving at my conclusions.
`
`A.
`
`14.
`
`Person of ordinary skill in the art
`
`I understand that an analysis of the claims of a patent in view of prior
`
`art has to be provided from the perspective of a person having ordinary skill in the
`
`
`
`5
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`art at the time of invention of the ’377 patent. I understand that I should consider
`
`factors such as the educational level and years of experience of those working in the
`
`pertinent art; the types of problems encountered in the art; the teachings of the prior
`
`art; patents and publications of other persons or companies; and the sophistication
`
`of the technology. I understand that the person of ordinary skill in the art is not a
`
`specific real individual, but rather a hypothetical individual having the qualities
`
`reflected by the factors discussed above.
`
`15.
`
`I understand that the Petition applies a priority date of September 14,
`
`2004 for the challenged claims, Pet. 4, and I apply the same date.
`
`16.
`
`I further understand that the Petition defines the person of ordinary skill
`
`in the art at the time of the invention as having had a master’s degree in computer
`
`engineering, computer science, electrical engineering, or a related field, with at least
`
`two years of experience in the field of speech recognition, or a bachelor’s degree in
`
`the same fields with at least four years of experience in the field of speech
`
`recognition. The Petition adds that further education or experience might substitute
`
`for the above requirements. I do not dispute the Petition’s assumptions at this time,
`
`and my opinions are rendered on the basis of the same definition of the ordinary
`
`artisan set forth in the Petition.
`
`
`
`6
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`17.
`
`I also note, however, that an ordinarily skilled engineer at the time of
`
`
`
`the invention would have been trained in evaluating both the costs and benefits of a
`
`particular design choice. Engineers are trained (both in school and through general
`
`experience in the workforce) to recognize that design choices can have complex
`
`consequences that need to be evaluated before forming a motivation to pursue a
`
`particular design choice, and before forming an expectation of success as to that
`
`design choice. In my opinion, anyone who did not recognize these realities would
`
`not be a person of ordinary skill in the art. Thus, a person who would have simply
`
`formed design motivations based only on the premise that a particular combination
`
`of known elements would be possible would not be a person of ordinary skill
`
`regardless of their education, experience, or technical knowledge. Likewise, a person
`
`who would have formed design motivations as to a particular combination of known
`
`elements based only on the premise that the combination may provide some benefit,
`
`with no consideration of the relevance of the benefit in the specific context and in
`
`relation to the costs or disadvantages of that combination, would also not have be a
`
`person of ordinary skill in the art, regardless of their education, experience, or
`
`technical knowledge. In my opinion, a person of ordinary skill in the art would have
`
`been deliberative and considered, rather than impulsive.
`
`
`
`7
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`18. Throughout my declaration, even if I discuss my analysis in the present
`
`
`
`tense, I am always making my determinations based on what a person of ordinary
`
`skill in the art (“POSA”) would have known at the time of the invention. Based on
`
`my background and qualifications, I have experience and knowledge exceeding the
`
`level of a POSA, and am qualified to offer the testimony set forth in this declaration.
`
`B.
`
`19.
`
`Burden of proof
`
`I understand that in an inter partes review the petitioner has the burden
`
`of proving a proposition of unpatentability by a preponderance of the evidence.
`
`C. Claim construction
`
`20.
`
`I understand that in an inter partes review, claims are interpreted based
`
`on the same standard applied by Article III courts, i.e., based on their ordinary and
`
`customary meaning as understood in view of the claim language, the patent’s
`
`description, and the prosecution history viewed from the perspective of the ordinary
`
`artisan. I further understand that where a patent defines claim language, the
`
`definition in the patent controls, regardless of whether those working in the art may
`
`have understood the claim language differently based on ordinary meaning.
`
`D. Obviousness
`
`21.
`
`I understand that a patent may not be valid even though the invention
`
`is not identically disclosed or described in the prior art if the differences between the
`
`
`
`8
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`subject matter sought to be patented and the prior art are such that the subject matter
`
`as a whole would have been obvious to a person having ordinary skill in the art in
`
`the relevant subject matter at the time the invention was made.
`
`22.
`
`I understand that, to demonstrate obviousness, it is not sufficient for a
`
`petition to merely show that all of the elements of the claims at issue are found in
`
`separate prior art references or even scattered across different embodiments and
`
`teachings of a single reference. The petition must thus go further, to explain how a
`
`person of ordinary skill would combine specific prior art references or teachings,
`
`which combinations of elements in specific references would yield a predictable
`
`result, and how any specific combination would operate or read on the claims.
`
`Similarly, it is not sufficient to allege that the prior art could be combined, but rather,
`
`the petition must show why and how a person of ordinary skill would have combined
`
`them.
`
`23.
`
`I understand that where an alleged motivation to combine relies on a
`
`particular factual premise, the petitioner bears the burden of providing specific
`
`support for that premise. I understand that obviousness cannot be shown by
`
`conclusory statements, and that the petition must provide articulated reasoning with
`
`some rational underpinning to support its conclusion of obviousness. I also
`
`understand that skill in the art and “common sense” rarely operate to supply missing
`
`
`
`9
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`knowledge to show obviousness, nor does skill in the art or “common sense” act as
`
`a bridge over gaps in substantive presentation of an obviousness case.
`
`III. Overview of the ’377 Patent
`
`24. U.S. Patent 10,062,377, titled “Distributed pipelined parallel speech
`
`recognition system,” is directed to an improved speech recognition circuit and
`
`associated methods. Ex. 1001, 1:18-20.
`
`25. The ’377 patent teaches that an “audio input for speech recognition”
`
`may be input to the front end in the form of digital audio or analog audio that is
`
`converted to digital audio using an analog to digital converter. Ex. 1001, 12:51-53.
`
`“The audio input is divided into time frames, each time frame typically being on the
`
`order of 10 ms.” Ex. 1001, 12:53-55. “For each audio input time frame, the audio
`
`signal is converted into a feature vector. This may be done by splitting the audio
`
`signal into spectral components,” such as, for instance, 13 components plus their
`
`first and second derivatives, creating a total of 39 components. Ex. 1001, 12:56-58.
`
`The feature vector thus “represents a point in an N-dimensional space,” where N is
`
`generally in the range of 20 to 39. Ex. 1001, 13:19-23.
`
`26. Each feature vector is then passed to the calculating circuit, or distance
`
`calculation engine, which calculates a distance indicating the similarity between a
`
`feature vector and one or more predetermined acoustic states of an acoustic model.
`
`
`
`10
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`Ex. 1001, 5:63-6:2, 25:33-35 (“Each feature vector is transferred to a distance
`
`calculation engine circuit 204, to obtain distances for each state of the acoustic
`
`model.”). “The distance calculator stage of the recognition process computes a
`
`probability or likelihood that a feature vector corresponds to a particular state.” Ex.
`
`1001, 13:24-26. “The likelihood of each state is determined by the distance between
`
`the feature vector and each state.” Ex. 1001, 13:1-2. The distance calculation may
`
`be a Mahalanobis distance using Gaussian distributions. Ex. 1001, 4:20-33. “The
`
`MHD (Mahalanobis Distance) is a distance between two N-dimensional points,
`
`scaled by the statistical variation in each component.” Ex. 1001, 13:13-15. The ’377
`
`patent teaches calculating the distance between a feature vector and 8,000 states,
`
`“i.e. one distance for each of the 8,000 states,” Ex. 1001, 13:59-61, which it teaches
`
`“gives the best recognition results when used with a language model.” Id. at 13:11-
`
`12, 14:2-4 (“Each state is also a 39 dimensional vector, having the same spectral
`
`components as the feature vector.”). “Due to the 10 ms frame length, a feature vector
`
`arrives at the MHD engine,” i.e., the distance calculation engine or calculating
`
`circuit, “every 10 ms.” Ex. 1001, 13:65-66.
`
`27. The distance calculation engine or calculating circuit “may be included
`
`within an accelerator,” Ex. 1001, 3:59-61, which may be a “loosely bound co-
`
`processor for a CPU running speech recognition software,” and which “has the
`
`
`
`11
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`advantage of reducing computational load on the CPU, and reducing memory
`
`bandwidth load for the CPU.” Id. at 24:17-20; see Figs. 17-23. “Each time a feature
`
`vector is loaded into the accelerator, the accelerator computes the distances for all
`
`states for that feature vector[.]” Ex. 1001, 26:6-10.
`
`28.
`
`“The distances calculated by the distance calculation engine are then
`
`transferred to the search stage 106 of the speech recognition circuit, which uses
`
`models such as one or more word models and/or language models to generate and
`
`output recognised text.” Ex. 1001, 24:5-11.
`
`IV. Jiang
`
`29. U.S. Pat. 6,374,219, titled “System for using silence in speech
`
`recognition,” (“Jiang”), is directed to “computer speech recognition performed by
`
`conducting a prefix tree search of a silence bracketed lexicon.” Ex. 1004, 1:15-18.
`
`“Possible words represented by the input data stream are provided as a prefix tree
`
`including a plurality of phoneme branches connected at nodes.” Ex. 1004, 4:8-12.
`
`Speech is input into system 60 in the form of audible voice signal provided by the
`
`user to a microphone 62, which converts the audible speech into an analog electric
`
`signal, which is converted by A/D converter 64 into a sequence of digital signals that
`
`are then provided to feature extraction module 66. Ex. 1004, 6:45-52. Feature
`
`
`
`12
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`extraction module 66 divides the digital signal into frames, each approximately 10
`
`ms in duration. Id. at 6:62-65.
`
`30.
`
`“The frames are then preferably encoded by feature extraction module
`
`66 into a feature vector reflecting the spectral characteristics for a plurality of
`
`frequency bands.” Ex. 1004, 6:65-7:1. “In the case of discrete and semi-continuous
`
`hidden Markov modeling, feature extraction module 66 also preferably encodes the
`
`feature vectors into one or more codewords using vector quantization techniques and
`
`a codebook derived from training data.” Id. at 7:1-5.
`
`31.
`
`“Upon receiving the codewords from feature extraction module 66, and
`
`the boundary detection signal provided by silence detection module 68, tree search
`
`engine 74 accesses information stored in the phonetic speech unit model memory
`
`72.” Ex. 1004, 7:30-34. “Based upon the HMMs stored in memory 72, tree search
`
`engine 74 determines a most likely phoneme represented by the codeword received
`
`from feature extraction module 66, and hence representative of the utterance
`
`received by the user of the system.” Ex. 1004, 7:37-42.
`
`V.
`
`Smyth
`
`32. U.S. Pat. 5,819,222, titled “Task-constrained connected speech
`
`recognition of propagation of tokens only if valid propagation path is present,”
`
`(“Smyth”), is directed to “task-constrained connected word recognition where the
`
`
`
`13
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`task, for example, might be to recognise one of a set of account numbers or product
`
`codes.” Ex. 1005, 1:15-21.
`
`VI. The ’377 Patent’s claimed distance calculations
`
`
`33. All challenged claims of the ’377 patent recite “a second programmable
`
`device programmed to calculate distances indicating the similarity between a feature
`
`vector and a plurality of acoustic states of an acoustic model,” see Pet. 77, limitation
`
`1(c) (emphasis added), and “wherein said identification of spoken words uses one or
`
`more distances calculated from a first feature vector.” Pet. 77, limitation 1(g)
`
`(emphasis added).
`
`34.
`
`In my opinion, an ordinary artisan would understand the above claim
`
`language to require making distance calculations using the feature vectors
`
`themselves, and this is particularly clear in view of the language of limitation 1(g).
`
`35.
`
`I also note that each embodiment of the ’377 patent is consistent with
`
`that understanding. For instance, Fig. 20 of the patent, included below, shows feature
`
`vectors (FVs) extracted at the front end, and then passed to the distance calculation
`
`Accelerator, which “calc[ulates] all dist[ances] for” each depicted feature vector. Ex.
`
`1001, FIG 20.
`
`
`
`14
`
`
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`36. Figures 18, 19, 21, 22, and 23 likewise contain the same teaching. Ex.
`
`
`
`1001.
`
`37. The ’377 patent’s written description likewise teaches that “[t]he
`
`distance calculator computes the distance in the N-dimensional space from the
`
`Feature Vector to the probability distribution for each state.” Ex. 1001, 13:27-30.
`
`The patent likewise states: “The FV registers 209 hold the feature vector whose
`
`distances are currently bring [sic] computed by the distance calculation engine 204.”
`
`
`
`15
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`Ex. 1001, 25:9-11 (emphasis added). Moreover, “[e]ach feature vector is transferred
`
`to distance calculation circuit 204, to obtain distances for each state of the acoustic
`
`model[,]” Ex. 1001 at 25:33-35 (emphasis added), and “[e]ach time a feature vector
`
`is loaded into the accelerator, the accelerator computes distances for all states for
`
`that feature vector, and stores the results alternately in the A or B Results Memory.”
`
`Id. at 26:6-10 (emphasis added), see also id. at 35:5-8. As these passages illustrate,
`
`in each instance the patent teaches the direct use of the feature vectors themselves
`
`for performing distance calculations. Moreover, although the ’377 patent
`
`contemplates the use of certain different types of distance calculations, including
`
`Mahalanobis Distances, it clearly teaches that any distance calculation must use the
`
`feature vectors themselves, not other values. For instance, the patent states: “The
`
`feature vector is used to calculate 8,000 MHD [Mahalanobis Distances] for each
`
`time frame, i.e., one distance for each of the 8,000 states.” Ex. 1001, 13:59-61.
`
`VII. Jiang does not teach the ’377 Patent’s claimed distance calculations
`
`38.
`
`I understand that the Petition and Mr. Schmandt present two theories
`
`with respect to the distance calculation requirement of the challenged claims. Pet. at
`
`18-19. The Petition alleges that Jiang alone teaches the limitation, Pet. 23-27, and
`
`alternatively alleges that the limitation would have been obvious by modifying Jiang
`
`
`
`16
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`in view of Smyth. Pet. 40-41. I address Jiang below and the modification of Jiang in
`
`view of Smyth in the following section.
`
`39. U.S. Patent No. 6,374,219 to Jiang (“Jiang”), is directed to a “System
`
`for Using Silence in Speech Recognition,” Ex. 1004, Title, and particularly to using
`
`silence detection to indicate a word boundary to a tree search engine as part of a
`
`speech recognition system. Ex. 1004, 7:17-28.
`
`40.
`
`Jiang teaches that “speech is input into system 60 in the form of an
`
`audible voice signal provided by the user to microphone 62,” which “converts the
`
`audible speech signal into an analog electronic signal which is provided to A/D
`
`converter 64. A/D converter 64 converts the analog speech signal into a sequence of
`
`digital signals which is provided to feature extraction module 66.” Ex. 1004, 6:46-
`
`53. “Feature extraction module 66 divides the digital signal received from A/D
`
`converter 64 into frames which include a plurality of digital samples. Each frame is
`
`approximately 10 milliseconds in duration. The frames are then preferably encoded
`
`by feature extraction module 66 into a feature vector reflecting the spectral
`
`characteristics for a plurality of frequency bands. In the case of discrete and semi-
`
`continuous hidden Markov modeling, feature extraction module 66 also preferably
`
`encodes the feature vectors into one or more codewords using vector quantization
`
`techniques and a codebook derived from training data. Thus, feature extraction
`
`
`
`17
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`module 66 provides, as its output the feature vectors (or codewords) for each spoken
`
`utterance.” Ex. 1004, 6:62-7:7. “Upon receiving the codewords from feature
`
`extraction module 66, and the boundary detection signal provided by silence
`
`detection module 68, tree search engine 74 accesses information stored in the
`
`phonetic speech unit models, such as hidden Markov models, which represent
`
`speech units to be detected by system 60.” Ex. 1004, 7:29-35. “Based upon the
`
`HMMs stored in memory 72, tree search engine 74 determines a most likely
`
`phoneme represented by the codeword received from feature extraction module 66,
`
`and hence representative of the utterance received by the user of the system.” Ex.
`
`1004, 7:37-42.
`
`41.
`
`Jiang further teaches that “[a]s the tree search engine traverses tree 77,
`
`it preferably computes a score, for each phoneme branch considered in tree 77,
`
`wherein the score represents the likelihood that the particular phoneme encoded by
`
`the codeword corresponds to the phoneme for the branch under consideration.” Ex.
`
`1004, 8:28-33. “As tree search engine 74 traverses tree 77, it preferably assigns a
`
`score to each node in tree 77 which is based on the likelihood that the present
`
`codeword (output probability distributions) under analysis is represented by the
`
`phoneme corresponding to the branch in tree 77 then being considered, and based
`
`on the score assigned to nodes further up the tree which are connected by phoneme
`
`
`
`18
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`branches to the present node. This is all done in a known manner.” Ex. 1004, 8:43-
`
`51.
`
`42.
`
`In view of Jiang’s teachings above, an ordinary artisan would have
`
`understood that Jiang teaches clustering feature vectors into similar codewords,
`
`where each feature vector was vector quantized encoded into a codeword, and then
`
`using the codewords to compute a score that represents the likelihood that the
`
`particular phoneme encoded by the codeword corresponds to the phoneme for the
`
`branch under consideration. Jiang then determines a most likely phoneme
`
`represented by the codeword received from feature extraction module 66. I
`
`understand the Petition provides the same understanding of Jiang. Pet. 21, 23.
`
`43. An ordinary artisan would have known that determining likelihood
`
`scores based on codewords and vector quantization, as taught in Jiang, is an entirely
`
`different technique in the art of speech recognition from calculating distances based
`
`on the feature vectors themselves, as recited in the ’377 patent’s challenged claims.
`
`44. Vector quantization is a lossy data compression technique that is “used
`
`to code a spectral vector into one of a fixed number of discrete symbols in order to
`
`reduce the computation required in a practical system.” Ex. 1015, Part 1, at 28.
`
`“[T]he basic idea of VQ is to reduce the information rate of the speech signal to a
`
`low rate through the use of a codebook with a relatively small number of code
`
`
`
`19
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`words.” Ex. 1015, Part 3 at 162. Stated otherwise, vector quantization converts the
`
`actual feature vectors themselves into “a much smaller set of vector quantized (VQ)
`
`feature signals.” Ex. 2005 at 16:1-7; Ex. 1015, Part 3, at 155. Using vector
`
`quantization comes “at the cost of increased error in signal representation but with
`
`the benefit of significantly reduced computation in the recognition process.” Ex.
`
`1015, Part1, at 34. Notably, when a vector quantization and codeword approach is
`
`used, the need to calculate actual distances between the feature vectors and acoustic
`
`states of an acoustic model is eliminated and “this spectral similarity computation is
`
`often reduced to a table lookup of similarities between pairs of codebook vectors.”
`
`Ex. 1015, Part 3, at 154.
`
`45. By contrast, the ’377 patent teaches that “[e]ach feature vector is
`
`transferred to distance calculation circuit 204, to obtain distances for each state of
`
`the acoustic model[,]” Ex. 1001 at 25:33-35 (emphasis added), and “[e]ach time a
`
`feature vector is loaded into the accelerator, the accelerator computes distances for
`
`all states for that feature vector, and stores the results alternately in the A or B Results
`
`Memory.” Id. at 26:6-10 (emphasis added), see also id. at 35:5-8.
`
`46.
`
`Jiang’s codeword-based teachings are thus significantly different than
`
`the ’377 patent’s claimed distance calculations, and an ordinary artisan would have
`
`readily recognized that fact. Simply put, the ordinary artisan would have understood
`
`
`
`20
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`that Jiang’s codewords are not the multi-dimensional real “feature vector” that the
`
`challenged claims require for distance calculations.
`
`47.
`
`I understand that the Petition and Mr. Schmandt argue that the
`
`challenged claims are met so long as a distance calculation “indicates a similarity
`
`between a feature vector and” one or more acoustic states of an acoustic model, and
`
`that Jiang teaches the claimed distance calculation because its codeword-based
`
`likelihood scores allegedly indicate such a similarity. Pet. 24-25; Ex. 1003, at
`
`paragraphs 142-45.
`
`48.
`
`I disagree with that assessment for three reasons.
`
`49. First, limitation 1(g) expressly requires “distances calculated from a
`
`first feature vector.” Thus, regardless of the “indicating a similarity” language of
`
`limitation 1(c), limitation 1(g) precludes satisfying the challenged claims based on
`
`codeword-based likelihood scores, which are not “distances calculated from a first
`
`feature vector.”
`
`50. Second, even where it discusses the use of distance calculations to
`
`determine the “similarity” between a feature vector and the acoustic states, the ’377
`
`patent specifically teaches using the feature vectors themselves to make the distance
`
`calculation. For instance, the patent expressly teaches that “the Mahalanobis distance
`
`between the feature vector and each state is calculated, to determine similarity of the
`
`
`
`21
`
`
`
`IPR2023-00035
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`feature vector to each state.” Ex. 1001, 13:15-18 (emphasis added). Indeed, in every
`
`disclosed embodiment of the distance calculation step, the ’377 patent teaches
`
`calculating distances by directly using the feature vectors, not a different value such
`
`as a codeword derived from vector quantization. Accordingly, an ordinary artisan
`
`would have known that the phrase “indicating a similarity” as recited in the claims
`
`permits a choice as to the method of distance calculation that can be used (so long
`
`as that method uses the feature vectors themselves and “indicates a similarity
`
`between a feature vector and” the acoustic states), but does not permit so-called
`
`distance calculations that do not use the feature vectors themselves.
`
`51. Third, an ordinary artisan would have known that vector quantization
`
`and codewords are not used to “calculate distances” as that term is used in the field
`
`of art. Rather, it was well known prior at