`
` UNITED STATES PATENT AND TRADEMARK OFFICE
`
`
`
`
`
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`
`
`
`
`
`APPLE INC.,
`Petitioner,
`
`v.
`
`Zentian Limited,
`Patent Owner.
`____________________
`
`Case IPR2023-00034
`Patent No. 7,979,277
`____________________
`
`
`
`
`DECLARATION OF DAVID ANDERSON, Ph.D. IN SUPPORT OF
`PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`I.
`
`
`I, David Anderson, Ph.D, do hereby declare as follows:
`
`Introduction
`
`A.
`
`1.
`
`Engagement
`
`I have been retained by Patent Owner Zentian Limited (“Zentian” or
`
`“Patent Owner”) to provide my opinions with respect to Zentian’s Preliminary
`
`Response to the Petition in Inter Partes Review proceeding IPR2023-00034, with
`
`respect to U.S. Pat. 7,979,277. I am being compensated for my time spent on this
`
`matter. I have no interest in the outcome of this proceeding and the payment of my
`
`fees is in no way contingent on my providing any particular opinions.
`
`2.
`
`As part of this engagement, I have also been asked to provide my
`
`technical review, analysis, insights, and opinions regarding the materials cited and
`
`relied upon by the Petition, including the prior art references and the supporting
`
`Declaration of Mr. Schmandt.
`
`3.
`
`The statements made herein are based on my own knowledge and
`
`opinions.
`
`B.
`
`Background and qualifications
`
`4. My full qualifications, including my professional experience and
`
`education, can be found in my Curriculum Vitae, which includes a complete list of
`
`my publications, and is attached as Ex. A to this declaration.
`
`
`
`2
`
`
`
`5.
`
`I am a professor in the School of Electrical and Computer Engineering
`
`at the Georgia Institute of Technology (“Georgia Tech”) in Atlanta, Georgia. I
`
`have been a professor at Georgia Tech since 1999. In 2009 I served as a visiting
`
`professor in the Department of Computer Science at Korea University in Seoul,
`
`South Korea.
`
`6.
`
`I received my Ph.D. in Electrical and Computer Engineering from
`
`Georgia Tech in 1999. I received my B.S. and M.S. in Electrical Engineering from
`
`Brigham Young University in 1993 and 1994, respectively.
`
`7.
`
`In my employment prior to Georgia Tech as well as in my subsequent
`
`studies and research, I have worked extensively in areas related to the research,
`
`design, and implementation of speech and audio processing systems. I have also
`
`taught graduate and undergraduate level courses at Georgia Tech on the
`
`implementation of signal processing and embedded systems. For example, I have
`
`taught courses on statistical machine learning, machine learning for speech, pattern
`
`recognition, multimedia processing and systems, software design, computer
`
`architecture, real-time signal processing systems, and applications of signal
`
`processing (covering topics in audio processing and speech recognition). I have
`
`also designed and taught a course on signal processing in the context of human
`
`perception. These courses and my research have covered many topics relevant to
`
`the subject matter of the ’277 patent and the prior art cited therein.
`
`
`
`3
`
`
`
`8.
`
`I have served as principal investigator or co-principal investigator in
`
`numerous multi-disciplinary research projects including “Blind Source Separation
`
`for Audio,” “Audio Classification,” “Auditory Scene Analysis,” “Hearing Aid
`
`Audio Processing,” “Speaker Driver Sound Enhancement,” “I-Vector Based Voice
`
`Quality,” “Analysis of Voice Exercise Using Signal Processing,” and “Smart
`
`Homes for Effective and Safe Remote Work During a Pandemic and Beyond.”
`
`9.
`
`I also have extensive experience with the practical implementation of
`
`signal processing algorithms, information theory, signal detection, and related
`
`topics through my research and consulting. I have published over 200 book
`
`chapters and papers in reviewed journals and conferences. Topics include those
`
`such as “Speech recognition using filter bank features,” “Speaker adaptation using
`
`speaker similarity score on DNN features.” “Segmentation based speech
`
`enhancement using auxiliary sensors,” “A framework for estimation of clean
`
`speech by fusion of outputs from multiple speech enhancement systems,”
`
`“Distributed acquisition and processing systems for speech and audio,” “A missing
`
`data-based feature fusion strategy for noise-robust automatic speech recognition
`
`using noisy sensors,” “Learning distances to improve phoneme classification,”
`
`“Identification of voice quality variation using i-vectors,” “Varying time-constants
`
`and gain adaptation in feature extraction for speech processing,” “Low bit-rate
`
`coding of speech in harsh conditions using non-acoustic auxiliary devices,”
`
`
`
`4
`
`
`
`“Speech analysis and coding using a multi-resolution sinusoidal transform,”
`
`“Biologically inspired auditory sensing system interfaces on a chip,” “Cascade
`
`classifiers for audio classification,” and “Single acoustic channel speech
`
`enhancement based on glottal correlation using non-acoustic sensors.” I have also
`
`contributed book chapters for treatises such as “Independent Component Analysis
`
`for Audio and Biosignal Applications,” and written a book on Fixed-Point Signal
`
`Processing which is related to the practical implementation of systems for
`
`processing sound and other signals.
`
`10.
`
`I am a named inventor on eight patents, including “Speech activity
`
`detector for use in noise reduction system, and methods therefor” (U.S. Patent No.
`
`6,351,731), and “Analog audio signal enhancement system using a noise
`
`suppression algorithm” (U.S. Patent No. 7,590,250).
`
`11.
`
`I am a Senior Member of the Institute of Electrical and Electronics
`
`Engineers (“IEEE”) and have been a Member since 1991. I am also a Member of
`
`the IEEE Signal Processing Society. From 1994 to 2016, I was also a member of
`
`the Acoustical Society of America. In 2003, I served as the Co-Chair for the NSF
`
`Symposium on Next Generation Automatic Speech Recognition. In 2004, I
`
`received the Presidential Early Career Award for Scientists and Engineers,
`
`presented by then-President George W. Bush, for my work on ultra-low-power
`
`signal processing system design.
`
`
`
`5
`
`
`
`C. Materials considered
`
`12.
`
`In the course of preparing my opinions, I have reviewed and am familiar
`
`with the ’277 patent, including its written description, figures, and claims. I have
`
`also reviewed and am familiar with the Petition in this proceeding, the supporting
`
`Declaration of Mr. Schmandt, and the relied upon prior art, including Jiang,
`
`Baumgartner, Brown, and Smyth. I have also reviewed the materials cited in this
`
`declaration. My opinions are based on my review of these materials as well as my
`
`more than 30 years of experience, research, and education in the field of art.
`
`II. Relevant legal standards
`
`13.
`
`I am not an attorney. I offer no opinions on the law. But counsel has
`
`informed me of the following legal standards relevant to my analysis here. I have
`
`applied these standards in arriving at my conclusions.
`
`A.
`
`14.
`
`Person of ordinary skill in the art
`
`I understand that an analysis of the claims of a patent in view of prior
`
`art has to be provided from the perspective of a person having ordinary skill in the
`
`art at the time of invention of the ’277 patent. I understand that I should consider
`
`factors such as the educational level and years of experience of those working in the
`
`pertinent art; the types of problems encountered in the art; the teachings of the prior
`
`art; patents and publications of other persons or companies; and the sophistication
`
`of the technology. I understand that the person of ordinary skill in the art is not a
`
`
`
`6
`
`
`
`specific real individual, but rather a hypothetical individual having the qualities
`
`reflected by the factors discussed above.
`
`15.
`
`I understand that the Petition applies a priority date of September 14,
`
`2004 for the challenged claims, Pet. 3, and I apply the same date.
`
`16.
`
`I further understand that the Petition defines the person of ordinary skill
`
`in the art at the time of the invention as having had a master’s degree in computer
`
`engineering, computer science, electrical engineering, or a related field, with at least
`
`two years of experience in the field of speech recognition, or a bachelor’s degree in
`
`the same fields with at least four years of experience in the field of speech
`
`recognition. The Petition adds that further education or experience might substitute
`
`for the above requirements. I do not dispute the Petition’s assumptions at this time,
`
`and my opinions are rendered on the basis of the same definition of the ordinary
`
`artisan set forth in the Petition.
`
`17.
`
`I also note, however, that an ordinarily skilled engineer at the time of
`
`the invention would have been trained in evaluating both the costs and benefits of a
`
`particular design choice. Engineers are trained (both in school and through general
`
`experience in the workforce) to recognize that design choices can have complex
`
`consequences that need to be evaluated before forming a motivation to pursue a
`
`particular design choice, and before forming an expectation of success as to that
`
`design choice. In my opinion, anyone who did not recognize these realities would
`
`
`
`7
`
`
`
`not be a person of ordinary skill in the art. Thus, a person who would have simply
`
`formed design motivations based only on the premise that a particular combination
`
`of known elements would be possible would not be a person of ordinary skill
`
`regardless of their education, experience, or technical knowledge. Likewise, a person
`
`who would have formed design motivations as to a particular combination of known
`
`elements based only on the premise that the combination may provide some benefit,
`
`with no consideration of the relevance of the benefit in the specific context and in
`
`relation to the costs or disadvantages of that combination, would also not have be a
`
`person of ordinary skill in the art, regardless of their education, experience, or
`
`technical knowledge. In my opinion, a person of ordinary skill in the art would have
`
`been deliberative and considered, rather than impulsive.
`
`18. Throughout my declaration, even if I discuss my analysis in the present
`
`tense, I am always making my determinations based on what a person of ordinary
`
`skill in the art (“POSA”) would have known at the time of the invention. Based on
`
`my background and qualifications, I have experience and knowledge exceeding the
`
`level of a POSA, and am qualified to offer the testimony set forth in this declaration.
`
`B.
`
`Burden of proof
`
`19.
`
`I understand that in an inter partes review the petitioner has the burden
`
`of proving a proposition of unpatentability by a preponderance of the evidence.
`
`
`
`8
`
`
`
`C. Claim construction
`
`20.
`
`I understand that in an inter partes review, claims are interpreted based
`
`on the same standard applied by Article III courts, i.e., based on their ordinary and
`
`customary meaning as understood in view of the claim language, the patent’s
`
`description, and the prosecution history viewed from the perspective of the ordinary
`
`artisan. I further understand that where a patent defines claim language, the
`
`definition in the patent controls, regardless of whether those working in the art may
`
`have understood the claim language differently based on ordinary meaning.
`
`D. Obviousness
`
`21.
`
`I understand that a patent may not be valid even though the invention
`
`is not identically disclosed or described in the prior art if the differences between the
`
`subject matter sought to be patented and the prior art are such that the subject matter
`
`as a whole would have been obvious to a person having ordinary skill in the art in
`
`the relevant subject matter at the time the invention was made.
`
`22.
`
`I understand that, to demonstrate obviousness, it is not sufficient for a
`
`petition to merely show that all of the elements of the claims at issue are found in
`
`separate prior art references or even scattered across different embodiments and
`
`teachings of a single reference. The petition must thus go further, to explain how a
`
`person of ordinary skill would combine specific prior art references or teachings,
`
`which combinations of elements in specific references would yield a predictable
`
`
`
`9
`
`
`
`result, and how any specific combination would operate or read on the claims.
`
`Similarly, it is not sufficient to allege that the prior art could be combined, but rather,
`
`the petition must show why and how a person of ordinary skill would have combined
`
`them.
`
`23.
`
`I understand that where an alleged motivation to combine relies on a
`
`particular factual premise, the petitioner bears the burden of providing specific
`
`support for that premise. I understand that obviousness cannot be shown by
`
`conclusory statements, and that the petition must provide articulated reasoning with
`
`some rational underpinning to support its conclusion of obviousness. I also
`
`understand that skill in the art and “common sense” rarely operate to supply missing
`
`knowledge to show obviousness, nor does skill in the art or “common sense” act as
`
`a bridge over gaps in substantive presentation of an obviousness case.
`
`III. Overview of the ’277 Patent
`
`24. U.S. Patent 7,979,277, titled “Speech Recognition Circuit and
`
`Method,” is directed to an improved speech recognition circuit and associated
`
`methods. Ex. 1001, 1:4-5. The ’277 patent teaches and claims a speech recognition
`
`circuit in which a front end, a calculating circuit, and a search stage are operated in
`
`a sequentially pipelined manner, with the front end and search stage being
`
`implemented on a first processor and the calculating circuit implemented on a second
`
`processor. See, e.g., Ex. 1001, Fig. 21, Claim 1.
`
`
`
`10
`
`
`
`25. The ’277 patent teaches that an “audio input for speech recognition”
`
`may be input to the front end in the form of digital audio or analog audio that is
`
`converted to digital audio using an analog to digital converter. Ex. 1001, 11:65-12:1.
`
`“The audio input is divided into time frames, each time frame typically being on the
`
`order of 10 ms.” Ex. 1001, 12:1-3. “For each audio input time frame, the audio signal
`
`is converted into a feature vector. This may be done by splitting the audio signal into
`
`spectral components,” such as, for instance, 13 components plus their first and
`
`second derivatives, creating a total of 39 components. Ex. 1001, 12:4-10. The feature
`
`vector thus “represents a point in an N-dimensional space,” where N is generally in
`
`the range of 20 to 39. Ex. 1001, 12:33-37.
`
`26. Each feature vector is then passed to the calculating circuit, or distance
`
`calculation engine, which calculates a distance indicating the similarity between a
`
`feature vector and one or more predetermined acoustic states of an acoustic model.
`
`Ex. 1001, 6:63-65, 25:18-20 (“Each feature vector is transferred to a distance
`
`calculation engine circuit 204, to obtain distances for each state of the acoustic
`
`model.”). “The distance calculator stage of the recognition process computes a
`
`probability or likelihood that a feature vector corresponds to a particular state.” Ex.
`
`1001, 12:38-40. “The likelihood of each state is determined by the distance between
`
`the feature vector and each state.” Ex. 1001, 12:15-16. The distance calculation may
`
`be a Mahalanobis distance using Gaussian distributions. Ex. 1001, 3:62-4:6, 12:51-
`
`
`
`11
`
`
`
`63. “The MHD (Mahalanobis Distance) is a distance between two N-dimensional
`
`points, scaled by the statistical variation in each component.” Ex. 1001, 12:27-29.
`
`The ’277 patent teaches calculating the distance between a feature vector and 8,000
`
`states, “i.e. one distance for each of the 8,000 states,” Ex. 1001, 13:5-7, which it
`
`teaches “gives the best recognition results when used with a language model.” Id. at
`
`12:21-26, 13:15-16 (“Each state is also a 39 dimensional vector, having the same
`
`spectral components as the feature vector.”). “Due to the 10 ms frame length, a
`
`feature vector arrives at the MHD engine,” i.e., the distance calculation engine or
`
`calculating circuit, “every 10 ms.” Ex. 1001, 13:11-12.
`
`27. The distance calculation engine or calculating circuit “may be included
`
`within an accelerator,” Ex. 1001, 3:35-37, which may be a “loosely bound co-
`
`processor for a CPU running speech recognition software,” and which “has the
`
`advantage of reducing computational load on the CPU, and reducing memory
`
`bandwidth load for the CPU.” Id. at 24:6-11; see Figs. 17-23. “Each time a feature
`
`vector is loaded into the accelerator, the accelerator computes the distances for all
`
`states for that feature vector[.]” Ex. 1001, 25:57-60.
`
`28.
`
`“The distances calculated by the distance calculation engine are then
`
`transferred to the search stage 106 of the speech recognition circuit, which uses
`
`models such as one or more word models and/or language models to generate and
`
`output recognised text.” Ex. 1001, 23:61-65.
`
`
`
`12
`
`
`
`29.
`
`“In some embodiments of the invention the search stage 106 provides
`
`the distance calculation engine 104 with a state list 1XX that indicates the subset of
`
`states for which the search stage requires distances to be calculated by the distance
`
`calculation engine 104. This is an optimization that may reduce computation time
`
`and/or power consumption.” Ex. 1001, 24:25-30.
`
`30. However, the ’277 patent teaches that “[i]n preferred embodiments of
`
`the invention, the front end 103, distance calculation engine 104 and search stage
`
`106 operate in a pipelined manner. When operating in a pipelined manner it is
`
`unlikely that the search stage 106 will be able to provide the active state list 1XX
`
`early enough for the distance calculation engine 104 to implement the optimization
`
`of computing only the distances that will be required by the search stage 106. The
`
`distance calculation circuit 104 may calculate the MHD values for every state in the
`
`lexicon, per frame, whether it is subsequently required by the search stage or not.
`
`This allows the accelerator and software system to operate in a concurrent pipelined
`
`manner, which maximizes the system throughput.” Ex. 1001, 24:31-43.
`
`IV. Jiang
`
`31. U.S. Pat. 6,374,219, titled “System for using silence in speech
`
`recognition,” (“Jiang”), is directed to “computer speech recognition performed by
`
`conducting a prefix tree search of a silence bracketed lexicon.” Ex. 1004, 1:15-18.
`
`“Possible words represented by the input data stream are provided as a prefix tree
`
`
`
`13
`
`
`
`including a plurality of phoneme branches connected at nodes.” Ex. 1004, 4:8-12.
`
`Speech is input into system 60 in the form of audible voice signal provided by the
`
`user to a microphone 62, which converts the audible speech into an analog electric
`
`signal, which is converted by A/D converter 64 into a sequence of digital signals that
`
`are then provided to feature extraction module 66. Ex. 1004, 6:45-52. Feature
`
`extraction module 66 divides the digital signal into frames, each approximately 10
`
`ms in duration. Id. at 6:62-65.
`
`32.
`
`“The frames are then preferably encoded by feature extraction module
`
`66 into a feature vector reflecting the spectral characteristics for a plurality of
`
`frequency bands.” Ex. 1004, 6:65-7:1. “In the case of discrete and semi-continuous
`
`hidden Markov modeling, feature extraction module 66 also preferably encodes the
`
`feature vectors into one or more codewords using vector quantization techniques and
`
`a codebook derived from training data.” Id. at 7:1-5.
`
`33.
`
`“Upon receiving the codewords from feature extraction module 66, and
`
`the boundary detection signal provided by silence detection module 68, tree search
`
`engine 74 accesses information stored in the phonetic speech unit model memory
`
`72.” Ex. 1004, 7:30-34. “Based upon the HMMs stored in memory 72, tree search
`
`engine 74 determines a most likely phoneme represented by the codeword received
`
`from feature extraction module 66, and hence representative of the utterance
`
`received by the user of the system.” Ex. 1004, 7:37-42.
`
`
`
`14
`
`
`
`V.
`
`Baumgartner
`
`34. U.S. Pat. App. 2002/0049582, titled “Speech label accelerators and
`
`techniques for using same,” (“Baumgartner”), is directed to the use of Speech Label
`
`Accelerators (SLAs) in speech recognition systems. Ex. 1007, at 1 ¶ 2.
`
`VI. Brown
`
`35. U.S. Pat. 5,699,456, titled “Large vocabulary connected speech
`
`recognition system and method of language representation using evolutional
`
`grammar to represent context free grammars,” (“Brown”), “is directed to a grammar-
`
`based connected recognition system that recognizes connected input by instantiating
`
`a grammar in real time.” Ex. 1036, 3:20-22. “With the exception of the manner in
`
`which grammar processor 130 controls and interacts with word probability processor
`
`125, the structure and operation of the system are conventional.” Id. at 3:48-52.
`
`VII. Smyth
`
`36. U.S. Pat. 5,819,222, titled “Task-constrained connected speech
`
`recognition of propagation of tokens only if valid propagation path is present,”
`
`(“Smyth”), is directed to “task-constrained connected word recognition where the
`
`task, for example, might be to recognise one of a set of account numbers or product
`
`codes.” Ex. 1005, 1:15-21.
`
`
`
`15
`
`
`
`VIII. The ’277 Patent’s claimed distance calculations
`
`
`37. All challenged claims of the ’277 patent recite “a calculating circuit for
`
`calculating distances indicating the similarity between a feature vector and a
`
`plurality of predetermined acoustic states of an acoustic model.” See Pet. 77,
`
`limitation 1(c) (emphasis added). In my opinion, an ordinary artisan would
`
`understand the above claim language to require making distance calculations using
`
`the feature vectors themselves.
`
`38.
`
`I also note that each embodiment of the ’277 patent is consistent with
`
`that understanding. For instance, Fig. 20 of the patent, included below, shows feature
`
`vectors (FVs) extracted at the front end, and then passed to the distance calculation
`
`Accelerator, which “calc[ulates] all dist[ances] for” each depicted feature vector. Ex.
`
`1001, FIG 20.
`
`
`
`16
`
`
`
`39. Figures 18, 19, 21, 22, and 23 likewise contain the same teaching. Ex.
`
`
`
`1001.
`
`40. The ’277 patent’s written description likewise teaches that “[t]he
`
`distance calculator computes the distance in the N-dimensional space from the
`
`Feature Vector to the probability distribution for each state.” Ex. 1001, 12:41-44.
`
`The patent likewise states: “The FV registers 209 hold the feature vector whose
`
`distances are currently bring [sic] computed by the distance calculation engine 204.”
`
`Ex. 1001, 24:62-64 (emphasis added). Moreover, “[e]ach feature vector is
`
`
`
`17
`
`
`
`transferred to distance calculation circuit 204, to obtain distances for each state of
`
`the acoustic model[,]” Ex. 1001 at 25:18-20 (emphasis added), and “[e]ach time a
`
`feature vector is loaded into the accelerator, the accelerator computes distances for
`
`all states for that feature vector, and stores the results alternately in the A or B
`
`Results Memory.” Id. at 25:57-60 (emphasis added), see also id. at 34:48-50. As
`
`these passages illustrate, in each instance the patent teaches the direct use of the
`
`feature vectors themselves for performing distance calculations. Moreover, although
`
`the ’277 patent contemplates the use of certain different types of distance
`
`calculations, including Mahalanobis Distances, it clearly teaches that any distance
`
`calculation must use the feature vectors themselves, not other values. For instance,
`
`the patent states: “The feature vector is used to calculate 8,000 MHD [Mahalanobis
`
`Distances] for each time frame, i.e., one distance for each of the 8,000 states.” Ex.
`
`1001, 13:5-7.
`
`A.
`
`41.
`
`Jiang does not teach the ’277 Patent’s claimed distance calculations
`
`I understand that the Petition and Mr. Schmandt present two theories
`
`with respect to the distance calculation requirement of the challenged claims. Pet. at
`
`15-16. In Ground 1, the Petition alleges that Jiang alone teaches the limitation, Pet.
`
`21-25, and alternatively, in Ground 4, alleges that the limitation would have been
`
`obvious by modifying Jiang in view of Smyth. Pet. 65-69. I address Jiang below and
`
`the modification of Jiang in view of Smyth in the following section.
`
`
`
`18
`
`
`
`42. U.S. Patent No. 6,374,219 to Jiang (“Jiang”), is directed to a “System
`
`for Using Silence in Speech Recognition,” Ex. 1004, Title, and particularly to using
`
`silence detection to indicate a word boundary to a tree search engine as part of a
`
`speech recognition system. Ex. 1004, 7:17-28.
`
`43.
`
`Jiang teaches that “speech is input into system 60 in the form of an
`
`audible voice signal provided by the user to microphone 62,” which “converts the
`
`audible speech signal into an analog electronic signal which is provided to A/D
`
`converter 64. A/D converter 64 converts the analog speech signal into a sequence of
`
`digital signals which is provided to feature extraction module 66.” Ex. 1004, 6:46-
`
`53. “Feature extraction module 66 divides the digital signal received from A/D
`
`converter 64 into frames which include a plurality of digital samples. Each frame is
`
`approximately 10 milliseconds in duration. The frames are then preferably encoded
`
`by feature extraction module 66 into a feature vector reflecting the spectral
`
`characteristics for a plurality of frequency bands. In the case of discrete and semi-
`
`continuous hidden Markov modeling, feature extraction module 66 also preferably
`
`encodes the feature vectors into one or more codewords using vector quantization
`
`techniques and a codebook derived from training data. Thus, feature extraction
`
`module 66 provides, as its output the feature vectors (or codewords) for each spoken
`
`utterance.” Ex. 1004, 6:62-7:7. “Upon receiving the codewords from feature
`
`extraction module 66, and the boundary detection signal provided by silence
`
`
`
`19
`
`
`
`detection module 68, tree search engine 74 accesses information stored in the
`
`phonetic speech unit models, such as hidden Markov models, which represent
`
`speech units to be detected by system 60.” Ex. 1004, 7:29-35. “Based upon the
`
`HMMs stored in memory 72, tree search engine 74 determines a most likely
`
`phoneme represented by the codeword received from feature extraction module 66,
`
`and hence representative of the utterance received by the user of the system.” Ex.
`
`1004, 7:37-42.
`
`44.
`
`Jiang further teaches that “[a]s the tree search engine traverses tree 77,
`
`it preferably computes a score, for each phoneme branch considered in tree 77,
`
`wherein the score represents the likelihood that the particular phoneme encoded by
`
`the codeword corresponds to the phoneme for the branch under consideration.” Ex.
`
`1004, 8:28-33. “As tree search engine 74 traverses tree 77, it preferably assigns a
`
`score to each node in tree 77 which is based on the likelihood that the present
`
`codeword (output probability distributions) under analysis is represented by the
`
`phoneme corresponding to the branch in tree 77 then being considered, and based
`
`on the score assigned to nodes further up the tree which are connected by phoneme
`
`branches to the present node. This is all done in a known manner.” Ex. 1004, 8:43-
`
`51.
`
`45.
`
`In view of Jiang’s teachings above, an ordinary artisan would have
`
`understood that Jiang teaches using vector quantization to represent all possible
`
`
`
`20
`
`
`
`feature vectors using a small number of representative codevectors. Each feature
`
`vector is associated with the most similar codevector, which is in turn represented
`
`using an index number or “codeword.” The codewords are then used to generate a
`
`score that represents the likelihood that the particular phoneme encoded by the
`
`codeword corresponds to the phoneme for the branch under consideration. Jiang then
`
`determines a most likely phoneme represented by the codeword received from
`
`feature extraction module 66. I understand the Petition provides the same
`
`understanding of Jiang. Pet. 20-22.
`
`46. An ordinary artisan would have known that determining likelihood
`
`scores based on codewords and vector quantization, as taught in Jiang, is an entirely
`
`different technique in the art of speech recognition from calculating distances based
`
`on the feature vectors themselves, as recited in the ’277 patent’s challenged claims.
`
`47. Vector quantization is a lossy data compression technique that is “used
`
`to code a spectral vector into one of a fixed number of discrete symbols in order to
`
`reduce the computation required in a practical system.” Ex. 1015, Part 1, at 28.
`
`“[T]he basic idea of VQ is to reduce the information rate of the speech signal to a
`
`low rate through the use of a codebook with a relatively small number of code
`
`words.” Ex. 1015, Part 3 at 162. Stated otherwise, vector quantization converts the
`
`actual feature vectors themselves into “a much smaller set of vector quantized (VQ)
`
`feature signals.” Ex. 2005 at 16:1-7; Ex. 1015, Part 3, at 155. Using vector
`
`
`
`21
`
`
`
`quantization comes “at the cost of increased error in signal representation but with
`
`the benefit of significantly reduced computation in the recognition process.” Ex.
`
`1015, Part 1, at 34. Notably, when a vector quantization and codeword approach is
`
`used, the need to calculate actual distances between the feature vectors and acoustic
`
`states of an acoustic model is eliminated and “this spectral similarity computation is
`
`often reduced to a table lookup of similarities between pairs of codebook vectors.”
`
`Ex. 1015, Part 3, at 154.
`
`48. By contrast, the ’277 patent teaches that “[e]ach feature vector is
`
`transferred to a distance calculation circuit 204, to obtain distances for each state of
`
`the acoustic model[,]” Ex. 1001 at 25:18-20 (emphasis added), and “[e]ach time a
`
`feature vector is loaded into the accelerator, the accelerator computes distances for
`
`all states for that feature vector, and stores the results alternately in the A or B
`
`Results Memory.” Id. at 25:57-60 (emphasis added), see also id. at 34:48-50.
`
`49.
`
`Jiang’s codeword-based teachings are thus significantly different than
`
`the ’277 patent’s claimed distance calculations, and an ordinary artisan would have
`
`readily recognized that fact. Simply put, the ordinary artisan would have understood
`
`that Jiang’s codewords are not the multi-dimensional real “feature vector” that the
`
`challenged claims require for distance calculations, but rather an index number that
`
`specifies the codebook element selected.
`
`
`
`22
`
`
`
`50.
`
`I understand that the Petition and Mr. Schmandt argue that the
`
`challenged claims are met so long as a distance calculation “indicates a similarity
`
`between a feature vector and” one or more acoustic states of an acoustic model, and
`
`that Jiang teaches the claimed distance calculation because its codeword-based
`
`likelihood scores allegedly indicate such a similarity. Pet. 22-24; Ex. 1003, 149-50
`
`51.
`
`I disagree with that assessment.
`
`52. First, while Limitation 1(c) discusses the use of distance calculations to
`
`determine the “similarity” between a feature vector and the acoustic states, the ’277
`
`patent specifically teaches using the feature vectors themselves to make the distance
`
`calculation. For instance, the patent expressly teaches that “the Mahalanobis distance
`
`between the feature vector and each state is calculated, to determine similarity of
`
`the feature vector to each state.” Ex. 1001, 12:29-32 (emphasis added). Indeed, in
`
`every disclosed embodiment of the distance calculation step, the ’277 patent teaches
`
`calculating distances by directly using the feature vectors, not a different value such
`
`as a codeword derived from vector quantization. Accordingly, an ordinary artisan
`
`would have known that the phrase “indicating a similarity” as recited in the claims
`
`permits a choice as to the method of distance calculation that can be used (so long
`
`as that method uses the feature vectors themselves and “indicates a similarity
`
`between a feature vector and” the acoustic states), but does not permit so-called
`
`distance calculations that do not use the feature vectors themselves.
`
`
`
`23
`
`
`
`53. Second, an ordinary artisan would have known that vector quantization
`
`and codewords are not used to “calculate distances” as that term is used in the field
`
`of art. Rather, it was well known prior at the time of the ’277 patent that VQ and
`
`codewords are typically used to