throbber

`
` UNITED STATES PATENT AND TRADEMARK OFFICE
`
`
`
`
`
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`
`
`
`
`
`APPLE INC.,
`Petitioner,
`
`v.
`
`Zentian Limited,
`Patent Owner.
`____________________
`
`Case IPR2023-00034
`Patent No. 7,979,277
`____________________
`
`
`
`
`DECLARATION OF DAVID ANDERSON, Ph.D. IN SUPPORT OF
`PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`

`

`
`
`I.
`
`
`I, David Anderson, Ph.D, do hereby declare as follows:
`
`Introduction
`
`A.
`
`1.
`
`Engagement
`
`I have been retained by Patent Owner Zentian Limited (“Zentian” or
`
`“Patent Owner”) to provide my opinions with respect to Zentian’s Preliminary
`
`Response to the Petition in Inter Partes Review proceeding IPR2023-00034, with
`
`respect to U.S. Pat. 7,979,277. I am being compensated for my time spent on this
`
`matter. I have no interest in the outcome of this proceeding and the payment of my
`
`fees is in no way contingent on my providing any particular opinions.
`
`2.
`
`As part of this engagement, I have also been asked to provide my
`
`technical review, analysis, insights, and opinions regarding the materials cited and
`
`relied upon by the Petition, including the prior art references and the supporting
`
`Declaration of Mr. Schmandt.
`
`3.
`
`The statements made herein are based on my own knowledge and
`
`opinions.
`
`B.
`
`Background and qualifications
`
`4. My full qualifications, including my professional experience and
`
`education, can be found in my Curriculum Vitae, which includes a complete list of
`
`my publications, and is attached as Ex. A to this declaration.
`
`
`
`2
`
`

`

`5.
`
`I am a professor in the School of Electrical and Computer Engineering
`
`at the Georgia Institute of Technology (“Georgia Tech”) in Atlanta, Georgia. I
`
`have been a professor at Georgia Tech since 1999. In 2009 I served as a visiting
`
`professor in the Department of Computer Science at Korea University in Seoul,
`
`South Korea.
`
`6.
`
`I received my Ph.D. in Electrical and Computer Engineering from
`
`Georgia Tech in 1999. I received my B.S. and M.S. in Electrical Engineering from
`
`Brigham Young University in 1993 and 1994, respectively.
`
`7.
`
`In my employment prior to Georgia Tech as well as in my subsequent
`
`studies and research, I have worked extensively in areas related to the research,
`
`design, and implementation of speech and audio processing systems. I have also
`
`taught graduate and undergraduate level courses at Georgia Tech on the
`
`implementation of signal processing and embedded systems. For example, I have
`
`taught courses on statistical machine learning, machine learning for speech, pattern
`
`recognition, multimedia processing and systems, software design, computer
`
`architecture, real-time signal processing systems, and applications of signal
`
`processing (covering topics in audio processing and speech recognition). I have
`
`also designed and taught a course on signal processing in the context of human
`
`perception. These courses and my research have covered many topics relevant to
`
`the subject matter of the ’277 patent and the prior art cited therein.
`
`
`
`3
`
`

`

`8.
`
`I have served as principal investigator or co-principal investigator in
`
`numerous multi-disciplinary research projects including “Blind Source Separation
`
`for Audio,” “Audio Classification,” “Auditory Scene Analysis,” “Hearing Aid
`
`Audio Processing,” “Speaker Driver Sound Enhancement,” “I-Vector Based Voice
`
`Quality,” “Analysis of Voice Exercise Using Signal Processing,” and “Smart
`
`Homes for Effective and Safe Remote Work During a Pandemic and Beyond.”
`
`9.
`
`I also have extensive experience with the practical implementation of
`
`signal processing algorithms, information theory, signal detection, and related
`
`topics through my research and consulting. I have published over 200 book
`
`chapters and papers in reviewed journals and conferences. Topics include those
`
`such as “Speech recognition using filter bank features,” “Speaker adaptation using
`
`speaker similarity score on DNN features.” “Segmentation based speech
`
`enhancement using auxiliary sensors,” “A framework for estimation of clean
`
`speech by fusion of outputs from multiple speech enhancement systems,”
`
`“Distributed acquisition and processing systems for speech and audio,” “A missing
`
`data-based feature fusion strategy for noise-robust automatic speech recognition
`
`using noisy sensors,” “Learning distances to improve phoneme classification,”
`
`“Identification of voice quality variation using i-vectors,” “Varying time-constants
`
`and gain adaptation in feature extraction for speech processing,” “Low bit-rate
`
`coding of speech in harsh conditions using non-acoustic auxiliary devices,”
`
`
`
`4
`
`

`

`“Speech analysis and coding using a multi-resolution sinusoidal transform,”
`
`“Biologically inspired auditory sensing system interfaces on a chip,” “Cascade
`
`classifiers for audio classification,” and “Single acoustic channel speech
`
`enhancement based on glottal correlation using non-acoustic sensors.” I have also
`
`contributed book chapters for treatises such as “Independent Component Analysis
`
`for Audio and Biosignal Applications,” and written a book on Fixed-Point Signal
`
`Processing which is related to the practical implementation of systems for
`
`processing sound and other signals.
`
`10.
`
`I am a named inventor on eight patents, including “Speech activity
`
`detector for use in noise reduction system, and methods therefor” (U.S. Patent No.
`
`6,351,731), and “Analog audio signal enhancement system using a noise
`
`suppression algorithm” (U.S. Patent No. 7,590,250).
`
`11.
`
`I am a Senior Member of the Institute of Electrical and Electronics
`
`Engineers (“IEEE”) and have been a Member since 1991. I am also a Member of
`
`the IEEE Signal Processing Society. From 1994 to 2016, I was also a member of
`
`the Acoustical Society of America. In 2003, I served as the Co-Chair for the NSF
`
`Symposium on Next Generation Automatic Speech Recognition. In 2004, I
`
`received the Presidential Early Career Award for Scientists and Engineers,
`
`presented by then-President George W. Bush, for my work on ultra-low-power
`
`signal processing system design.
`
`
`
`5
`
`

`

`C. Materials considered
`
`12.
`
`In the course of preparing my opinions, I have reviewed and am familiar
`
`with the ’277 patent, including its written description, figures, and claims. I have
`
`also reviewed and am familiar with the Petition in this proceeding, the supporting
`
`Declaration of Mr. Schmandt, and the relied upon prior art, including Jiang,
`
`Baumgartner, Brown, and Smyth. I have also reviewed the materials cited in this
`
`declaration. My opinions are based on my review of these materials as well as my
`
`more than 30 years of experience, research, and education in the field of art.
`
`II. Relevant legal standards
`
`13.
`
`I am not an attorney. I offer no opinions on the law. But counsel has
`
`informed me of the following legal standards relevant to my analysis here. I have
`
`applied these standards in arriving at my conclusions.
`
`A.
`
`14.
`
`Person of ordinary skill in the art
`
`I understand that an analysis of the claims of a patent in view of prior
`
`art has to be provided from the perspective of a person having ordinary skill in the
`
`art at the time of invention of the ’277 patent. I understand that I should consider
`
`factors such as the educational level and years of experience of those working in the
`
`pertinent art; the types of problems encountered in the art; the teachings of the prior
`
`art; patents and publications of other persons or companies; and the sophistication
`
`of the technology. I understand that the person of ordinary skill in the art is not a
`
`
`
`6
`
`

`

`specific real individual, but rather a hypothetical individual having the qualities
`
`reflected by the factors discussed above.
`
`15.
`
`I understand that the Petition applies a priority date of September 14,
`
`2004 for the challenged claims, Pet. 3, and I apply the same date.
`
`16.
`
`I further understand that the Petition defines the person of ordinary skill
`
`in the art at the time of the invention as having had a master’s degree in computer
`
`engineering, computer science, electrical engineering, or a related field, with at least
`
`two years of experience in the field of speech recognition, or a bachelor’s degree in
`
`the same fields with at least four years of experience in the field of speech
`
`recognition. The Petition adds that further education or experience might substitute
`
`for the above requirements. I do not dispute the Petition’s assumptions at this time,
`
`and my opinions are rendered on the basis of the same definition of the ordinary
`
`artisan set forth in the Petition.
`
`17.
`
`I also note, however, that an ordinarily skilled engineer at the time of
`
`the invention would have been trained in evaluating both the costs and benefits of a
`
`particular design choice. Engineers are trained (both in school and through general
`
`experience in the workforce) to recognize that design choices can have complex
`
`consequences that need to be evaluated before forming a motivation to pursue a
`
`particular design choice, and before forming an expectation of success as to that
`
`design choice. In my opinion, anyone who did not recognize these realities would
`
`
`
`7
`
`

`

`not be a person of ordinary skill in the art. Thus, a person who would have simply
`
`formed design motivations based only on the premise that a particular combination
`
`of known elements would be possible would not be a person of ordinary skill
`
`regardless of their education, experience, or technical knowledge. Likewise, a person
`
`who would have formed design motivations as to a particular combination of known
`
`elements based only on the premise that the combination may provide some benefit,
`
`with no consideration of the relevance of the benefit in the specific context and in
`
`relation to the costs or disadvantages of that combination, would also not have be a
`
`person of ordinary skill in the art, regardless of their education, experience, or
`
`technical knowledge. In my opinion, a person of ordinary skill in the art would have
`
`been deliberative and considered, rather than impulsive.
`
`18. Throughout my declaration, even if I discuss my analysis in the present
`
`tense, I am always making my determinations based on what a person of ordinary
`
`skill in the art (“POSA”) would have known at the time of the invention. Based on
`
`my background and qualifications, I have experience and knowledge exceeding the
`
`level of a POSA, and am qualified to offer the testimony set forth in this declaration.
`
`B.
`
`Burden of proof
`
`19.
`
`I understand that in an inter partes review the petitioner has the burden
`
`of proving a proposition of unpatentability by a preponderance of the evidence.
`
`
`
`8
`
`

`

`C. Claim construction
`
`20.
`
`I understand that in an inter partes review, claims are interpreted based
`
`on the same standard applied by Article III courts, i.e., based on their ordinary and
`
`customary meaning as understood in view of the claim language, the patent’s
`
`description, and the prosecution history viewed from the perspective of the ordinary
`
`artisan. I further understand that where a patent defines claim language, the
`
`definition in the patent controls, regardless of whether those working in the art may
`
`have understood the claim language differently based on ordinary meaning.
`
`D. Obviousness
`
`21.
`
`I understand that a patent may not be valid even though the invention
`
`is not identically disclosed or described in the prior art if the differences between the
`
`subject matter sought to be patented and the prior art are such that the subject matter
`
`as a whole would have been obvious to a person having ordinary skill in the art in
`
`the relevant subject matter at the time the invention was made.
`
`22.
`
`I understand that, to demonstrate obviousness, it is not sufficient for a
`
`petition to merely show that all of the elements of the claims at issue are found in
`
`separate prior art references or even scattered across different embodiments and
`
`teachings of a single reference. The petition must thus go further, to explain how a
`
`person of ordinary skill would combine specific prior art references or teachings,
`
`which combinations of elements in specific references would yield a predictable
`
`
`
`9
`
`

`

`result, and how any specific combination would operate or read on the claims.
`
`Similarly, it is not sufficient to allege that the prior art could be combined, but rather,
`
`the petition must show why and how a person of ordinary skill would have combined
`
`them.
`
`23.
`
`I understand that where an alleged motivation to combine relies on a
`
`particular factual premise, the petitioner bears the burden of providing specific
`
`support for that premise. I understand that obviousness cannot be shown by
`
`conclusory statements, and that the petition must provide articulated reasoning with
`
`some rational underpinning to support its conclusion of obviousness. I also
`
`understand that skill in the art and “common sense” rarely operate to supply missing
`
`knowledge to show obviousness, nor does skill in the art or “common sense” act as
`
`a bridge over gaps in substantive presentation of an obviousness case.
`
`III. Overview of the ’277 Patent
`
`24. U.S. Patent 7,979,277, titled “Speech Recognition Circuit and
`
`Method,” is directed to an improved speech recognition circuit and associated
`
`methods. Ex. 1001, 1:4-5. The ’277 patent teaches and claims a speech recognition
`
`circuit in which a front end, a calculating circuit, and a search stage are operated in
`
`a sequentially pipelined manner, with the front end and search stage being
`
`implemented on a first processor and the calculating circuit implemented on a second
`
`processor. See, e.g., Ex. 1001, Fig. 21, Claim 1.
`
`
`
`10
`
`

`

`25. The ’277 patent teaches that an “audio input for speech recognition”
`
`may be input to the front end in the form of digital audio or analog audio that is
`
`converted to digital audio using an analog to digital converter. Ex. 1001, 11:65-12:1.
`
`“The audio input is divided into time frames, each time frame typically being on the
`
`order of 10 ms.” Ex. 1001, 12:1-3. “For each audio input time frame, the audio signal
`
`is converted into a feature vector. This may be done by splitting the audio signal into
`
`spectral components,” such as, for instance, 13 components plus their first and
`
`second derivatives, creating a total of 39 components. Ex. 1001, 12:4-10. The feature
`
`vector thus “represents a point in an N-dimensional space,” where N is generally in
`
`the range of 20 to 39. Ex. 1001, 12:33-37.
`
`26. Each feature vector is then passed to the calculating circuit, or distance
`
`calculation engine, which calculates a distance indicating the similarity between a
`
`feature vector and one or more predetermined acoustic states of an acoustic model.
`
`Ex. 1001, 6:63-65, 25:18-20 (“Each feature vector is transferred to a distance
`
`calculation engine circuit 204, to obtain distances for each state of the acoustic
`
`model.”). “The distance calculator stage of the recognition process computes a
`
`probability or likelihood that a feature vector corresponds to a particular state.” Ex.
`
`1001, 12:38-40. “The likelihood of each state is determined by the distance between
`
`the feature vector and each state.” Ex. 1001, 12:15-16. The distance calculation may
`
`be a Mahalanobis distance using Gaussian distributions. Ex. 1001, 3:62-4:6, 12:51-
`
`
`
`11
`
`

`

`63. “The MHD (Mahalanobis Distance) is a distance between two N-dimensional
`
`points, scaled by the statistical variation in each component.” Ex. 1001, 12:27-29.
`
`The ’277 patent teaches calculating the distance between a feature vector and 8,000
`
`states, “i.e. one distance for each of the 8,000 states,” Ex. 1001, 13:5-7, which it
`
`teaches “gives the best recognition results when used with a language model.” Id. at
`
`12:21-26, 13:15-16 (“Each state is also a 39 dimensional vector, having the same
`
`spectral components as the feature vector.”). “Due to the 10 ms frame length, a
`
`feature vector arrives at the MHD engine,” i.e., the distance calculation engine or
`
`calculating circuit, “every 10 ms.” Ex. 1001, 13:11-12.
`
`27. The distance calculation engine or calculating circuit “may be included
`
`within an accelerator,” Ex. 1001, 3:35-37, which may be a “loosely bound co-
`
`processor for a CPU running speech recognition software,” and which “has the
`
`advantage of reducing computational load on the CPU, and reducing memory
`
`bandwidth load for the CPU.” Id. at 24:6-11; see Figs. 17-23. “Each time a feature
`
`vector is loaded into the accelerator, the accelerator computes the distances for all
`
`states for that feature vector[.]” Ex. 1001, 25:57-60.
`
`28.
`
`“The distances calculated by the distance calculation engine are then
`
`transferred to the search stage 106 of the speech recognition circuit, which uses
`
`models such as one or more word models and/or language models to generate and
`
`output recognised text.” Ex. 1001, 23:61-65.
`
`
`
`12
`
`

`

`29.
`
`“In some embodiments of the invention the search stage 106 provides
`
`the distance calculation engine 104 with a state list 1XX that indicates the subset of
`
`states for which the search stage requires distances to be calculated by the distance
`
`calculation engine 104. This is an optimization that may reduce computation time
`
`and/or power consumption.” Ex. 1001, 24:25-30.
`
`30. However, the ’277 patent teaches that “[i]n preferred embodiments of
`
`the invention, the front end 103, distance calculation engine 104 and search stage
`
`106 operate in a pipelined manner. When operating in a pipelined manner it is
`
`unlikely that the search stage 106 will be able to provide the active state list 1XX
`
`early enough for the distance calculation engine 104 to implement the optimization
`
`of computing only the distances that will be required by the search stage 106. The
`
`distance calculation circuit 104 may calculate the MHD values for every state in the
`
`lexicon, per frame, whether it is subsequently required by the search stage or not.
`
`This allows the accelerator and software system to operate in a concurrent pipelined
`
`manner, which maximizes the system throughput.” Ex. 1001, 24:31-43.
`
`IV. Jiang
`
`31. U.S. Pat. 6,374,219, titled “System for using silence in speech
`
`recognition,” (“Jiang”), is directed to “computer speech recognition performed by
`
`conducting a prefix tree search of a silence bracketed lexicon.” Ex. 1004, 1:15-18.
`
`“Possible words represented by the input data stream are provided as a prefix tree
`
`
`
`13
`
`

`

`including a plurality of phoneme branches connected at nodes.” Ex. 1004, 4:8-12.
`
`Speech is input into system 60 in the form of audible voice signal provided by the
`
`user to a microphone 62, which converts the audible speech into an analog electric
`
`signal, which is converted by A/D converter 64 into a sequence of digital signals that
`
`are then provided to feature extraction module 66. Ex. 1004, 6:45-52. Feature
`
`extraction module 66 divides the digital signal into frames, each approximately 10
`
`ms in duration. Id. at 6:62-65.
`
`32.
`
`“The frames are then preferably encoded by feature extraction module
`
`66 into a feature vector reflecting the spectral characteristics for a plurality of
`
`frequency bands.” Ex. 1004, 6:65-7:1. “In the case of discrete and semi-continuous
`
`hidden Markov modeling, feature extraction module 66 also preferably encodes the
`
`feature vectors into one or more codewords using vector quantization techniques and
`
`a codebook derived from training data.” Id. at 7:1-5.
`
`33.
`
`“Upon receiving the codewords from feature extraction module 66, and
`
`the boundary detection signal provided by silence detection module 68, tree search
`
`engine 74 accesses information stored in the phonetic speech unit model memory
`
`72.” Ex. 1004, 7:30-34. “Based upon the HMMs stored in memory 72, tree search
`
`engine 74 determines a most likely phoneme represented by the codeword received
`
`from feature extraction module 66, and hence representative of the utterance
`
`received by the user of the system.” Ex. 1004, 7:37-42.
`
`
`
`14
`
`

`

`V.
`
`Baumgartner
`
`34. U.S. Pat. App. 2002/0049582, titled “Speech label accelerators and
`
`techniques for using same,” (“Baumgartner”), is directed to the use of Speech Label
`
`Accelerators (SLAs) in speech recognition systems. Ex. 1007, at 1 ¶ 2.
`
`VI. Brown
`
`35. U.S. Pat. 5,699,456, titled “Large vocabulary connected speech
`
`recognition system and method of language representation using evolutional
`
`grammar to represent context free grammars,” (“Brown”), “is directed to a grammar-
`
`based connected recognition system that recognizes connected input by instantiating
`
`a grammar in real time.” Ex. 1036, 3:20-22. “With the exception of the manner in
`
`which grammar processor 130 controls and interacts with word probability processor
`
`125, the structure and operation of the system are conventional.” Id. at 3:48-52.
`
`VII. Smyth
`
`36. U.S. Pat. 5,819,222, titled “Task-constrained connected speech
`
`recognition of propagation of tokens only if valid propagation path is present,”
`
`(“Smyth”), is directed to “task-constrained connected word recognition where the
`
`task, for example, might be to recognise one of a set of account numbers or product
`
`codes.” Ex. 1005, 1:15-21.
`
`
`
`15
`
`

`

`VIII. The ’277 Patent’s claimed distance calculations
`
`
`37. All challenged claims of the ’277 patent recite “a calculating circuit for
`
`calculating distances indicating the similarity between a feature vector and a
`
`plurality of predetermined acoustic states of an acoustic model.” See Pet. 77,
`
`limitation 1(c) (emphasis added). In my opinion, an ordinary artisan would
`
`understand the above claim language to require making distance calculations using
`
`the feature vectors themselves.
`
`38.
`
`I also note that each embodiment of the ’277 patent is consistent with
`
`that understanding. For instance, Fig. 20 of the patent, included below, shows feature
`
`vectors (FVs) extracted at the front end, and then passed to the distance calculation
`
`Accelerator, which “calc[ulates] all dist[ances] for” each depicted feature vector. Ex.
`
`1001, FIG 20.
`
`
`
`16
`
`

`

`39. Figures 18, 19, 21, 22, and 23 likewise contain the same teaching. Ex.
`
`
`
`1001.
`
`40. The ’277 patent’s written description likewise teaches that “[t]he
`
`distance calculator computes the distance in the N-dimensional space from the
`
`Feature Vector to the probability distribution for each state.” Ex. 1001, 12:41-44.
`
`The patent likewise states: “The FV registers 209 hold the feature vector whose
`
`distances are currently bring [sic] computed by the distance calculation engine 204.”
`
`Ex. 1001, 24:62-64 (emphasis added). Moreover, “[e]ach feature vector is
`
`
`
`17
`
`

`

`transferred to distance calculation circuit 204, to obtain distances for each state of
`
`the acoustic model[,]” Ex. 1001 at 25:18-20 (emphasis added), and “[e]ach time a
`
`feature vector is loaded into the accelerator, the accelerator computes distances for
`
`all states for that feature vector, and stores the results alternately in the A or B
`
`Results Memory.” Id. at 25:57-60 (emphasis added), see also id. at 34:48-50. As
`
`these passages illustrate, in each instance the patent teaches the direct use of the
`
`feature vectors themselves for performing distance calculations. Moreover, although
`
`the ’277 patent contemplates the use of certain different types of distance
`
`calculations, including Mahalanobis Distances, it clearly teaches that any distance
`
`calculation must use the feature vectors themselves, not other values. For instance,
`
`the patent states: “The feature vector is used to calculate 8,000 MHD [Mahalanobis
`
`Distances] for each time frame, i.e., one distance for each of the 8,000 states.” Ex.
`
`1001, 13:5-7.
`
`A.
`
`41.
`
`Jiang does not teach the ’277 Patent’s claimed distance calculations
`
`I understand that the Petition and Mr. Schmandt present two theories
`
`with respect to the distance calculation requirement of the challenged claims. Pet. at
`
`15-16. In Ground 1, the Petition alleges that Jiang alone teaches the limitation, Pet.
`
`21-25, and alternatively, in Ground 4, alleges that the limitation would have been
`
`obvious by modifying Jiang in view of Smyth. Pet. 65-69. I address Jiang below and
`
`the modification of Jiang in view of Smyth in the following section.
`
`
`
`18
`
`

`

`42. U.S. Patent No. 6,374,219 to Jiang (“Jiang”), is directed to a “System
`
`for Using Silence in Speech Recognition,” Ex. 1004, Title, and particularly to using
`
`silence detection to indicate a word boundary to a tree search engine as part of a
`
`speech recognition system. Ex. 1004, 7:17-28.
`
`43.
`
`Jiang teaches that “speech is input into system 60 in the form of an
`
`audible voice signal provided by the user to microphone 62,” which “converts the
`
`audible speech signal into an analog electronic signal which is provided to A/D
`
`converter 64. A/D converter 64 converts the analog speech signal into a sequence of
`
`digital signals which is provided to feature extraction module 66.” Ex. 1004, 6:46-
`
`53. “Feature extraction module 66 divides the digital signal received from A/D
`
`converter 64 into frames which include a plurality of digital samples. Each frame is
`
`approximately 10 milliseconds in duration. The frames are then preferably encoded
`
`by feature extraction module 66 into a feature vector reflecting the spectral
`
`characteristics for a plurality of frequency bands. In the case of discrete and semi-
`
`continuous hidden Markov modeling, feature extraction module 66 also preferably
`
`encodes the feature vectors into one or more codewords using vector quantization
`
`techniques and a codebook derived from training data. Thus, feature extraction
`
`module 66 provides, as its output the feature vectors (or codewords) for each spoken
`
`utterance.” Ex. 1004, 6:62-7:7. “Upon receiving the codewords from feature
`
`extraction module 66, and the boundary detection signal provided by silence
`
`
`
`19
`
`

`

`detection module 68, tree search engine 74 accesses information stored in the
`
`phonetic speech unit models, such as hidden Markov models, which represent
`
`speech units to be detected by system 60.” Ex. 1004, 7:29-35. “Based upon the
`
`HMMs stored in memory 72, tree search engine 74 determines a most likely
`
`phoneme represented by the codeword received from feature extraction module 66,
`
`and hence representative of the utterance received by the user of the system.” Ex.
`
`1004, 7:37-42.
`
`44.
`
`Jiang further teaches that “[a]s the tree search engine traverses tree 77,
`
`it preferably computes a score, for each phoneme branch considered in tree 77,
`
`wherein the score represents the likelihood that the particular phoneme encoded by
`
`the codeword corresponds to the phoneme for the branch under consideration.” Ex.
`
`1004, 8:28-33. “As tree search engine 74 traverses tree 77, it preferably assigns a
`
`score to each node in tree 77 which is based on the likelihood that the present
`
`codeword (output probability distributions) under analysis is represented by the
`
`phoneme corresponding to the branch in tree 77 then being considered, and based
`
`on the score assigned to nodes further up the tree which are connected by phoneme
`
`branches to the present node. This is all done in a known manner.” Ex. 1004, 8:43-
`
`51.
`
`45.
`
`In view of Jiang’s teachings above, an ordinary artisan would have
`
`understood that Jiang teaches using vector quantization to represent all possible
`
`
`
`20
`
`

`

`feature vectors using a small number of representative codevectors. Each feature
`
`vector is associated with the most similar codevector, which is in turn represented
`
`using an index number or “codeword.” The codewords are then used to generate a
`
`score that represents the likelihood that the particular phoneme encoded by the
`
`codeword corresponds to the phoneme for the branch under consideration. Jiang then
`
`determines a most likely phoneme represented by the codeword received from
`
`feature extraction module 66. I understand the Petition provides the same
`
`understanding of Jiang. Pet. 20-22.
`
`46. An ordinary artisan would have known that determining likelihood
`
`scores based on codewords and vector quantization, as taught in Jiang, is an entirely
`
`different technique in the art of speech recognition from calculating distances based
`
`on the feature vectors themselves, as recited in the ’277 patent’s challenged claims.
`
`47. Vector quantization is a lossy data compression technique that is “used
`
`to code a spectral vector into one of a fixed number of discrete symbols in order to
`
`reduce the computation required in a practical system.” Ex. 1015, Part 1, at 28.
`
`“[T]he basic idea of VQ is to reduce the information rate of the speech signal to a
`
`low rate through the use of a codebook with a relatively small number of code
`
`words.” Ex. 1015, Part 3 at 162. Stated otherwise, vector quantization converts the
`
`actual feature vectors themselves into “a much smaller set of vector quantized (VQ)
`
`feature signals.” Ex. 2005 at 16:1-7; Ex. 1015, Part 3, at 155. Using vector
`
`
`
`21
`
`

`

`quantization comes “at the cost of increased error in signal representation but with
`
`the benefit of significantly reduced computation in the recognition process.” Ex.
`
`1015, Part 1, at 34. Notably, when a vector quantization and codeword approach is
`
`used, the need to calculate actual distances between the feature vectors and acoustic
`
`states of an acoustic model is eliminated and “this spectral similarity computation is
`
`often reduced to a table lookup of similarities between pairs of codebook vectors.”
`
`Ex. 1015, Part 3, at 154.
`
`48. By contrast, the ’277 patent teaches that “[e]ach feature vector is
`
`transferred to a distance calculation circuit 204, to obtain distances for each state of
`
`the acoustic model[,]” Ex. 1001 at 25:18-20 (emphasis added), and “[e]ach time a
`
`feature vector is loaded into the accelerator, the accelerator computes distances for
`
`all states for that feature vector, and stores the results alternately in the A or B
`
`Results Memory.” Id. at 25:57-60 (emphasis added), see also id. at 34:48-50.
`
`49.
`
`Jiang’s codeword-based teachings are thus significantly different than
`
`the ’277 patent’s claimed distance calculations, and an ordinary artisan would have
`
`readily recognized that fact. Simply put, the ordinary artisan would have understood
`
`that Jiang’s codewords are not the multi-dimensional real “feature vector” that the
`
`challenged claims require for distance calculations, but rather an index number that
`
`specifies the codebook element selected.
`
`
`
`22
`
`

`

`50.
`
`I understand that the Petition and Mr. Schmandt argue that the
`
`challenged claims are met so long as a distance calculation “indicates a similarity
`
`between a feature vector and” one or more acoustic states of an acoustic model, and
`
`that Jiang teaches the claimed distance calculation because its codeword-based
`
`likelihood scores allegedly indicate such a similarity. Pet. 22-24; Ex. 1003, 149-50
`
`51.
`
`I disagree with that assessment.
`
`52. First, while Limitation 1(c) discusses the use of distance calculations to
`
`determine the “similarity” between a feature vector and the acoustic states, the ’277
`
`patent specifically teaches using the feature vectors themselves to make the distance
`
`calculation. For instance, the patent expressly teaches that “the Mahalanobis distance
`
`between the feature vector and each state is calculated, to determine similarity of
`
`the feature vector to each state.” Ex. 1001, 12:29-32 (emphasis added). Indeed, in
`
`every disclosed embodiment of the distance calculation step, the ’277 patent teaches
`
`calculating distances by directly using the feature vectors, not a different value such
`
`as a codeword derived from vector quantization. Accordingly, an ordinary artisan
`
`would have known that the phrase “indicating a similarity” as recited in the claims
`
`permits a choice as to the method of distance calculation that can be used (so long
`
`as that method uses the feature vectors themselves and “indicates a similarity
`
`between a feature vector and” the acoustic states), but does not permit so-called
`
`distance calculations that do not use the feature vectors themselves.
`
`
`
`23
`
`

`

`53. Second, an ordinary artisan would have known that vector quantization
`
`and codewords are not used to “calculate distances” as that term is used in the field
`
`of art. Rather, it was well known prior at the time of the ’277 patent that VQ and
`
`codewords are typically used to

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket