`Declaration of Oded Gottesman, Ph.D. (Exhibit 2004)
`
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`
`APPLE, INC.
`Petitioner
`
`v.
`
`SAINT LAWRENCE COMMUNICATIONS LLC
`Patent Owner
`
`Case: IPR2016-01075
`Patent No. 7,151,802
`
`DECLARATION OF ODED GOTTESMAN, PH.D.
`Exhibit 2004
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Mail Stop “PATENT BOARD”
`Patent Trial and Appeal Board
`U.S. Patent and Trademark Office
`P.O. Box 1450
`Alexandria, VA 22313-1450
`
`
`
`SLC 2004
`
`
`
`Inter Partes Review of USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`TABLE OF CONTENTS
`
`
`
`
`
`I.
`
`II.
`
`INTRODUCTION ........................................................................................... 1
`A.
`Background ........................................................................................... 1
`B.
`Qualifications ........................................................................................ 2
`LIST OF DOCUMENTS CONSIDERED IN FORMULATING MY
`OPINIONS ...................................................................................................... 6
`III. TECHNICAL BACKGROUND AND STATE OF THE ART AT
`THE TIME OF THE ALLEGED INVENTION ........................................... 11
`A.
`Speech coding and Linear Predictive Coding (LPC) analysis ........... 14
`B.
`Long Term Prediction (Pitch Prediction) ........................................... 18
`C.
`Quantization ........................................................................................ 20
`D.
`Code Excited Linear Prediction (CELP) ............................................ 21
`E.
`Perceptual Weighting .......................................................................... 28
`F.
`Long term pitch prediction and using adaptive codebook .................. 29
`G.
`CELP Decoding .................................................................................. 29
`H.
`Speech Bandwidth extension .............................................................. 30
`I.
`Speech Quality .................................................................................... 31
`J.
`Finite precision Considerations .......................................................... 31
`IV. PERSON OF ORDINARY SKILL IN THE ART ........................................ 34
`V. OVERVIEW OF THE ‘802 PATENT .......................................................... 35
`VI. THE CLAIMS OF THE ‘802 PATENT ....................................................... 40
`VII. LEGAL STANDARDS ................................................................................. 41
`A.
`Requirements of a Method and System Patent ................................... 41
`B.
`Obviousness ........................................................................................ 42
`VIII. CLAIM CONSTRUCTION .......................................................................... 47
`IX. SUMMARY OF PRIOR ART TO THE ’802 PATENT ALLEGED IN
`THIS PETITION (CASE IPR2016-00704) .................................................. 47
`A. A 13.0 kbit/s wideband speech codec based on SB-ACELP
`(“Schnitzler”) ...................................................................................... 47
`Pyke Master’s Thesis Exhibit 2010 (“Pyke”) ..................................... 51
`
`B.
`
`
`
`i
`
`
`
`
`
`X.
`
`Inter Partes Review of USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`PATENTABILITY OF THE CHALLENGED CLAIMS OF THE
`‘802 PATENT ............................................................................................... 78
`XI. CONCLUSION ........................................................................................... 115
`BIBLIOGRAPHY ................................................................................................. 117
`DR. ODED GOTTESMAN – CURRICULUM VITAE ....................................... 119
`
`
`
`ii
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`I, Oded Gottesman, hereby declare as follows:
`
`I.
`
`INTRODUCTION
`A. Background
`1. My name is Oded Gottesman. I am a researcher and consultant
`
`working in areas related to speech and audio coding and enhancement, digital
`
`signal processing, telecommunications, networks, and location and positioning
`
`systems.
`
`2.
`
`I have been retained to act as an expert witness on behalf of SAINT
`
`LAWRENCE COMMUNICATIONS Inc. (“Patent Owner”) in connection with the
`
`above captioned Petition for Method and System Review of U.S. Patent No.
`
`7,151,802 (“Petition”) submitted by Apple, Inc. (“Petitioner”). I understand that
`
`this proceeding involves U.S. Patent No. 7,151,802 (“the ‘802 Patent”), titled
`
`“High frequency Content Recovering Method and Device for Over-Samples
`
`Synthesized Wideband Signal.” The ‘802 Patent is provided as Exhibit 1001.
`
`3.
`
`I understand that Petitioner challenges the validity of Claims 1-3, 8-
`
`11, 16, 25-27, 32-35, 40, 49, 50, 52, and 53 of the ‘802 Patent (the “challenged
`
`claims”).
`
`4.
`
`I have reviewed and am familiar with the ‘802 Patent as well as its
`
`prosecution history. The ‘802 prosecution history is provided as Exhibit 1003.
`
`Additionally, I have reviewed materials identified in Section III.
`
`
`
`1
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`As set forth below, I am familiar with the technology at issue as of the
`
`5.
`
`effective filing date of the ‘802 patent. I have been asked to provide my technical
`
`review, analysis, insights, and opinions regarding the prior art references that form
`
`the basis for the Petition. In forming my opinions, I have relied on my own
`
`experience and knowledge, my review of the ‘802 Patent and its file history, and of
`
`the prior art references cited in the Petition.
`
`6. My opinions expressed in this Declaration rely to a great extent on my
`
`own personal knowledge and recollection. However, to the extent I considered
`
`specific documents or data in formulating the opinions expressed in this
`
`Declaration, such items are expressly referred to in this Declaration.
`
`7.
`
`I am being compensated for my time in connection with this covered
`
`patent review at my standard consulting rate, which is $525 per hour. My
`
`compensation is not contingent upon and in no way affects the substance of my
`
`testimony.
`
`B. Qualifications
`I am a citizen of the United States, and I am currently employed as the
`8.
`
`Chief Technology Officer (“CTO”) of Compandent, Inc.
`
`9. My curriculum vitae, including my qualifications, a list of the
`
`publications that I have authored during my career, and a list of the cases in which,
`
`during the previous four years, I have testified as an expert at trial or by deposition,
`
`
`
`2
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`is attached to this report as Exhibit A. I expect to testify regarding my background,
`
`qualifications, and experience relevant to the issues in this investigation.
`
`10.
`
`I earned my Bachelor of Science degree in Electrical Engineering
`
`from Ben-Gurion University in 1988.
`
`11.
`
`In 1992, I earned my Master of Science degree in Electrical and
`
`Computer Engineering from Drexel University, which included performing
`
`research at AT&T Bell Labs, Murray Hill, at the time considered the world “holy
`
`grail” of speech processing research. My research was in the area of wideband
`
`speech coding, and titled “Algorithm Development and Real-Time Implementation
`
`of High-Quality 32kbps Wideband Speech Low-Delay Code-Excited Linear
`
`Predictive (LD-CELP) Coder”. The work continued a prior research by E.
`
`Ordentlich, and Y. Shoham who was also my M.Sc. research advisor. As a part of
`
`my work, I have also implemented that algorithm in DSP Assembly Language on
`
`two DSPs running in parallel. I subsequently co-authored and published two
`
`articled about this work.
`
`12.
`
`I earned my Doctorate of Philosophy in Electrical and Computer
`
`Engineering from the University of California at Santa Barbara in 2000.
`
`13.
`
`I have worked in the field of digital signal processing (“DSP”) for
`
`over 25 years, and have extensive experience in DSP research, design, and
`
`development, as well as the design and development of DSP-related software and
`
`
`
`3
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`hardware. Presently, I am the CTO of Compandent, a technology company that
`
`develops and provides telecommunication and DSP-related algorithms, software
`
`and hardware products, real-time DSP systems, speech coding and speech
`
`enhancement-related projects, and DSP, software, and hardware-related services.
`
`While at Compandent, I have contributed to a speech coding algorithm and noise
`
`canceller algorithm that has been adopted for secure voice communication by the
`
`U.S. Department of Defense & NATO. Currently, I am supporting the DoD’s and
`
`NATO’s use of these algorithms, and am performing real-time implementation
`
`projects for DoD and NATO vendors, as well as the Defense Advanced Research
`
`Projects Agency (DARPA).
`
`14.
`
`I have worked for numerous different companies in the field of digital
`
`signal processing during my career. I am very familiar with most, if not all, speech
`
`coding, speech enhancement, audio coding, and video coding techniques for
`
`various applications. As part of my work, I have developed real-time DSP
`
`systems, DSP software for telephony applications, various serial communication
`
`software and hardware, and Internet communication software. I have led real-time
`
`DSP and speech coding engineering groups in two high-tech companies before my
`
`present company (Comverse Technology, Inc. and Optibase Ltd.), and, at DSP
`
`Communications, Inc., I was involved with echo cancellation, noise cancellation,
`
`and the creation of state-of-the-art chipsets for cellular telephones.
`
`
`
`4
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`I have been working with, and have written programs for, personal
`
`15.
`
`computers since around 1986. Initially on DOS, and later on Windows 3.1,
`
`Windows 96, Windows 98, and Windows 2000, Windows NT, Windows XP,
`
`Windows 7 and Windows 10, Apple computers, iOS, Unix, Linux, and Android
`
`operating systems, as well as numerous Digital Signal Processors (DSP). Much of
`
`my programming concerned digital signal processing, particularly speech, audio
`
`and image coding, and communications.
`
`16. From 2001, I also have been providing expert technology services in
`
`patent disputes. My biography and experience relevant to my work in these
`
`matters is more fully detailed in Exhibit A.
`
`17.
`
`I have been the co-recipient, with Dr. Allen Gersho, of the Ericsson-
`
`Nokia Best Paper Award for the paper: “Enhanced Waveform Interpolative Coding
`
`at 4 kbps,” IEEE Workshop on Speech Coding, Finland, 1999. The IEEE
`
`Workshop on Speech Coding is an exclusive workshop for speech-coding
`
`researchers from around the world.
`
`18.
`
`I have authored and co-authored approximately eight journal
`
`publications, in addition to conference proceedings, technical articles, technical
`
`papers, book chapters, and technical presentations concerning a broad array of
`
`signal processing technology. I have also developed and taught many courses
`
`
`
`5
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`related to digital signal processing and signal processing systems. These courses
`
`have included introductory level and advanced courses.
`
`19.
`
`I have several international patents related to the field of audio signal
`
`enhancement, including U.S. Patent Nos. 6,614,370; 7,643,996; 7,010,482; and
`
`7,584,095.
`
`20.
`
`I am being compensated at the rate of $525 per hour for my work in
`
`connection with this matter. The compensation is not dependent in any way on the
`
`contents of this report, the substance of any further opinions or testimony that I
`
`may provide, or the ultimate outcome of this matter.
`
`II. LIST OF DOCUMENTS CONSIDERED IN FORMULATING MY
`OPINIONS
`In formulating my opinions, I have reviewed and considered all of the
`21.
`
`following documents:
`
`EXHIBIT
`NO.
`1001
`1002
`1003
`1004
`1005
`
`1006
`
`1007
`
`
`
`DESCRIPTION
`
`U.S. Patent No. 7,151,802
`File history of U.S. Patent No. 7,151,802
`Declaration of Jordan Cohen, Ph.D, Under 37 C.F.R. § 1.68
`ITU G.722 (1988)
`“A 13.0 kbit/s Wideband Speech Codec Based on SB-ACELP,”
`ICASSP ’98, PROC. 1998 IEEE INTL. CONF. ACOUSTICS,
`SPEECH, AND SIGNAL PROCESSING (1998) to J.
`Schnitzler (“Schnitzler”)
`“ITU-T G.729 Annex A: Reduced Complexity 8 kb/s CS-
`ACELP Codec for Digital Simultaneous Voice and Data,”
`IEEE COMM. MAG., 57-63 (Sept. 1997) to Salami et al.
`(“Salami 1997”)
`“A New Model of LPC Excitation for Producing Natural-
`
`6
`
`
`
`
`
`
`
`1008
`
`1009
`1010
`
`1011
`1012
`1013
`
`1014
`
`1015
`
`1016
`
`1017
`
`1021
`
`1022
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`Sounding Speech at Low Bit Rates,” ICASSP ’82, PROC. 1982
`IEEE INTL. CONF. ACOUSTICS, SPEECH, AND SIGNAL
`PROCESSING (1982) to Atal et al. (“Atal 1982”)
`“Waveform Interpolation Speech Coder at 4 kb/s,” M.S.
`Thesis, McGill University Department of Electrical and
`Computer Engineering (Aug. 1998) to E. Choy (“Choy”)
`GSM 06.60, v5.0.0 (1996)
`“Extrapolation of Wideband Speech From the Telephone
`Band,” University of Toronto Department of Electrical and
`Computer Engineering Master’s Thesis (1997) to A. A. Pyke
`(“Pyke”)
`ITU G.728 (1992)
`ITU G.729 (1996)
`“16 kbps Wideband and Speech Coding Technique Based on
`Algebraic CELP,” ICASSP ’91, PROC. 1991 IEEE INTL.
`CONF. ACOUSTICS, SPEECH, AND SIGNAL
`PROCESSING (1991) to Laflamme (“Laflamme”)
`“High Quality Coding of Wideband Audio Signals Using
`Transform Coded Excitation (TCX),” ICASSP ’94, PROC.
`1994 IEEE INTL. CONF. ACOUSTICS, SPEECH, AND
`SIGNAL PROCESSING (1994) to Lefebvre, et. al.
`(“Lefebvre”)
`“Low Delay Code Excited Linear Predictive (LD-CELP)
`Coding of Wide Band Speech at 32kbits/sec,” M.S. Thesis,
`Massachusetts Institute of Technology Department of Electrical
`Engineering and Computer Science (April 1, 1990) to E.
`Ordentlich (“Ordentlich Thesis”)
`“Code-Excited Linear Prediction (CELP): High-Quality Speech
`at Very Low Bit Rates,” ICASSP ’85, PROC. 1985 IEEE
`INTL. CONF. ACOUSTICS, SPEECH, AND SIGNAL
`PROCESSING (1985) to Schroeder, et. al. (“Schroeder”)
`“Speech Coding: A Tutorial Review,” PROC. IEEE, vol. 82,
`no. 10 (Oct. 1997) to Spanias (“Spanias”)
`Japanese Patent Application Publication No. JH08-123495
`(May
`17, 1996) to Tasaki et al.
`Japanese Patent Application Publication No. JH08-123495,
`Machine Translation (May 17, 1996) to Tasaki et al. (“Tasaki
`’495”)
`
`7
`
`
`
`1023
`
`1024
`
`1029
`
`1030
`
`1031
`1032
`
`1033
`
`1034
`
`1035
`
`2001
`
`2002
`
`2003
`
`2005
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`“16 kbit/s Wideband Speech Coding Based on Unequal
`Subbands,” ICASSP ’96, PROC. 1996 IEEE INTL. CONF.
`ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 57-63
`(May 7-10, 1996) to Paulus et al. (“Paulus”)
`“Reconstruction of Wideband Audio from Narrowband CELP
`Code,” ACOUSTICAL SOC. JPN., Lecture Paper Collection,
`249 (1994) Tasaki et. al. (“Tasaki”)
`“Analog to Digital Conversion of Voice by 2,400 Bit/Second
`Linear Predictive Coding,” Federal Standard 1015, December
`17, 1996.
`Claim Construction Order in Saint Lawrence Communications
`LLC v. ZTE Corporation, et al., 2-15-cv-00349 (E.D. Tex
`2016).
`U.S. Patent No. 5,966,689 to McCree (“McCree”)
`“MELP: The New Federal Standard at 2400 BPS,” IEEE
`ICASSP (1997) to Supplee, Lynn M. et al. (“Supplee”)
`High-Frequency Regeneration in Speech Coding Systems,”
`ICASSP ’79, PROC. 1979 IEEE INTL. CONF. ACOUSTICS,
`SPEECH, AND SIGNAL PROCESSING, 428-31 (April 2-4,
`1979) to Makhoul et al. (“Makhoul”)
`Joint Claim Construction Chart in Saint Lawrence
`Communications LLC v. Apple Inc., et al., 2:16-cv-00082
`(E.D. Tex 2017)
`“Real-time Implementation of a 9.6 kbit/s ACELP Wideband
`Speech Coder,” Conf. Rec. IEEE GLOBECOM, 447-451 (Dec.
`1992) to Salami et al. (“Salami”)
`P. Mermelstein, “G.722, A new CCITT Coding Standard for
`Digital Transmission of Wideband Audio Signals,” IEEE
`Comm. Mag., Vol. 26, No. 1, pp. 8-15, Jan. 1988.
`Fuemmeler et. al, “Techniques for the Regeneration of
`Wideband Speech from Narrowband Speech,” EURASIP
`Journal on Applied Signal Processing 2001:0, 1-9 (Sep. 2001).
`C.H. Ritz et. al., “Lossless Wideband Speech Coding,” 10th
`Australian Int’l. Conference on Speech Science & Technology,
`p. 249 (Dec. 2004).
`“Discrete-Time Signal Processing,” by Alan V. Oppenheim,
`
`8
`
`
`
`
`
`
`
`2006
`
`2007
`2008
`
`2009
`
`2010
`
`2011
`
`2012
`
`2013
`
`2014
`
`2015
`
`2016
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`Ronald W. Schafer
` https://www.mathworks.com/help/matlab/math/random-
`numbers-with-specific-mean-and-variance.html
`Transcript of Deposition of Dr. Johnson
`O. Gottesman and A. Gersho, “Enhanced Waveform
`Interpolative Coding at Low Bit Rate,” in IEEE Transactions
`on Speech and Audio Processing, vol. 9, November 2001, pp.
`786-798
`O. Gottesman and A. Gersho, “Enhancing Waveform
`Interpolative Coding with Weighted REW Paramertric
`Quantization,” in IEEE Workshop on Speech Coding
`Proceedings, pp. 50-52, September 2000, Wisconsin, USA
`O. Gottesman and A. Gersho, “High Quality Enhanced
`Waveform Interpolative Coding at 2.8 kbps,” in Proc. IEEE
`ICASSP’2000, vol. III, pp. 1363-1366, June 5-9, 2000,
`Istanbul, Turkey.
`O. Gottesman and A. Gersho, “Enhanced Analysis-by-
`Synthesis Waveform Interpolative Coding at 4 kbps,”
`EUROSPEECH’99, pp. 1443-1446, 1999, Hungary
`O. Gottesman and A. Gersho, “Enhanced Waveform
`Interpolative Coding at 4 kbps,” IEEE Workshop on Speech
`Coding Proceedings, pp. 90-92, 1999, Finland
`O. Gottesman, “Dispersion Phase Vector Quantization For
`Enhancement of Waveform Interpolative Coder,” IEEE
`ICASSP’99, vol. 1, pp. 269-272, 1999
`O. Gottesman and Y. Shoham, “Real-Time Implementation of
`High Quality 32 kbps Wideband Speech LD-CELP Coder,”
`EUROSPEECH’93, 1993
`Oded Gottesman, “Redundant compression of techniques for
`transmitting data over degraded communication links and/or
`storing data on media subject to degradation,” U.S. Patent
`6,614,370
`Oded Gottesman, “Enhanced waveform interpolative coder,”
`U.S. Patent 7,643,996
`
`9
`
`
`
`
`
`
`
`2017
`
`2018
`
`2019
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`Oded Gottesman and Allen Gersho, “REW parametric vector
`quantization and dual-predictive SEW vector quantization for
`waveform interpolative coding”, U.S. Patent 7,584,095
`Oded Gottesman and Allen Gersho, “REW parametric vector
`quantization and dual-predictive SEW vector quantization for
`waveform interpolative coding”, U.S. Patent 7,010,482
`Rabiner and Schafer, “Digital Processing Of Speech Signals,”
`Prentice Hall Inc., 1978.
`
`22.
`
`I have reviewed and am familiar with the response to Petition
`
`submitted on behalf of Patent Owner for covered patent review submitted with this
`
`Declaration and I agree with the technical analysis that underlies the positions set
`
`forth in the response to Petition.
`
`23.
`
`I have reviewed and am familiar with the Petition for covered patent
`
`submitted by Petitioner, and I disagree with some of it, and with its conclusions. I
`
`have reviewed and considered Dr. Cohen’s Report submitted on behalf of
`
`Petitioner, and I disagree with some of it, and with its conclusions.
`
`24.
`
`I may consider additional documents as they become available or
`
`other that are necessary to form my opinions. I reserve the right to revise,
`
`supplement, or amend my opinions based on new information and on my
`
`continuing analysis.
`
`
`
`10
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`III. TECHNICAL BACKGROUND AND STATE OF THE ART AT THE
`TIME OF THE ALLEGED INVENTION
`25. Speech signal is generated by air flow emanating from the lungs,
`
`passed through the vocal chords which may vibrate to create quasi-periodic air
`
`pulses, which are then passed through the vocal tracts (throat cavity, mouth cavity
`
`and nasal cavity) and radiate outside though the lips and/or the nose. When the
`
`vocal chords vibrate the generated quasi-periodic speech is called “voiced”, and
`
`when they are at rest the generated speech is more noise like and called
`
`“unvoiced”. A classical model for speech production (“Digital Processing of
`
`Speech Signals, L. R. Rabiner and R. W. Schafer, Prentice Hall, 1978) is illustrated
`
`below. This early model includes a switch for selecting between the “voiced”
`
`component generated by the periodic source and “unvoiced” component generated
`
`by the noise source, each multiplied by an appropriate gain, to form the excitation
`
`signal, which is then passed through a system modeling the vocal tract and uses
`
`time varying parameters, and the resulted signal is finally passed though the lips
`
`radiation model to form the generated speech. The fundamental period of the
`
`vocal chords vibration is known as the pitch period. The vocal tract is typically
`
`modeled by time varying filter implemented by set of parameters (or coefficients).
`
`
`
`11
`
`
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`
`
`26. Speech signal is much richer than simply voiced and unvoiced, for
`
`example buzzing sounds like “z” and “v” include both periodic and noise
`
`components, and therefore in later models, such as the one below, the switch was
`
`replaced by a mixer that combined timed varying mixture. For simplicity let’s
`
`consider the following simplified diagram. For producing good quality speech, the
`
`vocal tract parameters, representing the vocal tract resonance, are sufficiently
`
`updated every 20-30 ms, while the excitation components are sufficiently updated
`
`every 5-7 ms. It can be shown that the vocal tract effect adds correlation among
`
`
`
`12
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`neighboring samples, which is often treated as short-term prediction or
`
`redundancy. The vocal chords quasi periodicity is often treated as long-term
`
`prediction or redundancy.
`
`
`
`Figure 1. Block diagram of simplified speech production model
`
`27. Speech coding is used for representing speech signals in a compact
`
`form for transmission or storage. The compression is typically achieved by first
`
`capturing and removing the redundancies in forms of short-term prediction and
`
`long-term prediction that exist in the speech signal, and forming a residual signal
`
`having a much smaller dynamic range and energy. Then the residual signal is
`
`quantized, a process of representing it by a finite number of bits and limited
`
`resolution. The difference between the reduced resolution signal and the original
`
`signal is considered quantization noise (or quantization error). The number of bits
`
`
`
`13
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`used for the quantization governs the amount of quantization noise, or alternatively
`
`the overall quality of the coded speech.
`
`Speech coding and Linear Predictive Coding (LPC) analysis
`A.
`28. The Linear Predictive Coding (LPC) is a well known analysis for
`
`calculating the short term prediction coefficients that are used to model the vocal
`
`tract. It is typically applied directly to the input speech signal and LPCs are
`
`encoded (or quantized) once per frame at of 20-30 ms, which is an adequate
`
`interval based on the vocal tract change speed.
`
`29. The LPC is used to capture and remove the short-term prediction from
`
`the input speech, where the LPC coefficients are used as weighted sum of the most
`
`recent 10 samples to predict the present sample, which is essentially a filtering
`
`operation. The short-term predicted sample is subtracted from the input sample to
`
`generate the prediction error also known as the residual signal sample having a
`
`much smaller dynamic range and energy than the input speech. For this reason the
`
`residual signal is more desirable for quantization than the speech, since thanks to
`
`its smaller energy and dynamic range, its quantization error (or noise) is also
`
`smaller.
`
`
`
`14
`
`
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`Figure 2. Block diagram of open-loop short-term prediction removal
`
`
`
`example using LPC analysis
`
`
`
`30. The removal of the short-term prediction from the input speech to
`
`generate the residual signal is essentially a decomposition of the speech signal into
`
`its slowly varying vocal tract characteristics (i.e. the LPC coefficients) and its
`
`rapidly changing excitation characteristic (i.e. the residual signal). Such a
`
`decomposition allows for applying slower LPC quantization at a frame intervals of
`
`typically 20-30 ms, and a faster residual encoding at a subframe rate of typically 5-
`
`7 ms. Examples of speech signals and their corresponding residual signals (scaled
`
`up in the figure for better view) is illustrated at Id Fig 8.5. As can be seen, the
`
`residual signal exhibit less short-term correlation among neighboring samples,
`
`looks more like noise and comb pulses, but still exhibits long-term correlation
`
`between neighboring pitch periods.
`
`
`
`15
`
`
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`31.
`
`In the frequency domain, the vocal tract shapes the speech spectral
`
`envelope, while the excitation component shapes the fine spectral structure about
`
`that envelope. The figure below illustrates from top to bottom time domain
`
`sampled speech segment, the corresponding residual signal that contains much less
`
`short-term prediction and energy, the speech signal’s (log) spectrum and the
`
`corresponding spectral envelope represented by the LPC, and the residual signal’s
`
`(log) spectrum which is flat since the spectral envelope captured by the LPC was
`
`removed from the speech. As illustrated, the speech was decomposed to the
`
`
`
`16
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`spectral envelope - represented by the LPC, and the fine spectral variations around
`
`the spectral envelope - represented by the residual signal. As illustrated, during
`
`voiced speech segments, the residual signal exhibits comb-like harmonic structure
`
`at the fundamental frequency’s (pitch) multiple frequencies (harmonics). During
`
`buzzy sounds like “v” and “z”, and during transitions, the harmonic structure
`
`appears more prominent in the lower frequency range, and the higher frequency
`
`range (e.g. 4-5 kHz in the figure below) becomes less structured and more noise
`
`like.
`
`
`
`17
`
`
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`Long Term Prediction (Pitch Prediction)
`B.
`32. As explained above, the vocal chords “pitch” periodicity generates
`
`long term correlation between neighboring speech periods. For encoding purposes,
`
`such long-term correlation can be modeled and captured by means of long-term
`
`prediction. The parameters that are typically used to encode the long-term
`
`
`
`18
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`prediction are the pitch period which may be integer or fractional, and the gain
`
`used to predict the present period from the past period. Such parameters may be
`
`viewed as describing a comb filter, where the comb pulses are spaced by the pitch
`
`period, and their exponential progression is given by the gain. A simple scheme is
`
`illustrated below. In this scheme the past residual signal is stored and used to
`
`generate a predicted residual sample which is then subtracted from the input
`
`residual signal to form a long-term prediction error often referred to as the remnant
`
`signal.
`
`
`
`Figure 3. Block diagram of simplified open-loop long-term prediction
`
`removal example
`
`33. The remnant signal is noise like signal that exhibits almost no short
`
`term and no long-term correlation, and its energy and dynamic range are much
`
`smaller than the input speech. It is very hard to encode since it has no particular
`
`structure, and it is hard to tell how much its quantization error would affect the
`
`
`
`19
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`overall generated speech quality at the decoder. In other words, the simple
`
`encoding scheme described so far was performed in open-loop and did not really
`
`consider the speech signal generated by the decoder.
`
`C. Quantization
`34. Quantization is the process of reducing signal resolution, and
`
`representing the reduced resolution signal by some code to be stored and
`
`transmitted. It can be done on a sample by sample basis, using table of
`
`quantization levels, where the quantization level is selected by the encoder (scalar
`
`quantizer) using for example nearest-neighbor criterion. Quantization can also be
`
`performed on a set of values often called a vector (e.g. set of LPC parameters or set
`
`of consecutive signal samples), where quantized vectors are selected from multi-
`
`dimensional tables also referred to as codebooks, again the quantized vector is
`
`selected by the encoder (vector quantizer, VQ) using for example nearest-neighbor
`
`criterion. For a given number of bits, vector quantizer produces much better
`
`quality than scalar quantizer, at the cost of increased computational complexity.
`
`35. Codebooks (and quantizers) are usually computed by training data
`
`gathered during system operation, such that the generated trained codebook
`
`minimizes some distortion measure over all the training data generated by the
`
`system. Once the system is changed, e.g. its operation order is changed,
`
`computation elements are added, removed or modified, the codebook would no
`
`
`
`20
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`longer operate optimally, and new training process is needed to be performed to
`
`achieve adequate performance. In other words, systems having quantizers and/or
`
`codebooks may operate unpredictably once they are changed or combined with
`
`other systems.
`
`36. Similarly, person of ordinary skills in the art knows that one cannot
`
`combine systems that were designed to operate at one sampling rate with systems
`
`that were designed to operate at a different sampling rate. Combining such system
`
`typically yields unpredictable result.
`
`D. Code Excited Linear Prediction (CELP)
`37. Code Excited Linear Prediction (CELP) has become the most widely
`
`used coding scheme in the past 30 years. Its core idea is based on closed-loop
`
`quantization of the residual signal and remnant signal, also known as analysis-by-
`
`synthesis (AbS). In this paradigm, the excitation and the remnant signals are
`
`selected such that the resulted output speech best matches the input speech, as
`
`explained below.
`
`38. A simplified block diagram of the encoder of such a system is
`
`illustrated in Figure 4. In the encoder, the LPC are computed and quantized for the
`
`short-term correlation filter in open-loop manner once every 20-30 ms frame. This
`
`is done the standard LPC analysis that aims to maximize the speech short-term
`
`self-prediction, as explained above.
`
`
`
`21
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`39. Given the LPC precomputed in open-loop for the whole frame, the
`
`encoder then starts closed-loop processing of 3-5 subframes, each of 5-7 ms, where
`
`the distortion criterion is waveform matching between the input speech and the
`
`synthesized speech, or (theoretically) equivalently minimizing the energy of
`
`weighted error between the input speech and the synthesized speech. For
`
`convenience let’s consider a theoretically equivalent simplified diagram illustrated
`
`in Figure 4, although actual CELP system is implemented differently, and is more
`
`complicated.
`
`40. The encoder consists of two components, namely the analyzer, and the
`
`synthesizer. Since the synthesizer is actually a replica of the decoder, it is also
`
`known as the local decoder. In the synthesizer, the excitation is generated as the
`
`output of a long-term correlation filter, which generates the pitch periodicity. The
`
`excitation is then passed through the short-term correlation, also known as the LP
`
`synthesis filter, to produce the output speech. This filter models the vocal tract
`
`transfer function. It emphasizes certain frequency regions