`Declaration of Oded Gottesman, Ph.D. (Exhibit 2004)
`
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`
`ZTE USA, INC.
`Petitioner
`
`v.
`
`SAINT LAWRENCE COMMUNICATIONS LLC
`Patent Owner
`
`Case: IPR2016-00704
`Patent No. 7,151,802
`
`DECLARATION OF ODED GOTTESMAN, PH.D.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Mail Stop “PATENT BOARD”
`Patent Trial and Appeal Board
`U.S. Patent and Trademark Office
`P.O. Box 1450
`Alexandria, VA 22313-1450
`
`
`
`SLC 2004
`
`
`
`
`
`
`
`
`I.
`
`Inter Partes Review of USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`TABLE OF CONTENTS
`
`INTRODUCTION ........................................................................................... 1
`
`A.
`
`B.
`
`Background ........................................................................................... 1
`
`Qualifications ........................................................................................ 2
`
`II.
`
`LIST OF DOCUMENTS CONSIDERED IN FORMULATING MY
`OPINIONS ...................................................................................................... 6
`
`III. TECHNICAL BACKGROUND AND STATE OF THE ART AT
`THE TIME OF THE ALLEGED INVENTION ............................................. 9
`
`A.
`
`B.
`
`C.
`
`D.
`
`E.
`
`F.
`
`G.
`
`H.
`
`I.
`
`J.
`
`Speech coding and Linear Prediction Coding (LPC) analysis ........... 13
`
`Long Term Prediction (Pitch Prediction) ........................................... 18
`
`Quantization ........................................................................................ 20
`
`Code Excited Linear Prediction (CELP) ............................................ 21
`
`Perceptual Weighting .......................................................................... 28
`
`Long term pitch prediction and using adaptive codebook.................. 29
`
`CELP Decoding .................................................................................. 29
`
`Speech Bandwidth extension .............................................................. 30
`
`Speech Quality .................................................................................... 31
`
`Finite precision Considerations .......................................................... 31
`
`IV. PERSON OF ORDINARY SKILL IN THE ART ........................................ 34
`
`V. OVERVIEW OF THE ‘802 PATENT .......................................................... 35
`
`VI. THE CLAIMS OF THE ‘802 PATENT ....................................................... 39
`
`VII. LEGAL STANDARDS ................................................................................. 40
`
`A.
`
`B.
`
`Requirements of a Method and System Patent ................................... 40
`
`Obviousness ........................................................................................ 41
`
`VIII. CLAIM CONSTRUCTION .......................................................................... 46
`
`IX. SUMMARY OF PRIOR ART TO THE ’802 PATENT ALLEGED IN
`THIS PETITION (CASE IPR2016-00704) .................................................. 46
`
`A. A 13.0 kbit/s wideband speech codec based on SB-ACELP
`(“Schnitzler”) ...................................................................................... 46
`
`
`
`i
`
`
`
`
`
`Inter Partes Review of USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`B.
`
`C.
`
`
`Reconstruction of Wideband Audio from Narrowband CELP
`Code (“Tasaki”) .................................................................................. 50
`
`16 kbit/s Wideband Speech Coding Based On Unequal
`Subbands (“Paulus”) ........................................................................... 51
`
`X. GROUNDS FOR PATENTABILITY FOR EACH CLAIM OF THE
`‘802 PATENT ............................................................................................... 52
`
`A. Ground 1: Schnitzler in View of Tasaki Does Not Render
`Obvious Claims 1-3, 8-11, 16, 25-27, 32-35, 40, 49, 50, 52,
`And 53 ................................................................................................. 52
`
`1.
`
`2.
`
`Schnitzler in view of Tasaki does not render obvious claims 1,
`9, 25, 33, 49, 50, 52, and 53 ..................................................... 52
`
`Schnitzler In View Of Tasaki Also Does Not Render Obvious
`Claims 2, 3, 8, 10, 11, 16, 26, 27, 32, 34, 35, And 40 .............. 55
`
`3.
`
`No Motivation to Combine Schnitzler with Tasaki. ................ 59
`
`Ground 2: Schnitzler In View Of the Knowledge of a Person of
`Skill in the Art Does Not Render Obvious Claims 1, 9, 25, 33,
`49, 50, 52, And 53............................................................................... 59
`
`Ground 3: Schnitzler In View Of the Knowledge of a Person of
`Skill in the Art Further In View Of Paulus Does Not Render
`Obvious Claims 2, 3, 8, 10, 11, 16, 26, 27, 32, 34, 35, And 40. ........ 66
`
`1.
`
`No Motivation To Combine Schnitzler With Paulus ............... 66
`
`B.
`
`C.
`
`D.
`
`Summary of Invalidity Analysis ......................................................... 67
`
`XI. CONCLUSION ............................................................................................. 68
`
`BIBLIOGRAPHY ................................................................................................... 70
`
`DR. ODED GOTTESMAN – CURRICULUM VITAE ......................................... 72
`
`
`
`ii
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`I, Oded Gottesman, hereby declare as follows:
`
`I.
`
`INTRODUCTION
`
`A. Background
`
`1. My name is Oded Gottesman. I am a researcher and consultant
`
`working in areas related to speech and audio coding and enhancement, digital
`
`signal processing, telecommunications, networks, and location and positioning
`
`systems.
`
`2.
`
`I have been retained to act as an expert witness on behalf of SAINT
`
`LAWRENCE COMMUNICATIONS Inc. (“Patent Owner”) in connection with the
`
`above captioned Petition for Method and System Review of U.S. Patent No.
`
`7,151,802 (“Petition”) submitted by ZTE USA, INC. (“Petitioner”). I understand
`
`that this proceeding involves U.S. Patent No. 7,151,802 (“the ‘802 Patent”), titled
`
`“High frequency Content Recovering Method and Device for Over-Samples
`
`Synthesized Wideband Signal.” The ‘802 Patent is provided as Exhibit 1001.
`
`3.
`
`I understand that Petitioner challenges the validity of Claims 1-3, 8-
`
`11, 16, 25-27, 32-35, 40, 49, 50, 52, and 53 of the ‘802 Patent (the “challenged
`
`claims”).
`
`4.
`
`I have reviewed and am familiar with the ‘802 Patent as well as its
`
`prosecution history. The ‘802 prosecution history is provided as Exhibit 1003.
`
`Additionally, I have reviewed materials identified in Section III.
`
`
`
`1
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`5.
`
`
`As set forth below, I am familiar with the technology at issue as of the
`
`effective filing date of the ‘802 patent. I have been asked to provide my technical
`
`review, analysis, insights, and opinions regarding the prior art references that form
`
`the basis for the Petition. In forming my opinions, I have relied on my own
`
`experience and knowledge, my review of the ‘802 Patent and its file history, and of
`
`the prior art references cited in the Petition.
`
`6. My opinions expressed in this Declaration rely to a great extent on my
`
`own personal knowledge and recollection. However, to the extent I considered
`
`specific documents or data in formulating the opinions expressed in this
`
`Declaration, such items are expressly referred to in this Declaration.
`
`7.
`
`I am being compensated for my time in connection with this covered
`
`patent review at my standard consulting rate, which is $525 per hour. My
`
`compensation is not contingent upon and in no way affects the substance of my
`
`testimony.
`
`B. Qualifications
`
`8.
`
`I am a citizen of the United States, and I am currently employed as the
`
`Chief Technology Officer (“CTO”) of Compandent, Inc.
`
`9. My curriculum vitae, including my qualifications, a list of the
`
`publications that I have authored during my career, and a list of the cases in which,
`
`during the previous four years, I have testified as an expert at trial or by deposition,
`
`
`
`2
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`is attached to this report as Exhibit A. I expect to testify regarding my background,
`
`qualifications, and experience relevant to the issues in this investigation.
`
`10.
`
`I earned my Bachelor of Science degree in Electrical Engineering
`
`from Ben-Gurion University in 1988.
`
`11.
`
`In 1992, I earned my Master of Science degree in Electrical and
`
`Computer Engineering from Drexel University, which included performing
`
`research at AT&T Bell Labs, Murray Hill, at the time considered the world “holy
`
`grail” of speech processing research. My research was in the area of wideband
`
`speech coding, and titled “Algorithm Development and Real-Time Implementation
`
`of High-Quality 32kbps Wideband Speech Low-Delay Code-Excited Linear
`
`Predictive (LD-CELP) Coder”. The work continued a prior research by E.
`
`Ordentlich, and Y. Shoham who was also my M.Sc. research advisor. As a part of
`
`my work, I have also implemented that algorithm in DSP Assembly Language on
`
`two DSPs running in parallel. I subsequently co-authored and published two
`
`articled about this work.
`
`12.
`
`I earned my Doctorate of Philosophy in Electrical and Computer
`
`Engineering from the University of California at Santa Barbara in 2000.
`
`13.
`
`I have worked in the field of digital signal processing (“DSP”) for
`
`over 25 years, and have extensive experience in DSP research, design, and
`
`development, as well as the design and development of DSP-related software and
`
`
`
`3
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`hardware. Presently, I am the CTO of Compandent, a technology company that
`
`develops and provides telecommunication and DSP-related algorithms, software
`
`and hardware products, real-time DSP systems, speech coding and speech
`
`enhancement-related projects, and DSP, software, and hardware-related services.
`
`While at Compandent, I have contributed to a speech coding algorithm and noise
`
`canceller algorithm that has been adopted for secure voice communication by the
`
`U.S. Department of Defense & NATO. Currently, I am supporting the DoD’s and
`
`NATO’s use of these algorithms, and am performing real-time implementation
`
`projects for DoD and NATO vendors, as well as the Defense Advanced Research
`
`Projects Agency (DARPA).
`
`14.
`
`I have worked for numerous different companies in the field of digital
`
`signal processing during my career. I am very familiar with most, if not all, speech
`
`coding, speech enhancement, audio coding, and video coding techniques for
`
`various applications. As part of my work, I have developed real-time DSP
`
`systems, DSP software for telephony applications, various serial communication
`
`software and hardware, and Internet communication software. I have led real-time
`
`DSP and speech coding engineering groups in two high-tech companies before my
`
`present company (Comverse Technology, Inc. and Optibase Ltd.), and, at DSP
`
`Communications, Inc., I was involved with echo cancellation, noise cancellation,
`
`and the creation of state-of-the-art chipsets for cellular telephones.
`
`
`
`4
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`15.
`
`
`I have been working with, and have written programs for, personal
`
`computers since around 1986. Initially on DOS, and later on Windows 3.1,
`
`Windows 96, Windows 98, and Windows 2000, as well as Apple computers.
`
`Much of my programming concerned digital signal processing, particularly speech,
`
`audio and image coding, and communications.
`
`16. From 2001, I also have been providing expert technology services in
`
`patent disputes. My biography and experience relevant to my work in these
`
`matters is more fully detailed in Exhibit A.
`
`17.
`
`I have been the co-recipient, with Dr. Allen Gersho, of the Ericsson-
`
`Nokia Best Paper Award for the paper: “Enhanced Waveform Interpolative Coding
`
`at 4 kbps,” IEEE Workshop on Speech Coding, Finland, 1999. The IEEE
`
`Workshop on Speech Coding is an exclusive workshop for speech-coding
`
`researchers from around the world.
`
`18.
`
`I have authored and co-authored approximately eight journal
`
`publications, in addition to conference proceedings, technical articles, technical
`
`papers, book chapters, and technical presentations concerning a broad array of
`
`signal processing technology. I have also developed and taught many courses
`
`related to digital signal processing and signal processing systems. These courses
`
`have included introductory level and advanced courses.
`
`
`
`5
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`19.
`
`
`I have several international patents related to the field of audio signal
`
`enhancement, including U.S. Patent Nos. 6,614,370; 7,643,996; 7,010,482; and
`
`7,584,095.
`
`20.
`
`I am being compensated at the rate of $525 per hour for my work in
`
`connection with this matter. The compensation is not dependent in any way on the
`
`contents of this report, the substance of any further opinions or testimony that I
`
`may provide, or the ultimate outcome of this matter.
`
`II. LIST OF DOCUMENTS CONSIDERED IN FORMULATING MY
`OPINIONS
`
`21.
`
`In formulating my opinions, I have reviewed and considered all of the
`
`following documents:
`
`EXHIBIT
`NO.
`1001
`1002
`1003
`1004
`
`1005
`
`1006
`
`1007
`
`1008
`
`1009
`
`DESCRIPTION
`
`U.S. Patent No. 7,151,802
`File history of U.S. Patent No. 7,151,802
`Petition For Inter Partes Review Of U.S. Patent No. 7,151,802
`J. Schnitzler, A, “13.0 KBIT/S Wideband Speech Codec Based
`on SBACELP,” IEEE, pp. 157-160 (1998)
`J. Paulus and J. Schnitzler, “16 KBIT/S Wideband Speech
`Coding Based on Unequal Subbands,” IEEE, pp. 255-258
`(1996)
`J. Paulus and J. Schnitzler, “Wideband Speech Coding for the
`GSM Fullrate Channel,” ITG-Fachtagung
`prachkommunikation, pp. 11-14, September 1996
`ITU-T Recommendation G.729, “Coding of Speech at 8 kbit/s
`Using Conjugate-Structure Algebraic-Code-Excited Linear-
`Prediction (CSACELP),” March 1996
`GSM Enhanced Full Rate (EFR) Speech Transcoding (GSM
`06.60) (1996)
`Honkanen, T., et al., “Enhanced Full Rate Speech Codec for
`
`
`
`6
`
`
`
`
`
`
`
`1010
`1011
`1012
`
`1013
`
`1014
`1015
`
`1016
`
`1017
`
`1018
`
`1019
`
`1020
`
`1021
`
`1022
`
`1023
`
`1024
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`IS-136
`Digital Cellular System,” IEEE, pp. 731-734 (1997)
`U.S. Pat. No. 5,455,888 (“Iyengar”)
`U.S. Pat. No. 5,797,120 (“Ireton”)
`Atal, B. and Remde, J., “A New Model of LPC Extension for
`Producing Natural-Sounding Speech at Low Bit Rates,” IEEE,
`pp. 614-617 (1982).
`Spanias, A., “Speech Coding: A Tutorial Review,” Proceedings
`of the IEEE, 82(10):1539-82 (1994)
`Federal Coding Standard 1016 (February 14, 1991)
`ITU-T Recommendation G.728, “Coding of Speech at 16 kbit/s
`Using Low-Delay Code Excited Linear Prediction,” September
`1992
`Schroeder, M. and Atal, B, “Code-Excited Linear Prediction
`(CELP): High Quality Speech at Very Low Bit Rates,” IEEE,
`pp. 937-940 (1985)
`Tasaki, H., et al., “Reconstruction of Wideband Audio from
`Narrowband CELP Code,” Acoustical Society of Japan, pp.
`249-252 (1994)
`ITU G.722.2 Series G: Transmission Systems and Media,
`Digital Systems and Networks (2002)
`Oppenheim and Schafer, Discrete-Time Signal Processing, pp.
`17-33 (1989)
`Kroon, P., “Regular-Pulse Excitation – A Novel Approach to
`Effective and Efficient Multipulse Coding of Speech,” IEEE
`Transactions on Acoustics, Speech, and Signal Processing,
`34(5):1054-1063 (1986)
`Chan, W., et al., “Enhanced Multistage Vector Quantization by
`Joint Codebook Design,” IEEE Transactions on Acoustics,
`Speech, and Signal Processing, 40(11):1693-1697 (1992)
`Singhai, S. and A. Bishnu, “Improving Performance of Multi-
`Pulse LPC Coders at Low Bit Rates,” IEEE, pp. 1.3.1-1.3.4
`(1984)
`Ramachandran, R. and Kabal, R., “Pitch Prediction Filters in
`Speech Coding,” IEEE Transactions on Acoustics, Speech, and
`Signal Processing, 37(4):467-478 (1989)
`Atal, B. and Schroeder, M., “Predictive Coding of Speech
`Signals and Subjective Error Coding,” IEEE Transactions on
`Acoustics, Speech, and Signal Processing, 27(3):247-254
`
`7
`
`
`
`1025
`
`1026
`
`1027
`
`1028
`
`1029
`
`1030
`
`1031
`
`1032
`
`1033
`
`1034
`
`1035
`
`2001
`
`2002
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`(1979)
`Rabiner, L., “On the Use of Autocorrelation Analysis for Pitch
`Detection,” IEEE Transactions on Acoustics, Speech, and
`Signal Processing, 25(1):24-33 (1977)
`Kleijn, W. and Ketchum, R., “Improved Speech Quality and
`Efficient Vector Quantization in SELP,” IEEE, pp. 155-158
`(1988)
`Marques, J, et al., “Improved Pitch Predication With Factional
`Delays in CELP Coding,” Filters in Speech Coding,” IEEE,
`665-668 (1990)
`Chen, J., and Gersho, A., “Adaptive Postfiltering for Quality
`Enhancement of Coded Speech,” IEEE Transactions on
`Acoustics, Speech, and Signal Processing, 3(1), pp. 59-71
`(1995)
`ITU G.722 – General Aspects of Digital Transmission Systems
`Terminal Equipments (1988)
`Cheng, Y., et al., “Statistical Recovery of Wideband Speech
`From Narrowband Speech,” IEEE Transactions on Acoustics,
`Speech, and Signal Processing, 2(4):544-548 (1994) (“Cheng”)
`Ordentlich, E. and Shoham, Y., “Low-Delay Code-Excited
`Linear-Predictive Coding of Wideband Speech at 32 KBPS,”
`IEEE, pp. 9-12 (1991)
`Avendano, C., et al., “Beyond Nyquist: Towards the Recovery
`of Broad-Bandwidth Speech from Narrow-Bandwidth Speech,”
`Eurospeech, 95 (1995)
`Foodeei, M, “Low-Delay Speech Coding at 16 kb/s and Below
`,” (May 1991) (“Foodeei”)
`Odentlich, E., “Low Delay – Code Excited Linear Predictive
`(LD- CELP) Coding of Wide Band Speech at 32kbits/sec,”
`Massachusetts Institute of Technology, pp. 1-132 (1990).
`Declaration of Michael T. Johnson, PhD, Case: IPR2016-00704
`Patent 7,151,802.
`P. Mermelstein, “G.722, A new CCITT Coding Standard for
`Digital Transmission of Wideband Audio Signals,” IEEE
`Comm. Mag., Vol. 26, No. 1, pp. 8-15, Jan. 1988.
`
`Fuemmeler et. al, “Techniques for the Regeneration of
`Wideband Speech from Narrowband Speech,” EURASIP
`Journal on Applied Signal Processing 2001:0, 1-9 (Sep. 2001).
`
`8
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`2003
`
`2005
`
`
`C.H. Ritz et. al., “Lossless Wideband Speech Coding,” 10th
`Australian Int’l. Conference on Speech Science & Technology,
`p. 249 (Dec. 2004).
`
`“Discrete-Time Signal Processing,” by Alan V. Oppenheim,
`Ronald W. Schafer
`
`
`
`
`
`22.
`
`I have reviewed and am familiar with the response to Petition
`
`submitted on behalf pf Patent Owner for covered patent review submitted with this
`
`Declaration and I agree with the technical analysis that underlies the positions set
`
`forth in the response to Petition.
`
`23.
`
`I have reviewed and am familiar with the Petition for covered patent
`
`submitted by Petitioner, and I disagree with some of it, and with its conclusions. I
`
`have reviewed and considered Dr. Johnson Report submitted on behalf of
`
`Petitioner, and I disagree with some of it, and with its conclusions.
`
`24.
`
`I may consider additional documents as they become available or
`
`other that are necessary to form my opinions. I reserve the right to revise,
`
`supplement, or amend my opinions based on new information and on my
`
`continuing analysis.
`
`III. TECHNICAL BACKGROUND AND STATE OF THE ART AT THE
`TIME OF THE ALLEGED INVENTION
`
`25. Speech signal is generated by air flow emanating from the lungs,
`
`passed through the vocal chords which may vibrate to create quasi-periodic air
`
`
`
`9
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`pulses, which are then passed through the vocal tracts (throat cavity, mouth cavity
`
`and nasal cavity) and radiates outside though the lips and/or the nose. When the
`
`vocal chords vibrate the generated quasi-periodic speech is called “voiced”, and
`
`when they are at rest the generated speech is more noise like and called
`
`“unvoiced”. A classical model for speech production (“Digital Processing of
`
`Speech Signals, L. R. Rabiner and R. W. Schafer, Prentice Hall, 1978) is illustrated
`
`below. This early model includes a switch for selecting between the “voiced”
`
`component generated by the periodic source and “unvoiced” component generated
`
`by the noise source, each multiplied by an appropriate gain, to form the excitation
`
`signal, which is then passed through a system modeling the vocal tract and uses
`
`time varying parameters, and the resulted signal is finally passed though the lips
`
`radiation model to form the generated speech. The fundamental period of the
`
`vocal chords vibration is known as the pitch period. The vocal tract is typically
`
`modeled by time varying filter implemented by set of parameters (or coefficients).
`
`
`
`10
`
`
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`
`
`26. Speech signal is much richer than simply voiced and unvoiced, for
`
`example buzzing sounds like “z” and “v” include both periodic and noise
`
`components, and therefore in later models, such as the one below, the switch was
`
`replaced by a mixer that combined timed varying mixture. For simplicity let’s
`
`consider the following simplified diagram. For producing good quality speech, the
`
`vocal tract parameters, representing the vocal tract resonance, are sufficiently
`
`updated every 20-30 ms, while the excitation components are sufficiently updated
`
`every 5-7 ms. It can be shown that the vocal tract effect adds correlation among
`
`
`
`11
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`neighboring samples, which is often treated as short-term prediction or
`
`redundancy. The vocal chords quasi periodicity is often treated as long-term
`
`prediction or redundancy.
`
`
`
`Figure 1. Block diagram of simplified speech production model
`
`27. Speech coding is used for representing speech signals in a compact
`
`form for transmission or storage. The compression is typically achieved by first
`
`capturing and removing the redundancies in forms of short-term prediction and
`
`long-term prediction that exist in the speech signal, and forming a residual signal
`
`having a much smaller dynamic range and energy. Then the residual signal is
`
`quantized, a process of representing it by a finite number of bits and limited
`
`resolution. The difference between the reduced resolution signal and the original
`
`signal is considered quantization noise (or quantization error). The number of bits
`
`
`
`12
`
`Periodic
`component’s gain
`
`Vocal chords
`periodic component
`(pitch)
`
`Aspiration
`air flow
`(Noise)
`
`x
`
`x
`
`Noisy component’s
`gain
`
`+
`
`Excitation
`Signal
`
`(5-7 ms
`intervals)
`
`Vocal Tract model
`(20-30ms intervals)
`
`Speech
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`used for the quantization governs the amount of quantization noise, or alternatively
`
`the overall quality of the coded speech.
`
`A.
`
`Speech coding and Linear Prediction Coding (LPC) analysis
`
`28. The linear Predictive Coding (LPC) is a well known analysis for
`
`calculating the short term prediction coefficients that are used to model the vocal
`
`tract. It is typically applied directly to the input speech signal and LPCs are
`
`encoded (or quantized) once per frame at of 20-30 ms, which is an adequate
`
`interval based on the vocal tract change speed.
`
`29. The LPC are used to capture and remove the short-term prediction
`
`from the input speech, where the LPC coefficients are used as weighted sum of the
`
`most recent 10 samples to predict the present sample, which is essentially a
`
`filtering operation. The short-term predicted sample is subtracted from the input
`
`sample to generate the prediction error also known as the residual signal sample
`
`having a much smaller dynamic range and energy than the input speech. For this
`
`reason the residual signal is more desirable for quantization than the speech, since
`
`thanks to its smaller energy and dynamic range, its quantization error (or noise) is
`
`also smaller.
`
`
`
`13
`
`
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`Figure 2. Block diagram of open-loop short-term prediction removal
`
`
`
`example using LPC analysis
`
`
`
`30. The removal of the short-term prediction from the input speech to
`
`generate the residual signal is essentially a decomposition of the speech signal into
`
`its slowly varying vocal tract characteristics (i.e. the LPC coefficients) and its
`
`rapidly changing excitation characteristic (i.e. the residual signal). Such a
`
`decomposition allows for applying slower LPC quantization at a frame intervals of
`
`typically 20-30 ms, and a faster residual encoding at a subframe rate of typically 5-
`
`7 ms. Examples of speech signals and their corresponding residual signals (scaled
`
`up in the figure for better view) is illustrated at Id Fig 8.5. As can be seen, the
`
`residual signals exhibits less short-term correlation among neighboring samples,
`
`looks more like noise and comb pulses, but still exhibits long-term correlation
`
`between neighboring pitch periods.
`
`
`
`14
`
`Input Speech
`sample
`
`Residual
`signal sample
`
`-
`
`Stored recent
`speech samples
`
`LPC Prediction
`(Short-Term
`Prediction)
`
`Predicted
`speech sample
`
`LPC Analysis
`
`LPC Quantization
`(every 20-30 ms)
`
`LPC
`coefficients
`
`Quantized LPC
`coefficients
`
`Quantized LPC code
`transmitted every
`20-30 ms
`
`
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`31.
`
`In the frequency domain, the vocal tract shapes the speech spectral
`
`envelope, while the excitation component shapes the fine spectral structure about
`
`that envelope. The figure below illustrates from top to bottom time domain
`
`sampled speech segment, the corresponding residual signal that contains much less
`
`short-term prediction and energy, the speech signal’s (log) spectrum and the
`
`corresponding spectral envelope represented by the LPC, and the residual signal’s
`
`(log) spectrum which is flat since the spectral envelope captured by the LPC was
`
`removed from the speech. As illustrated, the speech was decomposed to the
`
`
`
`15
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`spectral envelope - represented by the LPC, and the fine spectral variations around
`
`the spectral envelope - represented by the residual signal. As illustrated, during
`
`voiced speech segments, the residual signal exhibits flat comb-like harmonic
`
`structure at the fundamental frequency’s (pitch) multiple frequencies (harmonics).
`
`During buzzy sounds like “v” and “z”, and during transitions, the harmonic
`
`structure appears more prominent in the lower frequency range, and the higher
`
`frequency range (e.g. 4-5kHz in the figure below) becomes less structured and
`
`more noise like. The LPC only captures the spectral envelope, and does not
`
`capture the harmonic structure. As illustrated, by removing the LPC spectral
`
`envelope from the speech in (c), the residual signal (d) has a flat spectrum that
`
`exhibits harmonic structure (during voiced speech).
`
`
`
`16
`
`
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`32. This also illustrates why some statements in the ‘802 petition are
`
`completely wrong, for example in p.7 “using harmonic structure represented the
`
`encoded LPC analysis…” Dr. Johnson is also flatly wrong about that, and the
`
`following statements in his reports reflect much confusion (Ex. 1035 regarding
`
`‘802 patent and Ex. 1036 regarding ‘521 patent, p.23 section.55) “such frequency
`
`
`
`17
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`extension can be accomplished in a simple manner by simply using the harmonic
`
`structure represented by the encoded LPC analysis…” Person of ordinary skills in
`
`the art knows that the LPC analysis does not represent the harmonic structure and
`
`cannot be used for that, rather than the residual signal does so. The LPC represents
`
`merely the spectral envelope, its resonances (formants) and valleys.
`
`33.
`
`Interestingly, the Ozawa is directed toward a heuristic preselection
`
`election of the long term prediction path for modeling the harmonic structure for
`
`the frame, based on the calculated LPC, and given that preselected single path the
`
`long-term prediction is determined for each subframe within the frame.
`
`B.
`
`Long Term Prediction (Pitch Prediction)
`
`34. As explained above, the vocal chords “pitch” periodicity generates
`
`long term correlation between neighboring speech periods. For encoding purposes,
`
`such long-term correlation can be modeled and captured by means of long-term
`
`prediction. The parameters that are typically used to encode the long-term
`
`prediction are the pitch period which may be integer or fractional, and the gain
`
`used to predict the present period from the past period. Such parameters may be
`
`viewed as describing a comb filter, where the comb pulses are spaced by the pitch
`
`period, and their exponential progression is given by the gain. A simple scheme is
`
`illustrated below. In this scheme the past residual signal is stored and used to
`
`generate a predicted residual sample which is then subtracted from the input
`
`
`
`18
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`residual signal to form a long-term prediction error often referred to as the remnant
`
`signal.
`
`
`
`Figure 3. Block diagram of simplified open-loop long-term prediction
`
`removal example
`
`35. The remnant signal is noise like signal that exhibits almost no short
`
`term and no long-term correlation, and its energy and dynamic range are much
`
`smaller than the input speech. It is very hard to encode since it has no particular
`
`structure, and it is hard to tell how much its quantization error would affect the
`
`overall generated speech quality at the decoder. In other words, the simple
`
`encoding scheme described so far was performed in open-loop and did not really
`
`consider the speech signal generated by the decoder. We will discuss below better
`
`coding scheme that operate in closed-loop and consider the synthesized speech.
`
`
`
`19
`
`Residual signal
`
`Residual sample
`
`Remnant
`signal sample
`
`-
`
`Stored previous
`period residual
`samples
`
`pitch Prediction
`(long-Term
`Prediction)
`
`Predicted
`residual sample
`
`Pitch (long-term)
`Analysis
`
`Pitch Quantization
`(evvery 5-7 ms)
`
`pitch
`coefficients
`
`Quantized pitch
`coefficients
`
`Quantized pitch
`code transmitted
`every 5-7 ms
`
`
`
`
`
`
`C. Quantization
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`36. Quantization is the process of reducing signal resolution, and
`
`representing the reduced resolution signal by some code to be stored and
`
`transmitted. It can be done on a sample by sample basis, using table of
`
`quantization levels, where the quantization level is selected by the encoder (scalar
`
`quantizer) using for example nearest-neighbor criterion. Quantization can also be
`
`performed on a set of values often called vector (e.g. set of LPC parametes or set
`
`of consecutive signal samples), where quantized vectors are selected from multi-
`
`dimensional tables also referred to as codebooks, again the quantized vector is
`
`selected by the encoder (vector quantizer, VQ) using for example nearest-neighbor
`
`criterion. For a given number of bits, vector quantizer produces much better
`
`quality than scalar quantizer, at the cost of increased computational complexity.
`
`37. Codebooks (and quantizers) are usually computed by training data
`
`gathered during system operation, such that the generated trained codebook
`
`minimizes some distortion measure over all the training data generated by the
`
`system. Once the system is changed, e.g. its operation order is changed,
`
`computation elements are added, removed or modified, the codebook would no
`
`longer operate optimally, and new training process is needed to be performed to
`
`achieve adequate performance. In other words, systems having quantizers and/or
`
`
`
`20
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`codebooks may operate unpredictably once they are changed or combined with
`
`other systems.
`
`38. Similarly, person of ordinary skills in the art knows that one cannot
`
`com