throbber
Inter Partes Review of USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D. (Exhibit 2004)
`
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`
`ZTE USA, INC.
`Petitioner
`
`v.
`
`SAINT LAWRENCE COMMUNICATIONS LLC
`Patent Owner
`
`Case: IPR2016-00704
`Patent No. 7,151,802
`
`DECLARATION OF ODED GOTTESMAN, PH.D.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Mail Stop “PATENT BOARD”
`Patent Trial and Appeal Board
`U.S. Patent and Trademark Office
`P.O. Box 1450
`Alexandria, VA 22313-1450
`
`
`
`SLC 2004
`
`

`

`
`
`
`
`
`I.
`
`Inter Partes Review of USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`TABLE OF CONTENTS
`
`INTRODUCTION ........................................................................................... 1
`
`A.
`
`B.
`
`Background ........................................................................................... 1
`
`Qualifications ........................................................................................ 2
`
`II.
`
`LIST OF DOCUMENTS CONSIDERED IN FORMULATING MY
`OPINIONS ...................................................................................................... 6
`
`III. TECHNICAL BACKGROUND AND STATE OF THE ART AT
`THE TIME OF THE ALLEGED INVENTION ............................................. 9
`
`A.
`
`B.
`
`C.
`
`D.
`
`E.
`
`F.
`
`G.
`
`H.
`
`I.
`
`J.
`
`Speech coding and Linear Prediction Coding (LPC) analysis ........... 13
`
`Long Term Prediction (Pitch Prediction) ........................................... 18
`
`Quantization ........................................................................................ 20
`
`Code Excited Linear Prediction (CELP) ............................................ 21
`
`Perceptual Weighting .......................................................................... 28
`
`Long term pitch prediction and using adaptive codebook.................. 29
`
`CELP Decoding .................................................................................. 29
`
`Speech Bandwidth extension .............................................................. 30
`
`Speech Quality .................................................................................... 31
`
`Finite precision Considerations .......................................................... 31
`
`IV. PERSON OF ORDINARY SKILL IN THE ART ........................................ 34
`
`V. OVERVIEW OF THE ‘802 PATENT .......................................................... 35
`
`VI. THE CLAIMS OF THE ‘802 PATENT ....................................................... 39
`
`VII. LEGAL STANDARDS ................................................................................. 40
`
`A.
`
`B.
`
`Requirements of a Method and System Patent ................................... 40
`
`Obviousness ........................................................................................ 41
`
`VIII. CLAIM CONSTRUCTION .......................................................................... 46
`
`IX. SUMMARY OF PRIOR ART TO THE ’802 PATENT ALLEGED IN
`THIS PETITION (CASE IPR2016-00704) .................................................. 46
`
`A. A 13.0 kbit/s wideband speech codec based on SB-ACELP
`(“Schnitzler”) ...................................................................................... 46
`
`
`
`i
`
`

`

`
`
`Inter Partes Review of USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`B.
`
`C.
`
`
`Reconstruction of Wideband Audio from Narrowband CELP
`Code (“Tasaki”) .................................................................................. 50
`
`16 kbit/s Wideband Speech Coding Based On Unequal
`Subbands (“Paulus”) ........................................................................... 51
`
`X. GROUNDS FOR PATENTABILITY FOR EACH CLAIM OF THE
`‘802 PATENT ............................................................................................... 52
`
`A. Ground 1: Schnitzler in View of Tasaki Does Not Render
`Obvious Claims 1-3, 8-11, 16, 25-27, 32-35, 40, 49, 50, 52,
`And 53 ................................................................................................. 52
`
`1.
`
`2.
`
`Schnitzler in view of Tasaki does not render obvious claims 1,
`9, 25, 33, 49, 50, 52, and 53 ..................................................... 52
`
`Schnitzler In View Of Tasaki Also Does Not Render Obvious
`Claims 2, 3, 8, 10, 11, 16, 26, 27, 32, 34, 35, And 40 .............. 55
`
`3.
`
`No Motivation to Combine Schnitzler with Tasaki. ................ 59
`
`Ground 2: Schnitzler In View Of the Knowledge of a Person of
`Skill in the Art Does Not Render Obvious Claims 1, 9, 25, 33,
`49, 50, 52, And 53............................................................................... 59
`
`Ground 3: Schnitzler In View Of the Knowledge of a Person of
`Skill in the Art Further In View Of Paulus Does Not Render
`Obvious Claims 2, 3, 8, 10, 11, 16, 26, 27, 32, 34, 35, And 40. ........ 66
`
`1.
`
`No Motivation To Combine Schnitzler With Paulus ............... 66
`
`B.
`
`C.
`
`D.
`
`Summary of Invalidity Analysis ......................................................... 67
`
`XI. CONCLUSION ............................................................................................. 68
`
`BIBLIOGRAPHY ................................................................................................... 70
`
`DR. ODED GOTTESMAN – CURRICULUM VITAE ......................................... 72
`
`
`
`ii
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`I, Oded Gottesman, hereby declare as follows:
`
`I.
`
`INTRODUCTION
`
`A. Background
`
`1. My name is Oded Gottesman. I am a researcher and consultant
`
`working in areas related to speech and audio coding and enhancement, digital
`
`signal processing, telecommunications, networks, and location and positioning
`
`systems.
`
`2.
`
`I have been retained to act as an expert witness on behalf of SAINT
`
`LAWRENCE COMMUNICATIONS Inc. (“Patent Owner”) in connection with the
`
`above captioned Petition for Method and System Review of U.S. Patent No.
`
`7,151,802 (“Petition”) submitted by ZTE USA, INC. (“Petitioner”). I understand
`
`that this proceeding involves U.S. Patent No. 7,151,802 (“the ‘802 Patent”), titled
`
`“High frequency Content Recovering Method and Device for Over-Samples
`
`Synthesized Wideband Signal.” The ‘802 Patent is provided as Exhibit 1001.
`
`3.
`
`I understand that Petitioner challenges the validity of Claims 1-3, 8-
`
`11, 16, 25-27, 32-35, 40, 49, 50, 52, and 53 of the ‘802 Patent (the “challenged
`
`claims”).
`
`4.
`
`I have reviewed and am familiar with the ‘802 Patent as well as its
`
`prosecution history. The ‘802 prosecution history is provided as Exhibit 1003.
`
`Additionally, I have reviewed materials identified in Section III.
`
`
`
`1
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`5.
`
`
`As set forth below, I am familiar with the technology at issue as of the
`
`effective filing date of the ‘802 patent. I have been asked to provide my technical
`
`review, analysis, insights, and opinions regarding the prior art references that form
`
`the basis for the Petition. In forming my opinions, I have relied on my own
`
`experience and knowledge, my review of the ‘802 Patent and its file history, and of
`
`the prior art references cited in the Petition.
`
`6. My opinions expressed in this Declaration rely to a great extent on my
`
`own personal knowledge and recollection. However, to the extent I considered
`
`specific documents or data in formulating the opinions expressed in this
`
`Declaration, such items are expressly referred to in this Declaration.
`
`7.
`
`I am being compensated for my time in connection with this covered
`
`patent review at my standard consulting rate, which is $525 per hour. My
`
`compensation is not contingent upon and in no way affects the substance of my
`
`testimony.
`
`B. Qualifications
`
`8.
`
`I am a citizen of the United States, and I am currently employed as the
`
`Chief Technology Officer (“CTO”) of Compandent, Inc.
`
`9. My curriculum vitae, including my qualifications, a list of the
`
`publications that I have authored during my career, and a list of the cases in which,
`
`during the previous four years, I have testified as an expert at trial or by deposition,
`
`
`
`2
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`is attached to this report as Exhibit A. I expect to testify regarding my background,
`
`qualifications, and experience relevant to the issues in this investigation.
`
`10.
`
`I earned my Bachelor of Science degree in Electrical Engineering
`
`from Ben-Gurion University in 1988.
`
`11.
`
`In 1992, I earned my Master of Science degree in Electrical and
`
`Computer Engineering from Drexel University, which included performing
`
`research at AT&T Bell Labs, Murray Hill, at the time considered the world “holy
`
`grail” of speech processing research. My research was in the area of wideband
`
`speech coding, and titled “Algorithm Development and Real-Time Implementation
`
`of High-Quality 32kbps Wideband Speech Low-Delay Code-Excited Linear
`
`Predictive (LD-CELP) Coder”. The work continued a prior research by E.
`
`Ordentlich, and Y. Shoham who was also my M.Sc. research advisor. As a part of
`
`my work, I have also implemented that algorithm in DSP Assembly Language on
`
`two DSPs running in parallel. I subsequently co-authored and published two
`
`articled about this work.
`
`12.
`
`I earned my Doctorate of Philosophy in Electrical and Computer
`
`Engineering from the University of California at Santa Barbara in 2000.
`
`13.
`
`I have worked in the field of digital signal processing (“DSP”) for
`
`over 25 years, and have extensive experience in DSP research, design, and
`
`development, as well as the design and development of DSP-related software and
`
`
`
`3
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`hardware. Presently, I am the CTO of Compandent, a technology company that
`
`develops and provides telecommunication and DSP-related algorithms, software
`
`and hardware products, real-time DSP systems, speech coding and speech
`
`enhancement-related projects, and DSP, software, and hardware-related services.
`
`While at Compandent, I have contributed to a speech coding algorithm and noise
`
`canceller algorithm that has been adopted for secure voice communication by the
`
`U.S. Department of Defense & NATO. Currently, I am supporting the DoD’s and
`
`NATO’s use of these algorithms, and am performing real-time implementation
`
`projects for DoD and NATO vendors, as well as the Defense Advanced Research
`
`Projects Agency (DARPA).
`
`14.
`
`I have worked for numerous different companies in the field of digital
`
`signal processing during my career. I am very familiar with most, if not all, speech
`
`coding, speech enhancement, audio coding, and video coding techniques for
`
`various applications. As part of my work, I have developed real-time DSP
`
`systems, DSP software for telephony applications, various serial communication
`
`software and hardware, and Internet communication software. I have led real-time
`
`DSP and speech coding engineering groups in two high-tech companies before my
`
`present company (Comverse Technology, Inc. and Optibase Ltd.), and, at DSP
`
`Communications, Inc., I was involved with echo cancellation, noise cancellation,
`
`and the creation of state-of-the-art chipsets for cellular telephones.
`
`
`
`4
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`15.
`
`
`I have been working with, and have written programs for, personal
`
`computers since around 1986. Initially on DOS, and later on Windows 3.1,
`
`Windows 96, Windows 98, and Windows 2000, as well as Apple computers.
`
`Much of my programming concerned digital signal processing, particularly speech,
`
`audio and image coding, and communications.
`
`16. From 2001, I also have been providing expert technology services in
`
`patent disputes. My biography and experience relevant to my work in these
`
`matters is more fully detailed in Exhibit A.
`
`17.
`
`I have been the co-recipient, with Dr. Allen Gersho, of the Ericsson-
`
`Nokia Best Paper Award for the paper: “Enhanced Waveform Interpolative Coding
`
`at 4 kbps,” IEEE Workshop on Speech Coding, Finland, 1999. The IEEE
`
`Workshop on Speech Coding is an exclusive workshop for speech-coding
`
`researchers from around the world.
`
`18.
`
`I have authored and co-authored approximately eight journal
`
`publications, in addition to conference proceedings, technical articles, technical
`
`papers, book chapters, and technical presentations concerning a broad array of
`
`signal processing technology. I have also developed and taught many courses
`
`related to digital signal processing and signal processing systems. These courses
`
`have included introductory level and advanced courses.
`
`
`
`5
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`19.
`
`
`I have several international patents related to the field of audio signal
`
`enhancement, including U.S. Patent Nos. 6,614,370; 7,643,996; 7,010,482; and
`
`7,584,095.
`
`20.
`
`I am being compensated at the rate of $525 per hour for my work in
`
`connection with this matter. The compensation is not dependent in any way on the
`
`contents of this report, the substance of any further opinions or testimony that I
`
`may provide, or the ultimate outcome of this matter.
`
`II. LIST OF DOCUMENTS CONSIDERED IN FORMULATING MY
`OPINIONS
`
`21.
`
`In formulating my opinions, I have reviewed and considered all of the
`
`following documents:
`
`EXHIBIT
`NO.
`1001
`1002
`1003
`1004
`
`1005
`
`1006
`
`1007
`
`1008
`
`1009
`
`DESCRIPTION
`
`U.S. Patent No. 7,151,802
`File history of U.S. Patent No. 7,151,802
`Petition For Inter Partes Review Of U.S. Patent No. 7,151,802
`J. Schnitzler, A, “13.0 KBIT/S Wideband Speech Codec Based
`on SBACELP,” IEEE, pp. 157-160 (1998)
`J. Paulus and J. Schnitzler, “16 KBIT/S Wideband Speech
`Coding Based on Unequal Subbands,” IEEE, pp. 255-258
`(1996)
`J. Paulus and J. Schnitzler, “Wideband Speech Coding for the
`GSM Fullrate Channel,” ITG-Fachtagung
`prachkommunikation, pp. 11-14, September 1996
`ITU-T Recommendation G.729, “Coding of Speech at 8 kbit/s
`Using Conjugate-Structure Algebraic-Code-Excited Linear-
`Prediction (CSACELP),” March 1996
`GSM Enhanced Full Rate (EFR) Speech Transcoding (GSM
`06.60) (1996)
`Honkanen, T., et al., “Enhanced Full Rate Speech Codec for
`
`
`
`6
`
`

`

`
`
`
`
`1010
`1011
`1012
`
`1013
`
`1014
`1015
`
`1016
`
`1017
`
`1018
`
`1019
`
`1020
`
`1021
`
`1022
`
`1023
`
`1024
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`IS-136
`Digital Cellular System,” IEEE, pp. 731-734 (1997)
`U.S. Pat. No. 5,455,888 (“Iyengar”)
`U.S. Pat. No. 5,797,120 (“Ireton”)
`Atal, B. and Remde, J., “A New Model of LPC Extension for
`Producing Natural-Sounding Speech at Low Bit Rates,” IEEE,
`pp. 614-617 (1982).
`Spanias, A., “Speech Coding: A Tutorial Review,” Proceedings
`of the IEEE, 82(10):1539-82 (1994)
`Federal Coding Standard 1016 (February 14, 1991)
`ITU-T Recommendation G.728, “Coding of Speech at 16 kbit/s
`Using Low-Delay Code Excited Linear Prediction,” September
`1992
`Schroeder, M. and Atal, B, “Code-Excited Linear Prediction
`(CELP): High Quality Speech at Very Low Bit Rates,” IEEE,
`pp. 937-940 (1985)
`Tasaki, H., et al., “Reconstruction of Wideband Audio from
`Narrowband CELP Code,” Acoustical Society of Japan, pp.
`249-252 (1994)
`ITU G.722.2 Series G: Transmission Systems and Media,
`Digital Systems and Networks (2002)
`Oppenheim and Schafer, Discrete-Time Signal Processing, pp.
`17-33 (1989)
`Kroon, P., “Regular-Pulse Excitation – A Novel Approach to
`Effective and Efficient Multipulse Coding of Speech,” IEEE
`Transactions on Acoustics, Speech, and Signal Processing,
`34(5):1054-1063 (1986)
`Chan, W., et al., “Enhanced Multistage Vector Quantization by
`Joint Codebook Design,” IEEE Transactions on Acoustics,
`Speech, and Signal Processing, 40(11):1693-1697 (1992)
`Singhai, S. and A. Bishnu, “Improving Performance of Multi-
`Pulse LPC Coders at Low Bit Rates,” IEEE, pp. 1.3.1-1.3.4
`(1984)
`Ramachandran, R. and Kabal, R., “Pitch Prediction Filters in
`Speech Coding,” IEEE Transactions on Acoustics, Speech, and
`Signal Processing, 37(4):467-478 (1989)
`Atal, B. and Schroeder, M., “Predictive Coding of Speech
`Signals and Subjective Error Coding,” IEEE Transactions on
`Acoustics, Speech, and Signal Processing, 27(3):247-254
`
`7
`
`

`

`1025
`
`1026
`
`1027
`
`1028
`
`1029
`
`1030
`
`1031
`
`1032
`
`1033
`
`1034
`
`1035
`
`2001
`
`2002
`
`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`(1979)
`Rabiner, L., “On the Use of Autocorrelation Analysis for Pitch
`Detection,” IEEE Transactions on Acoustics, Speech, and
`Signal Processing, 25(1):24-33 (1977)
`Kleijn, W. and Ketchum, R., “Improved Speech Quality and
`Efficient Vector Quantization in SELP,” IEEE, pp. 155-158
`(1988)
`Marques, J, et al., “Improved Pitch Predication With Factional
`Delays in CELP Coding,” Filters in Speech Coding,” IEEE,
`665-668 (1990)
`Chen, J., and Gersho, A., “Adaptive Postfiltering for Quality
`Enhancement of Coded Speech,” IEEE Transactions on
`Acoustics, Speech, and Signal Processing, 3(1), pp. 59-71
`(1995)
`ITU G.722 – General Aspects of Digital Transmission Systems
`Terminal Equipments (1988)
`Cheng, Y., et al., “Statistical Recovery of Wideband Speech
`From Narrowband Speech,” IEEE Transactions on Acoustics,
`Speech, and Signal Processing, 2(4):544-548 (1994) (“Cheng”)
`Ordentlich, E. and Shoham, Y., “Low-Delay Code-Excited
`Linear-Predictive Coding of Wideband Speech at 32 KBPS,”
`IEEE, pp. 9-12 (1991)
`Avendano, C., et al., “Beyond Nyquist: Towards the Recovery
`of Broad-Bandwidth Speech from Narrow-Bandwidth Speech,”
`Eurospeech, 95 (1995)
`Foodeei, M, “Low-Delay Speech Coding at 16 kb/s and Below
`,” (May 1991) (“Foodeei”)
`Odentlich, E., “Low Delay – Code Excited Linear Predictive
`(LD- CELP) Coding of Wide Band Speech at 32kbits/sec,”
`Massachusetts Institute of Technology, pp. 1-132 (1990).
`Declaration of Michael T. Johnson, PhD, Case: IPR2016-00704
`Patent 7,151,802.
`P. Mermelstein, “G.722, A new CCITT Coding Standard for
`Digital Transmission of Wideband Audio Signals,” IEEE
`Comm. Mag., Vol. 26, No. 1, pp. 8-15, Jan. 1988.
`
`Fuemmeler et. al, “Techniques for the Regeneration of
`Wideband Speech from Narrowband Speech,” EURASIP
`Journal on Applied Signal Processing 2001:0, 1-9 (Sep. 2001).
`
`8
`
`

`

`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`2003
`
`2005
`
`
`C.H. Ritz et. al., “Lossless Wideband Speech Coding,” 10th
`Australian Int’l. Conference on Speech Science & Technology,
`p. 249 (Dec. 2004).
`
`“Discrete-Time Signal Processing,” by Alan V. Oppenheim,
`Ronald W. Schafer
`
`
`
`
`
`22.
`
`I have reviewed and am familiar with the response to Petition
`
`submitted on behalf pf Patent Owner for covered patent review submitted with this
`
`Declaration and I agree with the technical analysis that underlies the positions set
`
`forth in the response to Petition.
`
`23.
`
`I have reviewed and am familiar with the Petition for covered patent
`
`submitted by Petitioner, and I disagree with some of it, and with its conclusions. I
`
`have reviewed and considered Dr. Johnson Report submitted on behalf of
`
`Petitioner, and I disagree with some of it, and with its conclusions.
`
`24.
`
`I may consider additional documents as they become available or
`
`other that are necessary to form my opinions. I reserve the right to revise,
`
`supplement, or amend my opinions based on new information and on my
`
`continuing analysis.
`
`III. TECHNICAL BACKGROUND AND STATE OF THE ART AT THE
`TIME OF THE ALLEGED INVENTION
`
`25. Speech signal is generated by air flow emanating from the lungs,
`
`passed through the vocal chords which may vibrate to create quasi-periodic air
`
`
`
`9
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`pulses, which are then passed through the vocal tracts (throat cavity, mouth cavity
`
`and nasal cavity) and radiates outside though the lips and/or the nose. When the
`
`vocal chords vibrate the generated quasi-periodic speech is called “voiced”, and
`
`when they are at rest the generated speech is more noise like and called
`
`“unvoiced”. A classical model for speech production (“Digital Processing of
`
`Speech Signals, L. R. Rabiner and R. W. Schafer, Prentice Hall, 1978) is illustrated
`
`below. This early model includes a switch for selecting between the “voiced”
`
`component generated by the periodic source and “unvoiced” component generated
`
`by the noise source, each multiplied by an appropriate gain, to form the excitation
`
`signal, which is then passed through a system modeling the vocal tract and uses
`
`time varying parameters, and the resulted signal is finally passed though the lips
`
`radiation model to form the generated speech. The fundamental period of the
`
`vocal chords vibration is known as the pitch period. The vocal tract is typically
`
`modeled by time varying filter implemented by set of parameters (or coefficients).
`
`
`
`10
`
`

`

`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`
`
`26. Speech signal is much richer than simply voiced and unvoiced, for
`
`example buzzing sounds like “z” and “v” include both periodic and noise
`
`components, and therefore in later models, such as the one below, the switch was
`
`replaced by a mixer that combined timed varying mixture. For simplicity let’s
`
`consider the following simplified diagram. For producing good quality speech, the
`
`vocal tract parameters, representing the vocal tract resonance, are sufficiently
`
`updated every 20-30 ms, while the excitation components are sufficiently updated
`
`every 5-7 ms. It can be shown that the vocal tract effect adds correlation among
`
`
`
`11
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`neighboring samples, which is often treated as short-term prediction or
`
`redundancy. The vocal chords quasi periodicity is often treated as long-term
`
`prediction or redundancy.
`
`
`
`Figure 1. Block diagram of simplified speech production model
`
`27. Speech coding is used for representing speech signals in a compact
`
`form for transmission or storage. The compression is typically achieved by first
`
`capturing and removing the redundancies in forms of short-term prediction and
`
`long-term prediction that exist in the speech signal, and forming a residual signal
`
`having a much smaller dynamic range and energy. Then the residual signal is
`
`quantized, a process of representing it by a finite number of bits and limited
`
`resolution. The difference between the reduced resolution signal and the original
`
`signal is considered quantization noise (or quantization error). The number of bits
`
`
`
`12
`
`Periodic
`component’s gain
`
`Vocal chords
`periodic component
`(pitch)
`
`Aspiration
`air flow
`(Noise)
`
`x
`
`x
`
`Noisy component’s
`gain
`
`+
`
`Excitation
`Signal
`
`(5-7 ms
`intervals)
`
`Vocal Tract model
`(20-30ms intervals)
`
`Speech
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`used for the quantization governs the amount of quantization noise, or alternatively
`
`the overall quality of the coded speech.
`
`A.
`
`Speech coding and Linear Prediction Coding (LPC) analysis
`
`28. The linear Predictive Coding (LPC) is a well known analysis for
`
`calculating the short term prediction coefficients that are used to model the vocal
`
`tract. It is typically applied directly to the input speech signal and LPCs are
`
`encoded (or quantized) once per frame at of 20-30 ms, which is an adequate
`
`interval based on the vocal tract change speed.
`
`29. The LPC are used to capture and remove the short-term prediction
`
`from the input speech, where the LPC coefficients are used as weighted sum of the
`
`most recent 10 samples to predict the present sample, which is essentially a
`
`filtering operation. The short-term predicted sample is subtracted from the input
`
`sample to generate the prediction error also known as the residual signal sample
`
`having a much smaller dynamic range and energy than the input speech. For this
`
`reason the residual signal is more desirable for quantization than the speech, since
`
`thanks to its smaller energy and dynamic range, its quantization error (or noise) is
`
`also smaller.
`
`
`
`13
`
`

`

`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`Figure 2. Block diagram of open-loop short-term prediction removal
`
`
`
`example using LPC analysis
`
`
`
`30. The removal of the short-term prediction from the input speech to
`
`generate the residual signal is essentially a decomposition of the speech signal into
`
`its slowly varying vocal tract characteristics (i.e. the LPC coefficients) and its
`
`rapidly changing excitation characteristic (i.e. the residual signal). Such a
`
`decomposition allows for applying slower LPC quantization at a frame intervals of
`
`typically 20-30 ms, and a faster residual encoding at a subframe rate of typically 5-
`
`7 ms. Examples of speech signals and their corresponding residual signals (scaled
`
`up in the figure for better view) is illustrated at Id Fig 8.5. As can be seen, the
`
`residual signals exhibits less short-term correlation among neighboring samples,
`
`looks more like noise and comb pulses, but still exhibits long-term correlation
`
`between neighboring pitch periods.
`
`
`
`14
`
`Input Speech
`sample
`
`Residual
`signal sample
`
`-
`
`Stored recent
`speech samples
`
`LPC Prediction
`(Short-Term
`Prediction)
`
`Predicted
`speech sample
`
`LPC Analysis
`
`LPC Quantization
`(every 20-30 ms)
`
`LPC
`coefficients
`
`Quantized LPC
`coefficients
`
`Quantized LPC code
`transmitted every
`20-30 ms
`
`

`

`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`31.
`
`In the frequency domain, the vocal tract shapes the speech spectral
`
`envelope, while the excitation component shapes the fine spectral structure about
`
`that envelope. The figure below illustrates from top to bottom time domain
`
`sampled speech segment, the corresponding residual signal that contains much less
`
`short-term prediction and energy, the speech signal’s (log) spectrum and the
`
`corresponding spectral envelope represented by the LPC, and the residual signal’s
`
`(log) spectrum which is flat since the spectral envelope captured by the LPC was
`
`removed from the speech. As illustrated, the speech was decomposed to the
`
`
`
`15
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`spectral envelope - represented by the LPC, and the fine spectral variations around
`
`the spectral envelope - represented by the residual signal. As illustrated, during
`
`voiced speech segments, the residual signal exhibits flat comb-like harmonic
`
`structure at the fundamental frequency’s (pitch) multiple frequencies (harmonics).
`
`During buzzy sounds like “v” and “z”, and during transitions, the harmonic
`
`structure appears more prominent in the lower frequency range, and the higher
`
`frequency range (e.g. 4-5kHz in the figure below) becomes less structured and
`
`more noise like. The LPC only captures the spectral envelope, and does not
`
`capture the harmonic structure. As illustrated, by removing the LPC spectral
`
`envelope from the speech in (c), the residual signal (d) has a flat spectrum that
`
`exhibits harmonic structure (during voiced speech).
`
`
`
`16
`
`

`

`
`
`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`
`32. This also illustrates why some statements in the ‘802 petition are
`
`completely wrong, for example in p.7 “using harmonic structure represented the
`
`encoded LPC analysis…” Dr. Johnson is also flatly wrong about that, and the
`
`following statements in his reports reflect much confusion (Ex. 1035 regarding
`
`‘802 patent and Ex. 1036 regarding ‘521 patent, p.23 section.55) “such frequency
`
`
`
`17
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`extension can be accomplished in a simple manner by simply using the harmonic
`
`structure represented by the encoded LPC analysis…” Person of ordinary skills in
`
`the art knows that the LPC analysis does not represent the harmonic structure and
`
`cannot be used for that, rather than the residual signal does so. The LPC represents
`
`merely the spectral envelope, its resonances (formants) and valleys.
`
`33.
`
`Interestingly, the Ozawa is directed toward a heuristic preselection
`
`election of the long term prediction path for modeling the harmonic structure for
`
`the frame, based on the calculated LPC, and given that preselected single path the
`
`long-term prediction is determined for each subframe within the frame.
`
`B.
`
`Long Term Prediction (Pitch Prediction)
`
`34. As explained above, the vocal chords “pitch” periodicity generates
`
`long term correlation between neighboring speech periods. For encoding purposes,
`
`such long-term correlation can be modeled and captured by means of long-term
`
`prediction. The parameters that are typically used to encode the long-term
`
`prediction are the pitch period which may be integer or fractional, and the gain
`
`used to predict the present period from the past period. Such parameters may be
`
`viewed as describing a comb filter, where the comb pulses are spaced by the pitch
`
`period, and their exponential progression is given by the gain. A simple scheme is
`
`illustrated below. In this scheme the past residual signal is stored and used to
`
`generate a predicted residual sample which is then subtracted from the input
`
`
`
`18
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`residual signal to form a long-term prediction error often referred to as the remnant
`
`signal.
`
`
`
`Figure 3. Block diagram of simplified open-loop long-term prediction
`
`removal example
`
`35. The remnant signal is noise like signal that exhibits almost no short
`
`term and no long-term correlation, and its energy and dynamic range are much
`
`smaller than the input speech. It is very hard to encode since it has no particular
`
`structure, and it is hard to tell how much its quantization error would affect the
`
`overall generated speech quality at the decoder. In other words, the simple
`
`encoding scheme described so far was performed in open-loop and did not really
`
`consider the speech signal generated by the decoder. We will discuss below better
`
`coding scheme that operate in closed-loop and consider the synthesized speech.
`
`
`
`19
`
`Residual signal
`
`Residual sample
`
`Remnant
`signal sample
`
`-
`
`Stored previous
`period residual
`samples
`
`pitch Prediction
`(long-Term
`Prediction)
`
`Predicted
`residual sample
`
`Pitch (long-term)
`Analysis
`
`Pitch Quantization
`(evvery 5-7 ms)
`
`pitch
`coefficients
`
`Quantized pitch
`coefficients
`
`Quantized pitch
`code transmitted
`every 5-7 ms
`
`

`

`
`
`
`C. Quantization
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`36. Quantization is the process of reducing signal resolution, and
`
`representing the reduced resolution signal by some code to be stored and
`
`transmitted. It can be done on a sample by sample basis, using table of
`
`quantization levels, where the quantization level is selected by the encoder (scalar
`
`quantizer) using for example nearest-neighbor criterion. Quantization can also be
`
`performed on a set of values often called vector (e.g. set of LPC parametes or set
`
`of consecutive signal samples), where quantized vectors are selected from multi-
`
`dimensional tables also referred to as codebooks, again the quantized vector is
`
`selected by the encoder (vector quantizer, VQ) using for example nearest-neighbor
`
`criterion. For a given number of bits, vector quantizer produces much better
`
`quality than scalar quantizer, at the cost of increased computational complexity.
`
`37. Codebooks (and quantizers) are usually computed by training data
`
`gathered during system operation, such that the generated trained codebook
`
`minimizes some distortion measure over all the training data generated by the
`
`system. Once the system is changed, e.g. its operation order is changed,
`
`computation elements are added, removed or modified, the codebook would no
`
`longer operate optimally, and new training process is needed to be performed to
`
`achieve adequate performance. In other words, systems having quantizers and/or
`
`
`
`20
`
`

`

`
`
`Inter Partes Review USPN 7,151,802
`Declaration of Oded Gottesman, Ph.D.
`
`
`codebooks may operate unpredictably once they are changed or combined with
`
`other systems.
`
`38. Similarly, person of ordinary skills in the art knows that one cannot
`
`com

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket