throbber
IPR2017-01244
`Saint Lawrence Communications
`Exhibit 2009
`
`ENHANCING WAVEFORM INTERPOLATIVE CODING WITH
`WEIGHTED REW PARAMETRIC QUANTIZATION
`Oded Gottesman and Allen Gersho
`Signal Compression Laboratory, Department of Electrical and Computer Engineering
`University of California, Santa Barbara, California 93106, USA
`E-mail: [oded, gersho]@scl.ece.ucsb.edu, Web: http://scl.ece.ucsb.edu
`
`[4]-[6]. Such a
`such as orthonormal polynomial
`functions,
`representation usually produces a smoother REW magnitude, and
`), is
`improves the perceptual quality. Suppose the REW magnitude, R(
`represented by a linear combination of orthonormal functions,
`i(
`)
`(
`0,
`)
`(
`R
`
`i
`
`i
`
`1 0
`
`I i
`
`):
`
`(1)
`
`where
` is the angular frequency, and I is the representation order. The
`REW magnitude is typically an increasing function of the frequency,
`which can be coarsely quantized with a small number of bits per
`waveform without significant perceptual degradation. Therefore, it may
`be advantageous to represent the REW magnitude in a simple, but
`perceptually relevant manner. Suppose the REW is modeled by the
`:
`following parametric representation,
`,
`(R
`)
`
`0;
`
`1
`
`(2)
`
`)(
`
`i
`
`(
`
`i
`
`)
`
`0,
`
`1 0
`
`I i
`
`R
`
`(
`
`,
`
`)
`
`T
`
`)(
`(
`),...,
`)(
`where
` is a parametric vector of coefficients
`0
`1
`I
` is the “unvoicing”
`within the representation model subspace, and
`parameter which is zero for a fully voiced spectrum, and one for a fully
`unvoiced spectrum.
`
`2.2 Piecewise Linear REW Representation
`For practical considerations we may assume that the parametric
`representation is piecewise linear, and may be represented by a set of N
`)
`,
`(
`R
`uniformly spaced spectra,
`, as illustrated in Figure 1.
`n
`This representation is similar to the hand-tuned REW codebook in [5],
`[6]. The parametric surface is linearly interpolated by:
`1(
`),
`(
`,
`(
`)
`,
`(
`)
`R
`R
`R
`
`)
`
`n
`
`(3)
`
`1 0
`
`N n
`
`n
`
`1
`
`;
`
`n
`
`1
`
`n
`
`;
`
`n
`
`1
`
`;
`
`n
`
`n
`
`1
`
`n
`
`From the linearity of the representation:
`1(
`)(
`)
`(4)
`n 1
`n
`n is the coefficient vector of the n-th REW magnitude
`where
`)
`( n
`representation, i.e.
`.
`2.3 REW Modeling
`2.3.1 Non-Weighted Distortion
`Suppose for a REW magnitude, R(
`), represented by some coefficient
`, we search for the parameter value, ( ), in
`,
`vector,
`n 1
`n
`, minimizes the MSE
`whose respective representation vector,
`)(
`distortion between the two spectra:
`
`
`
` , RRD
`
`(
`
`))
`
`R
`
`(
`
`)
`
`1(
`
`)
`
`R
`
`(
`
`,
`
`)
`
`n
`
`1
`
`R
`
`(
`
`,
`
`2
`
`d
`
`)
`
`n
`
`0
`From orthonormality, the distortion is equal to:
`2
`(
`
` , RRD
`)(
`))
`)
`1(
`
`n
`
`1
`
`2
`
`n
`
`(5)
`
`(6)
`
`ABSTRACT
`This paper presents an efficient quantization technique for the rapidly-
`evolving waveforms in waveform interpolative (WI) coders. The
`scheme, based on a parametrization of the rapidly-evolving waveform
`(REW) magnitude, and analysis-by-synthesis (AbS) vector quantization
`(VQ) of the REW parameters, allows both higher temporal and spectral
`resolution of the REW. A perceptually weighted distortion measure
`takes advantage of spectral and temporal masking and leads to improved
`reconstructed speech quality, most notably in mixed voiced and
`unvoiced speech segments. The technique is an important component of
`the Enhanced Waveform Interpolative (EWI) speech coder at 2.8 kbps
`that achieves a subjective quality slightly better than that of G.723.1 at
`6.3 kbps.
`
`1. INTRODUCTION
`In WI coding [1]-[10], the similarity between successive REW
`magnitudes is generally exploited by downsampling and interpolation
`and by constrained bit allocation [1]. In our earlier EWI coder [7]-[9],
`the REW magnitude was quantized on a waveform by waveform basis,
`and with an excessive number of bits – more than is perceptually
`required. Here we propose a novel parametric representation of the REW
`magnitude and an efficient paradigm for AbS predictive vector
`quantization of the REW parameter sequence. The proposed scheme is
`discussed, and a simplified version is derived. The quantization scheme
`employs a perceptually weighted distortion measure, which takes
`advantage of spectral and temporal masking. The new method achieves
`a substantial reduction in the REW bit rate.
`2. REW QUANTIZATION
`Efficient REW quantization can benefit from two observations: (a) the
`REW magnitude is typically an increasing function of the frequency,
`which suggests that an efficient parametric representation may be used;
`(b) one can observe similarity between successive REW magnitude
`spectra, which suggests that employing predictive VQ on a group of
`adjacent REWs may yield useful coding gains. The next four sections
`introduce the REW parametric representation and the associated VQ
`technique.
`
`2.1 REW Parameterization
`Direct quantization of the REW magnitude is a variable dimension
`quantization problem, which may result
`in spending bits and
`computational effort on perceptually irrelevant information. A simple
`and practical way
`to obtain a reduced, and fixed dimension
`representation of the REW is with a linear combination of basis
`
`This work was supported in part by the University of California MICRO program,
`Cisco Systems, Inc., Conexant Systems, Inc., Dialogic Corp., Fujitsu Laboratories of
`America, Inc., General Electric Corp., Hughes Network Systems, Lernout & Hauspie
`Speech Products NV, Lucent Technologies, Inc., Nokia Mobile Phones, Panasonic
`Speech Technology Laboratory, Qualcomm, Inc. and Texas Instruments, Inc.
`
`IEEE Workshop on Speech Coding, 2000 ©
`
`IPR2017-01077
`Saint Lawrence Communications
`Exhibit 2009
`
`

`

`all the vectors,
`
`)
`
`(m)
`
`, of the codebook. The quantized REW function
`(ˆ mcij
`)(ˆ m , is a function of the quantized parameter
`
`coefficients vector,
`
`)(ˆ m , which is obtained by passing the quantized vector,
`(ˆ mcij
`)
`,
`through the synthesis filter. The weighted distortion between each pair
`of input and quantized REW spectra is calculated. The total distortion is
`a temporally-weighted sum of the M spectrally weighted distortions.
`Since the predictor coefficients are known, direct VQ can be used to
`simplify the computations. For a piecewise linear parametric REW
`representation, a substantial simplification of the search computations
`may be obtained by
`interpolating
`the distortion between
`the
`representation spectra set.
`g(m)
`Gain
`LPC a(m)
`REW representation coefficient vector
`Pi
`
`Predictor
`Codebook
`Vector
`Quantizer
`Codebook
`
`^
`cij(m)
`
`Synthesis
`Filter
`
`1
`
`1
`zPi
`
`1
`
`^
`
`(m)
`
`)^
`
`(
`
`min(*)
`
`+
`
`Spectral
`Weighting
`
`||*|| 2
`
`Temporal
`Weighting
`
`Fig. 2. REW Parametric Representation AbS VQ
`2.4.2 Simplified Parametric Quantization Scheme
`The above scheme maps each quantized parameter to a coefficient
`vector, which is used to compute the spectral distortion. To reduce
`complexity, such a mapping, and spectral distortion computation may be
`eliminated by using the simplified scheme described below. For high
`rate, and smooth representation function, the total distortion is equal the
`sum of modeling distortion and quantization distortion:
`
` mRmRD ( (ˆ(ˆ),
`
`))
`w
`
`1
`
`M m
`
` (12)
`
`The optimal interpolation factor that minimizes the MSE is:
`T
`(
`)
`()
`
`opt
`
`n
`
`n
`
`1
`
`n
`
`1
`
`2
`
`(7)
`
`1
`n
`n
`and the respective optimal parameter value, which is a continuous
`variable between zero and one, is given by:
`(8)
`1(
`)(
`)
`1
`opt
`n
`opt
`n
`This result allows a rapid search for the best unvoicing parameter value
`needed to transform the coefficient vector to a scalar parameter, for
`encoding or for VQ design.
`
`1
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`0
`11N
`
`2N
`
`3 /4
`
`2
`
`1
`
`0.5
`/2
`REW parameter
`/4
`Frequency [radians]
`00
`0
`Fig. 1. REW Parametric Representation
`)
`,
`(R
`2.3.2 Weighted Distortion
`Commonly in speech coding, quantization is performed with a
`perceptually weighted distortion measure. In this case, the weighted
`distortion between the input and the parametric representation modeled
`spectra is equal:
`
`
`
` , RRD
`w
`
`)(
`
`R
`
`(
`
`)
`
`R
`
`(
`
`,
`
`2
`
`)
`
`0
`
`
`
` ( )dW
`
`
`
`(9)
`
`)(
`(
`)
`)(
`W
`(W
`)
`where
` is the weighted correlation matrix of the orthonormal
`functions, its elements are:
`
`T
`
`(
`
`)
`
`(
`
`)
`
`(
`
`)
`
` ,
`
`(10)
`
`(13)
`
`(14)
`
`
`
` mRmRD )),(( (ˆ(ˆ
`
`))
`
`M m
`
`
`
`
`
`M m
`
`1
`
`M m
`
`W
`
`,
`ji
`
`W
`
`i
`
`(
`
`j
`
`)
`d
`
`0
`)(
` is the modeled parametric
` is the input coefficient vectors, and
`coefficient vector. The optimal parameter that minimizes (9) is given by:
`T
`(
`(
`)
`)
`(11)
`1
`T
`)
`(
`)
`(
`1
`1
`n
`n
`n
`n
`and the respective optimal parameter value is computed using (8).
`
`n
`
`n
`
`n
`
`1
`
`opt
`
`2.4 REW Quantization
`2.4.1 Full Complexity Spectral Quantization Scheme
`A novel switched-predictive AbS REW parameter VQ paradigm is
`illustrated in Fig. 2. Switched-prediction is introduced to allow for
`different levels of REW parameter correlation. The scheme incorporates
`both spectral weighting and temporal weighting. The spectral weighting
`is used for the distortion between each pair of input and quantized
`spectra. In order to improve SEW/REW mixing, particularly in mixed
`voiced and unvoiced speech segments, and to increase speech crispness,
`especially for plosives and onsets, temporal weighting is incorporated in
`the AbS REW VQ. The temporal weighting is a monotonic function of
`the temporal gain. A codebook with two partitions is used. Each
`partition has a particular predictor coefficient value, Pi, i=1 or 2. The
`quantization target is an M-dimensional vector of REW spectra. Each
`REW spectrum is represented by a vector of basis function coefficients
`denoted by (m). The search for the minimal WMSE is performed over
`
`
`
` mRmRD ))((),(
`
`
`
`
`w
`w
`1
`1
`The quantization distortion is related to the quantized parameter by:
`(ˆ(ˆ
`((
`))
`)),
`mRmRD
`w
`
`M m
`
`1
`
`M m
`
`((
`
`m
`
`))
`
`T
`
`m
`
`))
`
`(
`mW
`
`)
`
`((
`
`m
`
`))
`
`m
`
`))
`
`(ˆ(ˆ
`(ˆ(ˆ
`1
`which, for the piecewise linear representation case, is equal
`(ˆ(ˆ
`))
`)),
`((
`mRmRD
`w
`
`1
`
`M
`
`2
`
`m
`
`1
`
`((
`
`m
`
`))
`
`T
`
`
`
`mW (
`
`)
`
`((
`
`m
`
`))(
`
`n
`
`(
`
`m
`
`)
`
`(ˆ
`
`2
`
`m
`
`))
`
`n
`))m
`(( ))m
`((
`
`
`
`((
`))
`where
`quantization
`. The
`1 m
`n
`n
`n
`distortion is linearly related to the REW parameter squared quantization
`2)
`(ˆ
`(
`)
`m
`m
`, and therefore justifies direct VQ of the REW
`error,
`parameter.
`2.4.2.1 Simplified Scheme, Non-Weighted Distortion
`The encoder maps the REW magnitude to an unvoicing parameter, and
`then quantizes the parameter by AbS VQ, as illustrated in Fig. 3. This
`scheme allows for higher temporal as well as spectral REW resolution,
`since no downsampling is performed, and the continuous parameter is
`vector quantized in AbS [10].
`
`IEEE Workshop on Speech Coding, 2000 ©
`
`

`

`Powered by TCPDF (www.tcpdf.org)
`
`Bits / second
`1000
`600
`450
`400
`350
`2800
`
`Bits / Frame
`Parameter
`20
`LPC
`2x6 = 12
`Pitch
`9
`Gain
`8
`SEW magnitude
`7
`REW magnitude
`Total
`56
`Table 1. Bit allocation for 2.8 kbps EWI coder
`4. SUBJECTIVE RESULTS
`We have conducted a subjective A/B test to compare our 2.8 kbps EWI
`coder to the G.723.1. The test data included 24 modified intermediate
`reference system (M-IRS) [11] filtered speech sentences, 12 of which
`are of female speakers, and 12 of male speakers. Twelve listeners
`participated in the test. The test results, listed in Table 2, indicate that
`the subjective quality of the 2.8 kbps EWI is slightly better than that of
`G.723.1 at 6.3 kbps.
`
`No Preference
`6.3 kbps G.723.1
`2.8 kbps WI
`Test
`25.00%
`36.81%
`38.19%
`Female
`25.00%
`31.94%
`43.06%
`Male
`Total
`25.00%
`34.38%
`40.63%
`Table 2. Results of subjective A/B test for comparison between
`the 2.8 kbps EWI coder to 6.3 kbps G.723.1. With 95%
`certainty the result lies within +/-5.59%.
`5. SUMMARY
`We have found a new technique that enhances the performance of the
`WI coder, and allow for better coding efficiency. It offers an efficient
`parametrization of the REW magnitude, and AbS VQ of the REW
`parameter. This scheme allows for higher temporal as well as spectral
`REW resolution. The weighted distortion scheme
`improves
`the
`reconstructed speech quality, most notably in mixed voiced and
`unvoiced speech segments. Subjective test results indicate that the
`performance of the 2.8 kbps EWI coder slightly exceeds that of G.723.1
`at 6.3 kbps and therefore EWI achieves very close to toll quality, at least
`under clean speech conditions.
`
`[3]
`
`6. REFERENCES
`[1] W. B. Kleijn, and J. Haagen, "A Speech Coder Based on Decomposition of
`Characteristic Waveforms," IEEE ICASSP'95, pp. 508-511, 1995.
`[2] W. B. Kleijn, and J. Haagen, "Waveform Interpolation for Coding and
`Synthesis," in Speech Coding Synthesis by W. B. Kleijn and K. K. Paliwal,
`Elsevier Science B. V., Chapter 5, pp. 175-207, 1995.
`I. S. Burnett, and D. H. Pham, "Multi-Prototype Waveform Coding using
`Frame-by-Frame Analysis-by-Synthesis," IEEE ICASSP'97, pp. 1567-1570,
`1997.
`[4] W. B. Kleijn, Y. Shoham, D. Sen, and R. Haagen, "A Low-Complexity
`Waveform Interpolation Coder," IEEE ICASSP'96, pp. 212-215, 1996.
`[5] Y. Shoham, "Very Low Complexity Interpolative Speech Coding at 1.2 to 2.4
`kbps," IEEE ICASSP'97, pp. 1599-1602, 1997.
`[6] Y. Shoham, "Low-Complexity Speech Coding at 1.2 to 2.4 kbps Based on
`Waveform Interpolation," International Journal of Speech Technology,
`Kluwer Academic Publishers, pp. 329-341, May 1999.
`[7] O. Gottesman, “Dispersion Phase Vector Quantization for Enhancement of
`Waveform Interpolative Coder,” IEEE ICASSP’99, vol. 1, pp. 269-272, 1999.
`[8] O. Gottesman and A. Gersho, “Enhanced Waveform Interpolative Coding at 4
`kbps,” IEEE Speech Coding Workshop, pp. 90-92, 1999, Finland.
`[9] O. Gottesman and A. Gersho, "Enhanced Analysis-by-Synthesis Waveform
`Interpolative Coding at 4 kbps," EUROSPEECH'99, pp. 1443-1446, 1999,
`Hungary.
`[10] O. Gottesman and A. Gersho, “High Quality Enhanced Waveform
`Interpolative Coding at 2.8 kbps,” IEEE ICASSP’00, Turkey, 2000.
`ITU-T, "Recommendation P.830, Subjective Performance Assessment of
`Telephone Band and Wideband Digital Codecs," Annex D, ITU, Geneva,
`February 1996.
`
`[11]
`
`REW polynomial coefficients
`
`(m)
`
`Vector
`Quantizer
`Codebook
`
`^
`ci(m)
`
`Synthesis
`Filter
`
`1
`
`1
`zP
`
`1
`
`( (m))
`
`^
`
`(m)
`
`(m)
`
`+
`
`min(*)
`
`||*|| 2
`
`Fig. 3. REW Parametric Representation AbS VQ.
`2.4.2.2 Simplified Scheme, Weighted Distortion
`We may improve the quantization scheme to incorporate spectral and
`temporal weightings, as illustrated in Fig. 4. The REW parameter vector
`is first mapped to REW parameter by minimizing a distortion, which is
`, as described
`weighted by the coefficient spectral weighting matrix
`in section 2.3.2. Then, the resulted REW parameter is used to compute a
`weighting, ws( (m)), in form of the spectral sensitivity to the REW
`
` 2)m
`(ˆ
`parameter squared quantization error,
`, given by:
`
`(
`)m
`T
`
`(15)
`
`((
`
`
`s mw
`
`))
`
`|
`(
`)
`m
`For the piecewise linear representation case it is equal:
`1
`T
`))
`((
`((
`)
`
`))
` (mW
`((
`m
`w
`m
`m
`
`s
`
`2
`
`n
`
`))
`
`(16)
`
`n
`
`((
`
`m
`
`))
`
`(
`
`m
`
`)
`
`2
`
`m
`
`)
`
`(17)
`
`M
`
`)
`
`M m
`
`)
`
`Mm
`
`The above derivative can be computed off line. Additionally, a temporal
`weighting, in the form of a monotonic function of the gain, may be used,
`and denoted by wt(g(m)). The AbS REW parameter quantization is
`computed by minimizing the combined spectrally and temporally
`weighted distortion:
`(ˆ,
`(ˆ
`))
`(
`(
`)
`(
`(
`wmgw
`m
`D
`m
`s
`t
`1
`1
`1
`m
`The weighted distortion scheme improves the reconstructed speech
`quality, most notably in mixed voiced and unvoiced speech segments.
`This may be explained by an improvement in REW/SEW mixing, and a
`less destructive REW contribution.
`g(m)
`Gain
`a(m)
`
`LPC
`
`Spectral
`Weighting
`
`REW polynomial coefficients
`
`(m)
`
`Vector
`Quantizer
`Codebook
`
`^
`ci(m)
`
`Synthesis
`Filter
`
`1
`
`1
`zP
`
`1
`
`(m)
`
`( (m))
`
`^
`
`(m)
`
`(m)
`
`+
`
`||*||2
`
`min(*)
`
`Temporal/Spectral
`Weighting
`
`Fig. 4. REW Parametric Representation Simplified Weighted AbS VQ
`3. BIT ALLOCATION
`The bit allocation for the 2.8 kbps EWI coder is given in Table 1. The
`frame length is 20 ms, and ten waveforms are extracted per frame. The
`line spectral frequencies (LSFs) are coded using predictive MSVQ,
`having two stages of 10 bit each, a 2-bit increase compared to the past
`version of our coder [8], [9]. The pitch is coded twice per frame.
`
`IEEE Workshop on Speech Coding, 2000 ©
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket