throbber
ENHANCED WAVEFORM INTERPOLATIVE CODING AT 4 KBPS
`Oded Gottesman and Allen Gersho
`
`Signal Compression Laboratory
`Department of Electrical and Computer Engineering
`University of California
`Santa Barbara, California 93106, USA
`E-mail: [oded, gersho]@scl.ece.ucsb.edu
`
`used for parametric coders as well as for waveform coders. The
`EWI coder employs this scheme, which incorporates perceptual
`weighting and does not require any phase unwrapping.
`
`The WI coders use non-ideal low-pass filters for downsampling
`and upsampling of the SEW. We describe a novel AbS SEW
`quantization scheme, which takes the non-ideal filters into
`consideration. An improved match between reconstructed and
`original SEW is obtained, most notably in the transitions.
`
`Pitch accuracy is crucial for high quality reproduced speech in
`WI coders. We introduce a novel pitch search technique based on
`varying segment boundaries; it allows for locking onto the most
`probable pitch period during transitions or other segments with
`rapidly varying pitch.
`Commonly in speech coding the gain sequence is downsampled
`and interpolated, as a result it is often smeared during plosives
`and onsets. To alleviate such a problem a novel switched-
`predictive AbS gain VQ scheme is introduced; it is based on
`temporal weighting.
`
`This paper is organized as follows. In Section 2 we explain the
`AbS SEW optimization. The dispersion phase quantizer is
`discussed in Section 3. Section 4 describes the pitch search. In
`Section 5 we present the switched-predictive AbS gain VQ. The
`bit allocation is given in section 6. Subjective results are
`reported in Section 7. Finally, we summarize our work.
`
`2.
`
`AbS SEW OPTIMIZATION
`
`ABSTRACT
`This paper presents an Enhanced Waveform Interpolative (EWI)
`speech coder at 4 kbps. The system incorporates novel features
`such as analysis-by-synthesis (AbS) vector-quantization (VQ) of
`the dispersion-phase, AbS optimization of the slowly evolving
`waveform (SEW), a special pitch search for transitions, and
`switched-predictive analysis-by-synthesis gain VQ. Subjective
`quality tests indicate that it exceeds that of MPEG-4 at 4 kbps
`and of G.723.1 at 5.3 kbps, and it is slightly better than that of
`G.723.1 at 6.3 kbps.
`
`1.
`
`INTRODUCTION
`
`Recently, there has been growing interest in developing toll-
`quality speech coders at rates of 4 kbps and below. The speech
`quality produced by waveform coders such as code-excited
`linear predictive (CELP) coder [1] degrades rapidly at rates
`below 5 kbps. On the other hand, parametric coders such as the
`Waveform-interpolative (WI) coder [4]-[6], the sinusoidal-
`transform coder (STC) [2], and the multiband-excitation (MBE)
`coder [3] produce good quality at low rates, but they do not
`achieve toll quality. This is mainly due to lack of robustness to
`parameter estimation, which is commonly done in open loop,
`and to inadequate modeling of non-stationary speech segments.
`In this work we propose a paradigm which incorporates AbS
`approach in the parameter estimation, and a special pitch search
`for the non-stationary segments.
`
`In parametric coders the phase information is commonly not
`transmitted, and this is for two reasons: first, the phase is of
`secondary perceptual significance; and second, no efficient
`phase quantization scheme is known. WI coders [4]-[6] typically
`use a fixed phase vector for the SEW, for example, in [5], fixed
`male speaker extracted phase was used. On the other hand,
`waveform coders such as CELP [1], by directly quantizing the
`waveform, implicitly allocate an excessive number of bits to the
`phase information - more than is perceptually required. Recently
`[7], we proposed a novel, efficient AbS VQ encoding of the
`dispersion phase of the excitation signal to enhance the
`performance of the WI coder at a very low bit-rate, which can be
`
`=
`
`D
`
`wI
`
`Commonly in WI coders the SEW is distorted by downsampling
`and upsampling with non-ideal low-pass filters. In order to
`reduce such distortion, an optimal SEW vector is calculated and
`quantized. Consider the accumulated weighted distortion, DwI,
`between the input SEW vectors, mr , and the interpolated
`vectors, mr~ , given by:
`[
`]
`[
`~
`--
`H
`r
`rWr
`m
`mm
`m
`
`]œœœœßøŒŒŒŒºØ ---
`(cid:229)(cid:229) -+
`
`M
`
`=
`1
`LM
`
`m
`
`+
`
`1
`1[
`+=
`1
`Mm
`
`]
`
`~
`r
`m
`
` (1)
`
`a
`
`(
`
`t
`
`m
`
`[
`r
`m
`
`2
`
`)]
`
`[
`]
`~
`H
`rWr
`M
`mm
`
`~
`r
`M
`
`This work was supported in part by the University of California MICRO
`program, ACT Networks, Inc, Cisco Systems, Inc., Conexant Systems, Inc.,
`Dialogic Corp., DSP Group, Inc., Fujitsu Laboratories of America, Inc.,
`General Electric Corp., Hughes Network Systems, Intel Corp., Lernout &
`Hauspie Speech Products NV, Lucent Technologies, Inc., Nokia Mobile
`Phones, Panasonic Speech Technology Laboratory, Qualcomm, Inc., Sun
`Microsystems Inc., and Texas Instruments, Inc.
`
`where M is the number of waveforms per frame, L is the
`lookahead number of waveforms, a (t) is some increasing
`interpolation function in the range 0£ a (t)£ 1, and
`mW is a
`diagonal matrix whose elements, wkk, are the combined spectral-
`weighting and synthesis of the k-th harmonic given by:
`
`

`

`{
`
`j

`
`=
`
`argmin
`j

`
`i
`
`--
`j

`j
`H
`
` )ˆ (rWr
`
`e
`
`i
`
`
`
`(r
`
`j

`j
`
`i
`
`e
`
`r
`
`(7)
`
`})ˆ
`jjˆe
`where i is the running phase codebook index, and
` is the
`respective diagonal phase exponent matrix. The AbS search for
`phase quantization is based on evaluating (7) for each candidate
`phase codevector. Since only trigonometric functions of the
`phase candidates are used, phase unwrapping is avoided.
`
`i
`
`Pitch-Cycle
`Waveform’s DFT
`
`Crude
`Linear-
`Phase
`Alignment
`
`Refined
`Linear-
`Phase
`Alignment
`
`Magnitude
`Codebook
`
`Phase
`Codebook
`
`^
`|r|
`
`e jj^
`
`x
`
`r^
`
`-
`
`r
`
`+
`
`W(z)
`
`; k = 1 ,.., K
`
` (2)
`
`p
`)2(
`P
`
`j
`
`k
`
`2
`
`)
`
`1g
`
`)
`
`2
`
`g
`/(
`gA
`z
`)(ˆ
`/(
`zAzA
`
`=
`
`w
`kk
`
`=
`ez
`where P is the pitch period, K is the number of harmonics, g is
`)(ˆ zA
` are the input and the quantized LPC
`the gain, A(z) and
`polynomials respectively, and the spectral weighting parameters
`satisfy 0 £ g 2 < g 1 £ 1 . The interpolated SEW vectors are given
`by:
`~
`r
`m
`
`-=
`1[
`
`a
`
`
`
`t(
`
`+
`
`a
`
`
`
`t(
`
`
`
` ; m = 1 ,.., M
`
`(3)
`
`
`
`
`
`(4)
`
`] œœœœ ß
`
`)
`
`m
`
`--
`1[
`
`a
`
`(
`t
`
`m
`
`ˆ)]
`r
`0
`
`t
`
`1
`1[
`+
`1
`
`a
`
`
`
`(t
`
`m
`
`2
`
`)]
`
`rW
`mm
`
`=
`Mm
`
`ˆ)]r
`
`)r
`0
`
`Mopt,
`m
`m
`0ˆr , is the quantized SEW at the previous frame. The
`optM ,r
`optimal vector,
`, which minimizes DwI, is given by:
`[
`rW
`mm
`
`( L
`
`where,
`
`r
`,
`optM
`
`=
`
`W
`,
`optM
`
`1
`
`M
`
`a
`
`=
`1
`M
`
`m
`
`+
`
`øŒŒŒŒºØ -
`(cid:229)(cid:229) -+
`
`where,
`
`M
`
`2
`
`LM
`
`2
`
`Pitch
`
`min||*||2
`
`W
`,
`optM
`
`=
`
`(cid:229)(cid:229) -+
`a
`+
`W
`
`(t
`)
`m
`
`m
`
`=
`1
`
`m
`
`1
`1[
`+
`1
`
`a
`
`
`
`(t
`
`m
`
`)]
`
`W
`m
`
`
`
`(5)
`
`=
`Mm
`
`This optimized vector is then quantized using WMSE weighted
`optM ,W . An improved match between reconstructed and
`by
`original SEW is obtained, most notably in the transitions.
`
`AbS PHASE QUANTIZATION
`3.
`The dispersion-phase quantization scheme [7] is illustrated in
`Figure 1. Consider a pitch cycle which is extracted from the
`residual signal, and is cyclically shifted such that its pulse is
`located at position zero. Let its DFT be denoted by r; the
`resulting DFT phase is the dispersion phase,
`, which
`determines, along with the magnitude r , the waveform’s pulse
`shape. After quantization, the components of the quantized
`magnitude vector, rˆ , are multiplied by the exponential of the
`j k , to yield the quantized waveform DFT,
`quantized phases, $(
`)
`rˆ , which is subtracted from the input DFT to produce the error
`DFT. The error DFT is then transformed to the perceptual
`domain by weighting it by the combined synthesis and weighting
`filter W(z). The encoder searches for the phase that minimizes
`the energy of the perceptual domain error, allowing a refining
`cyclic shift of the input waveform during the search, to eliminate
`any residual phase shift between the input waveform and the
`quantized waveform. Phase dispersion quantization aims to
`improve waveform matching. Efficient AbS quantization can be
`obtained by using the perceptually weighted distortion measure:
`--
`=
`H
` )ˆ,( rr
` /)ˆr
`
`
`
`)ˆ (rWr
`r
`
`(
`D
`K
` (6)
`w
`The magnitude is perceptually more significant than the phase;
`and should therefore be quantized first. Furthermore, if the phase
`were quantized first, the very limited bit allocation available for
`the phase would lead to an excessively degraded spectral
`matching of the magnitude in favor of a somewhat improved, but
`less important, matching of the waveform. For the above
`distortion measure, the quantized phase vector is given by [7]:
`
`Figure 1. Block diagram of the AbS dispersion phase’s
`vector quantization.
`
`4.
`
`PITCH SEARCH
`
`The pitch search is based on varying segment boundaries. It
`allows for locking onto the most probable pitch period even
`during transitions or other segments with rapidly varying pitch.
`Initially, pitch periods, P(ni), are searched every 2 ms at
`instances ni by maximizing the normalized correlation of the
`weighted speech sw(n), that is:
`=
`r
`
`{
`
`
`
`(nP
`i
`
`)
`
`arg
`max
`t
`,
`
` ,NN
`1
`
`2
`
`t
`
`,
`
`(
`
`n
`
`i
`
`,
`
`
`
` , NN
`1
`
`)
`
`2
`
`} =
`
`(8)
`
`
`
` ++t
`
`N
`
`n
`i
`
`2
`
`
`
`
`
` )( (nsns
`w
`w
`D-=
`Nnn
`1
`i
`
`t
`
`)
`
`(cid:239)(cid:239)(cid:254)(cid:239)(cid:239)(cid:253)(cid:252)(cid:239)(cid:239)(cid:238)(cid:239)(cid:239)(cid:237)(cid:236) ---(cid:229)(cid:229) (cid:229) D
`
`arg
`max
`t
`,
`,
`NN
`1
`
`2
`
`
`
` ++t
`
`N
`
`n
`i
`
`2
`
`
`
`
`
` )( )(nsns
`w
`w
`D-=
`Nnn
`1
`i
`
`
`
` ++t
`
`N
`
`n
`i
`
`2
`
`
`
` (ns
`w
`D-=
`Nnn
`1
`i
`
`t
`
`)
`
`
`
` (ns
`w
`
`t
`
`)
`
`where D
` is some incremental segment used in the summations
`for computational simplicity, and 0 £ Nj £ º 160 / D ß . Then, every
`10 ms a weighted-mean pitch value is calculated by:
`
`P
`mean
`
`(cid:229)(cid:229)=
`r
`(
` ) (nPn
`/)
`
`
`i
`i
`=
`1
`
`5
`
`i
`
`r
`
`5
`
`=
`1
`
`(
`
`n
`i
`
`)
`
` (9)
`
`i
`
`where
`
`( inr
`
`)
`
` is the normalized correlation for P(ni).
`
`5.
`
`GAIN QUANTIZATION
`
`The gain trajectory is commonly smeared during plosives and
`onsets by downsampling and interpolation. We address this
`problem and improve speech crispness with a novel Switched-
`Predictive AbS Gain VQ technique, illustrated in Figure 2.
`
`-
`-
`j
`D
`D
`

`

`Switched-prediction is introduced to allow for different levels of
`gain correlation, and to reduce the occurrence of gain outliers. In
`order to improve speech crispness, especially for plosives and
`onsets, temporal weighting is incorporated in the AbS gain VQ.
`The weighting is a monotonic function of the temporal gain.
`Two codebooks of 32 vectors each are used. Each codebook has
`an associated predictor coefficient, Pi, and a DC offset Di. The
`quantization target vector is the DC removed log-gain vector
`denoted by t(m). The search for the minimal WMSE is
`performed over all the vectors, cij(m), of the codebooks. The
`(ˆ mt
`)
`quantized target,
`, is obtained by passing the quantized
`vector, cij(m), through the synthesis filter. Since each quantized
`target vector may have a different value of the removed DC, the
`quantized DC is added temporarily to the filter memory after the
`state update, and the next quantized vector’s DC is subtracted
`from it before filtering is performed. Since the predictor
`coefficients are known, direct VQ can be used to simplify the
`computations.
`
`Log-Gain
`
`g(m)
`
`+
`
`DC
`Codebook
`Predictor
`Codebook
`Vector
`Quantizer
`Codebook
`
`Di
`
`Pi
`
`cij(m)
`
`Synthesis
`Filter
`
`1
`--
`zPi
`
`1
`
`1
`
`^
`t(m)
`
`+
`
`t(m)
`
`Temporal
`Weighting
`
`min||*||2
`
`Figure 2. Switched-Predictive Analysis-by-Synthesis
`gain VQ using temporal weighting.
`
`6.
`
`BIT ALLOCATION
`
`The bit allocation of the coder is given in Table 1. The frame
`length is 20 ms, and ten waveforms are extracted per frame. The
`pitch and the gain are coded twice per frame.
`
`Parameter
`LPC
`Pitch
`Gain
`REW
`SEW magnitude
`SEW phase
`Total
`
`Bits / Frame
`18
`2x6=12
`2x6=12
`20
`14
`4
`80
`
`Bits / second
`900
`600
`600
`1000
`700
`200
`4000
`
`Table 1. Bit allocation for EWI coder
`
`SUBJECTIVE RESULTS
`7.
`We have conducted a subjective A/B test to compare our 4 kbps
`EWI coder to MPEG-4 at 4 kbps, and to G.723.1. The test data
`included 24 MIRS speech sentences, 12 of which are of female
`speakers, and 12 of male speakers. Fourteen
`listeners
`participated in the test. The test results, listed in Table 2 to Table
`
`4, indicate that the subjective quality of EWI exceeds that of
`MPEG-4 at 4 kbps and of G.723.1 at 5.3 kbps, and it is slightly
`better than that of G.723.1 at 6.3 kbps.
`
`Test
`Female
`Male
`Total
`
`4 kbps WI
`65.48%
`61.90%
`63.69%
`
`4 kbps MPEG-4
`34.52%
`38.10%
`36.31%
`
`Table 2. Results of subjective A/B test for comparison between
`the 4 kbps WI coder to 4 kbps MPEG-4. With 95% certainty the WI
`preference lies in [58.63%, 68.75%].
`
`Test
`Female
`Male
`Total
`
`4 kbps WI
`57.74%
`61.31%
`59.52%
`
`5.3 kbps G.723.1
`42.26%
`38.69%
`40.48%
`
`Table 3. Results of subjective A/B test for comparison between
`the 4 kbps WI coder to 5.3 kbps G.723.1. With 95% certainty the
`WI preference lies in [54.17%, 64.88%]
`
`Test
`Female
`Male
`Total
`
`4 kbps WI
`54.76%
`52.98%
`53.87%
`
`6.3 kbps G.723.1
`45.24%
`47.02%
`46.13%
`
`Table 4. Results of subjective A/B test for comparison between
`the 4 kbps WI coder to 6.3 kbps G.723.1. With 95% certainty the
`WI preference lies in [48.51%, 59.23%].
`
`SUMMARY
`8.
`We have found that the performance of the WI coder can be
`enhanced by adding several new
`techniques. The most
`significant of these, reported here, analysis-by-synthesis vector-
`quantization of the dispersion-phase, AbS optimization of the
`SEW, a special pitch search for transitions, and switched-
`predictive analysis-by-synthesis gain VQ. These
`features
`improve the algorithm and its robustness. The test results
`indicate that the EWI coder slightly exceeds the G.723.1 coder's
`performance at 6.3 kbps and therefore it is very close to toll
`quality, at least under clean speech conditions.
`
`REFERENCES
`9.
`[1] B. S. Atal, and M. R. Schroeder, “Stochastic Coding of Speech at Very
`Low Bit Rate”, Proc. Int. Conf. Comm, Amsterdam, pp. 1610-1613,
`1984.
`[2] R. J. McAulay, and T. F. Quatieri, “Speech Analysis-Synthesis Based
`on a Sinusoidal Representation”, IEEE Trans. ASSP, Vol. 34, No. 4,
`pp. 744-754, 1986.
`[3] D. Griffin, and J. S. Lim, “Multiband Excitation Vocoder”, IEEE
`Trans. ASSP, Vol. 36, No. 8, pp. 1223-1235, August 1988.
`[4] Y. Shoham, "High Quality Speech Coding at 2.4 to 4.0 kbps Based on
`Time-Frequency-Interpolation", IEEE ICASSP’93, Vol. II, pp. 167-170,
`1993.
`[5] W. B. Kleijn, and J. Haagen, "Waveform Interpolation for Coding and
`Synthesis", in Speech Coding Synthesis by W. B. Kleijn and K. K.
`Paliwal, Elsevier Science B. V., Chapter 5, pp. 175-207, 1995.
`I. S. Burnett, and D. H. Pham, "Multi-Prototype Waveform Coding
`using Frame-by-Frame Analysis-by-Synthesis", IEEE ICASSP’97, pp.
`1567-1570, 1997.
`[7] O. Gottesman,
`“Dispersion Phase Vector Quantization For
`Enhancement of Waveform Interpolative Coder”, IEEE ICASSP’99,
`vol. 1, pp. 269-272, 1999.
`
`[6]
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket