throbber
CODE-EXCITED LINEAR PREDICTION (CELP):
`HIGH-QUALITY SPEECH AT VERY LOW BIT RATES
`
`Manfred R. Schroeder
`Drittes Physikalisches Institut
`University of Goettingen, F. R. Germany
`and AT&T Bell Laboratories
`Murray Hill, New Jersey 07974
`
`Bishnu S. Atal
`AT&T Bell Laboratories
`Murray Hill, New Jersey 07974
`
`ABSTRACT
`We describe in this paper a code-excited linear predictive coder
`in which the optimum innovation sequence is selected from a code
`book of stored sequences to optimize a given fidelity criterion. Each
`sample of the innovation sequence is filtered sequentially through two
`time-varying linear recursive filters, one with a long-delay (related to
`pitch period) predictor in the feedback loop and the other with a
`short-delay predictor (related to spectral envelope) in the feedback
`loop. We code speech, sampled at 8 kHz, in blocks of 5-msec dura-
`tion. Each block consisting of 40 samples is produced from one of
`1024 possible innovation sequences. The bit rate for the innovation
`sequence is thus 1/4 bit per sample. We compare in this paper
`several different random and deterministic code books for their
`effectiveness in providing the optimum innovation sequence in each
`block. Our results indicate that a random code book has a slight
`speech quality advantage at low bit rates. Examples of speech pro-
`duced by the above method will be played at the conference.
`
`INTRODUCTION
`
`Performance of adaptive predictive coders for speech signals
`using instantaneous quantizers deteriorate rapidly at bit rates below
`about 10 kbits/sec. Our past work has shown that high speech quality
`can be maintained in predictive coders at lower bit rates by using
`non-instantaneous stochastic quantizers which minimize a subjective
`error criterion based on properties of human auditory perception. [1].
`We have used tree search procedures to encode the innovation signal
`and have found the tree codes to perform very well at I bit/sample (8
`kbits/sec.). The speech quality is maintained even at 1/2 bit/sample
`when the tree has 4 branches at every node and 4 white Gaussian ran-
`dom numbers on each branch [21.
`The tree search procedures are suboptimal and the performance
`of tree codes deteriorates significantly when the innovation signal is
`coded at only 1/4 bit/sample (2 kbits/sec). Such low bit rates for the
`innovation signal are necessary to bring the total bit rate for coding
`the speech signal down to 4.8 kbits/sec - a rate that offers the possi-
`bility of carrying digital speech over a single analog voice channel.
`Fehn and Noll [3] have discussed merits of various multipath
`search coding procedures: code-book coding, tree coding, and trellis
`coding. Code-book coding is of particular interest at very low bit
`rates. In code-book coding, the set of possible sequences for a block
`of innovation signal is stored in a code book. For a given speech seg-
`ment, the optimum innovation sequence is selected to optimize a given
`fidelity criterion by exhaustive search of the code book and an index
`specifying the optimum sequence is transmitted to the receiver. In
`general, code-book coding is impractical due to the large size of the
`code books. However, at the very low bit rates we are aiming for,
`exhaustive search of the code book to find the best innovation
`sequence for encoding short segments of the speech signal becomes
`possible 14].
`
`SPEECH SYNTHESIS MODEL
`
`The speech synthesizer in a code-excited linear predictive coder
`is identical to the one used in adaptive predictive coders [11. It con-
`sists of two time-varying linear recursive filters each with a predictor
`in its feedback loop as shown in Fig. I. The first feedback loop
`includes a long-delay (pitch) predictor which generates the pitch
`periodicity of voiced speech. The second feedback loop includes a
`short-delay predictor to restore the spectral envelope.
`
`FINE STRUCTURE
`
`SPECTReL ENVELOPE
`(FORMANTS)
`
`SPEECH
`
`U'4NOVATIOr
`WHITE
`
`Fig. 1. Speech synthesis model with short and long delay
`predictors.
`
`The two predictors are determined using procedures outlined in
`References 1 and 5. The short-delay predictor has 16 coefficients and
`these are determined using the weighted stabilized covariance method
`of LPC analysis [1,5] once every 10 msec. In this method of LPC
`analysis, the instantaneous prediction error is weighted by a Hamming
`window 20 msec in duration and the predictor coefficients are deter-
`mined by minimizing the energy of the weighted error. The long-
`delay (pitch) predictor has 3 coefficients which are determined by
`minimizing the mean-squared prediction error after pitch prediction
`over a time interval of 5 msec [2].
`
`SELECTION OF OPTIMUM INNOVATION SEQUENCE
`
`Let us consider the coding of a short block of speech signal 5
`msec in duration. Each such block consists of 40 speech samples at a
`sampling frequency of 8 kHz. A bit rate of 1/4 bit per sample
`corresponds to 1024 possible sequences (10 bits) of length 40 for each
`block. The procedure for selecting the optimum sequence is illus-
`trated in Fig. 2. Each member of the code book provides 40 samples
`of the innovation signal. Each sample of the innovation signal is
`scaled by an amplitude factor that is constant for the 5 msec block
`and is reset to a new value once every 5 msec. The scaled samples
`are filtered sequentially through two recursive filters, one for introduc-
`ing the voice periodicity and the other for the spectral envelope. The
`regenerated speech samples at the output of the second filter are com-
`pared with the corresponding samples of the original speech signal to
`form a difference signal. The difference signal representing the-objec-
`tive error is further processed through a linear filter to attenuate those
`frequencies where the error is perceptually less important and to
`amplify those frequencies where the error is perceptually more impor-
`tant. The transfer function of the weighting filter is given by
`
`25.1.1
`
`CH2118-8/85/0000-0937 $1.00 © 1985 IEEE
`
`937
`
`Ex. 1046 / Page 1 of 4
`Apple v. Saint Lawrence
`
`

`

`V/(z)
`
`I —
`
`p
`
`k—I
`
`a5z
`— a5a'zt
`
`silence regions to voiced speech.
`
`(1)
`
`where ak are the short-delay predictor coefficients, p=l6 and a is a
`parameter for controlling the weighting of the error as a function of
`frequency. A suitable value of a is given by
`a e2100'"
`(2)
`where f is the sampling frequency. The weighted mean-squared
`error is determined by squaring and averaging the error samples at
`the output of the weighting filter for each 5-msec block. The
`optimum innovation sequence for each block is selected by exhaustive
`search to minimize the weighted error. As mentioned earlier, prior to
`filtering, each sample of the innovation sequence is scaled by an
`amplitude factor that is constant for the 5-msec block. This ampli-
`tude factor is determined for each code word by minimizing the
`weighted mean-squared error for the block.
`
`PERCEPTUAL
`ERROR
`
`Fig. 2. Block diagram illustrating the procedure for selecting
`the optimum innovation sequence.
`
`CONSTRUCHON OF OPTIMUM CODE BOOKS
`
`A code book, within the limitation of its size, should provide as
`dense a sampling as possible of the space of innovation sequences. In
`principle, the code words could be block codes that are optimally
`placed on a hypersphere in the 40-dimensional space (representing 40
`samples in each 5-msec block). Fehn and Noll [3] have argued that
`random code books (code books with randomly selected code words)
`are less restrictive than deterministic code books Random code books,
`in some sense, provide a lower bound for the performance at any given
`bit rate. A deterministic code book, if properly constructed, should
`provide a performance that is at least equal to - if not better than -
`that of the random code books and the deterministic nature of the
`code book should make it easier to find the optimum innovation
`sequence for each block of speech. However, it is generally very
`difficult to design an optimum deterministic code book.
`As a start, we have chosen a random code book in which each
`possible code word is constructed of white Gaussian random numbers
`with unit variance. We have chosen the Gaussian distribution since
`our earlier work has shown that the probability density function of the
`prediction error samples (after both short-delay and long-delay predic-
`tions) is nearly Gaussian Eli. Figure 3 shows a plot of the first-order
`cumulative amplitude distribution function for the prediction residual
`samples and compares it with the corresponding Gaussian distribution
`function with the same mean and variance. A closer examination of
`the prediction error shows that the Gaussian assumption is valid
`almost everywhere except for stop bursts of unvoiced stop consonants
`and for a few pitch periods during the transition from unvoiced or
`
`Fig. 3. First-order cumulative probability distribution function
`for the prediction residual samples (solid curve). The
`corresponding Gaussian distribution function with the same
`mean and variance is shown by the dashed curve.
`
`Each sample v, of the innovation sequence in a Gaussian code
`book can be expressed as a Fourier series of N cosine functions
`(N =20):
`N-IV, Cf c0s(lrktIIN + (bk), n=0,l
`
`2N—l,
`
`(3)
`
`k—U
`is uniformly
`where Ck and 4a are independent random variables,
`distributed between 0 and 2ir, and c5 is Rayleigh distributed with pro-
`bability density function
`p (c5)
`
`ckexp(—ck2/2), C5>0.
`
`(4)
`
`The function of the innovation sequence in the synthesis model of
`Fig. I is to provide a correction to the filter output in reproducing the
`speech waveform within the limitation of the size of the code book.
`Using the Fourier series model of Eq. (3), the correction can be con-
`sidered separately for the amplitude and phase of each Fourier com-
`ponent. Do we need both amplitude and phase corrections for high-
`quality speech synthesis? Are the two types of corrections equally
`important? These questions can be answered by restricting the varia-
`tions in the amplitudes and phases of various Fourier components in
`Eq. (3). For example, a code book can be formed by setting the
`amplitudes c5 to a constant value and by keeping the phases /k uni-
`formly distributed between 0 and 27r. Another code book is formed
`by setting the phases to some constant set of values and by keeping
`the amplitudes Rayleigh distributed in accordance with Eq. (4).
`We have also used a code book in which the different innovation
`sequences are obtained directly from the prediction error (after nor-
`malizing to unit variance) of speech signals. The amplitudes and
`phases are no longer distributed according to Rayleigh and uniform
`density functions, respectively, but reflect the distributions represented
`in the actual prediction error.
`
`RESULTS
`
`As we mentioned earlier, the random code book provides a base
`line against which we can compare other code books. We have syn-
`thesized several speech utterances spoken by both male and female
`
`938
`
`25L2
`
`Ex. 1046 / Page 2 of 4
`
`

`

`code book cannot be reduced significantly without producing substan-
`tial increase in the error.
`
`0Oc
`
`(I)0a:
`
`w0(
`
`)10
`
`a:
`Ui
`aD
`
`z
`
`30
`
`32
`
`34
`
`36
`RMS ERROR
`
`38
`
`40
`
`42
`
`Fig. 5. Distribution of error amongst the various code words in
`a Gaussian code book.
`
`Due to the random nature of code books, different Gaussian code
`books produced different innovation sequences. However, we did not
`hear any audible difference between the speech signals reconstructed
`from these different code books. Figure 6 shows several examples of
`the innovation sequences selected from several different Gaussian code
`books for one 5-msec block. The innovation sequences for other previ-
`ous blocks were kept the same; thus, the filter coefficients and tbo
`filter memories were identical at the beginning of the block. The
`coded innovation sequences show very little similarity to each other.
`The amplitude spectrum for the different sequences is shown in Fig.
`6(b). Again, there is no obvious common pattern amongst the
`different amplitude spectra. The corresponding phase responses are
`shown Fig. 6(c).
`
`(a)
`
`(b)
`
`Ic)
`
`0)2345
`
`TIME (msec)
`
`0
`
`I
`
`2 3 4
`
`FREQ 1kHz)
`
`0
`
`\AA
`
`2
`I
`FREQ 1kHz)
`
`3
`
`4
`
`Fig. 6. (a) Waveforms of different innovation sequences for a
`particular 5-msec block, (b) amplitude spectra of innovation
`sequences, and (c) phase responses of innovation sequences.
`
`The code book with constant amplitude but uniformly distributed
`phases performed nearly as well as the Gaussian code book. The
`signal-to-noise ratio decreased by about 1.5 dB and there was an audi-
`ble difference between the two code books. The code book with
`
`speakers (pitch frequencies ranging from 80 Hz to 400 Hz) using the
`different code books discussed in the previous section. The random
`code book (with 1024 code words) provided unexpectedly good perfor-
`mance. Even in close pair-wise comparisons over head phones, only
`occasional small differences were noticeable between the original and
`synthetic speech utterances. These results suggest that a 10-bit ran-
`dom code book has sufficient flexibility to produce high-quality speech
`from the synthesis model shown in Fig. 1.
`The waveforms of the original and synthetic speech signals were
`found to match closely for voiced speech and reasonably well for
`unvoiced speech. The signal-to-noise ratio averaged over several
`seconds of speech was found to be approximately 15 dB. Examples of
`speech waveforms are shown in Fig. 4. The figure shows (a) original
`speech, (b) synthetic speech, (c) the LPC prediction residual, (d) the
`reconstructed LPC residual, (e) the prediction residual after pitch
`prediction, and (f) the coded residual trom a 10-bit random code
`book. As expected, the Gaussian code book is not able to reproduce a
`sharp impulse in the coded residual waveform, The absence of the
`sharp impulse produces appreciable phase distortion in the recon-
`structed LPC prediction residual. However this phase distortion is
`mostly limited to frequency regions outside the formants.
`
`(a
`
`(b)
`
`(c)
`
`(d)
`
`(e)
`
`(f)
`
`L_L_L_ I
`0
`
`I
`
`I
`
`0.2
`
`I
`
`I
`
`04
`0.6
`TIME (IO1sec)
`
`I _LJ_I
`
`I
`
`I
`0.8
`
`I
`
`I
`
`I I
`.0
`
`Fig. 4. Waveforms of different signals in the coder: (a) the
`original speech, (b) the synthetic speech, (c) the LPC
`prediction residual, (d) the reconstructed LPC residual, (e)
`the prediction residual after pitch prediction, and (f) the coded
`residual from a 10-bit random code book. Waveforms (c) and
`(d) are amplified S times relative to the speech signal.
`Waveforms (e) and (f) are amplified by an additional factor
`of 2.
`
`We have also examined the distribution of the reconstruction
`error amongst various code words. Figure 5 shows a plot of the
`number of code words which produced a given amount of rms error in
`a particular 5-msec block of speech. The behavior shown is typical
`of what we observed in several blocks. The minimum rms error for
`this block was 30 and only 5 code words (Out of a total of 1024) pro-
`duced an rms error less than 33. This indicates that the size of the
`
`25t3
`
`939
`
`Ex. 1046 / Page 3 of 4
`
`

`

`constant phases but Rayleigh-distributed amplitudes performed very
`poorly, both in the signal-to-noise ratio and in listening to synthetic
`speech. The code book based on the prediction residual signals
`derived from speech performed as well as the Gaussian code book.
`
`CONCLUDING REMARKS
`
`Our present work with the code-excited linear predictive coder
`has demonstrated that such coders offer considerable promise for pro-
`ducing high quality synthetic speech at bit rates as low as 4.8
`kbits/sec. The random code book we have used so far obviously does
`not provide the best choice. The proper design of the code book is the
`key to success for achieving even lower bit rates than we realized in
`this study. We have so far employed a fixed code book for all speech
`data. A fixed code book is somewhat wasteful. Further efficiency
`could be gained by making the code book adaptive to the time-varying
`linear filters used to synthesize speech and to weight the error. The
`coding procedure is computationally very expensive; it took 125 sec of
`Cray-i CPU time to process 1 sec of the speech signal. The program
`was however not optimized to run on Cray. Most of the time was
`taken up by the search for the optimum innovation sequence. A code
`book with sufficient structure amenable to fast search algorithms
`ould lead to real time implementation of code-excited coders.
`
`REFERENCES
`
`Eli B. S. Atal, "Predictive coding of speech at low bit rates," IEEE
`Trans. Co,nmun. vol. COM-30, pp. 600-614, April 1982.
`[2] M. R. Schroeder and B. S. Atal, "Speech coding using efficient
`block codes," in Proc. mt. Conf on Acoustics, Speech and Signal
`Proc., vol. 3, pp. 1668-1671, May 1982.
`[31 H. G. Fehn and P. Noll, "Multipath search coding of stationary
`signals with applications to speech," IEEE Trans. Commun. vol.
`COM-30, pp. 687-701, April 1982.
`[41 B. S. Atal and M. R. Schroeder, "Stochastic coding of speech
`signals at very low bit rates," in Proc. mt. Conf Comrnun. -
`1CC84, part 2, pp. 1610-1613, May 1984.
`[51 S. Singhal and B. S. Atal, "Improving performance of multi-pulse
`LPC coders at low bit rates," in Proc. mt. Conf. on Acoustics.
`Speech and Signal Proc., vol. 1, paper no. 1.3, March 1984.
`
`940
`
`25.1A
`
`Ex. 1046 / Page 4 of 4
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket