`Saint Lawrence Communications
`Exhibit 2014
`
`REAL-TIME IMPLEMENTATION OF HIGH-QUALITY 32 KBPS WIDEBAND LD-CELP CODER
`
`Oded Gottesman
`Yair Shoham
`
`Speech Coding Research Department
`AT&T Bell Laboratories
`Murray Hill, New Jersey 07974
`
`ABSTRACT
`The Wideband-Audio Low-Delay CELP (LD-CELP) coder
`produces speech with quality as high as the CCITT 64 kb/s
`standard (G.722) at half the bitrate. The computational load of the
`encoder is almost 900% processor time of the 12.5 MIPS DSP32c.
`This makes a
`real-time
`implementation
`impractical. We
`investigated the Gain-Shape Vector-Quantization (GSVQ) in
`order to reduce the computational load of the encoder. This paper
`describes a real-time implementation of the LD-CELP encoder
`based on the AT&T SURFboard using two DSP32c operating in
`parallel. A computational load of 180% processor time has been
`achieved. The respective decoder requires 42% processor time.
`The implementation of a full-duplexed coding system requires
`three 12.5 MIPS Digital-Signal-Processors (DSPs) and has one-
`way coding delay of less than 1ms. The coder also performs well
`for non-speech wideband audio signals such as music.
`
`Keywords: Wideband, LD-CELP.
`
`1. INTRODUCTION
`The growing pool of ISDN applications intensifies the
`interest in new and more advanced coding algorithms for
`wideband speech [6, 7]. The major requirements expected from
`such coders are: a) the coded speech quality should be comparable
`to that of the G.722; b) the bitrate should be at least halved; and c)
`the one-way coding delay should be minimal. The 32 kb/s
`wideband-speech LD-CELP coder was shown to potentially
`satisfy
`these
`requirements
`[10, 11]. However,
`the high
`computational load of the encoder, which is approximately 900%
`of the processor-time of a 12.5 MIPS DSP [5], makes the
`implementability of this algorithm questionable. With the present
`DSP technology, the use of several DSPs operating in parallel
`seems to be unavoidable even if the algorithm is greatly
`simplified. Therefore, we were challenged to implement the
`encoder using only two DSPs.
`
`In this paper we present a real-time DSP implementation of
`this coder and describe its performance. Section 2 gives a brief
`overview of the initial 32 kb/s wideband-speech LD-CELP
`algorithm and analyzes its computational load. Section 3 shows
`how we dealt with the problem of the computational load
`reduction. Section 4 describes the development of the parallel-
`processing operated DSP software, the processor time and
`memory usage. Section 5 discusses the subjective performance
`test results.
`
`sn
`
`cj
`
`x
`
`c1
`c2
`
`Codebook
`
`cN
`
`gain
` adaptor
`
`+
`
`W(z)
`
`^
`rn
`
`1
`A(z)
`
`^
`sn
`
`c1
`c2
`
`Codebook
`
`cN
`
`cj
`
`x
`
`gain
` adaptor
`
`^
`rn
`
`1
`A(z)
`
`^
`sn
`
`Figure 1. Fully backward adaptive LD-CELP coder
`
`2. OVERVIEW OF THE 32 KB/S WIDEBAND-
`SPEECH LD-CELP ALGORITHM
`The LD-CELP is basically a backward-adaptive version of
`the conventional CELP coder [1, 8, 12]. The basic structure of the
`LD-CELP [3-5, 10, 11] is illustrated in Fig. 1. The LD-CELP
`encoder implements a closed-loop (analysis-by-synthesis) search
`procedure for finding the best excitation cj drawn from an
`excitation codebook. Each possible excitation vector is passed
`through the adaptive gain
` [2, 5] and the LPC filter 1/A(z) and
`results with a synthetic output vector. The encoder selects the
`excitation whose synthetic output vector sn is the best match to
`the input speech sn, usually in a Weighted-Mean-Squared Error
`
`The European Conference on Speech Communication and Technology, EUROSPEECH, 1993 ©
`
`IPR2017-01077
`Saint Lawrence Communications
`Exhibit 2014
`
`
`
`Element
`
`Encoder
`Real-Time (%)
`16 kHz
`Sampling rate
`7 kHz audio
`Coded data
`32 kb/s (2 bit/sample)
`Bitrate
`5 samples (0.3125 ms)
`Vector length
`backward mode
`LPC analysis
`32
`LPC Synthesis filter order
`16
`Noise-Weighting filter order
`2
`Spectral-Tilt filter order
`Noise-Weighting filter
`z=0.95, p=0.8, t=0.7
`Excitation signal, 10 bit
`Quantization
`Not used
`Pitch-Synthesis filter
`Backward-mode
`Adaptive predictive gain
`Table 1. The Wideband LD-CELP configuration
`
`Process
`
`Convolution plus energy
`VQ search
`LPC of order 32
`Recursive autocorrelation
`Impulse response
`Zero input response
`Update filters states
`Pre-filter the input
`Autocorrelation of order 3
`LPC of order 2
`Weight filters
`Predictive gain update
`Compute the VQ’s target
`Total Real-Time %
`MIPS
`
`Encoder
`Real-Time (%)
`429.36
`341.56
`61.67
`19.62
`10.62
`10.62
`10.62
`5.57
`2.37
`2.25
`2.05
`1.66
`0.32
`898.30
`112.29
`
`Decoder
`Real-Time (%)
`0
`0
`61.67
`19.62
`0
`0
`5.31
`0
`0
`0
`2.05
`1.66
`0
`90.32
`11.29
`
`* All real-time % computations are with respect to DSP32C having 12.5 MIPS
`Table 2. The computational complexity of the initial LD-CELP
`coder [5].
`
`Fig. 2 illustrates the complexity of the two investigated LD-CELP
`encoders [5] as a function of the LPC update period. The initial
`system performs an exhaustive search in a 10-bit codebook for the
`best matched shape-vector cj(n), hence is denoted by Shape-VQ
`(SVQ). The quantized excitation vector r(n) for the SVQ system
`is given by:
`
`(4)
` cj(n) ; 0 n N-1
`r(n) =
` is the adaptive predictive gain
`where N is the vector length and
`[2, 5] illustrated in Fig. 1. The second system performs a Gain-
`Shape VQ (GSVQ) [3-5], where 7 bits are allocated to represent a
`shape vector qj(n) and 3 bits are used for gain factor gk. The
`quantized excitation vector r(n) for the GSVQ system is given
`by:
`
`r(n) =
`
` gk qj(n) ; 0 n N-1
`
`(5)
`
`(WMSE) sense. The WMSE matching is accomplished via the use
`of a noise-weighting filter W(z). The parameter j that describes
`the selected excitation vector is then transmitted to the decoder
`where the synthesis process is duplicated.
`
`The parameters of the filters 1/A(z) and W(z) are determined
`via the LPC analysis applied to the recent past output speech in a
`backward-adaptive mode. The filter W(z) is important for
`achieving a high perceptual quality in CELP systems. The
`conventional form of noise-weighting filter Wc(z) is given by [1, 3-
`5, 8, 10, 11]:
`
`0;
`
`p
`
`z
`
` (1)
`
`1
`
`P
`
`1
`
`k
`P
`
`k
`
`1
`
`a
`
`k
`
`k
`
`z
`
`z
`
`a
`
`k
`
`k
`
`z
`
`p
`
`k
`
`k
`
`11
`
`)(
`zW
`c
`
`where A(z) is the LPC polynomial. Such a filter has an inherent
`limitation in modeling concurrently the formant structure and the
`spectral tilt. Since at high frequencies the data is highly
`unstructured and the initial unweighted SNR tends to be highly
`negative [11], noise-weighting filter is more critical in wideband
`speech coding. Therefore, an enhanced form of noise-weighting
`filter [5, 10, 11] is used for wideband speech LD-CELP coder, given
`by:
`
`W(z) = Wc(z) T(z)
`
`where T(z) is a tilt controlling second order section given by:
`
`)(
`zT
`
`1
`
`k
`
`1
`
`2
`
`k
`
`1
`
`k
`
`k
`
`z
`
`t
`
`(2)
`
`(3)
`
`2
`where the coefficients { k}k=1
` are computed by applying the
`standard LPC procedure to the first three correlation coefficients
`p
`of the current frame LPC coefficients {ak}k=0
` [5, 10, 11]. The
`parameter
`t is used to adjust the spectral tilt of T(z). Table 1
`shows the configuration of the wideband LD-CELP coder
`investigated and implemented [5].
`
`3. COMPUTATIONAL LOAD REDUCTION
`The computational complexity of our initial LD-CELP coder
`is depicted in Table 2 [5]. It is measured as a percentage of 12.5
`MIPS processor time. The overall complexity of the encoder is
`approximately 900%
`real-time. The most
`intensive
`task
`(429.36%) is the convolution of the synthesis filters with the
`entire set of excitation vectors and the computation of the energy
`of the resulted vectors [5]. The second intensive task is VQ search
`(341.56%). We selected to reduce the complexity of these two
`tasks by using Gain-Shape VQ [3-5]. Additional reduction of the
`algorithm complexity may be obtained by performing the LPC
`analysis once in every given number of vectors rather than every
`vector.
`
`The European Conference on Speech Communication and Technology, EUROSPEECH, 1993 ©
`
`
`
`the processing stream, performed by each DSP in the real-time
`implementation of the selected GS LD-CELP encoder. The master
`DSP handles the DMA with the analog interface. As soon as a
`new vector of samples is filled, the first (master) DSP starts
`processing the VQ processes. In the background the LPC
`processes are handled by the second (slave) DSP. The slave DSP
`is synchronized to the master DSP such that they share the VQ
`search. The arrows denote parameter transfer between the two
`DSPs. The illustrated process is repeated in a 4 vector period (the
`LPC update period). The 4 vectors in this period are denoted by
`vector #1 to vector #4. Table 3 summarizes the complexity of our
`real-time implemented GSVQ LD-CELP encoder [5]. Table 4
`summarizes the memory usage of the implementation.
`
`Processing Stream of 32 kb/s GSVQ LD-CELP encoder
`
`DMA
`
`DMA
`
`DMA
`
`Vector #1
`
`Vector #2
`
`DSP#1:
`(Master)
`
`DSP#2:
`(Slave)
`
`DSP#1:
`(Master)
`
`DSP#2:
`(Slave)
`
`0
`
`20
`
`40
`
`60
`
`80
`
`100
`
`120
`
`140
`
`160
`
`180
`
`200
`
`DMA
`
`DMA
`
`Vector #3
`
`DMA
`
`Vector #4
`
`Figure 2. The computational complexity vs LPC update rate for
`the two LD-CELP encoders [5].
`
`Fig. 3 illustrates the output SNR obtained [5] for the respective
`systems. The GSVQ encoder having an update rate of every 4
`vectors requires 180% real-time was selected for real-time
`implementation on a two-DSP hardware. The computational load
`of the respective decoder is 42% processor time. Therefore it is
`implemented on a third DSP [5].
`
`1
`
`Figure 3. Output SNR vs LPC update rate for the two LD-CELP
`coders [5].
`
`200
`
`220
`
`240
`
`260
`
`280
`
`DSP #1 (Master)
`
`320
`
`300
`
`340
`400
`360
`380
`Time ( 100% = 312.5µs )
`DSP #2 (Slave)
`
`4. SURFBOARD IMPLEMENTATION
`The original LD-CELP algorithm was written in C language.
`First we compiled and simulated the algorithm in general-purpose
`computer. Later we used the AT&T DSP32 C Language Compiler
`to compile the entire C code to DSP32 assembly code [13]. We
`ran this code on the AT&T DSP32 SURFboard. The encoding
`algorithm was then divided into two parts, to distribute its
`processing over two DSPs [5]. The first part includes the
`processes that are directly related to the VQ search and are
`performed every vector. We denoted this class of processes by VQ
`processes. The second part includes the processes that are directly
`related to the LPC analysis and are performed once in every given
`number of vectors. We denoted this class of processes by LPC
`processes. We ran these two parts of the algorithm on two DSPs
`where each one of them ran a different part of the algorithm in a
`master-slave manner. During this phase, the interface between the
`two DSPs was developed. We were greatly helped at this phase by
`a locally developed program called “dspx” which handled the
`downloading and the I/O between the Unix environment and the
`SURFboard. At this point we completed the allocation and
`scheduling of the tasks and interfaces performed by each one of
`the DSPs, but we still processed data files rather then real-time
`sampled data. The next step was to take a conservative approach
`in converting C subroutines step-by-step into DSP32 assembly
`code [13]. After all the C subroutines were converted to hand
`optimized DSP32 subroutines, we wrote a DSP32 assembly code
`to handle the DMA for the real-time processing. Fig. 5 illustrates
`
`Figure 4. Processing Stream of 32 kb/s Wideband GSVQ LD-
`CELP encoder [5].
`
`5. RESULTS
`The performance of the 32 kb/s wideband LD-CELP was
`evaluated by comparing it to the 64 kb/s G.722 CCITT standard
`wideband coder [9]. The test material included four male and four
`female utterances. Each utterance was coded by the G.722 and by
`the real-time LD-CELP to form a pair of utterances. Twenty-four
`listeners took part in the test. Twelve of the listeners work in
`speech processing and are well acquainted with this kind of test,
`and therefore were denoted "trained" listeners. The other twelve
`listeners, who are not experienced with this kind of test, were
`denoted "naive" listeners. The listener was asked to vote for the
`better sounding utterance in his judgment, or, to split his vote
`equally, if no preference could be made. The final scores were
`defined as the percentage of the number of votes for each system.
`
`The European Conference on Speech Communication and Technology, EUROSPEECH, 1993 ©
`
`
`
`Powered by TCPDF (www.tcpdf.org)
`
`computational complexity at a reasonable level. The results of the
`subjective A/B comparison tests indicate that the reproduced-
`speech quality of the 32 kb/s GS LD-CELP is comparable to the
`64 kb/s ADPCM standard. The two major advantages of our real-
`time implemented GS LD-CELP coder over the ADPCM standard
`are: a) it operates at half the bitrate of the ADPCM standard; and
`b) it has an extremely low one-way-delay of less than 0.94 ms
`compared to about 1.5 ms for the ADPCM standard. This work
`presents a real-time implemented coder which can be an excellent
`candidate
`for wideband
`audio
`coding
`in high-quality
`communication networks.
`
`REFERENCES
`[1] B. S. Atal, M. R. Schroeder, “Code Excited Linear
`Predictive (CELP): High Quality Speech at Very Low Bit Rates”,
`Proc. IEEE Int. Conf. ASSP, 1985, pp. 937-940.
`[2]
`J. H. Chen and Allen Gersho, “Gain-Adaptive Vector
`Quantization with Application to Speech Coding", IEEE Trans.
`Comm., Vol. 35 No. 9, September 1987, pp. 918-930.
`[3]
`J. H. Chen, “ A Robust Low-Delay CELP Speech Coder at
`16 kb/s”, Proc. GLOBECOM-89, Vol. 2, November 1989, pp.
`1237-1240.
`[4]
`J. H. Chen et. al., “Low-delay CELP coder for the CCITT
`16 kb/s speech coding standard”, IEEE SAC, vol. 10 no. 5, June
`1992, pp. 830-849.
`[5] O. Gottesman, “Algorithm Development and Real-Time
`Implementation of High-Quality 32kbps Wideband Speech LD-
`CELP Coder”, MS Thesis, ECE Dept., Drexel University, January
`1993.
`[6] N. S. Jayant et al., "Coding of Wideband Speech", Proc.
`2nd Europ. Conf. Speech. Commun. Technol., Sept 1991.
`[7] N. S. Jayant , "Signal Compression: Technology Targets
`and Research Directions", IEEE SAC, vol. 10 no. 5, June 1992,
`pp. 796-818.
`[8]
`Peter Kroon and Ed. F. Deprettere, "A Class of Analysis-
`by-Synthesis Predictive Coder for High Quality Speech Coding at
`Rates Between 4.8 and 16 kbits/s", IEEE J. SAC, vol. 6 No. 2,
`February 1988, pp. 353-363.
`[9]
`P. Mermelstein, "G.722, A New CCITT Coding Standard
`for Digital Transmission of Wideband Audio Signals", IEEE
`Communications Magazine, January 1988, pp. 8-15
`[10] E. Ordentlich, “Low Delay Code Excited Linear Predictive
`(LD-CELP) Coding of Wide Band Speech at 32Kbit/sec”, MS
`Thesis, EE Dept., MIT, March 1990.
`[11] E. Ordentlich, Y. Shoham, “Low-delay code-excited linear-
`predictive coding of wideband speech at 32 kbps”, Proc.
`ICASSP-91, pp. 9-12.
`[12] Y. Shoham, “Constrained-Stochastic-Excitation Coding of
`Speech at 4.8 Kb/s”, In B. S. Atal et al., editor, Advances in
`Speech Coding, Kluwer Academic Publishers, 1990, pp. 339-348.
`[13] "WE® DSP32C Digital Signal Processor - Information
`Manual", AT&T, January 1990.
`
`Process
`
`Decoder
`Encoder
`Real-Time (%)
`Real-Time (%)
`0
`13.42
`Convolution plus energy
`0
`88.70
`VQ search
`15.42
`15.42
`LPC of order 32
`18.83
`18.83
`Recursive autocorrelation
`0
`2.66
`Impulse response
`0
`10.62
`Zero input response
`5.31
`10.62
`Update filters states
`0
`5.57
`Pre-filter the input
`0
`0.59
`Autocorrelation of order 3
`0
`0.56
`LPC of order 2
`0.51
`1.02
`Weight filters
`1.66
`1.66
`Predictive gain update
`0
`0.32
`Compute the VQ’s target
`0
`10.03
`DSP interface
`41.73
`180.03
`Total Real-Time %
`5.22
`22.50
`MIPS
`* All real-time % computations are with respect to DSP32C having 12.5 MIPS
`Table 3. The computational complexity of the initial LD-CELP
`coder [5].
`
`Decoder
`
`\ System
`Encoder
`Encoder
`Block \
`DSP#2
`DSP#1
`2402
`3460
`4476
`Program
`6940
`7172
`7556
`Data
`9342
`10632
`12032
`Total
`Table 4. Memory usage of the Wideband LD-CELP (in bytes) [5]
`
`Type of input
`
`32 kb/s GSVQ
`64 kb/s ADPCM
`LD-CELP (%)
`(G.722) (%)
`45.57
`54.43
`Total Score
`41.15
`58.85
`Trained Listeners Score
`50.00
`50.00
`Naive Listeners Score
`46.61
`53.39
`Male’s utterances only
`44.53
`55.47
`Female’s utterances only
`Table 5. A/B-test results for 32 kb/s GS LD-CELP vs 64 kb/s
`ADPCM (G.722) [5].
`
`test are
`the subjective
`results of
`The experimental
`summarized in Table 5 [5]. The total results indicate that, on the
`average, our real-time 32 kb/s coder and the 64 kb/s ADPCM
`standard, which operates at twice the bit rate, provide comparable
`speech quality. Among naive-listeners, the two systems performed
`alike, on the average. We may, therefore, be able to halve the
`bitrate while preserving the high quality of the reproduced speech.
`Another observation is that LD-CELP does better on males than
`on females.
`
`6 CONCLUSIONS
`The main conclusion of this work is that high-quality coding
`of wideband audio at 32 kb/s is feasible while keeping the
`
`The European Conference on Speech Communication and Technology, EUROSPEECH, 1993 ©
`
`