IPR2017-01244, No. 1044-44 Exhibit - Exhibit 1044 Salami 1998 (P.T.A.B. Apr. 4, 2017)

116
`
`IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
`
`Design and Description of CS-ACELP:
`A Toll Quality 8 kb/s Speech Coder
`
`Redwan Salami, Member, IEEE, Claude Laﬂamme, Jean-Pierre Adoul, Fellow, IEEE,
`Akitoshi Kataoka, Member, IEEE, Shinji Hayashi, Takehiro Moriya, Member, IEEE, Claude Lamblin,
`Dominique Massaloux, St´ephane Proust, Peter Kroon, Fellow, IEEE, and Yair Shoham, Member, IEEE
`
`Abstract— This paper describes the 8 kb/s speech coding al-
`gorithm G.729 which has been recently standardized by ITU-T.
`The algorithm is based on a conjugate-structure algebraic CELP
`(CS-ACELP) coding technique and uses 10 ms speech frames. The
`codec delivers toll-quality speech (equivalent to 32 kb/s ADPCM)
`for most operating conditions. This paper describes the coder
`structure in detail and discusses the reasons behind certain design
`choices. A 16-b ﬁxed-point version has been developed as part of
`Recommendation G.729 and a summary of the subjective test
`results based on a real-time implementation of this version are
`presented.
`
`Index Terms—Analysis-by-synthesis, speech coding.
`
`I. INTRODUCTION
`
`SINCE 1990, Study Group 15 (SG15) of the ITU-T has
`
`been involved in a standardization process for a speech
`coding algorithm at 8 kb/s. The main applications for this
`coder are 1) personal communication systems (PCS), 2) digital
`satellite systems, and 3) other applications such as packetized
`speech and circuit multiplexing equipment. The speech quality
`produced by this coder should be equivalent to that of 32
`kb/s ADPCM (G.726) for most operating conditions. These
`conditions include clean and noisy speech, multiple encodings,
`level variations and nonspeech inputs. The intended wireless
`applications require that the coder is robust against channel
`errors. These errors could be either random or bursty, and the
`coder should be able to withstand them without introducing
`major annoying effects. Moreover, if the radio channels suffer
`from long fades, and complete frames are lost, the decoder
`should be able to conceal these missing frames with a minimal
`loss in speech quality.
`Two candidate algorithms were submitted: one from NTT
`[1]–[3] and the other from France Telecom CNET/University
`of Sherbrooke [4]. Both candidates were equivalent to (or
`better than) 32 kb/s ADPCM in most test conditions; however,
`they failed some conditions. At
`the March 1994 meeting
`of SG15, both proponents agreed to join their efforts to
`
`produce a coder that combines the best features of both
`algorithms, and to undertake further research to meet all
`performance requirements. At this time, AT&T joined these
`algorithmic optimization efforts. A ﬂoating-point version of
`the resulting coder was tested in January 1995, and it was
`accepted at the ITU-T meeting in February 1995. In the ﬁnal
`recommendation the algorithm is speciﬁed in terms of 16-
`b ﬁxed-point arithmetic. This version was tested in October
`1995, and the recommendation was accepted for ratiﬁcation
`in November 1995 [5].
`In this paper, we describe the important aspects of the
`algorithm, which is referred to as conjugate-structure algebraic
`CELP (CS-ACELP). Additional information can be found in
`[6]–[10]. The complete algorithm, including ANSI-C source
`code, can be found in [11].
`This paper is organized as follows. In Section II we describe
`the coding algorithm in detail. In Section III we describe
`features of this coder that were included to increase the
`robustness against transmission errors. Section IV reports on
`the performance, and Section V discusses implementation
`aspects. Finally, the conclusions are given in Section VI.
`
`II. DESCRIPTION OF THE CS-ACELP SPEECH CODER
`The coder is based on a code-excited linear prediction
`(CELP) coding model [12]. In this model the locally decoded
`signal is compared against the original signal and the coder
`parameters are selected such that the mean-squared weighted
`error between the original and reconstructed signal is mini-
`mized.
`The CS-ACELP coder is designed to operate with an
`appropriately bandlimited signal sampled at 8000 Hz. The
`input and output samples are represented using 16-b linear
`PCM. The coder operates on frames of 10 ms, using a 5 ms
`look-ahead for linear prediction (LP) analysis. This results in
`an overall algorithmic delay of 15 ms. The encoding principle
`is shown in Fig. 1. After processing the 16-b input samples
`through a 140 Hz highpass ﬁlter,
`tenth-order LP analysis
`is performed, and the LP parameters are quantized in the
`line spectral pair (LSF) domain [13] with 18 b [7]. The
`input frame is divided into two subframes of 5 ms each.
`The use of subframes allows better tracking of the pitch and
`gain parameters and reduces the complexity of the codebook
`searches. The quantized and unquantized LP ﬁlter coefﬁcients
`are used for the second subframe while in the ﬁrst subframe
`interpolated LP ﬁlter coefﬁcients are used. For each subframe
`1063–6676/98$10.00 ª
`
`Manuscript received March 21, 1996; revised March 26, 1997. The associate
`editor coordinating the review of this manuscript and approving it for
`publication was Dr. W. Bastiaan Kleijn.
`R. Salami, C. Laﬂamme, and J.-P. Adoul are with the Department of
`Electrical Engineering, University of Sherbrooke, P.Q., Canada J1K 2R1.
`A. Kataoka, S. Hayashi, and T. Moriya are with NTT, Tokyo, Japan.
`C. Lamblin, D. Massaloux, and S. Proust are with France Telecom CNET,
`Lannion, France.
`P. Kroon and Y. Shoham are with Bell Laboratories, Lucent Technologies,
`Murray Hill, NJ 07974 USA (e-mail: kroon@research.bell-labs.com).
`Publisher Item Identiﬁer S 1063-6676(98)01691-5.
`
`1998 IEEE
`
`Ex. 1044 / Page 1 of 15
`Apple v. Saint Lawrence
`
`

`SALAMI et al.: DESIGN AND DESCRIPTION OF CS-ACELP
`
`117
`
`Fig. 1. Block diagram of the CS-ACELP encoder.
`
`the excitation is represented by an adaptive-codebook and a
`ﬁxed-codebook contribution. The adaptive and ﬁxed-codebook
`parameters are transmitted every subframe.
`The adaptive-codebook component represents the period-
`icity in the excitation signal using a fractional pitch lag
`[14] with 1/3 sample resolution. The adaptive-codebook is
`searched using a two-step procedure. An open-loop pitch lag is
`estimated once per frame based on the perceptually weighted
`speech signal. The adaptive-code book index and gain are
`found by a closed-loop search around the open-loop pitch lag.
`The signal to be matched, referred to as the target signal, is
`computed by ﬁltering the LP residual through the weighted
`synthesis ﬁlter.
`The adaptive-codebook index is encoded with 8 b in the
`ﬁrst subframe and differentially encoded with 5 b in the
`second subframe. The target signal is updated by removing the
`adaptive-codebook contribution, and this new target is used
`in the ﬁxed-codebook search. The ﬁxed codebook is a 17-b
`algebraic codebook [10]. The gains of the adaptive and ﬁxed
`codebook are vector quantized with 7 b using a conjugate-
`structure codebook [7] (with moving-average (MA) prediction
`applied to the ﬁxed-codebook gain as in [4] and [15]). The bit
`allocation for a 10 ms frame is shown in Table I.
`The function of the decoder (see Fig. 2) consists of decoding
`the transmitted parameters (LP parameters, adaptive-codebook
`vector, ﬁxed-codebook vector, and gains) and performing
`synthesis to obtain the reconstructed speech, followed by a
`postprocessing stage [8], consisting of an adaptive postﬁlter
`and a ﬁxed highpass ﬁlter.
`
`low-frequency or DC components. To prevent overﬂow in the
`ﬁxed-point implementation, the input values are divided by
`two. The ﬁltered and scaled signal is referred to as
`, and
`will be used in all subsequent encoder operations.
`
`B. LP Analysis and Quantization
`LP analysis is performed once per speech frame using the
`autocorrelation method [16] with a 30 ms asymmetric window.
`Every 80 samples (10 ms), the autocorrelation coefﬁcients
`of windowed speech are computed and converted to LP
`coefﬁcients using the Levinson–Durbin algorithm [16]. Then
`the LP coefﬁcients are transformed to line spectral frequencies
`(LSF) [13] for quantization and interpolation purposes. The
`interpolated quantized and unquantized LSF coefﬁcients are
`converted back to LP coefﬁcients to construct the synthesis and
`weighting ﬁlters for each subframe. The short-term analysis
`and synthesis ﬁlters are based on tenth-order LP ﬁlters. The
`LP synthesis ﬁlter is deﬁned as
`
`(1)
`
`where
`, are the (quantized) LP coefﬁcients.
`1) Windowing and Autocorrelation Computation: The LP
`analysis window consists of two parts: the ﬁrst part is half a
`Hamming window and the second part is a quarter of a cosine
`function cycle. The window is given by
`
`(2)
`
`A. Preprocessing
`The 16-b PCM input samples to the speech encoder are ﬁl-
`tered with a second-order pole/zero highpass ﬁlter with a cutoff
`frequency of 140 Hz. This highpass ﬁlter prevents undesired
`
`There is a 5 ms look-ahead in the LP analysis, which means
`that 40 samples are needed from the future speech frame. This
`translates into an extra algorithmic delay of 5 ms at the encoder
`stage. The use of an asymmetrical window allows reduction
`in the look-ahead without compromising quality [17]. The LP
`
`Ex. 1044 / Page 2 of 15
`
`

`118
`
`IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
`
`Fig. 2. Block diagram of the CS-ACELP decoder.
`
`TABLE I
`BIT ALLOCATION OF G.729 CS-ACELP FOR A 10 ms FRAME. FOR SOME
`PARAMETERS, THE NUMBER OF BITS FOR EACH 5 ms SUBFRAME IS
`IDENTIFIED. THE TOTAL NUMBER OF BITS FOR A 10 ms FRAME = 80
`
`analysis window is applied to 120 samples from past speech
`frames, 80 samples from the present speech frame, and 40
`samples from the future frame. The use of a 30 ms window
`was found to provide a smoother evolution of the LP ﬁlter,
`thereby providing better speech quality.
`The autocorrelation coefﬁcients are computed from the
`windowed speech
`
`To avoid arithmetic problems for low-level input signals the
`value of
`has a lower boundary of
`. A 60
`Hz bandwidth expansion [18] is applied, by multiplying the
`autocorrelation coefﬁcients with
`
`(3)
`
`(4)
`
`where
`Hz is the bandwidth expansion and
`Hz is the sampling frequency. The bandwidth expansion on
`the autocorrelation coefﬁcients reduces the possibility of ill-
`conditioning in the Levinson algorithm (especially in ﬁxed
`point). In addition, it reduces underestimation of the formant
`bandwidths, which could create undesirably sharp resonances.
`To further reduce the possibility of ill-conditioning due to
`bandpass ﬁltering of the input, the value of
`is multiplied
`by a white-noise correction factor
`, which is equiva-
`lent to adding a noise ﬂoor at
`40 dB [19]. The modiﬁed
`autocorrelation coefﬁcients are used to obtain the LP ﬁlter
`coefﬁcients
`, by using the Levinson–Durbin
`algorithm [16].
`2) Quantization of the LSF Coefﬁcients: The LP ﬁlter co-
`efﬁcients
`are converted to line spectral
`frequencies (LSF) using Chebyshev polynomials [20]. In this
`procedure the roots are found in the cosine domain. Since the
`quantizer is vector quantization (VQ) based, it is more conve-
`nient to represent the LSF’s as normalized radian frequencies.
`The relation between these two representations is given by
`
`(5)
`
`are the LSF coefﬁcients in the cosine domain, and
`where
`the LSF coefﬁcients in the frequency domain.
`To keep the algorithmic delay as low as possible, the update
`of the LP coefﬁcients is done every 10 ms. However, most
`speech spectra vary slowly in time, and a slower update (e.g.,
`20 ms) would provide a better tradeoff between spectral repre-
`sentation and bit rate. Since the higher update rate introduces
`a strong correlation between coefﬁcients from frame to frame,
`a good compromise is to use predictive VQ. During onsets
`this correlation is not very strong. To accommodate both
`types of correlation the predictor switches between two modes,
`one representing a strong correlation and one representing a
`
`Ex. 1044 / Page 3 of 15
`
`

`SALAMI et al.: DESIGN AND DESCRIPTION OF CS-ACELP
`
`mild correlation. Another advantage of using a separate bit
`for the switch is that it effectively reduces the size of the
`codebook, thereby reducing the storage requirements. To limit
`propagation of channel errors, the predictor is based on an
`MA ﬁlter. The length of this ﬁlter was determined empirically
`using a large data base [2], and it was found that a fourth-order
`MA predictor forms a good compromise between performance
`and error propagation. The quantizer is organized as follows:
`a switched fourth-order MA prediction is used to predict the
`LSF coefﬁcients of the current frame. The difference between
`the computed and predicted coefﬁcients is quantized using a
`two-stage vector quantizer. The ﬁrst stage is a 10-D VQ using
`codebook
`with 128 entries (7 b). The second stage is a
`10-b VQ that has been implemented as a split VQ using two
`5-D codebooks,
`and
`containing 32 entries (5 b) each.
`The reason for using a nonsplit ﬁrst-stage is that it allows for
`the exploitation of the correlations between the ﬁrst 5 LSF and
`last 5 LSF coefﬁcients. At the second stage these correlations
`are less strong, and the split reduces search time and storage
`requirements.
`To explain the quantization process, it is convenient to
`ﬁrst describe the decoding process. Each quantized value is
`obtained from the sum of two codewords, as follows:
`
`(6)
`
`where
`are the codebook indices. To guarantee
`, and
`that the reconstructed ﬁlters are stable the vector
`is arranged
`such that adjacent elements have a minimum distance of
`(see [11]). This rearrangement process is done twice.
`First with a value of
`,
`then with a value
`of
`. The incorporation of this process into
`the quantization procedure, assures that each of the possible
`reconstructed
`vectors produces a stable ﬁlter. After this
`rearrangement process, the quantized LSF coefﬁcients
`for the current frame
`, are obtained from the weighted sum
`, and the current quantizer
`of previous quantizer outputs
`output
`
`(7)
`
`where
`are the coefﬁcients of the switched MA predictor as
`deﬁned by codebook
`, and the one bit codebook index
`.
`are given by
`At startup the initial values of
`for all
`.
`After computing
`, the corresponding ﬁlter is checked
`for stability and unnatural sharp resonances by checking the
`ordering property (i.e.,
`). If
`this condition is not met, the frequencies are moved using a
`heuristic process, which enforces a minimum spacing of 50
`Hz between the coefﬁcients [11].
`The procedure for encoding the LSF parameters can be
`outlined as follows. For each of the two MA predictors the best
`approximation to the current LSF coefﬁcients has to be found.
`The best approximation is deﬁned as the one that minimizes
`
`the weighted mean-squared error
`
`119
`
`(8)
`
`The weights emphasize the relative importance of each LSF.
`Spectral resonances (closely spaced LSF’s) are perceptually
`more important, and several (heuristic) procedures to derive
`these coefﬁcients can be found in the literature (cf. [21]).
`The following heuristic procedure was found to improve
`performance and be computationally efﬁcient. The weights
`are made adaptive as a function of the unquantized LSF
`coefﬁcients,
`
`if
`
`if
`
`if
`
`otherwise
`
`otherwise
`
`(9)
`
`otherwise
`
`are multiplied by
`and
`In addition, the weights
`The vector to be quantized for the current frame
`obtained from
`
`each.
`is
`
`(10)
`
`that
`is searched and the entry
`The ﬁrst codebook
`minimizes the (unweighted) mean-squared error (MSE) is
`selected. This is followed by a search of the second codebook
`, which deﬁnes the lower part of the second stage. The
`weighted MSE of (8) is computed, and the vector
`which
`results in the lowest error is selected. Using the selected ﬁrst
`stage vector
`and the lower part of the second stage
`, the
`higher part of the second stage is searched from codebook
`.
`The vector
`that minimizes the weighted MSE is selected.
`is rearranged twice using
`The resulting vector
`the procedure outlined earlier. This process is done for each of
`the two MA predictors deﬁned by
`, and the MA predictor
`that produces the lowest weighted MSE is selected.
`3) Interpolation of the LSF Coefﬁcients: The quantized
`(and unquantized) LP coefﬁcients are used for the second
`subframe. For the ﬁrst subframe, the quantized (and unquan-
`tized) LP coefﬁcients are obtained by linear interpolation of
`the corresponding parameters in the adjacent subframes. The
`interpolation is done on the LSF coefﬁcients in the cosine
`domain rather than the frequency domain. Interpolating in
`either domain did not produce noticeable audible differences,
`and the cosine domain was selected because of ease of
`implementation. Once the LSF coefﬁcients are quantized and
`interpolated, they are converted back to the LP coefﬁcients
`.
`
`C. Perceptual Weighting
`in a subframe is obtained
`The weighted speech signal
`by ﬁltering the speech through a perceptual weighting ﬁlter
`
`Ex. 1044 / Page 4 of 15
`
`

`120
`
`IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
`
`. This perceptual weighting ﬁlter [22] is based on the
`unquantized LP ﬁlter coefﬁcients
`, and is given by
`
`(11)
`
`The use of the unquantized coefﬁcients gives a weighting ﬁlter
`that matches better the original spectrum. The values of
`and
`modify the frequency response of the ﬁlter
`, and
`thereby the amount of noise weighting. It is difﬁcult to ﬁnd
`ﬁxed values of
`and
`that provided good performance for
`different input signal characteristics. For example, differences
`in the low-frequency cutoff would lead to different choices for
`these coefﬁcients. Hence, the values of
`and
`are made a
`function of the spectral shape of the input signal. For signals
`with a lot of low-frequency energy, the amount of weighting
`is increased.
`This adaptation is done once per 10 ms frame, but an
`interpolation procedure for each ﬁrst subframe is used to
`smooth this adaptation process. The spectral shape is obtained
`from a second-order linear prediction ﬁlter, obtained as a
`by-product from the Levinson–Durbin recursion. The reﬂec-
`tion coefﬁcients
`are converted to log area ratio (LAR)
`coefﬁcients
`by
`
`(12)
`
`The LAR coefﬁcients are used because they have better
`interpolation properties than reﬂection coefﬁcients [23]. The
`LAR coefﬁcients corresponding to the current 10 ms frame are
`used for the second subframe. The LAR coefﬁcients for the
`ﬁrst subframe are obtained through linear interpolation with
`the LAR parameters from the previous frame. The spectral
`envelope is characterized as being either ﬂat
`or tilted
`. For each subframe this characterization is obtained
`by applying a threshold function to the LAR coefﬁcients. To
`avoid rapid changes, a hysteresis is used by taking into account
`the value of
`in the previous subframe
`
`if
`if
`
`and
`or
`
`otherwise
`
`and
`and
`
`(13)
`
`If the interpolated spectrum for a subframe is classiﬁed as
`, the weight factors are set to
`and
`ﬂat
`. If the spectrum is classiﬁed as tilted
`,
`the value of
`is set to 0.98, and the value of
`is adapted
`to the strength of the resonances in the LP synthesis ﬁlter,
`but is bounded between 0.4 and 0.7. If a strong resonance is
`present, the value of
`is set closer to the upperbound. This
`adaptation is done to reduce the amount of unmasked noise at
`the formant frequencies. The adaptation of
`is done by using
`a heuristic criterion based on the minimum distance between
`two successive LSF coefﬁcients for the current subframe. The
`minimum distance is given by
`
`The value of
`
`is computed using
`
`, bounded by
`
`(15)
`
`The values of
`for the different conditions were
`and
`obtained through many informal listening experiments using
`expert listeners. The process was done in stages by selecting
`speech material that could be characterized to fall into one of
`the categories: 1) ﬂat spectrum
`, 2) tilted spectrum
`and no strong resonances, and 3) tilted spectrum
`with strong resonances. This allowed independent
`optimization of the weight factors. In all experiments both
`single and double encodings were considered. In general, the
`improvements due to this adaptation of the weights are most
`noticeable for double encodings.
`
`D. Pitch Analysis
`The pitch analysis technique described in [4] is used. An
`open-loop pitch lag
`is estimated once per 10 ms frame us-
`ing the weighted speech signal
`. The adaptive-codebook
`approach is used to represent
`the periodic component
`in
`the excitation signal. The selected adaptive-codebook vector
`is represented by an index, which corresponds to a certain
`fractional lag value.
`For each subframe the target signal,
`, and the impulse
`response,
`, of the weighted synthesis ﬁlter are computed.
`A closed-loop adaptive-codebook search is performed in the
`ﬁrst subframe around the index corresponding to the open-loop
`pitch lag estimate
`. A 1/3 fractional sample resolution is
`used in the range
`and integers only are used in
`the range 85–143. It was found that this choice of resolution
`provides a good trade-off between performance and bit rate.
`The adaptive-codebook index in the ﬁrst subframe is encoded
`with 8 b. In the second subframe, a 1/3 fractional sample
`resolution is used in the range
`where
`is the integer part of the adaptive-codebook lag in the
`ﬁrst subframe. This range is adapted for the cases where
`straddles the boundaries of the lag range. The lag in the second
`subframe is differentially encoded with 5 b. Since the open-
`loop pitch estimate provides a form of pitch tracking, the
`differential coding does not introduce noticeable degradations
`in the speech quality.
`1) Open-Loop Pitch Lag Estimation: The open-loop pitch
`lag estimation uses the weighted speech signal
`, and is
`done as follows: in the ﬁrst step, 3 maxima of the correlation
`
`(16)
`
`are found in the following three ranges: 1)
`, and 3)
`. Note that for
`signal values from the previous frame are used. The retained
`maxima
`, where
`are the lag values corresponding to the
`maxima in the three lag regions
`, are normalized
`through
`
`, 2)
`
`(14)
`
`(17)
`
`Ex. 1044 / Page 5 of 15
`
`

`SALAMI et al.: DESIGN AND DESCRIPTION OF CS-ACELP
`
`121
`
`The winner among the three normalized correlations is selected
`by favoring the lags with the values in the lower range. This is
`done by weighting the normalized correlations corresponding
`to the longer lag values. This procedure of dividing the lag
`range into three sections and favoring the smaller values is
`used to discourage choosing pitch multiples.
`2) Computation of the Target Signal: The LP residual
`given by
`
`is
`
`(18)
`
`The target signal
`for the adaptive-codebook search
`is computed by ﬁltering the LP residual signal
`through
`and the weighting
`the combination of synthesis ﬁlter
`. After determining the excitation for
`ﬁlter
`the subframe, the initial states of these ﬁlters are updated by
`ﬁltering the difference between the residual and excitation
`signals.
`3) Adaptive-Codebook Search: The adaptive-codebook pa-
`rameters (or pitch parameters) are the indices corresponding to
`a certain lag and gain. In the adaptive-codebook approach for
`implementing the pitch ﬁlter [24], the excitation is repeated
`for lags less than the subframe length. The use of fractional
`lags makes this process computationally expensive during the
`search stage. Hence, during the search the excitation beyond
`the duration of the pitch is extended by the LP residual.
`This procedure is simpler, and it was found that it produces
`similar results compared to using the adaptive-codebook for
`the complete subframe. Note that once the lag has been
`determined, the conventional adaptive-codebook approach is
`used to generate the adaptive-codebook vector.
`For each 5 ms subframe,
`the lag is determined using
`closed-loop analysis that minimizes the weighted MSE. In
`the ﬁrst subframe the lag
`is found by searching a small
`range (six samples) of values around the open-loop lag
`(see Section II-D1). For the second subframe, the closed-loop
`adaptive-codebook search is done around the lag selected in
`the ﬁrst subframe to ﬁnd the optimal lag
`.
`The
`closed-loop search minimizes
`the mean-squared
`weighted error between the original and reconstructed speech.
`This is achieved by maximizing the term
`
`(19)
`
`is the past ﬁltered
`is the target signal and
`where
`excitation at delay
`(past excitation convolved with
`,
`where
`is the impulse response of the weighted synthesis
`).
`ﬁlter
`For the determination of
`if the optimum
`, and also for
`integer closed-loop lag is less than 85, the fractions around
`the optimum integer lag have to be tested. Instead of com-
`puting the correlation for each lag value, only the normalized
`correlations (19) corresponding to integer lags are computed.
`The correlation values corresponding to the fractional lags are
`obtained through interpolation of the normalized correlation
`function in (19) using a ﬁnite impulse response (FIR) ﬁlter
`
`based on a Hamming windowed
`function truncated
`at
`. The lag value corresponding to the (interpolated)
`maximum correlation is selected, and the adaptive-codebook
`vector
`is computed by interpolating the past excitation
`signal
`at the given integer lag
`and fraction
`[14].
`The interpolation ﬁlter is based on a Hamming windowed
`function truncated at
`and padded with zeros
`at
`. The ﬁlter has a cutoff frequency ( 3 dB) at 3600
`Hz in the oversampled domain. Although the lower cutoff
`frequency allows fewer taps for the realization of the ﬁlter,
`it is necessary to also ﬁlter the integer lags thereby increasing
`complexity. However, it was found that the lowpass action of
`the ﬁlter resulted in a smoother quality of the reconstructed
`signal, which justiﬁed this tradeoff.
`Once the adaptive-codebook index has been determined, the
`adaptive-codebook gain
`is computed as
`
`bounded by
`
`(20)
`
`is the ﬁltered adaptive-
`is the target signal and
`where
`codebook vector
`, obtained by convolving
`with
`.
`The scaled and ﬁltered adaptive-codebook vector is subtracted
`from
`to produce a new target vector
`.
`
`E. Fixed (Algebraic) Codebook
`A 17-b algebraic codebook is used for the ﬁxed-codebook.
`Algebraic codebooks are deterministic codebooks in which the
`codebook vectors are determined from the transmitted index
`using simple algebra rather than lookup tables. This structure
`has advantages in terms of storage, search complexity, and
`robustness [25]–[27]. Each ﬁxed-codebook vector contains
`four nonzero pulses. These pulses can assume the amplitudes
`and positions given in Table II, and are encoded separately
`using the bit allocation given in this table.
`Let
`be the codebook vector at index . The optimum
`codeword is the one that maximizes the term
`
`(21)
`
`is the vector of correlations between the target
`where
`signal
`and the impulse response,
`, of the weighted
`synthesis ﬁlter, and
`is the matrix of correlations of
`.
`The structure of the codebook allows for a fast-search
`procedure since the ﬁxed-codebook vector
`contains only
`four nonzero pulses whose amplitudes are
`. The search
`is performed in four nested loops, where in each loop the
`contribution of a new pulse is added. The correlation in the
`numerator of (21) is given by
`
`(22)
`
`where
`is its sign.
`is the position of the th pulse and
`The energy in the denominator of (21) is given by
`
`(23)
`
`The search complexity is greatly reduced by using the
`following procedure. The most likely amplitude of a pulse
`
`Ex. 1044 / Page 6 of 15
`
`

`122
`
`IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
`
`TABLE II
`STRUCTURE OF 17-b FIXED-CODEBOOK
`
`. More
`occurring at a certain position is estimated using
`precisely,
`the amplitude of a pulse at a certain position
`is set a priori
`to the sign of
`at
`that position. This
`choice of signs for a given combination of pulses maximizes
`the correlation term in (21). Therefore, before entering the
`codebook search, the following steps are taken. First, the signal
`is decomposed into its absolute value
`and
`its sign which characterizes the preselected pulse amplitudes at
`each of the 40 possible pulse positions. Second, the matrix
`is
`modiﬁed in order to include the preset pulse amplitudes; that is,
`and
`.
`The correlation in (22) is now given by
`
`and the energy in (23) is given by
`
`(24)
`
`(25)
`
`8192
`To evaluate all pulse positions, a total of 2
`combinations need to be examined. To reduce the complexity,
`the search is focused on those combinations that are likely to
`provide a good match. Since the last loop has 16 possible pulse
`positions, it was decided to limit the number of times that this
`loop is entered. In an exhaustive search this loop is entered
`times to ﬁnd the position of the fourth pulse for
`all possible combinations of three pulse positions. By keeping
`track of the relative contribution of the three pulses toward
`the objective of maximizing (24), it is possible to make the
`(heuristic) decision not to enter the search loop for the fourth
`pulse.
`This is done by setting a threshold based on the correlation
`. The threshold is a function of the maximum absolute
`correlation and the average correlation due to the contribution
`of the ﬁrst three pulses. The maximum correlation due the
`contribution of the ﬁrst three pulses is given by
`
`(26)
`
`at the ﬁrst three
`are the maxima of
`where
`position tracks given in Table II. The average correlation due
`to the contribution of the ﬁrst three pulses is given by
`
`(27)
`
`The search threshold is given by
`
`(28)
`This threshold value is preset before starting the codebook
`search. The last loop is searched only if the correlation due to
`the contribution of the ﬁrst three pulses exceeds this threshold.
`Note that the value
`controls the number of times the last
`loop is entered. It was found that a value of
`provides
`performance equivalent to exhaustive search. In an exhaustive
`search, the last loop is entered
`times. At the threshold
`factor
`, the average value of
`is 60 and only 5% of
`the time it exceeds 90. The search complexity is determined
`by the number of codewords searched in a 10 ms frame (two
`subframes). To limit the worst case complexity, the value of
`in the two subframes is limited to a maximum value
`.
`The ﬁrst subframe is allowed a maximum value of 105 and the
`second subframe is left with
`, where
`is the value
`of
`in the ﬁrst subframe. Using this approach, the search is
`forced to stop only 1% of the time. The average worst case
`is 90 times per subframe. That is, the worst case of the
`number of tested positions is 90
`16
`1440 out of 8192
`combinations.
`The periodicity in the excitation is provided by the adaptive-
`codebook contribution only. High-pitched voices will have
`more than one pitch pulse in a subframe. In that situation
`the ﬁxed-codebook contribution could potentially reduce the
`amount of periodicity, leading to degraded speech quality.
`This effect is avoided by introducing periodicity in the ﬁxed-
`codebook excitation by ﬁltering the ﬁxed-codebook vector
`through the ﬁlter
`, where
`is the
`integer part of the pitch lag for the current subframe, and
`is the adaptive-codebook gain. This modiﬁcation of the ﬁxed-
`codebook vectors is integrated in the search by ﬁltering the
`impulse response
`with
`before the codebook search
`(for lag values less than the framesize). Since at that time
`the quantized adaptive-codebook gain is not known, it was
`found that the past quantized pitch gain bounded by [0.2–0.8]
`provided a good alternative.
`
`F. Quantization of the Gains
`The adaptive-codebook gain and the ﬁxed codebook gain are
`vector quantized using 7 b. This joint quantization provides a
`saving of about 2 b compared to scalar quantization. Based
`on informal comparisons, it was found that this quantization
`did not introduce any noticeable degradations to the speech
`quality compared to unquantized gains.
`The gain codebook search is done by minimizing the mean-
`squared weighted error between original and reconstructed
`speech which is given by
`
`(29)
`
`where
`is the target vector (see Section II-D2), and
`are the ﬁltered adaptive-codebook and ﬁxed-
`and
`codebook vectors, respectively.
`the Fixed-Codebook Gain: The ﬁxed-
`1) Prediction of
`codebook gains in adjacent frames are correlated. An efﬁcient
`
`Ex. 1044 / Page 7 of 15
`
`

`SALAMI et al.: DESIGN AND DESCRIPTION OF CS-ACELP
`
`123
`
`way to exploit this redundancy is to use a log-energy gain
`predictor [15]. This gain-predictor not only helps in reducing
`the dynamic range of the ﬁxed-codebook gain, it also makes
`this range less dependent on the input level variations. The
`use of a moving-average ﬁlter reduces the propagation of
`channel errors.
`The ﬁxed-codebook gain
`
`can be expressed as
`
`(30)
`
`is a predicted gain based on previous ﬁxed-codebook
`where
`energies, and
`is a correction factor, which is coded for
`transmission.
`The mean energy of the ﬁxed-codebook contribution is
`given by
`
`2) Codebook Search for Gain Quantization: The adaptive-
`codebook gain,
`, and the factor
`are vector quantized using
`a two-stage conjugate-structured codebook [2]. The term con-
`jugate refers to the fact that each input vector is quantized as a
`linear combination of both codebooks. Such a structure reduces
`both computational and memory requirements [28]. The ﬁrst
`stage consists of a 3

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases