`TELECOMMUNICATION
`STANDARD
`
`DRAFT
`pr ETS 300 726
`
`March 1996
`
`Source: ETSI TC-SMG
`
`Reference: DE/SMG-020660
`
`ICS: 33.060.50
`
`Key words: EFR, digital cellular telecommunications system, Global System for Mobile communications
`(GSM), speech
`
`Digital cellular telecommunications system;
`Enhanced Full Rate (EFR) speech transcoding
`(GSM 06.60)
`
`ETSI
`
`European Telecommunications Standards Institute
`
`ETSI Secretariat
`
`Postal address: F-06921 Sophia Antipolis CEDEX - FRANCE
`Office address: 650 Route des Lucioles - Sophia Antipolis - Valbonne - FRANCE
`X.400: c=fr, a=atlas, p=etsi, s=secretariat - Internet: secretariat@etsi.fr
`
`Tel.: +33 92 94 42 00 - Fax: +33 93 65 47 16
`
`Copyright Notification: No part may be reproduced except as authorized by written permission. The copyright and the
`foregoing restriction extend to reproduction in all media.
`
`© European Telecommunications Standards Institute 1996. All rights reserved.
`
`*
`
`ZTE EXHIBIT 1008
`
`Page 1 of 52
`
`
`
`Page 2
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`Whilst every care has been taken in the preparation and publication of this document, errors in content,
`typographical or otherwise, may occur. If you have comments concerning its accuracy, please write to
`"ETSI Editing and Committee Support Dept." at the address shown on the title page.
`
`Page 2 of 52
`
`
`
`Page 3
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`Contents
`
`Foreword .......................................................................................................................................................5
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`Scope ..................................................................................................................................................7
`
`Normative references..........................................................................................................................7
`
`Definitions, symbols and abbreviations ...............................................................................................8
`3.1
`Definitions ............................................................................................................................8
`3.2
`Symbols ...............................................................................................................................9
`3.3
`Abbreviations .....................................................................................................................12
`
`Outline description.............................................................................................................................13
`4.1
`Functional description of audio parts .................................................................................13
`4.2
`Preparation of speech samples .........................................................................................13
`4.2.1
`PCM format conversion.................................................................................14
`Principles of the GSM enhanced full rate speech encoder................................................14
`Principles of the GSM enhanced full rate speech decoder................................................15
`Sequence and subjective importance of encoded parameters..........................................16
`
`4.3
`4.4
`4.5
`
`Functional description of the encoder ...............................................................................................16
`5.1
`Pre-processing...................................................................................................................16
`5.2
`Linear prediction analysis and quantisation .......................................................................16
`5.2.1
`Windowing and autocorrelation computation ................................................16
`5.2.2
`Levinson-Durbin algorithm ............................................................................18
`5.2.3
`LP to LSP conversion....................................................................................18
`5.2.4
`LSP to LP conversion....................................................................................20
`5.2.5
`Quantisation of the LSP coefficients .............................................................20
`5.2.6
`Interpolation of the LSPs ...............................................................................21
`Open-loop pitch analysis....................................................................................................22
`Impulse response computation..........................................................................................22
`Target signal computation .................................................................................................23
`Adaptive codebook search ................................................................................................23
`Algebraic codebook structure and search .........................................................................24
`Quantisation of the fixed codebook gain............................................................................27
`Memory update ..................................................................................................................28
`CRC-calculation .................................................................................................................28
`
`5.3
`5.4
`5.5
`5.6
`5.7
`5.8
`5.9
`5.10
`
`Functional description of the decoder ...............................................................................................29
`6.1
`Decoding and speech synthesis ........................................................................................29
`6.2
`Post-processing .................................................................................................................30
`6.2.1
`Adaptive postfiltering .....................................................................................30
`6.2.2
`Up-scaling .....................................................................................................31
`
`Variables, constants and tables in the C-code of the GSM EFR codec............................................32
`7.1
`Description of the constants and variables used in the C code .........................................32
`
`Homing sequences ...........................................................................................................................41
`8.1
`Functional description ........................................................................................................41
`8.2
`Definitions ..........................................................................................................................41
`8.3
`Encoder homing.................................................................................................................43
`8.4
`Decoder homing ................................................................................................................43
`8.5
`Encoder home state...........................................................................................................44
`8.6
`Decoder home state ..........................................................................................................46
`
`Page 3 of 52
`
`
`
`Page 4
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`9
`
`Bibliography ...................................................................................................................................... 51
`
`History ......................................................................................................................................................... 52
`
`Page 4 of 52
`
`
`
`Page 5
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`Foreword
`
`This draft European Telecommunication Standard (ETS) has been produced by the Special Mobile Group
`(SMG) Technical Committee of the European Telecommunications Standards Institute (ETSI) and is now
`submitted for the Public Enquiry phase of the ETSI standards approval procedure.
`
`This draft ETS describes the detailed mapping between input blocks of 160 speech samples in 13-bit
`uniform PCM format to encoded blocks of 260 bits and from encoded blocks of 260 bits to output blocks
`of 160 reconstructed speech samples within the digital cellular telecommunications system.
`
`This draft ETS corresponds to GSM technical specification, GSM 06.60, version 5.0.0
`
`Date of latest announcement of this ETS (doa):
`
`3 months after ETSI publication
`
`Proposed transposition dates
`
`Date of latest publication of new National Standard
`or endorsement of this ETS (dop/e):
`
`6 months after doa
`
`Date of withdrawal of any conflicting National Standard (dow):
`
`6 months after doa
`
`Page 5 of 52
`
`
`
`Page 6
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`Blank page
`
`Page 6 of 52
`
`
`
`Page 7
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`1
`
`Scope
`
`This Draft European Telecommunication Standard (ETS) describes the detailed mapping between input
`blocks of 160 speech samples in 13-bit uniform PCM format to encoded blocks of 260 bits and from
`encoded blocks of 260 bits to output blocks of 160 reconstructed speech samples. The sampling rate is
`8000 sample/s leading to a bit rate for the encoded bit stream of 13 kbit/s. The coding scheme is the
`so-called Algebraic Code Excited Linear Prediction Coder, hereafter referred to as ACELP.
`
`This ETS also specifies the conversion between A-law PCM and 13-bit uniform PCM. Performance
`requirements for the audio input and output parts are included only to the extent that they affect the
`transcoder performance. This part also describes the codec down to the bit level, thus enabling the
`verification of compliance to the part to a high degree of confidence by use of a set of digital test
`sequences. These test sequences are described in GSM 06.54 [7] and are available on disks.
`
`In case of discrepancy between the requirements described in this ETS and the fixed point computational
`description (ANSI-C code) of these requirements contained in GSM 06.53 [6], the description in
`GSM 06.53 [6] will prevail.
`
`The transcoding procedure specified in this ETS is applicable for the enhanced full rate speech traffic
`channel (TCH) in the GSM system.
`
`In GSM 06.51 [5], a reference configuration for the speech transmission chain of the GSM enhanced full
`rate (EFR) system is shown. According to this reference configuration, the speech encoder takes its input
`as a 13-bit uniform PCM signal either from the audio part of the Mobile Station or on the network side,
`from the PSTN via an 8-bit/A-law to 13-bit uniform PCM conversion. The encoded speech at the output of
`the speech encoder is delivered to a channel encoder unit which is specified in GSM 05.03 [3]. In the
`receive direction, the inverse operations take place.
`
`2
`
`Normative references
`
`This ETS incorporates by dated and undated reference, provisions from other publications. These
`normative references are cited at the appropriate places in the text and the publications are listed
`hereafter. For dated references, subsequent amendments to or revisions of any of these publications
`apply to this ETS only when incorporated in it by amendment or revision. For undated references, the
`latest edition of the publication referred to applies.
`
`[1]
`
`[2]
`
`[3]
`
`[4]
`
`[5]
`
`[6]
`
`[7]
`
`GSM 01.04 (ETR 100): "Digital cellular telecommunication system (Phase 2);
`Abbreviations and acronyms".
`
`GSM 03.50 (ETS 300 540): "Digital cellular telecommunication system (Phase
`2); Transmission planning aspects of the speech service in the GSM Public
`Land Mobile Network (PLMN) system".
`
`GSM 05.03 (ETS 300 575): "Digital cellular
`(Phase 2); Channel coding".
`
`telecommunication system
`
`GSM 06.32 (ETS 300 580-6): "Digital cellular telecommunication system (Phase
`2); Voice Activity Detection (VAD)".
`
`GSM 06.51 (prETS 300 723): "Digital cellular telecommunications system;
`Enhanced Full Rate (EFR) speech processing functions General description
`
`GSM 06.53 (prETS 300 724): "Digital cellular telecommunications system;
`ANSI-C code for the GSM Enhanced Full Rate (EFR) speech codec".
`
`GSM 06.54 (Work item DE/SMG-020654 prETS 300 725): "Digital cellular
`telecommunications system; Test vectors for the GSM Enhanced Full Rate
`(EFR) speech codec".
`
`Page 7 of 52
`
`
`
`Page 8
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`[8]
`
`[9]
`
`3
`
`3.1
`
`ITU-T Recommendation G.711 (1988): "Coding of analogue signals by pulse
`code modulation Pulse code modulation (PCM) of voice frequencies".
`
`ITU-T Recommendation G.726: "40, 32, 24, 16 kbit/s adaptive differential pulse
`code modulation (ADPCM)".
`
`Definitions, symbols and abbreviations
`
`Definitions
`
`For the purpose of this ETS the following definitions apply.
`
`adaptive codebook:
`
`The adaptive codebook contains excitation vectors that are adapted for every
`subframe. The adaptive codebook is derived from the long term filter state. The
`lag value can be viewed as an index into the adaptive codebook.
`
`adaptive postfilter:
`
`This filter is applied to the output of the short term synthesis filter to enhance the
`perceptual quality of the reconstructed speech. In the GSM enhanced full rate
`codec, the adaptive postfilter is a cascade of two filters: a formant postfilter and
`a tilt compensation filter
`
`algebraic codebook:
`
`A fixed codebook where algebraic code is used to populate the excitation
`vectors (innovation vectors).The excitation contains a small number of nonzero
`pulses with predefined interlaced sets of positions.
`
`closed-loop pitch analysis: This is the adaptive codebook search, i.e., a process of estimating the pitch
`(lag) value from the weighted input speech and the long term filter state. In the
`closed-loop search, the lag is searched using error minimisation loop (analysis-
`by-synthesis). In the GSM enhanced full rate codec, closed-loop pitch search is
`performed for every subframe.
`
`direct form coefficients: One of the formats for storing the short term filter parameters. In the GSM
`enhanced full rate codec, all filters which are used to modify speech samples
`use direct form coefficients.
`
`fixed codebook:
`
`The fixed codebook contains excitation vectors for speech synthesis filters. The
`contents of the codebook are non-adaptive (i.e., fixed). In the GSM enhanced
`full rate codec, the fixed codebook is implemented using an algebraic codebook.
`
`fractional lags:
`
`A set of lag values having sub-sample resolution. In the GSM enhanced full rate
`codec a sub-sample resolution of 1/6th of a sample is used.
`
`frame:
`
`A time interval equal to 20 ms (160 samples at an 8 kHz sampling rate).
`
`integer lags:
`
`A set of lag values having whole sample resolution.
`
`interpolating filter:
`
`An FIR filter used to produce an estimate of sub-sample resolution samples,
`given an input sampled with integer sample resolution.
`
`inverse filter:
`
`lag:
`
`This filter removes the short term correlation from the speech signal. The filter
`models an inverse frequency response of the vocal tract.
`
`The long term filter delay. This is typically the true pitch period, or a multiple or
`sub-multiple of it.
`
`Line Spectral Frequencies:
`
`(see Line Spectral Pair)
`
`Page 8 of 52
`
`
`
`Line Spectral Pair:
`
`LP analysis window:
`
`Page 9
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`Transformation of LPC parameters. Line Spectral Pairs are obtained by
`decomposing the inverse filter transfer function A(z) to a set of two transfer
`functions, one having even symmetry and the other having odd symmetry. The
`Line Spectral Pairs (also called as Line Spectral Frequencies) are the roots of
`these polynomials on the z-unit circle).
`
`For each frame, the short term filter coefficients are computed using the high
`pass filtered speech samples within the analysis window. In the GSM enhanced
`full rate codec, the length of the analysis window is 240 samples. For each
`frame, two asymmetric windows are used to generate two sets of LP
`coefficients. No samples of the future frames are used (no lookahead).
`
`LP coefficients:
`
`Linear Prediction (LP) coefficients (also referred as Linear Predictive Coding
`(LPC) coefficients) is a generic descriptive term for describing the short term
`filter coefficients.
`
`open-loop pitch search:A process of estimating the near optimal lag directly from the weighted speech
`input. This is done to simplify the pitch analysis and confine the closed-loop
`pitch search to a small number of lags around the open-loop estimated lags. In
`the GSM enhanced full rate codec, open-loop pitch search is performed every
`10 ms.
`
`residual:
`
`The output signal resulting from an inverse filtering operation.
`
`short term synthesis filter: This filter introduces, into the excitation signal, short term correlation which
`models the impulse response of the vocal tract.
`
`perceptual weighting filter: This filter is employed in the analysis-by-synthesis search of the codebooks.
`The filter exploits the noise masking properties of the formants (vocal tract
`resonances) by weighting the error less in regions near the formant frequencies
`and more in regions away from them.
`
`subframe:
`
`A time interval equal to 5 ms (40 samples at an 8 kHz sampling rate).
`
`vector quantisation:
`
`A method of grouping several parameters into a vector and quantising them
`simultaneously.
`
`zero input response:
`
`The output of a filter due to past inputs, i.e. due to the present state of the filter,
`given that an input of zeros is applied.
`
`zero state response:
`
`The output of a filter due to the present input, given that no past inputs have
`been applied, i.e.,. given the state information in the filter is all zeroes.
`
`3.2
`
`Symbols
`
`The inverse filter with unquantised coefficients
`
`For the purpose of this ETS the following symbols apply.
`( )A z
`( )
`z
`
`The inverse filter with quantised coefficients
`
`A
`
`The speech synthesis filter with quantised coefficients
`
`The unquantised linear prediction parameters (direct form coefficients)
`
`1
`( )
`z
`
`A
`
`ai
`
`Page 9 of 52
`
`
`
`Page 10
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`ai
`( )W z
`g g
`,
`
`1
`
`2
`
`The quantised linear prediction parameters
`
`The perceptual weighting filter (unquantised coefficients)
`
`The perceptual weighting factors
`
`F z(
`
`)
`
`Adaptive prefilter
`
`H zf ( )
`
`d
`
`H zt ( )
`
`t
`
`m g=
`
`t k1
`
`H zh1( )
`
`The adaptive prefilter coefficient
`
`The formant postfilter
`
`Control coefficient for the amount of the formant postfiltering
`
`Tilt compensation filter
`
`Control coefficient for the amount of the tilt compensation filtering
`
`A tilt factor, with k1 being the first reflection coefficient
`
`Pre-processing high-pass filter
`
`w n1( ) , w n2(
`
`)
`
`LP analysis windows
`
`lag( )
`w i
`
`fs
`
`F z1 ( )
`
`F z2 ( )
`
`T xm ( )
`
`f
`
`i( )
`
`Lag window for the autocorrelations (60 Hz bandwidth expansion)
`
`The sampling frequency
`
`Symmetric LSF polynomial
`
`Antisymmetric LSF polynomial
`
`A m th order Chebyshev polynomial
`
`The coefficients of either F z1(
`
`) or F z2(
`
`)
`
`)1 n , z(
`z( )(
`
`
`
`
`
`)(2 n
`
`)
`
`The mean-removed LSF vectors
`
`
`
`
`
`)r( )(1 n , r(
`
`
`
`)(2 n
`
`)
`
`The LSF prediction residual vectors
`
`p( )n
`
`w ii ,
`
`= 1
`10
`,
`,
`
`,
`
`The predicted LSF vector
`
`LSP-quantisation weighting factors
`
`h n(
`
`)
`
`s n'( )
`
`s nw(
`
`)
`
`The impulse response of the weighted synthesis filter
`
`The windowed speech signal
`
`The weighted speech signal
`
`Page 10 of 52
`
`b
`g
`g
`
`
`Page 11
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`
`
`(s n
`
`)
`
`
`
` ( )¢s n
`
`Reconstructed speech signal
`
`The gain-scaled postfiltered signal
`
`
`
`
`
` (s nf
`
`)
`
`Postfiltered speech signal (before scaling)
`
`x n(
`
`)
`
`x n2(
`
`)
`
`r n(
`
`)
`
`c n(
`
`)
`
`v n(
`
`)
`
`The target signal for adaptive codebook search
`
`The target signal for algebraic codebook search
`
`The LP residual signal
`
`The fixed codebook vector
`
`The adaptive codebook vector
`
`( ) = ( )
`( )
`v n h n
`y n
`
`The filtered adaptive codebook vector
`
`y nk ( )
`
`u n(
`
`)
`
`'( )
`u n
`
`Top
`
`tmin
`
`tmax
`
`R k
`
`Tk
`
`=
`t
`d H x
`
`2
`
`H
`
`F = H Ht
`
`d n( )
`
`f ( , )
`i
`j
`
`c k
`
`The past filtered excitation
`
`The excitation signal
`
`The gain-scaled emphasised excitation signal
`
`Open-loop lag
`
`Minimum lag search value
`
`Maximum lag search value
`
`Correlation term to be maximised in the adaptive codebook search
`
`Correlation term to be maximised in the algebraic codebook search
`( )
`x n2
`
`The correlation between the target signal
`( )h n , i.e., backward filtered target
`
` and the impulse response
`
`The lower triangular Toepliz convolution matrix with diagonal
`diagonals ( )
`(
`)
`1
`39
`h
`h
`
`, ,
`The matrix of correlations of ( )h n
`
`( )h 0 and lower
`
`The elements of the vector d
`
`The elements of the symmetric matrix F
`
`The innovation vector
`
`Page 11 of 52
`
`*
`
`
`Page 12
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`The position of the i th pulse
`
`The number of pulses
`
`The weighted sum of the normalised
`prediction residual
`
`( )d n vector and normalised long-term
`
`Sign extended backward filtered target
`
`The mean-removed innovation energy (in dB)
`
`The mean of the innovation energy
`
`The predicted energy
`
`The MA prediction coefficients
`
`The fixed-codebook gain
`
`The predicted fixed-codebook gain
`
`The quantised fixed codebook gain
`
`The adaptive codebook gain
`
`The quantised adaptive codebook gain
`
`mi
`
`N p
`
`b n( )
`
`d n' (
`
`)
`
`E n(
`
`)
`
`E
`
`
`
`~
`( )E n
`[
`
`
`
`b b b b
`1 2 3 4
`
`]
`
`gc
`
`'
`gc
`
`gc
`
`g p
`
`g p
`
`g = g
`
`/
`
`c
`
`g
`
`'
`c
`
`g
`
`g(D)
`
`GF(2)
`
`3.3
`
`'
`A correction factor between the gain gc and the estimated one gc
`The optimum value for g
`
`A cyclic generator polynomial
`
`Gain scaling factor
`
`Galois field of 2 elements
`
`Abbreviations
`
`For the purposes of this ETS the following abbreviations apply. Further GSM related abbreviations may be
`found in GSM 01.04 [1].
`
`ACELP
`AGC
`CELP
`CRC
`FIR
`ISPP
`LP
`LPC
`LSF
`LSP
`
`Algebraic Code Excited Linear Prediction
`Adaptive Gain Control
`Code Excited Linear Prediction
`Cyclic Redundancy Check
`Finite Impulse Response
`Interleaved Single-Pulse Permutation
`Linear Prediction
`Linear Predictive Coding
`Line Spectral Frequency
`Line Spectral Pair
`
`Page 12 of 52
`
`g
`
`
`Page 13
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`LTP
`MA
`
`4
`
`Long Term Predictor (or Long Term Prediction)
`Moving Average
`
`Outline description
`
`This ETS is structured as follows:
`
`Section 4.1 contains a functional description of the audio parts including the A/D and D/A functions.
`Section 4.2 describes the conversion between 13-bit uniform and 8-bit A-law samples. Sections 4.3 and
`4.4 present a simplified description of the principles of the GSM EFR encoding and decoding process
`respectively. In section 4.5, the sequence and subjective importance of encoded parameters are given.
`
`Section 5 presents the functional description of the GSM EFR encoding, whereas section 6 describes the
`decoding procedures. Section 7 describes variables, constants and tables of the C-code of the GSM EFR
`codec.
`
`4.1
`
`Functional description of audio parts
`
`The analogue-to-digital and digital-to-analogue conversion will in principle comprise the following
`elements:
`
`1)
`
`Analogue to uniform digital PCM
`microphone;
`input level adjustment device;
`input anti-aliasing filter;
`sample-hold device sampling at 8 kHz;
`analogue-to-uniform digital conversion to 13-bit representation.
`
`The uniform format shall be represented in two's complement.
`
`2) Uniform digital PCM to analogue
`conversion from 13-bit/8 kHz uniform PCM to analogue;
`a hold device;
`reconstruction filter including x/sin( x ) correction;
`output level adjustment device;
`earphone or loudspeaker.
`
`In the terminal equipment, the A/D function may be achieved either
`
`by direct conversion to 13-bit uniform PCM format;
`
`or by conversion to 8-bit/A-law companded format, based on a standard A-law codec/filter
`according to ITU-T Recommendations G.711 [8] and G.714, followed by the 8-bit to 13-bit
`conversion as specified in section 4.2.1.
`
`For the D/A operation, the inverse operations take place.
`
`In the latter case it should be noted that the specifications in ITU-T G.714 (superseded by G.712) are
`concerned with PCM equipment located in the central parts of the network. When used in the terminal
`equipment, this ETS does not on its own ensure sufficient out-of-band attenuation. The specification of
`out-of-band signals is defined in GSM 03.50 [2] in section 2.
`
`4.2
`
`Preparation of speech samples
`
`The encoder is fed with data comprising of samples with a resolution of 13 bits left justified in a 16-bit
`word. The three least significant bits are set to '0'. The decoder outputs data in the same format. Outside
`the speech codec further processing must be applied if the traffic data occurs in a different representation.
`
`Page 13 of 52
`
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`
`
`Page 14
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`4.2.1
`
`PCM format conversion
`
`The conversion between 8-bit A-Law compressed data and linear data with 13-bit resolution at the speech
`encoder input shall be as defined in ITU-T Rec. G.711 [8].
`
`ITU-T Rec. G.711 [8] specifies the A-Law to linear conversion and vice versa by providing table entries.
`Examples on how to perform the conversion by fixed-point arithmetic can be found in ITU-T Rec. G.726
`[9]. Section 4.2.1 of G.726 [9] describes A-Law to linear expansion and section 4.2.7 of G.726 [9] provides
`a solution for linear to A-Law compression.
`
`4.3
`
`Principles of the GSM enhanced full rate speech encoder
`
`The codec is based on the code-excited linear predictive (CELP) coding model (see Bibliography). A 10th
`order linear prediction (LP), or short-term, synthesis filter is used which is given by
`
`1
`
`im
`
`=
`
`-(cid:229)
`
`
`
`H z( )
`
`=
`
`=
`
`1
`
`A z( )
`
`
`
`+
`
`1
`
`,
`
`i
`
`(1)
`
`
`
`a z
`i
`
`1
`= 1 are the (quantised) linear prediction (LP) parameters, and m = 10 is the predictor
`,m
`where ,a i
`
`,
`,
`
`i
`order. The long-term, or pitch, synthesis filter is given by
`
`=
`
`1
`
`B z(
`
`)
`
`1
`- -
`g zp
`
`1
`
`,
`
`T
`
`(2)
`
`where T is the pitch delay and g p is the pitch gain. The pitch synthesis filter is implemented using the
`so-called adaptive codebook approach.
`
`The CELP speech synthesis model is shown in figure 2. In this model, the excitation signal at the input of
`the short-term LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed
`(innovative) codebooks. The speech is synthesised by feeding the two properly chosen vectors from these
`codebooks through the short-term synthesis filter. The optimum excitation sequence in a codebook is
`chosen using an analysis-by-synthesis search procedure in which the error between the original and
`synthesised speech is minimised according to a perceptually weighted distortion measure.
`
`The perceptual weighting filter used in the analysis-by-synthesis search technique is given by
`
`gg
`
`A z(
`/
`)
`
`A z(
`/
`)
`( )A z is the unquantised LP filter and 0
`
`<
`< £g g
`1
` are the perceptual weighting factors. The
`where
`2
`1
` and g
`values g
`0 9= .
`0 6= .
` are used. The weighting filter uses the unquantised LP parameters while
`2
`1
`the formant synthesis filter uses the quantised ones.
`
`,
`
`1 2
`
`
`
`W z( )
`
`=
`
`(3)
`
`The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency
`of 8000 sample/s. At each 160 speech samples, the speech signal is analysed to extract the parameters
`of the CELP model (LP filter coefficients, adaptive and fixed codebooks' indices and gains). These
`parameters are encoded and transmitted. At the decoder, these parameters are decoded and speech is
`synthesised by filtering the reconstructed excitation signal through the LP synthesis filter.
`
`Page 14 of 52
`
`
`
`Page 15
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`The signal flow at the encoder is shown in figure 3. LP analysis is performed twice per frame. The two
`sets of LP parameters are converted to line spectrum pairs (LSP) and jointly quantised using split matrix
`quantisation (SMQ) with 38 bits. The speech frame is divided into 4 subframes of 5 ms each (40
`samples). The adaptive and fixed codebook parameters are transmitted every subframe. The two sets of
`quantised and unquantised LP filters are used for the second and fourth subframes while in the first and
`third subframes interpolated LP filters are used (both quantised and unquantised). An open-loop pitch lag
`is estimated twice per frame (every 10 ms) based on the perceptually weighted speech signal.
`
`Then the following operations are repeated for each subframe:
`
`) is computed by filtering the LP residual through the weighted synthesis filter
`The target signal x n(
`
`
`W z H z( ) (
`
`) with the initial states of the filters having been updated by filtering the error between
`LP residual and excitation (this is equivalent to the common approach of subtracting the zero input
`response of the weighted synthesis filter from the weighted speech signal).
`
`The impulse response, h n(
`
`) of the weighted synthesis filter is computed.
`
`)
`Closed-loop pitch analysis is then performed (to find the pitch lag and gain), using the target x n(
`and impulse response h n(
`) , by searching around the open-loop pitch lag. Fractional pitch with
`1/6th of a sample resolution is used. The pitch lag is encoded with 9 bits in the first and third
`subframes and relatively encoded with 6 bits in the second and fourth subframes.
`
`) is updated by removing the adaptive codebook contribution (filtered
`The target signal x n(
`) , is used in the fixed algebraic codebook search
`adaptive codevector), and this new target, x n2(
`(to find the optimum innovation). An algebraic codebook with 37 bits is used for the innovative
`excitation.
`
`The gains of the adaptive and fixed codebook are scalar quantised with 4 and 5 bits respectively
`(with moving average (MA) prediction applied to the fixed codebook gain).
`
`Finally, the filter memories are updated (using the determined excitation signal) for finding the target
`signal in the next subframe.
`
`The bit allocation of the codec is shown in table 1. In each 20 ms speech frame, 260 bits are produced,
`corresponding to a bit rate of 13 kbit/s. Within these 260 bits, 8 bits are used for CRC error checking.
`More detailed bit allocation is presented in table 6. Note that the most significant bits (MSB) are always
`sent first.
`
`Table 1: Bit allocation of the 13 kbit/s coding algorithm for 20 ms frame.
`
`Parameter
`
`1st & 3rd subframes
`
`2nd & 4th subframes
`
`total per frame
`
`2 LSP sets
`Parity bits
`Pitch delay
`Pitch gain
`Algebraic code
`Codebook gain
`Total
`
`9
`4
`37
`5
`
`6
`4
`37
`5
`
`38
`8
`30
`16
`148
`20
`260
`
`4.4
`
`Principles of the GSM enhanced full rate speech decoder
`
`The signal flow at the decoder is shown in figure 4. At the decoder, the transmitted indices are extracted
`from the received bitstream. The indices are decoded to obtain the coder parameters at each
`transmission frame. These parameters are the two LSP vectors, the 4 fractional pitch lags, the 4
`innovative codevectors, and the 4 sets of pitch and innovative gains. The LSP vectors are converted to the
`
`Page 15 of 52
`
`
`
`Page 16
`Draft prETS 300 726: March 1996 (GSM 06.60 version 5.0.0)
`
`LP filter coefficients and interpolated to obtain LP filters at each subframe. Then, at each 40-sample
`subframe:
`
`-
`
`-
`
`the excitation is constructed by adding the adaptive and innovative codevectors scaled by their
`respective gains.
`
`the speech is reconstructed by filtering the excitation through the LP synthesis filter.
`
`Finally, the reconstructed speech signal is passed through an adaptive postfilter.
`
`4.5
`
`Sequence and subjective importance of encoded parameters
`
`The encoder will produce the output information in a unique sequence and format, and the decoder must
`receive the same information in the same way. In table 6, the sequence of output bits s1 to s260 and the
`bit allocation for each parameter is shown.
`
`The different parameters of the encoded speech and their individual bits have unequal importance with
`respect to subjective quality. Before being submitted to the channel encoding function the bits have to be
`rearranged in the sequence of importance as given in table 7.
`
`5
`
`Functional description of the encoder
`
`In this section, the different functions of the encoder represented in figure 3 are described.
`
`5.1
`
`Pre-processing
`
`Two pre-processing functions are applied prior to the encoding process: high-pass filtering and signal
`down-scaling.
`
`Down-scaling consists of dividing the input by a factor of 2 to reduce the possibility of overflows in the
`fixed-point implementation.
`
`The high-pass filter serves as a precaution against undesired low frequency components. A filter with a
`cut off frequency of 80 Hz is used, and it is given by
`
`H z(
`
`
`h1
`
`)
`
`=
`
`+
`- -- -
`0 92727435
`0 92727435 18544941
`.
`.
`.
`z
`z
`+
`2
`1
`0 9114024
`1 19059465
`z
`z
`.
`.
`
`1
`
`2
`
`.
`
`(4)
`
`Down-scaling and high-pass filtering are combined by dividing the coefficients at the numerator of
`
`) by 2.
`H zh1(
`
`5.2
`
`Linear prediction analysis and quantisation
`
`Short-term prediction, o