`
`(12) United States Patent
`Lin et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,599,832 B2
`*Oct. 6, 2009
`
`(54) METHOD AND DEVICE FOR ENCODING
`SPEECH USING OPEN-LOOP PITCH
`ANALYSIS
`
`(75) Inventors: Daniel Lin, Montville, NJ (US); Brian
`M. McCarthy, Lafeyette Hill, PA (US)
`(73) Assignee: InterDigital Technology Corporation,
`Wilmington, DE (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 22 days.
`This patent is subject to a terminal dis-
`claimer.
`
`(51) Int. Cl.
`(2006.01)
`G10L 19/12
`(52) U.S. Cl. ...................................................... 704/219
`(58) Field of Classification Search .................. 704/219
`See application file for complete search history.
`References Cited
`|U.S. PATENT DOCUMENTS
`
`(56)
`
`3,617,636 A 11/1971 Ogihara
`-
`(Continued)
`FOREIGN PATENT DOCUMENTS
`86/08726
`6/1986
`WO 86 08 72 6
`6/1986
`
`WO
`WO
`
`(21) Appl. No.: 11/363,807
`31, 21.
`(22) Filed:
`(65)
`
`Feb. 28, 2006
`Prior Publication Data
`US 2006/0143003 A1
`Jun. 29, 2006
`
`Related U.S. Application Data
`(63) Continuation of application No. 10/924,398, filed on
`Aug. 23, 2004, now Pat. No. 7,013,270, which is a
`continuation of application No. 10/446,314, filed on
`May 28, 2003, now Pat. No. 6,782,359, which is a
`continuation of application No. 10/083,237, filed on
`Feb. 26, 2002, now Pat. No. 6,611,799, which is a
`continuation of application No. 09/805,634, filed on
`Mar. 14, 2001, now Pat. No. 6,385,577, which is a
`continuation of application No. 09/441,743, filed on
`Nov. 16, 1999, now Pat. No. 6,223,152, which is a
`continuation of application No. 08/950,658, filed on
`Oct. 15, 1997, now Pat. No. 6,006,174, which is a
`continuation of application No. 08/670,986, filed on
`Jun. 28, 1996, now abandoned, which is a continuation
`of application No. 08/104,174, filed on Aug. 9, 1993,
`now abandoned, which is a continuation of application
`No. 07/592,330, filed on Oct. 3, 1990, now Pat. No.
`5,235,670.
`
`OTHER PUBLICATIONS
`Proc. ICASSP '82, A New Model of LPC Excitation for Producing
`Natural-Sounding Speech at Low Bit Rates, B.S. Atal and J.R.
`Remde, pp. 614-617, Apr. 1982.
`(Continued)
`Primary Examiner–Susan McFadden
`(74) Attorney, Agent, or Firm—Volpe and Koenig, P.C.
`
`(57)
`
`ABSTRACT
`
`The present invention is a synthetic speech encoding device
`that produces a synthetic speech signal which closely
`matches an actual speech signal. The actual speech signal is
`digitized, and excitation pulses are selected by minimizing
`the error between the actual and synthetic speech signals. The
`preferred pattern of excitation pulses needed to produce the
`synthetic speech signal is obtained by using an excitation
`pattern containing a multiplicity of weighted pulses at timed
`positions. The selection of the location and amplitude of each
`excitation pulse is obtained by minimizing an error criterion
`between the synthetic speech signal and the actual speech
`signal. The error criterion function incorporates a perceptual
`weighting filter which shapes the error spectrum.
`
`16 Claims, 12 Drawing Sheets
`
`& KBPS MULTIPULSE SPEECH CODER
`TRANSiTIER
`
`
`
`
`
`Filief. §TATES
`
`BIT ALLOCATGN120 MSEG FRAME
`34
`24
`90
`#2
`
`|
`
`Ex. 1023 / Page 1 of 19
`Apple v. Saint Lawrence
`
`
`
`US 7,599,832 B2
`Page 2
`
`|U.S. PATENT DOCUMENTS
`4,058,676
`4,618,982
`4,669,120
`4,731,846
`4,776,015
`4,797,925
`4,815, 134
`4,845,753
`4,868,867
`4,890,327
`4,980,916
`4,991,213
`5,001,759
`5,027,405
`5,127,053
`5,235,670
`5,265,167
`5,307.441
`5,327,520
`5,568,512
`5,675,702
`5,999,899
`6,014,622
`6,148,282
`
`1 1/1977
`10/1986
`5/1987
`3/1988
`10/1988
`1/1989
`3/1989
`7/1989
`9/1989
`12/1989
`12/1990
`2/1991
`3/1991
`6/1991
`6/1992
`8/1993
`1 1/1993
`4/1994
`7/1994
`10/1996
`10/1997
`12/1999
`1/2000
`11/2000
`
`Wilkes et al.
`Horvath et al.
`Ono
`Secrest et al.
`Takeda et al.
`Lin
`Picone et al.
`Yasunaga
`Davidson et al.
`Bertrand et al.
`Zinser
`Wilson
`Fukui
`Ozawa
`Koch
`Lin et al.
`Akamine et al.
`Tzeng
`Chen
`McCree et al.
`Gerson et al.
`Robinson
`Su et al. ...............
`Paksoy et al. .........
`- - - - - - 704/219
`
`- - - - - - 704/223
`
`6,243,672 B1* 6/2001 Iijima et al. ................. 704/207
`6,246,979 B1
`6/2001 Carl
`6,345,248 B1* 2/2002 Su et al. ..................... 704/223
`6,591,234 B1
`7/2003 Chandran et al.
`6,633,839 B2 10/2003 Kushner et al.
`7,254,533 B1* 8/2007 Jabri et al. .................. 704/219
`OTHER PUBLICATIONS
`
`Proc. ICASSP’84, Improving Performance of Multi-Pulse Coders at
`Low Bit Rates, S. Singhal and B.S. Atal, paper 13, Mar. 1984.
`Proc. ICASSP '84, Efficient Computation and Encoding of the Mul
`tiple Excitation for LPC, M. Berouti et al., paper 10.1, Mar. 1984.
`Proc. ICASSP’86, Implementation of Multi-Pulse Coderona Single
`Chip Floating-Point Signal Processor, H. Alrutz, paper 44.3, Apr.
`1986.
`Digital Telephony, John Bellamy, pp. 153-154, 1991.
`Veeneman et al., “Computationally Efficient Stochastic Coding of
`Speech,” 1990, IEEE 40° Vehicular Technology Conference, May
`1990, pp. 331-335.
`Chazan et al., “Speech Reconstruction from Mel Frequency Cepstral
`Coefficients and Pitch Frequency.” IEEE International Conference on
`Acoustics, Speech, and Signal Processing, vol. 3, pp. 1299–1302
`(Jun. 2000).
`
`* cited by examiner
`
`Ex. 1023 / Page 2 of 19
`
`
`
`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 1 of 12
`
`US 7,599,832 B2
`
`
`
`was:cumso220503.?to
`
`oa
`
`2
`
`$28“:
`
`Nu
`
`<8
`
`we
`
`mafia?»5
`
`
`
`magma...32.5,:
`
`
`
`mmooozommmmmafia—5::mamxa
`
`
`
`
`
`
`
`ozeofi;zoaxgzoo
`
`EESEEEo5ea9:zopfizfia2%?
`
`5258
`
`9.
`
`20535a32«Es.sz
`
`Ea
`
`Emma”
`
`.my:
`
`«55:5
`
`228%:fine*M.<
`
`Ezmzmax832:2.58%“..
`358%SEv0a.u
`$25$5;.
`
`rdd
`
`Niamsz
`
`$432.
`
`$20.55.
`
`>j<Emmo¢wm
`
`:83”vtam;9...ad
`
`u.u
`
`Ex. 1023/ Page 3 of 19
`
`Ex. 1023 / Page 3 of 19
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 2 of 12
`
`US 7,599,832 B2
`
`48
`
`SAMPLING RATE = 8000Hz
`
`|
`
`|
`
`|
`|
`T Sec,
`
`FIG. 2
`
`Ex. 1023 / Page 4 of 19
`
`
`
`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 3 of 12
`
`US 7,599,832 B2
`
`
`
`Quel? - qms puz
`
`
`
`QUIB)} - qms ?s,
`
`Ex. 1023 / Page 5 of 19
`
`
`
`U.S. Patent
`U.S. Patent
`
`Oct. 6, 2009
`Oct. 6, 2009
`
`Sheet 4 of 12
`Sheet 4 of 12
`
`US 7,599,832 B2
`US 7,599,832 B2
`
`
`
`aha
`
`9a
`
`E.“
`
`“mr
`
`as“....as
`
`new....an
`
`.5
`
`23:.ea
`
`Va
`
`“3
`
`das“....ea
`
`(u)ds
`Ea”
`
`2.:
`
`
`
`
`
`:owwmm59m;Siamwoxmm
`
`vdc
`
`Ex. 1023/ Page 6 of 19
`
`Ex. 1023 / Page 6 of 19
`
`
`
`
`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 5 of 12
`
`US 7,599,832 B2
`
`<3:
`
`mm
`
`zoPomflux
`
`hszEmoo
`
`Ema—“Emsom._o...
`
`
`
`
`
`zo.._.<N_._.z<=ahszEmoozoFomdmm
`
`em
`
`om
`
`.2528
`
`moz<._.w_o
`
`205320.20
`
`imp—ammo
`
`mozsba
`
`225320.20
`
`3
`
`52.2:.zoz
`
`5.28323
`
`mun—Egg
`
`Emoazs.zoz
`
`
`
`$.28owe—32:
`
`mmNFzga
`
`m.oE
`
`Ex. 1023/ Page 7 of 19
`
`Ex. 1023 / Page 7 of 19
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`m
`
`m
`
`
`
`man:.13mflu2.53m;sEmaEmoo
`
`d
`
`6,u2.5%;0loMEggs2wca:33
`
`maan!.1;8m3825
`
`3
`
`US 7,599,832 B2
`
`na
`
`
`
`"oz—b.2225zap—.525»;on...
`
`o.9“.
`
`Ex. 1023/ Page 8 of 19
`
`Ex. 1023 / Page 8 of 19
`
`
`
`
`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 7 of 12
`
`US 7,599,832 B2
`
`FIG, 7
`
`
`
`
`
`
`
`CALCULATE
`Bri ri-º-1
`
`
`
`
`
`
`
`
`
`
`
`SOLVE
`MATRIX
`Q = H?.
`USING
`CHOLESKI
`DECOMPOSITION
`
`§
`
`Ex. 1023 / Page 9 of 19
`
`
`
`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 8 of 12
`
`US 7,599,832 B2
`
`h(n)
`
`x(n)
`COMPUTE
`C(i) = x(n)h(n-1)
`
`B(i) = B(i)/ G
`FOR 0 3. p
`
`É
`
`&
`
`MULTIPULSE
`ANALYSIS
`
`FIG, 8
`
`
`
`Ex. 1023 / Page 10 of 19
`
`
`
`U.S. Patent
`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 9 of 12
`
`US 7,599,832 B2
`2B2w,w597SU
`
`m3.3.25.3."E.
`
`E:
`(u) ||
`
`c;
`
`
`
`
`
`m3:82"?M2.2.ea«23.n3avamo“.cuE:w.new“2‘
`ama.dm
`
`d V
`
`
`
`
`
`mmzonfimmmmSaEHam—usz
`
`92
`on
`
`a.o_“_
`
`Ex.1023/Page11of19
`
`Ex. 1023 / Page 11 of 19
`
`
`
`
`U.S. Patent
`U.S. Patent
`
`m
`
`mm
`
`US 7,599,832 B2
`2B2w,
`
`7S
`
`2:
`
`
`
`
`
`
`
`m,9}:63;$5.}.13.2.5“J.
`
`nmag.25meg.3..mEEwoo<
`
`
`
`Ufifiazz233%..
`
`22“.
`
`Ex. 1023/Page 12 Of19
`
`Ex. 1023 / Page 12 of 19
`
`
`
`
`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 11 0f 12
`
`US 7,599,832 B2
`
`12-?
`
`3
`
`
`
`a
`Ea
`
`Va
`
`ad
`
`umv
`
`2:2.
`
`25
`
`2:2.ea
`
`
`
`228szwhammzmo
`
`8..8:15;
`
`a:5:1,?
`
`<0.
`
`8
`
`3.01
`
`Ex. 1023/Page 13 Of19
`
`Ex. 1023 / Page 13 of 19
`
`
`
`US. Patent
`
`Oct. 6, 2009
`
`Sheet 12 of 12
`
`US 7,599,832 B2
`
`FIG.12
`
`ga:
`.9
`m
`z
`3
`-‘——6
`a
`If,
`_l
`g
`c:
`
`m
`gI:
`
`aZ
`
`2a
`
`2
`-———-m
`aI.I.l
`a:
`a.
`u!
`_I
`<5
`
`A
`2m
`v
`
`In
`
`
`
`cc:
`co
`N
`
`a)
`00
`
`(3
`
`A
`v-Ia
`V
`
`(I)
`
`O
`3 00
`
`AA
`w-V'N‘!‘
`
`a
`E or)
`g N
`
`
`
`fii
`s a:
`g N
`
`moaess
`
`3
`a:
`O
`I—
`w
`a)
`:3
`a:
`D
`o
`m
`m
`.1
`2%
`P.
`
`gh 5
`
`%
`u—
`
`Ex. 1023/Page 14 Of19
`
`Ex. 1023 / Page 14 of 19
`
`
`
`US 7,599,832 B2
`
`1
`METHOD AND DEVICE FOR ENCODING
`SPEECH USING OPEN-LOOP PITCH
`ANALYSIS
`
`This application is a continuation of U.S. patent applica- 5
`tion Ser. No. 10/924,398, filed Aug. 23, 2004, which is a
`continuation of U.S. patent application Ser. No. 10/446,314,
`filed May 28, 2003, now U.S. Pat. No. 6,782,359, which is a
`continuation of U.S. patent application Ser. No. 10/083,237,
`filed Feb. 26, 2002, now U.S. Pat. No. 6,611,799, which is a 10
`continuation of U.S. patent application Ser. No. 09/805,634,
`filed Mar. 14, 2001, now U.S. Pat. No. 6,385,577, which is a
`continuation of U.S. patent application Ser. No. 09/441,743,
`filed Nov. 16, 1999, now U.S. Pat. No. 6,223,152, which is a
`continuation of U.S. patent application Ser. No. 08/950,658,
`filed Oct. 15, 1997, now U.S. Pat. No. 6,006,174, which is a
`continuation of U.S. patent application Ser. No. 08/670,986,
`filed Jun. 28, 1996, which is a continuation of U.S. patent
`application Ser. No. 08/104,174, filed Aug. 9, 1993, which is
`a continuation of U.S. patent application Ser. No. 07/592,330,
`filed Oct. 3, 1990, now U.S. Pat. No. 5,235,670, which appli- 20
`cations are incorporated herein by reference.
`
`15
`
`BACKGROUND
`
`2
`FIG. 2 is a block diagram of a sample/hold and A/D circuit
`used in the system of FIG. 1.
`FIG. 3 is a block diagram of the spectral whitening circuit
`of FIG. 1.
`FIG. 4 is a block diagram of the perceptual speech weight
`ing circuit of FIG. 1.
`FIG. 5 is a block diagram of the reflection coefficient
`quantization circuit of FIG. 1.
`FIG. 6 is a block diagram of the LPC interpolation/weight
`ing circuit of FIG. 1.
`FIG. 7 is a flow chart diagram of the pitch analysis block of
`FIG. 1.
`FIG. 8 is a flow chart diagram of the multipulse analysis
`block of FIG. 1.
`FIG.9 is a block diagram of the impulse response generator
`of FIG. 1.
`FIG. 10 is a block diagram of the perceptual synthesizer
`circuit of FIG. 1.
`FIG. 11 is a block diagram of the ringdown generator
`circuit of FIG. 1.
`FIG. 12 is a diagrammatic view of the factorial tables
`address storage used in the system of FIG. 1.
`
`DETAILED DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`45
`
`SUMMARY
`
`This invention relates to digital voice coders performing at 25
`relatively low voice rates but maintaining high voice quality.
`This invention incorporates improvements to the prior art
`In particular, it relates to improved multipulse linear predic
`of multipulse coders, specifically, a new type LPC spectral
`tive voice coders.
`quantization, pitch filter implementation, incorporation of
`The multipulse coder incorporates the linear predictive
`pitch synthesis filter in the multipulse analysis, and excitation
`all-pole filter (LPC filter). The basic function of a multipulse *
`encoding/decoding.
`coder is finding a suitable excitation pattern for the LPC
`Shown in FIG. 1 is a block diagram of an 8 kbps multipulse
`all-pole filter which produces an output that closely matches
`LPC speech coder, generally designated 10.
`the original speech waveform. The excitation signal is a series
`It comprises a pre-emphasis block 12 to receive the speech
`of weighted impulses. The weight values and impulse loca
`signals s(n). The pre-emphasized signals are applied to an
`tions are found in a systematic manner. The selection of a 35
`LPC analysis block 14 as well as to a spectral whitening block
`weight and location of an excitation impulse is obtained by
`16 and to a perceptually weighted speech block 18.
`minimizing an error criterion between the all-pole filter out
`The output of the block 14 is applied to a reflection coef
`put and the original speech signal. Some multipulse coders
`ficient quantization and LPC conversion block 20, whose
`incorporate a perceptual weighting filter in the error criterion
`output is applied both to the bit packing block 22 and to an
`function. This filter serves to frequency weight the error ao
`LPC interpolation/weighting block 24.
`which in essence allows more error in the format regions of
`The output from block 20 to block 24 is indicated at O. and
`the speech signal and less in low energy portions of the
`the outputs from block 24 are indicated at O. O." and at op,
`spectrum. Incorporation of pitch filters improve the perfor
`C. p.
`mance, of multipulse speech coders. This is done by modeling
`The signal cº, o' is applied to the spectral whitening block
`the long term redundancy of the speech signal thereby allow
`16 and the signal op. O'p is applied to the impulse generation
`ing the excitation signal to account for the pitch related prop
`block 26.
`erties of the signal.
`The output of spectral whitening block 16 is applied to the
`pitch analysis block 28 whose output is applied to quantizer
`block 30. The quantized output p from quantizer 30 is applied
`to the bit packer 22 and also as a second input to the impulse
`response generation block 26. The output of block 26, indi
`cated at h(n), is applied to the multiple analysis block 32.
`The perceptual weighting block 18 receives both outputs
`from block 24 and its output, indicated at Sp(n), is applied to
`an adder 34 which also receives the output r(n) from a ring
`down generator 36. The ringdown component r(n) is a fixed
`signal due to the contributions of the previous frames. The
`output x(n) of the adder 34 is applied as a second input to the
`multipulse analysis block 32. The two outputs E and G of the
`multipulse analysis block 32 are fed to the bit packing block
`22.
`The signals o, o', p and E, G are fed to the perceptual
`synthesizer block 38 whose output y(n), comprising the com
`bined weighted reflection coefficients, quantized spectral
`coefficients and multipulse analysis signals of previous
`frames, is applied to the block delay N/2 40. The output of
`block 40 is applied to the ringdown generator 36.
`
`-
`
`-
`
`-
`
`-
`
`.
`
`50
`
`The present invention is a synthetic speech encoding
`device that produces a synthetic speech signal which closely
`matches an actual speech signal. The actual speech signal is
`digitized, and excitation pulses are selected by minimizing
`the error between the actual and synthetic speech signals. The
`preferred pattern of excitation pulses needed to produce the 55
`synthetic speech signal is obtained by using an excitation
`pattern containing a multiplicity of weighted pulses at timed
`positions. The selection of the location and amplitude of each
`excitation pulse is obtained by minimizing an error criterion
`between the synthetic speech signal and the actual speech 60
`signal. The error criterion function incorporates a perceptual
`weighting filter which shapes the error spectrum.
`
`BRIEF DESCRIPTION OF THE DRAWING(S)
`
`FIG. 1 is a block diagram of an 8 kbps multipulse LPC
`speech coder.
`
`65
`
`Ex. 1023 / Page 15 of 19
`
`
`
`US 7,599,832 B2
`
`3
`The output of the block 22 is fed to the synthesizer/post
`filter 42.
`The operation of the aforesaid system is described as fol
`lows: The original speech is digitized using sample/hold and
`AID circuitry 44 comprising a sample and hold block 46 and
`an analog to digital block 48. (FIG. 2). The sampling rate is 8
`kHz. The digitized speech signal, s(n), is analyzed on a block
`basis, meaning that before analysis can begin, N samples of
`s(n) must be acquired. Once a block of speech samples s(n) is
`acquired, it is passed to the preemphasis filter 12 which has a
`Z-transform function
`
`4
`methods are computationally expensive. In the present inven
`tion the pitch analysis procedure indicated by block 28, is
`performed in an open loop manner on the speech spectral
`residual signal. Open loop methods have reduced computa
`tional requirements. The spectral residual signal is generated
`using the inverse LPC filter which can be represented in the
`Z-transform domain as A(z); A(z)=1/H(z) where H(z) is the
`LPC all-pole filter. This is known as spectral whitening and is
`represented by block 16. This block 16 is shown in detail in
`FIG. 3. The spectral whitening process removes the short
`time sample correlation which in turn enhances pitch analy
`S1S.
`A flow chart diagram of the pitch analysis block 28 of FIG.
`1 is shown in FIG. 7. The first step in the pitch analysis
`process is the collection of N samples of the spectral residual
`signal. This spectral residual signal is obtained from the pre
`emphasized speech signal by the method illustrated in FIG. 3.
`These residual samples are appended to the prior K retained
`residual samples to form a segment, r(n), where –K=n=N.
`The autocorrelation Q(i) is performed for tº sist, or
`
`Q(i), +,
`
`(3)
`
`The limits of i are arbitrary but for speech sounds a typical
`range is between 20 and 147 (assuming 8 kHz sampling). The
`next step is to search Q(i) for the max value, M1, where
`
`The value k is stored and Q(k-1), Q(k) and Q(K14-1) are
`set to a large negative value.
`We next find a second value M2 where
`
`The values k and k, correspond to delay values that pro
`duce the two largest correlation values. The values k and k2
`are used to check for pitch period doubling. The following
`algorithm is employed: If the ABS(k2–2*ki)<C, where C can
`be chosen to be equal to the number of taps (3 in this inven
`tion), then the delay value, D, is equal to k2 otherwise D=k.
`Once the frame delay value, D, is chosen the 3-tap gain terms
`are solved by first computing the matrix and vector values in
`eq.(6).
`
`10
`
`15
`
`20
`
`It is then passed to the LPC analysis block 14 from which
`the signal K is fed to the reflection coefficient quantizer and
`LPC converter whitening block 20, (shown in detail in FIG.
`3). The LPC analysis block 14 produces LPC reflection coef
`ficients which are related to the all-pole filter coefficients. The
`reflection coefficients are then quantized in block 20 in the
`manner shown in detail in FIG. 5 wherein two sets of quan
`tizer tables are previously stored. One set has been designed
`using training databases based on voiced speech, while the
`other has been designed using unvoiced speech. The reflec
`tion coefficients are quantized twice; once using the voiced
`quantizer 48 and once using the unvoiced quantizer 50. Each
`quantized set of reflection coefficients is converted to its
`respective spectral coefficients, as at 52 and 54, which, in
`turn, enables the computation of the log-spectral distance
`between the unquantized spectrum and the quantized spec
`trum. The set of quantized reflection coefficients which pro
`duces the smaller log-spectral distance shown at 56, is then
`retained. The retained reflection coefficient parameters are
`encoded for transmission and also converted to the corre
`sponding all-pole LPC filter coefficients in block 58.
`Following the reflection quantization and LPC coefficient
`conversion, the LPC filter parameters are interpolated using
`the scheme described herein. As previously discussed, LPC
`analysis is performed on speech of block length N which
`corresponds to N/8000 seconds (sampling rate=8000 Hz).
`Therefore, a set of filter coefficients is generated for every N
`40
`samples of speech or every N/8000 sec.
`In order to enhance spectral trajectory tracking, the LPC
`filter parameters are interpolated on a sub-frame basis at
`block 24 where the sub-framerate is twice the framerate. The
`interpolation scheme is implemented (as shown in detail in
`
`25
`
`30
`
`35
`
`X rom -i- Dron-i-1) X, r(m-irºn-i-1) X run-i. Dr?n-i – 1)
`X rom -i- Dron-i)
`X rom -i- Dron-i. 1) X run-iron – i. 1) X run-i. Drin-i-1)
`
`(6)
`
`FIG.6) as follows: letthe LPC filtercoefficients for framek–1
`be o' and for framek be o'. The filter coefficients for the first
`sub-frame of frame k is then
`g=(o"+g')/2
`(2)
`and o' parameters are applied to the second sub-frame.
`Therefore a different set of LPC filterparameters are available
`every 0.5°(N/8000) sec.
`Pitch Analysis
`Prior methods of pitch filter implementation for multipulse
`LPC coders have focused on closed loop pitch analysis meth
`ods (U.S. Pat. No. 4,701,954). However, such closed loop
`
`55
`
`60
`
`65
`
`The matrix is solved using the Cholesky matrix decompo
`sition. Once the gain values are calculated, they are quantized
`using a 32 word vector codebook. The codebook index along
`with the frame delay parameter are transmitted. The P signi
`fies the quantized delay value and index of the gain codebook.
`Excitation Analysis
`Multipulse’s name stems from the operation of exciting a
`vocal tract model with multiple impulses. A location and
`amplitude of an excitation pulse is chosen by minimizing the
`mean-squared error between the real and synthetic speech
`signals. This system incorporates the perceptual weighting
`filter 18. A detailed flow chart of the multipulse analysis is
`
`Ex. 1023 / Page 16 of 19
`
`
`
`5
`shown in FIG. 8. The method of determining a pulse location
`and amplitude is accomplished in a systematic manner. The
`basic algorithm can be described as follows: let h(n) be the
`system impulse response of the pitch analysis filter and the
`LPC analysis filter in cascade; the synthetic speech is the
`system’s response to the multipulse excitation. This is indi
`cated as the excitation convolved with the system response or
`
`US 7,599,832 B2
`
`6
`
`-continued
`
`W = 1
`S = X x*(n)
`
`n=1
`
`and
`
`10
`
`The error, E, is minimized by setting the dB/dE=0 or
`
`where ex(n) is a set of weighted impulses located at positions
`n1, m2, . . . n, or
`
`(8)
`
`The synthetic speech can be re-written as
`
`15
`
`Or
`
`A=C/H.
`The error, E, can then be written as
`
`(16)
`
`(19)
`
`In the present invention, the excitation pulse search is per
`formed one pulse at a time, therefore j=1. The error between
`the real and synthetic speech is
`
`The squared error
`
`W
`E = 2. (s,(n)-s(n)-r(n))
`
`(11)
`
`(12)
`
`20
`
`25
`
`30
`
`35
`
`40
`
`where s,(n) is the original speech after pre-emphasis and
`perceptual weighting (FIG. 4) and r(n) is a fixed signal com
`ponent due to the previous frames’ contributions and is
`45
`referred to as the ringdown component.
`FIGS. 10 and 11 show the manner in which this signal is
`generated, FIG. 10 illustrating the perceptual synthesizer 38
`and FIG. 11 illustrating the ringdown generator 36. The
`squared error is now written as
`
`50
`
`W
`E = 2. (x(n) — B, h(n - ny)*
`
`(13)
`
`55
`
`From the above equations it is evident that two signals are
`required formultipulse analysis, namely h(n) and x(n). These
`two signals are input to the multipulse analysis block 32.
`The first step in excitation analysis is to generate the system
`impulse response. The system impulse response is the con
`catenation of the 3-tap pitch synthesis filter and the LPC
`weighted filter. The impulse response filter has the Z-trans
`form:
`
`1
`
`(20)
`
`3
`
`1 – X biz-t-i 1 –
`
`.
`
`#3
`
`The b values are the pitch gain coefficients, the O. values are
`the spectral filter coefficients, and plis a filter weighting
`coefficient. The error signal, e(n), can be written in the
`Z-transform domain as
`(21)
`E(z)=x(z)—BH,(2):”
`where X(z) is the Z-transform of x(n) previously defined.
`The impulse response weight fl, and impulse response time
`shift location n, are computed by minimizing the energy of
`the error signal, e(n). The time shift variable n (1=1 for first
`pulse) is now varied from 1 to N. The value of n, is chosen
`such that it produces the smallest energy error E. Once n is
`found 51 can be calculated. Once the first location, n, and
`impulse weight, £1, are determined the synthetic signal is
`written as
`
`(22)
`When two weighted impulses are considered in the excita
`tion sequence, the error energy can be written as
`
`where x(n) is the speech signals,(n)—rn) as shown in FIG.1.
`
`Since the first pulse weight and location are known, the
`equation is rewritten as
`
`E = S – 2BC + Bº H
`where
`
`and
`
`(14)
`
`60
`
`(15)
`
`65
`
`The procedure for determining £2 and n2 is identical to that
`of determining 51 and nº. This procedure can be repeated p
`times. In the present instance p-5. The excitation pulse loca
`tions are encoded using an enumerative encoding scheme.
`
`Ex. 1023 / Page 17 of 19
`
`
`
`US 7,599,832 B2
`
`7
`
`Excitation Encoding
`A normal encoding scheme for 5 pulse locations would
`take 5*Int(log2 N+0.5), where N is the number of possible
`locations. For p=5 and N=80, 35 bits are required. The
`approach taken here is to employ an enumerative encoding
`scheme. For the same conditions, the number of bits required
`is 25 bits. The first step is to order the pulse locations (i.e.
`OL1=L2=L3=L4=L5=N-1 where L1=min(n1, m2, ns, na,
`ns) etc.). The 25 bit number, B, is:
`
`Computing the 5 sets of factorials is prohibitive on a DSP
`device, therefore the approach taken here is to pre-compute
`the values and store them on a DSP ROM. This is shown in
`FIG. 12. Many of the numbers require double precision (32
`bits). A quick calculation yields a required storage (for N=80)
`of 790 words ((N-1)*2*5). This amount of storage can be
`reduced by first realizing
`
`is simply L1; therefore no storage is required. Secondly,
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`The fourth pulse location is found by finding a value X such
`that
`
`then L4=X-1. This is repeated for L3 and L2. The remaining
`number is L1.
`
`What is claimed is:
`1. A speech encoder comprising:
`a sampler to generate samples from a speech signal;
`a linear predictive coding (LPC) device to produce a first
`set of linear predication (LP) coefficients based on the
`samples, and to produce spectral representations from
`the first set of LP coefficients;
`an interpolator to interpolate the spectral representations to
`generate interpolated spectral representations;
`a spectral device to convert the interpolated spectral repre
`sentations to a second set of LP coefficients;
`a pitch analyzer to perform open-loop pitch analysis with
`the second set of LP coefficients; and
`a bit packing device to transmit encoded speech compris
`ing a codebook index.
`2. The speech encoder of claim 1, wherein a residual signal
`is associated with the pitch analyzer.
`3. The speech encoder of claim 2, wherein the codebook
`index is based on the residual signal.
`4. The speech encoder of claim 1, wherein the sampler is
`samples the speech signal at a sampling rate of 8 kHz.
`5. A method for encoding speech, the method comprising:
`sampling a speech signal to generate samples:
`producing spectral representations from the samples;
`interpolating the spectral representations to generate inter
`polated spectral representations;
`performing open-loop pitch analysis based on the interpo
`lated spectral representation; and
`transmitting encoded speech comprising a codebook
`index.
`6. The method of claim 5, wherein a residual signal is
`associated with the open-loop pitch analysis.
`7. The method of claim 6, wherein the codebook index is
`based on the residual signal.
`8. The method of claim 5, wherein a sampling rate of the
`speech signal is 8 kHz.
`9. A method for encoding speech, the method comprising:
`sampling a speech signal to generate samples:
`producing a first set of linear predication (LP) coefficients
`based on the samples;
`
`contains only single precision numbers; therefore storage can
`be reduced to 553 words. The code is written such that the five
`addresses are computed from the pulse locations starting with
`the 5th location (Assumes pulse location range from 1 to 80).
`The address of the 5th pulse is 2*L5+393. The factor of 2 is
`due to double precision storage of L5's elements. The address
`of L4 is 2*L4+235, for L3, 2*L3+77, for L2, L2–1. The
`numbers stored at these locations are added and a 25-bit
`number representing the unique set of locations is produced.
`A block diagram of the enumerative encoding schemes is
`listed.
`Excitation Decoding
`Decoding the 25-bit word at the receiver involves repeated
`subtractions. For example, given B is the 25-bit word, the 5th
`location is found by finding the value X such that
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`then L5–x–1. Next let
`
`Ex. 1023 / Page 18 of 19
`
`
`
`US 7,599,832 B2
`
`producing spectral representations from the first set of LP
`coefficients;
`interpolating the spectral representations to generate inter
`polated spectral representations;
`converting the interpolated spectral representations to a 5
`second set of LP coefficients;
`performing open-loop pitch analysis with the second set of
`LP coefficients; and
`transmitting encoded speech comprising a codebook 10
`index.
`10. The method of claim 9, wherein a sampling rate of the
`speech signal is 8 kHz.
`11. The method of claim 9, wherein a residual signal is
`associated with the open-loop pitch analysis.
`12. The method of claim 11, wherein the codebookindex is
`based on the residual signal.
`
`15
`
`10
`13. A speech encoder comprising:
`a sampler to generate samples from a speech signal;
`a linear predictive coding (LPC) device to produce spectral
`representations from the samples;
`an interpolator to interpolate the spectral representations to
`generate interpolated spectral representations;
`a pitch analyzer to perform open-loop pitch analysis based
`on the interpolated spectral representations; and
`a bit packing device to transmit encoded speech compris
`ing a codebook index.
`14. The speech encoder of claim 13, wherein a residual
`signal is associated with the pitch analyzer.
`15. The speech encoder of claim 14, wherein the codebook
`index is based on the residual signal.
`16. The speech encoder of claim 13, wherein the sampler is
`samples the speech signal at a sampling rate of 8 kHz.
`
`Ex. 1023 / Page 19 of 19
`
`