throbber
US007599832B2
`
`(12) United States Patent
`Lin et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,599,832 B2
`*Oct. 6, 2009
`
`(54) METHOD AND DEVICE FOR ENCODING
`SPEECH USING OPEN-LOOP PITCH
`ANALYSIS
`
`(75) Inventors: Daniel Lin, Montville, NJ (US); Brian
`M. McCarthy, Lafeyette Hill, PA (US)
`(73) Assignee: InterDigital Technology Corporation,
`Wilmington, DE (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 22 days.
`This patent is subject to a terminal dis-
`claimer.
`
`(51) Int. Cl.
`(2006.01)
`G10L 19/12
`(52) U.S. Cl. ...................................................... 704/219
`(58) Field of Classification Search .................. 704/219
`See application file for complete search history.
`References Cited
`|U.S. PATENT DOCUMENTS
`
`(56)
`
`3,617,636 A 11/1971 Ogihara
`-
`(Continued)
`FOREIGN PATENT DOCUMENTS
`86/08726
`6/1986
`WO 86 08 72 6
`6/1986
`
`WO
`WO
`
`(21) Appl. No.: 11/363,807
`31, 21.
`(22) Filed:
`(65)
`
`Feb. 28, 2006
`Prior Publication Data
`US 2006/0143003 A1
`Jun. 29, 2006
`
`Related U.S. Application Data
`(63) Continuation of application No. 10/924,398, filed on
`Aug. 23, 2004, now Pat. No. 7,013,270, which is a
`continuation of application No. 10/446,314, filed on
`May 28, 2003, now Pat. No. 6,782,359, which is a
`continuation of application No. 10/083,237, filed on
`Feb. 26, 2002, now Pat. No. 6,611,799, which is a
`continuation of application No. 09/805,634, filed on
`Mar. 14, 2001, now Pat. No. 6,385,577, which is a
`continuation of application No. 09/441,743, filed on
`Nov. 16, 1999, now Pat. No. 6,223,152, which is a
`continuation of application No. 08/950,658, filed on
`Oct. 15, 1997, now Pat. No. 6,006,174, which is a
`continuation of application No. 08/670,986, filed on
`Jun. 28, 1996, now abandoned, which is a continuation
`of application No. 08/104,174, filed on Aug. 9, 1993,
`now abandoned, which is a continuation of application
`No. 07/592,330, filed on Oct. 3, 1990, now Pat. No.
`5,235,670.
`
`OTHER PUBLICATIONS
`Proc. ICASSP '82, A New Model of LPC Excitation for Producing
`Natural-Sounding Speech at Low Bit Rates, B.S. Atal and J.R.
`Remde, pp. 614-617, Apr. 1982.
`(Continued)
`Primary Examiner–Susan McFadden
`(74) Attorney, Agent, or Firm—Volpe and Koenig, P.C.
`
`(57)
`
`ABSTRACT
`
`The present invention is a synthetic speech encoding device
`that produces a synthetic speech signal which closely
`matches an actual speech signal. The actual speech signal is
`digitized, and excitation pulses are selected by minimizing
`the error between the actual and synthetic speech signals. The
`preferred pattern of excitation pulses needed to produce the
`synthetic speech signal is obtained by using an excitation
`pattern containing a multiplicity of weighted pulses at timed
`positions. The selection of the location and amplitude of each
`excitation pulse is obtained by minimizing an error criterion
`between the synthetic speech signal and the actual speech
`signal. The error criterion function incorporates a perceptual
`weighting filter which shapes the error spectrum.
`
`16 Claims, 12 Drawing Sheets
`
`& KBPS MULTIPULSE SPEECH CODER
`TRANSiTIER
`
`
`
`
`
`Filief. §TATES
`
`BIT ALLOCATGN120 MSEG FRAME
`34
`24
`90
`#2
`
`|
`
`Ex. 1023 / Page 1 of 19
`Apple v. Saint Lawrence
`
`

`

`US 7,599,832 B2
`Page 2
`
`|U.S. PATENT DOCUMENTS
`4,058,676
`4,618,982
`4,669,120
`4,731,846
`4,776,015
`4,797,925
`4,815, 134
`4,845,753
`4,868,867
`4,890,327
`4,980,916
`4,991,213
`5,001,759
`5,027,405
`5,127,053
`5,235,670
`5,265,167
`5,307.441
`5,327,520
`5,568,512
`5,675,702
`5,999,899
`6,014,622
`6,148,282
`
`1 1/1977
`10/1986
`5/1987
`3/1988
`10/1988
`1/1989
`3/1989
`7/1989
`9/1989
`12/1989
`12/1990
`2/1991
`3/1991
`6/1991
`6/1992
`8/1993
`1 1/1993
`4/1994
`7/1994
`10/1996
`10/1997
`12/1999
`1/2000
`11/2000
`
`Wilkes et al.
`Horvath et al.
`Ono
`Secrest et al.
`Takeda et al.
`Lin
`Picone et al.
`Yasunaga
`Davidson et al.
`Bertrand et al.
`Zinser
`Wilson
`Fukui
`Ozawa
`Koch
`Lin et al.
`Akamine et al.
`Tzeng
`Chen
`McCree et al.
`Gerson et al.
`Robinson
`Su et al. ...............
`Paksoy et al. .........
`- - - - - - 704/219
`
`- - - - - - 704/223
`
`6,243,672 B1* 6/2001 Iijima et al. ................. 704/207
`6,246,979 B1
`6/2001 Carl
`6,345,248 B1* 2/2002 Su et al. ..................... 704/223
`6,591,234 B1
`7/2003 Chandran et al.
`6,633,839 B2 10/2003 Kushner et al.
`7,254,533 B1* 8/2007 Jabri et al. .................. 704/219
`OTHER PUBLICATIONS
`
`Proc. ICASSP’84, Improving Performance of Multi-Pulse Coders at
`Low Bit Rates, S. Singhal and B.S. Atal, paper 13, Mar. 1984.
`Proc. ICASSP '84, Efficient Computation and Encoding of the Mul
`tiple Excitation for LPC, M. Berouti et al., paper 10.1, Mar. 1984.
`Proc. ICASSP’86, Implementation of Multi-Pulse Coderona Single
`Chip Floating-Point Signal Processor, H. Alrutz, paper 44.3, Apr.
`1986.
`Digital Telephony, John Bellamy, pp. 153-154, 1991.
`Veeneman et al., “Computationally Efficient Stochastic Coding of
`Speech,” 1990, IEEE 40° Vehicular Technology Conference, May
`1990, pp. 331-335.
`Chazan et al., “Speech Reconstruction from Mel Frequency Cepstral
`Coefficients and Pitch Frequency.” IEEE International Conference on
`Acoustics, Speech, and Signal Processing, vol. 3, pp. 1299–1302
`(Jun. 2000).
`
`* cited by examiner
`
`Ex. 1023 / Page 2 of 19
`
`

`

`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 1 of 12
`
`US 7,599,832 B2
`
`
`
`was:cumso220503.?to
`
`oa
`
`2
`
`$28“:
`
`Nu
`
`<8
`
`we
`
`mafia?»5
`
`
`
`magma...32.5,:
`
`
`
`mmooozommmmmafia—5::mamxa
`
`
`
`
`
`
`
`ozeofi;zoaxgzoo
`
`EESEEEo5ea9:zopfizfia2%?
`
`5258
`
`9.
`
`20535a32«Es.sz
`
`Ea
`
`Emma”
`
`.my:
`
`«55:5
`
`228%:fine*M.<
`
`Ezmzmax832:2.58%“..
`358%SEv0a.u
`$25$5;.
`
`rdd
`
`Niamsz
`
`$432.
`
`$20.55.
`
`>j<Emmo¢wm
`
`:83”vtam;9...ad
`
`u.u
`
`Ex. 1023/ Page 3 of 19
`
`Ex. 1023 / Page 3 of 19
`
`
`
`
`
`
`
`

`

`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 2 of 12
`
`US 7,599,832 B2
`
`48
`
`SAMPLING RATE = 8000Hz
`
`|
`
`|
`
`|
`|
`T Sec,
`
`FIG. 2
`
`Ex. 1023 / Page 4 of 19
`
`

`

`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 3 of 12
`
`US 7,599,832 B2
`
`
`
`Quel? - qms puz
`
`
`
`QUIB)} - qms ?s,
`
`Ex. 1023 / Page 5 of 19
`
`

`

`U.S. Patent
`U.S. Patent
`
`Oct. 6, 2009
`Oct. 6, 2009
`
`Sheet 4 of 12
`Sheet 4 of 12
`
`US 7,599,832 B2
`US 7,599,832 B2
`
`
`
`aha
`
`9a
`
`E.“
`
`“mr
`
`as“....as
`
`new....an
`
`.5
`
`23:.ea
`
`Va
`
`“3
`
`das“....ea
`
`(u)ds
`Ea”
`
`2.:
`
`
`
`
`
`:owwmm59m;Siamwoxmm
`
`vdc
`
`Ex. 1023/ Page 6 of 19
`
`Ex. 1023 / Page 6 of 19
`
`
`

`

`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 5 of 12
`
`US 7,599,832 B2
`
`<3:
`
`mm
`
`zoPomflux
`
`hszEmoo
`
`Ema—“Emsom._o...
`
`
`
`
`
`zo.._.<N_._.z<=ahszEmoozoFomdmm
`
`em
`
`om
`
`.2528
`
`moz<._.w_o
`
`205320.20
`
`imp—ammo
`
`mozsba
`
`225320.20
`
`3
`
`52.2:.zoz
`
`5.28323
`
`mun—Egg
`
`Emoazs.zoz
`
`
`
`$.28owe—32:
`
`mmNFzga
`
`m.oE
`
`Ex. 1023/ Page 7 of 19
`
`Ex. 1023 / Page 7 of 19
`
`
`
`
`
`
`
`

`

`U.S. Patent
`
`m
`
`m
`
`
`
`man:.13mflu2.53m;sEmaEmoo
`
`d
`
`6,u2.5%;0loMEggs2wca:33
`
`maan!.1;8m3825
`
`3
`
`US 7,599,832 B2
`
`na
`
`
`
`"oz—b.2225zap—.525»;on...
`
`o.9“.
`
`Ex. 1023/ Page 8 of 19
`
`Ex. 1023 / Page 8 of 19
`
`
`

`

`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 7 of 12
`
`US 7,599,832 B2
`
`FIG, 7
`
`
`
`
`
`
`
`CALCULATE
`Bri ri-º-1
`
`
`
`
`
`
`
`
`
`
`
`SOLVE
`MATRIX
`Q = H?.
`USING
`CHOLESKI
`DECOMPOSITION
`

`
`Ex. 1023 / Page 9 of 19
`
`

`

`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 8 of 12
`
`US 7,599,832 B2
`
`h(n)
`
`x(n)
`COMPUTE
`C(i) = x(n)h(n-1)
`
`B(i) = B(i)/ G
`FOR 0 3. p
`

`
`&
`
`MULTIPULSE
`ANALYSIS
`
`FIG, 8
`
`
`
`Ex. 1023 / Page 10 of 19
`
`

`

`U.S. Patent
`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 9 of 12
`
`US 7,599,832 B2
`2B2w,w597SU
`
`m3.3.25.3."E.
`
`E:
`(u) ||
`
`c;
`
`
`
`
`
`m3:82"?M2.2.ea«23.n3avamo“.cuE:w.new“2‘
`ama.dm
`
`d V
`
`
`
`
`
`mmzonfimmmmSaEHam—usz
`
`92
`on
`
`a.o_“_
`
`Ex.1023/Page11of19
`
`Ex. 1023 / Page 11 of 19
`
`
`

`

`U.S. Patent
`U.S. Patent
`
`m
`
`mm
`
`US 7,599,832 B2
`2B2w,
`
`7S
`
`2:
`
`
`
`
`
`
`
`m,9}:63;$5.}.13.2.5“J.
`
`nmag.25meg.3..mEEwoo<
`
`
`
`Ufifiazz233%..
`
`22“.
`
`Ex. 1023/Page 12 Of19
`
`Ex. 1023 / Page 12 of 19
`
`
`

`

`U.S. Patent
`
`Oct. 6, 2009
`
`Sheet 11 0f 12
`
`US 7,599,832 B2
`
`12-?
`
`3
`
`
`
`a
`Ea
`
`Va
`
`ad
`
`umv
`
`2:2.
`
`25
`
`2:2.ea
`
`
`
`228szwhammzmo
`
`8..8:15;
`
`a:5:1,?
`
`<0.
`
`8
`
`3.01
`
`Ex. 1023/Page 13 Of19
`
`Ex. 1023 / Page 13 of 19
`
`

`

`US. Patent
`
`Oct. 6, 2009
`
`Sheet 12 of 12
`
`US 7,599,832 B2
`
`FIG.12
`
`ga:
`.9
`m
`z
`3
`-‘——6
`a
`If,
`_l
`g
`c:
`
`m
`gI:
`
`aZ
`
`2a
`
`2
`-———-m
`aI.I.l
`a:
`a.
`u!
`_I
`<5
`
`A
`2m
`v
`
`In
`
`
`
`cc:
`co
`N
`
`a)
`00
`
`(3
`
`A
`v-Ia
`V
`
`(I)
`
`O
`3 00
`
`AA
`w-V'N‘!‘
`
`a
`E or)
`g N
`
`
`
`fii
`s a:
`g N
`
`moaess
`
`3
`a:
`O
`I—
`w
`a)
`:3
`a:
`D
`o
`m
`m
`.1
`2%
`P.
`
`gh 5
`
`%
`u—
`
`Ex. 1023/Page 14 Of19
`
`Ex. 1023 / Page 14 of 19
`
`

`

`US 7,599,832 B2
`
`1
`METHOD AND DEVICE FOR ENCODING
`SPEECH USING OPEN-LOOP PITCH
`ANALYSIS
`
`This application is a continuation of U.S. patent applica- 5
`tion Ser. No. 10/924,398, filed Aug. 23, 2004, which is a
`continuation of U.S. patent application Ser. No. 10/446,314,
`filed May 28, 2003, now U.S. Pat. No. 6,782,359, which is a
`continuation of U.S. patent application Ser. No. 10/083,237,
`filed Feb. 26, 2002, now U.S. Pat. No. 6,611,799, which is a 10
`continuation of U.S. patent application Ser. No. 09/805,634,
`filed Mar. 14, 2001, now U.S. Pat. No. 6,385,577, which is a
`continuation of U.S. patent application Ser. No. 09/441,743,
`filed Nov. 16, 1999, now U.S. Pat. No. 6,223,152, which is a
`continuation of U.S. patent application Ser. No. 08/950,658,
`filed Oct. 15, 1997, now U.S. Pat. No. 6,006,174, which is a
`continuation of U.S. patent application Ser. No. 08/670,986,
`filed Jun. 28, 1996, which is a continuation of U.S. patent
`application Ser. No. 08/104,174, filed Aug. 9, 1993, which is
`a continuation of U.S. patent application Ser. No. 07/592,330,
`filed Oct. 3, 1990, now U.S. Pat. No. 5,235,670, which appli- 20
`cations are incorporated herein by reference.
`
`15
`
`BACKGROUND
`
`2
`FIG. 2 is a block diagram of a sample/hold and A/D circuit
`used in the system of FIG. 1.
`FIG. 3 is a block diagram of the spectral whitening circuit
`of FIG. 1.
`FIG. 4 is a block diagram of the perceptual speech weight
`ing circuit of FIG. 1.
`FIG. 5 is a block diagram of the reflection coefficient
`quantization circuit of FIG. 1.
`FIG. 6 is a block diagram of the LPC interpolation/weight
`ing circuit of FIG. 1.
`FIG. 7 is a flow chart diagram of the pitch analysis block of
`FIG. 1.
`FIG. 8 is a flow chart diagram of the multipulse analysis
`block of FIG. 1.
`FIG.9 is a block diagram of the impulse response generator
`of FIG. 1.
`FIG. 10 is a block diagram of the perceptual synthesizer
`circuit of FIG. 1.
`FIG. 11 is a block diagram of the ringdown generator
`circuit of FIG. 1.
`FIG. 12 is a diagrammatic view of the factorial tables
`address storage used in the system of FIG. 1.
`
`DETAILED DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`45
`
`SUMMARY
`
`This invention relates to digital voice coders performing at 25
`relatively low voice rates but maintaining high voice quality.
`This invention incorporates improvements to the prior art
`In particular, it relates to improved multipulse linear predic
`of multipulse coders, specifically, a new type LPC spectral
`tive voice coders.
`quantization, pitch filter implementation, incorporation of
`The multipulse coder incorporates the linear predictive
`pitch synthesis filter in the multipulse analysis, and excitation
`all-pole filter (LPC filter). The basic function of a multipulse *
`encoding/decoding.
`coder is finding a suitable excitation pattern for the LPC
`Shown in FIG. 1 is a block diagram of an 8 kbps multipulse
`all-pole filter which produces an output that closely matches
`LPC speech coder, generally designated 10.
`the original speech waveform. The excitation signal is a series
`It comprises a pre-emphasis block 12 to receive the speech
`of weighted impulses. The weight values and impulse loca
`signals s(n). The pre-emphasized signals are applied to an
`tions are found in a systematic manner. The selection of a 35
`LPC analysis block 14 as well as to a spectral whitening block
`weight and location of an excitation impulse is obtained by
`16 and to a perceptually weighted speech block 18.
`minimizing an error criterion between the all-pole filter out
`The output of the block 14 is applied to a reflection coef
`put and the original speech signal. Some multipulse coders
`ficient quantization and LPC conversion block 20, whose
`incorporate a perceptual weighting filter in the error criterion
`output is applied both to the bit packing block 22 and to an
`function. This filter serves to frequency weight the error ao
`LPC interpolation/weighting block 24.
`which in essence allows more error in the format regions of
`The output from block 20 to block 24 is indicated at O. and
`the speech signal and less in low energy portions of the
`the outputs from block 24 are indicated at O. O." and at op,
`spectrum. Incorporation of pitch filters improve the perfor
`C. p.
`mance, of multipulse speech coders. This is done by modeling
`The signal cº, o' is applied to the spectral whitening block
`the long term redundancy of the speech signal thereby allow
`16 and the signal op. O'p is applied to the impulse generation
`ing the excitation signal to account for the pitch related prop
`block 26.
`erties of the signal.
`The output of spectral whitening block 16 is applied to the
`pitch analysis block 28 whose output is applied to quantizer
`block 30. The quantized output p from quantizer 30 is applied
`to the bit packer 22 and also as a second input to the impulse
`response generation block 26. The output of block 26, indi
`cated at h(n), is applied to the multiple analysis block 32.
`The perceptual weighting block 18 receives both outputs
`from block 24 and its output, indicated at Sp(n), is applied to
`an adder 34 which also receives the output r(n) from a ring
`down generator 36. The ringdown component r(n) is a fixed
`signal due to the contributions of the previous frames. The
`output x(n) of the adder 34 is applied as a second input to the
`multipulse analysis block 32. The two outputs E and G of the
`multipulse analysis block 32 are fed to the bit packing block
`22.
`The signals o, o', p and E, G are fed to the perceptual
`synthesizer block 38 whose output y(n), comprising the com
`bined weighted reflection coefficients, quantized spectral
`coefficients and multipulse analysis signals of previous
`frames, is applied to the block delay N/2 40. The output of
`block 40 is applied to the ringdown generator 36.
`
`-
`
`-
`
`-
`
`-
`
`.
`
`50
`
`The present invention is a synthetic speech encoding
`device that produces a synthetic speech signal which closely
`matches an actual speech signal. The actual speech signal is
`digitized, and excitation pulses are selected by minimizing
`the error between the actual and synthetic speech signals. The
`preferred pattern of excitation pulses needed to produce the 55
`synthetic speech signal is obtained by using an excitation
`pattern containing a multiplicity of weighted pulses at timed
`positions. The selection of the location and amplitude of each
`excitation pulse is obtained by minimizing an error criterion
`between the synthetic speech signal and the actual speech 60
`signal. The error criterion function incorporates a perceptual
`weighting filter which shapes the error spectrum.
`
`BRIEF DESCRIPTION OF THE DRAWING(S)
`
`FIG. 1 is a block diagram of an 8 kbps multipulse LPC
`speech coder.
`
`65
`
`Ex. 1023 / Page 15 of 19
`
`

`

`US 7,599,832 B2
`
`3
`The output of the block 22 is fed to the synthesizer/post
`filter 42.
`The operation of the aforesaid system is described as fol
`lows: The original speech is digitized using sample/hold and
`AID circuitry 44 comprising a sample and hold block 46 and
`an analog to digital block 48. (FIG. 2). The sampling rate is 8
`kHz. The digitized speech signal, s(n), is analyzed on a block
`basis, meaning that before analysis can begin, N samples of
`s(n) must be acquired. Once a block of speech samples s(n) is
`acquired, it is passed to the preemphasis filter 12 which has a
`Z-transform function
`
`4
`methods are computationally expensive. In the present inven
`tion the pitch analysis procedure indicated by block 28, is
`performed in an open loop manner on the speech spectral
`residual signal. Open loop methods have reduced computa
`tional requirements. The spectral residual signal is generated
`using the inverse LPC filter which can be represented in the
`Z-transform domain as A(z); A(z)=1/H(z) where H(z) is the
`LPC all-pole filter. This is known as spectral whitening and is
`represented by block 16. This block 16 is shown in detail in
`FIG. 3. The spectral whitening process removes the short
`time sample correlation which in turn enhances pitch analy
`S1S.
`A flow chart diagram of the pitch analysis block 28 of FIG.
`1 is shown in FIG. 7. The first step in the pitch analysis
`process is the collection of N samples of the spectral residual
`signal. This spectral residual signal is obtained from the pre
`emphasized speech signal by the method illustrated in FIG. 3.
`These residual samples are appended to the prior K retained
`residual samples to form a segment, r(n), where –K=n=N.
`The autocorrelation Q(i) is performed for tº sist, or
`
`Q(i), +,
`
`(3)
`
`The limits of i are arbitrary but for speech sounds a typical
`range is between 20 and 147 (assuming 8 kHz sampling). The
`next step is to search Q(i) for the max value, M1, where
`
`The value k is stored and Q(k-1), Q(k) and Q(K14-1) are
`set to a large negative value.
`We next find a second value M2 where
`
`The values k and k, correspond to delay values that pro
`duce the two largest correlation values. The values k and k2
`are used to check for pitch period doubling. The following
`algorithm is employed: If the ABS(k2–2*ki)<C, where C can
`be chosen to be equal to the number of taps (3 in this inven
`tion), then the delay value, D, is equal to k2 otherwise D=k.
`Once the frame delay value, D, is chosen the 3-tap gain terms
`are solved by first computing the matrix and vector values in
`eq.(6).
`
`10
`
`15
`
`20
`
`It is then passed to the LPC analysis block 14 from which
`the signal K is fed to the reflection coefficient quantizer and
`LPC converter whitening block 20, (shown in detail in FIG.
`3). The LPC analysis block 14 produces LPC reflection coef
`ficients which are related to the all-pole filter coefficients. The
`reflection coefficients are then quantized in block 20 in the
`manner shown in detail in FIG. 5 wherein two sets of quan
`tizer tables are previously stored. One set has been designed
`using training databases based on voiced speech, while the
`other has been designed using unvoiced speech. The reflec
`tion coefficients are quantized twice; once using the voiced
`quantizer 48 and once using the unvoiced quantizer 50. Each
`quantized set of reflection coefficients is converted to its
`respective spectral coefficients, as at 52 and 54, which, in
`turn, enables the computation of the log-spectral distance
`between the unquantized spectrum and the quantized spec
`trum. The set of quantized reflection coefficients which pro
`duces the smaller log-spectral distance shown at 56, is then
`retained. The retained reflection coefficient parameters are
`encoded for transmission and also converted to the corre
`sponding all-pole LPC filter coefficients in block 58.
`Following the reflection quantization and LPC coefficient
`conversion, the LPC filter parameters are interpolated using
`the scheme described herein. As previously discussed, LPC
`analysis is performed on speech of block length N which
`corresponds to N/8000 seconds (sampling rate=8000 Hz).
`Therefore, a set of filter coefficients is generated for every N
`40
`samples of speech or every N/8000 sec.
`In order to enhance spectral trajectory tracking, the LPC
`filter parameters are interpolated on a sub-frame basis at
`block 24 where the sub-framerate is twice the framerate. The
`interpolation scheme is implemented (as shown in detail in
`
`25
`
`30
`
`35
`
`X rom -i- Dron-i-1) X, r(m-irºn-i-1) X run-i. Dr?n-i – 1)
`X rom -i- Dron-i)
`X rom -i- Dron-i. 1) X run-iron – i. 1) X run-i. Drin-i-1)
`
`(6)
`
`FIG.6) as follows: letthe LPC filtercoefficients for framek–1
`be o' and for framek be o'. The filter coefficients for the first
`sub-frame of frame k is then
`g=(o"+g')/2
`(2)
`and o' parameters are applied to the second sub-frame.
`Therefore a different set of LPC filterparameters are available
`every 0.5°(N/8000) sec.
`Pitch Analysis
`Prior methods of pitch filter implementation for multipulse
`LPC coders have focused on closed loop pitch analysis meth
`ods (U.S. Pat. No. 4,701,954). However, such closed loop
`
`55
`
`60
`
`65
`
`The matrix is solved using the Cholesky matrix decompo
`sition. Once the gain values are calculated, they are quantized
`using a 32 word vector codebook. The codebook index along
`with the frame delay parameter are transmitted. The P signi
`fies the quantized delay value and index of the gain codebook.
`Excitation Analysis
`Multipulse’s name stems from the operation of exciting a
`vocal tract model with multiple impulses. A location and
`amplitude of an excitation pulse is chosen by minimizing the
`mean-squared error between the real and synthetic speech
`signals. This system incorporates the perceptual weighting
`filter 18. A detailed flow chart of the multipulse analysis is
`
`Ex. 1023 / Page 16 of 19
`
`

`

`5
`shown in FIG. 8. The method of determining a pulse location
`and amplitude is accomplished in a systematic manner. The
`basic algorithm can be described as follows: let h(n) be the
`system impulse response of the pitch analysis filter and the
`LPC analysis filter in cascade; the synthetic speech is the
`system’s response to the multipulse excitation. This is indi
`cated as the excitation convolved with the system response or
`
`US 7,599,832 B2
`
`6
`
`-continued
`
`W = 1
`S = X x*(n)
`
`n=1
`
`and
`
`10
`
`The error, E, is minimized by setting the dB/dE=0 or
`
`where ex(n) is a set of weighted impulses located at positions
`n1, m2, . . . n, or
`
`(8)
`
`The synthetic speech can be re-written as
`
`15
`
`Or
`
`A=C/H.
`The error, E, can then be written as
`
`(16)
`
`(19)
`
`In the present invention, the excitation pulse search is per
`formed one pulse at a time, therefore j=1. The error between
`the real and synthetic speech is
`
`The squared error
`
`W
`E = 2. (s,(n)-s(n)-r(n))
`
`(11)
`
`(12)
`
`20
`
`25
`
`30
`
`35
`
`40
`
`where s,(n) is the original speech after pre-emphasis and
`perceptual weighting (FIG. 4) and r(n) is a fixed signal com
`ponent due to the previous frames’ contributions and is
`45
`referred to as the ringdown component.
`FIGS. 10 and 11 show the manner in which this signal is
`generated, FIG. 10 illustrating the perceptual synthesizer 38
`and FIG. 11 illustrating the ringdown generator 36. The
`squared error is now written as
`
`50
`
`W
`E = 2. (x(n) — B, h(n - ny)*
`
`(13)
`
`55
`
`From the above equations it is evident that two signals are
`required formultipulse analysis, namely h(n) and x(n). These
`two signals are input to the multipulse analysis block 32.
`The first step in excitation analysis is to generate the system
`impulse response. The system impulse response is the con
`catenation of the 3-tap pitch synthesis filter and the LPC
`weighted filter. The impulse response filter has the Z-trans
`form:
`
`1
`
`(20)
`
`3
`
`1 – X biz-t-i 1 –
`
`.
`
`#3
`
`The b values are the pitch gain coefficients, the O. values are
`the spectral filter coefficients, and plis a filter weighting
`coefficient. The error signal, e(n), can be written in the
`Z-transform domain as
`(21)
`E(z)=x(z)—BH,(2):”
`where X(z) is the Z-transform of x(n) previously defined.
`The impulse response weight fl, and impulse response time
`shift location n, are computed by minimizing the energy of
`the error signal, e(n). The time shift variable n (1=1 for first
`pulse) is now varied from 1 to N. The value of n, is chosen
`such that it produces the smallest energy error E. Once n is
`found 51 can be calculated. Once the first location, n, and
`impulse weight, £1, are determined the synthetic signal is
`written as
`
`(22)
`When two weighted impulses are considered in the excita
`tion sequence, the error energy can be written as
`
`where x(n) is the speech signals,(n)—rn) as shown in FIG.1.
`
`Since the first pulse weight and location are known, the
`equation is rewritten as
`
`E = S – 2BC + Bº H
`where
`
`and
`
`(14)
`
`60
`
`(15)
`
`65
`
`The procedure for determining £2 and n2 is identical to that
`of determining 51 and nº. This procedure can be repeated p
`times. In the present instance p-5. The excitation pulse loca
`tions are encoded using an enumerative encoding scheme.
`
`Ex. 1023 / Page 17 of 19
`
`

`

`US 7,599,832 B2
`
`7
`
`Excitation Encoding
`A normal encoding scheme for 5 pulse locations would
`take 5*Int(log2 N+0.5), where N is the number of possible
`locations. For p=5 and N=80, 35 bits are required. The
`approach taken here is to employ an enumerative encoding
`scheme. For the same conditions, the number of bits required
`is 25 bits. The first step is to order the pulse locations (i.e.
`OL1=L2=L3=L4=L5=N-1 where L1=min(n1, m2, ns, na,
`ns) etc.). The 25 bit number, B, is:
`
`Computing the 5 sets of factorials is prohibitive on a DSP
`device, therefore the approach taken here is to pre-compute
`the values and store them on a DSP ROM. This is shown in
`FIG. 12. Many of the numbers require double precision (32
`bits). A quick calculation yields a required storage (for N=80)
`of 790 words ((N-1)*2*5). This amount of storage can be
`reduced by first realizing
`
`is simply L1; therefore no storage is required. Secondly,
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`The fourth pulse location is found by finding a value X such
`that
`
`then L4=X-1. This is repeated for L3 and L2. The remaining
`number is L1.
`
`What is claimed is:
`1. A speech encoder comprising:
`a sampler to generate samples from a speech signal;
`a linear predictive coding (LPC) device to produce a first
`set of linear predication (LP) coefficients based on the
`samples, and to produce spectral representations from
`the first set of LP coefficients;
`an interpolator to interpolate the spectral representations to
`generate interpolated spectral representations;
`a spectral device to convert the interpolated spectral repre
`sentations to a second set of LP coefficients;
`a pitch analyzer to perform open-loop pitch analysis with
`the second set of LP coefficients; and
`a bit packing device to transmit encoded speech compris
`ing a codebook index.
`2. The speech encoder of claim 1, wherein a residual signal
`is associated with the pitch analyzer.
`3. The speech encoder of claim 2, wherein the codebook
`index is based on the residual signal.
`4. The speech encoder of claim 1, wherein the sampler is
`samples the speech signal at a sampling rate of 8 kHz.
`5. A method for encoding speech, the method comprising:
`sampling a speech signal to generate samples:
`producing spectral representations from the samples;
`interpolating the spectral representations to generate inter
`polated spectral representations;
`performing open-loop pitch analysis based on the interpo
`lated spectral representation; and
`transmitting encoded speech comprising a codebook
`index.
`6. The method of claim 5, wherein a residual signal is
`associated with the open-loop pitch analysis.
`7. The method of claim 6, wherein the codebook index is
`based on the residual signal.
`8. The method of claim 5, wherein a sampling rate of the
`speech signal is 8 kHz.
`9. A method for encoding speech, the method comprising:
`sampling a speech signal to generate samples:
`producing a first set of linear predication (LP) coefficients
`based on the samples;
`
`contains only single precision numbers; therefore storage can
`be reduced to 553 words. The code is written such that the five
`addresses are computed from the pulse locations starting with
`the 5th location (Assumes pulse location range from 1 to 80).
`The address of the 5th pulse is 2*L5+393. The factor of 2 is
`due to double precision storage of L5's elements. The address
`of L4 is 2*L4+235, for L3, 2*L3+77, for L2, L2–1. The
`numbers stored at these locations are added and a 25-bit
`number representing the unique set of locations is produced.
`A block diagram of the enumerative encoding schemes is
`listed.
`Excitation Decoding
`Decoding the 25-bit word at the receiver involves repeated
`subtractions. For example, given B is the 25-bit word, the 5th
`location is found by finding the value X such that
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`then L5–x–1. Next let
`
`Ex. 1023 / Page 18 of 19
`
`

`

`US 7,599,832 B2
`
`producing spectral representations from the first set of LP
`coefficients;
`interpolating the spectral representations to generate inter
`polated spectral representations;
`converting the interpolated spectral representations to a 5
`second set of LP coefficients;
`performing open-loop pitch analysis with the second set of
`LP coefficients; and
`transmitting encoded speech comprising a codebook 10
`index.
`10. The method of claim 9, wherein a sampling rate of the
`speech signal is 8 kHz.
`11. The method of claim 9, wherein a residual signal is
`associated with the open-loop pitch analysis.
`12. The method of claim 11, wherein the codebookindex is
`based on the residual signal.
`
`15
`
`10
`13. A speech encoder comprising:
`a sampler to generate samples from a speech signal;
`a linear predictive coding (LPC) device to produce spectral
`representations from the samples;
`an interpolator to interpolate the spectral representations to
`generate interpolated spectral representations;
`a pitch analyzer to perform open-loop pitch analysis based
`on the interpolated spectral representations; and
`a bit packing device to transmit encoded speech compris
`ing a codebook index.
`14. The speech encoder of claim 13, wherein a residual
`signal is associated with the pitch analyzer.
`15. The speech encoder of claim 14, wherein the codebook
`index is based on the residual signal.
`16. The speech encoder of claim 13, wherein the sampler is
`samples the speech signal at a sampling rate of 8 kHz.
`
`Ex. 1023 / Page 19 of 19
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket