throbber
United States Patent [191
`Taniguchi et al.
`
`I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111
`US005115469A
`5,115,469
`[11] Patent Number:
`[45] Date of Patent: May 19, 1992
`
`[75]
`
`[54] SPEECH ENCODING/DECODING
`APPARATUS HAVING SELECTED
`ENCODERS
`Inventors: Tomohiko Taniguchi, Yokohama;
`Kohei Iseda; Koji Okazaki, both of
`Kawasaki; Fumio Amano, Tokyo;
`Shigeyuki Unagami, Atsugi;
`Yoshinori Tanaka, Yokohama; Yasuji
`Ohta, Kawasaki, all of Japan
`[73] Assignee: Fujitsu Limited, Kawasaki, Japan
`[21] Appl. No.:
`460,099
`[22] PCT Filed:
`Jun. 7, 1989
`PCT I JP89/00580
`[86] PCT No.:
`§ 371 Date:
`Feb. 8, 1990
`§ 102(e) Date:
`Feb. 8, 1990
`[87] PCT Pub. No.: W089/12292
`PCT Pub. Date: Dec. 14, 1989
`Foreign Application Priority Data
`[30]
`Jun. 8, 1988 [JP]
`Japan ................................ 63-141343
`Mar. 14, 1989 [JP]
`Japan .................................... 1-61533
`[51]
`Int. Cl.s ................................................ GOlL 5/00
`[52] U.S. Cl. ......................................... 381/36; 381/34
`[58] Field of Search .................................... 381/29-40;
`364/513.5; 375/22-24, 34, 122
`References Cited
`U.S. PATENT DOCUMENTS
`3,067,291 12/1962 Lewinter ............................... 381/31
`
`[56]
`
`3,903,366 9/1975 Coulter ................................. 381/38
`4,005,274 1/1977 Vagliani et al. ...................... 381/32
`4,303,803 12/1981 Yatsuzuka ............................. 581/31
`4,546,342 10/1985 Weaver et al. .................... .-. 375/122
`4,622,680 11/1986 Zinser ................................... 381/31
`Primary Examiner-Emanuel S. Kemeny
`Assistant Examiner-Michelle Doerrler
`Attorney, Agent, or Firm-Staas & Halsey
`ABSTRACT
`[57]
`Several encoders perform a local decoding of a speech
`signal and extract excitation information and vocal tract
`information from a speech signal for an encoding opera(cid:173)
`tion. The transmission rate ratio between the excitation
`information and the vocal tract information are differ(cid:173)
`ent for each encoder. An evaluation/selection unit eval(cid:173)
`uates the quality of decoded signals subjected to a local
`decoding in each of the encoders, determines the most
`suitable encoders from among the several encoders
`based on the result of the evaluation, and selects the
`most suitable encoder, thereby outputting the selection
`result as selection information. The decoder decodes a
`speech signal based on selection information, vocal
`information. The
`tract information and excitation
`evaluation/selectiop unit selects the output from the
`encoder in which the quality of a locally decoded signal
`is the most preferable. When voc11I tract information
`changes little, the vocal tract information is not output,
`thereby allowing for increased quality of information.
`As much of the surplus of unused vocal tract informa(cid:173)
`tion as possible is assigned to a residual signal. Thus, the
`quality of a decoded speech signal is improved.
`
`12 Claims, 7 Drawing Sheets
`
`303
`INPUT
`
`ENCODER # 2
`
`301-m
`ENCODER #m
`307-m
`DECODED
`SIGNALS
`
`308
`
`309
`
`DECODER
`
`#n
`
`w
`z
`...J
`
`-
`
`0:::
`0
`I-
`(.)
`w
`...J
`w
`
`(/)
`
`.~
`
`306-m
`302-2
`
`307-1
`
`307
`-2
`
`302-1
`EVALUATION & DECISION
`OF OPTIMUM ENCODER
`
`310
`
`310 SELECT
`INFORMATION
`
`IPR2016-01710
`UNIFIED EX1012
`
`

`
`U.S. Patent
`
`May 19, 1992
`
`Sheet 1 of 7
`
`5,115,469
`
`01-
`lJJ :::>
`Oo..
`
`(!)
`0
`
`C\J
`0
`
`-Cl
`
`0:::
`f:?
`u
`lJJ a:::
`a..
`
`t-
`a:::
`-
`.
`<{
`(!) a::
`-o
`LL.~
`a..
`
`01-u :::> Zo
`lJJ
`r - - - - - - - --------,
`I
`x

`I
`0
`:::>
`I
`~
`I
`z en
`oa::
`_w
`I
`t-t-
`I
`uw
`I
`o:?
`I
`LLJ <(
`a:: a::
`I
`Q.<(
`I
`Q.
`I
`0
`I
`t<)
`0
`0
`I
`~
`I
`I
`I
`I
`I
`I
`I
`I
`.....
`I
`Zz
`I
`I
`Cl: 0 :::>
`I
`<t t;cn
`I
`w -
`I
`I
`zocn
`:JW ~ 0
`I
`I o:::
`I w
`I
`Cl: <(
`18
`O.z
`I
`<t
`I~
`I
`~--- _.:_ _______ _J
`
`..J
`<( ..J
`::> <(
`Oz
`
`-Cf) l!>
`
`lJJ -
`0::: Cf)
`
`LO
`0
`
`.J
`t- <(
`=>z
`
`a,;-zcn -
`
`

`
`\0
`O'\
`
`... .s;;.
`UI
`.,...
`... .,...
`UI
`
`......
`Q ....
`N
`.....
`l'T>
`00 =(cid:173)
`l'T>
`
`~ ....
`'< ....
`~
`
`~
`
`N
`\C
`\C
`
`('!) =
`'"'d =
`•
`00
`~ •
`
`f""f'-
`
`f""f'-
`
`PRIOR ART
`
`FIG. 2
`
`CN
`
`212
`
`EVALUATION
`POWER
`ERROR
`
`211
`
`202
`
`FUNCTION
`
`I >t WEIGHT
`
`I
`
`QUANTIZER
`
`I
`
`204 I
`
`2067 207(Cp)
`
`203
`
`(z-m)l--f'>J 205(G)
`
`PREDICTOR
`
`PATTERN
`RESIDUAL
`
`208
`
`209
`
`210
`
`201
`
`ANALYSIS
`LP C
`
`SIGNAL
`
`INPUT
`
`

`
`°' \C>
`.t:;.
`...
`(II
`.....
`... .....
`
`(II
`
`0 ....
`w
`00 =(cid:173)~ a
`
`........
`
`~ ....
`'< ....
`~ =
`
`~
`\C
`\C
`
`('!) =
`"'C =
`•
`00
`~ •
`
`f"'I"
`
`f"'I"
`
`UTPUT
`
`0
`31
`
`INFORMATION
`
`310 SELECT
`
`~
`
`I 310
`
`OF OPTIMUM ENCODER
`EVALUATION & DECISION
`30211
`
`302~ 2
`
`-m
`
`300
`
`~ -2
`307---J t---30 7-I
`---
`~ ·
`
`SIGNALS
`
`DECODED
`
`307-m
`
`~ DECODER #n
`
`309}
`
`a::
`
`ENCODER #2 hi~ ~~·~
`
`~
`
`_J
`
`Cf)
`'W
`_J
`
`I
`I
`
`-301-211306-2:
`
`'
`
`I
`I
`
`301-m
`
`ENCOD~R nm I II~
`
`lf308
`
`INPUT • ~EN CODER # I
`
`.301-1
`
`303
`
`PARAMETERS
`304 VOCAL TRACT
`
`PARAMETERS
`
`305 EXCITATION
`
`FIG. 3
`
`

`
`U.S. Patent
`
`May 19, 1992
`
`Sheet 4 of 7
`
`5,115,469
`
`FIG. 4
`
`403-1
`
`RESIDUAL
`QUANTIZER 1-6_B~IT1-S ___ S_l_GN_A_L __ ~ 6 BITS
`/FRAME
`408
`
`406-1
`
`404-1
`
`PREDICTOR
`
`405-1
`INPUT SIGNAL
`
`401
`
`402
`
`LPC
`ANALYSIS
`
`LPC
`PARAMETERS
`407-1
`SN (A)
`-----------EVALUATION---
`
`2BITS
`/FRAME
`409
`
`405-2
`
`403-2
`
`RESIDUAL SIGNAL
`
`QUANTIZER BBlTS
`1 - - - - - - - -1___ . - " ' ( )
`
`BBITS
`/FRAME
`412
`
`....------~+ 406-2
`404-2
`
`~13
`
`PREDICTOR
`
`411
`
`MODE
`DETERMINA
`-TION
`
`FRAME
`DELAY
`
`A
`fa
`410
`......_ ________ __.,.EVALUATION
`......_ ____ SN (B)
`
`407-2
`
`

`
`ft> -UI
`
`00 =(cid:173)
`
`ft>
`
`0 .....
`
`N
`\Q
`\Q
`~
`
`~ =
`
`~\Q
`~
`'-<
`
`~ a (D = «-1'-
`
`•
`00
`~ •
`
`503
`
`DMUX
`
`505
`
`503
`
`REPRODUCED .
`
`SIGNAL
`
`(/)
`w
`(.) w _.
`
`1-
`0
`0:::
`
`~ °' \0
`...
`U1
`~
`~
`...
`U1
`
`--.l
`
`INFOMATION i
`
`NUMBER
`ENCODER
`
`I
`_____ '---____ _J
`I
`I
`
`FIG. 5
`
`502
`
`---------------
`
`OMPUTING CEPSTRUM DISTANC
`CD
`
`LPC
`
`507-2
`
`INFORMATION
`
`SIGNAL
`RESIDUAL
`
`---iTso~-2
`
`I
`I
`BOOK
`I
`I
`CODE
`NOISE :
`WHITE I
`
`MUX
`
`--------;
`
`t-+----' I PARAMETERS
`
`IPREDlCTION
`
`soa-2..----
`
`-TlON
`
`EVALUA
`ERROR
`512-2
`
`,..,,,.~~
`
`515
`
`sis--
`L_ ________ S /NA
`
`J I
`
`I
`
`51~ B
`L
`1
`
`L
`I
`I
`I
`
`:
`I
`:
`I
`
`I
`:
`
`( ~
`517
`
`~gg~ I INFOR~1J~
`NOISE I SIGNAL
`WHITE I RESIDUAL
`
`I
`I
`
`~F'
`
`:
`
`..----
`
`507-1
`
`!
`
`508 -I
`
`_
`T~ou
`LONG-'-
`509-1
`
`P
`~
`m
`512-f 5i
`
`-TTON
`EVALUA
`ERROR
`
`---REPROOUCW-----------------
`
`_}501-1
`
`w
`a:
`
`ANALYSIS
`PREDICTION
`LINEAR
`
`I
`I
`r---
`I
`I
`I
`I
`I
`I
`
`510-1
`
`511-1
`SIGNAL A
`
`-
`
`(SPEECH,
`SIGNAL I
`I
`INPUT
`r
`
`

`
`U.S. Patent
`
`May 19, 1992
`
`Sheet 6 of 7
`
`5,115,469
`
`START
`
`YES
`
`NO
`
`YES
`
`S2
`
`NO
`
`54
`
`A-MODE
`
`8-MODE
`
`FIG. 6
`
`

`
`U.S. Patent
`
`May 19, 1992
`
`Sheet 7 of 7
`
`5,115,469
`
`CODE NUMBER
`
`1600 bps
`
`GAIN
`
`PITCH FREQUENCY
`
`PITCH COEFFI ClENT
`
`LPC PARAMETERS
`TOTAL
`
`1000
`
`600
`
`600
`
`1000
`
`"
`•
`
`"
`
`•
`
`4800 bps
`
`FIG. 7A
`PRIOR ART
`
`A-MODE
`
`8-MODE
`
`CODE NUMBER
`
`1600_ bps
`
`GAIN
`
`PITCH FREQUENCY
`
`PITCH COEFFICIENT
`LPC PARAMETERS
`MODE SIGNAL
`
`1000
`
`600
`
`600
`950
`50
`
`H
`
`"
`
`"
`,,
`
`,,
`
`2200 bps
`,,
`
`1350
`
`600
`
`600
`
`"
`
`.,
`
`50
`
`,,
`
`TOTAL
`
`4800bps
`
`4800 bps
`
`FIG. 78
`
`

`
`1
`
`5, 115,469
`
`SPEECH ENCODING/DECODING APPARATUS
`HAVING SELECTED ENCODERS
`
`JO
`
`15
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`The present invention relates to a speech encoding
`and decoding apparatus for transmitting a speech signal
`after information compression processing has been ap-
`plied.
`Recently, a speech encoding and decoding apparatus
`for compressing speech information to data of about 4
`to 16 kbps at a high efficiency has been demanded for
`in-house communication systems, digital mobile radio
`systems and speech storing systems.
`2. Description of Related Art
`As the first prior art structure of a speech prediction
`encoding apparatus, there is provided an adaptive pre(cid:173)
`diction encoding apparatus for multiplexing the predic(cid:173)
`tion parameters (vocal tract information) of a predictor 20
`and residual signal (excitation information) for transmis(cid:173)
`sion to the receiving station.
`FIG. 1 is a block diagram of an encoder used in the
`speech encoding apparatus of the first prior art struc·
`ture. Encoder 100, comprises linear prediction analysis 25
`unit 101, predictor 102, quantizer 103, multiplexing unit
`104 and adders 105 and 106.
`Linear prediction analysis unit 101 analyzes input
`speech signals and outputs prediction parameters, and
`predictor 102 predicts input signals using an output 30
`from adder 106 (described below) and prediction pa(cid:173)
`rameters from linear prediction analysis unit 101. Adder
`105 outputs error data by computing the difference
`between an input speech signal and the predicted signal,
`quantizer 103 obtains a residual signal by quantizing the 35
`error data, and adder 106 adds the output from predic(cid:173)
`tor 102 to that of quantizer 103, thereby enabling the
`output to be fed back to predictor 102. Multiplexing unit
`104 multiplexes prediction parameters from linear pre(cid:173)
`diction analysis unit 101 and a residual signal from quan- 40
`tizer 103 for transmission to a receiving station.
`With such a structure, linear prediction analysis unit
`101 performs a linear prediction analysis of an input
`signal at every predetermined frame period, thereby
`extracting prediction parameters as vocal tract infon.na· 45
`tion to which appropriate bits are assigned by an en(cid:173)
`coder (not shown). The prediction parameters are thus
`encoded and output to predictor 102 and multiplexing
`unit 104. Predictor 102 predicts an input signal based on
`the prediction parameters and an output from adder 50
`106. Adder 105 computes the error data (the difference
`between the predicted information and the input signal),
`and quantizer 103 quantizes the error data, thereby
`assigning appropriate bits to the error data to provide a
`residual signal. This residual signal is output to multi- 55
`plexing unit 104 as excitation information.
`After that, the encoded prediction parameter and
`residual signal are multiplexed by multiplexing unit 104
`and transmitted to a receiving station.
`Adder 106 adds an input signal predicted by predic- 60
`tor 102 and a residual signal quantized by quantizer 103.
`An addition output is again input to predictor 102 and is
`used to predict the input signal together with the pre(cid:173)
`diction parameters.
`In this case, the number of bits assigned to prediction 65
`parameters for each frame is fixed at a-bits per frame
`and the number of bits assigned to the residual signal is
`fixed at /3-bits per frame. Therefore, the (a+/3) bits for
`
`2
`each frame are transmitted to the receiving station. In
`this case, the transmission rate is, for example, 8 kbps.
`FIG. 2 is a block diagram showing a second prior art
`structure of the speech encoding apparatus. This prior
`5 art structure is a Code Excited Linear Prediction
`(CELP) encoder which is known as a low bit rate
`speech encoder.
`Principally, a CELP encoder, like the first prior art
`structure shown in FIG. 1, is an apparatus for encoding
`and transmitting linear prediction . code parameters
`(LPC or prediction parameters) obtained from an LPC
`analysis and a residual signal. However, this CELP
`encoder represents a residual signal by using one of the
`residual patterns within a code book, thereby obtaining
`high efficiency encoding.
`Details of CELP are disclosed in Atal, B. S., and
`Schroeder, M. R. "Stochastic Coding of Speech at
`Very Low bit Rate" Proc.ICASSP 84-1610 to 1613,
`1984, and a summary of the CELP encoder will be
`explained as follows by referring to FIG. 2.
`LPC analysis unit 201 performs a LPC analysis of an
`input signal, and quantizer 202 quantizes the analyzed
`LPC parameters to be supplied to predictor 203. Pitch
`period m, pitch coefficient Cp and gain G, which are
`not shown, are extracted from the input signal.
`A residual waveform pattern (code vector) is sequen(cid:173)
`tially read out from the code book 204 and its respective
`pattern is, at first, input to multiplier 205 and multiplied
`by gain G. Then, the output is input to a feed-back loop,
`namely, a long-term predictor comprising delay circuit
`206, multiplier 207 and adder 208, to synthesize a resid·
`ual signal. The delay value of delay circuit 206 is set at
`the same value as the pitch period. Multiplier 207 multi-
`plies the output from delay circuit 206 by pitch coeffici(cid:173)
`ent Cp.
`A synthesized residual signal output from adder 208 is
`input to a feed-back loop, namely, a short term predic(cid:173)
`tion unit comprising predictor 203 and adder 209, and
`the predicted input signal is synthesized. The prediction
`parameters are LPC parameters from quantizer unit
`202. The predicted input signal is subtracted from an
`input signal at subtracter 210 to provide an error signal.
`Weight function unit 211 applies weight to the error
`signal, taking into consideration the acoustic character(cid:173)
`istics of humans. This is a correcting process to make
`the error to a human ear uniform as the influence of the
`error on the human ear is different dependfog on the
`frequency band.
`The output of weight function unit 211 is input to
`error power evaluation unit 212 and an error power is
`evaluated in respective frames.
`A white noise code book 204 has a plurality of sam(cid:173)
`ples of residual waveform patterns (code vectors), and
`the above series of processes is repeated with regard to
`all the samples. A residual waveform pattern whose
`error power within a frame is minimum is selected as a
`residual waveform pattern of the frame.
`As described above, the index of the residual wave(cid:173)
`form pattern obtained for every frame as well as LPC
`parameters from quantizer 202, pitch period m, pitch
`coefficient Cp and gain G are transmitted to a receiving
`station (not shown). The receiving station forms a long(cid:173)
`term predictor with transmitted pitch period m and
`pitch coefficient Cp as is similar to the above case, and
`the residual waveform pattern corresponding to a trans-
`mitted index is input to the long-term predictor, thereby
`reproducing a residual signal. Further, the transmitted
`
`

`
`3
`LPC parameters form a short-term predictor as is simi(cid:173)
`lar to the above case, and the reproduced residual signal
`is input to the short-term predictor, thereby reproduc(cid:173)
`ing an input signal.
`Respective dynamic characteristics of an excitation 5
`unit and a vocal tract unit in a sound producing struc(cid:173)
`ture of a human are different, and the respective data
`quantity to be transmitted at arbitrary points by the
`excitation unit and vocal tract unit are different.
`However, with a conventional speech encoding ap- 10
`paratus as shown in FIGS. 1 or 2, excitation information
`and vocal tract information are transmitted at a fixed
`ratio of data quantity. The above speech characteristics
`are not utilized. Therefore, when the transmission rate
`is low, quantization becomes coarse, thereby increasing 15
`noise and making it difficult to maintain satisfactory
`speech quality.
`The above problem is explained as follows with re(cid:173)
`gard to the conventional examples shown in FIGS. 1 or
`1
`
`In a speech signal there exists a period in which char(cid:173)
`acteristics change abruptly and a period in which the
`state is constant, and the latter value of the prediction
`parameters do not change too much. Namely, there are
`cases where co-relationship between the prediction 25
`parameters (LPC parameters) in continuous frames is
`strong, and cases where they are not strong. Conven(cid:173)
`tionally, prediction parameters (LPC parameters) are
`transmitted at a constant rate with regard to each frame.
`Consequently, the characteristics of the speech signals 30
`are not fully utilized. Therefore, the transmission data
`causes redundancies and the quality of the reproduced
`speech in the receiving station is not sufficient for the
`amount of transmission data.
`
`SUMMARY OF THE INVENTION
`An object of the present invention is to provide a
`mode-switching-type speech encoding/decoding appa(cid:173)
`ratus for providing a plurality of modes which depend
`on the transmission ratio between excitation informa- 40
`tion and vocal tract information, and, upon encoding,
`switching to the mode in which the best reproduction of
`speech quality can be obtained.
`Another object of the present invention is to suppress
`redundancy of transmission information, which pre- 45
`vents relatively stable vocal tract information from
`being transmitted, and instead assigning a lot of bits to
`excitation information, which is useful for an improve(cid:173)
`ment of quality, thereby increasing the quality of the
`reproduced speech. In order to achieve the above ob- 50
`ject, the present invention has adopted the following
`structure.
`The present invention relates to a speech encoding
`apparatus for encoding a speech signal by separating the
`characteristics of said speech signal into articulation SS
`information (generally called vocal tract information)
`representing articulation characteristics of said speech
`signal, and excitation information representing excita(cid:173)
`tion characteristics of said speech signal. Articulation
`characteristics are frequency characteristics of a voice 60
`formed by the human vocal tract and nasal cavity, and
`sometimes refer to only vocal tract characteristics.
`Vocal tract information representing vocal tract char(cid:173)
`acteristics comprise LPC parameters obtained by form(cid:173)
`ing a linear prediction analysis of a speech signal. Exci- 6S
`tation information comprises, for example, a residual
`signal. The present invention is also based on a speech
`decoding apparatus. The present invention based on
`
`5,115,469
`
`w
`
`4
`above speech encoding/decoding apparatus has the
`structure shown in FIG. 3.
`A plurality of encoding units (or "ENCODERS #1
`to #m")301-1to301-m locally decode speech signal (or
`"INPUT") 303 by extracting vocal tract information
`(or "VOCAL TRACT PARAMETERS") 304 and
`excitation information (or "EXCITATION PARAME(cid:173)
`TERS") 305 from the speech signal 303, by performing
`a local decoding on it. The vocal tract information and
`excitation information are generally in the form of pa(cid:173)
`rameters. The transmission ratios of respective encoded
`information are different, as shown by the reference
`numbers 306-1to306-m in FIG. 3. The above encoding
`units comprise a first encoding unit for encoding a
`speech signal by locally decoding it, and extracting
`LPC parameters and a residual signal from it at every
`frame, and a second encoding unit for encoding a
`speech signal by performing a local decoding on it and
`extracting a residual signal from it using the LPC pa(cid:173)
`rameters from the frame several frames before the cur(cid:173)
`rent one, the LPC parameters being obtained by the
`first encoding units.
`Next, evaluation/selection units (or "EVALUA(cid:173)
`TION AND DECISION OF OPTIMUM EN(cid:173)
`CODER") 302-1/302-2 evaluate the quality of respec(cid:173)
`tive decoded signals 07-1 to 307-m subjected to local
`decoding by respective encoding units 301-1 to 301-m,
`thereby providing the evaluation result. Then they de(cid:173)
`cide and select the most appropriate encoding units
`from among the encoding units 301-1 to 301-m, based
`on the evaluation result, and output a result of the selec-
`tion (or "SELECT") as selection information 310. The
`evaluation/selection units each comprise evaluation
`decision unit 302-1 and selection unit 302-2, respectively
`35 as shown in FIG. 3.
`The speech encoding apparatus of the above struc(cid:173)
`ture outputs vocal tract information 304 and excitation
`information 305 encoded by the encoding units selected
`by evaluation/selection units 302-302-2, and outputs
`selection information 310 from evaluation/selection
`unit 302-1/302-2, to, for example, line 308.
`Decoding unit (or "DECODER #") 309 decodes
`speech signal 311 from selection information 310, vocal
`tract information 304 and excitation information 305,
`which are transmitted from the speech encoding appa(cid:173)
`ratus.
`With such a structure, evaluation/selection unit
`302-1/302-2 selects encoding output 304 and 305 of the
`encoding unit, which is evaluated to be of good quality
`by decoding signals 307-1 to 307-m subjected to local
`decoding.
`In the portions of the speech signal in which vocal
`tract information does not change, the LPC parameter
`is not output, thereby causing a surplus of information.
`As much of the surplus as possible is assigned to a resid(cid:173)
`ual signal, thereby improving the quality of decoded
`signal (or "OUTPUT") 311 obtained in a speech decod(cid:173)
`ing apparatus.
`· In the block diagram shown in FIG. 3, the speech
`encoding apparatus is combined with the speech decod(cid:173)
`ing apparatus through a line 308, but it is clear that only
`the speech encoding apparatus or only the speech de(cid:173)
`coding apparatus may be used at one time. Thus, the
`output from the speech encoding apparatus is stored in
`a memory, and the input to the speech decoding appara(cid:173)
`tus is obtained from the memory.
`Vocal tract information is not limited to LPC param(cid:173)
`eters based on linear prediction analysis, but may be
`
`

`
`5,115,469
`
`5
`cepstrum parameters based, for example, on cepstrum
`analysis. A method of encoding the residual signal by
`dividing it into pitch information and noise information
`by a CELP encoding method or a RELP (Residual
`Excited Linear Prediction) method, for example, may 5
`be employed.
`
`30
`
`6
`The A-mode encoder produces current frame predic(cid:173)
`tion parameters (LPC parameters) as vocal tract infor(cid:173)
`mation from output terminal 409, and a residual signal
`as excitation information through output terminal 408.
`In this case, the transmission rate of the LPC parame(cid:173)
`ters is f3 bits/frame and that of a residual signal is a
`bits/frame. The B-mode encoder outputs a residual
`BRIEF DESCRIPTION OF THE DRAWINGS
`signal from output terminal 412 by using LPC parame-
`ters of the previous frame or a frame which is several
`FIG. 1 shows a block diagram of a first prior art
`10 frames before the current frame. In this case, the trans-
`structure,
`mission rate of the residual signal is (a+/3)bits/frame,
`FI G. 2 shows a block diagram of a second prior art
`so the number of bits for the residual signal can be in-
`structure,
`creased by the number of bits that are not being used for
`FIG. 3 depicts a block diagram for explaining the
`the LPC parameters, as the LPC parameters vary little.
`principle of the present invention,
`FIG. 4 shows a block diagram of the first embodi- 15 Input signals to predictors 404-1 and 404-2 are locally
`ment of the present invention,
`decoded outputs from adders 406-1 and 406-2. They are
`FIG. 5 represents a block diagram of the second
`equal to signals that are decoded in the receiving sta-
`embodiment of the present invention,

`1
`tton. Eva uation units 407-1 and 407-2 compare these
`FIG. 6 depicts an operation flow chart of the second
`locally decoded signals with their input signals from
`20 input terminal 401 to evaluate the quality of the de(cid:173)
`embodiment,
`FIG. 7A shows a table of an ass1·gnment of bt"ts to be
`coded speech. Signal to quantization noise ratio SNR
`transmitted in the second prior art, and
`within a frame, for example, is used for this evaluation,
`
`

`t f b"t t b t
`
`f t bl FIG 7B · ts a a e o an ass1gnmen o 1 s o e rans-
`.
`mitted in the second embodiment of the present inven-
`enabling evaluation units 407-1 and 407-2 to output
`tion.
`25 SN(A) and SN(B). The mode determination unit 413
`compares these signals, and if SN(A)>SN(B), a signal
`designating A-mode is output, and if SN(A)<SN(B), a
`signal designating B-mode is output.
`A signal designating A-mode or B-mode is transmit(cid:173)
`ted from mode determination unit 413 to a selector (not
`shown). Signals from output terminals 408, 409, and 412
`are input to the selector. When the selector designates
`A-mode, the encoded residual signal and LPC parame(cid:173)
`ters from output terminals 408 and 409 are selected and
`output to the opposite station. When the selector desig(cid:173)
`nates B-mode, the encoded residual signal from output
`terminal 412 is selected and output to the opposite sta(cid:173)
`tion.
`Selection of A- or B-modes is conducted in every
`frame. The transmission rate is (a+f3) bits per frame as
`described above and is not changed in any mode. The
`data of(a+/3) bits per frame is transmitted to a receiv(cid:173)
`ing station after a bit per frame representing an A/B
`signal designating whether the data is in A-mode or
`B-mode is added to the data of (a+f3) bits per frame.
`The data obtained in B-mode is transmitted if B-mode
`provides better quality. Therefore, the quality of repro(cid:173)
`duced speech in the present invention is better than in
`the prior art shown in FIG. 1, and the quality of the
`reproduced speech in the present invention can never
`be worse than in the prior art.
`FIG. 5 is a structural view of the second embodiment
`of this invention. This embodiment corresponds to the
`second prior art structure shown in FIG. 2. In FIG. 5,
`501-1 and 501-2 depict encoders. These encoders are
`both CELP encoders, as shown in FIG. 2. One of them,
`501-1, performs linear prediction analysis on every
`frame by slicing speech into 10 to 30 ms portions, and
`outputs prediction parameters, residual waveform pat(cid:173)
`tern, pitch frequency, pitch coefficient, and gain. The
`other encoder, 501-2, does not perform linear prediction
`analysis, but outputs only a residual waveform pattern.
`Therefore, as described later, encoder 501-2 can assign
`more quantization bits to a residual waveform pattern
`than encoder 501-1 can.
`The operation mode using encoder 501-1 is called
`A-mode and the operation mode using encoder 501-2 is
`called B-mode.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`The embodiment of the present invention will be
`explained by referring to the drawings.
`FIG. 4 shows a structural view of the first embodi(cid:173)
`ment of the present invention, and this embodiment
`corresponds to the first prior art structure shown in
`FIG. I.
`The first quantizer 403-1, predictor 404-1, adders 35
`405-1 and 406-1, and LPC analysis unit 402 correspond
`to the portions designated by 103, 102, 105, 106, and
`101, respectively, in FIG. 1, thereby providing an
`adaptive prediction speech encoder. In this embodi(cid:173)
`ment, a second quantizer 403-2, a second predictor 40
`404-2, and additional adders 405-2 and 406-2 are further
`provided. The LPC parameters applied to predictor
`404-2 are provided by delaying the output from LPC
`analysis unit 402 in frame delay circuit 411 through
`terminal A of switch 410. The portions in the upper 45
`stage of FIG. 4, which correspond to those in FIG. 1,
`cause output terminals 408 and 409 to transmit LPC
`parameters and a residual signal, respectively. This is
`defined as A-mode. The signal transmitted from output
`terminal 412 in the lower stage of FIG. 4 is only the 50
`residual signal, which is defined as B-mode. Evaluation
`units 407~1 and 407-2 evaluate the SIN of the encoder of
`the A- or B-mode. Mode determining (or "MODE
`DETERMINATION") portion 413 produces a signal
`A/B for determining which mode should be used (A- 55
`mode or B-mode) to transmit the output to an opposite
`station (i.e. receiving station) (not shown), based on the
`evaluation. Switch (SW) unit 410 selects the A side
`when the A-mode is selected in the previous frame.
`Then, as LPC parameters of the B-mode for the current 60
`frame, the values of the A-mode of the previous frame
`are used. When the B-mode is selected in the previous
`frame, the B side is selected and the values of the B(cid:173)
`mode in the previous frame, namely, the values of the
`A-mode in the frame which is several frames before the 65
`current frame, are used.
`In this circuit structure, the encoders of the A-and B
`modes operate in parallel with regard to every frame.
`
`

`
`5,115,469
`
`8
`7
`waveform distortion and spectral distortion of repro-
`In encoder 501-1, linear prediction analysis unit 506
`performs the same function as both LPC analysis unit
`duced speech signals A and B to evaluate the quality of
`speech reproduced by encoders 501-1 or 501-2. In other
`201 and quantizing unit 202. White noise code book
`words, unit 502 uses segmental S/N and LPC cepstrum
`507-1, gain controller 508-1, and error computing unit
`511-1, respectively, correspond to those features desig- 5 distance (CD) of respective frames in parallel to evalu-
`ate the quality of reproduced speech.
`nated by the reference numbers 204, 205, and 210 in
`FIG. 2. Long-term prediction (or "LONG-TERM
`Therefore, quality evaluation/encoder selection unit
`PREDICTOR") unit 509-1 corresponds to those fea-
`502 is provided with cepstrum distance (or "CD") com-
`tures designated by the reference numbers 206 to 208 in
`puting unit 515, operation mode judgement unit 516,
`FIG. 2. It performs an excitation operation by receiving 10 and switch 514.
`pitch data as described in conjunction with the second,
`Cepstrum distance computing unit 515 obtains the
`prior art structure. Short-term prediction (or "SHORT- .
`first LPC cepstrum coefficients from the LPC parame-
`TERM PREDICTOR"), unit 510-1 corresponds to
`ters that correspond to the present frame, and that have
`those features represented by the reference numbers 203
`been obtained from linear prediction analysis unit 516.
`and 209 in FIG. 2, and functions as a vocal tract by 15 Cepstrum distance computing unit 515 also obtains the
`receiving prediction parameters as described in the
`second LPC cepstrum coefficients from the LPC pa-
`second prior art. In addition, error evaluation unit 512-1
`rameters that are obtained from coefficient memory 513
`corresponds to those features designated by the refer-
`and are currently used in the B-mode. Then it computes
`ence numbers 211 and 212 in FIG. 2, and performs an
`LPC cepstrum distance CD in the current frame from
`evaluation of error power as described in conjunction 20 the first and second LPC cepstrum coefficients. It is
`with the second prior art structure. In this case, error
`generally accepted that the LPC cepstrum distance thus
`evaluation unit 512-1 sequentially designates addresses
`obtained clearly expresses the difference between the
`(phases} in white noise code book 507-1, and performs
`above two sets of vocal tract spectral characteristics
`evaluations of error power of all the code vectors (re-
`determined by preparing LPC parameters (spectral
`sidual patterns) as described in the second prior art 25 distortion).
`Operation mode judgement unit 516 receives segmen-
`structure. Then it selects the code vector that has the
`lowest error power, thereby producing, as the residual
`ta! SINA and SINB from encoders 501-1 and 501~2, and
`signal information, the number of the selected code
`receives the LPC cepstrum distance (CD) from cep-
`vector in white noise code book 507-1.
`strum distance computing unit 515 to perform the pro-
`Error evaluation unit 512-1 also outputs a segmental 30 cess shown in the operation flow chart of FIG. 6.
`SIN (SINA) that has waveform distortion data within a
`This process will be described later.
`frame.
`Where operation mode judgement unit 516 selects the
`Encoder 501-1, described in reference to FIG. 2,
`A-mode (encoder 501-1), switch 514 is switched to the
`produces encoded prediction (or "PREDICTION")
`A-mode terminal side. Where operation mode judge-
`parameters (LPC parameters) from linear prediction 35 ment unit 516 selects B-mode (encoder 501-2), switch
`514 is switched to the B-mode terminal side. Every time
`analysis unit 506. It also produces encoded pitch period,
`pitch coefficient and gain (not shown).
`A-mode is produced (output from encoder 501-1 is
`In encoder 501-2, the portions designated by the ref-
`selected) by a switching operation of switch 514, coeffi-
`erence numbers 507-2 to 512-2 are the same as respec-
`cient memory 513 is renewed. When the B-mode is
`tive portions designated by reference numbers 507-1 to 40 produced (so that the output from encoder 501-2 is
`512-1 in encoder 501-1. Encoder 501-2 does not have
`selected) coefficient memory 513 is not renewed and
`linear prediction analysis unit 506; instead, it has coeffi-
`maintains the current values. Multiplexing (or "MUX")
`cient memory 513. Coefficient memory 513 holds pre-
`unit 504 multiplexes residua

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket