`US005353374A
`
`United States Patent
`
`[191
`
`[11] Patent Number:
`
`5,353,374
`
`Wilson et al.
`
`[45] Date of Patent:
`
`Oct. 4, 1994
`
`[54] LOW BIT RATE VOICE TRANSMISSION
`FOR USE IN A NOISY ENVIRONMENT
`
`Noise Filtering”, IEEE/ASSP Journal, Jan. 1988, pp.
`139-140.
`
`[75]
`
`Inventors:
`
`Dennis L. Wilson, Palo Alto; James
`L. Wayman, Pebble Beach, both of
`Calif.
`
`Primary Examiner—David D. Knepper
`Attorney, Agent, or Firm—Perman & Green
`
`ABSTRACI“
`[57]
`A compressing a voice signal by the steps of (a) digitiz-
`ing an input signal that includes a voice signal, the input
`signal including a coherent noise component; and (b)
`compressing the digitized voice signal with a synchro-
`nized overlap add processor (20). So as to prevent the
`synchronized overlap add processor from locking to the
`coherent noise component,
`the step of compressing
`includes an initial step of applying the digitized input
`signal to a linear predictor (16), the linear predictor
`having time constants selected for attenuating the co-
`herent noise component of the input signal. The residual
`signal output of the linear predictor includes the voice
`signal, and an uncorrelated noise component if one is
`present in the input signal. The operation of the syn-
`chronized overlap add processor also functions to atten-
`uate the incoherent noise component. Further compres-
`sion of the compressed voice signal is accomplished by
`Huffman coding, arithmetic coding, or transform cod-
`ing so as to provide a greatly compressed voice signal
`that, when subsequently expanded, is found to exhibit
`excellent voice quality.
`
`17 Claims, 3 Drawing Sheets
`
`[73] Assignee: Loral Aerospace Corporation, New
`York, N.Y.
`
`[21] Appl. No.: 963,101
`
`[22] Filed:
`
`Oct. 19, 1992
`
`Int. Cl.5 ......................... .. Gl0L 9/14; GIOL 3/02
`[51]
`[52] U.S. Cl. ........ ..
`. 395/235; 395/228
`[58] Field of Search .................................. .. 381/29-40;
`395/235, 2.28, 2.25-2.27, 2.36, 2.37
`
`
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`1/1988 Papamichalis
`4,718,087
`1/ 1988 Bertrand .......
`4,720,861
`4,724,535 2/1988 Ono
`4,752,956 6/1988 Sluijter ..
`4,797,925 1/ 1989 Lin ............
`4,815,134 3/1989 Picone et al.
`4,864,620 9/ 1989 Bialick ..... ..
`4,903,303 2/1990 Taguchi
`4,932,061
`6/1990 Kroon et al. ....................... 395/2.32
`
`
`
`.
`
`OTHER PUBLICATIONS
`
`Wayman et al., “Some Improvements on the Syn-
`chronized—Over1ap—Add Method of Time Scale Modi-
`fication for Use in Real—Time Speech Compression and
`
`VOICE SIGNAL
`II
`
`
`
`LOW BlT- RATE
`
`
`
`
` LINEAR 3Yg’§E”§L‘3:gZE° VOICE SlGNAL :2 M;
`
`
`FILTER
`I
`"CTOR
`ADD
`24a
`
`\\ENCODER
`
` DlFFERENTlAL
`
`PROCESSING
`
`
`Apple 1044
`
`U.S. Pat. 9,189,437
`
`Apple 1044
`U.S. Pat. 9,189,437
`
`
`
`S.U
`
`h
`
`4M
`
`._<zo_mmo_o>ozaooomN_zom:oz>mm.m<m_z_._N.m.m:.<m-:m30...momm.__P.._<zo_mmo_o>
`._0_n_
`
`
`
`owmooq
`a<._mm>omo»o_omE
`
`
`Em_».,_w_ooM.o_E3ommooozm/r
`W”NN_m_m_1,2o_»<N:z<:o._,m._~___,_m%~H.m%..n_u.._~..__nn._
`
`
`SI
`
`N0E
`
`W...m:<m¢._m30..
`
`
`3Em>2ooo2_mmmoomn_
`
`
`
`
`rESEozaoomom._<2c_mmo_o><5._<_»zmmm..:_o._<2o_mmo_o>
`
`5.,n_<.Em>o
`
`3.,
`
`mmooomo%on/,84cm
`
`
`
`owmmom~_zom:oz»mmm
`
`oo¢new
`
`
`
`
`U.S. Patent
`
`Oct. 4, 1994
`
`‘
`
`Sheet 2 of 3
`
`5,353,374
`
`'6
`
`FIG. 3
`
`DIGITIZED
`
`\‘
`
`VOICE SIGNAL
`
`FROM
`A/D :4
`
`
`
`
`COEFFICIENT
`ADJUSTMENT
`
`
`
`
`
`U.S. Patent
`
`Oct. 4, 1994
`
`Sheet 3 of 3
`
`5,353,374
`
`200
`
`200
`
`200
`
`200
`
`200
`
`
`
`1
`
`5,353,374
`
`LOW BIT RATE VOICE TRANSMISSION FOR USE
`IN A NOISY ENVIRONMENT
`
`FIELD OF THE INVENTION
`
`This invention relates generally to voice communica-
`tion methods and apparatus and, in particular, to meth-
`ods and apparatus for transmitting a compressed digi-
`tized voice signal in the presence of noise.
`
`BACKGROUND OF THE INVENTION
`
`There are many applications where a very low bit
`rate digitized voice signal is useful. For example, any
`communication system having a limited bandwidth can
`implement more voice channels within the bandwidth if
`the voice data rate is reduced. Examples of such com-
`munication systems include, but are not limited to, cel-
`lular telephone systems and satellite communications
`systems, such as those that employ L band communica-
`tions. In general, any satellite communication scheme
`can employ bit reduction techniques to simplify the
`processing of the signals.
`A primary example of the use of low bit rate voice
`signals is the enciphered telephone system used by the
`military and intelligence communities. One conven-
`tional approach for maintaining privacy on telephone
`uses a 16 kbit/s continuously variable slope delta modu-
`lation scheme (CVSD) in the transmission of the voice
`signals. However, the quality of the voice is notoriously
`poor, and would most likely not be used were it not for
`the sensitive nature of the conversations. When the bit
`rate is expanded to 32 kbits per second, the quality of
`the CVSD voice is quite good, but the data rate is large
`enough to consume considerably more communication
`bandwidth than the usual telephone channel. By com-
`parison, a standard digital telephone channel uses 64
`kbits per second.
`Another known technique that is used to achieve low
`bit rates is linear predictive coding (LPC). Linear pre-
`dictive coding achieves bit rates of 2.4 kbits per second
`for poor quality, but intelligible speech. However, it is
`often impossible to recognize the speaker when using
`the LPC speech.
`Furthermore, LPC exhibits a problem when a noise
`signal coexists with the desired voice signal, in that the
`prediction algorithm adapts to the noise as well as to the
`speech. The result is that, for low signal-to-noise ratios,
`the speech signal may nearly disappear. This is because
`the noise signal “captures” the Linear Predictive Coder,
`and any residual of the voice signal is greatly reduced in
`amplitude and quality. LPC furthermore .has difficulty
`with both white noise and with coherent noise. Exam-
`ples of coherent noise are 60 Hz hum and the hum of
`machinery.
`The following U.S. Patents all disclose various as-
`pects of Linear Predictive Coding (LPC) as applied to
`speech: U.S. Pat. No. 4,718,087, entitled “Method and
`System Encoding Digital Speech Information”, by
`Panagiotis E. Papamichalis; U.S. Pat. No. 4,720,861,
`entitled “Digital Speech Coding Circuit”, by John P.
`Bertrand; U.S. Pat. No. 4,724,535, entitled “Low Bit-
`Rate Pattern Coding with Recursive Orthogonal Deci-
`sion of Parameters”, by Shigeru Ono; U.S. Pat. No.
`4,797,925, entitled “Method for Coding Speech at Low
`Bit Rates”, by Daniel Lin; U.S. Pat. No. 4,815,134,
`entitled “Very Low Rate Speech Encoder and De-
`coder” by Joseph W. Picone et a1.; U.S. Pat. No.
`4,903,303, entitled “Multi-Pulse Type Encoder Having
`
`l0
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`A Low Transmission Rate”, by Tetsu Taguchi; and
`U.S. Pat. No. 4,932,061, entitled “Multi-Pulse Excita-
`tion Linear-Predictive Speech Coder”, by Peter Kroon
`et al.
`
`Other known voice encoding techniques are not de-
`graded by white noise, but do have difficulty with co-
`herent noise. One example of such a technique is known
`as Synchronized-Overlap-Add (SOLA). By example,
`U.S. Pat. No. 4,864,620, entitled “Method for Perform-
`ing Time-Scale Modification of Speech Information or
`Speech Signals”, by L. Bialik discloses a method for
`determining a value of an overlap and a windowing of
`the speech signal. However, it is believed that the pres-
`ence of correlated noise will capture the overlap calcu-
`lation and degrade the speech quality.
`The present inventors describe an improved SOLA
`technique in an article entitled. “Some improvements
`on the synchronized-overlap-add method of time-
`domain modification for real-time speech compression
`and noise filtering”, IEEE Journal on Acoust. Speech
`and signal Proc., Vol. 36, 1988, pp. 139-40.
`One of the most severe environments for voice com-
`pression is in a vehicle where there exists both white
`noise, due to, for example, the wind, and coherent road
`noise and motor noise. Achievement of low bit rate
`voice encoding in these circumstances is difficult.
`It is thus one object of this invention to provide a low
`bit rate voice encoding technique that provides intelligi-
`ble speech at low signal-to-noise ratios.
`It is a further object of this invention to improve the
`signal-to-noise ratio for low bit rate encoded speech,
`and to suppress both white noise and coherent noise
`when digitally encoding speech.
`
`SUMMARY OF THE INVENTION
`
`The foregoing and other problems are overcome and
`the objects of the invention are realized by a low bit rate
`method for the transmission of voice signals, and by
`apparatus for accomplishing the method. Bit rates of
`one to two kbits per second, and below, are achieved
`for very good quality voice, where the speech is intelli-
`gible and the speaker is easily recognizable. The low bit
`rates are readily accomplished with the disclosed
`method when the Voice signal has little interfering noise
`and, also, in the presence of both white noise and corre-
`lated noise. The method of the invention thus finds
`application in noisy environments, such as in vehicles or
`areas where machinery is in use.
`In accordance with the invention, there is provided a
`method for compressing a voice signal by the steps of
`(a) digitizing an input signal that includes a voice signal,
`the input signal including a coherent noise component;
`and (b) compressing the voice signal with a synchro-
`nized overlap add processor. So as to prevent the syn-
`chronized overlap add processor from locking to the
`coherent noise component,
`the step of compressing
`includes an initial step of applying the digitized input
`signal to a linear predictor, the linear predictor having
`time constants selected for attenuating the coherent
`noise component of the input signal, but not signifi-
`cantly attenuating the voice signal. The residual signal
`output of the linear predictor thus includes the voice
`signal and also an uncorrelated noise component, if one
`is present in the input signal. The operation of the syn-
`chronized overlap add processor also functions to atten-
`uate the incoherent noise component.
`
`
`
`3
`
`5,353,374
`
`4
`
`parent. When the compression is 16 to 1, the repro-
`duced voice signal is intelligible, but has begun to de-
`grade.
`'
`The encoding process is completed by the coding of
`the voice signal by Blocks 22 and 24. The application of
`A law or Mu law companding by quantization block 22
`reduces the signal, that is still basically a 12-bit signal, to
`an 8-bit signal. Any of several known techniques for
`information coding may then be applied by Block 24.
`Huffman coding is a well known technique for informa-
`tion coding, and is operable to reduce the signal to an
`average of two to four hits per sample. With this tech-
`nique, and the good quality time compression of the
`signal provided by the SOLA processor 20, the result-
`ing bit rate of the encoded voice is 2 kbits to 4 kbits per
`second.
`
`10
`
`15
`
`Further compression of the compressed voice signal
`is accomplished by Huffman coding, arithmetic coding,
`or transform coding so as to provide a greatly com-
`pressed voice signal that, when subsequently expanded,
`is found to exhibit excellent voice quality.
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`The above set forth and other features of the inven-
`tion are made more apparent in the ensuing Detailed
`Description of the Invention when read in conjunction
`with the attached Drawing, wherein:
`FIG. 1 is a circuit block diagram illustrating a voice
`compressor that is constructed and operated in accor-
`dance with the invention;
`FIG. 2 is a circuit block diagram illustrating a voice
`decompressor that is constructed and operated in accor-
`dance with the invention;
`FIG. 3 is a circuit block diagram that illustrates in
`greater detail the linear predictor of FIG. 1;
`FIG. 4 is a circuit block diagram that illustrates in
`greater detail the differential processor of FIG. 1;
`FIG. 5 is a circuit block diagram that illustrates in
`greater detail the differential processor of FIG. 2; and
`FIG. 6 is a waveform diagram that illustrates the
`operation of the Synchronized-Overlap-Add processor
`of FIG. 1.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`A block diagram of a voice encoder 10 is shown in
`FIG. 1, and a corresponding voice decoder 30 is shown
`in FIG. 2. In FIG. 1 a voice signal 11 is filtered by a
`block 12 and digitized by an analog-to-digital (A/D)
`converter 14 at a convenient sample rate, such as the
`industry standard rate of 8000 samples per second, using
`12 bit conversion. The signal 11 is filtered at block 12 to
`prevent aliasing by removing frequencies higher than
`4000 Hz. The resulting high quality signal at the output
`of the A/D 14 has a bit rate of 96 kbits per second. In a
`telephone application the 12 bits is reduced to 8 bits by
`A law or Mu-law companding, which encodes the
`voice signal by using a simple non-linearity.
`In accordance with an aspect of the invention, the
`converted voice signal is passed through a linear predic-
`tor 16 to remove the coherent noise. The linear predic-
`tor 16 differs from a conventional LPC filter in two
`important respects. First, the adaptation rate is set to be
`significantly slower than the adaptation rate for a con-
`ventional LPC. Second, the output 16’ is not expressive
`of the value of the coder coefficients, but instead is the
`residual signal after the prediction. The significance of
`these distinctions is described below with respect to
`FIG. 3.
`
`The voice signal is next passed through a differential
`processor 18 which operates by taking successive differ-
`ences between samples to generate a continuous signal
`in the reconstruction. This technique eliminates one
`source of distortion in the signal.
`The voice signal is next passed through a synchro-
`nized overlap and add (SOLA) processor 20. In accor-
`dance with an aspect of the invention, the SOLA pro-
`cessor 20 suppresses the white noise while also reducing
`the effective sample rate by an amount that is adjustable
`so as to achieve a desired quality in the reproduced
`signal. By example, when the signal is compressed by a
`factor of four the result is essentially transparent for the
`voice signal, and the noise is suppressed somewhat. At
`a compression ratio of 8 to 1, the result is nearly trans-
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`Another suitable coding technique employs an arith-
`metic coder to achieve an encoding efficiency that is
`similar to that of the Huffman coder.
`Yet another suitable coding technique is the use of a
`transform coder, or an adaptive transform coder. For
`this approach, the signal is transformed using a fast
`Fourier transform or other transform, typically a trans-
`form that can be executed using a fast algorithm. The
`transform coefficients are quantized, establishing the
`quality of the information coding process. The trans-
`form coefficients are then encoded using the Huffman
`or arithmetic coding techniques. In general, transform
`coding produces a 4:1 to 8:1 compression of the voice
`signal.
`The resulting encoder output 24a, when using the
`transform coder, is l kbits per second to 2 kbits per
`second of high quality voice signal.
`The decoder 30 of FIG. 2, for the low bit rate voice
`signal, follows the path of the encoder 10 in reverse.
`The signal is first passed through a decoder 32 to re-
`move the Huffman or arithmetic information coding,
`and then through a reverse compander to remove the
`non-linearity of the companding. The signal is passed
`through a SOLA expander 34 to recover the original
`time scale of the signal. Finally the differential process-
`ing is removed by an inverse processing step performed
`by Block 36. It is noted that no attempt is made to re-
`verse the linear prediction processing that was applied
`by Block 16 of FIG. 1, since to do so would add the
`coherent noise back to the original signal. The signal is
`then converted from digital to analog by a D/A block
`38, and the analog signal is filtered by Block 40 to pro-
`vide a high quality Voice signal.
`Based on the foregoing, it can be seen that a voice
`signal encoding method of the invention includes the
`steps of employing a linear predictor to suppress a co-
`herent noise component of a digitized voice signal,
`differentially encoding the voice signal, performing a
`synchronized overlap add process to compress the
`voice signal, and coding the resultant compressed voice
`signal to further compress the voice signal to a desired
`low bit-rate.
`The operation of the encoder 10 of FIG. 1 is now
`described in further detail with respect to FIGS. 3
`through 6.
`A presently preferred embodiment of the linear pre-
`dictor 16 is shown in FIG. 3. The digitized, sampled
`voice signal, including both coherent and white noise, is
`successively delayed by a plurality of serially coupled
`delay elements 16a. The collection of delayed samples
`are weighted by Blocks 16c and summed by Block 16d.
`The output is fedback to a coefficient adjustment block
`
`
`
`5
`16b and is used as a predictor of the incoming digitized
`voice signal sample. An error signal is generated by
`Block 16e by taking a difference between the incoming
`sample and the prediction output from the summation
`block 16d. As can be seen, the error is correlated with
`the digitized voice signal sample at each delay time, and
`is used to correct the coefficients used in the prediction.
`The error signal output by Block 16e is the residual
`signal after the predicted signal is removed from the
`incoming signal. The signals that are removed from the
`input are those that can be predicted. In accordance
`with an aspect of the invention, and in that the voice
`signals rapidly change over a given period of, by exam-
`ple, one second, the time constants of the coefficient
`changes are set to be long with respect to one second.
`As a result, the voice signal is not predicted, and ap-
`pears as the linear predictor 16 residual signal output
`16'.
`
`However, the more slowly varying coherent signals,
`such as 60 cycle hum, motor noise, and road noise, are
`predicted and are thus strongly attenuated in the resid-
`ual signal output from the predictor 16.
`It is noted that the attenuation of the frequencies of
`the longer term coherent noise has some effect on the
`voice signal. That is, the frequencies that are prominent
`in the coherent noise are attenuated for all of the signals,
`including the voice signal. It has been found that the
`attenuation of some of the frequencies below 300 Hz
`occurs, in that this is where most of the coherent noise
`is concentrated.
`
`A presently preferred embodiment of the differential
`processor 18 is shown in FIG. 4. The differential pro-
`cessor 18 includes a one sample delay element 18a that
`delays the-linear predictor residual signal 16’ by one
`sample time. A Block 18b takes the difference between
`the signal 16', and the delayed signal output by the
`delay block 18a, and provides an output signal 18’.
`As seen in FIG. 5, for the receiver (decoder 30) the
`differential processing is reversed. That is, the delayed
`signal is added back to the original signal. This process
`makes the output signal at the receiver continuous, with
`no large discontinuities in the waveform at lower fre-
`quencies. The result is that a transient in the signal, at
`the frequency of the blocking of the signal during subse-
`quent processing, is prevented.
`It is noted that the differential processors 18 and 30
`are not required to implement the teaching of the inven-
`tion. However, the inclusion of the differential proces-
`sors 18 and 30 is advantageous in that the reconstructed
`speech is made smoother, and also the higher frequen-
`cies are emphasized, which is a desirable feature when
`processing voice signals.
`The operation of the SOLA processor 20 is shown
`diagrammatically in FIG. 6. The incoming signal
`is
`segmented into blocks 20a of from 10 milliseconds to
`100 milliseconds in length. A new signal comprised of
`significantly fewer samples is then generated by over-
`lapping a new block 20b with the existing output signal
`20c, by an amount proportional to the desired sample
`rate reduction. The new block 20b is then added back to
`the output signal 20c. The new block 20b is moved in
`time, compared to the output signal 20c, so that the new
`block is “synchronized” with the output signal. The
`amount that the new block is moved from the nominal
`overlap is determined by performing a correlation of
`the new block 20b, shifted from the nominal overlap,
`with the output signal 20c. The amount of shift that is
`
`5
`
`10
`
`15
`
`20
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,353,374
`
`6
`found to exhibit the greatest correlation is the position
`for adding back the new signal block.
`In the reconstruction process performed by the
`SOLA processor 34 of FIG. 2, a collection of segments
`equal to the length of the original segments are formed.
`A segment from the compressed signal is shifted and
`added back to the output signal. The amount of the shift
`is equal to the shift of the input, but in the opposite
`direction, thereby re-expanding the signal. By marking
`the point in the output signal where each of the input
`segments begins, the correlation process is not required
`to be performed on the output cycle. Alternatively, the
`correlation process can be performed in the output
`cycle, eliminating the overhead of adding in marks from
`the data chain.
`
`In greater detail, and as described by the present
`inventors in the above referenced article entitled “Some
`improvements
`on
`the
`synchronized-overlap-add
`method of time-domain modification for
`real-time
`speech compression and noise filtering” IEEE Journal
`on Acoust. Speech and Signal Proc., Vol. 36, 1988, pp.
`139p—l40, a presently preferred SOLA method begins
`with overlapping frames of time domain data. Frame
`length is defined as N, the amount of new data per frame
`is S,,, and the amount of overlap is thus N—S,,. Com-
`pression is achieved by “accordioning” these frames
`(shifting to increase the amount of overlap) and averag-
`ing, such that the amount of new data per frame is S;
`(where Ss<S,,). For expansion, the shifting is accom-
`plished to decrease the amount of overlap (S,,> S9). The
`ratio of S5/Sq is referred to as the modification factor, Ct.
`For compression, a< 1, while for expansion a> 1.
`If the accordioning is of the same amount for all
`frames, the resulting speech is of poor quality. Thus, the
`SOLA processor 20 varies the degree of accordioning
`such that the amount of new data per frame averages S5,
`but is allowed to vary by an amount km at any particular
`overlapping, where m is the frame number. The amount
`km, also referred to as ASS, may take on any value be-
`tween :*_-N/2. In general, k,,, is chosen such that the
`cross-correlation between the new frame and the aver-
`aged values of the previous frames is maximized. That
`is, kmis the value of k that maximizes
`
`L—l
`
`_EO }’(mS: + k +1) X (msa +1)
`R,,,(k)= -l
`L—-l
`[151 y2(mS, + k +1)
`]
`.2 x2(”1Sa +1)
`_]=0
`J=0
`N/2 E k E N/2
`
`as indicated by S. Roucos and A.M. Wilgus in “High
`Quality Time Scale Modification for Speech”, Prosc.
`ICASSP ’85, pp. 493-496 (1985), where L is the length
`of overlap between the new signal samples x(mSa+j)
`and the composite vector y formed by averaging previ-
`ous overlapped vectors.
`The vector y is updated by each new vector x, once
`km is found, by the formula
`
`y(mSs + km +1) = (1 -/(D).v(mSs + km +1) +10) X (mSa +1)
`for0§j§L,,,— 1,
`and
`
`}’(”1Ss + km +1) = x(mSa +1)
`forL,,,§j§N— 1,
`
`
`
`7
`where f(j) is a weighting function and L,,, is the length
`of the overlap of the two vectors x and y for the particu-
`lar k,,, involved. The method is initialized by setting the
`first y(j)=x(j),j=l. .
`. N. This implies that k1 will al-
`ways be -—Ss( for non-trivial signals) because the maxi-
`mum cross-correlation will occur when each y
`(mS5+k+j)=x(mS,,+j), forj= 1. .
`. N. This occurs for
`the first iteration of the method when k= -—S5.
`The compression ratio is asymptotically S5/Sa, that is,
`the amount of new data in each output frame divided by
`the amount of new data in each overlapped input frame.
`The SOLA processor 20 is responsible for the large
`compression achieved with this technique, relative to
`conventional techniques. The voice signal may be com-
`pressed by more than 8:1 before the voice signal quality
`significantly degrades.
`The SOLA processor 20 is also responsible for the
`suppression of the uncorrelated noise in that, as each
`segment is overlapped and added, there is a coherent
`adding of the voice signal and an incoherent adding of 20
`the noise. The result is an increase in the signal-to-noise
`ratio that is proportional to the square root of the num-
`ber of segments that overlap. For an 8:1 overlap the
`signal-to-noise ratio improves by a factor of 2.83, or
`approximately 9 dB, a significant improvement.
`It has been found that the SOLA processor 20 does
`not operate as well in the presence of coherent noise, in
`that the correlation process that establishes the overlap
`is distorted by the presence of the underlying coherent
`noise. When the noise is uncorrelated there is no partic-
`ular bias in the correlation and, consequently, the over-
`lap due to the noise. However, when the noise is corre-
`lated, and is also strong compared to the voice signal, it
`can overwhelm the voice signal, resulting in a correla-
`tion that locks to the underlying coherent noise signal.
`The result is a capture of the SOLA processor 20 by the
`coherent noise and a suppression and tearing of the
`voice signal, resulting in a poor quality signal.
`In accordance with the teaching of the invention, the
`linear predictor 16 of FIG. 3 overcomes this difficulty
`by strongly attenuating the coherent noise component
`of the incoming signal. As a result, the input to the
`SOLA processor 20 is substantially free of coherent
`noise, which enables the SOLA processor 20 to operate,
`as described above, in an optimal fashion on only the
`voice signal and the incoherent “white” noise.
`As was described above, one known technique for
`coding of the signal is Huffman coding. The signal from
`the SOLA processor 20 is passed through an A law or
`Mu law compander to reduce each signal sample from
`12 bits tip 8 bits. The sample is further reduced to the
`output code word by using a short code word for the
`more probable values, and longer code words for less
`probable values. A typical compression ratio for this
`encoding technique is approximately 2.5 to 1. The re-
`sulting output voice signal data stream contains approx-
`imately 3.2 bits per sample on the average. The average
`bit rate for good quality speech is approximately 3.5
`kbits per second. At 6 kbits to 7 kbits per second the
`quality is very good. At 1.5 kbits per second the voices
`is intelligible, but of poor quality.
`Details of Huffman coding are found in “Information
`Theory and Coding”, by N. Abramson, McGraw-Hill
`Book Company, New York, (1963).
`The above described arithmetic coding technique
`achieves a level of coding that is comparable to the
`Huffman coding. The coding begins from the same 8-bit
`companded signal as the Huffman coder discussed
`
`8
`above. The arithmetic interval represented by the 8-bits
`is divided into two equally probable intervals, and the
`sample is tested to determine which of the two intervals
`contains the sample. It should be noted that the intervals
`are not equal for most signal types, including the com-
`pressed, companded voice signal being considered. The
`choice of the interval containing the sample establishes
`the first bit. The selected interval is then subdivided into
`equally probable segments, and the subinterval contain-
`ing the sample is selected establishing the next bit in the
`code sequence. When the subinterval
`is sufficiently
`small, the coding terminates and the code word is added
`to the output stream.
`Details of arithmetic coding are found in “Adaptive
`Data Compression” by R. Williams, Kluwer Academic
`Publishers, Boston (1991).
`The above mentioned transform coding technique
`achieves a greater compression ratio by better exploit-
`ing the structure of the voice signal. In operation, a
`block of data is transformed using one of several possi-
`ble transforms. Suitable transforms include the Fourier
`transform and the related cosine or Hadamard trans-
`forms. The result is a representation of the signal block
`by a set of transform coefficients. If the transform is
`properly selected, only a few of the transform coeffici-
`ents are large, while the remainder are near zero. The
`coefficients may then be quantized to achieve a selected
`level of distortion, and then encoded for transmission.
`The information coding may be the Huffman coder, the
`arithmetic coder, or some other information coder.
`The transform coding achieves an 8:] compression
`with good voice quality, as compared to the 3:1 com-
`pression acliieved with the use of the Huffman coder or
`the arithmetic coder alone.
`
`Details of transform coding for voice signals are
`found in section 7.8 of “Speech Communication”, D.
`O’Shaugnessy, Addison-Wesley, New York, 1990.
`The compressed signal that is processed by the trans-
`form coder is the companded 8-bit per sample, 1000
`sample per second signal. The transform coding reduces
`the 8-kbits per second data stream to on the order of
`1-kbits per second. When the voice signal is decoded,
`reverse transformed, and expanded at the receiver, very
`good quality speech is reproduced. When the speech is
`noise-free (at the source), the quality of the speech is
`very good indeed. When there is noise, the quality of
`the speech degrades as the signal-to-noise ratio de-
`creases. The noise is not coded to be reproduced, with
`the result that the noise distortion does not resemble the
`noise at the input, but appears instead as a distortion of
`the speech signal. However, the process of the compres-
`sion and encoding operates to suppress the noise, as
`discussed above. As a result, the resulting signal sounds
`superior to the original signal in many high noise envi-
`ronments.
`
`One technique for reducing the data rate for voice
`signals is to send the voice signal through the charmel
`only when there is a voice signal. Typically, a person in
`an exchange of conversation speaks less than half the
`time. In an unbiased conversation, each of the partici-
`pants is speaking one half the time. Even when a person
`is speaking, there are pauses in the speech between
`phrases and between thoughts.
`Thus, a well-known voice detector added to the en-
`coder 10 of FIG. 1 enables the average data rate to be
`reduced to below 500 bits per second, by disabling the
`transmission of the voice signal 24a in the absence of a
`voice input signal.
`
`5,353,374
`
`5
`
`10
`
`15
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`9
`A presently preferred method for achieving re-expan-
`sion of the compressed voice signal by the SOLA pro-
`cessor 34 of FIG. 2 is also described in the above men-
`tioned article entitled “Some improvements on the syn-
`chronized-overlap-add method of time-domain modifi-
`cation for real-time speech compression and noise filter-
`ing”, IEEE Journal on Acoust. Speech and Signal
`Proc., Vol. 36, 1988, pp. 139-140.
`In summary, the teaching of the invention provides a
`method for transmitting a voice signal with a low bit
`rate. The method of the invention reduces the coherent
`noise in one step and reduces the white, or non-coherent
`noise,
`is another step. Voice compression is accom-
`plished in accordance with the SOLA technique, and
`also during a subsequent information coding step. The
`teaching of the invention significantly enhances the
`operation of the SOLA process in a noisy environment
`by suppressing or removing coherent noise with the
`linear predictive coder 16. The differential processor
`reduces an annoying repetitive “whine” component in
`the signal. The SOLA processor 20 also reduces the
`white noise.
`
`The result is a voice processing system capable of
`producing high quality voice signals. When there is
`little noise, the voice signal is of very good quality at
`data rates as low as approximately 1-kbit per second.
`When there is either coherent noise or white noise, the
`data rates may still be as low as approximately l~kbit per
`second, but the noise causes some degradation in the
`voice signal. If the transform coding process is em-
`ployed, the perception is not that there is added noise,
`but that the voice signal is somewhat distorted.
`The various processing blocks shown in FIGS. 1-5
`may be implemented with discrete circuit components,
`or with a combination of discrete circuit components
`and a suitably programmed digital signal processor
`(DSP). Furthermore, other voice processing functions,
`other than those described above, may also be accom-
`plished while processing the voice signal. One example
`is the use of the above mentioned voice detector. Fur-
`thermore, the method and apparatus taught by the in-
`vention may be employed to process audio signals in
`general, and are not limited for use solely with voice