throbber
US006298322B1
`(0) Patent No.
`a2) United States Patent
`US 6,298,322 B1
`
` Lindemann (45) Date of Patent: Oct. 2, 2001
`
`
`(54) ENCODING AND SYNTHESIS OF TONAL
`AUDIO SIGNALS USING DOMINANT
`SINUSOIDS AND A VECTOR-QUANTIZED
`RESIDUAL TONAL SIGNAL
`Inventor: Eric Lindemann, 2975 18th St.,
`Boulder, CG (US) 80304
`(73) Assignee: Eric Lindemann, Boulder, CO (US)
`
`(75)
`
`Jean LaRoche, HNS: Speech Modification Based on a
`Harmonic + Noise Model Proceedings of IEEE ICASSP,
`Apr. 1993, Minneapolis, Minnesota, vol. I, p. 550-553
`Section 2—Description of the Model.
`
`Primary Examiner—Tilivaldis Ivars Smits
`
`(57)
`
`ABSTRACT
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/306,256
`(22)
`Filed:
`May6, 1999
`
`(SL)
`
`Tint, C0ee cccceesecssseesseessseesseeeseeenees GI10L 19/02
`
`(52) U.S. Cheess 704/222; 704/200.1; 704/209;
`704/220
`(58) Field of Search cscs 704/200.1, 206,
`704/207, 209, 220, 222
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`6/1974 Koch.
`3,816,664
`9/1982 Gallitzendorfer.
`4,348,929
`7/1984 Hiyoshi .
`4.461.199
`9/1986 Hideo .
`4,611,522
`8/1989 Quatieri, Jy.
`4,856,068
`12/1989 McAulay.
`4,885,790
`6/1990 McAulay.
`4,937,873
`7/1991 Serra .
`5,029,509
`FOREIGN PATENT DOCUMENTS
`
`.
`
`Tonal audio signals can be modeled as a sum of sinusoids
`with time-varying frequencies, amplitudes, and phases. An
`efficient encoder and synthesizer of tonal audio signals is
`disclosed. The encoder determines time-varying
`frequencies, amplitudes, and, optionally, phases for a
`restricted number of dominant sinusoid components of the
`tonal audio signal to form a dominant sinusoid parameter
`sequence. These components are removed from the tonal
`audio signal to form a residual tonal signal. The residual
`tonal signal is encoded using a residual tonal signal encoder
` (RTSE). In one embodiment, the RTSE gencrates a vector
`quantization codebook (VQC) and residual codebook
`sequence (RCS). The VQC may contain time-domain
`residual waveforms selected from the residual tonal signal,
`synthetic time-domain residual waveforms with magnitude
`spectra related to the residual tonal signal, magnitude spec-
`trum encoding vectors, or a combination of time-domain
`waveforms and magoitude spectrum encoding vectors. The
`tonal audio signal synthesizer uses a sinusoidal oscillator
`bank to synthesize a set of dominant sinusoid components
`from the dominant sinusoid parameter sequence generated
`during encoding. In one embodiment, a residual tonal signal
`is synthesized using a VOC and RCS generated by the RTSE
`during encoding.
`If the VOC includes time-domain
`waveforms, an interpolating residual waveform oscillator
`4/1990 (EP).
`0363233 Al
`
`0363233 B1=11/1994 (EP). may be used to synthesize the residual tonal signal. The
`0813184 Al
`12/1997 (EP) .
`synthesized dominant sinusoids and synthesized residual
`OTHER PUBLICATIONS
`tonal signal are summed to form the synthesized tonal audio
`Scott Levine et al., A Switched Parametric & Transform
`signal.
`Audio Coder, Proceedings of the IEEE ICASSP, May 15-19,
`1999, Phoenix Arizona, Section 2—System Overview.
`
`42 Claims, 26 Drawing Sheets
`
`
`
`
`
`
`2 [ee —
`residual I signal
`
`
`
`150 yt
`+
`mass storage or
`communications channel
`a
`
`
`
`
`
`
`
`
`ee . ane Teaver onasgh
`i
`sinusoidal oscillator bank
`
`synthesizer
`
`
`
`
`
`108
`
`Sony Exhibit 1043
`Sony Exhibit 1043
`Sony v. MZ Audio
`Sony v. MZ Audio
`
`

`

`US 6,298,322 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`5,195,166
`5,226,108
`5,327,518
`5,369,730
`5,401,897
`5,479,564
`5,581,656
`5,686,683
`
`3/1993
`7/1993
`7/1994
`11/1994
`3/1995
`12/1995
`12/1996
`11/1997
`
`Hardwick .
`Hardwick .
`George .
`Yajima.
`Depalle .
`Vogten .
`Hardwick .
`Freed .
`
`2/1998
`5,717,821 *
`4/1998
`5,744,742
`6/1998
`5,765,126 *
`6/1998
`5,774,837
`7/1998
`5,787,387
`9/1998
`5,806,024
`5,848,387 * 12/1998
`
`Tsutsui et al. oo. eeee 704/200.1
`Lindemann.
`Tsutsui et al. oc. 704/200.1
`Yeldener.
`
`Aguilar .
`Ozawa.
`
`Nishiguchi et al. ww... 704/214
`
`* cited by examiner
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 1 of 26
`
`US 6,298,322 B1
`
`tonal audio signal
`
`101
`
`dominant sinusoid encoder
`
`residual
`
`.
`pitch
`sequence
`
`dominant
`sinusoid
`parameter
`sequence
`
`
`
`
`residual tonal signal
`encoder
`
`
`
`residual vector
`residual
`
`103
`codebook|quantization
`sequence|codebook
`
`
`communications channel
`
` mass storage or
`
`104
`
`.
`.
`.
`sinusoidal oscillator bank
`
`residual tonal signal
`‘
`g
`synthesizer
`
`resynthesized
`residual tonal
`signal
`
`resynthesized tonal audio signal
`
`Figure 1
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 2 of 26
`
`US 6,298,322 B1
`
`201
`
`203
`
`204
`
`205
`
`206
`
`207
`
`208
`
`209
`
`210
`
`211
`
`tonal audio signal
`
`n=0; offset = 0
`
`202
`
`offset + frame_length < signal_length
`
`NO
`
`frame=window * low_frequency_signal(offset to (offset+frame_length-1))
`
`YES
`
`frame = zeropad(frame, frame_length * 50)
`
`frame_fft = real_fft( frame)
`
`pitch_sequence[n] = find_pitch(frame_fft .* conj(frame_fft))
`
`indices = maxima(abs(frame_fft},number_of_sinusoids)
`
`frequencies = indices*(fs/2)/length(frame_fft);
`dominant_sinusoi_parameter_sequence[n].frequencies = frequencies;
`dominant_sinusoid_parameter_sequence[n].amplitudes =abs(frame_fft[indices]);
`dominant_sinusoid_parameter_sequence(n].phases =angle(frame_fft[indices]);
`
`
`
`frame_fft = frame_fft .*
`real_fft(zero_pad(zeros_filter(frequencies,frame_length), length(frame_fft))})
`
`residual_tonal_signal = overlap_add(residual_tonal_signal, real_ifft(frame_fft})
`
`
`
`offset += frame_length/2;
`
`n += 1;
`
`
`
`212
`
`return dominant_sinusoid_parameter_sequence, residual_tonal_signal, pitch_sequence
`
`Figure 2
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 3 of 26
`
`US 6,298,322 B1
`
`301
`
`303
`
`304
`
`305
`
`306
`
`307
`
`308
`
`309
`
`310
`
`311
`
`tonal audio signal
`
`n=0; offset = 0
`
`302
`
`offset + frame_length < signal_length
`
`NO
`
`frame=window * low_frequency_signal(offset to (offset+frame_length-1))
`
`YES
`
`frame = zeropad(frame, frame_length * 50)
`
`frame_fft = real_fft( frame)
`
`pitch_sequencef[n] = find_pitch(frame_fft .* conj(frame_Tfft))
`
`low_frequency_fft = frame_fft .* low_pass_fft( pitch_sequence[n]);
`high_frequency_fft = frame_fft .* high_pass_fft( pitch_sequence[n]);
`
`
`
`indices = find_maxima( abs(low_frequency_fft ), number_of_sinusoids)
`
`
`
`frequencies = indices*(fs/2)/length(frame_fft);
`
`dominant_sinusoid_parameter_sequence([n].frequencies = frequencies;
`
`
`dominant_sinusoid_parameter_sequence([n].amplitudes =abs(frame_fft[indices]};
`dominant_sinusoid_parameter_sequencef{n].phases =angle(frame_fft[indices]};
`
`
`
`
`
`residual_tonal_signal = overlap_add(residual_tonal_signal,
`real_ifft(high_frequency_fft))
`
`offset += frame_length/2; n+= 1;
`
`
`312
`
`return dominant_sinusoid_parameter_sequence, residual_tonal_signal, pitch_sequence
`
`Figure 3
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 4 of 26
`
`US 6,298,322 B1
`
`tonal audio signal
`
`n=0; offset = 0
`
`402
`
`.
`offset + frame_length < signal_length
`
`NO
`
`frame=window * low_frequency_signal(offset to (offset+frame_length-1))
`
`YES
`
`frame = zeropad(frame, frame_length * 40)
`
`frame_fft = real_fft( frame)
`
`
`
`pitch_sequence[n] = find_pitch(frame_fft.*conj(frame_fft));
`f0 = pitch_to_frequency(pitch_sequence[n]);
`
`harmonic_bins = round((f0 to fs/2 by f0) / (fs/2) * length(frame_fft))
`
`
`
`
`indices = find_largest(abs(frame_fft[harmonic_bins]},number_of_sinusoids)
`
`a,
`;
`
`
`frequencies = indices*(fs/2)/length(trame_fft);
`dominant_sinusoid_parameter_sequence[n].frequencies = frequencies;
`
`
`dominant_sinusoid_parameter_sequence[n].amplitudes -abs(frame_fft[indices]);
`
`
`dominant_sinusoid_parameter_sequence[n].phases =angle(frame_fft[indices]);
`
`frame_fft = frame_fft .*
`real_fft(zero_pad(zeros_filter( frequencies, frame_length), length(frame_fft)))
`
`
`
`residual_tonal_signal = overlap_add(residual_signal, real_ifft(frame_fft))
`
`
`
`401
`
`403
`
`404
`
`405
`
`406
`
`407
`
`406
`
`409
`
`410
`
`411
`
`412
`
`offset += frame_length/2;
`
`
`n += 1;
`
`413
`
`return dominant_sinusoid_parameter_sequence, residual_tonal_signal, pitch_sequence
`
`Figure 4
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 5 of 26
`
`US 6,298,322 B1
`
`tonal audio signal
`
`n=0; offset = 0
`
`502
`
`.
`offset + frame_length < signal_length
`
`NO
`
`frame=window * low_frequency_signal(offset to (offset+trame_iength-1))
`
`YES
`
`frame = zeropad(frame, frame_length * 50)
`
`frame_fft = real_fft( frame)
`
`
`
`pitch_sequence[n] = find_pitch(frame_tft.*conj(frame_fft));
`{0 = pitch_to_frequency(pitch_sequence[n]);
`
`
`
`indices = round((fO to fO*number_of_sinusoids by f0) / (fs/2) * length(frame_fft))
`
`501
`
`503
`
`504
`
`505
`
`506
`
`507
`
`508
`
`509
`
`510
`
`511
`
`low_frequency_fft = frame_fft .* low_pass_fft( pitch_sequence[n)]);
`high_frequency_fft = frame_fft .* high_pass_fft(pitch_sequence[n]);
`
`frequencies = indices*(fs/2)/length(frame_fft);
`dominant_sinusoid_parameter_sequence[n].frequencies = frequencies;
`dominant_sinusoid_parameter_sequence[n].amplitudes =
`abs(low_frequency_fft[indices]};
`dominant_sinusoid_parameter_sequence[n].phases =
`
`angle(low_frequency_fft[indices});
`
`residual_tonat_signal = overlap_add(residual_tonal_signal,
`real_ifft(high_frequency_fft))
`
`offset += frame_length/2;
`
`
`n += 1;
`
`512
`
`return dominant_sinusoid_parameter_sequence, residual_tonal_signal, pitch_sequence
`Figure 5
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 6 of 26
`
`US 6,298,322 Bl
`
`magnitude
`spectrum
`sequence
`
`magnitude
`spectrum
`codebook
`
`
`
`residual
`codebook
`
`sequence
`
`residual
`amplitude
`sequence
`
`.
`residual
`codebook pitch
`
`codebook
`
`residual
`waveform
`
`Figure 6
`
`amplitude
`
`residual
`codebook
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 7 of 26
`
`US 6,298,322 B1
`
`residual_tonal_signal
`
`n=0
`
`702
`
`,
`offset + frame_length < signal_length
`
`NO
`
`YES
`
`frame = window * residual_tonal_signal(
`offset to (offset+frame_length-1))
`
`frame_fft = real_fft(frame)
`
`magnitude_spectrum = abs(frame_fft)
`
`residual_amplitude_sequence[n] = sqrt(sum(magnitude_spectrum.‘’2))
`
`magnitude_spectrum = smooth_spectrum( magnitude_spectrum)
`
`,
`magnitude_spectrum _sequence[n] =
`magnitude_spectrum / residual_amplitude_sequence{n]
`
`700
`
`701
`
`703
`
`704
`
`705
`
`706
`
`707
`
`708
`
`709
`
`offset += frame_length/2;
`n += 1;
`
`
`710
`
`return
`
`Figure 7
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 8 of 26
`
`US 6,298,322 B1
`
`800
`
`pitch_sequence
`
` last_total_distance = LARGE_NUMBER;
`total_distance = LARGE_NUMBER/2;
`B01
`
`
`
`indices =ceil(rand(number_of_codebook_vectors)*length(magnitude_spectrum_sequence}});
`magnitude_spectrum_codebook = magnitude_spectrum_sequencelindices];
`
`
`residual_codebook_pitch = pitch_sequence[indices];
`
`
`residual_codebook_amplitude = residual_amplitude_sequence|[indices];
`
`ast_total_distance - total_distance > PROGRESS_THRESHOLD
`
`g
`
`$02
`
`YES
`
`form =1 to number_of_codebook_vectors
`
`817
`
`NO
`returnh
`
`Q
`
`forn = 1
`
`to length(magnitude_spectrum_sequence)
`
`distance[m][n] = (residual_amplitude_sequence[n].42
`
`803
`
`804
`
`805
`
`808
`
`+ residual_codebook_amplituder[m].‘2)
`- 2*(magnitude_spectrum_sequence[n]'*magnitude_spectrum_codebook[m])
`806
`+ pitch_weight*round(abs(pitch_sequence[n] - codebook_pitch[m])/pitch_sz)
`807 b_edsOO™~=“—sSOSOSOTC(CSSCSY
`
`
`
`809
`810
`
`811
`
`812
`
`813
`
`814
`
`815
`
`816
`
`to length(magnitude_spectrum_sequence)
`forn = 1
`residual_codebook_sequence[n] = closest_vector( distance[all][n]}
`
`form = 1
`
`to number_of_codebook_vectors
`
`
`
`
`{min_distances, indexes] = find( residual_codebook_sequence ==
`m)
`new_magnitude_spectrum = sum( magnitude_spectrum_sequence[indexes])/length(indexes);
`new_pitch = sum(pitch_sequence[indexes])/length({indexes);
`new_amplitude = sqrt(sum(new_magnitude_spectrum.’2));
`magnitude_spectrum_codebook[m] = new_magnitude_spectrum/new_amplitude;
`residual_codebook_pitch[m] = new_pitch;
`residual_codebook_amplitude[m]=new_amplitude;
`
` last_total_distance = total_distance;
`total_distance = sum(min_distances);
`
`Figure 8
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 9 of 26
`
`US 6,298,322 B1
`
`residual_tonal_signal and
`pitch_sequence
`
`form = 1
`
`to number_of_codebook_vectors
`
`900
`
`901
`
`902
`
`903
`
`for n = 1
`
`to length(magnitude_spectrum_sequence)
`
`+ pitch_weight*round(abs(pitch_sequence[n] - codebook_pitch[m])/pitch_sz)
`
`distance[m][n] = (residual_amplitude_sequence[n].42 +
`residual_codebook_amplitude[m]).42 -
`2*(magnitude_spectrum_sequence[n]"*magnitude_spectrum_codebook[m])
`
`905
`906
`907
`
`to number_of_code_book_vectors
`for m= 1
`closest_frame_index = closest_vector( distance{m][all])
`wave_start = (closest_frame_index-1)*frame_length/2
`908
`residual_waveform_codebook([m] =
`909
`residual_tonal_signal[wave_start to wave_start+ frame_length]
`910
`residual_codebook_pitch[m] = pitch_sequencefclosest_frame_index]
`residual_codebook_amplitude[m] = 911
`
`residual_amplitude_sequence[closest_frame_index]
`
`912
`
`return
`
`Figure 9
`
`

`

`U.S. Patent
`
`Sheet 10 of 26
`
`US 6,298,322 Bl
`
`harmonic
`spectrum
`sequence
`
`residual
`amplitude
`sequence
`
`amplitude
`
`harmonic
`spectrum
`codebook
`
`residual
`codebook
`
`residual
`codebook
`
`sequence
`
`residual
`waveform
`codebook
`
`Figure 10
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 11 of 26
`
`US 6,298,322 B1
`
`residual_tonal_signal
`and pitch sequence
`
`
`offset =0; n= 0; harmonic_spectrum_sequence[all]fall] =0;
`
`
`
`41102
`
`.
`offset + frame_length < signal_length
`
`NO
`
`frame = window .* residual_tonal_signal(offset to (offset+frame_length-1))
`
`YES
`
`frame_fft = real_fft( frame)
`
`fO = pitch_to_frequency(pitch_sequence[n])
`
`highest_harmonic = floor((fs/2) / 0);
`
`for k = t to highest_harmonic
`
`harmonic_freq = (k*f0)
`
`harmonic_bin = round(harmonic_freq/(fs/2)* length(frame_fft)
`harmonic_spectrum_sequence[n][k] = abs(frame_fft{harmonic_bin})
`
`residual_amplitude_sequence[n] =
`sqrt(sum(harmonic_spectrum_sequence{n].‘2))
`
`harmonic_spectrum_sequence[n] /= residual_amplitude_sequence[n])
`
`1101
`
`1103
`
`1105
`
`1106
`
`1107
`
`1108
`
`1109
`
`1110
`
`1111
`1112
`
`1113
`
`1114
`
`1115
`
`offset += frame_length; n += 1;
`
`
`1116
`
`Figure 11
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 12 of 26
`
`US 6,298,322 B1
`
`pitch_sequence
`
`1200
`
`last_total_distance = LARGE_NUMBER;
`
`total_distance = LARGE_NUMBER / 2;
`
`1201
`
`indices = ceil(rand(number_of_codebook_vectors)*length(harmonic_spectrum_sequence));
`harmonic_spectrum_codebook = harmonic_spectrum_sequence[indices];
`
`residual_codebook_amplitude = sqrt(sum(harmonic_spectrum_codebook.42));
`
`1202
`
`
`
`last_total_distance - total_distance > PROGRESS_THRESHOLD
`
`1217
`
`NO
`
`for m = 1
`
`to number_of_codebook_vectors
`
`1204
`
`1205
`
`1206
`1207
`
`forn = 1
`
`to length(harmonic_spectrum_sequence)
`
`residual_codebook_amplitude[m]).*2 -2 *harmonic_spectrum_sequence[n]" *harmonic_spectrum_codebook[m];
`
`distance[m][n] = (residual_amplitude_sequence[n].42 +
`
`1208
`1209
`1210
`
`1211
`
`1212
`
`1213
`
`:
`forn = 1
`to length(harmonic_spectrum_sequence)
`residual_codebook_sequence[n] = closest_vector( distancefall][n])
`
`form = 1
`
`to number_of_codebook_vectors
`
`[min_distances, indexes] = find( residual_codebook_sequence =
`new_harmonic_spectrum = sum( harmonic_spectrum_sequence[indexes]}/
`length(indexes);
`residual_codebook_amplitude[m] =
`sqrt(sum(new_harmonic_spectrum).42);
`harmonic_spectrum_codebook[m] = new_codebook_vector /
`residual_codebook_amplitude[m];
`
` 1214
`
`1215
`
`1216
`
`_total_distance = sum{min_distances);
`last_total_distance = total_distance;
`
`Figure 12
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 13 of 26
`
`US 6,298,322 B1
`
`
`
`phases = rand( frame_length/2) .* 2*Pl
`
`
`
`form = 1
`to number_of_codebook_vectors
`
`frame_fft = harmonic_spectrum_codebook[m].*exp(j.“phases)
`
`
`
`
`
`residual_waveform_codebook[m] = real_ifft( frame_fft)
`
`
`
`
`
`Figure 13
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 14 of 26
`
`US 6,298,322 B1
`
`1400
`
`1403
`
`LPC sequence
`
`LPC
`codebook
`
`1405
`
`LPC variance
`
`excitation
`amplitude
`sequence
`
`excitation
`signal
`
`sequence
`variance
`
`1404
`
`codebook
`
`Figure 14
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 15 of 26
`
`US 6,298,322 B1
`
`residual_tonal_signal
`
`1500
`
`1501
`
`1503
`
`1504
`
`1505
`
`1506
`
`1507
`
`offset = 0
`
`1502
`
`;
`offset + frame_length < signal_length
`
`NO
`
`YES
`
`frame = window *
`residual_tonal_signal(offset to (offset+frame_length-1))
`
`
`
`(LPC_sequence[n], residual_amplitude_sequence[n]) =
`generate_LPC_coefficients_and_amplitude(frame);
`
`
`LPC_variance_sequence[n] = sum(LPC_sequence[n].*2)
`
`excitation_segment =inverse_filter( frame, LPC_sequence[n)
`
`
`
`excitation_signal =overlap_add( excitation_signal, excitation_segment)
`
`offset += frame_length;
`n+= 1;
`
`
`
`1508
`
`Figure 15
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 16 of 26
`
`US 6,298,322 B1
`
`pitch_sequence
`
`last_total_distance = LARGE_NUMBER;
`
`total_distance = LARGE_NUMBER / 2:
`
` 1600
`
`1601
`
`indices = ceil(rand( number_of_codebook_vectors)*length(LPC_sequence));
`LPC_codebook = LPC_sequence[indices];
`LPC_codebook_variance = sum(LPC_codebook.’2);
`
`1602
`
`
`
`
`last_total_distance - total_distance > PROGRESS_THRESHOLD
`
`1617
`
`
`NO
`
`1603
`
`YES
`
`1604
`1605
`
`form = 1
`to number_of_codebook_vectors
`for n= 1
`to length(LPC_sequence)
`distance[m][n] = (LPC_variance_sequence[n] +
`LPCG_codebook_variance[m])
`- 2*(LPC_sequence[n]" *LPC_codebook[m]);
`1606
`vor,
`
`
`
`1608
`
`1609
`1610
`
`1611
`
`1612
`1613
`
`1614
`
`1615
`
`1616
`
`forn = 1
`to length(LPC_sequence)
`LPC_codebook_sequence[n] = closest_vector( distance[all][n])
`
`for m = 1
`to number_of_codebook_vectors
`{min_distances, indexes] = find( LPC_codebook_sequence == m)
`
`.
`.
`new_LPC_vector = sum( LPC_sequence[indexes])/ length(indexes);
`
`
`
`sum(new_LPC_vector.42);
`LPC_codebook_variance[m] =
`LPC_codebook[m] = new_LPC_vector
`
`total_distance = sum(min_distances);
`last_total_distance = total_distance;
`
`Figure 16
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 17 of 26
`
`US 6,298,322 B1
`
`magnitude_spectrum
`
`log_spectrum = log( magnitude_spectrum.42)
`
`cepstrum = ifft( log_spectrum)
`
`windowed_cepstrum = ceptstrum .* smoothing_window
`
`smoothed_log_spectrum = fft(windowed_cepstrum)
`
`1700
`
`1701
`
`1702
`
`1703
`
`1704
`
`smoothed_magnitude_spectrum = sqrt(exp(smoothed_log_spectrum))
`
`1705
`
`return smoothed_magnitude_spectrum
`
`Figure 17
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 18 of 26
`
`US 6,298,322 B1
`
`magnitude_squared_spectrum
`
`
`max_pitch_power = 0;
`best_pitch = 0;
`
`
`
`
`for pitch = pitch_min to pitch_max by 1/20
`
`
`{0 = pitch_to_frequency(pitch);
`harmonic_grid = round(((1 to floor( fs/2*fO)}*f0)/(fs/2)
`ectrum));
`
`1802
`
`pitch_power = sum(magnitude_squared_spectrum(harmonic_grid))
`
`if( pitch_power > max_pitch_power){
`max_pitch_power = pitch_power;
`best_pitch = pitch;
`
`
`
`
`
`return best_pitch
`
`}
`
`Figure 18
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 19 of 26
`
`US 6,298,322 B1
`
`frequency[n][1]
`phase[n][1]
`amp[n][1]
`
`frequency[n][2]
`phase[n][2]
`
`frequency[n][N]
`phase[n][N]
`amp([n][N}
`
` osccillator N
`
`
`
`
`osccillator 1
`
`osccillator 2
`
`output{1]
`
`output[2]
`
`output[N]
`
`Figure 19
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 20 of 26
`
`US 6,298,322 B1
`
`2014
`
`amp
`
`amplitude register
`
`2000
`
`hase
`
`P
`
`2009
`
`frequenc
`
`y
`
`phase to table offset
`conversion
`
`; frequency to phase
`increment conversion
`
`2001
`
`
`table
`
`
`
`
`
`initial offset
`register
`
`2010
`
`phase increment
`register
`
`2005
`
`sine wave
`
`2006
`
`2011
`
`Sx)
`a>
`
`2012
`
`frame
`sample counter
`
`window
`table
`
` 2008
`
`x)
`XS
`
`ob
`
`Figure 20
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 21 of 26
`
`US 6,298,322 B1
`
`
`
`
`2114
`
`amp
`
`phase
`
`frequency
`
`2108
`
`Previous amp
`
`2100
`
`2102
`
`phaseto table offset
`conversion
`
`frequency to phase
`increment conversion
`
`2109
`
`a
`
`2110
`
`2111
`
`divide by
`
`2103
`
`,
`phase increment
`register
`
`register
`
`2112 phase accumulator
`table
`
`2106
`
` sine wave
`
`amp accumulator
`register
`
`2107
`
`SO
`
`sinusoid output
`
`Figure 21
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 22 of 26
`
`US 6,298,322 B1
`
`amp
`
`2200
`
`phase
`
`2202
`
`2216
`
`previous amp
`
`phase to table offset
`conversion
`
`2203
`
`frequency
`
`
`
`pitch to phase
`increment conversion
`
`register
`
`
`phase increment
`
`2213
`
`(FIR jength-1) register
`
`FIR index counter
`
`onemory.
`
`2214
` sinewave
`
`LL
`
`2206
`
`amp
`accumulator
`
`register
`
`
`
`ed
`r|
`
`register
`
`2212
`
`OY)
`
`sinusoid output
`
`Figure 22
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 23 of 26
`
`US 6,298,322 B1
`
`residual
`i
`amplitude
`sequence
`
`residual codebook
`sequence
`
`waveform
`select
`
`2300
`
`initi
`initial_phase
`
`2307
`
`i
`pitch_sequence
`
`phase to table offset conversion
`
`2324
`
`3325
`
`2301
`
`2308
`
`amplitude
`
`op
`
`mila
`
`register 2310
`
`2321
`
`2326
`
`2304
`
`phase accumulator register
`
`pitch to phase
`increment
`conversion
`
`phase increment
`
`(FIR length-1) register
`
`FIR index counter
`
`S<) {|\_integer part|fractional part{fi {fin
`
`
`
`LY
`a
`LL
`VL
`N
`
`residual
`S2)
`waveform
`cS
`codeboo
`
`FIR coefficient memory
`
`2311
`
`waveform
`
`length
`
`2322
`
`2305
`
`4
`
`—
`
`2314
`
`2313
`
`accumulator register
`
`2315
`
`2318
`
`
`
`
`
`2)[namesamplecounter
`
`2
`
`52)]windowaio
`ae WSyr
`
`Jast half frame table
`
`synthesized residual tonal
`signal
`Figure 23
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 24 of 26
`
`US 6,298,322 Bl
`
`LPC codebook sequence
`
`excitation amplitude
`sequence
`
`pitch_sequence
`
`2400
`
`2406
`
`coefficients
`select register
`
`2404
`
`gain
`register
`
`excitation
`synthesizer
`
`2407
`
`x)<>
`
`2401
`
`2405
`
`2408
`
`xX
`length
`
`coofficient
`vector
`
`LPC codebook
`
`2402
`
`all-pole filter
`
`2411
`
`frame sample
`counter
`
`2412
`
`[earn
`
`
`
`
`last half frame
`table
`
`2409
`
`2410
`
`synthesized residual tonal
`signal
`
`Figure 24
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 25 of 26
`
`US 6,298,322 B1
`
`residual tonal signal
`
`vector
`quantizer
`
`
`
`residual vector
`residual
`
`quantization
`codebook
`
`codebook
`sequence
`
`
`Figure 25
`
`

`

`U.S. Patent
`
`Oct. 2, 2001
`
`Sheet 26 of 26
`
`US 6,298,322 B1
`
`residual tonal signal
`
`
`
`quantizer
`
`
`
`
`
`
`residual
`waveform
`
`
`codebook
`sequence
`
`
`residual
`codebook
`
`
`
`Figure 26
`
`

`

`US 6,298,322 Bl
`
`1
`ENCODING AND SYNTHESIS OF TONAL
`AUDIO SIGNALS USING DOMINANT
`SINUSOIDS AND A VECTOR-QUANTIZED
`RESIDUAL TONAL SIGNAL
`
`FIELD OF THE INVENTION
`
`This invention relates to encoding and synthesizing tonal
`audio signals, especially voiced speech and music signals.
`BACKGROUND OF THE INVENTION
`
`Tonal sounds can be effectively modeled as a sum of
`sinusoids with time-varying parameters consisting of
`frequency, amplitude, and phase. The key word here is
`“effectively” because, in fact, all sounds can be modeled as
`sums of sinusoids, but the number of sinusoids may be
`extremely large, and the time-varying sinusoidal parameters
`may not have intuitive significance. Colored noise signals
`like breath noise, ocean waves, and snare drums are
`examples of soundsthatare not effectively modeled by sums
`of sinusoids. Pitched musical instruments such as clarinet,
`trumpet, gongs, and certain cymbals, as well as ensembles of
`these instruments are examples of tonal sounds that are
`effectively modeled as sums of sinusoids.
`Many sounds are modeled as a combination of tonal and
`non-tonal, or colored noise, sounds. Flute and violin both
`have tonal and colored noise components. Human speech is
`often modeled as a mixture of tonal or “voiced” speech, and
`colored noise or “unvoiced” speech. The present invention is
`concerned with encoding and synthesizing tonal audio sig-
`nals. This invention can be used in conjunction with systems
`for encoding and synthesizing non-tonal or colored noise
`signals.
`Pitched signals are a special class of tonal audio signals in
`which the sinusoidal frequencies are harmonically related.
`The present invention can be used for encoding and synthe-
`sizing both pitched and unpitched tonal audio signals. Spe-
`cifically optimized embodiments are proposed for encoding
`and synthesizing pitched tonal audio signals.
`In this specification we use the term “tonal audio signal”
`to refer to all audio signals that can be effectively modeled
`as a sum of sinusoids with time-varying parameters consist-
`ing of frequency, amplitude, and phase. Theseare all signals
`that are not noise-like in character. We use the term “pitched
`tonal audio signal” or simply “pitched signal” to refer to
`tonal audio signals whose sinusoidal frequencies are har-
`monically related. The term “voiced signal” is a common
`term of art
`that refers to the pitched tonal audio signal
`component of a speech signal. The term “unvoiced signal”
`is a term ofart that refers to the noise-like componentof a
`speech signal. This is the non-tonal part of the signal that
`cannot be effectively modeled as a sum of sinusoids with
`time-varying parameters consisting of frequency, amplitude,
`and phase.
`One method of encoding and synthesizing tonal audio
`signals is additive sinusoidal encoding and synthesis. This
`method provides excellent results since the encoding and
`synthesis model is the same model as the signal: a sum of
`sinusoids with time-varying parameters. U.S. Pat. Nos.
`4,885,790 and 4,937,873, both to McCauley et. al, and US.
`Pat. No. 4,856,068, to Quatieri, J R. et al., teach systems for
`encoding and synthesizing sound waveforms as a sums of
`sinusoids with time-varying amplitude,
`frequency, and
`phase. While sinusoidal encoding and synthesis provides
`excellent results for tonal audio signals,
`the synthesis
`requires large computational resources because many tonal
`audio signals may involve one hundred or more individual
`sinusoids.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`To reduce the computational requirement of sinusoidal
`synthesis U.S. Pat. Nos. 5,401,897 to Depalle et al., 5,686,
`683, to Freed, and 5,327,518 teach systems for sinusoidal
`synthesis using Inverse Fast Fourier Transform (IFFT)tech-
`niques. While this approach reduces somewhat the compu-
`tation requirements for synthesis of a large number of
`parameters,
`the computation is still expensive and new
`problemsare introduced. Many synthesis environments, for
`example musical synthesizers, require multi-channel output.
`Using IFFT approaches, a separate IFFT system must be
`used for every channel. In addition, IFFT systems limit
`sinusoidal parameter update to once per frame, where a
`frame_length must be at least as long as the lowest fre-
`quency period. This parameter update rate may be insuffi-
`cient at higher frequencies.
`USS. Pat. Nos. 5,581,656, 5,195,166, and 5,226,108, all to
`Hardwicket al., teach a system where a certain number of
`sinusoids,
`the dominant or low-frequency sinusoids, are
`synthesized using traditional time-domain sinusoidal addi-
`tive synthesis, while the remaining sinusoids are synthesized
`using an IFFT approach. This permits higher update rate for
`the dominant sinusoid components while taking advantage
`of the lower IFFT computation rate for the bulk of the
`sinusoids. This approach has the disadvantages of IFFT
`computation cost especially with multi-channel synthesis.In
`addition, the dominant sinusoid components are usually at
`lower frequencies andit is the higher that often require an
`increased parameter update rate.
`A number of less compute-intensive systems have been
`proposed for encoding and synthesizing tonal audio signals.
`Linear Predictive Coding (LPC) is well knownin the art of
`speech coding and synthesis. Methods for using LPC for
`synthesizing tonal or voiced speech concentrate on methods
`for generating the tonal excitation signal. The numerous
`approachesinclude, generating a pulse-train at the desired
`pitch, generating a multi-pulse excitation signal at
`the
`desired pitch, vector quantizing (VQ) the excitation signal,
`and simply transmitting the excitation signal with fewerbits.
`U.S. Pat. No. 5,744,742,
`to Lindemann et al.,
`teaches a
`system for encoding excitation signals as single pitch period
`loops. To synthesize excitation signals at different pitches or
`amplitudes, weighted sums of pitch period excitation signal
`loops are created. The excitation signal pitch periods are
`stored in single pitch period waveform memory tables. The
`phase responseofall excitation signal waveformsis forced
`to be the same so that weighted sums of the waveforms do
`not cause phase cancellation. All of these techniques with
`the exception of simply transmitting the excitation signal
`give poorer results than full additive sinusoidal encoding
`and synthesis. The pulse based techniques in particular
`sound “buzzy” and unnatural.
`USS. Pat. Nos. 5,369,730 to Yajima, 5,479,564 to Vogten
`et al., European Patent 813,184 AJ to Dutoit et al., European
`Patents 0,363,233A1 and 0,363,233B1, both to Hamon,
`teach methodsof pitch synchronous concatenated waveform
`encoding and synthesis. With this method a numberofsingle
`pitch period waveformsare stored in memory. To synthesize
`a time-varying signal, a sequence of single pitch period
`waveformsis selected from waveform memory and concat-
`enated over time. The waveform are usually overlap-added
`for continuity. To shift the pitch of the synthesized signal the
`overlap rate is modulated. While relatively inexpensive in
`terms of compute resources,
`this approach suffers from
`distortions especially associated with the pitch shifting
`mechanism. Is audibly inferior to full additive synthesis for
`most tonal audio signals.
`In the music synthesizer field, an approach similar con-
`catenated waveform synthesis is referred to as waveform
`
`

`

`US 6,298,322 Bl
`
`3
`sequencing. With waveform sequencing each single pitch
`period waveform is pitch shifted using sample rate conver-
`sion techniques and looped for a specified time to generate
`a stable magnitude spectrum. To generate time-varying
`magnitude spectra the waveforms are generally cross-faded
`over time. U.S. Pat. Nos. 3,816,664, to Koch, 4,348,929, to
`Gallitzendorfer, 4,461,199 and Reissue 34,913, to Hiyoshi et
`al., and U.S. Pat. No. 4,611,522 to Hideo teach systems of
`waveform sequencing relative to music synthesis. Wave-
`form sequencing can be economical
`in computation
`resources but much of the complex time-varying character
`of the magnitude spectra is lost due to reduction to a limited
`number of waveforms.
`
`Anumberof hybrid systems have been proposedthat use
`additive sinusoidal encoding and synthesis for one part of a
`signal—usually the tonal part—and some other technique
`for the another part of the signal—usually the colored noise
`part. U.S. Pat. No. 5,029,509 to Serra et al. teaches a system
`for full sinusoidal encoding and synthesis of the tonal part of
`a signal and LPC coding of the non-tonal part of the signal.
`This approach has the computational expense of full sinu-
`soidal additive encoding and synthesis plus the expense of
`LPC coding and synthesis. A similar approach is applied to
`speechsignals in U'S. Pat. Nos. 5,774,837, to Yeldeneretal.,
`and USS. Pat. No. 5,787,387 to Aquilar.
`In “A Switched Parametric & Transform Audio Coder”,
`Scott Levine et al., Proceedings of the IEEE ICASSP, May
`15-19, 1999 Phoenix, Ariz., a system is taught wherein low
`frequencies are encoded and synthesized using full sinusoi-
`dal additive synthesis, and high frequencies are encoded
`using LPC with a white noise excitation signal. This is
`economical in terms of computation, but the high-frequency
`synthesized signal sounds excessively noise-like for tonal
`audio signals. A similar approachis applied to voiced speech
`signals in “HNS: Speech Modification Based on a
`Harmonic+Noise Model,” J. Laroche et al., Proceedings of
`IEEE ICASSP, April 1993, Minneapolis, Minn. The use of
`colored noise to model the high frequencies of tonal audio
`signals is less objectionable when applied to speech signals,
`but still results in some “buzzyness” at high frequencies.
`US. Pat. No. 5,806,024,
`to Ozawa, teaches a system
`wherein the short time magnitude spectrum of the tonal
`audio signal is determined in frames. The tonal audio signal
`is assumed to have a harmonic component with time-varying
`pitch. The pitch varies slowly enoughthat it can be consid-
`ered constant over each frame. For each frame, a pitch is
`determined. A harmonic spectrum is determined for each
`frame as the values of the magnitude spectrum at multiples
`of the pitch frequency. A residual spectrum is determined for
`each frame as the magnitude spectrum minus the harmonic
`spectrum. The harmonic spectrum framesand residual spec-
`trum frames are vector quantized (VQ) to form a harmonic
`spectrum codebook, residual spectrum codebook, and a gain
`codebook. The signal is encoded as sequence of unique
`coding vector numbers identifying coding vectors in these
`codebooks. Thus the harmonic spectrum codebook sequence
`codes the pitched part of the signal, and the residual code-
`book sequence codes the non-tonal and non-pitched-but-
`tonalpart of the signal. This approach can be economical but
`with VQ, muchof the richness in time-varying behavior is
`lost. This is especially true for complex tonal audio signals
`such as high-fidelity music signals.
`
`BRIEF SUMMARY OF THE INVENTION
`
`Accordingly, one object of the present invention is to
`synthesize tonal sounds, especially voiced speech or musical
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`sound, of high quality equivalent to full sinusoidal additive
`synthesis or IFFT sinusoidal synthesis, but with fewer
`encoding parameters and greatly reduced computational
`requirements.
`Another object of the present invention is to synthesize
`tonal sounds without the artificial “buzzyness” associated
`with pulse-based LPC techniques.
`Another object of the present invention is to synthesize
`high quality tonal sounds without audible loss of complex
`time-varying behavior associated with harmonic VQ or
`waveform sequencing tec

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket