throbber
Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`73
`
`3.2 THE BANK-Of .. fH:TERS FRONT-END PROCESSOR
`
`A block diagram of the canonic structure of a complete filter-bank front-end analyzer is
`given in Figure 3.4. The sampled speech signal, s(n),
`is passed through a bank of Q
`bandpass filters, giving the signals
`s;(n) = s(n) * h;(n),
`= L h;(m)s(n
`
`M;-1
`
`m=O
`
`- m),
`
`(3.la)
`
`(3.1 b)
`
`where we have assumed that the impulse response of the i'h bandpass filter is h;(m) with
`a duration of M; samples; hence, we use the convolution representation of the filtering
`operation to give an explicit expression for s;(n),
`the bandpass-filtered speech signal. Since
`the purpose of the filter-bank analyzer is to give a measurement of the energy of the speech
`is passed through a
`signal in a given frequency band, each of the bandpass signals, s;(n),
`nonlinearity, such as a full-wave or half-wave rectifier. The nonlinearity shifts the bandpass
`signal spectrum to the low-frequency band as well as creates high-frequency images. A
`lowpass filter is used to eliminate the high-frequency images, giving a set of signals, u;(n),
`1 < i ~ Q, which represent an estimate of the speech signal energy in each of the Q
`frequency bands.
`To more fully understand the effects of the nonlinearity and the lowpass filter, let us
`assume that the output of the ith bandpass filter is a pure sinusoid at frequency w;, i.e.
`s;(n) = a; sin(w;n).
`This assumption is valid for speech in the case of steady state voiced sounds when the
`bandwidth of the filter is sufficiently narrow so that only a single speech harmonic is passed
`by the bandpass filter. If we use a full-wave rectifier as the nonlinearity, that is,
`/(s;(n)) = S;(n)
`for s;(n) ~ 0
`= -s;(n)
`for s;(n) < 0.
`Then we can represent the nonlinearity output as
`v;(n) = f(s;(n))
`
`(3.2)
`
`(3.3)
`
`(3.4)
`
`= s;(n)
`
`• w(n),
`
`where
`
`+ 1
`if S;(n) ~ 0
`w(n) =
`{
`if S;(n) < 0
`-1
`Since the nonlinearity output can be viewed as a
`as illustrated in Figure 3.5(a)-(c).
`modulation in time, as shown in Eq. (3.4), then in the frequency domain we get the result
`that
`
`(3.5)
`
`are the Fourier transfonns of the signals v;(n),
`and W(eiw)
`where V;(eiw),
`s;(n)
`S;(eiw),
`and w(n),
`respectively, and ® is circular convolution. The spectrum S;(eiw) is a single
`impulse at w0 = w;, whereas the spectrum W(eiw) is a set of impulses at the odd-harmonic
`
`(3.6)
`
`IPR2023-00037
`Apple EX1013 Page 104
`
`

`

`Figure 3.4 Complete bank-of-filters analysis model.
`
`I
`
`-
`
`XQ(m) -
`
`-COMPRESSION
`
`AMPLITUDE
`
`ua(ml
`
`REDUCTION
`
`RATE
`
`ta(n) ~ SAMPLING
`
`-
`
`LOWPASS
`
`FILTER
`
`-
`
`v0(n)_
`
`NONLINEARITY
`
`SQ(n)_
`
`-
`
`FILTER Q
`BANDPASS
`
`4
`
`•
`•
`•
`
`•
`•
`•
`
`•
`•
`•
`
`•
`•
`•
`
`•
`•
`•
`
`■ -
`
`• (n)
`
`•1 (m) -
`
`-COMPRESSION
`
`u1(ml_ AMPLITUDE
`
`REDUCTION
`
`RATE
`
`-
`
`l,(n) _ SAMPLING
`
`LOWPASS
`
`FILTER
`
`v1 (n) _
`
`NONLINEARITY
`
`•1 (n) -
`
`FILTER 1
`BANDPASS
`
`_.
`
`.... -
`
`IPR2023-00037
`Apple EX1013 Page 105
`
`

`

`Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`75
`
`•1 (n)
`
`+1 w (n) I
`
`-1
`
`I
`I
`
`1
`I
`I
`I
`
`r Si (el "l I I w,
`rw (ol•ll I
`f v, (ol•ll
`
`w,
`
`0
`
`I
`
`I
`I
`
`I
`I
`I
`I
`
`I
`
`.
`I
`I
`
`I
`I
`I
`I
`
`I
`
`'
`
`I
`I
`
`I
`I
`I
`I
`
`n
`
`n
`
`1
`
`3w1
`
`t
`
`2w1
`
`L~
`
`4w1
`
`Figure 3.5 Typical waveforms and spectra for analysis of a pure sinusoid
`in the filter-bank model.
`
`frequencies Wq = w;q, q = 1, 3, ... , qmax• Hence the spectrum of V;(eiw) is an impulse at
`w = 0 and a set of smaller amplitude impulses at wq = w;q, q = 2, 4, 6, ... , as shown in
`Figure 3.5 (d)-(f). The effect of the lowpass filter is to retain the DC component of V;(eiw)
`and to filter out the higher frequency components due to the nonlinearity.
`The above analysis, although only strictly correct for a pure sinusoid, is a reasonably
`good model for voiced, quasiperiodic speech sounds so long as the bandpass filter is not so
`wide that it has two or more strong signal harmonics. Because of the time-varying nature of
`the speech signal (i.e., the quasiperiodicity), the spectrum of the lowpass signal is not a pure
`
`I
`
`I
`
`r
`'
`~
`
`~ \.
`
`IPR2023-00037
`Apple EX1013 Page 106
`
`

`

`76
`
`Chap. 3
`
`Signal Processing and Analysis Methods
`
`_::~
`~~J
`~~ - ,~:~:::J
`< J l:I I l 11111 ! I 1:1 1:1 fl 111 I
`
`CD
`~
`w
`C
`::::,
`....
`z
`i
`:E

`
`w
`::::, -6500
`..J
`>
`
`MOO~
`
`241
`
`200
`
`SAMPLE
`
`113~
`
`42~
`
`-10
`
`,,.~ ______::
`
`510
`
`5000
`
`FREQUENCY
`
`Figure 3.6 Typical waveforms and spectra of a voice speech signal in the bank-of-filters
`analysis model.
`
`DC impulse, but instead the information in the signal is contained in a low-frequency band
`around DC. Figure 3.6 illustrates typical waveforms of s(n), s;(n), w(n) and v;(n) for a brief
`(20 msec) section of voiced speech processed by a narrow bandwidth channel with center
`frequency of 500 Hz (sampling frequency for this example is 10,000 Hz). Also shown
`are the resulting spectral magnitudes for the four signals. It can be seen that jS;(eiw)j has
`most of its energy around 500 Hz (w = IOO(hr), whereas I W;(eiw) I (which is quasiperiodic)
`approximates an odd harmonic signal with peaks at 500, 1500, 2500 Hz. The resulting
`signal spectrum, jv;(eiw)j, shows the desired low-frequency concentration of energy as
`well as the undesired spectral peaks at 1000 Hz, 2000 Hz, etc. The role of the final lowpass
`filter is to eliminate the undesired spectral peaks.
`The bandwidth of the signal, v;(n), is related to the fastest rate of motion of speech
`harmonics in a narrow band and is generally acknowledged to be on the order of 20-30 Hz.
`Hence, the final two blocks of the canonic bank-of-filters model of Figure 3.4 are a sampling
`rate reduction box in which the lowpass-filtered signals, t;(n), are resampled at a rate on
`the order of 40--60 Hz (for economy of representation), and the signal dynamic range is
`compressed using an amplitude compression scheme (e.g., logarithmic encoding, µ-law
`encoding).
`Consider the design of a Q = 16 channel filter bank for a wideband speech signal
`where the highest frequency of interest is 8 kHz. Assume we use a sampling rate of
`Fs = 20 kHz on the speech data to minimize the effects of aliasing in the analog-to-digital
`conversion. The information (bit rate) rate of the raw speech signal is on the order of
`240 kbits per second (20 k samples per second times 12 bits per sample). At the output of
`
`IPR2023-00037
`Apple EX1013 Page 107
`
`

`

`Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`77
`
`1
`
`2
`
`F~·j I
`
`Fs
`N
`
`2 Fa
`N
`
`I
`
`3
`
`I
`3Fg
`N
`
`I
`
`a
`
`I
`OFs
`N
`
`a
`
`... L\
`
`Figure 3.7
`Ideal (a) and realistic (b) set of filter responses of a Q-channel filter bank
`covering the frequency range Fs/N to (Q + 'h>Fs/N.
`
`the analyzer, if we use a sampling rate of 50 Hz and we use a 7 bit logarithmic amplitude
`compressor, we get an information rate of 16 channels times 50 samples per second per
`channel times 7 bits per sample, or 5600 bits per second. Thus, for this simple example
`we have achieved about a 40-to- l reduction in bit rate, and hopefully such a data reduction
`would result in an improved representation of the significant information in the speech
`signal.
`
`3.2.1 Types of Filter Bank Used for Speech Recognition
`
`The most common type of filter bank used for speech recognition is the uniform filter bank
`for which the center frequency,_{;, of the ith bandpass filter is defined as
`
`(3.7)
`
`where F s is the sampling rate of the speech signal, and N is the number of uniformly spaced
`filters required to span the frequency range of the speech. The actual number of filters used
`in the filter bank, Q, satisfies the relation
`
`Q ~N/2
`
`(3.8)
`
`with equality when the entire frequency range of the speech signal is used in the analysis.
`The bandwidth, b;, of the ith filter, generally satisfies the property
`
`(3.9)
`
`with equality meaning that there is no frequency overlap between adjacent filter channels,
`and with inequality meaning that adjacent filter channels overlap. (If b; < Fs/N, then
`certain portions of the speech spectrum would be missing from the analysis and the resulting
`speech spectrum would not be considered very meaningful.) Figure 3.7a shows a set of Q
`ideal, non-overlapping, bandpass filters covering the range from Fs/N( ½) to (Fs/N)(Q +
`½). Similarly Figure 3.7b shows a more realistic set of Q overlapping filters covering
`approximately
`the same range.
`
`IPR2023-00037
`Apple EX1013 Page 108
`
`

`

`78
`
`Chap. 3
`
`Signal Processing and Analysis Methods
`
`The alternative to uniform filter banks is nonuniform filter banks designed according
`to some criterion for how the individual filters should be spaced in frequency. One
`commonly used criterion is to space the filters uniformly along a logarithmic frequency
`scale. (A logarithmic frequency scale is often justified from a human auditory perception
`point of view, as will be discussed in Chapter 4.) Thus for a set of Q bandpass filters with
`center frequencies,f;, and bandwidths, b;, 1 ~ i ~ Q, we set
`
`b1 =C
`
`b; = a,_b;-1,
`
`f; =!1 + :~::::>j
`(b; - bi)
`+
`2
`
`i-1
`
`j=I
`
`(3.IOa)
`(3.IOb)
`
`(3.11)
`
`'
`
`where C and/ 1 are the arbitrary bandwidth and center frequency of the first filter, and a is
`the logarithmic growth factor.
`The most commonly used values of a,_ are a,_ = 2, which gives an octave band spacing
`of adjacent filters, and o = 4/3 which gives a 1/3 octave filter spacing. Consider the
`design of a four band, octave-spaced, non-overlapping filter bank covering the frequency
`band from 200 to 3200 Hz (with a sampling rate of 6.67 kHz). Figure 3.8a shows the ideal
`filters for this filter bank. Values for / 1 and C of 300 Hz and 200 Hz are used, giving the
`following filter specifications:
`
`Filter I:
`Filter 2:
`Filter 3:
`Filter 4:
`
`/ 1 = 300 Hz,
`/ 2 = 600 Hz,
`/3 = 1200 Hz,
`/ 4 = 2400 Hz,
`
`b 1 = 200Hz
`b2 = 400Hz
`b3 = 800 Hz
`b4 = 1600 Hz
`
`An example of a 12-band, I /3-octave, ideal filter-bank specifications, covering the band
`from about 200 to 3200 Hz, is given in Figure 3.8b. For this example, C = 50 Hz, and
`/1 := 225 Hz.
`An alternative criterion for designing a nonuniform filter bank is to use the critical
`band scale directly. The spacing of filters along the critical band is based on perceptual
`studies and is intended to choose bands that give equal contribution to speech articulation.
`The general shape of the critical band scale is given in Figure 3.9. The scale is close to
`linear for frequencies below about I 000 Hz (i.e., the bandwidth is essentially constant as a
`function/), and is close to logarithmic for frequencies above 1000 Hz (i.e., the bandwidth
`is essentially exponential as a function of/). Several variants on the critical band scale
`have been used, including the mel scale and the bark scale. The differences between
`these variants are small and are, for the most part, insignificant with regard to design of
`filter banks for speech-recognition purposes. For example, Figure 3.8c shows a 7-band
`critical-band filter-bank specification.
`Other criteria for designing nonuniform filter banks have been proposed in the liter(cid:173)
`ature. For the most part, the uniform and nonuniform designs based on critical band scales
`have been the most widely used and studied filter-bank methods.
`
`IPR2023-00037
`Apple EX1013 Page 109
`
`

`

`Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`l&J
`0
`
`:::, ...
`z
`C)
`4
`~
`
`0
`0
`
`...
`
`i • 1
`
`I• 2
`
`200
`
`400
`
`-
`
`I
`200 400
`ft
`
`I
`
`800
`
`I• 3
`
`800
`
`I
`
`1600
`
`i • 4
`
`1600
`
`I
`
`i•7
`
`i "8
`
`i •9
`
`i•10
`
`I • 11
`
`i=12
`
`79
`
`(a)
`
`3200
`
`-
`f
`
`(b )
`
`680
`
`I
`
`3200
`
`-
`f
`
`(c)
`
`3200
`
`-
`f
`
`I
`
`i•1 i•3
`ti•2 t i•4 i=5 i•6
`...
`so- ......
`65
`85
`100
`
`"" 0
`::::, ...
`z
`C)
`<I
`~
`
`0
`
`130 170 200
`
`260
`
`340--
`
`--400
`
`-
`
`-
`
`520
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`200 ~ ~00+ 630 800 1000
`250 315 500
`
`I
`
`I
`
`1260
`
`1600
`
`I
`
`2000
`
`I
`
`2520
`
`fe
`
`fg
`
`I
`
`...
`
`i = I
`
`i = 2
`
`i • 3
`
`i • 4
`
`j:5
`
`200
`
`230
`
`290-
`
`~350
`
`450
`
`0
`0
`
`I
`200 400
`
`I
`
`630
`
`I
`
`I
`
`I
`
`920
`
`1270
`
`1720
`
`2320
`
`i = 6
`
`600
`
`I
`
`i = 7
`
`880
`
`I
`
`Figure 3.8
`Ideal specifications of a 4-channel octave band-filter bank (a), a 12-channel third-octave band filter bank (b),
`and a ?-channel critical band scale filter bank (c) covering the telephone bandwidth range (200-3200 Hz).
`
`Figure 3.9 The variation of bandwidth with frequency for the per(cid:173)
`ceptually based critical band scale.
`
`IPR2023-00037
`Apple EX1013 Page 110
`
`

`

`80
`
`Chap. 3
`
`Signal Processing and Analysis Methods
`
`3.2.2 Implementations of Filter Banks
`
`l-l
`
`(3.12a)
`
`(3.12b)
`
`A filter bank can be implemented in several w_a~s, depending o_n the method used to design
`the individual filters. Design methods for d1g1tal filters fall into two broad classes: (I)
`infinite impulse response (IIR) and (2) finite impulse response (FIR) methods. For IIR
`filters (also commonly called recursive filters in the literature), the most straightforward,
`and generally the most efficient implementation is to realize each individual bandpass filter
`as a cascade or parallel structure. (See Reference [ 1 ], pp. 40-46, for a discussion of such
`structures.)
`For FIR filters there are several possible methods of implementing the bandpass filters
`in the filter bank. The most straightforward and the simplest implementation is the direct
`fonn structure. In this case, if we denote the impulse response for the ith channel as h;(n),
`O < n < L - 1, then the output of the ith channel, x;(n), can be expressed as the discrete,
`finite convolution of the input signal, s(n), with the impulse response, h;(n), i.e.
`x;(n) = s(n) * h;(n)
`= L h;(m)s(n - m).
`m=O
`The computation of Eq. (3.12) is iterated on each channel i, for i = l, 2, ... , Q. The
`advantages of the convolutional, direct fonn structure are its simplicity and that it works
`for arbitrary h;(n). The disadvantage of this implementation is the high computational
`requirement. Thus for a Q-channel FIR filter bank, where each bandpass FIR filter has an
`impulse response of L samples duration, we require
`CoFFIR = LQ
`·, +
`(multiplication, addition)
`for a complete evaluation of x;(n), i = I, 2, ... , Q, at a single value of n.
`An alternative, less-expensive implementation can be derived for the case in which
`each bandpass filter impulse response can be represented as a fixed lowpass window, w(n),
`modulated by the complex exponential, eiw,-n_that is,
`
`(3.13)
`
`In this case Eq. (2.12b) becomes
`
`x;(n) = L w(m)eiw;ms(n - m)
`= L s(m) w(n - m)eiw;(n-m)
`
`m
`
`m
`
`(3.14)
`
`(3. 15a)
`
`m
`
`= eiw;nSn(eiw;),
`(3.15b)
`~here Sn(ei"';) is the sho~-time Fourier transfonn of s(n} at frequency w; = 2'/[f;. The
`importance of Eq. (3• I 5) is that efficient procedures often exist for evaluating the short·
`
`IPR2023-00037
`Apple EX1013 Page 111
`
`

`

`Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`81
`
`w (50 - m)
`
`\
`
`~ /
`
`/w
`
`(100- m)
`
`/
`
`\
`\
`\
`
`/
`
`/
`
`"o•200
`
`••
`
`Figure 3.10 The signals s(m) and w(n - m) used in evaluation of the short-time Fourier transform.
`
`time Fourier transform using FFf methods. We will discuss such procedures shortly; first,
`however, we briefly review the interpretations of the short-time Fourier transform (see
`Ref. [2] for a more complete discussion of this fascinating branch of signal processing).
`
`3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier
`Transform
`
`The short-time Fourier transform of the sequence s(m) is defined as
`
`Sn(eiw,) = L s(m)w(n - m)e-jw,m.
`
`(3.16)
`
`m
`If we take the point of view that we are evaluating Sn(eiw•) for a fixed n = no, then we can
`interpret Eq. (3.16) as
`
`(3.17)
`
`where Ff[·] denotes the Fourier Transform. Thus Sn0 (eiw;) is the conventional Fourier
`transform of the windowed signal, s(m) w(no - m), evaluated at the frequency w = w;.
`Figure 3.10 illustrates the signals s(m) and w(n - m), at times n = no = 50, 100, and 200 to
`show which parts of s(m) are used in the computation of the short-time Fourier transform.
`Since w(m) is an FIR filter (i.e., of finite size), if we denote that size by l, then using the
`conventional Fourier transform interpretation of Sn(eiw;), we can state the following:
`
`l. If L is large, relative to the signal periodicity (pitch), then Sn(eiw;) gives good fre(cid:173)
`quency resolution. That is, we can resolve individual pitch harmonics but only
`roughly see the overall spectral envelope of the section of speech within the window.
`2. If l is small relative to the signal periodicity, then Sn(eiw;) gives poor frequency
`resolution (i.e., no pitch harmonics are resolved), but a good estimate of the gross
`spectral shape is obtained.
`
`To illustrate these points, Figures 3.11-3.14 show examples of windowed signals,
`s(m)w(n - m), (part a of each figure) and the resulting log magnitude short time spectra,
`20 log 10 ISn(eiw) I (part b of each figure). Figure 3.11 shows results for an l = 500-point
`Hamming window applied to a section of voiced speech. The periodicity of the signal is
`clearly seen in the windowed time waveform, as well as in the short-time spectrum in which
`the fundamental frequency and its harmonics show up as narrow peaks at equally spaced
`
`IPR2023-00037
`Apple EX1013 Page 112
`
`

`

`82
`
`Chap. 3
`
`Signal Processing and Analysis Meth
`Ods
`
`19000.---~-...--.......--
`
`..........
`---r---,-----.-----,-----.--,
`
`w ::,
`-'
`~
`
`•
`
`-11000
`
`1
`
`m
`~
`w
`0 ::,
`~ z C,
`<
`~
`8 -'
`
`114.7
`
`34.7
`0
`
`SAMPLE
`
`500
`
`FREQUENCY
`
`5000
`
`Figure 3.11 Short-time Fourier transform using a long (500 points or 50 msec)
`Hamming window on a section of voiced speech.
`
`870()r-,-...----r----.---~---.-----.--....--~-~---,
`
`w ::,
`-' < >
`
`0
`
`-m
`'0 -w
`~ z C,
`i

`
`I\
`
`\ V
`
`-4200
`
`1
`
`95.1
`
`15.1
`0
`
`SAMPLE
`
`500
`
`FREQUENCY
`
`5000
`
`Figure 3.12 Shon-time Fourier transform using a sh rt (50
`•
`•
`.
`o
`ming wmdow on a secuon of voiced speech.
`
`) H

`points or msec
`am·
`5
`
`IPR2023-00037
`Apple EX1013 Page 113
`
`

`

`Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`83
`
`2200 r-------.---........,...---,-..,.,...__,...._
`
`.......... -.......-
`
`
`
`........... -
`
`-2400
`
`1
`
`SAMPLE
`
`500
`
`m
`~
`w
`0
`
`:::) .... z (!) <
`8 _,
`
`~
`
`97.8
`
`37.8
`0
`
`FREQUENCY
`
`5000
`
`Figure 3.13 Short-time Fourier transform using a long (500 points or 50 msec)
`Hamming window on a section of unvoiced speech.
`
`1500~~----~---~-.....--~--,---.----.---,
`
`-1300
`
`1
`
`~
`w
`C
`
`-m
`:::) .... z (!)
`i
`8 _,
`
`81.2
`
`21.2
`0
`
`SAMPLE
`
`500
`
`FREQUENCY
`
`5000
`
`Figure 3.14 Short-time Fourier transform using a short (50 points or 5 msec) Ham(cid:173)
`ming window on a section of unvoiced speech.
`
`IPR2023-00037
`Apple EX1013 Page 114
`
`

`

`84
`
`Chap. 3
`
`Signal Processing and Analysis Methods
`
`s (n)
`
`#1'¥
`s (n)
`
`w (n)
`
`Figure 3.15 Linear filter interpretation of the short-time Fourier
`transform.
`
`frequencies. Figure 3.12 shows a similar set of comparisons for an L = 50-point Hamming
`window. For such short windows, the time sequence s(m)w(n-m) does not show the signal
`periodicity, nor does the signal spectrum. In fact, what we see in the short-time Fourier
`transform log magnitude is a few rather broad peaks in frequency corresponding roughly
`to the speech f onnants.
`Figures 3.13 and 3.14 show the effects of using windows on a section of unvoiced
`speech (corresponding to the fricative /sh/) for an L = 500 sample window (Figure 3.13)
`and L = 50 sample window (Figure 3.14 ). Since there is no periodicity in the signal, the
`resulting short-time spectral magnitude of Figure 3.13, for the L = 500 sample window
`shows a ragged series of local peaks and valleys due to the random nature of the unvoiced
`speech. Using the shorter window smoothes out the random fluctuations in the short-time
`spectral magnitude and again shows the broad spectral envelope very well.
`
`3.2.2.2 Linear Filtering Interpretation of the Short-Time Fourier Transform
`
`The linear filtering interpretation of the short-time Fourier transform is derived by consid(cid:173)
`ering Sn(eiw;), of Eq. (3.16), for fixed values of w;, in which case we have
`
`That is, Sn(eiw;) is a convolution of the lowpass window, w(n), with the speech signal, s(n),
`modulated to center frequency w;. This linear filtering interpretation of Sn(~w;) is illustrated
`in Figure 3 .15.
`If we denote the conventional Fourier transforms of s(n) and w(n) by S(~w) and
`W(eiw), then we see that the Fourier tr@nsfonn of s(n) of Figure 3.15 is just
`
`(3.18)
`
`and thus we get
`
`(3.19)
`
`(3.20)
`
`Since W(ei"') approximates 1 over a narrow band, and is O everywhere else, we see that, for
`fixed values, w;, the short-time Fourier transfonn gives a signal representative of the signal
`spectrum in a band around w;. Thus the short-time Fourier transfonn, Sn(ei"'i), represents
`the signal spectral analysis at frequency w; by a filter whose bandwidth is that of W(eiw).
`
`IPR2023-00037
`Apple EX1013 Page 115
`
`

`

`Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`85
`
`3.2.2.3 Review Exercises
`
`Exercise 3.1
`A speech signal is sampled at a rate of 20,000 samples per second <Fs = 20 kHz). A 20-msec
`window is used for short-time spectral analysis, and the window is moved by 10 msec in
`consecutive analysis frames. Assume that a radix-2 FFf is used to compute DFfs.
`1. How many speech samples are used in each segment?
`2. What is the frame rate of the short-time spectral analysis?
`3. What size DFf and FFf are required to guarantee that no time-aliasing will occur?
`4. What is the resulting frequency resolution (spacing in Hz) between adjacent spectral
`samples?
`
`Solution 3.1
`1. Twenty msec of speech at the rate of 20,000 samples per second gives
`
`20 x 10- 3 sec x 20,000 samples/sec = 400 samples.
`
`Each section of speech is 400 samples in duration.
`2. Since the shift between consecutive speech frames is IO msec (i.e., 200 samples at a
`20,000 samples/sec rate), the frame rate is
`
`= 100/sec.
`
`frame rate=
`
`l
`l
`-----
`h'f
`f
`IO x 10- 3 sec
`rame s 1 t
`That is, l 00 spectral analyses are performed per second of speech.
`3. To avoid time aliasing in using the DFr to evaluate the short-time Fourier transform,
`we require the DFf size to be at least as large as the frame size of the analysis frame.
`Hence, from part 1, we require at least a 400-point DFf. Since we are using a radix 2
`FFf, we require, in theory, a 512-point FFT ( the smallest power of 2 greater than 400) to
`compute the DFf without time aliasing. (We would use the 400 speech samples as the
`first 400 points of the 512-point array; we pad 112 zero-valued samples to the end of the
`array to fill in and give a 512-point array.) Since the speech signal is real (as opposed
`to complex), we can use an FFT size of 256 by appropriate signal preprocessing and
`postprocessing with a complex FFT algorithm.
`4. The frequency resolution of the analysis is defined as
`
`sampling rate
`.
`frequency resolution = DFf size - =
`
`20,000 Hz
`512
`
`~ 39 Hz.
`
`Exercise 3.2
`If the sequences s(n) and w(n) have normal (long-time) Fourier transforms S(ei"') and W(ei"'),
`then show that the short-time Fourier transform
`00
`
`Sn(eiw) = L s(m)w(n - m)e-jwm
`
`m=-oo
`
`can be expressed in the form
`
`Sn(eiw) = _1 f 1r W(eio)eionS(ei<w+o>)dO.
`
`271'
`
`-1r
`
`IPR2023-00037
`Apple EX1013 Page 116
`
`

`

`86
`
`Chap. 3
`
`Signal Processing and Analysis M h
`et Ods
`
`That is S (eiw) is a smoothed (by the window spectrum) spectral estimate of S(eiw) at freq
`' n
`Uency
`w.
`
`Solution 3.2
`The long-time Fourier transforms of s(n) and w(n) can be expressed as
`
`CX)
`
`n=-ex>
`
`CX)
`
`n=-ex>
`
`The window sequence, w(n), can be recovered from its long-time Fourier transfonn via the
`integration
`
`11T'
`Sn(eiw) = L s(m)w(n - m)e-jwm
`
`1
`w(n) = -
`2-rr -7T'
`Hence, the short-time Fourier transform
`
`·wn
`·w
`W(e' )e' dw.
`
`CX)
`
`m=-ex>
`
`can be put in the fonn (by substituting for w(n - m)):
`
`S.(/"') = m ~oo s(m) [ 2~ 1: W(/B)/B<n-m)d0] e-jwm
`= 2~ 1: W(iB)/Bn L f ~ s(m)e-j(w+B>m] d0
`= 2~ 1: W(i8)/Bns(ei(w+8))d0.
`
`Exercise 3.3
`If we define the short-time spectrum of a signal in terms of its short-time Fourier transfonn as
`
`and we define the short-time autocorrelation of the signal as
`
`00
`
`Rn(k) = L w(n - m)s(m)w(n - k - m) s(m + k)
`
`m=-oo
`
`then show that for
`
`Sn(eiw) = L s(m)w(n - m)e-jwm
`
`(X)
`
`m=-<X>
`
`Rn(k) and Xn(~:) ~e related~ a normal (long-time) Fourier transfonn pair. In other worrlS,
`show that Xn(e' ) 1s the (long-t1me) Fourier transform of Rn(k}, and vice versa.
`
`IPR2023-00037
`Apple EX1013 Page 117
`
`

`

`Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`87
`
`Solution 3.3
`Given the definition of Sn(ei"') we have
`
`Xn(ei"') = 1Sn(ei"')j2 = [Sn(ei"')][Sn(ei"')t
`
`= lt= s(m)w(n - m)e-iw• l [,f s(r)w(n -
`= L L w(n - m)s(m)w(n -
`
`00
`
`00
`
`r)Jw']
`
`r)s(r)e-jw(m-r>
`
`Let r = k + m, then:
`
`r=-oo m=-oo
`
`00
`
`00
`
`k=-oo
`
`k=-oo
`
`(since Rn(k) = Rn( -k));
`
`therefore
`
`3.2.2.4 FFT Implementation of Uniform Filter Bank Based on the Short-Time
`Fourier Transform
`
`We now return to the question of how to efficiently implement the computation of the set
`of filter-bank outputs (Eq. (3.15)) for the uniform filter bank. If we assume, reasonably,
`that we are interested in a uniform frequency spacing-that
`is, if
`f; = i(Fs/N),
`then Eq. (3.15a) can be written as
`x;(n) = ei('lg-)in Ls(m)w(n
`
`i=O,I,
`
`... ,N-I
`
`- m)e-i(~)im_
`
`(3.21)
`
`(3.22)
`
`m
`Now consider breaking up the summation over m, into a double summation of rand k, in
`which
`
`-oo < r < oo.
`0 < k ~ N - 1,
`m = Nr + k,
`In other words, we break up the computation over m into pieces of size N. If we let
`
`then Eq. (3.22) can be written as
`
`Sn(m) = s(m)w(n - m),
`
`(3.23)
`
`(3.24)
`
`(3.25)
`
`IPR2023-00037
`Apple EX1013 Page 118
`
`

`

`88
`
`Chap. 3
`
`Signal Processing and Analysis Meth
`Ods
`
`Since e-j1.1rir = l, for all i, r, then
`
`If we define
`
`we wind up with
`
`r
`
`Un(k) = L Sn(Nr + k),
`x;(n) = eiC2; )in [I: Un(k)e-i(
`
`(3.26)
`
`(3.27)
`
`(3.28)
`
`i; )ik]
`
`k=O
`which is the desired result; that is, x;(n) is a modulated N-point DFf of the sequence un(k).
`Thus the basic steps in the computation of a uniform filter bank via FFf methods are
`as follows:
`
`1. Fonn the windowed signal sn(m) = s(m) w(n - m), m = n - L + 1, ... , n, where
`w(n) is a causal, finite window of duration L samples. Figure 3.16a illustrates this
`step.
`
`2. Fonn un(k) = L s11(Nr + k), 0 ::; k < N - 1. That is, break the signal Sn(m) into
`r
`pieces of size N samples and add up the pieces (alias them back unto itself) to give a
`signal of size N samples. Figures 3.16b and c illustrate this step for the case in which
`l»N.
`3. Take the N-point DFf of Un(k).
`4. Modulate the DFf by the sequence ei( 2
`: )in.
`
`The modulation step 4 can be avoided by circularly shifting the sequence, u11(k), by n EB N
`samples (where EB is the modulo operation), to give un((k - n))N, 0 < k < N - 1, prior to
`the DFf computation.
`The computation to implement the uniform filter bank via Eq. (3.28) is essentially
`CFBFFf ~ 2N log N•, +.
`Consider now the ratio, R, between the computation for the direct form implementation of
`a uniform filter bank (Eq. (3.13)), and the FFT implementation
`(Eq. (3.29)), such that
`R = CoFFIR =
`LQ
`(3.30)
`2N log N •
`C FBFFT
`If we assume N = 32 (i.e., a I 6-channel filter bank), with L = 128 (i.e., 12.8 msec impulse
`response filter at a IO-kHz sampling rate), and Q = 16 channels, we get
`R = 128 • 16 _
`2 · 32 · 5 - 6.4.
`h ct·
`ture
`h
`The FFf implementation is about 6.4 times more effi •
`&
`c1ent t an t e 1rect 1orm struc
`•
`
`(3.29)
`
`IPR2023-00037
`Apple EX1013 Page 119
`
`

`

`Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`89
`
`m
`
`-
`
`~(ml
`
`n-L+1
`
`0
`
`N-1
`
`k
`
`Figure 3.16 FFf implementation of a uniform filter bank.
`
`.
`
`-
`
`I
`
`'.
`
`s (n)
`
`h 1 (n)
`
`h2(n)
`
`•
`•
`•
`
`-
`
`~
`
`ha(n)
`
`Figure 3.17 Direct form implementation of an
`arbitrary nonuniform filler bank.
`
`3.2.2.5 Nonuniform FIR Filter Bank Implementations
`
`The most general form of a nonuniform FIR filter bank is shown in Figure 3.17, where the
`kth bandpass filter impulse response, hk(n), represents a filter with center frequency wk, and
`bandwidth !:iwk. The set of Q bandpass filters is intended to cover the frequency range of
`
`IPR2023-00037
`Apple EX1013 Page 120
`
`

`

`90
`
`Chap. 3
`
`Signal Processing and Analysis Meth
`Ods
`
`1
`
`2
`
`3
`
`1 2 3
`
`4
`
`5
`
`6
`
`7
`
`(a)
`
`(b)
`
`f
`
`f
`
`Figure 3.18 Two arbitrary nonuniform filter-bank
`ideal filter specifications
`consisting of either 3 bands (part a) or 7 bands (part b).
`
`interest for the intended speech-processing application.
`In its most general form, each bandpass filter is implemented via a direct convolution;
`that is, no efficient FFf structure can be used. In the case where each bandpass filter is
`designed via the windowing design method (Ref. [ 1 ]), using the same lowpass window, we
`can show that the composite frequency response of the Q-channel filter bank is independent
`of the number and distribution of the individual filters. Thus a filter bank with the three
`filters shown in Figure 3.18a has the exact same composite frequency response as the filter
`bank with the seven filters shown in Figure 3.18b.
`To show this we denote the impulse response of the kth bandpass filter as
`hk(n) = w(n)hk(n),
`where w(n) is the RR window, and hk(n) is the impulse response of the ideal bandpass filter
`being designed. The frequency response of the kth bandpass filter, Hk(eiw), can be written
`as
`
`(3.31)
`
`Thus the frequency response of the composite filter bank, H(eiw), can be written as
`
`H(eiw) = L Hk(eiw) = L W(eJ°w)@ /h(eiw).
`
`Q
`
`Q
`
`By interchanging the summation and the convolution we get
`
`k=I
`
`k=I
`
`H(eiw) = W(eiw)@ L Hk(eJ°w).
`
`Q
`
`(3.32)
`
`(3.33)
`
`(3.34)
`
`k=l
`By realizing_ ~at_the summation of Eq. (3.34) is the summation of ideal frequency responses,
`we see that 1t 1s mdependent of the number and distribution of the individual filters. 'fhus
`we can write the summation as
`
`Wmin < W < Wmax
`otherwise
`
`(3.35)
`
`IPR2023-00037
`Apple EX1013 Page 121
`
`

`

`,I
`
`Sec. 3.2
`
`The Bank-of-Filters Front-End Processor
`
`91
`
`where Wmin is the lowest frequency in the filter bank, and Wmax is the highest frequency.
`Then Eq. (3.34) can be expressed as
`
`(3.36)
`independent of the number of ideal filters, Q, and their distribution in frequency, which is
`the desired result.
`
`3.2.2.6 FFT-Based Nonuniform Filter Banks
`
`One possible way to exploit the FFf structure for implementing uniform filter banks
`discussed earlier is to design a large uniform filter bank (e.g., N = 128 or 256 channels)
`and then create the nonuniformity by combining two or more uniform channels. This
`technique of combining channels is readily shown to be equivalent to applying a modified
`analysis window to the sequence prior to the FFf. To see this, consider taking an N-point
`DFf of the sequence x(n) (derived from the speech signal, s(n), by windowing by w(n)).
`Thus we get
`
`(3.37)
`
`(3.38)
`
`(3.39)
`
`N-1
`xk = Lx(n)e-j
`n=O
`as the set of DFT values. If we consider adding DFf outputs Xk and Xk+I, we get
`
`nk,
`
`2;
`
`2;
`
`2;
`
`nk +e-j
`
`(e-j
`
`n(k+I))
`
`N-1
`Xk +Xk+I = LX(n)
`n=O
`
`which can be written as
`
`the equivalent kth channel value, Xt, could have been obtained by weighting the
`i.e.
`in time, by the complex sequence 2e-F';/ cos ( ~n). If more than two
`sequence, x(n),
`channels are combined, then a different equivalent weighting sequence results. Thus FFf
`channel combining is essentially a "quick and dirty" method of designing broader bandpass
`filters and is a simple and effective way of realizing certain types of nonuniform filter bank
`analysis structures.
`
`3.2.2.7 Tree Structure Realizations of Nonuniform Filter Banks
`
`A third method used to implement certain types of nonuniform filter banks is the tree
`structure in which the speech signal is filtered in stages, and the sampling rate is successively
`reduced at each stage for efficiency of implementation. An example of such a realization
`is given in Figure 3.19a for the 4-band, octave-spaced filter bank shown (ideally) in
`Figure 3.19b. The original speech signal, s(n), is filtered initially into two bands, a low
`band and a high band, using quadrature mirror filters (QMFs)-i.e.,
`filters whose frequency
`responses are complementary. The high band, which covers half the spectrum, is reduced
`in sampling rate by a factor of 2, and represents the highest octave band ( 1r /2 ~ w :::; 1r) of
`
`IPR2023-00037
`Apple EX1013 Page 122
`
`

`

`92
`
`Chap. 3
`
`Signal Processing and Analysis Meth
`Ods
`
`LP 3
`
`2 ♦
`
`X1 (m)
`
`HP 3
`
`s (n)
`
`X4 (m)
`
`1
`
`2
`
`3
`
`4
`
`I I
`
`0
`
`1T'
`8
`
`I
`
`1T'
`4
`
`1T'
`2
`
`Tr
`
`Figure 3.19 Tree structure implementati

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket