`
`Petitioner Apple Inc.
`Petitioner Apple Inc.
`Ex. 1012, Cover
`
`
`
`Discrete-Time
`Processing of
`Speech Signals_
`
`John R. Deller, Jr. Michigan State University
`John G. Proakis Northeastern University
`John H. L. Hansen Duke university
`
`Macmillan Publishing Company
`New York
`
`Maxwell Macmillan Canada
`Toronto
`
`Maxwell Macmillan International
`New York Oxford Singapore Sydney
`
`Petitioner Apple Inc.
`Ex. 1012, Cover-2
`
`
`
`Editor: John Griffin
`Production Supervisor: Elaine W. Wetterau
`Production Manager: Roger Vergnes
`Text Designer: Natasha Sylvester
`Cover Designer: Cathleen Norz
`Illustrations: Academy ArtWorks, Inc.
`This book was set in Times Roman by Graphic Sciences Corporation,
`printed and bound by Book Press.
`
`Copyright © 1993 by Macmillan Publishing Company, a division of Macmillan, Inc.
`
`Printed in the United States of America
`All rights reserved. No part of this book may be reproduced or
`transmitted in any form or by any means, electronic or mechanical,
`including photocopying, recording, or any information storage and
`retrieval system, without permission in writing from the publisher.
`
`Macmillan Publishing Company
`113 Sylvan Avenue, Englewood Cliffs, NJ 07632
`
`Library of Congress Cataloging in Publication Data
`
`Deller, John R.
`Discrete-time processing of speech signals I John R. Deller, John
`G. Proakis, John H. L. Hansen.
`p. em.
`Includes index.
`ISBN 0-02-328301-7
`1. Speech processing systems. 2. Signal processing-Digital
`techniques. 3. Discrete-time systems.
`I. Proakis, John G.
`II. Hansen, John H. L.
`III. Title.
`TK7882.S65D44 1993
`621.382'2-dc20
`
`Printing:
`
`2 3 4 5 6 7 8
`
`Year:
`
`5 6 7 8 9 0 1 2
`
`92-38644
`CIP
`
`Petitioner Apple Inc.
`Ex. 1012, Cover-3
`
`
`
`~ER ]1_--------------------------------------~
`
`Propaedeutic
`
`Read.Me: If you are someone who never reads Chapter 1, please at least
`read Sections 1.0.2 and 1.0.3 before proceeding!
`
`1.0 Preamble
`
`1.0.1 The Purpose of Chapter 1
`If the reader learns nothing more from this book, it is a safe bet that
`he or she will learn a new word. A propaedeutic1 is a "preliminary body
`of knowledge and rules necessary for the study of some art or science"
`(Barnhart, 1964). This chapter is just that-a propaedeutic for the study
`of speech processing focusing primarily on two broad areas, digital signal
`processing (DSP) and stochastic processes, and also on some necessary
`topics from the fields of statistical pattern recognition and information
`theory.
`The reader of this book is assumed to have a sound background in the
`first two of these areas, typical of an entry level graduate course in each
`field. It is not our purpose to comprehensively teach DSP and random
`processes, and the brief presentation here is not intended to provide an
`adequate background. There are many fine textbooks to which the reader
`might refer to review and reinforce prerequisite topics for these subjects.
`We list a considerable number of widely used books in Appendices 1.A
`and 1.B.
`What, then, is the point of our propaedeutic? The remainder of this
`chapter is divided into four main sections plus one small section, and the
`tutorial goals are somewhat different in each. Let us first consider the
`two main sections on DSP and stochastic processes. In the authors' expe(cid:173)
`rience, the speech processing student is somewhat more comfortable with
`"deterministic" DSP topics than with random processes. What we will do
`in Section 1.1, which focuses on DSP, therefore, is highlight some of the
`key concepts which will play central roles in our speech processing work.
`Where the material seems unfamiliar, the reader is urged to seek help in
`
`1 Pronounced "pro' -pa-doo' -tic."
`
`3
`
`Petitioner Apple Inc.
`Ex. 1012, p. 3
`
`
`
`4 Ch. 1 I Propaedeutic
`
`one or more of the DSP textbooks cited in Appendix l.A. Our main ob(cid:173)
`jective is to briefly outline the essential DSP topics with a particular in(cid:173)
`terest defining notation that will be used consistently throughout the
`book. A second objective is to cover a few subtler concepts that will be
`important in this book, and that might have been missed in the reader's
`first exposure to DSP.
`The goals of Section 1.2 on random processes are somewhat differ(cid:173)
`ent. We will introduce some fundamental concepts with a bit more for(cid:173)
`mality, uniformity, and detail than the DSP material. This treatment
`might at first seem unnecessarily detailed for a textbook on speech pro(cid:173)
`cessing. We do so, however, for several reasons. First, a clear understand(cid:173)
`ing of stochastic process concepts, which are so essential in speech
`processing, depends strongly on an understanding of the basic probability
`formalisms. Second, many engineering courses rely heavily on stochastic
`processes and not so much on the underlying probability concepts, so
`that the probability concepts become "rusty." Emerging technologies in
`speech processing depend on the basic probability theory and some re(cid:173)
`view of these ideas could prove useful. Third, it is true that the mastery
`of any subject requires several "passes" through the material, but engi(cid:173)
`neers often find this especially true of the field of probability and ran(cid:173)
`dom processes.
`The third and fourth major divisions of this chapter, Sections 1.3 and
`1.4, treat a few topics which are used in the vast fields of statistical pat(cid:173)
`tern recognition and information theory. In fact, we have included some
`topics in Section 1.3 which are perhaps more general than "pattern rec(cid:173)
`ognition" methods, but the rubric will suffice. These sections are con(cid:173)
`cerned with basic mathematical tools which will be used frequently, and
`in diverse ways in our study, beginning in Part IV of the book. There is
`no assumption that the reader has formal coursework in these topics be(cid:173)
`yond the normal acquaintance with them that would ordinarily be de(cid:173)
`rived from an engineering education. Therefore, the goal of these sections
`is to give an adequate description of a few important topics which will be
`critical to our speech work.
`Finally, Section 1.5 briefly reviews the essence and notation of phasors
`and steady-state analysis of systems described by differential equations.
`A firm grasp of this material will be necessary in our early work on ana(cid:173)
`log acoustic modeling of the speech production system in Chapter 3.
`As indicated above, the need for the subjects in Sections 1.3-1.5 is not
`immediate, so the reader might wish to scan over these sections, then re(cid:173)
`turn to them as needed. More guidance on reading strategy follows.
`
`1.0.2 Please Read This Note on Notation
`The principal tool of engineering is applied mathematics. The lan(cid:173)
`guage of mathematics is abstract symbolism. This book is written with a
`conviction that careful and consistent notation is a sign of clear under-
`
`Petitioner Apple Inc.
`Ex. 1012, p. 4
`
`
`
`1.0 1 Preamble 5
`
`standing, and clear understanding is derived by forcing oneself to com(cid:173)
`prehend and use such notation. Painstaking care has been taken in this
`book to use information-laden and consistent notation in keeping with
`this philosophy. When we err with notation, we err on the side of exces(cid:173)
`sive notation which is not always conventional, and not always necessary
`once the topic has been mastered. Therefore, the reader is invited (with
`your instructor's permission if you are taking a course!) to shorten or
`simplify the notation as the need for the "tutorial" notation subsides.
`Let us give some examples. We will later use an argument m to keep
`track of the point in time at which certain features are extracted from a
`speech signal. This argument is key to understanding the "short-term"
`nature of the processing of speech. The ith "linear prediction" coefficient
`computed on a "frame" of speech ending at time m will be denoted
`ii(i; m). In the development of an algorithm for computing the coeffi(cid:173)
`cients, forexample, the index m will not be very germane to the develop(cid:173)
`ment and the reader might wish to omit it once its significance is clear.
`Another example comes from the random process theory. Numerous ex(cid:173)
`amples of sloppy notation abound in probability theory, a likely reason
`why many engineers find this subject intractable. For example, some(cid:173)
`thing like "f(x)" is frequently used to denote the probability density func(cid:173)
`tion (pdf) for the random variable ~· There are numerous ways in which
`this notation can cause misunderstandings and even subtle mathematical
`traps which can lead to incorrect results. We will be careful in this text to
`delineate random processes, random variables, and values that may be
`assumed by a random variable. We will denote a random variable, for
`example, by underscoring the variable name, for example, x. The pdf for
`~ will be denoted fx(x), for example. The reader who has a clear under(cid:173)
`standing of the underlying concepts might choose to resort to some slop(cid:173)
`pier form of notation, but the reader who does not will benefit greatly by
`working to understanding the details of the notation.
`
`1.0.3 For People Who Never Read Chapter 1 (and Those Who Do)
`To be entitled to use the word "propaedeutic" at your next social en(cid:173)
`gagement, you must read at least some of this chapter. 2 If for no other
`reason than to become familiar with the notation, we urge you to at least
`generally review the topics here before proceeding. However, there is a
`large amount of material in this chapter, and some people will naturally
`prefer to review these topics on an "as needed" basis. For that reason, we
`provide the following guide to the use of Chapter 1.
`With a few exceptions, most of the topics in Sections 1.1 and 1.2 will
`be widely used throughout the book and we recommend their review be(cid:173)
`fore proceeding. The one exception is the subsection on "State Space Re-
`
`21f you have skipped the first part of this chapter, you will be using the word without
`even knowing what it means.
`
`Petitioner Apple Inc.
`Ex. 1012, p. 5
`
`
`
`6 Ch. 1 1 Propaedeutic
`
`alizations" in Section 1.1.6, which will be used in a limited way in
`Chapters 5 and 12. The topics in Sections 1.3 and 1.4, however, are
`mostly specialized subjects which will be used in particular aspects of our
`study, beginning in Part IV of the book. Likewise the topic in Section 1.5
`is used in one isolated, but important, body of material in Chapter 3.
`These latter topics and the "state space" topic in the earlier section
`will be "flagged" in Reading Notes at the beginning of relevant chapters,
`and in other appropriate places in the book.
`
`1.1 Review of DSP Concepts and Notation
`
`1.1.1 "Normalized Time and Frequency"
`Throughout the book, we will implicitly use what we might call nor(cid:173)
`malized time and frequency variables. By this we mean that a discrete
`time signal (usually speech), say s(n), will be indexed by integers only. 3
`Whereas s(n) invariably represents samples of an analog waveform, say
`sa<t), at some sample period, T,
`
`( 1.1)
`
`n = ... , -1, 0, 1, 2, ... ,
`the integer n indexes the sample number, but we have lost the absolute
`time orientation in the argument. To recover the times at which the sam(cid:173)
`ples are taken, we simply need to know T.
`To understand the "physical" significance of this mathematical con(cid:173)
`vention, it is sometimes convenient to imagine that we have scaled the
`real-world time axis by a factor of T prior to taking the samples, as illus(cid:173)
`trated in Fig. 1.1. "Normalized time," say t', is related to real time as
`t' =.!... T
`and the samples of speech are taken at intervals which are exactly4 "nor(cid:173)
`malized seconds (norm-sec)." In most cases it is perfectly sufficient to
`refer to the interval between samples as the "sample period," where the
`conversion to the real-world interval is obvious. However, on a few occa(cid:173)
`sions we will have more than one sampling process occurring in the same
`problem (i.e., a resampling of the speech sequence), and in these in(cid:173)
`stances the concept of a "normalized second" is useful to refer to the
`basic sampling interval on the data.
`Of course, the normalization of time renders certain frequency quanti(cid:173)
`ties invariant. The sample period in normalized time is always unity, and
`therefore the sample frequency is always unity [dimensionless, but some-
`
`( 1.2)
`
`3Note that we have referred to s(n) as "discrete time" rather than "digital." Throughout
`most of this book, we will ignore any quantization of amplitude.
`4The reader should note that the normalized time axis is actually dimensionless.
`
`Petitioner Apple Inc.
`Ex. 1012, p. 6
`
`
`
`1.1 f Review of DSP Concepts and Notation 7
`
`Speech waveform
`
`01 2 3
`
`9
`8
`4 56 7
`"Normalized" time, t' (norm-sec)
`
`18
`
`0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
`Real time, t (msec)
`
`FIGURE 1.1. Segment of a speech waveform used to illustrate the concept
`of "normalized time." Suppose that samples are to be taken at a rate F. =
`1 0 kHz so that the sample period is T = 0.1 msec. The lower time axis
`represents real time measured in milliseconds, while the upper represents a
`normalization of the time axis such that the sample times fall at integers.
`Normalized time, t', is related to real time, t, as t' = tfT. We will on a few
`occasions refer to the sample period in the scaled case as a "normalized
`second (norm-sec)."
`
`times "normalized Hertz (norm-Hz)"], and the sample radian frequency
`is always 2n [dimensionless or "normalized radians per second (norm(cid:173)
`rps)"]. Accordingly, the Nyquist frequency is always 0.5 norm-Hz, or n
`norm-rps. In general, the conversions between "real" frequencies, say F
`(Hz) and n (rps) and their normalized counterparts, say f and w, are
`given by
`
`f=FT
`w=QT.
`
`( 1.3)
`
`( 1.4)
`
`We can easily verify this by examining a single sinusoid at real frequency
`n,
`
`xa(t) =A sin(Ot + q7) =A sin(QT; + q7).
`
`( 1.5)
`
`The rightmost term can be regarded as a sinusoid at a different fre(cid:173)
`quency, w = QT, on a different time axis t' = t /T,
`x~(t') =A sin(wt' + q7).
`
`(1.6)
`
`Petitioner Apple Inc.
`Ex. 1012, p. 7
`
`
`
`8 Ch. 1 1 Propaedeutic
`
`Clearly, we obtain the same samples if we sample xa(t) at t = nT, or
`x~(t') at t' = n. A magnitude spectrum is plotted against real and nor(cid:173)
`malized frequencies in Fig. 1.2 to illustrate this concept.
`In spite of this rather lengthy explanation of "normalized time and
`frequency," we do not want to overemphasize this issue. Simply stated,
`we will find it convenient to index speech waveforms by integers (espe(cid:173)
`cially in theoretical developments). This is an accepted convention in
`. DSP. The point of the above is to remind the reader that the resulting
`"normalized" time and frequency domains are simply related to the
`"real" time and frequency. When the "normalized" quantities need to be
`converted to "real" quantities, this is very easily accomplished if the
`sample frequency or period is known. While using the DSP convention
`for convenience in many discussions, we will always keep the sampling
`information close at hand so the "real" quantities are known. To do oth(cid:173)
`erwise would be to deny the physical nature of the process with which we
`are working. In many instances in which practical systems and applica(cid:173)
`tions are being discussed, it will make perfect sense· to simply work in
`terms of "real" quantities.
`
`60
`
`a:) .,
`"""
`"--,
`~
`v "0
`
`3 ·= OJJ "' E
`
`-;;;
`b
`u
`<()
`0..
`CI"J
`
`00
`
`0.1
`
`0.3
`0.2
`0.4
`"Normalized" frequency, f(norm-Hz)
`
`0.5
`
`4
`
`5
`
`0
`
`2
`3
`Frequency, F (kHz)
`FIGURE 1.2. Magnitude spectrum of a typical speech waveform. This
`spectrum is based on the DFT of samples of the waveform taken at
`1 0 kHz. The lower frequency axis represents real frequencies measured in
`kHz, while the upper represents a normalization of the frequency axis
`concomitant to the time normalization. Normalized frequency, f, is related to
`real frequency, F, as f= FT, where Tis the nominal sample period in a given
`analysis. Accordingly, the "normalized" sampling, say t •• and Nyquist, fN,
`frequencies are invariant with the sample rate, with t. = 1 and fN = 1/2. We
`will sometimes refer to the units of normalized frequencies as "normalized
`Hertz (norm-Hz)" or "normalized radians per second (norm-rps)."
`
`Petitioner Apple Inc.
`Ex. 1012, p. 8
`
`
`
`1.1 1 Review of DSP Concepts and Notation 9
`
`1.1.2 Singularity Signals
`In the continuous time domain, a singularity signal is one for which
`one or more derivatives do not exist at one or more time points. Al(cid:173)
`though the concept of a derivative is no longer meaningful in discrete
`time, we often borrow the term singularity to describe analogous se(cid:173)
`quences in sampled time. The two for which we will have the most use
`are
`
`• The unit sample sequence or discrete-time impulse, defined by
`if n = 0
`
`J(n) d,;,f
`
`1'
`
`{
`
`0,
`
`otherwise
`
`( 1. 7)
`
`• The unit step sequence, defined by 5
`
`u(n) d~r
`
`1'
`
`0,
`
`{
`
`if n > 0
`
`otherwise
`
`(1.8)
`
`The unit step sequence is much more analogous to its analog counter(cid:173)
`part than the discrete-time impulse in the sense that u(n) simply repre(cid:173)
`sents samples of the analog unit step function (discounting problems that
`may arise due to different definitions at time zero). On the other hand,
`recall that the analog (Dirac) impulse function, say Ja(t), is defined such
`that it apparently has infinite height, zero width, and unity area. Al(cid:173)
`though the discrete-time impulse plays an analogous role to that played
`by the analog impulse, it may not be interpreted as its samples.
`
`1.1.3 Energy and Power Signals
`There are many ways in which a discrete time signal can be classified.
`One useful grouping is into the categories energy signal, power signal, or
`neither. Recall that the energy of a discrete time signal is defined as 6
`Ex d~r L jx(n)j2.
`
`cc
`
`n=-co
`
`( 1.9)
`
`A signal x(n) IS called an energy signal if
`
`O<Ex<oo.
`
`( 1.1 0)
`
`The power in a discrete-time sequence is
`
`5The notation u(n) is widely used to indicate the unit step sequence, but u(n) will also
`refer to a very important waveform, the ~glottal volume velocity," throughout the book. Be(cid:173)
`cause of the context, there will be no risk of confusion.
`6The absolute value signs are included because, in general, x(n) is a complex sequence.
`
`Petitioner Apple Inc.
`Ex. 1012, p. 9
`
`
`
`10 Ch. 1 1 Propaedeutic
`
`P
`x
`
`ct~r lim
`N-co
`
`1
`
`N
`
`2N + 1 L lx(n)l2·
`
`n=-N
`
`(1.11)
`
`A power signal has finite but nonzero power,
`0 < Px <OJ.
`A signal cannot be both a power signal and an energy signal simultane(cid:173)
`ously, since if Ex < ro, then Px = 0. A signal can, however, be neither
`when Px = ro or Ex= 0.
`For our purposes in speech processing, it is sufficient to associate the
`energy category with two broad classes of signals. These are
`
`(1.12)
`
`• Transients, those which decay (usually exponentially) with time. Ex(cid:173)
`amples are
`
`lal< 1
`
`( 1.13)
`
`(1.14)
`/a/< 1.
`• Finite sequences, those which are zero outside a finite time duration.
`An example is
`IPI < ro.
`x 3(n) = ePn[ u(n + 3)- u(n- 246)],
`Whereas the energy signals either decay out sufficiently fast or "stop"
`completely, the power signals neither decay nor increase in their enve(cid:173)
`lopes. The power signals can be associated with three broad classes of
`signals. These are
`
`(1.15)
`
`• Constant signals. An example is
`
`-ro < a < ro.
`( 1.16)
`• Periodic signals, those for which x(n) = x(n + N) for some finite N
`and for all n. Examples are
`
`-ro <a< ro
`x6(n) = [x3(n)Jmodulo 512 = I x3<n + i512).
`
`00
`
`i=-co
`
`( 1.17)
`
`( 1.18)
`
`• Realizations of stationary, ergodic stochastic processes (see Section
`1.2.3).
`
`The signals which fall into neither category are the trivial zero signal
`and those which "blow up" with time. Examples of the latter are x 1(n)
`and x 2(n) above with the magnitude of a taken to be greater than unity.
`
`1.1.4 Transforms and a Few Related Concepts
`At the heart of much of engineering analysis are various frequency do(cid:173)
`main transforms. Three transforms on discrete-time data will be used ex-
`
`Petitioner Apple Inc.
`Ex. 1012, p. 10
`
`
`
`1.1 1 Review of DSP Concepts and Notation 11
`
`tensively throughout this book, and it will be assumed that the reader is
`familiar with their properties and usage.
`The first is the discrete-time Fourier transform (DTFT), which, for the
`sequence x(n), is defined by
`X(w) = I x(n)e-jwn.
`
`( 1.19)
`
`00
`
`The inverse DTFT (IDTFT) is given by
`
`n=-oo
`
`fTC
`X(w)ejwn dw.
`-n
`
`( 1.20)
`
`1
`x(n) = -
`2
`rr
`The DTFT bears a useful relationship to the continuous-time Fourier
`transform in the case in which x(n) represents samples of the analog sig(cid:173)
`naF xa(t'). In this case X(w) will be a periodic (with period 2rr), poten(cid:173)
`tially aliased version of Xa(w),
`X(w) = I Xa(w- 2rri).
`
`00
`
`i=-co
`
`( 1.21)
`
`The existence of the DTFT is not a trivial subject, and we will review
`only a few important details. A sufficient condition for the DTFT of a
`sequence x(n) to exist is that the sequence be absolutely summable,
`00 I \x(n)\ <co.
`
`( 1.22)
`
`n=-co
`
`This follows immediately from ( 1.19). Moreover, absolute summability of
`x(n) is tantamount to absolute convergence of the series I:-oo x(n)e-jwn
`implying that this series converges uniformly to a continuous function of
`w (Churchill, 1960, Sees. 59 and 60). A sequence that is absolutely sum(cid:173)
`mabie will necessarily be an energy signal, since
`
`( 1.23)
`
`There are, however, energy signals that are not absolutely summable (see
`Problem 1.2). These energy signals will still have DTFTs, but ones whose
`series converge in a weaker (mean square) sense. This can be seen by
`viewing ( 1.19) as a conventional Fourier series for the periodic function
`X(w) whose coefficients are x(n). One of the properties of Fourier series
`is that if the energy in a single period of the fur..ction is finite, then the
`
`7Note the use of "normalized time" here. If "real" time is used, (1.21) becomes
`
`Petitioner Apple Inc.
`Ex. 1012, p. 11
`
`
`
`12 Ch. 1 1 Propaedeutic
`
`series will converge in mean square (Churchill, 1963). In the present case
`(using the Parseval relation),
`jX(w)j 2 dw = 2rc I x 2(n) = 2rcEx < ro,
`
`f 1t
`
`-1t
`
`00
`
`n=-oo
`
`(1.24)
`
`so the OTFT will converge in the mean square sense. Practically, this
`means that the OTFT sum will converge to X(w) at all points of continu(cid:173)
`ity, and at points of discontinuity it will converge to the "average" value
`("halfway" between the values on either side of the discontinuity).
`Properties of the OTFT are detailed in the textbooks cited in Appen(cid:173)
`dix l.A, and some are reviewed in Problem 1.3. The reader should also
`recall the numerous symmetry properties of the transform relation which
`can be useful in simplifying various computations and algorithms.
`Whereas the OTFT is most useful for theoretical spectral analysis, it is
`not computable on a digital computer because it is a function of a con(cid:173)
`tinuous argument. In principle, it also works with a sequence of doubly
`infinite length, which also precludes any practical computation. If we re(cid:173)
`strict ourselves to the practical situation in which a sequence of finite
`length is being studied, then the discrete Fourier transform (OFT) pro(cid:173)
`vides a mapping between the sequence, say
`
`x(n), n=O,l,2, ... ,N-1
`
`( 1.25)
`
`and a discrete set of frequency domain samples, given by
`
`X(k) =
`
`{
`
`N-1
`'\:"' X(n)e-;(2rr/N)kn
`L
`'
`
`n~o
`
`k = 0, I, ... , N- l
`
`0,
`
`other k
`
`The Inverse DFT (IDFT) is given by
`_!_ I X(k)e)(2n/N)kn
`
`N-1
`
`x(n) = N k~o
`{
`0,
`
`'
`
`n == 0, l, ... , N- l
`
`other n
`
`( 1.26)
`
`( 1.27)
`
`The OFT represents exact samples of the OTFT of the finite sequence
`x(n) at N equally spaced frequencies, wk = (2rc/N)k, for k E [O,N- 1].
`The discrete Fourier series (OFS) is closely related to the OFT compu(cid:173)
`tationally, but is quite different philosophically. The OFS is used to rep(cid:173)
`resent a periodic sequence (hence a power signal) with period, say N,
`using the set of basis functions eJ<Zn/N)kn for k = 0, ... , N- l. These rep(cid:173)
`resent the N harmonic frequencies that may be present in the signal. For
`a periodic sequence y(n), the expansion is
`y(n) = I C(k)eJ(2n/N)kn,
`
`( 1.28)
`
`N-1
`
`k~O
`
`Petitioner Apple Inc.
`Ex. 1012, p. 12
`
`
`
`1.1 1 Review of DSP Concepts and Notation 13
`
`where the coefficients are computed as
`
`C(k) = k LY(n)e-J(27r/N)kn.
`
`N-I
`
`n~o
`
`( 1.29)
`
`[In principle, the C(k)'s may be computed over any period of y(n).]
`It is occasionally convenient to use an "engineering DTFT" for a peri(cid:173)
`odic signal that technically has no DTFT. The contrived DTFT com(cid:173)
`posed of analog impulse functions at the harmonic frequencies weighted
`by the DFS coefficients is
`
`(1.30)
`
`Such a construction is not always palatable to a mathematician, but it
`works for most engineering purposes in the sense that it can be used any(cid:173)
`where that a DTFI is needed for y(n), as long as the rules for continuous(cid:173)
`time impulse functions are carefully followed. Note that this DTFT cor(cid:173)
`rectly asserts, for example, that y(n) has infinite energy at the harmonic
`frequencies. Consistency with conventional Fourier transform computa(cid:173)
`tions is obtained by defining the magnitude spectrum of such a DTFT by
`
`One more contrived quantity is sometimes used. The power density spec(cid:173)
`trum (PDS) for a periodic signal y(n), say ~,(w), is a real-valued function
`of frequency such that the average power in y(n) on the frequency range
`WI to w 2 with 0:::; wi < w 2 < 2n is given by
`
`(1.31)
`
`1 fw2
`average power in y(n) on wE [wl' w2 ] =-
`
`7{
`
`Wi
`
`~.(w) dw
`
`.
`
`(1.32)
`
`k2
`
`= 2 L I C(k)l2,
`
`k~kl
`
`where ki and k 2 represent the integer indices of the lowest and highest
`harmonic of y(n) in the specified range. 8 It is not difficult to show that a
`suitable definition is
`
`f- I k 12
`(
`2n)
`T;,(w)=2nk~oo C() oaw-kN.
`def
`
`(1.33)
`
`"If w 1 and hence k 1 are zero, then the lower expression in ( 1.32) should read
`k,
`C(O) + 2 .2:1C(k)l 2
`
`•
`
`k~l
`
`Petitioner Apple Inc.
`Ex. 1012, p. 13
`
`
`
`14 Ch. 1 J Propaedeutic
`
`[The reader can confirm that this is consistent with ( 1.32).] By comparing
`with ( 1.30), it is clear why some authors choose to write I Y(wW as a no(cid:173)
`tation for the PDS. The advantage of doing so is that it gives the con(cid:173)
`trived DTFT of (1.30) yet another "DTFT-like" property in the following
`sense: IX(w)j 2 is properly called the energy density spectrum for an energy
`signal x(n) and can be integrated over a specified frequency range to find
`total energy in that range. I Y(w)j 2 is thus an analogous notation for an
`analogous function for a power sequence. The disadvantage is that it in(cid:173)
`troduces more notation which can be easily confused with a more
`"valid" spectral quantity. We will therefore use only 1(w) to indicate the
`.
`y
`PDS of a periodic signal y(n).
`The similarity of the DFT to the DFS is apparent, and this similarity
`is consistent with our understanding that the IDFT, if used outside the
`range n E [0, N- 1], will produce a periodic replication of the finite
`sequence x(n). Related to this periodic nature of the DFT are the proper(cid:173)
`ties of "circular shift" and "circular convolution" of which the reader
`must beware in any application of this transform. A few of these notions
`are reviewed in Problem 1.4.
`For interpretive purposes, it will be useful for us to note the following.
`Although the DTFT does not exist for a periodic signal, we might con(cid:173)
`sider taking the limit9
`
`1
`.
`-
`y ( w) = J~ 2N + 1
`
`N L y(n)e-Jwn
`
`n~-N
`
`( 1.34)
`
`in the hope of making the transform converge. A moment's thought will
`reveal that this computation is equivalent to the same sum taken over a
`single period, say
`
`( 1.35)
`
`We shall refer to Y(w), 0 < w < 2n, as the complex envelope spectrum for
`a periodic signal y(n). The theoretical significance of the complex enve(cid:173)
`lope is that it can be sampled at the harmonic frequencies to obtain the
`DFS coefficients for the sequence. The reader will recall that a similar
`phenomenon occurs in the analog domain where the FT of one period of
`a periodic signal can be sampled at the harmonics to obtain the FS
`coefficients.
`Finally, with regard to Fourier techniques, recall that the fast Fourier
`transform (FFT) is a name collectively given to several classes of fast al(cid:173)
`gorithms for computing the DFT. The literature on this subject is vast,
`
`9The operator notation .L'{ · ) will be used consistently in the text to denote a time aver(cid:173)
`age of this form. We will formally define the operator when we discuss averages in Section
`1.2.3.
`
`Petitioner Apple Inc.
`Ex. 1012, p. 14
`
`
`
`1 .1 1 Review of DSP Concepts and Notation 15
`
`but the textbooks cited above provide a general overview of most of the
`fundamental treatments of the FFT. Some advanced topics are found in
`(Burris, 1988).
`The final transform that will be used extensively in this book is the
`(two-sided) z-transform (ZT), defined by
`X(z) = I x(n)z-n,
`
`00
`
`n=-co
`
`(1.36)
`
`where z is any complex number for which the sum exists, that is, for
`which
`
`00 I ix(n)zl-n <co.
`
`n=-co
`
`( 1.37)
`
`x(n) =
`
`2!1
`
`The values of z for which the series converges comprise the region of
`convergence (ROC) for the ZT. When the series converges, it converges
`absolutely (Churchill, 1960, Sec. 59), implying that the ZT converges uni(cid:173)
`formly as a function of z everywhere in the ROC. Depending on the time
`sequence, the ROC may be the interior of a circle, the exterior of a cir(cid:173)
`cle, or an annulus of the form rin < lzl <rout' where rin may be zero and
`rout may be infinite. The ROC is often critical in uniquely associating a
`time sequence with a ZT. For details see the textbooks in Appendix l.A.
`The ZT is formally inverted by contour integration,
`J. X(z)zn-l dz,
`J::
`where e is a counterclockwise contour through the ROC and encircling
`the origin in the z-plane, but several useful computational methods are
`well known, notably the partial fraction expansion method, and the resi(cid:173)
`due method.
`The ZT plays a similar role in DSP to that which the Laplace trans(cid:173)
`form does in continuous processing. A good speech processing engineer
`will learn to "read the z-plane" much the same as the analog systems en(cid:173)
`gineer uses the s-plane. In particular, the reader should be familiar with
`the correspondence between angles in the z-plane and frequencies, and
`between z-plane magnitudes and "damping." The interpretation of pole(cid:173)
`zero plots in the z-plane is also an essential tool for the speech processing
`engineer.
`Finally, we recall the relationships among the two Fourier transforms
`and the ZT. From the definitions, it is clear that
`DTFT X(w) = ZT X(eJw)
`
`( 1.38)
`
`( 1.39)
`
`for any w, so that the DTFT at frequency w is obtained by evaluating
`the ZT at angle w on the unit circle in the z-plane. This is only valid, of
`
`Petitioner Apple Inc.
`Ex. 1012, p. 15
`
`
`
`16 Ch. 1 1 Propaedeutic
`
`course, when the ROC of the ZT includes the unit circle of the z-plane. 10
`The periodicity with period 2n of the DTFf is consistent in this regard.
`Since the DFT represents samples of the DTFT at frequencies wk,
`k = 0, l, ... , N- 1, it can be obtained by evaluating the ZT at equally
`spaced angles around the unit circle in the z-plane. Therefore,
`
`DIT X(k) = DTIT x(wk =;; k) = ZT X(ej(2n/N)k).
`
`(1.40)
`
`Since we use the same "uppercase," for example, X, notation to indicate
`all three transforms, it is occasionally necessary in DSP work to explic(cid:173)
`itly denote the particular transform with, for example, a presuperscript as
`in (1.40).
`
`1.1.5 Windows and Frames
`In all practical signal processing applications, it is necessary to work
`with short terms or frames of the signal, unless the signal is of short du(cid:173)
`ration.11 This is especially true if we are to use conventional analysis
`techniques on signals (such as speech) with nonstationary dynamics. In
`this case it is necessary to select a portion of the signal that can reason(cid:173)
`ably be assumed to be stationary.
`Recall that a (time domain) window, say w(n), is a real, finite length
`sequence used to select a desired frame of the original signal, say x(n), by
`a simple multiplication process. Some of th