`
`PB HEEEW
`
`
`
`Ex. 1014/Page1 of 33
`Apple v. Saint Lawrence
`
`Ex. 1014 / Page 1 of 33
`Apple v. Saint Lawrence
`
`
`
`E{
`
`Li-‘1
`953
`
`[E]
`(E
`El
`@
`
`VOLUME 61. NUMBER I2
`
`@eeemben filQWQJ
`
`published manthty by
`The Institute of Electrical and Eiectronics Engineers, Inc.
`
`.l_
`
`Thermal and Current Tuning Effects in GaAs High Power [MPATT Diodes. l'. Beiiemare and W. J.
`Chudobiak ...........................................................................
`Optics: Quantum Electronics. and Hoiogmpt'tr
`Two-Dimensional Imaging by Means of Multifrequency Hologram Matrix An Ultrasound Experi-
`1559
`ment. J. Nairqmmo, H. Ogurn. T. Mit-asiiita. and T. Sitibuyama .............................
`1571
`Finestructure in the Spectra ofGaAlAs Heterostrueture Lasers. J. M. Ottem-aidthnd B. J. Ricki-tr. .
`Modern apparatus for computerized tomographic analysis typically produces projection data from ray sums
`using fan beams, as shown in this design. See the paper beginning on page ltilh for a discussion of fan-beam
`reconstruction schemes.
`
`SCANNING THE ISSUE
`
`PAPERS
`
`[Pill
`
`{'flfl'l'f'fl'i'fl
`
`WWHEEE @;
`
`I‘. Oppt’itv
`Enhancement and Bandwidth Compression ol‘ Noisy Speech {invited Paper}. J. 5. Lin: and A.
`iieim ..................................................................................
`
`Computer Hardware Description Languages—A Tutorial. S, G, Strict: ............................
`Fan-Beam Reconstruction Methods. B. K. P. Horn ............................................
`
`Partial-Match Retrieval via the Method of Superimposed Codes (incited Paper). C. S. Roberts .......
`
`[ntermodulation Generation by Electron Tunneling Through AluminumAOxide Films. C. D, Bond.
`(2 S. Guenzer. and C. A. Caroxetiu ........................................................
`
`CONTRIBUTORS
`
`PROCEEDWGS LETTERS
`
`Eiectmmagncries and Pius-mus
`
`——-'r
`-—-i
`_.._——l
`__.i
`-"—_5
`
`g ll _
`
`1 66?
`
`Rotating Dielectric Sphere in a Low-Frequency Field. J. ll'tm Biadei ...........................
`
`Circuit and Sit-stem Tht'ofyi'
`
`Simple Statistical Assessment of System-Frequemy-Response Variance from the Time—Domain
`Data. J. 0. Howe-r and S. C. Forge ......................................................
`
`On the Extent of Asymptotic Stability for Output Variables by Liapunov's Direct Method. H. K.
`Laitw ..............................................................................
`
`Analysis of the Frequency Domain Adaptive Filter. N. J. Bet-sited and P. L. Fctnmcii .............
`Eta-ironic Circuits and Design
`
`A New Active-RC Circuit Realization of Floating Inductance. V, Sittgii ........................
`
`Realization of Stable Current-Controlled Frequency-Dependent Positive Resistance. S. Pmkai—
`yaudom and W. Srisarakham ...........................................................
`
`Comments on "A Grounded Inductance Simulation Using the DVOCSIDVCVS.“ L. R, Odess and
`A. M. Soiiman ........................................................................
`
`A Maximally Flat Group Delay Recursive Digital Filter with Improved Passband Magnitude Re-
`sponse. P. Thayiehayapang and F. Ckeeuasuvit .............................................
`Some Observations Concerning the Methods of Filtert05eillator Realization Using the Concept of
`FDNR. R. Senatri ....................................................................
`
`Compact Bandpass Filters for ROD-MHz Band Land Mobile Radio Equipments. S. Yamasitita.
`K. Tan-aka. and H. Mishtimt ...........................................................
`
`Electronic Bet-ices
`
`Ex. 1014 / Page 2 of 33
`
`
`
`t-ontt'ntr continued
`
`Communication Theorv
`Comments on "Spectral Estimation: An Impossibility?" B. H. Soft-r. R. Kikucitt', and R. Nitzberg. .
`
`Numerical Evaluation of the First-Order Characteristic Function of a Filtered Impulsive Noise
`Sample. J. C. Vttneil't' and N. M. Shehadeh ............
`
`Restrictions on the Efiective Bandwidth 01' Sampled Signals, B. Phitobos and P. M. Chiriiart ........
`
`Controi' Stations and Cybernetics
`Walsh F-unction Approach for Simplification of Linear Systems. R. Subtlety-on and M. C. Vaititilingarrt
`
`Computers
`t—Fault LIZ—Step Sequentially Diagnosable Systems. K. K. Soiuja and B. D. 0. Anderson ...........
`Miscellaneous
`
`Correction to "Computerized Geophysical Tomography." K. A. Dine: and R. J'. Lyiie ............
`
`On True 3-D Object Reconstruction from Line Integrals. E. Let-iron ............................
`
`.
`Spectra of Pulse Rate Frequency Synthesizers. V. F. Krottpu ................................ .
`BOOK NEVIS W5 Modern Control Sizvtam Theort' and Application. S. M. Shinners. reviewed by S. Adeinion ............
`Fiber Optics in Communications Systems, by G. R. Elton and H. A. Elton. rt’i‘iewedirir C. R. Paris-out. .
`.
`.
`..................
`Book Alert ..........................................
`
`1979 INDEX
`
`......................... Follows page
`
`1m
`
`1533
`1884
`1884
`
`‘I m
`
`tHI
`P-Rilt't'I-‘DINUS (lt-
`'979EI3I1'HRIAI Bout)
`
`ll-l-I
`
`(ilen Wade. Editor
`
`J. B. (iunn
`H. H. Happ
`A. R. Howland
`Hiroshi Inom:
`Akira lshimaru
`Tingyi: Li
`R. W. Luck}-
`J. I. Ci. McCuc
`
`W. R. Crone. Mttnugtng Editor
`lg'i‘l IEEF. Pl-BIJI'ATIUNS Bmkt)
`
`R. W. Luck y. (‘ittttrmrtn
`Thelma Estrin.
`Ir't't-t‘ (‘itttt'i-rriott
`E. K. (iannetl. Strtfi'St'rrt'ttrri'
`R. J. Joenk
`Charles Lynk. Jr.
`W. H. Peace. III.
`I. (1 Phillips
`N. D. Pundit
`
`J. S. Meditch
`Alex Meiherell
`Sanjit Mitra
`G. S. Moschgttz
`A. V. Oppenheim
`l. C. Peden
`L. R. Rabiner
`David Slepian
`Charles Susskind
`M. E. Van Valkenburg
`
`A. C. Schell
`Daniel Sheingold
`Jack Sipress
`T. E. Stcrn
`Glen Wade
`R. P. Wellinger
`
`HEADQUARTERS STAFF
`Eric Herz. Executive Director and General
`Manager
`PUBLISHING SERVICES
`
`Elwood K. Gannett. Stofi' Director.
`Pabitln‘iirtg Semices
`
`H. James Carter. Associate Stofl‘Di'rt't‘tor
`Kearin Kolbre. Manager. information
`Sfrl‘it‘flt'
`Patricia H. Peniclt. Manager. Publication
`Administration Services
`Otto W. Vathke. Pubiir'ott'on Basins-tar
`Manager
`
`Ann l-I. Burgrneyer. Carolyne Elenowitz.
`Gail S. Ferenc'. Isabel Narea.
`Production Managers
`' Responsibiefor this pubit'r'atiort
`
`Joseph Morsicalo. Supervisor. Specitti
`Publications
`
`Prijono Hardjowirogo. Patricia H. Nolan.
`Kalltc Zapili. Associate Editors
`ADVERTISING
`
`William R. Saunders. Advertising Director
`
`arid of his relation to the subject.
`
`ca” Male" “”9””“9 PM““‘"”
`('mmttmtm - Hirnahi Inuiic. R. A. Kennedy. v. P. Kodali. A. H. Morgan
`Manager
`PHIN'I'IuDINGJ'i tilt I'HIt' Ilili'l‘. lh published monthly h} The lll\ll[|JlL' of Electrical and
`script has been accepted for publication. the author‘s organization will hi: requrslt'd Initial
`Electronic-i
`Itl‘lgll‘lWT'fi.
`Inc Harlot-Hm: .145 l-ust
`47" Street. Netti, York. NY IDUIT
`\Oltlltl'nll') charge of STD per printed page to cover part nl‘the publication cunt. Relations! hill
`l'ut contents of papers rests upon the authors and not on the IEEE or its memhcl"
`III-21v Service (‘emer trot orders. ‘iul‘m‘l’tnllul‘lh. addrcxa change“: 443 Horn lane. Ptiwala-
`“a, HI (511354. Telephones: Managing Edtiur 2|: tut-#7551 Publishing Services 21':
`"ll-I' whirl: IEEE members. first subscription “01!) in addfllflll to due‘ “it'll
`mum.- IEEE Seruce Center .‘l'll Nil-0060'. Adi-cruising III 644-75"!
`('nflyria‘ll‘l ltll
`will. now. Prion I‘or nonmembers available on rtqucst. Available on microfiche at
`Ilepr‘utt Permissions: Abstracting is permitted with L'I'Bdll
`to Iht’ \llqu‘E. Libraries.- an:
`microfilm. Change of address- must be received by the test or a month to he circuit-e i
`pen‘l‘tlttcil tii ril'iiilix'op} beyond the Iirriiiiiiif [LS (‘iipyrlghl I an liirnrtiale uwut'patriins
`"It following month's issue. Send new addrcss‘ plus mailing label ghowtng old adrift“:
`ili those post- I‘l‘t‘t urlltlltii that can) a code at the bottom of the first! page. provided the
`to the [F.EF Service Center.
`per-copy I'ei: indicated in the node is paid through the Copyright Clearance ('cntet: :21 final
`will articles without fee. lnsttuctun an: permitted to phiiliioop} Isolated article-i for non-
`i.‘ mercial classroom uni: without fee. I-or other copying. reprint or republicanon pert-rite.-
`n, wtilr: to Director. Publishing Sen-ices tit WEE. He:dquarler<. All
`righ'lit‘ rwrxcd.
`.
`'npyrtght
`I will by. The Institute iiI'EIecttiL-ul and hiccttonics Engineers. I
`,'Priiilcd III
`USA. Second-class postage paid at New York. NY and at additional matltriIuflicui.
`NIH-scripts should be submitted in triplltate tn thi: thtrit at ll-.[-.I-. Headqufiriers A sum-
`Illiiry 0! Instructions for preparation is tisttnd tn the most recent January NSIk. Detailed in-
`\i.ruulions are contained Ill "Information for IEFF. Authors." available on while-it. Sci.- nulc
`AI bettinnintt of "Proceeding!- letters" for special tn~ttttcritiii~ for I'll! waitiin After a manu-
`
`U. C. Andrews
`H. (i. Booker
`Hsu Chang
`D. G. Childers
`B. M. (‘hu
`James Early
`Donald Fink
`Harlow Freitag
`J. W. Goodman
`A. B. Urebenc
`
`C. J. Baldwin. Jr.
`T. H. Bonn
`Donald Christiansen
`I). B. Dobsnn
`G. D. Forney. Jr.
`F..
`l. Gordon
`
`\
`
`Advertising correspondence should be addressed to the Advertising Department ill It!
`Headquarters.
`K‘oot't'qlfl: It is the policy ol'llle IE'IIEF. to own the copyright tothc technical mnII'IhI-l'")
`ll publishes on hehall'rrf the Interests of the IEEE. iii. authors and their empioyfls- “‘1
`rflfillltfl't‘ ”1': “I‘Wlltlfiale reuse ofthis material by others. To comply with the U .5- COP) "I
`[.ah. authors art: required to Sign an IEEE copyright transfer form before puhllCaliDn- T1
`form. a copy MWhICh appears in the January I919 issue ul'thts journal. returrw lolllltl'tfl
`and lhl‘ll’ employers full rights to reuse lhtlr material for their own purposes Authors ml.-
`sul'tl'llll a 5|!!le tom of Ihi< l'on'rl with their manmctiph
`
`The BRIX‘t-zfititNus of THE IEEE welcon'tcs for consideration I] contributed tutorial-review papers in all areas of electrical engineering. and
`2] contributed research papers on subjects of broad interest to IEEE members. The prospective author of a tutorial-review paper is encouraged
`to submit an advance proposal giving an outline of the proposed coverage and a brief explanation ofwhy the subject is of current importance
`
`Contributed Papers
`
`Ex. 1014 / Page 3 of 33
`
`
`
`Enhancement and Bandwidth Compression
`
`PROCEEDINGS OF THE IEEE. VOL. 67. NO. 12. DECEMBER 1979
`
`of Noisy Speech
`
`JAE S. LIM, MEMBER, lEEE, AND ALAN V. OPPENHEIM, FELLOW, IEEE
`
`in aired Paper
`
`Abstract-Over the past several years there has been considerable
`attention focused on the problem of enhancement and bandwidth
`compression of speech degraded by additive background noise. This
`interest is motivated by several factors including a broad set of irnpor
`taut applications, the apparent lack of robustness in current speech-
`compressiou systems and the development of several potentially
`promising and practical solutions. One objective of this paper is to
`provide an overview of the variety of techniques that have been pro-
`posed for enhancement and bandwidth compression of speech degraded
`by additive background noise. A second objective is to suggest a uni-
`tying framework in terms of which the relationships between these
`systems is more visible and which hopefully provides a structure which
`will suggest fruitful directions for further research.
`
`l.
`
`INTRODUCTION
`
`00]8-9219f79IIZOO—1586500JS © 1979 IEEE
`
`for speech enhancement arises include correcting for reverber-
`ation, correcting for the distortion of the speech of underwater
`divers brcathing a helium-oxygen mixture, and Correcting
`the distortion of speech due to pathological difficulties of the
`speaker or introduced due to an attempt to speak too rapidly.
`Even for these examples, the problem and techniques vary,
`depending on the availability of other signals or information.
`For example, for enhancement of speech in an aircraft a
`separate microphone can be used to monitor the background
`noise so that the characteristics of the noise can be used to
`adjust or adapt
`the enhancement system. At
`the air-traffic
`control tower. however, the only signal available for enhance-
`ment is the degraded speech.
`Another very important application for speech enhancement
`in conjunction with speech bandwidth compression sys-
`is
`tems. Because of the increasing role of digital communication
`channels coupled with the need for encrypting of speech and
`increased emphasis on integrated voice-data networks, speech-
`bandwidth-compression systems are destined to play an in-
`creasingly important role in speech-communication systems.
`The conceptual basis for narrow-band speech-compression
`systems stems from a model for the speech signal based on
`what
`is known about the physics and physiology of speech
`production. Because of this reliance on a model for the signal
`it is not unreasonable to expect that as the signal deviates from
`the model due to distortion such as additive noise, the per-
`formance of the speech compression system with regard to
`factors such as quality,
`intelligibility. etc, will degrade.
`it
`is generally agreed that
`the performance of current spatiall-
`compression systems degrades rapidly in the presence of
`additive noise and other distortions and there is currently
`considerable interest and attention being directed at
`the
`development of more robust speech compression systems.
`There are two basic approaches which are typically considered
`either of which may be preferable in a given situation. One
`approach is
`to base the bandwidth compression on the as-
`sumption of undistorted speech and develop a preprocessor
`to enhance the degraded speech in preparation for further
`processing by the bandwidth compression system. It is impor-
`tant
`to recognize that
`in enhancing speech in preparation
`for bandwidth compression the effectiveness of the prepro-
`ceuor is judged on the basis of the output of the bandwidth-
`compression system in comparison with the output if no
`preprocessor is used. Thus, for example, it is possible that
`the output of the preprocessor would be judged by a listener
`to be inferior (by some measure) to the input but that the
`output of the bandwidth-compression system with the pre-
`processor is preferred to the output without it.
`In this case,
`the preprocessor would clearly be considered to be effective
`
`is
`
`PIIftHERB ARE a wide variety of contexts in which it
`
`desired to enhance speech. The objective of enhance-
`ment may perhaps be to improve the overall quality, to
`increase intelligibility, to reduce listener fatigue, etc. Depen d-
`ing on the specific application, the enhancement system may
`be directed at only one of these objectives or several. For
`example, a speech communication system may introduce a
`low-amplitude long-time delay echo or a narrow-band additive
`disturbance. While these degradations may not by themselves
`reduce intelligibility for the purposes for which the channel
`is used, they are generally objectionable and an improvement
`in quality perhaps even at the expense of some intelligibility
`may be desirable. Another example is the communication
`between a pilot and an air traffic control
`tower.
`in this
`environment, the speech is typically degraded by background
`noise. 0f central importance is the intelligibility of the speech
`and it would generally be acceptable to sacrifice quality if the
`intelligibility could be improved.
`Even with normal unde-
`graded speech,
`it is sometimes useful or desirable to provide
`enhancement. As a simple example high-pass filtering of nor-
`mal speech is often used to introduce a “crispness” which is
`generally perceived as an improvement in quality.
`The speech-enhancement problem covers a broad spectrum
`of constraints. applications and issues. Environments in which
`an additive background signal has been introduced are com-
`mon, The background may be noise-like such as in aircraft,
`street noise, etc. or may be speech-like such as an environment
`with competing speakers. Other examples in which the need
`
`Manuscript received lune 22. 1919: revised August 28, 19‘”. This
`work was supported in part by the Defense Advance Research Projects
`Agency monitored by the Office of Naval Research under Contract
`NW014-15-C-0951-NR049-328 at M.l.T. Research Laboratory of Elec-
`tronics and in part by the Department of the Air Force under Contract
`Fuses-raccoon: at M.l."l'. Lincoln Laboratory.
`The authors are with M131“. Research Laboratory of Electronics and
`MJJ’. Lincoln leoratory. Cambridge, MA 02139.
`
`Ex. 1014 / Page 4 of 33
`
`
`
`LEM AND OPPENHEIM: ENHANCEMENT AND BANDWIDTH COMPRESSION
`
`1531'
`
`whether quality, intelligibility, or some other attribute is the
`
`
`in enhancing the speech in preparation for bandwidth com-
`pression. Another approach to bandwidth compression of
`degraded speech is to incorporate into the model for the signal
`informatidn about
`the degradation. A number of systems
`based on such an approach have recently been proposed and
`will be discussed in detail in this paper.
`As is evident from the above discussion, the general problem
`of enhancing speech is broad and the constraints, information,
`and objectives are heavily dependent on the specific context
`and applications.
`in this paper, we consider only a small
`subset of possible topics, specifically the enhancement and
`bandwidth compression of speech degraded by additive noise.
`Furthermore, we assume that the only signal available is the
`degraded speech and that the noise does not depend on the
`original speech. Many practical problems, some of which have
`already been discussed, fall
`into this framework and some
`problems that do not can be transformed so that they do.
`For example, multiplicative noise or convolutional noise
`degradation can be converted to an additive noise degradation
`by a homomorphic transformation [1], [2]. As another
`example, signal-dependent quantization noise in pulse-code
`modulation {PCM} signal coding can be converted to a signal
`independent
`additive noise by a pseudo-noise technique
`{SI-[51.
`Even within the limited framework outlined above, there is a
`diversity of approaches and systems. One objective of this
`paper is to provide an overview of the variety of techniques
`that have been proposed for enhancement of speech degraded
`by additive background noise both for direct listening and as
`a preprocessor for subsequent bandwidth compression. Many
`of these systems were developed independently of each other
`and on the surface often appear to be unrelated. Thus another
`objective of the paper is to provide a unifying framework in
`terms of which the relationship between these systems is more
`visible, and which hopefully will provide a structure which
`will suggest further fruitful directions for research.
`In Section II, we present an overview of the general topic.
`In this overview we classify the various enhancement systems
`based on the information assumed about the speech and the
`noise. Some systems based on time-invariant Wiener filtering,
`for example, rely only on an assumed noise power spectrum
`and on long-time average characteristics of speech, such as the
`fact that the average speech spectrum decays with frequency
`at approximately 6 dBfoctave. Other systems rely on aspects
`of speech perception or speech production in general or on a
`detailed model of speech.
`Sections III-V present a more detailed discussion of several
`of these categories of speech-enhancement systems.
`In partic-
`ular, Section III is concerned with the general principle of
`speech enhancement based on estimation of the short-time
`spectral amplitude of the speech. This basic principle encom—
`passes a variety of techniques and systems including the
`specific methods of spectral subtraction, parametric Wiener
`filtering, etc.
`In Section IV, speech enhancement techniques
`which rely principally on the concept of the short-time period-
`icity of voiced speech are reviewed, including comb-filtering
`and related systems. Section V discusses a variety of systems
`that rely on more specific modeling of the speech waveform.
`As we will discuss in detail, in some cases. parameters of the
`model are obtained from an analysis of the degraded speech and
`used to synthesize the enhanced speech.
`In other cases, the
`results of an analysis based on a model for speech are used
`to control an enhancement filter, perhaps with the procedure
`
`being iterative so that the output of an enhancement filter is
`then subjected to further analysis, etc. Many of these systems
`also incorporate a number of the techniques introduced in
`Section III, including Wiener filtering and spectral subtraction.
`In Sections III-V, the focus is entirely on systems for en-
`hancement with the evaluation of the systems being based
`on listening without further processing.
`In Section Vi, we
`consider
`the related but
`separate problem of bandwidth
`compression of speech degraded by additive noise.
`In Section VII, we discuss in some detail the evaluation of
`the performance of the various systems presented in the earlier
`sections.
`In general, the performance evaluation of a speech-
`enhancement system is extremely difficult,
`in large measure
`because the appropriate criteria for evaluation are heavily
`dependent on the specific application of the system. Relative
`importance of such factors as quality, intelligibility, listener
`fatigue, etc., may vary considerably with the application.
`In
`Section VII, we summarize the performance evaluations that
`have been reported for the various systems presented in this
`paper. Since the evaluation of different systems has generally
`been based on different procedures,.environments, etc., no
`attempt is made in the section to compare individual systems.
`In general, however, we will see that while many of the en-
`hancement systems reduce the apparent background noise
`and thus perhaps increase quality, many of them to varying
`degrees, reduce intelligibility.
`In the context of bandwidth
`compression, however, various systems provide an increase
`in intelligibility over that obtained without the incorporation
`of speech enhancement.
`
`1]. Ovsnvrsw or SverMs son ENHANCEMENT AND
`Barrow! [mt COM PRESSIDN or Norsv Season
`As indicated in the previous section, our focus in this paper
`is on degradation due to the presence of additive noise. Even
`within this limited context
`there are a wide variety of ap-
`proaches which have been proposed and explored. Conceptu-
`ally any approach should attempt to capitalise on available
`information about the signal, i.e., the speech, and the back—
`ground noise.
`Speech is a special subclass of audio signals
`and there are reasonable models in terms of which the speech
`waveform can be described and categorized. The more speci-
`fically we attempt to model the speech signal, the more poten-
`tial 'for separating it from the background noise. On the other
`hand, the more we assume about the speech the more sensitive
`the enhancement system will be to inaccuracies or deviations
`from these assumptions. Thus incorporating assumptions and
`information about the speech signal represents tradeoffs which
`are reflected in the various systems.
`In a similar manner sys-
`tems can attempt to incorporate detailed information about
`the background noise. For example, the type of processing
`suggested if the background noise is a competing speaker is
`different than if it is wide-band random noise. Thus enhance-
`ment systems also tend to differ in terms of the assumptions
`made regarding the background noise. As with assumptions
`related to the signal,
`the more an enhancement system at—
`tempts to capitalize on assumed characteristics of the noise
`the more susceptible it is likely to be to deviations from these
`assumptions.
`Another important consideration in speech enhancement
`stems from the fact that the criteria for enhancement ulti«
`mately relate to an evaluation by a human listener. in different
`contexts the criteria for evaluation may differ depending on
`
`Ex. 1014 / Page 5 of 33
`
`
`
`1588
`
`PROCEEDINGS OF THE IEEE, VOL. 67. NO. 12, DECEMBER I979
`
`PITCH PERIUII}
`
`DlGlTnL FILTER COEFFICIENTS
`
`the use of masking for enhancement. By adding broad-band
`
`time varying transfer function for the linear system in ‘Fig. 1.
`However, because of the mechanical and physiological con-
`straints on the motion of the vocal
`tract and articulators
`such as the tongue and lips,
`it is reasonable to represent the
`linear system in Fig.
`l as a slowly varying linear system so that
`on a short—time basis it
`is approximated as stationary. Thus
`some specific attributes of the speech signal, which can be
`capitalized on in an enhancement system are that
`it is the
`response of a slowly varying linear system, that on a short-
`time basis its spectral envelope is characterized by a set of
`resonances, and that for voiced sounds, on a short-time basis
`it has a harmonic structure. This simplified model for speech
`production has generally been very successful in a variety of
`engineering contexts including speech enchancement, synthe-
`sis, and bandwidth compression. A more detailed discussion
`of models for speech production can be found in [6]-[8]..
`The perceptual aspects of speech are considerably more __
`complicated and less well understood. However, there are a
`number of commonly accepted aspects of speech perception
`which play an important role in speech-enchancement systems.
`For example, consonants are known to be important in the
`intelligibility of speech even though they represent a relatively
`small fraction of the signal energy. Furthermore, it is generally
`understood that
`the short-time spectrum is of central impor-
`tance in the perception of speech and that, specifically, the
`formants in the short-time spectrum are more important than
`other details of the spectral envelope.
`It appears also, that the
`first formant, typically in the range of 250 to 800 Hz, is less
`important perceptually, than the second formant [9], [10].
`Thus it is possible to apply a certain degree of high pass filter-
`ing [11], [12]
`to speech which may perhaps affect the first
`formant without introducing serious degradation in intelligi-
`bility. Similarly lowpaSs filtering with a cutoff frequency
`above 4 kHz, while perhaps affecting crispness and quality
`will in general not seriously affect intelligibility. A good repre-
`sentation of the magnitude of the short-time spectrum is also
`generally considered to be important whereas the phase is
`relatively unimportant. Another perceptual aspect of the
`auditory system that plays a role in speech enhancement is the
`ability to mask one signal with another. Thus, for example,
`narrow-band noise and many forms of artificial noise or deg-
`radation such as might be produced by a vocoder are more
`unpleasant to listen to than broad-band noise and a speech-
`enhancement system might include the introduction of broad-
`band noise to mask the narrow-band or artificial noise.
`All speechenhancement systems rely to varying degrees on
`the aspects of speech production and perception outlined
`above. One of the simplest approaches to enhancement is the
`use of low-pass or bandpass filtering to attenuate the noise
`outside the band of perceptual importance for speech. More
`generally, when the power spectrum of the noise is known,
`one can consider the use of Wiener filtering, based on the long-
`time power spectrum of speech. While in some cases such as
`the presence of narrow-band background noise, this is reason-
`ably successful, Wiener filtering based on the long-time power
`spectrum of the speech and noise is limited because speech is
`not stationary. Even if speech were truly stationary, mean-
`square error which is the error criterion on which Wiener
`filtering is based is not strongly correlated with perception and
`thus is not a particularly effective error criterion to apply to
`speech processing systems. This is evidenced, for. example, in
`
`TRAIN
`
`elplnll
`
`I—RANOOM '
`NOISE
`.
`
`hMPLlTUDE
`
`W__' _'1
`SPEECH
`.—-'|- SAMPLES
`{slnl}
`
`TIME UARTING
`DIGITAL FILTER|
`Viz]
`
`filth”?
`Fig.1.
`A speech production model.
`
`LOGIVlquI
`
`LccISlqul.
`
`(a)
`Fig. 2. An example of resonant frequencies of an acoustic cavity.
`(a) Vocal-tract transfer function.
`(h) Magnitude spectrum of a speech
`sound with the resonant frequencies shown in (a).
`
`important. Thus speech enhancement must inevitably
`most
`take into account aspects of human perception. As we will
`indicate shortly, some systems are heavily motivated by per-
`ceptual considerations, others rely more on mathematical
`criteria.
`In such cases, of course, the mathematical criteria
`must in some way be consistent with human perception, and,
`while an optimum mathematical criterion is not known, some
`mathematical error criteria are understood to be a better
`match than others to aspects of human perception.
`In the following discussion we briefly describe some aspects
`of speech production and speech perception that in varying
`degrees play a role in speech-enhancement systems. Following
`that we present a brief overview of a representative collection
`of speech-enchancement systems. with the intent of cate-
`gorizing these systems in terms of the various aspects of
`speech production and perception on which they attempt to
`capitalize.
`Speech is generated by exciting an acoustic cavity, the vocal
`tract, by pulses of air released through the vocal cords for
`voiced sounds, or by turbulence for unvoiced sounds. Thus
`a simple but useful model for speech production consists of
`a linear system, representing the vocal tract, driven by an
`excitation function which is a periodic pulse train for voiced
`sounds and wide-band noise for unvoiced sounds, as illustrated
`in Fig. 1. Furthermore, since the linear system represents an
`acoustic cavity, its response is of a resonant nature, so that
`its transfer function is characterized by a set of resonant
`frequencies, referred to as formants, as illustrated in Fig. 2(a).
`Thus,
`if the excitation and vocal-tract parameters are fixed,
`then as indicated in Fig. 2(b),
`the speech spectrum has an
`envelope representing the vocal-tract
`transfer
`function of
`Fig. 2(a) and a fine structure representing the excitation.
`Many of the techniques for speech enhancement, particu-
`larly those in Sections III and V are conceptually based on
`the representation of the speech signal as a stochastic process.
`This characterization of speech is clearly more appropriate in
`the case of unvoiced sounds for which the vocal tract is driven
`by wide-band noise. The vocal tract of course changes shape
`as different sounds are generated and this is reflected in a
`
`Ex. 1014 / Page 6 of 33
`
`
`
`LIM AN D UFPENHEIM: ENHANCEMENT AN D BANDWIDT'H COMPRESSION
`
`1589
`
`Extensions to pole-zero modeling have also been proposed
`
`noise to mask other degradation, we are, in effect, increasing
`the mean-square error. Another example that suggests that
`mean-square error is not well matched to the perceptually
`important attributes in speech is the fact that distortion of the
`speech waveform by processing with an all-pass filter results
`in essentially no audible difference if the impulse response of
`the all-pass filter is reasonably short but can result in a sub-
`stantial mean-square error between the original and filtered
`speech.
`In other words, mean-square error is sensitive to phase
`of the spectrum whereas perception tends not to be.
`Masking and bandpass filtering represent two simple ways
`in which perceptual aspects of the auditory system can be
`exploited in speech enhancement. Another system whose
`motivation depends heavily on aspects of speech perception
`was proposed by Thomas and Niederjohn [12] as a preproces-
`sor prior to the introduction of noise in those applications
`where noise-free speech is available for processing.
`In essence,
`their system applies high-pass filtering to reduce or remove the
`first formant followed by infinite clipping. The motivation
`for the system lies in the observation that at a given signal-
`to~noisc ratio infinite clipping will
`increase, relative to the
`vowels,
`the amplitude of the perceptually important
`low-
`amplitude events such as consonants thus making them less
`susceptible to masking by noise.
`In addition, for vowels
`the filtering will
`increase the amplitude of higher formants
`relative to the first formant, thus making the perceptually
`more important higher formants less susceptible to degrada-
`tion.
`In the speech enhancement problem considered in this
`paper, noise-free speech is not available for processing as re-
`quired in the above system. Thomas and Ravindran [13],
`however, applied high-pass
`filtering followed by infinite
`clipping to noisy speech as an experiment. While quality may
`be degraded by the process of filtering and clipping, they claim
`a noticeable improvement
`in intelligibility when applied to
`enhance speech degraded by wide-band random noise. One
`possible explanation may be that the high-pass filtering opera-
`tion reduces the masking of perceptually important higher
`formants
`by
`the
`relatively
`unimportant
`low-fr