`
`Audio Signal
`Processing and Coding
`
`Andreas Spanias, Ted Painter, and Venkatraman Atti
`
`YID
`
`Pie
`
`440
`
`lq
`
`h znt:tui 7,4
`
`t
`
`II U
`
`- •••-•91, r
`
`HULU LLC
`Ex
`IPR2
`
`Comcast - Exhibit 1008, page 1
`
`
`
`Copyright (0 2007 by John Wiley & Sons, Inc. All rights reserved.
`
`Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
`Published simultaneously in Canada.
`
`Ni, part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
`form or by any means, electronic. mechanical, photocopying. recording, scanning, or otherwise,
`except as permitted under Section 107 or 108 of the 1976 United Slates Copyfight Act, without
`either the prior written permission of the Publisher. or authorization through payment of the
`appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewoixl Drive. Danvers,
`MA 01923, (978) 7511-84011. fax (9781 750-1470. or on the web at www.copyright.com, Requests
`to the Publisher for permission should he addressed to the Permissions Department, John Wiley &
`Sons, Inc.. I I I River Street, Hoboken, NJ 07030. (2011 748-6011, fax (2(11) 748-6008, or online at
`latp://www,wiley.conii/go/permission.
`
`Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
`efforts in preparing this book. they make no representatives or warranties with respect to the
`accuracy or completeness of the contents of this hook and specifically disclaim ally implied
`warranties of merchantability or fitness fry a particular purpose. No warranty may he created or
`extended by sales representatives or written sales materials. The advice and strategies contained
`herein may not be suitable for your situation. You should consult with a professional where
`appropriate.. Neither the publisher nor author shall be liable for any loss of profit or any other
`commercial damages. including but not limited to special. incidental, consequential. or other
`damages.
`
`For general information on our other products and services or for technical support, please contact
`our Customer Care Department within the United States at (800) 762-2974, outside the United
`Slates at (317) 572-39'13 or fax (3171 572-4002,
`
`Wiley also publishes its books i❑ a variety of electronic formats. Some content that appears in print
`may not he available in electronic formats. For more information about Wiley products, visit our
`web site at www.wiley.com.
`
`Wiley Bicentennial Logo: Richard J. Pacifico
`
`Library of Congress Cataloging-in-Publication Data:
`
`Spanias, Andreas.
`Audio signal processing and coding/by Andreas Spanias, Ted Painter, Venkatraman Atti.
`p. CM.
`"Wiley-Interscience publication."
`Includes bibliographical references and index.
`ISBN: 978-0-471-79147-8
`1. Coding theory. 2. Signal processing—Digital techniques. 3..Sound—Recording and
`reproducing—Digital techniques. I. Painter, Ted, 1967-1I. Alti, Venkattalnan, 1978-ITL
`Title.
`
`TK.5102.92.S73 2006
`621.382'8—dc22
`
`Printed in the United States of America.
`
`10 9 8 7 6 5 4 3 2 1
`
`2006040507
`
`Page 2
`
`Comcast - Exhibit 1008, page 2
`
`
`
`
`
`
`CAAAER
`
`NwwnNPOWRNLHow
`
`PREFACE
`
`———(_-ee_OJOQOrOCOC——eSeseeFeFeFeFeFeFeFeFFesesese
`
`
`
`Ww
`
`Audio processing and recording has been part of telecommunication and enter-
`tainmentsystems for more than a century. Moreover bandwidth issues associated
`with audio recording, transmission, and storage occupied engineers fronthe very
`early stages in this field. A ser
`'s of important technological developments paved
`the way from early phonographs to magnetic tape recording, and lately compact
`disk (CD), and super storage devices. In the following, we capture some ofthe
`main events and milestones that mark the history in audio recording and storage.!
`Prototypes of phonographs appeared around 1877, andthe first attempt to mar-
`ket cylinder-based gramophones was by the Columbia Phonograph Co.
`in 1889.
`
`Five years later, Marconi demonstrated the first radio transmission that marked
`the beginning of audio broadcasting. The Victor Talking Machine Company, with
`the litte nipper dog as its trademark, was formed in 1901. The “telegraphone”, a
`magnetic recorder for voice that usedstill wire, was patented in Denmark around
`the end ofthe nineteenth century. The Odeon and His Masters Voice (HMV)
`label produced and marketed music recordings in the early nineteen hundreds,
`The cabinet phonograph with a horn called “Victrola” appeared al about the same
`time. Diamonddisk players were marketed in 1913 followed by efforts to produce
`sound-on-film for motion pictures. Other milestones include the first commercial
`transmission in Pittsburgh and the emergenceof public address amplifiers. Elec-
`trically recorded material appearedin the 1920s and the first sound-on-film was
`demonstrated in the mid 1920s by Warner Brothers. Cinemaapplications in the
`
`1930s promoted advances in loudspeaker technologies leading to the develop-
`ment of woofer, tweeter
`, and crossover network concepts. Juke boxes for music
`also uppeared in the 1930s. Magnetic tape recording was demonstrated in Ger-
`many in the 1930s by BASF and AEG/Telefunken. The Ampex tape recorders
`‘ppeared in the US in the late 1940s. The demonstration ofstereo high-fidelity
`(Hi-Fi) soundin the late 1940s spurred the development of amplifiers, speakers,
`and reel-to-reel
`tape recorders for home use in the 1950s both in Europe and
`Comcast- Exhibit 1008, page3
`
`
`
`
`
`Comcast - Exhibit 1008, page 3
`
`
`
`
`
`Xvi
`
`
`
`PREFACE
`
`
`
`Photos
`Extras
`Settings
`
`Shuffle Songs
`
`Apple iPod@a, (Courtesy of Apple Computer, tac.) Apple iPod®is a registered trademark
`of Apple Computer, Inc.
`
`the US. Meanwhile, Columbia produced the 33-rpmlong play (LP) vinyl record,
`while its rival RCA Victor produced the compact 45-rpm format whose sales
`took off with the emergence of rock androll music. Technological developments
`in the mid 1950s resulted in the emergence of compact transistor-based radios
`and soon afler small
`tape players. In 1963, Philips introduced the compact cas-
`setle tape format with its EL3300 series portable players (marketed in the US as
`Norelco) which became an instant success with accessories for home, portable,
`andcar use. Bight track cassettes became popularin the late [960s mainly for car
`use. The Dolby system for compact cassette noise reduction was also a landmark
`in the audio signal processing field. Meanwhile, FM broadcasting, which had
`been invented earlier, took off in the (960s and L970s with stereo transmissions.
`Melical
`tape-head technologies invenied in Japan in the 1960s provided high
`bandwidth recording capabilities which enabled video tape recorders for home
`use in the 1970s (e.g., VHS and Beta formats), This technology was also used
`in the 1980s for audio PCMstereo recording, Laser compact disk technology
`was introduced in 1982 and by the late 1980s became the preferred format for
`Hi-Fi stereo recording. Analog compact cassette players, high-quality reel-to-reel
`recorders, expensive turntables, and virtually all analog recording devices started
`fading away by the late 1980s. The launch ofthe digital CD audio format
`in
`
`
`
`
`
`Comcast - Exhibit 1008, page 4
`
`Comcast - Exhibit 1008, page 4
`
`
`
`PREFACE
`
`xvii
`
`Comcast - Exhibit 1008, page 5
`
`the 1980s coincided with the advent of personal computers, and took over in
`all aspects of music recording and distribution. CD playback soon dominated
`broadcasting, automobile, home stereo, and analog vinyl LP. The compact cas-
`sette formats becamerelics of an old era and eventually disappeared from music
`stores. Digital audio tape (DAT) systemsenabled by helical tape head technology
`were also introduced in the 1980s but were commercially unsuccessful because
`of strict copyright laws and unusually large taxes.
`Parallel developments in digital video formats for laser disk technologies
`included work in audio compression systems. Audio compression research papers
`started appcaring mostly in the 1980s al IEEE ICASSP and Audio Engineer-
`ing Society conferences by authors from several research and development labs
`including, Erlangen-Nuremburg University and Fraunhofer IIS, AT&T Bell Lab-
`oratories, and Dolby Laboratories. Audio compression or audio coding research,
`the art of representing an audio signal with the least number of information
`bits while maintaining its fidelity, went through quantum leaps in the late 1980s
`and 1990s. Although originally most audio compression algorithms were devel-
`oped as part of the digital motion video compression standards, e.g., the MPEG
`series, these algorithms eventually became important as stand alone technologies
`for audio recording and playback. Progress in VLSI technologies, psychoacous-
`lics and efficient time-frequency signal representations made possible a serics of
`scalable real-time compression algorithms for use in audio and cinema applica-
`tions. In the 1990s, we witnessed the emergence ofthe first products that used
`compresscd audio formats such as the MiniDisc (MD) and the Digital Compact
`Cassette (DCC). The sound and video playing capabilities of the PC and the
`proliferation of multimedia content through the Internet had a profound impact
`on audio compression technologies. The MPEG-I/-2 layer II (MP3) algorithm
`became a defacto standard for Internet music downloads. Specialized websites
`that feature music content changed the ways peuple buy and share music. Com-
`pact MP3 players appeared in the late 1990s. In the early 2000s, we had the
`emergence of the Apple iPod® player with a hard drive that supports MP3 and
`MPEGadvanced audio coding (AAC) algorithms.
`In order to enhance cinematic and home theater listening experiences and
`deliver greater realism than ever before, audio codec designers pursued sophis-
`licated multichannel audio coding techniques. In the mid 1990s, techniques for
`encoding 5.1 separate channels of audio were standardized in MPEG-2 BC and
`later MPEG-2 AAC audio. Proprictary multichannel algorithms were also devel-
`oped and commercialized by Dolby Laboratories (AC-3), Digital Theater System
`(DTS), Lucent (EPAC), Sony (SDDS), and Microsoft (WMA). Dolby Labs, DTS,
`Lexicon, and other companies also introduced 2:N channel upmix algorithms
`Capable of synthesizing multichannel surround presentation from conventional
`stcreo content (¢.g., Dolby ProLogic II, DTS Neo6). The human auditory system
`is capable of localizing sound with greater spatial resolution than current multi-
`channel audio systems offer, and as a result the quest continues to achieve the
`ullimate spatial fidelity in sound reproduction. Research involving spatial audio,
`real-time acoustic source localization, binaural cue coding, and application of
`
`Comcast - Exhibit 1008, page 5
`
`
`
`
`
`
`PREFACE
`
`xviii
`
`head-related transfer functions (HRTF) towards rendering immersive audio has
`gained interest. Audiophiles appeared skeptical with the 44.1-kHz 16-bit CD
`stereo format and some were critical of the sound quality of compression for-
`mats, These ideas along with the need for copyright protection eventually gained
`momentum and new standards and formats appeared in the early 2000s.
`In par-
`ticular, multichannel lossless coding such as the DVD-Audio (DVD-A) andthe
`Super-Audio-CD (SACD) appeared. The standardization of these storage for-
`mats provided the audio codec designers with enormous storage capacity. This
`motivated lossless coding of digital audio.
`The purpose ofthis book is to provide an in-depth treatment of audio com-
`pression algorithms and standards, The topic ts currently occupying several com-
`munities in signal processing, multimedia, and audio engineering. The intended
`readership for this book includes at least three groups. At the highest level, any
`reader with a general scientific backgroundwill be able to gain an appreciation for
`the heuristics of perceptual coding, Secondly, readers with a general electrical and
`computer engineering background will become familiar with the essential signal
`processing techniques and perceptual models embedded in most audio coders.
`Finally, undergraduate and graduate students with focuses in multimedia, DSP,
`and computer music will gain Important knowledge in signal analysis and audio
`coding algorithms. The vast body of literature provided and the tutorial aspects
`ofthe book make it an asset for audiophiles as well.
`
`Organization
`This book is in part the outeome of many years of research and teaching at Ari-
`zona State University, We opted toinclude exercises and computer problems and
`hence enable instructors to either use the content in existing DSP and multimedia
`courses, oF to promote the creation of new courses with focus in audio and speech
`processing and coding. The book has twelve chapters and each chapter contains
`problems, proofs; and computer exercises. Chapter [
`introduces the readers to
`the field of audio. signal processing and coding.
`In Chapter 2, we review the
`basic signal processing theory and emphasize concepts relevant to audio cod-
`ing. Chapter 3 deseribes waveform quantization and entropy coding schemes.
`Chapter 4 covers linear predictive coding andits utility in speech and audio cod-
`ing. Chapter 5 covers psychoacoustics and Chapter 6 exploresfilter bank design,
`Chapter 7 describes transform coding methodologies. Subband and sinusoidal
`coding algorithms are addressed in Chapters 8 and 9, respectively. Chapter 10
`reviews several audio coding standards including the ISOAEC MPEGfamily, the
`cinematic Sony SDDS, the Dolby AC-3, and the DTS-coherent acoustics (DTS-
`CA). Chapter 1] focuses on lossless audio coding anddigital audio watermarking
`techniques. Chapter 12 provides information on subjective quality measures,
`
`Use in Courses
`the
`For un undergraduate elective course with little or no background in DSP,
`>)
`instructor can cover in detail Chapters t,
`2,
`3, 4. and 5.
`then present select
`
`
`
`
`Comcast - Exhibit 1008, page 6
`
`Comcast - Exhibit 1008, page 6
`
`
`
`has
`CD
`‘or-
`ied
`ar-
`the
`or-
`his
`
`tal
`
`PREFACE
`
`xix
`
`sections of Chapter 6, and describe in an expository and qualitative manner
`certain basic algorithms and standards from Chapters 7-11. A graduate class in
`audio coding with students that have background in DSP, can start from Chapter 5
`and cover in detail Chapters 6 through Chapter 11, Audio coding practitioners and
`researchers that are interested mostly in qualitative descriptions of the standards
`and information on bibliography can start at Chapter 5 and proceed reading
`through Chapter 11.
`
`Trademarks and Copyrights
`
`Sony Dynamic Digital Sound, SDDS, ATRAC, and MiniDisc are trademarks of
`Sony Corporation. Dolby, Dalby Digital, AC-2, AC-3, DolbyFAX, Dolby Pro-
`Logic are trademarks of Dolby laboratories. The perceptual audio coder (PAC),
`EPAC, and MPAC are trademarks of AT&T and Lucent Technologics. The
`APT-x100 is trademark of Audio Processing Technology Inc. The DTS-CAis
`trademark of Digital Theater Systems Inc. Apple iPod® is a registered trademark
`of Apple Computer, Inc.
`
`Acknowledgments
`
`The authors have all spent time at Arizona State University (ASU) and Prof.
`Spanias is in fact still teaching and directing research in this area at ASU. The
`group of authors has worked on grants with Intel Corporation and would like to
`thank this organization for providing grants in scalable speech and audio coding
`that created opportunities for in-depth studies in these areas. Special thanks to
`our colleagues in Intel Corporation at that time including Brian Mears, Gopal
`Nair, Hedayat Daie, Mark Walker, Michael Deisher, and Tom Gardos. We also
`wish to acknowledge the support of current Intel colleagues Gang Liang, Mike
`Rosenzweig, and Jim Zhou, as well as Scott Peirce for proof reading some of the
`material. Thanks also to former doctoral students at ASU including Philip Loizou
`and Sassan Ahmadi for many useful discussions in speech and audio processing.
`We appreciate also discussions on narrowband vocoders with Bruce Fette in the
`late 1990s then with Motorola GEG and now with General Dynamics.
`The authors also acknowledge the National Science Foundation (NSF) CCLI
`for grants in education that supported in part the preparation of several computer
`examples and paradigms in psychoacoustics and signal coding. Also some of
`the early work in coding of Dr. Spanias was supported by the Naval Research
`Laboratories (NRL) and we would like to thank that organization for providing
`ideas for projects that inspired future work in this area. We also wish to thank
`ASU and someof the faculty and administrators that provided moral and material
`support for work in this area. Thanks are extended to current ASUstudents
`Shibani Misra, Visar Berisha, and Mahesh Banavar for proofreading some of
`the material. We thank the Wiley Interscience production team George Telecki,
`Melissa Yanuzzi, and Rachel Witmer for their diligent efforts in copyediting,
`cover design, and typesetting. We also thank all the anonymous reviewers for
`
`Comcast- Exhibit 1008, page 7
`
`
`
`Comcast - Exhibit 1008, page 7
`
`
`
`
`
`XX
` PREFACE
`
`
`
`
`their useful comments. Finally,
`we all wish to express our thanks to our families
`for their support.
`The book contentis used frequently in ASU online courses and industry short
`
`courses offered by Andreas Spanias, Contact Andreas Spanias (spanias @asu.edu /
`http://www. fulton.asu.edu/~spanias/) for details,
`
`
`' Resources used for obtaining important dates in recording history inelade websites at the University
`of Sun Diego, Arizona Stale University, and Wikipedia.
`
`
`Comcast - Exhibit 1008, page 8
`
`Comcast - Exhibit 1008, page 8
`
`
`
`families
`
`:ry short
`su.edu /
`
`CHAPTER 1
`
`INTRODUCTION
`
`Audio coding or audio compression algorithms are used to obtain compact dig-
`ital representations of high-fidelity (widchand) audio signals for the purpose of
`efficient transmission or storage. The central objective in audio coding is to rep-
`resent the signal with a minimum number of bits while achieving transparent
`signal reproduction, i.e., generating output audio that cannot be distinguished
`from the original input, even by a sensitive listener ("golden ears"). This text
`gives an in-depth treatment of algorithms and standards for transparent coding
`of high-fidelity audio.
`
`1.1 HISTORICAL PERSPECTIVE
`
`The introduction of the compact disc (CD) in the early 1980s brought to the
`fore all of the advantages of digital audio representation, including true high-
`fidelity, dynamic range, and robustness. These advantages, however, came at
`the expense of high data rates. Conventional CD and digital audio tape (DAT)
`systems are typically sampled at either 44.1 or 48 kHz using pulse code mod-
`ulation (PCM) with a 16-bit sample resolution. This results in uncompressed
`data rates of 705.6/768 kb/s for a monaural channel, or 1.41/1.54 Mb/s for a
`stereo-pair. Although these data rates were accommodated successfully in first-
`generation CD and DAT players, second-generation audio players and wirelessly
`connected systems arc often subject to bandwidth constraints that are incompat-
`ible with high data rates. Because of the success enjoyed by the first-generation
`
`:rsity
`
`Audio Signal Processing and Coding, by Andreas Spanias, Ted Painter, and Venkatcaman Atti
`Copyright
`2007 by John Wiley & Sons, Inc.
`
`1
`
`Page 23
`
`Comcast - Exhibit 1008, page 23
`
`
`
`2
`
`INTRODUCTION
`
`systems, however, end users have come to expect "CD-quality" audio reproduc-
`tion from any digital system. Therefore, new network and wireless multimedia
`digital audio systems must reduce data rates without compromising reproduc-
`tion quality. Motivated by the need for compression algorithms that can satisfy
`simultaneously the conflicting demands of high compression ratios and trans-
`parent quality for high-fidelity audio signals, several coding methodologies have
`been established over the last two decades. Audio compression schemes, in gen-
`eral, employ design techniques that exploit both permnual irrelevancies and
`statistical redundancies.
`PCM was the primary audio encoding scheme employed until the early 1980s.
`PCM does not provide any mechanisms for redundancy removal. Quantization
`methods that exploit the signal correlation, such as differential PCM (DPCM),
`delta modulation Daya761 liaya$41, and adaptive DPCM (ADPCM) were applied
`to audio compression later (e.g., PC audio cards). Owing to the need for dras-
`tic reduction in bit rates, researchers began to pursue new approaches for audio
`coding based on the principles of psychoacoustics (Zwic90] [Moor03]. Psychoa-
`coustic notions in conjunction with the basic properties of signal quantization
`have led to the theory of perceptual entropy [John880 (John8813J. Perceptual
`entropy is a quantitative estimate of the fundamental limit of transparent audio
`signal compression. Another key contribution to the field was the characterization
`of the auditory filter bank and particularly the time-frequency analysis capabili-
`ties of the inner ear fMoor831. Over the years, several filter-bank structures that
`mimic the critical band structure of the auditory filter bank have been proposed.
`A fi lter bank is a parallel bank of bandpass filters covering the audio spectrum,
`which, when used in conjunction with a perceptual model, can play an important
`role in the identification of perceptual irrelevancies.
`During the early 1990s, several workgroups and organizations such as
`the International Organization for Standardization/International Electro-technical
`Commission (ISO/IEC), the International Telecommunications Union (ITU),
`AT&T, Dolby Laboratories, Digital Theatre Systems (DTS), Lucent Technologies,
`Philips, and Sony were actively involved in developing perceptual audio coding
`algorithms and standards. Some of the popular commercial standards published
`in the early 1990s include Dolby's Audio Coder-3 (AC-3), the DTS Coherent
`Acoustics (DTS-CA), Lucent Technologies' Perceptual Audio Coder (PAC),
`Philips' Precision Adaptive Subhand Coding (PASO), and Sony's Adaptive
`Transform Acoustic Coding (ATRAC). Table 1.1 lists chronologically some of
`the prominent audio coding standards. The commercial success enjoyed by
`these audio coding standards triggered the launch of several multimedia storage
`formats.
`Table 1.2 lists some of the popular multimedia storage formats since the begin-
`ning of the CD era. High-performance stereo systems became quite common with
`the advent of CDs in the early 1980s. A compact-disc—read only memory (CD-
`ROM) can store data up to 700-800 MB in digital form as "microscopic-pits"
`that can he read by a laser beam off of a reflective surface or a medium. Three
`competing storage media — DAT, the digital compact cassette (DCC), and the
`
`Page 24
`
`Comcast - Exhibit 1008, page 24
`
`
`
`11110"•--
`
`Table 1.1. List of perceptual and lossless audio coding standards/algorithms.
`
`HISTORICAL PERSPECTIVE
`
`3
`
`Standard/algorithm
`
`1. ISO/IEC MPF,G-1 audio
`2. Philips' PASC (for DCC applications)
`3. AT&T/Lucent PAC/EPAC
`4. Dolhy AC-2
`5, AC-3/Dolby Digital
`6. ISO/IEC MPEG-2 (BC/LSF) audio
`7. Sony's ATRAC; (MiniDisc and SDDS)
`8. SHORTEN
`9. Audio processing technology - APT-s100
`10. ISO/IEC MPEG-2 AAC
`11. DTS coherent acoustics
`12. The DVD Algorithm
`13. MUSICompress
`14. Lossless transform coding of audio (LTAC)
`15. AudioPaK
`16. ISO/IEC MPEG-4 audio version 1
`17. Meridian lossless packing (MLP)
`18. ISO/IEC MPEG-4 audio version 2
`19. Audio coding based on integer transforms
`20. Direct-stream digital (DSD) technology
`
`Related references
`
`[ISOI92J
`[Lokh92]
`[.1ohn96c] [Sinh96]
`[Davi92] [FieI91]
`[Davis93J 1Fie196]
`[IS0194a]
`[Yosh94] [Tsut96]
`[Robi94]
`[Wyli96b]
`(1S01961
`[Smyt96] [Smyt99]
`[Crav96] [Crav97]
`[Wege97]
`[Pura97]
`[Hans98b] [Hans01]
`[1S0199]
`[Gerz99]
`[ISOI00]
`[Geig01] [Geig02]
`[ReefOla] [Jans03]
`
`Table 1.2. Some of the popular audio storage
`formats.
`
`Audio storage format
`
`Related references
`
`1. Compact disc
`2. Digital audio tape (DAT)
`3. Digital compact cassette (DCC)
`4. MiniDisc
`5. Digital versatile disc (DVD)
`6. DVD-audio (DVD-A)
`7. Super audio CD (SACD)
`
`[CD82] [1ECA87]
`[Watk88] [Tan89]
`[Lokh91] [Lokh92]
`[Yosh94] [Tsut96]
`[DVD96]
`[DVDOI ]
`[SACD02]
`
`MiniDisc (MD) - entered the commercial market during 1987-1992. Intended
`mainly for hack-up high-density storage (-1.3 GB). the DAT became the primary
`source of mass data storage/Transfer [Watk88] ITan891. In 1991-1992, Sony pro-
`posed a storage medium called the MiniDisc, primarily for audio storage. MD
`employs the ATRAC algorithm for compression. In 1991, Philips introduced the
`DCC, a successor of the analog compact cassette. Philips DCC employs a com-
`pression scheme called the PASC [Lokh91] lLokh921 11-loog9211. The. DCC began
`
`Page 25
`
`Comcast - Exhibit 1008, page 25
`
`
`
`4
`
`INTRODUCTION
`
`as a potential competitor for :DAL's but was discontinued in 1996. The introduc-
`tion of the digital versatile disc (DVD) in 1996 enabled both video and audio
`recording/storage as well as text-message programming. The DVD became one
`of the most successful storage media. With the improvements in the audio com-
`pression and DVD storage technologies, multichannel surround sound encoding
`formats gained interest [Bosi93] [Ho111199] [Bosi00].
`With the emergence of streaming audio applications, during
`late
`the
`1990s, researchers pursued techniques such as combined speech and audio
`architectures, as well as joint source-channel coding algorithms that are optimized
`for the packet-switched Internet. The advent of ISO/IEC MPEG-4 standard
`(1996-2000) [1S0199] [ISOT00] established new research goals for high-quality
`coding of audio at low bit rates. MPEG-4 audio encompasses more functionality
`than perceptual coding [Koen981 [Koen99]. it comprises an integrated family of
`algorithms with provisions for scalable, object-based speech and audio coding at
`bit rates from as low as 200 h/s up to 64 kb/s per channel.
`The emergence of the DVD-audio and the super audio CD (SACD) pro-
`vided designers with additional storage capacity, which motivated research in
`lossless audio coding 1Crav961 1Gerz991 [ReefOl a]. A lossless audio coding sys-
`tem is able to reconstruct perfectly a bit-for-bit representation of the original
`input audio. In contrast, a coding scheme incapable of perfect reconstruction is
`called lossy. For most audio program material, lossy schemes offer the advan-
`tage of lower bit rates (e.g., less than 1 hit per sample) relative to lossless
`schemes (e.g., 10 bits per sample). Delivering real-time lossless audio content
`to the network browser at low bit rates is the next grand challenge for codes
`designers.
`
`1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE
`
`Over the last few years, researchers have proposed several efficient signal models
`(e.g., transform-based, subband-filter structures, wavelet-packet) and compression
`standards (Table 1.1) for high-quality digital audio reproduction. Most of these
`algorithms are based on the generic architecture shown in Figure 1.1.
`The coders typically segment input signals into quasi-stationary frames ranging
`horn 2 to 50 ms. Then, a time-frequency analysis section estimates the temporal
`and spectral components of each frame. The time-frequency mapping is usually
`matched to the analysis properties of the human auditory system. Either way,
`the ultimate objective is to extract from the input audio a set of time-frequency
`parameters that is amenable to quantization according to a perceptual distortion
`metric. Depending on the overall design objectives, the time-frequency analysis
`section usually contains one of the following:
`
`• Unitary transform
`• Time-invariant hank of critically sampled, uniform/nonuniform bandpass
`filters
`
`Page 26
`
`Comcast - Exhibit 1008, page 26
`
`
`
`due-
`udio
`one
`:om-
`ding
`
`late
`udio
`ized
`lard
`pity
`ity
`y of
`g at
`
`nro-
`i
`ys-
`inal
`n is
`'an-
`less
`tent
`dec
`
`leis
`ion
`ese
`
`ing
`oral
`Lily
`ay,
`icy
`on
`sis
`
`155
`
`Input
`audio
`
`Time-
`frequency
`analysis
`
`Quantization
`and encoding
`
`Psychoacoustic
`analysis
`
`Bit-allocation
`
`AUDIO CODER ATTRIBUTES
`
`5
`
`Parameters
`
`Entropy
`(lossless)
`coding
`
`MUX
`
`To
`channel
`
`Masking
`thresholds
`
`Side
`information
`
`Figure 1.1. A generic perceptual audio encoder.
`
`• Time-varying (signal-adaptive) bank of critically sampled, uniform/nonunif-
`orm bandpass filters
`• Harmonic/sinusoidal analyzer
`• Source-system analysis (LPC and multipulse excitation)
`• Hybrid versions of the above.
`
`The choice of time-frequency analysi.s methodology always involves a fun-
`damental tradeoff between time and frequency resolution requirements. Percep-
`tual distortion control is achieved by a psychoacoustic signal analysis section
`that estimates signal masking power based on psychoacoustic principles. The
`psychoacoustic model delivers masking thresholds that quantify the maximum
`amount of distortion at each point in the time-frequency plane such that quan-
`tization of the time-frequency parameters does not introduce audible artifacts.
`The psychoacoustic model therefore allows the quantization section to exploit
`perceptual irrelevancies. This section can also exploit statistical redundancies
`through classical techniques such as DPCM or ADPCM. Once a quantized com-
`pact parametric set has been formed, the remaining redundancies are typically
`removed through noiseless run-length (RL) and entropy coding techniques, e.g.,
`Huffman [Cove91], arithmetic [Witt87], or Lempel-Ziv-Welch (LZW) [Ziv77]
`Nelc84l. Since the output of the psychoacoustic distortion control model is
`signal-dependent, most algorithms are inherently variable rate. Fixed channel
`rate requirements are usually satisfied through buffer feedback schemes, which
`often introduce encoding delays.
`
`1.3 AUDIO CODER ATTRIBUTES
`
`Perceptual audio coders arc typically evaluated based on the following attributes:
`audio reproduction quality, operating bit rates, computational complexity, codec
`delay, and channel error robustness. The objective is to attain a high-quality
`(transparent) audio output at low bit rates (<32 kb/s), with an acceptable
`
`Page 27
`
`Comcast - Exhibit 1008, page 27
`
`
`
`6
`
`INTRODUCTION
`
`algorithmic delay (--.5 to 20 ms), and with low computational complexity (-1 to
`10 million instructions per second, or MIPS).
`
`1.3.1 Audio Quality
`Audio quality is of paramount importance when designing an audio coding
`algorithm. Successful strides have been made since the development of sim-
`ple near-transparent perceptual coders. Typically, classical objective measures of
`signal fidelity such as the signal to noise ratio (SNR) and the total harmonic
`distortion (THD) are inadequate IRyde961. As the field of perceptual audio cod-
`ing matured rapidly and created greater demand for listening tests, there was a
`corresponding growth of interest in perceptual measurement schemes. Several
`subjective and objective quality measures have been proposed and standard-
`ized during the last decade. Some of these schemes include the noise-to-mask
`ratio (NMR, 1987) IBran87a1 the perceptual audio quality measure (PAQM,
`1991) IBeer9 I I, the perceptual evaluation (PERCEVAL, 19921 [Pa iI92[, the per-
`ceptual objective measure (POM, 1995) [Colo95], and the objective audio signal
`evaluation (OASE, 1997) [Spor97]. We will address these and several other qual-
`ity assessment schemes in detail in Chapter 12.
`
`1.3.2 Bit Rates
`From a codec designer's point of view, one of the key challenges is to rep-
`resent high-fidelity audio with a minimum number of bits. For instance, if a
`5-ms audio frame sampled at 48 kHz (240 samples per frame) is represented
`using 80 bits, then the encoding hit rate would he 80 bits/5 ms = 16 kb/s. Low
`bit rates imply high compression ratios and generally low reproduction qual-
`ity. Early coders such as the ISO/IEC MPEG-I (32-448 kb/s), the Dolby AC-3
`(32-384 kb/s), the Sony ATRAC (256 kb/s), and the Philips PASC (192 kb/s)
`employ high bit rates for obtaining transparent audio reproduction. However, the
`developinerit of several sophisticated audio .coding tools (e.g., MPEG-4 audio
`tools) created ways for eflicient transmission or storage of audio at rates between
`8 and 32 kb/s. Future audio coding algorithms promise to offer reasonable qual-
`ity at low rates along with the ability to scale both rate and quality to match
`different requirements such as time-varying channel capacity.
`
`1.3.3 Complexity
`Reduced computational complexity not only enables real-time implementation
`but may also decrease the power consumption and extend battery life. Com-
`putational complexity is usually measured in terms of millions of instructions
`per second (MIPS). Complexity estimates are proces