throbber
(NTOWIL EY
`
`Audio Signal
`Processing and Coding
`
`Andreas Spanias, Ted Painter, and Venkatraman Atti
`
`YID
`
`Pie
`
`440
`
`lq
`
`h znt:tui 7,4
`
`t
`
`II U
`
`- •••-•91, r
`
`HULU LLC
`Exhibit 109
`IPR2018-01090
`
`Page 1
`
`HULU LLC
`Exhibit 1008
`IPR2018-01170
`
`Page 1
`
`

`

`Copyright (0 2007 by John Wiley & Sons, Inc. All rights reserved.
`
`Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
`Published simultaneously in Canada.
`
`Ni, part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
`form or by any means, electronic. mechanical, photocopying. recording, scanning, or otherwise,
`except as permitted under Section 107 or 108 of the 1976 United Slates Copyfight Act, without
`either the prior written permission of the Publisher. or authorization through payment of the
`appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewoixl Drive. Danvers,
`MA 01923, (978) 7511-84011. fax (9781 750-1470. or on the web at www.copyright.com, Requests
`to the Publisher for permission should he addressed to the Permissions Department, John Wiley &
`Sons, Inc.. I I I River Street, Hoboken, NJ 07030. (2011 748-6011, fax (2(11) 748-6008, or online at
`latp://www,wiley.conii/go/permission.
`
`Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
`efforts in preparing this book. they make no representatives or warranties with respect to the
`accuracy or completeness of the contents of this hook and specifically disclaim ally implied
`warranties of merchantability or fitness fry a particular purpose. No warranty may he created or
`extended by sales representatives or written sales materials. The advice and strategies contained
`herein may not be suitable for your situation. You should consult with a professional where
`appropriate.. Neither the publisher nor author shall be liable for any loss of profit or any other
`commercial damages. including but not limited to special. incidental, consequential. or other
`damages.
`
`For general information on our other products and services or for technical support, please contact
`our Customer Care Department within the United States at (800) 762-2974, outside the United
`Slates at (317) 572-39'13 or fax (3171 572-4002,
`
`Wiley also publishes its books i❑ a variety of electronic formats. Some content that appears in print
`may not he available in electronic formats. For more information about Wiley products, visit our
`web site at www.wiley.com.
`
`Wiley Bicentennial Logo: Richard J. Pacifico
`
`Library of Congress Cataloging-in-Publication Data:
`
`Spanias, Andreas.
`Audio signal processing and coding/by Andreas Spanias, Ted Painter, Venkatraman Atti.
`p. CM.
`"Wiley-Interscience publication."
`Includes bibliographical references and index.
`ISBN: 978-0-471-79147-8
`1. Coding theory. 2. Signal processing—Digital techniques. 3..Sound—Recording and
`reproducing—Digital techniques. I. Painter, Ted, 1967-1I. Alti, Venkattalnan, 1978-ITL
`Title.
`
`TK.5102.92.S73 2006
`621.382'8—dc22
`
`Printed in the United States of America.
`
`10 9 8 7 6 5 4 3 2 1
`
`2006040507
`
`Page 2
`
`Page 2
`
`

`

`
`CAAAER
`
`NwwnNPOWRNLHow
`
`
`
`PREFACE
`TTTeSSSSSSSSSSSSMmmmmMhsesesesesee
`
`
`
`Ww
`
`Audio processing and recording has been part of telecommunication and enter-
`tainmentsystems for more than a century. Moreover bandwidth issues associated
`with audio recording, transmission, and storage occupied engineers fronthe very
`
`early stages in this field. A series of important technological developments paved
`the way from early phonographs to magnetic tape recording, and lately compact
`disk (CD), and super storage devices. In the following, we capture some ofthe
`main events and milestones that mark the history in audio recording and storage.!
`Prototypes of phonographs appeared around 1877, andthe first attempt to mar-
`ket cylinder-based gramophones was by the Columbia Phonograph Co.
`in 1889.
`
`Five years later, Marconi demonstrated the first radio transmission that marked
`the beginning of audio broadcasting. The Victor Talking Machine Company, with
`the litte nipper dog as its trademark, was formed in 1901. The “telegraphone”, a
`magnetic recorder for voice that usedstill wire, was patented in Denmark around
`the end ofthe nineteenth century. The Odeon and His Masters Voice (HMV)
`label produced and marketed music recordings in the early nineteen hundreds,
`The cabinet phonograph with a horn called “Victrola” appeared al about the same
`time. Diamonddisk players were marketed in 1913 followed by efforts to produce
`sound-on-film for motion pictures. Other milestones include the first commercial
`transmission in Pittsburgh and the emergenceof public address amplifiers. Elec-
`trically recorded material appearedin the 1920s and the first sound-on-film was
`demonstrated in the mid 1920s by Warner Brothers. Cinemaapplications in the
`
`1930s promoted advances in loudspeaker technologies leading to the develop-
`ment of woofer, tweeter, and crossover network concepts. Juke boxes for music
`also appeared in the 1930s. Magnetic tape recording was demonstrated in Ger-
`many in the 1930s by BASF and AEG/Teletunken, The Ampex tape recorders
`appeared in the US in the late 1940s. The demonstration of stereo high-fidelity
`(Hi-Fi) soundin the late 1940s spurred the development of amplifiers, speakers,
`ind reel-to-reel
`tape recorders for home use in the 1950s bothin Europe and
`XV
`
` Page 3
`
`
`
`Page 3
`
`

`

`
`
` PREFACE Xvi
`
`
`
`
`Shuffle Songs
`
`SS
`Photos
`Extras
`Settings
`
`Apple iPod@a, (Courtesy of Apple Computer, tac.) Apple iPod®is a registered trademark
`of Apple Computer, Inc.
`
`the US. Meanwhile, Columbia produced the 33-rpmlong play (LP) vinyl record,
`while its rival RCA Victor produced the compact 45-rpm format whose sales
`took off with the emergence of rock androll music. Technological developments
`in the mid 1950s resulted in the emergence of compact transistor-based radios
`and soon afler small
`tape players. In 1963, Philips introduced the compact cas-
`setle tape format with its EL3300 series portable players (marketed in the US as
`Norelco) which became an instant success with accessories for home, portable,
`andcar use. Bight track cassettes became popularin the late [960s mainly for car
`use. The Dolby system for compact cassette noise reduction was also a landmark
`in the audio signal processing field. Meanwhile, FM broadcasting, which had
`been invented earlier, took off in the (960s and L970s with stereo transmissions.
`Melical
`tape-head technologies invenied in Japan in the 1960s provided high
`bandwidth recording capabilities which enabled video tape recorders for home
`use in the 1970s (e.g., VHS and Beta formats), This technology was also used
`in the 1980s for audio PCMstereo recording, Laser compact disk technology
`was introduced in 1982 and by the late 1980s became the preferred format for
`Hi-Fi stereo recording. Analog compact cassette players, high-quality reel-to-reel
`recorders, expensive turntables, and virtually all analog recording devices started
`fading away by the late 1980s. The launch ofthe digital CD audio format
`in
`
`
`
`
`
`Page 4
`
`Page 4
`
`

`

`PREFACE
`
`the 1980s coincided with the advent of personal computers, and took over in
`all aspects of music recording and distribution. CD playback soon dominated
`broadcasting, automobile, home stereo, and analog vinyl LP. The compact cas-
`sette formats becamerelics of an old era and eventually disappeared from music
`stores. Digital audio tape (DAT) systemsenabled by helical tape head technology
`were also introduced in the 1980s but were commercially unsuccessful because
`of strict copyright laws and unusually large taxes.
`Parallel developments in digital video formats for laser disk technologies
`included work in audio compression systems. Audio compression research papers
`started appcaring mostly in the 1980s al IEEE ICASSP and Audio Engineer-
`ing Society conferences by authors from several research and development labs
`including, Erlangen-Nuremburg University and Fraunhofer IIS, AT&T Bell Lab-
`oratories, and Dolby Laboratories. Audio compression or audio coding research,
`the art of representing an audio signal with the least number of information
`bits while maintaining its fidelity, went through quantum leaps in the late 1980s
`and 1990s. Although originally most audio compression algorithms were devel-
`oped as part of the digital motion video compression standards, e.g., the MPEG
`series, these algorithms eventually became important as stand alone technologies
`for audio recording and playback. Progress in VLSI technologies, psychoacous-
`lics and efficient time-frequency signal representations made possible a serics of
`scalable real-time compression algorithms for use in audio and cinema applica-
`tions. In the 1990s, we witnessed the emergence ofthe first products that used
`compresscd audio formats such as the MiniDisc (MD) and the Digital Compact
`Cassette (DCC). The sound and video playing capabilities of the PC and the
`proliferation of multimedia content through the Internet had a profound impact
`on audio compression technologies. The MPEG-I/-2 layer II (MP3) algorithm
`became a defacto standard for Internet music downloads. Specialized websites
`that feature music content changed the ways peuple buy and share music. Com-
`pact MP3 players appeared in the late 1990s. In the early 2000s, we had the
`emergence of the Apple iPod® player with a hard drive that supports MP3 and
`MPEGadvanced audio coding (AAC) algorithms.
`In order to enhance cinematic and home theater listening experiences and
`deliver greater realism than ever before, audio codec designers pursued sophis-
`licated multichannel audio coding techniques. In the mid 1990s, techniques for
`encoding 5.1 separate channels of audio were standardized in MPEG-2 BC and
`later MPEG-2 AAC audio. Proprictary multichannel algorithms were also devel-
`oped and commercialized by Dolby Laboratories (AC-3), Digital Theater System
`(DTS), Lucent (EPAC), Sony (SDDS), and Microsoft (WMA). Dolby Labs, DTS,
`Lexicon, and other companies also introduced 2:N channel upmix algorithms
`Capable of synthesizing multichannel surround presentation from conventional
`stcreo content (¢.g., Dolby ProLogic II, DTS Neo6). The human auditory system
`is capable of localizing sound with greater spatial resolution than current multi-
`channel audio systems offer, and as a result the quest continues to achieve the
`ullimate spatial fidelity in sound reproduction. Research involving spatial audio,
`real-time acoustic source localization, binaural cue coding, and application of
`
`xvii
`
`Page 5
`
`Page 5
`
`

`

`
`
`
`PREFACE
`
`xviii
`
`head-related transfer functions (HRTF) towards rendering immersive audio has
`gained interest. Audiophiles appeared skeptical with the 44.1-kHz 16-bit CD
`stereo format and some were critical of the sound quality of compression for-
`mats, These ideas along with the need for copyright protection eventually gained
`momentum and new standards and formats appeared in the early 2000s.
`In par-
`ticular, multichannel lossless coding such as the DVD-Audio (DVD-A) andthe
`Super-Audio-CD (SACD) appeared. The standardization of these storage for-
`mats provided the audio codec designers with enormous storage capacity. This
`motivated lossless coding of digital audio.
`The purpose ofthis book is to provide an in-depth treatment of audio com-
`pression algorithms and standards, The topic ts currently occupying several com-
`munities in signal processing, multimedia, and audio engineering. The intended
`readership for this book includes at least three groups. At the highest level, any
`reader with a general scientific backgroundwill be able to gain an appreciation for
`the heuristics of perceptual coding, Secondly, readers with a general electrical and
`computer engineering background will become familiar with the essential signal
`processing techniques and perceptual models embedded in most audio coders.
`Finally, undergraduate and graduate students with focuses in multimedia, DSP,
`and computer music will gain Important knowledge in signal analysis and audio
`coding algorithms. The vast body of literature provided and the tutorial aspects
`ofthe book make it an asset for audiophiles as well.
`
`Organization
`This book is in part the outeome of many years of research and teaching at Ari-
`zona State University, We opted toinclude exercises and computer problems and
`hence enable instructors to either use the content in existing DSP and multimedia
`courses, oF to promote the creation of new courses with focus in audio and speech
`processing and coding. The book has twelve chapters and each chapter contains
`problems, proofs; and computer exercises. Chapter [
`introduces the readers to
`the field of audio. signal processing and coding.
`In Chapter 2, we review the
`basic signal processing theory and emphasize concepts relevant to audio cod-
`ing. Chapter 3 deseribes waveform quantization and entropy coding schemes.
`Chapter 4 covers linear predictive coding andits utility in speech and audio cod-
`ing. Chapter 5 covers psychoacoustics and Chapter 6 exploresfilter bank design,
`Chapter 7 describes transform coding methodologies. Subband and sinusoidal
`coding algorithms are addressed in Chapters 8 and 9, respectively. Chapter 10
`reviews several audio coding standards including the ISOAEC MPEGfamily, the
`cinematic Sony SDDS, the Dolby AC-3, and the DTS-coherent acoustics (DTS-
`CA). Chapter 1] focuses on lossless audio coding anddigital audio watermarking
`techniques. Chapter 12 provides information on subjective quality measures,
`
`Use in Courses
`the
`For un undergraduate elective course with little or no background in DSP,
`>)
`instructor can cover in detail Chapters t,
`2,
`3, 4. and 5.
`then present select
`
`
`
`
`Page 6
`
`

`

`has
`CD
`‘or-
`ied
`ar-
`the
`or-
`his
`
`tal
`
`PREFACE
`
`xix
`
`sections of Chapter 6, and describe in an expository and qualitative manner
`certain basic algorithms and standards from Chapters 7-11. A graduate class in
`audio coding with students that have background in DSP, can start from Chapter 5
`and cover in detail Chapters 6 through Chapter 11, Audio coding practitioners and
`researchers that are interested mostly in qualitative descriptions of the standards
`and information on bibliography can start at Chapter 5 and proceed reading
`through Chapter 11.
`
`Trademarks and Copyrights
`
`Sony Dynamic Digital Sound, SDDS, ATRAC, and MiniDisc are trademarks of
`Sony Corporation. Dolby, Dalby Digital, AC-2, AC-3, DolbyFAX, Dolby Pro-
`Logic are trademarks of Dolby laboratories. The perceptual audio coder (PAC),
`EPAC, and MPAC are trademarks of AT&T and Lucent Technologics. The
`APT-x100 is trademark of Audio Processing Technology Inc. The DTS-CAis
`trademark of Digital Theater Systems Inc. Apple iPod® is a registered trademark
`of Apple Computer, Inc.
`
`Acknowledgments
`
`The authors have all spent time at Arizona State University (ASU) and Prof.
`Spanias is in fact still teaching and directing research in this area at ASU. The
`group of authors has worked on grants with Intel Corporation and would like to
`thank this organization for providing grants in scalable speech and audio coding
`that created opportunities for in-depth studies in these areas. Special thanks to
`our colleagues in Intel Corporation at that time including Brian Mears, Gopal
`Nair, Hedayat Daie, Mark Walker, Michael Deisher, and Tom Gardos. We also
`wish to acknowledge the support of current Intel colleagues Gang Liang, Mike
`Rosenzweig, and Jim Zhou, as well as Scott Peirce for proof reading some of the
`material. Thanks also to former doctoral students at ASU including Philip Loizou
`and Sassan Ahmadi for many useful discussions in speech and audio processing.
`We appreciate also discussions on narrowband vocoders with Bruce Fette in the
`late 1990s then with Motorola GEG and now with General Dynamics.
`The authors also acknowledge the National Science Foundation (NSF) CCLI
`for grants in education that supported in part the preparation of several computer
`examples and paradigms in psychoacoustics and signal coding. Also some of
`the early work in coding of Dr. Spanias was supported by the Naval Research
`Laboratories (NRL) and we would like to thank that organization for providing
`ideas for projects that inspired future work in this area. We also wish to thank
`ASU and someof the faculty and administrators that provided moral and material
`support for work in this area. Thanks are extended to current ASUstudents
`Shibani Misra, Visar Berisha, and Mahesh Banavar for proofreading some of
`the material. We thank the Wiley Interscience production team George Telecki,
`Melissa Yanuzzi, and Rachel Witmer for their diligent efforts in copyediting,
`cover design, and typesetting. We also thank all the anonymous reviewers for
`
`Page 7
`
`
`
`Page 7
`
`

`

`
`
`
`XX
` PREFACE
`
`their useful comments. Finally,
`we all wish to express our thanks to our families
`for their support.
`The book contentis used frequently in ASU online courses and industry short
`
`courses offered by Andreas Spanias, Contact Andreas Spanias (spanias @asu.edu /
`http://www. fulton.asu.edu/~spanias/) for details,
`
`
`
`' Resources usedfor obtaining important dates in recording history include websites at the University
`of Sun Diego, Arizona Stale University, and Wikipedia.
`
`Page 8
`
`

`

`families
`
`:ry short
`su.edu /
`
`CHAPTER 1
`
`INTRODUCTION
`
`Audio coding or audio compression algorithms are used to obtain compact dig-
`ital representations of high-fidelity (widchand) audio signals for the purpose of
`efficient transmission or storage. The central objective in audio coding is to rep-
`resent the signal with a minimum number of bits while achieving transparent
`signal reproduction, i.e., generating output audio that cannot be distinguished
`from the original input, even by a sensitive listener ("golden ears"). This text
`gives an in-depth treatment of algorithms and standards for transparent coding
`of high-fidelity audio.
`
`1.1 HISTORICAL PERSPECTIVE
`
`The introduction of the compact disc (CD) in the early 1980s brought to the
`fore all of the advantages of digital audio representation, including true high-
`fidelity, dynamic range, and robustness. These advantages, however, came at
`the expense of high data rates. Conventional CD and digital audio tape (DAT)
`systems are typically sampled at either 44.1 or 48 kHz using pulse code mod-
`ulation (PCM) with a 16-bit sample resolution. This results in uncompressed
`data rates of 705.6/768 kb/s for a monaural channel, or 1.41/1.54 Mb/s for a
`stereo-pair. Although these data rates were accommodated successfully in first-
`generation CD and DAT players, second-generation audio players and wirelessly
`connected systems arc often subject to bandwidth constraints that are incompat-
`ible with high data rates. Because of the success enjoyed by the first-generation
`
`:rsity
`
`Audio Signal Processing and Coding, by Andreas Spanias, Ted Painter, and Venkatcaman Atti
`Copyright
`2007 by John Wiley & Sons, Inc.
`
`1
`
`Page 23
`
`Page 23
`
`

`

`2
`
`INTRODUCTION
`
`systems, however, end users have come to expect "CD-quality" audio reproduc-
`tion from any digital system. Therefore, new network and wireless multimedia
`digital audio systems must reduce data rates without compromising reproduc-
`tion quality. Motivated by the need for compression algorithms that can satisfy
`simultaneously the conflicting demands of high compression ratios and trans-
`parent quality for high-fidelity audio signals, several coding methodologies have
`been established over the last two decades. Audio compression schemes, in gen-
`eral, employ design techniques that exploit both permnual irrelevancies and
`statistical redundancies.
`PCM was the primary audio encoding scheme employed until the early 1980s.
`PCM does not provide any mechanisms for redundancy removal. Quantization
`methods that exploit the signal correlation, such as differential PCM (DPCM),
`delta modulation Daya761 liaya$41, and adaptive DPCM (ADPCM) were applied
`to audio compression later (e.g., PC audio cards). Owing to the need for dras-
`tic reduction in bit rates, researchers began to pursue new approaches for audio
`coding based on the principles of psychoacoustics (Zwic90] [Moor03]. Psychoa-
`coustic notions in conjunction with the basic properties of signal quantization
`have led to the theory of perceptual entropy [John880 (John8813J. Perceptual
`entropy is a quantitative estimate of the fundamental limit of transparent audio
`signal compression. Another key contribution to the field was the characterization
`of the auditory filter bank and particularly the time-frequency analysis capabili-
`ties of the inner ear fMoor831. Over the years, several filter-bank structures that
`mimic the critical band structure of the auditory filter bank have been proposed.
`A fi lter bank is a parallel bank of bandpass filters covering the audio spectrum,
`which, when used in conjunction with a perceptual model, can play an important
`role in the identification of perceptual irrelevancies.
`During the early 1990s, several workgroups and organizations such as
`the International Organization for Standardization/International Electro-technical
`Commission (ISO/IEC), the International Telecommunications Union (ITU),
`AT&T, Dolby Laboratories, Digital Theatre Systems (DTS), Lucent Technologies,
`Philips, and Sony were actively involved in developing perceptual audio coding
`algorithms and standards. Some of the popular commercial standards published
`in the early 1990s include Dolby's Audio Coder-3 (AC-3), the DTS Coherent
`Acoustics (DTS-CA), Lucent Technologies' Perceptual Audio Coder (PAC),
`Philips' Precision Adaptive Subhand Coding (PASO), and Sony's Adaptive
`Transform Acoustic Coding (ATRAC). Table 1.1 lists chronologically some of
`the prominent audio coding standards. The commercial success enjoyed by
`these audio coding standards triggered the launch of several multimedia storage
`formats.
`Table 1.2 lists some of the popular multimedia storage formats since the begin-
`ning of the CD era. High-performance stereo systems became quite common with
`the advent of CDs in the early 1980s. A compact-disc—read only memory (CD-
`ROM) can store data up to 700-800 MB in digital form as "microscopic-pits"
`that can he read by a laser beam off of a reflective surface or a medium. Three
`competing storage media — DAT, the digital compact cassette (DCC), and the
`
`Page 24
`
`Page 24
`
`

`

`11110"•--
`
`Table 1.1. List of perceptual and lossless audio coding standards/algorithms.
`
`HISTORICAL PERSPECTIVE
`
`3
`
`Standard/algorithm
`
`1. ISO/IEC MPF,G-1 audio
`2. Philips' PASC (for DCC applications)
`3. AT&T/Lucent PAC/EPAC
`4. Dolhy AC-2
`5, AC-3/Dolby Digital
`6. ISO/IEC MPEG-2 (BC/LSF) audio
`7. Sony's ATRAC; (MiniDisc and SDDS)
`8. SHORTEN
`9. Audio processing technology - APT-s100
`10. ISO/IEC MPEG-2 AAC
`11. DTS coherent acoustics
`12. The DVD Algorithm
`13. MUSICompress
`14. Lossless transform coding of audio (LTAC)
`15. AudioPaK
`16. ISO/IEC MPEG-4 audio version 1
`17. Meridian lossless packing (MLP)
`18. ISO/IEC MPEG-4 audio version 2
`19. Audio coding based on integer transforms
`20. Direct-stream digital (DSD) technology
`
`Related references
`
`[ISOI92J
`[Lokh92]
`[.1ohn96c] [Sinh96]
`[Davi92] [FieI91]
`[Davis93J 1Fie196]
`[IS0194a]
`[Yosh94] [Tsut96]
`[Robi94]
`[Wyli96b]
`(1S01961
`[Smyt96] [Smyt99]
`[Crav96] [Crav97]
`[Wege97]
`[Pura97]
`[Hans98b] [Hans01]
`[1S0199]
`[Gerz99]
`[ISOI00]
`[Geig01] [Geig02]
`[ReefOla] [Jans03]
`
`Table 1.2. Some of the popular audio storage
`formats.
`
`Audio storage format
`
`Related references
`
`1. Compact disc
`2. Digital audio tape (DAT)
`3. Digital compact cassette (DCC)
`4. MiniDisc
`5. Digital versatile disc (DVD)
`6. DVD-audio (DVD-A)
`7. Super audio CD (SACD)
`
`[CD82] [1ECA87]
`[Watk88] [Tan89]
`[Lokh91] [Lokh92]
`[Yosh94] [Tsut96]
`[DVD96]
`[DVDOI ]
`[SACD02]
`
`MiniDisc (MD) - entered the commercial market during 1987-1992. Intended
`mainly for hack-up high-density storage (-1.3 GB). the DAT became the primary
`source of mass data storage/Transfer [Watk88] ITan891. In 1991-1992, Sony pro-
`posed a storage medium called the MiniDisc, primarily for audio storage. MD
`employs the ATRAC algorithm for compression. In 1991, Philips introduced the
`DCC, a successor of the analog compact cassette. Philips DCC employs a com-
`pression scheme called the PASC [Lokh91] lLokh921 11-loog9211. The. DCC began
`
`Page 25
`
`Page 25
`
`

`

`4
`
`INTRODUCTION
`
`as a potential competitor for :DAL's but was discontinued in 1996. The introduc-
`tion of the digital versatile disc (DVD) in 1996 enabled both video and audio
`recording/storage as well as text-message programming. The DVD became one
`of the most successful storage media. With the improvements in the audio com-
`pression and DVD storage technologies, multichannel surround sound encoding
`formats gained interest [Bosi93] [Ho111199] [Bosi00].
`With the emergence of streaming audio applications, during
`late
`the
`1990s, researchers pursued techniques such as combined speech and audio
`architectures, as well as joint source-channel coding algorithms that are optimized
`for the packet-switched Internet. The advent of ISO/IEC MPEG-4 standard
`(1996-2000) [1S0199] [ISOT00] established new research goals for high-quality
`coding of audio at low bit rates. MPEG-4 audio encompasses more functionality
`than perceptual coding [Koen981 [Koen99]. it comprises an integrated family of
`algorithms with provisions for scalable, object-based speech and audio coding at
`bit rates from as low as 200 h/s up to 64 kb/s per channel.
`The emergence of the DVD-audio and the super audio CD (SACD) pro-
`vided designers with additional storage capacity, which motivated research in
`lossless audio coding 1Crav961 1Gerz991 [ReefOl a]. A lossless audio coding sys-
`tem is able to reconstruct perfectly a bit-for-bit representation of the original
`input audio. In contrast, a coding scheme incapable of perfect reconstruction is
`called lossy. For most audio program material, lossy schemes offer the advan-
`tage of lower bit rates (e.g., less than 1 hit per sample) relative to lossless
`schemes (e.g., 10 bits per sample). Delivering real-time lossless audio content
`to the network browser at low bit rates is the next grand challenge for codes
`designers.
`
`1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE
`
`Over the last few years, researchers have proposed several efficient signal models
`(e.g., transform-based, subband-filter structures, wavelet-packet) and compression
`standards (Table 1.1) for high-quality digital audio reproduction. Most of these
`algorithms are based on the generic architecture shown in Figure 1.1.
`The coders typically segment input signals into quasi-stationary frames ranging
`horn 2 to 50 ms. Then, a time-frequency analysis section estimates the temporal
`and spectral components of each frame. The time-frequency mapping is usually
`matched to the analysis properties of the human auditory system. Either way,
`the ultimate objective is to extract from the input audio a set of time-frequency
`parameters that is amenable to quantization according to a perceptual distortion
`metric. Depending on the overall design objectives, the time-frequency analysis
`section usually contains one of the following:
`
`• Unitary transform
`• Time-invariant hank of critically sampled, uniform/nonuniform bandpass
`filters
`
`Page 26
`
`Page 26
`
`

`

`due-
`udio
`one
`:om-
`ding
`
`late
`udio
`ized
`lard
`pity
`ity
`y of
`g at
`
`nro-
`i
`ys-
`inal
`n is
`'an-
`less
`tent
`dec
`
`leis
`ion
`ese
`
`ing
`oral
`Lily
`ay,
`icy
`on
`sis
`
`155
`
`Input
`audio
`
`Time-
`frequency
`analysis
`
`Quantization
`and encoding
`
`Psychoacoustic
`analysis
`
`Bit-allocation
`
`AUDIO CODER ATTRIBUTES
`
`5
`
`Parameters
`
`Entropy
`(lossless)
`coding
`
`MUX
`
`To
`channel
`
`Masking
`thresholds
`
`Side
`information
`
`Figure 1.1. A generic perceptual audio encoder.
`
`• Time-varying (signal-adaptive) bank of critically sampled, uniform/nonunif-
`orm bandpass filters
`• Harmonic/sinusoidal analyzer
`• Source-system analysis (LPC and multipulse excitation)
`• Hybrid versions of the above.
`
`The choice of time-frequency analysi.s methodology always involves a fun-
`damental tradeoff between time and frequency resolution requirements. Percep-
`tual distortion control is achieved by a psychoacoustic signal analysis section
`that estimates signal masking power based on psychoacoustic principles. The
`psychoacoustic model delivers masking thresholds that quantify the maximum
`amount of distortion at each point in the time-frequency plane such that quan-
`tization of the time-frequency parameters does not introduce audible artifacts.
`The psychoacoustic model therefore allows the quantization section to exploit
`perceptual irrelevancies. This section can also exploit statistical redundancies
`through classical techniques such as DPCM or ADPCM. Once a quantized com-
`pact parametric set has been formed, the remaining redundancies are typically
`removed through noiseless run-length (RL) and entropy coding techniques, e.g.,
`Huffman [Cove91], arithmetic [Witt87], or Lempel-Ziv-Welch (LZW) [Ziv77]
`Nelc84l. Since the output of the psychoacoustic distortion control model is
`signal-dependent, most algorithms are inherently variable rate. Fixed channel
`rate requirements are usually satisfied through buffer feedback schemes, which
`often introduce encoding delays.
`
`1.3 AUDIO CODER ATTRIBUTES
`
`Perceptual audio coders arc typically evaluated based on the following attributes:
`audio reproduction quality, operating bit rates, computational complexity, codec
`delay, and channel error robustness. The objective is to attain a high-quality
`(transparent) audio output at low bit rates (<32 kb/s), with an acceptable
`
`Page 27
`
`Page 27
`
`

`

`6
`
`INTRODUCTION
`
`algorithmic delay (--.5 to 20 ms), and with low computational complexity (-1 to
`10 million instructions per second, or MIPS).
`
`1.3.1 Audio Quality
`Audio quality is of paramount importance when designing an audio coding
`algorithm. Successful strides have been made since the development of sim-
`ple near-transparent perceptual coders. Typically, classical objective measures of
`signal fidelity such as the signal to noise ratio (SNR) and the total harmonic
`distortion (THD) are inadequate IRyde961. As the field of perceptual audio cod-
`ing matured rapidly and created greater demand for listening tests, there was a
`corresponding growth of interest in perceptual measurement schemes. Several
`subjective and objective quality measures have been proposed and standard-
`ized during the last decade. Some of these schemes include the noise-to-mask
`ratio (NMR, 1987) IBran87a1 the perceptual audio quality measure (PAQM,
`1991) IBeer9 I I, the perceptual evaluation (PERCEVAL, 19921 [Pa iI92[, the per-
`ceptual objective measure (POM, 1995) [Colo95], and the objective audio signal
`evaluation (OASE, 1997) [Spor97]. We will address these and several other qual-
`ity assessment schemes in detail in Chapter 12.
`
`1.3.2 Bit Rates
`From a codec designer's point of view, one of the key challenges is to rep-
`resent high-fidelity audio with a minimum number of bits. For instance, if a
`5-ms audio frame sampled at 48 kHz (240 samples per frame) is represented
`using 80 bits, then the encoding hit rate would he 80 bits/5 ms = 16 kb/s. Low
`bit rates imply high compression ratios and generally low reproduction qual-
`ity. Early coders such as the ISO/IEC MPEG-I (32-448 kb/s), the Dolby AC-3
`(32-384 kb/s), the Sony ATRAC (256 kb/s), and the Philips PASC (192 kb/s)
`employ high bit rates for obtaining transparent audio reproduction. However, the
`developinerit of several sophisticated audio .coding tools (e.g., MPEG-4 audio
`tools) created ways for eflicient transmission or storage of audio at rates between
`8 and 32 kb/s. Future audio coding algorithms promise to offer reasonable qual-
`ity at low rates along with the ability to scale both rate and quality to match
`different requirements such as time-varying channel capacity.
`
`1.3.3 Complexity
`Reduced computational complexity not only enables real-time implementation
`but may also decrease the power consumption and extend battery life. Com-
`putational complexity is usually measured in terms of millions of instructions
`per second (MIPS). Complexity estimates are processor-dependent. For example,
`the complexity associated with Dolby's AC-3 decoder was estimated at approxi-
`mately 27 MIPS using the 'Loran ZR38001 general-purpose DSP core [Vem951;
`for the Motorola DSP56002 processor, the complexity was estimated at 45
`MIPS [Vem951. Usually, most of the audio codecs rely on the so-called asym-
`metric encoding principle. This means that the codee complexity is not evenly.
`
`Page 28

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket