throbber
(NTOWIL EY
`
`Audio Signal
`Processing and Coding
`
`Andreas Spanias, Ted Painter, and Venkatraman Atti
`
`YID
`
`Pie
`
`440
`
`lq
`
`h znt:tui 7,4
`
`t
`
`II U
`
`- •••-•91, r
`
`HULU LLC
`Exhibit 109
`IPR2018-01090
`
`Page 1
`
`HULU LLC
`Exhibit 1008
`IPR2018-01170
`
`Page 1
`
`

`

`Copyright (0 2007 by John Wiley & Sons, Inc. All rights reserved.
`
`Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
`Published simultaneously in Canada.
`
`Ni, part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
`form or by any means, electronic. mechanical, photocopying. recording, scanning, or otherwise,
`except as permitted under Section 107 or 108 of the 1976 United Slates Copyfight Act, without
`either the prior written permission of the Publisher. or authorization through payment of the
`appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewoixl Drive. Danvers,
`MA 01923, (978) 7511-84011. fax (9781 750-1470. or on the web at www.copyright.com, Requests
`to the Publisher for permission should he addressed to the Permissions Department, John Wiley &
`Sons, Inc.. I I I River Street, Hoboken, NJ 07030. (2011 748-6011, fax (2(11) 748-6008, or online at
`latp://www,wiley.conii/go/permission.
`
`Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
`efforts in preparing this book. they make no representatives or warranties with respect to the
`accuracy or completeness of the contents of this hook and specifically disclaim ally implied
`warranties of merchantability or fitness fry a particular purpose. No warranty may he created or
`extended by sales representatives or written sales materials. The advice and strategies contained
`herein may not be suitable for your situation. You should consult with a professional where
`appropriate.. Neither the publisher nor author shall be liable for any loss of profit or any other
`commercial damages. including but not limited to special. incidental, consequential. or other
`damages.
`
`For general information on our other products and services or for technical support, please contact
`our Customer Care Department within the United States at (800) 762-2974, outside the United
`Slates at (317) 572-39'13 or fax (3171 572-4002,
`
`Wiley also publishes its books i❑ a variety of electronic formats. Some content that appears in print
`may not he available in electronic formats. For more information about Wiley products, visit our
`web site at www.wiley.com.
`
`Wiley Bicentennial Logo: Richard J. Pacifico
`
`Library of Congress Cataloging-in-Publication Data:
`
`Spanias, Andreas.
`Audio signal processing and coding/by Andreas Spanias, Ted Painter, Venkatraman Atti.
`p. CM.
`"Wiley-Interscience publication."
`Includes bibliographical references and index.
`ISBN: 978-0-471-79147-8
`1. Coding theory. 2. Signal processing—Digital techniques. 3..Sound—Recording and
`reproducing—Digital techniques. I. Painter, Ted, 1967-1I. Alti, Venkattalnan, 1978-ITL
`Title.
`
`TK.5102.92.S73 2006
`621.382'8—dc22
`
`Printed in the United States of America.
`
`10 9 8 7 6 5 4 3 2 1
`
`2006040507
`
`Page 2
`
`Page 2
`
`

`

`
`666533
`
`
`
`
`
`
`PREFACE
`*———__—___—__
`
`tvth-tAOoo-lzmmoo
`
`iv
`
`Audio processing and recording has been part of telecommunication and enter-
`tainment systems for more than a century. Moreover bandwidth issues associated
`with audio reeording. transmission. and storage oeeupied engineers from the very
`
`early stages in this lield. A set
`.' of important technological developments paved
`the way from early phonographs to magnetic tape recording. and lately compact
`dislt
`tC'D). and super storage devices. In the following. we capture some {\I’ the
`main events and milestones that mark the history in audio recording and storage.'
`Prototypes ol' pltonographs appeared around MT]. and the lirst attempt to malt
`tort cylinder-hosed grantophones was hy the t‘olumhia l’honograph Co.
`in lHtt‘).
`
`l-'ive years later. Marconi demons aled the first radio transmission that rnarlted
`the beginning oi" audio hroadeasting. 'l'he Victor Talking Machine Company. with
`the little t‘tippt'r dog as its trademark. was l'ot'ttted in I‘Jtll. The “telegraphone”. a
`magnetic recorder for voice. that used still wire. was patented in Denmark around
`the end ol' the nineteenth century. The ()dcon and His Masters Voice tHMVt
`lahel produced and marketed Inusie recordings in the early nineteen hundreds.
`The eahinct phonograph with a horn called "Victrola" appeared at about the same
`time. Diamond disk players were marketed in 19H followed by et'l'orts to produce
`sound—or
`tilrn [or motion pictures. Other milestones include the first commercial
`It'ztnsntissiim in Pittslmrgh and the emergence. ol' public address amplifiers. lilee-
`Irically recorded material appeared in the IQZUs and the lirst sottttd-‘onsfilm was
`demonstrated in the mid l‘fllls hy Warner Brothers. Cinema applications in the
`
`Willis pron'toted advances in loudspeaker technologies leading to the develop-
`ment ol' woofer. tweeter
`. and crossover nettvork conecpts. .lulte hoses for music
`also uppcared in the Wills. Magnetic tape recording was demonstrated in (jer—
`Inan_v in the Wills by HAHI" and Al-i(_i/'l'cleloosen.
`'l‘ltc Atnpes tape recorders
`£1llpcored in the US in the late lU-Iltls. The demonstration ol' stereo ltigltvhdclity
`tHirrl'll sound in the late l‘J-llls spurred the development ol" amplifiers. speakers.
`and reclsto—reel
`tape recorders For home use in the l‘J‘Slls hoth in linrope and
`XV
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 3
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 3
`
`

`

`
`
`
`
`xvi PREFACE
`
`
`
`
`Pholnfi
`Extras
`39:“an
`
`Shu'llle Songs
`
`Apple iPodrIU. (Courtesy ol‘ Apple Computer, inc.) Apple iPodH'il is a registered trademark
`ol' Apple Computer,
`lnc,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the US. Meanwhile, Columbia produced the 33—rpm long play (LP! vinyl record,
`while its rival RCA Victor produced the compact 45~|‘pm l‘ormat whose sales
`took OH" with the emergence of rock and roll music. Technological developments
`in the mid |95()s resulted in the emergence ol' compact transistor—based radios
`and soon alter small
`tape players.
`ln 1%}, Philips introduced the compact cas-
`sette tape l'ormat with its ELBllll series portable players (marketed in the US as
`\lorelco) which became an instant success with accessories for home, portable,
`and car use. Eight track cassettes became popular in the late l960s mainly l'or car
`use. The Dolby system tor compact cassette noise reduction was also a landmark
`in the audio signal processing field. Meanwhile, FM broadcasting. which had
`been invented ‘arlier, took ol‘l‘ in the [960s and [970s with stereo transmissions.
`Helical
`tapeehcad technologies invented in Japan in the l9o()s provided high
`bandwidth recording capabilities which enabled video tape recorders for home
`use in the “Nils te.g.. VHS and Beta formats). This technology was also used
`in the I980: for audio PCM stereo recording. Laser compact disk technology
`was introduced in 1982 and by the late I‘JKUH became the preferred l'ormat for
`Hi—Fi stereo recording Analog compact cassette players, highaprality reelrtoereel
`recorders, expensive. turntables, and virtually all analog recording devices started
`l‘ading away by the late l98lls. The launch ol" the digital CD audio Format
`in
`
`
`
`
`
`
`Page 4
`
`Page 4
`
`

`

`PREFACE
`
`xvii
`
`the 1980s coincided with the advent of personal computers, and took over in
`all aspects of music recording and distribution. CD playback soon dominated
`broadcasting, automobile, home stereo, and analog vinyl LP. The compact eas-
`sette formats became relics of an old era and eventually disappeared from music
`stores. Digital audio tape (DAT) systems enabled by helical tape head technology
`were also introduced in the 1980s but were commercially unsuccessful because
`of strict copyright laws and unusually large taxes.
`Parallel developments in digital video formats for laser disk technologies
`included work in audio compression systems. Audio compression research papers
`started appearing mostly in the 19805 at IEEE ICASSP and Audio Engineer—
`ing Society conferences by authors from several research and development labs
`including, Erlangen-Nuremburg University and Fraunhofcr 113, AT&T Bell Labs
`oratories, and Dolby Laboratories. Audio compression or audio coding research,
`the art of representing an audio signal with the least number of information
`bits while maintaining its fidelity, went through quantum leaps in the late 1980s
`and 1990s. Although originally most audio compression algorithms were devel-
`oped as part of the digital motion video compression standards, e.g., the MPEG
`series, these algorithms eventually became important as stand alone technologies
`for audio recording and playback. Progress in VLSI technologies, psychoacous-
`tics and efficient time—frequency signal representations made possible a series of
`scalable real—time compression algorithms for use in audio and cinema applica
`tions. In the 1990s, we witnessed the emergence of the first products that used
`compressed audio formats such as the MiniDisc (MD) and the Digital Compact
`Cassette (DCC). The sound and video playing capabilities of the PC and the
`proliferation of multimedia content through the Internet had a profound impact
`on audio compression technologies. The MPEG-l/—2 layer III (MP3) algorithm
`became a defacto standard for Internet music downloads. Specialized web sites
`that feature music content changed the ways people buy and share music. Com—
`pact MP3 players appeared in the late 1990s. In the early 20005, we had the
`emergence of the Apple iPod® player with a hard drive that supports MP3 and
`MPEG advanced audio coding (AAC) algorithms.
`In order to enhance cinematic and home theater listening experiences and
`deliver greater realism than ever before, audio codec designers pursued sophis-
`ticated multichannel audio coding techniques. In the mid 19905, techniques for
`encoding 5.1 separate channels of audio were standardized in MPEG—2 BC and
`later MPEG-2 AAC audio. Proprietary multichannel algorithms were also devel-
`oped and commercialixed by Dolby Laboratories (AC-3), Digital Theater System
`(DTS), Luceall IEPACJ. Sony (SDDS), and Microsoft (WMA). Dolby Labs, DTS,
`Lexicon, and other companies also introduced 2:N channel upmix algorithms
`Capable of synthesizing multichannel surround presentation from conventional
`Stereo content (e.g., Dolby ProLogic II, DTS Ne06). The human auditory system
`is capable of localizing sound with greater spatial resolution than current multi-
`Channel audio systems offer, and as a result the quest continues to achieve the
`ultimate spatial fidelity in sound reproduction. Research involving spatial audio,
`real—time acoustic source localization, binaural cue coding, and application of
`
`Page 5
`
`Page 5
`
`

`

` xviii
`
`PREFACE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`head-related transfer functions (HRTF) towards rendering immersive attdio has
`gained interest. Audiophiles appeared skeptical with the 44.likHz lo—bit CD
`stereo format and sotne were critical of the sound quality of compression for—
`tnats. These ideas
`alottg with the need for copyright protection eventually gained
`momentum and new standards and formats appeared in the early 2000s.
`In par»
`ticular. mullichmmel lossless coding sttch as the DVD~Audio (fDVDiAt and the
`Super—Audio-L‘D (SACD) appeared. The standardization of these storage for:
`mats provided the audio codec designers with enormous storage capacity. This
`motivated torsion- coding of digital attdio.
`Thc purpose of this lmok is to provide an iii—depth treatment of audio com
`pression algorithms and standards. The topic is currently occupying several com
`munities in signal processing, multimedia, and audio engineering. The intended
`readership for this book includes at least three groups. At the highest level, any
`reader with a general scientific background will be able to gain an appreciation for
`the hettristics of perceptual coding. Secondly. readers with a general electrical and
`computer engineering background will become familiar with the essential signal
`processing technitptes and perceptual models embedded in most audio coders.
`Finally. Itnt‘lcrgratltlatc and Hrttdtlate students with focuses in multimedia. DSP.
`audio
`and computer music will gain important knowledge in signal analysis and
`coding algorithms. The vast body of literature provided
`and the tutorial aspects
`of the tmok make it an asset for attdiophiles as well.
`
`Organization
`
`This book is in part the outcome of many years of research and teaching at Arie
`zona State University. We opted to include exercises and computer problems and
`hence enable instructors to either use the content in existing DSP and multimedia
`courses, or to promote the creation of new courses with focus in audio and speech
`processing and coding. The book has twelve chapters and each chapter contains
`prol'tlt‘tns, proofs. and computer exercises, Chapter I
`introduces the readers to
`the lield of audio signal processing and coding.
`In Chapter 2. we review the
`basic signal processing theory and mnphasize concepts relevant to audio cod—
`ing. (‘haptcr 3 describes war-elort'n n
`prantization and entropy coding schemes.
`Chapter 4 covers littcar pr't‘dicltvc coding and its tttility in speech and audio cod—
`ing. Chapter 5 covers psychoaeottstics and Chapter 6 explores filter bank design,
`Chapter 7 describes transform coding methodologies, Subband and sinusoidal
`coding algorithms are addressed in Chapters 8 and 9. respectively. Chapter l0
`revit‘Ws scvcral audio coding standards including the |SlJ/lfi(' Ml‘l'iU loudly, the
`cinematic Sony RUNS. the IJolliy ALVA, and the DTS—coherent acoustics (D'l‘S—
`('Al. ("haptcr l
`l
`loctlses on losslcss audio coding and digital audio watermarking
`techniques. Chapter l2 provides information on subjective quality measures.
`
`Use in Courses
`
`the
`For an undergraduate elective course with little or no background in DSP,
`instructor can cover in detail Chapters l, 2, 3, 4. and 5.
`then present select
`
`
`
`Page 6
`
`

`

`has
`CD
`’or-
`ied
`tar-
`the
`or—
`his
`
`III-
`
`ed
`11y
`?or
`
`Ial
`rs.
`1)
`i0
`:ts
`
`uw
`
`PREFACE
`
`xix
`
`sections of Chapter 6, and describe in an expository and qualitative manner
`certain basic algorithms and standards from Chapters 7—11. A graduate class in
`audio coding with students that have background in DSP, can start from Chapter 5
`and cover in detail Chapters 6 through Chapter 11. Audio coding practitioners and
`researchers that are interested mostly in qualitative descriptions of the standards
`and information on bibliography can start at Chapter 5 and proceed reading
`through Chapter II.
`
`Trademarks and Copyrights
`
`Sony Dynamic Digital Sound, SDDS, ATRAC, and MiniDisc are trademarks of
`Sony Corporation. Dolby, Dolby Digital, AC—2, AC—3, DolbyFAX, Dolby Pro-
`Logic are trademarks of Dolby laboratories. The perceptual audio coder (PAC),
`EPAC, and MPAC are trademarks of AT&T and Lueent Technologies. The
`APT—x100 is trademark of Audio Processing Technology Inc. The DTS—CA is
`trademark of Digital Theater Systems Inc. Apple iPod® is a registered trademark
`of Apple Computer, Inc,
`
`Acknowledgments
`
`The authors have all spent time at Arizona State University (ASU) and Prof.
`Spanias is in fact still teaching and directing research in this area at ASU. The
`group of authors has worked on grants with Intel Corporation and would like to
`thank this organization for providing grants in scalable speech and audio coding
`that created opportunities for in-depth studies in these areas. Special thanks to
`our colleagues in Intel Corporation at that time including Brian Mears, Gopal
`Nair, Hedayat Daie, Mark Walker, Michael Deisher, and Tom Gardos. We also
`wish to acknowledge the support of current Intel colleagues Gang Liang, Mike
`Rosenzweig, and Jim Zhou, as well as Scott Peirce for proof reading some of the
`material. Thanks also to former doctoral students at ASU including Philip Loizou
`and Sassan Ahmadi for many useful discussions in speech and audio processing.
`We appreciate also discussions on narrowband vocoders with Bruce Fette in the
`late 1990s then with Motorola GEG and now with General Dynamics.
`The authors also acknowledge the National Science Foundation (NSF) CCLI
`for grants in education that supported in part the preparation of several computer
`examples and paradigms in psychoaeoustics and signal coding. Also some of
`the early work in coding of Dr. Spanias was supported by the Naval Research
`Laboratories (NRL) and we would like to thank that organization for providing
`ideas for projects that inspired future work in this area. We also wish to thank
`ASU and some of the faculty and administrators that provided moral and material
`SUpport for work in this area. Thanks are extended to current ASU students
`Shibani Misra, Visar Berisha, and Mahesh Banavar for proofreading some of
`the material. We thank the Wiley lnterseience production team George Telecki,
`Melissa Yanuzzi, and Rachel Witmer for their diligent efforts in copyediting,
`cover design, and typesetting. We also thank all the anonymous reviewers for
`
`Page 7
`
`
`
`Page 7
`
`

`

`XX
`
`
`PREFACE
`
`
`
`
`their useful comments. Finally,
`for their support.
`The book cunlcnl is used frequently in ASU online courses and industry short
`
`courses ul'l'urutl by Andrew; Spunins. Contact Andreas Spanias (spanias@asu.edu /
`hll[II/{WWW.i-llill')]].tl.\ill.Clill/“'Nj_‘lillliil§0 for details.
`
`we all wish to express our thanks to our families
`
`
`' Resources used for obtaining important dates in recording history include web sites ill the University
`of Sam Diego Arizona State University, and Wikipediil.
`
`Page 8
`
`

`

`families
`
`:ry short
`su.edu /
`
`CHAPTER 1
`
`INTRODUCTION
`
`Audio coding or audio compression algorithms are used to obtain compact dig-
`ital representations of high-fidelity (widchand) audio signals for the purpose of
`efficient transmission or storage. The central objective in audio coding is to rep-
`resent the signal with a minimum number of bits while achieving transparent
`signal reproduction, i.e., generating output audio that cannot be distinguished
`from the original input, even by a sensitive listener ("golden ears"). This text
`gives an in-depth treatment of algorithms and standards for transparent coding
`of high-fidelity audio.
`
`1.1 HISTORICAL PERSPECTIVE
`
`The introduction of the compact disc (CD) in the early 1980s brought to the
`fore all of the advantages of digital audio representation, including true high-
`fidelity, dynamic range, and robustness. These advantages, however, came at
`the expense of high data rates. Conventional CD and digital audio tape (DAT)
`systems are typically sampled at either 44.1 or 48 kHz using pulse code mod-
`ulation (PCM) with a 16-bit sample resolution. This results in uncompressed
`data rates of 705.6/768 kb/s for a monaural channel, or 1.41/1.54 Mb/s for a
`stereo-pair. Although these data rates were accommodated successfully in first-
`generation CD and DAT players, second-generation audio players and wirelessly
`connected systems arc often subject to bandwidth constraints that are incompat-
`ible with high data rates. Because of the success enjoyed by the first-generation
`
`:rsity
`
`Audio Signal Processing and Coding, by Andreas Spanias, Ted Painter, and Venkatcaman Atti
`Copyright
`2007 by John Wiley & Sons, Inc.
`
`1
`
`Page 23
`
`Page 23
`
`

`

`2
`
`INTRODUCTION
`
`systems, however, end users have come to expect "CD-quality" audio reproduc-
`tion from any digital system. Therefore, new network and wireless multimedia
`digital audio systems must reduce data rates without compromising reproduc-
`tion quality. Motivated by the need for compression algorithms that can satisfy
`simultaneously the conflicting demands of high compression ratios and trans-
`parent quality for high-fidelity audio signals, several coding methodologies have
`been established over the last two decades. Audio compression schemes, in gen-
`eral, employ design techniques that exploit both permnual irrelevancies and
`statistical redundancies.
`PCM was the primary audio encoding scheme employed until the early 1980s.
`PCM does not provide any mechanisms for redundancy removal. Quantization
`methods that exploit the signal correlation, such as differential PCM (DPCM),
`delta modulation Daya761 liaya$41, and adaptive DPCM (ADPCM) were applied
`to audio compression later (e.g., PC audio cards). Owing to the need for dras-
`tic reduction in bit rates, researchers began to pursue new approaches for audio
`coding based on the principles of psychoacoustics (Zwic90] [Moor03]. Psychoa-
`coustic notions in conjunction with the basic properties of signal quantization
`have led to the theory of perceptual entropy [John880 (John8813J. Perceptual
`entropy is a quantitative estimate of the fundamental limit of transparent audio
`signal compression. Another key contribution to the field was the characterization
`of the auditory filter bank and particularly the time-frequency analysis capabili-
`ties of the inner ear fMoor831. Over the years, several filter-bank structures that
`mimic the critical band structure of the auditory filter bank have been proposed.
`A fi lter bank is a parallel bank of bandpass filters covering the audio spectrum,
`which, when used in conjunction with a perceptual model, can play an important
`role in the identification of perceptual irrelevancies.
`During the early 1990s, several workgroups and organizations such as
`the International Organization for Standardization/International Electro-technical
`Commission (ISO/IEC), the International Telecommunications Union (ITU),
`AT&T, Dolby Laboratories, Digital Theatre Systems (DTS), Lucent Technologies,
`Philips, and Sony were actively involved in developing perceptual audio coding
`algorithms and standards. Some of the popular commercial standards published
`in the early 1990s include Dolby's Audio Coder-3 (AC-3), the DTS Coherent
`Acoustics (DTS-CA), Lucent Technologies' Perceptual Audio Coder (PAC),
`Philips' Precision Adaptive Subhand Coding (PASO), and Sony's Adaptive
`Transform Acoustic Coding (ATRAC). Table 1.1 lists chronologically some of
`the prominent audio coding standards. The commercial success enjoyed by
`these audio coding standards triggered the launch of several multimedia storage
`formats.
`Table 1.2 lists some of the popular multimedia storage formats since the begin-
`ning of the CD era. High-performance stereo systems became quite common with
`the advent of CDs in the early 1980s. A compact-disc—read only memory (CD-
`ROM) can store data up to 700-800 MB in digital form as "microscopic-pits"
`that can he read by a laser beam off of a reflective surface or a medium. Three
`competing storage media — DAT, the digital compact cassette (DCC), and the
`
`Page 24
`
`Page 24
`
`

`

`11110"•--
`
`Table 1.1. List of perceptual and lossless audio coding standards/algorithms.
`
`HISTORICAL PERSPECTIVE
`
`3
`
`Standard/algorithm
`
`1. ISO/IEC MPF,G-1 audio
`2. Philips' PASC (for DCC applications)
`3. AT&T/Lucent PAC/EPAC
`4. Dolhy AC-2
`5, AC-3/Dolby Digital
`6. ISO/IEC MPEG-2 (BC/LSF) audio
`7. Sony's ATRAC; (MiniDisc and SDDS)
`8. SHORTEN
`9. Audio processing technology - APT-s100
`10. ISO/IEC MPEG-2 AAC
`11. DTS coherent acoustics
`12. The DVD Algorithm
`13. MUSICompress
`14. Lossless transform coding of audio (LTAC)
`15. AudioPaK
`16. ISO/IEC MPEG-4 audio version 1
`17. Meridian lossless packing (MLP)
`18. ISO/IEC MPEG-4 audio version 2
`19. Audio coding based on integer transforms
`20. Direct-stream digital (DSD) technology
`
`Related references
`
`[ISOI92J
`[Lokh92]
`[.1ohn96c] [Sinh96]
`[Davi92] [FieI91]
`[Davis93J 1Fie196]
`[IS0194a]
`[Yosh94] [Tsut96]
`[Robi94]
`[Wyli96b]
`(1S01961
`[Smyt96] [Smyt99]
`[Crav96] [Crav97]
`[Wege97]
`[Pura97]
`[Hans98b] [Hans01]
`[1S0199]
`[Gerz99]
`[ISOI00]
`[Geig01] [Geig02]
`[ReefOla] [Jans03]
`
`Table 1.2. Some of the popular audio storage
`formats.
`
`Audio storage format
`
`Related references
`
`1. Compact disc
`2. Digital audio tape (DAT)
`3. Digital compact cassette (DCC)
`4. MiniDisc
`5. Digital versatile disc (DVD)
`6. DVD-audio (DVD-A)
`7. Super audio CD (SACD)
`
`[CD82] [1ECA87]
`[Watk88] [Tan89]
`[Lokh91] [Lokh92]
`[Yosh94] [Tsut96]
`[DVD96]
`[DVDOI ]
`[SACD02]
`
`MiniDisc (MD) - entered the commercial market during 1987-1992. Intended
`mainly for hack-up high-density storage (-1.3 GB). the DAT became the primary
`source of mass data storage/Transfer [Watk88] ITan891. In 1991-1992, Sony pro-
`posed a storage medium called the MiniDisc, primarily for audio storage. MD
`employs the ATRAC algorithm for compression. In 1991, Philips introduced the
`DCC, a successor of the analog compact cassette. Philips DCC employs a com-
`pression scheme called the PASC [Lokh91] lLokh921 11-loog9211. The. DCC began
`
`Page 25
`
`Page 25
`
`

`

`4
`
`INTRODUCTION
`
`as a potential competitor for :DAL's but was discontinued in 1996. The introduc-
`tion of the digital versatile disc (DVD) in 1996 enabled both video and audio
`recording/storage as well as text-message programming. The DVD became one
`of the most successful storage media. With the improvements in the audio com-
`pression and DVD storage technologies, multichannel surround sound encoding
`formats gained interest [Bosi93] [Ho111199] [Bosi00].
`With the emergence of streaming audio applications, during
`late
`the
`1990s, researchers pursued techniques such as combined speech and audio
`architectures, as well as joint source-channel coding algorithms that are optimized
`for the packet-switched Internet. The advent of ISO/IEC MPEG-4 standard
`(1996-2000) [1S0199] [ISOT00] established new research goals for high-quality
`coding of audio at low bit rates. MPEG-4 audio encompasses more functionality
`than perceptual coding [Koen981 [Koen99]. it comprises an integrated family of
`algorithms with provisions for scalable, object-based speech and audio coding at
`bit rates from as low as 200 h/s up to 64 kb/s per channel.
`The emergence of the DVD-audio and the super audio CD (SACD) pro-
`vided designers with additional storage capacity, which motivated research in
`lossless audio coding 1Crav961 1Gerz991 [ReefOl a]. A lossless audio coding sys-
`tem is able to reconstruct perfectly a bit-for-bit representation of the original
`input audio. In contrast, a coding scheme incapable of perfect reconstruction is
`called lossy. For most audio program material, lossy schemes offer the advan-
`tage of lower bit rates (e.g., less than 1 hit per sample) relative to lossless
`schemes (e.g., 10 bits per sample). Delivering real-time lossless audio content
`to the network browser at low bit rates is the next grand challenge for codes
`designers.
`
`1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE
`
`Over the last few years, researchers have proposed several efficient signal models
`(e.g., transform-based, subband-filter structures, wavelet-packet) and compression
`standards (Table 1.1) for high-quality digital audio reproduction. Most of these
`algorithms are based on the generic architecture shown in Figure 1.1.
`The coders typically segment input signals into quasi-stationary frames ranging
`horn 2 to 50 ms. Then, a time-frequency analysis section estimates the temporal
`and spectral components of each frame. The time-frequency mapping is usually
`matched to the analysis properties of the human auditory system. Either way,
`the ultimate objective is to extract from the input audio a set of time-frequency
`parameters that is amenable to quantization according to a perceptual distortion
`metric. Depending on the overall design objectives, the time-frequency analysis
`section usually contains one of the following:
`
`• Unitary transform
`• Time-invariant hank of critically sampled, uniform/nonuniform bandpass
`filters
`
`Page 26
`
`Page 26
`
`

`

`due-
`udio
`one
`:om-
`ding
`
`late
`udio
`ized
`lard
`pity
`ity
`y of
`g at
`
`nro-
`i
`ys-
`inal
`n is
`'an-
`less
`tent
`dec
`
`leis
`ion
`ese
`
`ing
`oral
`Lily
`ay,
`icy
`on
`sis
`
`155
`
`Input
`audio
`
`Time-
`frequency
`analysis
`
`Quantization
`and encoding
`
`Psychoacoustic
`analysis
`
`Bit-allocation
`
`AUDIO CODER ATTRIBUTES
`
`5
`
`Parameters
`
`Entropy
`(lossless)
`coding
`
`MUX
`
`To
`channel
`
`Masking
`thresholds
`
`Side
`information
`
`Figure 1.1. A generic perceptual audio encoder.
`
`• Time-varying (signal-adaptive) bank of critically sampled, uniform/nonunif-
`orm bandpass filters
`• Harmonic/sinusoidal analyzer
`• Source-system analysis (LPC and multipulse excitation)
`• Hybrid versions of the above.
`
`The choice of time-frequency analysi.s methodology always involves a fun-
`damental tradeoff between time and frequency resolution requirements. Percep-
`tual distortion control is achieved by a psychoacoustic signal analysis section
`that estimates signal masking power based on psychoacoustic principles. The
`psychoacoustic model delivers masking thresholds that quantify the maximum
`amount of distortion at each point in the time-frequency plane such that quan-
`tization of the time-frequency parameters does not introduce audible artifacts.
`The psychoacoustic model therefore allows the quantization section to exploit
`perceptual irrelevancies. This section can also exploit statistical redundancies
`through classical techniques such as DPCM or ADPCM. Once a quantized com-
`pact parametric set has been formed, the remaining redundancies are typically
`removed through noiseless run-length (RL) and entropy coding techniques, e.g.,
`Huffman [Cove91], arithmetic [Witt87], or Lempel-Ziv-Welch (LZW) [Ziv77]
`Nelc84l. Since the output of the psychoacoustic distortion control model is
`signal-dependent, most algorithms are inherently variable rate. Fixed channel
`rate requirements are usually satisfied through buffer feedback schemes, which
`often introduce encoding delays.
`
`1.3 AUDIO CODER ATTRIBUTES
`
`Perceptual audio coders arc typically evaluated based on the following attributes:
`audio reproduction quality, operating bit rates, computational complexity, codec
`delay, and channel error robustness. The objective is to attain a high-quality
`(transparent) audio output at low bit rates (<32 kb/s), with an acceptable
`
`Page 27
`
`Page 27
`
`

`

`6
`
`INTRODUCTION
`
`algorithmic delay (--.5 to 20 ms), and with low computational complexity (-1 to
`10 million instructions per second, or MIPS).
`
`1.3.1 Audio Quality
`Audio quality is of paramount importance when designing an audio coding
`algorithm. Successful strides have been made since the development of sim-
`ple near-transparent perceptual coders. Typically, classical objective measures of
`signal fidelity such as the signal to noise ratio (SNR) and the total harmonic
`distortion (THD) are inadequate IRyde961. As the field of perceptual audio cod-
`ing matured rapidly and created greater demand for listening tests, there was a
`corresponding growth of interest in perceptual measurement schemes. Several
`subjective and objective quality measures have been proposed and standard-
`ized during the last decade. Some of these schemes include the noise-to-mask
`ratio (NMR, 1987) IBran87a1 the perceptual audio quality measure (PAQM,
`1991) IBeer9 I I, the perceptual evaluation (PERCEVAL, 19921 [Pa iI92[, the per-
`ceptual objective measure (POM, 1995) [Colo95], and the objective audio signal
`evaluation (OASE, 1997) [Spor97]. We will address these and several other qual-
`ity assessment schemes in detail in Chapter 12.
`
`1.3.2 Bit Rates
`From a codec designer's point of view, one of the key challenges is to rep-
`resent high-fidelity audio with a minimum number of bits. For instance, if a
`5-ms audio frame sampled at 48 kHz (240 samples per frame) is represented
`using 80 bits, then the encoding hit rate would he 80 bits/5 ms = 16 kb/s. Low
`bit rates imply high compression ratios and generally low reproduction qual-
`ity. Early coders such as the ISO/IEC MPEG-I (32-448 kb/s), the Dolby AC-3
`(32-384 kb/s), the Sony ATRAC (256 kb/s), and the Philips PASC (192 kb/s)
`employ high bit rates for obtaining transparent audio reproduction. However, the
`developinerit of several sophisticated audio .coding tools (e.g., MPEG-4 audio
`tools) created ways for eflicient transmission or storage of audio at rates between
`8 and 32 kb/s. Future audio coding algorithms promise to offer reasonable qual-
`ity at low rates along with the ability to scale both rate and quality to match
`different requirements such as time-varying channel capacity.
`
`1.3.3 Complexity
`Reduced computational complexity not only enables real-time implementation
`but may also decrease the power consumption and extend battery life. Com-
`putational complexity is usually measured in terms of millions of instructions
`per second (MIPS). Complexity estimates are processor-dependent. For example,
`the complexity a

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket