`
`Office of Business Enterprises
`Duplication Services Section
`
`THIS IS TO CERTIFY that the collections of the Library of Congress contain a bound
`volume entitled AUDIO SIGNAL PROCESSING AND CODING, call number TK
`5102.92.S73 2007, Copy 1. The attached — Cover Page, Title Page, Copyright Page, Table of
`Contents Pages, Chapter 1 and Chapter 10 - are a true and complete representation from that
`work.
`
`THIS IS TO CERTIFY FURTHER, that work is marked with a Library of Congress
`Cataloging-in-Publication stamp dated March 5, 2007.
`
`IN WITNESS WHEREOF, the seal of the Library of Congress is affixed hereto on
`May 18, 2018.
`
`Deirdre Scott
`Business Enterprises Officer
`Office of Business Enterprises
`Library of Congress
`
`HULU LLC
`101 Independence Avenue, SE Washington, DC 20540-4917 Tel 202.707.5650 www.locsgov; duplicationservices@loc.gov Exhibit 1009
`IPR2018-01187
`Page 1
`
`Comcast - Exhibit 1009, page 1
`
`
`
`Andreas Spanias, Ted Painter, and Venkatraman Atti
`
`g
`
`0"
`
`Page 2
`
`Comcast - Exhibit 1009, page 2
`
`
`
`AUDIO SIGNAL
`PROCESSING
`AND CODING
`
`Andreas §panias
`Ted Painter
`Venkatraman Atti
`
`VIC CH T CNIVIA L
`
`C
`
`C
`
`1 B 0 7
`WW 1 LEY
`2 0 0 7
`
`BICCNVC14141AL
`
`WILEY-INTERSCIENCE
`A John Wiley & Sons, Inc., Publication
`
`Page 3
`
`Comcast - Exhibit 1009, page 3
`
`
`
`•
`
`r
`r
`Copyright © 2007 b John Wilet.riCk._ SoTis;
`
`rights reserved,
`
`evill Jersey.
`
`poboken
`
`Published by John Wiley kjibus.
`Published simultaneously in Cari5da.-
`No purl of this.publieatton may he reproaced, stored id a intrievaltsystenti, or transmitted is ;spy
`form or by any walls. eiectronie. rtieclratlte al. pnotueopying. t•ecording. scanning. or otherwise,
`except as pomincd undei %Vik] 11)7 or 108 of the 19Th United States Copyright Act, without
`either the prior written permission ut the Publisher. or authorization thrinigh payment of the
`appropriate per-copy fee to the Copyright Clearance Center, I 11 C., 222 Rosewood Drive, Danvers,
`MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyrighLeom. Requests
`to the Publisher for permission should be addressed to the Permissions Department, John Wiley &
`Sons, Inc.. 111 River Street, Hoboken. NJ 07030, (201) 748-601 1, fax (201) 748-6008, or online at
`http://www.wiley.com/go/permission.
`
`Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
`efforts in preparing this book, they make no representations or warranties with respect to the
`accuracy or completeness of the contents of this book and specifically disclaim any implied
`warranties of merchantability or fitness for a particular purpose. No warranty may be created or
`extended by sales representatives or written sales materials. The advice and strategies contained
`herein may not be suitable for your situation. You should consult with a professional where
`appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other
`commercial damages, including but not limited to special, incidental. consequential, or other
`damages.
`
`For general information on our other products and services or for technical support, please contact
`our Customer Care Department within the United States at (800) 762-2974, outside the United
`States at (317) 572-3993 or fax (317) 572-4002.
`
`Wiley also publishes its books in a variety of electronic formats. Some content that appears in print
`may not be available in electronic formats. For more information about Wiley products. visit our
`web site at www.wiley.com.
`Wiley n icen Le1111 a I Logo: Richard J. Pacific()
`
`Library of Congress Cataloging-in-Publication Data:
`
`Spanias, Andreas.
`Audio signal processing and coding/by Andreas Spanias, Ted Painter, Venkutraman AM.
`p. CIT1,
`“Wiley-Interscience publication."
`Includes bibliographical references and index.
`ISBN: 978-0-471-79147-8
`I. Coding theory. 2. signal processing -Digital techniques. 3 Sound-Recording and
`reptodueing-Digital techniques. I. Painter_ Ted, 1967-1i. Aui, Venkatramum 1978-111.
`Title.
`
`TK5102.92.873 2006
`621.382'8- dc22
`
`Printed in the United States of America.
`10 9 8 7 6 5 4 3 7 I
`
`2006040507
`
`Page 4
`
`Comcast - Exhibit 1009, page 4
`
`
`
`CONTENTS
`
`PREFACE
`
`1
`
`INTRODUCTION
`
`1.1
`Historical Perspective
`1 .2 A General Perceptual Audio Coding Architecture
`1.3 Audio Coder Attributes
`1.3. 1
`Audio Quality
`1.3.2
`Bit Rates
`1.3.3 Complexity
`1 ,3.4 Codec Delay
`1.3.5
`Error Robustness
`Types of Audio Coders — An Overview
`1.4
`1 .5 Organization of the Book
`1.6 Notational Conventions
`Problems
`Computer Exercises
`
`2 SIGNAL PROCESSING ESSENTIALS
`
`2. 1
`Introduction
`Spectra of Analog Signals
`2.2
`2.3 Review of Convolution and Filtering
`2.4 Uniform Sampling
`2.5 Discrete-Time Signal Processing
`
`xv
`
`1
`
`4
`5
`6
`6
`
`7
`7
`7
`8
`9
`I I
`1 I
`
`13
`
`13
`1 3
`16
`17
`20
`
`vii
`
`Page 5
`
`Comcast - Exhibit 1009, page 5
`
`
`
`Viii
`
`CONTENTS
`
`Transforms for Discrete-Time Signals
`2.5.1
`The Discrete and the Fast Fourier Transform
`2.5.2
`The Discrete Cosine Transform
`2.5.3
`2.5.4 The Short-Time Fourier Transform
`2.6 Difference Equations and Digital Filters
`The Transfer and the Frequency Response Functions
`2.7
`Poles. Zeros, and Frequency Response
`2.7.1
`Examples of Digital Filters for Audio Applications
`2.7.2
`2.8 Review of Multirate Signal Processing
`2.8.1 Down-sampling by an Integer
`2.8.2 Up-sampling by an Integer
`Sampling Rate Changes by Noninteger Factors
`2.8.3
`2.8.4 Quadrature Mirror Filter Banks
`2.9 Discrete-Time Random Signals
`2.9.1 Random Signals Processed by LTI Digital Filters
`2.9.2 Autocorrelation Estimation from Finite-Length Data
`2.10 Summary
`Problems
`Computer Exercises
`
`3 QUANTIZATION AND ENTROPY CODING
`
`3. 1
`
`Introduction
`3. 1 .1
`The Quantization —Bit Allocation —Entropy Coding
`Module
`3.2 Density Functions and Quantization
`3.3
`Scalar Quantization
`3.3.1 Uniform Quantization
`3.3.2 Nonuniform Quantization
`3.3.3 Differential PCM
`3.4 Vector Quantization
`3.4.1
`Structured VQ
`3.4.2
`Split-VQ
`3.4.3 Conjugate-Structure VQ
`3.5 Bit-Allocation Algorithms
`3.6
`Entropy Coding
`3.6.1 Huffman Coding
`3.6.2 Rice Coding
`3.6.3 Golomb Coding
`
`20
`22
`23
`23
`25
`27
`29
`30
`33
`33
`35
`36
`36
`39
`42
`44
`44
`45
`47
`
`51
`
`5I
`
`52
`53
`54
`54
`57
`59
`62
`64
`67
`69
`70
`74
`77
`81
`82
`
`Page 6
`
`Comcast - Exhibit 1009, page 6
`
`
`
`CONTENTS
`
`iX
`
`3.7
`
`3.6.4 Arithmetic Coding
`Summary
`Problems
`Computer Exercises
`
`4
`
`LINEAR PREDICTION IN NARROWBAND AND WIDEBAND
`CODING
`
`83
`85
`55
`
`91
`
`91
`92
`94
`95
`90
`90
`97
`11)1)
`102
`102
`104
`106
`107
`108
`
`113
`
`1 13
`1 1•I
`I 15
`
`120
`123
`I24
`124
`12-1
`125
`127
`128
`
`130
`131
`
`Page 7
`
`4. 1
`4.2
`4.3
`
`Introduction
`LP-Based Source-System Modeling for Speech
`Short-Term Linear Prediction
`4.3.1
`Long-Term Prediction
`4.3.2 ADPCM Using Linear Prediction
`4.4 Open-Loop Analysis-Synthesis Linear Prediction
`4.5 Analysis-by-Synthesis Linear Prediction
`Code-Excited Linear Prediction Algorithms
`4.5.1
`Linear Prediction in Wideband Coding
`4.6.1 Wideband Speech Coding
`4.6.2 Wideband Audio Coding
`Summary
`Problems
`Computer Exercises
`
`4.6
`
`4.7
`
`5 PSYCHOACOUSTIC PRINCIPLES
`
`5.1
`5.2
`5. 3
`5.4
`
`introduction
`Absolute Threshold of Hearing
`Critical Bands
`Simultaneous Masking, Masking Asymmetry. and the Spread
`of Masking
`5.4.1 Noise-Masking-Tone
`Tone-Masking-Noise
`5.4.2
`5.4.3 Noise-Masking-Noise
`5.4.4 Asymmetry of Masking
`The Spread of Masking
`5.4.5
`5.5 Nonsimultaneous Masking
`Perceptual Entropy
`5.6
`Fix .dinple Codec Perceptual Model: ISO/IEC 1 1 172-3
`5.7
`(MPECT - I) Psychcacoustic Model I
`Step I : Spectral ,AnillyNis and SP1, Normalization
`5.7. 1
`
`Comcast - Exhibit 1009, page 7
`
`
`
`X
`
`CONTENTS
`
`Step 2: Identification of Tonal and Noise. Maskers
`5.7.2
`Step 3: Decimation and Reorganization of Maskers
`5.7.3
`Step 4: Calculation of Individual Masking Thresholds
`5.7.4
`Step 5: Calculation of Global Masking Thresholds
`5.7.5
`Perceptual Bit Allocation
`Summary
`Problems
`Computer Exercises
`
`5.8
`5.9
`
`6 TIME-FREQUENCY ANALYSIS: FILTER BANKS AND
`TRANSFORMS
`
`6.1
`6.2
`6.3
`
`6.3.2
`
`6.3.3
`
`Introduction
`Analysis-Synthesis Framework for M-hand Filter Banks
`Filter Banks for Audio Coding: Design Considerations
`6.3.I
`The Role of Time-Frequency Resolution in Masking
`Power Estimation
`The Role of Frequency Resolution in Perceptual Bit
`Allocation
`The Role of Time Resolution in Perceptual Bit
`Allocation
`6.4 Quadrature Minor and Conjugate Quadrature Filters
`Tree-Structured QMF and CQF M-band Banks
`6.5
`6.6 Cosine Modulated "Pseudo QMF" M-band Banks
`6.7 Cosine Modulated Perfect Reconstruction (PR) .41-band Banks
`and the Modified Discrete Cosine Transform (MDCT)
`6.7.1
`Forward and Inverse MDCT
`6.7.2 MDCT Window Design
`Example MDCT Windows (Prototype FIR Filters)
`6.7.3
`6.8 Discrete Fourier and Discrete Cosine Transform
`6.9
`Pre-echo Distortion
`6.10 Pre-echo Control Strategies
`6.10. 1 Bit Reservoir
`6. 10.2 Window Switching
`6.10.3 Hybrid. Switched Filter Banks
`6.10.4 Gain Modification
`6.10.5 Temporal Noise Shaping
`6. 1 1 Summary
`Problems
`Computer Exercises
`
`13 I
`135
`136
`138
`138
`140
`140
`141
`
`145
`
`145
`146
`148
`
`149
`
`149
`
`150
`155
`156
`160
`
`163
`165
`165
`167
`178
`180
`182
`182
`I 82
`184
`185
`
`1188(5)
`188
`191
`
`Page 8
`
`Comcast - Exhibit 1009, page 8
`
`
`
`CONTENTS
`
`Xi
`
`7
`
`TRANSFORM CODERS
`
`7.1
`Introduction
`7.2 Optimum Coding in the Frequency Domain
`7.3
`Perceptual Transform Coder
`7.3. 1
`PXFM
`7.3.2
`SEPXFM
`7.4
`Brandenburg-Johnston Hybrid Coder
`7.5 CNET Coders
`7.5. 1
`CNET DR' Coder
`7.5.2 CNET MDCT Coder I
`7.5.3 CNET MDCT Coder 2
`7.6 Adaptive Spectral Entropy Coding
`7.7 Differential Perceptual Audio Coder
`7.8 DFT Noise Substitution
`7.9 DCT with Vector Quantization
`7.10 MDCT with Vector Quantization
`7. 11 Summary
`Problems
`Computer Exercises
`
`8 SUBBAND CODERS
`
`8. I
`
`Introduction
`Subhand Algorithms
`8.1.1
`8.2 DWT and Discrete Wavelet Packet Transform (DWPTt
`8.3 Adapted WP Algorithms
`8.3.1 DWPT Coder with Globally Adapted Daubechies
`Analysis Wavelet
`Scalable DWPT Coder with Adaptive Tree Structure
`8.3.2
`8.3.3 DWPT Coder with Glohally Adapted General
`Analysis Wavelet
`8.3.4 DWPT Coder with Adaptive 'Free Structure and
`Locally Adapted Analysis Wavelet
`8.3.5 DWPT Coder with Perceptually Optimized Synthesis
`Wavelets
`8.4 Adapted Nonuniform Filter Banks
`Switched Nonuniform Filter Bank Cascade
`8.4.1
`Frequency-Varying Modulated Lapped Transforms
`8.4.2
`8.5 Hybrid WP and Adapted WP/Sinusoidal Algorithms
`
`195
`
`195
`196
`197
`198
`199
`
`200
`201
`201
`201
`202
`203
`204
`205
`206
`207
`208
`208
`210
`
`211
`
`21 1
`212
`214
`218
`
`218
`220
`
`22 3
`
`223
`
`224
`226
`226
`227
`117
`
`Page 9
`
`Comcast - Exhibit 1009, page 9
`
`
`
`Xii
`
`CONTENTS
`
`8.5.1 Hybrid Sinusoidal/Classical DWPT Coder
`8.5.2 Hybrid Sinusoidal/M-band DWPT Coder
`8.5.3 Hybrid Sinusoidal/DWPT Coder with WP Tree
`Structure Adaptation (ARCO)
`Subband Coding with Hybrid Filter Bank/CELP Algorithms
`8.6.1 Hybrid Subband/CELP Algorithm for Low-Delay
`Applications
`8.6.2 Hybrid Subband/CELP Algorithm for
`Low-Complexity Applications
`Subband Coding with IIR Filter Banks
`Problems
`Computer Exercise
`
`8.6
`
`8.7
`
`9 SINUSOIDAL CODERS
`
`9.1
`9.2
`
`9.3
`
`Introduction
`The Sinusoidal Model
`9.2.1 Sinusoidal Analysis and Parameter Tracking
`9.2.2
`Sinusoidal Synthesis and Parameter Interpolation
`Analysis/Synthesis Audio Codec (ASAC)
`9.3.1 ASAC Segmentation
`9.3.2 ASAC Sinusoidal Analysis-by-Synthesis
`9.3.3 ASAC Bit Allocation, Quantization, Encoding, and
`Scalabi lity
`9.4 Harmonic and Individual Lines Pius Noise Coder (HILN)
`9.4.1 HILN Sinusoidal Analysis-by-Synthesis
`9.4.2 HILN Bit Allocation, Quantization. Encoding, and
`Decoding
`9.5 FM Synthesis
`9.5.1 Principles of FM Synthesis
`Perceptual Audio Coding Using an FM Synthesis
`9.5.2
`Model
`The Sines + Transients + Noise (STN) Model
`9.6
`9.7 Hybrid Sinusoidal Coders
`9.7.1 Hybrid Sinusoidal-MDCT Algorithm
`9.7.2 Hybrid Sinusoidal-Vocoder Algorithm
`Summary
`Problems
`Computer Exercises
`
`9.8
`
`228
`229
`
`230
`233
`
`234
`
`235
`237
`237
`240
`
`241
`
`241
`242
`242
`245
`247
`248
`248
`
`248
`249
`250
`
`251
`251
`252
`
`252
`254
`255
`256
`257
`258
`258
`259
`
`Page 10
`
`Comcast - Exhibit 1009, page 10
`
`
`
`CONTENTS
`
`xiii
`
`10 AUDIO CODING STANDARDS AND ALGORITHMS
`
`10.1
`Introduction
`10.2 MIDI Versus Digital Audio
`10.2. 1 MIDI Synthesizer
`10.2.2 General MIDI (GM)
`10.2.3 MUM Applications
`10.3 Multichannel Surround Sound
`10.3. 1 The Evolution of Surround Sound
`10.12 The Mono, the Stereo, and the Surround Sound
`Formats
`10.3.3 The ITU-R 135.775 5. I -Channel Configuration
`10.4 MPE(.3 Audio Standards
`10.4.1 MPEG-1 Audio (ISO/IEC 11172-3)
`10.4.2 MPEG-2 BC/LSF (1SO/IEC-13818-3)
`10.4.3 MPEG-2 NBC/AAC (ISO/TEC-13818-7)
`10.4.4 MPEG-4 Audio (1SO/1EC 14496-3)
`10.4.5 MPEG-7 Audio (ISO/1EC 15938-4)
`10.4.6 MPEG-21 Framework (ISO/IEC-21000)
`10.4.7 MPEG Surround and Spatial Audio Coding
`10.5 Adaptive Transform Acoustic Coding (ATRAC)
`10.6 Lucent Technologies PAC, EPAC, and MPAC
`10.6.1 Perceptual Audio Coder (PAC)
`10.6.2 Enhanced PAC (EPAC)
`10.6.3 Multichannel PAC (MPAC)
`10.7 Dolby Audio Coding Standards
`10.7.1 Dolby AC-2, AC-2A
`10.7.2 Dolhy AC-3/Dolby Digital/Dolby SR L)
`10.8 Audio Processing Technology APT-x 100
`10.9 DTS — Coherent Acoustics
`10.9.1 Framing and Subband Analysis
`10.9.2 Psychoacoustic Analysis
`I0.9.3 ADPCM — Differential Subband Coding
`10.9.4 Bit Allocation. Quantization. and Multiplexing
`10.9.5 DTS-CA Versus Dolby Digital
`Problems
`Computer Exercise
`
`263
`
`763
`264
`264
`266
`266
`267
`267
`
`268
`268
`270
`275
`279
`283
`289
`309
`317
`319
`319
`32]
`321
`323
`323
`325
`325
`327
`335
`338
`338
`339
`339
`341
`342
`342
`347
`
`11 LOSSLESS AUDIO CODING AND DIGITAL WATERMARKING 343
`
`11 . 1
`
`Introduction
`
`343
`
`Page 11
`
`Comcast - Exhibit 1009, page 11
`
`
`
`xiv
`
`CONTENTS
`
`11 .2 Lossless Audio Coding (L2AC)
`1 1 .2.1 L2AC Principles
`11.2.2 L2AC Algorithms
`11.3 DVD-Audio
`11.3.1 Meridian Lossless Packing (MLP)
`11.4 Super-Audio CD (SACD)
`1 1.4.1 SACD Storage Format
`11.4.2 Sigma-Delta Modulators (SDM)
`11.4.3 Direct Stream Digital (DSD) Encoding
`11.5 Digital Audio Watermarking
`1 1.5.1 Background
`1 1.5.2 A Generic Architecture for DAW
`11.5.3 DAW Schemes — Attributes
`1 1 .6 Summary of Commercial Applications
`Problems
`Computer Exercise
`
`344
`345
`346
`356
`358
`358
`362
`362
`364
`368
`370
`374
`377
`378
`382
`382
`
`12 QUALITY MEASURES FOR PERCEPTUAL AUDIO CODING
`
`383
`
`383
`384
`386
`387
`388
`
`Introduction
`12.1
`12.2 Subjective Quality Measures
`12.3 Confounding Factors in Subjective Evaluations
`12.4 Subjective Evaluations of Two-Channel Standardized Coders
`12.5 Subjective Evaluations of 5.1-Channel Standardized Coders
`12.6 Subjective Evaluations Using Perceptual Measurement
`389
`Systems
`390
`12.6.1 CAR Perceptual Measurement Schemes
`390
`12.6.2 NSE Perceptual Measurement Schemes
`391
`12.7 Algorithms for Perceptual Measurement
`12.7.1 Example: Perceptual Audio Quality Measure (PAQM) 392
`396
`12.7.2 Example: Noise-to-Mask Ratio (NMR)
`12.7.3 Example: Objective Audio Signal Evaluation (OAST:;) 399
`12.8 TTU-R BS.1387 and TTU-T P.861 ; Standards for Perceptual
`Quality Measurement
`12.9 Research Directions for Perceptual Codec Quality Measures
`
`401
`402
`
`REFERENCES
`
`INDEX
`
`405
`
`459
`
`Page 12
`
`Comcast - Exhibit 1009, page 12
`
`
`
`CHAPTER 1
`
`INTRODUCTION
`
`Audio coding or audio compression algorithms are used to obtain compact dig-
`ital representations of high-fidelity (widehand) audio signals for the purpose of
`efficient transmission or storage. The central objective in audio coding is to rep-
`resent the signal with a minimum number of hits while achieving transparent
`signal reproduction, i.e., generating output audio that cannot he distinguished
`from the original input, even by a sensitive listener ("golden ears"). This text
`gives an in-depth treatment of algorithms and standards for transparent coding
`of high-fidelity audio.
`
`1.1 HISTORICAL PERSPECTIVE
`
`The introduction of the compact disc (CD) in the early 1980s brought to the
`fore all of the advantages of digital audio representation, including true high-
`fidelity, dynamic range, and robustness. These advantages, however, came at
`the expense of high data rates. Conventional CD and digital audio tape (DAT)
`systems are typically sampled at either 44.1 or 48 kHz using pulse code mod-
`ulation (PCM) with a 16-bit sample resolution. 'This results in uncompressed
`data rates of 705.6/768 kb/s for a monaural channel, or 1.41/1.54 Mb/s for a
`stereo-pair. Although these data rates were accommodated successfully in first-
`generation CD and DAT players, second-generation audio players and wirelessly
`connected systems are often subject to bandwidth constraints that arc incompat-
`ible with high data rates. Because of the success enjoyed by the first-generation
`
`Audio Signal Pmca.v,s.ing and Coding, by Andreas Spanias. Ted Painter. and Venkatrarnan Atti
`Copyright e) 2007 by John Wiley & Sons. Inc.
`
`1
`
`Page 13
`
`Comcast - Exhibit 1009, page 13
`
`
`
`2
`
`INTRODUCTION
`
`systems, however, end users have come to expect "CD-quality" audio reproduc-
`tion from any digital system. Therefore, new network and wireless multimedia
`digital audio systems must reduce data rates without compromising reproduc-
`tion quality. Motivated by the need for compression algorithms that can satisfy
`simultaneously the conflicting demands of high compression ratios and trans-
`parent quality for high-fidelity audio signals, several coding methodologies have
`been established over the last two decades. Audio compression schemes, in gen-
`eral, employ design techniques that exploit both perceptual irrelevancies and
`statistical redundancies.
`PCIVI was the primary audio encoding scheme employed until the early 1980s.
`PCM does not provide any mechanisms for redundancy removal. Quantization
`methods that exploit the signal correlation, such as differential PCM (DPCM),
`delta modulation [Jaya76] [Jaya84], and adaptive DPCM (ADPCM) were applied
`to audio compression later (e.g., PC audio cards). Owing to the need for dras-
`tic reduction in bit rates, researchers began to pursue new approaches for audio
`coding based on the principles of psychoacoustics [Zwie901 [Moor03J. Psychoa-
`coustic notions in conjunction with the basic properties of signal quantization
`have led to the theory of perceptual entropy lJohn88a1 IJohn88hi. Perceptual
`entropy is a quantitative estimate of the fundamental limit of transparent audio
`signal compression. Another key contribution to the field was the characterization
`of the auditory filter hank and particularly the time-frequency analysis capabili-
`ties of the inner ear fMoor831. Over the years, several ,filler-bank structures that
`mimic the critical hand structure of the auditory fi lter bank have been proposed.
`A fi lter hank is a parallel bank of bandpass filters covering the audio spectrum,
`which, when used in conjunction with a perceptual model, can play an important
`role in the identification of perceptual irrelevancies.
`During the early 1990s, several workgroups and organizations such as
`the International Organization for Standardization/International E]ectro-technical
`Commission (ISO/IFC), the International Telecommunications Union (IT1J).
`AT&T. Dolby Laboratories. Digital Theatre Systems (DTS), Lucent Technologies,
`Philips, and Sony were actively involved in developing perceptual audio coding
`algorithms and standards. Some of the popular commercial standards published
`in the early 1990s include Dolby's Audio Coder-3 (AC-3). the DTS Coherent
`Acoustics (DTS-CA), Lucent Technologies' Perceptual Audio Coder (PAC),
`Philip's' Precision Adaptive Subband Coding (PASC), and Sony's Adaptive
`Transform Acoustic Coding (ATRAC). Table 1.1 lists chronologically some of
`the prominent audio coding standards. The commercial success enjoyed by
`these audio coding standards triggered the launch of several multimedia storage
`formats.
`Table 1.2 lists some of the popular multimedia storage formats since the begin-
`ning of the CD era. High-performance stereo systems became quite common with
`the advent of CDs in the early 1980s. A compact-disc—read only memory (CD-
`ROM) can store data up to 700-800 MB in digital form as "microscopic-pits"
`that can be read by a laser beam off of a reflective surface or a medium. Three
`competing storage media — DAT. the digital compact cassette (DCC), and the
`
`Page 14
`
`Comcast - Exhibit 1009, page 14
`
`
`
`Table 1.1. List of perceptual and lossless audio coding standards/algorithms.
`
`HISTORICAL PERSPECTIVE
`
`3
`
`Standard/al go ri ihn I
`
`1. ISO/SEC MPEG-I audio
`2. Philips' PASC (for DCC applications)
`3. AT&T/Lucent PAC/.EPAC
`4. T)olby AC-2
`5. AC-3/Dolby Digital
`6. ISO/IEC MPEG-2 (BC/ESE) audio
`7. Sony's ATRAC; (MiniDisc and SDDS)
`8. SHORTEN
`9. Audio processing technology - APT-x100
`10. ISO/IEC MPEG-2 AAC
`11. DTS coherent acoustics
`12. The DVD Algorithm
`13. MUSICompress
`14. Lossless transform coding of audio (Ul'AC)
`15. AudioPaK
`16. ISO/IEC MPEG-4 audio version 1
`17. Meridian lossless packing (MLP)
`18. ISO/IEC MPEG-4 audio version 2
`19. Audio coding based on integer transforms
`20. Direct-stream digital (DSD) technology
`
`Related references
`
`11S01921
`[Lok.h921
`[John96c1 [Sinh96]
`[Davi92] [Fie191 J
`[Davis931 IF1e1961
`[ISOI94a]
`1Yosh941 [Tsu1961
`1Robi94]
`[Wyli96b]
`11S0.19611
`[Sinyt961 [Smyt99]
`[Crai,961 [Crav97]
`[Vv'ege971
`[Pura97.1
`[Flans98b] llians011
`[1S0199.1
`1Gerz991
`IIS01001
`1Geig011 [Geig021
`1ReefOla] liJans03:1
`
`Table 1.2. Some of the popular audio storage
`formats.
`
`Audio storage format
`
`Related references
`
`1. Compact disc
`2. Digital audio tape (DAT)
`3. Digital compact cassette (DCC)
`4. MiniDisc
`5. Digital versatile disc (DVD)
`6. DV.D-audio (DVD-A)
`7. Super audio CD (SACD)
`
`lCD821 [1ECAS71
`I Wa1.k881 [Tart891
`[Lok11911 1Lok,h921
`[Yosh941 [Tsut961
`I DVD96l
`1Pvno I
`ISACD021
`
`MiniDisc (MD) - entered the commercial market during 1987-1992. Intended
`mainly for back-up high-density storage (-1.3 GB), the DAT became the primary
`source of mass data storage/transfer fWatk88] lTan891. In 1991-1992. Sony pro-
`posed a storage medium called the MiniDisc, primarily, for audio storage. MD
`employs the ATRAC algorithm for compression. In 1991, Philips introduced the
`DCC, a successor of the analog compact cassette. Philips DCC employs a com-
`pression scheme called the PASC iLokh911 1Lokh921 [Hoog94]. The DCC began
`
`J
`
`Page 15
`
`Comcast - Exhibit 1009, page 15
`
`
`
`4
`
`INTRODUCTION
`
`as a potential competitor for DAL but was discontinued in 1996. The introduc-
`tion of the digital versatile disc (DVD) in 1996 enabled both video and audio
`recording/storage as well as text-message programming. The DVD became one
`of the most successful storage, media. With the improvements in the audio corn-
`pression and DVD storage technologies. multichannel surround sound encoding
`formats gained interest [Bosi93] [Holm99] [Bosi00].
`With the emergence of streaming audio applications. during
`late
`the
`1990s, researchers pursued techniques such as combined speech and audio
`architectures, as well as joint source-channel coding algorithms that are optimized
`for the packet-switched Internet. The advent of ISO/IEC MPEG-4 standard
`(1996-2000) [IS0199] IISOI0011 established new research goals for high-quality
`coding of audio at low bit rates. MPEG-4 audio encompasses more functionality
`than perceptual coding [Koen98] lKoen991. It comprises an integrated family of
`algorithms with provisions for scalable, object-based speech and audio coding at
`bit rates from as low as 200 b/s up to 64 kb/s per channel.
`The emergence of the DVD-audio and the super audio CD (SACD) pro-
`vided designers with additional storage capacity, which motivated research in
`lossless audio coding [Crav96] [Gerz99] [ReefOla]. A losslcss audio coding sys-
`tem is able to reconstruct perfectly a bit-for-bit representation of the original
`input audio. In contrast, a coding scheme incapable of perfect reconstruction is
`called lossy. For most audio program material, lossy schemes offer the advan-
`tage of lower hit rates (e.g., less than 1 bit per sample) relative to lossless
`schemes (e.g., 10 bits per sample). Delivering real-time lossless audio content
`to the network browser at low hit rates is the next grand challenge for codec
`designers.
`
`1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE
`
`Over the last few years, researchers have proposed several efficient signal models
`(e.g., transform-based, subband-filter structures, wavelet-packet) and compression
`standards (Table 1 . 1) for high-quality digital audio reproduction. Most of these
`algorithms are based on the generic architecture shown in Figure 1. 1 .
`The coders typically segment input signals into quasi-stationary frames ranging
`from 2 to 50 ins. Then, a time-frequency analysis section estimates the temporal
`and spectral components of each frame. The time-frequency mapping is usually
`matched to the analysis properties of the human auditory system. Either way,
`the ultimate objective is to extract from the input audio a set of time-frequency
`parameters that is amenable to quantization according to a perceptual distortion
`metric. Depending on the overall design objectives, the time-frequency analysis
`section usually contains one of the following:
`
`• Unitary transform
`• Time-invariant bank of critically sampled, uniform/nonuniform bandpass
`filters
`
`Page 16
`
`Comcast - Exhibit 1009, page 16
`
`
`
`Input
`audio
`
`Time-
`frequency
`analysis
`
`Quantization
`and encoding
`
`Psychoacoustic
`analysis
`
`IT
`
`Bit-allocation
`
`AUDIO CODER ATTRIBUTES
`
`5
`
`Parameters
`
`Entropy
`(lossless)
`coding
`
`MUX
`
`To
`channel
`
`Masking
`thresholds
`
`Side
`information
`Figure 1.1. A generic perceptual audio encoder.
`
`• Time-varying (signal-adaptive) hank of critically sampled, uniform/nonunif-
`orm bandpass fillers
`• Harmonic/sinusoidal analyzer
`• Source-system analysis (LPC and multipulse excitation)
`• Hybrid versions of the above.
`
`The choice of time-frequency analysis methodology always involves a fun-
`damental tradeoff between time and frequency resolution requirements. Percep-
`tual distortion control is achieved by a psychoacoustic signal analysis section
`that estimates signal masking power based on psychoacoustic principles. The
`psychoacoustic model delivers masking thresholds that quantify the maximum
`amount of distortion at each point in the time-frequency plane such that quan-
`tization of the time-frequency parameters does not introduce audible artifacts.
`The psychoacoustic model therefore allows the quantization section to exploit
`perceptual irrelevancies. This section can also exploit statistical redundancies
`through classical techniques such as DPCM or ADPCM. Once a quantized com-
`pact parametric set has been formed, the remaining redundancies are typically
`removed through noiseless run-length (1,11..,) and entropy coding techniques, e.g.,
`Huffman 1Cove911, arithmetic [Witt87], or Lempel-Ziv-Welch (LZW) rzi v77
`Welc841. Since the output of the psychoacoustic distortion control model is
`signal-dependent, most algorithms are inherently variable rate. Fixed channel
`rate requirements are usually satisfied through buffer feedback schemes, which
`often introduce encoding delays.
`
`1.3 AUDIO CODER ATTRIBUTES
`
`Perceptual audio coders are typically evaluated based on the following attributes:
`audio reproduction quality, operating bit rates. computational complexity, codec
`delay, and channel error robustness. The objective is to attain a high-quality
`(transparent) audio output at low bit rates (<32 kb/s), with an acceptable
`
`Page 17
`
`Comcast - Exhibit 1009, page 17
`
`
`
`6
`
`INTRODUCTION
`
`algorithmic delay (---5 to 20 ms). and with low computational complexity (-1 to
`10 million instructions per second, or MIPS).
`
`1.3.1 Audio Quality
`
`Audio quality is of paramount importance when designing an audio coding
`algorithm. Successful strides have been made since the development of sim-
`ple near-transparent perceptual coders. Typically, classical objective measures of
`signal fidelity such as the signal to noise ratio (SNR) and the total harmonic
`distortion (THD) are inadequate [Ryde96]. As the field of perceptual audio cod-
`ing matured rapidly and created greater demand for listening tests, there was a
`corresponding growth of interest in perceptual measurement schemes. Several
`subjective and objective quality measures have been proposed and standard-
`ized during the last decade. Some of these schemes include the noise-to-mask
`ratio (NMR, 1987) [Bran87a] the perceptual audio quality measure (PAQM,
`1991) [Beer91], the perceptual evaluation (PERCF.VAL, 1992) [Pail92], the per-
`ceptual objective measure (POM, 1995) IColo951, and the objective audio signal
`evaluation (OASE, 1997) [Spor971. We will address these and several other qual-
`ity assessment schemes in detail in Chapter 12.
`
`1.3.2 Bit Rates
`
`From a codec designer's point of view, one of the key challenges is to rep-
`resent high-fidelity audio with a minimum number of hits. For instance, if a
`5-ms audio frame sampled at 48 kHz (240 samples per frame) is represented
`using 80 bits, then the encoding bit rate would be 80 bits/5 ms = 16 kb/s. Low
`bit rates imply high compression ratios and generally low reproduction qual-
`ity. Early coders such as the ISO/1EC MPEG-I (32-448 kb/s), the Dolby AC-3
`(32-384 klA/s), the Sony ATRAC (256 kb/s), and the- Philips PASC (192 kb/s)
`employ high bit rates for obtaining transparent audio reproduction. However. the
`development of several sophisticated audio coding tools (e.g., MPEG-4 audio
`tools) created ways for efficient transmission Cr storage of audio at rates between
`8 and 32 kb/s. Future audio coding algorithms promise to offer reasonable qual-
`ity at low rates along with the ability to scale both rate and quality to match
`different requirements such as time-varying channel capacity.
`
`1.3.3 Complexity
`
`Reduced computational complexity not only enables real-time implementation
`hut may. also decrease the power consumption and extend hattely life. Com-
`putational complexity is usually measured in terms of millions of instructions
`per second (MIPS). Complexity estimates arc processor-dependent. For example,
`the complexity associated with Dolby's AC-3 decoder was estimated at approxi-
`mately 27 MIPS using the Zoran ZR38001 general-purpose DSP core 1Vern951;
`for the Motorola DSP56002 processor, the complexity was estimated at 45
`MIPS [Vern95]. Usually, most of the audio coders rely on the so-called asym-
`metric encoding principle. This means that the codec complexity is not evenly
`
`Page 18
`
`Comcast - Exhibit 1009, page 18
`
`
`
`TYPES OF AUDIO CODERS - AN OVERVIEW
`
`7
`
`shared between the encoder and the decoder (typically, encoder 80% and decoder
`20% complexity). with more emphasis on reducing the decoder complexity.
`
`1.3.4 Codec Delay
`Many of the network applications for high-fidelity audio (streaming audio, audio-
`on-demand) are delay tolerant (up to 100-200 ms), providing the opportunity
`to exploit long-term signal properties in order to achieve high coding gain.
`However. in two-way real-time communication and voice-over Internet proto-
`col (VolP) applications, low-delay encoding (10-20 ms) is important. Consider
`the example described before, i.e., an audio coder operating on frames of 5 ms
`at a 48 kHz sampling frequency. In an ideal encoding scenario, the minimum
`amount of delay should be 5 ins at the encoder and 5 ins at the decoder (same as
`the frame length). However, other factors such as analysis-synthesis filter • hank
`window, the look-ahead, the bit-reservoir, and the channel delay contribute to
`additional delays. Employing shorter analysis-synthesis windows, avoiding look-
`ahead, and re-structuring the bit-reservoir functions could result in low-delay
`encoding, nonetheless, with reduced coding efficiencies.
`
`1.3.5 Error Robustness
`
`The increasing popularity of streaming audio over packet-switched and wire-
`less networks such as the Internet implies that any algorithm intended for such
`applications must be able to deal with a noisy thne-varying channel. In partic-
`ular, provisions for error robustness and error protection must be incorporated
`at the encoder in order to achieve reliable transmission of digital audio over
`error-prone channels. One simple idea could be to provide better protection to
`the error-sensitive and priority (important) bits. For instance, the audio frame
`header requires the maximum error rob