`
`Office of Business Enterprises
`Duplication Services Section
`
`THIS IS TO CERTIFYthat the collections of the Library of Congress contain a bound
`volumeentitled AUDIO SIGNAL PROCESSING AND CODING,call number TK
`5102.92.S73 2007, Copy 1. The attached — Cover Page, Title Page, Copyright Page, Table of
`Contents Pages, Chapter 1 and Chapter 10 - are a true and complete representation from that
`work.
`
`THIS IS TO CERTIFY FURTHER,that work is marked with a Library of Congress
`Cataloging-in-Publication stamp dated March 5, 2007.
`
`IN WITNESS WHEREOF,the seal of the Library of Congressis affixed hereto on
`May18, 2018.
`
`4ude
`
`Deirdre Scott
`Business Enterprises Officer
`Office of Business Enterprises
`
`Library of Congress
`
`NETFLIX, INC
`101 Independence Avenue, SEH Washington, DC 20540-4917 Tel 202.707.5650 www.loc.gov; duplicationservices@loc.gov Exhibit 1009
`IPR2018-01630
`Page 1
`
`Page 1
`
`
`
`NETFLIX, INC
`Exhibit 1009
`IPR2018-01630
`
`
`
` ICAUICCORRINALCLOM (aU.CUCIMCOAT
`
`
`1,0, ene
`
` |!
`
`:
`Ai
`|
`fife
`vm eae ata El
`
`| ie=aU lyUH OL |Ua
`
`COAUBbg
`
`AUTOIORY(iE
`BOXCRS ROULI!
`
`
`
`Page 2
`
`
`
`
`
`AUDIO SIGNAL
`PROCESSING
`AND CODING
`
`
`
`Andreas Spanias
`Ted Painter
`Venkatraman Atti
`
`Fe ee eS
`
`|1807([
`=
`z
`
`*| 4WILEY|;
`
`
`
`
`|} 2007 |
`=
`Ir
`WILEY-INTERSCIENCE
`A John Wiley & Sons,Inc., Publication
`
`
`
`
`
`Page 3
`
`
`
`
`
`
`Page 3
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`\
`
`aA
`WA Gop |
`|
`Copyright © 2007 by John Wile Son; fcZAll rightsreserved
`Published by Jobn wiley? keSONS, Ine. Hoboken,NewJersey.
`Published simultaneously in Candilas-——
`
`No part ofthis publication maybe reproduced, stored in a retrieval system. or transmittedin any
`form or by any means. electrome, inechatical, photocopying, recording. scanning. or othenyise,
`except as permitted under Section 107 or 108 ofthe 1976 United States Copyright Act. without
`either the prior written permission of the Publisber. or authorization through payment ofthe
`appropriate per-copy fee Lo the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers,
`MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web al www.copyrighteom. Requests
`lo the Publisher for permission should be addressed to the Permissions Department, John Wiley &
`Song,Inc.. LL] River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at
`hitp:/www.wiley.conVgo/permission
`
`Limit of Liability/Diselaimer of Warranty: While the publisher and author have used their best
`efforts in preparing this book,
`they make no representations or warranties with respect to the
`accuracy or completeness ofthe contents of this book and specitically disclaim any implied
`warranties of merchantability or fitness for a particular purpose. No warranty may be created or
`extended by sales representatives or written sales materials. The advice andstrategies contained
`herein may nol be suilable for your situation. You should consult with a professional where
`appropriate. Neither the publisher nor author shall be lable for any loss of profit or any other
`commercial damages, including but not limited to special, incidental, consequential, or othe
`damages.
`
`For general information on ourother products and services or for technical support, please contact
`our Customer Care Depariment within the United States at (800) 762-2974, outside the United
`States at (317) 572-3993 or fax (317) 572-4002.
`
`Wiley also publishes its books in a variely of electronic formats. Some content that appears in print
`maynat be available in electronic formats. For more information about Wiley products. visit our
`web site al www.wiley.com.
`
`Wiley Bicentennial Logo: Richard J. Pacitico
`
`Library of Congress Cataloging-in-Publication Data:
`
`Spanias, Andreas.
`Audio signal processing and coding/by Andreas Spanias, Ted Painter, Venkatraman Aut.
`p. em,
`“Wiley-Interscicnce publication.”
`Includes bibliographical references and index.
`ISBN: 978-0-471-79147-8
`1. Coding theory. 2, Signal processing- Digital echniques. 3. Sound-Recording and
`reproducing -Digital techniques. I. Painter, Ted, 1967-11. Aui, Venkatraman, 1978-TIL.
`Title.
`
`TK5102.92.873 2006
`621.382°8 —de22
`
`2000040507
`aeem
`
`Printed in the United States of America.
`
`1098765432
`
`
`
`
`a emerse namie St eeeate ee oe Ts a ee|
`
`
`
`Page 4
`
`Page 4
`
`
`
`
`CONTENTS
`
`
`
`
`PREFACE
`
`1.
`
`INTRODUCTION
`
`1.
`
`HWw=I.
`
`1.
`
`Historical Perspective
`A General Perceptual Audio Coding Architecture
`Audio Coder Attributes
`
`«x
`
`UI
`
`1 1
`
`3
`
`13
`
`13
`
`16
`
`7 4
`
`0)
`
`1.3.1
`1.3.2
`
`Audio Quality
`Bit Rates
`
`Complexity
`1.3.3.
`Codec Delay
`I.3.4
`Error Robustness
`1.3.5
`Types of Audio Coders — An Overview
`Organization of the Book
`Notational Conventions
`Problems
`
`Computer Exercises
`
`[4
`
`1.5
`
`1.6
`
`2
`
`SIGNAL PROCESSING ESSENTIALS
`
`tytwMwNNtnfewwhMoS
`
`Introduction
`Spectra of Analog Signals
`Review of Convolution and Filtering
`Uniform Sampling
`Discrete-Time Signal Processing
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 5
`
`Page 5
`
`
`
`vill
`
`CONTENTS
`
`20
`
`22
`
`23
`e
`28
`
`25
`
`27
`
`29
`
`30
`
`3 3
`
`3
`
`35
`
`36
`
`36
`39
`
`42
`
`4d
`
`44
`
`45
`
`aT
`
`2.§.1
`Transforms for Discrete-Time Signals
`2.5.2
`The Discrete and the Fast Fourier Transform
`2.5.3.
`The Discrete Cosine Transtorm
`2.5.4
`The Short-Time Fourier Transform
`Difference Equations and Digital Filters
`The Transfer and the Frequency Response Functions
`2.7.1
`Poles. Zeros, and Frequency Response
`2.7.2
`Examples of Digital Filters for Audio Applications
`Reviewof Multirate Signal Processing
`2.8.1
`Down-sampling by an Integer
`2.8.2
`Up-sampling by an Integer
`2.8.3
`Sampling Rale Changes by Noninteger Factors
`2.8.4
`Quadrature Mirror Filter Banks
`Discrete-Time Random Signals
`2.9.1
`RandomSignals Processed by LT] Digital Filters
`2.9.2
`Autocorrelation Estimation from Finite-Length Data
`Summary
`Problems
`
`Computer Exercises
`
`2.10
`
`3
`
`QUANTIZATION AND ENTROPY CODING
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`82 eee
`
`
`
`3.1
`
`3.2
`
`3.3
`
`3.4
`
`Introduction
`3.1.1
`The Quantization—Bit Allocation—Entropy Coding
`Module
`Density Functions and Quantization
`Scalar Quantization
`3.3.1
`Uniform Quantization
`3.3.2
`Nonuniform Quantization
`3.3.3.
`Differential PCM
`
`Vector Quantization
`3.4.1
`Structured VQ
`3.4.2
`Split-VQ
`
`3.4.3
`Conjugate-Structure VQ
`
`Bit-Allocation Algorithms
`
`Entropy Coding
`
`3.6.1.
`Huffman Coding
`
`3.6.2
`Rice Coding
`
`3.6.3.
`Golomb Coding
`
`Page 6
`
`
`
`
`
`CONTENTS
`
`3.7
`
`Anthmetic Coding
`3.6.4
`Summary
`Problems
`
`Computer Exercises
`
`LINEAR PREDICTION IN NARROWBAND AND WIDEBAND
`CODING
`
`4.)
`
`4.2
`4.3
`
`4.4
`4.5
`
`4.6
`
`4.7
`
`Introduction
`
`LP-Based Source-System Modeling for Speech
`Short-Term Linear Prediction
`
`Long-Term Prediction
`4.3.1
`ADPCMUsing Linear Prediction
`4.3.2
`Open-Loop Analysis-Synthesis Linear Prediction
`Analysis-by-Synthesis Linear Prediction
`4.5.1
`Code-Excited Linear Prediction Algorithms
`Linear Prediction in Wideband Coding
`4.6.1 Wideband Speech Coding
`4.6.2 Wideband Audio Coding
`
`Summary
`Problems
`
`Computer Exercises
`
`PSYCHOACOUSTIC PRINCIPLES
`
`91
`
`O\
`
`92
`
`O4
`
`9S
`
`96
`
`96
`97
`
`100
`
`102
`
`102
`
`104
`
`106
`107
`
`108
`
`On
`
`a)
`
`AOA
`
`Bowiy
`
`¢ Asymmetry. and the Spread
`
`Introduction
`Absolute Threshold of Hearing
`Critical Bands
`Simultaneous Masking, Maskin
`of Masking
`5.4.1
`Noise-Masking-Tone
`§.4.2
` Tone-Masking-Noise
`5.4.3
`Noise-Masking-Noise
`5.4.4
`Asymmetry of Masking
`54.5
`The Spread of Masking
`Nonsimultaneous Masking
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Perceptual Entropy
`Example Codec Perceptual Model: ISO/IEC 11172-3
`(MPEG - 1) Psychoacoustic Model
`|
`5.7.1
`Step 1: Spectral Analysis and SPL Normalization
`
`
`
`Page 7
`
`Page 7
`
`
`
`
`
`CONTENTS
`
`Step 2: [dentification of Tonal and Noise Maskers
`5.7.2
`Step 3: Decimation and Reorganization of Maskers
`5.7.3.
`Step 4: Calculation of Individual Masking Thresholds
`5.7.4
`Step 5: Calculation of Global Masking Thresholds
`5.7.5
`Perceptual Bit Allocation
`Summary
`Problems
`
`5.8
`5.9
`
`Computer Exercises
`
`TIME-FREQUENCY ANALYSIS: FILTER BANKS AND
`TRANSFORMS
`
`6.1
`6.2
`6.3
`
`64
`6.5
`6.6
`6.7
`
`6.8
`
`Introduction
`Analysis-Synthesis Framework for M-hand Filter Banks
`Filter Banks for Audio Coding: Design Considerations
`6.3.1
`The Role of Time-Frequency Resolution in Masking
`Power Estimation
`The Role of Frequency Resolution in Perceptual Bit
`Allocation
`The Role of Time Resolution in Perceptual Bit
`Allocation
`
`6.3.2
`
`6.3.3
`
`Quadrature Mirror and Conjugate Quadrature Filters
`‘Tree-Structured QMF and CQF M-band Banks
`Cosine Modulated “Pseudo QMF” M-band Banks
`Cosine Modulated Perfect Reconstruction (PR) 4-band Banks
`and the Modified Discrete Cosine Transform (MDCT)
`6.7.1
`Forward and Inverse MDCT
`
`6.7.2. MDCT Window Design
`6.7.3
`Example MDCT Windows (Prototype FIR Filters)
` Diserete Fourier and Discrete Cosine Transform
`
`13
`135
`136
`138
`138
`140
`(40
`
`14]
`
`145
`
`145
`146
`148
`
`149
`
`149
`
`150
`
`155
`156
`160
`
`163
`165
`
`165
`167
`178
`
`6.10.2. Window Switching
`6.10.3 Hybrid. Switched Filter Banks
`6.10.4 Gain Modification
`
`6.11
`
`6.10.5 Temporal Noise Shaping
`Summary
`Problems
`
`Computer Exercises
`
`182
`J&4
`185
`
`185
`186
`188
`
`19]
`
`
`6.9—Pre-echo Distortion 180
`6.10
`Pre-echo Control Strategies
`[$2
`6.10.1
`Bil Reservair
`182
`
`
`
`Ses nici mec orgy
`
`Page 8
`
`Page 8
`
`
`
`
`
`CONTENTS
`
`TRANSFORM CODERS
`
`195
`
`7.1
`7.2
`7.3.
`
`SUBBAND CODERS
`
`xi
`
`
`
`
`195
`Introduction
`
`196
`Optimum Coding in the Frequency Domain
`197
`Perceptual Transform Coder
`
`198
`7.3.)
` PXFM
`
`
`199
`7.3.2
`SEPXFM
`
`200
`Brandenburg-Johnstan Hybrid Coder
`7.4
`201
`CNET Coders
`7.5
`
`
`
`75..CNET {DEIf Coder 201
`
`7.5.2.
`CNET MDCT Coder |
`201
`
`7.5.3.
`CNET MDCT Coder 2
`202
`
`7.6
`Adaptive Spectral Entropy Coding
`203
`
`7.7~~Differential Perceptual Audio Coder 204
`
`7.8
`DFT Noise Substitution
`205
`
`
`7.9
`DCTwith Vector Quantization
`206
`7.10 MDCT with Vector Quantization
`207
`
`7.41
`Summary
`208
`
`Problems
`208
`
`
`Computer Exercises
`210
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Introduction
`8.1.1
`Subband Algorithms
`DWT and Discrete Wavelet Packet Transform (DWPT)
`Adapted WP Algorithms
`8.3.1
` DWPT Coder with Globally Adapted Daubechies
`Analysis Wavelet
`Scalable DWPT Coder with Adaptive Tree Structure
`DWPT Coder with Globally Adapted General
`Analysis Wavelet
`DWPT Coder with Adaptive Tree Structure and
`Locally Adapted Analysis Wavelet
`DWPT Coder with Perceptually Optimized Synthesis
`Wavelets
`Adapted Nonuniform Filter Banks
`84.1
`Switched Nonuniform Filter Bank Cascade
`8.4.2
`Frequency-Varying Modulated Lapped Transforms
`Hybrid WP and Adapted WP/Sinusoidal Algorithins
`
`8.1
`
`8.2
`8.3
`
`8.4
`
`832
`83.3
`
`8.3.4
`
`8.3.5
`
`
`
`211
`
`At
`22
`214
`218
`
`218
`220
`
`223
`
`223
`
`224
`226
`226
`227
`227
`
`Page 9
`
`Page 9
`
`
`
`
`
`CONTENTS
`
`8.5.1
`8.5.2
`8.5.3.
`
`Hybrid Sinusoidal/Classical DWPT Coder
`Hybrid Sinusoidal/M -band DWPTCoder
`Hybrid Sinusoidal/DWPTCoder with WP Tree
`Structure Adaptation (ARCO)
`Subband Coding with Hybrid Filter Bank/CELP Algorithms
`8.6.1
`Hybrid Subband/CELPAlgorithm for Low-Delay
`Applications
`Hybrid Subband/CELP Algorithm for
`Low-Compleaity Applications
`Subband Coding with IIR Filter Banks
`Problems
`
`8.6.2.
`
`8.6
`
`8.7
`
`Computer Exercise
`
`SINUSOIDAL CODERS
`
`228
`
`229
`
`230
`
`BS
`
`234
`
`235
`
`237
`
`237
`
`240
`
`241
`
`241
`
`242
`
`242
`
`245
`247
`
`248
`
`248
`
`246
`
`249
`250
`
`9.1
`
`92
`
`9.3
`
`9.4
`
`9.6
`
`97
`
`9.8
`
`Introduction
`
`The Sinusoidal Model
`
`Sinusoidal Analysis and Parameter Tracking
`9.2.1
`Sinusoidal Synthesis and Parameter Interpolation
`9.2.2
`Analysis/Synthesis Audio Codec (ASAC)
`93.1
` ASAC Segmentation
`9.3.2
`ASAC Sinusoidal Analysis-by-Synthesis
`9.3.3.
`ASACBit Allocation, Quantization, Encoding, and
`Scalability
`Harmonie and Individual Lines Plus Noise Coder (HIEN)
`94.1
`HILN Sinusoidal Analysis-by-Synthesis
`94.2
`HILN Bit Allocation, Quantization. Encoding, and
`Decoding
`FM Synthesis
`9.5.1
`Principles of FM Synthesis
`9.5.2
`Perceptual Audio Coding Using an FM Synthesis
`Model
`
`The Sines + Transients + Noise (STN) Model
`Hybrid Sinusoidal Coders
`9.7.1
`Hybrid Sinusoidal-MDCTAlgorithm
`9.7.2
`Hybrid Sinusvidal-Vocoder Algorithin
`Summary
`Problems
`
`
`Computer Exercises
`
`
`
`Page 10
`
`Page 10
`
`
`
`10 AUDIO CODING STANDARDS AND ALGORITHMS
`
`Introducuon
`
`10.1
`
`10.2
`
`10.4
`
`10.5
`
`10.6
`
`10.7
`
`10.8
`
`10.9
`
`CONTENTS
`
`xiii
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`MIDI Versus Digital Audio
`10.2.1 MIDI Synthesizer
`10.2.2. General MID] (GM)
`10.2.3. MIDI Applicauons
`Multichannel Surround Sound
`10.3.1 The Evolution of Surround Sound
`10.3.2 The Mono. the Stereo, and the Surround Sound
`Formats
`10.3.3. The ITU-R BS.775 5.1-Channe} Configuration
`MPEGAudio Standards
`10.4.1 MPEG-I Audio (ISO/TEC 11172-3)
`10.4.2 MPEG-2 BC/LSF (SO/EC-1 3818-3)
`10.4.3 MPEG-2 NBC/AAC (ISOHEC-13818-7)
`10.4.4 MPEG-4 Audio (ISO/IEC 14496-3)
`10.4.5 MPEG-7 Audio (ISOAEC 15938-4)
`10.4.6 MPEG-21 Framework (SO/IEC-21000)
`10.4.7. MPEG Surround and Spatial Audio Coding
`Adaptive Transform Acoustic Coding (ATRAC)
`Lucent Technologies PAC, EPAC, and MPAC
`10.6.1
`Perceptual Audio Coder (PAC)
`10.6.2 Enhanced PAC (EPAC)
`10.6.3 Multichannel PAC (MPAC)
`Dolby Audio Coding Standards
`10.7.1 Dolby AC-2, AC-2A
`10.7.2 Dolby AC-3/Dolby Digital/Dolby SR - D
`Audio Processing Technology APT-x100
`DLS — Coherent Acoustics
`10.9.1
`Framing and Subband Analysis
`10.9.2.
`Psychoacoustic Analysis
`10.9.3 ADPCM- Differential Subband Ceding
`10.9.4 Bit Allocation, Quantization, and Multiplexing
`10.9.5 DTS-CA Versus Dolby Digital
`Problems
`Computer Exercise
`
`263
`
`263
`
`264
`
`264
`206
`
`266
`
`267
`
`267
`
`208
`268
`270
`
`275
`
`279
`283
`
`289
`309
`
`317
`
`319
`319
`
`32]
`
`321
`
`323
`
`BES
`
`338
`
`339
`
`339
`341
`
`342
`342
`
`342
`
`343
`
`141 LOSSLESS AUDIO CODING AND DIGITAL WATERMARKING
`
`
`11.1
`Introduction
`
`
`
`
`
`Page 11
`
`Page 11
`
`
`
`11.2 Lossless Audio Coding (L°AC)
`11.2.1
`L*ACPrinciples
`11.2.2.
`[7AC Algorithms
`11.3. DVD-Audio
`11.3.1 Meridian Lossless Packing (MLP)
`11.4 Super-Audio CD (SACD)
`11.4.1
`SACD Storage Format
`11.4.2 Sigma-Delta Modulators (SDM)
`11.4.3. Direct Stream Digital (DSD) Encoding
`11.5 Digital Audio Watermarking
`11.5.1] Background
`11.5.2. A Generic Architecture for DAW
`
`11.5.3. DAWSchemes — Attributes
`11.6 Summary of Commercial Applications
`Problems
`Computer Exercise
`
`344
`345
`3.46
`356
`358
`3458
`362
`362
`364
`368
`370
`374
`
`377
`378
`382
`382
`
`12 QUALITY MEASURES FOR PERCEPTUAL AUDIO CODING
`
`383
`
`12.1
`Introduction
`12.2 Subjective Quality Measures
`12.3 Confounding Factors in Subjective Evaluations
`[2.4 Subjective Evaluations of Two-Channel Standardized Codecs
`12.5
`Subjective Evaluations of 5.1-Channel Standardized Codecs
`12.6
`Subjective Evaluations Using Perceptual Measurement
`Systems
`12.6.1 CIR Perceptual Measurement Schemes
`12.6.2 NSEPerceptual Measurement Schemes
`12.7 Algorithms for Perceptual Measurement
`12.7.1
`Example: Perceptual Audio Quality Measure (PAQM)
`12.7.2. Example: Noise-to-Mask Ratio (NMR)
`12.7.3. Example: Objective Audio Signal Evaluation (OASE)
`ITU-R BS.1387 and ITU-T P.861: Standards for Perceptual
`Quality Measurement
`12.9 Research Directions for Perceptual Codec Quality Measures
`
`12.8
`
`383
`384
`386
`387
`388
`
`389
`390
`390
`391
`392
`396
`399
`
`me AO
`402
`
`REFERENCES
`
`405
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`xiv
`
`CONTENTS
`
`
`
`
`
`
`
` INDEX
`
` |
`Page 12
`
`Page 12
`
`
`
`
`
`
`CHAPTER1
`
`
`INTRODUCTION
`
`
`Audio coding or audio compression algorithms are used to obtain compact dig-
`ital representations of high-fidelity (wideband) audio signals for the purpose of
`efficient transmission or storage. The central objective in audio coding is to rep-
`resent the signal with a minimum number of bits while achieving transparent
`signal reproduction,
`i.c., generating output audio that cannot be distinguished
`{rom the original input, even by a sensitive listener (“golden ears”). This text
`gives an in-depth treatment of algorithms and standards for transparent coding
`of high-fidelity audio.
`
`1.1 HISTORICAL PERSPECTIVE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`The introduction of the compact disc (CD) in the early 1980s brought to the
`fore all of the advantages of digital audio representation,
`including true high-
`fidelity, dynamic range, and robustness. These advantages, however, came at
`the expense of high data rates. Conventional CD and digital audio tape (DAT)
`sysiems are typically sampled at either 44.1 or 48 kHz using pulse code mod-
`ulation (PCM) with a
`[6-bit sample resolution. This results in uncompressed
`data rates of 705.6/768 kb/s for a monaural channel, or 1.41/1.54 Mb/s for a
`stereo-pair. Although these data rates were accommodated successfully tn first-
`generation CD and DATplayers, second-generation audio players and wirelessly
`connected systems are often subject to bandwidth constraints that are incompat-
`ible with high data rates. Because of the success enjoyed by the first-generation
`
`Audio Signal Processing and Coding, by Andreas Spanias. Ted Painter. and Venkatraman Att
`Copyright © 2007 by John Wiley & Sons. Inc,
`
`
`
`Page 13
`
`Page 13
`
`
`
`
`
`
`
`INTRODUCTION
`
`systems, however, end users have come to expect “CD-quality” audio reproduc-
`tion from any digital system. Therefore, new network and wireless multimedia
`digital audio systems must reduce data rates without compromising reproduc-
`tion quality. Motivated by the needfor compression algorithms that can satisfy
`simultaneously the conflicting demands of high compression ratios and trans-
`parent quality for high-fidelity audio signals, several coding methodologies have
`beenestablished overthe last two decades. Audio compression schemes, in gen-
`eral, employ design techniques that exploit both perceptual irrelevancies and
`Statistical redundancies.
`PCMwasthe primary audio encoding scheme employed until the early 1980s.
`PCM does not provide any mechanisms for redundancy removal. Quantization
`methods that exploit the signal correlation, such as differential PCM (DPCM),
`delta modulation [Jaya76] [Jaya84], and adaptive DPCM (ADPCM)were applied
`to audio compression later (e.g., PC audio cards). Owing lo the need for dras-
`lic reduction in bit rates, researchers began to pursue new approaches for audio
`coding based onthe principles ofpsychoacousties [Zwic9)] [Moor03]. Psychoa-
`coustic notions in conjunction with the basic properties of signal quantization
`have led to the theory of perceptual entropy (John88a] [Johng8b]. Perceptual
`cntropy is a quantilative estimate of the fundamental limit of lransparent audio
`signal compression. Another key contribution to the field was the characterization
`of the auditory filter bank and particularly the time-frequency analysis capabili-
`ties of the inner ear [Moor83]. Overthe years, several filter-bank structures that
`mimic the critical band structure of the auditory filler bank have heen proposed.
`A filter bank is a parallel bank of bandpass filters covering the audio spectrum,
`which, when used in conjunction with a perceptual model, can play an important
`role in the identification of perceptual irrelevancies.
`During the early 1990s, several workgroups and organizations such as
`the International Organization for Standardization/International Electro-technical
`Commission (ISO/IEC),
`the International Telecommunications Union (ITU),
`AY&T. Dolby Laboratories, Digital Theatre Systems (DTS), Lucent Technologies,
`Philips, and Sony were actively involved in developing perceptual audio coding
`algorithms and standards. Some ofthe popular commercial standards published
`in the early 1990s include Dolby’s Audio Coder-3 (AC-3). the DTS Coherent
`Acoustics (DTS-CA), Lucent Technologies’ Perceptual Audio Coder (PAC),
`Philips’ Precision Adaptive Subband Coding (PASC), and Sony’s Adaptive
`Transform Acoustic Coding (ATRAC). Table 1.1 lists chronologically some of
`the prominent audio coding standards. The commercial success enjoyed by
`these audio coding standards triggered the launch ofseveral multimedia storage
`formats.
`Table 1.2 lists some ofthe popular multimedia storage Jormats since the begin-
`ning of the CD era. High-performancestereo systems became quite common with
`the advent of CDsin the early 1980s. A compact-disc—read only memory (CD-
`ROM)canstore data up to 700-800 MB in digital form as “microscopic-pits”
`that can be read bya laser beamoffof a reflective surface or a medium. Three
`competing storage media — DAT. the digital compact cassette (DCC), and the
`
`
`
`Semeee a eeneseen
`
`=
`
`eeeee
`
`Page 14
`
`
`
`HISTORICAL PERSPECTIVE
`
`3
`
`Table 1.1. List of perceptual and lossless audio coding standards/algorithms.
`Related references
`Standard/algonthm
`
`
`[ISQI92]
`1, SOJIEC MPEG-I audio
`[Lokh92]
`2. Philips’ PASC (for DCCapplications)
`(John96c] [Sinh96]
`3. AT&T/Lucent PAC/EPAC
`[Davi92]} |Fiel91 |
`4. Dolby AC-2
`[Davis93|
`|Fiel96]
`5. AC-3/Dolby Digital
`{ISOI94a]
`6. ISO/LEC MPEG-2 (BC/LSF) audio
`[Yosh94] [Tsut96]
`7, Sony's ATRAC,; (MimDisc and SDDS)
`fRobi94|
`8. SHORTEN
`[Wyli96b]
`9. Audio processing technology — APT-x100
`{ISOI96]
`10. ISOAEC MPEG-2 AAC
`[Smyt96] [Smyt99]
`l 1. DTS coherent acoustics
`[Crav96] [Crav97]
`12. The DVD Algonthm
`[Wege97]|
`13. MUSICompress
`{Pura97 |
`14, Lossless transform coding of audio (LTAC)
`[Hans98b] [HansO!]
`15. AudioPak
`[ISOI99}
`16. ISOAEC MPEG-4 audio version |
`(Gerz99|
`17. Meridian lossless packing (MLP)
`[ISOLO0]
`18. ISO/IEC MPEG-4 audio version 2
`[GeigO1] (Geig02|
`| 9, Audio coding based on integer transforms
`[ReefOla] [Jans03]
`20. Direct-stream digital (DSD) technology
`
`
`Table 1.2. Some of the popular audio storage
`formats.
`
`
`Related references
`Audio storage format
`
`
`[CD82] [IECAS7]
`1. Compact disc
`[WatkS8] [Tan8?]
`2. Digital audio tape (DAT)
`{Lokh9!) [Lokh92|
`3. Digital compact cassette (DCC)
`| Yosh94] [Tsut96]
`4. MiniDisc
`|DVD96]
`5. Digital versatile disc (DVD)
`{[DVDOL)
`6, DVD-audio (DVD-A)
`
`7. Super audio CD (SACD) [SACDO2|
`
`MiniDisc (MD) — entered the commercial market during 1987-1992. Intended
`mainly for back-up high-density storage (~ |.3 GB), the DAT becamethe primary
`source of mass data storage/transfer [Watk88] [Tan89}]. In 1991-1992, Sony pro-
`posed a storage mediumcalled the MiniDisc, primarily for audio storage. MD
`employs the ATRACalgorithmfor compression. In 1991, Philips introduced the
`DCC, a successor of the analog compact cassette. Philips DCC employs a com-
`pression schemecalled the PASC {Lokh91] [Lokh92] [Hoog94]. The DCC began
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 15
`
`Page 15
`
`
`
`
`
`4
`
`INTRODUCTION
`
`as a potential competitor for DATs but was discontinued in 1996. The introduc-
`tion of the digital versatile disc (DVD) in 1996 cnabled both video and audio
`recording/storage as well as text-message programming. The DVD became one
`of the most successful storage media. With the improvements in the audio com-
`pression and DVDstorage technologies, multichannel surround sound encoding
`formats gained interest |Bosi93] [Holm99] [BosiQ0Q].
`late
`With the emergence of streaming audio applications. during the
`1990s,
`researchers pursued techniques such as combined speech and audio
`architectures, as well as joint source-channel coding algorithms that are optimized
`for the packet-switched Internet. The advent of ISOAEC MPEG-4 standard
`(1996-2000) [ISOI99]} [ISOI00] established newresearch goals for high-quality
`coding of audio at low bit rates. MPEG-4 audio encompasses more functionality
`than perceptual coding |Koen98] [Koen99]. It comprises an integrated family of
`algorithms with provisions for scalable, object-based speech andaudio coding at
`bit rates from as low as 200 b/s up to 64 kb/s per channel.
`The emergence of the DVD-audio and the super audio CD (SACD) pro-
`vided designers with additional storage capacity, which motivated research in
`lossless audio coding [Crav96] [Gerz99] [ReefOla]. A lossless audio coding sys-
`tem is able to reconstruct pertectly a bit-for-bit representation of the original
`input audio. In contrast, a coding scheme incapable of perfect reconstruction is
`called /ossy. For most audio program material, lossy schemes offer the advan-
`tage of lower bit rates (e.g.
`less than | bit per sample) relative to lossless
`schemes (e.g., 10 bits per sample). Delivering real-time lossless audio content
`to the network browser at low bit rates is the next grand challenge for codec
`designers.
`
`1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE
`
`Overthe last few years, researchers have proposed several efficient signal models
`(e.g., ansform-based, subband-filter structures, wavelet-packet) and compression
`standards (Table 1.1) for high-quality digital audio reproduction. Most of these
`algorithms are based on the generic architecture shown in Figure 1.1.
`The coders typically segment input signals into quasi-stationary frames ranging
`from 2 to SO ms. Then, a time-frequency analysis section estimates the temporal
`and spectral components of each frame. The time-frequency mapping is usually
`matched to the analysis properties of the human auditory system. Either way,
`the ultimate objective is to extract from the input audio a set of time-frequency
`parameters that is amenable to quantization according to a perceptual distortion
`metric. Depending on the overall design objectives, the time-frequency analysis
`section usually contains one of the following:
`
`e Unitary transform
`« Time-invariant bank of critically sampled. uniform/nonuniform bandpass
`filters
`
`ee Rees|
`
`Page 16
`
`Page 16
`
`
`
`Input
`
`|
`
`audio
`
`=
`|
`
`a
`
`ee a
`frequency
`analysis
`7
`
`
`
`|
`
`|.
`
`
`
`|
`
`Quantizali
`and encoding
`g
`
`=
`
`a
`;
`|__»| Psychoacoustic
`——* Bit-allocation
`analysis
`ye a
`Masking
`thresholds
`
`Parameters
`
`-
`:
`
`
`
`AUDIO CODER ATTRIBUTES
`
`5
`
`|
`
`lee
`
`ie
`
`¢ |
`
`Entropy
`(lossless)
`coding
`
`MUX
`
`[-—*
`To
`channel
`
`|
`| DI
`Side
`information
`
`rates (<32 kb/s), with an acceptable
`
`Figure 1.1. A generic perceptual audio encoder.
`
`e Time-varying (signal-adaptive) bank of critically sampled, uniform/nonunif-
`orm bandpass fillers
`e Harmonic/sinusoidal analyzer
`¢ Source-system analysis (LPC and multipulse excitation)
`« Hybrid versions of the above.
`
`The choice of time-frequency analysis methodology always involves a fun-
`damental tradeoff between time and frequencyresolution requirements. Percep-
`tual distortion control
`is achieved by a psychoacoustic signal analysis section
`that estimates signal masking power based on psychoacoustic principles. The
`psychoacoustic model delivers masking thresholds that quantify the maximum
`amount of distortion at each point in the time-frequency plane such that quan-
`lization of the time-frequency parameters does not introduce audible artifacts.
`The psychoacoustic mode! therefore allows the quantization section to exploit
`perceptual
`irrelevancies. This section can also exploit statistical redundancies
`through classical techniques such as DPCM or ADPCM. Once a quantized com-
`pact parametric set has been formed, the remaining redundancies are typically
`removed through noiseless run-length (RL) and entropy coding techniques, ©.g.,
`Huffman [Cove91], arithmetic [Witt87], or Lempel-Ziv-Welch (LZW) [Ziv77]
`[Welc84]. Since the output of the psychoacoustic distortion control model
`is
`signal-dependent, most algorithms are inherently variable rate. Fixed channel
`rale requirements are usuallysatisfied through buffer feedback schemes, which
`often introduce encoding delays.
`
`1.3 AUDIO CODER ATTRIBUTES
`
`Perceptual audio coders are typically evaluated basedonthe following attributes:
`audio reproduction quality, operating bit rates. computational complexity, codec
`delay, and channel error robustness. The objective is to attain a high-quality
`(transparent) audio output at
`low bit
`
`Page 17
`
`Page 17
`
`
`
`
`
`
`INTRODUCTION
`
`6
`
`algorithmic delay (~5 to 20 ms), and with low computational complexity (~1 to
`10 million instructions per second, or MIPS).
`
`1.3.1 Audio Quality
`
`importance when designing an audio coding
`Audio quality is of paramount
`algorithm. Successful strides have been made since the development of sim-
`ple near-transparent perceptual coders. Typically, classical objective measures of
`signal fidelity such as the signal
`to noise ratio (SNR) and the total harmonic
`distortion (THD) are inadequate [Ryde96]. As the field of perceptual audio cod-
`ing matured rapidly andcreated greater demand forlistening tests, there was a
`corresponding growth of interest in perceptual measurement schemes. Several
`subjective and objective quality measures have been proposed and standard-
`ized during the last decade. Some of these schemes include the noise-to-mask
`ratio (NMR, 1987) [Bran87a]
`the perceptual audio quality measure (PAQM,
`1991) [Beer91], the perceptual evaluation (PERCEVAL, 1992) [Pail92], the per-
`ceptual objective measure (POM. 1995) [Colo95], and the objective audio signal
`evaluation (OASE, 1997) [Spor97]. We will address these and several other qual-
`ity assessment schemes in detail in Chapter 12.
`
`1.3.2 Bit Rates
`
`From a codec designer’s point of view, one of the key challenges is to rep-
`resent high-fidelity audio with a minimum number ofbits. For instance,
`if a
`5-ms audio frame sampled at 48 kHz (240 samples per frame) is represented
`using 80 bits, then the encoding bit rate would be 80 bits/5 ms == 16 kb/s. Low
`bit rates imply high compression ratios and generally low reproduction qual-
`ity. Early coders such as the ISO/IEC MPEG-1 (32—448 kb/s), the Dolby AC-3
`(32-384 kb/s), the Sony ATRAC (256 kb/s), and the Philips PASC (192 kb/s)
`employ high bit rates for obtaining transparent audio reproduction. However. the
`development of several sophisticated audio coding tools (¢.g., MPEG-4 audio
`tools) created waysfor efficient transmission or storage of audio at rates between
`8 and 32 kb/s. Future audio coding algorithms promise to offer reasonable qual-
`ily al low rates along with the ability to scale both rate and guality to match
`dilferent requirements such as time-varying channel capacity.
`
`1.3.3 Complexity
`
`Reduced computational complexity not only enables real-time implementation
`but mayalso decrease the power consumption and extend battery life. Com-
`putational complexity 1s usually measured in terms of millions of instructions
`per second (MIPS). Complexity estimates are processor-dependent. For example,
`the complexity associated with Dolby’s AC-3 decoder was estimated at approxi-
`mately 27 MIPS using the Zoran ZR38001 general-purpose DSP core |Vern95)];
`tor the Motorola DSP56002 processor,
`the complexity was estimated at 45
`MIPS [Vern95]. Usually, most of the audio codecs rely on the so-called asym-
`metric encoding principle. This means that the codec complexity is not evenly
`
`
` reat EbeenS
`
`
`
`lbhtei Lhenee ee Ee
`
`Page 18
`
`
`
`shared between the encoder and the decoder (typically, encoder 80% and decoder
`20% complexity). with more emphasis on reducing the decoder complexity.
`
`1.3.4 Codec Delay
`Many of the network applications for high-fidelity audio (streaming audio, audio-
`on-demand) are delay tolerant (up to 100-200 ms), providing the opportunity
`to exploit
`long-term signal properties in order to achieve high coding gain.
`However,
`in two-way real-lime communication and voice-over Internet proto-
`col (VoIP) applications, low-delay encoding (]0—20 ms) is important. Consider
`the example described before, i.c., an audio coder operating on frames of 5 ms
`at a 48 kHz sampling frequency. In an ideal encoding scenario, the minimum
`amountof delay should be 5 ms at the encoder and 5 msat the decoder (same as
`the frame length). However, other factors such as analysis-synthesis filler bank
`window,
`the look-ahead. the bit-reservoir, and the channel delay contribute to
`additional delays. Employing shorter analysis-synthesis windows, avoiding look-
`ahead, and re-structuring the bit-reservoir functions could result in Jow-delay
`encoding, nonetheless, with reduced coding efficiencies.
`
`TYPES OF AUDIO CODERS- AN OVERVIEW
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1.3.5 Error Robustness
`The increasing popularity of streaming audio over packet-switched and wire-
`less networks such as the Internet implies that any algorithm intended for such
`applications must be able to deal with a noisy time-varying channel. In partic-
`ular, provisions for error robustness and error protection must be incorporated
`at the encoder in order to achieve reliable transmission of digital audio over
`error-prone channels. Oue simple idea could be to provide betler protection to
`the error-sensitive and priority (important) bits. For instance,
`the audio frame
`header requires the maximum error robustness: otherwise, transmission errors
`in the headerwill seriously impair the entire audio frame. Several error detect-
`ing/correcting codes [Lin82] [Wick95] [Bayl97] [Swee02] [Zara02] can also be
`employed. Inclusion oferror correcting codes in the bitstream might help to obtain
`error-free reproduction of the input audio, however, with increased complexity
`andbit rates.
`Fromthe discussionin the previoussections, it is evident that several tradeoffs
`must be considered in designing an algorithm for a particular application. Forthis
`reason. audio coding standards consist of several tools that enable the design of
`scalable algorithms. For example, MPEG-4 provides tools to design algorithms
`that satisfy a