`
`Office of Business Enterprises
`Duplication Services Section
`
`THIS IS TO CERTIFY that the collections of the Library of Congress contain a bound
`volume entitled AUDIO SIGNAL PROCESSING AND CODING, call number TK
`5102.92.S73 2007, Copy 1. The attached — Cover Page, Title Page, Copyright Page, Table of
`Contents Pages, Chapter 1 and Chapter 10 - are a true and complete representation from that
`work.
`
`THIS IS TO CERTIFY FURTHER, that work is marked with a Library of Congress
`Cataloging—in-Publication stamp dated March 5, 2007.
`
`IN WITNESS WHEREOF, the seal of the Library of Congress is affixed hereto on
`May 18, 2018.
`
`Deirdre Scott
`
`Library of Congress
`
`Business Enterprises Officer
`Office of Business Enterprises
`
`NETFLIX, INC
`101 Independence Avenue, SE Washington, DC 20540—4917 Tel 202.707.5650 www.loc.grov; duplicntionservices@loc.gov Exhibit 1 009
`
`|PR2018—01630
`
`Page 1
`
`Page 1
`
`
`
`NETFLIX, INC
`Exhibit 1009
`IPR2018-01630
`
`
`
`
`
`
`
`
`
`
`
`Page 2
`
`
`
`
`
`AUDIO SIGNAL
`PROCESSING
`AND CODING
`
`
`
`Andreas §panias
`Ted Painter
`
`Venkatraman Atti
`
`
`
`
`
`Page 3
`
`I
`.‘
`318075-
`
`giwi'WILEYf
`
`gzoo7€
`
`
`
`WILEYMINTERSCIENCE
`A John WIIey & Sons, Inc” Publication
`
`Page 3
`
`
`
`
`
`«A;
`
`.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`=
`.
`r ' ’3 _ r
`i
`Copvright ’O 2007 hi; John Willfi-"dt 50%; Idiots/kill rrglrtsgreserved
`Published by John Wile; injjurrs. Ines llobokenfifilewfiJersey.
`Published simultaneously in (‘irrifi‘t‘la --
`--
`
`No part of ll'It\ Public-“Inn may ht- reproduced. stored in a retrieval 5:} «rent. or transmitted in Ho):
`lLJIII'I or by any na‘ans. electronic. incrltaniunl. photocopying. recording. scanning or “lllt‘I'uisv,
`{'r‘J'L'lil as permitted under Section ll]?f nr [tit-t of the WW] United States Copyright J‘th.
`\Vllllout
`onlm the pltrti' wiritmt puttnimnn til the Publisher. or :mthnrimtmn through payment or the
`appropriate per-copy fee to the Copyright Clearance Center, lnc.. 222 Rosewood Drive. Danvers,
`MA 01923. (9750 750-8400, fax (978) 750-4-l‘ftr, or on the web at wnn .t'opyrightcom. RCQUCSI'X
`to the Publisher {or permission should be addressed to the Permissions Department. John W'lley &
`Sons, lnc.. lll River Street. Hoboken. NJ 07030, (20]) 748-60] 1, fax (201) 7486008. or online at
`http://mvwwileycorn/go/permission
`
`Limit of Liabilily/Diselairncr of Warranty: While the publisher and author have used their best
`efforts in preparing this book,
`they make no representations or warranties with respect to the
`accuracy or completeness ol‘ the contents of this book and specifically disclaim any implied
`warranties of merchantability or fitness for a particular purpose. No warranty may be created or
`extended by sales representatives or written sales materials. The advice and strategies contained
`herein may not be suitable for your situation You should consult with a professional where
`appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other
`commercial damages, including but not limited to special, incidental. consequential, or other
`damages.
`
`For general information on our other products and services or for technical support, please contact
`our Customer Care Department within the United States at {800) 762—2974, outside the United
`States at (317) 572-3993 or fax (3]7) 572—4002.
`
`Wiley also publishes its books in a variety of electronic formats. Some content that appears in print
`may not be available in electronic formats. For more information about Wiley products. visit our
`web site at \vwiwileyzcom.
`
`Wiley Bicentennial Logo: Richard J. Pacitico
`
`Library of Congress Catalogiug-iii-Publication Data:
`
`Spanias, Andreas.
`
`Audio signal processing and coding/by Andreas Spanius, 'l‘ed Painter, Venkutraman Atti.
`p. cm.
`
`“\N’iley-lnterscicnce publication."
`Includes bibliographical references and index.
`ISBN: 978—0-471—79147—8
`
`t'rrding Iliuory. 2. Signal processing "lingual lt'L‘lmiquc‘s. 3 Round—Recording and
`l
`reprntlmmi: - Digital techniques. I. Painter. Ted. tum-ll. Atti, \r‘ertkutrannm.
`IUTS—IH.
`Title.
`
`'l‘KS l02.92.S73 2006
`62 l .382‘87dc22
`
`Printed in the United States of America.
`
`2006040313....
`.uw‘rw”
`
`1098765432]
`
`
`
`
`.
`..
`.
`..
`.
`....
`.
`.— a;- _.-_..
`and“ lust
`l
`. u I
`|.II. klllllr.‘
`
`- , Ml.
`.- u - Niacin—n“
`
`
`Page 4
`
`Page 4
`
`
`
`
`CONTENTS
`
`
`
`
`PREFACE
`
`xv
`
`1
`
`INTRODUCTION
`
`1.1
`1.2
`1.3
`
`Historical Perxpccrive
`A General Perceptual Audio Coding Architecture
`Audio Coder Attributes
`
`1.31
`1.3.2
`
`1.3.3
`1.3.4
`1.3.5
`
`Audio Quality
`Bit Rates
`
`Complexity
`Codec Delay
`Error Robustness
`
`1.4
`1.5
`1.6
`
`Types of Audio Coders ~ An Overview
`Organization 01' 11113 Book
`Notationnl Conventions
`Problems
`Computer Exercises
`
`2
`
`SIGNAL PROCESSING ESSENTIALS
`
`2.1
`2.2
`2.3
`2.4
`2.5
`
`Introduction
`Spectra of Analog Signals
`Review of Convolution and Filtering
`Uniform Sampling
`Discrete—lime Signal Processuig
`
`1
`
`l
`4
`5
`
`()
`6
`
`6
`7
`
`7
`8
`9
`|
`|
`
`1
`1
`
`'13
`
`13
`13
`1(3
`'7
`211
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 5
`
`Page 5
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Structured VQ
`3.4.1
`Split—VQ
`3.4.2
`
`Conjugate-Structure VQ
`3.4.3
`
`Bit-Allocation Algorithms
`
`Entropy Coding
`
`3.6.1
`Huffman Coding
`
`3.6.2
`Rice Coding
`
`3.6.3
`Golomb Coding
`
` - " nun-u amt-1H
`
`
`
`Page 6
`
`VIII
`
`CONTENTS
`
`2.5.1
`2.5.2
`
`2.5.3
`
`2.5.4
`
`Transforms for I‘Discrete—Time Signals
`The Discrete and the Fast Fourier Transform
`
`The Discrete Cosine Transform
`
`The Short-Ti me Fourier Transform
`
`Difference Equations and Digital Filters
`
`The 'Il‘ransfer and the Frequency Response Functions
`
`2.7.]
`
`Poles. Zeros. and Frequency Response
`
`22
`
`23
`q
`23
`
`25
`
`27
`
`30’3
`
`J 3
`
`3
`
`35
`’1
`.
`36
`
`36
`39
`
`42
`
`44
`
`44
`
`45
`
`47
`
`(i9
`
`74
`
`77
`
`8 r
`
`82
`
`Examples of Digital Filters for Audio Applications
`2.7.2
`Review of Multirate Signal Processing
`
`2.8.].
`
`2.8.2
`
`2.8.3
`2.8.4
`
`Down—sampling by an Integer
`
`Up—sampling by an Integer
`
`Sampling Rate Changes by Noninteger Factors
`Quadrature Mirror Filter Banks
`
`2.9
`
`2.10
`
`Discrete—lime Random Signals
`
`Random Signals Processed by L'I‘l Digital Filters
`2.9.1
`Autocorrelation Estimation from Finite—Length Data
`2.9.2
`Summary
`Problems
`
`Computer Exercises
`
`QUANTIZATION AND ENTROPY CODING
`
`3.
`
`Introduction
`
`3.1.]
`
`The Quantization—Bit Allocation—Entropy Coding
`Module
`
`3.2
`
`3.3
`
`3.4
`
`Density Functions and Quantization
`Scalar Quantization
`
`3.3.]
`3.3.2
`3.3.3
`
`Uniform Quantization
`Nonuniform Quantization
`Differential PCM
`
`Vector Quantization
`
`Page 6
`
`
`
`
`
`CONTENTS
`
`ix
`
`3.7
`
`Arithmetic Coding
`3.6.4
`Summary
`Problems:
`
`(.30mputer Exercises
`
`4
`
`LINEAR PREDICTION IN NARROWBAND AND WIDEBAND
`CODING
`
`4.]
`
`4.2
`4.3
`
`4.4
`4.5
`
`Introduction
`
`LPBased Source-System Modeling I‘or Speech
`Short—Term Linear Prediction
`
`4.3.1
`4.3.2
`
`Long-Term Prediction
`ADPCM Using Linear Prediction
`
`(')pen—l..oop Analysis-Synthesis Linear Prediction
`Aiialysis—by—Synt'liesis Linear Prediction
`4.5.1
`Code-Excited Linear Prediction Algorithmx
`
`4.6
`
`Linear Prediction in Wideband Coding
`
`4.7
`
`4.6.1 Wideband Speech Coding
`4.6.2 Wideband Audio Coding
`Sn Inmary
`Problems
`Computer Exercises
`
`5
`
`PSYCHOACOUSTIC PRINCIPLES
`
`5.1
`5.2
`5.3
`5.4
`
`5.5
`3.
`5.7
`
`Introduction
`Absolute ’I‘Ilrcsliold of Hearing
`Critical Bands
`Simultancoux Masking, Masking Asymmetry. and the Spread
`ol‘ Masking
`5.4.1
`Noise-Masking—Tone
`5.4.2
`Tone—Mashrig-Noise
`5.4.3
`NoiserMasking-Noise
`5.4.4
`Asymmetry 01. Masking
`5.4.5
`The Spread 01‘ Atlasking
`Nonsimultaneous IVIleklnil
`Perceptual Entropy
`[Example (‘nilt‘t' Perceptual l‘vhnl-cl'. [St HIEC 11172-3
`tMl’l‘iG .
`ll l’xyclioacottnin‘ Model
`I
`5.7.1
`Step 1: Spectral Allah-xix and SPL Normali/ation
`
`
`
`53
`55
`35
`
`8!»
`
`91
`
`9|
`
`92
`()4
`
`9.5
`96
`
`90
`97
`1110
`
`102
`
`102
`104
`100
`1(17
`1011
`
`113
`
`113
`1 14
`”5
`
`120
`123
`12-1
`l24
`12-1
`125
`127
`128
`
`130
`131
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 7
`
`Page 7
`
`
`
`
`
`X
`
`CONTENTS
`
`5.7.2
`
`5.7.3
`
`5.7.4
`
`Step 2: Identification of Tonal and Noise Maskers
`
`Step 3: Declination and Reorganization of Maskers
`
`Step 4: Calculation ol‘ Individual Masking "Thresholds
`
`Step 5: Calculation of Global Masking Thresholds
`5.7.5
`Perceptual Bit Allocation
`
`Summary
`Problen‘is
`
`Computer Exercises
`
`5.8
`
`5.9
`
`6
`
`TIME-FREQUENCY ANALYSIS: FILTER BANKS AND
`TRANSFORMS
`
`6.1
`
`6.2
`
`6.3
`
`6.4
`
`6.5
`
`6.6
`6.7
`
`6.8
`
`6.9
`
`6.10
`
`Introduction
`
`Analysis-Synthesis Framework for M—band Filter Banks
`
`Filter Banks for Audio ('lotling21)esign Considerations
`
`6.3.1
`
`6.3.2
`
`6.3.3
`
`The Role of Time—Frequency Resolution in Masking
`Power 1556111216011
`
`The Role of Frequency Resolution in Perceptual Bit
`Allocation
`
`The Role ol~ Time. Resolution in Perceptual Bit
`Allocation
`
`Quadrature Mirror and Conjugate Quadrature Filters
`
`Tree—Structured QMF and CQF M»band Banks
`
`Cosine Modulated “Pseudo QMF" M —band Banks
`Cosine i'\/I(,)t:lulated Perfect Reconstruction (PR) M—band Banks
`and the Modified Discrete Cosine Transform (MDCT)
`6.7.1
`Forward and Inverse MDCT
`
`6.7.2 MDCT Window Design
`
`Example MDCT Windows (Prototype FIR Filters)
`6.7.3
`Discrete Fourier and Discrete Cosine 'l‘ransl‘orm
`
`Pre-echo Distortion
`
`Pre—echo Control Strategies
`6.10.1
`Bit Reservoir
`
`6.10.2 Window Switching
`6.10.3 Hybrid. Switched Filter Banks
`6.10.4 Gain Modification
`
`6.1 1
`
`6.10.5 Temporal Noise. Shaping
`Summary
`Problems
`
`Computer Exercises
`
`131
`
`135
`
`136
`
`138
`138
`
`140
`140
`
`141
`
`145
`
`145
`
`146
`
`148
`
`149
`
`149
`
`150
`
`155
`
`156
`
`160
`
`163
`165
`
`165
`
`167
`178
`
`180
`
`182
`182
`
`182
`134
`185
`
`185
`186
`188
`
`191
`
`....m _....
`
`. ._ .. ..
`
`...... "......
`
`--—.-...m.r.. 1m
`
`Page 8
`
`Page 8
`
`
`
`xi
`CONTENTS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`TRANSFORM CODERS
`
`7.]
`
`7.2
`7.3
`
`Introduction
`
`Optimum Coding in the lircqucncy Domain
`Perceptual 'l‘ransform Coder
`7.3.]
`PXth
`
`7.3.2
`
`SEPXFM
`
`7.4
`7.5
`
`Bratndenburgdolntston Hybrid Coder
`CNET Codcrfi
`
`7.5.1
`
`7.5.2
`
`7.5.3
`
`CNET DF'I‘ Coder
`
`CNET MDC'I' Coder I
`
`CNET MDC’l' Coder 2
`
`7.6
`7.7
`7.8
`
`Adaptive Spectral Entropy Coding
`Differential Perceptual Audio Coder
`DFT Noise Substitution
`
`DCT with Vector Quantization
`7.9
`7.10 MDCT with Vector Quantization
`7.1]
`Summary
`Problems
`Computer Exercises
`
`SUBBAND CODERS
`
`195
`
`195
`
`1%
`197
`198
`
`199
`
`200
`201
`
`20]
`
`201
`
`202
`
`203
`20-1
`205
`
`206
`207
`208
`208
`210
`
`211
`
`21 t
`2|2
`214
`318
`
`218
`220
`
`223
`
`221
`
`224
`226
`226
`__
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 9
`
`81
`
`8.2
`8.3
`
`8.4
`
`Introduction
`8.1.l
`Subhand Algorithms
`DWT and Discrete, Wavelet Packet 'l‘ranslorln (DWP’l'I
`Adapted WP Algoritlnm
`8.3.1
`DWPT Coder with Globally Adapted Daubcchics
`Analysis Wavelet
`Scalable DWPT Coder with Adaptive Trev. Structurc
`
`8.3.2
`
`8.3.3
`
`8.3.4
`
`8.3.5.
`
`DWPT Coder with Globally Adapted General
`Analysis Wavelet
`DWPT Coder with Adaptivc 'l'rce Structurc and
`Locally Adapted Analysis Wavelet
`DWPT (.‘odcr with Perceptually Optimized Synthesis
`Wavelets
`Adapted Nonunit'orm Filter Banks
`8.4.1
`Switched Nonunit‘orm Filter Bank Cascade
`8.4.2-
`Frcqucncy—Varying Modulated Lappcd Transforms
`Hybrid WP and Adapted WP/Sinusoidal Algorithn‘ix
`
`
`
`Page 9
`
`
`
`
`
`
`
`CONTENTS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`228
`
`229
`
`230
`
`233
`
`234
`
`235
`
`237
`
`237
`
`240
`
`241
`
`241
`
`242
`
`242
`
`245
`247
`
`248
`
`248
`
`248
`
`249
`250
`
`8.5.1
`
`8.5.2
`
`8.5.3
`
`Hybrid Sinusoidal/Classicrtl DWPT Coder
`
`Hybrid Sinusoidal/M-bzmd DWPT Coder
`
`Hybrid Sinusoidal/DWPT Coder with WP Tree
`Structure Adaptation (ARCO)
`
`Subbund Coding with Hybrid Filter Bank/CELP Algorithms
`
`8.6.1
`
`8.6.2
`
`Hybrid Subband/CELP Algorithm for Low-Delay
`Applications
`
`Hybrid Subband/CELP Algorithm for
`Low—Complexity Applications
`
`Subbzind Coding with HR Filter Banks
`Problems
`
`8.6
`
`8.7
`
`Computer Exercise
`
`SINUSOIDAL CODERS
`
`9.1
`
`9.2
`
`9.3
`
`9.4
`
`9.6
`
`9.7
`
`Introduction
`
`The Sinusoidal Model
`
`9.2.1
`
`Sinusoidal Analysis and Parameter Tracking
`
`Sinusoidal Synthesis and Parameter Interpolation
`9.2.2
`Analysis/Synthesis Audio Codec (ASAC)
`
`9.3.1
`
`9.3.2
`
`9.3.3
`
`ASAC Segmentation
`
`ASAC Sinusoidal Analysisiby—Synthesis
`
`ASAC Bit Allocation. Quantization, Encoding, and
`Scalability
`
`Harmonic and Individual Lines Plus Noise Coder (HUN)
`
`9.4.1
`
`9.4.2
`
`MIL-N Sinusoidal Analysis-by—Synthesis
`
`HILN Bit Allocation, Quantization. Encoding. and
`Decoding
`
`FM Synthesis
`
`9.5.1
`
`9.5.2
`
`Principles 0113M Synthesis
`
`Perceptual Audio Coding Using an FM Synthesis
`Model
`
`The Sines + Transients + Noise (S’I‘N) Model
`
`Hybrid Sinusoidal Coders
`
`9.7.1
`
`Hybrid Sinusoidal—MDC'I‘ Algorithm
`
`9.8
`
`Hybrid Sinusoidal-Vocodcr Algorithm
`9.7.2
`S u mm ary
`Problems
`
`
`
`
`
`Computer Exercises
`
`
`
`Page 10
`
`Page 10
`
`
`
`CONTENTS
`
`xm
`
`263
`
`203
`
`20—1
`
`264
`266
`
`200
`
`267
`
`207
`
`268
`208
`270
`
`275
`
`279
`283
`
`289
`309
`
`317
`
`319
`319
`
`321
`
`321
`
`323
`
`10 AUDIO CODING STANDARDS AND ALGORITHMS
`
`10.1
`
`Introduction
`
`10.2 MIDI I‘larxux Digital Audio
`
`10.2.1 MIDI Synthesizer
`10.2.2 General MIDI (GM)
`
`10.2.3 MIDI Applications
`10.3 Multichannel Surround Sound
`
`10.3.1 The Evolution of Surround Sound
`
`10.3.2 The Mono. the Stereo. and the Surround Sound
`FOI‘III'AIS
`
`10.3.3 The ITIJ—R 138.775 5.1-Cliannel Configuration
`104 AMWIEAumoSumdmds
`
`10.4.1 MPEG—1 Audio (ISO/IEC .11 172—3)
`10.4.2 MPEG—2 BC/I..SF (lSO/IEC—13818-3)
`
`10.4.3 MPEG-2 NBC/AAC (ISO/11301381 8-7)
`
`10.4.4 MPEG-4 Audio (ISO/IEC 14496—3)
`
`10.4.5 MPEG—7 Audio (lSO/IEC 15938-4)
`
`10.4.6 M PEG-21 Framework (ISO/113021000)
`
`10.4.7 MPEG Surround and Spatial Audio Coding
`10.5 Adaptive 'I‘ransl’orm Acoustic Coding (A'I‘RAC)
`10.6 Lucent Technologies PAC, EPAC. and M PAC
`
`Perceptual Audio Coder (PAC)
`10.6.1
`10.6.2 Enhanced PAC (EI’AC)
`
`10.6.3 Multichannel PAC (MPAC)
`
`10.7 Dolby Audio Coding Standards
`10.7.1 Dolby ACT—2, ACE—2A
`10.7.2 Dolby AC-3/Dolby Digital/Dolby SR - D
`10.8 Audio Processing Technology APT—x 100
`10.9 D'I‘S — Coherent Acoustics
`
`Framing and Subband Analysis
`10.9.1
`Psychoacoustic Analysis
`10.9.2
`10.9.3 ADPCM — Differential Suhband Coding
`10.9.4 Bit Allocation. Quantization. and Multiplexing
`10.9.5 DTS—CA Versus Dolby Digital
`Problems
`
`Computer Exercise.
`
`11 LOSSLESS AUDIO CODING AND DIGITAL WATERMARKING
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`11.1
`
`Introduction
`
`
`
`
`
`
`Page 11
`
`Page 11
`
`
`
`
`
`CONTENTS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`11.2 Lossless Audio Coding (LEAC)
`1
`1 .2.1
`IJ2AC Principles
`11.2.2
`[.ZAC Algorithms
`11.3 DVD—Audio
`
`11.3.1 Meridian LossIess Packing (MLP)
`11.4 Super-Audio CD (SACD)
`
`1 1.4.1
`
`SACD Storage Format
`
`11.4.2
`
`Sigma—Delta Modulators (SDM)
`
`11.4.3 Direct Stream Digital (DSD) Encoding
`
`11.5 Digital Audio Watermarking
`
`1 1.5.1 Background
`1 1.5.2 A Generic Architecture. for DAW
`
`11.5.3 DAW Schemes — Attributes
`
`1 1.6 Summary 01" Commercial Applications
`Problems
`
`Computer Exercise
`
`12 QUALITY MEASURES FOR PERCEPTUAL AUDIO CODING
`
`12.1
`
`Introduction
`
`12.2 Subjective Quality Measures
`
`12.3 Confounding Factors in Subjective Evaluations
`
`12.4 Subjective Evaluations of"1‘w0—('Zhanne1 Standardized Codecs
`
`12.5
`
`12.6
`
`Subjective Evaluations of 5.1—Channel Standardized Codecs
`
`Subjective EvaJuations Using Perceptual Measurement
`Systems
`
`12.6.1 CIR Perceptual Measurement Schemes
`
`12.6.2 NSE Perceptual Measurement Schemes
`
`12.7 Algorithms for Perceptual Measurement
`
`12.7.1
`
`Example: Perceptual Audio Quality Measure (PAQM)
`
`12.7.2 Example: Noise—to—Mask Ratio (NMR)
`
`344
`
`345
`
`346
`
`356
`
`358
`
`358
`
`362
`
`362
`
`364
`
`368
`
`370
`
`374
`
`377
`
`378
`
`382
`
`382
`
`383
`
`383
`
`384
`
`386
`
`387
`
`388
`
`389
`
`390
`
`391)
`391
`
`392
`
`396
`
`399
`
`401
`
`402
`
`405
`
`
`
`
`
`
`459
`INDEX
`
`
`
`
`12.7.3 Example: Objective. Audio Signal Evaluation (OASE)
`
`12.8
`
`ITU—R 138.1387 and lTU—T P.861: Standards for Perceptual
`Quality Measurement
`
`12.9 Research Directions For Perceptual Codec Quality Measures
`
`REFERENCES
`
`
`
`-~-— II|UF n ' rum
`
`Page 12
`
`Page 12
`
`
`
`CHAPTER 1
`
`
`
`
`
`INTRODUCTION
`
`
`Audio coding or audio mmprns‘simi algorithms are used to obtain compact dig-
`ital representations of high—fidelity (wideband) audio signals for the purpose of
`efficient transmission or storage. The central objective in audio coding is to rep-
`resent the signal with a minimum number ol’ hits while achieving transparent
`signal reproduction,
`i.e._. generating output audio that cannot be distinguished
`from the original input, even by a sensitive listener (“golden ears"). This text
`gives an in-depth treatment of algorithms and standards for transparent coding
`of high—fidelity audio.
`
`1.1 HISTORICAL PERSPECTIVE
`
`The introduction ol‘ the compact disc (CD) in the early 19805 brought to the
`fore all of the advantages of digital audio representation,
`including true high—
`tidelity, dynamic range, and robustness. These advantages, however. came at
`the expense of high data rates. Conventional CD and digital audio tape (DAT)
`systems are typically sampled at either 44.1 or 48 kHz using pulse code mod-
`ulation (PCM) with a [6—bit sample resolution. This results in uncompressed
`data rates 01‘ 705.6/768 kb/s for a monaural channel, or .1.4l/l.54 Mb/s for a
`stereo—pair. Although these data rates were accommodated successfully in first—
`generation CD and DAT players. second-generation audio players and wirelessly
`connected systems are often subject to bandwidth constraints that are incompat-
`ible with high data rates. Because of the success enjoyed by the first-generation
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Ant-iris Signal Processing and (lilting, by Andreas Spanias. Ted Paintcit and Venkatraman Atti
`Copyright © 30W by John Wiley & Sons. Inc.
`
`
`
`Page 13
`
`Page 13
`
`
`
`
`
`
`
`INTRODUCTION
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`systems, however, end users have come to expect “CD—quality” audio reproduc—
`tion from any digital system. Therefore. new network and wireless multimedia
`digital audio systems must reduce data rates without compromising reproduc-
`tion quality. Motivated by the need for compression algorithms that can satisfy
`simultaneously the conflicting demands of high compression ratios and trans—
`parent quality for high—fidelity audio signals, several coding methodologies have
`been established over the last two decades. Audio compression schemes, in gen—
`eral, employ design techniques that exploit both perceptual irrelevuac/es and
`Stan's/it'd! redundancies.
`
`PCM was the primary audio encoding scheme employed until the early 1980s.
`PCM does not provide any mechanisms for redundancy removal. Quantization
`methods that exploit the signal correlation, such as differential PCM (DPCM),
`delta modulation [Jaya76] [Jaya84]. and adaptive DPCM (ADPCM) were applied
`to audio compression later (cg, PC audio cards). Owing to the need for dras—
`tic reduction in bit rates, researchers began to pursue new approaches for audio
`coding based on the principles rlfpS)7(5IZOUC‘OMA‘I‘H‘S [Zwic90] [Moor03J. Psychoa—
`coustic notions in conjunction with the basic properties of signal quantization
`have led to the theory of perceptual wimpy lJohn88a] |John88b]. Perceptual
`entropy is a quantitative estimate 01" the fundamental limit of transparent audio
`signal compression. Another key contributirm to the field was the characterization
`of the auditory filter bank and particularly the time-frequency analysis capabili—
`ties of the inner ear [Moor83]. Over the years. severalfilter/yank structures that
`mimic the critical band structure of the auditory filter bank have been proposed.
`A filter bank is a parallel bank of bandpass filters covering the audio spectrum,
`which, when used in conjunction with a perceptual model, can play an important
`role in the identification of perceptual irrelevancies.
`During the early 1990s, several workgroups and organizations such as
`the International Organization for Standardization/Intemational Electro—technical
`Commission (lSO/lEC),
`the International Telecommunications Union (ITU).
`AT&T. Dolby Laboratories. Digital Theatre Systems (DTS), Lueent Technologies,
`Philips, and Sony were actively involved in developing perceptual audio coding
`algorithms and standards. Some of the popular commercial standards published
`in the early 1990s include Dolby’s Audio Coder—3 (AC3). the DTS Coherent
`Acoustics (DTS—CA), Lucent Technologies’ Perceptual Audio Coder (PAC),
`Philips’ Precision Adaptive Subband Coding (PASC). and Sony's Adaptive
`Transform Acoustic Coding (ATRAC). Table 1.]
`lists chronologically some of
`the prominent audio coding standards. The connnercial success enjoyed by
`these audio coding standards triggered the launch of several multimedia storage
`formats.
`
`Table 1.2 lists some of the popular multimedia storage formats since the begin—
`ning of the CD era. lligit-performance stereo systems became quite common with
`the advent of CDs in the early l980s. A compactidisc—read only memory (CD-
`ROM) can store data up to 700—800 MB in digital form as "mieroscopic»pits"
`that. can be read by a laser beam off of a reflective surface or a medium. Three
`competing storage media — DAT. the digital compact cassette (DCC), and the
`
`
`
`_m._.._ “a...“ mm-g.
`.
`l'I'1'"I'!I u mun I.
`..
`----~mnr--ur : “I'm“.—
`
`
`Page 14
`
`Page 14
`
`
`
` HISTORICAL PERSPECTIVE
`
`3
`
`Table 1.1. List of perceptual and lossless audio coding standards/algorithins.
`
`Standard/LII gori t hut
`
`Related references
`
`I. lSO/lEC MPEG—1 audio
`
`2. Philips” PASC (l'or DCC applications)
`3. AT&'l‘/1'.ucent PAC/.i-{PAC
`4i Dolby AC2
`5. AC—3/Dolby Digital
`0. ISO/1H7 MPEG—2 (BC/1.81") audio
`7. Sony's A'I‘RAC; (MiniDisc and SDDS)
`8. SHOR’I‘EN
`
`[1301.92]
`[Lokh92]
`[.lohng6c] [Sinh96]
`['Davi92] [Fiel91j
`[Davi593l lr‘iel96]
`[8019421]
`[Yosh94] l'l‘sut96]
`lRObi94j
`[Wyli96b]
`9. Audio processing technoliiigy — APT-XIOO
`[180,196]
`JO. ISO/Hit? MPEG-2 AAC
`[Smyt96] [Smyt99]
`ll. DTS coherent acoustics
`[ct-awe] [Crav97]
`[2. The DVD Algorithm
`[Wege97]
`l3. MUSICompress
`[Pura97l
`l4. l.i()ssless transform coding of audio (III‘AC)
`[HansQSh] [HansOl]
`i5. AudioPuK
`[180199]
`16. ISO/IEC MPEG—4 audio version 1
`[Gerz99]
`l7. Meridian lossless packing (MLP)
`IISOIOOJ
`18. lSO/lEC MPEG—4 audio version '2
`lGeigOII [GeigOZl
`t9. Audio coding based on integer transforms
`lReefOla] [.lansOSJ
`2t). Direct—strewn digital t’DSD) technology
`
`
`Table 1.2. Some of the popular audio storage
`formats.
`
`
`Related references
`Audio storage format
`
`
`Pym».-
`
`Compact disc
`Di gita’l audio tape (DAT)
`Digital COITIP'dCi cassette (DCC)
`MiniDisc
`
`‘Jl
`Digital versatile disc (DVD)
`6. DVD-audio (DVD-A)
`>1
`Super audio CD (SACD)
`
`icoszt [IECAH'H
`twniksst muse]
`[Lulx'llq l
`t [Lnl-tlt‘JI]
`trusimt i'rsuttim
`tot/Dom
`tot-Inn I
`[sacrum
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 15
`
`MiniDisc (MD) — entered the commercial market during 1987—1992. intended
`mainly for back—up hi ghidensity storage (~ I .3 GB), the DAT became the primary
`source of mass data storage/transfer [Watk88] [Ttm89]. in 1991—1992. Sony pro—
`posed a storage medium called the MiniDisc. primarily for audio storage. MD
`employs the ATRAC algorithm for compression. In I991, Philips introduced the
`DCC, a successor of the analog compact cassette. Philips DCC employs a com—
`pression scheme called the PASC [Lokhgl] [Lok1192] [Hoog94]. The DCC began
`
`
`
`Page 15
`
`
`
`
`
`4
`
`INTRODUCTION
`
`as a potential competitor for DA’I‘s but was discontinued in [996. The introduc—
`tion of the digital versatile disc (DVD) in 1996 enabled both video and audio
`recording/storage as well as text—message programming. The DVD became one
`of the most successful storage. media. With the improvements in the audio conr
`pression and DVD storage technologies. multichannel surround sound encoding
`formats gained interest |Bosi93] ll-lolrnQ9j [BosiOO].
`late
`With the emergence of streaming audio applications. during the
`19905,
`researchers pursued techniques such as combined speech and audio
`architectures, as well as joint sourceichannel coding algorithms that are optimized
`for the packet—switched Internet. The advent of ISO/IEC MPEG-4 standard
`
`(1996—2000) [180199] [lSOlOO] established new research goals for high—quality
`coding of audio at low bit rates. MPEG—4 audio encompasses more functionality
`than perceptual coding [KoenQB] [Koen99|. It comprises an integrated family of
`algorithms with provisions for scalable, object-based speech and audio coding at
`bit rates from as low as 200 b/s up to 64 kb/s per channel.
`The emergence of the DVD-audio and the super audio CD (SAC'D) pro-
`vided designers with additional storage capacity, which moti 'ated research in
`lossless audio coding [Crav96] ['Ger‘zQQ] [ReefOla]. A losslcss audio coding sys—
`tem is able to reconstruct perfectly a bit—for—bit representation of the original
`input audio. In contrast, a coding scheme incapable of perfect reconstruction is
`called lass): For most audio program material, lossy schemes offer the advair
`tagc of lower bit rates (cg. less than l bit per sample) relative to lossless
`schemes (cg, 10 bits per sample). Delivering real—time lossless audio content
`to the network browser at low bit rates is the next grand challenge for codec
`designers.
`
`1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE
`
`Over the last few years, researchers have proposed several efficient signal models
`(cg, transforn’t—based. subband-filtcr structures, wavelet—packet) and compression
`standards (Table H) for high—quality digital audio reproduction. Most of these
`algorithms are based on the generic architecture shown in Figure l.l.
`The coders typically segment input signals into quasi—stationary frames ranging
`from 2 to 50 ms. Then, a time—frequency analysis section estimates the temporal
`and spectral components of each frame. The time—frequency mapping is usually
`matched to the analysis properties of the human auditory system. Either way.
`the ultimate objective is to extract from the input audio a set of time—frequency
`parameters that is amenable to quantization according to a perceptual distortion
`metric. Depending on the overall design objectives. the time-frequency analysis
`section usaally contains one of the following:
`
`. Unitary transform
`
`- Time—invariant bank of critically sampled. uniform/nonuniforrn bandpass
`filters
`
`m,“ ,
`
`_
`
`.. "mu.-.” .
`
`, m gun I'
`
`k urn-run mmu-
`
`--
`
`‘--—Imrrm"w
`
`Page 16
`
`Page 16
`
`
`
`_‘
`Input
`l
`'
`Time»
`|
`audio
`Q antiz t'
`,_
`JP frequency —. arid encgdiirn _' *
`i
`analysts
`g
`7
`
`AUDIO CODER A'l‘l'FllBUTES
`
`5
`
`Parameters
`
`_
`_
`'
`t |
`
`
`
`
`"
`To
`channel
`
`rates (<32 kb/s), with an acceptable
`
`
`
`i
`
`[P
`
`_..
`
`____
`
`.
`r _
`sychoacoustic
`analysis —r Bit allocation
`_ n; i
`..
`.
`
`.
`
`__,
`
`‘7,
`
`- 9 -
`
`Entropy
`(loss-tees}
`coding
`
`MUX
`
`'
`
`:
`
`Masking
`thresholds
`
`Side
`information
`
`Figure 1.1. A generic perceptual audio encoder.
`
`. Timewarying (signal—adaptive) bank of critically sampled, uniform/nonunitl
`orm bandpass filters
`- Harmonic/sinusoidal analyzer
`
`. Source—system analysis (LPG and multipulse excitation)
`
`- Hybrid versions of the above.
`
`The choice of time—frequency analysis methodology always involves a fun-
`damental tradeotT between time and frequency resolution requirements. Percep-
`tual distortion control
`is achieved by a psychoacoustic signal analysis section
`that estimates signal masking power based on psychoacoustic principles. The
`psychoacoustic model delivers masking thresholds that quantify the maximum
`amount of distortion at each point in the limerfrequency plane such that quan-
`tization of the time-frequency parameters does not introduce audible artifacts.
`The psychoacoustic model therefore allows the quantization section to exploit
`perceptual
`irrelevancies. This section can also exploit statistical redundancies
`through classical techniques such as DPCM or ADPCM. Once a quantized corn—
`pact parametric set has been formed, the remaining redundancies are typically
`removed through noiseless runilcngth (RL) and entropy coding techniques, eg,
`Huffman [Covc9ll. arithmetic [Witt87], or Lempel-Ziv—Welch (LZW) ['Ziv77l
`[Welc84]. Since the output of the psychoacoustic distortion control model
`is
`signal—dependent, most algorithms are inherently variable rate. Fixed channel
`rate requirements are usually satisfied through buffer feedback schemes, which
`often introduce encoding delays.
`
`1.3 AUDIO CODER ATTRIBUTES
`
`Perceptual audio coders are typically evaluated based on the following attributes:
`audio reproduction quality. operating bit rates. computational complexity, codec
`delay. and channel error robustness. The objective is to attain a high—quality
`(transparent) audio output at
`low bit
`
`Page 17
`
`Page 17
`
`
`
`
`
`lNTRODUCTION
`
`6
`
`
`
`
`
`
`
`
`
`
`
`
`algorithmic delay (~5 to 20 ms). and with low computational complexity (~l to
`ID million instructions per second, or MIPS).
`
`1.3.1 Audio Quality
`
`importance when designing an audio coding
`Audio quality is of paramount
`algorithm. Successful strides have been made since the development of simi
`ple near—transparent perceptual coders. Typically, classical objective measures of
`signal fidelity such as the signal
`to noise ratio (SNR) and the total harmonic
`distortion (THD) are inadequate [Ryde96]. As the field of perceptual audio cod—
`ing matured rapidly and created greater demand for listening tests, there was a
`corresponding growth of interest in perceptual measurement schemes. Several
`subjective and objective quality measures have been proposed and standard—
`ized during the last decade. Some of these schemes include the noisc-to—mask
`
`the perceptual audio quality measure (PAQM,
`I987) [BranS7a]
`ratio (NMR.
`I991) [Beer9l], the perceptual evaluation (PERCFVAL, 1992) [PailQZ], the per—
`ceptual objective measure (POM. 1995) [C(ilo95], and the objective audio signal
`evaluation (OASE. I997) [Spor97]. We will address these and several other qual-
`ity assessment schemes in detail in Chapter l2.
`
`1.3.2 Bit Rates
`
`From a codec designer’s point of view. one of the key challenges is to rep—
`resent high-fidelity audio with a minimum number of bits. For instance.
`if a
`5-ms audio frame sampled at 48 kHz (240 samples per frame) is represented
`using 80 bits, then the encoding bit, rate would be 80 bits/5 ms = 16 kb/s. Low
`bit rates imply high compression ratios and generally low reproduction qual—
`ity. Early coders such as the ISO/113C MPEG—1 (32—448 kh/s). the Dolby AC—3
`(32—384 kb/s), the Sony ATRAC (256 kb/s), and the Philips PASC (193 kb/s)
`employ high bit rates for obtaining transparent audio reproduction. However. the
`development of several sophisticated audio coding tools te.g., MPEG—4 audio
`tools) created ways for efficient transmission or storage of audio at rates between
`8 and 32 kb/s. Future audio coding algorithms promise to offer reasonable qual-
`ity at low rates along with the ability to scale both rare and quality to match
`different requirements such as time 'a'rying channel capacity.
`
`1.3.3 Complexity
`
`Reduced computational complexity not only enables real—time implementation
`but may also decrease the power consumption and extend battery life. Corn
`putational complexity is usually measured in terms ol~ millions of instructions
`per second (MIPS). Complexity estimates are processor—dependent. For example,
`the complexity associated with Dolby’s AC—3 decoder was estimated at approxi—
`mately 27 MIPS using the Zoran ZR38001 general-ptn‘pose DSP core [.Vern95];
`for the Motorola DSPSGUOZ processor,
`the complexity was estimated at 45
`MIPS [Vet-1.195]. Usually. most of the audio codccs rely on the so—called asym—
`metric encoding principle. This means that the codec complexity is not evenly
`
`
`
`
`
`
`
`- ‘Wl_"-III|'IEYI u-«r .__,..
`
`tum-v-
`
`flkll‘fl'" -'m
`
`Page 18
`
`Page 18
`
`
`
`Shared between the encoder and the decoder (typically, encoder 80% and decoder
`20% complexity). with more emphasis on reducing the decoder complexity.
`
`1.3.4 Codec Delay
`
`Many of the network applications for high—fidelity audio (streaming audio. audio—
`onidemandl are delay tolerant (up to 100—200 ms). providing the opportunity
`to exploit
`long—term signal properties in order to achieve high coding gain.
`However.
`in two-way real—time communication and voice—over lnternet proto
`col (VolP) applications, low-delay encoding (10720 ms) is important. Consider
`the example described before. i.c., an audio coder operating on frames of 5 ms
`at a 48 kHz sampling frequency. In an ideal encoding scenario, the minimum
`amount of delay should he 5. ms at the encoder and 5 ms at the decoder (same as
`the frame length). However, other factors such as analysis—synthesis filter bank
`window,
`the loolcahead. the bit—reservoir. and the channel delay contribute to
`additional delays. Employing shorter analysis—synthesis windows, avoiding look
`ahead, and re—structuring the bit—reservoir functions could result in low—delay
`encoding, nonetheless. with reduced coding efficiencies.
`
`TYPES OF AUDIO CODERS — AN OVERVIEW
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1.3.5 Error Robustness
`
`The increasing popularity of streaming audio over packet—switched and wire-
`less networks such as the Internet implies that any algorithm intended for such
`applications must be able to deal with a noisy time—varying channel. In partic—
`ular, provisions for error robustness and error protection must be incorporated
`at the encoder in