throbber
LIBRARY OF CONGRESS
`
`Office of Business Enterprises
`Duplication Services Section
`
`THIS IS TO CERTIFY that the collections of the Library of Congress contain a bound
`volume entitled AUDIO SIGNAL PROCESSING AND CODING, call number TK
`5102.92.S73 2007, Copy 1. The attached — Cover Page, Title Page, Copyright Page, Table of
`Contents Pages, Chapter 1 and Chapter 10 - are a true and complete representation from that
`work.
`
`THIS IS TO CERTIFY FURTHER, that work is marked with a Library of Congress
`Cataloging—in-Publication stamp dated March 5, 2007.
`
`IN WITNESS WHEREOF, the seal of the Library of Congress is affixed hereto on
`May 18, 2018.
`
`Deirdre Scott
`
`Library of Congress
`
`Business Enterprises Officer
`Office of Business Enterprises
`
`NETFLIX, INC
`101 Independence Avenue, SE Washington, DC 20540—4917 Tel 202.707.5650 www.loc.grov; duplicntionservices@loc.gov Exhibit 1 009
`
`|PR2018—01630
`
`Page 1
`
`Page 1
`
`
`
`NETFLIX, INC
`Exhibit 1009
`IPR2018-01630
`
`

`

`
`
`
`
`
`
`
`
`Page 2
`
`

`

`
`
`AUDIO SIGNAL
`PROCESSING
`AND CODING
`
`
`
`Andreas §panias
`Ted Painter
`
`Venkatraman Atti
`
`
`
`
`
`Page 3
`
`I
`.‘
`318075-
`
`giwi'WILEYf
`
`gzoo7€
`
`
`
`WILEYMINTERSCIENCE
`A John WIIey & Sons, Inc” Publication
`
`Page 3
`
`

`

`
`
`«A;
`
`.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`=
`.
`r ' ’3 _ r
`i
`Copvright ’O 2007 hi; John Willfi-"dt 50%; Idiots/kill rrglrtsgreserved
`Published by John Wile; injjurrs. Ines llobokenfifilewfiJersey.
`Published simultaneously in (‘irrifi‘t‘la --
`--
`
`No part of ll'It\ Public-“Inn may ht- reproduced. stored in a retrieval 5:} «rent. or transmitted in Ho):
`lLJIII'I or by any na‘ans. electronic. incrltaniunl. photocopying. recording. scanning or “lllt‘I'uisv,
`{'r‘J'L'lil as permitted under Section ll]?f nr [tit-t of the WW] United States Copyright J‘th.
`\Vllllout
`onlm the pltrti' wiritmt puttnimnn til the Publisher. or :mthnrimtmn through payment or the
`appropriate per-copy fee to the Copyright Clearance Center, lnc.. 222 Rosewood Drive. Danvers,
`MA 01923. (9750 750-8400, fax (978) 750-4-l‘ftr, or on the web at wnn .t'opyrightcom. RCQUCSI'X
`to the Publisher {or permission should be addressed to the Permissions Department. John W'lley &
`Sons, lnc.. lll River Street. Hoboken. NJ 07030, (20]) 748-60] 1, fax (201) 7486008. or online at
`http://mvwwileycorn/go/permission
`
`Limit of Liabilily/Diselairncr of Warranty: While the publisher and author have used their best
`efforts in preparing this book,
`they make no representations or warranties with respect to the
`accuracy or completeness ol‘ the contents of this book and specifically disclaim any implied
`warranties of merchantability or fitness for a particular purpose. No warranty may be created or
`extended by sales representatives or written sales materials. The advice and strategies contained
`herein may not be suitable for your situation You should consult with a professional where
`appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other
`commercial damages, including but not limited to special, incidental. consequential, or other
`damages.
`
`For general information on our other products and services or for technical support, please contact
`our Customer Care Department within the United States at {800) 762—2974, outside the United
`States at (317) 572-3993 or fax (3]7) 572—4002.
`
`Wiley also publishes its books in a variety of electronic formats. Some content that appears in print
`may not be available in electronic formats. For more information about Wiley products. visit our
`web site at \vwiwileyzcom.
`
`Wiley Bicentennial Logo: Richard J. Pacitico
`
`Library of Congress Catalogiug-iii-Publication Data:
`
`Spanias, Andreas.
`
`Audio signal processing and coding/by Andreas Spanius, 'l‘ed Painter, Venkutraman Atti.
`p. cm.
`
`“\N’iley-lnterscicnce publication."
`Includes bibliographical references and index.
`ISBN: 978—0-471—79147—8
`
`t'rrding Iliuory. 2. Signal processing "lingual lt'L‘lmiquc‘s. 3 Round—Recording and
`l
`reprntlmmi: - Digital techniques. I. Painter. Ted. tum-ll. Atti, \r‘ertkutrannm.
`IUTS—IH.
`Title.
`
`'l‘KS l02.92.S73 2006
`62 l .382‘87dc22
`
`Printed in the United States of America.
`
`2006040313....
`.uw‘rw”
`
`1098765432]
`
`
`
`
`.
`..
`.
`..
`.
`....
`.
`.— a;- _.-_..
`and“ lust
`l
`. u I
`|.II. klllllr.‘
`
`- , Ml.
`.- u - Niacin—n“
`
`
`Page 4
`
`Page 4
`
`

`

`
`CONTENTS
`
`
`
`
`PREFACE
`
`xv
`
`1
`
`INTRODUCTION
`
`1.1
`1.2
`1.3
`
`Historical Perxpccrive
`A General Perceptual Audio Coding Architecture
`Audio Coder Attributes
`
`1.31
`1.3.2
`
`1.3.3
`1.3.4
`1.3.5
`
`Audio Quality
`Bit Rates
`
`Complexity
`Codec Delay
`Error Robustness
`
`1.4
`1.5
`1.6
`
`Types of Audio Coders ~ An Overview
`Organization 01' 11113 Book
`Notationnl Conventions
`Problems
`Computer Exercises
`
`2
`
`SIGNAL PROCESSING ESSENTIALS
`
`2.1
`2.2
`2.3
`2.4
`2.5
`
`Introduction
`Spectra of Analog Signals
`Review of Convolution and Filtering
`Uniform Sampling
`Discrete—lime Signal Processuig
`
`1
`
`l
`4
`5
`
`()
`6
`
`6
`7
`
`7
`8
`9
`|
`|
`
`1
`1
`
`'13
`
`13
`13
`1(3
`'7
`211
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 5
`
`Page 5
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Structured VQ
`3.4.1
`Split—VQ
`3.4.2
`
`Conjugate-Structure VQ
`3.4.3
`
`Bit-Allocation Algorithms
`
`Entropy Coding
`
`3.6.1
`Huffman Coding
`
`3.6.2
`Rice Coding
`
`3.6.3
`Golomb Coding
`
` - " nun-u amt-1H
`
`
`
`Page 6
`
`VIII
`
`CONTENTS
`
`2.5.1
`2.5.2
`
`2.5.3
`
`2.5.4
`
`Transforms for I‘Discrete—Time Signals
`The Discrete and the Fast Fourier Transform
`
`The Discrete Cosine Transform
`
`The Short-Ti me Fourier Transform
`
`Difference Equations and Digital Filters
`
`The 'Il‘ransfer and the Frequency Response Functions
`
`2.7.]
`
`Poles. Zeros. and Frequency Response
`
`22
`
`23
`q
`23
`
`25
`
`27
`
`30’3
`
`J 3
`
`3
`
`35
`’1
`.
`36
`
`36
`39
`
`42
`
`44
`
`44
`
`45
`
`47
`
`(i9
`
`74
`
`77
`
`8 r
`
`82
`
`Examples of Digital Filters for Audio Applications
`2.7.2
`Review of Multirate Signal Processing
`
`2.8.].
`
`2.8.2
`
`2.8.3
`2.8.4
`
`Down—sampling by an Integer
`
`Up—sampling by an Integer
`
`Sampling Rate Changes by Noninteger Factors
`Quadrature Mirror Filter Banks
`
`2.9
`
`2.10
`
`Discrete—lime Random Signals
`
`Random Signals Processed by L'I‘l Digital Filters
`2.9.1
`Autocorrelation Estimation from Finite—Length Data
`2.9.2
`Summary
`Problems
`
`Computer Exercises
`
`QUANTIZATION AND ENTROPY CODING
`
`3.
`
`Introduction
`
`3.1.]
`
`The Quantization—Bit Allocation—Entropy Coding
`Module
`
`3.2
`
`3.3
`
`3.4
`
`Density Functions and Quantization
`Scalar Quantization
`
`3.3.]
`3.3.2
`3.3.3
`
`Uniform Quantization
`Nonuniform Quantization
`Differential PCM
`
`Vector Quantization
`
`Page 6
`
`

`

`
`
`CONTENTS
`
`ix
`
`3.7
`
`Arithmetic Coding
`3.6.4
`Summary
`Problems:
`
`(.30mputer Exercises
`
`4
`
`LINEAR PREDICTION IN NARROWBAND AND WIDEBAND
`CODING
`
`4.]
`
`4.2
`4.3
`
`4.4
`4.5
`
`Introduction
`
`LPBased Source-System Modeling I‘or Speech
`Short—Term Linear Prediction
`
`4.3.1
`4.3.2
`
`Long-Term Prediction
`ADPCM Using Linear Prediction
`
`(')pen—l..oop Analysis-Synthesis Linear Prediction
`Aiialysis—by—Synt'liesis Linear Prediction
`4.5.1
`Code-Excited Linear Prediction Algorithmx
`
`4.6
`
`Linear Prediction in Wideband Coding
`
`4.7
`
`4.6.1 Wideband Speech Coding
`4.6.2 Wideband Audio Coding
`Sn Inmary
`Problems
`Computer Exercises
`
`5
`
`PSYCHOACOUSTIC PRINCIPLES
`
`5.1
`5.2
`5.3
`5.4
`
`5.5
`3.
`5.7
`
`Introduction
`Absolute ’I‘Ilrcsliold of Hearing
`Critical Bands
`Simultancoux Masking, Masking Asymmetry. and the Spread
`ol‘ Masking
`5.4.1
`Noise-Masking—Tone
`5.4.2
`Tone—Mashrig-Noise
`5.4.3
`NoiserMasking-Noise
`5.4.4
`Asymmetry 01. Masking
`5.4.5
`The Spread 01‘ Atlasking
`Nonsimultaneous IVIleklnil
`Perceptual Entropy
`[Example (‘nilt‘t' Perceptual l‘vhnl-cl'. [St HIEC 11172-3
`tMl’l‘iG .
`ll l’xyclioacottnin‘ Model
`I
`5.7.1
`Step 1: Spectral Allah-xix and SPL Normali/ation
`
`
`
`53
`55
`35
`
`8!»
`
`91
`
`9|
`
`92
`()4
`
`9.5
`96
`
`90
`97
`1110
`
`102
`
`102
`104
`100
`1(17
`1011
`
`113
`
`113
`1 14
`”5
`
`120
`123
`12-1
`l24
`12-1
`125
`127
`128
`
`130
`131
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 7
`
`Page 7
`
`

`

`
`
`X
`
`CONTENTS
`
`5.7.2
`
`5.7.3
`
`5.7.4
`
`Step 2: Identification of Tonal and Noise Maskers
`
`Step 3: Declination and Reorganization of Maskers
`
`Step 4: Calculation ol‘ Individual Masking "Thresholds
`
`Step 5: Calculation of Global Masking Thresholds
`5.7.5
`Perceptual Bit Allocation
`
`Summary
`Problen‘is
`
`Computer Exercises
`
`5.8
`
`5.9
`
`6
`
`TIME-FREQUENCY ANALYSIS: FILTER BANKS AND
`TRANSFORMS
`
`6.1
`
`6.2
`
`6.3
`
`6.4
`
`6.5
`
`6.6
`6.7
`
`6.8
`
`6.9
`
`6.10
`
`Introduction
`
`Analysis-Synthesis Framework for M—band Filter Banks
`
`Filter Banks for Audio ('lotling21)esign Considerations
`
`6.3.1
`
`6.3.2
`
`6.3.3
`
`The Role of Time—Frequency Resolution in Masking
`Power 1556111216011
`
`The Role of Frequency Resolution in Perceptual Bit
`Allocation
`
`The Role ol~ Time. Resolution in Perceptual Bit
`Allocation
`
`Quadrature Mirror and Conjugate Quadrature Filters
`
`Tree—Structured QMF and CQF M»band Banks
`
`Cosine Modulated “Pseudo QMF" M —band Banks
`Cosine i'\/I(,)t:lulated Perfect Reconstruction (PR) M—band Banks
`and the Modified Discrete Cosine Transform (MDCT)
`6.7.1
`Forward and Inverse MDCT
`
`6.7.2 MDCT Window Design
`
`Example MDCT Windows (Prototype FIR Filters)
`6.7.3
`Discrete Fourier and Discrete Cosine 'l‘ransl‘orm
`
`Pre-echo Distortion
`
`Pre—echo Control Strategies
`6.10.1
`Bit Reservoir
`
`6.10.2 Window Switching
`6.10.3 Hybrid. Switched Filter Banks
`6.10.4 Gain Modification
`
`6.1 1
`
`6.10.5 Temporal Noise. Shaping
`Summary
`Problems
`
`Computer Exercises
`
`131
`
`135
`
`136
`
`138
`138
`
`140
`140
`
`141
`
`145
`
`145
`
`146
`
`148
`
`149
`
`149
`
`150
`
`155
`
`156
`
`160
`
`163
`165
`
`165
`
`167
`178
`
`180
`
`182
`182
`
`182
`134
`185
`
`185
`186
`188
`
`191
`
`....m _....
`
`. ._ .. ..
`
`...... "......
`
`--—.-...m.r.. 1m
`
`Page 8
`
`Page 8
`
`

`

`xi
`CONTENTS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`TRANSFORM CODERS
`
`7.]
`
`7.2
`7.3
`
`Introduction
`
`Optimum Coding in the lircqucncy Domain
`Perceptual 'l‘ransform Coder
`7.3.]
`PXth
`
`7.3.2
`
`SEPXFM
`
`7.4
`7.5
`
`Bratndenburgdolntston Hybrid Coder
`CNET Codcrfi
`
`7.5.1
`
`7.5.2
`
`7.5.3
`
`CNET DF'I‘ Coder
`
`CNET MDC'I' Coder I
`
`CNET MDC’l' Coder 2
`
`7.6
`7.7
`7.8
`
`Adaptive Spectral Entropy Coding
`Differential Perceptual Audio Coder
`DFT Noise Substitution
`
`DCT with Vector Quantization
`7.9
`7.10 MDCT with Vector Quantization
`7.1]
`Summary
`Problems
`Computer Exercises
`
`SUBBAND CODERS
`
`195
`
`195
`
`1%
`197
`198
`
`199
`
`200
`201
`
`20]
`
`201
`
`202
`
`203
`20-1
`205
`
`206
`207
`208
`208
`210
`
`211
`
`21 t
`2|2
`214
`318
`
`218
`220
`
`223
`
`221
`
`224
`226
`226
`__
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 9
`
`81
`
`8.2
`8.3
`
`8.4
`
`Introduction
`8.1.l
`Subhand Algorithms
`DWT and Discrete, Wavelet Packet 'l‘ranslorln (DWP’l'I
`Adapted WP Algoritlnm
`8.3.1
`DWPT Coder with Globally Adapted Daubcchics
`Analysis Wavelet
`Scalable DWPT Coder with Adaptive Trev. Structurc
`
`8.3.2
`
`8.3.3
`
`8.3.4
`
`8.3.5.
`
`DWPT Coder with Globally Adapted General
`Analysis Wavelet
`DWPT Coder with Adaptivc 'l'rce Structurc and
`Locally Adapted Analysis Wavelet
`DWPT (.‘odcr with Perceptually Optimized Synthesis
`Wavelets
`Adapted Nonunit'orm Filter Banks
`8.4.1
`Switched Nonunit‘orm Filter Bank Cascade
`8.4.2-
`Frcqucncy—Varying Modulated Lappcd Transforms
`Hybrid WP and Adapted WP/Sinusoidal Algorithn‘ix
`
`
`
`Page 9
`
`

`

`
`
`
`
`CONTENTS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`228
`
`229
`
`230
`
`233
`
`234
`
`235
`
`237
`
`237
`
`240
`
`241
`
`241
`
`242
`
`242
`
`245
`247
`
`248
`
`248
`
`248
`
`249
`250
`
`8.5.1
`
`8.5.2
`
`8.5.3
`
`Hybrid Sinusoidal/Classicrtl DWPT Coder
`
`Hybrid Sinusoidal/M-bzmd DWPT Coder
`
`Hybrid Sinusoidal/DWPT Coder with WP Tree
`Structure Adaptation (ARCO)
`
`Subbund Coding with Hybrid Filter Bank/CELP Algorithms
`
`8.6.1
`
`8.6.2
`
`Hybrid Subband/CELP Algorithm for Low-Delay
`Applications
`
`Hybrid Subband/CELP Algorithm for
`Low—Complexity Applications
`
`Subbzind Coding with HR Filter Banks
`Problems
`
`8.6
`
`8.7
`
`Computer Exercise
`
`SINUSOIDAL CODERS
`
`9.1
`
`9.2
`
`9.3
`
`9.4
`
`9.6
`
`9.7
`
`Introduction
`
`The Sinusoidal Model
`
`9.2.1
`
`Sinusoidal Analysis and Parameter Tracking
`
`Sinusoidal Synthesis and Parameter Interpolation
`9.2.2
`Analysis/Synthesis Audio Codec (ASAC)
`
`9.3.1
`
`9.3.2
`
`9.3.3
`
`ASAC Segmentation
`
`ASAC Sinusoidal Analysisiby—Synthesis
`
`ASAC Bit Allocation. Quantization, Encoding, and
`Scalability
`
`Harmonic and Individual Lines Plus Noise Coder (HUN)
`
`9.4.1
`
`9.4.2
`
`MIL-N Sinusoidal Analysis-by—Synthesis
`
`HILN Bit Allocation, Quantization. Encoding. and
`Decoding
`
`FM Synthesis
`
`9.5.1
`
`9.5.2
`
`Principles 0113M Synthesis
`
`Perceptual Audio Coding Using an FM Synthesis
`Model
`
`The Sines + Transients + Noise (S’I‘N) Model
`
`Hybrid Sinusoidal Coders
`
`9.7.1
`
`Hybrid Sinusoidal—MDC'I‘ Algorithm
`
`9.8
`
`Hybrid Sinusoidal-Vocodcr Algorithm
`9.7.2
`S u mm ary
`Problems
`
`
`
`
`
`Computer Exercises
`
`
`
`Page 10
`
`Page 10
`
`

`

`CONTENTS
`
`xm
`
`263
`
`203
`
`20—1
`
`264
`266
`
`200
`
`267
`
`207
`
`268
`208
`270
`
`275
`
`279
`283
`
`289
`309
`
`317
`
`319
`319
`
`321
`
`321
`
`323
`
`10 AUDIO CODING STANDARDS AND ALGORITHMS
`
`10.1
`
`Introduction
`
`10.2 MIDI I‘larxux Digital Audio
`
`10.2.1 MIDI Synthesizer
`10.2.2 General MIDI (GM)
`
`10.2.3 MIDI Applications
`10.3 Multichannel Surround Sound
`
`10.3.1 The Evolution of Surround Sound
`
`10.3.2 The Mono. the Stereo. and the Surround Sound
`FOI‘III'AIS
`
`10.3.3 The ITIJ—R 138.775 5.1-Cliannel Configuration
`104 AMWIEAumoSumdmds
`
`10.4.1 MPEG—1 Audio (ISO/IEC .11 172—3)
`10.4.2 MPEG—2 BC/I..SF (lSO/IEC—13818-3)
`
`10.4.3 MPEG-2 NBC/AAC (ISO/11301381 8-7)
`
`10.4.4 MPEG-4 Audio (ISO/IEC 14496—3)
`
`10.4.5 MPEG—7 Audio (lSO/IEC 15938-4)
`
`10.4.6 M PEG-21 Framework (ISO/113021000)
`
`10.4.7 MPEG Surround and Spatial Audio Coding
`10.5 Adaptive 'I‘ransl’orm Acoustic Coding (A'I‘RAC)
`10.6 Lucent Technologies PAC, EPAC. and M PAC
`
`Perceptual Audio Coder (PAC)
`10.6.1
`10.6.2 Enhanced PAC (EI’AC)
`
`10.6.3 Multichannel PAC (MPAC)
`
`10.7 Dolby Audio Coding Standards
`10.7.1 Dolby ACT—2, ACE—2A
`10.7.2 Dolby AC-3/Dolby Digital/Dolby SR - D
`10.8 Audio Processing Technology APT—x 100
`10.9 D'I‘S — Coherent Acoustics
`
`Framing and Subband Analysis
`10.9.1
`Psychoacoustic Analysis
`10.9.2
`10.9.3 ADPCM — Differential Suhband Coding
`10.9.4 Bit Allocation. Quantization. and Multiplexing
`10.9.5 DTS—CA Versus Dolby Digital
`Problems
`
`Computer Exercise.
`
`11 LOSSLESS AUDIO CODING AND DIGITAL WATERMARKING
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`11.1
`
`Introduction
`
`
`
`
`
`
`Page 11
`
`Page 11
`
`

`

`
`
`CONTENTS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`11.2 Lossless Audio Coding (LEAC)
`1
`1 .2.1
`IJ2AC Principles
`11.2.2
`[.ZAC Algorithms
`11.3 DVD—Audio
`
`11.3.1 Meridian LossIess Packing (MLP)
`11.4 Super-Audio CD (SACD)
`
`1 1.4.1
`
`SACD Storage Format
`
`11.4.2
`
`Sigma—Delta Modulators (SDM)
`
`11.4.3 Direct Stream Digital (DSD) Encoding
`
`11.5 Digital Audio Watermarking
`
`1 1.5.1 Background
`1 1.5.2 A Generic Architecture. for DAW
`
`11.5.3 DAW Schemes — Attributes
`
`1 1.6 Summary 01" Commercial Applications
`Problems
`
`Computer Exercise
`
`12 QUALITY MEASURES FOR PERCEPTUAL AUDIO CODING
`
`12.1
`
`Introduction
`
`12.2 Subjective Quality Measures
`
`12.3 Confounding Factors in Subjective Evaluations
`
`12.4 Subjective Evaluations of"1‘w0—('Zhanne1 Standardized Codecs
`
`12.5
`
`12.6
`
`Subjective Evaluations of 5.1—Channel Standardized Codecs
`
`Subjective EvaJuations Using Perceptual Measurement
`Systems
`
`12.6.1 CIR Perceptual Measurement Schemes
`
`12.6.2 NSE Perceptual Measurement Schemes
`
`12.7 Algorithms for Perceptual Measurement
`
`12.7.1
`
`Example: Perceptual Audio Quality Measure (PAQM)
`
`12.7.2 Example: Noise—to—Mask Ratio (NMR)
`
`344
`
`345
`
`346
`
`356
`
`358
`
`358
`
`362
`
`362
`
`364
`
`368
`
`370
`
`374
`
`377
`
`378
`
`382
`
`382
`
`383
`
`383
`
`384
`
`386
`
`387
`
`388
`
`389
`
`390
`
`391)
`391
`
`392
`
`396
`
`399
`
`401
`
`402
`
`405
`
`
`
`
`
`
`459
`INDEX
`
`
`
`
`12.7.3 Example: Objective. Audio Signal Evaluation (OASE)
`
`12.8
`
`ITU—R 138.1387 and lTU—T P.861: Standards for Perceptual
`Quality Measurement
`
`12.9 Research Directions For Perceptual Codec Quality Measures
`
`REFERENCES
`
`
`
`-~-— II|UF n ' rum
`
`Page 12
`
`Page 12
`
`

`

`CHAPTER 1
`
`
`
`
`
`INTRODUCTION
`
`
`Audio coding or audio mmprns‘simi algorithms are used to obtain compact dig-
`ital representations of high—fidelity (wideband) audio signals for the purpose of
`efficient transmission or storage. The central objective in audio coding is to rep-
`resent the signal with a minimum number ol’ hits while achieving transparent
`signal reproduction,
`i.e._. generating output audio that cannot be distinguished
`from the original input, even by a sensitive listener (“golden ears"). This text
`gives an in-depth treatment of algorithms and standards for transparent coding
`of high—fidelity audio.
`
`1.1 HISTORICAL PERSPECTIVE
`
`The introduction ol‘ the compact disc (CD) in the early 19805 brought to the
`fore all of the advantages of digital audio representation,
`including true high—
`tidelity, dynamic range, and robustness. These advantages, however. came at
`the expense of high data rates. Conventional CD and digital audio tape (DAT)
`systems are typically sampled at either 44.1 or 48 kHz using pulse code mod-
`ulation (PCM) with a [6—bit sample resolution. This results in uncompressed
`data rates 01‘ 705.6/768 kb/s for a monaural channel, or .1.4l/l.54 Mb/s for a
`stereo—pair. Although these data rates were accommodated successfully in first—
`generation CD and DAT players. second-generation audio players and wirelessly
`connected systems are often subject to bandwidth constraints that are incompat-
`ible with high data rates. Because of the success enjoyed by the first-generation
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Ant-iris Signal Processing and (lilting, by Andreas Spanias. Ted Paintcit and Venkatraman Atti
`Copyright © 30W by John Wiley & Sons. Inc.
`
`
`
`Page 13
`
`Page 13
`
`

`

`
`
`
`
`INTRODUCTION
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`systems, however, end users have come to expect “CD—quality” audio reproduc—
`tion from any digital system. Therefore. new network and wireless multimedia
`digital audio systems must reduce data rates without compromising reproduc-
`tion quality. Motivated by the need for compression algorithms that can satisfy
`simultaneously the conflicting demands of high compression ratios and trans—
`parent quality for high—fidelity audio signals, several coding methodologies have
`been established over the last two decades. Audio compression schemes, in gen—
`eral, employ design techniques that exploit both perceptual irrelevuac/es and
`Stan's/it'd! redundancies.
`
`PCM was the primary audio encoding scheme employed until the early 1980s.
`PCM does not provide any mechanisms for redundancy removal. Quantization
`methods that exploit the signal correlation, such as differential PCM (DPCM),
`delta modulation [Jaya76] [Jaya84]. and adaptive DPCM (ADPCM) were applied
`to audio compression later (cg, PC audio cards). Owing to the need for dras—
`tic reduction in bit rates, researchers began to pursue new approaches for audio
`coding based on the principles rlfpS)7(5IZOUC‘OMA‘I‘H‘S [Zwic90] [Moor03J. Psychoa—
`coustic notions in conjunction with the basic properties of signal quantization
`have led to the theory of perceptual wimpy lJohn88a] |John88b]. Perceptual
`entropy is a quantitative estimate 01" the fundamental limit of transparent audio
`signal compression. Another key contributirm to the field was the characterization
`of the auditory filter bank and particularly the time-frequency analysis capabili—
`ties of the inner ear [Moor83]. Over the years. severalfilter/yank structures that
`mimic the critical band structure of the auditory filter bank have been proposed.
`A filter bank is a parallel bank of bandpass filters covering the audio spectrum,
`which, when used in conjunction with a perceptual model, can play an important
`role in the identification of perceptual irrelevancies.
`During the early 1990s, several workgroups and organizations such as
`the International Organization for Standardization/Intemational Electro—technical
`Commission (lSO/lEC),
`the International Telecommunications Union (ITU).
`AT&T. Dolby Laboratories. Digital Theatre Systems (DTS), Lueent Technologies,
`Philips, and Sony were actively involved in developing perceptual audio coding
`algorithms and standards. Some of the popular commercial standards published
`in the early 1990s include Dolby’s Audio Coder—3 (AC3). the DTS Coherent
`Acoustics (DTS—CA), Lucent Technologies’ Perceptual Audio Coder (PAC),
`Philips’ Precision Adaptive Subband Coding (PASC). and Sony's Adaptive
`Transform Acoustic Coding (ATRAC). Table 1.]
`lists chronologically some of
`the prominent audio coding standards. The connnercial success enjoyed by
`these audio coding standards triggered the launch of several multimedia storage
`formats.
`
`Table 1.2 lists some of the popular multimedia storage formats since the begin—
`ning of the CD era. lligit-performance stereo systems became quite common with
`the advent of CDs in the early l980s. A compactidisc—read only memory (CD-
`ROM) can store data up to 700—800 MB in digital form as "mieroscopic»pits"
`that. can be read by a laser beam off of a reflective surface or a medium. Three
`competing storage media — DAT. the digital compact cassette (DCC), and the
`
`
`
`_m._.._ “a...“ mm-g.
`.
`l'I'1'"I'!I u mun I.
`..
`----~mnr--ur : “I'm“.—
`
`
`Page 14
`
`Page 14
`
`

`

` HISTORICAL PERSPECTIVE
`
`3
`
`Table 1.1. List of perceptual and lossless audio coding standards/algorithins.
`
`Standard/LII gori t hut
`
`Related references
`
`I. lSO/lEC MPEG—1 audio
`
`2. Philips” PASC (l'or DCC applications)
`3. AT&'l‘/1'.ucent PAC/.i-{PAC
`4i Dolby AC2
`5. AC—3/Dolby Digital
`0. ISO/1H7 MPEG—2 (BC/1.81") audio
`7. Sony's A'I‘RAC; (MiniDisc and SDDS)
`8. SHOR’I‘EN
`
`[1301.92]
`[Lokh92]
`[.lohng6c] [Sinh96]
`['Davi92] [Fiel91j
`[Davi593l lr‘iel96]
`[8019421]
`[Yosh94] l'l‘sut96]
`lRObi94j
`[Wyli96b]
`9. Audio processing technoliiigy — APT-XIOO
`[180,196]
`JO. ISO/Hit? MPEG-2 AAC
`[Smyt96] [Smyt99]
`ll. DTS coherent acoustics
`[ct-awe] [Crav97]
`[2. The DVD Algorithm
`[Wege97]
`l3. MUSICompress
`[Pura97l
`l4. l.i()ssless transform coding of audio (III‘AC)
`[HansQSh] [HansOl]
`i5. AudioPuK
`[180199]
`16. ISO/IEC MPEG—4 audio version 1
`[Gerz99]
`l7. Meridian lossless packing (MLP)
`IISOIOOJ
`18. lSO/lEC MPEG—4 audio version '2
`lGeigOII [GeigOZl
`t9. Audio coding based on integer transforms
`lReefOla] [.lansOSJ
`2t). Direct—strewn digital t’DSD) technology
`
`
`Table 1.2. Some of the popular audio storage
`formats.
`
`
`Related references
`Audio storage format
`
`
`Pym».-
`
`Compact disc
`Di gita’l audio tape (DAT)
`Digital COITIP'dCi cassette (DCC)
`MiniDisc
`
`‘Jl
`Digital versatile disc (DVD)
`6. DVD-audio (DVD-A)
`>1
`Super audio CD (SACD)
`
`icoszt [IECAH'H
`twniksst muse]
`[Lulx'llq l
`t [Lnl-tlt‘JI]
`trusimt i'rsuttim
`tot/Dom
`tot-Inn I
`[sacrum
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 15
`
`MiniDisc (MD) — entered the commercial market during 1987—1992. intended
`mainly for back—up hi ghidensity storage (~ I .3 GB), the DAT became the primary
`source of mass data storage/transfer [Watk88] [Ttm89]. in 1991—1992. Sony pro—
`posed a storage medium called the MiniDisc. primarily for audio storage. MD
`employs the ATRAC algorithm for compression. In I991, Philips introduced the
`DCC, a successor of the analog compact cassette. Philips DCC employs a com—
`pression scheme called the PASC [Lokhgl] [Lok1192] [Hoog94]. The DCC began
`
`
`
`Page 15
`
`

`

`
`
`4
`
`INTRODUCTION
`
`as a potential competitor for DA’I‘s but was discontinued in [996. The introduc—
`tion of the digital versatile disc (DVD) in 1996 enabled both video and audio
`recording/storage as well as text—message programming. The DVD became one
`of the most successful storage. media. With the improvements in the audio conr
`pression and DVD storage technologies. multichannel surround sound encoding
`formats gained interest |Bosi93] ll-lolrnQ9j [BosiOO].
`late
`With the emergence of streaming audio applications. during the
`19905,
`researchers pursued techniques such as combined speech and audio
`architectures, as well as joint sourceichannel coding algorithms that are optimized
`for the packet—switched Internet. The advent of ISO/IEC MPEG-4 standard
`
`(1996—2000) [180199] [lSOlOO] established new research goals for high—quality
`coding of audio at low bit rates. MPEG—4 audio encompasses more functionality
`than perceptual coding [KoenQB] [Koen99|. It comprises an integrated family of
`algorithms with provisions for scalable, object-based speech and audio coding at
`bit rates from as low as 200 b/s up to 64 kb/s per channel.
`The emergence of the DVD-audio and the super audio CD (SAC'D) pro-
`vided designers with additional storage capacity, which moti 'ated research in
`lossless audio coding [Crav96] ['Ger‘zQQ] [ReefOla]. A losslcss audio coding sys—
`tem is able to reconstruct perfectly a bit—for—bit representation of the original
`input audio. In contrast, a coding scheme incapable of perfect reconstruction is
`called lass): For most audio program material, lossy schemes offer the advair
`tagc of lower bit rates (cg. less than l bit per sample) relative to lossless
`schemes (cg, 10 bits per sample). Delivering real—time lossless audio content
`to the network browser at low bit rates is the next grand challenge for codec
`designers.
`
`1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE
`
`Over the last few years, researchers have proposed several efficient signal models
`(cg, transforn’t—based. subband-filtcr structures, wavelet—packet) and compression
`standards (Table H) for high—quality digital audio reproduction. Most of these
`algorithms are based on the generic architecture shown in Figure l.l.
`The coders typically segment input signals into quasi—stationary frames ranging
`from 2 to 50 ms. Then, a time—frequency analysis section estimates the temporal
`and spectral components of each frame. The time—frequency mapping is usually
`matched to the analysis properties of the human auditory system. Either way.
`the ultimate objective is to extract from the input audio a set of time—frequency
`parameters that is amenable to quantization according to a perceptual distortion
`metric. Depending on the overall design objectives. the time-frequency analysis
`section usaally contains one of the following:
`
`. Unitary transform
`
`- Time—invariant bank of critically sampled. uniform/nonuniforrn bandpass
`filters
`
`m,“ ,
`
`_
`
`.. "mu.-.” .
`
`, m gun I'
`
`k urn-run mmu-
`
`--
`
`‘--—Imrrm"w
`
`Page 16
`
`Page 16
`
`

`

`_‘
`Input
`l
`'
`Time»
`|
`audio
`Q antiz t'
`,_
`JP frequency —. arid encgdiirn _' *
`i
`analysts
`g
`7
`
`AUDIO CODER A'l‘l'FllBUTES
`
`5
`
`Parameters
`
`_
`_
`'
`t |
`
`
`
`
`"
`To
`channel
`
`rates (<32 kb/s), with an acceptable
`
`
`
`i
`
`[P
`
`_..
`
`____
`
`.
`r _
`sychoacoustic
`analysis —r Bit allocation
`_ n; i
`..
`.
`
`.
`
`__,
`
`‘7,
`
`- 9 -
`
`Entropy
`(loss-tees}
`coding
`
`MUX
`
`'
`
`:
`
`Masking
`thresholds
`
`Side
`information
`
`Figure 1.1. A generic perceptual audio encoder.
`
`. Timewarying (signal—adaptive) bank of critically sampled, uniform/nonunitl
`orm bandpass filters
`- Harmonic/sinusoidal analyzer
`
`. Source—system analysis (LPG and multipulse excitation)
`
`- Hybrid versions of the above.
`
`The choice of time—frequency analysis methodology always involves a fun-
`damental tradeotT between time and frequency resolution requirements. Percep-
`tual distortion control
`is achieved by a psychoacoustic signal analysis section
`that estimates signal masking power based on psychoacoustic principles. The
`psychoacoustic model delivers masking thresholds that quantify the maximum
`amount of distortion at each point in the limerfrequency plane such that quan-
`tization of the time-frequency parameters does not introduce audible artifacts.
`The psychoacoustic model therefore allows the quantization section to exploit
`perceptual
`irrelevancies. This section can also exploit statistical redundancies
`through classical techniques such as DPCM or ADPCM. Once a quantized corn—
`pact parametric set has been formed, the remaining redundancies are typically
`removed through noiseless runilcngth (RL) and entropy coding techniques, eg,
`Huffman [Covc9ll. arithmetic [Witt87], or Lempel-Ziv—Welch (LZW) ['Ziv77l
`[Welc84]. Since the output of the psychoacoustic distortion control model
`is
`signal—dependent, most algorithms are inherently variable rate. Fixed channel
`rate requirements are usually satisfied through buffer feedback schemes, which
`often introduce encoding delays.
`
`1.3 AUDIO CODER ATTRIBUTES
`
`Perceptual audio coders are typically evaluated based on the following attributes:
`audio reproduction quality. operating bit rates. computational complexity, codec
`delay. and channel error robustness. The objective is to attain a high—quality
`(transparent) audio output at
`low bit
`
`Page 17
`
`Page 17
`
`

`

`
`
`lNTRODUCTION
`
`6
`
`
`
`
`
`
`
`
`
`
`
`
`algorithmic delay (~5 to 20 ms). and with low computational complexity (~l to
`ID million instructions per second, or MIPS).
`
`1.3.1 Audio Quality
`
`importance when designing an audio coding
`Audio quality is of paramount
`algorithm. Successful strides have been made since the development of simi
`ple near—transparent perceptual coders. Typically, classical objective measures of
`signal fidelity such as the signal
`to noise ratio (SNR) and the total harmonic
`distortion (THD) are inadequate [Ryde96]. As the field of perceptual audio cod—
`ing matured rapidly and created greater demand for listening tests, there was a
`corresponding growth of interest in perceptual measurement schemes. Several
`subjective and objective quality measures have been proposed and standard—
`ized during the last decade. Some of these schemes include the noisc-to—mask
`
`the perceptual audio quality measure (PAQM,
`I987) [BranS7a]
`ratio (NMR.
`I991) [Beer9l], the perceptual evaluation (PERCFVAL, 1992) [PailQZ], the per—
`ceptual objective measure (POM. 1995) [C(ilo95], and the objective audio signal
`evaluation (OASE. I997) [Spor97]. We will address these and several other qual-
`ity assessment schemes in detail in Chapter l2.
`
`1.3.2 Bit Rates
`
`From a codec designer’s point of view. one of the key challenges is to rep—
`resent high-fidelity audio with a minimum number of bits. For instance.
`if a
`5-ms audio frame sampled at 48 kHz (240 samples per frame) is represented
`using 80 bits, then the encoding bit, rate would be 80 bits/5 ms = 16 kb/s. Low
`bit rates imply high compression ratios and generally low reproduction qual—
`ity. Early coders such as the ISO/113C MPEG—1 (32—448 kh/s). the Dolby AC—3
`(32—384 kb/s), the Sony ATRAC (256 kb/s), and the Philips PASC (193 kb/s)
`employ high bit rates for obtaining transparent audio reproduction. However. the
`development of several sophisticated audio coding tools te.g., MPEG—4 audio
`tools) created ways for efficient transmission or storage of audio at rates between
`8 and 32 kb/s. Future audio coding algorithms promise to offer reasonable qual-
`ity at low rates along with the ability to scale both rare and quality to match
`different requirements such as time 'a'rying channel capacity.
`
`1.3.3 Complexity
`
`Reduced computational complexity not only enables real—time implementation
`but may also decrease the power consumption and extend battery life. Corn
`putational complexity is usually measured in terms ol~ millions of instructions
`per second (MIPS). Complexity estimates are processor—dependent. For example,
`the complexity associated with Dolby’s AC—3 decoder was estimated at approxi—
`mately 27 MIPS using the Zoran ZR38001 general-ptn‘pose DSP core [.Vern95];
`for the Motorola DSPSGUOZ processor,
`the complexity was estimated at 45
`MIPS [Vet-1.195]. Usually. most of the audio codccs rely on the so—called asym—
`metric encoding principle. This means that the codec complexity is not evenly
`
`
`
`
`
`
`
`- ‘Wl_"-III|'IEYI u-«r .__,..
`
`tum-v-
`
`flkll‘fl'" -'m
`
`Page 18
`
`Page 18
`
`

`

`Shared between the encoder and the decoder (typically, encoder 80% and decoder
`20% complexity). with more emphasis on reducing the decoder complexity.
`
`1.3.4 Codec Delay
`
`Many of the network applications for high—fidelity audio (streaming audio. audio—
`onidemandl are delay tolerant (up to 100—200 ms). providing the opportunity
`to exploit
`long—term signal properties in order to achieve high coding gain.
`However.
`in two-way real—time communication and voice—over lnternet proto
`col (VolP) applications, low-delay encoding (10720 ms) is important. Consider
`the example described before. i.c., an audio coder operating on frames of 5 ms
`at a 48 kHz sampling frequency. In an ideal encoding scenario, the minimum
`amount of delay should he 5. ms at the encoder and 5 ms at the decoder (same as
`the frame length). However, other factors such as analysis—synthesis filter bank
`window,
`the loolcahead. the bit—reservoir. and the channel delay contribute to
`additional delays. Employing shorter analysis—synthesis windows, avoiding look
`ahead, and re—structuring the bit—reservoir functions could result in low—delay
`encoding, nonetheless. with reduced coding efficiencies.
`
`TYPES OF AUDIO CODERS — AN OVERVIEW
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1.3.5 Error Robustness
`
`The increasing popularity of streaming audio over packet—switched and wire-
`less networks such as the Internet implies that any algorithm intended for such
`applications must be able to deal with a noisy time—varying channel. In partic—
`ular, provisions for error robustness and error protection must be incorporated
`at the encoder in

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket