throbber
LIBRARY OF CONGRESS
`
`Office of Business Enterprises
`Duplication Services Section
`
`THIS IS TO CERTIFYthat the collections of the Library of Congress contain a bound
`volumeentitled AUDIO SIGNAL PROCESSING AND CODING,call number TK
`5102.92.S73 2007, Copy 1. The attached — Cover Page, Title Page, Copyright Page, Table of
`Contents Pages, Chapter 1 and Chapter 10 - are a true and complete representation from that
`work.
`
`THIS IS TO CERTIFY FURTHER,that work is marked with a Library of Congress
`Cataloging-in-Publication stamp dated March 5, 2007.
`
`IN WITNESS WHEREOF,the seal of the Library of Congressis affixed hereto on
`May18, 2018.
`
`4ude
`
`Deirdre Scott
`Business Enterprises Officer
`Office of Business Enterprises
`
`Library of Congress
`
`NETFLIX, INC
`101 Independence Avenue, SEH Washington, DC 20540-4917 Tel 202.707.5650 www.loc.gov; duplicationservices@loc.gov Exhibit 1009
`IPR2018-01630
`Page 1
`
`Page 1
`
`
`
`NETFLIX, INC
`Exhibit 1009
`IPR2018-01630
`
`

`

` ICAUICCORRINALCLOM (aU.CUCIMCOAT
`
`
`1,0, ene
`
` |!
`
`:
`Ai
`|
`fife
`vm eae ata El
`
`| ie=aU lyUH OL |Ua
`
`COAUBbg
`
`AUTOIORY(iE
`BOXCRS ROULI!
`
`
`
`Page 2
`
`

`

`
`
`AUDIO SIGNAL
`PROCESSING
`AND CODING
`
`
`
`Andreas Spanias
`Ted Painter
`Venkatraman Atti
`
`Fe ee eS
`
`|1807([
`=
`z
`
`*| 4WILEY|;
`
`
`
`
`|} 2007 |
`=
`Ir
`WILEY-INTERSCIENCE
`A John Wiley & Sons,Inc., Publication
`
`
`
`
`
`Page 3
`
`
`
`
`
`
`Page 3
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`\
`
`aA
`WA Gop |
`|
`Copyright © 2007 by John Wile Son; fcZAll rightsreserved
`Published by Jobn wiley? keSONS, Ine. Hoboken,NewJersey.
`Published simultaneously in Candilas-——
`
`No part ofthis publication maybe reproduced, stored in a retrieval system. or transmittedin any
`form or by any means. electrome, inechatical, photocopying, recording. scanning. or othenyise,
`except as permitted under Section 107 or 108 ofthe 1976 United States Copyright Act. without
`either the prior written permission of the Publisber. or authorization through payment ofthe
`appropriate per-copy fee Lo the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers,
`MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web al www.copyrighteom. Requests
`lo the Publisher for permission should be addressed to the Permissions Department, John Wiley &
`Song,Inc.. LL] River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at
`hitp:/www.wiley.conVgo/permission
`
`Limit of Liability/Diselaimer of Warranty: While the publisher and author have used their best
`efforts in preparing this book,
`they make no representations or warranties with respect to the
`accuracy or completeness ofthe contents of this book and specitically disclaim any implied
`warranties of merchantability or fitness for a particular purpose. No warranty may be created or
`extended by sales representatives or written sales materials. The advice andstrategies contained
`herein may nol be suilable for your situation. You should consult with a professional where
`appropriate. Neither the publisher nor author shall be lable for any loss of profit or any other
`commercial damages, including but not limited to special, incidental, consequential, or othe
`damages.
`
`For general information on ourother products and services or for technical support, please contact
`our Customer Care Depariment within the United States at (800) 762-2974, outside the United
`States at (317) 572-3993 or fax (317) 572-4002.
`
`Wiley also publishes its books in a variely of electronic formats. Some content that appears in print
`maynat be available in electronic formats. For more information about Wiley products. visit our
`web site al www.wiley.com.
`
`Wiley Bicentennial Logo: Richard J. Pacitico
`
`Library of Congress Cataloging-in-Publication Data:
`
`Spanias, Andreas.
`Audio signal processing and coding/by Andreas Spanias, Ted Painter, Venkatraman Aut.
`p. em,
`“Wiley-Interscicnce publication.”
`Includes bibliographical references and index.
`ISBN: 978-0-471-79147-8
`1. Coding theory. 2, Signal processing- Digital echniques. 3. Sound-Recording and
`reproducing -Digital techniques. I. Painter, Ted, 1967-11. Aui, Venkatraman, 1978-TIL.
`Title.
`
`TK5102.92.873 2006
`621.382°8 —de22
`
`2000040507
`aeem
`
`Printed in the United States of America.
`
`1098765432
`
`
`
`
`a emerse namie St eeeate ee oe Ts a ee|
`
`
`
`Page 4
`
`Page 4
`
`

`

`
`CONTENTS
`
`
`
`
`PREFACE
`
`1.
`
`INTRODUCTION
`
`1.
`
`HWw=I.
`
`1.
`
`Historical Perspective
`A General Perceptual Audio Coding Architecture
`Audio Coder Attributes
`
`«x
`
`UI
`
`1 1
`
`3
`
`13
`
`13
`
`16
`
`7 4
`
`0)
`
`1.3.1
`1.3.2
`
`Audio Quality
`Bit Rates
`
`Complexity
`1.3.3.
`Codec Delay
`I.3.4
`Error Robustness
`1.3.5
`Types of Audio Coders — An Overview
`Organization of the Book
`Notational Conventions
`Problems
`
`Computer Exercises
`
`[4
`
`1.5
`
`1.6
`
`2
`
`SIGNAL PROCESSING ESSENTIALS
`
`tytwMwNNtnfewwhMoS
`
`Introduction
`Spectra of Analog Signals
`Review of Convolution and Filtering
`Uniform Sampling
`Discrete-Time Signal Processing
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 5
`
`Page 5
`
`

`

`vill
`
`CONTENTS
`
`20
`
`22
`
`23
`e
`28
`
`25
`
`27
`
`29
`
`30
`
`3 3
`
`3
`
`35
`
`36
`
`36
`39
`
`42
`
`4d
`
`44
`
`45
`
`aT
`
`2.§.1
`Transforms for Discrete-Time Signals
`2.5.2
`The Discrete and the Fast Fourier Transform
`2.5.3.
`The Discrete Cosine Transtorm
`2.5.4
`The Short-Time Fourier Transform
`Difference Equations and Digital Filters
`The Transfer and the Frequency Response Functions
`2.7.1
`Poles. Zeros, and Frequency Response
`2.7.2
`Examples of Digital Filters for Audio Applications
`Reviewof Multirate Signal Processing
`2.8.1
`Down-sampling by an Integer
`2.8.2
`Up-sampling by an Integer
`2.8.3
`Sampling Rale Changes by Noninteger Factors
`2.8.4
`Quadrature Mirror Filter Banks
`Discrete-Time Random Signals
`2.9.1
`RandomSignals Processed by LT] Digital Filters
`2.9.2
`Autocorrelation Estimation from Finite-Length Data
`Summary
`Problems
`
`Computer Exercises
`
`2.10
`
`3
`
`QUANTIZATION AND ENTROPY CODING
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`82 eee
`
`
`
`3.1
`
`3.2
`
`3.3
`
`3.4
`
`Introduction
`3.1.1
`The Quantization—Bit Allocation—Entropy Coding
`Module
`Density Functions and Quantization
`Scalar Quantization
`3.3.1
`Uniform Quantization
`3.3.2
`Nonuniform Quantization
`3.3.3.
`Differential PCM
`
`Vector Quantization
`3.4.1
`Structured VQ
`3.4.2
`Split-VQ
`
`3.4.3
`Conjugate-Structure VQ
`
`Bit-Allocation Algorithms
`
`Entropy Coding
`
`3.6.1.
`Huffman Coding
`
`3.6.2
`Rice Coding
`
`3.6.3.
`Golomb Coding
`
`Page 6
`
`

`

`
`
`CONTENTS
`
`3.7
`
`Anthmetic Coding
`3.6.4
`Summary
`Problems
`
`Computer Exercises
`
`LINEAR PREDICTION IN NARROWBAND AND WIDEBAND
`CODING
`
`4.)
`
`4.2
`4.3
`
`4.4
`4.5
`
`4.6
`
`4.7
`
`Introduction
`
`LP-Based Source-System Modeling for Speech
`Short-Term Linear Prediction
`
`Long-Term Prediction
`4.3.1
`ADPCMUsing Linear Prediction
`4.3.2
`Open-Loop Analysis-Synthesis Linear Prediction
`Analysis-by-Synthesis Linear Prediction
`4.5.1
`Code-Excited Linear Prediction Algorithms
`Linear Prediction in Wideband Coding
`4.6.1 Wideband Speech Coding
`4.6.2 Wideband Audio Coding
`
`Summary
`Problems
`
`Computer Exercises
`
`PSYCHOACOUSTIC PRINCIPLES
`
`91
`
`O\
`
`92
`
`O4
`
`9S
`
`96
`
`96
`97
`
`100
`
`102
`
`102
`
`104
`
`106
`107
`
`108
`
`On
`
`a)
`
`AOA
`
`Bowiy
`
`¢ Asymmetry. and the Spread
`
`Introduction
`Absolute Threshold of Hearing
`Critical Bands
`Simultaneous Masking, Maskin
`of Masking
`5.4.1
`Noise-Masking-Tone
`§.4.2
` Tone-Masking-Noise
`5.4.3
`Noise-Masking-Noise
`5.4.4
`Asymmetry of Masking
`54.5
`The Spread of Masking
`Nonsimultaneous Masking
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Perceptual Entropy
`Example Codec Perceptual Model: ISO/IEC 11172-3
`(MPEG - 1) Psychoacoustic Model
`|
`5.7.1
`Step 1: Spectral Analysis and SPL Normalization
`
`
`
`Page 7
`
`Page 7
`
`

`

`
`
`CONTENTS
`
`Step 2: [dentification of Tonal and Noise Maskers
`5.7.2
`Step 3: Decimation and Reorganization of Maskers
`5.7.3.
`Step 4: Calculation of Individual Masking Thresholds
`5.7.4
`Step 5: Calculation of Global Masking Thresholds
`5.7.5
`Perceptual Bit Allocation
`Summary
`Problems
`
`5.8
`5.9
`
`Computer Exercises
`
`TIME-FREQUENCY ANALYSIS: FILTER BANKS AND
`TRANSFORMS
`
`6.1
`6.2
`6.3
`
`64
`6.5
`6.6
`6.7
`
`6.8
`
`Introduction
`Analysis-Synthesis Framework for M-hand Filter Banks
`Filter Banks for Audio Coding: Design Considerations
`6.3.1
`The Role of Time-Frequency Resolution in Masking
`Power Estimation
`The Role of Frequency Resolution in Perceptual Bit
`Allocation
`The Role of Time Resolution in Perceptual Bit
`Allocation
`
`6.3.2
`
`6.3.3
`
`Quadrature Mirror and Conjugate Quadrature Filters
`‘Tree-Structured QMF and CQF M-band Banks
`Cosine Modulated “Pseudo QMF” M-band Banks
`Cosine Modulated Perfect Reconstruction (PR) 4-band Banks
`and the Modified Discrete Cosine Transform (MDCT)
`6.7.1
`Forward and Inverse MDCT
`
`6.7.2. MDCT Window Design
`6.7.3
`Example MDCT Windows (Prototype FIR Filters)
` Diserete Fourier and Discrete Cosine Transform
`
`13
`135
`136
`138
`138
`140
`(40
`
`14]
`
`145
`
`145
`146
`148
`
`149
`
`149
`
`150
`
`155
`156
`160
`
`163
`165
`
`165
`167
`178
`
`6.10.2. Window Switching
`6.10.3 Hybrid. Switched Filter Banks
`6.10.4 Gain Modification
`
`6.11
`
`6.10.5 Temporal Noise Shaping
`Summary
`Problems
`
`Computer Exercises
`
`182
`J&4
`185
`
`185
`186
`188
`
`19]
`
`
`6.9—Pre-echo Distortion 180
`6.10
`Pre-echo Control Strategies
`[$2
`6.10.1
`Bil Reservair
`182
`
`
`
`Ses nici mec orgy
`
`Page 8
`
`Page 8
`
`

`

`
`
`CONTENTS
`
`TRANSFORM CODERS
`
`195
`
`7.1
`7.2
`7.3.
`
`SUBBAND CODERS
`
`xi
`
`
`
`
`195
`Introduction
`
`196
`Optimum Coding in the Frequency Domain
`197
`Perceptual Transform Coder
`
`198
`7.3.)
` PXFM
`
`
`199
`7.3.2
`SEPXFM
`
`200
`Brandenburg-Johnstan Hybrid Coder
`7.4
`201
`CNET Coders
`7.5
`
`
`
`75..CNET {DEIf Coder 201
`
`7.5.2.
`CNET MDCT Coder |
`201
`
`7.5.3.
`CNET MDCT Coder 2
`202
`
`7.6
`Adaptive Spectral Entropy Coding
`203
`
`7.7~~Differential Perceptual Audio Coder 204
`
`7.8
`DFT Noise Substitution
`205
`
`
`7.9
`DCTwith Vector Quantization
`206
`7.10 MDCT with Vector Quantization
`207
`
`7.41
`Summary
`208
`
`Problems
`208
`
`
`Computer Exercises
`210
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Introduction
`8.1.1
`Subband Algorithms
`DWT and Discrete Wavelet Packet Transform (DWPT)
`Adapted WP Algorithms
`8.3.1
` DWPT Coder with Globally Adapted Daubechies
`Analysis Wavelet
`Scalable DWPT Coder with Adaptive Tree Structure
`DWPT Coder with Globally Adapted General
`Analysis Wavelet
`DWPT Coder with Adaptive Tree Structure and
`Locally Adapted Analysis Wavelet
`DWPT Coder with Perceptually Optimized Synthesis
`Wavelets
`Adapted Nonuniform Filter Banks
`84.1
`Switched Nonuniform Filter Bank Cascade
`8.4.2
`Frequency-Varying Modulated Lapped Transforms
`Hybrid WP and Adapted WP/Sinusoidal Algorithins
`
`8.1
`
`8.2
`8.3
`
`8.4
`
`832
`83.3
`
`8.3.4
`
`8.3.5
`
`
`
`211
`
`At
`22
`214
`218
`
`218
`220
`
`223
`
`223
`
`224
`226
`226
`227
`227
`
`Page 9
`
`Page 9
`
`

`

`
`
`CONTENTS
`
`8.5.1
`8.5.2
`8.5.3.
`
`Hybrid Sinusoidal/Classical DWPT Coder
`Hybrid Sinusoidal/M -band DWPTCoder
`Hybrid Sinusoidal/DWPTCoder with WP Tree
`Structure Adaptation (ARCO)
`Subband Coding with Hybrid Filter Bank/CELP Algorithms
`8.6.1
`Hybrid Subband/CELPAlgorithm for Low-Delay
`Applications
`Hybrid Subband/CELP Algorithm for
`Low-Compleaity Applications
`Subband Coding with IIR Filter Banks
`Problems
`
`8.6.2.
`
`8.6
`
`8.7
`
`Computer Exercise
`
`SINUSOIDAL CODERS
`
`228
`
`229
`
`230
`
`BS
`
`234
`
`235
`
`237
`
`237
`
`240
`
`241
`
`241
`
`242
`
`242
`
`245
`247
`
`248
`
`248
`
`246
`
`249
`250
`
`9.1
`
`92
`
`9.3
`
`9.4
`
`9.6
`
`97
`
`9.8
`
`Introduction
`
`The Sinusoidal Model
`
`Sinusoidal Analysis and Parameter Tracking
`9.2.1
`Sinusoidal Synthesis and Parameter Interpolation
`9.2.2
`Analysis/Synthesis Audio Codec (ASAC)
`93.1
` ASAC Segmentation
`9.3.2
`ASAC Sinusoidal Analysis-by-Synthesis
`9.3.3.
`ASACBit Allocation, Quantization, Encoding, and
`Scalability
`Harmonie and Individual Lines Plus Noise Coder (HIEN)
`94.1
`HILN Sinusoidal Analysis-by-Synthesis
`94.2
`HILN Bit Allocation, Quantization. Encoding, and
`Decoding
`FM Synthesis
`9.5.1
`Principles of FM Synthesis
`9.5.2
`Perceptual Audio Coding Using an FM Synthesis
`Model
`
`The Sines + Transients + Noise (STN) Model
`Hybrid Sinusoidal Coders
`9.7.1
`Hybrid Sinusoidal-MDCTAlgorithm
`9.7.2
`Hybrid Sinusvidal-Vocoder Algorithin
`Summary
`Problems
`
`
`Computer Exercises
`
`
`
`Page 10
`
`Page 10
`
`

`

`10 AUDIO CODING STANDARDS AND ALGORITHMS
`
`Introducuon
`
`10.1
`
`10.2
`
`10.4
`
`10.5
`
`10.6
`
`10.7
`
`10.8
`
`10.9
`
`CONTENTS
`
`xiii
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`MIDI Versus Digital Audio
`10.2.1 MIDI Synthesizer
`10.2.2. General MID] (GM)
`10.2.3. MIDI Applicauons
`Multichannel Surround Sound
`10.3.1 The Evolution of Surround Sound
`10.3.2 The Mono. the Stereo, and the Surround Sound
`Formats
`10.3.3. The ITU-R BS.775 5.1-Channe} Configuration
`MPEGAudio Standards
`10.4.1 MPEG-I Audio (ISO/TEC 11172-3)
`10.4.2 MPEG-2 BC/LSF (SO/EC-1 3818-3)
`10.4.3 MPEG-2 NBC/AAC (ISOHEC-13818-7)
`10.4.4 MPEG-4 Audio (ISO/IEC 14496-3)
`10.4.5 MPEG-7 Audio (ISOAEC 15938-4)
`10.4.6 MPEG-21 Framework (SO/IEC-21000)
`10.4.7. MPEG Surround and Spatial Audio Coding
`Adaptive Transform Acoustic Coding (ATRAC)
`Lucent Technologies PAC, EPAC, and MPAC
`10.6.1
`Perceptual Audio Coder (PAC)
`10.6.2 Enhanced PAC (EPAC)
`10.6.3 Multichannel PAC (MPAC)
`Dolby Audio Coding Standards
`10.7.1 Dolby AC-2, AC-2A
`10.7.2 Dolby AC-3/Dolby Digital/Dolby SR - D
`Audio Processing Technology APT-x100
`DLS — Coherent Acoustics
`10.9.1
`Framing and Subband Analysis
`10.9.2.
`Psychoacoustic Analysis
`10.9.3 ADPCM- Differential Subband Ceding
`10.9.4 Bit Allocation, Quantization, and Multiplexing
`10.9.5 DTS-CA Versus Dolby Digital
`Problems
`Computer Exercise
`
`263
`
`263
`
`264
`
`264
`206
`
`266
`
`267
`
`267
`
`208
`268
`270
`
`275
`
`279
`283
`
`289
`309
`
`317
`
`319
`319
`
`32]
`
`321
`
`323
`
`BES
`
`338
`
`339
`
`339
`341
`
`342
`342
`
`342
`
`343
`
`141 LOSSLESS AUDIO CODING AND DIGITAL WATERMARKING
`
`
`11.1
`Introduction
`
`
`
`
`
`Page 11
`
`Page 11
`
`

`

`11.2 Lossless Audio Coding (L°AC)
`11.2.1
`L*ACPrinciples
`11.2.2.
`[7AC Algorithms
`11.3. DVD-Audio
`11.3.1 Meridian Lossless Packing (MLP)
`11.4 Super-Audio CD (SACD)
`11.4.1
`SACD Storage Format
`11.4.2 Sigma-Delta Modulators (SDM)
`11.4.3. Direct Stream Digital (DSD) Encoding
`11.5 Digital Audio Watermarking
`11.5.1] Background
`11.5.2. A Generic Architecture for DAW
`
`11.5.3. DAWSchemes — Attributes
`11.6 Summary of Commercial Applications
`Problems
`Computer Exercise
`
`344
`345
`3.46
`356
`358
`3458
`362
`362
`364
`368
`370
`374
`
`377
`378
`382
`382
`
`12 QUALITY MEASURES FOR PERCEPTUAL AUDIO CODING
`
`383
`
`12.1
`Introduction
`12.2 Subjective Quality Measures
`12.3 Confounding Factors in Subjective Evaluations
`[2.4 Subjective Evaluations of Two-Channel Standardized Codecs
`12.5
`Subjective Evaluations of 5.1-Channel Standardized Codecs
`12.6
`Subjective Evaluations Using Perceptual Measurement
`Systems
`12.6.1 CIR Perceptual Measurement Schemes
`12.6.2 NSEPerceptual Measurement Schemes
`12.7 Algorithms for Perceptual Measurement
`12.7.1
`Example: Perceptual Audio Quality Measure (PAQM)
`12.7.2. Example: Noise-to-Mask Ratio (NMR)
`12.7.3. Example: Objective Audio Signal Evaluation (OASE)
`ITU-R BS.1387 and ITU-T P.861: Standards for Perceptual
`Quality Measurement
`12.9 Research Directions for Perceptual Codec Quality Measures
`
`12.8
`
`383
`384
`386
`387
`388
`
`389
`390
`390
`391
`392
`396
`399
`
`me AO
`402
`
`REFERENCES
`
`405
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`xiv
`
`CONTENTS
`
`
`
`
`
`
`
` INDEX
`
` |
`Page 12
`
`Page 12
`
`

`

`
`
`
`CHAPTER1
`
`
`INTRODUCTION
`
`
`Audio coding or audio compression algorithms are used to obtain compact dig-
`ital representations of high-fidelity (wideband) audio signals for the purpose of
`efficient transmission or storage. The central objective in audio coding is to rep-
`resent the signal with a minimum number of bits while achieving transparent
`signal reproduction,
`i.c., generating output audio that cannot be distinguished
`{rom the original input, even by a sensitive listener (“golden ears”). This text
`gives an in-depth treatment of algorithms and standards for transparent coding
`of high-fidelity audio.
`
`1.1 HISTORICAL PERSPECTIVE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`The introduction of the compact disc (CD) in the early 1980s brought to the
`fore all of the advantages of digital audio representation,
`including true high-
`fidelity, dynamic range, and robustness. These advantages, however, came at
`the expense of high data rates. Conventional CD and digital audio tape (DAT)
`sysiems are typically sampled at either 44.1 or 48 kHz using pulse code mod-
`ulation (PCM) with a
`[6-bit sample resolution. This results in uncompressed
`data rates of 705.6/768 kb/s for a monaural channel, or 1.41/1.54 Mb/s for a
`stereo-pair. Although these data rates were accommodated successfully tn first-
`generation CD and DATplayers, second-generation audio players and wirelessly
`connected systems are often subject to bandwidth constraints that are incompat-
`ible with high data rates. Because of the success enjoyed by the first-generation
`
`Audio Signal Processing and Coding, by Andreas Spanias. Ted Painter. and Venkatraman Att
`Copyright © 2007 by John Wiley & Sons. Inc,
`
`
`
`Page 13
`
`Page 13
`
`

`

`
`
`
`
`INTRODUCTION
`
`systems, however, end users have come to expect “CD-quality” audio reproduc-
`tion from any digital system. Therefore, new network and wireless multimedia
`digital audio systems must reduce data rates without compromising reproduc-
`tion quality. Motivated by the needfor compression algorithms that can satisfy
`simultaneously the conflicting demands of high compression ratios and trans-
`parent quality for high-fidelity audio signals, several coding methodologies have
`beenestablished overthe last two decades. Audio compression schemes, in gen-
`eral, employ design techniques that exploit both perceptual irrelevancies and
`Statistical redundancies.
`PCMwasthe primary audio encoding scheme employed until the early 1980s.
`PCM does not provide any mechanisms for redundancy removal. Quantization
`methods that exploit the signal correlation, such as differential PCM (DPCM),
`delta modulation [Jaya76] [Jaya84], and adaptive DPCM (ADPCM)were applied
`to audio compression later (e.g., PC audio cards). Owing lo the need for dras-
`lic reduction in bit rates, researchers began to pursue new approaches for audio
`coding based onthe principles ofpsychoacousties [Zwic9)] [Moor03]. Psychoa-
`coustic notions in conjunction with the basic properties of signal quantization
`have led to the theory of perceptual entropy (John88a] [Johng8b]. Perceptual
`cntropy is a quantilative estimate of the fundamental limit of lransparent audio
`signal compression. Another key contribution to the field was the characterization
`of the auditory filter bank and particularly the time-frequency analysis capabili-
`ties of the inner ear [Moor83]. Overthe years, several filter-bank structures that
`mimic the critical band structure of the auditory filler bank have heen proposed.
`A filter bank is a parallel bank of bandpass filters covering the audio spectrum,
`which, when used in conjunction with a perceptual model, can play an important
`role in the identification of perceptual irrelevancies.
`During the early 1990s, several workgroups and organizations such as
`the International Organization for Standardization/International Electro-technical
`Commission (ISO/IEC),
`the International Telecommunications Union (ITU),
`AY&T. Dolby Laboratories, Digital Theatre Systems (DTS), Lucent Technologies,
`Philips, and Sony were actively involved in developing perceptual audio coding
`algorithms and standards. Some ofthe popular commercial standards published
`in the early 1990s include Dolby’s Audio Coder-3 (AC-3). the DTS Coherent
`Acoustics (DTS-CA), Lucent Technologies’ Perceptual Audio Coder (PAC),
`Philips’ Precision Adaptive Subband Coding (PASC), and Sony’s Adaptive
`Transform Acoustic Coding (ATRAC). Table 1.1 lists chronologically some of
`the prominent audio coding standards. The commercial success enjoyed by
`these audio coding standards triggered the launch ofseveral multimedia storage
`formats.
`Table 1.2 lists some ofthe popular multimedia storage Jormats since the begin-
`ning of the CD era. High-performancestereo systems became quite common with
`the advent of CDsin the early 1980s. A compact-disc—read only memory (CD-
`ROM)canstore data up to 700-800 MB in digital form as “microscopic-pits”
`that can be read bya laser beamoffof a reflective surface or a medium. Three
`competing storage media — DAT. the digital compact cassette (DCC), and the
`
`
`
`Semeee a eeneseen
`
`=
`
`eeeee
`
`Page 14
`
`

`

`HISTORICAL PERSPECTIVE
`
`3
`
`Table 1.1. List of perceptual and lossless audio coding standards/algorithms.
`Related references
`Standard/algonthm
`
`
`[ISQI92]
`1, SOJIEC MPEG-I audio
`[Lokh92]
`2. Philips’ PASC (for DCCapplications)
`(John96c] [Sinh96]
`3. AT&T/Lucent PAC/EPAC
`[Davi92]} |Fiel91 |
`4. Dolby AC-2
`[Davis93|
`|Fiel96]
`5. AC-3/Dolby Digital
`{ISOI94a]
`6. ISO/LEC MPEG-2 (BC/LSF) audio
`[Yosh94] [Tsut96]
`7, Sony's ATRAC,; (MimDisc and SDDS)
`fRobi94|
`8. SHORTEN
`[Wyli96b]
`9. Audio processing technology — APT-x100
`{ISOI96]
`10. ISOAEC MPEG-2 AAC
`[Smyt96] [Smyt99]
`l 1. DTS coherent acoustics
`[Crav96] [Crav97]
`12. The DVD Algonthm
`[Wege97]|
`13. MUSICompress
`{Pura97 |
`14, Lossless transform coding of audio (LTAC)
`[Hans98b] [HansO!]
`15. AudioPak
`[ISOI99}
`16. ISOAEC MPEG-4 audio version |
`(Gerz99|
`17. Meridian lossless packing (MLP)
`[ISOLO0]
`18. ISO/IEC MPEG-4 audio version 2
`[GeigO1] (Geig02|
`| 9, Audio coding based on integer transforms
`[ReefOla] [Jans03]
`20. Direct-stream digital (DSD) technology
`
`
`Table 1.2. Some of the popular audio storage
`formats.
`
`
`Related references
`Audio storage format
`
`
`[CD82] [IECAS7]
`1. Compact disc
`[WatkS8] [Tan8?]
`2. Digital audio tape (DAT)
`{Lokh9!) [Lokh92|
`3. Digital compact cassette (DCC)
`| Yosh94] [Tsut96]
`4. MiniDisc
`|DVD96]
`5. Digital versatile disc (DVD)
`{[DVDOL)
`6, DVD-audio (DVD-A)
`
`7. Super audio CD (SACD) [SACDO2|
`
`MiniDisc (MD) — entered the commercial market during 1987-1992. Intended
`mainly for back-up high-density storage (~ |.3 GB), the DAT becamethe primary
`source of mass data storage/transfer [Watk88] [Tan89}]. In 1991-1992, Sony pro-
`posed a storage mediumcalled the MiniDisc, primarily for audio storage. MD
`employs the ATRACalgorithmfor compression. In 1991, Philips introduced the
`DCC, a successor of the analog compact cassette. Philips DCC employs a com-
`pression schemecalled the PASC {Lokh91] [Lokh92] [Hoog94]. The DCC began
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 15
`
`Page 15
`
`

`

`
`
`4
`
`INTRODUCTION
`
`as a potential competitor for DATs but was discontinued in 1996. The introduc-
`tion of the digital versatile disc (DVD) in 1996 cnabled both video and audio
`recording/storage as well as text-message programming. The DVD became one
`of the most successful storage media. With the improvements in the audio com-
`pression and DVDstorage technologies, multichannel surround sound encoding
`formats gained interest |Bosi93] [Holm99] [BosiQ0Q].
`late
`With the emergence of streaming audio applications. during the
`1990s,
`researchers pursued techniques such as combined speech and audio
`architectures, as well as joint source-channel coding algorithms that are optimized
`for the packet-switched Internet. The advent of ISOAEC MPEG-4 standard
`(1996-2000) [ISOI99]} [ISOI00] established newresearch goals for high-quality
`coding of audio at low bit rates. MPEG-4 audio encompasses more functionality
`than perceptual coding |Koen98] [Koen99]. It comprises an integrated family of
`algorithms with provisions for scalable, object-based speech andaudio coding at
`bit rates from as low as 200 b/s up to 64 kb/s per channel.
`The emergence of the DVD-audio and the super audio CD (SACD) pro-
`vided designers with additional storage capacity, which motivated research in
`lossless audio coding [Crav96] [Gerz99] [ReefOla]. A lossless audio coding sys-
`tem is able to reconstruct pertectly a bit-for-bit representation of the original
`input audio. In contrast, a coding scheme incapable of perfect reconstruction is
`called /ossy. For most audio program material, lossy schemes offer the advan-
`tage of lower bit rates (e.g.
`less than | bit per sample) relative to lossless
`schemes (e.g., 10 bits per sample). Delivering real-time lossless audio content
`to the network browser at low bit rates is the next grand challenge for codec
`designers.
`
`1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE
`
`Overthe last few years, researchers have proposed several efficient signal models
`(e.g., ansform-based, subband-filter structures, wavelet-packet) and compression
`standards (Table 1.1) for high-quality digital audio reproduction. Most of these
`algorithms are based on the generic architecture shown in Figure 1.1.
`The coders typically segment input signals into quasi-stationary frames ranging
`from 2 to SO ms. Then, a time-frequency analysis section estimates the temporal
`and spectral components of each frame. The time-frequency mapping is usually
`matched to the analysis properties of the human auditory system. Either way,
`the ultimate objective is to extract from the input audio a set of time-frequency
`parameters that is amenable to quantization according to a perceptual distortion
`metric. Depending on the overall design objectives, the time-frequency analysis
`section usually contains one of the following:
`
`e Unitary transform
`« Time-invariant bank of critically sampled. uniform/nonuniform bandpass
`filters
`
`ee Rees|
`
`Page 16
`
`Page 16
`
`

`

`Input
`
`|
`
`audio
`
`=
`|
`
`a
`
`ee a
`frequency
`analysis
`7
`
`
`
`|
`
`|.
`
`
`
`|
`
`Quantizali
`and encoding
`g
`
`=
`
`a
`;
`|__»| Psychoacoustic
`——* Bit-allocation
`analysis
`ye a
`Masking
`thresholds
`
`Parameters
`
`-
`:
`
`
`
`AUDIO CODER ATTRIBUTES
`
`5
`
`|
`
`lee
`
`ie
`
`¢ |
`
`Entropy
`(lossless)
`coding
`
`MUX
`
`[-—*
`To
`channel
`
`|
`| DI
`Side
`information
`
`rates (<32 kb/s), with an acceptable
`
`Figure 1.1. A generic perceptual audio encoder.
`
`e Time-varying (signal-adaptive) bank of critically sampled, uniform/nonunif-
`orm bandpass fillers
`e Harmonic/sinusoidal analyzer
`¢ Source-system analysis (LPC and multipulse excitation)
`« Hybrid versions of the above.
`
`The choice of time-frequency analysis methodology always involves a fun-
`damental tradeoff between time and frequencyresolution requirements. Percep-
`tual distortion control
`is achieved by a psychoacoustic signal analysis section
`that estimates signal masking power based on psychoacoustic principles. The
`psychoacoustic model delivers masking thresholds that quantify the maximum
`amount of distortion at each point in the time-frequency plane such that quan-
`lization of the time-frequency parameters does not introduce audible artifacts.
`The psychoacoustic mode! therefore allows the quantization section to exploit
`perceptual
`irrelevancies. This section can also exploit statistical redundancies
`through classical techniques such as DPCM or ADPCM. Once a quantized com-
`pact parametric set has been formed, the remaining redundancies are typically
`removed through noiseless run-length (RL) and entropy coding techniques, ©.g.,
`Huffman [Cove91], arithmetic [Witt87], or Lempel-Ziv-Welch (LZW) [Ziv77]
`[Welc84]. Since the output of the psychoacoustic distortion control model
`is
`signal-dependent, most algorithms are inherently variable rate. Fixed channel
`rale requirements are usuallysatisfied through buffer feedback schemes, which
`often introduce encoding delays.
`
`1.3 AUDIO CODER ATTRIBUTES
`
`Perceptual audio coders are typically evaluated basedonthe following attributes:
`audio reproduction quality, operating bit rates. computational complexity, codec
`delay, and channel error robustness. The objective is to attain a high-quality
`(transparent) audio output at
`low bit
`
`Page 17
`
`Page 17
`
`

`

`
`
`
`INTRODUCTION
`
`6
`
`algorithmic delay (~5 to 20 ms), and with low computational complexity (~1 to
`10 million instructions per second, or MIPS).
`
`1.3.1 Audio Quality
`
`importance when designing an audio coding
`Audio quality is of paramount
`algorithm. Successful strides have been made since the development of sim-
`ple near-transparent perceptual coders. Typically, classical objective measures of
`signal fidelity such as the signal
`to noise ratio (SNR) and the total harmonic
`distortion (THD) are inadequate [Ryde96]. As the field of perceptual audio cod-
`ing matured rapidly andcreated greater demand forlistening tests, there was a
`corresponding growth of interest in perceptual measurement schemes. Several
`subjective and objective quality measures have been proposed and standard-
`ized during the last decade. Some of these schemes include the noise-to-mask
`ratio (NMR, 1987) [Bran87a]
`the perceptual audio quality measure (PAQM,
`1991) [Beer91], the perceptual evaluation (PERCEVAL, 1992) [Pail92], the per-
`ceptual objective measure (POM. 1995) [Colo95], and the objective audio signal
`evaluation (OASE, 1997) [Spor97]. We will address these and several other qual-
`ity assessment schemes in detail in Chapter 12.
`
`1.3.2 Bit Rates
`
`From a codec designer’s point of view, one of the key challenges is to rep-
`resent high-fidelity audio with a minimum number ofbits. For instance,
`if a
`5-ms audio frame sampled at 48 kHz (240 samples per frame) is represented
`using 80 bits, then the encoding bit rate would be 80 bits/5 ms == 16 kb/s. Low
`bit rates imply high compression ratios and generally low reproduction qual-
`ity. Early coders such as the ISO/IEC MPEG-1 (32—448 kb/s), the Dolby AC-3
`(32-384 kb/s), the Sony ATRAC (256 kb/s), and the Philips PASC (192 kb/s)
`employ high bit rates for obtaining transparent audio reproduction. However. the
`development of several sophisticated audio coding tools (¢.g., MPEG-4 audio
`tools) created waysfor efficient transmission or storage of audio at rates between
`8 and 32 kb/s. Future audio coding algorithms promise to offer reasonable qual-
`ily al low rates along with the ability to scale both rate and guality to match
`dilferent requirements such as time-varying channel capacity.
`
`1.3.3 Complexity
`
`Reduced computational complexity not only enables real-time implementation
`but mayalso decrease the power consumption and extend battery life. Com-
`putational complexity 1s usually measured in terms of millions of instructions
`per second (MIPS). Complexity estimates are processor-dependent. For example,
`the complexity associated with Dolby’s AC-3 decoder was estimated at approxi-
`mately 27 MIPS using the Zoran ZR38001 general-purpose DSP core |Vern95)];
`tor the Motorola DSP56002 processor,
`the complexity was estimated at 45
`MIPS [Vern95]. Usually, most of the audio codecs rely on the so-called asym-
`metric encoding principle. This means that the codec complexity is not evenly
`
`
` reat EbeenS
`
`
`
`lbhtei Lhenee ee Ee
`
`Page 18
`
`

`

`shared between the encoder and the decoder (typically, encoder 80% and decoder
`20% complexity). with more emphasis on reducing the decoder complexity.
`
`1.3.4 Codec Delay
`Many of the network applications for high-fidelity audio (streaming audio, audio-
`on-demand) are delay tolerant (up to 100-200 ms), providing the opportunity
`to exploit
`long-term signal properties in order to achieve high coding gain.
`However,
`in two-way real-lime communication and voice-over Internet proto-
`col (VoIP) applications, low-delay encoding (]0—20 ms) is important. Consider
`the example described before, i.c., an audio coder operating on frames of 5 ms
`at a 48 kHz sampling frequency. In an ideal encoding scenario, the minimum
`amountof delay should be 5 ms at the encoder and 5 msat the decoder (same as
`the frame length). However, other factors such as analysis-synthesis filler bank
`window,
`the look-ahead. the bit-reservoir, and the channel delay contribute to
`additional delays. Employing shorter analysis-synthesis windows, avoiding look-
`ahead, and re-structuring the bit-reservoir functions could result in Jow-delay
`encoding, nonetheless, with reduced coding efficiencies.
`
`TYPES OF AUDIO CODERS- AN OVERVIEW
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1.3.5 Error Robustness
`The increasing popularity of streaming audio over packet-switched and wire-
`less networks such as the Internet implies that any algorithm intended for such
`applications must be able to deal with a noisy time-varying channel. In partic-
`ular, provisions for error robustness and error protection must be incorporated
`at the encoder in order to achieve reliable transmission of digital audio over
`error-prone channels. Oue simple idea could be to provide betler protection to
`the error-sensitive and priority (important) bits. For instance,
`the audio frame
`header requires the maximum error robustness: otherwise, transmission errors
`in the headerwill seriously impair the entire audio frame. Several error detect-
`ing/correcting codes [Lin82] [Wick95] [Bayl97] [Swee02] [Zara02] can also be
`employed. Inclusion oferror correcting codes in the bitstream might help to obtain
`error-free reproduction of the input audio, however, with increased complexity
`andbit rates.
`Fromthe discussionin the previoussections, it is evident that several tradeoffs
`must be considered in designing an algorithm for a particular application. Forthis
`reason. audio coding standards consist of several tools that enable the design of
`scalable algorithms. For example, MPEG-4 provides tools to design algorithms
`that satisfy a

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket