throbber
LIBRARY OF CONGRESS
`
`Office of Business Enterprises
`Duplication Services Section
`
`THIS IS TO CERTIFY that the collections of the Library of Congress contain a bound
`volume entitled AUDIO SIGNAL PROCESSING AND CODING, call number TK
`5102.92.S73 2007, Copy 1. The attached — Cover Page, Title Page, Copyright Page, Table of
`Contents Pages, Chapter 1 and Chapter 10 - are a true and complete representation from that
`work.
`
`THIS IS TO CERTIFY FURTHER, that work is marked with a Library of Congress
`Cataloging-in-Publication stamp dated March 5, 2007.
`
`IN WITNESS WHEREOF, the seal of the Library of Congress is affixed hereto on
`May 18, 2018.
`
`Deirdre Scott
`Business Enterprises Officer
`Office of Business Enterprises
`Library of Congress
`
`HULU LLC
`101 Independence Avenue, SE Washington, DC 20540-4917 Tel 202.707.5650 www.locsgov; duplicationservices@loc.gov Exhibit 1009
`IPR2018-01187
`Page 1
`
`Page 1
`
`HULU LLC
`Exhibit 1009
`IPR2018-01187
`
`

`

`Andreas Spanias, Ted Painter, and Venkatraman Atti
`
`g
`
`0"
`
`Page 2
`
`Page 2
`
`

`

`AUDIO SIGNAL
`PROCESSING
`AND CODING
`
`Andreas §panias
`Ted Painter
`Venkatraman Atti
`
`VIC CH T CNIVIA L
`
`C
`
`C
`
`1 B 0 7
`WW 1 LEY
`2 0 0 7
`
`BICCNVC14141AL
`
`WILEY-INTERSCIENCE
`A John Wiley & Sons, Inc., Publication
`
`Page 3
`
`Page 3
`
`

`

`•
`
`r
`r
`Copyright © 2007 b John Wilet.riCk._ SoTis;
`
`rights reserved,
`
`evill Jersey.
`
`poboken
`
`Published by John Wiley kjibus.
`Published simultaneously in Cari5da.-
`No purl of this.publieatton may he reproaced, stored id a intrievaltsystenti, or transmitted is ;spy
`form or by any walls. eiectronie. rtieclratlte al. pnotueopying. t•ecording. scanning. or otherwise,
`except as pomincd undei %Vik] 11)7 or 108 of the 19Th United States Copyright Act, without
`either the prior written permission ut the Publisher. or authorization thrinigh payment of the
`appropriate per-copy fee to the Copyright Clearance Center, I 11 C., 222 Rosewood Drive, Danvers,
`MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyrighLeom. Requests
`to the Publisher for permission should be addressed to the Permissions Department, John Wiley &
`Sons, Inc.. 111 River Street, Hoboken. NJ 07030, (201) 748-601 1, fax (201) 748-6008, or online at
`http://www.wiley.com/go/permission.
`
`Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
`efforts in preparing this book, they make no representations or warranties with respect to the
`accuracy or completeness of the contents of this book and specifically disclaim any implied
`warranties of merchantability or fitness for a particular purpose. No warranty may be created or
`extended by sales representatives or written sales materials. The advice and strategies contained
`herein may not be suitable for your situation. You should consult with a professional where
`appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other
`commercial damages, including but not limited to special, incidental. consequential, or other
`damages.
`
`For general information on our other products and services or for technical support, please contact
`our Customer Care Department within the United States at (800) 762-2974, outside the United
`States at (317) 572-3993 or fax (317) 572-4002.
`
`Wiley also publishes its books in a variety of electronic formats. Some content that appears in print
`may not be available in electronic formats. For more information about Wiley products. visit our
`web site at www.wiley.com.
`Wiley n icen Le1111 a I Logo: Richard J. Pacific()
`
`Library of Congress Cataloging-in-Publication Data:
`
`Spanias, Andreas.
`Audio signal processing and coding/by Andreas Spanias, Ted Painter, Venkutraman AM.
`p. CIT1,
`“Wiley-Interscience publication."
`Includes bibliographical references and index.
`ISBN: 978-0-471-79147-8
`I. Coding theory. 2. signal processing -Digital techniques. 3 Sound-Recording and
`reptodueing-Digital techniques. I. Painter_ Ted, 1967-1i. Aui, Venkatramum 1978-111.
`Title.
`
`TK5102.92.873 2006
`621.382'8- dc22
`
`Printed in the United States of America.
`10 9 8 7 6 5 4 3 7 I
`
`2006040507
`
`Page 4
`
`Page 4
`
`

`

`CONTENTS
`
`PREFACE
`
`1
`
`INTRODUCTION
`
`1.1
`Historical Perspective
`1 .2 A General Perceptual Audio Coding Architecture
`1.3 Audio Coder Attributes
`1.3. 1
`Audio Quality
`1.3.2
`Bit Rates
`1.3.3 Complexity
`1 ,3.4 Codec Delay
`1.3.5
`Error Robustness
`Types of Audio Coders — An Overview
`1.4
`1 .5 Organization of the Book
`1.6 Notational Conventions
`Problems
`Computer Exercises
`
`2 SIGNAL PROCESSING ESSENTIALS
`
`2. 1
`Introduction
`Spectra of Analog Signals
`2.2
`2.3 Review of Convolution and Filtering
`2.4 Uniform Sampling
`2.5 Discrete-Time Signal Processing
`
`xv
`
`1
`
`4
`5
`6
`6
`
`7
`7
`7
`8
`9
`I I
`1 I
`
`13
`
`13
`1 3
`16
`17
`20
`
`vii
`
`Page 5
`
`Page 5
`
`

`

`Viii
`
`CONTENTS
`
`Transforms for Discrete-Time Signals
`2.5.1
`The Discrete and the Fast Fourier Transform
`2.5.2
`The Discrete Cosine Transform
`2.5.3
`2.5.4 The Short-Time Fourier Transform
`2.6 Difference Equations and Digital Filters
`The Transfer and the Frequency Response Functions
`2.7
`Poles. Zeros, and Frequency Response
`2.7.1
`Examples of Digital Filters for Audio Applications
`2.7.2
`2.8 Review of Multirate Signal Processing
`2.8.1 Down-sampling by an Integer
`2.8.2 Up-sampling by an Integer
`Sampling Rate Changes by Noninteger Factors
`2.8.3
`2.8.4 Quadrature Mirror Filter Banks
`2.9 Discrete-Time Random Signals
`2.9.1 Random Signals Processed by LTI Digital Filters
`2.9.2 Autocorrelation Estimation from Finite-Length Data
`2.10 Summary
`Problems
`Computer Exercises
`
`3 QUANTIZATION AND ENTROPY CODING
`
`3. 1
`
`Introduction
`3. 1 .1
`The Quantization —Bit Allocation —Entropy Coding
`Module
`3.2 Density Functions and Quantization
`3.3
`Scalar Quantization
`3.3.1 Uniform Quantization
`3.3.2 Nonuniform Quantization
`3.3.3 Differential PCM
`3.4 Vector Quantization
`3.4.1
`Structured VQ
`3.4.2
`Split-VQ
`3.4.3 Conjugate-Structure VQ
`3.5 Bit-Allocation Algorithms
`3.6
`Entropy Coding
`3.6.1 Huffman Coding
`3.6.2 Rice Coding
`3.6.3 Golomb Coding
`
`20
`22
`23
`23
`25
`27
`29
`30
`33
`33
`35
`36
`36
`39
`42
`44
`44
`45
`47
`
`51
`
`5I
`
`52
`53
`54
`54
`57
`59
`62
`64
`67
`69
`70
`74
`77
`81
`82
`
`Page 6
`
`Page 6
`
`

`

`CONTENTS
`
`iX
`
`3.7
`
`3.6.4 Arithmetic Coding
`Summary
`Problems
`Computer Exercises
`
`4
`
`LINEAR PREDICTION IN NARROWBAND AND WIDEBAND
`CODING
`
`83
`85
`55
`
`91
`
`91
`92
`94
`95
`90
`90
`97
`11)1)
`102
`102
`104
`106
`107
`108
`
`113
`
`1 13
`1 1•I
`I 15
`
`120
`123
`I24
`124
`12-1
`125
`127
`128
`
`130
`131
`
`Page 7
`
`4. 1
`4.2
`4.3
`
`Introduction
`LP-Based Source-System Modeling for Speech
`Short-Term Linear Prediction
`4.3.1
`Long-Term Prediction
`4.3.2 ADPCM Using Linear Prediction
`4.4 Open-Loop Analysis-Synthesis Linear Prediction
`4.5 Analysis-by-Synthesis Linear Prediction
`Code-Excited Linear Prediction Algorithms
`4.5.1
`Linear Prediction in Wideband Coding
`4.6.1 Wideband Speech Coding
`4.6.2 Wideband Audio Coding
`Summary
`Problems
`Computer Exercises
`
`4.6
`
`4.7
`
`5 PSYCHOACOUSTIC PRINCIPLES
`
`5.1
`5.2
`5. 3
`5.4
`
`introduction
`Absolute Threshold of Hearing
`Critical Bands
`Simultaneous Masking, Masking Asymmetry. and the Spread
`of Masking
`5.4.1 Noise-Masking-Tone
`Tone-Masking-Noise
`5.4.2
`5.4.3 Noise-Masking-Noise
`5.4.4 Asymmetry of Masking
`The Spread of Masking
`5.4.5
`5.5 Nonsimultaneous Masking
`Perceptual Entropy
`5.6
`Fix .dinple Codec Perceptual Model: ISO/IEC 1 1 172-3
`5.7
`(MPECT - I) Psychcacoustic Model I
`Step I : Spectral ,AnillyNis and SP1, Normalization
`5.7. 1
`
`Page 7
`
`

`

`X
`
`CONTENTS
`
`Step 2: Identification of Tonal and Noise. Maskers
`5.7.2
`Step 3: Decimation and Reorganization of Maskers
`5.7.3
`Step 4: Calculation of Individual Masking Thresholds
`5.7.4
`Step 5: Calculation of Global Masking Thresholds
`5.7.5
`Perceptual Bit Allocation
`Summary
`Problems
`Computer Exercises
`
`5.8
`5.9
`
`6 TIME-FREQUENCY ANALYSIS: FILTER BANKS AND
`TRANSFORMS
`
`6.1
`6.2
`6.3
`
`6.3.2
`
`6.3.3
`
`Introduction
`Analysis-Synthesis Framework for M-hand Filter Banks
`Filter Banks for Audio Coding: Design Considerations
`6.3.I
`The Role of Time-Frequency Resolution in Masking
`Power Estimation
`The Role of Frequency Resolution in Perceptual Bit
`Allocation
`The Role of Time Resolution in Perceptual Bit
`Allocation
`6.4 Quadrature Minor and Conjugate Quadrature Filters
`Tree-Structured QMF and CQF M-band Banks
`6.5
`6.6 Cosine Modulated "Pseudo QMF" M-band Banks
`6.7 Cosine Modulated Perfect Reconstruction (PR) .41-band Banks
`and the Modified Discrete Cosine Transform (MDCT)
`6.7.1
`Forward and Inverse MDCT
`6.7.2 MDCT Window Design
`Example MDCT Windows (Prototype FIR Filters)
`6.7.3
`6.8 Discrete Fourier and Discrete Cosine Transform
`6.9
`Pre-echo Distortion
`6.10 Pre-echo Control Strategies
`6.10. 1 Bit Reservoir
`6. 10.2 Window Switching
`6.10.3 Hybrid. Switched Filter Banks
`6.10.4 Gain Modification
`6.10.5 Temporal Noise Shaping
`6. 1 1 Summary
`Problems
`Computer Exercises
`
`13 I
`135
`136
`138
`138
`140
`140
`141
`
`145
`
`145
`146
`148
`
`149
`
`149
`
`150
`155
`156
`160
`
`163
`165
`165
`167
`178
`180
`182
`182
`I 82
`184
`185
`
`1188(5)
`188
`191
`
`Page 8
`
`Page 8
`
`

`

`CONTENTS
`
`Xi
`
`7
`
`TRANSFORM CODERS
`
`7.1
`Introduction
`7.2 Optimum Coding in the Frequency Domain
`7.3
`Perceptual Transform Coder
`7.3. 1
`PXFM
`7.3.2
`SEPXFM
`7.4
`Brandenburg-Johnston Hybrid Coder
`7.5 CNET Coders
`7.5. 1
`CNET DR' Coder
`7.5.2 CNET MDCT Coder I
`7.5.3 CNET MDCT Coder 2
`7.6 Adaptive Spectral Entropy Coding
`7.7 Differential Perceptual Audio Coder
`7.8 DFT Noise Substitution
`7.9 DCT with Vector Quantization
`7.10 MDCT with Vector Quantization
`7. 11 Summary
`Problems
`Computer Exercises
`
`8 SUBBAND CODERS
`
`8. I
`
`Introduction
`Subhand Algorithms
`8.1.1
`8.2 DWT and Discrete Wavelet Packet Transform (DWPTt
`8.3 Adapted WP Algorithms
`8.3.1 DWPT Coder with Globally Adapted Daubechies
`Analysis Wavelet
`Scalable DWPT Coder with Adaptive Tree Structure
`8.3.2
`8.3.3 DWPT Coder with Glohally Adapted General
`Analysis Wavelet
`8.3.4 DWPT Coder with Adaptive 'Free Structure and
`Locally Adapted Analysis Wavelet
`8.3.5 DWPT Coder with Perceptually Optimized Synthesis
`Wavelets
`8.4 Adapted Nonuniform Filter Banks
`Switched Nonuniform Filter Bank Cascade
`8.4.1
`Frequency-Varying Modulated Lapped Transforms
`8.4.2
`8.5 Hybrid WP and Adapted WP/Sinusoidal Algorithms
`
`195
`
`195
`196
`197
`198
`199
`
`200
`201
`201
`201
`202
`203
`204
`205
`206
`207
`208
`208
`210
`
`211
`
`21 1
`212
`214
`218
`
`218
`220
`
`22 3
`
`223
`
`224
`226
`226
`227
`117
`
`Page 9
`
`Page 9
`
`

`

`Xii
`
`CONTENTS
`
`8.5.1 Hybrid Sinusoidal/Classical DWPT Coder
`8.5.2 Hybrid Sinusoidal/M-band DWPT Coder
`8.5.3 Hybrid Sinusoidal/DWPT Coder with WP Tree
`Structure Adaptation (ARCO)
`Subband Coding with Hybrid Filter Bank/CELP Algorithms
`8.6.1 Hybrid Subband/CELP Algorithm for Low-Delay
`Applications
`8.6.2 Hybrid Subband/CELP Algorithm for
`Low-Complexity Applications
`Subband Coding with IIR Filter Banks
`Problems
`Computer Exercise
`
`8.6
`
`8.7
`
`9 SINUSOIDAL CODERS
`
`9.1
`9.2
`
`9.3
`
`Introduction
`The Sinusoidal Model
`9.2.1 Sinusoidal Analysis and Parameter Tracking
`9.2.2
`Sinusoidal Synthesis and Parameter Interpolation
`Analysis/Synthesis Audio Codec (ASAC)
`9.3.1 ASAC Segmentation
`9.3.2 ASAC Sinusoidal Analysis-by-Synthesis
`9.3.3 ASAC Bit Allocation, Quantization, Encoding, and
`Scalabi lity
`9.4 Harmonic and Individual Lines Pius Noise Coder (HILN)
`9.4.1 HILN Sinusoidal Analysis-by-Synthesis
`9.4.2 HILN Bit Allocation, Quantization. Encoding, and
`Decoding
`9.5 FM Synthesis
`9.5.1 Principles of FM Synthesis
`Perceptual Audio Coding Using an FM Synthesis
`9.5.2
`Model
`The Sines + Transients + Noise (STN) Model
`9.6
`9.7 Hybrid Sinusoidal Coders
`9.7.1 Hybrid Sinusoidal-MDCT Algorithm
`9.7.2 Hybrid Sinusoidal-Vocoder Algorithm
`Summary
`Problems
`Computer Exercises
`
`9.8
`
`228
`229
`
`230
`233
`
`234
`
`235
`237
`237
`240
`
`241
`
`241
`242
`242
`245
`247
`248
`248
`
`248
`249
`250
`
`251
`251
`252
`
`252
`254
`255
`256
`257
`258
`258
`259
`
`Page 10
`
`Page 10
`
`

`

`CONTENTS
`
`xiii
`
`10 AUDIO CODING STANDARDS AND ALGORITHMS
`
`10.1
`Introduction
`10.2 MIDI Versus Digital Audio
`10.2. 1 MIDI Synthesizer
`10.2.2 General MIDI (GM)
`10.2.3 MUM Applications
`10.3 Multichannel Surround Sound
`10.3. 1 The Evolution of Surround Sound
`10.12 The Mono, the Stereo, and the Surround Sound
`Formats
`10.3.3 The ITU-R 135.775 5. I -Channel Configuration
`10.4 MPE(.3 Audio Standards
`10.4.1 MPEG-1 Audio (ISO/IEC 11172-3)
`10.4.2 MPEG-2 BC/LSF (1SO/IEC-13818-3)
`10.4.3 MPEG-2 NBC/AAC (ISO/TEC-13818-7)
`10.4.4 MPEG-4 Audio (1SO/1EC 14496-3)
`10.4.5 MPEG-7 Audio (ISO/1EC 15938-4)
`10.4.6 MPEG-21 Framework (ISO/IEC-21000)
`10.4.7 MPEG Surround and Spatial Audio Coding
`10.5 Adaptive Transform Acoustic Coding (ATRAC)
`10.6 Lucent Technologies PAC, EPAC, and MPAC
`10.6.1 Perceptual Audio Coder (PAC)
`10.6.2 Enhanced PAC (EPAC)
`10.6.3 Multichannel PAC (MPAC)
`10.7 Dolby Audio Coding Standards
`10.7.1 Dolby AC-2, AC-2A
`10.7.2 Dolhy AC-3/Dolby Digital/Dolby SR L)
`10.8 Audio Processing Technology APT-x 100
`10.9 DTS — Coherent Acoustics
`10.9.1 Framing and Subband Analysis
`10.9.2 Psychoacoustic Analysis
`I0.9.3 ADPCM — Differential Subband Coding
`10.9.4 Bit Allocation. Quantization. and Multiplexing
`10.9.5 DTS-CA Versus Dolby Digital
`Problems
`Computer Exercise
`
`263
`
`763
`264
`264
`266
`266
`267
`267
`
`268
`268
`270
`275
`279
`283
`289
`309
`317
`319
`319
`32]
`321
`323
`323
`325
`325
`327
`335
`338
`338
`339
`339
`341
`342
`342
`347
`
`11 LOSSLESS AUDIO CODING AND DIGITAL WATERMARKING 343
`
`11 . 1
`
`Introduction
`
`343
`
`Page 11
`
`Page 11
`
`

`

`xiv
`
`CONTENTS
`
`11 .2 Lossless Audio Coding (L2AC)
`1 1 .2.1 L2AC Principles
`11.2.2 L2AC Algorithms
`11.3 DVD-Audio
`11.3.1 Meridian Lossless Packing (MLP)
`11.4 Super-Audio CD (SACD)
`1 1.4.1 SACD Storage Format
`11.4.2 Sigma-Delta Modulators (SDM)
`11.4.3 Direct Stream Digital (DSD) Encoding
`11.5 Digital Audio Watermarking
`1 1.5.1 Background
`1 1.5.2 A Generic Architecture for DAW
`11.5.3 DAW Schemes — Attributes
`1 1 .6 Summary of Commercial Applications
`Problems
`Computer Exercise
`
`344
`345
`346
`356
`358
`358
`362
`362
`364
`368
`370
`374
`377
`378
`382
`382
`
`12 QUALITY MEASURES FOR PERCEPTUAL AUDIO CODING
`
`383
`
`383
`384
`386
`387
`388
`
`Introduction
`12.1
`12.2 Subjective Quality Measures
`12.3 Confounding Factors in Subjective Evaluations
`12.4 Subjective Evaluations of Two-Channel Standardized Coders
`12.5 Subjective Evaluations of 5.1-Channel Standardized Coders
`12.6 Subjective Evaluations Using Perceptual Measurement
`389
`Systems
`390
`12.6.1 CAR Perceptual Measurement Schemes
`390
`12.6.2 NSE Perceptual Measurement Schemes
`391
`12.7 Algorithms for Perceptual Measurement
`12.7.1 Example: Perceptual Audio Quality Measure (PAQM) 392
`396
`12.7.2 Example: Noise-to-Mask Ratio (NMR)
`12.7.3 Example: Objective Audio Signal Evaluation (OAST:;) 399
`12.8 TTU-R BS.1387 and TTU-T P.861 ; Standards for Perceptual
`Quality Measurement
`12.9 Research Directions for Perceptual Codec Quality Measures
`
`401
`402
`
`REFERENCES
`
`INDEX
`
`405
`
`459
`
`Page 12
`
`Page 12
`
`

`

`CHAPTER 1
`
`INTRODUCTION
`
`Audio coding or audio compression algorithms are used to obtain compact dig-
`ital representations of high-fidelity (widehand) audio signals for the purpose of
`efficient transmission or storage. The central objective in audio coding is to rep-
`resent the signal with a minimum number of hits while achieving transparent
`signal reproduction, i.e., generating output audio that cannot he distinguished
`from the original input, even by a sensitive listener ("golden ears"). This text
`gives an in-depth treatment of algorithms and standards for transparent coding
`of high-fidelity audio.
`
`1.1 HISTORICAL PERSPECTIVE
`
`The introduction of the compact disc (CD) in the early 1980s brought to the
`fore all of the advantages of digital audio representation, including true high-
`fidelity, dynamic range, and robustness. These advantages, however, came at
`the expense of high data rates. Conventional CD and digital audio tape (DAT)
`systems are typically sampled at either 44.1 or 48 kHz using pulse code mod-
`ulation (PCM) with a 16-bit sample resolution. 'This results in uncompressed
`data rates of 705.6/768 kb/s for a monaural channel, or 1.41/1.54 Mb/s for a
`stereo-pair. Although these data rates were accommodated successfully in first-
`generation CD and DAT players, second-generation audio players and wirelessly
`connected systems are often subject to bandwidth constraints that arc incompat-
`ible with high data rates. Because of the success enjoyed by the first-generation
`
`Audio Signal Pmca.v,s.ing and Coding, by Andreas Spanias. Ted Painter. and Venkatrarnan Atti
`Copyright e) 2007 by John Wiley & Sons. Inc.
`
`1
`
`Page 13
`
`Page 13
`
`

`

`2
`
`INTRODUCTION
`
`systems, however, end users have come to expect "CD-quality" audio reproduc-
`tion from any digital system. Therefore, new network and wireless multimedia
`digital audio systems must reduce data rates without compromising reproduc-
`tion quality. Motivated by the need for compression algorithms that can satisfy
`simultaneously the conflicting demands of high compression ratios and trans-
`parent quality for high-fidelity audio signals, several coding methodologies have
`been established over the last two decades. Audio compression schemes, in gen-
`eral, employ design techniques that exploit both perceptual irrelevancies and
`statistical redundancies.
`PCIVI was the primary audio encoding scheme employed until the early 1980s.
`PCM does not provide any mechanisms for redundancy removal. Quantization
`methods that exploit the signal correlation, such as differential PCM (DPCM),
`delta modulation [Jaya76] [Jaya84], and adaptive DPCM (ADPCM) were applied
`to audio compression later (e.g., PC audio cards). Owing to the need for dras-
`tic reduction in bit rates, researchers began to pursue new approaches for audio
`coding based on the principles of psychoacoustics [Zwie901 [Moor03J. Psychoa-
`coustic notions in conjunction with the basic properties of signal quantization
`have led to the theory of perceptual entropy lJohn88a1 IJohn88hi. Perceptual
`entropy is a quantitative estimate of the fundamental limit of transparent audio
`signal compression. Another key contribution to the field was the characterization
`of the auditory filter hank and particularly the time-frequency analysis capabili-
`ties of the inner ear fMoor831. Over the years, several ,filler-bank structures that
`mimic the critical hand structure of the auditory fi lter bank have been proposed.
`A fi lter hank is a parallel bank of bandpass filters covering the audio spectrum,
`which, when used in conjunction with a perceptual model, can play an important
`role in the identification of perceptual irrelevancies.
`During the early 1990s, several workgroups and organizations such as
`the International Organization for Standardization/International E]ectro-technical
`Commission (ISO/IFC), the International Telecommunications Union (IT1J).
`AT&T. Dolby Laboratories. Digital Theatre Systems (DTS), Lucent Technologies,
`Philips, and Sony were actively involved in developing perceptual audio coding
`algorithms and standards. Some of the popular commercial standards published
`in the early 1990s include Dolby's Audio Coder-3 (AC-3). the DTS Coherent
`Acoustics (DTS-CA), Lucent Technologies' Perceptual Audio Coder (PAC),
`Philip's' Precision Adaptive Subband Coding (PASC), and Sony's Adaptive
`Transform Acoustic Coding (ATRAC). Table 1.1 lists chronologically some of
`the prominent audio coding standards. The commercial success enjoyed by
`these audio coding standards triggered the launch of several multimedia storage
`formats.
`Table 1.2 lists some of the popular multimedia storage formats since the begin-
`ning of the CD era. High-performance stereo systems became quite common with
`the advent of CDs in the early 1980s. A compact-disc—read only memory (CD-
`ROM) can store data up to 700-800 MB in digital form as "microscopic-pits"
`that can be read by a laser beam off of a reflective surface or a medium. Three
`competing storage media — DAT. the digital compact cassette (DCC), and the
`
`Page 14
`
`Page 14
`
`

`

`Table 1.1. List of perceptual and lossless audio coding standards/algorithms.
`
`HISTORICAL PERSPECTIVE
`
`3
`
`Standard/al go ri ihn I
`
`1. ISO/SEC MPEG-I audio
`2. Philips' PASC (for DCC applications)
`3. AT&T/Lucent PAC/.EPAC
`4. T)olby AC-2
`5. AC-3/Dolby Digital
`6. ISO/IEC MPEG-2 (BC/ESE) audio
`7. Sony's ATRAC; (MiniDisc and SDDS)
`8. SHORTEN
`9. Audio processing technology - APT-x100
`10. ISO/IEC MPEG-2 AAC
`11. DTS coherent acoustics
`12. The DVD Algorithm
`13. MUSICompress
`14. Lossless transform coding of audio (Ul'AC)
`15. AudioPaK
`16. ISO/IEC MPEG-4 audio version 1
`17. Meridian lossless packing (MLP)
`18. ISO/IEC MPEG-4 audio version 2
`19. Audio coding based on integer transforms
`20. Direct-stream digital (DSD) technology
`
`Related references
`
`11S01921
`[Lok.h921
`[John96c1 [Sinh96]
`[Davi92] [Fie191 J
`[Davis931 IF1e1961
`[ISOI94a]
`1Yosh941 [Tsu1961
`1Robi94]
`[Wyli96b]
`11S0.19611
`[Sinyt961 [Smyt99]
`[Crai,961 [Crav97]
`[Vv'ege971
`[Pura97.1
`[Flans98b] llians011
`[1S0199.1
`1Gerz991
`IIS01001
`1Geig011 [Geig021
`1ReefOla] liJans03:1
`
`Table 1.2. Some of the popular audio storage
`formats.
`
`Audio storage format
`
`Related references
`
`1. Compact disc
`2. Digital audio tape (DAT)
`3. Digital compact cassette (DCC)
`4. MiniDisc
`5. Digital versatile disc (DVD)
`6. DV.D-audio (DVD-A)
`7. Super audio CD (SACD)
`
`lCD821 [1ECAS71
`I Wa1.k881 [Tart891
`[Lok11911 1Lok,h921
`[Yosh941 [Tsut961
`I DVD96l
`1Pvno I
`ISACD021
`
`MiniDisc (MD) - entered the commercial market during 1987-1992. Intended
`mainly for back-up high-density storage (-1.3 GB), the DAT became the primary
`source of mass data storage/transfer fWatk88] lTan891. In 1991-1992. Sony pro-
`posed a storage medium called the MiniDisc, primarily, for audio storage. MD
`employs the ATRAC algorithm for compression. In 1991, Philips introduced the
`DCC, a successor of the analog compact cassette. Philips DCC employs a com-
`pression scheme called the PASC iLokh911 1Lokh921 [Hoog94]. The DCC began
`
`J
`
`Page 15
`
`Page 15
`
`

`

`4
`
`INTRODUCTION
`
`as a potential competitor for DAL but was discontinued in 1996. The introduc-
`tion of the digital versatile disc (DVD) in 1996 enabled both video and audio
`recording/storage as well as text-message programming. The DVD became one
`of the most successful storage, media. With the improvements in the audio corn-
`pression and DVD storage technologies. multichannel surround sound encoding
`formats gained interest [Bosi93] [Holm99] [Bosi00].
`With the emergence of streaming audio applications. during
`late
`the
`1990s, researchers pursued techniques such as combined speech and audio
`architectures, as well as joint source-channel coding algorithms that are optimized
`for the packet-switched Internet. The advent of ISO/IEC MPEG-4 standard
`(1996-2000) [IS0199] IISOI0011 established new research goals for high-quality
`coding of audio at low bit rates. MPEG-4 audio encompasses more functionality
`than perceptual coding [Koen98] lKoen991. It comprises an integrated family of
`algorithms with provisions for scalable, object-based speech and audio coding at
`bit rates from as low as 200 b/s up to 64 kb/s per channel.
`The emergence of the DVD-audio and the super audio CD (SACD) pro-
`vided designers with additional storage capacity, which motivated research in
`lossless audio coding [Crav96] [Gerz99] [ReefOla]. A losslcss audio coding sys-
`tem is able to reconstruct perfectly a bit-for-bit representation of the original
`input audio. In contrast, a coding scheme incapable of perfect reconstruction is
`called lossy. For most audio program material, lossy schemes offer the advan-
`tage of lower hit rates (e.g., less than 1 bit per sample) relative to lossless
`schemes (e.g., 10 bits per sample). Delivering real-time lossless audio content
`to the network browser at low hit rates is the next grand challenge for codec
`designers.
`
`1.2 A GENERAL PERCEPTUAL AUDIO CODING ARCHITECTURE
`
`Over the last few years, researchers have proposed several efficient signal models
`(e.g., transform-based, subband-filter structures, wavelet-packet) and compression
`standards (Table 1 . 1) for high-quality digital audio reproduction. Most of these
`algorithms are based on the generic architecture shown in Figure 1. 1 .
`The coders typically segment input signals into quasi-stationary frames ranging
`from 2 to 50 ins. Then, a time-frequency analysis section estimates the temporal
`and spectral components of each frame. The time-frequency mapping is usually
`matched to the analysis properties of the human auditory system. Either way,
`the ultimate objective is to extract from the input audio a set of time-frequency
`parameters that is amenable to quantization according to a perceptual distortion
`metric. Depending on the overall design objectives, the time-frequency analysis
`section usually contains one of the following:
`
`• Unitary transform
`• Time-invariant bank of critically sampled, uniform/nonuniform bandpass
`filters
`
`Page 16
`
`Page 16
`
`

`

`Input
`audio
`
`Time-
`frequency
`analysis
`
`Quantization
`and encoding
`
`Psychoacoustic
`analysis
`
`IT
`
`Bit-allocation
`
`AUDIO CODER ATTRIBUTES
`
`5
`
`Parameters
`
`Entropy
`(lossless)
`coding
`
`MUX
`
`To
`channel
`
`Masking
`thresholds
`
`Side
`information
`Figure 1.1. A generic perceptual audio encoder.
`
`• Time-varying (signal-adaptive) hank of critically sampled, uniform/nonunif-
`orm bandpass fillers
`• Harmonic/sinusoidal analyzer
`• Source-system analysis (LPC and multipulse excitation)
`• Hybrid versions of the above.
`
`The choice of time-frequency analysis methodology always involves a fun-
`damental tradeoff between time and frequency resolution requirements. Percep-
`tual distortion control is achieved by a psychoacoustic signal analysis section
`that estimates signal masking power based on psychoacoustic principles. The
`psychoacoustic model delivers masking thresholds that quantify the maximum
`amount of distortion at each point in the time-frequency plane such that quan-
`tization of the time-frequency parameters does not introduce audible artifacts.
`The psychoacoustic model therefore allows the quantization section to exploit
`perceptual irrelevancies. This section can also exploit statistical redundancies
`through classical techniques such as DPCM or ADPCM. Once a quantized com-
`pact parametric set has been formed, the remaining redundancies are typically
`removed through noiseless run-length (1,11..,) and entropy coding techniques, e.g.,
`Huffman 1Cove911, arithmetic [Witt87], or Lempel-Ziv-Welch (LZW) rzi v77
`Welc841. Since the output of the psychoacoustic distortion control model is
`signal-dependent, most algorithms are inherently variable rate. Fixed channel
`rate requirements are usually satisfied through buffer feedback schemes, which
`often introduce encoding delays.
`
`1.3 AUDIO CODER ATTRIBUTES
`
`Perceptual audio coders are typically evaluated based on the following attributes:
`audio reproduction quality, operating bit rates. computational complexity, codec
`delay, and channel error robustness. The objective is to attain a high-quality
`(transparent) audio output at low bit rates (<32 kb/s), with an acceptable
`
`Page 17
`
`Page 17
`
`

`

`6
`
`INTRODUCTION
`
`algorithmic delay (---5 to 20 ms). and with low computational complexity (-1 to
`10 million instructions per second, or MIPS).
`
`1.3.1 Audio Quality
`
`Audio quality is of paramount importance when designing an audio coding
`algorithm. Successful strides have been made since the development of sim-
`ple near-transparent perceptual coders. Typically, classical objective measures of
`signal fidelity such as the signal to noise ratio (SNR) and the total harmonic
`distortion (THD) are inadequate [Ryde96]. As the field of perceptual audio cod-
`ing matured rapidly and created greater demand for listening tests, there was a
`corresponding growth of interest in perceptual measurement schemes. Several
`subjective and objective quality measures have been proposed and standard-
`ized during the last decade. Some of these schemes include the noise-to-mask
`ratio (NMR, 1987) [Bran87a] the perceptual audio quality measure (PAQM,
`1991) [Beer91], the perceptual evaluation (PERCF.VAL, 1992) [Pail92], the per-
`ceptual objective measure (POM, 1995) IColo951, and the objective audio signal
`evaluation (OASE, 1997) [Spor971. We will address these and several other qual-
`ity assessment schemes in detail in Chapter 12.
`
`1.3.2 Bit Rates
`
`From a codec designer's point of view, one of the key challenges is to rep-
`resent high-fidelity audio with a minimum number of hits. For instance, if a
`5-ms audio frame sampled at 48 kHz (240 samples per frame) is represented
`using 80 bits, then the encoding bit rate would be 80 bits/5 ms = 16 kb/s. Low
`bit rates imply high compression ratios and generally low reproduction qual-
`ity. Early coders such as the ISO/1EC MPEG-I (32-448 kb/s), the Dolby AC-3
`(32-384 klA/s), the Sony ATRAC (256 kb/s), and the- Philips PASC (192 kb/s)
`employ high bit rates for obtaining transparent audio reproduction. However. the
`development of several sophisticated audio coding tools (e.g., MPEG-4 audio
`tools) created ways for efficient transmission Cr storage of audio at rates between
`8 and 32 kb/s. Future audio coding algorithms promise to offer reasonable qual-
`ity at low rates along with the ability to scale both rate and quality to match
`different requirements such as time-varying channel capacity.
`
`1.3.3 Complexity
`
`Reduced computational complexity not only enables real-time implementation
`hut may. also decrease the power consumption and extend hattely life. Com-
`putational complexity is usually measured in terms of millions of instructions
`per second (MIPS). Complexity estimates arc processor-dependent. For example,
`the complexity associated with Dolby's AC-3 decoder was estimated at approxi-
`mately 27 MIPS using the Zoran ZR38001 general-purpose DSP core 1Vern951;
`for the Motorola DSP56002 processor, the complexity was estimated at 45
`MIPS [Vern95]. Usually, most of the audio coders rely on the so-called asym-
`metric encoding principle. This means that the codec complexity is not evenly
`
`Page 18
`
`Page 18
`
`

`

`TYPES OF AUDIO CODERS - AN OVERVIEW
`
`7
`
`shared between the encoder and the decoder (typically, encoder 80% and decoder
`20% complexity). with more emphasis on reducing the decoder complexity.
`
`1.3.4 Codec Delay
`Many of the network applications for high-fidelity audio (streaming audio, audio-
`on-demand) are delay tolerant (up to 100-200 ms), providing the opportunity
`to exploit long-term signal properties in order to achieve high coding gain.
`However. in two-way real-time communication and voice-over Internet proto-
`col (VolP) applications, low-delay encoding (10-20 ms) is important. Consider
`the example described before, i.e., an audio coder operating on frames of 5 ms
`at a 48 kHz sampling frequency. In an ideal encoding scenario, the minimum
`amount of delay should be 5 ins at the encoder and 5 ins at the decoder (same as
`the frame length). However, other factors such as analysis-synthesis filter • hank
`window, the look-ahead, the bit-reservoir, and the channel delay contribute to
`additional delays. Employing shorter analysis-synthesis windows, avoiding look-
`ahead, and re-structuring the bit-reservoir functions could result in low-delay
`encoding, nonetheless, with reduced coding efficiencies.
`
`1.3.5 Error Robustness
`
`The increasing popularity of streaming audio over packet-switched and wire-
`less networks such as the Internet implies that any algorithm intended for such
`applications must be able to deal with a noisy thne-varying channel. In partic-
`ular, provisions for error robustness and error protection must be incorporated
`at the encoder in order to achieve reliable transmission of digital audio over
`error-prone channels. One simple idea could be to provide better protection to
`the error-sensitive and priority (important) bits. For instance, the audio frame
`header requires the maximum error robustness: otherwise, transmission errors
`in the header will seriously impair the entire audio frame. Several error detect-
`ing/correcting codes [Lin82] [Wick95] [13ay197] 1Swee021 [Zara021 can also be
`employed. Inclusion of error correcting codes in the bitstrearn might help to obtain
`error-free reproduction of the input audio, however, with increased complexity
`and bit rates.
`From the di

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket