throbber
Digital Audio Compression
`
`By Davis Yen Pan
`
`Abstract
`
`Compared to most digital data types, with the
`exception of digital video, the data rates associ-
`ated with uncompressed digital audio are substan-
`tial. Digital audio compression enables more effi-
`cient storage and transmission of audio data. The
`many forms of audio compression techniques offer a
`range of encoder and decoder complexity, compressed
`audio quality, and differing amounts of data com-
`pression. The -law transformation and ADPCM
`coder are simple approaches with low-complexity,
`low-compression, and medium audio quality algo-
`rithms. The MPEG/audio standard is a high-
`complexity, high-compression, and high audio qual-
`ity algorithm. These techniques apply to general au-
`dio signals and are not specifically tuned for speech
`signals.
`
`Introduction
`
`Digital audio compression allows the efficient stor-
`age and transmission of audio data. The various au-
`dio compression techniques offer different levels of
`complexity, compressed audio quality, and amount
`of data compression.
`
`This paper is a survey of techniques used to com-
`press digital audio signals. Its intent is to provide
`useful information for readers of all levels of ex-
`perience with digital audio processing. The paper
`
`begins with a summary of the basic audio digitiza-
`tion process. The next two sections present detailed
`descriptions of two relatively simple approaches to
`audio compression: -law and adaptive differential
`pulse code modulation. In the following section, the
`paper gives an overview of a third, much more so-
`phisticated, compression audio algorithm from the
`Motion Picture Experts Group. The topics covered
`in this section are quite complex and are intended
`for the reader who is familiar with digital signal
`processing. The paper concludes with a discussion
`of software-only real-time implementations.
`
`Digital Audio Data
`
`The digital representation of audio data offers
`many advantages: high noise immunity, stability,
`and reproducibility. Audio in digital form also al-
`lows the efficient implementation of many audio
`processing functions (e.g., mixing, filtering, and
`equalization) through the digital computer.
`
`The conversion from the analog to the digital do-
`main begins by sampling the audio input in regular,
`discrete intervals of time and quantizing the sam-
`pled values into a discrete number of evenly spaced
`levels. The digital audio data consists of a sequence
`of binary values representing the number of quan-
`tizer levels for each audio sample. The method of
`representing each sample with an independent code
`word is called pulse code modulation (PCM). Figure
`1 shows the digital audio process.
`
`00110111000...
`
`11001100100...
`
`ANALOG
`AUDIO
`INPUT
`
`ANALOG-TO-DIGITAL
`CONVERSION
`
`PCM
`VALUES
`
`DIGITAL SIGNAL
`PROCESSING
`
`PCM
`VALUES
`
`DIGITAL-TO-ANALOG
`CONVERSION
`
`ANALOG
`AUDIO
`OUTPUT
`
`Figure 1 Digital Audio Process
`
`Digital Technical Journal Vol. 5No. 2,Spring1993 1
`
`IPR2016-01710
`UNIFIED EX1015
`
`

`
`Digital Audio Compression
`
`According to the Nyquist theory, a time-sampled
`signal can faithfully represent signals up to half the
`sampling rate.[1] Typical sampling rates range from
`8 kilohertz (kHz) to 48 kHz. The 8-kHz rate covers
`a frequency range up to 4 kHz and so covers most of
`the frequencies produced by the human voice. The
`48-kHz rate covers a frequency range up to 24 kHz
`and more than adequately covers the entire audi-
`ble frequency range, which for humans typically ex-
`tends to only 20 kHz.
`In practice, the frequency
`range is somewhat less than half the sampling rate
`because of the practical system limitations.
`
`The number of quantizer levels is typically a power
`of 2 to make full use of a fixed number of bits per au-
`dio sample to represent the quantized values. With
`uniform quantizer step spacing, each additional bit
`has the potential of increasing the signal-to-noise
`ratio, or equivalently the dynamic range, of the
`quantized amplitude by roughly 6 decibels (dB). The
`typical number of bits per sample used for digital
`audio ranges from 8 to 16. The dynamic range ca-
`pability of these representations thus ranges from
`48 to 96 dB, respectively. To put these ranges into
`perspective, if 0 dB represents the weakest audi-
`ble sound pressure level, then 25 dB is the mini-
`mum noise level in a typical recording studio, 35 dB
`is the noise level inside a quiet home, and 120 dB
`is the loudest level before discomfort begins.[2] In
`terms of audio perception, 1 dB is the minimum au-
`dible change in sound pressure level under the best
`conditions, and doubling the sound pressure level
`amounts to one perceptual step in loudness.
`
`Compared to most digital data types (digital video
`excluded), the data rates associated with uncom-
`pressed digital audio are substantial. For example,
`the audio data on a compact disc (2 channels of au-
`dio sampled at 44.1 kHz with 16 bits per sample) re-
`quires a data rate of about 1.4 megabits per second.
`There is a clear need for some form of compression
`to enable the more efficient storage and transmis-
`sion of this data.
`
`The many forms of audio compression techniques
`differ in the trade-offs between encoder and decoder
`complexity, the compressed audio quality, and the
`amount of data compression. The techniques pre-
`sented in the following sections of this paper cover
`the full range from the -law, a low-complexity,
`low-compression, and medium audio quality algo-
`rithm, to MPEG/audio, a high-complexity, high-
`compression, and high audio quality algorithm.
`These techniques apply to general audio signals and
`are not specifically tuned for speech signals. This
`paper does not cover audio compression algorithms
`designed specifically for speech signals. These al-
`gorithms are generally based on a modeling of the
`vocal tract and do not work well for nonspeech audio
`
`2 Digital Technical Journal Vol. 5No. 2,Spring1993
`
`signals.[3,4] The federal standards 1015 LPC (linear
`predictive coding) and 1016 CELP (coded excited lin-
`ear prediction) fall into this category of audio com-
`pression.
`
`-law Audio Compression
`
`The -law transformation is a basic audio compres-
`sion technique specified by the Comité Consultatif
`Internationale de Télégraphique et Téléphonique
`(CCITT) Recommendation G.711.[5] The transfor-
`mation is essentially logarithmic in nature and al-
`lows the 8 bits per sample output codes to cover
`a dynamic range equivalent to 14 bits of linearly
`quantized values. This transformation offers a com-
`pression ratio of (number of bits per source sample)
`/8 to 1. Unlike linear quantization, the logarithmic
`step spacings represent low-amplitude audio sam-
`ples with greater accuracy than higher-amplitude
`values. Thus the signal-to-noise ratio of the trans-
`formed output is more uniform over the range of
`amplitudes of the input signal. The -law transfor-
`mation is
`
`y = 255 127
`ln(1+)  ln(1 + jxj) f or x < 0
`
`ln(1+)  ln(1 + jxj) f or x  0
`127 127
`
`where m = 255, and x is the value of the input sig-
`nal normalized to have a maximum value of 1. The
`CCITT Recommendation G.711 also specifies a simi-
`lar A-law transformation. The -law transformation
`is in common use in North America and Japan for
`the Integrated Services Digital Network (ISDN) 8-
`kHz-sampled, voice-grade, digital telephony service,
`and the A-law transformation is used elsewhere for
`the ISDN telephony.
`
`Adaptive Differential Pulse Code
`Modulation
`
`Figure 2 shows a simplified block diagram of an
`adaptive differential pulse code modulation (AD-
`PCM) coder.[6] For the sake of clarity, the figure
`omits details such as bit-stream formatting, the pos-
`sible use of side information, and the adaptation
`blocks. The ADPCM coder takes advantage of the
`fact that neighboring audio samples are generally
`similar to each other. Instead of representing each
`audio sample independently as in PCM, an ADPCM
`encoder computes the difference between each au-
`dio sample and its predicted value and outputs the
`PCM value of the differential. Note that the AD-
`PCM encoder (Figure 2a) uses most of the compo-
`nents of the ADPCM decoder (Figure 2b) to compute
`the predicted values.
`
`

`
`X[n] +
`
`+
`
`–
`
`D[n]
`
`(ADAPTIVE)
`QUANTIZER
`
`C[n]
`
`Xp[n – 1]
`
`(ADAPTIVE)
`PREDICTOR
`
`(ADAPTIVE)
`DEQUANTIZER
`
`+
`
`Dq[n]
`
`Xp[n]
`
`+
`
`+
`
`(a) ADPCM Encoder
`
`C[n]
`
`(ADAPTIVE)
`DEQUANTIZER
`
`Dq[n] +
`
`+
`
`+
`
`Xp[n]
`
`Xp[n – 1]
`
`(ADAPTIVE)
`PREDICTOR
`
`(b) ADPCM Decoder
`
`Figure 2 ADPCM Compression and Decompression
`
`The quantizer output is generally only a (signed)
`representation of the number of quantizer levels.
`The requantizer reconstructs the value of the quan-
`tized sample by multiplying the number of quan-
`tizer levels by the quantizer step size and possibly
`adding an offset of half a step size. Depending on
`the quantizer implementation, this offset may be
`necessary to center the requantized value between
`the quantization thresholds.
`
`The ADPCM coder can adapt to the characteristics
`of the audio signal by changing the step size of ei-
`ther the quantizer or the predictor, or by changing
`both. The method of computing the predicted value
`and the way the predictor and the quantizer adapt
`to the audio signal vary among different ADPCM
`coding systems.
`
`Some ADPCM systems require the encoder to pro-
`vide side information with the differential PCM val-
`ues. This side information can serve two purposes.
`First, in some ADPCM schemes the decoder needs
`the additional information to determine either the
`predictor or the quantizer step size, or both. Second,
`the data can provide redundant contextual informa-
`tion to the decoder to enable recovery from errors in
`the bit stream or to allow random access entry into
`the coded bit stream.
`
`Digital Audio Compression
`
`The following section describes the ADPCM algo-
`rithm proposed by the Interactive Multimedia Asso-
`ciation (IMA). This algorithm offers a compression
`factor of (number of bits per source sample)/4 to 1.
`Other ADPCM audio compression schemes include
`the CCITT Recommendation G.721 (32 kilobits per
`second compressed data rate) and Recommendation
`G.723 (24 kilobits per second compressed data rate)
`standards and the compact disc interactive audio
`compression algorithm.[7,8]
`
`The IMA ADPCM Algorithm. The IMA is a consor-
`tium of computer hardware and software vendors
`cooperating to develop a de facto standard for com-
`puter multimedia data. The IMA’s goal for its audio
`compression proposal was to select a public-domain
`audio compression algorithm able to provide good
`compressed audio quality with good data compres-
`sion performance. In addition, the algorithm had to
`be simple enough to enable software-only, real-time
`decompression of stereo, 44.1-kHz-sampled, audio
`signals on a 20-megahertz (MHz) 386-class com-
`puter. The selected ADPCM algorithm not only
`meets these goals, but is also simple enough to en-
`able software-only, real-time encoding on the same
`computer.
`
`The simplicity of the IMA ADPCM proposal lies
`in the crudity of its predictor. The predicted value
`of the audio sample is simply the decoded value of
`the immediately previous audio sample. Thus the
`predictor block in Figure 2 is merely a time-delay
`element whose output is the input delayed by one
`audio sample interval. Since this predictor is not
`adaptive, side information is not necessary for the
`reconstruction of the predictor.
`
`Figure 3 shows a block diagram of the quantization
`process used by the IMA algorithm. The quantizer
`outputs four bits representing the signed magnitude
`of the number of quantizer levels for each input sam-
`ple.
`
`Digital Technical Journal Vol. 5No. 2,Spring1993 3
`
`

`
`Digital Audio Compression
`
`Adaptation to the audio signal takes place only in
`the quantizer block. The quantizer adapts the step
`size based on the current step size and the quan-
`tizer output of the immediately previous input. This
`adaptation can be done as a sequence of two table
`lookups. The three bits representing the number of
`quantizer levels serve as an index into the first table
`lookup whose output is an index adjustment for the
`second table lookup. This adjustment is added to a
`stored index value, and the range-limited result is
`used as the index to the second table lookup. The
`summed index value is stored for use in the next
`iteration of the step-size adaptation. The output of
`the second table lookup is the new quantizer step
`size. Note that given a starting value for the in-
`dex into the second table lookup, the data used for
`adaptation is completely deducible from the quan-
`tizer outputs; side information is not required for
`the quantizer adaptation. Figure 4 illustrates a
`block diagram of the step-size adaptation process,
`and Tables 1 and 2 provide the table lookup con-
`tents.
`
`Table 1
`First Table Lookup for the IMA
`ADPCM Quantizer Adaptation
`
`Three Bits Quantized
`Magnitude
`
`Index Adjustment
`
`000
`
`001
`
`010
`
`011
`
`100
`
`101
`
`110
`
`111
`
`-1
`
`-1
`
`-1
`
`-1
`
`2
`
`4
`
`6
`
`8
`
`START
`
`SAMPLE < 0
`?
`
`NO
`
`BIT 3 = 0
`
`YES
`
`BIT 3 = 1;
`SAMPLE = – SAMPLE
`
`>–
`SAMPLE
` STEP SIZE
` ?
`
`YES
`
`BIT 2 = 1;
`SAMPLE =
` SAMPLE – STEP SIZE
`
`NO
`
`BIT 2 = 0
`
`>–
` SAMPLE
`STEP SIZE/2
` ?
`
`YES
`
`BIT 1 = 1;
`SAMPLE =
` SAMPLE – STEP SIZE/2
`
`NO
`
`BIT 1 = 0
`
`>–
` SAMPLE
`STEP SIZE/4
` ?
`
`NO
`
`BIT 0 = 0
`
`YES
`
`BIT 0 = 1
`
`DONE
`
`Figure 3 IMA ADPCM Quantization
`
`4 Digital Technical Journal Vol. 5No. 2,Spring1993
`
`

`
`Digital Audio Compression
`
`LOWER THREE
`BITS OF
`QUANTIZER
`OUTPUT
`
`FIRST
`TABLE
`LOOKUP
`
`INDEX
`ADJUSTMENT
`+
`
`+
`
`+
`
`LIMIT VALUE
`BETWEEN
`0 AND 88
`
`STEP
`SIZE
`
`SECOND
`TABLE
`LOOKUP
`
`DELAY FOR NEXT
`ITERATION OF
`STEP-SIZE
`ADAPTATION
`
`Figure 4 IMA ADPCM Step-size Adaptation
`
`Table 2
`Second Table Lookup for the IMA ADPCM Quantizer Adaptation
`
`Index
`
`Step Size
`
`Index
`
`Step Size
`
`Index
`
`Step Size
`
`Index
`
`Step Size
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`16
`
`17
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`29
`
`30
`
`31
`
`60
`
`66
`
`73
`
`80
`
`88
`
`97
`
`107
`
`118
`
`130
`
`143
`
`44
`
`45
`
`46
`
`47
`
`48
`
`49
`
`50
`
`51
`
`52
`
`53
`
`494
`
`544
`
`598
`
`658
`
`724
`
`796
`
`876
`
`963
`
`1,060
`
`1,166
`
`66
`
`67
`
`68
`
`69
`
`70
`
`71
`
`72
`
`73
`
`74
`
`75
`
`4,026
`
`4,428
`
`4,871
`
`5,358
`
`5,894
`
`6,484
`
`7,132
`
`7,845
`
`8,630
`
`9,493
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`19
`
`21
`
`23
`
`25
`
`28
`
`31
`
`34
`
`37
`
`41
`
`45
`
`50
`
`55
`
`32
`
`33
`
`34
`
`35
`
`36
`
`37
`
`38
`
`39
`
`40
`
`41
`
`42
`
`43
`
`157
`
`173
`
`190
`
`209
`
`230
`
`253
`
`279
`
`307
`
`337
`
`371
`
`408
`
`449
`
`54
`
`55
`
`56
`
`57
`
`58
`
`59
`
`60
`
`61
`
`62
`
`63
`
`64
`
`65
`
`1,282
`
`1,411
`
`1,552
`
`1,707
`
`1,878
`
`2,066
`
`2,272
`
`2,499
`
`2,749
`
`3,024
`
`3,327
`
`3,660
`
`76
`
`77
`
`78
`
`79
`
`80
`
`81
`
`82
`
`83
`
`84
`
`85
`
`86
`
`87
`
`88
`
`10,442
`
`11,487
`
`12,635
`
`13,899
`
`15,289
`
`16,818
`
`18,500
`
`20,350
`
`22,358
`
`24,623
`
`27,086
`
`29,794
`
`32,767
`
`Digital Technical Journal Vol. 5No. 2,Spring1993 5
`
`

`
`of the output wave form is unchanged unless the
`index to the second step-size table lookup is range
`limited. Range limiting results in a partial or full
`correction to the value of the step size.
`
`The nature of the step-size adaptation limits the
`impact of an error in the step size. Note that an er-
`ror in step_size[m+1] caused by an error in a single
`code word can be at most a change of (1.1)9, or 7.45
`dB in the value of the step size. Note also that any
`sequence of 88 code words that all have magnitude
`3 or less (refer to Table 1) completely corrects the
`step size to its minimum value. Even at the lowest
`audio sampling rate typically used, 8 kHz, 88 sam-
`ples correspond to 11 milliseconds of audio. Thus
`random access entry or edit points exist whenever
`11 milliseconds of low-level signal occur in the audio
`stream.
`
`MPEG/Audio Compression
`
`The Motion Picture Experts Group (MPEG) audio
`compression algorithm is an International Organi-
`zation for Standardization (ISO) standard for high-
`fidelity audio compression. It is one part of a three-
`part compression standard. With the other two
`parts, video and systems, the composite standard
`addresses the compression of synchronized video
`and audio at a total bit rate of roughly 1.5 megabits
`per second.
`
`Like -law and ADPCM, the MPEG/audio com-
`pression is lossy; however, the MPEG algorithm can
`achieve transparent, perceptually lossless compres-
`sion. The MPEG/audio committee conducted exten-
`sive subjective listening tests during the develop-
`ment of the standard. The tests showed that even
`with a 6-to-1 compression ratio (stereo, 16-bit-per-
`sample audio sampled at 48 kHz compressed to 256
`kilobits per second) and under optimal listening con-
`ditions, expert listeners were unable to distinguish
`between coded and original audio clips with statisti-
`cal significance. Furthermore, these clips were spe-
`cially chosen because they are difficult to compress.
`Grewin and Ryden give the details of the setup, pro-
`cedures, and results of these tests.[9]
`
`The high performance of this compression algo-
`rithm is due to the exploitation of auditory mask-
`ing. This masking is a perceptual weakness of
`the ear that occurs whenever the presence of a
`strong audio signal makes a spectral neighborhood
`of weaker audio signals imperceptible. This noise-
`masking phenomenon has been observed and cor-
`roborated through a variety of psychoacoustic ex-
`periments.[10]
`
`Digital Audio Compression
`
`IMA ADPCM: Error Recovery. A fortunate side
`effect of the design of this ADPCM scheme is that
`decoder errors caused by isolated code word errors
`or edits, splices, or random access of the compressed
`bit stream generally do not have a disastrous impact
`on decoder output. This is usually not true for com-
`pression schemes that use prediction. Since predic-
`tion relies on the correct decoding of previous audio
`samples, errors in the decoder tend to propagate.
`The next section explains why the error propagation
`is generally limited and not disastrous for the IMA
`algorithm. The decoder reconstructs the audio sam-
`ple, Xp[n], by adding the previously decoded audio
`sample, Xp[n—1]to the result of a signed magnitude
`product of the code word, C[n], and the quantizer
`step size plus an offset of one-half step size:
`
`Xp[n] = Xp[n—1] + step_size[n]  C’ [n]
`
`where C’ [n] = one-half plus a suitable numeric con-
`version of C [n].
`
`An analysis of the second step-size table lookup
`reveals that each successive entry is about 1.1 times
`the previous entry. As long as range limiting of the
`second table index does not take place, the value
`for step_size[n] is approximately the product of the
`previous value, step_size[n—1], and a function of
`the code word, F(C[n—1]):
`
`step_size[n]= step_size[n—1]  F(C[n—1])
`
`The above two equations can be manipulated to
`express the decoded audio sample, Xp[n], as a func-
`tion of the step size and the decoded sample value
`at time, m, and the set of code words between time,
`m, and n
`
`Xp[n] = Xp[m]+ step_size [m]
`
`F (C[j ])gC 0[i]
`
`iQ
`
`j=m+1
`
`f
`
`nP
`
`i=m+1
`
`
`
`Note that the terms in the summation are only a
`function of the code words from time m+1 onward.
`An error in the code word, C[q], or a random access
`entry into the bit stream at time q can result in an
`error in the decoded output, Xp[q], and the quan-
`tizer step size, step_size[q+1]. The above equation
`shows that an error in Xp[m] amounts to a constant
`offset to future values of Xp[n]. This offset is in-
`audible unless the decoded output exceeds its per-
`missible range and is clipped. Clipping results in
`a momentary audible distortion but also serves to
`correct partially or fully the offset term. Further-
`more, digital high-pass filtering of the decoder out-
`put can remove this constant offset term. The above
`equation also shows that an error in step_size[m+1]
`amounts to an unwanted gain or attenuation of fu-
`ture values of the decoded output Xp[n]. The shape
`
`6 Digital Technical Journal Vol. 5No. 2,Spring1993
`
`

`
`Empirical results also show that the ear has a lim-
`ited frequency selectivity that varies in acuity from
`less than 100 Hz for the lowest audible frequencies
`to more than 4 kHz for the highest. Thus the audi-
`ble spectrum can be partitioned into critical bands
`that reflect the resolving power of the ear as a func-
`tion of frequency. Table 3 gives a listing of critical
`bandwidths.
`
`Because of the ear’s limited frequency resolving
`power, the threshold for noise masking at any given
`frequency is solely dependent on the signal activity
`within a critical band of that frequency. Figure 5 il-
`lustrates this property. For audio compression, this
`property can be capitalized by transforming the au-
`dio signal into the frequency domain, then dividing
`the resulting spectrum into subbands that approx-
`imate critical bands, and finally quantizing each
`subband according to the audibility of quantization
`noise within that band. For optimal compression,
`each band should be quantized with no more lev-
`els than necessary to make the quantization noise
`inaudible. The following sections present a more
`detailed description of the MPEG/audio algorithm.
`
`MPEG/Audio Encoding and Decoding
`
`Figure 6 shows block diagrams of the MPEG/audio
`encoder and decoder.[11,12] In this high-level rep-
`resentation, encoding closely parallels the process
`described above. The input audio stream passes
`through a filter bank that divides the input into
`multiple subbands. The input audio stream simul-
`taneously passes through a psychoacoustic model
`that determines the signal-to-mask ratio of each
`subband. The bit or noise allocation block uses the
`signal-to-mask ratios to decide how to apportion the
`total number of code bits available for the quanti-
`zation of the subband signals to minimize the au-
`dibility of the quantization noise. Finally, the last
`block takes the representation of the quantized au-
`dio samples and formats the data into a decodable
`bit stream. The decoder simply reverses the for-
`matting, then reconstructs the quantized subband
`values, and finally transforms the set of subband
`values into a time-domain audio signal. As speci-
`fied by the MPEG requirements, ancillary data not
`necessarily related to the audio stream can be fitted
`within the coded bit stream.
`
`The MPEG/audio standard has three distinct lay-
`ers for compression. Layer I forms the most basic
`algorithm, and Layers II and III are enhancements
`that use some elements found in Layer I. Each suc-
`cessive layer improves the compression performance
`but at the cost of greater encoder and decoder com-
`plexity.
`
`Digital Audio Compression
`
`Layer I. The Layer I algorithm uses the basic filter
`bank found in all layers. This filter bank divides
`the audio signal into 32 constant-width frequency
`bands. The filters are relatively simple and pro-
`vide good time resolution with reasonable frequency
`resolution relative to the perceptual properties of
`the human ear. The design is a compromise with
`three notable concessions. First, the 32 constant-
`width bands do not accurately reflect the ear’s crit-
`ical bands. Figure 7 illustrates this discrepancy.
`The bandwidth is too wide for the lower frequencies
`so the number of quantizer bits cannot be specifi-
`cally tuned for the noise sensitivity within each crit-
`ical band. Instead, the included critical band with
`the greatest noise sensitivity dictates the number of
`quantization bits required for the entire filter band.
`Second, the filter bank and its inverse are not loss-
`less transformations. Even without quantization,
`the inverse transformation would not perfectly re-
`cover the original input signal. Fortunately, the er-
`ror introduced by the filter bank is small and inaudi-
`ble. Finally, adjacent filter bands have a significant
`frequency overlap. A signal at a single frequency
`can affect two adjacent filter bank outputs.
`
`Table 3
`Approximate Critical Band Boundaries
`
`Band Number
`
`Frequency
`(Hz)1
`
`Band Number
`
`Frequency
`(Hz)1
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`50
`
`95
`
`140
`
`235
`
`330
`
`420
`
`560
`
`660
`
`800
`
`940
`
`1,125
`
`1,265
`
`1,500
`
`1,735
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`1Frequencies are at the upper end of the band.
`
`1,970
`
`2,340
`
`2,720
`
`3,280
`
`3,840
`
`4,690
`
`5,440
`
`6,375
`
`7,690
`
`9,375
`
`11,625
`
`15,375
`
`20,250
`
`Digital Technical Journal Vol. 5No. 2,Spring1993 7
`
`

`
`Digital Audio Compression
`
`STRONG TONAL SIGNAL
`
`(cid:0)
`
`REGION WHERE WEAKER
`SIGNALS ARE MASKED
`
`
`
`(cid:0)(cid:0)
`
`AMPLITUDE
`
`(cid:0)
`(cid:0)
`(cid:0)
`
`FREQUENCY
`
`Figure 5 Audio Noise Masking
`
`PCM AUDIO
`INPUT
`
`TIME-TO-FREQUENCY
`MAPPING FILTER
`BANK
`
`BIT/NOISE
`ALLOCATION,
`QUANTIZER, AND
`CODING
`
`BIT-STREAM
`FORMATTING
`
`ENCODED
`BIT STREAM
`
`PSYCHOACOUSTIC
`MODEL
`
`ANCILLARY DATA
`(OPTIONAL)
`
`(a) MPEG/Audio Encoder
`
`ENCODED
`BIT STREAM
`
`BIT-STREAM
`UNPACKING
`
`FREQUENCY
`SAMPLE
`RECONSTRUCTION
`
`FREQUENCY-TO-TIME
`MAPPING
`
`DECODED
`PCM AUDIO
`
`ANCILLARY DATA
`(IF ENCODED)
`
`(b) MPEG/Audio Decoder
`
`Figure 6 MPEG/Audio Compression and Decompression
`
`8 Digital Technical Journal Vol. 5No. 2,Spring1993
`
`

`
`Digital Audio Compression
`
`MPEG/AUDIO FILTER BANK BANDS
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`29
`
`30
`
`31
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`CRITICAL BAND BOUNDARIES
`
`Figure 7 MPEG/Audio Filter Bandwidths versus Critical Bandwidths
`
`The filter bank provides 32 frequency samples, one
`sample per band, for every 32 input audio samples.
`The Layer I algorithm groups together 12 samples
`from each of the 32 bands. Each group of 12 sam-
`ples receives a bit allocation and, if the bit alloca-
`tion is not zero, a scale factor. Coding for stereo
`redundancy compression is slightly different and is
`discussed later in this paper. The bit allocation de-
`termines the number of bits used to represent each
`sample. The scale factor is a multiplier that sizes
`the samples to maximize the resolution of the quan-
`tizer. The Layer I encoder formats the 32 groups of
`12 samples (i.e., 384 samples) into a frame. Besides
`the audio data, each frame contains a header, an
`optional cyclic redundancy code (CRC) check word,
`and possibly ancillary data.
`
`Layer II. The Layer II algorithm is a simple en-
`hancement of Layer I. It improves compression per-
`formance by coding data in larger groups. The Layer
`II encoder forms frames of 3 by 12 by 32 = 1,152
`samples per audio channel. Whereas Layer I codes
`data in single groups of 12 samples for each sub-
`band, Layer II codes data in 3 groups of 12 samples
`for each subband. Again discounting stereo redun-
`dancy coding, there is one bit allocation and up to
`three scale factors for each trio of 12 samples. The
`encoder encodes with a unique scale factor for each
`group of 12 samples only if necessary to avoid audi-
`ble distortion. The encoder shares scale factor val-
`ues between two or all three groups in two other
`cases: (1) when the values of the scale factors are
`sufficiently close and (2) when the encoder antici-
`pates that temporal noise masking by the ear will
`hide the consequent distortion. The Layer II algo-
`rithm also improves performance over Layer I by
`representing the bit allocation, the scale factor val-
`ues, and the quantized samples with a more efficient
`code.
`
`Layer III. The Layer III algorithm is a much
`more refined approach.[13,14] Although based on
`the same filter bank found in Layers I and II, Layer
`III compensates for some filter bank deficiencies by
`processing the filter outputs with a modified discrete
`cosine transform (MDCT). Figure 8 shows a block
`diagram of the process.
`
`The MDCTs further subdivide the filter bank out-
`puts in frequency to provide better spectral resolu-
`tion. Because of the inevitable trade-off between
`time and frequency resolution, Layer III specifies
`two different MDCT block lengths: a long block of
`36 samples or a short block of 12. The short block
`length improves the time resolution to cope with
`transients. Note that the short block length is one-
`third that of a long block; when used, three short
`blocks replace a single long block. The switch be-
`tween long and short blocks is not instantaneous. A
`long block with a specialized long-to-short or short-
`to-long data window provides the transition mech-
`anism from a long to a short block. Layer III has
`three blocking modes: two modes where the outputs
`of the 32 filter banks can all pass through MDCTs
`with the same block length and a mixed block mode
`where the 2 lower-frequency bands use long blocks
`and the 30 upper bands use short blocks.
`
`Digital Technical Journal Vol. 5No. 2,Spring1993 9
`
`

`
`Digital Audio Compression
`
`PCM
`AUDIO
`INPUT
`
`LAYER I
`AND
`LAYER II
`FILTER
`BANK
`
`SUBBAND 0 MDCT
`WINDOW
`
`SUBBAND 1 MDCT
`WINDOW
`
`SUBBAND 31 MDCT
`WINDOW
`
`MDCT
`
`MDCT
`
`MDCT
`
`ALIAS
`REDUCTION
`(ONLY FOR
`LONG
`BLOCKS)
`
`LONG, LONG-TO-SHORT
`SHORT, SHORT-TO-LONG
`WINDOW SELECT
`
`LONG OR SHORT BLOCK
`CONTROL (FROM
`PSYCHOACOUSTIC MODEL)
`
`Figure 8 MPEG/Audio Layer III Filter Bank Processing, Encoder Side
`
`Other major enhancements over the Layer I and
`Layer II algorithms include:
`
`is actually calculated and specifically allocated to
`each subband.
`
`• Alias reduction — Layer III specifies a method
`of processing the MDCT values to remove some
`redundancy caused by the overlapping bands of
`the Layer I and Layer II filter bank.
`
`• Nonuniform quantization — The Layer III quan-
`tizer raises its input to the 3/4 power before quan-
`tization to provide a more consistent signal-to-
`noise ratio over the range of quantizer values.
`The requantizer in the MPEG/audio decoder re-
`linearizes the values by raising its output to the
`4/3 power.
`
`• Entropy coding of data values — Layer III uses
`Huffman codes to encode the quantized samples
`for better data compression.[15]
`
`• Use of a bit reservoir — The design of the Layer
`III bit stream better fits the variable length na-
`ture of the compressed data. As with Layer II,
`Layer III processes the audio data in frames of
`1,152 samples. Unlike Layer II, the coded data
`representing these samples does not necessar-
`ily fit into a fixed-length frame in the code bit
`stream. The encoder can donate bits to or bor-
`row bits from the reservoir when appropriate.
`
`• Noise allocation instead of bit allocation — The
`bit allocation process used by Layers I and II
`only approximates the amount of noise caused
`by quantization to a given number of bits. The
`Layer III encoder uses a noise allocation iteration
`loop. In this loop, the quantizers are varied in an
`orderly way, and the resulting quantization noise
`
`The Psychoacoustic Model
`
`The psychoacoustic model is the key component
`of the MPEG encoder that enables its high perfor-
`mance.[16,17,18,19] The job of the psychoacoustic
`model is to analyze the input audio signal and de-
`termine where in the spectrum quantization noise
`will be masked and to what extent. The encoder
`uses this information to decide how best to repre-
`sent the input audio signal with its limited number
`of code bits. The MPEG/audio standard provides
`two example implementations of the psychoacoustic
`model. Below is a general outline of the basic steps
`involved in the psychoacoustic calculations for ei-
`ther model.
`
`• Time align audio data — The psychoacoustic
`model must account for both the delay of the au-
`dio data through the filter bank and a data off-
`set so that the relevant data is centered within
`its analysis window. For example, when using
`psychoacoustic model two for Layer I, the de-
`lay through the filter bank is 256 samples, and
`the offset required to center the 384 samples of
`a Layer I frame in the 512-point psychoacoustic
`analysis window is (512 — 384)/2 = 64 points.
`The net offset is 320 points to time align the psy-
`choacoustic model data with the filter bank out-
`puts.
`
`• Convert audio to spectral domain — The psy-
`choacoustic model uses a time-to-frequency map-
`ping such as a 512- or 1,024-point Fourier trans-
`
`10 Digital Technical Journal Vol. 5No. 2,Spring1993
`
`

`
`form. A standard Hann weighting, applied to
`audio data before Fourier transformation, condi-
`tions the data to reduce the edge effects of the
`transform window. The model uses this sepa-
`rate and independent mapping instead of the fil-
`ter bank outputs because it needs finer frequency
`resolution to calculate the masking thresholds.
`• Partition spectral values into critical bands —
`To simplify the psychoacoustic calculations, the
`model groups the frequency values into percep-
`tual quanta.
`• Incorporate threshold in quiet — The model in-
`cludes an empirically determined absolute mask-
`ing threshold. This threshold is the lower bound
`for noise masking and is determined in the ab-
`sence of masking signals.
`• Separate into tonal and nontonal components —
`The model must identify and separate the tonal
`and noiselike components of the audio signal be-
`cause the noise-masking characteristics of the
`two types of signal are different.
`• Apply spreading function — The model deter-
`mines the noise-masking thresholds by applying
`an empirically determined masking or spreading
`function to the signal components.
`• Find the minimum masking threshold for each
`subband — The psychoacoustic model calculates
`the masking thresholds with a higher-frequency
`resolution than provided by the filter banks.
`Where the filter band is wide relative to the crit-
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket