throbber
Copyright (c) 1996 Institute of Electrical and Electronics Engineers. Reprinted, with permission, from the
`IEEE Multimedia Journal, Summer 1995 issue.
`
`This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any
`way imply IEEE endorsement of any of Motorola's or Digital's products or services. Internal or personal
`use of this material is permitted. However, permission to reprint/republish this material for advertising or
`promotional purposes or for creating new collective works for resale or redistribution must be obtained
`from the IEEE by sending a blank email message to info.pub.permissions@ieee.org.
`
`By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
`
`This article was published when the author was with Motorola, Inc. As of October 7, 1996, Davis Pan will
`be working at the Cambridge Research Laboratory of Digital Equipment Corporation in Cambridge,
`Massachusetts.
`
`A Tutorial on MPEG/Audio Compression
`
`Davis Pan
`
`Motorola Inc.
`1301 East Algonquin Road,
`Schaumburg, IL 60196
`
`ABSTRACT
`
`This tutorial covers the theory behind MPEG/audio compression. This algorithm was developed by the Motion Picture Experts
`Group (MPEG), as an International Organization for Standardization (ISO) standard for the high fidelity compression of digital
`audio. The MPEG/audio compression standard is one part of a multiple part standard that addresses the compression of video
`(11172-2), the compression of audio (11172-3), and the synchronization of the audio, video, and related data streams (11172-1)
`to an aggregate bit rate of about 1.5 Mbits/sec. The MPEG/audio standard also can be used for audio-only applications to
`compress high fidelity audio data at much lower bit rates.
`
`While the MPEG/audio compression algorithm is lossy, often it can provide "transparent", perceptually lossless, compression
`even with compression factors of 6-to-1 or more. The algorithm works by exploiting the perceptual properties of the human
`auditory system. This paper also will cover the basics of psychoacoustic modeling and the methods used by the MPEG/audio
`algorithm to compress audio data with least perceptible degradation.
`
`1. INTRODUCTION
`
`This tutorial covers the theory behind MPEG/audio compression. It is written for people with a modest background in digital
`signal processing and does not assume prior experience in audio compression or psychoacoustics. Here the goal is to give a
`broad, preliminary understanding of MPEG/audio compression; many details have been omitted. Wherever possible, this tutorial
`uses figures and illustrative examples to present the intricacies of the algorithm.
`
`The MPEG/audio compression algorithm is the first international standard[1,2] for the digital compression of high-fidelity audio.
`Other audio compression algorithms address speech-only applications or provide only medium-fidelity audio compression
`performance. For example, Code Excited Linear Prediction (CELP)[3] is a speech coding algorithm, while m -law and Adaptive
`Differential Pulse Code Modulation (ADPCM) are relatively simple compression algorithms that can provide medium fidelity
`audio compression. To contrast the complexity of the MPEG/audio algorithm with that of some simpler, generic audio
`compression algorithms, the annex of this paper presents the details of m -law and the ADPCM algorithm adopted by the
`Interactive Multimedia Association.
`
`The MPEG/audio standard is the result of over 3 years of collaborative work by an international committee of high-fidelity audio
`compression experts known as the Motion Picture Experts Group (MPEG/audio). The International Organization for Standards
`and the International Electrotechnical Commission (ISO/IEC) adopted this standard at the end of 1992.
`
`1
`
`Micro Motion 1014
`
`

`
`Although MPEG/audio compression is perfectly suitable for audio-only applications, it is actually one part of a three part
`compression standard. Combined with the other two parts, video and systems, the MPEG standard addresses the compression of
`synchronized video and audio at a total bit rate of about 1.5 Megabits/sec.
`
`The MPEG standard is rigid only where necessary to ensure inter-operability. It mandates the syntax of the coded bitstream,
`defines the decoding process, and provides compliance tests for assessing the accuracy of the decoder[4]. This guarantees that,
`regardless of origin, any fully compliant MPEG/audio decoder will be able to decode any MPEG/audio bitstream with a
`predictable result. A wide acceptance of this standard will permit manufacturers to produce and sell, at reasonable cost, large
`numbers of MPEG/audio codecs.
`
`Where possible, the standard is open to future innovative improvements. Designers are free to try new and different
`implementations of the encoder or decoder within the bounds of the standard. There is especially good potential for diversity in
`the encoder.
`
`1.1 MPEG/audio Features and Applications
`
`MPEG/audio is a generic audio compression standard. Unlike vocal-tract-model coders specially tuned for speech signals, the
`MPEG/audio coder gets its compression without making assumptions about the nature of the audio source. Instead, the coder
`exploits the perceptual limitations of the human auditory system. Much of the compression results from the removal of
`perceptually irrelevant parts of the audio signal. Removal of such parts results in inaudible distortions, thus MPEG/audio can
`compress any signal meant to be heard by the human ear. In keeping with its generic nature, MPEG/audio offers a diverse
`assortment of compression modes:
`
`• The audio sampling rate can be 32, 44.1, or 48 kHz.
`
`• The compressed bitstream can support one or two audio channels in one of 4 possible modes:
`
`1. a monophonic mode for a single audio channel,
`
`2. a dual-monophonic mode for two independent audio channels (this is functionally identical to the stereo mode),
`
`3. a stereo mode for stereo channels with a sharing of bits between the channels, but no joint-stereo coding, and
`
`4. a joint-stereo mode that either takes advantage of the correlations between the stereo channels or the irrelevancy of
`the phase difference between channels, or both.
`
`• The compressed bitstream can have one of several predefined fixed bit rates ranging from 32 to 224 kbits/sec per channel.
`Depending on the audio sampling rate, this translates to compression factors ranging from 2.7 to 24. In addition, the
`standard provides a "free" bit rate mode to support fixed bit rates other than the predefined rates.
`
`• MPEG/audio offers a choice of three independent layers of compression. This provides a wide range of tradeoffs between
`codec complexity and compressed audio quality:
`
`Layer I is the simplest and is best suited for bit rates above 128 kbits/sec per channel. For example, Philips' Digital
`Compact Cassette (DCC)[5] uses Layer I compression at 192 kbits/s per channel.
`
`Layer II has an intermediate complexity and is targeted for bit rates around 128 kbits/s per channel. Possible
`applications for this layer include the coding of audio for Digital Audio Broadcasting (DAB®)[6] , for the storage of
`synchronized video-and-audio sequences on CD-ROM, and the full motion extension of CD-interactive, Video CD.
`
`Layer III is the most complex but offers the best audio quality, particularly for bit rates around 64 kbits/s per channel.
`This layer is well suited for audio transmission over ISDN.
`
`All three layers are simple enough to allow single-chip, real-time decoder implementations.
`
`2
`
`

`
`• The coded bitstream supports an optional Cyclic Redundancy Check (CRC) error detection code.
`
`• MPEG/audio provides a means of including ancillary data within the bitstream.
`
`In addition, the MPEG/audio bitstream makes features such as random access, audio fast forwarding, and audio reverse possible.
`
`2. OVERVIEW
`
`The key to MPEG/audio compression is quantization. Although quantization is lossy, this algorithm can give "transparent",
`perceptually lossless, compression. The MPEG/audio committee conducted extensive subjective listening tests during the
`development of the standard. The tests showed that even with a 6-to-1 compression ratio (stereo, 16 bits/sample, audio sampled
`at 48 kHz compressed to 256 kbits/sec) and under optimal listening conditions, expert listeners were unable to distinguish
`between coded and original audio clips with statistical significance. Furthermore, these clips were specially chosen because they
`are difficult to compress. Reference 7 gives the details of the set up, procedures and results of these tests.
`
`Figure 1 shows block diagrams of the MPEG/audio encoder and decoder. The input audio stream passes through a filter bank
`that divides the input into multiple subbands of frequency. The input audio stream simultaneously passes through a
`psychoacoustic model that determines the ratio of the signal energy to the masking threshold for each subband. The bit or noise
`allocation block uses the signal-to-mask ratios to decide how to apportion the total number of code bits available for the
`quantization of the subband signals to minimize the audibility of the quantization noise. Finally, the last block takes the
`representation of the quantized subband samples and formats this data and side information into a coded bitstream. Ancillary
`data not necessarily related to the audio stream can be inserted within the coded bitstream. The decoder deciphers this bitstream,
`restores the quantized subband values, and reconstructs the audio signal from the subband values.
`
`The following sections explore various aspects of MPEG/audio compression in more detail. The first section covers the time to
`frequency mapping of the polyphase filter bank. The next section covers implementations of the psychoacoustic model followed
`by a more detailed descriptions of the 3 Layers of MPEG/audio compression. This gives enough background to cover a brief
`summary of the different bit (or noise) allocation processes used by the three layers and the joint stereo coding methods. The
`paper finishes with a short description of current MPEG/audio standards work.
`
`2.1 The Polyphase Filter Bank
`
`This section will give some insight into the behavior of the MPEG/audio polyphase filter bank by presenting a detailed
`examination of the encoder's analysis filter bank. A similar analysis applies to the decoder's synthesis filter bank.
`
`The polyphase filter bank is the key component common to all layers of MPEG/audio compression. This filter bank divides the
`audio signal into 32 equal-width frequency subbands. The filters are relatively simple and provide good time resolution with
`reasonable frequency resolution. The design is a good compromise with three notable concessions. First, the equal widths of
`the subbands do not accurately reflect the human auditory system's frequency dependent behavior. The width of a "critical band"
`as a function of frequency is a good indicator of this behavior. Many psychoacoustic effects are consistent with a critical band
`frequency scaling. For example, both the perceived loudness of a signal and its audibility in the presence of a masking signal is
`different for signals within one critical band than for signals that extend over more than one critical band. Figure 2 compares the
`polyphase filter bandwidths with the width of these critical bands. At lower frequencies a single subband covers several critical
`bands. In this circumstance the number of quantizer bits cannot be specifically tuned for the noise masking available for the
`individual critical bands. Instead, the critical band with the least noise masking dictates the number of quantization bits needed
`for the entire subband. Second, the filter bank and its inverse are not lossless transformations. Even without quantization, the
`inverse transformation cannot perfectly recover the original signal. However, by design the error introduced by the filter bank is
`small and inaudible. Finally, adjacent filter bands have a major frequency overlap. A signal at a single frequency can affect two
`adjacent filter bank outputs. Other parts of this paper will cover these issues in more detail.
`
`To understand the polyphase filter bank it is useful to examine its origin. The ISO MPEG/audio standard describes a procedure
`for computing the analysis polyphase filter outputs that is very similar to a method described by Rothweiler[8]. Figure 3 shows a
`structure for a MPEG-encoder-filter bank based on Rothweiler's proposal. For comparison, figure 4 shows the flow diagram
`from the ISO MPEG/audio standard for the same filter bank.
`
`By combining the equations and steps shown by flow diagram, one can derive the following equation for the filter bank outputs:
`
`3
`
`

`
`7
`
`63
`(cid:202) (cid:229)
`st[i] = (cid:229)
`(cid:202) M[i][k] * (C[k+64j] * x[k+64j])
`k=0
`j=0
`
`where:
`
` (1)
`
`i is the subband index and ranges from 0 to 31,
`st[i] is the filter output sample for subband i at time t, where t is an integer multiple of 32 audio sample intervals,
`C[n] is one of 512 coefficients of the analysis window defined in the standard,
`x[n] is an audio input sample read from a 512 sample buffer, and
`(2*i+1)*(k-16)*p
`M[i][k] = cos[
`64
`
`] are the analysis matrix coefficients.
`
`The above equation is partially optimized to reduce the number of computations. Because the function within the parenthesis is
`independent of the value of i, and M[i][k] is independent of j, the 32 filter outputs need only 512 + 32*64 = 2,560 multiplies and
`64*7+32*63 = 2,464 additions, or roughly 80 multiplies and additions per output. Substantially further reductions in multiplies
`and adds are possible with, for example, a fast Discrete Cosine Transform[9,10] or a fast Fourier Transform implementation[11].
`
`Note this filter bank implementation is critically sampled: for every 32 input samples, the filter bank produces 32 output samples.
`In effect, each of the 32 subband filters subsamples its output by 32 to produce only one output sample for every 32 new audio
`samples.
`
`One can manipulate equation (1) into a familiar filter convolution equation:
`
`511
`st[i] = (cid:229)
`(cid:202) x [t-n]*H i[n]
`n=0
`
`where:
`x [t ] is an audio sample at time t, and
`(2*i+1)*(n-16)*p
`] with
`Hi[n] = h[n]*cos[
`64
`h[n]= -C[n], if the integer part of (n/64) is odd,
` = C[n] otherwise, for n = 0 to 511.
`
` (2)
`
`In this form, each subband of the filter bank has its own band-pass filter response, Hi[n]. Although this form is more convenient
`for analysis, it is clearly not an efficient solution: A direct implementation of this equation requires 32*512 = 16,384 multiplies
`and 32*511 = 16,352 additions to compute the 32 filter outputs.
`
`The coefficients, h[n], correspond to the prototype low-pass filter response for the polyphase filter bank. Figure 5 compares a
`plot of h[n] with C[n]. The C[n] used in the partially optimized equation (1) has every odd numbered group of 64 coefficients of
`h[n] negated to compensate for M[i][k]. The cosine term of M[i][k] only ranges from k = 0 to 63 and covers an odd number of
`half cycles whereas the cosine terms of Hi[n] range from n=0 to 511 and cover 8 times the number of half cycles.
`
`The equation for Hi[n] clearly shows that each is a modulation of the prototype response with a cosine term to shift the low pass
`response to the appropriate frequency band, hence these are called polyphase filters. These filters have center frequencies at
`odd multiples of p /(64T) where T is the audio sampling period and each has a nominal bandwidth of p /(32T). As figure 6 shows,
`the prototype filter response does not have a sharp cutoff at its nominal bandwidth. So when the filter outputs are subsampled by
`32, there is a considerable amount of aliasing. The design of the prototype filter, and the inclusion of appropriate phase shifts in
`the cosine terms, results in a complete alias cancellation at the output of the decoder's synthesis filter bank[8,12]. Another
`consequence of using a filter with a wider-than-nominal bandwidth is an overlap in the frequency coverage of adjacent polyphase
`filters. This effect can be detrimental to efficient audio compression because signal energy near nominal subband edges will
`appear in two adjacent polyphase filter outputs. Figure 7 shows how a pure sinusoid tone, which has energy at only one
`frequency, appears at the output of two polyphase filters.
`
`4
`
`

`
`Although the polyphase filter bank is not lossless, any consequent errors are small. Figures 8 and 9 show the composite
`frequency response combining response of the encoder's analysis filter bank with that of the decoder's synthesis filter bank.
`Without quantization of the subband samples, the composite response has a ripple of less than .07 dB.
`
`2.2 Psychoacoustics
`
`The MPEG/audio algorithm compresses the audio data in large part by removing the acoustically irrelevant parts of the audio
`signal. That is, it takes advantage of the human auditory system's inability to hear quantization noise under conditions of
`auditory masking. This masking is a perceptual property of the human auditory system that occurs whenever the presence of a
`strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible. A variety of
`psychoacoustic experiments corroborate this masking phenomenon[13].
`
`Empirical results also show that the human auditory system has a limited, frequency dependent, resolution. This frequency
`dependency can be expressed in terms of critical band widths which are less than 100 Hz for the lowest audible frequencies and
`more than 4 kHz at the highest. The human auditory system blurs the various signal components within a critical band although
`this system's frequency selectivity is much finer than a critical band.
`
`Because of the human auditory system's frequency-dependent resolving power, the noise masking threshold at any given
`frequency is solely dependent on the signal energy within a limited bandwidth neighborhood of that frequency. Figure 10
`illustrates this property. MPEG/audio works by dividing the audio signal into frequency subbands that approximate critical
`bands, then quantizing each subband according to the audibility of quantization noise within that band. For the most efficient
`compression, each band should be quantized with no more levels than necessary to make the quantization noise inaudible.
`
`2.2.1 The Psychoacoustic Model
`
`The psychoacoustic model analyzes the audio signal and computes the amount of noise masking available as a function of
`frequency [,14,15,16,17]. The masking ability of a given signal component depends on its frequency position and its loudness. The
`encoder uses this information to decide how best to represent the input audio signal with its limited number of code bits. The
`MPEG/audio standard provides two example implementations of the psychoacoustic model. Psychoacoustic model 1 is less
`complex than psychoacoustic model 2 and has more compromises to simplify the calculations. Either model works for any of the
`layers of compression. However, only model 2 includes specific modifications to accommodate Layer III.
`
`There is considerable freedom in the implementation of the psychoacoustic model. The required accuracy of the model is
`dependent on the target compression factor and the intended application. For low levels of compression, where there is a
`generous supply of code bits, a complete bypass of the psychoacoustic model may be adequate for consumer use. In this case,
`the bit allocation process can iteratively assign bits to the subband with the lowest signal-to-noise ratio. For the archiving of
`music, the psychoacoustic model can be made much more stringent[18].
`
`Below is a general outline of the basic steps involved in the psychoacoustic calculations for either model. Differences between
`the two models will be highlighted.
`
`• Time align audio data. There is one psychoacoustic evaluation per frame. The audio data sent to the psychoacoustic
`model must be concurrent with the audio data to be coded. The psychoacoustic model must account for both the delay of
`the audio data through the filter bank and a data offset so that the relevant data is centered within the psychoacoustic
`analysis window. For example, when using psychoacoustic model 1 for Layer I, the delay through the filter bank is 256
`samples and the offset required to center the 384 samples of a Layer I frame in the 512 point analysis window is (512-
`384)/2 = 64 points. The net offset is 320 points to time align the psychoacoustic model data with the filter bank outputs.
`
`• Convert audio to a frequency domain representation. The psychoacoustic model should use a separate, independent, time-
`to-frequency mapping instead of the polyphase filter bank because it needs finer frequency resolution for an accurate
`calculation of the masking thresholds. Both psychoacoustic models use a Fourier transform for this mapping. A standard
`Hann weighting, applied to the audio data before Fourier transformation, conditions the data to reduce the edge effects of
`the transform window.
`
`Psychoacoustic model 1 uses a 512 sample analysis window for Layer I and a 1024 sample window for Layers II and
`III. Because there are only 384 samples in a Layer I frame, a 512 sample window provides adequate coverage. Here
`the smaller window size reduces the computational load. Layer II and III use a 1,152 sample frame size so the 1,024
`
`5
`
`

`
`sample window does not provide complete coverage. While ideally the analysis window should completely cover the
`samples to be coded, a 1,024 sample window is a reasonable compromise. Samples falling outside the analysis window
`generally will not have a major impact on the psychoacoustic evaluation.
`
`Psychoacoustic model 2 uses a 1,024 sample window for all layers. For Layer I, the model centers a frame's 384 audio
`samples in the psychoacoustic window as previously discussed. For Layers II and III, the model computes two 1,024
`point psychoacoustic calculations for each frame. The first calculation centers the first half of the 1,152 samples in the
`analysis window and the second calculation centers the second half. The model combines the results of the two
`calculations by using the higher of the two signal-to-mask ratios for each subband. This in effect selects the lower of
`the two noise masking thresholds for each subband.
`
`• Process spectral values in groupings related to critical band widths. To simplify the psychoacoustic calculations, both
`models process the frequency values in perceptual quanta.
`
`• Separate spectral values into tonal and non-tonal components. Both models identify and separate the tonal and noise-like
`components of the audio signal because the masking abilities of the two types of signal are different.
`
`Psychoacoustic model 1 identifies tonal components based on the local peaks of the audio power spectrum. After
`processing the tonal components, model 1 sums the remaining spectral values into a single non-tonal component per
`critical band. The frequency index of each of these concentrated non-tonal components is the value closest to the
`geometric mean of the enclosing critical band.
`
`Psychoacoustic model 2 never actually separates tonal and non-tonal components. Instead, it computes a tonality index
`as a function of frequency. This index gives a measure of whether the component is more tone-like or noise-like. Model
`2 uses this index to interpolate between pure tone-masking-noise and noise-masking-tone values. The tonality index is
`based on a measure of predictability. Model 2 uses data from the previous two analysis windows to predict, via linear
`extrapolation, the component values for the current window. Tonal components are more predictable and thus will have
`higher tonality indices. Because this process relies on more data, it is more likely to better discriminate between tonal
`and non-tonal components than the model 1 method.
`
`• Apply a spreading function. The masking ability of a given signal spreads across its surrounding critical band. The model
`determines the noise masking thresholds by first applying an empirically determined masking (model 1) or spreading
`function (model 2) to the signal components.
`
`• Set a lower bound for the threshold values. Both models include an empirically determined absolute masking threshold,
`the threshold in quiet. This threshold is the lower bound on the audibility of sound.
`
`• Find the masking threshold for each subband. Both psychoacoustic models calculate the masking thresholds with a higher
`frequency resolution than that provided by the polyphase filter bank. Both models must derive a subband threshold value
`from possibly a multitude of masking thresholds computed for frequencies within that subband.
`
`Model 1 selects the minimum masking threshold within each subband. While this approach is good for the lower
`frequency subbands where the subband is narrow relative to a critical band, it may be inaccurate for the higher
`frequency subbands because critical bands for that frequency range span several subbands. These inaccuracies arise
`because model 1 concentrates all the non-tonal components within each critical band into a single value at a single
`frequency. In effect model 1 converts non-tonal components into a form of tonal component. A subband within a wide
`critical band but far from the concentrated non-tonal component will not get an accurate non-tonal masking assessment.
`This approach is a compromise to reduce the computational loads.
`
`Model 2 selects the minimum of the masking thresholds covered by the subband only where the band is wide relative to
`the critical band in that frequency region. It uses the average of the masking thresholds covered by the subband where
`the band is narrow relative to the critical band. Model 2 is not less accurate for the higher frequency subbands because
`it does not concentrate the non-tonal components.
`
`6
`
`

`
`• Calculate the signal-to-mask ratio. The psychoacoustic model computes the signal-to-mask ratio as the ratio of the signal
`energy within the subband (or, for Layer III , a group of bands) to the minimum masking threshold for that subband. The
`model passes this value to the bit (or noise) allocation section of the encoder.
`
`2.2.1.1 An Example of Psychoacoustic Model Analysis
`
`This section gives an illustrative example of the analysis used by psychoacoustic model 1 and model 2. Figure 11 is a spectral
`plot of the example audio signal to be psychoacoustically analyzed and compressed. This signal consists of a combination of a
`strong, 11,250 Hz, sinusoidal tone with lowpass noise.
`
`2.2.1.1.1 Example for Psychoacoustic Model 2
`
`The processes used by psychoacoustic model 2 are somewhat easier to visualize, so this model will be covered first. Figure 12a
`shows the result, according to psychoacoustic model 2, of transforming the audio signal to the perceptual domain (63, one-third
`critical band, partitions) and then applying the spreading function. Note the shift of the sinusoid peak and the expansion of the
`lowpass noise distribution. The perceptual transformation expands the low frequency region and compresses the higher
`frequency region. Because the spreading function is applied in a perceptual domain, the shape of the spreading function is
`relatively uniform as a function of partition. Figure 13 shows a plot of the spreading functions. Figure 12b shows the tonality
`index for the audio signal as computed by psychoacoustic model 2. Figure 14a shows a plot of the masking threshold as
`computed by the model based on the spread energy and the tonality index. This figure has plots of the masking threshold both
`before and after the incorporation of the threshold in quiet to illustrate its impact. Note the threshold in quiet significantly
`increases the noise masking threshold for the higher frequencies. The human auditory system is much less sensitive in this
`region. Also note how the sinusoid signal increases the masking threshold for the neighboring frequencies. The masking
`threshold is computed in the uniform frequency domain instead of the perceptual domain in preparation for the final step of the
`psychoacoustic model, the calculation of the signal-to-mask ratios (SMR) for each subband. Figure 14b is a plot of these results
`and figure 14c is a frequency plot of a processed audio signal using these SMR’s. In this example the audio compression was
`severe (768 to 64 kbits/sec) so the coder may not necessarily be able to mask all the quantization noise.
`
`2.2.1.1.2 Example for Psychoacoustic Model 1
`
`This example uses the same example audio signal as above. Figure 15a shows the how psychoacoustic model 1 identifies the
`local spectral peaks as tonal and non-tonal components. Figure 15b shows the remaining tonal and non-tonal components after
`the decimation process. This process both removes components that would be below the threshold in quiet and removes the
`weaker tonal components within roughly half a critical band width (0.5 Bark) of a stronger tonal component. Psychoacoustic
`model 1 uses the decimated tonal and non-tonal components to determine the global masking threshold in a subsampled
`frequency domain. This subsampled domain corresponds approximately to a perceptual domain. Figure 15c shows the global
`masking threshold calculated for the example audio signal. Psychoacoustic model 1 selects the minimum global masking
`threshold within each subband to compute the SMR’s. Figure 16a shows the resulting signal-to-mask ratio and figure 16b is a
`frequency plot of the processed audio signal using these SMR’s.
`
`2.3 Layer Coding Options
`
`The MPEG/audio standard has 3 distinct layers for compression. Layer I forms the most basic algorithm while Layer II and
`Layer III are enhancements that use some elements found in Layer I. Each successive layer improves the compression
`performance but at the cost of greater encoder and decoder complexity. Every MPEG/audio bitstream contains periodically
`spaced frame headers to identify the bitstream. Figure 17 gives a pictoral representation of the header syntax. A 2 bit field in the
`MPEG header identifies the layer in use.
`
`2.3.1 Layer I
`
`The Layer I algorithm codes audio in frames of 384 audio samples. It does so by grouping together 12 samples from each of the
`32 subbands, as shown in figure 18. Besides the code for audio data, each frame contains a header, an optional Cyclic
`Redundancy Code (CRC) error check word, and possibly ancillary data. Figure 19a shows the arrangement of this data in a
`Layer I bitstream. The numbers within parentheses give the possible number of bits that can be used to encode each field. Each
`group of 12 samples gets a bit allocation and, if the bit allocation is not zero, a scale factor. The bit allocation tells the decoder
`the number of bits used to represent each sample. For Layer I this allocation can be 0 to 15 bits per subband. The scale factor is
`
`7
`
`

`
`a multiplier that sizes the samples to make full use of the range of the quantizer. Each scale factor has a 6 bit representation.
`The decoder multiplies the decoded quantizer output with the scale factor to recover the quantized subband value. The dynamic
`range of the scalefactors alone is over 120 dB. The combination of the bit allocation and the scale factor provide the potential for
`representing the samples with a dynamic range well over 120 dB. Joint stereo coding slightly alters the representation of left and
`right channel audio samples and will be covered later.
`
`2.3.2 Layer II
`
`The Layer II algorithm is a straightforward enhancement of Layer I. It codes the audio data in larger groups and imposes some
`restrictions on the possible bit allocations for values from the middle and higher subbands. It also represents the bit allocation,
`the scale factor values, and the quantized samples with a more compact code. Layer II gets better audio quality by saving bits in
`these areas so more code bits are available to represent the quantized subband values.
`
`The Layer II encoder forms frames of 1152 samples per audio channel. Whereas Layer I codes data in single groups of 12
`samples for each subband, Layer II codes data in 3 groups of 12 samples for each subband. Figure 18 shows this grouping as
`well. Again discounting stereo redundancy coding, there is one bit allocation and up to three scale factors for each trio of 12
`samples. The encoder uses a different scale factor for each group of 12 samples only if necessary to avoid audible distortion.
`The encoder shares scale factor values among two or all three

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket