throbber
Samsung Exhibit 1032
`Samsung v. Affinity
`IPR2014-01181
`Page 00001
`
`

`
`Principles of
`Digital Audio
`
`Ken 0. lfohlmann
`
`Fourth Edition
`
`New York San Francisco Washington. D.C. Auckland Bogoté
`Caracas Lisbon London Madrid Mexico City Milan
`Montreal New Delhi San Juan Singapore
`Sydney Tokyo Toronto
`
`McGraw-Hill
`
`Page 00002
`
`

`
`Library of Congress Cataloging-irr
`
`blication Data
`
`J
`
`2
`
`l'll'Ill _.
`
`'
`‘
`Pohlmann, Ken C.
`Principles of digital audio / Ken C. Poh1mann.—4th ed.
`p.
`cm.
`Includes bibliographical references and index.
`ISBN 0-07-134819-0
`1. Sound—Recording and reproducing—Digital techniques.
`TK7881.4 P63 2000
`621.389’3—dc21
`
`I. Title.
`
`99-054165
`
`McGraw—Hill
`A Divisibn of The McGraw-Hill Companies
`
`32
`
`Copyright © 2000 by The McGraw-Hill Companies, Inc. All rights reserved.
`Printed in the United States of America. Except as permitted under the United
`States Copyright Act of 1976, no part of this publication may be reproduced or
`distributed in any form or by any means, or stored in a data base or retrieval
`system, without the prior written permission of the publisher.
`
`1234567890 AGM/AGM 90543210
`
`ISBN 0-07-134819-0
`
`I
`The sponsoring editor for this book was Stephen S. Chapman and the
`production supervisor was Maureen Harper. It was set in Century Schoolbook by
`Pro-Image Corporation.
`
`Printed and bound by Quebecor/Martinsburg.
`
`This book was printed on recycled, acid-free paper containing
`a minimum of 50% recycled, de-inked fiber.
`'
`
`McGraw-Hill books are available at special quantity discounts to use as premiums
`and sales promotions, or for use in corporate training programs. For more infor-
`mation, please write to the Director of Special Sales, Professional Publishing, Mc-
`Graw-Hill, Two Penn Plaza, New York, NY 10121-2298. Or contact your local
`bookstore.
`
`Information contained in this work has been obtained by The McGraw—
`Hill Companies, Inc. (“Me-Graw~I~l1']l”) from sources believed to be reli~
`able. However, neither McGraw-Hill nor its authors guarantee the ac-
`curacy or completeness of any information published herein, and neither
`McGrs.w-Hill nor its authors shall be responsible for any errors,
`omissions, or damages arising out of use of this information. This work
`is piiblished’ with the understanding that McG1-aw-Hill and its authors
`are supplyinginforination but are not attempting to render engineering
`or other professional services. If such services are required, the assis-
`tance of an appropriate-profssional should be sought.
`
`Page 00003
`
`

`
`Perceptual Coding
`
`327
`
`the modulated lapped transform (MLT). In the MDCT, the length of the over-
`lapping windows is twice that of the block time (shift length of the transform).
`Frequency-domain subsampling is performed; the number of time and fre-
`quency components equals the shift length of the input time-domain sampled
`signal. MDCT also lends itself to adaptive window switching approaches with
`different window functions for the first and second half of the window; the
`time domain aliasing property must be independently valid for each window
`half. Many bands are possible with the MDCT with good efficiency, on the
`order of an FFT computation. Many codecs apply a Window function to blocks
`prior to transformation; this helps minimize spectral leakage of spectral co-
`efficients. A window is a time function that is multiplied by an audio block to
`provide a windowed audio block; the window shape governs the frequency
`selectivity of the filter bank. The overlap/ add characteristic minimizes block-
`ing artifacts and leakage of spectral coefficients. Digital filters and Windows
`are discussed in chapter 17.
`Hybrid filter banks use a cascade of different filter types (such as polyphase
`and MDCT) to provide different frequency resolutions at different frequencies
`with moderate complexity; for example, MPEG-1 Layer III encoders use a
`hybrid filter with a polyphase filter bank and MDCT. The ATRAC algorithm
`used in the MiniDisc, examined in chapter 12, is a hybrid coder that uses
`QMF to divide the signal into three subbands, then each subband is trans-
`formed into the frequency domain using the MDCT. Table 10.3 compares the
`properties of filter banks used in several low-bit rate coders.
`
`MPEG-1 Audio Standard
`
`.The International Standards Organization and the International Electro-
`technical Commission formed the Moving Pictures Expert Group (MPEG) in
`1988 to devise compression techniques for audio and video. This group has
`
`TABLE 10.3 Comparison of filter-bank properties.
`
`Feature
`
`Layer 1
`
`Layer 2
`
`Layer 3
`
`AC-2
`
`AC-3
`
`ATRAC*
`
`PAC/MPAC
`
`Filterbank type
`
`PQMF
`
`PQMF
`
`MDCT/MDST MDCT
`
`Hybrid
`PQMFI
`MDCT
`41.66 Hz
`
`750 Hz
`
`750 Hz
`
`93.75 Hz
`
`93.75 Hz
`
`Hybrid
`QMF/
`MDCT
`46.87 Hz
`
`MDCT
`
`23.44 Hz
`
`Frequency resolution
`at 48 kHz
`Time resolution at
`48 kHz
`Impulse response
`(LW)
`Impulse response
`(SW)
`23 ms
`10.66 ms
`32 ms
`32 ms
`24 ms
`24 ms
`8 ms
`Frame length at 48
`kHz
`
`
`0.66 ms
`
`0.66 ms
`
`4 ms
`
`1.3 ms
`
`2.66 ms
`
`1.3 ms
`
`2.66 ms
`
`512
`
`-—
`
`512
`
`-
`
`1664
`
`896
`
`512
`
`128
`
`512
`
`256
`
`1024
`
`128
`
`2048
`
`256
`
`*ATRAC is operating at a sampling frequency at 44.1 kHz. For comparison, the frame length and impulse response
`figures are given for an ATRAC system working at 48 kHz.
`(Brandenburg and Bosi)
`
`Page 00004
`
`

`
`328
`
`Chapter Ten
`
`developed several highly successful standards. It first devised the ISO/IEC
`International Standard 11172 “Coding of Moving Pictures and Associated Au-
`dio for Digital Storage Media at up to about 1.5 Mbit/ s” for reduced data rate
`coding of digital video and audio signals; the standard was finalized in No-
`vember, 1992. It is commonly known as MPEG-1. (The acronym is pronounced
`“m-peg”) The standard has three major parts: system (multiplexed Video and
`audio), video, and audio; a fourth part defines conformance testing. The max-
`imum audio bit rate is set at 1.856 Mbps. The audio portion of the standard
`(11172-3) has found many applications such as Video CD, CD-ROM, ISDN,
`video games, and digital audio broadcasting. It supports coding of 32, 44.1,
`and 48 kHz PCM data at bit rates of approximately 32 to 224 kbps/ channel
`(64 to 448 kbps for stereo). (Because data networks use data rates of 64 kbps
`(8 bits sampled at 8 kHz), most coders output a data channel rate that is a
`multiple of 64.)
`The ISO/MPEIG-1 standard was specifically developed to support audio and
`video coding for CD playback Withinwthe CD’s bandwidth of 1.41 Mbps. How-
`ever, the standard supports stereo bit rates ranging from 64 kbps to 448 kbps,
`as well as mono audio coding of 32 kbps. In addition, in the stereo modes,
`stereophonic irrelevance and redundancy can be optionally exploited to reduce
`the bit rate. Stereo audio bit rates below 256 kbps are useful for applications
`requiring more than two audio channels While maintaining full screen motion
`video. Rates above 256 kbps are useful for applications requiring higher audio
`quality, and partial screen video images. In either case, the bit allocation is
`dynamically adaptable according to need. The MPEG-1 standard is based on
`a history of research and development of data reduction algorithms.
`MUSICAM (Masking-pattern Universal Subband Integrated Coding And
`Multiplexing) was an early and successful perceptual coding algorithm. De-
`rived from MASCAM (Masking-pattern Adapted Subband Coding And Mul-
`tiplexing), MUSICAM divides the input audio signal into 32 subbands and
`uses perceptual coding models of minimum hearing threshold and masking to
`achieve data reduction. With a sampling frequency of 48 kHz, the subbands
`are each 750 Hz wide. Each subband is given a 6-bit scale factor according to
`the peak value in the subband’s 12 samples and quantized with a variable
`word ranging from 0 to 15 bits. Scale factors are calculated over a 24-ms
`interval, corresponding to 36 samples. A subband is quantized only if it con-
`tains audible signals above the masking threshold. Subbands with signals
`well above the threshold are coded with more bits, yielding a higher S/N ratio.
`In other words, within a given bit rate, bits are assigned where they are most
`needed. In addition, a side-chain Fourier spectral analysis is performed on the
`input signal to assist in the masking threshold calculations. In this way, the
`data rate is reduced, to perhaps 128 kbps per mono channel (256 kbps for
`stereo). Extensive tests of 128 kbps MUSICAM showed that the coder achieves
`fidelity that is indistinguishable from a CD source, that it is monophonically
`compatible, that at least two cascaded codec stages produce no audible deg-
`radation, and that it is preferred to very high quality FM signals.
`The audio portion of the ISO/MPEG-1 standard can trace its origins to tests
`conducted by Swedish Radio in July 1990. MUSICAM coding was judged su-
`
`Page 00005
`
`

`
`Perceptual Coding
`
`329
`
`perior in complexity and coding delay; however, the ASPEC (Adaptive Spectral
`Perceptual Entropy Coding) transform coder provided superior sound quality
`at very low data rates. The architectures of these two coding methods form
`the basis for the ISO/MPEG-1 audio standard. The 11172-3 standard de-
`scribes three layers of coding, each with different applications. Specifically,
`Layer I describes the least sophisticated method that requires relatively high
`data rates (approximately 192 kbps/ channel). Layer II is based on Layer I
`but is more complex and operates at somewhat lower data rates (approxi-
`mately 96-128 kbps/ channel). Layer IIA is a joint stereo version operating at
`128 and 192 kbps per stereo pair. Layer III is somewhat conceptually different
`from I and II, is the most sophisticated, and operates at the lowest data rate
`(approximately 64 kbps/ channel). The increased complexity from Layer I to
`III is reflected in the fact that at low data rates, Layer III will perform best
`for audio fidelity. Generally, Layers II, IIA and III have been judged to be
`acceptable for broadcast applications; in other Words, the 128 kbps/ channel
`data reduction does not impair the quality of the original audio signal.
`In very general terms, all three coders operate similarly. The audio signal
`passes through a filter bank and is analyzed in the frequency domain. The
`subsampled components are regarded as subband values, or spectral coeffi—
`cients. The output of a side-chain transform, or the filter bank itself, is used
`to estimate masking thresholds. The subband values or spectral coefficients
`are quantized according to the psychoacoustic model. Coded mapped samples
`and bit allocation information are packed into frames prior to transmission.
`In each case, the encoders are not defined by the ISO/MPEG-1 standard,‘ only
`the decoders are specified. This forward adaptive bit allocation permits im-
`provements in encoding methods, particularly in the psychoacoustic modeling,
`provided the data output from the encoder can be decoded according to the
`standard. In other words, existing coders Will play data from improved encod-
`ers.
`
`The MPEG-1 layers support stereo joint coding using intensity coding. Left/
`right high frequency subband samples are summed into one channel but scale
`factors remain left/right independent. The decoder forms the envelopes of the
`original left and right channels using the scale factors. The spectral shape of
`the left and right channels is the same in these upper subbands, but their
`amplitudes differ. The bound for joint coding is selectable at four frequencies:
`3, 6, 9, and 12 kHz at a 48-kHz sampling frequency; the bound can be changed
`from one frame to another. Care must be taken to avoid aliasing between
`subbands and negative correlation between channels when joint coding. Layer
`III also supports MS (sum and difference) coding between channels, as de-
`scribed below. Joint stereo "coding increases coder complexity only slightly.
`MPEG data is transmitted in frames, as shown in Fig. 10.16, with each
`frame being individually decodable. The length of a frame depends on the
`layer and MPEG algorithm used. In MPEG-1, Layer II and III have the same
`frame length representing 1152 audio samples. In Layer II, the audio data of
`a frame is located in the audio frame to which it corresponds. Unlike the other
`layers, in Layer III the number of bits per frame can vary; this allocation
`provides flexibility according to the coding demands of the audio signal.
`
`Page 00006
`
`

`
`A
`
`ISO/MPEG/AUDIO layer I frame structure: valid for 384 PCM audio input samples
`Duration: 8 ms with a sampling rate of 48 kHz
`
`CRC
`
`B't
`anocgtion
`
`|
`
`:
`4 bit linear-
`
`Scalefactors
`
`6 bit linear
`
`I
`I
`I
`I
`I
`I
`I
`I
`P
`I
`I
`I
`
`I
`I
`I
`I
`r
`I
`I
`I
`I
`I
`I
`f
`
`Subband
`
`Samples
`
`1 subband sample corresponds to
`32 PCM audio input samples.
`
`
`
`
`
`Auxiliarydatafield
`
`
`
`lengthnotspecified
`
`I
`
`I
`
`Header
`
`:A) 12 bit
`'
`E
`.
`sync
`I
`1
`:
`:B) 20bit
`.
`system.
`I
`info
`'
`
`I I
`
`ISO/MPEG/AUDIO layer II frame structure: valid for 1152 PCM audio input samples
`Duration: 24 ms with a sampling rate of 48 kHz
`SCFSI
`
`CRC
`
`allocation
`
`Scale factors
`
`Subband
`
`Samples
`
`Gr“:
`I
`: 6bit linear bro
`:Low subbandsi
`-4bit linear
`I 00 -
`II:I:II:I:II:II:II:I:II:II:I:II:I -
`I
`I
`I "b-
`l
`:Mid subbands : 01 : _[>_
`_b_ ' 12 granules [Gr] of 3 subbaud :
`.3 bit linear
`I
`I
`:
`Samples each.
`I
`I
`I 10 I
`I 3 subband samples correspond‘
`'Hi3h 311bba“d5'
`'
`' t 96 PCM aud'o 'nput samples.'
`:2biI:1inear
`:nH>~4> i:°
`“
`:
`I
`'2 bit
`'
`'
`
`.‘»'>U
`
`
`Auxiliarydata
`
`
`fieldlengthnotspecified
`
`24ms
`Main data
`‘
`.
`0 Scale factors
`- Coded subband samples
`° Auxiliary data
`
`‘ - — — - _
`
`,
`
`|
`i
`
`11
`
`ide info
`
`I
`
`Main data
`begin
`
`1
`256 bit (stereo)
`|
`Slgranuleo l SI granulel
`L Sidelnfo for subframe l
`Sideinfo for subframe 0
`Scale factor select information
`
`Private bits (user defined)
`Pointer to begin of main data of this frame
`
`32 bit
`
`B_
`mite
`
`Syncword
`
`|
`
`.
`
`Emphasis
`Original/home
`---k Copyright
`-- Mode, mode extension (mono, stereo...)
`Sampling frequency
`Bit rate
`
`t I
`
`I
`
`Layer
`
`Figure 10.16 Structure of the ISO/ MPEG-1 audio Layer 1, II, and III bit streams. The
`header and some other fields are common, but other fields differ. Higher-level coders
`might transcode lower-level bit streams. A. Layer I bit stream format. B. Layer II bit
`stream format. C. Layer III bit stream format.
`
`Page 00007
`
`

`
`Perceptual Coding
`
`331
`
`A frame begins with a 32-bit ISO header with a 12-bit synchronizing pattern
`and 20 bits of general data on layer, bit rate index, sampling frequency, type
`of emphasis, etc. This is followed by an optional 16-bit CRCC check word with
`generation polynomial xi“ + x15 + x2 fr 1. Subsequent fields describe bit allo-
`cation data (number of bits used to code subband samples), scale factor selec-
`tion data, and scale factors themselves. This varies from layer to layer. For
`example, Layer I sends a fixed 6-bit scale factor for each coded subband. Layer
`II examines scale factors and uses dynamic scale-factor selection information
`(SCFSI) to avoid redundancy; this reduces the scale factor bit rate by a factor
`of two.
`The largest part of the frame is occupied by subband samples. Again, this
`varies among layers. In Layer II, for example, samples are grouped in gran-
`ules. The length of the field is determined by a bit rate index, but the bit
`allocation determines the actual number of bits used to code the signal; if the
`frame length exceeds the number of bits allocated, the remainder of the frame
`can be occupied by ancillary data (this feature is used by MPEG-2, for ex-
`ample). Ancillary data is coded similarly to primary frame data. Frames con-
`tain 384 samples in Layer I and 1152 samples in II and III (or 8 and 24 ms
`respectively at 48 kHz).
`The similarity between the layers promotes tandem operation; for example,
`Layer III data can be transcoded to Layer II without returning to the analog
`domain (other digital processing is required however). A full MPEG-1 decoder
`must be able to decode its layer, and all layers below it. There are also layer
`X coders that only code one layer. Layer I preserves highest fidelity for ac-
`quisition and production work at high bit rates where six or more codings can
`take place; Layer II distributes programs efficiently Where two codings can
`occur; Layer III is most efficient, with lowest rates, with somewhat lower fi-
`delity, and a single coding.
`Extensive tests have demonstrated that either Layer II or III at 2 X128
`kbps or 192 kbps joint stereo can convey a stereo audio program with no
`audible degradation compared to a 16-bit linear system. If a higher data rate
`of 384 kbps is allowed, Layer I also achieves transparency compared to 16-bit
`linear PCM. At rates as low as 128 kbps, Layers II and III can convey stereo
`material that is subjectively Very close to 16-bit fidelity. Tests also have stud-
`ied the effects of cascading MPEG codecs. For example, in one experiment,
`critical audio material was passed through four Layer II codec stages at 192
`kbps and two stages at 128 kbps, and they were found to be transparent. On
`the other hand, a cascade of five codec stages at 128 kbps was not transparent
`for all music programs. More specifically, a source reduced to 384 kbps with
`MPEG-1 Layer II sustained about 15 code/decodes before noise became sig-
`nificant; however, at 192 kbps, only two codings were possible. These partic-
`ular tests did not enjoy the benefit of joint stereo coding, and as with any
`MPEG perceptual coder, overall performance can be improved by substituting
`new psychoacoustic models in the encoder. In addition, transcoding produces
`no appreciable noise after multiple MPEG code/ decodes.
`.’
`MPEG-2, discussed below, incorporates the three audio layers of MPEG-1,
`and adds additional features, principally surround sound. However, MPEG-2
`
`Page 00008
`
`

`
`332
`
`Chapter Ten
`
`decoders can play MPEG-1 audio files, and MPEG-1 two-channel decoders can
`decode stereo information from surround sound MPEG-2 files.
`
`Psychoacoustic models
`
`The MPEG-1 standard suggests two psychoacoustic models which determine
`the minimum masking threshold for inaudibility. The models are needed only
`in the encoder. Simple encoders do not employ a psychoacoustic model; The
`difference between the maximum signal level and the masking threshold is
`used by the bit allocator to set the quantization levels. Generally, model 1 is
`applied to Layers I and II and model 2 is applied to Layer III. In both cases,
`the models follow an algorithm to output signal-to-mask ratios for each sub-
`band or group of subbands. For example, model 1 performs these nine steps:
`
`1. Perform time to frequency mapping: A 512- or 1024—point fast Fourier trans-
`form is used, with a Hann window‘to reduce edge effects, to transform time
`domain data to the frequency domain; in this way, precise masking thresh-
`olds can be calculated.
`
`SPL levels: This calculation is performed for each sub-
`. Determine
`band using spectral data and scale factors. Maxima are considered to be
`potential maskers, used in forming the masking threshold.
`
`. Determine threshold in quiet: An absolute hearing threshold is determined
`in the absence of any signal; this forms the lower masking bound.
`
`. Identify tonal and nontonal components: Tonal (sinusoidal) and nontonal
`(noiselike) components in the signal are identified and processed separately
`because they provide different masking thresholds.
`'
`
`. Decimation of maskers: The number of maskers is reduced to obtain only
`the relevant maskers; their magnitude and distance in bark must be ap-
`propriate.
`
`. Calculate masking thresholds: Noise masking thresholds for each subband
`are determined by applying a masking function to the signal. When the
`subband is wide compared to the critical band, the spectral model selects
`minimum threshold; when it is narrow, the model averages the thresholds
`covering the subband.
`
`. Determine global masking threshold: This is the summation of the upper
`and lower slopes of individual subband masking curves, as well as the
`threshold in quiet to form a composite contour.
`
`. Determine minimum masking threshold: These values are determined for
`each subband, based on the global masking threshold.
`
`. Calculate signal—t0-mask ratios: The difference between the maximum SPL
`levels and the minimum masking threshold values determines the SMR
`ratio in each subband; this value is supplied to the bit allocator.
`
`Page 00009
`
`

`
`Perceptual Coding
`
`333
`
`Although the validity of the psychoacoustic model is crucial to the success
`of any perceptual coder, it is the actual employment of the model in the quan-
`tization process that ultimately determines the audibility of noise. In that
`respect, the interrelationship of the model and the quantizer is the most pro-
`prietary part of any codec.
`
`Layer I is a simplified version of the MUSICAM standard; block diagrams of
`a single-channel Layer I encoder and decoder (which also applies to Layer II)
`are shown in Fig. 10.17. Its aim is to provide high fidelity at low cost, at a
`somewhat high data rate. A polyphase filter is used to split the Wideband
`signal into 32 subbands of equal width. The filter is critically sampled; there
`is the same number of samples in the analyzed domain as in the time domain.
`Adjacent subbands overlap; a single frequency can affect two subbands. The
`filter and its inverse are not lossless; however, the error is small. The filter
`bank’s bands are all equal width, but the ear’s critical bands are not; this is
`compensated for in the bit allocation algorithm; for example, lower bands are
`
`31 Subband
`
`Digitlaltgléiijo
`signa
`}
`(2,468 kbps)
`
`Filterbank
`32 subbands
`
`0
`
`Linear ——-—-—-
`.
`quantlzer
`
`Codiflg Of
`i
`side
`mformauon Auxiliary data
`
`.
`Bitstrcam
`formatting
`CRc'°he°k _’
`
`Coded audio
`-
`.1
`kbps...
`2:192 kbps)
`
`lozgwgmts
`13
`
`Psycho-
`acmislic
`model
`
`Encoded audio
`bitstream _.
`(2*32 kbps___
`2,492 kbps)
`
`Demultiplexing
`and
`error check
`
`1
`Auxiliary data
`
`Dequantization
`of
`subband samples
`
`31
`
`0
`
`Inverse
`filterbank
`32 subbands
`
`_
`Stercophomc
`audio signal
`(2*768 kbps)
`
`Decoding of
`side information
`
`13
`
`Figure 10.17 ISO/MPEG-1 Layer I or II audio encoder and decoder. The 32-subband filter
`bank is common to all three layers. A. Layer I or II encoder (single-channel mode). B.
`Layer I or II two-channel decoder.
`
`Page 00010
`
`

`
`334
`
`Chapter Ten
`
`usually assigned more bits, increasing their resolution over higher bands. This
`polyphase filter bank with 32 subbands is used in all three layers; Layer III
`adds additional hybrid processing.
`>
`The filter outputs 32 samples, one sample per band, for every 32 input
`samples. In Layer I, 12 subband samples from each of the 32 subbands are
`grouped to form a frame; this represents 384 wideband samples. Each sub-
`band group of 12 samples is given a bit allocation; subbands judged inaudible
`are given a zero allocation. Based on the calculated masking threshold (just
`audible noise), the bit allocation determines the number of bits used to quan-
`tize those samples. A floating point notation is used to code samples; the man-
`tissa determines resolution and the exponent determines dynamic range. A
`fixed scale factor exponent is computed for each subband with a nonzero al-
`location; it is based on the largest sample value in the subband. Each of the
`12 subband samples in a block is normalized by dividing it by the same scale
`factor; this optimizes quantizer resolution.
`Using the scale factor informationnand spectral analysis from a 512-sample
`FFT wideband transform, a psychoacoustic model compares the data to the
`minimum threshold curve; the normalized samples are quantized by the bit
`allocator to achieve data reduction. The subband data is coded, not the FFT
`spectra. Dynamic bit allocation assigns m‘antissa bits to the samples in each
`coded subband, or omits coding for inaudible subbands. Each sample is coded
`with one PCM codeword; the quantizer provides 2” - 1 steps Where 2 S n S 15.
`Subbands with a high signal-to-mask ratio are given a long word, subbands
`with a low SMR ratio are given fewer bits; in other words, the SMR ratio
`determines the minimum signal-to-noise ratio that has to be met by the quan-
`tization of the subband samples. However, quantization is performed itera-
`tively; when available, additional bits are added to codewords to increase the
`S/N ratio above the minimum. The block scale factor exponent and sample
`mantissas are output. Error correction and other information is added to the
`signal at the output of the coder.
`Decoding is performed by decoding the bit allocation information, and de-
`coding the scale factors. Samples are requantized by multiplying them with
`the correct scale factor. The scale factors provide all the information needed
`to recalculate the masking thresholds; in other words, the decoder does not
`need‘ a psychoacoustic model. Samples are applied to an inverse synthesis
`filter to output the waveform.
`
`Example of Layer I algorithm
`
`As with other perceptual coding methods, Layer I uses the ear’s audiology
`performance as its guide for audio encoding, relying on principles such as
`amplitude masking to encode a signal that is perceptually identical. Generally,
`Layer I operating at 384 kbps achieves the same quality as a Layer II coder
`operating at 256 kbps. Also, Layer I can be transcoded to Layer II. The fol-
`lowing describes a typical Layer I implementation.
`
`Page 00011
`
`

`
`Perceptual Coding
`
`335
`
`Signals input to an encoder can be analog, or PCM digital with 32-, 44.1-,
`or 48-kHz sampling frequencies. At these three sampling frequencies, the sub-
`band width is 500, 689, and 750 Hz, and the frame period is 12, 8.7, and 8
`ms, respectively. The following description assumes a 48—kHz sampling fre-
`quency. The stereo audio signal is passed to the first stage in a Layer I encoder,
`as shown in Fig. 10.18. A 24-bit FIR filter with the equivalent of 512 taps
`divides the audio band into 32 subbands of equal 750-Hz width. The filter
`window is shifted by 32 samples each time (12 shifts) so all the 384 samples
`in the 8—ms frame are analyzed. The filter bank outputs 32 subbands. With
`this filter, the effective sampling rate of a subband is reduced by 32 to 1, for
`example, from a frequency of 48 kHz to 1.5 kHz. Although the channels are
`bandlimited, they are still in PCM representation at this point in the algo-
`rithm. The subbands are equal width, Whereas the ear’s critical bands are not.
`With critical bands, the number of bits allocated may be equal in each band.
`This can be compensated for in equal subbands by unequally allocating bits
`to the subbands; more bits are given to code signals in lower-frequency sub-
`bands.
`
`The encoder analyzes the energy in each subband to determine which sub-
`bands contain audible information. This example of a Layer I encoder does
`not use an FFT side chain. The algorithm calculates average power levels in
`each subband over the 8-ms (12 sample) period. Masking levels in subbands
`and adjacent subbands are estimated. Minimum threshold levels are applied.
`Peak power levels in each subband are calculated and compared to mask-
`ing levels. The SMR ratio (difference between the maximum signal and
`the masking threshold) is calculated for each subband and is used to de-
`termine the number of bits N assigned to a subband (i) such‘ that N, 2
`(SMR, — 1.76)/ 6.02. A bit pool approach is taken to optimally code signals
`within the given bit rate. Quantized values form a mantissa, with a possible
`
`Allocation
`calculation
`
`Allocation information
`
`Filter
`
`Broad
`band
`audio
`
`,
`Normalized
`-- Quantization
`
`generator
`
`Scale factors
`
`Scale factor indexes -b
`
`Figure 10.18 Example of an ISO/MPEG-1 Layer I encoder; the FFT side
`chain is omitted. (Philips)
`
`1
`Coding info
`
`Page 00012
`
`

`
`336
`
`Chapter Ten
`
`range of 2 to 15 bits, thus a maximum resolution of 92 dB is available from
`this part of the coding word. ‘In practice, in addition to signal strength, man-
`tissa values also are affected by rate of change of the waveform pattern, and
`available data capacity. In any event, new mantissa values are calculated for
`every sample period.
`Quantized values are normalized (scaled) to optimally use the dynamic
`range of the processor. Specifically, six exponent bits form a scale factor, which
`is determined by the signa1’s absolute amplitude. This scale factor covers the
`range from -118 dB to +6 dB in 2-dB steps. Because the audio signal varies
`slowly in relation to the sampling frequency, the masking threshold and scale
`factors are calculated only once for every group of 12 samples, forming a frame
`(12 samples/subband X 32 subbands = 384 samples). For every subband, the
`absolute peak value of the 12 samples is compared to a table of scale factors,
`and the closest (next highest) constant is applied; the other sample values are
`normalized to that factor, and during decoding will be used as multipliers to
`compute the correct subband signalpplevel.
`A fioating—point representation is used; one field contains a fixed length 6-
`bit exponent, and another field contains a variable length 2- to 15-bit man-
`tissa. Every block of 12 subband samples may have different mantissa lengths
`and values, but would share the same exponent. Allocation information de-
`tailing the length of a mantissa is placed in a 4-bit field in each frame. Because
`the total number of bits representing each sample within a subband is con-
`stant, this allocation information (like the exponent) needs to be transmitted
`only once every 12 samples. A null allocation value is conveyed when a sub-
`band is not encoded; in this case neither exponent nor mantissa values within
`that subband are transmitted. The 15-bit mantissa yields a maximum signal-
`to-noise ratio of 92 dB. The 6-bit exponent can convey 64 values; however, a
`pattern of all 1’s is not used, and another value is used as a reference. There
`are thus 62 values, each representing 2-dB steps for an ideal total of 124 dB.
`The reference is used to divide this into two ranges, one from O to -118 dB,
`and the other from 0 to +6 dB. The 6 dB of headroom is needed because a
`component in a single subband might have a peak amplitude 6 dB higher
`than the broadband composite audio signal. In this example, the broadband
`dynamic range is thus equivalent to 19 bits of linear coding.
`A complete frame contains synchronization information, sample bits, scale
`factors, bit allocation information, and control bits for sampling frequency
`information, emphasis, etc. The total number of bits in a frame (with 2 chan-
`nels, with 384 samples, over 8 ms, sampled at 48 kHz) is 3072. This in turn
`yields a 384-kbps transmission rate. With the addition of error detection and
`correction code, and modulation, the final bit rate to a storage medium might
`be 768 kbps. The first set of subband samples in a frame is calculated from
`512 samples by the 512-tap filter and the filter window is shifted by 32 sam-
`ples each time into 11 more positions during a frame period; thus each frame
`incorporates information from 864 broadband audio samples per channel.
`Sampling frequencies of 32 and 44.1 kHz also are supported, and because
`the number of bands remains fixed at 32, the subband width becomes 689.06
`
`Page 00013
`
`

`
`Perceptual Coding
`
`337
`
`Hz with a 44.1-kHz sampling frequency. Because the output bit rate is fixed
`at 384 kbps, and 384 samples/ channel per frame is fixed, there is a reduction
`in frame rate at sampling frequencies of 32 and 44.1 kHz, and thus an in-
`crease in the number of bits per frame. These additional bits per frame are
`used by the algorithm to further increase audio quality.
`Layer I decoding proceeds frame by frame, using the processing shown in
`Fig. 10.19. Data is reformatted to linear PCM by a subband decoder, using
`allocation information and scale factors. Received scale factors are placed in
`an array with two columns of 32 rows, each six bits Wide. Each column rep-
`resents an output channel, and each row represents one subband. The sub-
`band samples are multiplied by the scale factors to restore them to their quan-
`tized values; empty subbands are automatically assigned a zero value. A
`synthesis reconstruction filter recombines the 32 subbands into one broadband
`audio signal. This subband filter operates identically (but inversely) to the
`input filter. As in the encoder, 384 samples/ channel represent 8 ms of audio
`signal (at a sampling frequency of 48 kHz). Following this subband filtering,
`the signal is ready for reproduction through D/A converters.
`Because psychoacoustic processing, bit allocation, and other operations are
`not used in the decoder, its cost is quite low. More importantly, the decoder is
`transparent to improvements in encoder technology. If the psychoacoustic
`models used in encoders are improved, the resulting fidelity would improve
`as well. Because the encoding algorithm is a function of digital signal proc-
`essing, more sophisticated coding is possible. For example, because the num-
`ber of bits per frame varies according to sample rate, it might be expedient
`to c

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket