throbber
RPX Exhibit 1144
`RPX v. DAE
`
`

`
`REGIONAL OFFICES
`
`Japan
`Japan Section
`c/o Japan Audio Society, Mori
`Building
`14-34, 1-chome, Jingumae,
`Shibuya-ku,
`Tokyo 158, Japan
`Tel: 81 -3-3403-6649
`Fax: 81-3—3404-6549
`
`’
`
`Europe
`AES Europe Region
`Zevenbunderslaan 142/9
`B-1190 Brussels, Belgium
`Tel: 322-345-7971
`Fax: 32-2-345-3419 ,
`
`United Kingdom
`British Section
`Audio Engineering Society Ltd.
`P. O. Box 645
`Slough, SL1 8BJ
`United Kingdom
`Tel: Bumharn 44 (0) 628 663725
`Fax: 44 (O) 628 667002
`
`AES REGIONS AND SECTIONS
`
`Eastern Region
`Sections: Atlanta, Boston,
`District of Columbia, New
`York, Philadelphia, Toronto.
`Student Sections: Duquesne
`University, Fredonia, Full Sail
`Center for Recording Arts,
`McGill University,
`Pennsylvania State University,
`University of Massachusetts-
`Lowell, University of Miami.
`
`Central Region
`Sections: Chicago, Detroit,
`lndianapoiis, Kansas City,
`Nashville, St. Louis, Upper
`Midwest, West Michigan.
`Student Sections: Belmont
`University, Hutchinson
`Technical College,
`Lawrence Technological
`University, Michigan
`Technological University,
`Middle Tennessee State
`University, Ohio University.
`
`Western Region
`Sections: Albuquerque/Santa
`Fe, Los Angeles, Pacific North-
`west, Portland, San Diego, San
`Francisco, Southern Arizona,
`Utah, Vancouver.
`Student Sections: American Riv-
`er, California State University-
`Chico, Cogswell Polytechnical
`College, Denver, Long Beach
`City College, Loyola Marymount
`University, San Francisco State
`University, University of South-
`ern Caliiornia.
`
`Europe Region
`Sections: Austrian, Belgian,
`Belgrade, British, Bulgarian,
`Central German, Croatian,
`Czech, Danish, Finnish, French,
`Greek, Hungarian, Iberian, Israel,
`Italian, Netherlands, North Ger-
`man, Norwegian, Polish, Roma-
`nian, Russian, South German,
`Slovakian Republic, Slovenian,
`Swedish, Swiss.
`
`International Region
`Sections: Adelaide,
`Japan, Melbourne, Mexican,
`Philippine, Sydney, Venezuelan.
`
`PURPOSE
`
`The Audio Engineering Society is organized for the pur-
`pose of: uniting persons performing professional services
`in the audio engineering field and its allied arts; collecting,
`collating, and disseminating scientific knowledge in the
`field of audio engineering and its allied arts; advancing
`such science in both theoretical and practical applications;
`and preparing, publishing, and distributing literature and
`periodicals relative to the foregoing purposes and policies.
`
`MEMBERSHIP
`Individuals who are interested in audio engineering may
`become members of the Society. Applications are consid-
`ered by the Admissions Committee.
`.
`Grades and annual dues are as follows: Full member,
`$65.00; Associate member, $65.00; Student member,
`$35.00. A membership application form may be obtained
`from headquarters.
`~
`Sustaining memberships are available to persons, corpo-
`rations, or organizations who wish to support the Society.
`A subscription to the Journal isincluded with all
`memberships.
`’
`R
`-
`
`AES-
`
`--
`
`-A 00
`
`0
`
`"'
`
`O
`
`O
`
`"
`
`(ISSN 0004-7554), Volume 42, Number 10, 1994 October
`Published monthly, except January/February and July/August when published bi-
`monthly, by the Audio Engineering Society, 60 East 42nd Street, New York, New
`York 10165-2520, USA, Telephone 212-661-2355. Fax 212-682-0477. Second-
`class postage paid at New York, New York, and at an additional mailing office.
`Postmaster: Send address corrections to Audio Engineering Society, 60 East 42nd
`Street, New York, New York 10165-2520.
`
`The Audio Engineering Society is not responsible for statements made by its
`contributors.
`
`EDITORIAL STAFF
`
`Daniel R. von Recklinghausen Editor
`Patricia M. Macdonaid Executive Editor
`Ingeborg M. Stochmal
`Copy Editor
`Abbie J. Cohen Senior Editor
`Patricia L. Sarch Art Director
`G. Franklin Montgomery
`Consulting Technical Editor
`Flavia Elzinga Advertising
`Gerri M. Calamusa Associate Editor
`Stephanie Paynes Writer
`Robert O. Fehr Editor Emeritus
`
`;;
`
`George L. Augspurger
`Jeffrey Barish
`Jeff)’ Ballck
`SarenBech
`Durand Begault
`piet J_ Berkhout
`Barry A' messer
`‘ John Bradiey
`“°“" "' B“°"°’5
`Mahlon o. Burkhard
`,;
`.
`R‘°"“"d C‘ cm‘
`3?; Marvin Camras
`Duane H. Cooper
`" Robert R. Cordell
`i
`Andrew Duncan
`John M. Eargle
`Edward J. Foster
`Mark B. Gardner
`Earl R. Geddes
`
`'
`
`REVIEW BOARD
`John T. Mullin
`Richard A. Greiner
`Marlin P0l0n
`David Griesinger
`Malcolm 0. J. I-Iawkstord D-Pfels
`John M. Hollywood
`?a"'°!°g°°"
`1-
`|-
`H t
`l3“°'5 Umsell
`A32 "fi‘°.nKa?zg:an
`Kees Schouhamer
`James M. Kates
`Immmk
`D B Keele Jr
`‘er
`David L Kle
`'
`. 9*’
`Paul W. Klipsch
`W Marsh ll Leach Jr
`‘
`3 _
`_
`’
`Stanley P. Lipshitz
`_
`James F‘ M°G'" _
`J‘ G‘
`McKmgh‘
`G w M N ll
`“V -
`C 3 Y
`D-J- M93793
`Robert A. M009
`James A. Moorer
`
`R? iteardi-i §m”r?'"

`‘
`3
`Herman J. M. Steeneken
`John Strawn
`G H B b Th
`d
`“““°"
`-
`-( ° )
`Floyd E30019
`Em“ |__ Torick
`John vanderkoo
`-
`Dame|R_von
`Recklinghausen
`Rhonda Wilson
`Wieslaw V. Woszczyk
`
`V
`
`COPYRIGHT
`Copyright © 1994 by the Audio Engi-
`neering Society, Inc. It is permitted to
`quote from this Journal with custom-
`ary credit to the source.
`
`:} COPIES
`Individual readers are permitted to
`photocopy isolated articles for re-
`search or other noncommercial use.
`Permission to photocopy for internal
`or personal use of specific clients is
`granted by the Audio Engineering So-
`ciety to libraries and other users regis-
`tered with the Copyright Clearance
`Center (CCC), provided that the base
`fee of $1.00 per copy plus $0.50 per
`page is paid directly to CCC, 21
`Congress Street, Salem, MA 01970,
`USA. 0004-7554/93. Photocopies of
`individual articles may be ordered
`from the AES Headquarters office at
`$5.00 per article.
`
`REPRINTS AND REPUBLICATION
`Multiple reproduction or republica-
`tion of any material in this Journal re-
`quires the permission of the Audio
`Engineering Society. Permission may
`also be required from the author(s).
`Inquiries should be sent to the AES
`Editorial office.
`‘
`
`SUBSCRIPTIONS
`The Journal is available by subscrip-
`tion. Annual rates are $125.00
`surface mail, $170.00 air mail. For
`further information, contact AES
`Headquarters.
`
`BACK ISSUES
`Selected back issues are available:
`From Vol. 1 (1953) through Vol. 12
`(1964), $10.00 per issue (members),
`$15.00 (nonmembers); Vol. 13 (1965)
`to present, $6.00 per issue (mem-
`bers), $11.00 (nonmembers). For in-
`formation, contact AES Headquarters
`office.
`.
`
`MICROFILM
`(1971 Jan-
`1
`Copies of Vol. 19, No.
`uary) to the present edition are avail-
`able on microfilm from University Mi-
`crofilms International, 300 North
`Zeeb Rd., Ann Arbor, MI 48106,
`USA.
`
`ADVERTISING
`Contact the AES Editorial office.
`
`MANUSCRIPTS
`For information on the presentation
`and processing of manuscripts, see
`Information for Authors.
`
`

`
`(cid:3) (cid:55)(cid:75)(cid:76)(cid:86)(cid:3)(cid:80)(cid:68)(cid:87)(cid:72)(cid:85)(cid:76)(cid:68)(cid:79)(cid:3)(cid:80)(cid:68)(cid:92)(cid:3)(cid:69)(cid:72)(cid:3)(cid:83)(cid:85)(cid:82)(cid:87)(cid:72)(cid:70)(cid:87)(cid:72)(cid:71)(cid:3)(cid:69)(cid:92)(cid:3)(cid:38)(cid:82)(cid:83)(cid:92)(cid:85)(cid:76)(cid:74)(cid:75)(cid:87)(cid:3)(cid:79)(cid:68)(cid:90)(cid:3)(cid:11)(cid:55)(cid:76)(cid:87)(cid:79)(cid:72)(cid:3)(cid:20)(cid:26)(cid:3)(cid:56)(cid:17)(cid:54)(cid:17)(cid:3)(cid:38)(cid:82)(cid:71)(cid:72)(cid:12)(cid:3)
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`PAPERS
`
`ISO-MPEG-1 Audio: A Generic Standard for Coding
`of HighQuality Digital Audio*
`
`KARLHEINZ BRANDENBURG, AES Fellow
`
`FhG—lIS, Erlangen, Germany
`
`AND
`
`GERHARD STOLL
`
`Institutfiir Rundfunktechnik, Munich, Germany
`
`Coauthors:
`
`Yves—Frang:ois Dehéry, C.C.E.T. T, France
`James D. Johnston, AES Member, AT&T, Bell Laboratories, USA
`Leon v.d. Kerkhof, Philips, Netherlands
`Ernst F. Schréder, AES Member, Thomson Consumer Electronics, Germany
`
`The standardization body ISO/IEC/JTC1/SC29/WG1l (Moving Pictures Expert Group,
`MPEG) was drafting a standard for compressing the high bit rate of moving pictures and
`associated audio down to 1.5 Mbit/s. The audio part of the proposed standard is described.
`Three layers of the audio coding scheme with increasing complexity and performance
`were defined. These layers were developed in collaboration mainly with AT&T, CCETT,
`FhG/University of Erlangen, Philips, IRT, and Thomson Consumer Electronics. The
`generic coding system is suitable for different applications, such as storage on inexpensive
`storage media or transmission over channels with limited capacity (such as digital audio
`broadcasting or ISDN audio transmission).
`
`0 INTRODUCTION
`
`The necessity to specify a generic Video and audio
`coding scheme for many applications dealing with digi-
`tally coded video and audio and requiring low data rates
`has led the ISO/IEC standardization body to establish
`the ISO/IEC JTC 1/SC29/WG1 1 , called MPEG (Moving
`Pictures Experts Group). This group had the task to
`compare and assess several digital audio low-bit-rate
`coding techniques in order to develop an international
`standard for the coded representation of moving pic-
`tures, associated audio, and their combination when
`used for storage and retrieval on digital storage media
`
`* Presented at the 92nd Convention of the Audio Engi-
`neering Society, Vienna, Austria, l992‘Ma_rch.24-27; revised
`1994 July 15.
`.
`
`
`
`780
`
`(DSM). The DSM targeted by MPEG include CD-ROM,
`DAT, magneto—optical disks, and computer disks, and
`it is expected that MPEG-based bit-rate reduction tech-
`niques will be used in a variety of communication chan-
`nels such as ISDN and local area networks and in broad-
`casting applications. The international standard ISO/IEC
`11172 “Coding of Moving Pictures and Associated
`Audio for Digital Storage Media at up to about 1.5 Mbit/s”
`was finalized in November 1992 and consists of three
`
`parts: system, video, and audio [1]. The system part
`(11172-1) deals with synchronization and multiplexing
`of audio—visual information, whereas the video (11172-
`2) and audio (11172-3) parts address the video and the
`audio bit-rate reduction techniques, respectively. This
`standard is also known as the MPEG-1 standard.
`
`MPEG-2 Audio is the consequent extension from two
`to five audio channels providing backward compatibility
`
`J. Audio Eng. Soc., Vol. 42, No. 10, 1994 October
`
`

`
`PAPERS
`
`CODING OF HIGH-QUALITY DIGITAL AUDIO
`
`to MPEG-1. The main aspects are high quality of five
`( + 1) audio channels, low bit rate and backward compat-
`ibility—-the key to insuring that existing 2-channel de-
`coders will still be able to decode compatible stereo
`information from five (+ 1) multichannel signals.
`This standard, which is expected in November 1994,
`is based on standards and recommendations from inter-
`
`national organizations such as_ ITU-R, SMPTE, and
`EBU. International standardization bodies will insure
`
`the highest audio signal quality by extensive testing.“
`For audio reproduction the loudspeaker positions left,
`center, right, left and right surround are used, according
`to the 3/2-standard.
`
`1 STANDARDIZATION AND QUALITY
`ASSESSMENTS WITHIN MPEG-1 AUDIO
`
`Since 1988 ISO/MPEG has been undertaking the stan-
`dardization of compression techniques for video and as-
`sociated audio. The main topic for standardization in
`MPEG was video coding together with audio coding
`for DSM. On the other hand the audio coding standard
`developed by this group was the first international stan-
`dard in the field of digital audio compression and is
`expected to be followed in different applications. Beside
`several subgroups such as video, system, test, imple-
`mentation, requirement, and DSM, the audio subgroup
`of MPEG had the responsibility for developing a stan-
`dard for coding of PCM audio signals with sampling
`rates of 32, 44.1, and 48 kHz at bit rates in a range of
`32-192 kbit/s per mono and 64-384 kbit/s per stereo
`audio channel. The operating modes are
`
`- Single channel
`* Dual channel, like bilingual
`- Stereo
`
`0 Joint stereo (combined coding of left and right chan-
`nels of a stereophonic audio program)
`
`Table 1 gives a short general View of the milestones
`of the MPEG-AUDIO group. This group asked for pro-
`posals for the audio coding standard in mid-1989, and 14
`proposals were submitted for this purpose. The original
`proposals were grouped into four clusters according to
`algorithmic similarities. The clustered candidate algo-
`rithms were called ASPEC, ATAC, MUSICAM, and
`SB/ADPCM.
`
`A number of subjective tests were performed [2]—[4]
`since mid-1990 to assess the audio quality of the ISO/
`MPEG/Audio coding standard. During this time period
`several improvements have been made to meet the pres-
`ent audio quality. The important milestones in the devel-
`opment of the standard have been the official tests orga-
`nized by the Swedish Broadcasting Corporation in
`Stockholm under the auspices of ISO and EBU. In July
`1990 large listening tests and objective evaluations, such
`as basic audio quality at different bit rates, sensitivity to
`transmission bit errors , encoder and decoder complexity ,
`and coding delay, were performed-jon prototype real-
`time implementations of the four clustered algorithms.
`
`J. Audio Eng. Soc., Vol. 42, No. 10, 1994 October
`
`Both the ASPEC and MUSICAM proposals have shown
`a very high subjective quality at bit rates of about 100
`kbit/s per channel.
`.
`Due to the result that the proposals of the ASPEC
`and MUSICAM groups have been subjectively nearly
`equally rated, and were judged relatively close in their
`overall performance, the official decision was as fol-
`lows [5]:
`
`the MPEG standardization committee decided to
`.
`.
`.1
`approve a collaborative development of the draft audio I
`coding standard between the ASPEC and MUSICAM
`groups, because the ASPEC codec was slightly superior
`with respect to the audio quality, especially for lower
`bit rates (64 kbit/s/channel), and the MUSICAM codec
`was slightly superior with respect to implementation
`complexity and decoding delay. The decision was that
`MUSICAM should be the basis for the low-complexity
`first layer, and algorithmic refinements including contri-
`butions of ASPEC should be used in the subsequent
`layers.
`
`Table 1. Milestones of ISO/MPEG—Audio group during the
`development of audio part of the International Standard 11 172.
`
`Date
`
`1988 December
`
`1989 January to 1990 March
`1989 May
`
`1989 June
`
`1989 October
`
`1989 December
`
`1990 May
`
`1990 June
`
`1990 August
`
`1990 December
`
`1991 May
`
`1991 June
`
`1991 November
`
`1991 December
`
`1992 November
`
`Activities
`
`First audio meeting in
`Hanover
`Preparation of tests
`Determining requirements and
`weighting procedure
`Proposal of 14 algorithms to
`be tested
`Clustering of proponents into
`four groups
`Detailed description of four
`clustered proposals:
`ASPEC, ATAC,
`MUSICAM, and
`SB—ADPCM
`Exchange of tapes with coded
`audio sequences between
`four clusters
`Subjective and objective tests
`at SR, Stockholm
`Presentation of results and
`decision to follow a layer
`concept
`First draft of part 3, “Audio
`Coding” of International
`Standard ISO 11172 was
`prepared.
`Verification of three layers by
`"
`subjective testing, again at
`SR in Stockholm
`Layers I and II are frozen;
`Layer III and ‘joint stereo
`coding’ are still under
`discussion
`Second verification of Layer
`III and first checking of
`Joint Stereo Coding by
`subjective testing at NDR
`in Hanover
`Draft of International
`Standard (DIS) ready for
`balloting at national
`standardization bodies
`International Standard ISOI
`IEC 11172-3 accepted by
`national standardization
`bodies
`
`781
`
`

`
` BFIANDENBUFIG AND STOLL
`
`PAPERS
`
`A three-layer coding algorithm has been defined. These
`three layers were tested again in April 1991 by the Swed-
`ish Broadcasting Corporation [3], and a last verification
`test for the very low bit rate of 64 kbit/s/channel and
`“joint stereo coding” was carried out by the University
`of Hanover under the auspices of NDR in November
`1991 [4]. In November 1991 the final proposal, consist-
`ing of three modes of operation called “Layers,” was
`adapted by ISO/MPEG [6].
`
`2 BASIC STRUCTURE OF A GENERIC AUDIO
`CODING SCHEME USING PERCEPTUAL
`CRITERIA
`
`perceptual audio coding
`
`The basic structure of a
`scheme is shown in Fig. 1.
`1) A time—frequency mapping (filter bank) is used
`to decompose the input signal into subsampled spectral
`components. Depending on the filter bank used, these
`are called subband values or frequency lines.
`2) The output of this filter bank, or the output of a
`parallel transform, is used to calculate an estimate of the
`actual (time—dependent) masking threshold using rules
`known from psychoacoustics.
`3) The subband samples or frequency lines are quan-
`tized and coded with the aim of keeping the noise, which
`is introduced by quantizing, below the masking thresh-
`old. Depending on the algorithm, this step is done in
`very different ways. The complexity varies from block
`companding to analysis-=by-synthesis systems using ad-
`ditional noiseless compression.
`4) A frame packing is used to assemble the bit stream,
`which typically consists of the quantized and coded
`mapped samples and some side information, such as bit
`allocation information.
`Depending on the focus on either low frequency reso-
`lution together with high time resolution or high fre-
`quency resolution which leads to only limited time reso-
`lution, the systems are usually called subband coders or
`transform coders.
`
`2.1 Filter Banks
`
`The following list provides a short overview] over the
`most common filter banks used for coding of high-qual-
`ity audio signals:
`1) QMF—Tree Filter Banks: Different frequency reso-
`lution at different frequencies is possible. Typical QMF-
`tree filter banks use from 4 to 24 bands. The computa-
`tional complexity is high.
`2) Polyphase Filter Banks: These are equally spaced
`filter banks which combine the filter design flexibility
`of generalized QMF banks with low computational com-
`plexity [7]. It is possible to design the prototype filter
`in a way that achieves both good frequency resolution
`(stop-band attenuation better than 96 dB) and good con-
`trol of possible time-domain artifacts. A polyphase filter
`bank using 32 bands is used for Layers I and II of the
`ISO/MPEG audio coder.
`3) DFT, DCT with Sine-Taper Window: These were
`the first transforms used in transform coding of audio
`signals. They implement equally spaced filter banks with
`128 -512 bands at a low computational complexity. They
`do not provide critical sampling, that is, the number of
`time—frequency components is greater than the number
`of time samples represented by one block length. An-
`other disadvantage of these transforms are possible
`blocking artifacts.
`4) Modified Discrete Cosine Transform (MDCT, us-
`ing time-domain aliasing cancellation as proposed in
`[8])2 This transform combines critical sampling with a
`good frequency resolution provided by a sine window
`(compared to a sine-taper window) and the computa-
`tional efficiency of a fast FFT-like algorithm. Typically
`128-512 equally spaced bands are used.
`5) Hybrid Structures (such as polyphase and MDCT):
`Using hybrid structures as first proposed in [9] it is
`possible to combine different frequency resolutions
`different frequencies with moderate implementation
`complexity. A hybrid system consisting of a polyphase
`filter bank and an MDCT is used in Layer III.
`ISO/MPEG/AUDIO
`Bitstream
`
`PFarE:‘:g
`
`Digital Audio
`3'9"“ (POM)
`
`Time
`Frequency -
`Mapping
`
`
`
`ISO/MPEG/AUDIO
`Bitstream
`
`( a)
`
`782
`
`
`
`
`
`
`Digital Audio
`Signal (PCM)
`
`
`
`Frequ er1}c/Time
`
`

`
`PAPERS
`
`Theoretically MDCT and polyphase filter banks be-
`long to the same class of time—frequency domain map-
`pings, called lapped orthogonal transform.
`
`3 GENERIC CODING CONCEPT
`
`In view of a number of totally different applications,
`a concept of a generic coding system was envisioned.
`Depending on the application, three layers of the coding
`system with increasing comp1ex_ity and performance can
`be used. A standard ISO decoder is able to decode bit-
`
`stream data which have been encoded in any of the
`layers. There will also be standard ISO Layer X de-
`coders, which are able to decode Layers X and X — n.
`The ISO/MPEG/Audio coding technique offers to deal
`with a much higher dynamic range, due to the scaling
`technique used, than Compact Disc or DAT,
`that is,
`conventional 16-bit PCM.
`
`In all three layers the input PCM audio signal is con-
`verted from the time domain into a frequency domain.
`This is done by a polyphase filter bank consisting of 32
`subbands [7].
`In Layers I and II a filter bank creates 32 subband
`representations of the input audio stream, which are then
`quantized and coded under the control of a psycho-
`acoustic model from which a blockwise adaptive bit
`allocation is derived.
`
`LayerI is a simplified version of the MUSICAM cod-
`ing scheme, most appropriate for consumer applications
`such as digital home recording on tapes, Winchester
`discs, or on magneto—optical disks, that is, for those
`applications for which very low data rates are not
`mandatory.
`Layer II introduces further compression with respect
`to Layer I by redundance and irrelevance removal on
`the scale factors, and uses more precise quantization.
`Layer II is nearly identical with the MUSICAM scheme
`[10], [1 1], with the exception of the frame header. This
`header has been added to the MUSICAM frame during
`the ISO/MPEG/Audio development work. Layer II has
`numerous applications in both consumer and profes-
`sional audio, such as audio broadcasting, television, re-
`cording, telecommunication, and multimedia [12].
`Layer III consists of a combination of the most effec-
`tive modules of the ASPEC [13] and MUSICAM coding
`schemes. An additional frequency resolution is provided
`by the use of a hybrid filter bank. Every subband is
`thereby further split into higher-resolution frequency
`lines by a linear transform that operates on 18 subband
`samples in each subband. In Layer III, nonuniform quan-
`tization, adaptive segmentation, and entropy coding of
`the quantized values are employed for a better coding
`efficiency. The application of this layer is appropriate
`most of all in telecommunication, in particular with nar-
`row-band ISDN and in the field of professional audio
`with high weights on very low bit rates.
`Joint stereo coding can be added as an additional fea-
`ture to any of the layers. This. technique exploits the
`redundancy and irrelevance of typical stereophonic pro-
`gram material and can be used to. increase the audio
`
`J. Audio Eng. Soc., Vol. 42, No. 10, 1994 October
`
`CODING OF HIGH-QUALITY DIGITAL AUDIO
`
`quality at low bit rates or reduce the bit rate for stereo-
`phonic signals [14], [15]. The increase of encoder com-
`plexity is small and requires negligible additional de-
`coder complexity. Joint stereo coding does not enlarge
`the overall coding delay.
`
`3.1 Psychoacoustic Models
`
`The psychoacoustic model calculates the minimum
`masking threshold necessary to determine the just no-
`ticeable noise level for each band in the filter bank. The
`
`difference between the maximum signal l_evel and the
`minimum masking threshold is used in the bit or noise
`allocation to determine the actual quantizer level in each
`subband for each block. Two psychoacoustic models are
`given in the informative part of the standard. While they
`can both be applied to any layer of the MPEG/Audio
`algorithm, in practice model 1 will be used for Layers I
`and II, and model 2 for Layer III. In both psychoacoustic
`models the final output of the model is a signal-to—mask
`ratio for each subband (Layers I and II) or group of
`bands (Layer III). The psychoacoustic models are only
`necessary in the encoder. This allows decoders of sig-
`nificantly less complexity. It is therefore possible to im-
`prove even later the performance of the encoder, relating
`the ratio of bit rate to subjective quality. For some appli-
`cations which are not demanding a very low bit rate, it
`is even possible to use a very simple encoder without
`any psychoacoustic model.
`
`3.1.1 Psychoacoustic Model 1
`
`A high frequency resolution, that is, small subbands
`in the lower frequency region, and a lower resolution in
`the higher frequency region with wide subbands should
`be the basis for an adequate calculation of the masking
`thresholds in the frequency domain. This would lead to
`a tree structure of the filter bank. The polyphase filter
`network used for the subband filtering has a parallel
`structure which does not provide subbands of different
`widths. Nevertheless, one major advantage of the filter
`bank is given by adapting the audio blocks optimally to
`the requirements of the temporal masking effects and
`inaudible preechoes. The second major advantage is
`given by the small delay and complexity. To compensate
`for the lack of accuracy of the spectrum analysis of the
`filter bank, a 512-point fast Fourier transform (FFT) for
`Layer 1, and a 1024-point FFT for Layer II are used in
`parallel to the process of filtering the audio signal into
`32 subbands [16]. The output of the FFT is used to
`determine the relevant tonal,
`that is, sinusoidal, and
`nontonal,
`that is, noise maskers, of the actual audio
`signal. It is well known from psychoacoustic research
`that the tonality ofa masking component has an influence
`on the masking threshold. For this reason it is worth-
`while to discriminate between tonal and nontonal compo-
`nents. The individual masking threshold for each masker
`above the absolute masking threshold are calculated de-
`pending on frequency position, loudness level, and to-
`nality. All the individual masking thresholds, including
`the absolute threshold, are added to the so-called global
`masking threshold. For each subband the minimum
`
`783
`
`

`
`BRANDENBURG AND STOLL
`
`PAPERS
`
`value of this masking curve is determined. Finally the
`difference between the maximum signal level, calculated
`by both the scale factors and the power density spectrum
`' of the FFT, and the minimum masking threshold is cal-
`culated for each subband and each block. The block-
`
`length for Layer I is determined by 12 subband samples,
`corresponding to 384 input audio PCM samples, and for
`Layer II by 36 subband samples, corresponding to 1152
`input audio PCM samples. This difference of maximum
`signal level and minimum masking threshold is called
`signal-to-mask ratio (SMR) and is the relevant input
`function for the bit allocation.
`
`3.1.2 Psychoacoustic Model 2
`The frequency—domain representation of the data is
`calculated via FFT with a window length of 1024 sam-
`ples. The calculation is done every 576 samples, that
`is, synchronous to the hybrid filter bank. The separate
`calculation of the frequency—domain representation is
`necessary because the hybrid filter bank values cannot
`easily be used to get a magnitude—phase representation
`of the input sequence. The magnitude—phase representa-
`tion is necessary to calculate the tonality of the current
`input block for every frequency component.
`The tonality estimation works using a simple polyno-
`mial predictor, as described in [9]. The basic idea is to
`use the predictability of the signal as an indicator for
`tonality. The prediction is done in the magnitude~phase
`domain. The values stores from the last two blocks are
`.used to predict the magnitude and phase of each fre-
`quency line for the current block. The Euclidian distance
`between estimated and actual values in the magni-
`tude—phase domain is normalized to the maximum pos-
`sible distance. The normalized value is called “chaos
`
`measure” and can assume values between 0 (the rotating
`phasor prediction had 0 distance from the actual value)
`and l (the predicted value has the maximum distance
`from the actual value). A logarithmic mapping is used
`to map the chaos measure range between 0.5 and 0.05
`to tonality values of between 0 and 1.
`The magnitude values of the frequency-domain repre-
`sentation are converted to a one-third critical band en-
`
`ergy representation. A convolution of these values with
`the cochlea spreading function follows. The next step
`in the threshold estimation is the calculation of the just
`masked noise level in the cochlea domain using the to-
`nality index and the convolved spectrum. A correction
`for the dc gain of the convolution has to be applied. The
`last step to get the preliminary estimated threshold is
`the adjustment for the absolute threshold. As the sound
`pressure level of the final audio output is not known in
`advance, the absolute threshold is assumed to be some
`
`amount below the LSB for the frequencies around 4 kHz.
`A more detailed description of the estimation of the
`masking threshold using spreading convolution can be
`found in [17].
`.
`The final step in the calculation of the threshold is
`preecho control. Preechoes are audible if the backward
`masking of the signal is not sufficient to mask the error
`signal, which was spread in time due to the limited
`784
`
`time resolution of the synthesis filter bank. This is only
`possible if there is a sudden increase in signal energy,
`at least for part of the signal bandwidth: From this a
`sufficient (but not necessary) condition for the absence
`of audible preechoes can be derived. The estimated
`masking threshold is restricted not to exceed the prelimi-
`nary estimated threshold of the last block. This condition
`
`on the final estimated threshold may reduce the estimated
`threshold by a large amount. To keep the actual quanti-
`zation noise below this modified threshold, additional
`bits need to be available to the quantization and coding
`loop. Layer III contains an intelligent buffer manage-
`ment scheme (called bit reservoir) in order to make the
`additional bits available when needed. This technique
`was taken from OCF (see [18]).
`
`4 LAYER I AND LAYER ll CODiNG SCHEME
`
`Block diagrams of the Layer 1 and Layer II encoders
`are given in Fig. 2. The coding technique for these layers
`is based on a subband splitting of the input PCM audio
`signal by a polyphase analysis filter bank into 32 equally
`spaced subbands, a dynamic bit allocation derived from
`a psychoacoustic model, block companding of the sub-
`band samples, and the bit-stream formatting [10], [l 1],
`[19]. The individual steps of the encoding and decoding
`process are explained in detailed form in the following
`sections.
`
`4.1 Filter Bank
`
`The prototype QMF filter is of order 511. It is opti-
`mized in terms of spectral resolution and rejection of
`side lobes, which is better than 96 dB. This rejection is
`necessary for a sufficient cancellation of aliasing distor-
`tions. This filter bank provides a reasonable tradeoff
`between temporal behavior on one side and spectral ac-
`curacy on the other. A time—frequency mapping provid-
`ing a high number of subbands facilitates the bit-rate
`reduction due to the fact that the human ear perceives
`the audio information in the spectral domain with a reso-
`lution corresponding to the critical bands of the ear, or
`even lower. These critical bands have a width of about
`
`100 Hz in the low—frequency region, that is, below 500
`Hz, and widths of about 20% of the center frequency at
`higher frequencies. The requirement of having a good
`spectral resolution is unfortunately contradictory to the
`necessity of keeping the transient impulse response, the
`so-called pre- and postecho, within certain limits in
`terms of temporal position and amplitude compared to
`the attack of a percussive sound. The knowledge of the
`temporal masking behavior [20] gives an indication of
`the necessary temporal position and amplitude of the
`preecho generated by a time—frequency mapping in such
`a way that this preecho, which normally is much more
`critical compared to the postecho, is masked by the origi-
`nal attack. In conjunction with the dual-synthesis filter
`bank located in the decoder, this filter technique pro-
`.vides a global transfer function optimized in terms of
`perfect impulse response perception.
`-
`In the decoder the dual-synthesis filter bank recon-
`
`J. Audio Eng. Soc., Vol. 42, No. 10, 1994 October
`
`

`
`PAPERS
`
`CODING OF HIGH-QUALITY D|GlTAL AUDlO
`
`structs a block of 32 output samples. The filter structure
`is extremely efficient for implementation in a low—com-
`plexity and non-DSP-based decoder and requires gener-
`ally fewer than 80 integer multiplications or additions
`per PCM output sample. Moreover, the complete analy-
`sis and synthesis filter gives an overall time delay of
`only 10.5 ms at a 48-kHz sampling rate.
`
`4.2 Determination and Cooling of Scale Factors -
`' The calculation of the scale factor for each subband
`is performed for a block of 12 ‘subband samples. The
`maximum of the absolute value of these 12 samples is
`determined and quantized with a word length of 6 bits,
`covering an overall dynamic range of 120 dB per sub-
`band with a resolution of 2 dB per scale factor class. In
`
`Layer I a scale factor is transmitted for each block and
`each subband that has no O-bit allocation.
`
`Layer II uses an additional coding to reduce the trans-
`mission rate for the scale factors. Due to the fact that
`
`in Layer II a frame corresponds to 36 subband samples,
`that is, three times the length of a Layer I frame (see
`Fig. 3), three scale f

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket