throbber
<2nEEE JOURNAL on
`
`EflflLEKIPEH)AflREHMS]IJ
`CKDLELICHNICUKTWIJEHS
`
`JUNE 1992
`
`VOLUME 10
`
`NUMBER 5
`
`ISACEM
`
`(ISSN 0?33-8716)
`
`&
`‘E-
`In
`
`—
`
`W:
`F'-:3
`ea
`in
`
`2 EEu
`
`.
`4::
`
`
`
`.
`_,.A PUBLICATION OF THE IEEE COMMUNICATIONS SOCIETY
`
`3 Si"EECI-I. AND IMAGE CODING
`
`5.75
`
`EE2=
`
`Gue's't"'EiiittIr—N-. Htihing
`
`...........',"Guest Editorial . . N. H nbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`.
`
`.
`
`.
`.
`
`;‘-PAPERS
`
`
`
`. .N. Joyrmt
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`ifiignal Compression: Technology Targets and Roses rch Directions [invited Paper”) .
`.5‘. Wang. A. Sekey, rind A. Gerrho
`.
`.
`.
`mn Objective Measure for Predicting Subjective Quality of Speech Coders (Invited Paper)
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`_
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`-A Low-Delay CELP Coder for the CCITT I"6__ Rb/s Speech Coding Standard (Invited .”:'Ip£’r)
`I
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. .J.-H. C,‘iien.1€. V. Cox. lr’.-C. Lin. N. Jayanr. and M. J. Meiclrner
`__.A H igh-Quality Multirate Real-Time CELP Coder Urtu.".'ed Paper) .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. P. Kmart and K. .S‘ivrunimm'iari
`;Teclmiques for improving the Performance of CELP-Type Speech Coders Un.ur'Ica' Paper) .
`.
`.
`.
`.
`. 1. A. GEJ".§'O.’l and M./1. Jariiik
`-‘Two-Channel Conjugate Vector Quantizer for Noisy Channei Speech Coding .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. T. Mariya
`Weighted Optimum Bit Allocations to Orthogonal Transforms for Picture Coding .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. B. Macq
`-'j’_Image Coding with the Discrete Cosine-Ill Transform .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. O. K. Ersoy and A. Nouira
`-Unified Variable-Length Transform Coding and Image-Adaptive Vector Quantization .
`.
`. L. Wang, M. Goldberg, rm.1l' .S'. Sfiiieri
`3-Adaptive Transform Tree Coding of Images .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. W. .'rt. Peariman, P. Jakordar. and M. M. Leimg
`
`'-Spectral Entropy-Activity Classification in Adaptive Transform Coding .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. . .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. R. Marrer and U. Franky
`
`5_Shape-Gain Vector Quantization for Noisy Channels with Applications to Image Coding .
`.
`.
`.
`. J. Rosebroc.-'< and P. W. Be.r.rh'ch
`;'Subband Image Coding Using Entropy-Coded Quantization over Noisy Channels .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. N. Tamrbe and N. Farm:-d:‘n
`5._:A Progressive Scheme for Digital Image I-ialfloning, Coding of Halftoncs. and Reconstruction .
`.
`. S. Ko."l'.I’a.t‘ and D. Ana.s'ro.r.'rr’otr
`_§Sing|e Bit-Map Block Truncation Coding of Color Images .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. Y. Wu and D. C. Coii
`_-Iplerframe Hierarchical Address-Vector Quantization .
`.
`.
`.
`. . .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. N. M. Nasrabadi’, C. Y. Chan, and J. U. Roy
`I.-A Fast Feature-Based Block Matching Algorithm Using integral Projections .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. J.-.5‘. Kiri: and R.-H. Park
`--A Transformation for the Calculation of Filter Pairs for Perfect
`licconstruction in Suhband Coding with Linequincnnx
`Subsampiing .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. J. De Lamar’.-‘(rerun and G. Sdiriarief
`
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`. . .
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`. . .
`
`.
`
`.
`
`.
`
`
`
`.
`
`
`
`RPX Exhibit 1134
`RPX v. DAE
`
`RPX Exhibit 1134
`
`RPX V. DAE
`
`
`
`
`
`-CALL FOR PAPERS
`.'
`
`Litllegrity of Public Teiecornmunication Networks .
`
`
`
`
`

`
`&
`IEEE COMMUNICATIONS SOCIETY E"
`
`G)
`
`
`
`S. B. WEINSTEIN, Director of Publications
`Belleore, Rm. 2L-287
`445 South Street
`Morristown, NJ 07960-1910
`
`IEEE COMMUNICATIONS SOCIETY
`JOURNAL Editorial Board I992
`
`W. H. TRAEITER. Editor-iii-Chief
`De . Elec.
`n .
`Univ. Missouri-Rolla
`Rolla, MO 65401-0249
`tranter@ee.ttmr.edu
`
`EU; L. Mli:DO;:)A:li;:t, Associate Editor
`e core, m.
`-
`l
`'
`445 South Street
`Morristown. NJ 07960-I910
`suc@thi.irripcr.be|lcore.com
`
`A. M. BUSH
`National Science Foundation
`[300 G. Street. NW
`Washington, DC 20550
`
`N. K. CHEUNG
`Bellcore, Rm. Nvc 32-219
`331 Newman Springs Rd.
`Red Bank, NJ 07701
`iijf.-_ HIAYE
`ep. E cc. En .
`Cfgloordia Univ.
`1
`5 De Maisonneuve.
`Montreal, Quebec,
`Canada H3G IM8
`
`M K
`. AWASHIMA
`Fujitsu Inst. Comp. Sci.
`l-l't'-25 Shinkamata
`Ota-I(i.i
`Tokyo 144-00. Japan
`%eM’§“§,"°£"C°m M E,
`Calf’,-etgn Univ
`1’
`-
`Colonel B Drgve
`Ottawa gm
`Canada‘ KlS'5B6
`NI.rt;LI%xBi1:icHUK
`A
`l Labs.
`600 Mountain Ave.
`Murray Hill, NJ 07974
`
`3-
`
`Senior Editorss
`. RAPPAPORT
`T.
`The Bradley Dep. Elcc. Eng.
`615 Whiltemore Hall
`Virginia Polytech. Inst. 3:
`‘Silo; Univfim
`0
`B ac
`urg,
`
`2406l-
`
`l I l
`
`D L T
`ENNENHOUSE
`.
`.
`M.I.T., Rm. NE43-S38
`545 Technology Square
`Cambfidgc. MA 92139
`
`A. A. ROBIIOCR
`_
`Italtel
`Via A. di‘To-cqucvilie 13
`20154 Milan, Italy
`W} D. SIlRCOS£t(3E28
`Be lcore, m.
`-
`445 South Street
`Morristown, NJ {N960-l9!0
`
`6
`
`IR. Yitriugosiii
`a 5.
`ujiisu
`‘l0lS Karnilmdanaka Nakahara-ltttrl
`Kawasaki 2| 1, Japan
`
`THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, INC.
`Officers
`
`.
`ARVID G. LARSON, l/ice Presiclettt, Professional Activitles
`MERRILL W. BUCKLEY, J R.. President
`J . T. CAIN, Vice President, Pirblicatioii Acti'vlti'e.r
`MARTHA SLOAN, President-Elect
`huts T. GANDIA, Vice Piifesidenr, lffegionol Airtivgrtier
`KARSTEN E.“?Rfi.NGElD3Secretary
`ARCO W. MIGLIARO.
`ice Presi ent, Stan or s Activities
`THEODORE
`.
`ISSEY, R.. Treasurer
`FERNANDO ALDA NA, Vice Presideiit, Technical Acti‘vi‘tie.s
`EDWARD A. PARRISH, Vice President, Educational Activities
`FREDERICK T. ANDREWS, JR., Director, Division IH—Comniunr'coti’on.r Technology Division
`
`Headquarters Staff
`ERIC HERZ. Executive Director and General Manager
`THOMAS W. BARTLETT, Associate General Mai-ioger—Finance and Adininistrotion
`WILLIAM D. CRAWLEY, Associate General Manager-Programs‘
`JOHN H. POWERS. Associate Genera! Monoger— Volunteer Services
`
`DONALD CHRISTIANSEN, Editor, IEEE Spectrum
`IRVING ENGELSON, Staff Director, Technical Activities
`LEO FANNINO, Staff Director, Professional Activities
`W. R. HABINGREITHER, .S‘tofi'Dtrector, Customer Service Center
`PHYLLIS HALL, Staff Dlrector, Publishing Services
`
`MELVIN I. 0I.I<I~:N, Staff Director, Field Services
`EDWARD ROSENBERG, Controller
`ANDREW G. SALEM, Stofl Dlrector. Standards
`RUDOLF A. STAMPFL, Staff Director, Ediicotiorinl Actr‘vr‘ti'e.t
`
`Publications Department
`Publications Managers: ANN I-I. BURGMEYER, GAIL S. FERENC
`Managing Editor.‘ VALERIE CAMMARATA
`Associate Ea’i’ror.- SHARVN L. PERRY
`
`IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS is published nine times a year in January, February, April, May, June, Aagusi.SepIemb=1‘-it
`October. and December by The Institute of Electrical and Electronics Engineers, Inc. Responsibility For the contents rests upon the authors and not upon tl:l'_Gj
`IEEE, the SocietyfCot.mcil,‘or its members. IEEE Headquarters: 345 East 4‘? Street, New York. NY l00I'?. NY Telephone: 212-705 + extension: lnforrtlfl?
`lion -1900; General Manager -79 I0; Public Information -7867: Publishing Services -7560; S ectrum -7556. Telecopiers: N Y (Headquarters) 212-‘I52-4929,;
`NY (Publications) 2l2-705-7632; NY Telex: 236-4-ll (international messages only). IE E Service Center (for orders, subscriptions, address chan
`'-
`Bducational Activities, Region Sectionfstudent Services, Standards): 445 Hoes Lane, PO. Box 1331, Piscataway, NJ 03855-1331. NJ Telephone: In 011,;
`matiori: 908-981-0060; 908-56 + extension: Controller -5365; Technical Activities -3900. IEEE Washington Oflice (for U.S. professional activities): 1828-‘
`L Street, NW, Suite I202, Washington. DC 20036-5 l 04. Washington Telephone: 202-785-00] 1'. Priee,i'Publieatiori Information: Individual copies: IEEE1
`members $10.00 {first copy only]. nonmembers $20.00 per copy. (Note: Add $4.00 postage and handling charge to any order from 51.00 to $50.00. iijicludilllz
`prepaid orders.) Member and nonmember subscription riccs avaiiable on request. Available in microfiche and microfilm. Copyright and Reprint Pei'i:nl3--
`alone: Abstracting is permitted with credit to the source.
`ibrariesare permitted to photocopy beyond the limits of the U.S. Copyright Law for
`rivate use oi’-'
`patrons: I) those post~l9Ti articles that carry a code at the bottom of the lirst
`age, provided the per-copy feeindieated in the code is pai
`tlirough Ihl-‘v.
`Copyright Clearance Center, 29 Congress Street, Salem, MA 01970; 2) pie-1 78 articles without fee. Instructors are permitted to photocopy isolated:
`articles for noncommercial classroom use without fee. For all other copying. reprint, or republication permission, write to Copyrights and Permissions;
`Department, IEEE Publishing Services, 445 Hoes Lane, P. O. Box 1331, Piscatawa . NJ 03 355- [33]. Copyright @) I992 by The Institute of Electrical anti-
`Electronics Engineers. Inc. All rights reserved. Second-class postage paid at New orlt, NY. and at additional mailing offices. Postmaster: Send addrcs-‘ls
`changes to IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, IEEE, 445 Hoes Lane. PD. Box 1331, Piscataway, NJ 03855-t33t. GS _
`Registration No. 125634188.
`
`
`
`
`
`The l'ield of interest of the IEEE Communications Society consists of all telecommunications including telephone, telegraphy, facsimile. and pain
`point television, by electromagnetic propagation including radio; wire; aerial, undcrground,_coaiiial, and submarine cables; waveguides, oommunica
`satellites, and losers; in marine, aeronautical. space. and fixed station services: repeaters. radio relaying. signal storage. and regeneration: telecomrituni "
`tam error detection and correction: multiplexing and carrier techniques: communication switching systems: data communications: and communion
`l. eory.
`-
`_
`In addition to the above, this JOURNAL or the IEEE TRANSACTIONS ON COMMUNICATIONS contains papers pertaining to analog and digital sig '
`processing and modulation, audio and video encoding techniques, the theory and design of transmitters, receivers, and ‘repeaters for communications I
`optical and sonic media, the design and analysis of computer communication systems. and the development of communication_sol‘tware. Coniributio -'
`theory enhancing the understanding of communication systems and techniques are included, as are discussions of the social implications of the developm
`of communication technology. All members of the IEEE are eligible for membership in the Society upon payment of the annual Society membership fgg;
`Sl5.00. Members may receive this JOURNAL or the IEEE TRANSACTIONS ON COMMUNICATIONS upon palyment ofan additional 3 l 5.00 ($30.00 total),
`both publications upon payment ofan additional $30.00 ($45.00 total). For information on joining, write to I e IEEE at the address below. Member copr‘
`Transacti'on.t,lJottrnaIs are for personal use on!y.
`
`'
`
`..-
`
`

`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`

`
`t"l'!E!\" r-I
`
`('lf.Z A LOW-IJEI..r\Y CELP CODER
`
`Bil
`
`speech coding standards now already exist for specific ap-
`plications. To avoid proliferation of different 16 kb/s
`standards and the potential difliculty of interworking be-
`tween different standards, "in June 1988 the CCITT de-
`
`cided to investigate the possibility of establishing a single
`16 kb/s speech coding standard for universal applica-
`tions. The CCITT’s intended applications include video-
`phone, cordless telephone, digital satellite systems, Dig-
`ital Circuit Multiplication Equipment (DCME), Public
`Switched Telephone Network (PSTN), Integrated Service
`Digital Network (ISDN), digital leased lines, voice store
`and forward systems, voice messages for recorded an-
`nouncements,
`land-digital mobile
`radio,
`packetized
`speech, etc. [12].
`Because of the variety of applications this 16 kb/s
`standard has to serve, the CCITT determined a stringent
`set of performance requirements and objectives for the
`candidate coding algorithms. Not every application would
`need every requirement to be met. Yet, to be accepted by
`the CCITT as a universal 16 kb/s standard. a candidate
`algorithm must meet all of the requirements. The major
`requirement was that the speech quality should be roughly
`comparable to that of G.72l while the one-way encoder!
`decoder delay should not exceed 5 ms (the objective was
`52 ms) [12]. In other words, the CCITT was looking for
`a toll-quality low-delay speech coder at 16 kb/s.
`The CCITT specifies the speech quality requirements
`in terms of qdu, or the quantization distortion unit. By
`definition, one encoding with a 64 kb/s (3.71 1 PCM co-
`dec introduces 1 qdu of distortion (CCITT Recommen-
`dation G. l 13). For asynchronous tandem connections, the
`qdu of individual speech coders is supposed to be addi-
`tive. For example, N asynchronous tandeming stages of
`G.7ll codecs should result in N qdu. Another example is
`that a single encoding ofa 32 kb/s 0.721 ADPCM codec
`is rated at 3.5 qdu, and therefore two ADPCM encodings
`should be rated at 7 qdu and four encodings at 14 qdu.
`The CCITT performance requirements for the 16 kb/s
`standard specify that, for a clear channel {i.e., no bit er-
`rors), a candidate coder should produce 4 qdu or less for
`a single encoding and 14 qdu or less for three asynchro-
`nous tandeming stages. In effect, this says that a candidate
`coder can be slightly worse than G.72l ADPCM for a
`single encoding, and three asynchronous encodings of the
`candidate coder should match the speech quality of four
`asynchronous encodings of G.72l ADPCM. For noisy
`channels, the CCITT required that, for random bit errors
`at a bit error rate (BER) of l0‘3 or 104, a candidate coder
`should produce decoded speech quaiity not worse than that
`of G.721 ADPCM under the same conditions.
`In addi-
`
`tion, the coder should pass network signaling tones such
`as DTMF and CCITT Signaling Systems No. S, 6. and 7.
`It was not so difficult to meet each one of these quality
`requirements individually. However,
`in 1988,
`it was a
`major challenge to create a 16 kb/s coder that would meet
`all ofthese requirements simultaneously. Furthermore, the
`addition of the 5 ms low-delay requirement made such an
`attempt even more difficult.
`
`Because of the low-delay requirement. none of the 16
`kb/s coders mentioned above (CELP, MPLPC. APC,
`ATC, and SBOADPCM) could be used in their current
`form. With all these well—established coders ruled out. the
`
`only hope seemed to be backwai'd-adaptive predictive
`coders which derive their predictor coefficients from pre-
`viously quantized speech and thus do not need to bulfer a
`large frame of speech samples.
`(The G.72l ADPCM
`coder belongs to this category.)
`Prior to the CClTT’s recent standardization effort at 16
`
`kb/s, several researchers had previously reported their
`work on low-delay speech coding at 16 kb/s. Jayant and
`Ramamoorthy I13], [14] used an adaptive postfilter to en-
`hance 16 kb/s ADPCM speech and achieved a mean
`opinion score (MOS) [15] of 3.5 at nearly zero coding
`delay. Cox at at‘. [16] combined SBC and vector quanti-
`zation (VQ) and achieved an MOS of roughly 3.5—3.7 at
`a coding delay of 15 ms. Berouti er a.-’. [17] reduced the
`coding delay of an MPLPC coder to 2 ms by reducing the
`frame size to 1 ms. However.
`the speech quality was
`equivalent to 5.5—bit log PCM—a significant degradation.
`Taniguchi et at.
`[18] developed an ADPCM coder with
`mnltiquantizers where the best quantizer was selected once
`every 2.5 ms. The coding delay of their real-time proto-
`type codec was 8.3 ms. With the help of postfiltering, the
`coder produced speech quality “nearly equivalent
`to a
`7-bit it-law PCM" [18], but this was achieved with a non-
`standard 6.4 kHz sampling rate and a resulting nonstan-
`dard bandwidth of the speech signal. Gibson et at‘. [19],
`[20] studied backward—adaptive predictive tree coders and
`predictive trellis coders which should have low coding de-
`lays. Unfortunately, they did not report the exact coding
`delay or the subjective speech quality. Iyengar and Kabal
`121] also developed a backward-adaptive predictive tree
`coder with a 1 ms coding delay and a level of speech qual-
`ity equivalent to 7-bit log PCM. Watts and Cuperman {22]
`proposed a vector generalization of ADPCM with a delay
`between I and 1.5 ms. They did not report the subjective
`speech quality of the coder.
`Since 1988, when the CCITT announced its intention
`
`to standardize a 16 kb/s low-delay speech coder, there
`has been a great deal of research activity in the area of
`low-delay speech coding at 16 kb/s [23]—[38].
`In re-
`sponse to the CClTT’s standardization effort, we have
`created a 16 kb/s coder called low-delay CELP, or LD-
`CELP, which achieves high speech quality with a one-
`way coding delay less than 2 ms [24], [29], [32], [33],
`[35}, [38].
`
`The LD—CELP coder is a predictive coder that com-
`bines: 1) high-order backward—adaptive linear prediction;
`2) backward gain—adaptive vector quantization [39], [40]
`for excitation; 3)
`the analysis-by-synthesis excitation
`codebook search of CELP; and 4) adaptive postfiltering
`[I4], [41]. The low coding delay is achieved by using
`backward-adaptive prediction to avoid the long speech
`buffet‘ required by forward—adaptive prediction, and by
`using a small excitation vector size of only five samples,
`or 0.625 ms (assuming the standard 8 kHz sampling rate).
`
`

`
`332
`
`EEEIE JOL:'R|\l.I\|_ UN .‘$ELi-'.C']‘I€D ARIEAS IN CO.\-1MUNIC.-\'l"lONS.
`
`\"()i__ Iii. N0. 5. JUNE N93
`
`With the processing delay and transmission delay also in-
`cluded,
`the total one—way coding delay can be less than
`2 ins. This not only surpasses the CCITT delay require-
`ment of 5 ms but actually meets" the objective of 2 ms.
`This LD-CELP coder was submitted by AT&T to the
`CCITT and has been the only candidate coder since 1989.
`This coder has been implemented in real—time hardware
`using the AT&T WE“? DSP32C floating-point digital sig-
`nal processor, and the resulting ha1'dware prototype
`LD-CELP coder has been used_ in the official CCITT lab-
`oratory tests.
`
`In the standardization process, there were two phases
`of laboratory testing. The first phase of testing was con-
`ducted in late 1989 and early I990, while the second phase
`was in early I991. The LD-CELP cotter submitted for the
`first phase of testing (called the Phase 1 coder from here
`on) met all of the CCITT’s performance requirements ex-
`cept for the requirement of three asynchronous tandems.
`Based on the Phase 1
`test results,
`the Speech Quality
`Experts Group (SQEG) of the CCITT indicated that the
`LD-CELP coder could be standardized for point-to—point
`applications but not for networking applications where
`tandeming may occur, unless the coder could be improved
`to meet the tandeming performance requirement
`in the
`Phase 2 test.
`
`In late 1990 to early 1991, we improved the LD-CELP
`coder’s tandeming performance significantly and pro-
`duced what we called the Phase 2 cotter. The hardware
`
`prototype Phase 2 coder was then tested in the second
`phase of laboratory testing in 1991. From the Phase 2 test
`results, the SQEG concluded that the 16 kb/s LD-CELP
`coder “has a performance eqttittctfetrt to or better than
`G. 721,“ and “meets at! the speech qtrutity require.-ueurs
`set by Study Group XV and tested by Study Group XH ”
`[42]. Therefore,
`“the SQEG recommends
`that
`the
`I6 kb/s LD-CELP cortec can be stuttdarrtt'zed as o CCITT
`G Series Recottttttettdrttfott as t'egut'ds to its speech quot-
`tty” [42]. According to the current standardization sched
`ule, this 16 kb/s LD-CELP coder is expected to be stan-
`dardized by the first half of 1992.
`In this paper, we will describe the 16 kb/s LD-CELP
`coding algorithm,
`its
`implementation, and its perfor-
`mance. Section II
`introduces system concepts and pro-
`vides an overview ofthe LD-CELP coder. Section III de-
`
`scribes the LD-CELP coding algorithm. Section IV
`discusses the implementation issues. Section V describes
`the subjective and objective performance, and Section VI
`gives some concluding remarks.
`
`II. SYSTEM Cot~JCt'-;t='rs AND OVERVIEW
`
`In this section, we review the conventional CELP al-
`gorithm [1] and then give an overview of the LD-CELP
`algorithm and point out the differences between conven-
`tional CELP and LD-CELP. Along the way, we also dis-
`cuss the issue of coding delay.
`
`A. Review of Conventional CELP
`
`A typical example of the conventional CELP speech
`coder is shown in Fig. 1. The CELP coder is based on the
`
`“source-filter" speech production model [43], with the
`short—term synthesis filter modeling the vocal tract and the
`excitation VQ, together with the long-term synthesis fi].
`ter, modeling the glottai excitation. The CELP coder syn-
`thesizes speech by passing a gain-scaled excitation se- -
`quence through long-term and short-term synthesis filters,
`Both synthesis filters are all-pole filters containing either
`a long~term or a short-term predictor in a feedback loop.
`Basically.
`the CELP coder encodes speech frame—by-
`fralne, and within each frame it attempts to find the best
`predictors, gain, and excitation such that a perceptually
`weighted mean—squared error (MSE) between the input
`speech and the synthesized speech is minimized.
`The long-term predictor is often referred to as the pitch
`predictor, because its main function is to exploit the pitch
`periodicity in voiced speech. Typically, a one-tap pitch
`predictor is used,
`in which case the predictor transfer
`function is:
`
`P.(z) = 62 "’
`
`(1)
`
`where p is the bulk delay or pitch period, and B is the
`predictor tap. The short-term predictor is sometimes re-
`ferred to as the LPC predictor, because it is aiso used in
`the well—known LPC (linear predictive coding) vocoders
`which operate at 2.4 kb/s or below. The LPC predictor
`is typically a 10th-order predictor with a transfer function
`of:
`
`10
`
`P2(Z) =
`
`a.-zit
`
`(2)
`
`through am are the predictor coefficients. The
`where a.
`excitation VQ codebook contains a table of codebook vec-
`tors (or codevectors) of equal length. The codevectors are
`typically populated by Gaussian random numbers with
`possible center clipping.
`_
`In the actual encoding process, the encoder first buffers
`an input speech frame of about 20 ms or so, and then
`performs linear predictive analysis [43] (or LPC rutut'ysr's)
`on the buffered speech. The resulting LPC parameters are
`then quantized. The pitch predictor parameters, including
`the pitch period and the predictor tap, are then determined
`either in an open-loop fashion [1] or in a closed-loop fash-
`ion [44]. The quantized LPC parameters and pitch pre-
`dictor parameters are both sent as side information to the
`decoder. This scheme is called fot‘wut'd-adoptive preo't'c-
`tiou.
`
`The input speech frame is further subdivided into sev-
`eral equal-length subfrumes, or vectors, typically of size
`4 to 8 ms. Then, for each vector, the encoder passes each
`candidate codevector in the excitation VQ codebook
`through the gain scaling unit and the two synthesis filters,
`and then compares the corresponding filtered output vec-
`tor with the input speech vector and computes the asso-
`ciated perceptually weighted MSE. distortion. The en-
`coder repeats this process for all candidate excitation
`codevectors and then identifies the codevector that mini-
`
`mizes the perceptually weighted MSE distortion. This
`
`
`
`

`
`CHEN at air. A LOW-DELAY CEl.P LTODER
`
`333
`
`multiplex
`
`Encode
`and
`
`
`quantization
`
`LPC
`analysis &
`
`
`
`
`
`Long-term
`synthesis
`
`
`
`
`
`
`
`Output
`postfiltered
`
`\m_,:_..._/
`
`Long-term
`synthesis
`filter
`
`{bl
`
` j/
`Short-term
`synthesis
`filter
`
`Fig. 1.
`
`{:1} Typical conventional CELP encoder. {bl Corresponding CELP
`decoder.
`
`is
`process is called a closed-loop Search (sometimes it
`called an (maIysis—by-.rymItesis procedure). Vector quan-
`tization of the excitation using a closed-loop search is the
`main feature of CELP;
`it
`is also the main reason why
`CELP coders outperform other linear predictive coders
`such as APC and MPLPC.
`
`it is possible to jointly optimize the ex-
`Theoretically,
`citation codevector and the gain.
`In practice, however,
`almost all conventional CELP coders separately quantize
`the gain subsequent
`to the closed—loop excitation VQ
`codebook search. There are at least two reasons for this.
`First, performing such sequential optimization produces
`a much lower computational complexity than a joint op-
`timization approach. Second, with the gain typically
`quantized to 5 b or so, the resolution in gain quantization
`is high enough that the performance difference between
`the two approaches is essentially negligible.
`
`five ditiercnt kinds of infor-
`In conventional CELP.
`mation are encoded and sent to the decoder. These are:
`
`1) the LPC parameters; 2} the pitch period; 3) the pitch
`predictor tap; 4) the excitation gain; and 5) the excitation
`“shape“ codevectors. The decoder decodes such infor-
`mation and reproduces speech frame-by-frame by exciting
`the two cascaded synthesis filters with the seated excita-
`tion vector sequence. Usually, a postfilter of the type pro-
`posed in [41] is used in a CELP decoder to enhance the
`perceptual quality of decoded speech.
`
`B. Cridfrtg De.-'rt_v
`
`In the Introduction, we mentioned that a conventional
`
`CELP coder typically has a one-way coding delay of 50
`to 60 ms. The one-way coding delay is defined as the
`elapsed time from the instant a speech sample arrives at
`
`
`
`

`
`334
`
`IEEE JOURNAL ON SELECTED ARE.-XS IN CUMMUNlf.'.-\Tl0NS. VOL.
`
`ID. NO. 5. JUNE I993
`
`the encoder input to the instant when that same speech
`sample appears at the decoder output, less any delay added
`by other communication equipment (such as modems) in
`between the encoder—decode'r pair, and the signal propa-
`gation delay which depends on distance. In other words,
`it
`is as if the encoder and the decoder were connected
`
`back-to-back by wires at the same physical location with-
`out any equipment in between. This definition makes the
`coding delay dependent only on the coding algorithm. not
`on other equipment or communication distance.
`By such definition,
`the coding delay of CELP coders
`can be roughly determined in terms of the speech frame
`size used. The coding delay consists of three kinds of de-
`lay: 1) algorithmic buffering delay; 2) processing delay;
`and 3) bit transmission delay. These delays are explained
`below. First, due to the forward-adaptive LPC analysis,
`the CELP encoder first has to buffer one frame of speech
`samples before the encoding of the first sample in that
`frame can be started. Such buffering introduces at least
`one frame worth of buffering delay. Second, assume the
`real—time hardware is just fast enough to run the coder in
`real-time (which is usually the case). Then,
`it will take
`almost one frame worth of processing delay to perform
`the encoding and decoding of the buffered speech frame.
`Third, suppose the encoder does not start sending bits cor-
`responding to a given speech frame until the encoding of
`the entire frame is completed and the decoder does not
`start decoding a speech frame until all of the bits of that
`frame have been received, then one additional frame of
`
`bit transmission delay will be introduced. This is because
`it is one frame worth of time from the instant the first bit
`of the frame is sent to the instant the last bit of the frame
`
`is received, provided that the bit rate of the communica-
`tion channel is the same as the bit rate of the speech coder.
`Hence, the total one—way coding delay of a CELP coder
`is roughly three frames.
`Of course, the above delay analysis is oversimplified.
`In practice,
`it
`is possible to reduce the processing delay
`without using a faster processor. This can be achieved by
`sending bits out “on-the~fly“ at the encoder as soon as
`certain bits become available, and by decoding bits “on-
`the—fiy“ at the decoder as soon as the received bits are
`suflicient to start decoding the first speech sample in the
`frame. (Some constraints on bit timing have to be care-
`fully observed, of course.) Having a fast processor can
`further cut down the processing delay. The use of a faster
`communication channel (e.g., in multiplexed cornmuni-
`cation systems) can also reduce the bit transmission de-
`lay. On the other hand, for ease of implementation, some-
`times a designer Inay choose to have a coding delay longer
`than three frames. Therefore, the actual coding delay is
`likely to vary between two and four frames, depending on
`the actual implementation and application. If we take 2.5
`to 3 frames as the average, then a CELP coder with a 20
`ms frame size is likely to have a coding delay of 50 to 60
`ms. This long coding delay is mainly due to the large 20
`ms frame buffer that is required by the forward adaptation
`of-LPC predictor coeflicients.
`
`C. Ovet‘vt'eu-' of LD-CELP
`
`In LD-CELP, we reduce the coding delay by making
`the LPC predictor “backward-adaptive." (See [15, ch. 4,
`6] for a comprehensive discussion of forward and back-
`ward adaptation.) This means that the predictor coeffi-
`cients will not be derived from the input speech samples
`yet to be coded, but rather from previously quantized
`speech samples. Since previously quantized speech sam-
`ples are also available at the decoder, the decoder can also
`derive the predictor coefficients using the same procedure
`used in the encoder. Thus, there is no need to transmit
`
`any side information bits to specify the predictor cocffi—
`cients. More importantly, with backward adaptation, there
`is no need to buffer about 20 ms of input speech for for-
`ward—adaptive LPC analysis. Therefore,
`the excitation
`vector (subframe) becomes the basic bulfer unit, and the
`
`one—way coding delay becomes roughly three times the
`vector size (or vector dimension). Since the vector di-
`mension is much smaller than 20 ms, the coding delay is
`greatly reduced.
`From the very beginning, our goal was to bypass the 5
`ms delay requirement and to achieve the delay objective
`of2 ms or less. The following simple analysis shows that
`we also needed to make the excitation gain backward-
`adaptive in order to achieve this goal. With the 8 kHz
`standard sampling rate, a delay of 2 ms corresponds to 16
`samples. Since the coding delay is roughly three times the
`vector dimension, to achieve a one~way delay of 16 sam-
`ples or less, the largest vector dimension we can use is
`five samples (0.625 ms). With a vector dimension of five
`samples and a bit rate of two bits/sample. we only have
`10 bits to encode the excitation gain as well as the exci-
`tation shape codevector. The excitation gain typically re-
`quires four to five bits of resolution in conventional CELP.
`In our case, however, it would be highly inefficient to use
`40-50% of the total bit rate to encode the slowly varying
`and somewhat redundant gain term. To achieve better
`coding efficiency and speech quality under the 2 ms delay
`constraint, we use backw(ti'a’ gat'ri—aa'aptive vector quan-
`rizarioii [40] for the excitation. By making the excitation
`gain backward—adaptive, we derive the excitation gain
`from the gain information embedded in previously quan-
`tized excitation, and there is no need to send any bits to
`specify the excitation gain, since the decoder can derive
`the same gain in the same manner.
`A simplified block diagram of the 16 kb/s LD-CELP
`coder is shown in Fig. 2. In this coder, the excitation vec-
`tor has a dimension of five samples, so the one~way cod-
`ing delay is less than 2 ms. The pitch predictor in con—
`ventional CELP is eliminated, and a 50th—order LPC
`
`predictor is used. The LPC predictor coefficients are up-
`dated oncc every four speech vectors (2.5 ms) by pet‘-
`forming LPC analysis on previously quantized speech.
`The excitation gain is updated once every vector by using
`a l0th—0rder adaptive linear predictor in the logarithmic
`gain domain. The coeflicients of this log-gain predictor
`are updated once every four vectors by performing linear
`
`
`
`

`
`CHBN er (IL: A LOW-DELAY CELP CODER
`
`335
`
`Excitation
`VQ
`oodebook
`
`
` Perceptual
`
`
`weighting
`
`filter
`
`Output
`postfiltered
`
`Adaptive
`postfilter
`
`VQ index
`from
`
`Excitation
`VQ
`codebook
`
`Backward
`
`
`
`
`
`
`gain
`adaptation
`
`
`
`Fig. 2.
`
`(a) 16 kb/s |ow—delay Ci-ELP encoder. (b) 16 lib/s low-delay CELP
`decoder.
`
`(b)
`
`predictive analysis on the logarithmic gains of previously
`quantized and scaled excitation vectors. The 10th-order
`perceptual weighting filter is also updated once every four
`vectors, but it is updated by performing an LPC analysis
`on the input speech. To reduce the codebook search com-
`plexity, the 10-bit excitation VQ codebook is a product
`code

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket