throbber
@:/5.55 JOURNAL ON
`
`SELECTED AREAS IN
`COMMUNICATIONS
`
`(ISSN 0733-8716}
`
`31
`, Cir..-
`PE:1
`an
`NJ
`
`2 EEL
`
`Lo § 3
`
`-‘.
`3
`
`z:
`
`JUNE 1992
`
`VOLUME 10
`
`NUMBER 5
`
`ISACEM
`
`_A PUBLICATION OF THE IEEE CDMMUNICATIONS SOCIETY
`
` '
`
`SPEECH i_t_ND IMAGE comma
`Guest E_dito_r—N. I-lubing
`
`PAPERS
`
`.
`
`‘
`
`. .N. Joyrmr
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`_
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`' Signal Compression: Technology Targets and Research Directions [Invited Paper) .
`.5’. Wang, A. Sekey. and A. Gersho
`.
`.
`.
`.
`Objective Measure for Predicting Subjective Quality of Speech Coders [hioirerl Paper)
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`_
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`—-A Low-Delay CELP Coder for the CCITT I6 khfs Speech Coding Standard (hmfted Paper) .
`,
`.
`. . .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. .J’.-H. Chen, R. V. Cox, }’.~C. Lin, N. Jayanr, and M. J. Mefdiiter
`A High-Quality Muitirate Real-Time CELP Coder (Invited Paper) .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. P. Kroc.-n and K. Slt’tImiJ'lt'i'f.hflH
`Techniques for Improving the Performance of CELP—Typc Speech Coders [.-’noi'red Paper) .
`.
`.
`.
`.
`. I. A. Gerson arm‘ M. A. Ja.rimt'
`1 Two-Channel Conjugate Vector Quantizer for Noisy Channel Speech Coding .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. T. Mariya
`‘Weighted Optimum Bit Allocations to Orthogonal Transforms for Picture Coding .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. .8. Macq
`_ Image Coding with the Discrete Cosine-III Transform .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. 0. K. Ersoy and A. Notziro
`-.;Unified Variable-Length Transform Coding and Image-Adaptit-e Vector Quantization .
`.
`. L. Wang, M. Goldberg, rm.1’.S‘. S.-‘Him:
`I Adaptive Transform Tree Coding of images .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. W. /‘t. Pearlmon, P. Jakotdnr, and M. M. Lezmg
`I Spectral Entropy-Activity Classification in Adaptive Transform Coding .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. R. Master and U. F:-amlce
`[Shape-Gain Vector Quantization for Noisy Channels with Applications to Image Coding .
`.
`.
`.
`. J. Rosebrork and P. W. Be.r.r.-‘fair
`_ Stlbband Image Coding Using Entropy-Coded Quantization over Noisy Channels .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. N. Tmiabc and N. Fnruardfn
`"__;A Progressive Scheme for Digital Image Halfloning, Coding of Halftoncs. and Reconstruction .
`.
`. S. Koh’i‘a.c and D. Ana.rra.r.r:‘on
`‘Single Bit-Map Block Truncation Coding of Color Images .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. Y. Wu and D. C‘. Col!
`" ;I,I_1terframe Hierarchical Address-Vector Quantization .
`.
`.
`. ., .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. N. M. .Nnr.rrnbadr'. C‘. Y. Choc. and J. U. Roy
`-A Fast Feature-Based Block Matching Algorithm Using Integral Projections . .' .
`_
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. J.-.S'. Kim and R.-H. Park
`PL Transformation for the Calculation of Filter Pairs for Perfect Reconstruction in Subbancl Coding with Linequincumt
`Subsampiing .
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`. J. De Lrm-:ei'Hicm'e and G. Schamel
`
`,
`.
`if“ *:"°“ PAPER?
`I
`-. Illegrity of Public Telecommunication Networks .
`-
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`T
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`RPX Exhibit 1034
`RPX Exhibit 1034
`RPX v. DAE
`RPX V. DAE
`
`

`
`iEi~:i-.' COMMUNICATIONS SOCIETY %'“
`

`
`The field of interest of the IEEE Communications Society consists of all telecommunications including tetcphone. teiegraphy. facsimile. and point.
`point television. by electromagnetic propagation including radio; wire; aerial. underground. coaxial. and submarine cables; waveguides. communiq;
`satellites. and lasers; in marine. aeronautical. space. and fitted station services: repeaters. radio relaying. signal storage. and regeneration: tclecommuri
`(Lon error detection and correction: multiplexing and carrier techniques; communication switching systems: data communications; and communion
`t eory.
`-
`I
`_
`_
`In addition to the above. this JOURNAL or the IEEE TRANSACTIONS ON COMMUNICATIONS contains papers pertaining to anaiog and digital si
`processing and modulation. audio and video encoding techniques. the theory and design of transmitters. receivers. and repeaters for communications j
`optical and sonic media. the design and analysis of computer communication systems, and the development of COI't'lI1'I|lI'_lIl'.?ttI|.l:2Il1‘SOI‘l_.\lI'3I'e. Contributions-‘
`theory enhancing the understanding of communication systems and techniques are included. as are discussions of the social implications of the tlettelcipm
`of communication technology. All members of the IEEE are eligible For membership in the Society upon payment ofthe annual Society membership
`$15.00. Members may receive this JOURNAL or the IEEE TRANSACTIONS ON COMMUNICATIONS upon payment ofao additional Si 5.00 ($30.00 total)
`both publications upon payment ofan additional $30.00 {$45.00 total). For information onjoining. write to the IEEE at the address below. Membercopfg
`Trnn.rttcIion.rfJoitrntrt.r ore for personal use onty.
`
`'
`
`IEEE COMMUNICATIONS SOCIETY
`JOURNAL Editorial Board 1992
`
`S. B. WEINSTEIN. Director of Publicotioris
`Bellcore. Rm. 2L-28'?
`445 South Street
`Morristown. NJ 07960-1910
`
`A. M. BUSH
`National Science Foundation
`1800 G. Street. NW
`Washington. DC 20550
`
`N_ K_ CHEUNG
`Betlcore. Rm. NVC 32-2:9
`331 Newman S rings Rd.
`new-=u=.~: m»
`J_ I.-_ HAYES
`Dep. Elec. Eng.
`Concordia Univ.
`1455 De Maisonneuve.
`Montreal. Quebec.
`Canada H3G IM8
`
`M. KAWASHIMA
`Fujitsu Inst. Comp. Sci.
`1- I7-25 Shinkamatn
`Ota-Ku
`Tokyo 144-00. Japan
`
`5‘ """“'“°”°
`3:91‘ tSy5tl'J&. C°mP""" Eng’
`c:i.:..:i"...ii*.-...
`g;‘:a"g‘a-I?{'§- 536
`N. MAXEMCHUK
`AT&T Bell Labs.
`600 Mountain Ave.
`Murray Hill. NJ 0'i974
`
`W. H. TRANTER. Editor-in-Chief
`Dep. Elec. Eng.
`Univ. Missouri-Rolla
`Rolla. MO 6540i-0249
`tranter@ ee.ttmr.edu
`Senior Editors
`T. S. RAi=vAi=oit'r
`The Bradley Dep. Elec. Eng.
`6 I5 Whittemore Hall
`Virginia Polytech. Inst. St
`State Univ.
`BIB.CkSlJI.l!‘g, VA
`A. A. ROBROCK
`391:3...
`la
`.
`t
`ocquevi c
`20154 MIlaI1.Itaiy
`W. D. SINCOSKIE.
`I-lellcore, Rm. 2Q-286
`445 South Street
`Morristown. NJ 0'i960-l9l0
`
`SUE L. MCDONALD. Associate Edit-,3
`Bellcore. Rm. 2P-291
`445 South Street
`Motristown, NJ 07960-5910
`sue @ thumper.be|lcore.cont
`
`D. L. TENNENHOUSE
`M.I.'I'.. Rm. NE43-533
`545 Technology Square
`Cambridge. MA 02139
`
`R. Ynrsuaosiii
`Fujitsu Labs.
`l0l5 Kamiiiodzinaka Nzikahara-lru_..'
`Kawasaki 211. Japan
`
`II
`
`,3
`
`THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS. INC.
`Ofllcers
`
`.
`ARVID G. LARSON. Vice Presidetrt. Professiorroi Acti‘vtti’es
`MERRILL W. BUCKLEY. J R.. President
`J. T. CAIN. Vice Prestdeirt. Pitbticntion Activities
`MARTHA SLOAN. President-Eiect
`LUIS T. GANDiA. Vice President. Regional Activities
`KARSTEN E. DRANGEID. Secretary
`MARCO W. MIGLIARO. Vice President. Standards Activities
`THEODORE W. HISSEY. J R.. Treasurer
`FERNANDO ALDANA. Vice President. Tecitnicoi Activities
`EDWARD A. PARRISH. Vice President. Educational Activities
`FREDERICK T. ANDREWS. JR.. Director. Dtotliiott III-—Commimr‘cotions Technology Division
`
`Headquarters Staff
`ERIC I-IERZ. Executive Director and General Manager
`THOMAS W. BARTLETT. Associate Genera! Monager—Ft'norice and Adniinr'.itrati‘on
`WILLIAM D. CRAWLEY. Associate Generai Mctnager—Progrnni.r
`JOHN H. POWERS. Associate General Ma-troger— Volunteer Services
`
`DONALD CHRISTIANSEN, Editor, IEEE Spectrum
`IRVING ENGELSDN, Staff Director. Technical Activities
`LEO FANNING. Staff Director, Professional Activities
`W. R. HABINGREITHER. Staff Director. Customer Service Center
`PHYLLIS HALL. Staff Director. Publishing Services
`
`MEi.vii~i I. OLKEN. Staff Director. I-‘ictd Services
`EDwARD ROSENBERG. Controiier
`ANDREW G. SALEM, Staff Director. Staridards
`RUDOLF A. STA MPFL. Staff Director. Educattotioi AcrivItie.t
`
`Publications Department
`Publications Managers: ANN H. BURGMEYER. GAIL S. FERENC
`Managing Editor: VALERIE CAMMARATA
`Associate Editor: SHARVN L. PERRY
`
`r
`
`IEEEJOURNALON SELECTED AREAS IN COMMUNICATIONS is published nine timesa year in January. February. April. May. June. August.SepternbBT_ti-
`October. and December by The Institute of Electrical and Electronics Engineers. Inc. Responsibility for the contents rests upon the authors and not upon
`IEEE. the SocietyfCouncil.‘or its members. IEEE Headquarters: 345 East 47 Street, New York. NY 1001?. NY Telephone: 212-705 + extension: Informs"-=
`tion -7900; General Manager -7910; Public Information -7867; Publishing Services -‘.’560; Sgectrum -‘I556. Telecopiers: NY (Headquarters) 212-'l'S2-41929.:
`gdY (Pubtiqations) 212305-76%: NY ;!I'§le§: 23g-4-ll {international messages only). [E E Service Center (for orders, subscriptions, address chqn es.
`ucationa Activities.
`egion
`ection
`tu ent crvices. Standards}: 445 Hoes Lane. PO. Bo
`I331. Facatawa . NJ 08855-1331. NJ Tele hone:
`in or-'5
`matiori: 908-931-0060; 903-56 + extension: Controller -5365; Technical Activities -3900; IEEEx\V8SIJtl1gll0l'| Officyc [for US. professional 3Ct.!l’ltllIcS)I 1323.-
`L Street. NW. Suite 1202. Washington. DC 20036-5104. Washington Telephone: 202~'iB5-0017. Price,’Puhli'cation Information: Individual copies: IEEB:
`members S I 0.00 (first copy only). nonmembers $20.00 per copy. (Note: Add $4.00 postage and handling charge to any order from $1.00 to $50.00. including
`prepaid orders.) Member and nonmember subscription prices available on request. Available in microfiche and microfilm. Copyright and Reprint Pei-nits}
`sions:Abstracting is permitted with credit to the source. L'braries ar
`'ttedt ptiotoco
`be
`it th i'mits of the U.S. C
`' ht I.
`f
`' ate set)
`patrons: 1) those post-19?? articles that carry a code at the bottortfdjfihg first ‘page. pl'0\["’IydEd’IIIl]; perfchpy fee indicated I?1Pl.yI‘Id.gC(!tieaI: piiui
`Iirrouglh the;
`Copyright Clearance Center. 29 Congress Street. Salem. MA 0l9'i0; 2) pre-l978_ articles with_out_fee. Instructors are permitted _to photocopy isolated;
`articles for noncommercial classroom use without fee. For all other copying. reprint. or republication permission. write to Copyri his and Permisstom
`Department, IEEE PublishinqServices. 445 I-Ioes-Lane. P. O. Box 1331. Piscataway. NJ 088554 33 I . Copyright (=3 1992 by The Institute of Electrical art'fl__-
`A
`Electronics Engineers. Inc.
`I rights reserved. Seoond«class postage paid at New York, NY. and at additional mailing offices. Postmaster: Send addrc5§l
`changes to IEEE JOURNAL ON ELECTED AREAS [N COMMUNICATIONS. IEEE.-145 Hoes Lane. P.0. Box I331. Piscataway. NJ 08855-1331. GST.
`Registration No. 125634188.
`
`

`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`

`
`(‘HEN H ni.:
`
`.i'\ LOW-DEL.-\Y CELP CODER
`
`R3l
`
`speech coding standards now already exist for specific ap-
`plications. To avoid proliferation of different 16 kb/s
`standards and the potential difficulty of interworking be-
`tween diiferent standards, "in June 1988 the CCITT de-
`
`cided to investigate the possibility of establishing a single
`16 kb/s speech coding standard for universal applica-
`tions. The CCITT‘s intended applications include video-
`phone, cordless telephone, digital satellite systems, Dig-
`ital Circuit Multiplication Equipment (DCME), Public
`Switched Telephone Network (PSTN), Integrated Service
`Digital Network (ISDN), digital leased lines, voice store
`and forward systems, voice messages for recorded an-
`nouncements,
`iancl-digital mobile
`radio,
`packetized
`speech, etc. [[2].
`Because of the variety of applications this 16 kb/s
`standard has to serve, the CCITT determined a stringent
`set of performance requirements and objectives for the
`candidate coding algorithms. Not every application would
`need every requirement to be met. Yet, to be accepted by
`the CCITT as a universal 16 kb/s standard, a candidate
`algorithm must meet all of the requirements. The major
`requirement was that the speech quality should be roughly
`comparable to that of G.'i'21 while the one—way encoder!
`decoder delay should not exceed 5 ms (the objective was
`52 ms) [12]. In other words, the CCITT was looking for
`a toll—quality low-delay speech coder at 16 kb/s.
`The CCITT specifies the speech quality requirements
`in terms of qdu, or the quantization distortion unit. By
`definition, one encoding with a 64 lcb/s G.7ll PCM co-
`dec introduces 1 qdu of distortion (CCITT Recommen-
`dation (3.1 13). For asynchronous tandem connections, the
`qdu of individual speech coders is Supposed to be addi-
`tive. For example, N asynchronous tandeming stages of
`G71 1 codees should result in N qdu. Another example is
`that a single encoding ofa 32 kb/s G.72l ADPCM codec
`is rated at 3.5 qdu, and therefore two ADPCM encodings
`should be rated at ? qdu and four encodings at 14 qdu.
`The CCITT performance requirements for the 16 kb/s
`standard specify that, for a clear channel (i.e., no hit er-
`rors), a candidate coder should produce 4 qdu or less for
`a single encoding and 14 qdu or less for three asynchro-
`nous tandeming stages. In effect, this says that a candidate
`coder can be slightly worse than G321 ADPCM for a
`single encoding, and three asynchronous encodings of the
`candidate coder should match the speech quality of four
`asynchronous encodings of G.?2l ADPCM. For noisy
`channels, the CCITT required that, for random bit errors
`at a bit error rate (BER) of 104 or 10-2, a candidate coder
`should produce decoded speech quality not worse than that
`of G.'r'2l ADPCM under the same conditions.
`In addi-
`
`tion, the coder should pass network signaling tones such
`as DTMF and CCITTSignaiing Systems No. 5, 6, and 7.
`It was not so difficult to meet each one of these quality
`requirements individually. However,
`in 1988.
`it was a
`major challenge to create a 16 kb/s coder that would meet
`all of these requirements simultaneously. Furthermore, the
`addition of the 5 ms low-delay requirement made such an
`attempt even more difficult.
`
`Because of the low—delay requirement. none of the 16
`kb/s coders mentioned above (CE-l_.P, MPLPC, AFC,
`ATC, and SBC-ADPCM) could be used in their current
`form. With all these well-established coders ruled out, the
`only hope seemed to be backiirrii'rI-ridripriirc predictive
`coders which derive their predictor coefiicients from pre-
`viously quantized speech and thus do not need to buffer a
`large frame of speech samples.
`(The G.'?2l ADPCM
`coder belongs to this category.)
`Prior to the CClTT‘s recent standardization effort at 16
`kb/s, several researchers had previously reported their
`work on low-deiay speech coding at 16 kb/s. Jayant and
`Rarnamoorthy I13}, [14] used an adaptive posttilter to en-
`hance l6 kb/s ADPCM speech and achieved a mean
`opinion score (MOS) [15] of 3.5 at nearly zero coding
`delay. Cox er in‘. [16] combined SBC and vector quanti-
`zation (VQ) and achieved an MOS of roughly 3.5-3.3’ at
`a coding delay of 15 ms. Berouti er at. {l?] reduced the
`coding delay of an MPLPC coder to 2 ms by reducing the
`frame size to 1 ms. However,
`the speech quality was
`equivalent to 5.5—bit log PCM—a significant degradation.
`Taniguchi er of. [18] developed an ADPCM coder with
`rnultiquantizers where the best quantizer was selected once
`every 2.5 ms. The coding delay of their real-time proto-
`type codec was 8.3 ms. With the help of postfiltering, the
`coder produced speech quality “nearly equivalent to a
`7-bit p.~1aw PCM" [18]. but this was achieved with a non-
`
`standard 6.4 kHz sampling rate and a resulting nonstan-
`dard bandwidth of the speech signal. Gibson et al. [19],
`[20] studied backward-adaptive predictive tree coders and
`predictive treliis coders which should have low coding de-
`lays. Unfortunately, they did not report the exact coding
`delay or the subjective speech quality. Iyengar and Kabal
`[21] also developed a backward-adaptive predictive tree
`coder with a 1 ms coding delay and a level of speech qual-
`ity equivalent to T-bit log PCM. Watts and Cuperman [22]
`proposed a vector generalization of ADPCM with a delay
`between 1 and 1.5 ms. They did not report the subjective
`speech quality of the coder.
`Since 1988, when the CCITT announced its intention
`
`to standardize a 16 itbfs low—delay speech coder, there
`has been a great deal of research activity in the area of
`low-delay speech coding at 16 kb/s [23]—[38]. In re-
`sponse to the CClTT’s standardization effort, we have
`created a 16 kb/s coder called low-delay CELP, or LD-
`CELP, which achieves high speech quality with a one-
`way coding delay less than 2 ms [24], [29], [32], [33],
`[35], [38].
`
`The LD-CELP coder is a predictive coder that com-
`bines: 1) high—order backward-adaptive linear prediction;
`2) backward gain—adaptive vector quantization [39], [40]
`for excitation; 3)
`the analysis-by-synthesis excitation
`codebook search of CELP; and 4) adaptive postfiltering
`[14], [41]. The low coding delay is achieved by using
`backward-adaptive prediction to avoid the long speech
`buffer required by forward—adaptive prediction, and by
`using a small excitation vector size of oniy five sampies,
`or 0.625 ms (assuming the standard 8 kHz sampling rate).
`
`

`
`832
`
`IEEE JOURN.-\L ON SELECTED ARE.-\S IN C().\'iML‘N|(?.-\'|‘IOi\".'§. VOL.
`
`ill. NO. 5. JUNE lull}
`
`With the processing delay and transmission delay also in-
`cluded,
`the total one-way coding delay can be less than
`2 ms. This not only surpasses the CCITT delay require-
`ment of 5 ms but actually meets" the objective of 2 ms.
`This LD-CELP coder was submitted by AT&T to the
`CCITT and has been the only candidate coder since 1989.
`This coder has been implemented in real—time hardware
`using the AT&T WEE” DSP32C floating—point digital sig-
`nai processor, and the resulting hardware prototype
`LD-CELP coder has been used in the ofhcial CCITT lab-
`
`oratory tests.
`In the standardization process, there were two phases
`of laboratory testing. The first phase of testing was con-
`ducted in late 1989 and early I990. while the second phase
`was in early I991. The l_.D-CELP coder submitted for the
`first phase of testing (called the Phase i coder from here
`on) met all of the CCITT’s performance requirements ex-
`cept fo1' the requirement of three asynchronous tandems.
`Based on the Phase 1
`test results,
`the Speech Quality
`Experts Group (SQEG} of the CCITT indicated that the
`LD-CELP coder could be standardized for point-to-point
`applications but not for networking applications where
`tandeming may occur. unless the code1' could be improved
`to meet
`the tandeming performance requirement
`in the
`Phase 2 test.
`
`In late 1990 to early 1991, we improved the LD-CELP
`coder‘s tanderning performance significantly and pro-
`duced what we called the Phase 2 coder. The hardware
`
`prototype Phase 2 coder was then tested in the second
`phase of laboratory testing in 1991. From the Phase 2 test
`results, the SQEG concluded that the 16 kb/s LD—CELP
`coder “has at petformrmce eqnivrrieiit to or better than
`G. 72!." and “meets (iii the speech qnriiity requirements
`set by Sttidy Group XV and tested by Study Group XH "
`[42]. Therefore.
`“the SQEG recommends’
`that
`the
`I6 kb / 5 LD- CELF codec can be .5‘i(t.Jt(iat‘(iiZ€d as (I CC1TT
`G Series Recommenriatian as regards to its speech quai-
`ity” [42]. According to the current standardization sched-
`ule, this 16 kb/s LD-CELP coder is expected to be stan-
`dardized by the first half of 1992.
`In this paper, we will describe the 16 kb/s LD-CELP
`coding algorithm,
`its
`implementation, and its perfor-
`mance. Section II introduces system concepts and pro-
`vides an overview ofthe LD-CELP coder. Section III de-
`
`the LD-CELP coding algorithm. Section IV
`scribes
`discusses the implementation issues. Section V describes
`the subjective and objective performance, and Section VI
`gives some concluding remarks.
`
`II. SYSTEM CONCEPTS AND Ovr-:RvtEw
`
`In this section, we review the conventional CELP al-
`gorithm [l] and then give an overview of the LD-CELP
`algorithm and point out the differences between conven-
`tional CELP and LD-CELP. Along the way. we also dis-
`cuss the issue of coding delay.
`A. Review 0fCon-uentioiiai CELP
`
`A typical example of the conventional CELP speech
`coder is shown in Fig. l. The CELP coder is based on the
`
`“source—filter” speech production model [43], with the
`short—term synthesis filter modeling the vocal tract and the
`excitation VQ, together with the long-term synthesis fi1_
`ter, modeling the glottal excitation. The CELP coder syn-
`thesizes specch by passing a gain-scaled excitation se— '
`quencc through long—term and short-term synthesis filters,
`Both synthesis filters are all-pole filters containing either
`a long-term or a short-term predictor in a feedback loop_
`Basically,
`the CELP coder encodes speech frame—by-
`frame, and within each frame it attempts to find the best
`predictors, gain, and excitation such that a perceptually
`weighted mean-squared error (MSE} between the input
`speech and the synthesized speech is minimized.
`The long-term predictor is often referred to as the pitch
`predictor, because its main function is to exploit the pitch
`periodicity in voiced speech. Typically, a one—tap pitch
`predictor is used.
`in which case the predictor transfer
`function is:
`
`Pi(z) = fiz _"’
`
`(1)
`
`where p is the bulk delay or pitch period, and .6 is the
`predictor tap. The short—term predictor is sometimes re-
`ferred to as the LPC predictor, because it is also used in
`the well-known LPC {linear predictive coding) vocoders
`which operate at 2.4 kb/s or below. The LPC predictor
`is typically a 10th-order predictor with a transfer function
`of:
`
`It}
`
`‘Pita =
`
`a.-z
`
`(2)
`
`through din are the predictor coefficients. The
`where ti.
`excitation VQ codebook contains a table of codebook vec-
`tors (or cotievectors) of equal length. The codevectors are
`typicaily populated by Gaussian random nttmbers with
`possible center clipping.
`In the actual encoding process, the encoder first buffers
`an input speech frame of about 20 ms or so, and then
`performs linear predictive analysis [43] (or LPC niiniysis)
`on the buffered speech. The resulting LPC parameters are
`then quantized. The pitch predictor parameters, including
`the pitch period and the predictor tap. are then determined
`either in an open-loop fashion [1] or in a closed-loop fash-
`ion [44]. The quantized LPC parameters and pitch pre-
`dictor parameters are both sent as side infonnation to the
`decoder. This scheme is called for'1vard—adciptive predic-
`tion.
`
`The input speech frame is further subdivided into sev-
`eral equa1—length .sttbft'ames, or vectors, typically of size
`4 to 8 ms. Then, for each vector. the encoder passes each
`candidate codevector
`in the excitation VQ codebook
`through the gain scaling unit and the two synthesis filters.
`and then compares the corresponding filtered output vec-
`tor with the input speech vector and computes the asso-
`ciated perceptually weighted MSE distortion. The en-
`coder repeats this process for all candidate excitation
`codevectors and then identifies the codevector that mini-
`
`mizes the perceptually weighted MSE distortion. This
`
`

`
`t_‘llE|\' H :r!.: A LO\\r’—DEL.-KY CELP CODI.-ZR
`
`Encode
`and
`multiplex
`
`LPC
`analysis &
`quantization
`
`Excitation
`VQ
`codebook
`
`Long-term
`synthesis
`filter
`
`{bl
`
`Short—term
`
`synthesis
`fitter
`
`Fig.
`
`I.
`
`(a) Typical conventional CELP encoder. ([1,) Corresponding CBLP
`decoder.
`
`is
`process is called a cl'osed—foop .rearci'r {sometimes it
`called an andIysfs-by—s_yr1rhesis procedure). Vector quan-
`lization of the excitation using a closed-loop search is the
`main feature of CELP;
`it
`is also the main reason why
`CELP coders outperform other linear predictive coders
`such as APC and MPLPC.
`
`Theoretically, it is possible to jointly optimize the ex-
`citation codevector and the gain.
`In practice, however,
`almost all conventional CELP coders separately quantize
`the gain subsequent
`to the closed-loop excitation VQ
`codebook search. There are at least two reasons for. this.
`
`First, performing such sequential optimization produces
`a much lower computational complexity than a joint op-
`timization approach. Second, with the gain typically
`quantized to 5 b or so, the resolution in gain quantization
`is high enough that the performance difference between
`the two approaches is essentially negligible.
`
`In conventional CELP. five different kinds of infor-
`mation are encoded and sent to the decoder. These are:
`
`l) the LPC parameters; 2) the pitch period; 3) the pitch
`predictor tap; 4) the excitation gain; and 5) the excitation
`“shape“ codevectors. The decoder decodes such infor-
`mation and reproduces speech frame—by—frame by exciting
`the two cascaded synthesis filters with the scaled excita-
`tion vector sequence. Usually, a postfiiter of the type pro-
`posed in [4]] is used in a CELP decoder to enhance the
`perceptual quality of decoded speech.
`
`B. Corfiirg Defrtry
`
`In the Introduction. we mentioned that a conventional
`
`CELP coder typically has a one—way coding delay of 50
`to 60 ms. The one~way coding delay is defined as the
`elapsed time from the instant a speech sample arrives at
`
`

`
`IEEE JOURNAL ON SI-Il.l-LC'|‘ED AREAS IN COMl\-lUNI("ATl(}NS. VOL.
`
`ltl. NO. 5. JUNIE W92
`
`the encoder input to the instant when that same speech
`sample appears at the decoder output, less any delay added
`by other communication equipment (such as modems) in
`between the encoder—decode‘r pair. and the signal propa-
`gation delay which depends on distance. In other words,
`it
`is as if the encoder and the decoder were connected
`
`back-to-back by wires at the same physical location with-
`out any equipment in between. This definition makes the
`coding delay dependent only on the coding algorithm. not
`on other equipment or communication distance.
`By such definition,
`the coding delay of CELP coders
`can be roughly determined in terms of the speech frame
`size used. The coding delay consists of three kinds of de-
`lay: 1) algorithmic buffering delay; 2) processing delay;
`and 3) bit transmission delay. These delays are explained
`below. First, due to the forward-adaptive LPC analysis,
`the CELP encoder first has to buffer one frame of speech
`samples before the encoding of the first sample in that
`frame can be started. Such buffering introduces at least
`one frame worth of buffering delay. Second, assume the
`real—time hardware is just fast enough to run the coder in
`real—time (which is usually the case). Then,
`it will take
`almost one frame worth of processing delay to perform
`the encoding and decoding of the buffered speech frame.
`Third. suppose the encoder does not start sending bits cor-
`responding to a given speech frame until the encoding of
`the entire frame is completed and the decoder does not
`start decoding a speech frame until all of the bits of that
`frame have been received, then one additional frame of
`
`bit transmission delay will be introduced. This is because
`it is one frame worth of time from the instant the first bit
`of the frame is sent to the instant the last bit of the frame
`
`is received. provided that the bit rate of the communica-
`tion channel is the same as the bit rate of the speech coder.
`Hence, the total one-way coding delay of a CELP coder
`is roughly three frames.
`Of course, the above delay analysis is oversimplified.
`In practice,
`it is possible to reduce the processing delay
`without using a faster processor. This can be achieved by
`sending bits out “on-the-fly“ at the encoder as soon as
`certain bits become available, and by decoding bits “on-
`the-fly" at the decoder as soon as the received bits are
`sufficient to start decoding the first speech sample in the
`frame. (Some constraints on bit timing have to be care-
`fully observed, of course.) Having a fast processor can
`further out down the processing delay. The use ofa faster
`communication channel (e.g.,
`in multiplexed communi-
`cation systems) can also reduce the bit transmission de-
`lay. On the other hand. for ease of implementation, some-
`times a designer may choose to have a coding delay longer
`than three frames. Therefore, the actual coding delay is
`likely to vary between two and four frames, depending on
`the actual implementation and application. If we take 2.5
`to 3 frames as the average, then a CELP coder with a 20
`ms frame size is likely to have a coding delay of 50 to 60
`ms. This long coding delay is mainly due to the large 20
`ms frame buffer that is required by the forward adaptation
`of LPC predictor coefficients.
`
`C. Over-New of LD-CELP
`
`In LD-CELP. we reduce the coding delay by making
`the LPC predictor “backward-adaptive." (See [15, ch. 4,
`6] for a comprehensive discussion of forward and back-
`ward adaptation.) This means that
`the predictor coeffi-
`cients will not be derived from the input speech samples
`yet
`to be coded, but rather from previously quantized
`speech samples. Since previously quantized speech sain-
`ples are also available at the decoder, the decoder can also
`derive the predictor coefiicients using the same procedure
`used in the encoder. Thus, there is no need to transmit
`
`any side information bits to specify the predictor coeffi-
`cients. More importantly, with backward adaptation, there
`is no need to buffer about 20 ms of input speech for for»
`ward-adaptive LPC analysis. Therefore,
`the excitation
`vector (subframe) becomes the basic buffer unit, and the
`one-way coding delay becomes roughly three times the
`vector size (or vector dimension}. Since the vector di-
`
`mension is much smaller than 20 ms, the coding delay is
`greatly reduced.
`From the very beginning, our goal was to bypass the 5
`ms delay requirement and to achieve the delay objective
`of2 ms or less. The following simple analysis shows that
`we also needed to make the excitation gain backward-
`adaptive in order to achieve this goal. With the 8 kHz
`standard sampling rate, a delay of2 ms corresponds to 16
`samples. Since the coding delay is roughly three times the
`vector dimension, to achieve a one-way delay of 16 sam
`ples or less, the largest vector dimension we can use is
`five samples (0.625 ms). With a vector dimension of five
`samples and a bit rate of two bits/sample, we only have
`10 bits to encode the excitation gain as well as the exci-
`tation shape codevector. The excitation gain typically re-
`quires four to five bits of resolution in conventional CELP.
`In our case, however, it would be highly inefficient to use
`40-50% of the total bit rate to encode the slowly varying
`and somewhat redundant gain term. To achieve better
`coding efiiciency and speech quality under the 2 ms delay
`constraint, we use backward g(ifll-ftditpffve vector quan-
`tization [40] for the excitation. By making the excitation
`gain backward-adaptive. we derive the excitation gain
`from the gain information embedded in previously quan-
`tized excitation, and there is no need to send any bits to
`specify the excitation gain, since the decoder can derive
`the same gain in the same manner.
`A simplified block diagram of the 16 kb/s LD-CELP
`coder is shown in Fig. 2. In this coder. the excitation vec-
`tor has a dimension of five samples, so the one-way cod-
`ing deiay is less than 2 ms. The pitch predictor in con-
`ventional CELP is eliminated, and a 50th—order LPC
`predictor is used. The LPC predictor coefficients are up-
`dated once every four speech vectors (2.5 ms) by per-
`forming LPC analysis on previously quantized speech.
`The excitation gain is updated once every vector by using
`a 10th-order adaptive linear predictor in the logarithmic
`gain domain. The coefficients of this log-gain predictor
`are updated once every four vectors by performing linear
`
`

`
`CHEN 2:
`
`.m’.: A LOW-DELAY CELP CODER
`
`Excitation
`VQ
`codebook
`
`gain
`adaptation
`
`Fig. 2.
`
`(El) 16 kb/s |ow—de|ay CELP encoder. (b) 16 kb/s low-delay CELP
`decoder.
`
`predictive analysis on the logarithmic gains of previously
`quantized and scaled excitation vectors. The 10th-order
`perceptual weighting filter is also updated once every four
`vectors, but it is updated by performing an LPC analysis
`on the input speech. To reduce the codebook search com-
`plexity,
`the 10-bit excitation VQ codebook is a product
`code of a three-bit gain codebook and a seven—bit shape
`codebook.
`
`Ideally, we should be able to achieve better coder per-
`formance if we update all predictors and filters once a
`vector. However, more frequent updates result in higher
`computational complexity. To make it possible for real-
`time implementation with the DSP32C, we found it nec-
`essary to reduce the update rate to once every four vec-
`tors. Fortunately,
`the spectral envelope of speech does
`not change very rapidly with time, so the less frequent
`updates of predictors does not cause noticeable degrada-
`tion in speech quality.
`It should be emphasized that, although all predictors
`
`and filters are updated once every four vectors (20 sam-
`ples),
`in order to achieve a low coding delay the basic
`buifer size of the LD-CELP coder is still limited to only
`one vector

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket