`US 6,301,258 B1
`(10) Patent No.:
`(12)
`Katseff et al.
`(45) Date of Patent:
`*Oct. 9, 2001
`
`
`US006301258B1
`
`(75)
`
`(54) LOW-LATENCY BUFFERING FOR PACKET
`TELEPHONY
`Inventors: Howard Paul Katseff, Englishtown;
`Robert Patrick Lyons, Jackson;
`Bethany Scott Robinson, Lebanon,all
`of NJ (US)
`
`(73) Assignee: AT&T Corp., New York, NY (US)
`’
`.
`.
`.
`.
`(*) Notice:
`Thispatent issued on a continued pros-
`ceution application filed under 37 CFR
`1.53(d), and is subject to the twenty year
`patent
`term provisions of 35 USC.
`154{aN(2).
`ne
`:
`:
`:
`Subject to any disclaimer, the term of this
`patent is extended ror alfjusied, under 35
`US.C. 154(b) by 0 days.
`
`cesseececssccesescsseeeseensereeseeee 91/50
`0 460 867 AZ * S/1991 (EP)
`0.548 S97TAL * A24992 (ER) csrseseceeesrewessreserreraceys 93/26
`OTHER, PUBLICATIONS
`Ran etal. “Designing an ON-Demand Multimedia Service”
`IEEE Communication Magazine v30 iss7, Jul. 1992.*
`Megiddo et al. “The Minimum Reservation Rate Problem in
`Digital Audio/Video”, Israel Symposium on Theory of Com-
`puting, 1993.*
`Revindran et al. “Delay Compensation Protocols for Syn-
`chronization of Multimedia Data Streams”, IEEE Trans. on
`Knowledge and Data Engineering, v5 iss4, Aug. 1993.*
`j
`|
`(List continued on next page.)
`Primary Examiner—Huy D. Vu
`Assistant Examiner—Yoan Nguyen
`;
`,
`oie
`(74) Attorney, Agent, or Firm—Kenyon & Kenyon
`(57)
`ABSTRACT
`
`(56)
`
`.
`(21) Appl. No.: 08/985,229
`(22)
`Filed:
`Dec. 4, 1997
`;
`4
`(51)
`Int. Ch. ccc ceeeeeeee HOAL 12/56; HO04J 3/06
`(52) US. Cl eee 370/412; 370/508; 370/517
`(58) Field of Search.
`..........0.......seseeseeceseneiees 370/465, 468,
`370/477, 508, 516-517, 519; 375/371-372
`.
`References Cited
`U.S. PATENT DOCUMENTS
`
`In a method for reducing latency in packet telephony caused
`byanti-jitter buffering, audio data elements are received and
`placed in a telephony input buffcr used for anti-jitter buff-
`ering. Rather than wait until the bufferis full, the audio data
`elements are clocked, or played, out of the buffer at a rate
`slower than the normalplayrate. In this way, latency due to
`the initial buffer fill period is reduced or eliminated. Audio
`data elements continue to be played out at a slower than
`normal rate until the buffer fill level reaches a threshold. At
`that time, the play rate for sending data elements out of the
`telephony input buffer is adjusted to the normal play rate. In
`an alternative embodiment of the present invention, the fill
`level of the telephony input buffer is controlled within a
`4,914,650 ® ANT990 Sritany syecwsceeweceecercosseees 370/60
`
`desired range by speeding up or slowing down therate at
`5,109,482 :
`4/1992 Bohrman....
`ve 395/154
`
`5,159,447*10/1992 Haskell etal.. ve 358/133 which audio data elements are played out ofthe telephony
`
`acon 95/159
`SDLOAS © Sileo): Catlitest et ils
`input buffer.
`In
`yet another alternative embodiment,
`the
`3/1993 Alcoetal. ..
`ve 395/157
`5,193,148 *
`P
`mle
`yer
`ane
`te
`packs
`ts a
`
`8/1993 Mills et al. acssssscsseeeesen 395/133.
`5,237,648 *
`‘MMlolllil OL_Jaleney iller 1 AME packel Deiwork 1s Mieasire
`5,287,182 *
`2/1994 Haskell et al. vassssesnesneenee 348/500
`«andthe size of the telephonyinput buffer is adjusted based
`.. 370/84
`5,544,170 *
`8/1996 Kasahara .......
`upon the relative amountof jitter, such that the relative size
`
`.
`we 370/263
`5,623,490 *
`4/1997 Richteret al.
`of the buffer is reduced when the packet network is quiet,
`5,822,537 * 10/1998 Katseff et al. oe 395/200.61
`and the size of the buffer is increased when the network is
`FOREIGN PATENT DOCUMENTS
`metalawelyery.
`
`O 2711866 A2 © 12/1987 (ED) ccememeeen 88/25
`
`50 Claims, 6 Drawing Sheets
`
`Oo 7 170
`AuoIO OUT
`j
`‘SPEAKER MIE,
`wo]
`
`
`105]
`
`SOUNDA/D, D/)iN
` SOUND
`
`
`
`
`
`
`
`
`
`
`NETHORK LAYER
`
`CARD
`
`
`
`
`
`
`
`0 PORT
`
`140
`qo00
`
`
`
`MODEM
`1STELEPHONE LINE
`
`GOOGLE 1009
`
`GOOGLE 1009
`
`1
`
`
`
`US 6,301,258 B1
`Page 2
`
`OTHER PUBLICATIONS
`
`Computer Communications, vol. 15, No. 10, Dec. 1, 1992
`pp. 611-618, Blakowski G. et al, “Tool Support for the
`Synchronization and Presentation of Distributed Multime-
`dia’”.*
`
`IEEE Communications Magazine, vol. 29. No. 10, Oct. 1,
`1991, pp. 54-64, Israel Cidon et al, “Bandwidth Manage-
`ment and Congestion Control on plaNet”.*
`Cherry, Lorinda L. and Waldstein, Robert K., “Electronic
`Access To Full Document Text And Images Trough Linus”,
`AT&T Technical Journal, Jul./Aug. 1989, pp. 72-90.*
`Davecv, Danco, Cakmakov, Dusan and Cabukovski, Vanco,
`“Distributed Multimedia Information Retrieval System”,
`Computer Communications, vol. 15, No. 3, Apr. 1992, pp.
`177-184.*
`
`Haskin, Roger L., “The Shark Continuous—Media File
`Server”, Digest of Papers, COMPCONSpring’ 93, San Fran-
`cisco, California, Feb. 22-26, 1993, pp. 12-15.*
`Katseff, Howard P. and London, Thomas B., “The Ferret
`Document Browser”, USENIX Summer 1993 Technical
`Conference, Cincinnati, Jun. 1993.*
`Lesk, Michael, “Television Libraries for Workstations: An
`All-Digital Storage, Transmission and Display for Low-—rate
`Video”, (apparently unpublished).*
`O’Gorman, Lawrence, “Image and Document Processing
`‘lechniques for the RightPages Electronic Library System’,
`Proceedings of the 11” LAPR International Conference on
`Pattern Recognition, The Hague, Nethlands, Aug. 30-Sep. 3,
`1992, vol. II, Conference B: Pattern Recogbition Method-
`ology and Systems, IEEE Computer Society Press, Califor-
`nia, pp. 260—263.*
`
`Poole, Lon, “Quicklime In Motion: Pasting Movies Into
`Documents Will Be As Easy As Pasting Graphics”, Mac—
`world, Sep. 1991, pp. 154-159.*
`Rangan, P. Venkat, Vin, Harrick M. and Ramanathan, Srini-
`vas, “Designing An On-Demand Multimedia Service”,
`IEEE Communications Magazine, vol. 30, No. 7, Jul. 1992,
`pp. 56-64.*
`Rowe, Lawrence A. and Smith, Brian C., “A Continuous
`Media Player”, Proceeding of the 3” International Workship
`on Network and OS Support for Digital Audio and Video,
`San Diego, CA, Nov. 1992.*
`Rowe, Lawrence A. and Larson, Ray R., “A Video—on—De-
`mand System”, (apparently unpublished).*
`Semilof, Margie, “NetWare to Get Multimedia Hooks”,
`Communications Week, No. 469, Aug. 30, 1993, pp.
`21-22.*
`
`Story, Guy A., O’Gorman, Lawrence, Fox, David Schaper,
`Louise Levy, and Jagadish, H.V., “The RightPages Image—
`Based Electronic Library for Altering and Browsing”, Com-
`puter, Sep. 1992, pp. 17-26.*
`Tobagi, Fouad A. and Pang, Joseph, “StarWorks—A Video
`Applications Server”, Digest of Papers COMPCON
`Spring’93, San Francisco, California, Feb. 22-26, 1993,
`IEEE Computer Society Press, pp. 4-11.*
`“Interview: Expert Discusses Multimedia Implementations
`on Networks”, Communications Week, No. 471, Sep. 13,
`1993 pp. 22-23.*
`Pres Syndicate of the University of Cambridge, Scenari-
`o—based Hypermedia: A Model and a System, 1990.*
`
`* cited by examiner
`
`2
`
`
`
`U.S. Patent
`
`Oct. 9, 2001
`
`Sheet 1 of 6
`
`US 6,301,258 B1
`
`FIG.
`
`1
`
`1
`
`100
`
`(2 170
`
`SPEAKER
`
`105] AUDIO OUT}~160AUDIO IN
`
`
`' 1 1 U { t i) i} 1 1 ' 1 1 1 1 J ' ' ' I I t ' t t t t t i) 1 t 1 i) t { I t t ' 1 1 ' t ' t 1 1 ' ~
`
`MANAGER
`
`oSNe
`
`—_
`
`—_— —_ BRO
`
`SOUND A/D, D/A
`
`beeeeweeeeeweeHeeeeeeee
`
`150
`
`BUFFER
`
`145~-TELEPHONE LINE
`
`3
`
`
`
`U.S. Patent
`
`Oct. 9, 2001
`
`Sheet 2 of 6
`
`US 6,301,258 B1
`
`FIG. 2
`
`203
`
`
`
`RECEIVE AUDIO DATA ELEMENT AND
`STORE IN TELEPHONY INPUT BUFFER
`
`
`
`SET ELEMENTS TO PLAY AT
`SLOWER THAN NORMAL RATE
`
`NUMBER OF ELEMENTS < 1?
`
`
`
`
`
`
`
`
`
`SET ELEMENTS TO PLAY AT
`NORMAL RATE
`
`PLAY ELEMENTS WITHOUT WAITING
`FOR BUFFER TO FILL
`
` FIG. 3
`
`RECEIVE AUDIO DATA ELEMENT
`AND STORE IN TELEPHONY
`INPUT BUFFER
`
`PLAY ELEMENTS
`OUT AT SLOWER THAN
`NORMAL RATE
`
`
`
`
`
`PLAY ELEMENTS OUT
`AT FASTER THAN
`NORMAL RATE
`
`
`
`PLAY ELEMENTS OUT
`AT NORMAL RATE
`
`
`
`4
`
`
`
`U.S. Patent
`
`Oct. 9, 2001
`
`Sheet 3 of6
`
`US 6,301,258 B1
`
`403
`
`RECEIVE AUDIO DATA ELEMENT AND
`STORE IN TELEPHONY INPUT BUFFER
`
`FIG. 4A
`
`
`
`
`
`
`
`PLAY ELEMENTS OUT AT
`SLOWER THAN NORMAL RATE
`
`NUMBER OF ELEMENTS < 1?
`
`PLAY ELEMENTS OUT AT
`FASTER THAN NORMAL RATE
`
`404
`
`FIG. 4B
`
`
`
`
`
`Alf
`
`RECEIVE AUDIO DATA ELEMENT
`AND STORE IN TELEPHONY
`INPUT BUFFER
`
`
`PLAY ELEMENTS OUT
`PLAY ELEMENTS OUT
`AT SLOWER THAN
`AT FASTER THAN
`
`NORMAL RATE
`
`
`
`NORMAL RATE
`
`PLAY ELEMENTS OUT
`AT NORMAL RATE
`
`5
`
`
`
`US 6,301,258 B1
`
`U.S. Patent
`
`Sheet 4 of 6
`
`Oct. 9, 2001
`
`GnTHY34IN8 GOld
`
`
`
`SLOTS3WIL
`
`6
`
`
`
`U.S. Patent
`
`Oct. 9, 2001
`
`Sheet 5 of 6
`
`US 6,301,258 BI
`
`FIG. 6A
`
`601
`
`602
`
`603
`
`
`
`
`RECEIVE AUDIO DATA ELEMENT
`
`INTO BUFFER
`
`DETERMINE NUMBER OF DATA
`ELEMENTS IN BUFFER
`
`
`
`ENTER CURRENT TIME ARRAY
`AT INDEX GIVEN BY CURRENT
`NUMBER OF ELEMENTS
`
`
`
`FIG. 6B
`
`INCREASE BUFFER SIZE
`
`611
`
`612
`
`613
`
`615
`
`
`
`SCAN ARRAY FOR MOST
`RECENT ENTRY Tr
`
`SCAN EACH OTHER ELEMENT
`Tn OF ARRAY FOR
`COMPARISON WITH Tr
`
`REDUCE BUFFER SIZE
`
`
`
`
`
`
`
`7
`
`
`
`U.S. Patent
`
`Oct. 9, 2001
`
`Sheet 6 of 6
`
`US 6,301,258 B1
`
`FIG. 6C
`
`621
`
`622
`
`626
`
`REDUCE BUFFER
`SIZE
`
`
`
`624
`
`INCREASE BUFFER
`SIZE
`
`
`
`
`
`
`
`SCAN ARRAY FOR MOST
`RECENT ENTRY Tr
`
`
`SCAN EACH OTHER
`ELEMENT Tn OF ARRAY FOR
`COMPARISON WITH Tr
`
`MAINTAIN CURRENT BUFFER
`SIZE
`
`8
`
`
`
`1
`LOW-LATENCY BUFFERING FOR PACKET
`TELEPHONY
`
`The present application is related to U.S. application
`entitled “Low-Latency Audio Interface for Packet
`Telephony,” which is filed on even date herewith. These two
`applications are co-pending and commonly assigned.
`TECHNICAL FIELD
`
`This invention relates to packet telephony in general and,
`more particularly, provides a way of reducing latency in
`packet telephony communications.
`BACKGROUND OF THE INVENTION
`
`US 6,301,258 B1
`
`2
`such input buffers may run several packets deep with an
`equivalent length (in terms of time) ranging from % toa full
`second of audio data. Thus, each time a speaker starts
`talking, perceptible latency is introduced as a result of
`anti-jitter buffering, making interactive conversations diffi-
`cult or unnatural.
`
`Whatis desired is a way of reducing the latency in packet
`telephony communications caused by buffering used to
`reduce or eliminate networkjitter.
`SUMMARY OF THE INVENTION
`
`10
`
`The present inventionis directed to a method for reducing
`latency in packet telephony caused byanti-jitter buffering.
`Whena second, or remote, user begins to speak, the tele-
`phony input buffer, whichis used for anti-jitter buffering, is
`Packet telephony involves the use of a packet network,
`initially empty. As audio data are received,
`the data are
`such as the Internet or an “intranet” (modeled in function-
`placed in the telephony input buffer. However, rather than
`ality based upon the Internet and used by a companies
`wait until the buffer is full, the audio data are clocked, or
`locally or internally) for telecommunicating voice, pictures,
`played, out of the buffer as soon as the first data element
`moving images and multimedia (e.g., voice and pictures)
`arrives and at a rate slower than the normalplayrate. In this
`content.
`Instead of a pair of telephones connected by
`way, latency duc to the initial buffer fill period is reduced or
`switched telephone lines, however, packet telephony typi-
`eliminated. Audio data continue to be played out at a slower
`than normalrate until the bufferfill level reachesa threshold.
`cally involves the use of a “packet phone” or “Internet
`phone”at one or both ends of the telephonylink, with the
`At that
`time,
`the play rate for sending data out of the
`information transferred over a packet network using packet
`telephony input buffer is adjusted to the normal play rate.
`switching techniques. A “packet phone”or “Internet phone”
`This technique for starting playback at a slower rate before
`typically includes a personal computer (PC) running appli-
`the buffer is fulled may be employed whenever the buffer
`cation software for implementing packetized transmission of
`empties, e.g. either as the result of the startup of a
`audio signals over a packet network (such as the Internet);
`conversation, or of network delays or of loss of a burst of
`in addition,
`the PC-based configuration of a packet or
`packets.
`Internet phone typically includes additional hardware
`In an alternative embodimentof the present invention, the
`devices, such as a microphone, speakers and a sound card,
`fill level of the telephony input buffer is controlled within a
`which are plugged or incorporated into the PC.
`desired range by speeding up or slowing downthe rate at
`The amountof time it takes for a communication to travel
`which audio data are played out of the telephony input
`through a communications networkis referred to as latency.
`buffer. In yet another alternative embodiment, the amount of
`The amount of latency can impact the quality of the com-
`latency jitter in the packet network is measured andthe size
`munication; the higher the latency, the lesser the quality of
`of the telephony input buffer is adjusted based upon the
`the communication. Latency of about 150 milliseconds (ms)
`relative amountofjitter, such that the relative size of the
`or more produces a noticeable effect upon conversations
`buffer is reduced when the packet network is quiet, and the
`that, for some people, can render a conversation next to
`size of the buffer is increased whenthe networkisrelatively
`impossible. The Plain Old Telephone Service (POTS) net-
`jittery.
`work controls latency to an acceptable degree, which is one
`BRIEF DESCRIPTION OF THE DRAWINGS
`of the ways in which the POTS network is deemedareliable
`FIG. 1 showsa functional diagram for a PC-based packet
`and quality communications service.
`phone utilizing the buffering management of the present
`However, latency is a significant problem in packettele-
`invention.
`phony. Latency problems may be caused by factors such as
`traffic congestion or bottlenecks in the packet network,
`which can delay delivery of packets to the destination.
`Another problem is caused by packet network “Jitter.”
`“Jitter” is the variance in latency from packet-to-packet or
`between groups of packets, such that packets (or packet
`groups) are not received at the destination at regular inter-
`vals. In packet telephony, packets are clocked into the packet
`network from the sending station at a regular rate;
`thus,
`network characteristics are responsible for deviation from
`regularity in the rate of receiving data packets at the receiv-
`ing station.
`the
`Packet telephony programs use an input buffer at
`receiving station to compensate for networkjitter. Anti-jitter
`buffering is used to allow data to be clocked outof the buffer
`and into the telephony section at a regular rate. Each time
`voice input from the network starts at the receiving station,
`the packet telephony program directs the incoming data inta
`the telephony input buffer, and docs not start clocking the
`data out of the buffer (clocking audio data out of a memory
`is often called “playing” the data) and along to the speaker
`output until the telephony input bufferis full. For example,
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`FIG. 2 showsa flow diagramof the startup process of the
`present invention.
`FIG. 3 shows a flow diagram for controlled buffering in
`an alternative embodiment of the present invention.
`FIG. 4A showsa flow diagram for controlled buffering in
`another alternative embodiment of the present invention.
`FIG. 4B showsa flow diagram for controlled buffering in
`another alternative embodiment of the present invention.
`FIG. 5 shows the data structure for a jitter array used in
`accordance with the present invention.
`FIG. 6A showsa flow diagram for updating a jitter array
`in accordance with the present invention.
`FIG. 6B shows a flow diagram for measuring jitter in
`accordance with the present invention.
`FIG. 6C shows a flow diagram for measuring jitter in
`accordance with an alternative embodimentof the present
`invention.
`
`DETAILED DESCRIPTION
`
`The present inventionis directed to a method for reducing
`latency in packet telephony caused by anti-jitter buffering. In
`
`9
`
`
`
`US 6,301,258 B1
`
`4
`From I/O port 135, the data proceeds to modem 140,
`which converts the data to tones suitable for transmission
`over a standard POTS telephone line 145 to a connecting
`service used to connect to a packet network, such as the
`Internet.
`
`3
`accordance with the present invention, data are played out of
`the receivers’s input buffer at variable rates, such that
`latency is reduced, while controlling the size of the buffer to
`reduce the effects of network jitter.
`FIG. 1 showsa functional diagram for a PC-based packet
`phone utilizing the buffering management of the present
`It should be noted that connecting a PC to a packet
`invention; the functionality shown in FIG. 1 is based upon
`network, such as the Internet, may be accomplished by any
`the hardware/software functionality typically found in a
`number of known techniques, such as through the use of a
`PC-based packet phone. Whenafirst user begins to speak
`modem over a telephone line described above. Access to a
`10
`into microphone 100 (which servesas the audio input device
`packet network, such as the Internet, may also be accom-
`for the packet phone), an analog audio signal from micro-
`plished through, e.g., use of an ISDNline, a cable television
`phone 100 is received into the PC-based packet phone via
`line, or a local area network using techniques knownto those
`audio input port 105. Audio input port 105 is connected to
`skilled in the art.
`sound card 110. The analog audio signal is delivered to
`sound card 110, where it is then digitized using an analog-
`to-digital (A/D) converter 112. Sound card 110 may be any
`one of a number of standard PC sound cards, such as the
`SoundBlaster™ 16 from Creative Labs. Sound card 110 also
`
`15
`
`typically contains a pair of data buffers 114 and 116. Data
`buffer 114 buffers the audio data received from audio input ,
`port 105 and digitized by A/D converter 112 before being
`sent to CODEC 120. ‘lypically, this data buffering is per-
`formed in accordance with an established protocol, such as
`that provided by a standard Microsoft audio driver supplied
`with the Microsoft Windows™ operating system.
`CODEC 120 compresses the audio data for efficient
`transmission over the packet network. CODEC 120 may,
`typically, be either a hardware or software componentthatis
`well-known in communications and telephony applications
`to those skilled in the art.
`
`25
`
`30
`
`The packet telephony productis a telephony program 125
`having a telephony application 127 and data buffers 128 and
`129. Telephony application 127 implements the [unctional-
`ity needed to prepare the data for transmission over a packet
`network. For example, telephony application 127 places the
`data into a form compatible with a data communications
`protocol used for transmitting data over a packet network.
`Telephony output buffer 128 buffers the data output by
`telephony application 127. Telephony output buffer 128 is
`kept as short as possible and is used to buffer data going out
`to the packet network in the event the network becomes
`temporarily busyat a particular instant, so that outgoing data
`are not lost.
`
`The audio data from telephony output buffer 128 is then
`processed by network layer 130. Telephony application 127
`requests that network layer 130 play data out of telephony
`buffer 128 as soon as placed in the buffer. Network layer 130
`is a software communications application which adds one or
`more layers of data protocol, such as the well-known Trans-
`mission Control Protocol and Internet Protocol (TCP/IP), or
`the known User Datagram Protocol and Internet Protocol
`(UDP/IP), and/or the well-established point-to-point proto-
`col (PPP) used for communicating over a packet network.
`TCP/IPis typically used for control and setup, while UDP/IP
`is often used for transmitting audio data because UDP/IP
`does not cause lost packets of audio data to be retransmitted.
`UDP/IP may be preferred for transmitting audio data
`because, for packet telephony, retransmitting lost audio data
`will degrade a conversation. PPP is typically employed
`when a modem is used to permit the PC connect to PC
`connect to a packet network, such as the Internet, using a
`standard dial-up telephone line. Network layer 130 is typi-
`cally included as the network stack in the Microsoft Win-
`dows™operating system. Packcts are then scent to input/
`output (I/O) port 135, which is a standardserial port used for
`establishing a serial data connection between a PC and a
`peripheral device, such as a modem.
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`10
`
`Once the data is sent to the packet network, the packet
`network transmits the data whichis ultimately directed to a
`second user having a receiving terminal (e.g., another
`PC-based packet phone) at
`the other end of a TCP/IP-
`compatible connection established between the two users.
`The second user may transmit audio or speech data back
`to the first user. The process of receiving external audio data
`from the second user over the packet network into thefirst
`user’s PC-based packet phoneis, in many respects, a rever-
`sal of the steps described above in connection with sending
`audio data from the first user’s packet phone to the second
`user over the packet network. The external audio data
`packets are received from a packet network connecting
`service (i.¢., Internet service provider) over POTS telephone
`line 145 into modem 140, which converts the data from
`tones into digital data packets. From modem 140 the data
`proceeds to J/O port 135 and then to network layer 130,
`which removesone or more protocollayers (such as TCP/IP,
`UDP/IP and/or PPP).
`After network layer 130, the data is sent to telephony
`application 125 which directs the data into telephony input
`buffer 129. As mentioned above, telephony input buffer 129
`is typically several data elements deep and, in an attempt to
`compensate forjitter, telephony program 125 delays playing
`the data out
`to speaker 170 from buffer 129,
`through
`telephony application 127, until telephony input buffer 129
`is full or has reached a given threshold (typically, such a
`threshold would be one-half full); this introduces latency
`whenever the buffer cmptics, c.g., when the second user
`begins to speak.
`In accordance with the present invention, buffer manager
`150 operates to control telephony application 127 and tele-
`phony input buffer 129, such that data is played out of
`telephony input buffer before the buffer fills up. Buffer
`manager 150 clocks the audio data out at a rate less than the
`normalrate (1.¢., at less than the real-time rate) which allows
`telephony input buffer 129 to fill—thus utilizing the effec-
`tiveness of the buffer in reducing jitter. In this way, latency
`normally introduced by virtue of the delay in the start of
`playing data from telephony input buffer 129 is eliminated
`whilc, at the same time, the taking advantage of the bencfits
`afforded by buffering in the reduction of networkjitter. This
`technique forstarting play of data from the buffer at a slower
`rate before the buffer is fulled may be employed whenever
`the buffer empties, e.g. either as the result of the startup of
`a conversation, or of network delays or of loss of a burst of
`packets.
`Once the data is played from telephony input buffer 129,
`it proceeds through the rest of the telephony and audio
`processing. Upon lcaving input buffer 129,
`the data is
`processed by telephony application 127 and is sent
`to
`CODEC120. CODEC 120 decompresses the audio data that
`was compressed (by the transmitting packet phone) for
`
`10
`
`
`
`US 6,301,258 B1
`
`é
`
`10
`
`15
`
`5
`transmission over the packet network. From CODEC 120
`the audio data is then sent to data buffer 116 of sound card
`110. Data buffer 116 buffers the audio data, which is then
`converted into analog form by D/A converter 112 and sent
`in analog through audio output port 160 to speaker 170.
`For purposes of the various aspects and embodiments of
`the invention described below,
`it
`is assumed that
`the
`PC-based packet telephone operates as discussed above with
`reference to the functional diagram of FIG. 1.
`1. Startup.
`The operation of the present invention at startup of a
`conversation by the second user will now be described in
`further detail with reference to FIG. 2, which shows a flow
`diagram of the startup process of the present invention.
`Those skilled in the art will recognize that
`the startup
`process may be employed wheneverthe buffer empties (and
`not just when a conversation begins). When the second user
`(whois remote from thefirst user) initially begins to speak,
`telephony input buffer 129 is typically empty. Buffer 129
`will also typically be emptyafter a period of silence from the
`remote terminal as the result of silence suppression (in <
`which the transmission of audio data packets from the
`remote terminal of the second user is temporarily halted
`during the period of silence by the second user), or as the
`result of network delaysor loss of a burst of packets. As the
`first audio data beginsto arrive after the second user begins
`to speak, they are placed in telephony <input buffer 129 as
`shown in block 201 of FIG. 2. Rather than waiting for the
`buffer to fill (either fully or to some predetermined level)
`with the data, as soonasthe first data element(s) are received
`they are played out of the buffer. Initially, the numberof data
`elements in telephony input buffer 129 is small, so that the
`data cannot be played out at the normal rate; otherwise, the
`buffer would notfill.
`
`25
`
`30
`
`The initiation of playing out data from the buffer may
`begin as soon as the first data element
`is received.
`Alternatively, the playing of data out of the buffer may await
`the receipt of a small numberof data elements, such number
`being less than one-half full (e.g., 2 elements), before the
`data are played out of the bulfer.
`Thus, in accordance with the present invention, the num-
`ber of data elements in buffer 129 is compared against a
`threshold value, T, as shown in block 202. The threshold
`value represents a numberof data elementsor, alternatively,
`the threshold T could be provided as a unit of time (as the
`data elements are expected to be received at regular
`intervals,
`there is a direct correlation between expected
`duration and the number of elements of audio data) or the
`desired relative fill percentage for the buffer(i.c., the ratio of
`the numberof data elements in the buffer to the total size or
`capacity ofthe full buffer). Illustratively, the threshold value
`maybe set to represent approximately 50% ofthe size of the
`buffer (i-e., the desired fill percentage would be 50%).
`If the numberof data elements in buffer 129 is less than
`the threshold value T, the elements are set to play out at a
`slower than normal rate as shown in block 203, and the
`elements are played as shownin block 205 without waiting
`for the buffer to fill. Ilustratively, elements may be played
`out at a rate equal to approximately 90% of the normal or
`expected rate at which audio data elements arrive. In this
`way, the audio data will be processed and speech heard at
`speaker 170 at a time before buffer 129 would have filled
`without playing any data. At the same time, because the data
`are being played out at a rate slower than the rate at which
`they arrive in telephony input buffer 129, buffcr 129 slowly
`begins to fill with data elements.
`The slower rate at which the data is played out should be
`arate at which the speech nonethelessis intelligible. A 90%
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`11
`
`6
`play rate is barely noticeable and not objectionable; other
`rates may, similarly, prove acceptable.
`At somepoint during the continuing receipt of audio data,
`the numberof data elements in the buffer will be equal to or
`exceed the threshold value, T, where T may represent an
`optimal or desired number of elements to be contained in
`telephonyinput buffer 129. Whenthatis the case, compari-
`son between the number of elements in the butter to the
`threshold value T at block 202 will produce a response that
`the numberof data elementsis not less than the threshold T
`and, as shownin block 204,the rate of playing the audio data
`out of buffer 129 is then set to the normal rate. In this way,
`it is expected that the number of data clements arriving at
`buffer 129 will be roughly equal to the number of elements
`played out of buffer 129 and, thus,the relative buffer “fill”
`percentage would be expected to remain approximately the
`same over time. Of course, given the nature of the network
`jitter problem, there will be short-term fluctuations in the
`telephony input buffer fill percentage.
`2. Controlled Buffering.
`Turing now to FIG. 3, another embodiment of the
`present invention will now be described in which the num-
`ber of elements in telephony buffer 129 is controlled over
`time to be constrained within a desired range (this processis
`not limited to the startup of a conversation). This provides
`the benefit of not permitting the buffer to overfill or empty
`out, such that the conversation will appear more natural. As
`before, in block 301 audio data arriving from the remote
`terminal of the second user are placed in telephony input
`buffer 129. As shown in block 302, the number of data
`elements in buffer 129 is compared againsta first threshold
`value, T1. The threshold value T1 represents a number of
`data elementsor, alternatively, could be provided as a unit of
`time (as the data elements are expected to be received at
`regular
`intervals,
`there is a direct correlation between
`expected duration and the number of elements of audio
`data). If the numberof elementsin buffer 129 is less than the
`first threshold value T1, the elements are played out at a
`slower than normal rate as shownin block 303. As discussed
`above, the play rate should be one that provides acceptable
`audio speech quality at the speaker output. Because the rate
`is lower than the normalrate, it is expected that the buffer
`would slowly begin to fill to a greater percentage.
`If the number of elements is not
`less than the first
`
`threshold, T1, the number is then compared at block 304
`against a second threshold value, T2, which is higher than
`the first threshold value, T1. If the number of elements is
`greater than the sccond threshold valuc, T2,the rate at which
`audio data elements are played out of the buffer is set at
`block 305 to a rate faster than the normal rate. When the play
`rate is faster than normal, it is then expected that the buffer
`would slowly begin to empty.
`If comparison at block 304 of the number of elements
`filling input buffer 129 with the second threshold, T2,
`reveals that the number of elements is less than (or equal to)
`T2, then the play rate, as shown at block 306, the rate of
`playing the audio data out of buffer 129 is then set to the
`normal play rate. In this way, the number of data elements
`filling telephony input buffer 129 is controlled such that the
`number of elements is steered to the range between T1 and
`T2. As described above, the threshold values T1 and T2 may
`be expressed as a number of elements, a unit of time, or the
`relative fill percentage for the buffer.
`Illustratively,
`the
`threshold values T1 and T2 maybesct to represent approxi-
`mately 25% and 75% ofthe size of the buffer, respectively.
`In an alternative embodiment of the present invention,
`controlled buffering may be performed in essentially a
`
`11
`
`
`
`US 6,301,258 B1
`
`7
`continuous mannerby collapsing the two thresholds, T1 and
`T2, into a single threshold value T, as shown in FIG. 4A. At
`block 401 audio data elements arriving from the remote
`terminal of the second user are placed in telephony input
`buffer 129. As shown in block 402, the numberof elements
`in buffer 129 is compared against a threshold value, T. If the
`number of elements is less than the threshold value, T, the
`audio data are played out at a slower than normal rate as
`shown in block 403. If the number of elements is greater
`than or equalto the threshold value, the elements are played
`out at a faster than normal rate as shown in block 404.
`An alternative to this embodimentis shown in FIG. 4B. At
`block 411 audio data arriving from the remote terminal of
`the second user are placed in telephony input buffer 129. As
`shown in block 412, the numberof data elements in buffer
`129 is compared against a threshold value, T. If the number
`of elements is less than the threshold value, T, the elements
`are played outat a slower than normalrate as shownin block
`413. If the number of data elements is greater than the
`threshold value, the elements are played out at a faster than
`normal rate as shown in block 414. If the number of :
`elements is equal to the threshold, T, the rate is set to the
`normal rate at block 415.
`
`10
`
`15
`
`3. Dynamic Buffer Sizing.
`In addition to controlling the rate at which audio data
`elements are played out of the telephony input buffer, the
`size of the buffer itself may be controlled. The longer the
`butter, the greater the potential latency in the audio that is
`eventually heard at the speaker output. The length of the
`buffer required to be effective in reducing jitter is dependent
`upon the relative amount ofjitter presented by the packet
`network. If the network