`
`Study on Appropriate Voice Data Length of IP Packets
`for VoIP Network Adjustment
`
`SUGAWARA. and Marine MASUGI
`OOUCI-H, Tsuyusl'ti TAKENAGA‘,
`N‘IT Network Service Systems Laboratories
`9-“, Midori—Cho 3-Chome. Musashino-Shi, Tokyo 180-8535. Japan
`' Nippon Telegraph and Telephone West Corp. 245'. Nakanoshima 6-Cltome, Kita—Ku, Osaka-Shi. Osaka 540-8511. Japan
`
`Abstract-This paper evaluates suitable voice-data length In
`1'? pockets for the adj ntment of Vol? network systems. Based
`on measuremte in a real environment, we examined the
`voice-quality level while varying the voice-data length of 1?
`packets under various network conditions. We found that I
`Vol} system with long-voice data has high-transmission
`efficiency but there is a high deterioration in the voice—quality
`level In an inferior network We also discovered that a Vol?
`system with short-voice data it tolerant to packet lessee and
`preserves voice quality. Baler! on these results, we propose a
`Vet? system that sets the volce-data length or IP packets
`according to dynamically changing network Q05 conditions to
`achieve both high-transmission clfieleuey and stable voice
`quality.
`
`I. INTRODUCTION
`
`The range of computer network technology has
`expanded in size and diversity. and a wide variety of
`value-added applications for use on the Internet have
`appeared in recent years. Networks (eg the Internet) have
`become
`broadband.
`and multimedia
`communication
`enVironn'lents where data, voice and images
`can be
`exchanged have been rapidly improving. IETF is studying
`communication services that connect existing telephone
`networks with IP networks. and ITU—T is deciding on a
`VoIP (Voice over interact Protocol} protocol and studying
`the. technologies for communication services where INS
`(Intelligent Networks) can collaborate with l'P nehvorks.
`There has been remarkable progress in the IP—based
`technology for computer-telephony integration. and this has
`resulted in lowering communication costs. In particular,
`packet voice applications such as Netmceting, CuSceMc
`and other conferencing multimedia applications have
`becomewidespread.
`Due to the shared nature of current network structures,
`however, guaranteeing the quality of service (005] of
`Internet applications from end-toqcnd is sometimes difficult.
`Because voice transmission on the Internet is unreliable.
`the current best-effort technology cannot guarantee the (205
`or reliability of VoIP services. Various studies on Vol?
`technology have tried to establish reliable services and
`evaluate (208 properties [1-4]. However. the Q05 level of
`Vol? systems depends on many parameters including the
`end-to-end delay. jitter, packet loss in the network, type of
`codec used, length of voice data, and the size of the
`jitter-absorbing buffer [5], so that further investigations are
`needed to clarify the factors affecting (203. For instance. it
`has been pointed out that the transmission efficiency of
`voice data carried by IP packets is not sufficient because of
`the high ratio of header length to voice-data length in WI?
`packets. Hence.
`to provide efficient and reliable VolP
`
`it is important to clarify the effect that changing
`services.
`the system‘s voice-data transmission rate has, among others.
`Also. there are some other issues such as less degraded
`voice quality due to VolP packet losses. decreased number
`of UDP (User Datagram Protocol) connections for a voice
`gateway and decreased processing load for routing.
`The transmission efficiency of VoIP can be enhanced by
`omitting
`redundancy in the
`IPr'UDPfRTP (Real-time
`Transport Protocol) header information, compressing it by
`not resending header information that does not change sfier
`call concoction setup, and multiplexing the voice data of
`two or more call channels
`in one 1? packet with a
`sub-header identifying call channels. In lTU-T and IETF,
`IP-based multiplexing methods have been studied to
`enhance the transmission efficiency of VoIP [69].
`We evaluated a VoIP system with the intention of
`designing optimal network services for various network
`conditions. Based on measurements in a real network
`environment. we examined the voicequality level (FSQM:
`Perceptual Speech Quality Measure [10]} while changing
`the voice—data length of Il’ packets for various packet loss
`and jitter conditions. The above measure is widely used,
`and it provides an objective quality level for the voice over
`existing telephone voice bandwidths (300 tic-3.4 kHz], and
`it is recognized as an appropriate method with relatively
`little error to determine the subjective quality level (MOS:
`Mean Opinion Score {11. 12]). The smaller the PSQM
`value.
`the better
`the voice quality. Using the results
`obtained from our experiment, we created a new Vofl’
`mechanism that enhances
`transmission efficiency and
`provides
`stable voice quality.
`It
`sets
`the appropriate
`voieedata length for [P packets based on the dynamically
`changing network (205 conditions.
`
`ll.
`
`'I‘RANS‘IISSIDN INEFFICIENCY 0F VolP
`
`the
`When packets transmit an analog voice signal.
`VoIP-GW (Vow-Gateway) digitizcs
`the voice signal
`through a codec. such as (3.71]
`[64 khps) and 6.729 (E
`kbps), and it transmits voice data of a fixed length. where
`UDP and RT? are used to transmit voice data in real time.
`The structure of an IP packet over an Ethernet is shown in
`Fig 1. Here the voice data length can he changed according
`to the Vol”? transmission efficiency.
`The MAC Media Access Control) header. IP header,
`UDP header, RTP header and PCS (Frame Check
`Sequence) are necessary for transmitting voice data over
`the Ethernet, and preamble and “’6 (Inter Packet Gap)
`should be considered as occupying bandwidth 'on the
`transmission line. For
`instance,
`the
`total occupied
`bandwidth is 93 bytes including IFG. preamble. MAC
`
`O-‘l 803-7632-3l'01l31'lfll) ©2002 IEEE
`
`ibis
`
`Snap 1024
`U.S. Pat. 7,535,890
`
`
`
`header, IP header, UDP header, RTP header and FCS when
`transmitting 20-byte voice data. The 78 bytes
`thus
`correspond to the overhead of IP transmission, so the ratio
`ofvoice data to the total is less than 25%
`
`___1__2_
`
`Ebytes
`
`14 bytes
`
`mantra»
`
`times
`
`Err_f=L’8*e
`
`(l)
`
`where Err_f (%) is the errored frame rate, L (bytes) is the
`voice data length per IP packet, and e (%) is the bit error
`rate. Note that this does not take into consideration the
`possibility of bit errors in the packet header.
`
`’
`
`'
`
`,o‘
`
`’I.
`
`\‘~
`
`x
`
`\
`
`20 bytes
`
`5 bytes
`
`t2 bytes
`
`6 bytes or more
`
`Fig. 1. VoIP packet structure for Ethernet.
`
`The voice-data length of an iP packet usually depends
`on the method used for the \bIP-GW. Eighty-byte voice
`data is often used for 6.711, whereas 20-byte voice data is
`used for G729 in conventional VoIP communication.
`
`However, it is also permissible to change the voice-data
`length per packet by changing the VoIP-GW set up.
`Table I shows the relationship between the packet
`transmission cycle and voice-data length, and Fig. 2 shows
`the relationship between the packet transmission cycle and
`the bandwidth occupied by the VoIP frames of an Ethernet.
`The longer the transmission cycle becomes, the longer the
`voice-data length. Moreover, the longer the voice data in an
`IP packet becomes, the more the transmission efficiency
`increases because the Vol? packet has overheads for the
`MAC header (in the case of the Ethernet), IP header, UDP
`header, and RTP header (Fig. 3). However, the longer one
`packet becomes, the more packet errors are likely to occur,
`so it is
`important
`to evaluate how the network traffic
`conditions affect
`the packet behavior and Q05 in VoIP
`systems.
`
`TABLE 1
`
`Relationship between packet transmission cycle
`and voice data length
`
`Transmission
`
`
`
`A high traffic load can cause packet loss and jitter, and a
`poor transmission line can cause bit errors. Larger jitter
`than the jitter absorbing buffer size may cause p'acket loss,
`and bit errors may also result in packet loss if the data link
`layer
`(such as HDLC (High-level Data Link Control
`procedure» has a function for dropping irregular frames.
`Given 40 bytes of voice data in an. IP packet, e.g., when the
`bit error rate is 0.01%, the reproduction rate of voice data
`in the worst case may be 96.8%. Moreover, given 800 bytes
`of voice data in an IP packet, the reproduction rate of voice
`data in the worst case may be 36%. This explains why the
`VoIP quality decreases drastically with bit error rate. The
`above relationship is expressed by the following formula,
`
`N8
`
`kaS)
`
`
`
`Occupledbandwmh( s
`
`+6711
`+8129
`
`5
`
`20 40
`
`60 80 100
`
`Transmission cycle (ms)
`
`Fig. 2. Bandwidth occupied by Vofl' frames.
`
`C
`3 1
`‘3 on
`E o 6
`g
`.
`E 04
`‘1?
`g 0.2
`E o
`
`+6111
`+6329
`
`5
`
`20 4o 60 so 100
`Transmission cycle (ms)
`
`Fig. 3. Transmission efficiency
`
`III.
`
`EXPERIMENT
`
`We evaluated the effect of changing the packet loss rate
`and the voice data length of IP packets on the setup
`described below. The background traffic generator fixed the
`frame length and the amount of background traffic.
`
`A. Test-bed network
`
`setup we used to
`Fig. 4 shows the measurement
`evaluate the voice quality of a VoIP system.
`
`
`
`Fig. 4. Tut-bed network.
`
`1619
`
`
`
`A voice quality tester measured the PSQM+ between two
`VolP-gateways, and a network emulator varied the packet
`loss rate within a range from 0 to 5% and average jitter
`time within a range from 0 to 50 ms. Background traffic
`was generated in the same direction as the voice data
`packet flow
`
`B. Voice samples
`
`The speech database [13] contained eight voice sample
`files (four samples from two men speaking in Japanese,
`four samples from two women speaking in Japanese),
`which was based on lTU-T recommendation P.80. The
`PSQM+ value in the results is the average of these eight
`samples. We also calculated the standard deviation from
`these.
`
`C. Results
`
`0
`
`Fig. 5 plots the measured PSQM+ for 6.711 and G329
`codecs versus packet loss rate. where we chose a maximum
`and minimum that the VoIP-GW could assign for each
`codec as the voice—data length of the Vol? packets. We
`concluded the following from these results:
`e
`Short-voice data and long-voice data have the same
`voice quality when packet loss rate equals zero.
`The higher the packet loss rate, the lower the voice
`quality (= higher PSQM+),
`and the longer
`the
`voice-data packet length,
`the lower the tolerance to
`voice-data packet loss.
`actual
`Considering that
`the PSQM+ value for
`communications should be less than about 2.5,
`the
`6.711 codec can tolerate packet loss rates of up to 5%,
`and the 6.729 codec can tolerate losses up to 2%.
`
`0
`
`3
`
`+ 25
`CE,
`2
`g
`1.5
`
`1
`
`+soobytes
`+160bytes
`
`D
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`Packet loss rate (%)
`
`(a)160and80(lbytes(G.7ll)
`
`PSQM+
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`the measured PSQM+ and standard
`6 shows
`Fig.
`deviation for five voice-data lengths (160, 320, 480, 640,
`800 bytes) given a constant packet loss rate of 3%. Here,
`the 6.711 codec was selected and the eight voice-sample
`files were transmitted. The 160 -byte voice-data packets had
`the highest
`tolerance to packet
`loss and the BOO-byte
`voice-data packets had the highest fluctuation in PSQM+,
`indicating that the longest packet was affected most.
`
`160
`
`320
`
`480
`
`640
`
`800
`
`Voice-data length (bytes)
`
`(a) MeasuredPSQM+
`
`2.5
`
`N
`
`1.5
`
`PSQM+
`
`0.9
`
`0.6
`
`0.3
`
`
`
`Standarddeviation
`
`160
`
`320480
`
`640
`
`800
`
`Voice-data length (bytes)
`
`(in) Standard deviation
`
`Fig. 6. Examples of measured results for several voice-data length a
`(G371 I, packet loss rate: 3%).
`
`the measured PSQM+ and standard
`Fig. 7 shows
`deviation influenced by jitter. Here, we also selected the
`(3.711 codec. We concluded the following from these
`results:
`
`I
`
`0
`
`0
`
`There was no difference between long—voice data and
`short-voice data, though the more the jitter the lower
`the PSQM+ value, when jitter occurred in the
`network.
`
`Longer voice-data packets had higher fluctuations in
`PSQM+, showing that the longer packet was affected
`more.
`
`for actual
`the PSQM+ value
`Considering that
`communications should be less than about 2.5, the
`G.7ll codec can tolerate jitter up to about 20 ms.
`
`Additionally, when we compared the PSQM+ values of
`jitter—absorbing buffers A (large) and B (small), we found
`that the jitter tolerance of the former was higher than that
`of the latter.
`
`Packet loss rate (96)
`
`200 bytes (6.729)
`(h) 20
`Fig. 5. Examples of ensured PSQM+ fortwo-volee-data lengths.
`
`1 620
`
`
`
`
`
`little delay is experienced in telephone
`long as
`communications, because longer voice data increase
`end-to-end delay. If the response time exceeds a certain
`threshold, the VoIP-GW reduces the voice-data length.
`If the response time is less than the threshold,
`the
`VoIP-GW increases the voice-data length.
`If the IP network is stable (packet loss rate of nearly
`0%),
`the VoTP-GW assigns long voice data. If the
`packet
`loss rate exceeds a certain threshold,
`the
`VoIP-GW reduces the voice.data length to preserve
`communication quality. The VoIP-GW then increases
`voice-data length when network conditions return to a
`stable state.
`
`If the jitter time is hss than a certain threshold, the
`VoIP-GW assigns long voice data. If the jitter time
`exceeds a certain threshold, the VoIP-GW reduces the
`voice-data length to preserve communication quality.
`The VoIP-GW then increases voice-data length when
`jitter time returns to a low level.
`VoIP-GW
`VolP-GW
`
`Chanda-roice-data leng1h Change voicedata length
`
`Acquire network Ecol-ohm
`Change voice-data length
`
`Acquire ne’nwrk Edition
`
`Acquirfletmnc condition
`Change voicedah length
`
`Acquire network condition
`
`Fig. 8. Variable voice data length Vol? system.
`
`Next, we provide the threshold for network condition
`values (response time, packet loss rate and jitter time) to
`change the voice-data length. When network condition
`values exceed the threshold,
`the VoIP-GW varies the
`voice-data length and jitter absorbing buffer size.
`
`A. Response time
`
`ITU-T defines the guidelines for one-way transmission
`time in 0.1 14 [14] as follows.
`0 to 150 ms: Acceptable for most usa‘ applications.
`150
`to
`400 ms: Acceptable
`provided
`that
`Administrations are aware of the transmission time
`impact
`on
`the
`transmission
`quality of user
`applications.
`above 400 ms: Unacceptable for general network
`planning purposes; however, it is recognized that in
`some exceptional cases this limit will be exceeded.
`
`The delay time for the source and destination VoIP ~GW
`used in our evaluation was about 90 ms (voice data length:
`'160 bytes) — 160 ms (voice data Icngth: SOObytes) under
`G.711. Considering the delay time in the VoIP-GW, the
`threshold for the delay time in an lP network should be less
`than 200 ms.
`
`B. Packet loss rate
`
`From the results in Section III.C, the threshold f0r packet
`
` zo
`
`30
`
`40
`
`Jitter average time (ms)
`(a) MeasuredPSQM+
`
`deviation
`Standard
`
`Jitter average time (ms)
`(11) Standard deviation
`Fig. 7. Examples ol'measurcdinfluence by jitter(G.71|).
`
`From these results, we found there was a relationship
`between voice quality and voice-data length in VoIP
`systems as follows.
`
`TABLE I]
`
`Charcterist'lcs of voice-data length
`Short-voice data
`Wide
`
`
`
`
`
`
`
`Tolerant to packet
`loss
`Less fluctuation in
`voice uali
`
`Less fluctuation in
`voice uali
`
`Stability of voice
`I uali
`
`
`Tend to degrade
`
`More fluctuation in
`voice quality
`
`
`
`
`
`voice uali
`
`efficien
`
` Influence of
`
`packet loss
`
`
`
`Influence ofjitter
`
`
`
`
`IV.
`
`VARIABLE VOICE DATA IENGTH VoIP SYSTEM
`
`In the previous section, we obtained the characteristics
`of a VolP system operating under
`inferior network
`conditions. They can be used to create a VoIP system that
`offers optimal voice quality and transmission efficiency by
`varying the voice data length in IP packets based on
`network QoS conditions. Fig. 8 shows the architecture of
`the variable voice-data-length VolP system we proposed.
`The VoIP-GW of the system works as follows:
`
`0
`
`network conditions
`system monitors
`The VoIP
`rate,
`jitter
`time) by
`(response time, packet
`loss
`periodically pinging the destination VolP-GW after a
`call connection is set up. Here, we regard a late
`response time, high packet loss rate and a large jitter
`time as an inferior network.
`The VoIP-GW assigns long voice data to IP packets as
`
`1621
`
`
`
`