`for VoIP Network Adjustment
`
`SUGAWARA, and Ma.5&0 MASUGI
`OOUCI-H, Tsuyoshi TAKENAGAI,
`NTT Network Service Systems Laboratories
`9-! I, Midori-Che 3-Chome, Musashino-Shi, Tokyo I80-8535. Japan
`' Nippon Telegraph and Telephone West Corp.
`2~2C-'. Naksrloshima 6-Cltortle, Kita-Ku, Osaka-Shi, Osaka 540-8511. Japan
`
`-flstroct-This paper evaluates suitable voice-data length In
`1'? packet: for the adjustment of Vol? network systems. Based
`on measoremte lo I real environment, we examined the
`voice-quality level while varying the vniee-data length of 1?
`packets nnder various network conditions. We found that 3
`Vol? system with long-voice data has high-transmission
`eflleiency hut there is a high deterioration in the voice-quality
`level In an inferior network. We also discovered that a Voll’
`system with short-voice data it tolerant to pocket lessee and
`preserves voice quality. Baled on these results, we propose a
`Von’ system that sets the volce-data length of ll’ packets
`according to dynamically changing network Q03 eouditlons to
`achieve both high-transmission elfieleucy and stable voice
`quality.
`
`I. INTRODUCTION
`
`The range of computer network technology has
`expanded in size and diversity, and a wide variety of
`value-added applications for use on the Internet have
`appeared in recent years. Networks (eg the Internet) have
`become
`broadband,
`and multimedia
`communication
`environments where data, voice and images
`can be
`exchanged have been rapidly improving. IETF is studying
`communication services that connect existing telephone
`networks with IP networks, and ITU-T is deciding on at
`Vail’ (Voice over Internet Protocol} protocol and studying
`the technologies for communication services where INS
`(Intelligent Networks) can collaborate with I? networks.
`There has been remarkable progress in the IP—based
`technology for computer-telephony integration. and this has
`resulted in lowering communication costs. In particular,
`packet voice applications such as Netmceting, CuSccMc
`and other conferencing multimedia applications have
`becomewidespread.
`Due to the shared nature of current network structures,
`however, guaranteeing the quality of service (Qc-S] of
`Internet applications from end-toacnd is sometimes difficult.
`Because voice transmission on the Internet is unreliable,
`the current best-cfibrt technology cannot guarantee the QoS
`or reliability of Vol!’ services. Various studies on Vol?
`technology have tried to establish reliable services and
`evaluate Q08 properties [l-4]. However. the Q05 level of
`Vol? systems depends on many parameters including the
`end-to-end delay, jitter, packet loss in the network, type of
`code: used, length of voice data, and the size of the
`jitter-absorbing buffer [5], so that fiuther investigations are
`needed to clarify the factors affecting (203. For instance, it
`has been pointed out that the transmission efficiency of
`voice data carried by IP packets is not sufficient because oi
`the high ratio of lieadcrlcngth to voice-data length in Vol?
`packets. Hence,
`to provide efficient and reliable Veil’
`
`it is important to clarify the effect that changing
`services,
`the system's voice-data transmission rate has, among others.
`Also. there are some other issues such as less degraded
`voice quality due to VolP packet losses. decreased number
`of UDP (User Datagram Protocol) connections for a voice
`gateway and decreased processing load for routing.
`The transmission cfliciency of VoIP can be enhanced by
`omitting
`redundancy in the
`l"P.u'I.lDPfR.'l'I"
`(Real-time
`Transport Protocol) header information, compressing it by
`not resending header infonnation that does not change aficr
`call connection setup, and multiplexing the voice data of
`two or more call channels
`in one II’ packet with a
`sub-header identifying call channels. In ITU-T and IETF,
`IP-based multiplexing methods have been studied to
`enhance the transmission efliciency of Volt’ [69].
`We evaluated a Veil‘ system with the intention of
`designing optimal network services for various network
`conditions. Based on measurements in a real network
`cnvironmcnt, we examined the voicequality level (FSQM:
`Perceptual Speech Quality Measure [l0]} while changing
`the voice—data length of II’ packets for various packet loss
`and jitter conditions. The above measure is widely used,
`and it provides an objective quality level for the voice over
`existing telephone voice bandwidths (300 142-14 kHz], and
`it is recognized as an appropriate method with relatively
`little error to determine the subjective quality ievel (MOS:
`Mean Opinion Score [11,
`l2]). The smaller the PSQM
`value,
`the better
`the voice quality. Using the results
`obtained from our experiment, we created a new Vofl’
`mechanism that enhances
`transmission efficiency and
`provides
`stable voice quality.
`It
`sets
`the appropriate
`voioe»dal.a length for IP packets based on the dynamically
`changing network QoS conditions.
`
`ll. TRANSUIISSIDN INEFFICIENCY OF Voll’
`
`the
`When packets transmit an analog voice signal.
`VolP'—GW (Vol?-Gateway) digitizcs
`the voice signal
`through a codcc, such as G31] [64 kbps) and G329 (E
`kbps), and it transmits voice data of a fixed length, where
`UDP and RT? are used to transmit voice data in real time.
`The structure of an IP packet over an Ethemet is shown in
`Fig I. Here the voice data lcngth can he changed according
`to the Vol”? transmission efficiency.
`The MAC Media Access Control) header, IP header,
`UDP header, RTP header and PCS (Frame Check
`Sequence) are necessary for transmitting voice data over
`the Ethernet, and preamble and IPG (Inter Packet Gap)
`should be considered as occupying bandwidth ‘on the
`transmission line. For
`instance,
`the
`totai occupied
`bandwidth is 98 bytes including IFG. preamble, MAC
`
`0-? 803-7632-3-"D1n'$l'l_flD ©2002 IEEE
`
`i618
`
`Apple 1024
`U.S. Pat. 7,535,890
`
`
`
`header, ll’ header, UDP header, RTP header and FCS when
`transmitting 20-byte voice data. The 78 bytes
`thus
`correspond to the overhead of IP transmission, so the ratio
`ofvoice data to the total is less than 25%
`
`___1__2_
`
`Ebytes
`
`14 bytes
`
`46-tsnobyles
`
`Iibyies
`
`,
`
`a
`
`—"
`
`”-
`
`x‘~
`
`\
`
`u
`
`20 bytes
`
`5 bytes
`
`t2 bytes
`
`6 bytes or more
`
`Fig. 1. Vol!’ packet structure for Ethernet.
`
`The voice-data length of an IP packet usually depends
`on the method used for the \bIP-GW. Eighty-byte voice
`data is often used for G.7l 1, whereas 20-byte Voice data is
`used for G.729 in conventional VoIP communication.
`
`However, it is also permissible to change the voice-data
`length per packet by changing the VoIP-GW setup.
`Table I shows the relationship between the packet
`transmission cycle and voice-data length, and Fig. 2 shows
`the relationship between the packet transmission cycle and
`the bandwidth occupied by the VoIP frames of an Ethernet.
`The longer the transmission cycle becomes, the longer the
`voice-data length. Moreover, the longer the voice data in an
`IP packet becomes, the more the transmission efficiency
`increases because the V011’ packet has overheads for the
`MAC header (in the case of the Ethernet), IP header, UDP
`header, and RT? header (Fig. 3). However, the longer one
`packet becomes, the more packet errors are likely to occur,
`so it is
`important
`to evaluate how the network traffic
`conditions affect
`the packet behavior and Q05 in VoIP
`systems.
`
`TABLE I
`
`Relationship between packet transmission cycle
`and voice data length
`
`Transmission
`
`
`
`A high traffic load can cause packet loss and jitter, and a
`poor transmission line can cause bit errors. Larger jitter
`than the jitter absorbing buffer size may cause phcket loss,
`and bit errors may also result in packet loss if the data link
`layer
`(such as HDLC (High-level Data Link Control
`procedure)) has a function for dropping irregular frames.
`Given 40 bytes of voice data in an. IP packet, e.g., when the
`bit error rate is 0.01%, the reproduction rate of voice data
`in the worst case may be 96.8%. Moreover, given 800 bytes
`of voice data in an IP packet, the reproduction rate of voice
`data in the worst case may be 36%. This explains why the
`VoIP quality decreases drastically with bit error rate. The
`above relationship is expressed by the following formula,
`
`Err_f=L’8*e
`
`(1)
`
`where Err_f (%) is the errored frame rate, L (bytes) is the
`voice data length per IP packet, and e (‘‘/o) is the bit error
`rate. Note that this does not take into consideration the
`possibility of bit errors in the packet header.
`
`IV8
`
`kbns)
`
`
`
`Occupiedbandimdth( §
`
`—O—G.711
`-'—G.729
`
`5
`
`20 40
`
`60 80 100
`
`Transmission cycle (ms)
`
`Fig. 2. Bandwidth occupied by Vo[P frames.
`
`C
`3 1
`‘E as
`E 06
`S
`-
`o_4
`E 0.2
`E o
`
`—o—G 71
`1
`.
`-I-G129
`
`5
`
`20 4o 60 so 100
`Transmission cycle (ms)
`
`Fig. 3. Transmission elfldency
`
`III.
`
`EXPERIMENT
`
`We evaluated the effect of changing the packet loss rate
`and the voice data length of IP packets on the setup
`described below. The background traffic generator fixed the
`frame length and the amount of background traffic.
`
`A. Test-bed network
`
`setup we used to
`Fig. 4 shows the measurement
`evaluate the voice quality of a VoIP system.
`
`
`
`Fig. 4. Tut-bed network.
`
`1619
`
`
`
`the measured PSQM+ and standard
`6 shows
`Fig.
`deviation for five voice-data lengths (160, 320, 480, 640,
`800 bytes) given a constant packet loss rate of 3%. Here,
`the G.7ll codec was selected and the eight voice-sample
`files were transmitted. The 160-byte voice-data packets had
`the highest
`tolerance to packet
`loss and the 800-byte
`voice-data packets had the highest fluctuation in PSQM+,
`indicating that the longest packet was affected most.
`
`160
`
`320
`
`480
`
`640
`
`800
`
`Voice-data length (bytes)
`
`(a) MeasuredPSQM+
`
`2.5
`
`2
`
`1.5
`
`+ 5
`
`U)n.
`
`E 0.9
`.9
`.3>
`
`0.6
`
`0.3
`
`BE
`
`SC
`Em
`
`160
`
`320480
`
`540
`
`800
`
`Voice-data length (bytes)
`
`(ls) Standard deviation
`
`Fig. 6. Examples of measured results for several voice-data length 3
`(G371 I, packet loss rate: 3%).
`
`the measured PSQM+ and standard
`Fig. 7 shows
`deviation influenced by jitter. Here, we also selected the
`(3.711 codec. We concluded the following from these
`results:
`
`I
`
`0
`
`0
`
`There was no difference between long—voice data and
`short-voice data, though the more the jitter the lower
`the PSQM+ value, when jitter occurred in the
`network.
`
`Longer voice-data packets had higher fluctuations in
`PSQM+, showing that the longer packet was affected
`more.
`
`for actual
`the PSQM+ value
`Considering that
`communications should be less than about 2.5, the
`G.7ll codec can tolerate jitter up to about 20 ms.
`
`Additionally, when we compared the PSQM+ values of
`jitter-absorbing buffers A (large) and B (small), we found
`that the jitter tolerance of the former was higher than that
`of the latter.
`
`Packet loss rate (96)
`
`(h) 20 and 200 bytes (GJ29)
`Fig. 5. Examples of cum-ed PSQM+ tortwo-volee-data length.
`
`1 620
`
`A voice quality tester measured the PSQM+ between two
`VoIP-gateways, and a network emulator varied the packet
`loss rate within a range from 0 to 5% and average jitter
`time within a range from 0 to 50 ms. Background traffic
`was generated in the same direction as the voice data
`packet flow
`
`B. Voice samples
`
`The speech database [13] contained eight voice sample
`files (four samples from two men speaking in Japanese,
`four samples from two women speaking in Japanese),
`which was based on ITU-T recommendation P.80. The
`PSQM+ value in the results is the average of these eight
`samples. We also calculated the standard deviation from
`these.
`
`C. Results
`
`0
`
`Fig. 5 plots the measured PSQM+ for G.7ll and G.729
`codecs versus packet loss rate, where we chose a maximum
`and minimum that the VolP-GW could assign for each
`codec as the voice—data length of the Vol'P packets. We
`concluded the following from these results:
`0
`Short-voice data and long-voice data have the same
`voice quality when packet loss rate equals zero.
`The higher the packet loss rate, the lower the voice
`quality (= higher PSQM+),
`and the longer
`the
`voice-data packet length,
`the lower the tolerance to
`voice-data packet loss.
`actual
`Considering that
`the PSQM+ value for
`communications should be less than about 2.5,
`the
`G711 codec can tolerate packet loss rates of up to 5%,
`and the G.729 codec can tolerate losses up to 2%.
`
`o
`
`3
`
`+ 25
`5 2
`E
`1.5
`
`1
`
`-9"800bytes
`—I—16Obytes
`
`D
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`Packet loss rate 0%)
`
`(a)l60and800bytes(G.71l)
`
`PSQM+
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`
`
`little delay is experienced in telephone
`long as
`communications, because longer voice data increase
`end-to-end delay. If the response time exceeds a certain
`threshold, the VoIP-GW reduces the voice-data length.
`If the response time is less than the threshold,
`the
`VoIP-GW increases the voice-data length.
`If the IP network is stable (packet loss rate of nearly
`0%),
`the V011’-GW assigns long voice data. If the
`packet
`loss rate exceeds a certain threshold,
`the
`VoIP-GW reduces the voice-data length to preserve
`communication quality. The VoIP-GW then increases
`voice-data length when network conditions return to a
`stable state.
`
`If the jitter time is less than a certain threshold, the
`Vol!’-GW assigns long voice data. If the jitter time
`exceeds a certain threshold, the VoIP—GW reduces the
`voice-data length to preserve communication quality.
`The VoIP-GW then increases voice-data length when
`jitter time returns to a low level.
`VDIP-GW
`VolP-GW
`
`changfioicwm 131919. Change voicedata length
`
`Acquire network cépdifim
`Change voice-data length
`
`Acquire nehvork -fidillflfl
`
`Amuirgzp-etnorlc condition
`Change vcicedaia length
`
`Acquire network condition
`
`Fig. 8. Variable voice data length Vol)’ system.
`
`Next, we provide the threshold for network condition
`values (response time, packet loss rate and jitter time) to
`change the voice-data length. When network condition
`values exceed the threshold,
`the VoIP-GW varies the
`voice-data length and jitter absorbing buffer size.
`
`A. Response time
`
`ITU-T defines the guidelines for one-way transmission
`time in G.l14 [14] as follows.
`0 to 150 ms: Acceptable for most use‘ applications.
`150
`to
`‘l00 ms: Acceptable
`provided
`that
`Administrations are aware of the transmission time
`impact
`on
`the
`transmission
`quality of user
`applications.
`above 400 ms: Unacceptable for general network
`planning purposes; however, it is recognized that in
`some exceptional cases this limit will be exceeded.
`
`The delay time for the source and destination VoIP -GW
`used in our evaluation was about 90 ms (voice data length:
`‘I60 bytes) — 160 ms (voice data Icngth: 800bytcs) under
`G.7ll. Considering the delay time in the VoIP-GW, the
`threshold for the delay time in an IP network should be less
`than 200 ms.
`
`B. Packet loss rate
`
`From the results in Section III.C, the threshold for packet
`
` 20
`
`30
`
`40
`
`Jitter average time (ms)
`(a) MeasuredPSQM+
`
`deviation
`Standard
`
`Jitter average time (ms)
`(11) Standard deviation
`Fig. 7. Examples ofmeasuredlnfluence by jitter(G.'I1l).
`
`From these results, we found there was a relationship
`between voice quality and voice-data length in VoIP
`systems as follows.
`
`TABLE I]
`
`Cliarcteristlcs of voice-data length
`
`Influence of
`Tolerant to packet
`Tend to degrade
`
`
`loss
`More fluctuation in
`packet loss
`
`Less fluctuation in
`voice quality
`voice uali
`
`
`Less flucruation in
`voice uali
`
`Stability of voice
`I uali
`
`
`
`
`
`
`
`
`voice uali
`
`efficien
`
`Influence ofjitter
`
`
`
`
`IV.
`
`VARIABLE VOICE DATA IENGTH VoIP SYSTEM
`
`In the previous section, we obtained the characteristics
`of a VoIP system operating under
`inferior network
`conditions. They can be used to create a VoIP system that
`offers optimal voice quality and transmission efficiency by
`varying the voice data length in IP packets based on
`network QoS conditions. Fig. 8 shows the architecture of
`the variable voice-data-length Voll’ system we proposed.
`The VoIP-GW of the system works as follows:
`
`0
`
`network conditions
`system monitors
`The VoIP
`rate,
`jitter
`time) by
`(response time, packet
`loss
`periodically pinging the destination Vol?-GW after a
`call connection is set up. Here, we regard a late
`response time, high packet loss rate and a large jitter
`time as an inferior network.
`The VoIP-GW assigns long voice data to IP packets as
`
`1621