throbber
VOLUME I
`
`INET’95 Conference
`Proceedings
`
`Editor: Kilnam Chon
`
`International Networking Conference
`Honolulu, Hawaii, USA
`27-30 June 1995
`
`The Annual Conference of the Internet Society
`
`To order copies, contact the ISOC Secretariat at:
`Internet Society
`12020 Sunrise Valley Dr., Suite 270
`Reston, VA 22091
`USA
`isoc@isco.org
`
`RPX Exhibit 1222
`RPX v. DAE
`
`

`
`Reliable Audio for Use over the Internet
`
`Proc. INET ’95
`
`V.Hardman, A.Sasse. M.Handley, A.Watson
`
`Reliable Audio for Use over the Internet
`Vicky Hardman <v.hardman@cs.ucl.ac.uk>, Martina Angela Sasse <a.sasse@cs.ucLac.nk>,
`Mark Handley <m.handley@cs.ucl.ac.uk>, Anna Watson <a.watson@cs.ucl.ac.uk>
`
`Abstract
`This paper describes current problems found with
`audio applications over the MBONE (Multicast Back-
`bone). and investigates possible solutions to the most
`common one - packet loss. The principles of packet
`speech systems are discussed, and how the structure
`allows the use of redundancy to design viable solu-
`tions to the problem. The paper proposes the use of
`synthetic speech coding algorithms (vocoders) to pro-
`vide redundancy, since the algorithms produce a very
`low bit-rate stream, which only adds a small over-
`head to a packet. Preliminary experiments show that
`normal speech repaired with synthetic quality speech
`is intelligible, even at very high loss rates.
`
`Introduction
`The application of this work is multimedia confer-
`encing over the MBONE (Multicast Backbone), an
`experimental overlay network of the Internet. The
`work has arisen from experiences in multi-way multi-
`media conferencing in Project MICE (Multimedia In-
`tegrated Conferencing for Europe) [1], is currently
`applied in Project ReLaTe (Remote Language Teach-
`ing over SuperJANET) [2], and includes formal ex-
`periments into the human perception of packet speech
`systems degraded by packet loss.
`If multimedia conferencing is to become widely
`used in the Internet community, user must perceive
`the quality to be sufficiently good for most applica-
`tions. Experience has shown that audio is almost al-
`ways the most important component of multimedia
`conferencing. Whilst we have identified a number of
`problems which impair the quality of audio, the ma-
`jor one with audio over the MBONE is packet loss
`[3]. This paper attempts addresses the problem of
`packet loss over the MBONE.
`Packet loss can occur for a number of reasons:
`- congestion of routers and gateways, which lead
`to packet being discarded;
`- delays in packet transmission, with packet arriv-
`ing too late at the receiver to be played back;
`- heavy loading of the workstations, leading to-
`scheduling difficulties in multi-tasking operat-
`ing systems.
`Packet loss is a persistent problem, particularly
`given the increasing popularity, and therefore in-
`creasing lead, of the Internet. Possible ways of com-
`batting congestion include bandwidth reservation and
`moves toward an integrated service management on
`the Internet. These would require wide-scale changed
`
`to be agreed and implemented, so these solutions will
`be available in the short to medium term. Yet, the
`disruption of speech intelligibility even atJow loss
`rates which we currently experience may convince a
`whole generation of users that multimedia conferenc-
`ing over the Internet is not viable. We therefore pro-
`pose a solution which renders the speech intelligible
`under current network conditions, and can be de-
`ployed in the short term. Such a solution will have to
`be at the application level, i.e. the multicast audio
`tools.
`Current audio applications repair lost packets with
`silence, which leads to the speech clipping effects
`currently experienced by many users. Since compara-
`tively large packets are used, even the loss of individ-
`ual packet loss has a serious impact on the
`intelligibility of speech.
`We propose a method of repairing damaged
`speech using cheap redundancy within the packets
`sent from the transmitter. The redundancy is synthetic
`speech, which, when split into packets, only adds a
`very small amount of overhead, and therefore does
`not add to the congestion at the network level. The re-
`dundancy for any given packet of speech is piggy-
`backed onto a later packet. This mechanism means
`that when the receiver suffers the loss of the primary
`speech information, it still has the possibility of sub-
`stituting something sensible in the output stream of
`speech, provided that the redundancy can be received.
`In order to establish the effectiveness of this solu-
`tion, we have performed experiments into user per-
`ception of speech repaired with a synthetic substitute,
`the experiments subjectively measured speech intel-
`ligibility, and the results show that this technique is
`very successful at repairing speech with large packet
`sizes and for very high loss rates (results were taken
`up to 40%). The paper also describes how the pro-
`posed solution scales in the multicast environment.
`
`Background
`Speech Coding
`Speech coding schemes have been standardised
`for use pver telephone networks; a variety of speech
`coding algorithms exist for a single target quality of
`service (QoS), and at a very few discrete bit-rates:
`Pulse Code Modulation (PCM) operates at 64 kbps.
`Adaptive Differential Pulse Code Modulation (AD-
`PCM) operates at 32 kbps, and Code Excited Linear
`Prediction (LD-CELP) operates at 16 kbps. The target
`QoS is ’toll’ (or telephone) quality, and each algo-
`rithm available at this QoS produces a different bit-
`rate: the improvement in bit-rate being obtained for
`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`171
`
`

`
`Reliable Audio for Use over the Internet
`
`Proc. INET ’95
`
`V.Hardman, A.Sasse, M.Handley, A.Watson
`
`increasing complexity in the coding algorithms. An-
`other standard speech coding algorithm is Groupe
`Speciale Mobile (GSM), which was designed for use
`over cellular telephone networks. The target QoS is
`consequently slightly less than toll, but the algorithm
`is popular, since it operates at the same bit-rate as
`CELP, but is much less complex. A fuller discussion
`of toll quality speech coding algorithms can be found
`in [4].
`A second class of coding algorithms exist, which
`operate at the ’communications’ or synthetic QoS.
`These algorithms operate at very low bit-rates (ap-
`prox. 4.8kbps and below), and produce very mechan-
`ical sounding speech. Perhaps the most important
`method of this class is Linear Predictive Coding
`(LPC), since the principle is also an integral part of
`both the CELP and GSM coders. A fuller description
`of which can be found in [5].
`Packet Speech Systems
`Packet speech systems usually employ the stand-
`ard speech coding algorithms, and group the emerg-
`ing stream of codewords into packets for transmission
`over the network. At the receiver, the packets may be
`delivered: out of order, not at all, or at non-uniform
`intervals. Consequently, a reconstruction delay must
`be used at the receiver to repair the network effects;
`this enables sample play-out to be smoothed.
`In a packet speech system, the end-to-end delay is
`always a critical factor in the usability of a real-time
`voice system, and should be kept below 600ms in the
`absence of echoes (The figure may be in fact be less
`than this - 400ms) [6], if conversation patterns are not
`to break down. The size of fhe packets (in ms) chosen
`for a packet speech-system directly impacts the end-
`to-end delay. A delay equal to the size of one packet
`is incurred at the transmitter, since the samples in the
`packet have to be collected before a packet can be
`sent. At the receiver, a rough estimate of the recon-
`struction delay required to smooth out packet arrival
`times is two packets worth in ms [7] [8], although the
`true value may be substantially in excess of this rule
`of thumb. Consequently, a minimum of three packets
`worth of delay is incurred on an end-to-end basis, be-
`fore the network propagation delay has been taken
`into account.
`The delay introduced will be enough to receive
`most of the packets, but some will always arrive too
`late to be played back, and can be considered ’lost’.
`Furthermore, the network itself may lose packets. In
`such situations, the speech ’stream’ must be repaired,
`and a dummy packet inserted in place of the lost one,
`so that the correct timing relationship is maintained
`between the transmitter and receiver. The presence of
`the dummy packet is usually discernible to the listen-
`er, and unfortunately, the perceptibility of the loss in-
`creases with increasing packet size, as well as with
`increasing loss rate.
`The impact of the two factors identified above,
`(delay and loss), is such that small packet sizes are re-
`quired for real-time voice links. However, the use of
`
`small packets increases the overhead of packet head-
`ers, and any processing incurred at network nodes, ^
`and therefore increases the likelihood of congestion
`and loss. Consequently, a trade-off exists between the
`requirements of the network, and the requirements of
`real-time voice connections.
`Voice Reconstruction Techniques
`Repair methods for packet loss are known as voice
`reconstruction mechanisms. The aim is to construct a
`suitable dummy packet at the receiver, so that the loss
`is as imperceptible as possible. With compressed
`speech, voice reconstruction mechanisms not only
`have to produce a suitable fill-in packet, but also have
`to maintain the decoder tracking, since the algorithms
`transmit difference information. Voice reconstruction
`techniques can be split into two categories; receiver
`only, and combined source and channel techniques.
`
`Table 1: Voice Reconstruction
`Techniques
`
`Combined Source and
`Channel
`
`Embedded Coding
`
`Redundancy
`
`Receiver-Only
`
`Silence
`
`White Noise
`
`Waveform Substitu-
`tion
`
`Sample Interpolation
`
`Receiver-only techniques are those that try to re-
`construct the missing segment of speech solely at the
`receiver, possibly from correctly received packets
`preceding that which was lost. Combined source and
`channel techniques are those that try to make the sys-
`tem robust to loss by either arranging for the transmit-
`ter to code the speech in such a way as to be robust to
`packet loss, or by transmitting extra information to
`help with reconstruction.
`Receiver-OnlyT echniques
`The original voice reconstruction techniques were
`receiver-only, and used either silence, white noise,
`repetition of part of the last correctly received speech
`waveform, or sample interpolation as the substitute.
`Silence substitution is favoured because it is sim-
`ple to implement, and it gives adequate performance
`for small packet sizes (<16ms), and up to 1% loss [9]
`[10].
`It is well known that other methods give substan-
`tially better results than those obtained from silence
`substitution. Warren [11] investigated the human per-
`ception of speech interrupted by silence compared to
`noises, such as coughs. The results show that phone-
`mic restoration (the ability of the human brain to sub-
`consciously repair the missing segment of speech
`with the correct sound) occurs for the noise situation.
`
`172
`
`

`
`Reliable Audio for Use over the Internet
`
`Proc. INET ’95i
`
`V.Hardman, A.Sasse, M.Handley, A.Watson
`
`and does not occur for silence substitution.
`Experience from the MICE project has shown that
`listeners can become very frustrated with MBONE
`speech. Their frustration stems from^a variety of au-
`dio problems:
`- interrupted speech due to packet loss - the speech
`sounds at first bubbly, and then individual in-
`terruptions are apparent;
`- talking into a ’dead’ channel, and the lack of au-
`tomatic gain control for listeners;
`- high levels of background noise cutting in and
`out;
`f
`- distortion due to overloading the microphone
`when speakers shout, or mis-matched levels.
`Packet loss is the most ^strating problem, and
`one users cannot cure of their own accord. The fhis-
`tration with interrupted speech can be explained by
`considering the linguistic construct of a sentence,
`which includes a pause (of duration > a phoneme) at
`the end of the sentence. Since the size of packets used
`over the MBONE are often comparable to the length
`of a phoneme, the interruptions in the speech flow
`sometimes occur at inappropriate points, which sends
`ambiguous signals to brain, as to whether speech is
`continuing or not [11].
`White noise was shown to give a subjective per-
`formance improvement over silence by Miller [12]
`when contextual information in speech was removed,
`and an intelligibility improvement by Warren [11]
`when the contextual information was present. Conse-
`quently, silence substitution is not a suitable means of
`voice reconstruction, since white noise is known to
`give improvements, and is as easy to generate as si-
`lence.
`Other receiver-only voice reconstruction tech-
`niques rely on the assumption that the speech charac-
`teristics have not changed from a precfeding segment
`of speech, and use this preceding segment informa-
`tion to reconstruct the missing part; a simple example
`of this sort of voice reconstruction would be to repeat
`the last correctly received packet.The mechanisms
`fail when the packet sizes are large, and the loss rate
`is high (packets are more likely to be lost in twos or
`threes, than singularly). A fuller explanation of exist-
`ing receiver-only techniques can be found in [13].
`Combined Source and Channel'Techniques
`Combined source and channel techniques gener-
`ally show significant improvement over receiver only
`techniques. The techniques either transmit extra in-
`formation within the speech packets (to help with re-
`construction at the receiver), or alter the speech
`coding algorithm and network operation (to make
`system as a whole more robust to packet loss).
`Embedded speech coding techniques used with
`adaptive differential pulse code modulation (ADPCM
`[14]), such as those by [15], [16], and code excited
`linear prediction (CELP) [17], have shown significant
`
`performance improvements during packet loss.’Em-
`bedded speech coding techniques allow the bit-rate
`can be adjusted from 40 to 32 or 23 kbps, without the
`introduction of large amounts of noise; essentially the
`feed-back loops in the encoder and decoder operate at
`a lower resolution than usual. The standard was de-
`signed to ease the problem of packet loss in packet
`networks; the codewords are segmented into high and
`low priority bits, and then placed in different packets.
`The mechanism relies on arranging for the network to
`drop packets containing LSBs only, which means that
`the mechanism is not applicable to networks which do
`not provide this support, such’ as today’s Internet.
`Lara-Barron [15] investigated embedded AD-
`PCM coding techniques at 16-32 ms, and reported
`success for up to 40% loss (no reduction in speech
`quality for up to 6% loss).
`The significant improvement resulting from the
`use of this mechanism is mostly due to the preserva-
`tion of the decoder adaption logic [16].
`Speech Quality
`Speech quality may be assessed by either subjec-
`tive or objective means, although it is well known that
`subjective assessment methods provide more accu-
`rate results.
`Subjective assessment is usually made by per-
`forming listening tests using a large number of sub-
`jects.'The material used, and the measurements made,
`depend upon the likely degree of distortion expected.
`Toll quality speech coding algorithms are usually
`assessed by mean opinion scores (MOS) [18], where
`encoding distortion and noise are the likely type of
`degradation suffered. The technique involves the lis-
`tener making a category rating after listening to a pas-
`sage of speech.
`Synthetic quality speech coding algorithms result
`in speech that has far greater degradation than found
`in toll quality systems; intelligibility .is usually only
`adequate at best. Consequently, the MOS method is
`not suitable, and communications quality systems are
`assessed using comprehension or intelligibility tests.
`There is a wide range of speech material available,
`ranging from a sequence of syllables '(the listeners
`transcribe what they hear) to passages (the compre-
`hension of which is ascertained by asking a series of
`questions) [19].
`The speech material is chosen based on the re-
`quired sensitivity of the results, desired experimental
`control, and range of human faculties included in the
`test.
`Reliable Audio for Use over the Internet
`We have developed a new voice reconstruction
`scheme that uses redundancy to improve voice recon-
`struction at the receiver.
`The redundant information is the output of a syn-
`thetic quality speech coding algorithm (LPC), which
`
`

`
`Reliable Audio for Use over the Internet
`
`Proc. INET ’95
`
`V.Hardman, A.Sasse, M.Handley, A.Watson
`
`is very low bit-rate (4.8kbps). LPC is generally con-
`sidered to contain about 60% of the information con-
`tent of the speech signal, as the overall shape of the
`frequency spectrum is preserved at the expense of
`short-term amplitude and pitch variations. This tech-
`nique is exactly what is required for successful voice
`reconstruction; the gap will be filled with a sound that
`is expected, and phonemic restoration should im-
`prove the situation further.
`The Characteristics of the MBONE / Internet
`The Interiiet, and its multicast overlay (MBONE)
`is a unique ’shared’ packet network, that offers scala-
`ble multi-way communication. Such a network has
`traditionally not been considered suitable for speech
`applications, because of the large end-to-end delay
`that is commonly experienced ovpr the network, and
`the potentially high probability of packet loss, (with
`relatively large segments of speech being lost). Cur-
`rent audio tools such as vat [20], and nevot [21] have|
`however, demonstrated that these problems are not
`prohibitive to successful voice communications,
`since use of these tools is very widespread.
`The Internet provides variable length packets, a
`feature which has the potential for fine-grained con-
`trol over the trade-off between network and speech
`performance requirements. The ’per-packet’, rather
`than ’size-.of-packet’ network penalty for small pack-
`ets coupled with the ability to have variable length
`packets also means that the state information from the
`speech coding algorithms can be transmitted in the
`packet, which substantially improves the perception
`of packet .loss. Current audio tools transmit coding al-
`gorithm st^te information in each packet, but replace
`lost packets with silence.
`The Loss Charaeteristics of the MBONE
`Current research by a MICE partner, Bolot [22], is
`investigating the numljer of consecutive losses found
`over the MBONE. The results show that for light and
`intermediate loads, losses are essentially non-consec-
`utive for an audio stream, and for heavy loads, the be-
`haviour is similar, but consecutive losses are more
`prevalent.
`These results suggest that a model where the re-
`dundancy is positioned in the packet after speech
`from the primary coding algorithm is suitable for light
`and medium network loads, and a model with the re-
`dundancy positioned a number of packets later is suit-
`able for heavy loads.
`Voice Reconstruction for the MBONE
`LPC as the redundant information adds only a
`small amount of overhead to an RTP [23] packet (12
`bytes per 160 bytes of PCM (or per 80 bytes of AD-
`PCM). The information is piggy-backed to the packet
`following that containing the primap^ speech code-
`words; that the loss of an individual jiacket can be re-
`paired using the redundant information in the
`following packet. This mechanism is unique to packet
`networks, and is only feasible because of the recon-
`
`174
`
`struction delay introduced at the receiver.
`The use of this redundancy technique means an in-
`crease in the reconstruction delay by the time equiva-
`lent of the distance of the redundancy component
`after the primary component; this implies an extra de-
`lay of one packet for light and medium loading con-
`ditions.
`Multiple multicast receivers in a single conference
`may experience a variety of the characteristics report-
`ed in [22]. Consequently, the reconstruction mecha-
`nism may occasionally have more than one instance
`of the redundancy after the primary coding scheme
`packet. In this way, the heavy loading characteristics
`seen by one site do not affect the performance of the
`majority.
`
`Redundancy
`Primary Speech
`Coding
`
`a)
`
`b)
`
`X 1
`4 1
`;V
`6 1
`
`m
`5 >
`m
`
`3 E
`
`' 4 I
`
`V
`
`2 1
`
`3 i
`
`Figure 1: The Positioning of Redundancy
`Relevant to the Primary Coding for
`a) low and b) higher loss rates
`
`The provision of LPC redundant information for
`use in voice reconstruction is intended to be used with
`per-packet state information; this prevents decoder
`mistracking in the case of loss. When a packet has
`been lost, the receiver decodes the redundant infor-
`mation, and feeds the samples to the audio hardware.
`Consequently, the output speech waveform consists
`of periods of toll quality speech, interspersed with pe-
`riods of synthetic quality speech.
`While LPC is a fairly complex speech coding al-
`gorithm, it should be noted that linear predictive anal-
`ysis and synthesis are an essential part of all new
`higher compression schemes: GSM and CELP both
`use these techniques as a first step in their algorithms.
`LPC also has the potential to be used elsewhere in the
`system; as an improvement to the silence detection
`function, which is an integral part of most packet
`speech systems.
`Experimental Design
`Voice reconstruction experiments to date have
`usually been performed with packet sizes of 16-32ms,
`or less. The packet sizes used over the MBONE are
`usually greater than these values (40ms is commonly
`used). Consequently, little information exists about
`the degradation commonly experienced in voice con-
`
`

`
`Reliable Audio for Use over the Internet
`
`Proc. INET ’95
`
`V.Hardman, A.Sasse, M.Handley, A.Watson
`
`A detailed description of the experimental details
`can be found in [13] and [26].
`
`Results
`The interaction between reconstruction scheme
`and packet size is difficult to analyse, since other fac-
`tors, such as the temporal characteristics of speech
`sounds, and the masking and temporal order percep-
`tion capabilities of the ear also affect human percep-
`tion.
`
`Silence Substitution
`The first point to consider is whether the results
`from these experiments give an insight into the
`speech perception from current audio tools.
`______Silence
`___Repetition
`___LPC Repair
`%Articulation
`nc____________ -
`yj
`-
`_______________ I________________
`1 -- -- •_ -- -- t A DPPM
`•S.
`“l-xj ... )............. No Loss
`
`t. —
`\
`
`-1
`
`nections over the MBONE/Intemet. This experiment
`was therefore designed to compare LPC redundancy
`with waveform substitution dfld silence substitution;
`receiver-only techniques, such as-waveform substitu-
`tion are cheap to implement, and could potentially be
`used instead of LPC under certain circumstances. The
`experiment was also designed to try and give a broad
`outlook on the question of the degradation experi-
`enced as a result of packet loss over the Internet, by
`considering a wide range of loss rates (0-40%).
`Waveform substitution was chosen as a represent-
`ative receiver-only method, and the simplest of these,
`packet repetition, was considered suitable for com-
`parison.
`It was assumed that LPC redundancy could al-
`ways be received in the event of loss - this assumption
`does not hold true in the real world, but provides a
`valid basis for the experiment design.
`Waveform substitution repeats the last correctly
`received packet until the period of loss ends. This
`mechanism has an inherent draw-back when the lost
`packet is the last in a talk-spurt; the last packet will be
`repeated until a new talk-spurt starts.
`The solution to this problem lies in realising that
`there should be a limit on the length of ’waveform
`substituted’ speech, before the underlying'assumption
`of no change in the speech characteristics breaks
`down; a suitable figure for this is 80ms, or the average
`length of a phoneme. Consequently, when the packet
`size is 20ms, the last correctly received packet may be
`repeated three times. When the packet size is 40 ms,
`the packet may be repeated twice (120ms). When the
`packet size is 80ms, the packet may be repeated once
`(160ms).
`The speech was recorded and manipulated using a
`Sun SPARC station 10, the OGI speech tools soft-
`ware [24], which is a system that was developed for
`speech recognition experiments, and software written
`by the author (to generate the loss, code and decode
`the speech using different algorithms etc.). The co-
`deds used in these experiments are publicly available
`versions [ADPCM][LPC], with ADPCM conforming
`to the CCITT standard [25]. The loss was generated
`randomly.
`The subjective quality was assessed using phonet-
`ically balanced (PB) words, which proportionally
`represent the sounds found in every-day English
`speech.
`There were three groups of seven subjects, those
`who heard the PB word lists reconstructed by silence
`substitution, those who heard waveform substitution,
`and those who heard LPC reconstruction. Each sub-
`ject in each group had ten different lists, nine of which
`were test conditions, and the first of which was a no
`loss control condition. The lists were 25 words long,
`and after recording answers, for each list, the subjects
`completed a MOS rating scale, rating the quality of
`the speech just heard on a five-point scale, from bad
`to excellent.
`
`A..'. 7
`\ :
`\:
`
`'
`
`•.
`
`'
`
`10
`
`20
`
`30
`
` LPC
`40'"'“"
`%Loss
`lire 2: Voice ReconstructiQn for 20/40ms Packets
`
`85
`
`75
`
`65
`
`55
`
`45
`
`35
`0
`
`Fig
`
`Referring to figure 2: Silence Substitution for 20
`and 40 ms packet sizes, it can be seen that the graph
`indicates that using silence substitution for the voice
`reconstruction scheme fails between 15 and 20% loss.
`For 80ms packets (see figure 3), the results suggest
`that intelligibility is inadequate at a 15% loss rate.
`This observation is consistent with experiences
`obtained from Project MICE. The observation is also
`in keeping with findings reported by [12] and [27];
`’speech degradation as high as 50% can be tolerated
`(intelligibility of 80%) if the packet size is small
`(0.019s)’ on the other hand, if packets are long
`(0.25s), and the loss probability is high, intelligibility
`decreases to very low values (10%)’.
`Waveform Substitution
`The results for waveform substitution indicate
`that, for a packet size of 20ms, intelligibility does not
`decrease significantly with loss rate over the range of
`measurements taken.’ When the packet size is 40 ms,
`however, a significant difference is found between
`30% loss and 40% loss, whilst with 80 ms packets sig-
`nificant differences can be found at much lower levels
`of loss: performance drops between 15 and 20%, and
`between 20 and 30%.
`
`

`
`Reliable Audio for Use over the Internet
`
`Proc. INET ’95
`
`V.Hardman, A.Sasse, M.Handley, A.Watson
`
`Conclusions and Further Work
`This paper has described a voice reconstruction
`method for use in packet networks. The mechanism is
`suitable for both unicast and multicast connections
`under all types of network conditions, although the
`main intelligibility advantage is to be found with
`large packet sizes, and medium to high loss rates; a re-
`ceiver only voice reconstmction mechanism is suita-
`ble for light loss rates and small packet sizes.
`Listeners however, prefer LPC reconstruction for
`packet sizes of 40 and 80ms.
`The mechanism may have minimal overhead in
`processor utilisation, since LPC analysis is part of the
`more complex coding algorithms, and may be used in
`the future to enhance other aspects of an audio tool.
`The method also only adds a small amount of over-
`head to the bandwidth used in a voice conference.
`Since the overhead in terms of bandwidth is small,
`it may be desirable to have multiple copies of the re-
`dundancy present. This rationale would enable an au-
`dio tool to provide multi-cast international
`conferences with a voice reconstruction mechanism
`that scales; receivers at the end of ’good’ network
`branches would have good performance with imper- •
`ceptible loss, .while receivers at the end of ’poor’ net-
`work branches would experience a small reduction in
`speech quality, but would have a larger end-to-end
`delay.
`The paper describes formal speech quality tests on
`the human perception of speech at different packet
`loss rates. The results show that the perceptibility of
`packet loss need no longer be regarded as one of the
`main constraints on the packet size if LPC redundan-
`cy is used to aid voice reconstruction.
`In particular, the voice reconstruction should take
`the following form:
`At low loss rates (20% and below) and small
`packet sizes (20 and 40ms) the results show that a re-
`ceiver only technique is suitable, although LPC re-
`dundancy should also be used for 40ms packets, since
`listeners preferred it to waveform substitution.
`At higher loss rates and for all conditions when
`packet size is 80ms, LPC redundancy should be used.
`The MICE project is currently implementing a
`prototype audio tool, which will include voice recon-
`struction using LPC redundancy. Techniques to pro-
`vide controlled feed-back from multiple receivers in
`multi-cast conferences are also being developed,
`which will enable the use of redundancy to be adap-
`tive to receiver’s needs. Further studies on human
`perception of audio and network performance, are
`planned, which will investigate the limits on packet
`size; the study will include an analysis of the impact
`of delay, as well as the human perception of large
`packet loss. The human perception studies will also
`address the relationship between objective measures
`(such as network performance results), and subjective
`perception and performance.
`
`Comparison of Silence Substitution and
`Waveform Substitution
`A comparison between the two receiver-only
`voice reconstruction schemes was made for different
`packet sizes. The results show that waveform substi-
`tution is better than silence substitution for packet siz-
`es of 20 and 40ms, but that the advantage is not
`present for packet sizes of 80ms. The highest advan-
`tage was obtained for packet sizes of 20ms.
`The reduction in advantage of voice reconstruc-
`tion scheme for 40ms (compared to 20ms) might be
`due to the practice within the voice reconstruction
`scheme of allowing a 40ms packet to be repeated up
`to twice, which implies an assumption that speech
`characteristics have not changed for 80ms. This as-
`sumption is unlikely to be valid all of the time, since
`the average length of a phoneme is 80ms. Better re-
`sults might be obtained by restricting repetition to one
`40ms interval only.
`LPC Redundancy
`Referring to Figure 2, it can be seen that for packet
`sizes of 20 and 40ms, as loss rate increases, intelligi-
`bility does not deteriorate much. Slight deterioration
`in intelligibility is only present for packet sizes of
`80ms, and then only at high loss rates.
`Comparison of LPC and Waveform Substi-
`tution
`For 20ms and 40ms packets, the results show that
`there is little advantage of using LPC instead of wave-
`form substitution.
`For 80ms packets, LPC should always be used.
`^ MOS Results
`The MOS results show the same trends as the in-
`telligibility results. However, the listeners preference
`for LPC redundancy instead of waveform substitution
`for 40ms packets was more marked than is shown in
`the intelligibility graph (Figure 2).
`
`176
`
`

`
`Reliable Audio for Use over the Internet
`
`Proc. INET ’95
`
`V.Hardman, A.Sasse, M.Handley, A.Watson
`
`Acknowledgements
`The ideas relating to using LPC for redundancy
`have been discussed at length in the technical meet-
`ings of the MICE (Multimedia Integrated Conferenc-
`ing for Europe - ESPRIT 7606 Project). In particular,
`we would like to acknowledge the collaboration with
`Christian Huitema and Jean Bolot from INRIA
`Sophia Antipolis, France, on this issue. Very special
`thanks are due to Van Jacobson (Lawrence Berkley
`Labs), the creator of vat, for many fruitful discus-
`sions and email exchanges on audio problems and
`how to cure them.
`
`References
`[1] Kirstein P.T., Sasse M.A., Handley M.J.,’Recent
`Activities in the MICE Conferencing Project’ Pa-
`per No. 166, INET 95.
`[2] Buckett J. Campbell I. Watson T.J. Sasse M.A.
`Hardman V.J. Watson A. ’ReLaTe: Remote Lan-
`guage Teaching over SuperJANET’ Proceedings
`of UKERNA 95, Networkshop, March 1995.
`[3] Sasse M. A. et al. ’Remote Seminars Through
`Multimedia Conferencing; Experiences from the
`MICE Project, Proceedings of INET94/JENC5,
`pp. 251-258.
`[4] Papamichalis P.E. ’Practical Approaches to
`speech Coding’ Publ. Prentice-Hall 1987.
`[5] Gold B. ’Digital Speech Networks’ Proceedings
`of the IEEE, Vol. 65, No. 12,

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket