`Z.-Y. Shae
`
`M. H. Willebeek-LeMair
`
`IBM Research Division
`T. J. Watson Research Center
`Yorktown Heights, NY 10598, USA.
`mwlm, zshae@watson.ibm.com
`
`Abstract
`Videoconferencing has become one of the major ap-
`plications which is driving new computing and com-
`munication technologies. Solutions for the circuit
`switched and packet-based networks differ consider-
`ably. The H.320 ITU Standard has defined a solution
`for videoconferencing over N-ISDN. This standard ad-
`vocates a centralized approach to multiparty confer-
`encing based on a multipoint control unit (MCU). Var-
`ious aspects of the H.320 standard are clearly reusable
`in packet-switched environments while others a,re not.
`In this paper we examine the differences between a
`centralized MCU multiparty conference approach and
`a distributed multiconference approach which lever-
`ages some of the capabilities of the packet-based net-
`works. Each approach has distinct advantages and
`disadvantages. We discuss these and present a distrib-
`uted Multimedia Multiparty Teleconferencing (MMT)
`system we have implemented.
`I Introduction
`It is evident that videoconferencing has become one
`of the major applications which is driving new com-
`puting and communication technologies. The video-
`conferencing arena, however, is extremely diverse and
`interoperability issues will play a major role in deter-
`mining the rate at which this application will evolve
`as a business. Interoperability issues are driven by
`the range of quality/performance requirements placed
`on videoconferencing systems based on different ap-
`plication scenarios as well as communication network
`connectivity. Application quality requirements range
`from talking head video, on one to two inch screens for
`personal communication devices, to full-motion video,
`on full screens for telemedicine and telebanking type
`applications. However, it is desirable that the wire-
`less hand-held videoconferencing devices interoperate
`with sophisticated room-based systems.
`Videoconferencing solutions are currently evolv-
`ing from several directions. On the one side, there
`are the circuit-switched (e.g., Narrowband EDN or
`the Switched-56Kbps phone lines) types of solutions,
`which being motivated by the telephony industry, can
`be likened to it. On the other side are the packet-
`based network (e.g., Ethernet and Token Ring legacy
`LANs) solutions, which endeavor to carry real-time
`
`fer Mode) might eventually a I low these two approaches
`
`traffic over existing computer communications net-
`works. The advent of ATM 11 (Asynchronous Trans-
`to converge.
`Solutions for the circuit switched and packet-based
`networks differ considerably. These network tech-
`nologies differ in terms of bandwidth, connection vs
`connectionless communication paradigms, jitter, la-
`tency, packet-loss (error) characteristics, etc. As
`a consequence, the solutions for videoconferencing
`are adapted to exploit the capabilities/charancteristics
`of each network. These differences include the en-
`coder/decoder (CODEC) technology for video com-
`pression and decompression, the methods used to
`guarantee network performance, and in the provisions
`within the end-stations to handle the real-time traffic.
`The only true videoconferencing standard which
`exists today is the International Telecommunications
`Union H.320. The H.320 Standard consists of several
`components:
`0 H.261 [8 the video codec for audiovisual services
`
`at px64 k bits.
`
`0 G.711, G.722, G.723 (optional) the audio codecs
`for audiovisual services at px64 kbit/s.
`0 H.221 [7] the frame structure for a 64 to 1920
`kbit/s channel in audiovisual teleservicen.
`0 H.242 [3] the system for establishing communica-
`tion between audiovisual terminals using digital
`channels up to 2 Mbits/s.
`0 H.230 [4] the frame-synchronous control and in-
`dication signals for audiovisual systems.
`0 H.231 [5] multipoint control units for audiovisual
`systems using digital channels up to 2 Mbit/s.
`0 T.120 [6] data protocols for multimedia colifer-
`encing.
`The H.320 videoconferencing standard was specifically
`tailored to the N-XSDN circuit-switched environment.
`The majority of installed room-based systems; conform
`to this standard.
`
`0-8186-7125-4/95 $04.00 0 1995 IEEE
`
`85
`
`CISCO Exhibit 1004, pg. 1
`
`
`
`The ITU Standards have now begun addressing
`videoconferencing in different network environments,
`such as the H.324 for POTS, and the H.310, H.32Y1
`and H.32Z suite for LAN9 and ATM. H.310 is a mul-
`timedia terminal recommendation for use over ATM
`networks and uses equivalent IS0 MPEG-2 stan-
`dards for video (ITU-T H.262) and systems (ITU-T
`H.222.0). H.32Y is the terminal recommendation for
`the adaptation of H.320 for use over ATM networks us-
`ing AAL 1 -the circuit emulation adaptation layer, and
`H.32Z is the recommendation for H.320 over LANs.
`In all of these emerging standards, interoperability
`with the H.320 standard is mandatory. This inter-
`operability, however, may be achieved in many ways.
`The various alternatives differ in the degree to which
`they embrace the H.320 model. Various aspects of
`the H.320 standard are clearly reusable in these other
`environments while others are not. For example, the
`audio and video compression algorithms or variations
`of them are being adopted by most of the evolving
`standards, however, different multiplexing schemes are
`being developed to better suit each network.
`One aspect of the H.320 standard which is less clear
`as to its applicability to these other network environ-
`ments is the centralized approach to multiparty con-
`ferencing. The H.320 standard defines a central con-
`ference server called a Multipoint Control Unit (MCU)
`to enable multiparty calls. Each participant in the call
`connects directly to the MCU, which then controls
`the conference. This paradigm is clearly suited to the
`point-to-point connectivity nature of ISDN networks.
`However, it is not the obvious choice for shared LANs
`or packet switched networks which support multicast
`capabilities.
`In this paper we examine the differences between a
`centralized MCU multiparty conference approach and
`a distributed multiconference approach which lever-
`ages some of the capabilities of the packet-based net-
`works. Each approach has distinct advantages and
`disadvantages. We describe these and draw some con-
`clusions as to the relative merits of each in the respec-
`tive network environments.
`
`2 The Centralized Multiparty Confer-
`encing Approach
`In a centralized approach, all conference streams
`are transmitted to a central server which then distrib-
`utes composed and/or selected streams to the vari-
`ous participants. In the ITU H.320 terminology, this
`central server is known as a Multipoint Control Unit
`(MCU)[S]. The functions performed by an MCU are
`many and range in complexity. Basic MCU designs do
`little more than select a video stream from multiple
`incoming streams to be distributed to all conference
`participants. More complex designs perform a com-
`position of multiple video streams before distributing
`the video stream to various participants.
`To illustrate the differences between various MCU
`capabilities along with the tradeoffs between the com-
`plexity of these functions, the four MCU configura-
`tions shown in Fig. 1 are discussed.
`
`in
`Select and Broadcast. The configuration
`Fig. l(a) is a simple design which selects one of mul-
`tiple incoming streams to be broadcast to all partici-
`pants. The criteria used to select the video stream to
`be distributed ma vary. In a user controlled system,
`a designated user Conference manager may send con-
`
`trol information to the MCU to select t h e video source.
`
`Automatic systems may select the video source which
`corresponds to the active voice signal. The sensitivity
`of the selector must be adjusted to prevent an unintel-
`ligible thrashing between different sources. Synchro-
`nization is required when switching between sources
`to insure that the switching occurs at frame bound-
`aries and, in some cases, that a constant bit rate is
`maintained.
`
`Destination Select. The configuration in Fig. l(b)
`is slightly more sophisticated in that each recipient
`may independently select which video stream to re-
`ceive. This requires a switch within the MCU with
`multicast capability. Signaling between each destina-
`tion and the MCU is required to implement the se-
`lection. As with the Select and Broadcast configura-
`tion, switching between video streams must be syn-
`chronized to frame boundaries.
`
`erence frames which are enco d ed independent of any
`
`Compressed Composition. The configuration in
`Fig. l(c) performs a composition of the incoming video
`streams in the compressed domain. Techniques for
`performing this composition of compressed video are
`presented in [12]. The resulting video stream band-
`width may be a multiple of the incoming bandwidth.
`For example, if three incoming streams of bandwidth
`p are composed into a single stream, where there is
`no overlap of the various sources in the composite,
`the resulting stream will have an average bandwidth
`of 3p. Hence, each station’s incoming and outgoing
`link bandwidths are asymmetric.
`Some video encoding algorithms consist of different
`types of frames (e.g., MPEG , where some are ref-
`other frames, and predicted frames which are encoded
`based on reference frames. The decoding engine treats
`these frames differently, and cannot decode a pre-
`dicted frame without the reference frames to which
`it applies. For these video streams, in order for the
`receiving decoder to be able to handle the compos-
`ite stream as a single stream, the reference frames of
`each source need be synchronized to be mixed in the
`same composite frame, and switching should occur at
`reference frame boundaries. The H.261 video coding
`algorithm does not have this complication since the
`decoding dependency is reduced to the macro block
`level (different macro blocks in a frame may either be
`of a predicted type or a reference type).
`
`Uncompressed Composition. The final configu-
`ration shown in Fig l(d) also performs a composi-
`tion of the incoming video streams, but does so in
`the pixel domain. This requires that each incoming
`stream be decoded and that each outgoing stream be
`encoded. Separate encoders for each outgoing stream
`
`86
`
`CISCO Exhibit 1004, pg. 2
`
`
`
`Figure 1: Multipoint Control Unit Configurations. (a) Single video select and broadcast. (b) Destination video
`select. (c) Video composition in the compressed domain. (d) Video composition in the pixel domain.
`
`87
`
`CISCO Exhibit 1004, pg. 3
`
`
`
`are depicted in the figure due to clock synchroniza-
`tion requirements between the sender and receiver. If
`the synchronization requirement can be solved other-
`wise, a single encoder might suffice. By composing the
`incoming streams in the pixel domain, the resulting
`stream bandwidth can be adjusted to match different
`link speeds.
`Another level of complexity may be added to the
`composition configurations by allowing each destina-
`tion to select its own composition.
`The centralized approach is advantageous since the
`complexity of handling multiple streams can be con-
`fined within this single shared device and the network
`bandwidth requirements can potentially be reduced.
`However, it reduces the flexibility of the end-station
`control, and may increase the latency of the real-time
`traffic which is forced to traverse an intermediate hop
`between source and destination. Furthermore, a pri-
`mary shortcoming of the centralized scheme is that it
`does not scale well. If one envisions large conferences
`(e.g., lectures) with thousands of users, it is not feasi-
`ble for each participant to connect into a central server
`or MCU.
`3 The Distributed Multiparty Confer-
`encing Approach
`The centralized MCU approach is clearly suited to
`the N-ISDN environment for which it was proposed.
`In that environment, all connections are dedicated
`point-to-point connections where a single video and
`audio stream are carried over each connection. In a
`packet or cell-based network multiple virtual connec-
`tions may be multiplexed over a given physical link.
`Furthermore, routers and switches in the network are
`capable of multicasting a packet or cell to multiple
`outgoing links. These two features, namely the multi-
`plexing and the multicasting, may be leveraged to of-
`fer a distributed multipoint conferencing solution with
`advantages over a centralized approach. Hence, in
`a distributed multiway conference approach some of
`CU functions may be performed by the network
`while others can be performed within the end-stations.
`The MCU component in the centralized approach
`performs several key functions to enable a multipoint
`conference. These functions can be categorized as con-
`trol, multimedia distribution, and audio/video data
`a distributed environment as well, but where and how
`each of them are performed can vary. To better un-
`derstand how some of the MCU functions can be per-
`formed in a distributed fashion within the network it-
`self, we will examine the functions in more detail. We
`will then explore how the multicast capability in the
`internet and ATM environments can be exploited to
`support a multipoint conference.
`
`manipulation. Each of these functions are nee c i ed in
`
`Control. The control function includes such things
`as i) call setup which involves participant identifica-
`tion, authorization and capability exchange, ii) adding
`and dropping participants, and iii) call monitoring for
`floor control and to select video and audio for distrib-
`ution.
`
`Distribution. This function involves the distribu-
`tion of audio, video, and data from each participant
`to possibly no, some, or all other participants in the
`conference.
`
`ter suited for densely populated groups w h ile sender
`
`Stream Manipulation. In general, this involves
`such things as audio mixing, video composition, and
`reformatting (transcoding) of any of these streams
`to allow interoperability between incompatible end-
`stations. Some of these functions are beyond the scope
`of the H.231 standard, but can easily be envisioned as
`MCU extensions for heterogeneous environments.
`3.1 Multicast Support in the Internet and
`ATM Environments.
`Network multicast capability is essential for effec-
`tive distributed implementations of the MCU func-
`tions described above. Hence, in order to explore how
`these MCU functions can be implemented in a distrib-
`uted manner we first examine the multicast capabili-
`ties of the Internet and ATM network environments.
`Approaches for multicast support in the Internet
`and ATM environments, although originally quite dif-
`ferent, seem to be converging. Two very distinct
`multicast paradigms are sender versus receiver ini-
`tiated tree constructions.
`In general, the receiver
`initiated approaches, first advocated by the IETF
`(e.g., DVMRP [13], MOSPF [14], CBT [15 ), are bet-
`initiated approaches, first adopted by the ATM Fo-
`rum [17], are better suited to smaller sparse groups.
`In the receiver initiated approach described in
`[13, 14, 151, a conference is established by obtaining'
`a multicast address and advertising the time of the
`call along with the address to all participants. Con-
`nectivity is established amongst participants in the
`conference by creating a source-specific bi-directional
`multicast tree between all multicast routers using a
`broadcast. End-stations send connect messages to the
`nearest multicast router to subscribe to a particular
`multicast address. A router is pruned from the tree
`only if no end-stations have registered with the router
`to connect to the conference multicast address. Every
`so often (after a predefined timeout period) all routers
`are reconnected via a broadcast and then pruned again
`depending on each router's end-station registration.
`This receiver initiated multicast solution is ideal
`for large and densely connected multiparty conference
`models. It scales well to support many users, incurs
`no additional overhead for adding or deleting partic-
`ipants, and is robust in the event of failed routers or
`participating end-stations. It is not well suited, how-
`ever, to sparsely connected end-stations and offers lit-
`tle, if any, means of controlling participation.
`The PIM architecture has been DroDosed r161 to ad-
`dress both the case of sparse grouis afi well k9 densely
`populated groups. It supports two modes of operation,
`PIM SM and PIM DM, for sparse mode and dense
`mode, respectively.
`
`'No formal mechanism is currently defined for obtaining a
`unique multicast address.
`
`88
`
`CISCO Exhibit 1004, pg. 4
`
`
`
`or dedicated t unidirectional) multicast trees with the
`
`A sender initiated approach, first adopted by the
`ATM Forum 171, defines both shared (bi-directional)
`sender at the root and the recipients at the leaves. In
`this model a station must sequentially add each station
`it wishes to communicate with to its multicast tree.
`In the case of dedicated trees, each station wishing to
`send information must establish a multicast tree to all
`other participants in the conference. Here, in order for
`the Nth station to leave or join a conference IY trees
`must be reconstructed.
`This sender initiated approach is well suited to
`small group conferences, where participants eadh know
`about each other.
`It does not scale well far larger
`conferences since the overhead of adding and delet-
`ing participants is incurred by each end-station in the
`conference.
`The shortcomings of the sender initiated approach
`have motivated a recent contribution to the ATM Fo-
`rum describing a variety of alternative means for the
`support of ATM multicast services [18] some off which
`are closely modeled after the IETF schemes.
`3.2 Distributed Solution Issues
`The ideal multicast model for a distributed multi-
`party conference would allow each receiver to dynam-
`ically select which of the other participants it chooses
`to view. This requires a flexible receiver controlled
`multicast tree formulation. A separate shared multi-
`cast tree would carry audio in order for all participants
`to be heard. Finally, another multicast tree would dis-
`tribute the shared data.
`Distributed multiway conferencing schemes vary
`depending on the types of functionality they are de-
`signed to support. Let us consider how a distributed
`version the various multiway conference capabilities
`enabled by the four different MCU configurations de-
`scribed in Fig. l might be implemented. Configura-
`tion (a) requires that all participants be heard, but
`only that a single participant’s video be viewed at a
`time. TO achieve this, a single shared multicast tree
`can be used to carry audio as well as a single multicast
`tree for video. However, a special arbitration protocol
`is required to select which participant’s video to select
`at a given instant. A voice driven activation scheme
`could be implemented for participants to dynamically
`activate or halt their video transmission. Additional
`arbitration mechanisms may be required to :prevent
`two or more participants from transmitting video si-
`multaneously in the event that they possibly begin
`speaking simultaneously.
`Configurations (b) and (c) are similar in1 terms
`of network requirements, but differ in terms of end-
`station complexity. In both schemes, separate mutli-
`cast trees are required for each participant’s video. To
`receive a particular video source a participant must
`subscribe to the appropriate multicast tree. In order
`to be able to dynamically switch between sounces, the
`process of joining and leaving multicast trees nieeds to
`be efficient and fast. As far as the end-station com-
`plexity is concerned, in (b) each station only needs
`to be able to handle a single incoming video stream.
`However, in (c) each end-station must be capable of
`
`accepting multiple incoming streams and processing
`them accordingly to achieve a composite on the mon-
`itor.
`We have separated the elements of the distrib-
`uted multiway conference into the same components
`that form an MCU, namely, control, distribution, and
`stream manipulation. Each of these are discussed in
`detail.
`
`3.2.1 Control
`The first aspect of a distributed multiway conference
`model that may differ from the centralized MCU ap-
`proach is the separation of control streams from multi-
`media streams (e.g., video, audio, and data).. In fact,
`one solution might maintain a centralized approach
`for control via a designated server node in the net-
`work while implementing 2 distributed approach for
`the multimedia streams. A centralized control ap-
`proach would differ little from the MCU-based control,
`but require that the MCU establish appropriahe multi-
`cast connections for the multimedia streams. The cen-
`tralized control approach would suffer from the same
`scalability shortcomings as the MCU solution.
`In a centralized control structure, such operations
`such as video selection or caller acceptance can eas-
`ily be handled. For these functions to be performed
`in a distributed way requires more sophisticated algo-
`rithms, particularly to be performed in a timely and
`robust manner.
`An interesting topic is that of secure communica-
`tion. Further exploration is required to investigate
`how multimedia streams may be protected and en-
`crypted in a multiway conference. If the inultiway
`scheme is centralized and the MCU needs to decom-
`press the audio or video stream, then the MCU needs
`to be part of the secure system. If, on the other hand,
`the video and audio streams are passed directly be-
`tween senders and receivers then they can bc: sent u 5
`ing conventional encryption techniques.
`
`3.2.2 Distribution
`There is no provision of a global clock in today’s
`packet-switched networks. This introduces a poten-
`tial clock mismatch problem between the source and
`destination stations of a real- time video conferencing
`system. When dealing with constant bit-rate streams
`as the H.320 standard defines, a minor clock mismatch
`will eventually cause a buffer overflow or underflow.
`Another consequence of packet-based networks is
`jitter. Playout buffers may be incorporated at the
`receiver to absorb jitter and adapt clock mismatches,
`but these may also contribute considerably to the end-
`to-end delay. For example, if a system has a 1 Kbyte
`playout buffer and the system generates audio data
`at an 8 kHz sampling rate with 1 byte per sample at
`the source. Then, although the buffer size is relatively
`small, it already corresponds to 128 ms of delay time
`when it is built up from empty to full.
`A typical crystal oscillator has lo**(-4) frequency
`tolerance. Assume that the receiver has a crystal os-
`cillator with clock rate which is lo**(-4) slower than
`
`89
`
`CISCO Exhibit 1004, pg. 5
`
`
`
`the transmitter. In this case, for every second there
`will be 0.80 bytes more audio data generated at the
`transmitter than consumed at the receiver. The buffer
`will overflow from empty within 1280 seconds (about
`22 minutes), and the end-to-end delay also increases as
`the buffer starts to fill. This example indicates that
`in 22 minutes, the end-to-end delay will increase by
`128 ms due to the clock mismatch only. Given that
`the target end-to-end delay for real-time video confer-
`encing is on the order of 300 ms, this added delay is
`significant. Furthermore, since a major part of the 300
`ms delay budget is expected to be spent in the packet
`network, the end station delay should be minimized.
`
`There are various network or higher layer protocols
`for dealing with the real time clock issues. For ex-
`ample, the internet draft real-time transport protocol
`(RTP [20]) is proposed to provide end-to-end delivery
`services for data with real-time characteristics. RTP
`is a higher layer protocol which specifies that each
`packet should carry certain information such as time
`stamps and sequence numbers for use in network nodes
`(e.g., routers, bridges or translators) to provide real-
`time services. However, RTP itself does not provide
`any mechanisms to provide quality of services guaran-
`tees or real time delivery. Another such proposal was
`the Internet Stream Protocol (ST-I1 211) at the same
`
`layer as IP. ST-I1 is designed to use t h e pre-allocation
`
`of resources to allow packets to be forwarded with low
`delayed and low loss due to congestion. However, these
`schemes do not consider the effect of clock mismatch
`between the source and play back.
`
`The clock mismatch issue was most recently ad-
`dressed in the MPEG-I1 transport layer [22]. The
`MPEG-I1 transport layer periodically inserts a Pro-
`gram Clock Reference (PCR) signal within the MPEG
`I1 stream. The decoder has to slave the local clock to
`the source PCR in order to achieve source-destination
`clock synchronization. It is noted, however, that this
`scheme will not work properly in a packet network en-
`vironment which contains jitter nor in the variable bit
`rate (VBR) encoding environment. The low end-to-
`end delay requirement will also put an upper bound
`on the size of the receiver buffer. Unfortunately, this
`master/slave scheme fails completely in the multiparty
`environment.
`
`Although not all video signals need be viewed si-
`multaneously for a videoconference to be effective, all
`audio inputs must be heard by each participant. For a
`large or even moderate size conference it is not prac-
`tical to mix all the audio signals at each end-station.
`Hence, for a multiway conference an effective way of
`supporting a larger number of participants is through
`the use of silence detection. By using silence detec-
`tion mechanisms at each source and only transmitting
`audio during active periods, only a finite number of
`audio signals need to be mixed at any given instant.
`When more than this finite number are active simul-
`taneously, the mixed audio signal is not intelligible
`anyway.
`
`90
`
`3.2.3 Stream Manipulation
`When going through an MCU and composing multi-
`ple video streams there are 2 options. i) decode each
`incoming stream, compose the selected stream, and re-
`compress the composite. This approach is necessary if
`the bandwidth of the composed stream is to equal the
`bandwidth of a single incoming stream. The decom-
`pression and recompression adds delay to your end-to-
`end latency, and to allow customized composition for
`each recipient is very expensive. ii) create a composite
`of multiple streams in the compressed domain. This is
`advantageous in terms of delay, but by combining, say
`4 QCIF streams to create a CIF stream, your outgo-
`ing stream is 4 times the bandwidth of each incoming
`stream. Note that if these composites are customized
`for each recipient then you could potentially have N
`composite streams.
`One of the attractive features of ATM is the mul-
`ticast capability. Multiway conferencing is an ideal
`application to exploit this. Assuming N participants
`in the conference we can also assume that there is
`only a limited number of participants that each video
`recipient views at a given time. So, by selection of
`who each participant chooses to view, multicast trees
`are set up between each source and those recipients
`which select it. The trees change whenever a recipi-
`ent changes a selection. Now, the number of streams
`is greatly reduced and each recipient has full control
`over the selection of who to view.
`4 Multimedia Multiparty Teleconfer-
`encing -a Distributed Solution
`Due to its multi-disciplinary nature, validation and
`evaluation based on a working test-bed are very impor-
`tant aspects in the research and development of a mul-
`timedia desktop conferencing system. Consequently,
`the Multimedia Networking Group at IBM T. J. Wat-
`son Research Center is engaged in the development
`of a comprehensive multimedia multiparty teleconfer-
`ence (MMT [lo]) prototype. One of the objectives of
`the MMT project is to investigate various aspects of
`the distributed multiparty conference model.
`4.1 MMT structure
`The current MMT prototype is a JPEG-based sys-
`tem designed for distributed multiparty conferenc-
`ing over a packet switched network. The hardware
`configuration of the MMT system (see Fig. 2) con-
`sists of: a video/audio capture and display card
`(VAC), a CODEC and DSP card (MMTCNTL), and
`an optional network communication adapter. The
`VAC card is responsible for digitizing the incom-
`ing video/audio signal and the reverse process of
`video audio playback. The VAC card will accept
`
`NTS L as an input signal and generate an NTSC signal
`
`as the output.
`In the MMT system audio, video, and data packets
`are sent as separate streams. This allows receivers to
`elect to receive only an audio or data stream from a
`particular sender rather than be forced to consume all
`three. This is particularly useful for stations with no
`video processing capability. Silence detection is imple-
`
`CISCO Exhibit 1004, pg. 6
`
`
`
`(a)
`MMT End-station: A
`
`memory 40
`
`(4
`MMT End-station: B
`
`+
`
`audio
`video
`in/out
`
`Fi ure 2: MMT system block diagram with alternative communication modes: (a) Via the on-board Ethernet.
`(by Via the microchannel bus through a separate network adapter.
`
`mented such that there is no audio packet transmission
`during silence periods. The VAC card is also c,apable
`of removing the annoying frame segmentation effect,
`known as shearing effect, in the playback.
`The MMTCNTL card implements the video com-
`pression and decompression. The DSP algorithm im-
`plements audio mixing and video composition such
`that the end user can choose to hear or view any
`arbitrary conference participant(s). The DSP code
`also implements an algorithm for concealing the ef-
`fect of clock mismatch, lip synchroni~ation, and packet
`loss. There is a direct interface between the VAC and
`MMTCNTL adapter. This enables the video and au-
`dio data to bypass the microchannel bus and frees
`CPU power for data sharing or other applications.
`There is also an on-board 10 base-T Ethernet com-
`munication system in the MMTCNTL. This on-board
`10 base-T interface enables MMT to connect dlirectly
`to the LAN without the intervention of the CPU. In
`addition to the on board l0base-T interface, the com-
`pressed data can be sent up to the PC system memory
`via the microchannel bus. This enables the system to
`be configured with any IP/UDP supported connmuni-
`cation adapter.
`4.2 Network Multicast Support
`Although multicast is a well-know concept, it has
`not been widely implemented in existing networks.
`Local area networks provide a hardware multicast ca-
`pability but it is not available at the higher protocol
`layers. Consequently, to achieve the desired multicast
`capability several approaches were experimented with
`
`using the MMT system:
`1. A direct interface to the ORBIT gigabit network.
`ORBIT is an experimental gigabit network and
`is one of the five national gigabit Lestbeds coor-
`dinated by CNRI. ORBIT implements a propri-
`etary network multicast function. As such, each
`MMT station sends one multicast type packet and
`receives all appropriate multicast packets from
`other participants.
`2. Connection to an Ethernet switch hub. This hub
`implements the Distance Vector Multicast Rout-
`ing Protocol (DVMRP). By establishing a tunnel
`between two hubs, the switch hubs are able to
`provide a multicast IP network superimposed on
`top of the Internet.
`3. Packet replication over the traditional Internet.
`Since there is no multicast support in this envi-
`ronment the MMT stations were made to simu-
`late the multicast by sending out multiple dupli-
`cated packets for many point-to-point sessions for
`each individual destination.
`The current networking trend suggests that router
`vendors will soon be offering multicast support. This
`will make the general network multicast more avail-
`able in the near future.
`4.3 Video Composition
`pressed Domain
`The network multicast capability allows each re-
`ceiving station to receive video and audio packets from
`
`in the Com-
`
`91
`
`CISCO Exhibit 1004, pg. 7
`
`
`
`multiple conference participants simultaneously. The
`MMT system allows each user to individually select
`which participants to view or hear. As stated ear-
`lier, multiple audio streams are audible simultane-
`ously through the audio mixing subsystem. MMT
`also implements a technique for viewing multiple video
`streams simultaneously. This capability is typically
`implemented in the MCU. However, within a distrib-
`uted environme