throbber
THE UNIVERSITY or TEXAS AT AUSTIN
`Tl’-IE GENERAL LIBRARIES
`"""'--‘v --an-..""_....... . ._._.-.___
`
`PROCEEDINGS
`
`ACM Multimedia '95
`
`San Francisco, California, November 5-9, 1995
`
`
`
`Sponsored by the ACM SIG MULTIMEDIA, SIGCHI, SIGGRAPH,
`SIGMIS, SIGBIO, SIGCOMM, SIGIR AND SIGOIS, in cooperation with
`ACM SIGAPP, SIGCAPH, SIGMOD and SIGOPS
`
`Q
`
`RPX Exhibit 1224
`RPX Exhibit 1224
`RPX V. DAE
`RPX v. DAE
`
`

`
`Australiamew Zealand
`
`Addison-Wesley Publishers Pty. Ltd.
`6 Byfield Street
`-
`North Ryde, N.S.W. 2113
`Australia
`Tel: +61 2 878 5411
`Fax: +61 2 878 5830
`
`Latin America
`
`Addison-Wesley Iberoamericana S.A.
`Boulevard de las Cataratas #3
`Colonia Jardines del Pedregal
`Delegacion Alvaro Obregon
`01900 Mexico D. F.
`Tel: +52 5 660 2695
`Fax: +52 5 660 4930
`
`Canada
`
`Addison-Wesley Publishing
`Ltd.
`26 Prince Andrew Place
`Don Mills, Ontario M3C 2T8
`Canada
`Tel: +416 447 5101
`Fax: +416 4-43 0948
`
`(Canada)-
`
`l
`
`g
`
`i J I
`
`'
`
`‘
`
`The Association
`Machinery, Inc.
`1515 Broadway
`New York, NY'10036
`
`for
`
`Computing
`
`Nonmember orders from outside the
`U.S. should be addressed as noted
`below:
`
`‘9 1995 by the Association
`Copyright
`for Computing Machinery, Inc. (ACM).
`Permission to make digital or hard
`copies of parl: or all of this work for
`personal or classroom use is granted
`without fee provided that the copies are
`not made or distributed for profit or
`commercial advantage and that copies
`bear this notice and the full citation on
`the
`first
`page.
`Copyrights
`for
`components of this work owned by
`others than ACM must be honored.
`Abstracting with credit is permitted. To
`copy otherwise, to republish, to post on
`seniers, or
`to redistribute to lists,
`requires prior specific permission andl
`or a fee. Request permission to re-
`publish from: Publications Dept. ACM,
`inc. Fax +1 (212) 869-0481 or <
`permissions@acm.org>.
`For
`other
`copying of articles that carry a code at
`the bottom of the first or last page or
`screen display, copying is permitted
`provided that the per-copy fee indicated
`in
`the code
`is paid through the
`Copyright Clearance Center,
`222
`Rosewood Drive, Danvers, MA 01923.
`
`...In proceedings of the ACM Multi-
`media, 1995 (San Francisco, CA, USA,
`November 5-9. 1995) ACM, New York,
`1995, pp. 23-35.
`
`Ordering Information
`
`Nonmembers
`
`Nonmember orders placed within the
`U.S. should be directed to:
`
`Addison-Wesley Publishing Company
`Order Department
`Jacob Way
`Reading, MA 01867
`Tel: +1 800 447 2226
`
`Addison-Wesley will pay postage and
`handling on orders accompanied by
`check. Credit card orders may be pieced
`
`by rnail or by calling the Addison-
`Wesley Order Department
`at
`the
`number
`above.
`Follow-up
`inquiries
`should be directed to the Customer
`Service Department
`at
`the
`same
`number. Please include the Addison-
`Wesley ISBN number with your order:
`
`A-W ISBN 0-201-87774-0
`
`Europe.’Middle East
`
`Addison-Wesley Publishing Group
`Concertgebouwplein 25
`1071 LM Amsterdam
`The Netherlands
`Tel: +31 20 6717296
`Fax: +31 20 6645334
`
`Germanylllustrialswitzerland
`
`Addison-Wesley Verlag Deutschland
`
`GmbH4
`Hildachstrasse 15d
`Wachsbleiche 7-12
`53111 Bonn
`Germany
`Tel: +49 228 98 515 0
`Fax: +49 228 98 515 99
`
`United l<ingdomlAfrica
`
`Addison-Wesley Publishers Ltd.
`Finchampstead Road
`Wokingham. Berkshire RG11 2NZ
`United Kingdom
`Tel: +44 734 794000
`Fax: +44 734 794035
`
`After January 1. 1996:
`Addison-Wesley Longman Publishers,
`Ltd.
`Longman House
`Burnt Mill
`Harlow, Essex CM202JE
`United Kingdom
`Tel: +44 1279 623 623
`Fax: 4441279 431 059
`
`Asia
`
`Addison-Wesley Singapore Pte. Ltd.
`15 Beach Road
`#05-02/09:’ 10 Beach Centre
`Singapore 0718
`Tel: +65 339 7503
`Fax: +65 339 9709
`
`Japan
`
`Addison-Wesley Publishers Japan Ltd.
`Nichibo Building
`1-2-2 Sarugakucho
`Chiyoda-ku, Tokyo 101
`Japan
`Tel: +81 33 291 4581
`Fax: +81 33 291 4592
`
`ACM Members
`A limited number of copies are available
`at the ACM member discount. Send
`order with payment in US dollars to:
`
`ACM Order Department
`P.O. Box 12114
`Church Street Station
`New York, N.Y. 10257
`
`OR
`
`ACM European Service Center
`Avenue Marcel Thiry 204
`1200 Brussels
`
`Belgium
`
`Credit card orders from U.S.A. and
`Canada:
`
`+1 800 342 6626
`
`New York Metropolitan Area
`outside of the U.S.:
`+1 212 626 0500
`Fax +1 212 944-1318
`Email:acmpubs@acm.org
`
`and
`
`include your ACM member
`Please
`number and the ACM Order number
`with your order.
`
`ACM Order Number: 433952
`ACM ISBN: O-89791-751-0
`
`
`
`

`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`

`
`128 kb/s H.261 [181 streams for the MBone.
`
`2 BACKGROUND AND MOTIVATION
`
`In this section, we motivate the need for our video gateway
`architecture by giving several example scenarios.
`
`2.1 BAGNet and MBone
`
`One of our goals is to integrate the MBone applications into
`the BAGNet environment. We briefly describe BAGNet and
`MBone environments. then motivate the need for the video
`gateway by discussing issues of running existing MBone ap-
`plications across both BAGNet and the general Internet.
`BAGNet is an OC—3C (155 Mb/s) ATM network that con-
`nects 15 San Francisco Bay Area sites including universi-
`ties, government and industrial research labs. One goal in the
`BAGNet project is to develop and deploy teleseminar appli-
`cations with high quality video and audio. Currently, each
`site has only several workstations connected directly to the
`ATM network, and one of these machines is used as a gate-
`way to connect BAGNet with other machines in the organi-
`zation. Typically. ethernets are used within each organiza-
`tion. In a telcseminar scenario. one of the machines directly
`attached to BAGNet will multicast high quality thus high
`rate video across BAGNet. A stream of full motion JPEG
`
`compressed NTSC quality video will consume about 6 Mb/s
`bandwidth. While multiple 6 Mb/s video streams can be eas-
`ily supported within BAGNet, transmitting even one of these
`streams onto an ethernet is not practical as the ethernets are
`shared. In addition, it may be desirable to transmit the video
`across the entire MBone where the aggregate bandwidth is
`only about 500 kb/s for all sessions. Therefore, we would
`like to accommodate three types of receivers in such an en-
`vironment: hosts connecting directly to BAGNet, hosts con-
`necting to BAGNet via ethernets and routers. and hosts on
`the global MBone.
`Figure 1(a) shows such a scenario. Assume that host H0
`is transmitting high quality IPEG video at 6 Mb/s. The
`ATM switch S forwards the transmission to BAGNet and to
`a router R0 connected to an ethernet. which contains among
`its hosts an MBone router R1. In order to not flood the eth-
`
`ernet, we run a video gateway on R0 that transcodes the
`6 Mb/s JPEG down to I Mb/s. Thus, ethernet hosts such as
`H] still receive the video transmission at a reasonably high
`quality. We place an additional video gateway between R1
`and the MBone. This gateway further reduces the bandwidth
`requirements by transcoding the IPEG stream to H.261 and
`limits its output rate to 128 kbfs.
`
`2.2 Linking Remote MBone Sites
`
`Another application of our video gateway architecture is to
`link several remotely located MBone sites together. Con-
`sider the following example: a seminar at Berkeley is mul-
`ticast locally to the campus. and a group of researchers at
`Carnegie Mellon University wants to tune in and participate.
`Unfortunately, a scenario like this — where small groups
`communicate via long haul networks — is not effectively
`supported by the current IP Multicast infrastructure.
`
`The major mechanism of limiting the scope of multicas-
`ting traffic over today's MBone is use of TTL. The mecha-
`nism works as follows. Each multicast packet is assigned a
`Time-To-Live number in the IP header and each link is as-
`
`signed a static threshold value. If the packet’s tirne—to-live
`value is less than the threshold, the packet is not forwarded.
`By placing larger thresholds on the links that connect re-
`mote sites and using smaller TTL numbers for each multi-
`cast packet, the multicast traffic can be contained in a local
`region.
`In the example mentioned above, in order for researchers
`at CMU to receive the multicast video, the video has to be
`sent out from Berkeley using a ’I'I'l_. value that is higher than
`the threshold value of cross—country links, which means that
`the video will be distributed across the entire country. This
`is not satisfactory.
`With our video gateway, this problem can be easily solved
`by placing one video gateway at each site, and linking the
`two video gateways by unicast connections. Figure l(bl il-
`lustrates this scenario. Multicast video from CMU is for-
`
`warded by the local video gateway via unicast connection to
`the remote video gateway, which in turn multicasts the video
`to the remote site at UCB.
`
`2.3 Multicast Video across ISDN Links
`
`Most MBone transmissions use a target rate of 128 kb/s for
`video and 64 l-this for audio, precluding simple bridging of
`MBone sessions across a 128 kb/s ISDN line. By transcod-
`ing a 128 kb/s H.261 coded video stream into it 64 kb/s
`H.261 stream (and by similarly adapting the audio bit rate).
`an MBone session can be bridged over an ISDN link. as de-
`picted in Figure l(c).
`
`2.4 Multicast Video over Wireless Links
`
`Wireless links are characterized by relatively low bandwidth
`and high transmission error rates. Figure 1(d) illustrates the
`role of the tideo gateway in a topology containing mobile
`wireless hosts. By placing a video gateway at the basestation
`router (BS), we can transcode the incoming video stream to a
`lower banclvt iclth stream and control the rate of output trans-
`mission over the wireless link.
`
`3 THE VIDEO GATEWAY
`
`We now address the design of a video gateway architecture
`that will flexibly support the configurations discussed in the
`previous section.
`In order to interoperate with the existing
`MBone video tools, the video gateway must be RTP com-
`patible. In this section, we give a brief overview of RTP. dis-
`cuss the ramifications of transcoding an RTP data stream. and
`propose an architecture that can perform this transcoding in
`a protocol-consistent fashion.
`
`3.1
`
`Fleal-time Transport Protocol
`
`The Real-time Transport Protocol [14] is an application-level
`protocol that is designed to satisfy the needs of multi-party
`multimedia applications. In the IP protocol stack. RTP lies
`
`
`
`256
`
`

`
`BAGnet
`
`-’
`
`V.
`
`'‘=“?’”*- Ezamv
`
`j
`
`.
`
`i
`
`Um;
`
`
`
`(bi
`
`((1)
`
`(C)
`
`I
`
`Figure l: Four example scenarios for a video gateway.
`
`just above UDP. The protocol consists of two parts: the data
`transfer protocol RTP and the control protocol RTCP.
`Each RTP data packet consists of an RTP packet header
`and the RTP payload. The packet header includes a sequence
`number. a niedia—spccific timestamp, and a synchronization
`source (SSRC) identifier. The SSRC provides a mechanism
`for identifying media sources in a fashion independent ofthe
`underlying transport or network layers. End-hosts allocate
`their SSRC randomly and hecause the SSRC’s must be glob-
`ally unique within an RTP session. a collision detection al-
`gorithm is employed to avoid conflicts. All packets from an
`SSRC form part of the same timing and sequence number
`space. Receivers group packets by SSRC identifiers for play-
`back.
`
`The RTP control protocol (RTCP) provides mechanisms
`for data distribution monitoring. cross—media synchroniza-
`tion, and sender identification. This control information is
`disseminated by periodically transmitting control packets to
`all participants in the session, using the same distribution
`mechanism as for data packets. The transmission interval
`is randomized and adjusted according to the session size
`to maintain the RTCP bandwidth below some configurable
`limit.
`
`Distribution Monitoring. The primary function of RTCP
`is to provide feedback to the session on the quality ofthe data
`CllSlI'ii)Ll1i0l'l. This information is critical in diagnosing fail-
`
`ures and monitoring performance. and can be used by ap-
`plications to dyI1E11l1iCz1li_\' adapt to network congestion [3].
`Monitoring statistics are disseminated from active sources
`via RTCP sender reports tSR) and from receivers back to the
`entire session \ ia reccix er reports (RR). The SR statistics in-
`clude. among other things. the sender's cumulative packet
`count and sender's cumulative byte count. Each receiver
`generates a separate reception report for each active source.
`The RR statistics include a ctnnulanve count oflost packets.
`a short-term loss indicator. an estimate of the _iitter- in data
`packets, and timestamps for round-trip time estimation.
`Synchronization. Since media streams are distributed
`on distinct RTP sessions (with distinct SSRCSJ. the proto-
`col provides a mechanism for synchronizing media across
`sessions, which relies on canonical-nante (CNAME) identi-
`fiers and SR5. Each SR contains that source’s CNAME and
`
`the correspondence between local real—time and the media-
`specific time units. By matching sender reports across dif-
`ferent media according to their CNAME, a receiver can align
`the time offsets of different media streams to reconstruct the
`
`original synchronization.
`Sender Identification. Each session participant identi-
`fies itself by binding text descriptions to their SSRC using
`“source description" (SDES) messages.
`In addition to the base protocol semantics. the RTP spec-
`ification defines two types oi" "intemiediate systems": mit-
`
`257
`
`
`
`

`
`ers and translators. Mixers concentrate multiple incoming
`streams into a single outgoing Stream.
`In order to combine
`streams, a mixer might inject buffering delay to counteract
`network jitter, and thus must resynehronize the streams. The
`mixer therefore has its own SSRC identifier and appears ex-
`plicitly in the session. Translators, on the other hand, do not
`resynchronize the data stream and do not necessarily appear
`in the session.
`
`o operates in a session-transparent fashion.
`
`We achieve these goals with an architecture that effectively
`passes all RTCP source identification information directly
`though the gateway, but decouples the flow of sender and re-
`ceiver reports. Instead, sender and receiver reports are mod-
`ified to refiect the distribution quality as seen by the gateway.
`
`
`3.2 The Gateway Architecture
`
`In theory, the basic gateway architecture is straightforward:
`we merely establish an application-level path that joins two
`separate RTP sessions and properly transforms the RTP data
`and control streams. RTP data packets are especially easy to
`handle. We can simply forward a data stream from one ses-
`sion to the other, either unmodified or transcoded to adapt the
`transmission bandwidth. The bandwidth can be further regu-
`lated by applying a rate-limiter to the output. Since RTP data
`streams are self-describing and “stateless”, there is no spe-
`cial processing or external signaling to be carried out in or-
`der to instantiate the transformed stream. By directly bridg-
`ing the two sessions. we implicitly merge the SSRC identi-
`fier space and the collision resolution algorithm penetrates
`the gateway.
`However, a problem surfaces when we consider RTCP
`packets. While SDES messages can flow through the gate-
`way unmodified, synchronization information and SR/RR
`distribution statistics are problematic. When a gateway
`transcodes a stream, the packet counts, byte counts. and tim-
`ing information will all be affected. Hence, RTCP packets
`cannot be simply forwarded across the gateway -- they must
`be modified in a self—consistent fashion.
`
`When we first began the design of our video gateway, we
`encountered an inadequate treatment of this‘ problem in the
`protocol definition. Revision 6 of the RTP draft specification
`accounted only for mixers that resynchronized streams and
`translators that did not modify the data packets. Because we
`wanted our architecture to admit data transcoding, we were
`required to use a mixer design. However. because a mixer
`becomes the new point of synchronization, a video gateway
`based on this model could not preserve the semantics of the
`RTP cross-media synchronization information. That is, the
`newly generated video stream could no longer be synchro-
`nized with an audio stream generated at the original source.
`One immediate solution to this problem is to implement a
`combined audio/video gateway that preserves synchroniza-
`tion across the system, but this complicates the design and
`introduces unnecessary resynchronization delay.
`As a result of our interactions with the RTP designers,
`new definitions of translators and mixers appeared in revi-
`sion 7 of the protocol draft. The modified specification re-
`laxes the constraint on translators, leaving more freedom to
`the application-gateway designer.
`The revised RTP specification allows us to build a
`transcoding gateway that:
`
`o preserves cross-media synchronization information
`without the need to re—time the media;
`
`o offers minimal delay, since data packets can often be
`forwarded as soon as they arrive: and.
`
`Figure 2: A video gateway between two RTP multicast
`groups. Group fit’;
`is transmitting JPEG xideo whereas
`group flufg has capability to only receive or transmit H.261.
`
`
`
`in
`More specifically. consider the scenario in Figure 2.
`which two multicast groups tit] and Mg are joined by the
`video gateway, VGW. Assume that the All session contains
`high-rate JPEG sources as well as low-rate H.261 sources.
`while the M2 session contains only H.261 sources. Further
`assume that the gateway is configured to transcode JPEG to
`H.261 in the M1/M2 direction. In our architecture. the gate-
`way simply forwards l-1.261 packets to the alternate session
`(in either direction). but transcodes JPEG packets to H.261.
`While data packets are transcoded in a straightforward man-
`ner, RTCP sender and receiver reports must be handled as
`follows:
`
`a On receiving an SR from it JPEG source in M-,_ VOW
`modifies the sender statistics to refiect the H.261 statis-
`
`tics of the transcoded stream originating from VOW.
`
`I On receiving an SR from an H.261 source in either J11
`or M-2. VOW forwards the packet umnoditied.
`
`. On receiving an RR from a recssiver in M3. which is re-
`porting on a transcoded source from M1. \'(3\\' modi-
`fies thc reception statistics to reliect the JPEG reception
`statistics seen by VGW.
`
`o On receiving an RR front a receiver in _"iI1 or _lI~;._ which
`is reporting on a non-transcoded source. VGW font ards
`the packet unmodified.
`
`In all cases. the modified (or unmodified) RTCP packet is im-
`mediately forwarded to the other session.
`By decoupling the flow of distribution reports across the
`gateway, we prevent a source in one half of the session from
`learning about distribution problems in the other half of the
`session. This might raise concerns over the impact on con-
`gestion avoidance algorithms where a source relies on such
`information to adjust its transmission rate. However. our ex-
`plicit decoupling is in fact a viable configuration even in the
`presence of adaptive rate control. Recall that the original
`
`258
`
`
`
`

`
`motivation of the transcoding operation is to provide band-
`width adaptation. Rather than adjust the rate at the source, it
`would be better to adjust the rate at a point cioser to the con-
`gested receiver, thereby preventing unnecessary rate backo ff
`for well-connected receivers. Thus, it makes more sense for
`the gateway source to run its own congestion control loop.
`
`3.3 Gateway Control
`
`The gateway architecture described in the previous section
`provides a general mechanism for bandwidth adaptation and
`session aggregation. This mechanism must somehow be
`controlled and configured. Hence. we export an interface
`that allows remote agents to dynamically configure the gate-
`way. The control interface allows agents to:
`
`o establish rate controls for each stream as well as the
`
`global session,
`
`0 define rules for selecting output formats from input for- ,
`mats,
`
`o define compression parameters for each transcoded
`stream (e.g.. quantizers), and
`
`filters (e.g..
`o install
`video").
`
`to forward only the "lecturer’s
`
`This decomposition separates mechanism from policy. By
`defining a control interface in this fashion, we displace the
`policy level functionality associated with conference control
`to external agents. The design ofsuch external control agents
`are beyond the scope of this paper.
`
`4 TRANSCODING
`
`We now discuss the transcoding process by which the video
`gateway maps a compressed video bit stream into a different
`(lower-rate} bit stream either by employing the same coin-
`pression format with alternate parameters or by employing
`another format altogether. We call the agent that performs
`this transcoding algorithm a transcoder.
`
`4.1 Design
`
`A high-level depiction of our transcoder design is given in
`Figure 3. The basic model involves transcoding from some
`set of supported input formats to a (possibly different) set
`of output formats. Each input format is handled by a mod-
`ule that decodes the incoming bit stream into an intermediate
`representation. This representation is transformed and deliv-
`ered to an encoder, which produces a new bit stream in a new
`(or possibly same) format.
`If we restrict our design to rely on a single format for the
`intermediate representation, we would be forced to adopt a
`pixel representation since this is the least common denom-
`inator for all compression schemes. But requiring every
`transcoder configuration to decompress all the way to the
`pixel domain, and then re-encode the stream from scratch.
`would impose a substantial performance penalty.
`Instead.
`
`we allow for multiple intermediate formats. In this v.a_\. en-
`coder/decoder pairs can optimize their interaction by choos-
`ing an appropriate intermediate format. For example. DCT-
`based coding schemes like JPEG and H.261 could be more
`efficiently transcoded using DCT coefficients. bypassing the
`forward and reverse transform of each block.
`
`Because we have decomposed the transcoder into sepa-
`rate encoder and decoder stages, the system is easily extensi-
`ble. To support a new input format, for example, onl} a sin-
`gle encoder module needs to be implemented. Additionally,
`to support multiple intermediate formats in a flexible way,
`we define format translators that map between intermediate
`formats. For example, we might have a JPEG decoder that
`produced only a DCT output representation. which needed
`to be encoded by a wavelet-based coding scheme that ac-
`cepted only pixel input. Rather than implement a separate
`JPEG decoder that produces pixels directly. we implement
`a generic DCT to pixel converter. Ifthe performance of this
`generic approach is unsatisfactory, it can be optimized b_\ im-
`plementing matched encoder/decoder pairs.
`By decoupling the encoder and decoder through an inter-
`mediate format, we can additionally perform generic trans-
`formations on the canonical image data. Such transforma-
`tions are vital to the video gateway because large reductions
`in bandwidth between the input format and output format
`cannot be attained exclusively with encoder parameter and
`format choices. For example, bit rates below 64 kb/s are
`more or less infeasible for full-sized NTSC streams at 30 f/s.
`
`Instead, using our transformation model, we can appl} ag-
`gressive temporal and spatial decimation on top of a format
`conversion. We also might need to perform a frame geom-
`etry conversion since not all compression schemes support
`the same resolutions (for example. H.261 supports onl_\ the
`“Common Interchange Format"). or color decimation con-
`version since different formats downsample the chroniinancc
`planes differently (e.g.. 4: l :l versus 4:212 YL'\7).
`
`4.2 A JPEGIH.261 Transcoder
`
`factors determined the focus of our mi-
`Two principal
`tial transcoder design:
`(1) the widespread deployment of
`Motion-JPEG hardware and the decision to utilize JPEG as
`
`the primary compression scheme for BAG.\'et. and ('3: the
`large installed base of MBone applications that can decode
`H.2ol (vic and ivs) as well as the low bit-rate requirements
`of H.261. In the rest of this section, we describe the design
`of an efficient JPEG to H.261 transcoder.
`
`The H.261 specification defines two types of video blocks:
`inrra-mode blocks which are independent of past blocks. and
`inter-mode blocks which are predicted from previous blocks.
`Because inter-mode blocks are coded as differences. a lost
`
`update will cause a reconstruction error that persists until an
`intra-mode block refresh. At low bit rates. this interval can
`be substantial.
`
`Intra-H.261. A solution to this problem is to avoid inter-
`mode blocks altogether and instead code every block in intra-
`mode. This approach, called Intra-H.261 [9l. relies on "con-
`ditional replenishment“ instead of motion compensation to
`reduce the bit rate. In conditional replenishment. the video
`image is partitioned into small blocks and only the blocks
`
`259
`
`
`
`

`
`
`
`Figure 3: The basic design of the transcoder. Any of several supported input formats can be converted into any supported output
`fonnat. The intermediate stage denoted by T performs a transformation on the internal representation. when necessary. to match
`the conventions of the decoder and encoder. In addition to any bandwidth reduction inherent in format conversion. the output
`can be rate—controlled by decoupling the generation of output frames from the arrival of input frames.
`
`
`
`Intra—H.26l carries out condi-
`that change are transmitted.
`tional replenishment by using H.261 macroblock addressing
`to skip over unreplenished blocks.
`Since an Intra—H.26l coder never utilizes inter-mode
`
`it need not perform motion compensation, which
`blocks,
`substantially reduces the run-time complexity. Moreover. we
`can easily build a coder that takes DCTs instead of pixels
`as input.
`In this way. the encoder complexity is further re-
`duced because it need not compute DCTS. This property is
`exploited in the data-flow path optimization which is detailed
`in the next section.
`
`4.2.1 Data Flow Optimizations
`
`
`
`.1
`
`"'~':\
`
`I/idaoFra~mes
`
`JPEG—-
`
`5"
`
`|—-
`
`H.261 -~
`
`"Bid
`
`- oer
`
`|-
`
`— - Cc}
`
`--1
`
`frames are input to an H.261 encoder. which performs mo-
`tion compensation on each block and codes the motion resid-
`ual using an approach similar to JPEG.
`This decomposition is straightforward to implement since
`it can be built simply by interfacing existing coders. How-
`ever, the resulting performance. for a soft“ are implementa-
`tion,
`is unsatisfactory. We experimented with exact];
`this
`configuration using the fairly well—tuned .iPEG decoder and
`(lntra-)l-1.261 encoder from vie. Even on a fast workstation
`tSGI lndy. 133MHz MIPS R-i600SC), the nai\e transcoder
`configuration operated only in the 6 to [3 f/s ran~__-e. well be-
`low real-time performance.
`
`
`Der
`
`I C
`
`R?
`
`
`
`ea) —_3o
`
`r-— :1
`
`~w—@~© re
`
`Figure 4: A generic IPEG/H.261 transcoder.
`
`
`
`Figure 5: An optimized IPEG/H.26l transcoder.
`
`Figure 4 shows the basic data flow of :1 generic implementa-
`tion of a J PEG/H.261 transcoder. The input to the transcoder
`is a stream of JPEG—encoded frames. Each frame is decoded
`into pixels. which involves entropy decoding, reverse quan-
`tization, and reverse DCT. Once in the pixel domain. the
`
`
`
`The two primary bottlenecks in the transcoder are the DCT
`computations and the memory traffic that results from ma-
`nipulating uncompressed images. Reducing or eliminating
`
`260
`
`
`
`

`
`these bottlenecks would dramatically improve the transcod-
`ing performance. By carrying out conditional replenish-
`ment in the DCT domain and by utilizing an Intra-H.261
`that takes DCTS as input, we can do just this. The modi-
`fied data flow path is shown in Figure 5. Here the output of
`the IPEG inverse quantizer is redirected to the input of the
`H26! quantizer thereby bypassing the two transfonn com-
`putations. Furthermore, since conditional replenishment is
`carried out in the DCT domain, we can prune much of the
`memory traffic by deciding early that a block is to be skipped.
`There are two complications with our modified data path:
`resolution conversion and chrominance subsampling.
`Resolution Conversion. JPEG is typically transmitted in
`either PAL or NTSC format, while H.261 is CIF (or l/4-
`CIF) resolution. Instead ofemploying an expensive scale op-
`eration, we simply downsample the input (if necessary) to
`approximately match CIF or II4-CIF resolution. Then, for
`NTSC. the frame is embedded in a CIF geometry using a gray
`border, and for PAL, a few pixels at the vertical edges of the
`frame are cropped away.
`Chrominance Subsampling. Because most hardware-
`based JPEG codecs support only 4:2:2 YUV decimated
`video while H.261 requires 421:] decimated video, we must
`subsample the JPEG chrominance planes vertically by two.
`This is trivial in the pixel domain: simply drop every other
`pixel (after optionally low-pass filtering to avoid aliasing).
`A brute force approach is to compute the reverse 8x8 2D
`DCT of two vertically adjacent blocks, downsample the re-
`sulting 16x8 block to an 8x8 block, and forward transform
`this 8x8 block. However, this defeats our fast-path optimiza-
`tion. which relies on avoiding DCT computations.
`It turns
`out that pixel subsampling can be carried out by operating
`on the DCT coefficients directly.
`Since the 2D DCT is a separable transform. we need only
`consider the one dimensional downsarnpling problem; that
`is, we can operate on each column of the 2D DCT indepen-
`dently. The appendix contains a derivation of an (approxi-
`mate) algorithm for computing the of downsampling a 16-
`point signal using operations performed directly on the two
`8-point DCT coefficients of the two halves of the lo-point
`signal. The output is an 8-point DCT of the subsampied sig-
`nal. By running this algorithm over each of the coiumns of
`two vertically adjacent 8x8 DCTs. we effectively subsample
`the underlying signal. Though our algorithm is only approx-
`imate, it is fast and produces good qualitative results.
`
`4.3 Addressing Some Weaknesses
`
`Two commonly cited weaknesses of transcoding systems are
`increased end-to-end delay and an incompatibility with en-
`cryption. Increased delay is incurred, for example, when en-
`coding JPEG to MPEG (with B frames) because the encoder
`must collect several frames into a “group of pictures" be-
`fore generating new output. However, for the compression
`formats commonly used in the Internet, this is not a prob-
`lem. The packet formats of the schemes based on condi-
`tional replenishment are intentionally designed so that each
`packet may be processed independently of the rest. Hence.
`the theoretical transcoding delay is zero (modulo processing
`and store and forward delays) because we can transcode each
`packet as soon as it arrives.
`In practice. our .lPEG/‘H.261
`
`transcoder waits until an entire JPEG frame arrives. Since
`traffic is usually smoothed at the source, we incur one frame
`interval of delay. On the other hand, if the JPEG packets ar-
`rive in order, they could be decoded incrementally and output
`packets generated “on the fly".
`The other complaint against transcodin g systems is their
`incompatibility with encryption. This argument is based on
`the view that the transcoding agent would be integrated into
`the network architecture and that privacy would be compro-
`mised by entrusting the network with the encryption key.
`However, in our model, the transcoder is an application level
`agent that is deployed by the user, within that user's admin-
`istrative domain. Thus, the user will be able to configure the
`transcoder with the session keyfs) in a secure fashion.
`
`5
`
`IMPLEMENTATION
`
`We have implemented our gateway and transcoder archi-
`tectures in a prototype application called rgu‘. We lever-
`aged off the flexible code base in vie, by incorporating vic’s
`H.261 encoder, JPEG decoder, and networking implementa- .
`tion into vgw. While our eventual goal is to support several
`combinations of efficient transcoding operations, the current
`implementation supports only the JPEG/H.261 transcoding
`model described in the previous section. Streams that are
`not JPEG (or not intended to be transcoded) are simply for-
`warded across the gateway.
`JPEGlH.261 Transcoder. By modifying vic's H.261 en-
`coder and JPEG decoder to support a DCT based call inter-
`face, we were able to easily implement the transcoder archi-
`tecture described in the previous section. The J PEG decoder
`performs table-driven Huffman decoding of the DCT trans-
`form coefficients in parallel with conditional replenishment.
`We optimize the case of an unreplenished block by Huffman
`decoding the first six DCT coefficients and comparing them
`to what has already been transmitted, as described in [9].
`if
`they are "similar enough", we skip the current block by pars-
`ing the rest of the coefficients in the block without an} ad-
`ditional processing. Hcnce. in the case where there is little
`motion in the scene, the system runs at the Huffman decod-
`ing speed (whichjust uses table lookups).
`A single frame buffer of DCT coefficients is maintained
`by the decoder. which is used as a refere

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket