`Design and Implementation
`
`George Fankhauser, Marcel Dasen, Nathalie Weiler, Bernhard Plattner, Burkhard Stiller
`
`Computer Engineering and Networks Laboratory (TIK), ETH Zürich
`Gloriastrasse 35, CH – 8092 Zürich, Switzerland, Phone: +41 1 632 7017
`E-Mail: [gfa, dasen, weiler, plattner, stiller] @ tik.ee.ethz.ch
`
`TIK Technical Report No. 44
`
`June 1998
`
`Abstract
`
`Wireless video in acceptable quality is only possible by following an end-to-end approach. WaveVideo is an
`integrated, adaptive video coding architecture designed for heterogeneous wireless networks. It includes basic
`video compression algorithms based on wavelet transformations, an efficient channel coding, a filter architec-
`ture for receiver-based media scaling, and error-control methods to adapt video transmissions to the wireless
`environment. Using a joint source/channel coding approach, WaveVideo offers a high degree of error tolerance
`on noisy channels, still being competitive in terms of compression.
`
`Adaptation to channel conditions and user requirements is implemented on three levels. The coding itself
`features spatial and temporal measures to conceal transmission errors. Additionally, the amount of introduced
`error-control information is controlled by feedback. The video stream coding, applied to multicast capable net-
`works, can serve different user needs efficiently at the same time by scaling the video stream in the network
`according to receivers’ quality requirements.
`
`The WaveVideo architecture is unique in terms of its capability to use QoS mapping and adaptation func-
`tions across all network nodes providing the same uniform interface.
`
`Keywords: Video compression, joint source/channel coding, wireless video, mobile and wireless networks,
`error-control, QoS mapping, QoS adaptation, media scaling, filtering, multicast.
`
`A short version of this technical report is published in ACM Monet, Mobile Networks and Applications, Special Issue on
`Adaptive Mobile Networking and Computing.
`
`Dish
`Exhibit 1045, Page 1
`
`
`
`2
`
`Dish
`
`Exhibit 1045, Page 2
`
`Dish
`Exhibit 1045, Page 2
`
`
`
`1
`
`Introduction
`
`Wireless and mobile networks have to be considered
`hostile environments for applications. Especially multi-
`media applications, such as voice and video, which
`depend on the timely delivery of data, suffer from the
`high variance of bandwidth and bit error rate (BER).
`Unlike tethered networks which support constant trans-
`mission rates and typical, predictable values for BER,
`wireless networks must cope with physical layer errors
`induced by path-loss, fading, channel interference, and
`shadowing. They can correct these errors only to a
`minor extent.
`Using traditional video compression methods which
`were designed for reliable media or reliable network
`transmission lead to several problems in wireless envi-
`ronments. Firstly, the channel coding has to provide
`reliability for the full bandwidth of the video stream
`which is very expensive. Separating compression
`(source coding) and channel coding means wasting
`resources, because the channel coder cannot exploit the
`fact that not all of the stream data is of equal impor-
`tance. For most lossy video coding algorithms, a com-
`bined source and channel coding offers a higher
`performance than a separated approach [7], [12], [18],
`[37]. Secondly, these traditional video coding methods
`do not offer the elasticity to adapt to frequent changes of
`channel conditions.
`For interactive video transmissions over networks
`with wireless links, not only a certain minimum band-
`width of correctly transmitted data has to be provided,
`but also a delay limit has to be met in order to provide
`the applications’ requested Quality of Service (QoS).
`Unfortunately, wireless and mobile networks cannot
`provide strict QoS guarantees [48]. QoS provided by
`such networks should rather be divided into different
`mobility QoS-levels like wired-, wireless-, and han-
`dover-QoS [57] or controlled-QoS (for wireless and
`mobile operations in general) [9]. These large and fre-
`quent fluctuations in QoS have to be partly absorbed by
`the application. In order to do so, wireless applications
`in general, and video coding and transmission systems
`in particular must be elastic, i.e. they must be able to
`scale quality over a wide range when the available
`resources (bandwidth, reliable transmission) are chang-
`ing. This elasticity (scalability) property can be offered
`in a discrete or continuous form. It is also desirable that
`video coders are able to conceal bursty errors and fast
`changes in QoS (e.g., fast moving mobile terminals
`(MT)) without further changes of sender parameters.
`For large scale changes that can be expected to last over
`a reasonable period of time (e.g., after a handover),
`feedback and QoS adaptation mechanisms must be pro-
`vided to find a new working point for the system.
`
`QoS adaptation needs to cooperate with users, too.
`QoS adaptation mechanisms operating only on network
`feedback are not good enough. Users must be able to
`specify their quality requirements through simple and
`exemplarily QoS parameters, which are translated to the
`video coder’s detailed parameters. If the video coding
`provides high elasticity, a dynamic QoS mapping mech-
`anism can react to resource availability and hide even
`major changes in wireless network QoS from the user.
`
`1.1 Design Goals
`
`Designing a video transmission system for wireless
`networks requires the coding method to be robust.
`Although the physical and data link layer of wireless
`networks already feature channel codes that can correct
`typical errors, it is not acceptable that bursty errors,
`which cause loss of packets, are immediately visible.
`Fortunately, images and video signals offer many places
`to conceal other signals that are not part of the original
`source, because of their inherent high redundancy and
`the fact that the human visual system (HVS) is more
`sensible to features, like sharp edges or changes in
`brightness, than to gradients or colors. For WaveVideo,
`it was a clear design goal to conceal such errors and to
`distribute them among large areas of the video to make
`these changes almost invisible.
`Another goal is efficient channel coding. It is simple
`to add channel codes that help to improve transmission
`quality, but it is also costly. WaveVideo adds redun-
`dancy to the coded signal only, where absolutely neces-
`sary in order to keep a good compression ratio and an
`accepted video quality. At high compression ratios sim-
`ilar requirements hold. The compression must not pro-
`duce any disturbing artifacts, like blocks or wrongly
`reproduced brightness values. It is also desirable that
`the decoder does not need to post-process a signal to get
`rid of artifacts.
`If transmission conditions change dramatically in
`networks (e.g., handover in wireless networks or con-
`gestion situations in fixed networks), and error conceal-
`ment is no longer a solution, control mechanisms
`should be applicable to react to the new situation.
`Besides providing the control mechanism and a feed-
`back protocol the coder must be able to fulfill the
`requested adaptations by scaling the video stream
`accordingly. For wireless high-speed networks (20
`Mbit/s and faster) this means that scalability over
`almost three orders of magnitude should be achieved.
`Heterogeneous networks with different wireless
`cells (e.g., different radio systems and geographical
`scope), featuring different bandwidth, different typical
`BER and error distribution, require a wireless video net-
`work architecture to operate error-control and QoS
`adaptation as close to the wireless link as possible. Fur-
`
`3
`
`Dish
`Exhibit 1045, Page 3
`
`
`
`D
`
`6
`
` Radio Cell
`
` Radio Cell
`
`D
`
`5
`
`D
`
`4
`
`D
`
`E
`
`F
`
`AP
`
` Radio Cell
`
`D
`
`D
`
`7
`
`E
`
`AP
`
`F
`
`D
`
`D
`
`AP
`
`E
`
`F
`Switch
`
`F
`
`D
`
`D
`
`3
`
`Switch
`
`F
`
`Fixed Network
`
`C
`
`1
`
`F
`Switch
`
`D
`
`D
`
`2
`
`Error Control Module
`Video Filter Module
`
`E F
`
`Video Coder
`Video Decoder
`
`C D
`
`Terminal
`Mobile Terminal
`
`Figure 2: Typical wireless network setup with WaveVideo
`codecs, error-control modules and filters.
`
`tantly, filters are used on APs where MTs with different
`channel conditions must be served. In fact, switches and
`APs use instances of filters and error-control modules
`for each connection to a receiver. Scalability is ensured
`by using efficient filters and error-control modules that
`perform most operations by selection and replication of
`tagged network frames (packets).
`As an example (cf. Figure 2), consider Terminal 1
`(video coder) which sends a high quality video stream
`to the switch it is attached to. Other terminals in the
`local subnetwork (e.g., Terminal 2) may receive the
`same high quality video stream via multicast transmis-
`sion. Other terminals and MTs in the network do not
`request as much quality (e.g., for access link bandwidth
`or cost reasons). Therefore, a filter is located on the out-
`going link of this switch towards the network cloud.
`The same filter mechanism is applied in the other two
`switches and to APs for each connected MT. These fil-
`ters are controlled by receivers depending on current
`channel conditions and user settings. There are crowded
`radio cells with many MTs (e.g. MT 7) offering less
`bandwidth to each receiver, while there are MTs being
`offered the full capacity (e.g., MT 4). MT 5 for example
`suffers from bad radio conditions at the border of the
`cell which requires filter settings that produce very low
`bandwidth. In addition all APs host error-control mod-
`ules can insert redundant information about the most
`critical parts of the video stream. This information is
`generated locally on demand by exploiting the fact that
`these critical parts can be identified in the video stream.
`By using network filtering to reduce bandwidth and
`error-control modules to add robustness at every node
`in the network, it is possible to adapt to various channel
`conditions and receiver requirements without creating
`different connections with different layers for quality
`and error protection levels. While the initiation of error-
`
`thermore, when many receivers subscribe to transmis-
`sions from the same sender (e.g., live video), all
`receivers in such a multicast group should be able to get
`the QoS they request. To implement this goal effi-
`ciently, each link of a distribution tree in the network
`should never transport a better quality than the maxi-
`mum required in the following sub-tree. To adapt the
`video stream at switches for these particular needs of
`subbranches, it has to be filtered or down-scaled. For
`WaveVideo, this procedure should be as efficient as pos-
`sible in order to be able to build filters into access points
`(AP) or base transmission stations without a huge com-
`putational complexity.
`Considered application scenarios focus on video
`conferencing. This translates into the requirement of
`producing a symmetric codec (coder/decoder), i.e.
`coder and decoder should run at about the same speed.
`Furthermore, targeting mobile devices can restrict the
`capabilities which favors software-only solutions.
`Power and performance limitations require a low-com-
`plexity codec.
`
`1.2 Architecture and Environment
`
`The WaveVideo network and application architec-
`ture assumes a large, heterogeneous network with wire-
`less and wired subnetworks,
`involving different
`technologies and protocols. Transport interfaces chosen
`to develop and test the system on are ATM (Asynchro-
`nous Transfer Mode) and AAL 5 (ATM Adaptation
`Layer 5) on one hand, and IP (Internet Protocol) and
`UDP (User Datagram Protocol) on the other hand.
`Wireless links are build around the wireless ATM net-
`work developed within the European ACTS project
`Magic WAND [39].
`
`Coder
`
`Filter
`Module
`
`Filter
`Module
`
`Error
`Control
`
`Video
`Decoder
`
`Data Path
`
`Feedback Path
`
`Figure 1: Architectural components of WaveVideo.
`
`As shown in Figure 1, the four software components
`used in the WaveVideo architecture are coders, decod-
`ers, error-control modules, and filters. Coder and
`decoder modules are located in end-systems which
`comprise terminals and MTs. Error-control modules are
`placed directly in APs and MTs and receive feedback on
`channel conditions by their peers. Filters are used to
`scale video streams to links with different bandwidth
`and can be placed wherever diverse qualities must be
`served. This holds for switches in networks that are con-
`nected to more than one subnetwork. But most impor-
`
`4
`
`Dish
`Exhibit 1045, Page 4
`
`
`
`control feedback is provided automatically by the
`decoder, the specification of the desired quality for each
`receiver is made by the user. This interaction with users
`is performed by mapping simple and understandable
`user-level QoS definitions to codec and transport-level
`QoS parameters.
`This report is organized as follows. Section 2 dis-
`cusses related work in wireless video coding, media
`scaling including layered and stream based methods,
`and QoS mapping and adaptation for wireless video.
`Section 3 and 4 describe the wavelet-based compression
`algorithms and channel coding, respectively. Filtering,
`QoS mapping and QoS adaptation are discussed in Sec-
`tions 5 and 6. Quantitative results for the presented
`methods are given in Section 7 and the final Section
`draws conclusions and discusses future work.
`
`2 Related Work
`
`Research in the wireless network area typically
`focuses either on physical layers issues, such as CDMA
`versus TDMA, and multiple access protocols, or on
`higher layer control issues, such as call handoff, hierar-
`chical cell design, and dynamic channel allocation. New
`wireless networks must cope with the need to carry
`multimedia traffic, such as video data. Several control
`issues must be faced, e.g. admission control, dynamic
`bandwidth control, flow control, resource allocation.
`A variety of compression algorithms and transmis-
`sion approaches have been tuned to scale video to dif-
`ferent network
`requirements
`(bandwidth,
`error
`characteristics) which are not traditionally tailored to
`this purpose, such as ATM networks, the Internet, and
`wireless channels. Because of their susceptibility to
`errors, methods for error resilience are needed.
`In case of video streams, a variety of different cod-
`ing techniques exist. The most popular of standardized
`methods encompass ISO’s Motion Picture Experts
`Group Version 2 (MPEG-2) [25] for video transmission,
`ITU H.261 [26] and H.263 [27] for low-bit-rate video
`telephony, respectively.
`Aravind et al. [4] analyze the behavior of MPEG-2
`in presence of packet loss. It is concluded that non-lay-
`ered MPEG-2 provides an unacceptable quality in wire-
`less networks. Within layered versions of MPEG-2
`including dual-priority transmission the scalable, spatial
`encoding provides best video quality, but the SNR (Sig-
`nal-to-Noise Ratio) scalable encoding is preferable
`despite of its worse quality, because of its implementa-
`tion simplicity.
`On one hand several 3-D subband coding algorithms
`have been proposed which focus on the reduction of
`temporal redundancy [55]. On the other hand, standards
`as MPEG and H.261 try to reduce spatial redundancy
`and address the temporal redundancy by motion com-
`
`pensation. Motion compensation allows to exploit the
`temporal redundancy in video frame sequences. Block
`matching algorithms (BMA) are most popular in for-
`ward motion compensated video coding, because of
`their compact motion field representation. However,
`they produce severe blocking artifacts which give an
`unnatural impression to the viewer. The goal of any
`video coding algorithm should be to reduce both spatial
`and temporal redundancy. In this case the video
`sequence can be represented with fewer bits and can be
`transmitted in an acceptable quality over wireless chan-
`nels.
`Newly emerging coding algorithms deal with the
`reduction of spatial and temporal redundancy without
`using any motion compensation, because of its annoy-
`ing blocking artifacts. Therefore, the WaveVideo
`approach uses a wavelet-based image compression with
`extension to the temporal axis (cf. Section 3).
`Horn and Girod [22] use spatio-temporal resolution
`pyramids to transmit video robustly and scalable over
`wireless channels. They combine pyramid coders with
`multiscale motion compensation, i.e. with an indepen-
`dent motion compensation in each resolution layer.
`Therefore they are able to decode partial bit-streams
`and to compress efficiently at different bit rates. Alike,
`they achieve decoding performance similar to H.263 for
`their lowpass and bandpass implementations. A similar
`approach based on the wavelet representation and
`multiresolution motion compensation can be found in
`[51] and in [64].
`Krishnamurthy et al. [32] developed an advanced
`motion estimation algorithm which compactly encodes
`motion fields and recovers from errors even at coarse
`resolution levels. In contrast to the popular BMA, it
`does not produce any blocking artifacts.
`A hybrid motion-compensated wavelet transform
`coder for very low bit rates has been proposed by Mar-
`tucci et al. [35] for the MPEG-4 standard. It uses an
`overlapping block motion compensation combined with
`a discrete wavelet transform and the zerotree concept
`for encoding the wavelet coefficients. Its main advan-
`tages are its scalability and the good performance, espe-
`cially on I-frames. Another example of such an hybrid
`strategy can be found in [53].
`The coding algorithm of Belzer et al. [7] is based on
`the subband decomposition using integer coefficient fil-
`ters. They adaptively deliver video at rates between 60
`and 600 kbit/s as per available bandwidth. The com-
`pression is performed on a frame by frame basis without
`any motion compensation.
`Cheung and Zakhor [12] suggest a bit allocation
`approach for a joint source/channel video codec specifi-
`cally tailored to noisy channels. The source and channel
`bits are partitioned in such a way that the expected dis-
`tortion is minimized. The source coding algorithm is
`
`5
`
`Dish
`Exhibit 1045, Page 5
`
`
`
`based on 3-D subband coding with multirate quantiza-
`tion. The compression efficiency is claimed to be com-
`parable to standards such as MPEG.
`Wireless links are prone to errors. Unlike wired
`transport media, they suffer from limited and changing
`bandwidth. Different error-control mechanisms deal
`with frame losses and delay jitter due to channel degra-
`dation. In a one-way communication the only possibil-
`ity is forward error-control (FEC). In many wireless
`channels a feedback channel is provided. Several
`enhancements to FEC have been proposed [42]. Auto-
`matic Repeat reQuest (ARQ) is the mechanism of
`choice [23], [30]. Different hybrid combinations of
`ARQ and FEC have been proposed, especially for wire-
`less channels [13], [19].
`Source based adaptation schemes suffer from inher-
`ent drawbacks. The network-assisted bandwidth adapta-
`tion burdens
`the network with
`supplementary
`computations (cf. Section 5). Therefore many research-
`ers proposed a receiver based rate adaptation. Layered
`compression and transmission allows the receiver to fil-
`ter the appropriate layer to meet the local capacity of the
`network. The Heidelberg Transport System (HeiTS)
`[16] uses a discrete scaling mechanism to allow the
`receiver to adapt to the delivered bandwidth.
`Hoffman and Speer [21] describe a video distribu-
`tion system based on a layered multicast infrastructure.
`Their temporal hierarchy is created by using multiple
`rates of a MJPEG (Motion Joint Photographic Expert
`Group) video. On one hand, the receiver can negotiate
`with the network to obtain the highest level of available
`video quality with respect to the network using, e.g.,
`RSVP (Resource ReServation Protocol) [63]. On the
`other hand it can apply an aggressive adaptation strat-
`egy by subscribing to all layers and dropping layers
`afterwards. This system’s drawbacks are its lack of scal-
`ability to larger groups and its inability to deal with
`bandwidth fluctuations.
`A similar approach was taken by McCanne et al.
`[37] in the receiver-driven layer multicast (RLM)
`scheme. RLM uses IP multicast and an RTP implemen-
`tation. However it handles subscriptions by performing
`a set of test joins and by measuring the consecutively
`workload in the network. This system’s asset is that no
`changes to the network infrastructure are needed.
`Quality of Service support is a well studied field in
`wired networks. Delivering QoS guarantees in multime-
`dia systems is an end-to-end issue, concerning both
`sender and receiver applications. By admitting a con-
`nection, the network must meet the contract it agreed on
`with the application in terms of service quality. To
`enable the network to deal with network congestion,
`different solutions have been proposed. Fixed network
`based schemes try to provide absolute guarantees on
`resource availability [2], [31], whereas other designs
`
`rely on a combination of application and network
`issues. Yeadon et al. [62] propose to filter hierarchically
`encoded streams to ease network fluctuations.
`In wireless and mobile networks, QoS guarantees
`are even harder to comply with. Wireless channel fad-
`ing and mobility have not been considered in traditional
`QoS architectures for wired networks. Lately, several
`frameworks have been developed to meet these require-
`ments in wireless and/or mobile environments.
`Campbell’s and Coulson’s receiver-oriented video
`delivery system relies on layered compression [10].
`Receivers signal reservation messages. The network
`returns the level of congestion. Then the receiver reacts
`to congestion by reducing the resource reservation
`request. The assembly and distribution of reservation
`messages assumes an ATM network and is based on
`network filters. A similar approach for wireless net-
`works called Mobiware is described in [9].
`Lee [33] proposes two new models: an application
`model, which couples adaptive applications to the adap-
`tive network resource control, and a service model
`which introduces a new network service class. This
`adaptive reserved service provides performance guaran-
`tees to adaptive applications on a best effort basis in
`mobile networks. The framework is still under develop-
`ment.
`A mobile QoS framework considering both network
`and application adaptation is investigated in [57]. The
`application specifies a desired and a minimum needed
`QoS, which allows QoS degradation and upgrade to
`meet congestion fluctuations in mobile networks. Fur-
`thermore, the delivered QoS moves with the user in the
`mobile network.
`The shadow cluster concept [34] uses a predictive
`resource estimation scheme to cope with QoS reserva-
`tions on the network layer. A similar approach is taken
`by Oliveira et al. [43]. They both do not consider any
`application adaptation possibilities.
`Naghshineh and Willebeek-LeMair [41] propose a
`framework that bridges multimedia application and net-
`work needs. The multimedia stream is divided in sev-
`eral substreams with different QoS requirements. All
`components of the framework, i.e. network switches,
`access points, services and signaling, routing and con-
`trol protocols, try to monitor the required QoS.
`Only few researchers proposed a truly integrated,
`adaptive video coding architecture. Campbell is work-
`ing towards such an integrated approach. In this report,
`an end-to-end architecture for this purpose is presented.
`The WaveVideo prototype includes joint source/channel
`video coding algorithms, a filter architecture for
`receiver-based media scaling and error-control methods
`to cope with a wireless environment.
`
`6
`
`Dish
`Exhibit 1045, Page 6
`
`
`
`On the receiving side, tagged packets are directly
`decoded into an empty wavelet tree, thus adding signif-
`icant coefficients to it. Whenever the tree is completed
`or an external event occurs, such as a deadline is
`reached to play a video frame, the wavelet tree is pro-
`cessed by the inverse WT (IWT) and a YCbCr image is
`produced which may be converted to RGB, depending
`on the video hardware used. An additional step is the
`estimation of the quality which is approximated by the
`completeness of the reconstructed wavelet tree (cf. Sec-
`tion 4.2 and 6.5).
`
`3.1
`
`Spatial Compression
`
`Spatial compression attempts to eliminate as much
`redundancy from single video frames as possible with-
`out introducing a perceptible degradation of quality.
`This is done by firstly transforming the image from the
`spatial to the frequency domain and secondly by quan-
`tizing and compressing the decorrelated output. A two-
`dimensional discrete wavelet transformation (DWT) is
`applied to the image. The DWT is implemented and
`approximated using iterated discrete-time filters [58]
`which are applied recursively as shown in Figure 4. For
`luminance (Y) and color difference (CbCr) channels a
`similar tree of low- and high-frequency subbands is
`generated. More precisely, a 2-D transformation step
`from one level to the next is assembled by a horizontal
`and a vertical 1-D transformation performed in series.
`Implemented as low- and high-pass filters with subsam-
`pling, this procedure yields four subbands of a quarter
`of the original size. LL-subbands are recursively passed
`to the next lower level for further transformation, while
`subbands containing high frequency signals (HL, LH,
`HH) are subject to quantization and compression. The
`LL-subband of the lowest level is also needed for recon-
`
`RGB
`
`Cb
`
`Cr
`
`Y
`
`LL
`
`HL
`
`LH
`
`HH
`
`LL
`
`HL
`
`LH
`
`HH
`
`LL
`
`HL
`
`LH
`
`HH
`
`LL
`
`LHHL
`
`HH
`
`LL Low-frequency subbands;
`only smallest size is encoded
`HL LH HH Quantized and encoded
`high-frequency subbands
`
`Figure 4: Schematic of wavelet decomposition of luminance
`channel and subsampled color channels.
`
`7
`
`3 WaveVideo Compression Algorithms
`
`The developed WaveVideo coding method is a lossy
`compression based on the wavelet transformation (WT)
`and temporal redundancy elimination. It was first
`described in [14] and has undergone a major revision
`and many enhancements since. The outline of both the
`coding and decoding method is presented in Figure 3.
`Video is fed into the coder either in RGB or YCbCr
`color space. The color-separated signal is transformed
`into frequency space yielding a multiresolution repre-
`sentation or wavelet tree of the video image with differ-
`ent levels of detail (cf. Figure 4). This transformed
`
`YCbCr Video Input
`RGB Video Input
`
`Buffer Allocation
`
`Color Conversion
`and Sub-sampling
`
`Wavelet
`Transformation
`
`RGB Video Output
`
`Color Conversion
`and Up-sampling
`
`Inverse Wavelet
`Transformation
`
`LL-Cache
`
`Error Estimation
`
`RLZ-Decoder
`and Re-assembly
`
`Tag/Header
`Decoder
`
`Delta-Buffer
`
`Delta-Quantizer
`
`I/D-
`Frame
`
`Quantizer
`
`RLZ-Coder
`
`LL-Interleaving
`
`Tagging,
`Headers,
`Segmentation
`
`Compression Ratio Feedback
`
`Wireless Network
`
`Wireless Network
`
`Figure 3: Video coder and decoder connected by a wireless
`network.
`
`image is compared to a previously recorded one and it is
`decided, whether a new intra-frame (I-frame) or differ-
`ence-frame (D -frame) is sent to the quantizer and RLZ-
`coder (Run-Length Zero coder). The RLZ-coder is a
`simple and lossless run-length coder which is optimized
`for the coefficient distribution of the wavelet trans-
`forms. The last two stages of the coder produce the
`channel format which is sent over the network. Based
`on feedback from the receiver, the placement and
`degree of redundancy of low-frequency (LL) subbands
`and the segment size is selected. Finally, tags and head-
`ers describing the semantics of each packet are added.
`These packets contain a compressed coefficient repre-
`sentation separated by subband, by quantizer range and
`by color channel.
`
`Dish
`Exhibit 1045, Page 7
`
`
`
`struction, since it is the root of the reconstruction tree.
`All non-redundant subbands needed for decoding are
`grouped by shadowed boxes in Figure 4. The maximum
`number of transformation levels depends on the size of
`the original video frames and is limited to 8 levels for
`large formats, such as High Definition TV. If a color
`subsampling scheme is used, as shown in the example,
`the luminance channel can be transformed one level
`deeper than the color channels.
`Transforming full images instead of small blocks
`has several advantages. Larger images can be processed
`more often which increases the number of high-fre-
`quency coefficients and, thus, more redundancy can be
`removed. Compared to standard DCT-blocks of 8 by 8
`pixels, which corresponds to a three-level transforma-
`tion, the common video formats can be transformed
`from 4 (QCIF) to 6 (PAL) levels using the WT. Generat-
`ing these multiresolution representations and coding
`them grouped by level-of-detail rather than locality
`allows decoders and network stream filters to extract
`only the desired information. For random loss occurring
`in wireless networks, the multiresolution representation
`can produce partial reconstructions that spread loss of
`information over the whole image. Since the HL and
`LH subbands contain information about edges, such
`incomplete images lack sharpness, but never fall short
`by whole image areas, such as blocks.
`The choice of a wavelet basis for video compression
`was based on reconstruction properties and runtime
`complexity. Since many suitable candidates were found,
`the wavelet basis has been included as a coder and
`stream parameter with a preset default value. Generally,
`complexity for wavelet filters is O(n), where n is the
`number of filter taps. The one-dimensional n-tap filter-
`pair is applied as follows:
`n 1–
`
`(cid:229)=
`
`lk
`
`L˜ ix2k i0–
`
`i+
`
` and
`
`hk
`
`n 1–
`
`(cid:229)=
`
`H˜ ix2k i0–
`
`i+
`
`.
`
`0=
`i
`0=
`i
`L˜
`H˜
`are the low- and high-pass filters, x the pixel
`and
`values with row- or column-index i, and k is the index
`of the filter output. Iterating with steps of 2k automati-
`cally introduces the desired downsampling by 2. Filter
`coefficients are real numbers in the range [-1,1]. Using
`a fixed-point format for these filter coefficients, the
`algorithm can be implemented for arbitrary wavelets by
`multiplication-free table lookups. Depending on the
`number of taps and filter coefficients, this method
`offers only a limited number of transformation levels
`within a given precision. For common video formats
`and some short wavelet filters a precision of 16 bit has
`proven
`to be sufficient. Using
`this approach,
`Daubechies 4- and 6-tap filters were employed [15].
`Having 2 and 3 zero moments, respectively, they are
`suitable to reconstruct natural images and offer a good
`compromise between image quality and filter length.
`
`8
`
`As variants of Daubechies’ filters, psycho-visually
`tuned versions were integrated, too [40].
`An interesting class of integer-based reversible
`wavelet filters and a general method to develop them
`was described by Chao and Fisher in [11]. Filter coeffi-
`cients are represented by multiples of the power of 2
`which eliminates rounding and loss of arithmetic preci-
`sion completely. The CF53 type, using a 5-tap low- and
`3-tap high-pass filter, was found to have both required
`capabilities: good image reconstruction properties com-
`parable to Daubechies’ 6-tap filter and very efficient
`implementations. Due to its shortness it cannot produce
`as optimal results as typical filters used for image pro-
`cessing which are usually longer (7 to 9 taps). Such
`wavelet filters, as discussed in [1] and [3], try to achieve
`optimal reconstruction results. The CF53 filter is as a
`compromise of reconstruction image quality and low
`runtime cost. The algorithm to calculate lk and hk needs
`only 4 add and 2 shift operations and by limiting the
`transformation depth to 5 levels (this is equivalent to 10
`1-D wavelet filter iterations), coefficients can be repre-
`sented as 16 bit integers without loss of precision. Fur-
`thermore, this compact integer representation allows to
`use modern SIMD-type (Single Instruction Multiple
`Data) instructions which are surfacing on many proces-
`sor architectures recently (they are also known as “mul-
`timedia”-instructions). Such instructions execute the
`same arithmetic operation in parallel on small integer
`values in large registers, i.e. on four 16 bit values in a 64
`bit register. For wavelet filters used in WaveVideo this
`means that in theory four pixels can be read and pro-
`cessed in parallel and four coefficients can be written
`back. In practice however, several issues had to be han-
`dled. The iterative subsampling leads to small pixel
`rows and columns which are no longer divisible by four
`requiring special border treatment. This happens in
`addition to the well known border treatment for the WT
`itself which was solved by wrapping the images both
`vertically and horizontally around (modulo). 1-D filters
`work differently for horizontal and vertical filters.
`While horizontal filters parallelize the algorithm for
`each filter step, vertical filters can process four columns
`at a time and achieve near optimal speedup (see Section
`7.3 for a specific processor example and speedup
`results).
`Once