`
`Real-Time Video Compression
`
`Techniques and Algorithms
`
`by
`
`Raymond Westwater
`Borko Furht
`
`Florida Atlantic University
`
`1 of 169
`
`GOOGLE EXHIBIT 1013
`
`
`
`Page iv
`
`Distributors for North America:
`Kluwer Academic Publishers
`101 Philip Drive
`Assinippi Park
`Norwell, Massachusetts 02061 USA
`
`Distributors for all other countries:
`Kluwer Academic Publishers Group
`Distribution Centre
`Post Office Box 322
`3300 AH Dordrecht, THE NETHERLANDS
`
`Library of Congress Cataloging-in-Publication Data
`
`A C.I.P. Catalogue record for this book is available
`from the Library of Congress.
`
`Copyright © 1997 by Kluwer Academic Publishers
`
`All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or
`transmitted in any form or by any means, mechanical, photocopying, recording, or otherwise, without
`the prior written permission of the publisher, Kluwer Academic Publishers, 101 Philip Drive,
`Assinippi Park, Norwell, Massachusetts 02061
`
`Printed on acid-free paper.
`
`Printed in the United States of America
`
`2 of 169
`
`
`
`Contents
`
`Preface
`
`1. The Problem of Video Compression
`
`1.1 Overview of Video Compression Techniques
`
`1.2 Applications of Compressed Video
`
`1.3 Image and Video Formats
`
`1.4 Overview of the Book
`
`2. The MPEG Video Compression Standard
`
`2.1 MPEG Encoder and Decoder
`
`2.2 MPEG Data Stream
`
`3. The H.261/H.263 Compression Standard for Video Telecommunications
`
`3.1 Picture Formats for H.261/H 263 Video Codecs
`
`3.2 H.261/H.263 Video Encoder
`
`3.3 H.261/H.263 Video Decoder
`
`4. The XYZ Video Compression Algorithm
`
`4.1 XYZ Compression Algorithm
`
`4.2 XYZ Decompression Algorithm
`
`5. The Discrete Cosine Transform
`
`5.1 Behavior of the DCT
`
`5.2 Fast One-dimensional DCT Algorithms
`
`5.3 Two-dimensional DCT Algorithms
`
`5.4 Inverse DCT Algorithms
`
`Page v
`
`vii
`
`1
`
`3
`
`6
`
`8
`
`12
`
`15
`
`15
`
`18
`
`23
`
`24
`
`25
`
`28
`
`29
`
`29
`
`32
`
`37
`
`37
`
`40
`
`47
`
`50
`
`3 of 169
`
`
`
`5.5 Three-dimensional DCT Algorithms
`
`6. Quantization
`
`6 1 Defining an Invariant Measure of Error
`
`6 2 Calculation of Transform Variances
`
`6 3 Generating Quantizer Factors
`
`6.4 Adding Human Visual Factors
`
`7. Entropy Coding
`
`7 1 Huffman Coding
`
`7 2 Use of Entropy Coding in JPEG and MPEG
`
`7 3 Adaptive Huffman Coding
`
`8. VLSI Architectures of the XYZ Video Codec
`
`8 1 Complexity of the Video Compression Algorithms
`
`8 2 From Algorithms to VLSI Architectures
`
`8 3 Classification of Video Codec VLSI Architectures
`
`8.4 Implementation of the XYZ Video Compression Algorithm
`
`8 5 Adaptive XYZ Codec Using Mesh Architecture
`
`8.6 XYZ Codec Based on Fast 3D DCT Coprocessor
`
`9. Experimental Results Using XYZ Compression
`
`9 1 PC Implementation
`
`9 2 MasPar Implementation
`
`9 3 Non-adaptive XYZ Compression
`
`10. Conclusion
`
`Bibliography
`
`Index
`
`Page vi
`
`51
`
`57
`
`58
`
`62
`
`65
`
`67
`
`73
`
`73
`
`76
`
`78
`
`83
`
`83
`
`86
`
`87
`
`90
`
`103
`
`111
`
`123
`
`124
`
`138
`
`144
`
`151
`
`155
`
`163
`
`4 of 169
`
`
`
`Page vii
`
`Preface
`
`This book is on real-time video compression. Specifically, the book introduces the XYZ video
`compression technique, that operates in three dimensions, eliminating the overhead of motion
`estimation. First, video compression standards, MPEG and H.261/H.263, are described. They both
`use asymmetric compression algorithms, based on motion estimation. Their encoders are much more
`complex than decoders. The XYZ technique uses a symmetric algorithm, based on the Three-
`Dimensional Discrete Cosine Transform (3D-DCT). 3D-DCT was originally suggested for
`compression about twenty years ago, however at that time the computational complexity of the
`algorithm was to high, it required large buffer memory, and was not as effective as motion
`estimation. We have resurrected the 3D-DCT based video compression algorithm by developing
`several enhancements to the original algorithm. These enhancements made the algorithm feasible for
`real-time video compression in applications such as video-on-demand, interactive multimedia, and
`videoconferencing. The demonstrated results, presented in the book, suggest that the XYZ video
`compression technique is not only a fast algorithm, but also provides superior compression ratios and
`high quality of the video compared to existing standard techniques, such as MPEG and H.261/H.263.
`The elegance of the XYZ technique is in its simplicity, which leads to inexpensive VLSI
`implementation of a XYZ codec.
`
`We would like to thank Jim Prince for conducting experiments in developing visually weighted
`quantizers for the XYZ algorithm, as well as a number of students from Florida Atlantic University,
`who participated in these experiments. We also want to thank Drs. Roy Levow, K. Genesan, and
`Matthew Evett, professors from Florida Atlantic University, Dr. Steve Rosenbaum from Cylex
`Systems, and Joshua Greenberg for constructive discussions during this project.
`
`RAYMOND WESTWATER AND BORKO FURHT
`BOCA RATON, JULY 1996.
`
`5 of 169
`
`
`
`Page 1
`
`1—
`The Problem of Video Compression
`
`The problem of real-time video compression is a difficult and important one, and has inspired a great
`deal of research activity. This body of knowledge has been, to a substantial degree, embodied into the
`MPEG and H.261/H263 motion video standards. However, some important questions remain
`unexplored. This book describes one possible alternative to these standards that has superior
`compression characteristics while requiring far less computational power for its full implementation.
`
`Since about 1989, moving digital video images have been integrated with programs. The difficulty in
`implementing moving digital video is the tremendous bandwidth required for the encoding of video
`data. For example, a quarter screen image (320 x 240 pixels) playing on an RGB video screen at full
`speed of 30 frames per second (fps) requires storage and transmission of 6.9 million bytes per second.
`This data rate is simply prohibitive, and so means of compressing digital video suitable for real-time
`playback are a necessary step for the widespread introduction of digital motion video applications.
`
`Many digital video compression algorithms have been developed and implemented. The compression
`ratios of these algorithms varies according to the subjective acceptable level of error, the definition of
`the word compression, and who is making the claim. Table 1.1 summarizes video compression
`algorithms, their typical compression ratios reported in the literature, and their characteristics.
`
`6 of 169
`
`
`
`Table 1 1 Overview of video compression algorithms
`
`Compression Algorithm
`
`Typical
`Compression
`Ratio
`
`Characteristics
`
`Page 2
`
`Intel RTV/Indeo
`
`Intel PLV
`
`IBM Photomotion
`
`Motion JPEG
`
`Fractals
`
`Wavelets
`
`H.261/H263
`
`MPEG
`
`3:1
`
`12:1
`
`3:1
`
`10:1
`
`10:1
`
`20:1
`
`50:1
`
`30:1
`
`A 128X240 data stream is interpolated to 256X240 Color is
`subsampled 4:1 A simple 16 bit codebook is used without error
`correction Frame differencing is used
`
`A native 256X240 stream is encoded using vector quantization
`and motion compensation Compression requires specialized
`equipment
`
`An optimal 8-bit color palette is determined, and run-length
`encoding and frame differencing are used
`
`Uses 2-D DCT to encode individual frames Gives good real-time
`results with inexpensive but special-purpose equipment This
`technique supports random-access since no frame differencing is
`used
`
`Fractals compress natural scenes well, but require tremendous
`computing power
`
`2-D and 3-D wavelets have been used in the compression of
`motion video Wavelet compression is low enough in complexity
`to compress entire images, and therefore does not suffer from the
`boundary artifacts seen in DCT-based techniques
`
`Real-time compression and decompression algorithm for video
`telecommunications It is based on 2-D DCT with simple motion
`estimation between frames
`
`Uses 2-D DCT with motion estimation and interpolation between
`frames The MPEG standard is difficult and expensive to
`compress, but plays back in real-time with inexpensive equipment
`
`An ideal video compression technique should have the following characteristics:
`
`• Will produce levels of compression rivaling MPEG without objectionable artifacts.
`
`• Can be played back in real time with inexpensive hardware support.
`
`7 of 169
`
`
`
`Page 3
`
`• Can degrade easily under network overload or on a slow platform.
`
`• Can be compressed in real time with inexpensive hardware support.
`
`1.1—
`Overview of Video Compression Techniques
`
`The JPEG still picture compression standard has been extremely successful, having been
`implemented on virtually all platforms. This standard is fairly simple to implement, is not
`computationally complex, and gets 10:1 to 15:1 compression ratios without significant visual
`artifacts. This standard is based upon entropy encoding of quantized coefficients of the discrete
`cosine transformation of 8x8 blocks of pixel data.
`
`Figure 1.1 shows the block diagram of both the JPEG compression and decompression algorithms. A
`single frame is subdivided into 8x8 blocks, each of which is independently processed. Each block is
`transformed into DCT space, resulting in an 8x8 block of DCT coefficients. These coefficients are
`then quantized by integer division by constants. The quantizing constant for each DCT coefficient is
`chosen to produce minimal visual artifacts, while maximally reducing the representational entropy of
`the coefficients. The quantized coefficients are then entropy coded into a compressed data stream.
`The reduced entropy of the quantized coefficients is reflected in the higher compression ratio of the
`data.
`
`The Motion JPEG (M-JPEG) uses the JPEG compression for each frame. It provides random access
`to individual frames, however the compression ratios are too low (same as in JPEG), because the
`technique does not take advantage of the similarities between adjacent frames.
`
`The MPEG moving compression standard is an attempt to extend DCT-based compression into
`moving pictures. MPEG encodes frames by estimating the motion difference between the frames, and
`encoding the differences into roughly JPEG format. Unfortunately, motion estimation is
`computationally complex, requires specialized equipment to encode, and adds considerable
`complexity to the algorithm. Figure 1.2 illustrates the MPEG compression algorithm for predictive
`frames.
`
`8 of 169
`
`
`
`Page 4
`
`Figure 1.1
`JPEG compression and decompression algorithms.
`
`One of the most promising new technologies is wavelet-based compression [VK95]. Figure 1.3
`illustrates a simple wavelet transform: subband decomposition. The image as a whole is subdivided
`into frequency subbands, which are then individually quantized. One of the most attractive features of
`this system is that it is applied to the image as a whole, thereby avoiding the edge artifacts associated
`with the block-based DCT compression schemes.
`
`The wavelet transform can be applied to the time dimension as well. Experience has shown that this
`decomposition does not give as good compression results as motion compensation. As there are no
`other compression algorithms capable of such high compression ratios, MPEG is considered the
`existing ''state-of-the-art".
`
`The XYZ algorithm is a natural extension of the research that has been done in video compression.
`Much work has been done in the development of transform-based motion video compression
`algorithms, and in the development of quantizing factors based on the sensitivity of the human eye to
`various artifacts of compression.
`
`9 of 169
`
`
`
`Page 5
`
`Figure 1.2
`MPEG compression algorithm for predictive frames. MPEG
`adds motion estimation to the JPEG model.
`
`Figure 1.3
`Octave-band or wavelet decomposition of a still
`image into unequal subbands.
`
`XYZ compression is an alternative extension of DCT encoding to moving pictures. Sequences of
`eight frames are collected into a three-dimensional block to which a three-dimensional DCT will be
`applied. The transformed data is then quantized.
`
`10 of 169
`
`
`
`These quantizing constants are demonstrated to cause artifacts which are minimally visible. The resulting
`data stream is then entropy coded. This process strongly resembles the JPEG encoding process, as
`illustrated in Figure 1.4.
`
`Page 6
`
`Figure 1.4
`XYZ compression algorithm.
`
`This algorithm is built upon a considerable body of published work. The three-dimensional DCT has been
`used to encode errors after motion estimation has been performed [RP77], and true three-dimensional
`DCT-based compression algorithms have been developed where the quantizers were based upon
`minimization of introduced mean square error [NA77]. These algorithms have fallen into disfavor because
`they were considered to require excessive computation, required too much buffer memory, and were not as
`effective as motion estimation. This book refutes these arguments.
`
`Work in visibility of artifacts produced by quantization has also been done [CR90]. Visibility of two-
`dimensional quantization artifacts has been thoroughly explored for the DCT transforms space. The XYZ
`algorithm extends this work to quantization of three-dimensional DCT coefficients.
`
`1.2—
`Applications of Compressed Video
`
`Video compression techniques made feasible a number of applications. Four distinct applications of the
`compressed video can be summarized as: (a) consumer broadcast television, (b) consumer playback, (c)
`desktop video, and (d) videoconferencing.
`
`11 of 169
`
`
`
`Consumer broadcast television, which includes digital video delivery to homes, typically requires a small number of high-quality compressors and a
`large number of low-cost decompressors Expected compression ratio is about 50:1
`
`Consumer playback applications, such as CD-ROM libraries and interactive games, also require a small number of compressors and a large number
`of low-cost decompressors The required compression ratio is about 100:1
`
`Desktop video, which includes systems for authoring and editing video presentations, is a symmetrical application requiring the same number of
`encoders and decoders The expected compression ratio is in the range from 5:1 to 50:1
`
`Videconferencing applications also require the same number of encoders and decoders, and the expected compression ratio is about 100:1
`
`Page 7
`
`Application
`
`Analog Videophone
`
`Low Bitrate Video
`Conferencing
`
`Basic Video Telephony
`
`Video Conferencing
`
`Interactive Multimedia
`
`Digital TV - NTSC
`
`High Definition Television
`
`Table 1 2 Applications of the compressed video and current video compression standards
`
`Bandwidth
`
`Standard
`
`Size
`
`Frame Rate [frames/sec]
`
`5-10 Kbps
`
`26-64 Kbps
`
`64-128 Kbps
`
`>= 384 Kbps
`
`1-2 Mbps
`
`3-10 Mbps
`
`15-80 Mbps
`
`none
`
`H 263
`
`H 261
`
`H 261
`
`MPEG-1
`
`MPEG-2
`
`MPEG-2
`
`170x128
`
`128x96
`176x144
`
`176x144
`352x288
`
`352x288
`
`352x240
`
`720x480
`
`1200x800
`
`2-5
`
`15-30
`
`10-20
`
`15-30
`
`15-30
`
`30
`
`30-60
`
`Table 1 2 summarizes applications of the compressed video, by specifying current standards used in various applications, the required bandwidth,
`and typical frame sizes and frame rates
`
`12 of 169
`
`
`
`Page 8
`
`1.3—
`Image and Video Formats
`
`A digital image represents a two-dimensional array of samples, where each sample is called a pixel.
`Precision determines how many levels of intensity can be represented, and is expressed as the number
`of bits/sample. According to precision, images can be classified into: (a) binary images, represented
`by 1 bit/sample, (b) computer graphics, represented by 4 bits/sample, (c) grayscale images,
`represented by 8 bits/sample, and color images, represented with 16, 24 or more bits/sample.
`
`According to the trichromatic theory, the sensation of color is produced by selectively exciting three
`classes of receptors in the eye. In a RGB color representation system, shown in Figure 1.5, a color is
`produced by adding three primary colors: red, green, and blue (RGB). The straight line, where
`R=G=B, specifies the gray values ranging from black to white.
`
`Figure 1.5
`The RGB representation of color images.
`
`Another representation of color images, YUV representation, describes luminance and chrominance
`components of an image. The luminance component provides a grayscale version of the image, while
`two chrominance components give additional information that converts the grayscale image to a color
`image. The YUV representation is more natural for image and video compression. The exact
`transformation from RGB to YUV representation, specified by the CCIR 601 standard, is given by
`the following equations:
`
`13 of 169
`
`
`
`Page 9
`
`where Y is the luminance component, and U and V are two chrominance components.
`
`An approximate RGB to YUV transformation is given as:
`
`This transformation has a nice feature that, when R+G+B, then Y=R=G=B, and U=V=0. In this case,
`the image is a grayscale image.
`
`Color conversion from RGB to YUV requires several multiplications, which can be computationally
`expensive. An approximation, proposed in [W+94], can be calculated by performing bit shifts and
`adds instead multiplication operations. This approximation is defines as:
`
`14 of 169
`
`
`
`This approximation also gives a simplified YUV to RGB transformation, expressed by:
`
`Page 10
`
`Another color format, referred to as YCbCr format, is intensively used for image compression. In
`YCbCr format, Y is the same as in a YUV system, however U and V components are scaled and zero-
`shifted to produce Cb and Cr, respectively, as follows:
`
`In this way, chrominance components Cb and Cr are always in the range [0,1].
`
`Computer Video Formats
`
`Resolutions of an image system refers to its capability to reproduce fine detail. Higher resolution
`requires more complex imaging systems to represent these images in real time. In computer systems,
`resolution is characterized with number of pixels. Table 1.3 summarizes popular computer video
`formats, and related storage requirements.
`
`Television Formats
`
`In television systems, resolution refers to the number of line pairs resolved on the face of the display
`screen, expressed in cycles per picture height, or cycles per picture width. For example, the NTSC
`broadcast system in North America and Japan, denoted as 525/59.94, has about 483 picture lines.
`
`The HDTV system will approximately double the number of lines of current broadcast television at
`approximately the same field rate. For example, a 1050x960 HDTV system will have 960 total lines.
`Spatial and temporal characteristics of conventional television systems (such as NTSC, SECAM, and
`PAL), and high-
`
`15 of 169
`
`
`
`definition TV systems (HDTV) are presented in Tables 1.4 and 1.5, respectively [BF91].
`
`Computer Video
`Format
`
`Table 1 3 Characteristics of various computer video formats
`
`Resolution (pixels)
`
`Colors (bits)
`
`320x200
`
`640x350
`
`640x480
`
`4 (2 bits)
`
`16 (4 bits)
`
`256 (8 bits)
`
`1024x768
`
`256 (8 bits)
`
`CGA - Color
`Graphics Adapter
`
`EGA - Enhanced
`Graphics Adapter
`
`VGA - Video
`Graphics Adapter
`
`88514/A Display
`Adapter Mode
`
`XGA - Extended
`Graphics Array (a)
`
`XGA - Extended
`Graphics Array (b)
`
`SVGA - Super VGA
`
`Page 11
`
`Storage
`Capacity
`Per Image
`
`128,000 bits=
`16 KB
`
`896,000 bits=
`112 KB
`
`2,457,600 bits=
`307 2 KB
`
`6,291,456 bits=
`786 432 KB
`
`6,291,456 bits=
`786 432 KB
`
`6,291,456 bits
`=786 432 KB
`
`65,000 (24 bits)
`
`256 (8 bits)
`
`640x480
`
`1024x768
`
`1024x768
`
`65,000 (24 bits)
`
`2 36 MB
`
`Table 1 4 Spatial characteristics of television systems [BF91]
`
`System
`
`Total Lines
`
`Active
`Lines
`
`Vertical
`Resolution
`
`Optimal
`Viewing
`Distance [m]
`
`Aspect
`Ratio
`
`Horizontal
`Resolution
`
`HDTV
`USA
`
`1050
`
`HDTV Europe
`
`1250
`
`NTSC
`
`PAL
`
`SECAM
`
`525
`
`625
`
`625
`
`960
`
`1000
`
`484
`
`575
`
`575
`
`675
`
`700
`
`242
`
`290
`
`290
`
`2 5
`
`2 4
`
`7 0
`
`6 0
`
`6 0
`
`16/9
`
`16/9
`
`4/3
`
`4/3
`
`4/3
`
`600
`
`700
`
`330
`
`425
`
`465
`
`Total
`Picture
`Elements
`
`720,000
`
`870,000
`
`106,000
`
`165,000
`
`180,000
`
`16 of 169
`
`
`
`Page 12
`
`Table 1 5 Temporal characteristics of television systems [BF91]
`
`System
`
`Total Channel
`Width [MHz]
`
`Video Baseband
`Y [MHz]
`
`Video Baseband
`R-Y [MHz]
`
`Video Baseband
`B-Y [MHz]
`
`Scanning Rate
`Camera [Hz]
`
`Scanning Rate
`HDTV Display
`[Hz]
`
`Scanning Rate
`Convent.
`Display [Hz]
`
`HDTV USA
`
`HDTV Europe
`
`NTSC
`
`PAL
`
`SECAM
`
`9 0
`
`12 0
`
`6 0
`
`8 0
`
`8 0
`
`10 0
`
`14 0
`
`4 2
`
`5 5
`
`6 0
`
`5 0
`
`7 0
`
`1 0
`
`1 8
`
`2 0
`
`5 0
`
`7 0
`
`0 6
`
`1 8
`
`2 0
`
`59 94
`
`50
`
`59 94
`
`50
`
`50
`
`59 94
`
`100
`
`NA
`
`NA
`
`NA
`
`59 94
`
`50
`
`59 94
`
`50
`
`50
`
`1.4—
`Overview of the Book
`
`This book is divided into ten chapters:
`
`1. Video compression. This current chapter introduces the problem of compressing motion video, illustrates the motivation for the 3-D
`solution chosen in the book, and briefly describes the proposed solution. Image and video formats are introduced as well.
`
`2. MPEG. This chapter describes the MPEG compression standard. Important contributions in the field and related work are emphasized.
`
`3. H 261/H.263. This chapter describes the compression standard for video telecommunications.
`
`4. XYZ compression. The XYZ video compression algorithm is described in detail in this chapter. Both encoder and decoder are presented, as
`well as an example of compressing 8x8x8 video block.
`
`5. 3-D DCT. The theory of the Discrete Cosine Transform is developed and extended to three dimensions. A fast 3-D algorithm is developed.
`
`6. Quantization. Discussion is presented on the issues of determining optimal quantizers using various error criteria. A model of Human
`Visual System is used to develop factors that weigh the DCT coefficients according to their relative visibility.
`
`7. Entropy coding. A method for encoding the quantized coefficients is developed based on the stochastic behavior of the pixel data.
`
`17 of 169
`
`
`
`Page 13
`
`8. VLSI architectures for XYZ codec. Issues concerning real-time implementation of the XYZ
`compression algorithm are analyzed including the complexity of the algorithm and mapping the
`algorithm into various VLSI architectures.
`
`9. Results. Obtained results of an implementation of the XYZ compression algorithm are presented.
`
`10. Conclusion. Summary of contributions are outlined, emphasizing the real-time features of the
`compression algorithm, visual quality, and compression ratio. Directions for future research are given
`as well.
`
`18 of 169
`
`
`
`19 of 169
`
`19 of 169
`
`
`
`Page 15
`
`2—
`The MPEG Video Compression Standard
`
`The Motion Picture Experts' Group was assembled by the International Standards Organization (ISO)
`to establish standards for the compression, encoding, and decompression of motion video. MPEG-1
`[IS92b] is a standard supporting compression of image resolutions of approximately 352x288 at 30
`fps into a data stream of 1.5 Mbps. This data rate is suitable for pressing onto CD-ROM. The MPEG-
`2 standard [IS93b] supports compression of broadcast television (704x576 at 30 fps) and HDTV
`(1920x1152 at 60 fps) of up to 60 Mpixels/sec (appx. 700 Mb) at compression ratios of roughly three
`times those expected of moving JPEG [IS92a] (playback rates of up to 80 Mbps).
`
`The MPEG standard specifies the functional organization of a decoder. The data stream is cached in a
`buffer to reduce the effect of jitter in delivery and decode, and is demultiplexed into a video stream,
`an audio stream, and additional user-defined streams. The video stream is decoded into a ''video
`sequence" composed of the sequence header and groups of pictures.
`
`2.1—
`MPEG Encoder and Decoder
`
`The specification of the MPEG encoder defines many compression options. While all of these options
`must be supported by the decoder, the selection of which options to support in compression is left to
`the discretion of the implementer. An MPEG encoder may choose compression options balancing the
`need for high compression
`
`20 of 169
`
`
`
`Page 16
`
`ratios against the complexity of motion compensation or adaptive quantization calculations.
`Decisions will be affected by such factors as:
`
`• A need for real-time compression. MPEG algorithms are complex, and there may not be sufficient
`time to implement exotic options on a particular platform.
`
`• A need for high compression ratios. For highest possible compression ratios at highest possible
`quality, every available option must be exercised.
`
`• A need for insensitivity to transmission error. MPEG-2 supports recovery from transmission errors.
`Some error recovery mechanisms are implemented by the encoder.
`
`• Fast algorithms. Development of fast algorithms may make compression options available that
`would otherwise be impractical.
`
`• Availability of specialized hardware. Dedicated hardware may increase the performance of the
`encoder to the point that additional compression options can be considered.
`
`In the MPEG standard, frames in a sequence are coded using three different algorithms, as illustrated
`in Figure 2.1.
`
`Figure 2.1
`Types of frames in the MPEG standard.
`
`21 of 169
`
`
`
`Page 17
`
`I frames (intra frames) are self-contained and coded using a DCT-based technique similar to JPEG. I
`frames are used as random access points in MPEG streams, and they give the lowest compression
`ratios within MPEG.
`
`P frames (predicted frames) are coded using forward predictive coding, where the actual frame is
`coded with reference to a pervious frame (I or P). This process is similar to H.261/H.263 predictive
`coding, except the previous frame is not always the closest previous frames, as in H.261/H.263
`coding. The compression ratio of P frames is significantly higher than of I frames.
`
`B frames (bidirectional or interpolated frames) are coded using two reference frames, a past and a
`future frame (which can be I or P frames). Bidirectional, or interpolated coding provides the highest
`amount of compression [Fur95b].
`
`I, P, and B frames are described in more detail in Section 8.2. Note that in Figure 2.1, the first three B
`frames (2,3, and 4) are bidirectionally coded using the past frame I (frame 1), and the future frame P
`(frame 5). Therefore, the decoding order will differ from the encoding order. The P frame 5 must be
`decoded before B frames 2,3, and 4, and I frame 9 before B frames 6,7, and 8. If the MPEG sequence
`is transmitted over the network, the actual transmission order should be {1,5,2,,3,4,8,6,7,8}.
`
`The MPEG application determines a sequence of I, P, and B frames. If there is a need for fast random
`access, the best resolution would be achieved by coding the whole sequence as I frames (MPEG
`becomes identical to Motion JPEG). However, the highest compression ratio can be achieved by
`incorporating a large number of B frames.
`
`The block diagram of the MPEG encoder is given in Figure 2.2, while the MPEG decoder is shown in
`Figure 2.3.
`
`I frames are created similarly to JPEG encoded pictures, while P and B frames are encoded in terms
`of previous and future frames. The motion vector is estimated, and the difference between the
`predicted and actual blocks (error terms) are calculated. The error terms are then DCT encoded and
`the entropy encoder is used to produce the compact code.
`
`22 of 169
`
`
`
`Page 18
`
`Figure 2.2
`The block diagram of the MPEG encoder.
`
`2.2—
`MPEG Data Stream
`
`The MPEG specification defines a "video sequence" composed of a video sequence header and many Group-
`Of-Pictures (GOP), as illustrated in Figure 2.4. The video sequence header defines the video format, picture
`dimensions, aspect ratio, frame rate, and delivered data rate. Supported video formats include CCIR601,
`HDTV(16:9), and VGA. Supported chroma formats include "4:2:0" (YUV) and
`
`23 of 169
`
`
`
`"4:4:4" (RGB). A suggested buffer size for the video sequence is also specified, a number intended to buffer
`jitter caused by differences in decode time.
`
`Page 19
`
`Figure 2.3
`The block diagram of the MPEG decoder.
`
`A GOP contains pictures that may be encoded into one of three supported compression formats. The GOP
`header contains a starting time for the group, and can therefore be used as a point of random access. Each frame
`within the GOP is numbered, and its number coupled with the GOP start time and the playback frame rate
`determines its playback time. Each picture is subdivided into "slices" and then into "macroblocks". A
`macroblock is composed of four 8x8 blocks of luminance data, and typically two 8x8 blocks of chrominance
`data, one Cr and one Cb.
`
`24 of 169
`
`
`
`Page 20
`
`Figure 2.4
`MPEG data stream.
`
`I Picture Format
`
`The I (Intraframe) picture format substantially corresponds to the JPEG format. These pictures are
`encoded by transformation into DCT space, quantization of the resultant coefficients, and entropy
`coding of the result. Transformation into DCT space is performed by an 8x8 DCT. Quantization is
`performed by reference to a user-loadable quantization table modified by a scale factor. This
`mechanism supports adaptive quantization at the cost of additional complexity - although 30%
`improvement in compression is claimed [PM93].
`
`After quantization, the resulting coefficients are reordered in zig-zag order, run-length coded,
`variable-length coded, and entropy coded. The resulting data stream should show roughly JPEG
`levels of compression.
`
`P Picture Format
`
`The P (Predicted) picture format introduces the concept of motion compensation. Each macroblock is
`coded with a vector that predicts its value from an earlier I or P
`
`25 of 169
`
`
`
`Page 21
`
`frame. The decoding process copies the contents of the macroblock-sized data at the address
`referenced by the vector into the macroblock of the P frame currently being decoded. Five bits of
`resolution are reserved for the magnitude of the vector in each of the x and y directions, meaning that
`1024 possible data blocks may be referenced by the predicted macroblock. However, eight possible
`magnitude ranges may be assigned to those five bits, meaning as many as 8192 macroblocks might
`have to be evaluated to exhaustively determine the best vector. Each evaluation might require testing
`as many as 384 pixels, and a further complexity is seen in performing fractional interpolation of
`pixels (vector motions as small as 1/2 pixel are supported). Finally, the difference between the
`prediction and the macroblock to be compressed may be encoded in like fashion to I frame encoding
`above.
`
`B Picture Format
`
`The B (Bidirectional prediction) picture format is calculated with two vectors. A backwards vector
`references a macroblock-sized region in the previous I or P frame, the forward vector references a
`macroblock-sized region in the next I or P frame. For this reason, I and P frames are placed in the
`coded stream before any B frames that reference them.
`
`The macroblock-sized regions referenced by the motion compensation vectors are averaged to
`produce the motion estimate for the macroblock being decoded. As with P frames, the error between
`the prediction and the frame being encoded is compressed and placed in the bitstream. The error
`factor is decompressed and added to the prediction to form the B frame macroblock.
`
`Many demanding technical issues are raised by the MPEG specification. These include fast
`algorithms for the DCT, fast algorithms for motion vector estimation, algorithms for adaptive
`quantization, and decompression in environments that allow some errors.
`
`26 of 169
`
`
`
`27 of 169
`
`27 of 169
`
`
`
`Page 23
`
`3—
`The H.261/H.263 Compression Standard for Video Telecommunications
`
`ITU has developed a video conferencing standard H.324 at very low bitrate for the General Switched
`Telephone Network (GSTN) and mobile radio [IT95a, IT95b, IT93]]. The H.324 is a
`recommendation for real-time voice, data, and video over V.34 modems on the GSTN telephone
`network. It consists of five documents: (1) H.324 systems, (2) H.223 multiplex, (3) H.245 control, (4)
`H.263 video codec, and (5) G.273 speech codec. The H.261 coding standard provides coded video at
`bit rates 64 Kbits/s and above, whereas the H.263 video coding standard, proposed for H.324,
`provides coded video around 16 Kbits/s.
`
`Figure 3.1 shows a block diagram of a generic multimedia system, compliant to the H.324 standard.
`The system consists of terminal equipment, GSTN modem, GSTN network, multipoint control unit
`(MCU), and other system operation entities.
`
`Video equipment includes cameras, monitors, and video processing units to improve compression.
`Audio equipment includes microphone, speakers, telephone instrument, and attached audio devices.
`Data application equipment includes computers, non-standardized data application protocols,
`telematic visual aids such as electronic whiteboards, etc.
`
`GSTN network interface supports appropriate signaling, ringing functions and voltage levels in
`accordance with national standards.
`
`28 of 169
`
`
`
`Page 24
`
`Figure 3.1
`Block diagram of a generic H.324-compliant multimedia system.
`
`3.1—
`Picture Formats for H.261/H.263 Video Codecs
`
`All H.324 terminals support both the H.263 and H.261 video codecs. For the H.261 algorithm two
`formats are defined: CIF and QCIF, while for the H.263 algorithm three a